Re: LotsOfCores feature

Erick Erickson Thu, 06 Jun 2013 15:25:09 -0700

Now Jack. You know "it depends" <G>.... Just answer
the questions "how many simultaneous cores can you
open on your hardware", and "what's the maximum percentage
of the cores you expect to be open at any one time".
Do some math and you have your answer.....


The meta-data, essentially anything in the <core> tag
or the core.properties file is kept in an in-memory structure. At
startup time, that structure has to be filled. I haven't measured
exactly, but it's relatively small (GUESS: 256 bytes) plus control
structures. So _theoretically_ you could put millions on a single
node. But you don't want to because:
1> if you're doing core discovery, you have to walk millions of
     directories every time you start up.
2> otherwise you're maintaining a huge solr.xml file (which will be
    going away anyway).

Aleksey's use case also calls for "less than a million" or so open
at once. I can't imagine fitting that many cores into memory
simultaneously one one machine.

The design goal is 10-15K cores on a machine. The theory
is that pretty soon you're going to have a big enough percentage
of them open that you'll blow memory up.

And this is always governed by the size of the transient cache.
Pretty soon you'll be opening a core for each and every query if
you have more requests coming in for unique cores than your
cache size.

So, as usual, it's a matter of the usage pattern to determine how
many cores you can put on the machine.

FWIW,
Erick

On Thu, Jun 6, 2013 at 4:13 PM, Jack Krupansky <j...@basetechnology.com> wrote:
> So, is that a clear yes or a clear no for Aleksey's use case - 10's of
> millions of cores, not all active but each loadable on demand?
>
> I asked this same basic question months ago and there was no answer
> forthcoming.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Erick Erickson
> Sent: Thursday, June 06, 2013 3:53 PM
> To: solr-user@lucene.apache.org
> Subject: Re: LotsOfCores feature
>
>
> 100K is really not the limit, it's just hard to imagine
> 100K cores on a single machine unless some were
> really rarely used. And it's per node, not cluster-wide.
>
> The current state is that everything is in place, including
> transient cores, auto-discovery, etc. So you should be
> able to go ahead and try it out.
>
> The next bit that will help with efficiency is sharing named
> config sets. The intent here is that <solrhome>/configs will
> contain sub-dirs like "conf1", "conf2" etc. Then your cores
> can reference configName=conf1 and only one copy of
> the configuration data will be used rather than re-loading one
> for each core as it comes up and down.
>
> Do note that the _first_ query in to one of the not-yet-loaded
> cores will be slow. The model here is that you can tolerate
> some queries taking more time at first than you might like
> in exchange for the hardware savings. This pre-supposes that
> you simply cannot fit all the cores into memory at once.
>
> The "won't fix" bits are there because, as we got farther into this
> process, the approach changed and the functionality of the
> won't fix JIRAs was subsumed by other changes by and large.
>
> I've got to update that documentation sometime, but just haven't
> had time yet. If you go down this route, we'll be happy to
> add your name to the authorized editors of the wiki list if you'd
> like.
>
> Best
> Erick
>
> On Thu, Jun 6, 2013 at 3:08 PM, Aleksey <bitterc...@gmail.com> wrote:
>>
>> I was looking at this wiki and linked issues:
>> http://wiki.apache.org/solr/LotsOfCores
>>
>> they talk about a limit being 100K cores. Is that per server or per
>> entire fleet because zookeeper needs to manage that?
>>
>> I was considering a use case where I have tens of millions of indices
>> but less that a million needs to be active at any time, so they need
>> to be loaded on demand and evicted when not used for a while.
>> Also since number one requirement is efficient loading of course I
>> assume I will store a prebuilt index somewhere so Solr will just
>> download it and strap it in, right?
>>
>> The root issue is marked as "won;t fix" but some other important
>> subissues are marked as resolved. What's the overall status of the
>> effort?
>>
>> Thank you in advance,
>>
>> Aleksey
>
>

Re: LotsOfCores feature

Reply via email to