I would not try putting tens of millions of cores on one machine. My
question (and I think Jack's as well) was around having them across a
fleet, say if I need 1M then I'd get 100 machines appropriately sized
for 10K each. I was clarifying because there was some talk about
ZooKeeper only being able to store small amount of configuration and
there were concerns that it won't keep information about which core is
where if it's millions.

This question is still open in my mind, since I haven't yet
familiarized myself with how ZK works.




On Thu, Jun 6, 2013 at 3:23 PM, Erick Erickson <erickerick...@gmail.com> wrote:
> Now Jack. You know "it depends" <G>.... Just answer
> the questions "how many simultaneous cores can you
> open on your hardware", and "what's the maximum percentage
> of the cores you expect to be open at any one time".
> Do some math and you have your answer.....
>
> The meta-data, essentially anything in the <core> tag
> or the core.properties file is kept in an in-memory structure. At
> startup time, that structure has to be filled. I haven't measured
> exactly, but it's relatively small (GUESS: 256 bytes) plus control
> structures. So _theoretically_ you could put millions on a single
> node. But you don't want to because:
> 1> if you're doing core discovery, you have to walk millions of
>      directories every time you start up.
> 2> otherwise you're maintaining a huge solr.xml file (which will be
>     going away anyway).
>
> Aleksey's use case also calls for "less than a million" or so open
> at once. I can't imagine fitting that many cores into memory
> simultaneously one one machine.
>
> The design goal is 10-15K cores on a machine. The theory
> is that pretty soon you're going to have a big enough percentage
> of them open that you'll blow memory up.
>
> And this is always governed by the size of the transient cache.
> Pretty soon you'll be opening a core for each and every query if
> you have more requests coming in for unique cores than your
> cache size.
>
> So, as usual, it's a matter of the usage pattern to determine how
> many cores you can put on the machine.
>
> FWIW,
> Erick
>
> On Thu, Jun 6, 2013 at 4:13 PM, Jack Krupansky <j...@basetechnology.com> 
> wrote:
>> So, is that a clear yes or a clear no for Aleksey's use case - 10's of
>> millions of cores, not all active but each loadable on demand?
>>
>> I asked this same basic question months ago and there was no answer
>> forthcoming.
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Erick Erickson
>> Sent: Thursday, June 06, 2013 3:53 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: LotsOfCores feature
>>
>>
>> 100K is really not the limit, it's just hard to imagine
>> 100K cores on a single machine unless some were
>> really rarely used. And it's per node, not cluster-wide.
>>
>> The current state is that everything is in place, including
>> transient cores, auto-discovery, etc. So you should be
>> able to go ahead and try it out.
>>
>> The next bit that will help with efficiency is sharing named
>> config sets. The intent here is that <solrhome>/configs will
>> contain sub-dirs like "conf1", "conf2" etc. Then your cores
>> can reference configName=conf1 and only one copy of
>> the configuration data will be used rather than re-loading one
>> for each core as it comes up and down.
>>
>> Do note that the _first_ query in to one of the not-yet-loaded
>> cores will be slow. The model here is that you can tolerate
>> some queries taking more time at first than you might like
>> in exchange for the hardware savings. This pre-supposes that
>> you simply cannot fit all the cores into memory at once.
>>
>> The "won't fix" bits are there because, as we got farther into this
>> process, the approach changed and the functionality of the
>> won't fix JIRAs was subsumed by other changes by and large.
>>
>> I've got to update that documentation sometime, but just haven't
>> had time yet. If you go down this route, we'll be happy to
>> add your name to the authorized editors of the wiki list if you'd
>> like.
>>
>> Best
>> Erick
>>
>> On Thu, Jun 6, 2013 at 3:08 PM, Aleksey <bitterc...@gmail.com> wrote:
>>>
>>> I was looking at this wiki and linked issues:
>>> http://wiki.apache.org/solr/LotsOfCores
>>>
>>> they talk about a limit being 100K cores. Is that per server or per
>>> entire fleet because zookeeper needs to manage that?
>>>
>>> I was considering a use case where I have tens of millions of indices
>>> but less that a million needs to be active at any time, so they need
>>> to be loaded on demand and evicted when not used for a while.
>>> Also since number one requirement is efficient loading of course I
>>> assume I will store a prebuilt index somewhere so Solr will just
>>> download it and strap it in, right?
>>>
>>> The root issue is marked as "won;t fix" but some other important
>>> subissues are marked as resolved. What's the overall status of the
>>> effort?
>>>
>>> Thank you in advance,
>>>
>>> Aleksey
>>
>>

Reply via email to