Re: LotsOfCores feature

Erick Erickson Fri, 07 Jun 2013 05:33:47 -0700

I should have been clearer, and others have mentioned... the "lots of cores"
stuff is really outside Zookeeper/SolrCloud at present. I don't think it's
incompatible, but it wasn't part of the design so it'll need some effort to
make it play nice with SolrCloud. I'm not sure there's actually a compelling
use-case for combining the two.

bq: Also, instead of managing cores is it not possible to manage servers
which will be in tens and hundreds?

Well, tens to hundreds of servers will work with SolrCloud. You could
theoretically take over routing documents (i.e. custom hashing) and
simply use SolrCloud without the "lots of cores" stuff. So the scenario
is that you have, say, 250 machines that will hold all your data and use
custom routing to get the right docs to the right core. Some of the upcoming
SolrJ being capable of sending requests only to the proper shard would
certainly help here. But this too is rather unexplored territory. I don't think
Zookeeper would really have a problem here because it's not moving much
data back and forth, the 1M limitation for data in ZK is on a per-core basis
and really applies only to the conf data, NOT the index.

But the current approach does lend itself to Jack's scenario. Essentially your
ClusterKeeper could send the index to one of the machines and create the
core there.

The current approach addresses the case where you are essentially doing
what Jack outlined semi-manually. That is, you're distributing your cores
around your cluster based on historical access patterns. It's pretty easy to
move the cores around by copying the dirs and using the auto-discovery
stuff to keep things in balance, but it's in no way automatic and probably
requires a restart (or at least core unload/load). Jack's idea
of doing this dynamically should work in that kind of scenario.

I can imagine, for instance, some relatively small number of physical
machines and all the user's indexes actually being kept on a networked
filesystem. The startup process is simply finding a machine with spare
capacity and telling it to create the core and pointing it at the pre-existing
index. On the assumption that the indexes fit into memory, you'd pay a
small penalty for start-up but wouldn't need to copy indexes around. You
could elaborate this as necessary, tuning the transient caches such that
you "fit" the number/size of users to particular hardware. If the store were
an HDFS file system, redundancy/backup/error recovery would come along
"for free".

But under any scenario, one of the hurdles will be figuring out how many
simultaneous users of whatever size can actually be comfortably handled
by a particular piece of hardware. And usually there's some kind of long
tail just to make it worse. Most of your users will be under X documents,
and some users will be 100X.... And updating would be "interesting".

But I should emphasize that anything elaborate like this dynamic shuffling
is kind of theoretical at this point, meaning we haven't actually tested it. It
_should_ work, but I'm sure there will be some issues to flush out.

Best
Erick

On Fri, Jun 7, 2013 at 6:38 AM, Noble Paul നോബിള്‍  नोब्ळ्
<noble.p...@gmail.com> wrote:
> The Wiki page was built not for Cloud Solr.
>
> We have done such a deployment where less than a tenth of cores were active
> at any given point in time. though there were tens of million indices they
> were split among a large no:of hosts.
>
>
> If you don't insist of Cloud deployment it is possible. I'm not sure if it
> is possible with cloud
>
>
> On Fri, Jun 7, 2013 at 12:38 AM, Aleksey <bitterc...@gmail.com> wrote:
>
>> I was looking at this wiki and linked issues:
>> http://wiki.apache.org/solr/LotsOfCores
>>
>> they talk about a limit being 100K cores. Is that per server or per
>> entire fleet because zookeeper needs to manage that?
>>
>> I was considering a use case where I have tens of millions of indices
>> but less that a million needs to be active at any time, so they need
>> to be loaded on demand and evicted when not used for a while.
>> Also since number one requirement is efficient loading of course I
>> assume I will store a prebuilt index somewhere so Solr will just
>> download it and strap it in, right?
>>
>> The root issue is marked as "won;t fix" but some other important
>> subissues are marked as resolved. What's the overall status of the
>> effort?
>>
>> Thank you in advance,
>>
>> Aleksey
>>
>
>
>
> --
> -----------------------------------------------------
> Noble Paul

Re: LotsOfCores feature

Reply via email to