Re: Solr Cloud wiki and branch notes

Jason Rutherglen Fri, 15 Jan 2010 14:46:51 -0800

> This is really about doing not-so-much in the very near term,
> while thinking ahead to the longer term.

Lets have a page dedicated to release 1.0 of cloud? I feel
uncomfortable editing the existing wiki because I don't know
what the plans are for the first release.

I need to revisit Katta as my short term plans include using
Zookeeper (not for failover) but simply for deploying
shards/cores to servers, and nothing else. I can use the core
admin interface to bring them online, update them etc. Or I'll
just implement something and make a patch to Solr... Thinking
out loud:

/anyname/shardlist-v1.txt /anyname/shardlist-v2.txt

where shardlist-v1.txt contains:
corename,coredownloadpath,instanceDir

Where coredownloadpath can be any URL including hftp, hdfs, ftp, http, https.

Where the system automagically uninstalls cores that should no
longer exist on a given server. Cores with the same name
deployed to the same server would use the reload command,
otherwise the create command.

Where there's a ZK listener on the /anyname directory for new
files that are greater than the last known installed
shardlist.txt.

Alternatively, an even simpler design would be uploading a
solr.xml file per server, something like:
/anyname/solr-prod01.solr.xml

Which a directory listener on each server parses and makes the
necessary changes (without restarting Tomcat).

On the search side in this system, I'd need to wait for the
cores to complete their install, then swap in a new core on the
search proxy that represents the new version of the corelist,
then the old cores could go away. This isn't very different than
the segmentinfos system used in Lucene IMO.

On Fri, Jan 15, 2010 at 1:53 PM, Yonik Seeley <ysee...@gmail.com> wrote:
> On Fri, Jan 15, 2010 at 4:12 PM, Jason Rutherglen
> <jason.rutherg...@gmail.com> wrote:
>> The page is huge, which signals to me maybe we're trying to do
>> too much
>
> This is really about doing not-so-much in the very near term, while
> thinking ahead to the longer term.
>
>> Revamping distributed search could be in a different branch
>> (this includes partial results)
>
> That could just be a separate patch - it's scope is not that broad (I
> think there may already be a JIRA issue open for it).
>
>> Having a single solrconfig and schema for each core/shard in a
>> collection won't work for me. I need to define each core
>> externally, and I don't want Solr-Cloud to manage this, how will
>> this scenario work?
>
> We do plan on each core being able to have it's own schema (so one
> could try out a version of a schema and gradually migrate the
> cluster).
>
> It could also be possible to define a schema as "local" (i.e. use the
> one on the local file system)
>
>> A host is about the same as node, I don't see the difference, or
>> enough of one
>
> A host is the hardware. It will have limited disk, limited CPU, etc.
> At some point we will want to model this... multiple nodes could be
> launched on one box.  We're not doing anything with it now, and won't
> in the near future.
>
>> Cluster resizing and rebalancing can and should be built
>> externally and hopefully after an initial release that does the
>> basics well
>
> The initial release will certainly not be doing any resizing or rebalancing.
> We should allow this to be done externally.  In the future, we
> shouldn't require that this be done externally though (i.e. we should
> somehow alow the cluster to grow w/o people having to write code).
>
>> Collection is a group of cores?
>
> A collection of documents - the complete search index.  It has a
> single schema, etc.
>
> -Yonik
> http://www.lucidimagination.com
>

Re: Solr Cloud wiki and branch notes

Reply via email to