Re: Bootstrapping a Collection on SolrCloud

Erick Erickson Wed, 09 Jan 2019 16:18:37 -0800

bq.  do all 100 replicas move to the one remaining node?

No. The replicas are in a "down" state the Solr instances
are brought back up (I'm skipping autoscaling here, but
even that wouldn't move all the replicas to the one remaining
node).


bq.  what the collection *should* look like based on the
hardware I am deploying to.

With the caveat that the Solr instances have to be up, this
is entirely possible. First of all, you can provide a "createNodeSet"
to the create command to specify exactly what Solr nodes you
want used for your collection. There's a special "EMPTY"
value that _almost_ does what you want, that is it creates
no replicas, just the configuration in ZooKeeper. Thereafter,
though, you have to ADDREPLICA (which you can do with
"node" parameter to place it exactly where you want.

bq. how many shards are at least partially dependent on the
available hardware

Not if you're using compositeID routing. The number of shards
is fixed at creation time, although you can split them later.

I don't  think you can use bin/solr create_collection with the
EMPTY createNodeSet, so you need at least one
Solr node running to create your skeleton collection.

I think the thing I'm getting stuck on is how in the world the
Solr code could know enough to "do the right thing". How many
docs do you have? How big are they? How much to you expect
to grow? What kinds of searches do you want to support?

But more power to you if you can figure out how to support the kind
of thing you want. Personally I think it's harder than you might
think and not broadly useful. I've been wrong more times than I like
to recall, so maybe you have an approach that would get around
the tigers hiding in the grass I think are out there...

Best,
Erick


On Wed, Jan 9, 2019 at 3:04 PM Frank Greguska <fg...@apache.org> wrote:
>
> Thanks for the response. You do raise good points.
>
> Say I reverse your example and I have a 10 node cluster with a 10-shard
> collection and a replication factor of 10. Now I kill 9 of my nodes, do all
> 100 replicas move to the one remaining node? I believe the answer is, well
> that depends on the configuration.
>
> I'm thinking about it from the initial cluster planning side of things. The
> decisions about auto-scaling, how many replicas, and even how many shards
> are at least partially dependent on the available hardware. So at
> deployment time I would think there would be a way of defining what the
> collection *should* look like based on the hardware I am deploying to.
> Obviously this could change during runtime and I may need to add nodes,
> split shards, etc...
>
> As it is now it seems like I need to deploy my cluster then write a custom
> script to ensure each node I expect to be there is running and only then
> create my collection with desired shards and replication.
>
> - Frank
>
> On Wed, Jan 9, 2019 at 2:14 PM Erick Erickson <erickerick...@gmail.com>
> wrote:
>
> > How would you envision that working? When would the
> > replicas actually be created and under what heuristics?
> >
> > Imagine this is possible, and there are a bunch of
> > placeholders in ZK for a 10-shard, collection with
> > a replication factor of 10 (100 replicas all told). Now
> > I bring up a single Solr instance. Should all 100 replicas
> > be created immediately? Wait for N Solr nodes to be
> > brought online? On some command?
> >
> > My gut feel is that this would be fraught with problems
> > and not very valuable to many people. If you could create
> > the "template" in ZK without any replicas actually being created,
> > then at some other point say "make it so", I don't see the advantage
> > over just the current setup. And I do think that it would be
> > considerable effort.
> >
> > Net-net is I'd like to see a much stronger justification
> > before anyone embarks on something like this. First as
> > I mentioned above I think it'd be a lot of effort, second I
> > virtually guarantee it'd introduce significant bugs. How
> > would it interact with autoscaling for instance?
> >
> > Best,
> > Erick
> >
> > On Wed, Jan 9, 2019 at 9:59 AM Frank Greguska <fg...@apache.org> wrote:
> > >
> > > Hello,
> > >
> > > I am trying to bootstrap a SolrCloud installation and I ran into an issue
> > > that seems rather odd. I see it is possible to bootstrap a configuration
> > > set from an existing SOLR_HOME using
> > >
> > > ./server/scripts/cloud-scripts/zkcli.sh -zkhost ${ZK_HOST} -cmd bootstrap
> > > -solrhome ${SOLR_HOME}
> > >
> > > but this does not create a collection, it just uploads a configuration
> > set.
> > >
> > > Furthermore, I can not use
> > >
> > > bin/solr create
> > >
> > > to create a collection and link it to my bootstrapped configuration set
> > > because it requires Solr to already be running.
> > >
> > > I'm hoping someone can shed some light on why this is the case? It seems
> > > like a collection is just some znodes stored in zookeeper that contain
> > > configuration settings and such. Why should I not be able to create those
> > > nodes before Solr is running?
> > >
> > > I'd like to open a feature request for this if one does not already exist
> > > and if I am not missing something obvious.
> > >
> > > Thank you,
> > >
> > > Frank Greguska
> >

Re: Bootstrapping a Collection on SolrCloud

Reply via email to