Re: loading zookeeper data

Aristedes Maniatis Sun, 24 Jul 2016 21:45:55 -0700

Thanks so much for your reply. That's clarified a few things for me.

Erick Erickson wrote:


> Where SolrCloud becomes compelling is when you _do_ need to have
> shard, and deal with HA/DR. 

I'm not using shards since the indicies are small enough, however I use 
master/slave with 6 nodes for two reasons: having a single master poll the 
database means less load on the database than have every node poll separately. 
And of course we still want HA and performance, so we balance load with haproxy.

> Then the added step of maintaining things
> in Zookeeper is a small price to pay for _not_ having to be sure that
> all the configs on all the servers are all the same. Imagine a cluster
> with several hundred replicas out there. Being absolutely sure that
> all of them have the same configs, have been restarted and the like
> becomes daunting. So having to do an "upconfig" is a good tradeoff
> IMO.

Saltstack (and ansible, puppet, chef, etc) all make distributed configuration 
management trivial. So it isn't solving any problem for me, but I understand 
how people without a configuration management tool would like it.



> The bin/solr script has a "zk -upconfig" parameter that'll take care
> of pushing the configs up. Since you already have the configs in VCS,
> your process is just to pull them from vcs to "somewhere" then
> bin/solr zk -upconfig -z zookeeper_asserss -n configset_name -d
> directory_you_downloaded_to_from_VCS.

Yep, I guess that's confirming my guess at how people are expected to use this. 
Its pretty cumbersome for me because:

1. I don't want production machines to require VCS checkout credentials
2. I don't want to have to install Solr (and keep the version in sync with 
production) on our build or configuration management machines
3. I still need files on disk in order to version control them and tie that 
into our QA processes. Now I need another step to take those files and inject 
them into the Zookeeper black box, ensuring they are always up to date.

I do understand that people who managed hundreds of nodes completely by hand 
would find it useful. But I am surprised that there were any of those people.

I was hoping that Zookeeper had some hidden features that would make my life 
easier.


> Thereafter you simply refer to them by name when you create a
> collection and the rest of it is automatic. Every time a core reloads
> it gets the new configs.
> 
> If you're trying to manipulate _cores_, that may be where you're going
> wrong. Think of them as _collections_. What's not clear from your
> problem statement is whether these cores on the various machines are
> part of the same collection or not.

I was unaware of the concept of collection until now. We use one core for each 
type of entity we are indexing and that works well.

> Do you have multiple shards in one
> logical index?

No shards. Every Solr node contains the complete set of all data.

>  Or do you have multiple collections that have
> masters/slaves (in which case the master and all the slaves that point
> to it will be a "collection")?

I'm not understanding from https://wiki.apache.org/solr/SolrTerminology what a 
Collection is, that makes it different to the old concept of Core.


> Do all of the cores you have use the
> same configurations? Or is each set of master/slaves using a different
> configuration?

Each core has a different configuration (which is what makes it a different 
core... different source data, different synonyms, etc). But every node is 
identical and kept that way with saltstack.



Anyhow, the bottom line appears to be that 130Mb of jars are needed to deploy 
my configuration to Zookeeper. In that case, I think I'll do it by building a 
new deployment project, with a gradle task (so I don't need to worry about all 
those Solr dependencies for zksh), and a Jenkins job that can be triggered to 
run the deployment to either staging or production. A few new holes in my 
firewall and I'll be done.

Unfortunate new points of failure and complexity, but I can't think of anything 
simpler.


Thanks

Ari



-- 
-------------------------->
Aristedes Maniatis
CEO, ish
https://www.ish.com.au
GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A

signature.asc
Description: OpenPGP digital signature

Re: loading zookeeper data

Reply via email to