Re: Required local configuration with ZK solr.xml?
On 1/29/2014 12:48 PM, Jeff Wartes wrote: And that, I think, is my misunderstanding. I had assumed that the link between a node and the collections it belongs to would be the (possibly chroot¹ed) zookeeper reference *itself*, not the node¹s directory structure. Instead, it appears that ZK is simply a repository for the collection configuration, where nodes may look up what they need based on filesystem core references. Work is underway towards a new mode where zookeeper is the ultimate source of truth, and each node will behave accordingly to implement and maintain that truth. I can't seem to locate a Jira issue for it, unfortunately. It's possible that one doesn't exist yet, or that it has an obscure title. Mark Miller is the one who really understands the full details, as he's a primary author of SolrCloud code. Currently, what SolrCloud considers to be truth is dictated by both zookeeper and an amalgamation of which cores each server actually has present. The collections API modifies both. With an older config (all current and future 4.x versions), the latter is in solr.xml. If you're using the new solr.xml format (available 4.4 and later, will be mandatory in 5.0), it's done with Core Discovery. Zookeeper has a list of everything and coordinates the cluster state, but has no real control over the cores that actually exist on each server. When the two sources of truth disagree, nothing happens to fix the situation, manual intervention is required. Any errors in my understanding of SolrCloud are my own. I don't claim that what I just wrote is error-free, but I am pretty sure that it's essentially correct. Thanks, Shawn
Re: Required local configuration with ZK solr.xml?
Work is underway towards a new mode where zookeeper is the ultimate source of truth, and each node will behave accordingly to implement and maintain that truth. I can't seem to locate a Jira issue for it, unfortunately. It's possible that one doesn't exist yet, or that it has an obscure title. Mark Miller is the one who really understands the full details, as he's a primary author of SolrCloud code. Currently, what SolrCloud considers to be truth is dictated by both zookeeper and an amalgamation of which cores each server actually has present. The collections API modifies both. With an older config (all current and future 4.x versions), the latter is in solr.xml. If you're using the new solr.xml format (available 4.4 and later, will be mandatory in 5.0), it's done with Core Discovery. Zookeeper has a list of everything and coordinates the cluster state, but has no real control over the cores that actually exist on each server. When the two sources of truth disagree, nothing happens to fix the situation, manual intervention is required. Thanks Shawn, this was exactly the confirmation I was looking for. I think I have a much better understanding now. The takeaway I have is that SolrCloud¹s current automation assumes relatively static clusters, and that if I want anything like dynamic scaling, I¹m going to have to write my own tooling to add nodes safely. Fortunately, it appears that the necessary CoreAdmin commands don¹t need much besides the collection name, so it smells like a simple thing to query zookeeper¹s /collections path (or clusterstate.json) and issue GET requests accordingly when I spin up a new node. If you (or anyone) does happen to recall a reference to the work you alluded to, I¹d certainly be interested. I googled around myself for a few minutes, but haven¹t found anything so far.
Re: Required local configuration with ZK solr.xml?
Found it. In case anyone else cares, this appears to be the root issue: https://issues.apache.org/jira/browse/SOLR-5128 Thanks again. On 1/30/14, 9:01 AM, Jeff Wartes jwar...@whitepages.com wrote: Work is underway towards a new mode where zookeeper is the ultimate source of truth, and each node will behave accordingly to implement and maintain that truth. I can't seem to locate a Jira issue for it, unfortunately. It's possible that one doesn't exist yet, or that it has an obscure title. Mark Miller is the one who really understands the full details, as he's a primary author of SolrCloud code. Currently, what SolrCloud considers to be truth is dictated by both zookeeper and an amalgamation of which cores each server actually has present. The collections API modifies both. With an older config (all current and future 4.x versions), the latter is in solr.xml. If you're using the new solr.xml format (available 4.4 and later, will be mandatory in 5.0), it's done with Core Discovery. Zookeeper has a list of everything and coordinates the cluster state, but has no real control over the cores that actually exist on each server. When the two sources of truth disagree, nothing happens to fix the situation, manual intervention is required. Thanks Shawn, this was exactly the confirmation I was looking for. I think I have a much better understanding now. The takeaway I have is that SolrCloud¹s current automation assumes relatively static clusters, and that if I want anything like dynamic scaling, I¹m going to have to write my own tooling to add nodes safely. Fortunately, it appears that the necessary CoreAdmin commands don¹t need much besides the collection name, so it smells like a simple thing to query zookeeper¹s /collections path (or clusterstate.json) and issue GET requests accordingly when I spin up a new node. If you (or anyone) does happen to recall a reference to the work you alluded to, I¹d certainly be interested. I googled around myself for a few minutes, but haven¹t found anything so far.
Re: Required local configuration with ZK solr.xml?
...the differnce between that example and what you are doing here is that in that example, because both of nodes already had collection1 instance dirs, they expected to be part of collection1 when they joined the cluster. And that, I think, is my misunderstanding. I had assumed that the link between a node and the collections it belongs to would be the (possibly chroot¹ed) zookeeper reference *itself*, not the node¹s directory structure. Instead, it appears that ZK is simply a repository for the collection configuration, where nodes may look up what they need based on filesystem core references. So assume I have a ZK running separately, and it already has a config uploaded which all my collections will use. I then add two nodes with empty solr home dirs, (no cores found) and use the collections API to create a new collection ³collection1 with numShards=1 and replicationFactor=2. Please check my assertions: WRONG: If I take the second node down, wipe the solr home dir on it, and add it back, the collection will properly replicate back onto the node. RIGHT: If I replace the second node, it must have a $SOLR_HOME/collection1_shard1_replica2/core.properties file in order to properly act as the replacement. RIGHT: If I replace the first node, it must have a $SOLR_HOME/collection1_shard1_replica1/core.properties file in order to properly act as the replacement. If correct, this sounds kinda painful, because you have to know the exact list of directory names for the set of collections/shards/replicas the node was participating in if you want to replace a given node. Worse, if you¹re replacing a node whose disk failed, you need to manually reconstruct the contents of those core.properties files? This also leaves me a bit confused about how to increase the replicationFactor on an existing collection, but I guess that¹s tangential. Thanks.
Required local configuration with ZK solr.xml?
It was my hope that storing solr.xml would mean I could spin up a Solr node pointing it to a properly configured zookeeper ensamble, and that no further local configuration or knowledge would be necessary. However, I’m beginning to wonder if that’s sufficient. It’s looking like I may also need each node to be preconfigured with at least a directory and a core.properties for each collection/core the node intends to participate in? Is that correct? I figured I’d test this by starting a stand-alone ZK, and configuring it by issuing a zkCli bootstrap against the solr examples dir solr dir, then manually putfile-ing the (new-style) solr.xml. I then attempted to connect two solr instances that referenced that zookeeper, but did NOT use the solr examples dir as the base. I essentially used empty directories for the solr home. Although both connected and zk shows both in the /live_nodes, both report “0 cores discovered” in the logs, and don’t seem to find and participate in the collection as happens when you follow the SolrCloud example verbatim. (http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster) I may have some other configuration issues at present, but I’m going to be disappointed if I need to have preknowledge of what collections/cores may have been dynamically created in a cluster in order to add a node that participates in that cluster. It feels like I might be missing something. Any clarifications would be appreciated.
Re: Required local configuration with ZK solr.xml?
Maybe i'm mising something, but everything you are describing sounds correct and working properly -- the disconnect between what i think is suppose to happen and what you seem to be expecting seems to be right arround here : essentially used empty directories for the solr home. Although both : connected and zk shows both in the /live_nodes, both report “0 cores : discovered” in the logs, and don’t seem to find and participate in the : collection as happens when you follow the SolrCloud example verbatim. : (http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster) ...the differnce between that example and what you are doing here is that in that example, because both of nodes already had collection1 instance dirs, they expected to be part of collection1 when they joined the cluster. In your situation however, it sounds like you have created a cluster w/o any collections (which is fine) and you have a config set ready to go in ZK. If you now send a CREATE command to the collections API, refering to the config set, it should automatically create the neccessary cores on your nodes. if that's not the case, or if i've missunderstood what you have done and are trying to do, please elaborate. -Hoss http://www.lucidworks.com/