Missed the list in my last reply: This used to work properly - I'm guess that the zk layout refactoring right before 4.0 broke it. We likely need a JIRA issue, a fix, and a test.
Mark On Nov 14, 2012, at 6:43 AM, Gilles Comeau <gilles.com...@polecat.co> wrote: > Hi all, > > I just wanted to make the simplest repro of this issue, which now I am > thinking might be related to the decision made in: > https://issues.apache.org/jira/browse/SOLR-3080 ? And this is the expected > behaviour? > > 1. Download SOLR 4 production and extract. > 2. Replace solr.xml in apache-solr-4.0.0/example/solr/solr.xml with: > > <?xml version="1.0" encoding="UTF-8" ?> > <solr persistent="true"> > <cores adminPath="/admin/cores" defaultCoreName="collection1" > host="${host:}" hostPort="${jetty.port:}" hostContext="${hostContext:}" > zkClientTimeout="${zkClientTimeout:15000}"> > <core shard="shard1" instanceDir="collection1/" name="collection1" > collection="polecat"/> > <core shard="shard1" instanceDir="collection2/" name="collection2" > collection="polecat"/> > <core schema="schema.xml" shard="core3" instanceDir="core3/" name="core3" > config="solrconfig.xml" collection="polecat" dataDir="data"/> > </cores> > </solr> > > 3. Start solr with: java -Dbootstrap_confdir=./solr/collection1/conf > -Dcollection.configName=myconf -DzkRun -Dsolrcloud.skip.autorecovery=true > -jar start.jar > (skip.autorecovery is used because the shards don't exist previously) > > Then run this: > Sanity query: > http://localhost:8983/solr/polecat/select?q=*%3A*&wt=xml&distrib=true > Remove the core: > http://localhost:8983/solr/admin/cores?action=UNLOAD&core=core3&deleteIndex=true > Error query: > http://localhost:8983/solr/polecat/select?q=*%3A*&wt=xml&distrib=true > > And the sanity query, we will receive 0 records, the error query "no servers > hosting shard:". And in the clusterstate.json: "core3":{"replicas":{}}}} > > Regards, > > Gilles > > -----Original Message----- > From: Gilles Comeau [mailto:gilles.com...@polecat.co] > Sent: 13 November 2012 16:39 > To: solr-user@lucene.apache.org; markrmil...@gmail.com > Subject: RE: Removing Shards from Zookeeper - no servers hosting shard > > Sorry forgot.. pictures are no good.. From cluster.json, the same > information, the core I unloaded shard sticks around: > “"solrexperiment:8080_solr_experiment_02_10_2012":{"replicas":{}}}}” > > Do I need a special command to delete the shard or something? I’ve never > seen a command that does that? > > Regards, Gilles > > "experiment":{ > > "solrexperiment:8080_solr_experiment_master":{"replicas":{"IS-17093:9090_solr_experiment_master":{ > "shard":"solrexperiment:8080_solr_experiment_master", > "roles":null, > "state":"active","core":"experiment_master","collection":"experiment","node_name":"IS-17093:9090_solr","base_url":"http://IS-17093:9090/solr","leader":"true"}}}, > > "solrexperiment:8080_solr_experiment_01_10_2012":{"replicas":{"IS-17093:9090_solr_01_10_2012_experiment":{ > > "shard":"solrexperiment:8080_solr_experiment_01_10_2012","roles":null,"state":"active","core":"01_10_2012_experiment", > > "collection":"experiment","node_name":"IS-17093:9090_solr","base_url":"http://IS-17093:9090/solr","leader":"true"}}}, > "solrexperiment:8080_solr_experiment_02_10_2012":{"replicas":{}}}} > > > From: Gilles Comeau [mailto:gilles.com...@polecat.co] > Sent: 13 November 2012 16:29 > To: solr-user@lucene.apache.org; markrmil...@gmail.com > Subject: RE: Removing Shards from Zookeeper - no servers hosting shard > > > When I do the unload through the UI, I see the below messages in the solr > log. Nothing in the zookeeper log. > > > > Then right after I try: > http://217.147.83.124:9090/solr/experiment_master/select?q=*%3A*&wt=xml&distrib=true > and get <str name="msg">no servers hosting shard:</str>. Also, I still > see the shard being referenced in the cloud tab in the UI. > > > > [cid:image001.png@01CDC1BB.FD2BE590] > > > > Does this work for anyone else using SOLR 4.0 production with external > zookeeper and distributed queries and if so, can you let me know exactly what > versions and steps you take to not get this error? ☺ Anyone else have any > problems getting this to work? > > > > > My setup is pretty basic: Local external zookeeper 3.3.6, solr 4.0 with > three cores seen above. > > > > Regards, Gilles > > > > INFO: [02_10_2012_experiment] CLOSING SolrCore > org.apache.solr.core.SolrCore@11e3c2c6<mailto:org.apache.solr.core.SolrCore@11e3c2c6> > > 13-Nov-2012 16:19:13 org.apache.solr.core.SolrCore closeSearcher > > INFO: [02_10_2012_experiment] Closing main searcher on request. > > 13-Nov-2012 16:19:13 org.apache.solr.search.SolrIndexSearcher close > > FINE: Closing Searcher@7cd47880 main > > > fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=7,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} > > > filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=1,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} > > > queryResultCache{lookups=4,hits=3,hitratio=0.75,inserts=2,evictions=0,size=2,warmupTime=0,cumulative_lookups=4,cumulative_hits=3,cumulative_hitratio=0.75,cumulative_inserts=1,cumulative_evictions=0} > > > documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} > > 13-Nov-2012 16:19:13 org.apache.solr.core.CachingDirectoryFactory close > > FINE: Closing: > CachedDir<<org.apache.lucene.store.MMapDirectory@/solr2/cores/02_10_2012/data/index > > lockFactory=org.apache.lucene.store.NativeFSLockFactory@717757ad;refCount=1;path=/solr2/cores/02_10_2012/data/index;done=false<mailto:org.apache.lucene.store.MMapDirectory@/solr2/cores/02_10_2012/data/index%20lockFactory=org.apache.lucene.store.NativeFSLockFactory@717757ad;refCount=1;path=/solr2/cores/02_10_2012/data/index;done=false>>> > > 13-Nov-2012 16:19:13 org.apache.solr.update.DirectUpdateHandler2 close > > INFO: closing DirectUpdateHandler2{commits=0,autocommits=0,soft > autocommits=0,optimizes=0,rollbacks=0,expungeDeletes=0,docsPending=0,adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=0,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=0} > > 13-Nov-2012 16:19:13 org.apache.solr.update.DefaultSolrCoreState decref > > INFO: SolrCoreState ref count has reached 0 - closing IndexWriter > > 13-Nov-2012 16:19:13 org.apache.solr.update.DefaultSolrCoreState decref > > INFO: Closing SolrCoreState - canceling any ongoing recovery > > 13-Nov-2012 16:19:13 org.apache.solr.core.CoreContainer persistFile > > INFO: Persisting cores config to /solr2/solr.xml > > 13-Nov-2012 16:19:13 org.apache.solr.core.Config getVal > > FINE: null solr/cores/@adminPath=/admin/cores > > 13-Nov-2012 16:19:13 org.apache.solr.core.Config getNode > > FINE: null missing optional solr/cores/@shareSchema > > 13-Nov-2012 16:19:13 org.apache.solr.core.Config getVal > > FINE: null solr/cores/@hostPort=9090 > > 13-Nov-2012 16:19:13 org.apache.solr.core.Config getVal > > FINE: null solr/cores/@zkClientTimeout=10000 > > 13-Nov-2012 16:19:13 org.apache.solr.core.Config getVal > > FINE: null solr/cores/@hostContext=solr > > 13-Nov-2012 16:19:13 org.apache.solr.core.Config getNode > > FINE: null missing optional solr/cores/@leaderVoteWait > > 13-Nov-2012 16:19:13 org.apache.solr.core.SolrXMLSerializer persistFile > > INFO: Persisting cores config to /solr2/solr.xml > > 13-Nov-2012 16:19:13 org.apache.solr.common.cloud.ZkStateReader > updateClusterState > > INFO: Updating cloud state from ZooKeeper... > > 13-Nov-2012 16:19:13 org.apache.solr.common.cloud.ZkStateReader$2 process > > INFO: A cluster state change has occurred - updating... > > > > -----Original Message----- > From: Mark Miller [mailto:markrmil...@gmail.com] > Sent: 13 November 2012 14:13 > To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org> > Subject: Re: Removing Shards from Zookeeper - no servers hosting shard > > > > Odd...the unload command should be enough... > > > > On Tue, Nov 13, 2012 at 5:26 AM, Gilles Comeau > <gilles.com...@polecat.co<mailto:gilles.com...@polecat.co>> wrote: > >> Hi all, > >> > >> We've just updated to SOLR 4.0 production and Zookeeper 3.3.6 from SOLR 4.0 >> development version circa November 2011. We keep 6 months of data online in >> our primary cluster, and archive off old stuff to a slower disk archive >> cluster. We used to remove SOLR cores with the following code, but >> everything has changed in Zookeeper now. > >> > >> Old code to remove cores from Zookeeper: > >> > >> > >> curl >> http://127.0.0.1:8080/solr/admin/cores?action=UNLOAD&core=${SHARD}<http://127.0.0.1:8080/solr/admin/cores?action=UNLOAD&core=$%7bSHARD%7d<http://127.0.0.1:8080/solr/admin/cores?action=UNLOAD&core=$%7bSHARD%7d%3chttp://127.0.0.1:8080/solr/admin/cores?action=UNLOAD&core=$%7bSHARD%7d>> > >> > >> echo "Removing indexes from all Zookeeper hosts" > >> for (( i=0; i<${#ZK_HOSTS[*]}; i++ )) > >> do > >> $JAVA -cp >> .:/apps/zookeeper-3.3.5/zookeeper-3.3.5.jar:/apps/zookeeper-3.3.5/lib/jline-0.9.94.jar:/apps/zookeeper-3.3.5/lib/log4j-1.2.15.jar >> org.apache.zookeeper.ZooKeeperMain -server ${ZK_HOSTS[$i]} delete >> /collections/polecat/shards/solrenglish:8080_solr_$SHARD/$HOSTNAME:8080_solr_$SHARD > >> $JAVA -cp >> .:/apps/zookeeper-3.3.5/zookeeper-3.3.5.jar:/apps/zookeeper-3.3.5/lib/jline-0.9.94.jar:/apps/zookeeper-3.3.5/lib/log4j-1.2.15.jar >> org.apache.zookeeper.ZooKeeperMain -server ${ZK_HOSTS[$i]} delete >> /collections/polecat/shards/solrenglish:8080_solr_$SHARD > >> Done > >> > >> curl http://solrmaster01:8080/solr/admin/cores?action=RELOAD&core=master > >> > >> Now that we have migrated, I have tried removing cores from Zookeeper by >> removing the stuff for the unloaded core in "leaders" and "leader_elect", >> but for some reason SOLR keeps sending the requests to the shard, and I end >> up with the "no servers hosting shard" error. > >> > >> Does anyone know how to remove a SOLR core from a SOLR server and have >> Zookeeper updated, and have distributed queries still work? The only thing >> I know how to do now is stop tomcat, stop zookeeper, clear out the data >> directory and then restart both. This isn't really ideal for a process I'd >> like to have running each night, and surely it is something others have it. >> I've tried google searching, and what I find is references to the bug where >> solr notifies zookeeper on core unloads which is marked as fixed, and people >> talking about how it doesn't work but if your run reloads on each core, it >> will work. (also doesn't work when I do it) > >> > >> Regards, > >> > >> Gilles Comeau > > > > > > > > -- > > - Mark