Re: Solrcloud not reachable and after restart just a no servers hosting shard

2012-12-21 Thread gumatias
I'm getting the same error. I followed the SolrCloud examples and it didn't
work.. here's basically what I've done:

EXPERIMENT 1: start shards and index documents, search for documents in all
replicas

# Starting Shards
- Shard1 Leader (with zookeeper)
java -Dbootstrap_confdir=./solr/collection1/conf
-Dcollection.configName=myconf -DzkRun
-DzkHost=localhost:9983,localhost:8574,localhost:9900 -DnumShards=2 -jar
start.jar
- Shard1 Replica (with zookeeper)
java -Djetty.port=7574 -DzkRun
-DzkHost=localhost:9983,localhost:8574,localhost:9900 -jar start.jar
- Shard2 Leader (with zookeeper)
java -Djetty.port=8900 -DzkRun
-DzkHost=localhost:9983,localhost:8574,localhost:9900 -jar start.jar
- Shard2 Replica
java -Djetty.port=7500
-DzkHost=localhost:9983,localhost:8574,localhost:9900 -jar start.jar

clusterstate.json: http://dl.dropbox.com/u/7570330/clusterstate.txt

# Indexing sample document
java -jar post.jar hd.xml

# search in all Shards: number of results found: 2
Note: all shards have the same result

EXPERIMENT 2: Kill current Shard1 Leader, expect Shard1 Replica to become
leader, search should still work and results return (is that right?)

# Killing Shard2 Leader

Shard2 Replica logs:
...
Dec 21, 2012 11:57:39 AM org.apache.zookeeper.server.PrepRequestProcessor
pRequest
INFO: Processed session termination for sessionid: 0x3bbe3403c1
Dec 21, 2012 11:57:39 AM org.apache.zookeeper.server.PrepRequestProcessor
pRequest
INFO: Got user-level KeeperException when processing
sessionid:0x3bbe3403c0 type:delete cxid:0x4dea zxid:0xfffe
txntype:unknown reqpath:n/a Error
Path:/collections/collection1/leaders/shard1 Error:KeeperErrorCode = NoNode
for /collections/collection1/leaders/shard1
Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.ShardLeaderElectionContext
runLeaderProcess
INFO: Running the leader process.
Dec 21, 2012 11:57:39 AM org.apache.zookeeper.server.PrepRequestProcessor
pRequest
INFO: Got user-level KeeperException when processing
sessionid:0x3bbe3403c0 type:create cxid:0x4dec zxid:0xfffe
txntype:unknown reqpath:n/a Error Path:/overseer Error:KeeperErrorCode =
NodeExists for /overseer
Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.ShardLeaderElectionContext
shouldIBeLeader
INFO: Checking if I should try and be the leader.
Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.ShardLeaderElectionContext
shouldIBeLeader
INFO: My last published State was Active, it's okay to be the leader.
Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.ShardLeaderElectionContext
runLeaderProcess
INFO: I may be the new leader - try and sync
Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.SyncStrategy sync
INFO: Sync replicas to
http://Gustavos-MacBook-Pro.local:8900/solr/collection1/
Dec 21, 2012 11:57:39 AM org.apache.solr.update.PeerSync sync
INFO: PeerSync: core=collection1
url=http://Gustavos-MacBook-Pro.local:8900/solr START
replicas=[http://Gustavos-MacBook-Pro.local:8983/solr/collection1/]
nUpdates=100
Dec 21, 2012 11:57:39 AM org.apache.solr.update.PeerSync sync
INFO: PeerSync: core=collection1
url=http://Gustavos-MacBook-Pro.local:8900/solr DONE.  We have no versions. 
sync failed.
Dec 21, 2012 11:57:39 AM org.apache.solr.common.SolrException log
SEVERE: Sync Failed
Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.ShardLeaderElectionContext
rejoinLeaderElection
INFO: There is a better leader candidate than us - going back into recovery
Dec 21, 2012 11:57:39 AM org.apache.solr.update.DefaultSolrCoreState
doRecovery
INFO: Running recovery - first canceling any ongoing recovery
Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.RecoveryStrategy run
INFO: Starting recovery process.  core=collection1
recoveringAfterStartup=false
Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: Attempting to PeerSync from
http://Gustavos-MacBook-Pro.local:8983/solr/collection1/ core=collection1 -
recoveringAfterStartup=false
Dec 21, 2012 11:57:39 AM org.apache.solr.update.PeerSync sync
INFO: PeerSync: core=collection1
url=http://Gustavos-MacBook-Pro.local:8900/solr START
replicas=[http://Gustavos-MacBook-Pro.local:8983/solr/collection1/]
nUpdates=100
Dec 21, 2012 11:57:39 AM org.apache.zookeeper.server.PrepRequestProcessor
pRequest
INFO: Got user-level KeeperException when processing
sessionid:0x3bbe3403c0 type:delete cxid:0x4df3 zxid:0xfffe
txntype:unknown reqpath:n/a Error
Path:/collections/collection1/leaders/shard1 Error:KeeperErrorCode = NoNode
for /collections/collection1/leaders/shard1
Dec 21, 2012 11:57:39 AM org.apache.solr.update.PeerSync sync
WARNING: no frame of reference to tell of we've missed updates
Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: PeerSync Recovery was not successful - trying replication.
core=collection1
Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: Starting Replication Recovery. core=collection1
Dec 21, 2012 11:57:39 AM 

Re: Solrcloud not reachable and after restart just a no servers hosting shard

2012-12-21 Thread Mark Miller
Your hitting https://issues.apache.org/jira/browse/SOLR-3939

The luck of hashing must have left the guy trying to become the leader without 
any docs. Due to SOLR-3939, a node with an empty index cannot become the leader.

- Mark

On Dec 21, 2012, at 1:41 PM, gumatias gust...@matias.com wrote:

 I'm getting the same error. I followed the SolrCloud examples and it didn't
 work.. here's basically what I've done:
 
 EXPERIMENT 1: start shards and index documents, search for documents in all
 replicas
 
 # Starting Shards
 - Shard1 Leader (with zookeeper)
   java -Dbootstrap_confdir=./solr/collection1/conf
 -Dcollection.configName=myconf -DzkRun
 -DzkHost=localhost:9983,localhost:8574,localhost:9900 -DnumShards=2 -jar
 start.jar
 - Shard1 Replica (with zookeeper)
   java -Djetty.port=7574 -DzkRun
 -DzkHost=localhost:9983,localhost:8574,localhost:9900 -jar start.jar
 - Shard2 Leader (with zookeeper)
   java -Djetty.port=8900 -DzkRun
 -DzkHost=localhost:9983,localhost:8574,localhost:9900 -jar start.jar
 - Shard2 Replica
   java -Djetty.port=7500
 -DzkHost=localhost:9983,localhost:8574,localhost:9900 -jar start.jar
 
 clusterstate.json: http://dl.dropbox.com/u/7570330/clusterstate.txt
 
 # Indexing sample document
   java -jar post.jar hd.xml
   
 # search in all Shards: number of results found: 2
 Note: all shards have the same result
 
 EXPERIMENT 2: Kill current Shard1 Leader, expect Shard1 Replica to become
 leader, search should still work and results return (is that right?)
 
 # Killing Shard2 Leader
 
 Shard2 Replica logs:
 ...
 Dec 21, 2012 11:57:39 AM org.apache.zookeeper.server.PrepRequestProcessor
 pRequest
 INFO: Processed session termination for sessionid: 0x3bbe3403c1
 Dec 21, 2012 11:57:39 AM org.apache.zookeeper.server.PrepRequestProcessor
 pRequest
 INFO: Got user-level KeeperException when processing
 sessionid:0x3bbe3403c0 type:delete cxid:0x4dea zxid:0xfffe
 txntype:unknown reqpath:n/a Error
 Path:/collections/collection1/leaders/shard1 Error:KeeperErrorCode = NoNode
 for /collections/collection1/leaders/shard1
 Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.ShardLeaderElectionContext
 runLeaderProcess
 INFO: Running the leader process.
 Dec 21, 2012 11:57:39 AM org.apache.zookeeper.server.PrepRequestProcessor
 pRequest
 INFO: Got user-level KeeperException when processing
 sessionid:0x3bbe3403c0 type:create cxid:0x4dec zxid:0xfffe
 txntype:unknown reqpath:n/a Error Path:/overseer Error:KeeperErrorCode =
 NodeExists for /overseer
 Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.ShardLeaderElectionContext
 shouldIBeLeader
 INFO: Checking if I should try and be the leader.
 Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.ShardLeaderElectionContext
 shouldIBeLeader
 INFO: My last published State was Active, it's okay to be the leader.
 Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.ShardLeaderElectionContext
 runLeaderProcess
 INFO: I may be the new leader - try and sync
 Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.SyncStrategy sync
 INFO: Sync replicas to
 http://Gustavos-MacBook-Pro.local:8900/solr/collection1/
 Dec 21, 2012 11:57:39 AM org.apache.solr.update.PeerSync sync
 INFO: PeerSync: core=collection1
 url=http://Gustavos-MacBook-Pro.local:8900/solr START
 replicas=[http://Gustavos-MacBook-Pro.local:8983/solr/collection1/]
 nUpdates=100
 Dec 21, 2012 11:57:39 AM org.apache.solr.update.PeerSync sync
 INFO: PeerSync: core=collection1
 url=http://Gustavos-MacBook-Pro.local:8900/solr DONE.  We have no versions. 
 sync failed.
 Dec 21, 2012 11:57:39 AM org.apache.solr.common.SolrException log
 SEVERE: Sync Failed
 Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.ShardLeaderElectionContext
 rejoinLeaderElection
 INFO: There is a better leader candidate than us - going back into recovery
 Dec 21, 2012 11:57:39 AM org.apache.solr.update.DefaultSolrCoreState
 doRecovery
 INFO: Running recovery - first canceling any ongoing recovery
 Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.RecoveryStrategy run
 INFO: Starting recovery process.  core=collection1
 recoveringAfterStartup=false
 Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.RecoveryStrategy doRecovery
 INFO: Attempting to PeerSync from
 http://Gustavos-MacBook-Pro.local:8983/solr/collection1/ core=collection1 -
 recoveringAfterStartup=false
 Dec 21, 2012 11:57:39 AM org.apache.solr.update.PeerSync sync
 INFO: PeerSync: core=collection1
 url=http://Gustavos-MacBook-Pro.local:8900/solr START
 replicas=[http://Gustavos-MacBook-Pro.local:8983/solr/collection1/]
 nUpdates=100
 Dec 21, 2012 11:57:39 AM org.apache.zookeeper.server.PrepRequestProcessor
 pRequest
 INFO: Got user-level KeeperException when processing
 sessionid:0x3bbe3403c0 type:delete cxid:0x4df3 zxid:0xfffe
 txntype:unknown reqpath:n/a Error
 Path:/collections/collection1/leaders/shard1 Error:KeeperErrorCode = NoNode
 for /collections/collection1/leaders/shard1
 Dec 21, 2012 11:57:39 AM 

Re: Solrcloud not reachable and after restart just a no servers hosting shard

2012-12-21 Thread Mark Miller
At least it looks like your hitting that - based on it mentioning no frame of 
reference to use to sync with - more importantly though, your also hitting 
another issue - see my email to the user list:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201212.mbox/%3cd0994a2d-04b0-4a80-af07-9add49b85...@gmail.com%3E

- Mark

On Dec 21, 2012, at 2:10 PM, Mark Miller markrmil...@gmail.com wrote:

 Your hitting https://issues.apache.org/jira/browse/SOLR-3939
 
 The luck of hashing must have left the guy trying to become the leader 
 without any docs. Due to SOLR-3939, a node with an empty index cannot become 
 the leader.
 
 - Mark
 
 On Dec 21, 2012, at 1:41 PM, gumatias gust...@matias.com wrote:
 
 I'm getting the same error. I followed the SolrCloud examples and it didn't
 work.. here's basically what I've done:
 
 EXPERIMENT 1: start shards and index documents, search for documents in all
 replicas
 
 # Starting Shards
 - Shard1 Leader (with zookeeper)
  java -Dbootstrap_confdir=./solr/collection1/conf
 -Dcollection.configName=myconf -DzkRun
 -DzkHost=localhost:9983,localhost:8574,localhost:9900 -DnumShards=2 -jar
 start.jar
 - Shard1 Replica (with zookeeper)
  java -Djetty.port=7574 -DzkRun
 -DzkHost=localhost:9983,localhost:8574,localhost:9900 -jar start.jar
 - Shard2 Leader (with zookeeper)
  java -Djetty.port=8900 -DzkRun
 -DzkHost=localhost:9983,localhost:8574,localhost:9900 -jar start.jar
 - Shard2 Replica
  java -Djetty.port=7500
 -DzkHost=localhost:9983,localhost:8574,localhost:9900 -jar start.jar
 
 clusterstate.json: http://dl.dropbox.com/u/7570330/clusterstate.txt
 
 # Indexing sample document
  java -jar post.jar hd.xml
  
 # search in all Shards: number of results found: 2
 Note: all shards have the same result
 
 EXPERIMENT 2: Kill current Shard1 Leader, expect Shard1 Replica to become
 leader, search should still work and results return (is that right?)
 
 # Killing Shard2 Leader
 
 Shard2 Replica logs:
 ...
 Dec 21, 2012 11:57:39 AM org.apache.zookeeper.server.PrepRequestProcessor
 pRequest
 INFO: Processed session termination for sessionid: 0x3bbe3403c1
 Dec 21, 2012 11:57:39 AM org.apache.zookeeper.server.PrepRequestProcessor
 pRequest
 INFO: Got user-level KeeperException when processing
 sessionid:0x3bbe3403c0 type:delete cxid:0x4dea zxid:0xfffe
 txntype:unknown reqpath:n/a Error
 Path:/collections/collection1/leaders/shard1 Error:KeeperErrorCode = NoNode
 for /collections/collection1/leaders/shard1
 Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.ShardLeaderElectionContext
 runLeaderProcess
 INFO: Running the leader process.
 Dec 21, 2012 11:57:39 AM org.apache.zookeeper.server.PrepRequestProcessor
 pRequest
 INFO: Got user-level KeeperException when processing
 sessionid:0x3bbe3403c0 type:create cxid:0x4dec zxid:0xfffe
 txntype:unknown reqpath:n/a Error Path:/overseer Error:KeeperErrorCode =
 NodeExists for /overseer
 Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.ShardLeaderElectionContext
 shouldIBeLeader
 INFO: Checking if I should try and be the leader.
 Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.ShardLeaderElectionContext
 shouldIBeLeader
 INFO: My last published State was Active, it's okay to be the leader.
 Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.ShardLeaderElectionContext
 runLeaderProcess
 INFO: I may be the new leader - try and sync
 Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.SyncStrategy sync
 INFO: Sync replicas to
 http://Gustavos-MacBook-Pro.local:8900/solr/collection1/
 Dec 21, 2012 11:57:39 AM org.apache.solr.update.PeerSync sync
 INFO: PeerSync: core=collection1
 url=http://Gustavos-MacBook-Pro.local:8900/solr START
 replicas=[http://Gustavos-MacBook-Pro.local:8983/solr/collection1/]
 nUpdates=100
 Dec 21, 2012 11:57:39 AM org.apache.solr.update.PeerSync sync
 INFO: PeerSync: core=collection1
 url=http://Gustavos-MacBook-Pro.local:8900/solr DONE.  We have no versions. 
 sync failed.
 Dec 21, 2012 11:57:39 AM org.apache.solr.common.SolrException log
 SEVERE: Sync Failed
 Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.ShardLeaderElectionContext
 rejoinLeaderElection
 INFO: There is a better leader candidate than us - going back into recovery
 Dec 21, 2012 11:57:39 AM org.apache.solr.update.DefaultSolrCoreState
 doRecovery
 INFO: Running recovery - first canceling any ongoing recovery
 Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.RecoveryStrategy run
 INFO: Starting recovery process.  core=collection1
 recoveringAfterStartup=false
 Dec 21, 2012 11:57:39 AM org.apache.solr.cloud.RecoveryStrategy doRecovery
 INFO: Attempting to PeerSync from
 http://Gustavos-MacBook-Pro.local:8983/solr/collection1/ core=collection1 -
 recoveringAfterStartup=false
 Dec 21, 2012 11:57:39 AM org.apache.solr.update.PeerSync sync
 INFO: PeerSync: core=collection1
 url=http://Gustavos-MacBook-Pro.local:8900/solr START
 replicas=[http://Gustavos-MacBook-Pro.local:8983/solr/collection1/]
 nUpdates=100
 Dec 21, 

Solrcloud not reachable and after restart just a no servers hosting shard

2012-09-24 Thread Daniel Brügge
Hi,

I am running Solrcloud 4.0-BETA and during the weekend it 'crashed' somehow,
so  that it wasn't reachable. CPU load was 100%.

After a restart i couldn't access the data it just telled me:

no servers hosting shard

Is there a way to get the data back?

Thanks  regards

Daniel


Re: Solrcloud not reachable and after restart just a no servers hosting shard

2012-09-24 Thread Sami Siren
hi,

Can you share a little bit more about your configuration: how many
shards, # of replicas, how does your clusterstate.json look like,
anything suspicious in the logs?

--
 Sami Siren

On Mon, Sep 24, 2012 at 11:13 AM, Daniel Brügge
daniel.brue...@gmail.com wrote:
 Hi,

 I am running Solrcloud 4.0-BETA and during the weekend it 'crashed' somehow,
 so  that it wasn't reachable. CPU load was 100%.

 After a restart i couldn't access the data it just telled me:

 no servers hosting shard

 Is there a way to get the data back?

 Thanks  regards

 Daniel


Re: Solrcloud not reachable and after restart just a no servers hosting shard

2012-09-24 Thread Mark Miller
Right - we need logs, admin-cloud dump to clipboard info, anything
else to go on.

On Mon, Sep 24, 2012 at 4:36 AM, Sami Siren ssi...@gmail.com wrote:
 hi,

 Can you share a little bit more about your configuration: how many
 shards, # of replicas, how does your clusterstate.json look like,
 anything suspicious in the logs?

 --
  Sami Siren

 On Mon, Sep 24, 2012 at 11:13 AM, Daniel Brügge
 daniel.brue...@gmail.com wrote:
 Hi,

 I am running Solrcloud 4.0-BETA and during the weekend it 'crashed' somehow,
 so  that it wasn't reachable. CPU load was 100%.

 After a restart i couldn't access the data it just telled me:

 no servers hosting shard

 Is there a way to get the data back?

 Thanks  regards

 Daniel



-- 
- Mark