Re: RecoveryStrategy overseer session expired

2012-07-23 Thread Mark Miller
https://issues.apache.org/jira/browse/SOLR-3647 DistrubtedQueue should use our 
Solr zk client rather than the std zk client.

I didn't add a CHANGES entry, but there should be one under bugs - I hadn't 
fully thought through the problems other than it would not deal with connection 
loss correctly.

Session Expiration is even worse though - the std ZooKeeper client is useless 
after an expiration - you have to create a new client. Otherwise it will keep 
throwing an expired connection exception every time you try and use it.

The solr zk client handles this transparently and you can keep using the same 
client instance once it reconnects.

- Mark


On Jul 23, 2012, at 10:53 AM, Mark Miller wrote:

> Hey - out of the house for a bit so I don't have the issue number, but a few 
> days ago I resolved an issue around the distrib queue using the straight zk 
> client and not the solr zk client. 
> 
> I'm not 100% since I'm out on the street, but I think that will probably 
> solve your issue. 
> 
> Sent from my iPhone
> 
> On Jul 23, 2012, at 9:58 AM, "Trym R. Møller"  wrote:
> 
>> Hi
>> 
>> Running SolrCloud with a Solr loosing its zookeeper connection while having 
>> a replica I see the below log message repeatedly and the shard never 
>> recovers. The Solr has successfully reconnected to ZooKeeper and ZooKeeper 
>> is running fine.
>> I know that the cause is the loss of the ZooKeeper connection and I will 
>> work on that, but I can guarantee that one of my ZooKeepers will go down at 
>> some point (e.g. by a system admin), so I need the recovery to work. I can 
>> see the code has changed recently just in this area.
>> 
>> Do anyone have a hint of what I may do to get more information about this?
>> 
>> Thanks in advance for any comments.
>> 
>> Best regards Trym
>> 
>> SEVERE: Error while trying to recover.
>> org.apache.zookeeper.KeeperException$SessionExpiredException: 
>> KeeperErrorCode = Session expired for /overseer/queue/qn-
>>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
>>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>>   at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:643)
>>   at org.apache.solr.cloud.DistributedQueue.offer(DistributedQueue.java:236)
>>   at org.apache.solr.cloud.ZkController.publish(ZkController.java:745)
>>   at 
>> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:288)
>>   at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:210)

- Mark Miller
lucidimagination.com













Re: RecoveryStrategy overseer session expired

2012-07-23 Thread Mark Miller
Hey - out of the house for a bit so I don't have the issue number, but a few 
days ago I resolved an issue around the distrib queue using the straight zk 
client and not the solr zk client. 

I'm not 100% since I'm out on the street, but I think that will probably solve 
your issue. 

Sent from my iPhone

On Jul 23, 2012, at 9:58 AM, "Trym R. Møller"  wrote:

> Hi
> 
> Running SolrCloud with a Solr loosing its zookeeper connection while having a 
> replica I see the below log message repeatedly and the shard never recovers. 
> The Solr has successfully reconnected to ZooKeeper and ZooKeeper is running 
> fine.
> I know that the cause is the loss of the ZooKeeper connection and I will work 
> on that, but I can guarantee that one of my ZooKeepers will go down at some 
> point (e.g. by a system admin), so I need the recovery to work. I can see the 
> code has changed recently just in this area.
> 
> Do anyone have a hint of what I may do to get more information about this?
> 
> Thanks in advance for any comments.
> 
> Best regards Trym
> 
> SEVERE: Error while trying to recover.
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired for /overseer/queue/qn-
>at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
>at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:643)
>at org.apache.solr.cloud.DistributedQueue.offer(DistributedQueue.java:236)
>at org.apache.solr.cloud.ZkController.publish(ZkController.java:745)
>at 
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:288)
>at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:210)


RecoveryStrategy overseer session expired

2012-07-23 Thread Trym R. Møller

Hi

Running SolrCloud with a Solr loosing its zookeeper connection while 
having a replica I see the below log message repeatedly and the shard 
never recovers. The Solr has successfully reconnected to ZooKeeper and 
ZooKeeper is running fine.
I know that the cause is the loss of the ZooKeeper connection and I will 
work on that, but I can guarantee that one of my ZooKeepers will go down 
at some point (e.g. by a system admin), so I need the recovery to work. 
I can see the code has changed recently just in this area.


Do anyone have a hint of what I may do to get more information about this?

Thanks in advance for any comments.

Best regards Trym

SEVERE: Error while trying to recover.
org.apache.zookeeper.KeeperException$SessionExpiredException: 
KeeperErrorCode = Session expired for /overseer/queue/qn-
at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:118)

at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:643)
at 
org.apache.solr.cloud.DistributedQueue.offer(DistributedQueue.java:236)

at org.apache.solr.cloud.ZkController.publish(ZkController.java:745)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:288)
at 
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:210)