Re: InterruptedException handling between solr->zk interactions

2018-04-14 Thread Mikhail Khludnev
Ok. I've found my fix for the expired /autoscaling.json spin in OTT

diff --git
a/solr/core/src/java/org/apache/solr/cloud/autoscaling/OverseerTriggerThread.java
b/solr/core/src/java/org/apache/solr/cloud/autoscaling/OverseerTriggerThread.java
index ece4c4c..6fe2057 100644
---
a/solr/core/src/java/org/apache/solr/cloud/autoscaling/OverseerTriggerThread.java
+++
b/solr/core/src/java/org/apache/solr/cloud/autoscaling/OverseerTriggerThread.java
@@ -142,8 +142,14 @@
 Thread.currentThread().interrupt();
 log.warn("Interrupted", e);
 break;
-  } catch (IOException | KeeperException e) {
+  }
+  catch (IOException | KeeperException e) {
 log.error("A ZK error has occurred", e);
+if (e.getCause()!=null && e.getCause() instanceof
KeeperException.SessionExpiredException) {
+  log.warn("Solr cannot talk to ZK, exiting " +
+  getClass().getSimpleName() + " main queue loop", e);
+  return;
+}
   }
 }
I'll put as a part of SOLR-12200


On Sat, Apr 14, 2018 at 1:12 AM, Varun Thacker  wrote:

> Hi Mikhail,
>
> My checkout already has that commit when i ran into this issue. I'll reply
> on SOLR-7736 with some more details.
>
>
> On Fri, Apr 13, 2018 at 3:02 PM, Mikhail Khludnev  wrote:
>
>> Hello, Varun.
>>
>> If you are bothered with
>> --- Thousands of "Session expired for /autoscaling.json" messages before
>> I had to manually kill the test run
>> it should be resolved by
>> https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;a=
>> commitdiff;h=a4789db
>>
>>
>> On Sat, Apr 14, 2018 at 12:31 AM, Varun Thacker 
>> wrote:
>>
>>> Is there a general strategy on how to deal with InterruptedException
>>> while issues a zookeeper call from solr?
>>>
>>> Here's a more concrete example which I am unsure if it's doing the right
>>> thing or not:
>>>
>>> https://github.com/apache/lucene-solr/blob/master/solr/core/
>>> src/java/org/apache/solr/cloud/OverseerTaskProcessor.java#L180
>>>
>>> This code simply catches Exception. So if InterruptedException is thrown
>>> , we simply log an ERROR and move on.
>>>
>>> Excerpt logs from a local failed test run: https://gist.github.com/v
>>> thacker/5dcb8978ba177d8725e98c5d433ee6c2
>>>
>>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>
>
>


-- 
Sincerely yours
Mikhail Khludnev


Re: InterruptedException handling between solr->zk interactions

2018-04-13 Thread Tomás Fernández Löbbe
Yes, I've seen these issues too. The right thing to do is to close all
resources (in some cases finish anything that can't be left in a bad state)
and exit. In this particular case I'd think the InterruptedException is
swallowed unintentionally because of the catch (Exception ). I suspect for
the OverseerTaskProcessor the right thing to do is to close and exit?. We
should at the very least be restoring the interrupted flag (so that
Mikhail's fix would make the thread exit immediately)

On Fri, Apr 13, 2018 at 3:02 PM, Mikhail Khludnev  wrote:

> Hello, Varun.
>
> If you are bothered with
> --- Thousands of "Session expired for /autoscaling.json" messages before I
> had to manually kill the test run
> it should be resolved by
> https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;a=
> commitdiff;h=a4789db
>
>
> On Sat, Apr 14, 2018 at 12:31 AM, Varun Thacker  wrote:
>
>> Is there a general strategy on how to deal with InterruptedException
>> while issues a zookeeper call from solr?
>>
>> Here's a more concrete example which I am unsure if it's doing the right
>> thing or not:
>>
>> https://github.com/apache/lucene-solr/blob/master/solr/core/
>> src/java/org/apache/solr/cloud/OverseerTaskProcessor.java#L180
>>
>> This code simply catches Exception. So if InterruptedException is thrown
>> , we simply log an ERROR and move on.
>>
>> Excerpt logs from a local failed test run: https://gist.github.com/v
>> thacker/5dcb8978ba177d8725e98c5d433ee6c2
>>
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: InterruptedException handling between solr->zk interactions

2018-04-13 Thread Varun Thacker
Hi Mikhail,

My checkout already has that commit when i ran into this issue. I'll reply
on SOLR-7736 with some more details.


On Fri, Apr 13, 2018 at 3:02 PM, Mikhail Khludnev  wrote:

> Hello, Varun.
>
> If you are bothered with
> --- Thousands of "Session expired for /autoscaling.json" messages before I
> had to manually kill the test run
> it should be resolved by
> https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;a=
> commitdiff;h=a4789db
>
>
> On Sat, Apr 14, 2018 at 12:31 AM, Varun Thacker  wrote:
>
>> Is there a general strategy on how to deal with InterruptedException
>> while issues a zookeeper call from solr?
>>
>> Here's a more concrete example which I am unsure if it's doing the right
>> thing or not:
>>
>> https://github.com/apache/lucene-solr/blob/master/solr/core/
>> src/java/org/apache/solr/cloud/OverseerTaskProcessor.java#L180
>>
>> This code simply catches Exception. So if InterruptedException is thrown
>> , we simply log an ERROR and move on.
>>
>> Excerpt logs from a local failed test run: https://gist.github.com/v
>> thacker/5dcb8978ba177d8725e98c5d433ee6c2
>>
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: InterruptedException handling between solr->zk interactions

2018-04-13 Thread Mikhail Khludnev
Hello, Varun.

If you are bothered with
--- Thousands of "Session expired for /autoscaling.json" messages before I
had to manually kill the test run
it should be resolved by
https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;a=commitdiff;h=a4789db


On Sat, Apr 14, 2018 at 12:31 AM, Varun Thacker  wrote:

> Is there a general strategy on how to deal with InterruptedException while
> issues a zookeeper call from solr?
>
> Here's a more concrete example which I am unsure if it's doing the right
> thing or not:
>
> https://github.com/apache/lucene-solr/blob/master/solr/
> core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java#L180
>
> This code simply catches Exception. So if InterruptedException is thrown ,
> we simply log an ERROR and move on.
>
> Excerpt logs from a local failed test run: https://gist.github.com/
> vthacker/5dcb8978ba177d8725e98c5d433ee6c2
>
>


-- 
Sincerely yours
Mikhail Khludnev