Re: InterruptedException handling between solr->zk interactions
Ok. I've found my fix for the expired /autoscaling.json spin in OTT diff --git a/solr/core/src/java/org/apache/solr/cloud/autoscaling/OverseerTriggerThread.java b/solr/core/src/java/org/apache/solr/cloud/autoscaling/OverseerTriggerThread.java index ece4c4c..6fe2057 100644 --- a/solr/core/src/java/org/apache/solr/cloud/autoscaling/OverseerTriggerThread.java +++ b/solr/core/src/java/org/apache/solr/cloud/autoscaling/OverseerTriggerThread.java @@ -142,8 +142,14 @@ Thread.currentThread().interrupt(); log.warn("Interrupted", e); break; - } catch (IOException | KeeperException e) { + } + catch (IOException | KeeperException e) { log.error("A ZK error has occurred", e); +if (e.getCause()!=null && e.getCause() instanceof KeeperException.SessionExpiredException) { + log.warn("Solr cannot talk to ZK, exiting " + + getClass().getSimpleName() + " main queue loop", e); + return; +} } } I'll put as a part of SOLR-12200 On Sat, Apr 14, 2018 at 1:12 AM, Varun Thacker wrote: > Hi Mikhail, > > My checkout already has that commit when i ran into this issue. I'll reply > on SOLR-7736 with some more details. > > > On Fri, Apr 13, 2018 at 3:02 PM, Mikhail Khludnev wrote: > >> Hello, Varun. >> >> If you are bothered with >> --- Thousands of "Session expired for /autoscaling.json" messages before >> I had to manually kill the test run >> it should be resolved by >> https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;a= >> commitdiff;h=a4789db >> >> >> On Sat, Apr 14, 2018 at 12:31 AM, Varun Thacker >> wrote: >> >>> Is there a general strategy on how to deal with InterruptedException >>> while issues a zookeeper call from solr? >>> >>> Here's a more concrete example which I am unsure if it's doing the right >>> thing or not: >>> >>> https://github.com/apache/lucene-solr/blob/master/solr/core/ >>> src/java/org/apache/solr/cloud/OverseerTaskProcessor.java#L180 >>> >>> This code simply catches Exception. So if InterruptedException is thrown >>> , we simply log an ERROR and move on. >>> >>> Excerpt logs from a local failed test run: https://gist.github.com/v >>> thacker/5dcb8978ba177d8725e98c5d433ee6c2 >>> >>> >> >> >> -- >> Sincerely yours >> Mikhail Khludnev >> > > -- Sincerely yours Mikhail Khludnev
Re: InterruptedException handling between solr->zk interactions
Yes, I've seen these issues too. The right thing to do is to close all resources (in some cases finish anything that can't be left in a bad state) and exit. In this particular case I'd think the InterruptedException is swallowed unintentionally because of the catch (Exception ). I suspect for the OverseerTaskProcessor the right thing to do is to close and exit?. We should at the very least be restoring the interrupted flag (so that Mikhail's fix would make the thread exit immediately) On Fri, Apr 13, 2018 at 3:02 PM, Mikhail Khludnev wrote: > Hello, Varun. > > If you are bothered with > --- Thousands of "Session expired for /autoscaling.json" messages before I > had to manually kill the test run > it should be resolved by > https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;a= > commitdiff;h=a4789db > > > On Sat, Apr 14, 2018 at 12:31 AM, Varun Thacker wrote: > >> Is there a general strategy on how to deal with InterruptedException >> while issues a zookeeper call from solr? >> >> Here's a more concrete example which I am unsure if it's doing the right >> thing or not: >> >> https://github.com/apache/lucene-solr/blob/master/solr/core/ >> src/java/org/apache/solr/cloud/OverseerTaskProcessor.java#L180 >> >> This code simply catches Exception. So if InterruptedException is thrown >> , we simply log an ERROR and move on. >> >> Excerpt logs from a local failed test run: https://gist.github.com/v >> thacker/5dcb8978ba177d8725e98c5d433ee6c2 >> >> > > > -- > Sincerely yours > Mikhail Khludnev >
Re: InterruptedException handling between solr->zk interactions
Hi Mikhail, My checkout already has that commit when i ran into this issue. I'll reply on SOLR-7736 with some more details. On Fri, Apr 13, 2018 at 3:02 PM, Mikhail Khludnev wrote: > Hello, Varun. > > If you are bothered with > --- Thousands of "Session expired for /autoscaling.json" messages before I > had to manually kill the test run > it should be resolved by > https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;a= > commitdiff;h=a4789db > > > On Sat, Apr 14, 2018 at 12:31 AM, Varun Thacker wrote: > >> Is there a general strategy on how to deal with InterruptedException >> while issues a zookeeper call from solr? >> >> Here's a more concrete example which I am unsure if it's doing the right >> thing or not: >> >> https://github.com/apache/lucene-solr/blob/master/solr/core/ >> src/java/org/apache/solr/cloud/OverseerTaskProcessor.java#L180 >> >> This code simply catches Exception. So if InterruptedException is thrown >> , we simply log an ERROR and move on. >> >> Excerpt logs from a local failed test run: https://gist.github.com/v >> thacker/5dcb8978ba177d8725e98c5d433ee6c2 >> >> > > > -- > Sincerely yours > Mikhail Khludnev >
Re: InterruptedException handling between solr->zk interactions
Hello, Varun. If you are bothered with --- Thousands of "Session expired for /autoscaling.json" messages before I had to manually kill the test run it should be resolved by https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;a=commitdiff;h=a4789db On Sat, Apr 14, 2018 at 12:31 AM, Varun Thacker wrote: > Is there a general strategy on how to deal with InterruptedException while > issues a zookeeper call from solr? > > Here's a more concrete example which I am unsure if it's doing the right > thing or not: > > https://github.com/apache/lucene-solr/blob/master/solr/ > core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java#L180 > > This code simply catches Exception. So if InterruptedException is thrown , > we simply log an ERROR and move on. > > Excerpt logs from a local failed test run: https://gist.github.com/ > vthacker/5dcb8978ba177d8725e98c5d433ee6c2 > > -- Sincerely yours Mikhail Khludnev