[jira] [Updated] (SOLR-8914) ZkStateReader's refreshLiveNodes(Watcher) is not thread safe

2016-05-09 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-8914:
---
Fix Version/s: (was: 6.0)
   master (7.0)


Manually correcting fixVersion per Step #S5 of LUCENE-7271


> ZkStateReader's refreshLiveNodes(Watcher) is not thread safe
> 
>
> Key: SOLR-8914
> URL: https://issues.apache.org/jira/browse/SOLR-8914
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.4.1, 5.5, 6.0
>Reporter: Hoss Man
>Assignee: Scott Blum
> Fix For: 5.5.1, 5.6, 6.0.1, 6.1, master (7.0)
>
> Attachments: SOLR-8914.patch, SOLR-8914.patch, SOLR-8914.patch, 
> SOLR-8914.patch, jenkins.thetaphi.de_Lucene-Solr-6.x-Solaris_32.log.txt, 
> live_node_mentions_port56361_with_threadIds.log.txt, 
> live_nodes_mentions.log.txt
>
>
> Jenkin's encountered a failure in TestTolerantUpdateProcessorCloud over the 
> weekend
> {noformat}
> http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Solaris/32/consoleText
> Checking out Revision c46d7686643e7503304cb35dfe546bce9c6684e7 
> (refs/remotes/origin/branch_6x)
> Using Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseG1GC
> {noformat}
> The failure happened during the static setup of the test, when a 
> MiniSolrCloudCluster & several clients are initialized -- before any code 
> related to TolerantUpdateProcessor is ever used.
> I can't reproduce this, or really make sense of what i'm (not) seeing here in 
> the logs, so i'm filing this jira with my analysis in the hopes that someone 
> else can help make sense of it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8914) ZkStateReader's refreshLiveNodes(Watcher) is not thread safe

2016-04-20 Thread Scott Blum (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Blum updated SOLR-8914:
-
Fix Version/s: 5.5.1

> ZkStateReader's refreshLiveNodes(Watcher) is not thread safe
> 
>
> Key: SOLR-8914
> URL: https://issues.apache.org/jira/browse/SOLR-8914
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.5, master, 6.0, 5.4.1
>Reporter: Hoss Man
>Assignee: Scott Blum
> Fix For: master, 5.5.1, 6.1, 5.6, 6.0.1
>
> Attachments: SOLR-8914.patch, SOLR-8914.patch, SOLR-8914.patch, 
> SOLR-8914.patch, jenkins.thetaphi.de_Lucene-Solr-6.x-Solaris_32.log.txt, 
> live_node_mentions_port56361_with_threadIds.log.txt, 
> live_nodes_mentions.log.txt
>
>
> Jenkin's encountered a failure in TestTolerantUpdateProcessorCloud over the 
> weekend
> {noformat}
> http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Solaris/32/consoleText
> Checking out Revision c46d7686643e7503304cb35dfe546bce9c6684e7 
> (refs/remotes/origin/branch_6x)
> Using Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseG1GC
> {noformat}
> The failure happened during the static setup of the test, when a 
> MiniSolrCloudCluster & several clients are initialized -- before any code 
> related to TolerantUpdateProcessor is ever used.
> I can't reproduce this, or really make sense of what i'm (not) seeing here in 
> the logs, so i'm filing this jira with my analysis in the hopes that someone 
> else can help make sense of it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8914) ZkStateReader's refreshLiveNodes(Watcher) is not thread safe

2016-04-20 Thread Scott Blum (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Blum updated SOLR-8914:
-
Fix Version/s: 5.6
   6.0.1

> ZkStateReader's refreshLiveNodes(Watcher) is not thread safe
> 
>
> Key: SOLR-8914
> URL: https://issues.apache.org/jira/browse/SOLR-8914
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.5, master, 6.0, 5.4.1
>Reporter: Hoss Man
>Assignee: Scott Blum
> Fix For: master, 6.1, 5.6, 6.0.1
>
> Attachments: SOLR-8914.patch, SOLR-8914.patch, SOLR-8914.patch, 
> SOLR-8914.patch, jenkins.thetaphi.de_Lucene-Solr-6.x-Solaris_32.log.txt, 
> live_node_mentions_port56361_with_threadIds.log.txt, 
> live_nodes_mentions.log.txt
>
>
> Jenkin's encountered a failure in TestTolerantUpdateProcessorCloud over the 
> weekend
> {noformat}
> http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Solaris/32/consoleText
> Checking out Revision c46d7686643e7503304cb35dfe546bce9c6684e7 
> (refs/remotes/origin/branch_6x)
> Using Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseG1GC
> {noformat}
> The failure happened during the static setup of the test, when a 
> MiniSolrCloudCluster & several clients are initialized -- before any code 
> related to TolerantUpdateProcessor is ever used.
> I can't reproduce this, or really make sense of what i'm (not) seeing here in 
> the logs, so i'm filing this jira with my analysis in the hopes that someone 
> else can help make sense of it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8914) ZkStateReader's refreshLiveNodes(Watcher) is not thread safe

2016-04-19 Thread Scott Blum (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Blum updated SOLR-8914:
-
Fix Version/s: 6.1
   master

> ZkStateReader's refreshLiveNodes(Watcher) is not thread safe
> 
>
> Key: SOLR-8914
> URL: https://issues.apache.org/jira/browse/SOLR-8914
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.5, master, 6.0, 5.4.1
>Reporter: Hoss Man
>Assignee: Scott Blum
> Fix For: master, 6.1
>
> Attachments: SOLR-8914.patch, SOLR-8914.patch, SOLR-8914.patch, 
> SOLR-8914.patch, jenkins.thetaphi.de_Lucene-Solr-6.x-Solaris_32.log.txt, 
> live_node_mentions_port56361_with_threadIds.log.txt, 
> live_nodes_mentions.log.txt
>
>
> Jenkin's encountered a failure in TestTolerantUpdateProcessorCloud over the 
> weekend
> {noformat}
> http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Solaris/32/consoleText
> Checking out Revision c46d7686643e7503304cb35dfe546bce9c6684e7 
> (refs/remotes/origin/branch_6x)
> Using Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseG1GC
> {noformat}
> The failure happened during the static setup of the test, when a 
> MiniSolrCloudCluster & several clients are initialized -- before any code 
> related to TolerantUpdateProcessor is ever used.
> I can't reproduce this, or really make sense of what i'm (not) seeing here in 
> the logs, so i'm filing this jira with my analysis in the hopes that someone 
> else can help make sense of it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8914) ZkStateReader's refreshLiveNodes(Watcher) is not thread safe

2016-04-19 Thread Scott Blum (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Blum updated SOLR-8914:
-
Affects Version/s: 6.0
   master
   5.5
   5.4.1

> ZkStateReader's refreshLiveNodes(Watcher) is not thread safe
> 
>
> Key: SOLR-8914
> URL: https://issues.apache.org/jira/browse/SOLR-8914
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.5, master, 6.0, 5.4.1
>Reporter: Hoss Man
>Assignee: Scott Blum
> Attachments: SOLR-8914.patch, SOLR-8914.patch, SOLR-8914.patch, 
> SOLR-8914.patch, jenkins.thetaphi.de_Lucene-Solr-6.x-Solaris_32.log.txt, 
> live_node_mentions_port56361_with_threadIds.log.txt, 
> live_nodes_mentions.log.txt
>
>
> Jenkin's encountered a failure in TestTolerantUpdateProcessorCloud over the 
> weekend
> {noformat}
> http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Solaris/32/consoleText
> Checking out Revision c46d7686643e7503304cb35dfe546bce9c6684e7 
> (refs/remotes/origin/branch_6x)
> Using Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseG1GC
> {noformat}
> The failure happened during the static setup of the test, when a 
> MiniSolrCloudCluster & several clients are initialized -- before any code 
> related to TolerantUpdateProcessor is ever used.
> I can't reproduce this, or really make sense of what i'm (not) seeing here in 
> the logs, so i'm filing this jira with my analysis in the hopes that someone 
> else can help make sense of it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8914) ZkStateReader's refreshLiveNodes(Watcher) is not thread safe

2016-04-13 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-8914:
--
   Assignee: Mark Miller  (was: Shalin Shekhar Mangar)
Description: 
Jenkin's encountered a failure in TestTolerantUpdateProcessorCloud over the 
weekend

{noformat}
http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Solaris/32/consoleText
Checking out Revision c46d7686643e7503304cb35dfe546bce9c6684e7 
(refs/remotes/origin/branch_6x)
Using Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseG1GC
{noformat}

The failure happened during the static setup of the test, when a 
MiniSolrCloudCluster & several clients are initialized -- before any code 
related to TolerantUpdateProcessor is ever used.

I can't reproduce this, or really make sense of what i'm (not) seeing here in 
the logs, so i'm filing this jira with my analysis in the hopes that someone 
else can help make sense of it.


  was:

Jenkin's encountered a failure in TestTolerantUpdateProcessorCloud over the 
weekend

{noformat}
http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Solaris/32/consoleText
Checking out Revision c46d7686643e7503304cb35dfe546bce9c6684e7 
(refs/remotes/origin/branch_6x)
Using Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseG1GC
{noformat}

The failure happened during the static setup of the test, when a 
MiniSolrCloudCluster & several clients are initialized -- before any code 
related to TolerantUpdateProcessor is ever used.

I can't reproduce this, or really make sense of what i'm (not) seeing here in 
the logs, so i'm filing this jira with my analysis in the hopes that someone 
else can help make sense of it.



I'll take it. I set my test beasting script on it, and it comes out solid for 
me.

> ZkStateReader's refreshLiveNodes(Watcher) is not thread safe
> 
>
> Key: SOLR-8914
> URL: https://issues.apache.org/jira/browse/SOLR-8914
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
>Assignee: Mark Miller
> Attachments: SOLR-8914.patch, SOLR-8914.patch, SOLR-8914.patch, 
> SOLR-8914.patch, jenkins.thetaphi.de_Lucene-Solr-6.x-Solaris_32.log.txt, 
> live_node_mentions_port56361_with_threadIds.log.txt, 
> live_nodes_mentions.log.txt
>
>
> Jenkin's encountered a failure in TestTolerantUpdateProcessorCloud over the 
> weekend
> {noformat}
> http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Solaris/32/consoleText
> Checking out Revision c46d7686643e7503304cb35dfe546bce9c6684e7 
> (refs/remotes/origin/branch_6x)
> Using Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseG1GC
> {noformat}
> The failure happened during the static setup of the test, when a 
> MiniSolrCloudCluster & several clients are initialized -- before any code 
> related to TolerantUpdateProcessor is ever used.
> I can't reproduce this, or really make sense of what i'm (not) seeing here in 
> the logs, so i'm filing this jira with my analysis in the hopes that someone 
> else can help make sense of it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8914) ZkStateReader's refreshLiveNodes(Watcher) is not thread safe

2016-03-30 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-8914:
---
Attachment: SOLR-8914.patch

i've updated the patch to cleanup the test a bit -- besdies some cosmetic stuff 
it now does more iterations of smaller "bursts" with more variability in the 
number of threads used in each burst (which should increase the odds of it 
failing, eventually, on diff machines regardless of CPU count.

bq. I'm beasting your latest patch too, I'll report anything that comes up. 
Just to make sure, I should be beasting StressTestLiveNodes, right?

TestStressLiveNodes, but otherwise yes.

It would also be helpful to know if (and how quickly) you can get 
TestStressLiveNodes to fail on your machine when beasting w/o the rest of the 
patch (so far i'm the only one that's been able to confirm the bug in practice 
w/o Scott's patch - hopefully these changes increase those odds)

> ZkStateReader's refreshLiveNodes(Watcher) is not thread safe
> 
>
> Key: SOLR-8914
> URL: https://issues.apache.org/jira/browse/SOLR-8914
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
> Attachments: SOLR-8914.patch, SOLR-8914.patch, SOLR-8914.patch, 
> SOLR-8914.patch, jenkins.thetaphi.de_Lucene-Solr-6.x-Solaris_32.log.txt, 
> live_node_mentions_port56361_with_threadIds.log.txt, 
> live_nodes_mentions.log.txt
>
>
> Jenkin's encountered a failure in TestTolerantUpdateProcessorCloud over the 
> weekend
> {noformat}
> http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Solaris/32/consoleText
> Checking out Revision c46d7686643e7503304cb35dfe546bce9c6684e7 
> (refs/remotes/origin/branch_6x)
> Using Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseG1GC
> {noformat}
> The failure happened during the static setup of the test, when a 
> MiniSolrCloudCluster & several clients are initialized -- before any code 
> related to TolerantUpdateProcessor is ever used.
> I can't reproduce this, or really make sense of what i'm (not) seeing here in 
> the logs, so i'm filing this jira with my analysis in the hopes that someone 
> else can help make sense of it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8914) ZkStateReader's refreshLiveNodes(Watcher) is not thread safe

2016-03-30 Thread Scott Blum (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Blum updated SOLR-8914:
-
Attachment: SOLR-8914.patch

Updated patch, with a refinement of the channel formulation.  NOTE: I could not 
get TestStressLiveNodes to fail for me locally, but I believe this should fix 
the deadlock.

> ZkStateReader's refreshLiveNodes(Watcher) is not thread safe
> 
>
> Key: SOLR-8914
> URL: https://issues.apache.org/jira/browse/SOLR-8914
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
> Attachments: SOLR-8914.patch, SOLR-8914.patch, SOLR-8914.patch, 
> jenkins.thetaphi.de_Lucene-Solr-6.x-Solaris_32.log.txt, 
> live_node_mentions_port56361_with_threadIds.log.txt, 
> live_nodes_mentions.log.txt
>
>
> Jenkin's encountered a failure in TestTolerantUpdateProcessorCloud over the 
> weekend
> {noformat}
> http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Solaris/32/consoleText
> Checking out Revision c46d7686643e7503304cb35dfe546bce9c6684e7 
> (refs/remotes/origin/branch_6x)
> Using Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseG1GC
> {noformat}
> The failure happened during the static setup of the test, when a 
> MiniSolrCloudCluster & several clients are initialized -- before any code 
> related to TolerantUpdateProcessor is ever used.
> I can't reproduce this, or really make sense of what i'm (not) seeing here in 
> the logs, so i'm filing this jira with my analysis in the hopes that someone 
> else can help make sense of it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8914) ZkStateReader's refreshLiveNodes(Watcher) is not thread safe

2016-03-29 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-8914:
---
Attachment: SOLR-8914.patch

I wrote up a stress test to demonstrate the bug.  I've added it to the patch 
Scott already worked up & attached.

Scott: Prior to incorporating your changes, hammering on this stress test would 
fail within the first 20 attempts.  But with your changes I'm seeing deadlocks 
within the first 5 attempts every time i hammer on it...

{noformat}
Found one Java-level deadlock:
=
"zkCallback-7-thread-2-processing-n:127.0.0.1:48312_solr":
  waiting to lock monitor 0x7f82d40076b8 (object 0xff3b5b38, a 
java.lang.Object),
  which is held by "zkCallback-7-thread-1-processing-n:127.0.0.1:48312_solr"
"zkCallback-7-thread-1-processing-n:127.0.0.1:48312_solr":
  waiting to lock monitor 0x7f82d400be38 (object 0xff3b5800, a 
org.apache.solr.common.cloud.ZkStateReader),
  which is held by 
"OverseerStateUpdate-95637266046386179-127.0.0.1:48312_solr-n_00"
"OverseerStateUpdate-95637266046386179-127.0.0.1:48312_solr-n_00":
  waiting to lock monitor 0x7f82d40076b8 (object 0xff3b5b38, a 
java.lang.Object),
  which is held by "zkCallback-7-thread-1-processing-n:127.0.0.1:48312_solr"
{noformat}


> ZkStateReader's refreshLiveNodes(Watcher) is not thread safe
> 
>
> Key: SOLR-8914
> URL: https://issues.apache.org/jira/browse/SOLR-8914
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
> Attachments: SOLR-8914.patch, SOLR-8914.patch, 
> jenkins.thetaphi.de_Lucene-Solr-6.x-Solaris_32.log.txt, 
> live_node_mentions_port56361_with_threadIds.log.txt, 
> live_nodes_mentions.log.txt
>
>
> Jenkin's encountered a failure in TestTolerantUpdateProcessorCloud over the 
> weekend
> {noformat}
> http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Solaris/32/consoleText
> Checking out Revision c46d7686643e7503304cb35dfe546bce9c6684e7 
> (refs/remotes/origin/branch_6x)
> Using Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseG1GC
> {noformat}
> The failure happened during the static setup of the test, when a 
> MiniSolrCloudCluster & several clients are initialized -- before any code 
> related to TolerantUpdateProcessor is ever used.
> I can't reproduce this, or really make sense of what i'm (not) seeing here in 
> the logs, so i'm filing this jira with my analysis in the hopes that someone 
> else can help make sense of it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8914) ZkStateReader's refreshLiveNodes(Watcher) is not thread safe

2016-03-29 Thread Scott Blum (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Blum updated SOLR-8914:
-
Attachment: SOLR-8914.patch

Here's a pretty simple patch that I believe should fix.  I think #3 is probably 
the safest option here.

> ZkStateReader's refreshLiveNodes(Watcher) is not thread safe
> 
>
> Key: SOLR-8914
> URL: https://issues.apache.org/jira/browse/SOLR-8914
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
> Attachments: SOLR-8914.patch, 
> jenkins.thetaphi.de_Lucene-Solr-6.x-Solaris_32.log.txt, 
> live_node_mentions_port56361_with_threadIds.log.txt, 
> live_nodes_mentions.log.txt
>
>
> Jenkin's encountered a failure in TestTolerantUpdateProcessorCloud over the 
> weekend
> {noformat}
> http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Solaris/32/consoleText
> Checking out Revision c46d7686643e7503304cb35dfe546bce9c6684e7 
> (refs/remotes/origin/branch_6x)
> Using Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseG1GC
> {noformat}
> The failure happened during the static setup of the test, when a 
> MiniSolrCloudCluster & several clients are initialized -- before any code 
> related to TolerantUpdateProcessor is ever used.
> I can't reproduce this, or really make sense of what i'm (not) seeing here in 
> the logs, so i'm filing this jira with my analysis in the hopes that someone 
> else can help make sense of it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8914) ZkStateReader's refreshLiveNodes(Watcher) is not thread safe

2016-03-29 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-8914:
---
Description: 

Jenkin's encountered a failure in TestTolerantUpdateProcessorCloud over the 
weekend

{noformat}
http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Solaris/32/consoleText
Checking out Revision c46d7686643e7503304cb35dfe546bce9c6684e7 
(refs/remotes/origin/branch_6x)
Using Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseG1GC
{noformat}

The failure happened during the static setup of the test, when a 
MiniSolrCloudCluster & several clients are initialized -- before any code 
related to TolerantUpdateProcessor is ever used.

I can't reproduce this, or really make sense of what i'm (not) seeing here in 
the logs, so i'm filing this jira with my analysis in the hopes that someone 
else can help make sense of it.


  was:


Jenkin's encountered a failure in TestTolerantUpdateProcessorCloud over the 
weekend

{noformat}
http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Solaris/32/consoleText
Checking out Revision c46d7686643e7503304cb35dfe546bce9c6684e7 
(refs/remotes/origin/branch_6x)
Using Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseG1GC
{noformat}

The failure happened during the static setup of the test, when a 
MiniSolrCloudCluster & several clients are initialized -- before any code 
related to TolerantUpdateProcessor is ever used.

I can't reproduce this, or really make sense of what i'm (not) seeing here in 
the logs, so i'm filing this jira with my analysis in the hopes that someone 
else can help make sense of it.


Summary: ZkStateReader's refreshLiveNodes(Watcher) is not thread safe  
(was: inexplicable "no servers hosting shard: shard2" using 
MiniSolrCloudCluster)

Updating summary.

I suspect we either need to move the {{zkClient.getChildren(...)}} call inside 
the existing {{synchronized (getUpdateLock())}} block, or the entire 
{{refreshLiveNodes(Watcher watcher)}} method needs to synchronize on some new 
"liveNodesLock".

> ZkStateReader's refreshLiveNodes(Watcher) is not thread safe
> 
>
> Key: SOLR-8914
> URL: https://issues.apache.org/jira/browse/SOLR-8914
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
> Attachments: jenkins.thetaphi.de_Lucene-Solr-6.x-Solaris_32.log.txt, 
> live_node_mentions_port56361_with_threadIds.log.txt, 
> live_nodes_mentions.log.txt
>
>
> Jenkin's encountered a failure in TestTolerantUpdateProcessorCloud over the 
> weekend
> {noformat}
> http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Solaris/32/consoleText
> Checking out Revision c46d7686643e7503304cb35dfe546bce9c6684e7 
> (refs/remotes/origin/branch_6x)
> Using Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseG1GC
> {noformat}
> The failure happened during the static setup of the test, when a 
> MiniSolrCloudCluster & several clients are initialized -- before any code 
> related to TolerantUpdateProcessor is ever used.
> I can't reproduce this, or really make sense of what i'm (not) seeing here in 
> the logs, so i'm filing this jira with my analysis in the hopes that someone 
> else can help make sense of it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org