[jira] [Commented] (SOLR-7613) solrcore.properties file should be loaded if it resides in ZooKeeper
[ https://issues.apache.org/jira/browse/SOLR-7613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575908#comment-14575908 ] Noble Paul commented on SOLR-7613: -- Let's get rid of solrcore.properties in cloud . We don't need it. It is not just reading that thing. We need to manage the lifecycle as well (editing, refreshing etc) This is the right way to do properties in solrcloud https://cwiki.apache.org/confluence/display/solr/Config+API#ConfigAPI-CommandsforUser-DefinedProperties solrcore.properties file should be loaded if it resides in ZooKeeper Key: SOLR-7613 URL: https://issues.apache.org/jira/browse/SOLR-7613 Project: Solr Issue Type: Bug Reporter: Steve Davids Fix For: 5.3 The solrcore.properties file is used to load user defined properties for use primarily in the solrconfig.xml file, though this properties file will only load if it is resident in the core/conf directory on the physical disk, it will not load if it is in ZK's core/conf directory. There should be a mechanism to allow a core properties file to be specified in ZK and can be updated appropriately along with being able to reload the properties when the file changes (or via a core reload). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7493) Requests aren't distributed evenly if the collection isn't present locally
[ https://issues.apache.org/jira/browse/SOLR-7493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-7493: Attachment: SOLR-7493.patch * Uses a random seeded with tests.seed of System.currentTimeMillis for shuffling * Added a simple test which creates creates 3 jettys, 2 collections A, B such that A has replicas on node1, node2 and collection B has replica on node3. The test fires 10 search requests to node3 intended for collection A and asserts that all requests do not go to the same replica of collection A. Requests aren't distributed evenly if the collection isn't present locally -- Key: SOLR-7493 URL: https://issues.apache.org/jira/browse/SOLR-7493 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.0 Reporter: Jeff Wartes Assignee: Shalin Shekhar Mangar Attachments: SOLR-7493.patch, SOLR-7493.patch I had a SolrCloud cluster where every node is behind a simple round-robin load balancer. This cluster had two collections (A, B), and the slices of each were partitioned such that one collection (A) used two thirds of the nodes, and the other collection (B) used the remaining third of the nodes. I observed that every request for collection B that the load balancer sent to a node with (only) slices for collection A got proxied to one *specific* node hosting a slice for collection B. This node started running pretty hot, for obvious reasons. This meant that one specific node was handling the fan-out for slightly more than two-thirds of the requests against collection B. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7493) Requests aren't distributed evenly if the collection isn't present locally
[ https://issues.apache.org/jira/browse/SOLR-7493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575913#comment-14575913 ] ASF subversion and git services commented on SOLR-7493: --- Commit 1683946 from sha...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1683946 ] SOLR-7493: Requests aren't distributed evenly if the collection isn't present locally Requests aren't distributed evenly if the collection isn't present locally -- Key: SOLR-7493 URL: https://issues.apache.org/jira/browse/SOLR-7493 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.0 Reporter: Jeff Wartes Assignee: Shalin Shekhar Mangar Attachments: SOLR-7493.patch, SOLR-7493.patch I had a SolrCloud cluster where every node is behind a simple round-robin load balancer. This cluster had two collections (A, B), and the slices of each were partitioned such that one collection (A) used two thirds of the nodes, and the other collection (B) used the remaining third of the nodes. I observed that every request for collection B that the load balancer sent to a node with (only) slices for collection A got proxied to one *specific* node hosting a slice for collection B. This node started running pretty hot, for obvious reasons. This meant that one specific node was handling the fan-out for slightly more than two-thirds of the requests against collection B. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-Tests-5.x-Java7 - Build # 3197 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-Tests-5.x-Java7/3197/ No tests ran. Build Log: [...truncated 161 lines...] BUILD FAILED /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/build.xml:536: The following error occurred while executing this line: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/build.xml:484: The following error occurred while executing this line: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/build.xml:61: The following error occurred while executing this line: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/extra-targets.xml:39: The following error occurred while executing this line: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/lucene/build.xml:50: The following error occurred while executing this line: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/lucene/common-build.xml:1436: The following error occurred while executing this line: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/lucene/common-build.xml:991: Could not read or create hints file: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/.caches/test-stats/core/timehints.txt Total time: 16 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Sending artifact delta relative to Lucene-Solr-Tests-5.x-Java7 #3186 Archived 1 artifacts Archive block size is 32768 Received 0 blocks and 464 bytes Compression is 0.0% Took 1.8 sec Recording test results Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7613) solrcore.properties file should be loaded if it resides in ZooKeeper
[ https://issues.apache.org/jira/browse/SOLR-7613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575915#comment-14575915 ] Noble Paul commented on SOLR-7613: -- bq.In my particular case the core won't load without some of the properties being specified. Is there a way to get those properties into ZK before you even create the new collection? It looks like you are adding properties to an already existing collection... In these cases we normally provide a sane default in the config. example {prop_name:prop_default_val} The other option is to write directly to ZK before creating the collection , till we have a way to handle these outside of the collection solrcore.properties file should be loaded if it resides in ZooKeeper Key: SOLR-7613 URL: https://issues.apache.org/jira/browse/SOLR-7613 Project: Solr Issue Type: Bug Reporter: Steve Davids Fix For: 5.3 The solrcore.properties file is used to load user defined properties for use primarily in the solrconfig.xml file, though this properties file will only load if it is resident in the core/conf directory on the physical disk, it will not load if it is in ZK's core/conf directory. There should be a mechanism to allow a core properties file to be specified in ZK and can be updated appropriately along with being able to reload the properties when the file changes (or via a core reload). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-7613) solrcore.properties file should be loaded if it resides in ZooKeeper
[ https://issues.apache.org/jira/browse/SOLR-7613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575915#comment-14575915 ] Noble Paul edited comment on SOLR-7613 at 6/6/15 8:06 PM: -- bq.In my particular case the core won't load without some of the properties being specified. Is there a way to get those properties into ZK before you even create the new collection? It looks like you are adding properties to an already existing collection... In these cases we normally provide a sane default in the config. example {prop_name:prop_default_val} After the core is loaded you can use the config api to update the values The other option is to write the {{configoverlay.json}} directly to ZK before creating the collection , till we have a way to handle these outside of the collection was (Author: noble.paul): bq.In my particular case the core won't load without some of the properties being specified. Is there a way to get those properties into ZK before you even create the new collection? It looks like you are adding properties to an already existing collection... In these cases we normally provide a sane default in the config. example {prop_name:prop_default_val} The other option is to write directly to ZK before creating the collection , till we have a way to handle these outside of the collection solrcore.properties file should be loaded if it resides in ZooKeeper Key: SOLR-7613 URL: https://issues.apache.org/jira/browse/SOLR-7613 Project: Solr Issue Type: Bug Reporter: Steve Davids Fix For: 5.3 The solrcore.properties file is used to load user defined properties for use primarily in the solrconfig.xml file, though this properties file will only load if it is resident in the core/conf directory on the physical disk, it will not load if it is in ZK's core/conf directory. There should be a mechanism to allow a core properties file to be specified in ZK and can be updated appropriately along with being able to reload the properties when the file changes (or via a core reload). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Jenkins problems with timehints.txt [was: Re: [JENKINS] Lucene-Solr-Tests-5.x-Java7 - Build # 3197 - Still Failing]
On Jun 6, 2015, at 4:05 PM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-Solr-Tests-5.x-Java7/3197/ No tests ran. Build Log: [...truncated 161 lines...] BUILD FAILED /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/build.xml:536: The following error occurred while executing this line: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/build.xml:484: The following error occurred while executing this line: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/build.xml:61: The following error occurred while executing this line: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/extra-targets.xml:39: The following error occurred while executing this line: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/lucene/build.xml:50: The following error occurred while executing this line: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/lucene/common-build.xml:1436: The following error occurred while executing this line: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/lucene/common-build.xml:991: Could not read or create hints file: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/.caches/test-stats/core/timehints.txt I’m trying to restart jobs on ASF Jenkins, and Subversion seems to be working fine, but the timehints.txt file is causing trouble that I don’t understand. There’s enough space on the device: $ df -k /x1 Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdb1 82437808 49183712 29043460 63% /x1 I can read the file, and the jenkins user owns it, so I don’t understand what’s happening: $ ls -l /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/.caches/test-stats/core/timehints.txt -rw-rw-r-- 1 jenkins jenkins 49152 Jun 3 13:34 /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/.caches/test-stats/core/timehints.txt Uwe? Dawid? - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Jenkins problems with timehints.txt [was: Re: [JENKINS] Lucene-Solr-Tests-5.x-Java7 - Build # 3197 - Still Failing]
I would nuke workspace through UI. Maybe build was killed while writing file. Uwe Am 6. Juni 2015 22:09:52 MESZ, schrieb Steve Rowe sar...@gmail.com: On Jun 6, 2015, at 4:05 PM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-Solr-Tests-5.x-Java7/3197/ No tests ran. Build Log: [...truncated 161 lines...] BUILD FAILED /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/build.xml:536: The following error occurred while executing this line: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/build.xml:484: The following error occurred while executing this line: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/build.xml:61: The following error occurred while executing this line: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/extra-targets.xml:39: The following error occurred while executing this line: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/lucene/build.xml:50: The following error occurred while executing this line: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/lucene/common-build.xml:1436: The following error occurred while executing this line: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/lucene/common-build.xml:991: Could not read or create hints file: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/.caches/test-stats/core/timehints.txt I’m trying to restart jobs on ASF Jenkins, and Subversion seems to be working fine, but the timehints.txt file is causing trouble that I don’t understand. There’s enough space on the device: $ df -k /x1 Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdb1 82437808 49183712 29043460 63% /x1 I can read the file, and the jenkins user owns it, so I don’t understand what’s happening: $ ls -l /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/.caches/test-stats/core/timehints.txt -rw-rw-r-- 1 jenkins jenkins 49152 Jun 3 13:34 /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/.caches/test-stats/core/timehints.txt Uwe? Dawid? - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Uwe Schindler H.-H.-Meier-Allee 63, 28213 Bremen http://www.thetaphi.de
[jira] [Commented] (LUCENE-6523) IW commit without commit user-data changes should also be reflected in NRT reopen
[ https://issues.apache.org/jira/browse/LUCENE-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575916#comment-14575916 ] ASF subversion and git services commented on LUCENE-6523: - Commit 1683947 from [~mikemccand] in branch 'dev/trunk' [ https://svn.apache.org/r1683947 ] LUCENE-6523: a new commit, even without user-data changes, is visible to reopened NRT reader IW commit without commit user-data changes should also be reflected in NRT reopen - Key: LUCENE-6523 URL: https://issues.apache.org/jira/browse/LUCENE-6523 Project: Lucene - Core Issue Type: Bug Reporter: Michael McCandless Assignee: Michael McCandless Fix For: Trunk, 5.3 Attachments: LUCENE-6523.patch In LUCENE-6505 we fixed NRT readers to properly reflect changes from the last commit (new segments_N filename, new commit user-data), but I missed the case where a commit is done immediately after opening an NRT reader with no changes to the commit user-data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7493) Requests aren't distributed evenly if the collection isn't present locally
[ https://issues.apache.org/jira/browse/SOLR-7493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575917#comment-14575917 ] ASF subversion and git services commented on SOLR-7493: --- Commit 1683948 from sha...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1683948 ] SOLR-7493: Initialize random correctly Requests aren't distributed evenly if the collection isn't present locally -- Key: SOLR-7493 URL: https://issues.apache.org/jira/browse/SOLR-7493 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.0 Reporter: Jeff Wartes Assignee: Shalin Shekhar Mangar Attachments: SOLR-7493.patch, SOLR-7493.patch I had a SolrCloud cluster where every node is behind a simple round-robin load balancer. This cluster had two collections (A, B), and the slices of each were partitioned such that one collection (A) used two thirds of the nodes, and the other collection (B) used the remaining third of the nodes. I observed that every request for collection B that the load balancer sent to a node with (only) slices for collection A got proxied to one *specific* node hosting a slice for collection B. This node started running pretty hot, for obvious reasons. This meant that one specific node was handling the fan-out for slightly more than two-thirds of the requests against collection B. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7493) Requests aren't distributed evenly if the collection isn't present locally
[ https://issues.apache.org/jira/browse/SOLR-7493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575919#comment-14575919 ] ASF subversion and git services commented on SOLR-7493: --- Commit 1683950 from sha...@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1683950 ] SOLR-7493: Requests aren't distributed evenly if the collection isn't present locally. Merges r1683946 and r1683948 from trunk. Requests aren't distributed evenly if the collection isn't present locally -- Key: SOLR-7493 URL: https://issues.apache.org/jira/browse/SOLR-7493 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.0 Reporter: Jeff Wartes Assignee: Shalin Shekhar Mangar Attachments: SOLR-7493.patch, SOLR-7493.patch I had a SolrCloud cluster where every node is behind a simple round-robin load balancer. This cluster had two collections (A, B), and the slices of each were partitioned such that one collection (A) used two thirds of the nodes, and the other collection (B) used the remaining third of the nodes. I observed that every request for collection B that the load balancer sent to a node with (only) slices for collection A got proxied to one *specific* node hosting a slice for collection B. This node started running pretty hot, for obvious reasons. This meant that one specific node was handling the fan-out for slightly more than two-thirds of the requests against collection B. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-7493) Requests aren't distributed evenly if the collection isn't present locally
[ https://issues.apache.org/jira/browse/SOLR-7493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-7493. - Resolution: Fixed Fix Version/s: 5.3 Trunk Thanks Jeff! Requests aren't distributed evenly if the collection isn't present locally -- Key: SOLR-7493 URL: https://issues.apache.org/jira/browse/SOLR-7493 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.0 Reporter: Jeff Wartes Assignee: Shalin Shekhar Mangar Fix For: Trunk, 5.3 Attachments: SOLR-7493.patch, SOLR-7493.patch I had a SolrCloud cluster where every node is behind a simple round-robin load balancer. This cluster had two collections (A, B), and the slices of each were partitioned such that one collection (A) used two thirds of the nodes, and the other collection (B) used the remaining third of the nodes. I observed that every request for collection B that the load balancer sent to a node with (only) slices for collection A got proxied to one *specific* node hosting a slice for collection B. This node started running pretty hot, for obvious reasons. This meant that one specific node was handling the fan-out for slightly more than two-thirds of the requests against collection B. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6523) IW commit without commit user-data changes should also be reflected in NRT reopen
[ https://issues.apache.org/jira/browse/LUCENE-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575926#comment-14575926 ] ASF subversion and git services commented on LUCENE-6523: - Commit 1683954 from [~mikemccand] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1683954 ] LUCENE-6523: a new commit, even without user-data changes, is visible to reopened NRT reader IW commit without commit user-data changes should also be reflected in NRT reopen - Key: LUCENE-6523 URL: https://issues.apache.org/jira/browse/LUCENE-6523 Project: Lucene - Core Issue Type: Bug Reporter: Michael McCandless Assignee: Michael McCandless Fix For: Trunk, 5.3 Attachments: LUCENE-6523.patch In LUCENE-6505 we fixed NRT readers to properly reflect changes from the last commit (new segments_N filename, new commit user-data), but I missed the case where a commit is done immediately after opening an NRT reader with no changes to the commit user-data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-6523) IW commit without commit user-data changes should also be reflected in NRT reopen
[ https://issues.apache.org/jira/browse/LUCENE-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-6523. Resolution: Fixed IW commit without commit user-data changes should also be reflected in NRT reopen - Key: LUCENE-6523 URL: https://issues.apache.org/jira/browse/LUCENE-6523 Project: Lucene - Core Issue Type: Bug Reporter: Michael McCandless Assignee: Michael McCandless Fix For: Trunk, 5.3 Attachments: LUCENE-6523.patch In LUCENE-6505 we fixed NRT readers to properly reflect changes from the last commit (new segments_N filename, new commit user-data), but I missed the case where a commit is done immediately after opening an NRT reader with no changes to the commit user-data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-5.x-MacOSX (64bit/jdk1.7.0) - Build # 2343 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-MacOSX/2343/ Java: 64bit/jdk1.7.0 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC 1 tests failed. FAILED: org.apache.solr.client.solrj.TestLBHttpSolrClient.testReliability Error Message: No live SolrServers available to handle this request Stack Trace: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request at __randomizedtesting.SeedInfo.seed([ADCD8757DD0F6D78:6C055A117C69BCD1]:0) at org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:576) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135) at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:943) at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:958) at org.apache.solr.client.solrj.TestLBHttpSolrClient.testReliability(TestLBHttpSolrClient.java:219) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:872) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:886) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:845) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:747) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:781) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:792) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at
Re: Jenkins problems with timehints.txt [was: Re: [JENKINS] Lucene-Solr-Tests-5.x-Java7 - Build # 3197 - Still Failing]
Thanks, I nuked it. On Jun 6, 2015, at 4:12 PM, Uwe Schindler u...@thetaphi.de wrote: I would nuke workspace through UI. Maybe build was killed while writing file. Uwe Am 6. Juni 2015 22:09:52 MESZ, schrieb Steve Rowe sar...@gmail.com: On Jun 6, 2015, at 4:05 PM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-Solr-Tests-5.x-Java7/3197/ No tests ran. Build Log: [...truncated 161 lines...] BUILD FAILED /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/build.xml:536: The following error occurred while executing this line: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/build.xml:484: The following error occurred while executing this line: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/build.xml:61: The following error occurred while executing this line: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/extra-targets.xml:39: The following error occurred while executing this line: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/lucene/build.xml:50: The following error occurred while executing this line: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/lucene/common-build.xml:1436: The following error occurred while executing this line: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/lucene/common-build.xml:991: Could not read or create hints file: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/.caches/test-stats/core/timehints.txt I’m trying to restart jobs on ASF Jenkins, and Subversion seems to be working fine, but the timehints.txt file is causing trouble that I don’t understand. There’s enough space on the device: $ df -k /x1 Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdb1 82437808 49183712 29043460 63% /x1 I can read the file, and the jenkins user owns it, so I don’t understand what’s happening: $ ls -l /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/.caches/test-stats/core/timehints.txt -rw-rw-r-- 1 jenkins jenkins 49152 Jun 3 13:34 /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/.caches/test-stats/core/timehints.txt Uwe? Dawid? To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Uwe Schindler H.-H.-Meier-Allee 63, 28213 Bremen http://www.thetaphi.de - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.8.0_60-ea-b12) - Build # 12957 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/12957/ Java: 64bit/jdk1.8.0_60-ea-b12 -XX:+UseCompressedOops -XX:+UseG1GC 1 tests failed. FAILED: org.apache.solr.cloud.ShardSplitTest.test Error Message: Timeout occured while waiting response from server at: http://127.0.0.1:43944/_qpz/ic Stack Trace: org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: http://127.0.0.1:43944/_qpz/ic at __randomizedtesting.SeedInfo.seed([5B5F3742B530787F:D30B08981BCC1587]:0) at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:572) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:235) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:227) at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1220) at org.apache.solr.cloud.ShardSplitTest.splitShard(ShardSplitTest.java:490) at org.apache.solr.cloud.ShardSplitTest.incompleteOrOverlappingCustomRangeTest(ShardSplitTest.java:100) at org.apache.solr.cloud.ShardSplitTest.test(ShardSplitTest.java:76) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:872) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:886) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:960) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:935) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:845) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:747) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:781) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:792) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) at
[jira] [Updated] (LUCENE-6482) Class loading deadlock relating to NamedSPILoader
[ https://issues.apache.org/jira/browse/LUCENE-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shikhar Bhushan updated LUCENE-6482: Attachment: CodecLoadingDeadlockTest.java Class loading deadlock relating to NamedSPILoader - Key: LUCENE-6482 URL: https://issues.apache.org/jira/browse/LUCENE-6482 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.9.1 Reporter: Shikhar Bhushan Assignee: Uwe Schindler Attachments: CodecLoadingDeadlockTest.java This issue came up for us several times with Elasticsearch 1.3.4 (Lucene 4.9.1), with many threads seeming deadlocked but RUNNABLE: {noformat} elasticsearch[search77-es2][generic][T#43] #160 daemon prio=5 os_prio=0 tid=0x7f79180c5800 nid=0x3d1f in Object.wait() [0x7f79d9289000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:912) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:758) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453) at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98) at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:126) at org.elasticsearch.index.store.Store.access$300(Store.java:76) at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:465) at org.elasticsearch.index.store.Store$MetadataSnapshot.init(Store.java:456) at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:281) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} It didn't really make sense to see RUNNABLE threads in Object.wait(), but this seems to be symptomatic of deadlocks in static initialization (http://ternarysearch.blogspot.ru/2013/07/static-initialization-deadlock.html). I found LUCENE-5573 as an instance of this having come up with Lucene code before. I'm not sure what exactly is going on, but the deadlock in this case seems to involve these threads: {noformat} elasticsearch[search77-es2][clusterService#updateTask][T#1] #79 daemon prio=5 os_prio=0 tid=0x7f7b155ff800 nid=0xd49 in Object.wait() [0x7f79daed8000] java.lang.Thread.State: RUNNABLE at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at java.lang.Class.newInstance(Class.java:433) at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67) - locked 0x00061fef4968 (a org.apache.lucene.util.NamedSPILoader) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:47) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:37) at org.apache.lucene.codecs.PostingsFormat.clinit(PostingsFormat.java:44) at org.elasticsearch.index.codec.postingsformat.PostingFormats.clinit(PostingFormats.java:67) at org.elasticsearch.index.codec.CodecModule.configurePostingsFormats(CodecModule.java:126) at org.elasticsearch.index.codec.CodecModule.configure(CodecModule.java:178) at org.elasticsearch.common.inject.AbstractModule.configure(AbstractModule.java:60) - locked 0x00061fef49e8 (a org.elasticsearch.index.codec.CodecModule) at
Re: [VOTE] 5.2.0 RC4
Thanks everyone! The vote for 5.2.0 RC4 has now passed. I'll start working on publishing and releasing this. On Thu, Jun 4, 2015 at 11:02 AM, Yonik Seeley ysee...@gmail.com wrote: +1 -Yonik On Tue, Jun 2, 2015 at 11:12 PM, Anshum Gupta ans...@anshumgupta.net wrote: Please vote for the fourth (and hopefully final) release candidate for Apache Lucene/Solr 5.2.0. The artifacts can be downloaded from: https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-5.2.0-RC4-rev1683206/ You can run the smoke tester directly with this command: python3 -u dev-tools/scripts/smokeTestRelease.py https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-5.2.0-RC4-rev1683206/ Here's my +1 SUCCESS! [0:32:56.564985] -- Anshum Gupta - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Anshum Gupta
Re: ASF Policeman Jenkins jobs disabled [was: Re: [JENKINS] Solr-Artifacts-5.2 - Build # 13 - Still Failing]
Thanks for creating this Steve. I'm blocked on the release process due to this. On Fri, Jun 5, 2015 at 7:02 PM, Steve Rowe sar...@gmail.com wrote: I took the Policeman Jenkins nodes back online, and jobs seem to be building fine there. However, after I re-enabled all the ASF Jenkins jobs, Subversion can’t update checkouts. For now I’m re-disabling all the ASF Jenkins jobs. Also, I can’t update or checkout locally. At Infra’s request on Hipchat, I created a JIRA: https://issues.apache.org/jira/browse/INFRA-9775 Steve On Jun 5, 2015, at 7:37 PM, Steve Rowe sar...@gmail.com wrote: ASF Infrastructure wrote in an email to committ...@apache.org that Subversion will be down for as much as 6 hours from 22:00 UTC. I temporarily took all three Policeman Jenkins nodes offline. I apparently don’t have such permission on ASF Jenkins, so I instead manually disabled all 19 jobs. I’ll re-enable things once Subversion is back up, if I’m still awake. Otherwise, anybody else should please feel free to do the same. Steve On Jun 5, 2015, at 6:56 PM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Solr-Artifacts-5.2/13/ No tests ran. Build Log: [...truncated 8 lines...] ERROR: Failed to update http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_5_2 org.tmatesoft.svn.core.SVNException: svn: E175002: OPTIONS /repos/asf/lucene/dev/branches/lucene_solr_5_2 failed at org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:388) at org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:373) at org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPConnection.java:361) at org.tmatesoft.svn.core.internal.io.dav.DAVConnection.performHttpRequest(DAVConnection.java:707) at org.tmatesoft.svn.core.internal.io.dav.DAVConnection.exchangeCapabilities(DAVConnection.java:627) at org.tmatesoft.svn.core.internal.io.dav.DAVConnection.open(DAVConnection.java:102) at org.tmatesoft.svn.core.internal.io.dav.DAVRepository.openConnection(DAVRepository.java:1020) at org.tmatesoft.svn.core.internal.io.dav.DAVRepository.getRepositoryUUID(DAVRepository.java:148) at org.tmatesoft.svn.core.internal.wc16.SVNBasicDelegate.createRepository(SVNBasicDelegate.java:339) at org.tmatesoft.svn.core.internal.wc16.SVNBasicDelegate.createRepository(SVNBasicDelegate.java:328) at org.tmatesoft.svn.core.internal.wc16.SVNUpdateClient16.update(SVNUpdateClient16.java:482) at org.tmatesoft.svn.core.internal.wc16.SVNUpdateClient16.doUpdate(SVNUpdateClient16.java:364) at org.tmatesoft.svn.core.internal.wc16.SVNUpdateClient16.doUpdate(SVNUpdateClient16.java:274) at org.tmatesoft.svn.core.internal.wc2.old.SvnOldUpdate.run(SvnOldUpdate.java:27) at org.tmatesoft.svn.core.internal.wc2.old.SvnOldUpdate.run(SvnOldUpdate.java:11) at org.tmatesoft.svn.core.internal.wc2.SvnOperationRunner.run(SvnOperationRunner.java:20) at org.tmatesoft.svn.core.wc2.SvnOperationFactory.run(SvnOperationFactory.java:1238) at org.tmatesoft.svn.core.wc2.SvnOperation.run(SvnOperation.java:294) at org.tmatesoft.svn.core.wc.SVNUpdateClient.doUpdate(SVNUpdateClient.java:311) at org.tmatesoft.svn.core.wc.SVNUpdateClient.doUpdate(SVNUpdateClient.java:291) at org.tmatesoft.svn.core.wc.SVNUpdateClient.doUpdate(SVNUpdateClient.java:387) at hudson.scm.subversion.UpdateUpdater$TaskImpl.perform(UpdateUpdater.java:157) at hudson.scm.subversion.WorkspaceUpdater$UpdateTask.delegateTo(WorkspaceUpdater.java:161) at hudson.scm.SubversionSCM$CheckOutTask.perform(SubversionSCM.java:1030) at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:1011) at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:987) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2474) at hudson.remoting.UserRequest.perform(UserRequest.java:118) at hudson.remoting.UserRequest.perform(UserRequest.java:48) at hudson.remoting.Request$2.run(Request.java:328) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: svn: E175002: OPTIONS /repos/asf/lucene/dev/branches/lucene_solr_5_2 failed at org.tmatesoft.svn.core.SVNErrorMessage.create(SVNErrorMessage.java:208) at org.tmatesoft.svn.core.SVNErrorMessage.create(SVNErrorMessage.java:154) at org.tmatesoft.svn.core.SVNErrorMessage.create(SVNErrorMessage.java:97) ... 35 more
RE: ASF Policeman Jenkins jobs disabled [was: Re: [JENKINS] Solr-Artifacts-5.2 - Build # 13 - Still Failing]
Policeman Jenkins mainly goes to the EU mirror of SVN; that may explain the difference. Many thanks @ Steve for temporarily disabling the builds. I am not sure what you did on Policeman Jenkins, but all is back up again. The easiest is generally to click on Prepare shutdown, then it runs no jobs anymore. Unfortunately you cannot do this on ASF Jenkins, maybe Infra should have done this. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Steve Rowe [mailto:sar...@gmail.com] Sent: Saturday, June 06, 2015 4:02 AM To: dev@lucene.apache.org Subject: Re: ASF Policeman Jenkins jobs disabled [was: Re: [JENKINS] Solr- Artifacts-5.2 - Build # 13 - Still Failing] I took the Policeman Jenkins nodes back online, and jobs seem to be building fine there. However, after I re-enabled all the ASF Jenkins jobs, Subversion can’t update checkouts. For now I’m re-disabling all the ASF Jenkins jobs. Also, I can’t update or checkout locally. At Infra’s request on Hipchat, I created a JIRA: https://issues.apache.org/jira/browse/INFRA-9775 Steve On Jun 5, 2015, at 7:37 PM, Steve Rowe sar...@gmail.com wrote: ASF Infrastructure wrote in an email to committ...@apache.org that Subversion will be down for as much as 6 hours from 22:00 UTC. I temporarily took all three Policeman Jenkins nodes offline. I apparently don’t have such permission on ASF Jenkins, so I instead manually disabled all 19 jobs. I’ll re-enable things once Subversion is back up, if I’m still awake. Otherwise, anybody else should please feel free to do the same. Steve On Jun 5, 2015, at 6:56 PM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Solr-Artifacts-5.2/13/ No tests ran. Build Log: [...truncated 8 lines...] ERROR: Failed to update http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_5_2 org.tmatesoft.svn.core.SVNException: svn: E175002: OPTIONS /repos/asf/lucene/dev/branches/lucene_solr_5_2 failed at org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPC onnection.java:388) at org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPC onnection.java:373) at org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPC onnection.java:361) at org.tmatesoft.svn.core.internal.io.dav.DAVConnection.performHttpRequest (DAVConnection.java:707) at org.tmatesoft.svn.core.internal.io.dav.DAVConnection.exchangeCapabilities (DAVConnection.java:627) at org.tmatesoft.svn.core.internal.io.dav.DAVConnection.open(DAVConnectio n.java:102) at org.tmatesoft.svn.core.internal.io.dav.DAVRepository.openConnection(DAV Repository.java:1020) at org.tmatesoft.svn.core.internal.io.dav.DAVRepository.getRepositoryUUID(D AVRepository.java:148) at org.tmatesoft.svn.core.internal.wc16.SVNBasicDelegate.createRepository(S VNBasicDelegate.java:339) at org.tmatesoft.svn.core.internal.wc16.SVNBasicDelegate.createRepository(S VNBasicDelegate.java:328) at org.tmatesoft.svn.core.internal.wc16.SVNUpdateClient16.update(SVNUpdat eClient16.java:482) at org.tmatesoft.svn.core.internal.wc16.SVNUpdateClient16.doUpdate(SVNUp dateClient16.java:364) at org.tmatesoft.svn.core.internal.wc16.SVNUpdateClient16.doUpdate(SVNUp dateClient16.java:274) at org.tmatesoft.svn.core.internal.wc2.old.SvnOldUpdate.run(SvnOldUpdate.ja va:27) at org.tmatesoft.svn.core.internal.wc2.old.SvnOldUpdate.run(SvnOldUpdate.ja va:11) at org.tmatesoft.svn.core.internal.wc2.SvnOperationRunner.run(SvnOperation Runner.java:20) at org.tmatesoft.svn.core.wc2.SvnOperationFactory.run(SvnOperationFactory.j ava:1238) at org.tmatesoft.svn.core.wc2.SvnOperation.run(SvnOperation.java:294) at org.tmatesoft.svn.core.wc.SVNUpdateClient.doUpdate(SVNUpdateClient.ja va:311) at org.tmatesoft.svn.core.wc.SVNUpdateClient.doUpdate(SVNUpdateClient.ja va:291) at org.tmatesoft.svn.core.wc.SVNUpdateClient.doUpdate(SVNUpdateClient.ja va:387) at hudson.scm.subversion.UpdateUpdater$TaskImpl.perform(UpdateUpdater. java:157) at hudson.scm.subversion.WorkspaceUpdater$UpdateTask.delegateTo(Works paceUpdater.java:161) at hudson.scm.SubversionSCM$CheckOutTask.perform(SubversionSCM.java:1 030) at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:101 1) at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:987 ) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2474) at hudson.remoting.UserRequest.perform(UserRequest.java:118) at hudson.remoting.UserRequest.perform(UserRequest.java:48) at hudson.remoting.Request$2.run(Request.java:328) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorSe rvice.java:72) at
[jira] [Commented] (LUCENE-6482) Class loading deadlock relating to NamedSPILoader
[ https://issues.apache.org/jira/browse/LUCENE-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575639#comment-14575639 ] Shikhar Bhushan commented on LUCENE-6482: - Thanks Uwe. I have actually not had a single occasion of not encountering the deadlock, just these lines do the trick every time {noformat} public static void main(String... args) { final Thread t1 = new Thread(() - Codec.getDefault()); final Thread t2 = new Thread(() - new SimpleTextCodec()); t1.start(); t2.start(); } {noformat} I am using JDK8u25. Class loading deadlock relating to NamedSPILoader - Key: LUCENE-6482 URL: https://issues.apache.org/jira/browse/LUCENE-6482 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.9.1 Reporter: Shikhar Bhushan Assignee: Uwe Schindler Attachments: CodecLoadingDeadlockTest.java This issue came up for us several times with Elasticsearch 1.3.4 (Lucene 4.9.1), with many threads seeming deadlocked but RUNNABLE: {noformat} elasticsearch[search77-es2][generic][T#43] #160 daemon prio=5 os_prio=0 tid=0x7f79180c5800 nid=0x3d1f in Object.wait() [0x7f79d9289000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:912) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:758) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453) at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98) at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:126) at org.elasticsearch.index.store.Store.access$300(Store.java:76) at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:465) at org.elasticsearch.index.store.Store$MetadataSnapshot.init(Store.java:456) at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:281) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} It didn't really make sense to see RUNNABLE threads in Object.wait(), but this seems to be symptomatic of deadlocks in static initialization (http://ternarysearch.blogspot.ru/2013/07/static-initialization-deadlock.html). I found LUCENE-5573 as an instance of this having come up with Lucene code before. I'm not sure what exactly is going on, but the deadlock in this case seems to involve these threads: {noformat} elasticsearch[search77-es2][clusterService#updateTask][T#1] #79 daemon prio=5 os_prio=0 tid=0x7f7b155ff800 nid=0xd49 in Object.wait() [0x7f79daed8000] java.lang.Thread.State: RUNNABLE at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at java.lang.Class.newInstance(Class.java:433) at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67) - locked 0x00061fef4968 (a org.apache.lucene.util.NamedSPILoader) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:47) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:37) at org.apache.lucene.codecs.PostingsFormat.clinit(PostingsFormat.java:44) at org.elasticsearch.index.codec.postingsformat.PostingFormats.clinit(PostingFormats.java:67) at
[jira] [Updated] (LUCENE-6482) Class loading deadlock relating to NamedSPILoader
[ https://issues.apache.org/jira/browse/LUCENE-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shikhar Bhushan updated LUCENE-6482: Attachment: (was: CodecLoadingDeadlockTest.java) Class loading deadlock relating to NamedSPILoader - Key: LUCENE-6482 URL: https://issues.apache.org/jira/browse/LUCENE-6482 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.9.1 Reporter: Shikhar Bhushan Assignee: Uwe Schindler Attachments: CodecLoadingDeadlockTest.java This issue came up for us several times with Elasticsearch 1.3.4 (Lucene 4.9.1), with many threads seeming deadlocked but RUNNABLE: {noformat} elasticsearch[search77-es2][generic][T#43] #160 daemon prio=5 os_prio=0 tid=0x7f79180c5800 nid=0x3d1f in Object.wait() [0x7f79d9289000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:912) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:758) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453) at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98) at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:126) at org.elasticsearch.index.store.Store.access$300(Store.java:76) at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:465) at org.elasticsearch.index.store.Store$MetadataSnapshot.init(Store.java:456) at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:281) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} It didn't really make sense to see RUNNABLE threads in Object.wait(), but this seems to be symptomatic of deadlocks in static initialization (http://ternarysearch.blogspot.ru/2013/07/static-initialization-deadlock.html). I found LUCENE-5573 as an instance of this having come up with Lucene code before. I'm not sure what exactly is going on, but the deadlock in this case seems to involve these threads: {noformat} elasticsearch[search77-es2][clusterService#updateTask][T#1] #79 daemon prio=5 os_prio=0 tid=0x7f7b155ff800 nid=0xd49 in Object.wait() [0x7f79daed8000] java.lang.Thread.State: RUNNABLE at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at java.lang.Class.newInstance(Class.java:433) at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67) - locked 0x00061fef4968 (a org.apache.lucene.util.NamedSPILoader) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:47) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:37) at org.apache.lucene.codecs.PostingsFormat.clinit(PostingsFormat.java:44) at org.elasticsearch.index.codec.postingsformat.PostingFormats.clinit(PostingFormats.java:67) at org.elasticsearch.index.codec.CodecModule.configurePostingsFormats(CodecModule.java:126) at org.elasticsearch.index.codec.CodecModule.configure(CodecModule.java:178) at org.elasticsearch.common.inject.AbstractModule.configure(AbstractModule.java:60) - locked 0x00061fef49e8 (a org.elasticsearch.index.codec.CodecModule) at
Re: Lucene / Solr 5.2 release notes
Could we change the CHANGES.txt links with the .html links instead? They are much better formatted. (I just realised I don't have edit access to the wiki, could someone add me? Name: Ramkumar Aiyengar, Alias: andyetitmoves) I’ve made drafts for the Lucene and Solr release notes - please feel free to edit or suggest edits: Lucene: https://wiki.apache.org/lucene-java/ReleaseNote52 Solr: http://wiki.apache.org/solr/ReleaseNote52 -- Anshum Gupta
[JENKINS] Lucene-Solr-trunk-Windows (32bit/jdk1.8.0_45) - Build # 4898 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/4898/ Java: 32bit/jdk1.8.0_45 -server -XX:+UseConcMarkSweepGC 2 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.cloud.HttpPartitionTest Error Message: ObjectTracker found 1 object(s) that were not released!!! [TransactionLog] Stack Trace: java.lang.AssertionError: ObjectTracker found 1 object(s) that were not released!!! [TransactionLog] at __randomizedtesting.SeedInfo.seed([812DD08DF8CEA274]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertNull(Assert.java:551) at org.apache.solr.SolrTestCaseJ4.afterClass(SolrTestCaseJ4.java:235) at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:799) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at java.lang.Thread.run(Thread.java:745) FAILED: junit.framework.TestSuite.org.apache.solr.cloud.HttpPartitionTest Error Message: Could not remove the following files (in the order of attempts): C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\build\solr-core\test\J0\temp\solr.cloud.HttpPartitionTest 812DD08DF8CEA274-001\shard-3-001\cores\c8n_1x2_leader_session_loss_shard1_replica2\data\tlog\tlog.000: java.nio.file.FileSystemException: C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\build\solr-core\test\J0\temp\solr.cloud.HttpPartitionTest 812DD08DF8CEA274-001\shard-3-001\cores\c8n_1x2_leader_session_loss_shard1_replica2\data\tlog\tlog.000: The process cannot access the file because it is being used by another process. C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\build\solr-core\test\J0\temp\solr.cloud.HttpPartitionTest 812DD08DF8CEA274-001\shard-3-001\cores\c8n_1x2_leader_session_loss_shard1_replica2\data\tlog: java.nio.file.DirectoryNotEmptyException: C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\build\solr-core\test\J0\temp\solr.cloud.HttpPartitionTest 812DD08DF8CEA274-001\shard-3-001\cores\c8n_1x2_leader_session_loss_shard1_replica2\data\tlog C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\build\solr-core\test\J0\temp\solr.cloud.HttpPartitionTest 812DD08DF8CEA274-001\shard-3-001\cores\c8n_1x2_leader_session_loss_shard1_replica2\data: java.nio.file.DirectoryNotEmptyException: C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\build\solr-core\test\J0\temp\solr.cloud.HttpPartitionTest 812DD08DF8CEA274-001\shard-3-001\cores\c8n_1x2_leader_session_loss_shard1_replica2\data C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\solr\build\solr-core\test\J0\temp\solr.cloud.HttpPartitionTest 812DD08DF8CEA274-001\shard-3-001\cores\c8n_1x2_leader_session_loss_shard1_replica2: java.nio.file.DirectoryNotEmptyException:
[jira] [Commented] (LUCENE-6482) Class loading deadlock relating to NamedSPILoader
[ https://issues.apache.org/jira/browse/LUCENE-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575654#comment-14575654 ] Uwe Schindler commented on LUCENE-6482: --- I can easily reproduce this! I will now try to synchronize the serviceloader class, so we make sure that the classpath scanning is done sequentially. Class loading deadlock relating to NamedSPILoader - Key: LUCENE-6482 URL: https://issues.apache.org/jira/browse/LUCENE-6482 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.9.1 Reporter: Shikhar Bhushan Assignee: Uwe Schindler Attachments: CodecLoadingDeadlockTest.java This issue came up for us several times with Elasticsearch 1.3.4 (Lucene 4.9.1), with many threads seeming deadlocked but RUNNABLE: {noformat} elasticsearch[search77-es2][generic][T#43] #160 daemon prio=5 os_prio=0 tid=0x7f79180c5800 nid=0x3d1f in Object.wait() [0x7f79d9289000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:912) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:758) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453) at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98) at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:126) at org.elasticsearch.index.store.Store.access$300(Store.java:76) at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:465) at org.elasticsearch.index.store.Store$MetadataSnapshot.init(Store.java:456) at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:281) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} It didn't really make sense to see RUNNABLE threads in Object.wait(), but this seems to be symptomatic of deadlocks in static initialization (http://ternarysearch.blogspot.ru/2013/07/static-initialization-deadlock.html). I found LUCENE-5573 as an instance of this having come up with Lucene code before. I'm not sure what exactly is going on, but the deadlock in this case seems to involve these threads: {noformat} elasticsearch[search77-es2][clusterService#updateTask][T#1] #79 daemon prio=5 os_prio=0 tid=0x7f7b155ff800 nid=0xd49 in Object.wait() [0x7f79daed8000] java.lang.Thread.State: RUNNABLE at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at java.lang.Class.newInstance(Class.java:433) at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67) - locked 0x00061fef4968 (a org.apache.lucene.util.NamedSPILoader) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:47) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:37) at org.apache.lucene.codecs.PostingsFormat.clinit(PostingsFormat.java:44) at org.elasticsearch.index.codec.postingsformat.PostingFormats.clinit(PostingFormats.java:67) at org.elasticsearch.index.codec.CodecModule.configurePostingsFormats(CodecModule.java:126) at org.elasticsearch.index.codec.CodecModule.configure(CodecModule.java:178) at
[jira] [Comment Edited] (LUCENE-6482) Class loading deadlock relating to NamedSPILoader
[ https://issues.apache.org/jira/browse/LUCENE-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575656#comment-14575656 ] Uwe Schindler edited comment on LUCENE-6482 at 6/6/15 10:27 AM: I fails with every codec. The issue happens only if you call {{Codec.forName()}} at the same time as using the constructor of any Codec subclass. I have no idea how we should prevent that. I tried to synchronize NamedSPILoader, but that did not help. The problem is that this is a special type of deadlock that does not really involve standard Java locks. It is more the JVM internal prevent to initialize a subclass before the parent class is initialized. was (Author: thetaphi): I fails with every codec. The issue happens only if you call {{Codec.forName()}} at the same time as using the constructor of any Codec subclass. I have no idea how we should prevent that. I tried to synchronie NamedSPILoader, but that did not help. The problem is that this is a special type of deadlock that does not really involve standard Kava locks. It is more the JVM internal prevent to initialize a subclass before the parent class is initialized. Class loading deadlock relating to NamedSPILoader - Key: LUCENE-6482 URL: https://issues.apache.org/jira/browse/LUCENE-6482 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.9.1 Reporter: Shikhar Bhushan Assignee: Uwe Schindler Attachments: CodecLoadingDeadlockTest.java This issue came up for us several times with Elasticsearch 1.3.4 (Lucene 4.9.1), with many threads seeming deadlocked but RUNNABLE: {noformat} elasticsearch[search77-es2][generic][T#43] #160 daemon prio=5 os_prio=0 tid=0x7f79180c5800 nid=0x3d1f in Object.wait() [0x7f79d9289000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:912) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:758) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453) at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98) at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:126) at org.elasticsearch.index.store.Store.access$300(Store.java:76) at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:465) at org.elasticsearch.index.store.Store$MetadataSnapshot.init(Store.java:456) at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:281) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} It didn't really make sense to see RUNNABLE threads in Object.wait(), but this seems to be symptomatic of deadlocks in static initialization (http://ternarysearch.blogspot.ru/2013/07/static-initialization-deadlock.html). I found LUCENE-5573 as an instance of this having come up with Lucene code before. I'm not sure what exactly is going on, but the deadlock in this case seems to involve these threads: {noformat} elasticsearch[search77-es2][clusterService#updateTask][T#1] #79 daemon prio=5 os_prio=0 tid=0x7f7b155ff800 nid=0xd49 in Object.wait() [0x7f79daed8000] java.lang.Thread.State: RUNNABLE at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at
[jira] [Comment Edited] (LUCENE-6482) Class loading deadlock relating to NamedSPILoader
[ https://issues.apache.org/jira/browse/LUCENE-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575660#comment-14575660 ] Uwe Schindler edited comment on LUCENE-6482 at 6/6/15 10:40 AM: From digging around, the main problem is basically the following: - The JVM requires that a parent class must be initialized before the child class - But there is a special case: A parent class is allowed to initialize subclasses from its static initializers (so stuff like our setup works). This is documented in the JLS. - But another thread is not allowed to initialize subclasses at the same time. This basically leads to the deadlock we are seeing here. We have a not yet fully initialized Codec parent class. The other thread is creating an instance of a subclass directly. But while initializing this subclass, it waits for the parent class to get available. But the parent class is currently scanning classpath and creating instances of all available codecs. While doing this it tries to create an instance of exactly the same class that the other thread is instantiating directly using new(). And this is the deadlock. was (Author: thetaphi): From digging around, the main problem is basically the following: - The JVM requires that a parent clas smust be initialized before the child class - But there is a special case: A parent class is allowed to initialize subclasses from its static initializers (so stuff like our setup works). This is documented in the JLS. - But another thread is not allowed to initialize subclasses at the same time. This basically leads to the deadlock we are seeing here. We have a not yet fully initialized Codec parent class. The other thread is creating an instance of a subclass directly. But while initializing this subclass, it waits for the parent class to get available. But the parent class is currently scanning classpath and creating instances of all available codecs. While doing this it tries to create the class that the other thread is instantiating directly using new(). And this is the deadlock. Class loading deadlock relating to NamedSPILoader - Key: LUCENE-6482 URL: https://issues.apache.org/jira/browse/LUCENE-6482 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.9.1 Reporter: Shikhar Bhushan Assignee: Uwe Schindler Attachments: CodecLoadingDeadlockTest.java This issue came up for us several times with Elasticsearch 1.3.4 (Lucene 4.9.1), with many threads seeming deadlocked but RUNNABLE: {noformat} elasticsearch[search77-es2][generic][T#43] #160 daemon prio=5 os_prio=0 tid=0x7f79180c5800 nid=0x3d1f in Object.wait() [0x7f79d9289000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:912) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:758) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453) at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98) at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:126) at org.elasticsearch.index.store.Store.access$300(Store.java:76) at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:465) at org.elasticsearch.index.store.Store$MetadataSnapshot.init(Store.java:456) at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:281) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} It didn't really make sense to see
[jira] [Updated] (LUCENE-6482) Class loading deadlock relating to NamedSPILoader
[ https://issues.apache.org/jira/browse/LUCENE-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-6482: -- Attachment: LUCENE-6482.patch Attached ypu will find a patch that solves the issue. I may only need to check SmokeTester that it can identify the new pattern. With this patch the deadlock is prevented, because a separate, hidden Helper class is used that holds 2 things: The Serviceloader and the default Codec. Initialization is delayed until is accessed first, so a deadlock can never happen. I had to remove the checks that code may call forName() from subclasses ctor (which now works), but may add the deadlock again. So I'll find a way to detect this (using asserts on stacktrace or whatever). [~shikhar]: Can you apply the patch and try to check on your side? Class loading deadlock relating to NamedSPILoader - Key: LUCENE-6482 URL: https://issues.apache.org/jira/browse/LUCENE-6482 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.9.1 Reporter: Shikhar Bhushan Assignee: Uwe Schindler Attachments: CodecLoadingDeadlockTest.java, LUCENE-6482.patch This issue came up for us several times with Elasticsearch 1.3.4 (Lucene 4.9.1), with many threads seeming deadlocked but RUNNABLE: {noformat} elasticsearch[search77-es2][generic][T#43] #160 daemon prio=5 os_prio=0 tid=0x7f79180c5800 nid=0x3d1f in Object.wait() [0x7f79d9289000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:912) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:758) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453) at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98) at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:126) at org.elasticsearch.index.store.Store.access$300(Store.java:76) at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:465) at org.elasticsearch.index.store.Store$MetadataSnapshot.init(Store.java:456) at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:281) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} It didn't really make sense to see RUNNABLE threads in Object.wait(), but this seems to be symptomatic of deadlocks in static initialization (http://ternarysearch.blogspot.ru/2013/07/static-initialization-deadlock.html). I found LUCENE-5573 as an instance of this having come up with Lucene code before. I'm not sure what exactly is going on, but the deadlock in this case seems to involve these threads: {noformat} elasticsearch[search77-es2][clusterService#updateTask][T#1] #79 daemon prio=5 os_prio=0 tid=0x7f7b155ff800 nid=0xd49 in Object.wait() [0x7f79daed8000] java.lang.Thread.State: RUNNABLE at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at java.lang.Class.newInstance(Class.java:433) at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67) - locked 0x00061fef4968 (a org.apache.lucene.util.NamedSPILoader) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:47) at
[jira] [Commented] (LUCENE-6482) Class loading deadlock relating to Codec initialization, default codec and SPI discovery
[ https://issues.apache.org/jira/browse/LUCENE-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575706#comment-14575706 ] Uwe Schindler commented on LUCENE-6482: --- This also fixes the bug that was investigated during LUCENE-4440: A filtercodec using forName() to create the delegate. This is no longer leading to issues (see above). Class loading deadlock relating to Codec initialization, default codec and SPI discovery Key: LUCENE-6482 URL: https://issues.apache.org/jira/browse/LUCENE-6482 Project: Lucene - Core Issue Type: Bug Components: core/codecs Affects Versions: 4.9.1 Reporter: Shikhar Bhushan Assignee: Uwe Schindler Priority: Critical Fix For: Trunk, 5.3 Attachments: CodecLoadingDeadlockTest.java, LUCENE-6482.patch, LUCENE-6482.patch This issue came up for us several times with Elasticsearch 1.3.4 (Lucene 4.9.1), with many threads seeming deadlocked but RUNNABLE: {noformat} elasticsearch[search77-es2][generic][T#43] #160 daemon prio=5 os_prio=0 tid=0x7f79180c5800 nid=0x3d1f in Object.wait() [0x7f79d9289000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:912) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:758) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453) at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98) at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:126) at org.elasticsearch.index.store.Store.access$300(Store.java:76) at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:465) at org.elasticsearch.index.store.Store$MetadataSnapshot.init(Store.java:456) at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:281) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} It didn't really make sense to see RUNNABLE threads in Object.wait(), but this seems to be symptomatic of deadlocks in static initialization (http://ternarysearch.blogspot.ru/2013/07/static-initialization-deadlock.html). I found LUCENE-5573 as an instance of this having come up with Lucene code before. I'm not sure what exactly is going on, but the deadlock in this case seems to involve these threads: {noformat} elasticsearch[search77-es2][clusterService#updateTask][T#1] #79 daemon prio=5 os_prio=0 tid=0x7f7b155ff800 nid=0xd49 in Object.wait() [0x7f79daed8000] java.lang.Thread.State: RUNNABLE at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at java.lang.Class.newInstance(Class.java:433) at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67) - locked 0x00061fef4968 (a org.apache.lucene.util.NamedSPILoader) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:47) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:37) at org.apache.lucene.codecs.PostingsFormat.clinit(PostingsFormat.java:44) at org.elasticsearch.index.codec.postingsformat.PostingFormats.clinit(PostingFormats.java:67) at
[jira] [Commented] (LUCENE-6482) Class loading deadlock relating to NamedSPILoader
[ https://issues.apache.org/jira/browse/LUCENE-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575651#comment-14575651 ] Uwe Schindler commented on LUCENE-6482: --- Thanks! I will also look into SimpleTextCodec, because in all your stack traces, this codec was affected. Did you also try with other codecs? Class loading deadlock relating to NamedSPILoader - Key: LUCENE-6482 URL: https://issues.apache.org/jira/browse/LUCENE-6482 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.9.1 Reporter: Shikhar Bhushan Assignee: Uwe Schindler Attachments: CodecLoadingDeadlockTest.java This issue came up for us several times with Elasticsearch 1.3.4 (Lucene 4.9.1), with many threads seeming deadlocked but RUNNABLE: {noformat} elasticsearch[search77-es2][generic][T#43] #160 daemon prio=5 os_prio=0 tid=0x7f79180c5800 nid=0x3d1f in Object.wait() [0x7f79d9289000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:912) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:758) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453) at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98) at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:126) at org.elasticsearch.index.store.Store.access$300(Store.java:76) at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:465) at org.elasticsearch.index.store.Store$MetadataSnapshot.init(Store.java:456) at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:281) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} It didn't really make sense to see RUNNABLE threads in Object.wait(), but this seems to be symptomatic of deadlocks in static initialization (http://ternarysearch.blogspot.ru/2013/07/static-initialization-deadlock.html). I found LUCENE-5573 as an instance of this having come up with Lucene code before. I'm not sure what exactly is going on, but the deadlock in this case seems to involve these threads: {noformat} elasticsearch[search77-es2][clusterService#updateTask][T#1] #79 daemon prio=5 os_prio=0 tid=0x7f7b155ff800 nid=0xd49 in Object.wait() [0x7f79daed8000] java.lang.Thread.State: RUNNABLE at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at java.lang.Class.newInstance(Class.java:433) at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67) - locked 0x00061fef4968 (a org.apache.lucene.util.NamedSPILoader) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:47) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:37) at org.apache.lucene.codecs.PostingsFormat.clinit(PostingsFormat.java:44) at org.elasticsearch.index.codec.postingsformat.PostingFormats.clinit(PostingFormats.java:67) at org.elasticsearch.index.codec.CodecModule.configurePostingsFormats(CodecModule.java:126) at org.elasticsearch.index.codec.CodecModule.configure(CodecModule.java:178) at
[jira] [Commented] (LUCENE-6482) Class loading deadlock relating to NamedSPILoader
[ https://issues.apache.org/jira/browse/LUCENE-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575656#comment-14575656 ] Uwe Schindler commented on LUCENE-6482: --- I fails with every codec. The issue happens only if you call {{Codec.forName()}} at the same time as using the constructor of any Codec subclass. I have no idea how we should prevent that. I tried to synchronie NamedSPILoader, but that did not help. The problem is that this is a special type of deadlock that does not really involve standard Kava locks. It is more the JVM internal prevent to initialize a subclass before the parent class is initialized. Class loading deadlock relating to NamedSPILoader - Key: LUCENE-6482 URL: https://issues.apache.org/jira/browse/LUCENE-6482 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.9.1 Reporter: Shikhar Bhushan Assignee: Uwe Schindler Attachments: CodecLoadingDeadlockTest.java This issue came up for us several times with Elasticsearch 1.3.4 (Lucene 4.9.1), with many threads seeming deadlocked but RUNNABLE: {noformat} elasticsearch[search77-es2][generic][T#43] #160 daemon prio=5 os_prio=0 tid=0x7f79180c5800 nid=0x3d1f in Object.wait() [0x7f79d9289000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:912) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:758) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453) at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98) at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:126) at org.elasticsearch.index.store.Store.access$300(Store.java:76) at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:465) at org.elasticsearch.index.store.Store$MetadataSnapshot.init(Store.java:456) at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:281) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} It didn't really make sense to see RUNNABLE threads in Object.wait(), but this seems to be symptomatic of deadlocks in static initialization (http://ternarysearch.blogspot.ru/2013/07/static-initialization-deadlock.html). I found LUCENE-5573 as an instance of this having come up with Lucene code before. I'm not sure what exactly is going on, but the deadlock in this case seems to involve these threads: {noformat} elasticsearch[search77-es2][clusterService#updateTask][T#1] #79 daemon prio=5 os_prio=0 tid=0x7f7b155ff800 nid=0xd49 in Object.wait() [0x7f79daed8000] java.lang.Thread.State: RUNNABLE at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at java.lang.Class.newInstance(Class.java:433) at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67) - locked 0x00061fef4968 (a org.apache.lucene.util.NamedSPILoader) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:47) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:37) at org.apache.lucene.codecs.PostingsFormat.clinit(PostingsFormat.java:44) at
[jira] [Commented] (LUCENE-6529) NumericFields + SlowCompositeReaderWrapper + UninvertedReader + -Dtests.codec=random can results in incorrect SortedSetDocValues
[ https://issues.apache.org/jira/browse/LUCENE-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575716#comment-14575716 ] Robert Muir commented on LUCENE-6529: - In the case of all 3 provided seeds we have: {noformat} [junit4] 2 NOTE: test params are: codec=Asserting(Lucene50): {foo=PostingsFormat(name=MockRandom)} {noformat} If i disable the ord-sharing optimization in DocTermOrds, all 3 seeds pass. So I think there is a bug in e.g. FixedGap/BlockTerms dictionary or something like that. Maybe BasePostingsFormatTestCase does not adequately exercise methods like size()/ord()/seek(ord). It should be failing! NumericFields + SlowCompositeReaderWrapper + UninvertedReader + -Dtests.codec=random can results in incorrect SortedSetDocValues - Key: LUCENE-6529 URL: https://issues.apache.org/jira/browse/LUCENE-6529 Project: Lucene - Core Issue Type: Bug Reporter: Hoss Man Attachments: LUCENE-6529.patch Digging into SOLR-7631 and SOLR-7605 I became fairly confident that the only explanation of the behavior i was seeing was some sort of bug in either the randomized codec/postings-format or the UninvertedReader, that was only evident when two were combined and used on a multivalued Numeric Field using precision steps. But since i couldn't find any -Dtests.codec or -Dtests.postings.format options that would cause the bug 100% regardless of seed, I switched tactices and focused on reproducing the problem using UninvertedReader directly and checking the SortedSetDocValues.getValueCount(). I now have a test that fails frequently (and consistently for any seed i find), but only with -Dtests.codec=random -- override it with -Dtests.codec=default and everything works fine (based on the exhaustive testing I did in the linked issues, i suspect every named codec works fine - but i didn't re-do that testing here) The failures only seem to happen when checking the SortedSetDocValues.getValueCount() of a SlowCompositeReaderWrapper around the UninvertedReader -- which suggests the root bug may actually be in SlowCompositeReaderWrapper? (but still has some dependency on the random codec) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6529) NumericFields + SlowCompositeReaderWrapper + UninvertedReader + -Dtests.codec=random can results in incorrect SortedSetDocValues
[ https://issues.apache.org/jira/browse/LUCENE-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575718#comment-14575718 ] Robert Muir commented on LUCENE-6529: - All BasePostingsFormatTestCase has is a measly check that ord() is the correct value when next()'ing through all the terms sequentially. It does not test seek(ord) and other possibilities. I will try to fix the test... NumericFields + SlowCompositeReaderWrapper + UninvertedReader + -Dtests.codec=random can results in incorrect SortedSetDocValues - Key: LUCENE-6529 URL: https://issues.apache.org/jira/browse/LUCENE-6529 Project: Lucene - Core Issue Type: Bug Reporter: Hoss Man Attachments: LUCENE-6529.patch Digging into SOLR-7631 and SOLR-7605 I became fairly confident that the only explanation of the behavior i was seeing was some sort of bug in either the randomized codec/postings-format or the UninvertedReader, that was only evident when two were combined and used on a multivalued Numeric Field using precision steps. But since i couldn't find any -Dtests.codec or -Dtests.postings.format options that would cause the bug 100% regardless of seed, I switched tactices and focused on reproducing the problem using UninvertedReader directly and checking the SortedSetDocValues.getValueCount(). I now have a test that fails frequently (and consistently for any seed i find), but only with -Dtests.codec=random -- override it with -Dtests.codec=default and everything works fine (based on the exhaustive testing I did in the linked issues, i suspect every named codec works fine - but i didn't re-do that testing here) The failures only seem to happen when checking the SortedSetDocValues.getValueCount() of a SlowCompositeReaderWrapper around the UninvertedReader -- which suggests the root bug may actually be in SlowCompositeReaderWrapper? (but still has some dependency on the random codec) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6529) NumericFields + SlowCompositeReaderWrapper + UninvertedReader + -Dtests.codec=random can results in incorrect SortedSetDocValues
[ https://issues.apache.org/jira/browse/LUCENE-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575724#comment-14575724 ] ASF subversion and git services commented on LUCENE-6529: - Commit 1683913 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1683913 ] LUCENE-6529: add asserts NumericFields + SlowCompositeReaderWrapper + UninvertedReader + -Dtests.codec=random can results in incorrect SortedSetDocValues - Key: LUCENE-6529 URL: https://issues.apache.org/jira/browse/LUCENE-6529 Project: Lucene - Core Issue Type: Bug Reporter: Hoss Man Attachments: LUCENE-6529.patch Digging into SOLR-7631 and SOLR-7605 I became fairly confident that the only explanation of the behavior i was seeing was some sort of bug in either the randomized codec/postings-format or the UninvertedReader, that was only evident when two were combined and used on a multivalued Numeric Field using precision steps. But since i couldn't find any -Dtests.codec or -Dtests.postings.format options that would cause the bug 100% regardless of seed, I switched tactices and focused on reproducing the problem using UninvertedReader directly and checking the SortedSetDocValues.getValueCount(). I now have a test that fails frequently (and consistently for any seed i find), but only with -Dtests.codec=random -- override it with -Dtests.codec=default and everything works fine (based on the exhaustive testing I did in the linked issues, i suspect every named codec works fine - but i didn't re-do that testing here) The failures only seem to happen when checking the SortedSetDocValues.getValueCount() of a SlowCompositeReaderWrapper around the UninvertedReader -- which suggests the root bug may actually be in SlowCompositeReaderWrapper? (but still has some dependency on the random codec) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6482) Class loading deadlock relating to NamedSPILoader
[ https://issues.apache.org/jira/browse/LUCENE-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575681#comment-14575681 ] Uwe Schindler commented on LUCENE-6482: --- The only workaround I see is the following: - Move the ServiceLoader, forName, and classpath scanning out of {{clinit}} of Codec (same for PostingsFormat and DocValuesFormat) into separate pkg-private classes (or static inner classes). Let Codec.forName() delegate there. In addition dont call Codec.forName() inside clinit of codec, so initialize the default codec via {{new}}. I will try to implement this, unfortunately the code is no longer as nice as now. I digged around: IBM's ICU has similar hacks to prevent this type of stuff while loading Locales or Charsets. Class loading deadlock relating to NamedSPILoader - Key: LUCENE-6482 URL: https://issues.apache.org/jira/browse/LUCENE-6482 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.9.1 Reporter: Shikhar Bhushan Assignee: Uwe Schindler Attachments: CodecLoadingDeadlockTest.java This issue came up for us several times with Elasticsearch 1.3.4 (Lucene 4.9.1), with many threads seeming deadlocked but RUNNABLE: {noformat} elasticsearch[search77-es2][generic][T#43] #160 daemon prio=5 os_prio=0 tid=0x7f79180c5800 nid=0x3d1f in Object.wait() [0x7f79d9289000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:912) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:758) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453) at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98) at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:126) at org.elasticsearch.index.store.Store.access$300(Store.java:76) at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:465) at org.elasticsearch.index.store.Store$MetadataSnapshot.init(Store.java:456) at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:281) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} It didn't really make sense to see RUNNABLE threads in Object.wait(), but this seems to be symptomatic of deadlocks in static initialization (http://ternarysearch.blogspot.ru/2013/07/static-initialization-deadlock.html). I found LUCENE-5573 as an instance of this having come up with Lucene code before. I'm not sure what exactly is going on, but the deadlock in this case seems to involve these threads: {noformat} elasticsearch[search77-es2][clusterService#updateTask][T#1] #79 daemon prio=5 os_prio=0 tid=0x7f7b155ff800 nid=0xd49 in Object.wait() [0x7f79daed8000] java.lang.Thread.State: RUNNABLE at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at java.lang.Class.newInstance(Class.java:433) at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67) - locked 0x00061fef4968 (a org.apache.lucene.util.NamedSPILoader) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:47) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:37) at
[jira] [Updated] (LUCENE-6482) Class loading deadlock relating to NamedSPILoader
[ https://issues.apache.org/jira/browse/LUCENE-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-6482: -- Component/s: core/codecs Class loading deadlock relating to NamedSPILoader - Key: LUCENE-6482 URL: https://issues.apache.org/jira/browse/LUCENE-6482 Project: Lucene - Core Issue Type: Bug Components: core/codecs Affects Versions: 4.9.1 Reporter: Shikhar Bhushan Assignee: Uwe Schindler Fix For: Trunk, 5.3 Attachments: CodecLoadingDeadlockTest.java, LUCENE-6482.patch This issue came up for us several times with Elasticsearch 1.3.4 (Lucene 4.9.1), with many threads seeming deadlocked but RUNNABLE: {noformat} elasticsearch[search77-es2][generic][T#43] #160 daemon prio=5 os_prio=0 tid=0x7f79180c5800 nid=0x3d1f in Object.wait() [0x7f79d9289000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:912) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:758) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453) at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98) at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:126) at org.elasticsearch.index.store.Store.access$300(Store.java:76) at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:465) at org.elasticsearch.index.store.Store$MetadataSnapshot.init(Store.java:456) at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:281) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} It didn't really make sense to see RUNNABLE threads in Object.wait(), but this seems to be symptomatic of deadlocks in static initialization (http://ternarysearch.blogspot.ru/2013/07/static-initialization-deadlock.html). I found LUCENE-5573 as an instance of this having come up with Lucene code before. I'm not sure what exactly is going on, but the deadlock in this case seems to involve these threads: {noformat} elasticsearch[search77-es2][clusterService#updateTask][T#1] #79 daemon prio=5 os_prio=0 tid=0x7f7b155ff800 nid=0xd49 in Object.wait() [0x7f79daed8000] java.lang.Thread.State: RUNNABLE at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at java.lang.Class.newInstance(Class.java:433) at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67) - locked 0x00061fef4968 (a org.apache.lucene.util.NamedSPILoader) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:47) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:37) at org.apache.lucene.codecs.PostingsFormat.clinit(PostingsFormat.java:44) at org.elasticsearch.index.codec.postingsformat.PostingFormats.clinit(PostingFormats.java:67) at org.elasticsearch.index.codec.CodecModule.configurePostingsFormats(CodecModule.java:126) at org.elasticsearch.index.codec.CodecModule.configure(CodecModule.java:178) at org.elasticsearch.common.inject.AbstractModule.configure(AbstractModule.java:60) - locked 0x00061fef49e8 (a
[jira] [Updated] (LUCENE-6482) Class loading deadlock relating to NamedSPILoader
[ https://issues.apache.org/jira/browse/LUCENE-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-6482: -- Fix Version/s: 5.3 Trunk Class loading deadlock relating to NamedSPILoader - Key: LUCENE-6482 URL: https://issues.apache.org/jira/browse/LUCENE-6482 Project: Lucene - Core Issue Type: Bug Components: core/codecs Affects Versions: 4.9.1 Reporter: Shikhar Bhushan Assignee: Uwe Schindler Fix For: Trunk, 5.3 Attachments: CodecLoadingDeadlockTest.java, LUCENE-6482.patch This issue came up for us several times with Elasticsearch 1.3.4 (Lucene 4.9.1), with many threads seeming deadlocked but RUNNABLE: {noformat} elasticsearch[search77-es2][generic][T#43] #160 daemon prio=5 os_prio=0 tid=0x7f79180c5800 nid=0x3d1f in Object.wait() [0x7f79d9289000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:912) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:758) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453) at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98) at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:126) at org.elasticsearch.index.store.Store.access$300(Store.java:76) at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:465) at org.elasticsearch.index.store.Store$MetadataSnapshot.init(Store.java:456) at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:281) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} It didn't really make sense to see RUNNABLE threads in Object.wait(), but this seems to be symptomatic of deadlocks in static initialization (http://ternarysearch.blogspot.ru/2013/07/static-initialization-deadlock.html). I found LUCENE-5573 as an instance of this having come up with Lucene code before. I'm not sure what exactly is going on, but the deadlock in this case seems to involve these threads: {noformat} elasticsearch[search77-es2][clusterService#updateTask][T#1] #79 daemon prio=5 os_prio=0 tid=0x7f7b155ff800 nid=0xd49 in Object.wait() [0x7f79daed8000] java.lang.Thread.State: RUNNABLE at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at java.lang.Class.newInstance(Class.java:433) at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67) - locked 0x00061fef4968 (a org.apache.lucene.util.NamedSPILoader) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:47) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:37) at org.apache.lucene.codecs.PostingsFormat.clinit(PostingsFormat.java:44) at org.elasticsearch.index.codec.postingsformat.PostingFormats.clinit(PostingFormats.java:67) at org.elasticsearch.index.codec.CodecModule.configurePostingsFormats(CodecModule.java:126) at org.elasticsearch.index.codec.CodecModule.configure(CodecModule.java:178) at org.elasticsearch.common.inject.AbstractModule.configure(AbstractModule.java:60) - locked 0x00061fef49e8 (a
[jira] [Comment Edited] (LUCENE-6482) Class loading deadlock relating to Codec initialization, default codec and SPI discovery
[ https://issues.apache.org/jira/browse/LUCENE-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575706#comment-14575706 ] Uwe Schindler edited comment on LUCENE-6482 at 6/6/15 12:26 PM: -This also fixes the bug that was investigated during LUCENE-4440: A filtercodec using forName() to create the delegate. This is no longer leading to issues (see above).- Ignore this, the issue is still there on FilterCodec. But we have the warning in the Javadocs of FilterCodec already. I will improve the error message again... was (Author: thetaphi): This also fixes the bug that was investigated during LUCENE-4440: A filtercodec using forName() to create the delegate. This is no longer leading to issues (see above). Class loading deadlock relating to Codec initialization, default codec and SPI discovery Key: LUCENE-6482 URL: https://issues.apache.org/jira/browse/LUCENE-6482 Project: Lucene - Core Issue Type: Bug Components: core/codecs Affects Versions: 4.9.1 Reporter: Shikhar Bhushan Assignee: Uwe Schindler Priority: Critical Fix For: Trunk, 5.3 Attachments: CodecLoadingDeadlockTest.java, LUCENE-6482.patch, LUCENE-6482.patch This issue came up for us several times with Elasticsearch 1.3.4 (Lucene 4.9.1), with many threads seeming deadlocked but RUNNABLE: {noformat} elasticsearch[search77-es2][generic][T#43] #160 daemon prio=5 os_prio=0 tid=0x7f79180c5800 nid=0x3d1f in Object.wait() [0x7f79d9289000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:912) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:758) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453) at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98) at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:126) at org.elasticsearch.index.store.Store.access$300(Store.java:76) at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:465) at org.elasticsearch.index.store.Store$MetadataSnapshot.init(Store.java:456) at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:281) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} It didn't really make sense to see RUNNABLE threads in Object.wait(), but this seems to be symptomatic of deadlocks in static initialization (http://ternarysearch.blogspot.ru/2013/07/static-initialization-deadlock.html). I found LUCENE-5573 as an instance of this having come up with Lucene code before. I'm not sure what exactly is going on, but the deadlock in this case seems to involve these threads: {noformat} elasticsearch[search77-es2][clusterService#updateTask][T#1] #79 daemon prio=5 os_prio=0 tid=0x7f7b155ff800 nid=0xd49 in Object.wait() [0x7f79daed8000] java.lang.Thread.State: RUNNABLE at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at java.lang.Class.newInstance(Class.java:433) at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67) - locked
[jira] [Commented] (LUCENE-6482) Class loading deadlock relating to NamedSPILoader
[ https://issues.apache.org/jira/browse/LUCENE-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575652#comment-14575652 ] Uwe Schindler commented on LUCENE-6482: --- SimpleTextCodec alone cannot be the bad guy... I'll still digging. Class loading deadlock relating to NamedSPILoader - Key: LUCENE-6482 URL: https://issues.apache.org/jira/browse/LUCENE-6482 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.9.1 Reporter: Shikhar Bhushan Assignee: Uwe Schindler Attachments: CodecLoadingDeadlockTest.java This issue came up for us several times with Elasticsearch 1.3.4 (Lucene 4.9.1), with many threads seeming deadlocked but RUNNABLE: {noformat} elasticsearch[search77-es2][generic][T#43] #160 daemon prio=5 os_prio=0 tid=0x7f79180c5800 nid=0x3d1f in Object.wait() [0x7f79d9289000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:912) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:758) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453) at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98) at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:126) at org.elasticsearch.index.store.Store.access$300(Store.java:76) at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:465) at org.elasticsearch.index.store.Store$MetadataSnapshot.init(Store.java:456) at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:281) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} It didn't really make sense to see RUNNABLE threads in Object.wait(), but this seems to be symptomatic of deadlocks in static initialization (http://ternarysearch.blogspot.ru/2013/07/static-initialization-deadlock.html). I found LUCENE-5573 as an instance of this having come up with Lucene code before. I'm not sure what exactly is going on, but the deadlock in this case seems to involve these threads: {noformat} elasticsearch[search77-es2][clusterService#updateTask][T#1] #79 daemon prio=5 os_prio=0 tid=0x7f7b155ff800 nid=0xd49 in Object.wait() [0x7f79daed8000] java.lang.Thread.State: RUNNABLE at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at java.lang.Class.newInstance(Class.java:433) at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67) - locked 0x00061fef4968 (a org.apache.lucene.util.NamedSPILoader) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:47) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:37) at org.apache.lucene.codecs.PostingsFormat.clinit(PostingsFormat.java:44) at org.elasticsearch.index.codec.postingsformat.PostingFormats.clinit(PostingFormats.java:67) at org.elasticsearch.index.codec.CodecModule.configurePostingsFormats(CodecModule.java:126) at org.elasticsearch.index.codec.CodecModule.configure(CodecModule.java:178) at org.elasticsearch.common.inject.AbstractModule.configure(AbstractModule.java:60) - locked 0x00061fef49e8 (a
[jira] [Commented] (LUCENE-6482) Class loading deadlock relating to NamedSPILoader
[ https://issues.apache.org/jira/browse/LUCENE-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575696#comment-14575696 ] Uwe Schindler commented on LUCENE-6482: --- I checked SPI on the Tokenstream factories: There is no such issue: - we don't initialize the classes - we don't initialize instances until requested - we have no default tokenstreams or similar Class loading deadlock relating to NamedSPILoader - Key: LUCENE-6482 URL: https://issues.apache.org/jira/browse/LUCENE-6482 Project: Lucene - Core Issue Type: Bug Components: core/codecs Affects Versions: 4.9.1 Reporter: Shikhar Bhushan Assignee: Uwe Schindler Fix For: Trunk, 5.3 Attachments: CodecLoadingDeadlockTest.java, LUCENE-6482.patch This issue came up for us several times with Elasticsearch 1.3.4 (Lucene 4.9.1), with many threads seeming deadlocked but RUNNABLE: {noformat} elasticsearch[search77-es2][generic][T#43] #160 daemon prio=5 os_prio=0 tid=0x7f79180c5800 nid=0x3d1f in Object.wait() [0x7f79d9289000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:912) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:758) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453) at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98) at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:126) at org.elasticsearch.index.store.Store.access$300(Store.java:76) at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:465) at org.elasticsearch.index.store.Store$MetadataSnapshot.init(Store.java:456) at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:281) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} It didn't really make sense to see RUNNABLE threads in Object.wait(), but this seems to be symptomatic of deadlocks in static initialization (http://ternarysearch.blogspot.ru/2013/07/static-initialization-deadlock.html). I found LUCENE-5573 as an instance of this having come up with Lucene code before. I'm not sure what exactly is going on, but the deadlock in this case seems to involve these threads: {noformat} elasticsearch[search77-es2][clusterService#updateTask][T#1] #79 daemon prio=5 os_prio=0 tid=0x7f7b155ff800 nid=0xd49 in Object.wait() [0x7f79daed8000] java.lang.Thread.State: RUNNABLE at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at java.lang.Class.newInstance(Class.java:433) at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67) - locked 0x00061fef4968 (a org.apache.lucene.util.NamedSPILoader) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:47) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:37) at org.apache.lucene.codecs.PostingsFormat.clinit(PostingsFormat.java:44) at org.elasticsearch.index.codec.postingsformat.PostingFormats.clinit(PostingFormats.java:67) at org.elasticsearch.index.codec.CodecModule.configurePostingsFormats(CodecModule.java:126) at
[jira] [Updated] (LUCENE-6482) Class loading deadlock relating to NamedSPILoader
[ https://issues.apache.org/jira/browse/LUCENE-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-6482: -- Priority: Critical (was: Major) Class loading deadlock relating to NamedSPILoader - Key: LUCENE-6482 URL: https://issues.apache.org/jira/browse/LUCENE-6482 Project: Lucene - Core Issue Type: Bug Components: core/codecs Affects Versions: 4.9.1 Reporter: Shikhar Bhushan Assignee: Uwe Schindler Priority: Critical Fix For: Trunk, 5.3 Attachments: CodecLoadingDeadlockTest.java, LUCENE-6482.patch, LUCENE-6482.patch This issue came up for us several times with Elasticsearch 1.3.4 (Lucene 4.9.1), with many threads seeming deadlocked but RUNNABLE: {noformat} elasticsearch[search77-es2][generic][T#43] #160 daemon prio=5 os_prio=0 tid=0x7f79180c5800 nid=0x3d1f in Object.wait() [0x7f79d9289000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:912) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:758) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453) at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98) at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:126) at org.elasticsearch.index.store.Store.access$300(Store.java:76) at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:465) at org.elasticsearch.index.store.Store$MetadataSnapshot.init(Store.java:456) at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:281) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} It didn't really make sense to see RUNNABLE threads in Object.wait(), but this seems to be symptomatic of deadlocks in static initialization (http://ternarysearch.blogspot.ru/2013/07/static-initialization-deadlock.html). I found LUCENE-5573 as an instance of this having come up with Lucene code before. I'm not sure what exactly is going on, but the deadlock in this case seems to involve these threads: {noformat} elasticsearch[search77-es2][clusterService#updateTask][T#1] #79 daemon prio=5 os_prio=0 tid=0x7f7b155ff800 nid=0xd49 in Object.wait() [0x7f79daed8000] java.lang.Thread.State: RUNNABLE at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at java.lang.Class.newInstance(Class.java:433) at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67) - locked 0x00061fef4968 (a org.apache.lucene.util.NamedSPILoader) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:47) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:37) at org.apache.lucene.codecs.PostingsFormat.clinit(PostingsFormat.java:44) at org.elasticsearch.index.codec.postingsformat.PostingFormats.clinit(PostingFormats.java:67) at org.elasticsearch.index.codec.CodecModule.configurePostingsFormats(CodecModule.java:126) at org.elasticsearch.index.codec.CodecModule.configure(CodecModule.java:178) at org.elasticsearch.common.inject.AbstractModule.configure(AbstractModule.java:60) -
[jira] [Comment Edited] (LUCENE-6482) Class loading deadlock relating to NamedSPILoader
[ https://issues.apache.org/jira/browse/LUCENE-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575698#comment-14575698 ] Uwe Schindler edited comment on LUCENE-6482 at 6/6/15 11:50 AM: New patch. I renamed the inner class to Holder and added Javadocs. The checks I removed are no longer an issue. This patch also prevents the deadlock that might happen if you call Codec.forName() from the constructor of a subclass. I would still not do this, but there is no more reason to check this - it cannot deadlock or NPE. Unfrotunately this is too late to get into 5.2, so I delay to 5.3 (or maybe 5.2.1). We should maybe put this into a 4.10.x release for those people that are affected by this and are still on 4.x. I will raise this issue's priority to Critical. Elasticsearch should maybe for now use the workaround by calling Codec.availableCodecs() in its Boostrap class before init. was (Author: thetaphi): New patch. I renamed the inner class to Holder and added Javadocs. The checks I removed are no longer an issue. This patch also prevents the deadlock that might happen if you call Codec.forName() from the constructor of a subclass. I would still not do this, but there is no more reason to check this - it cannot deadlock or NPE. Unfrotunately this is too late to get into 5.2, so I delay to 5.2. We should maybe put this into a 4.10.x release for those people that are affected by this and are still on 4.x. I will raise this issue's priority to Critical. Elasticsearch should maybe for now use the workaround by calling Codec.availableCodecs() in its Boostrap class before init. Class loading deadlock relating to NamedSPILoader - Key: LUCENE-6482 URL: https://issues.apache.org/jira/browse/LUCENE-6482 Project: Lucene - Core Issue Type: Bug Components: core/codecs Affects Versions: 4.9.1 Reporter: Shikhar Bhushan Assignee: Uwe Schindler Priority: Critical Fix For: Trunk, 5.3 Attachments: CodecLoadingDeadlockTest.java, LUCENE-6482.patch, LUCENE-6482.patch This issue came up for us several times with Elasticsearch 1.3.4 (Lucene 4.9.1), with many threads seeming deadlocked but RUNNABLE: {noformat} elasticsearch[search77-es2][generic][T#43] #160 daemon prio=5 os_prio=0 tid=0x7f79180c5800 nid=0x3d1f in Object.wait() [0x7f79d9289000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:912) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:758) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453) at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98) at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:126) at org.elasticsearch.index.store.Store.access$300(Store.java:76) at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:465) at org.elasticsearch.index.store.Store$MetadataSnapshot.init(Store.java:456) at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:281) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} It didn't really make sense to see RUNNABLE threads in Object.wait(), but this seems to be symptomatic of deadlocks in static initialization (http://ternarysearch.blogspot.ru/2013/07/static-initialization-deadlock.html). I found LUCENE-5573 as an instance of this having come up with Lucene code before. I'm not sure what exactly is going
[jira] [Updated] (LUCENE-6482) Class loading deadlock relating to NamedSPILoader
[ https://issues.apache.org/jira/browse/LUCENE-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-6482: -- Attachment: LUCENE-6482.patch New patch. I renamed the inner class to Holder and added Javadocs. The checks I removed are no longer an issue. This patch also prevents the deadlock that might happen if you call Codec.forName() from the constructor of a subclass. I would still not do this, but there is no more reason to check this - it cannot deadlock or NPE. Unfrotunately this is too late to get into 5.2, so I delay to 5.2. We should maybe put this into a 4.10.x release for those people that are affected by this and are still on 4.x. I will raise this issue's priority to Critical. Elasticsearch should maybe for now use the workaround by calling Codec.availableCodecs() in its Boostrap class before init. Class loading deadlock relating to NamedSPILoader - Key: LUCENE-6482 URL: https://issues.apache.org/jira/browse/LUCENE-6482 Project: Lucene - Core Issue Type: Bug Components: core/codecs Affects Versions: 4.9.1 Reporter: Shikhar Bhushan Assignee: Uwe Schindler Fix For: Trunk, 5.3 Attachments: CodecLoadingDeadlockTest.java, LUCENE-6482.patch, LUCENE-6482.patch This issue came up for us several times with Elasticsearch 1.3.4 (Lucene 4.9.1), with many threads seeming deadlocked but RUNNABLE: {noformat} elasticsearch[search77-es2][generic][T#43] #160 daemon prio=5 os_prio=0 tid=0x7f79180c5800 nid=0x3d1f in Object.wait() [0x7f79d9289000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:912) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:758) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453) at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98) at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:126) at org.elasticsearch.index.store.Store.access$300(Store.java:76) at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:465) at org.elasticsearch.index.store.Store$MetadataSnapshot.init(Store.java:456) at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:281) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} It didn't really make sense to see RUNNABLE threads in Object.wait(), but this seems to be symptomatic of deadlocks in static initialization (http://ternarysearch.blogspot.ru/2013/07/static-initialization-deadlock.html). I found LUCENE-5573 as an instance of this having come up with Lucene code before. I'm not sure what exactly is going on, but the deadlock in this case seems to involve these threads: {noformat} elasticsearch[search77-es2][clusterService#updateTask][T#1] #79 daemon prio=5 os_prio=0 tid=0x7f7b155ff800 nid=0xd49 in Object.wait() [0x7f79daed8000] java.lang.Thread.State: RUNNABLE at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at java.lang.Class.newInstance(Class.java:433) at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67) - locked 0x00061fef4968 (a
[jira] [Updated] (LUCENE-6482) Class loading deadlock relating to Codec initialization, default codec and SPI discovery
[ https://issues.apache.org/jira/browse/LUCENE-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-6482: -- Summary: Class loading deadlock relating to Codec initialization, default codec and SPI discovery (was: Class loading deadlock relating to NamedSPILoader) Class loading deadlock relating to Codec initialization, default codec and SPI discovery Key: LUCENE-6482 URL: https://issues.apache.org/jira/browse/LUCENE-6482 Project: Lucene - Core Issue Type: Bug Components: core/codecs Affects Versions: 4.9.1 Reporter: Shikhar Bhushan Assignee: Uwe Schindler Priority: Critical Fix For: Trunk, 5.3 Attachments: CodecLoadingDeadlockTest.java, LUCENE-6482.patch, LUCENE-6482.patch This issue came up for us several times with Elasticsearch 1.3.4 (Lucene 4.9.1), with many threads seeming deadlocked but RUNNABLE: {noformat} elasticsearch[search77-es2][generic][T#43] #160 daemon prio=5 os_prio=0 tid=0x7f79180c5800 nid=0x3d1f in Object.wait() [0x7f79d9289000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:912) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:758) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453) at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98) at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:126) at org.elasticsearch.index.store.Store.access$300(Store.java:76) at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:465) at org.elasticsearch.index.store.Store$MetadataSnapshot.init(Store.java:456) at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:281) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} It didn't really make sense to see RUNNABLE threads in Object.wait(), but this seems to be symptomatic of deadlocks in static initialization (http://ternarysearch.blogspot.ru/2013/07/static-initialization-deadlock.html). I found LUCENE-5573 as an instance of this having come up with Lucene code before. I'm not sure what exactly is going on, but the deadlock in this case seems to involve these threads: {noformat} elasticsearch[search77-es2][clusterService#updateTask][T#1] #79 daemon prio=5 os_prio=0 tid=0x7f7b155ff800 nid=0xd49 in Object.wait() [0x7f79daed8000] java.lang.Thread.State: RUNNABLE at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at java.lang.Class.newInstance(Class.java:433) at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67) - locked 0x00061fef4968 (a org.apache.lucene.util.NamedSPILoader) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:47) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:37) at org.apache.lucene.codecs.PostingsFormat.clinit(PostingsFormat.java:44) at org.elasticsearch.index.codec.postingsformat.PostingFormats.clinit(PostingFormats.java:67) at
[jira] [Updated] (LUCENE-6482) Class loading deadlock relating to Codec initialization, default codec and SPI discovery
[ https://issues.apache.org/jira/browse/LUCENE-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-6482: -- Attachment: LUCENE-6482.patch Revised patch which brings back the IllegalStateExceptions for people misusing FilterCodec. I think it's ready. Class loading deadlock relating to Codec initialization, default codec and SPI discovery Key: LUCENE-6482 URL: https://issues.apache.org/jira/browse/LUCENE-6482 Project: Lucene - Core Issue Type: Bug Components: core/codecs Affects Versions: 4.9.1 Reporter: Shikhar Bhushan Assignee: Uwe Schindler Priority: Critical Fix For: Trunk, 5.3 Attachments: CodecLoadingDeadlockTest.java, LUCENE-6482.patch, LUCENE-6482.patch, LUCENE-6482.patch This issue came up for us several times with Elasticsearch 1.3.4 (Lucene 4.9.1), with many threads seeming deadlocked but RUNNABLE: {noformat} elasticsearch[search77-es2][generic][T#43] #160 daemon prio=5 os_prio=0 tid=0x7f79180c5800 nid=0x3d1f in Object.wait() [0x7f79d9289000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:912) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:758) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453) at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98) at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:126) at org.elasticsearch.index.store.Store.access$300(Store.java:76) at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:465) at org.elasticsearch.index.store.Store$MetadataSnapshot.init(Store.java:456) at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:281) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} It didn't really make sense to see RUNNABLE threads in Object.wait(), but this seems to be symptomatic of deadlocks in static initialization (http://ternarysearch.blogspot.ru/2013/07/static-initialization-deadlock.html). I found LUCENE-5573 as an instance of this having come up with Lucene code before. I'm not sure what exactly is going on, but the deadlock in this case seems to involve these threads: {noformat} elasticsearch[search77-es2][clusterService#updateTask][T#1] #79 daemon prio=5 os_prio=0 tid=0x7f7b155ff800 nid=0xd49 in Object.wait() [0x7f79daed8000] java.lang.Thread.State: RUNNABLE at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at java.lang.Class.newInstance(Class.java:433) at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67) - locked 0x00061fef4968 (a org.apache.lucene.util.NamedSPILoader) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:47) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:37) at org.apache.lucene.codecs.PostingsFormat.clinit(PostingsFormat.java:44) at org.elasticsearch.index.codec.postingsformat.PostingFormats.clinit(PostingFormats.java:67) at
[jira] [Commented] (LUCENE-6529) NumericFields + SlowCompositeReaderWrapper + UninvertedReader + -Dtests.codec=random can results in incorrect SortedSetDocValues
[ https://issues.apache.org/jira/browse/LUCENE-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575713#comment-14575713 ] Robert Muir commented on LUCENE-6529: - I saw this, i didn't have a chance to look at it yet until now. Thanks for narrowing it down to this test! When fields actually have multiple values (which is the situation you test), DocTermsOrds is used, and, in the case the codec supports optional ord() and seek(ord), it will use them. So maybe there is a bug in one of the term dictionaries there, and why its only provoked with random codecs. I will play with the test and try to narrow it further. NumericFields + SlowCompositeReaderWrapper + UninvertedReader + -Dtests.codec=random can results in incorrect SortedSetDocValues - Key: LUCENE-6529 URL: https://issues.apache.org/jira/browse/LUCENE-6529 Project: Lucene - Core Issue Type: Bug Reporter: Hoss Man Attachments: LUCENE-6529.patch Digging into SOLR-7631 and SOLR-7605 I became fairly confident that the only explanation of the behavior i was seeing was some sort of bug in either the randomized codec/postings-format or the UninvertedReader, that was only evident when two were combined and used on a multivalued Numeric Field using precision steps. But since i couldn't find any -Dtests.codec or -Dtests.postings.format options that would cause the bug 100% regardless of seed, I switched tactices and focused on reproducing the problem using UninvertedReader directly and checking the SortedSetDocValues.getValueCount(). I now have a test that fails frequently (and consistently for any seed i find), but only with -Dtests.codec=random -- override it with -Dtests.codec=default and everything works fine (based on the exhaustive testing I did in the linked issues, i suspect every named codec works fine - but i didn't re-do that testing here) The failures only seem to happen when checking the SortedSetDocValues.getValueCount() of a SlowCompositeReaderWrapper around the UninvertedReader -- which suggests the root bug may actually be in SlowCompositeReaderWrapper? (but still has some dependency on the random codec) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6482) Class loading deadlock relating to NamedSPILoader
[ https://issues.apache.org/jira/browse/LUCENE-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575660#comment-14575660 ] Uwe Schindler commented on LUCENE-6482: --- From digging around, the main problem is basically the following: - The JVM requires that a parent clas smust be initialized before the child class - But there is a special case: A parent class is allowed to initialize subclasses from its static initializers (so stuff like our setup works). This is documented in the JLS. - But another thread is not allowed to initialize subclasses at the same time. This basically leads to the deadlock we are seeing here. We have a not yet fully initialized Codec parent class. The other thread is creating an instance of a subclass directly. But while initializing this subclass, it waits for the parent class to get available. But the parent class is currently scanning classpath and creating instances of all available codecs. While doing this it tries to create the class that the other thread is instantiating directly using new(). And this is the deadlock. Class loading deadlock relating to NamedSPILoader - Key: LUCENE-6482 URL: https://issues.apache.org/jira/browse/LUCENE-6482 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.9.1 Reporter: Shikhar Bhushan Assignee: Uwe Schindler Attachments: CodecLoadingDeadlockTest.java This issue came up for us several times with Elasticsearch 1.3.4 (Lucene 4.9.1), with many threads seeming deadlocked but RUNNABLE: {noformat} elasticsearch[search77-es2][generic][T#43] #160 daemon prio=5 os_prio=0 tid=0x7f79180c5800 nid=0x3d1f in Object.wait() [0x7f79d9289000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:912) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:758) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453) at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98) at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:126) at org.elasticsearch.index.store.Store.access$300(Store.java:76) at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:465) at org.elasticsearch.index.store.Store$MetadataSnapshot.init(Store.java:456) at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:281) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} It didn't really make sense to see RUNNABLE threads in Object.wait(), but this seems to be symptomatic of deadlocks in static initialization (http://ternarysearch.blogspot.ru/2013/07/static-initialization-deadlock.html). I found LUCENE-5573 as an instance of this having come up with Lucene code before. I'm not sure what exactly is going on, but the deadlock in this case seems to involve these threads: {noformat} elasticsearch[search77-es2][clusterService#updateTask][T#1] #79 daemon prio=5 os_prio=0 tid=0x7f7b155ff800 nid=0xd49 in Object.wait() [0x7f79daed8000] java.lang.Thread.State: RUNNABLE at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at
[jira] [Commented] (LUCENE-6482) Class loading deadlock relating to Codec initialization, default codec and SPI discovery
[ https://issues.apache.org/jira/browse/LUCENE-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575701#comment-14575701 ] Uwe Schindler commented on LUCENE-6482: --- I checked smoke tester. This one does not validate the Codec.forName() in Codec.java, so no changes needed. I think patch is ready. Many thanks to [~shikhar] for reporting this, this is really a nasty bug :( Class loading deadlock relating to Codec initialization, default codec and SPI discovery Key: LUCENE-6482 URL: https://issues.apache.org/jira/browse/LUCENE-6482 Project: Lucene - Core Issue Type: Bug Components: core/codecs Affects Versions: 4.9.1 Reporter: Shikhar Bhushan Assignee: Uwe Schindler Priority: Critical Fix For: Trunk, 5.3 Attachments: CodecLoadingDeadlockTest.java, LUCENE-6482.patch, LUCENE-6482.patch This issue came up for us several times with Elasticsearch 1.3.4 (Lucene 4.9.1), with many threads seeming deadlocked but RUNNABLE: {noformat} elasticsearch[search77-es2][generic][T#43] #160 daemon prio=5 os_prio=0 tid=0x7f79180c5800 nid=0x3d1f in Object.wait() [0x7f79d9289000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:912) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:758) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453) at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98) at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:126) at org.elasticsearch.index.store.Store.access$300(Store.java:76) at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:465) at org.elasticsearch.index.store.Store$MetadataSnapshot.init(Store.java:456) at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:281) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} It didn't really make sense to see RUNNABLE threads in Object.wait(), but this seems to be symptomatic of deadlocks in static initialization (http://ternarysearch.blogspot.ru/2013/07/static-initialization-deadlock.html). I found LUCENE-5573 as an instance of this having come up with Lucene code before. I'm not sure what exactly is going on, but the deadlock in this case seems to involve these threads: {noformat} elasticsearch[search77-es2][clusterService#updateTask][T#1] #79 daemon prio=5 os_prio=0 tid=0x7f7b155ff800 nid=0xd49 in Object.wait() [0x7f79daed8000] java.lang.Thread.State: RUNNABLE at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at java.lang.Class.newInstance(Class.java:433) at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67) - locked 0x00061fef4968 (a org.apache.lucene.util.NamedSPILoader) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:47) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:37) at org.apache.lucene.codecs.PostingsFormat.clinit(PostingsFormat.java:44) at
[jira] [Commented] (SOLR-7555) Display total space and available space in Admin
[ https://issues.apache.org/jira/browse/SOLR-7555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575717#comment-14575717 ] Eric Pugh commented on SOLR-7555: - So this is a great example why having awesome unit testing framework helps us! I didn't know about the increment/decrement of directory objects. The unit test + your help showed me. So now it appears to be passing. I'm updating patch. Display total space and available space in Admin Key: SOLR-7555 URL: https://issues.apache.org/jira/browse/SOLR-7555 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 5.1 Reporter: Eric Pugh Assignee: Erik Hatcher Priority: Minor Fix For: 5.3 Attachments: DiskSpaceAwareDirectory.java, SOLR-7555-display_disk_space.patch, SOLR-7555-display_disk_space_v2.patch, SOLR-7555-display_disk_space_v3.patch, SOLR-7555-display_disk_space_v4.patch, SOLR-7555.patch, SOLR-7555.patch, SOLR-7555.patch Frequently I have access to the Solr Admin console, but not the underlying server, and I'm curious how much space remains available. This little patch exposes total Volume size as well as the usable space remaining: !https://monosnap.com/file/VqlReekCFwpK6utI3lP18fbPqrGI4b.png! I'm not sure if this is the best place to put this, as every shard will share the same data, so maybe it should be on the top level Dashboard? Also not sure what to call the fields! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene / Solr 5.2 release notes
I added you to both the Lucene and Solr admin groups, I'm pretty sure that automatically gives you edit privileges, let me know if I'm wrong there. That should allow you to add users to the contributor's groups too. Erick On Sat, Jun 6, 2015 at 1:53 AM, Ramkumar R. Aiyengar andyetitmo...@gmail.com wrote: Could we change the CHANGES.txt links with the .html links instead? They are much better formatted. (I just realised I don't have edit access to the wiki, could someone add me? Name: Ramkumar Aiyengar, Alias: andyetitmoves) I’ve made drafts for the Lucene and Solr release notes - please feel free to edit or suggest edits: Lucene: https://wiki.apache.org/lucene-java/ReleaseNote52 Solr: http://wiki.apache.org/solr/ReleaseNote52 -- Anshum Gupta - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: ASF Policeman Jenkins jobs disabled [was: Re: [JENKINS] Solr-Artifacts-5.2 - Build # 13 - Still Failing]
For no apparent reason this just started working for me. Honest, I didn't change nuttin'. On Sat, Jun 6, 2015 at 1:51 AM, Uwe Schindler u...@thetaphi.de wrote: Policeman Jenkins mainly goes to the EU mirror of SVN; that may explain the difference. Many thanks @ Steve for temporarily disabling the builds. I am not sure what you did on Policeman Jenkins, but all is back up again. The easiest is generally to click on Prepare shutdown, then it runs no jobs anymore. Unfortunately you cannot do this on ASF Jenkins, maybe Infra should have done this. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Steve Rowe [mailto:sar...@gmail.com] Sent: Saturday, June 06, 2015 4:02 AM To: dev@lucene.apache.org Subject: Re: ASF Policeman Jenkins jobs disabled [was: Re: [JENKINS] Solr- Artifacts-5.2 - Build # 13 - Still Failing] I took the Policeman Jenkins nodes back online, and jobs seem to be building fine there. However, after I re-enabled all the ASF Jenkins jobs, Subversion can’t update checkouts. For now I’m re-disabling all the ASF Jenkins jobs. Also, I can’t update or checkout locally. At Infra’s request on Hipchat, I created a JIRA: https://issues.apache.org/jira/browse/INFRA-9775 Steve On Jun 5, 2015, at 7:37 PM, Steve Rowe sar...@gmail.com wrote: ASF Infrastructure wrote in an email to committ...@apache.org that Subversion will be down for as much as 6 hours from 22:00 UTC. I temporarily took all three Policeman Jenkins nodes offline. I apparently don’t have such permission on ASF Jenkins, so I instead manually disabled all 19 jobs. I’ll re-enable things once Subversion is back up, if I’m still awake. Otherwise, anybody else should please feel free to do the same. Steve On Jun 5, 2015, at 6:56 PM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Solr-Artifacts-5.2/13/ No tests ran. Build Log: [...truncated 8 lines...] ERROR: Failed to update http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_5_2 org.tmatesoft.svn.core.SVNException: svn: E175002: OPTIONS /repos/asf/lucene/dev/branches/lucene_solr_5_2 failed at org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPC onnection.java:388) at org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPC onnection.java:373) at org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPC onnection.java:361) at org.tmatesoft.svn.core.internal.io.dav.DAVConnection.performHttpRequest (DAVConnection.java:707) at org.tmatesoft.svn.core.internal.io.dav.DAVConnection.exchangeCapabilities (DAVConnection.java:627) at org.tmatesoft.svn.core.internal.io.dav.DAVConnection.open(DAVConnectio n.java:102) at org.tmatesoft.svn.core.internal.io.dav.DAVRepository.openConnection(DAV Repository.java:1020) at org.tmatesoft.svn.core.internal.io.dav.DAVRepository.getRepositoryUUID(D AVRepository.java:148) at org.tmatesoft.svn.core.internal.wc16.SVNBasicDelegate.createRepository(S VNBasicDelegate.java:339) at org.tmatesoft.svn.core.internal.wc16.SVNBasicDelegate.createRepository(S VNBasicDelegate.java:328) at org.tmatesoft.svn.core.internal.wc16.SVNUpdateClient16.update(SVNUpdat eClient16.java:482) at org.tmatesoft.svn.core.internal.wc16.SVNUpdateClient16.doUpdate(SVNUp dateClient16.java:364) at org.tmatesoft.svn.core.internal.wc16.SVNUpdateClient16.doUpdate(SVNUp dateClient16.java:274) at org.tmatesoft.svn.core.internal.wc2.old.SvnOldUpdate.run(SvnOldUpdate.ja va:27) at org.tmatesoft.svn.core.internal.wc2.old.SvnOldUpdate.run(SvnOldUpdate.ja va:11) at org.tmatesoft.svn.core.internal.wc2.SvnOperationRunner.run(SvnOperation Runner.java:20) at org.tmatesoft.svn.core.wc2.SvnOperationFactory.run(SvnOperationFactory.j ava:1238) at org.tmatesoft.svn.core.wc2.SvnOperation.run(SvnOperation.java:294) at org.tmatesoft.svn.core.wc.SVNUpdateClient.doUpdate(SVNUpdateClient.ja va:311) at org.tmatesoft.svn.core.wc.SVNUpdateClient.doUpdate(SVNUpdateClient.ja va:291) at org.tmatesoft.svn.core.wc.SVNUpdateClient.doUpdate(SVNUpdateClient.ja va:387) at hudson.scm.subversion.UpdateUpdater$TaskImpl.perform(UpdateUpdater. java:157) at hudson.scm.subversion.WorkspaceUpdater$UpdateTask.delegateTo(Works paceUpdater.java:161) at hudson.scm.SubversionSCM$CheckOutTask.perform(SubversionSCM.java:1 030) at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:101 1) at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:987 ) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2474) at hudson.remoting.UserRequest.perform(UserRequest.java:118) at hudson.remoting.UserRequest.perform(UserRequest.java:48) at
Re: ASF Policeman Jenkins jobs disabled [was: Re: [JENKINS] Solr-Artifacts-5.2 - Build # 13 - Still Failing]
Ahh, working in this case means I can do an SVN checkout. On Sat, Jun 6, 2015 at 9:45 AM, Erick Erickson erickerick...@gmail.com wrote: For no apparent reason this just started working for me. Honest, I didn't change nuttin'. On Sat, Jun 6, 2015 at 1:51 AM, Uwe Schindler u...@thetaphi.de wrote: Policeman Jenkins mainly goes to the EU mirror of SVN; that may explain the difference. Many thanks @ Steve for temporarily disabling the builds. I am not sure what you did on Policeman Jenkins, but all is back up again. The easiest is generally to click on Prepare shutdown, then it runs no jobs anymore. Unfortunately you cannot do this on ASF Jenkins, maybe Infra should have done this. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Steve Rowe [mailto:sar...@gmail.com] Sent: Saturday, June 06, 2015 4:02 AM To: dev@lucene.apache.org Subject: Re: ASF Policeman Jenkins jobs disabled [was: Re: [JENKINS] Solr- Artifacts-5.2 - Build # 13 - Still Failing] I took the Policeman Jenkins nodes back online, and jobs seem to be building fine there. However, after I re-enabled all the ASF Jenkins jobs, Subversion can’t update checkouts. For now I’m re-disabling all the ASF Jenkins jobs. Also, I can’t update or checkout locally. At Infra’s request on Hipchat, I created a JIRA: https://issues.apache.org/jira/browse/INFRA-9775 Steve On Jun 5, 2015, at 7:37 PM, Steve Rowe sar...@gmail.com wrote: ASF Infrastructure wrote in an email to committ...@apache.org that Subversion will be down for as much as 6 hours from 22:00 UTC. I temporarily took all three Policeman Jenkins nodes offline. I apparently don’t have such permission on ASF Jenkins, so I instead manually disabled all 19 jobs. I’ll re-enable things once Subversion is back up, if I’m still awake. Otherwise, anybody else should please feel free to do the same. Steve On Jun 5, 2015, at 6:56 PM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Solr-Artifacts-5.2/13/ No tests ran. Build Log: [...truncated 8 lines...] ERROR: Failed to update http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_5_2 org.tmatesoft.svn.core.SVNException: svn: E175002: OPTIONS /repos/asf/lucene/dev/branches/lucene_solr_5_2 failed at org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPC onnection.java:388) at org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPC onnection.java:373) at org.tmatesoft.svn.core.internal.io.dav.http.HTTPConnection.request(HTTPC onnection.java:361) at org.tmatesoft.svn.core.internal.io.dav.DAVConnection.performHttpRequest (DAVConnection.java:707) at org.tmatesoft.svn.core.internal.io.dav.DAVConnection.exchangeCapabilities (DAVConnection.java:627) at org.tmatesoft.svn.core.internal.io.dav.DAVConnection.open(DAVConnectio n.java:102) at org.tmatesoft.svn.core.internal.io.dav.DAVRepository.openConnection(DAV Repository.java:1020) at org.tmatesoft.svn.core.internal.io.dav.DAVRepository.getRepositoryUUID(D AVRepository.java:148) at org.tmatesoft.svn.core.internal.wc16.SVNBasicDelegate.createRepository(S VNBasicDelegate.java:339) at org.tmatesoft.svn.core.internal.wc16.SVNBasicDelegate.createRepository(S VNBasicDelegate.java:328) at org.tmatesoft.svn.core.internal.wc16.SVNUpdateClient16.update(SVNUpdat eClient16.java:482) at org.tmatesoft.svn.core.internal.wc16.SVNUpdateClient16.doUpdate(SVNUp dateClient16.java:364) at org.tmatesoft.svn.core.internal.wc16.SVNUpdateClient16.doUpdate(SVNUp dateClient16.java:274) at org.tmatesoft.svn.core.internal.wc2.old.SvnOldUpdate.run(SvnOldUpdate.ja va:27) at org.tmatesoft.svn.core.internal.wc2.old.SvnOldUpdate.run(SvnOldUpdate.ja va:11) at org.tmatesoft.svn.core.internal.wc2.SvnOperationRunner.run(SvnOperation Runner.java:20) at org.tmatesoft.svn.core.wc2.SvnOperationFactory.run(SvnOperationFactory.j ava:1238) at org.tmatesoft.svn.core.wc2.SvnOperation.run(SvnOperation.java:294) at org.tmatesoft.svn.core.wc.SVNUpdateClient.doUpdate(SVNUpdateClient.ja va:311) at org.tmatesoft.svn.core.wc.SVNUpdateClient.doUpdate(SVNUpdateClient.ja va:291) at org.tmatesoft.svn.core.wc.SVNUpdateClient.doUpdate(SVNUpdateClient.ja va:387) at hudson.scm.subversion.UpdateUpdater$TaskImpl.perform(UpdateUpdater. java:157) at hudson.scm.subversion.WorkspaceUpdater$UpdateTask.delegateTo(Works paceUpdater.java:161) at hudson.scm.SubversionSCM$CheckOutTask.perform(SubversionSCM.java:1 030) at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:101 1) at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:987 ) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2474) at
[jira] [Commented] (LUCENE-6529) NumericFields + SlowCompositeReaderWrapper + UninvertedReader + -Dtests.codec=random can results in incorrect SortedSetDocValues
[ https://issues.apache.org/jira/browse/LUCENE-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575756#comment-14575756 ] Robert Muir commented on LUCENE-6529: - I have to run for now, one thing to investigate too: is the problem the extra terms introduced by precision step? Maybe crank precisionStep down and see if expected/actual change. Maybe the current optimization is unsafe in that case and yields a bogus valueCount including the range terms, which screws up things down the road. NumericFields + SlowCompositeReaderWrapper + UninvertedReader + -Dtests.codec=random can results in incorrect SortedSetDocValues - Key: LUCENE-6529 URL: https://issues.apache.org/jira/browse/LUCENE-6529 Project: Lucene - Core Issue Type: Bug Reporter: Hoss Man Attachments: LUCENE-6529.patch Digging into SOLR-7631 and SOLR-7605 I became fairly confident that the only explanation of the behavior i was seeing was some sort of bug in either the randomized codec/postings-format or the UninvertedReader, that was only evident when two were combined and used on a multivalued Numeric Field using precision steps. But since i couldn't find any -Dtests.codec or -Dtests.postings.format options that would cause the bug 100% regardless of seed, I switched tactices and focused on reproducing the problem using UninvertedReader directly and checking the SortedSetDocValues.getValueCount(). I now have a test that fails frequently (and consistently for any seed i find), but only with -Dtests.codec=random -- override it with -Dtests.codec=default and everything works fine (based on the exhaustive testing I did in the linked issues, i suspect every named codec works fine - but i didn't re-do that testing here) The failures only seem to happen when checking the SortedSetDocValues.getValueCount() of a SlowCompositeReaderWrapper around the UninvertedReader -- which suggests the root bug may actually be in SlowCompositeReaderWrapper? (but still has some dependency on the random codec) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7642) Should launching Solr in cloud mode using a ZooKeeper chroot create the chroot znode if it doesn't exist?
[ https://issues.apache.org/jira/browse/SOLR-7642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575810#comment-14575810 ] Mark Miller commented on SOLR-7642: --- I think we have discussed this before. Perhaps with an issue by [~tomasflobbe] if I remember right. I think we ended up only doing it when bootstrapping or something. What I don't like about it is that things like typos in connection strings just create new zk nodes and start a fresh zk state tree. Kind of seems like if you want to create a node to be a chroot you should explicitly create it. If you are using a chroot you probably have other processes using ZK and it's probably best if you manage the root nodes of that directly rather than automagically. Should launching Solr in cloud mode using a ZooKeeper chroot create the chroot znode if it doesn't exist? - Key: SOLR-7642 URL: https://issues.apache.org/jira/browse/SOLR-7642 Project: Solr Issue Type: Improvement Reporter: Timothy Potter Priority: Minor If you launch Solr for the first time in cloud mode using a ZooKeeper connection string that includes a chroot leads to the following initialization error: {code} ERROR - 2015-06-05 17:15:50.410; [ ] org.apache.solr.common.SolrException; null:org.apache.solr.common.cloud.ZooKeeperException: A chroot was specified in ZkHost but the znode doesn't exist. localhost:2181/lan at org.apache.solr.core.ZkContainer.initZooKeeper(ZkContainer.java:113) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:339) at org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:140) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:110) at org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:138) at org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:852) at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:298) at org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1349) at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1342) at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:741) at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:505) {code} The work-around for this is to use the scripts/cloud-scripts/zkcli.sh script to create the chroot znode (bootstrap action does this). I'm wondering if we shouldn't just create the znode if it doesn't exist? Or is that some violation of using a chroot? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7555) Display total space and available space in Admin
[ https://issues.apache.org/jira/browse/SOLR-7555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Pugh updated SOLR-7555: Attachment: SOLR-7555-display_disk_space_v5.patch This version of the patch file incorporates the improvements to the test case that Erik provided, and properly releases the Directory object after it being used, so it passes the tests. Display total space and available space in Admin Key: SOLR-7555 URL: https://issues.apache.org/jira/browse/SOLR-7555 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 5.1 Reporter: Eric Pugh Assignee: Erik Hatcher Priority: Minor Fix For: 5.3 Attachments: DiskSpaceAwareDirectory.java, SOLR-7555-display_disk_space.patch, SOLR-7555-display_disk_space_v2.patch, SOLR-7555-display_disk_space_v3.patch, SOLR-7555-display_disk_space_v4.patch, SOLR-7555-display_disk_space_v5.patch, SOLR-7555.patch, SOLR-7555.patch, SOLR-7555.patch Frequently I have access to the Solr Admin console, but not the underlying server, and I'm curious how much space remains available. This little patch exposes total Volume size as well as the usable space remaining: !https://monosnap.com/file/VqlReekCFwpK6utI3lP18fbPqrGI4b.png! I'm not sure if this is the best place to put this, as every shard will share the same data, so maybe it should be on the top level Dashboard? Also not sure what to call the fields! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6529) NumericFields + SlowCompositeReaderWrapper + UninvertedReader + -Dtests.codec=random can results in incorrect SortedSetDocValues
[ https://issues.apache.org/jira/browse/LUCENE-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575806#comment-14575806 ] Hoss Man commented on LUCENE-6529: -- thanks for looking into this rmuir. i haven't tried out your patch yet, but in response to your questions... bq. is the problem the extra terms introduced by precision step? ... almost certainly. The test from SOLR-7631 that inspired this one never fails unless a precisionStep is used, but we definitely can/should beef this test up to demonstrate that as well. NumericFields + SlowCompositeReaderWrapper + UninvertedReader + -Dtests.codec=random can results in incorrect SortedSetDocValues - Key: LUCENE-6529 URL: https://issues.apache.org/jira/browse/LUCENE-6529 Project: Lucene - Core Issue Type: Bug Reporter: Hoss Man Attachments: LUCENE-6529.patch Digging into SOLR-7631 and SOLR-7605 I became fairly confident that the only explanation of the behavior i was seeing was some sort of bug in either the randomized codec/postings-format or the UninvertedReader, that was only evident when two were combined and used on a multivalued Numeric Field using precision steps. But since i couldn't find any -Dtests.codec or -Dtests.postings.format options that would cause the bug 100% regardless of seed, I switched tactices and focused on reproducing the problem using UninvertedReader directly and checking the SortedSetDocValues.getValueCount(). I now have a test that fails frequently (and consistently for any seed i find), but only with -Dtests.codec=random -- override it with -Dtests.codec=default and everything works fine (based on the exhaustive testing I did in the linked issues, i suspect every named codec works fine - but i didn't re-do that testing here) The failures only seem to happen when checking the SortedSetDocValues.getValueCount() of a SlowCompositeReaderWrapper around the UninvertedReader -- which suggests the root bug may actually be in SlowCompositeReaderWrapper? (but still has some dependency on the random codec) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-5.x-Linux (64bit/jdk1.7.0_80) - Build # 12789 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/12789/ Java: 64bit/jdk1.7.0_80 -XX:-UseCompressedOops -XX:+UseG1GC 1 tests failed. FAILED: org.apache.lucene.spatial.spatial4j.Geo3dRptTest.testOperations {#5 seed=[ADFCC7193C72FA89:9BDCDB8859624E4]} Error Message: [Intersects] qIdx:34 Shouldn't match I#1:Rect(minX=131.0,maxX=143.0,minY=39.0,maxY=54.0) Q:Geo3dShape{planetmodel=PlanetModel.SPHERE, shape=GeoPath: {planetmodel=PlanetModel.SPHERE, width=0.5061454830783556(29.0), points={[[X=0.5155270860898133, Y=-0.25143936017440033, Z=0.8191520442889918], [X=-6.047846824324981E-17, Y=9.57884834439237E-18, Z=-1.0], [X=-0.5677569555011356, Y=0.1521300177236823, Z=0.8090169943749475], [X=5.716531405282095E-17, Y=2.1943708116382607E-17, Z=-1.0]]}}} Stack Trace: java.lang.AssertionError: [Intersects] qIdx:34 Shouldn't match I#1:Rect(minX=131.0,maxX=143.0,minY=39.0,maxY=54.0) Q:Geo3dShape{planetmodel=PlanetModel.SPHERE, shape=GeoPath: {planetmodel=PlanetModel.SPHERE, width=0.5061454830783556(29.0), points={[[X=0.5155270860898133, Y=-0.25143936017440033, Z=0.8191520442889918], [X=-6.047846824324981E-17, Y=9.57884834439237E-18, Z=-1.0], [X=-0.5677569555011356, Y=0.1521300177236823, Z=0.8090169943749475], [X=5.716531405282095E-17, Y=2.1943708116382607E-17, Z=-1.0]]}}} at __randomizedtesting.SeedInfo.seed([ADFCC7193C72FA89:9BDCDB8859624E4]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.lucene.spatial.prefix.RandomSpatialOpStrategyTestCase.fail(RandomSpatialOpStrategyTestCase.java:127) at org.apache.lucene.spatial.prefix.RandomSpatialOpStrategyTestCase.testOperation(RandomSpatialOpStrategyTestCase.java:116) at org.apache.lucene.spatial.prefix.RandomSpatialOpStrategyTestCase.testOperationRandomShapes(RandomSpatialOpStrategyTestCase.java:56) at org.apache.lucene.spatial.spatial4j.Geo3dRptTest.testOperations(Geo3dRptTest.java:100) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:872) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:886) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:845) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:747) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:781) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:792) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at
Re: VOTE: RC0 release apache-solr-ref-guide-5.2.pdf
: Please VOTE to release these files as the Solr Ref Guide 5.2... : : https://dist.apache.org/repos/dist/dev/lucene/solr/ref-guide/apache-solr-ref-guide-5.2-RC0/ The VOTE has passed, i'll start pushing to the mirrors. -Hoss http://www.lucidworks.com/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.9.0-ea-b60) - Build # 12967 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/12967/ Java: 32bit/jdk1.9.0-ea-b60 -client -XX:+UseParallelGC 1 tests failed. FAILED: org.apache.solr.cloud.CollectionsAPIAsyncDistributedZkTest.testSolrJAPICalls Error Message: Shard split did not complete. Last recorded state: running expected:[completed] but was:[running] Stack Trace: org.junit.ComparisonFailure: Shard split did not complete. Last recorded state: running expected:[completed] but was:[running] at __randomizedtesting.SeedInfo.seed([1A58B13484DE1995:423C3D5582B4B141]:0) at org.junit.Assert.assertEquals(Assert.java:125) at org.apache.solr.cloud.CollectionsAPIAsyncDistributedZkTest.testSolrJAPICalls(CollectionsAPIAsyncDistributedZkTest.java:90) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:502) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:872) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:886) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:960) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:935) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:845) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:747) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:781) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:792) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at
[JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.8.0_60-ea-b12) - Build # 12968 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/12968/ Java: 64bit/jdk1.8.0_60-ea-b12 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC 1 tests failed. FAILED: org.apache.lucene.index.TestDemoParallelLeafReader.testRandom Error Message: Stack Trace: java.lang.AssertionError at __randomizedtesting.SeedInfo.seed([46D022636CAB2412:349C076CDDCB9261]:0) at org.apache.lucene.codecs.asserting.AssertingLiveDocsFormat.newLiveDocs(AssertingLiveDocsFormat.java:44) at org.apache.lucene.index.FreqProxTermsWriter.applyDeletes(FreqProxTermsWriter.java:66) at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:102) at org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:112) at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:421) at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:513) at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:625) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3005) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2980) at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:973) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1018) at org.apache.lucene.index.TestDemoParallelLeafReader$ReindexingReader.close(TestDemoParallelLeafReader.java:244) at org.apache.lucene.index.TestDemoParallelLeafReader.testRandom(TestDemoParallelLeafReader.java:1291) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:872) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:886) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:845) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:747) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:781) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:792) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at
[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.8.0) - Build # 2391 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/2391/ Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC 1 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.cloud.HttpPartitionTest Error Message: ObjectTracker found 1 object(s) that were not released!!! [TransactionLog] Stack Trace: java.lang.AssertionError: ObjectTracker found 1 object(s) that were not released!!! [TransactionLog] at __randomizedtesting.SeedInfo.seed([DD454AC9D552DCDA]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertNull(Assert.java:551) at org.apache.solr.SolrTestCaseJ4.afterClass(SolrTestCaseJ4.java:235) at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:799) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at java.lang.Thread.run(Thread.java:745) Build Log: [...truncated 9984 lines...] [junit4] Suite: org.apache.solr.cloud.HttpPartitionTest [junit4] 2 Creating dataDir: /Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/J0/temp/solr.cloud.HttpPartitionTest DD454AC9D552DCDA-001/init-core-data-001 [junit4] 2 1327990 INFO (SUITE-HttpPartitionTest-seed#[DD454AC9D552DCDA]-worker) [] o.a.s.BaseDistributedSearchTestCase Setting hostContext system property: /tfu/f [junit4] 2 1327994 INFO (TEST-HttpPartitionTest.test-seed#[DD454AC9D552DCDA]) [] o.a.s.c.ZkTestServer STARTING ZK TEST SERVER [junit4] 2 1327995 INFO (Thread-2631) [] o.a.s.c.ZkTestServer client port:0.0.0.0/0.0.0.0:0 [junit4] 2 1327995 INFO (Thread-2631) [] o.a.s.c.ZkTestServer Starting server [junit4] 2 1328105 INFO (TEST-HttpPartitionTest.test-seed#[DD454AC9D552DCDA]) [] o.a.s.c.ZkTestServer start zk server on port:61935 [junit4] 2 1328106 INFO (TEST-HttpPartitionTest.test-seed#[DD454AC9D552DCDA]) [] o.a.s.c.c.SolrZkClient Using default ZkCredentialsProvider [junit4] 2 1328108 INFO (TEST-HttpPartitionTest.test-seed#[DD454AC9D552DCDA]) [] o.a.s.c.c.ConnectionManager Waiting for client to connect to ZooKeeper [junit4] 2 1328130 INFO (zkCallback-1605-thread-1) [] o.a.s.c.c.ConnectionManager Watcher org.apache.solr.common.cloud.ConnectionManager@19927120 name:ZooKeeperConnection Watcher:127.0.0.1:61935 got event WatchedEvent state:SyncConnected type:None path:null path:null type:None [junit4] 2 1328131 INFO (TEST-HttpPartitionTest.test-seed#[DD454AC9D552DCDA]) [] o.a.s.c.c.ConnectionManager Client is connected to ZooKeeper [junit4] 2 1328132 INFO (TEST-HttpPartitionTest.test-seed#[DD454AC9D552DCDA]) [] o.a.s.c.c.SolrZkClient Using default ZkACLProvider [junit4] 2 1328132 INFO (TEST-HttpPartitionTest.test-seed#[DD454AC9D552DCDA]) [] o.a.s.c.c.SolrZkClient makePath: /solr
[jira] [Updated] (LUCENE-6482) Class loading deadlock relating to Codec initialization, default codec and SPI discovery
[ https://issues.apache.org/jira/browse/LUCENE-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-6482: -- Attachment: LUCENE-6482.patch LUCENE-6482-failingtest.patch New patch with a testcase that fails if the rest of patch is not applied (just apply LUCENE-6482-failingtest.patch to a plain checkout. This test spawns another JVM and spawns some threads to load codecs in parallel. It is similar to [~shikhar]'s test, but allows to run inside the normal testsuite. Class loading deadlock relating to Codec initialization, default codec and SPI discovery Key: LUCENE-6482 URL: https://issues.apache.org/jira/browse/LUCENE-6482 Project: Lucene - Core Issue Type: Bug Components: core/codecs Affects Versions: 4.9.1 Reporter: Shikhar Bhushan Assignee: Uwe Schindler Priority: Critical Fix For: Trunk, 5.3 Attachments: CodecLoadingDeadlockTest.java, LUCENE-6482-failingtest.patch, LUCENE-6482.patch, LUCENE-6482.patch, LUCENE-6482.patch, LUCENE-6482.patch This issue came up for us several times with Elasticsearch 1.3.4 (Lucene 4.9.1), with many threads seeming deadlocked but RUNNABLE: {noformat} elasticsearch[search77-es2][generic][T#43] #160 daemon prio=5 os_prio=0 tid=0x7f79180c5800 nid=0x3d1f in Object.wait() [0x7f79d9289000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:912) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:758) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453) at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98) at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:126) at org.elasticsearch.index.store.Store.access$300(Store.java:76) at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:465) at org.elasticsearch.index.store.Store$MetadataSnapshot.init(Store.java:456) at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:281) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} It didn't really make sense to see RUNNABLE threads in Object.wait(), but this seems to be symptomatic of deadlocks in static initialization (http://ternarysearch.blogspot.ru/2013/07/static-initialization-deadlock.html). I found LUCENE-5573 as an instance of this having come up with Lucene code before. I'm not sure what exactly is going on, but the deadlock in this case seems to involve these threads: {noformat} elasticsearch[search77-es2][clusterService#updateTask][T#1] #79 daemon prio=5 os_prio=0 tid=0x7f7b155ff800 nid=0xd49 in Object.wait() [0x7f79daed8000] java.lang.Thread.State: RUNNABLE at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at java.lang.Class.newInstance(Class.java:433) at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67) - locked 0x00061fef4968 (a org.apache.lucene.util.NamedSPILoader) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:47) at
[JENKINS] Lucene-Solr-Tests-5.x-Java7 - Build # 3196 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-Tests-5.x-Java7/3196/ No tests ran. Build Log: [...truncated 167 lines...] BUILD FAILED /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/build.xml:536: The following error occurred while executing this line: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/build.xml:484: The following error occurred while executing this line: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/build.xml:61: The following error occurred while executing this line: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/extra-targets.xml:39: The following error occurred while executing this line: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/lucene/build.xml:50: The following error occurred while executing this line: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/lucene/common-build.xml:1436: The following error occurred while executing this line: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/lucene/common-build.xml:991: Could not read or create hints file: /x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-5.x-Java7/.caches/test-stats/core/timehints.txt Total time: 18 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Sending artifact delta relative to Lucene-Solr-Tests-5.x-Java7 #3186 Archived 1 artifacts Archive block size is 32768 Received 0 blocks and 464 bytes Compression is 0.0% Took 1.8 sec Recording test results Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6482) Class loading deadlock relating to Codec initialization, default codec and SPI discovery
[ https://issues.apache.org/jira/browse/LUCENE-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-6482: -- Attachment: (was: LUCENE-6482.patch) Class loading deadlock relating to Codec initialization, default codec and SPI discovery Key: LUCENE-6482 URL: https://issues.apache.org/jira/browse/LUCENE-6482 Project: Lucene - Core Issue Type: Bug Components: core/codecs Affects Versions: 4.9.1 Reporter: Shikhar Bhushan Assignee: Uwe Schindler Priority: Critical Fix For: Trunk, 5.3 Attachments: CodecLoadingDeadlockTest.java, LUCENE-6482-failingtest.patch, LUCENE-6482.patch, LUCENE-6482.patch, LUCENE-6482.patch, LUCENE-6482.patch This issue came up for us several times with Elasticsearch 1.3.4 (Lucene 4.9.1), with many threads seeming deadlocked but RUNNABLE: {noformat} elasticsearch[search77-es2][generic][T#43] #160 daemon prio=5 os_prio=0 tid=0x7f79180c5800 nid=0x3d1f in Object.wait() [0x7f79d9289000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:912) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:758) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453) at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98) at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:126) at org.elasticsearch.index.store.Store.access$300(Store.java:76) at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:465) at org.elasticsearch.index.store.Store$MetadataSnapshot.init(Store.java:456) at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:281) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} It didn't really make sense to see RUNNABLE threads in Object.wait(), but this seems to be symptomatic of deadlocks in static initialization (http://ternarysearch.blogspot.ru/2013/07/static-initialization-deadlock.html). I found LUCENE-5573 as an instance of this having come up with Lucene code before. I'm not sure what exactly is going on, but the deadlock in this case seems to involve these threads: {noformat} elasticsearch[search77-es2][clusterService#updateTask][T#1] #79 daemon prio=5 os_prio=0 tid=0x7f7b155ff800 nid=0xd49 in Object.wait() [0x7f79daed8000] java.lang.Thread.State: RUNNABLE at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at java.lang.Class.newInstance(Class.java:433) at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67) - locked 0x00061fef4968 (a org.apache.lucene.util.NamedSPILoader) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:47) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:37) at org.apache.lucene.codecs.PostingsFormat.clinit(PostingsFormat.java:44) at org.elasticsearch.index.codec.postingsformat.PostingFormats.clinit(PostingFormats.java:67) at org.elasticsearch.index.codec.CodecModule.configurePostingsFormats(CodecModule.java:126) at
[jira] [Updated] (LUCENE-6482) Class loading deadlock relating to Codec initialization, default codec and SPI discovery
[ https://issues.apache.org/jira/browse/LUCENE-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-6482: -- Attachment: LUCENE-6482.patch LUCENE-6482-failingtest.patch Class loading deadlock relating to Codec initialization, default codec and SPI discovery Key: LUCENE-6482 URL: https://issues.apache.org/jira/browse/LUCENE-6482 Project: Lucene - Core Issue Type: Bug Components: core/codecs Affects Versions: 4.9.1 Reporter: Shikhar Bhushan Assignee: Uwe Schindler Priority: Critical Fix For: Trunk, 5.3 Attachments: CodecLoadingDeadlockTest.java, LUCENE-6482-failingtest.patch, LUCENE-6482.patch, LUCENE-6482.patch, LUCENE-6482.patch, LUCENE-6482.patch This issue came up for us several times with Elasticsearch 1.3.4 (Lucene 4.9.1), with many threads seeming deadlocked but RUNNABLE: {noformat} elasticsearch[search77-es2][generic][T#43] #160 daemon prio=5 os_prio=0 tid=0x7f79180c5800 nid=0x3d1f in Object.wait() [0x7f79d9289000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:912) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:758) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453) at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98) at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:126) at org.elasticsearch.index.store.Store.access$300(Store.java:76) at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:465) at org.elasticsearch.index.store.Store$MetadataSnapshot.init(Store.java:456) at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:281) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} It didn't really make sense to see RUNNABLE threads in Object.wait(), but this seems to be symptomatic of deadlocks in static initialization (http://ternarysearch.blogspot.ru/2013/07/static-initialization-deadlock.html). I found LUCENE-5573 as an instance of this having come up with Lucene code before. I'm not sure what exactly is going on, but the deadlock in this case seems to involve these threads: {noformat} elasticsearch[search77-es2][clusterService#updateTask][T#1] #79 daemon prio=5 os_prio=0 tid=0x7f7b155ff800 nid=0xd49 in Object.wait() [0x7f79daed8000] java.lang.Thread.State: RUNNABLE at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at java.lang.Class.newInstance(Class.java:433) at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67) - locked 0x00061fef4968 (a org.apache.lucene.util.NamedSPILoader) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:47) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:37) at org.apache.lucene.codecs.PostingsFormat.clinit(PostingsFormat.java:44) at org.elasticsearch.index.codec.postingsformat.PostingFormats.clinit(PostingFormats.java:67) at
[jira] [Updated] (LUCENE-6482) Class loading deadlock relating to Codec initialization, default codec and SPI discovery
[ https://issues.apache.org/jira/browse/LUCENE-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-6482: -- Attachment: (was: LUCENE-6482-failingtest.patch) Class loading deadlock relating to Codec initialization, default codec and SPI discovery Key: LUCENE-6482 URL: https://issues.apache.org/jira/browse/LUCENE-6482 Project: Lucene - Core Issue Type: Bug Components: core/codecs Affects Versions: 4.9.1 Reporter: Shikhar Bhushan Assignee: Uwe Schindler Priority: Critical Fix For: Trunk, 5.3 Attachments: CodecLoadingDeadlockTest.java, LUCENE-6482-failingtest.patch, LUCENE-6482.patch, LUCENE-6482.patch, LUCENE-6482.patch, LUCENE-6482.patch This issue came up for us several times with Elasticsearch 1.3.4 (Lucene 4.9.1), with many threads seeming deadlocked but RUNNABLE: {noformat} elasticsearch[search77-es2][generic][T#43] #160 daemon prio=5 os_prio=0 tid=0x7f79180c5800 nid=0x3d1f in Object.wait() [0x7f79d9289000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:912) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:758) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453) at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98) at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:126) at org.elasticsearch.index.store.Store.access$300(Store.java:76) at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:465) at org.elasticsearch.index.store.Store$MetadataSnapshot.init(Store.java:456) at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:281) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140) at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277) at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} It didn't really make sense to see RUNNABLE threads in Object.wait(), but this seems to be symptomatic of deadlocks in static initialization (http://ternarysearch.blogspot.ru/2013/07/static-initialization-deadlock.html). I found LUCENE-5573 as an instance of this having come up with Lucene code before. I'm not sure what exactly is going on, but the deadlock in this case seems to involve these threads: {noformat} elasticsearch[search77-es2][clusterService#updateTask][T#1] #79 daemon prio=5 os_prio=0 tid=0x7f7b155ff800 nid=0xd49 in Object.wait() [0x7f79daed8000] java.lang.Thread.State: RUNNABLE at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at java.lang.Class.newInstance(Class.java:433) at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67) - locked 0x00061fef4968 (a org.apache.lucene.util.NamedSPILoader) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:47) at org.apache.lucene.util.NamedSPILoader.init(NamedSPILoader.java:37) at org.apache.lucene.codecs.PostingsFormat.clinit(PostingsFormat.java:44) at org.elasticsearch.index.codec.postingsformat.PostingFormats.clinit(PostingFormats.java:67) at org.elasticsearch.index.codec.CodecModule.configurePostingsFormats(CodecModule.java:126) at
[jira] [Commented] (SOLR-7642) Should launching Solr in cloud mode using a ZooKeeper chroot create the chroot znode if it doesn't exist?
[ https://issues.apache.org/jira/browse/SOLR-7642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575861#comment-14575861 ] Tomás Fernández Löbbe commented on SOLR-7642: - Yes, this was discussed in SOLR-4028 and the decision there was to only create the chroot when bootstrap_conf was used, don't know how the correct way should be now that we have the script. I think this would give us something easier to start at the cost of increasing the possibility of a development/production headache (e.g. due to a typo, or you are hitting the wrong ZooKeeper ensemble, etc). On the other hand, it would be nice to not need to use the zkCli.sh or equivalent to create the chroot Should launching Solr in cloud mode using a ZooKeeper chroot create the chroot znode if it doesn't exist? - Key: SOLR-7642 URL: https://issues.apache.org/jira/browse/SOLR-7642 Project: Solr Issue Type: Improvement Reporter: Timothy Potter Priority: Minor If you launch Solr for the first time in cloud mode using a ZooKeeper connection string that includes a chroot leads to the following initialization error: {code} ERROR - 2015-06-05 17:15:50.410; [ ] org.apache.solr.common.SolrException; null:org.apache.solr.common.cloud.ZooKeeperException: A chroot was specified in ZkHost but the znode doesn't exist. localhost:2181/lan at org.apache.solr.core.ZkContainer.initZooKeeper(ZkContainer.java:113) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:339) at org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:140) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:110) at org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:138) at org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:852) at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:298) at org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1349) at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1342) at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:741) at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:505) {code} The work-around for this is to use the scripts/cloud-scripts/zkcli.sh script to create the chroot znode (bootstrap action does this). I'm wondering if we shouldn't just create the znode if it doesn't exist? Or is that some violation of using a chroot? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org