[jira] [Commented] (SOLR-9029) regular fails since ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy
[ https://issues.apache.org/jira/browse/SOLR-9029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293944#comment-15293944 ] Steve Rowe commented on SOLR-9029: -- Not backporting to 6.0.1, since the modifications are to {{ZkStateReader.forceUpdateCollection()}}, introduced by SOLR-8745, which won't be backported to branch_6_0. > regular fails since > ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy > > > Key: SOLR-9029 > URL: https://issues.apache.org/jira/browse/SOLR-9029 > Project: Solr > Issue Type: Bug >Reporter: Hoss Man >Assignee: Scott Blum > Fix For: 6.1, master (7.0) > > > jenkins started to semi-regularly complain about > ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy on march 7 (53 > failures in 45 days at current count) > March 7th is not-coincidently when commit > 093a8ce57c06f1bf2f71ddde52dcc7b40cbd6197 for SOLR-8745 was made, modifying > both the test & a bunch of ClusterState code. > > Sample failure... > https://builds.apache.org/job/Lucene-Solr-Tests-master/1096 > {noformat} >[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=ZkStateReaderTest > -Dtests.method=testStateFormatUpdateWithExplicitRefreshLazy > -Dtests.seed=78F99EDE682EC04B -Dtests.multiplier=2 -Dtests.slow=true > -Dtests.locale=tr-TR -Dtests.timezone=Europe/Tallinn -Dtests.asserts=true > -Dtests.file.encoding=UTF-8 >[junit4] ERROR 0.45s J0 | > ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy <<< >[junit4]> Throwable #1: org.apache.solr.common.SolrException: Could > not find collection : c1 >[junit4]> at > __randomizedtesting.SeedInfo.seed([78F99EDE682EC04B:13B63EA311211D71]:0) >[junit4]> at > org.apache.solr.common.cloud.ClusterState.getCollection(ClusterState.java:170) >[junit4]> at > org.apache.solr.cloud.overseer.ZkStateReaderTest.testStateFormatUpdate(ZkStateReaderTest.java:135) >[junit4]> at > org.apache.solr.cloud.overseer.ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy(ZkStateReaderTest.java:46) >[junit4]> at java.lang.Thread.run(Thread.java:745) > {noformat} > ...i've also seen this fail locally, but i've never been able to reproduce it > with the same seed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9029) regular fails since ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy
[ https://issues.apache.org/jira/browse/SOLR-9029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257478#comment-15257478 ] ASF subversion and git services commented on SOLR-9029: --- Commit 89857653cafdafe5396abe946cc3d7f4fec1377d in lucene-solr's branch refs/heads/branch_6x from [~dragonsinth] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8985765 ] SOLR-9029: fix rare ZkStateReader visibility race during collection state format update > regular fails since > ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy > > > Key: SOLR-9029 > URL: https://issues.apache.org/jira/browse/SOLR-9029 > Project: Solr > Issue Type: Bug >Reporter: Hoss Man >Assignee: Scott Blum > > jenkins started to semi-regularly complain about > ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy on march 7 (53 > failures in 45 days at current count) > March 7th is not-coincidently when commit > 093a8ce57c06f1bf2f71ddde52dcc7b40cbd6197 for SOLR-8745 was made, modifying > both the test & a bunch of ClusterState code. > > Sample failure... > https://builds.apache.org/job/Lucene-Solr-Tests-master/1096 > {noformat} >[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=ZkStateReaderTest > -Dtests.method=testStateFormatUpdateWithExplicitRefreshLazy > -Dtests.seed=78F99EDE682EC04B -Dtests.multiplier=2 -Dtests.slow=true > -Dtests.locale=tr-TR -Dtests.timezone=Europe/Tallinn -Dtests.asserts=true > -Dtests.file.encoding=UTF-8 >[junit4] ERROR 0.45s J0 | > ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy <<< >[junit4]> Throwable #1: org.apache.solr.common.SolrException: Could > not find collection : c1 >[junit4]> at > __randomizedtesting.SeedInfo.seed([78F99EDE682EC04B:13B63EA311211D71]:0) >[junit4]> at > org.apache.solr.common.cloud.ClusterState.getCollection(ClusterState.java:170) >[junit4]> at > org.apache.solr.cloud.overseer.ZkStateReaderTest.testStateFormatUpdate(ZkStateReaderTest.java:135) >[junit4]> at > org.apache.solr.cloud.overseer.ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy(ZkStateReaderTest.java:46) >[junit4]> at java.lang.Thread.run(Thread.java:745) > {noformat} > ...i've also seen this fail locally, but i've never been able to reproduce it > with the same seed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9029) regular fails since ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy
[ https://issues.apache.org/jira/browse/SOLR-9029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257477#comment-15257477 ] ASF subversion and git services commented on SOLR-9029: --- Commit 89c65af2a6e5f1c8216c1202f65e8d670ef14385 in lucene-solr's branch refs/heads/master from [~dragonsinth] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=89c65af ] SOLR-9029: fix rare ZkStateReader visibility race during collection state format update > regular fails since > ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy > > > Key: SOLR-9029 > URL: https://issues.apache.org/jira/browse/SOLR-9029 > Project: Solr > Issue Type: Bug >Reporter: Hoss Man >Assignee: Scott Blum > > jenkins started to semi-regularly complain about > ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy on march 7 (53 > failures in 45 days at current count) > March 7th is not-coincidently when commit > 093a8ce57c06f1bf2f71ddde52dcc7b40cbd6197 for SOLR-8745 was made, modifying > both the test & a bunch of ClusterState code. > > Sample failure... > https://builds.apache.org/job/Lucene-Solr-Tests-master/1096 > {noformat} >[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=ZkStateReaderTest > -Dtests.method=testStateFormatUpdateWithExplicitRefreshLazy > -Dtests.seed=78F99EDE682EC04B -Dtests.multiplier=2 -Dtests.slow=true > -Dtests.locale=tr-TR -Dtests.timezone=Europe/Tallinn -Dtests.asserts=true > -Dtests.file.encoding=UTF-8 >[junit4] ERROR 0.45s J0 | > ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy <<< >[junit4]> Throwable #1: org.apache.solr.common.SolrException: Could > not find collection : c1 >[junit4]> at > __randomizedtesting.SeedInfo.seed([78F99EDE682EC04B:13B63EA311211D71]:0) >[junit4]> at > org.apache.solr.common.cloud.ClusterState.getCollection(ClusterState.java:170) >[junit4]> at > org.apache.solr.cloud.overseer.ZkStateReaderTest.testStateFormatUpdate(ZkStateReaderTest.java:135) >[junit4]> at > org.apache.solr.cloud.overseer.ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy(ZkStateReaderTest.java:46) >[junit4]> at java.lang.Thread.run(Thread.java:745) > {noformat} > ...i've also seen this fail locally, but i've never been able to reproduce it > with the same seed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9029) regular fails since ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy
[ https://issues.apache.org/jira/browse/SOLR-9029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257379#comment-15257379 ] Scott Blum commented on SOLR-9029: -- Testing a fix now: https://github.com/fullstorydev/lucene-solr/tree/SOLR-9029 [~hossman] [~shalinmangar] if you'd like to look at the change. > regular fails since > ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy > > > Key: SOLR-9029 > URL: https://issues.apache.org/jira/browse/SOLR-9029 > Project: Solr > Issue Type: Bug >Reporter: Hoss Man >Assignee: Scott Blum > > jenkins started to semi-regularly complain about > ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy on march 7 (53 > failures in 45 days at current count) > March 7th is not-coincidently when commit > 093a8ce57c06f1bf2f71ddde52dcc7b40cbd6197 for SOLR-8745 was made, modifying > both the test & a bunch of ClusterState code. > > Sample failure... > https://builds.apache.org/job/Lucene-Solr-Tests-master/1096 > {noformat} >[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=ZkStateReaderTest > -Dtests.method=testStateFormatUpdateWithExplicitRefreshLazy > -Dtests.seed=78F99EDE682EC04B -Dtests.multiplier=2 -Dtests.slow=true > -Dtests.locale=tr-TR -Dtests.timezone=Europe/Tallinn -Dtests.asserts=true > -Dtests.file.encoding=UTF-8 >[junit4] ERROR 0.45s J0 | > ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy <<< >[junit4]> Throwable #1: org.apache.solr.common.SolrException: Could > not find collection : c1 >[junit4]> at > __randomizedtesting.SeedInfo.seed([78F99EDE682EC04B:13B63EA311211D71]:0) >[junit4]> at > org.apache.solr.common.cloud.ClusterState.getCollection(ClusterState.java:170) >[junit4]> at > org.apache.solr.cloud.overseer.ZkStateReaderTest.testStateFormatUpdate(ZkStateReaderTest.java:135) >[junit4]> at > org.apache.solr.cloud.overseer.ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy(ZkStateReaderTest.java:46) >[junit4]> at java.lang.Thread.run(Thread.java:745) > {noformat} > ...i've also seen this fail locally, but i've never been able to reproduce it > with the same seed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9029) regular fails since ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy
[ https://issues.apache.org/jira/browse/SOLR-9029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257375#comment-15257375 ] Scott Blum commented on SOLR-9029: -- Finally found it... there's an very rare edge case in forceUpdateCollection() that only occurs when a collection moves from being the legacy collection state straight to being a lazy collection, without ever being observed missing. Basically, it requires you to not see any ZK events during the execution of the test method. I can repro this by putting early exits in LegacyClusterStateWatcher and CollectionsChildWatcher to prevent any watch events from taking effect. > regular fails since > ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy > > > Key: SOLR-9029 > URL: https://issues.apache.org/jira/browse/SOLR-9029 > Project: Solr > Issue Type: Bug >Reporter: Hoss Man >Assignee: Scott Blum > > jenkins started to semi-regularly complain about > ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy on march 7 (53 > failures in 45 days at current count) > March 7th is not-coincidently when commit > 093a8ce57c06f1bf2f71ddde52dcc7b40cbd6197 for SOLR-8745 was made, modifying > both the test & a bunch of ClusterState code. > > Sample failure... > https://builds.apache.org/job/Lucene-Solr-Tests-master/1096 > {noformat} >[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=ZkStateReaderTest > -Dtests.method=testStateFormatUpdateWithExplicitRefreshLazy > -Dtests.seed=78F99EDE682EC04B -Dtests.multiplier=2 -Dtests.slow=true > -Dtests.locale=tr-TR -Dtests.timezone=Europe/Tallinn -Dtests.asserts=true > -Dtests.file.encoding=UTF-8 >[junit4] ERROR 0.45s J0 | > ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy <<< >[junit4]> Throwable #1: org.apache.solr.common.SolrException: Could > not find collection : c1 >[junit4]> at > __randomizedtesting.SeedInfo.seed([78F99EDE682EC04B:13B63EA311211D71]:0) >[junit4]> at > org.apache.solr.common.cloud.ClusterState.getCollection(ClusterState.java:170) >[junit4]> at > org.apache.solr.cloud.overseer.ZkStateReaderTest.testStateFormatUpdate(ZkStateReaderTest.java:135) >[junit4]> at > org.apache.solr.cloud.overseer.ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy(ZkStateReaderTest.java:46) >[junit4]> at java.lang.Thread.run(Thread.java:745) > {noformat} > ...i've also seen this fail locally, but i've never been able to reproduce it > with the same seed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9029) regular fails since ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy
[ https://issues.apache.org/jira/browse/SOLR-9029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257329#comment-15257329 ] Scott Blum commented on SOLR-9029: -- Super puzzling. We've tested that the ZK node exists, and the fact that reader.forceUpdateCollection() is called on the same thread that subsequently checks collection exists practically eliminates data visibility problems. > regular fails since > ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy > > > Key: SOLR-9029 > URL: https://issues.apache.org/jira/browse/SOLR-9029 > Project: Solr > Issue Type: Bug >Reporter: Hoss Man >Assignee: Scott Blum > > jenkins started to semi-regularly complain about > ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy on march 7 (53 > failures in 45 days at current count) > March 7th is not-coincidently when commit > 093a8ce57c06f1bf2f71ddde52dcc7b40cbd6197 for SOLR-8745 was made, modifying > both the test & a bunch of ClusterState code. > > Sample failure... > https://builds.apache.org/job/Lucene-Solr-Tests-master/1096 > {noformat} >[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=ZkStateReaderTest > -Dtests.method=testStateFormatUpdateWithExplicitRefreshLazy > -Dtests.seed=78F99EDE682EC04B -Dtests.multiplier=2 -Dtests.slow=true > -Dtests.locale=tr-TR -Dtests.timezone=Europe/Tallinn -Dtests.asserts=true > -Dtests.file.encoding=UTF-8 >[junit4] ERROR 0.45s J0 | > ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy <<< >[junit4]> Throwable #1: org.apache.solr.common.SolrException: Could > not find collection : c1 >[junit4]> at > __randomizedtesting.SeedInfo.seed([78F99EDE682EC04B:13B63EA311211D71]:0) >[junit4]> at > org.apache.solr.common.cloud.ClusterState.getCollection(ClusterState.java:170) >[junit4]> at > org.apache.solr.cloud.overseer.ZkStateReaderTest.testStateFormatUpdate(ZkStateReaderTest.java:135) >[junit4]> at > org.apache.solr.cloud.overseer.ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy(ZkStateReaderTest.java:46) >[junit4]> at java.lang.Thread.run(Thread.java:745) > {noformat} > ...i've also seen this fail locally, but i've never been able to reproduce it > with the same seed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9029) regular fails since ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy
[ https://issues.apache.org/jira/browse/SOLR-9029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252998#comment-15252998 ] Scott Blum commented on SOLR-9029: -- Scanned through the code, nothing jumps out at me. I'll dig deeper at some point. > regular fails since > ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy > > > Key: SOLR-9029 > URL: https://issues.apache.org/jira/browse/SOLR-9029 > Project: Solr > Issue Type: Bug >Reporter: Hoss Man >Assignee: Scott Blum > > jenkins started to semi-regularly complain about > ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy on march 7 (53 > failures in 45 days at current count) > March 7th is not-coincidently when commit > 093a8ce57c06f1bf2f71ddde52dcc7b40cbd6197 for SOLR-8745 was made, modifying > both the test & a bunch of ClusterState code. > > Sample failure... > https://builds.apache.org/job/Lucene-Solr-Tests-master/1096 > {noformat} >[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=ZkStateReaderTest > -Dtests.method=testStateFormatUpdateWithExplicitRefreshLazy > -Dtests.seed=78F99EDE682EC04B -Dtests.multiplier=2 -Dtests.slow=true > -Dtests.locale=tr-TR -Dtests.timezone=Europe/Tallinn -Dtests.asserts=true > -Dtests.file.encoding=UTF-8 >[junit4] ERROR 0.45s J0 | > ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy <<< >[junit4]> Throwable #1: org.apache.solr.common.SolrException: Could > not find collection : c1 >[junit4]> at > __randomizedtesting.SeedInfo.seed([78F99EDE682EC04B:13B63EA311211D71]:0) >[junit4]> at > org.apache.solr.common.cloud.ClusterState.getCollection(ClusterState.java:170) >[junit4]> at > org.apache.solr.cloud.overseer.ZkStateReaderTest.testStateFormatUpdate(ZkStateReaderTest.java:135) >[junit4]> at > org.apache.solr.cloud.overseer.ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy(ZkStateReaderTest.java:46) >[junit4]> at java.lang.Thread.run(Thread.java:745) > {noformat} > ...i've also seen this fail locally, but i've never been able to reproduce it > with the same seed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9029) regular fails since ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy
[ https://issues.apache.org/jira/browse/SOLR-9029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252978#comment-15252978 ] Hoss Man commented on SOLR-9029: [~shalinmangar] & [~dragonsinth] - anything jump out at you? > regular fails since > ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy > > > Key: SOLR-9029 > URL: https://issues.apache.org/jira/browse/SOLR-9029 > Project: Solr > Issue Type: Bug >Reporter: Hoss Man > > jenkins started to semi-regularly complain about > ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy on march 7 (53 > failures in 45 days at current count) > March 7th is not-coincidently when commit > 093a8ce57c06f1bf2f71ddde52dcc7b40cbd6197 for SOLR-8745 was made, modifying > both the test & a bunch of ClusterState code. > > Sample failure... > https://builds.apache.org/job/Lucene-Solr-Tests-master/1096 > {noformat} >[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=ZkStateReaderTest > -Dtests.method=testStateFormatUpdateWithExplicitRefreshLazy > -Dtests.seed=78F99EDE682EC04B -Dtests.multiplier=2 -Dtests.slow=true > -Dtests.locale=tr-TR -Dtests.timezone=Europe/Tallinn -Dtests.asserts=true > -Dtests.file.encoding=UTF-8 >[junit4] ERROR 0.45s J0 | > ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy <<< >[junit4]> Throwable #1: org.apache.solr.common.SolrException: Could > not find collection : c1 >[junit4]> at > __randomizedtesting.SeedInfo.seed([78F99EDE682EC04B:13B63EA311211D71]:0) >[junit4]> at > org.apache.solr.common.cloud.ClusterState.getCollection(ClusterState.java:170) >[junit4]> at > org.apache.solr.cloud.overseer.ZkStateReaderTest.testStateFormatUpdate(ZkStateReaderTest.java:135) >[junit4]> at > org.apache.solr.cloud.overseer.ZkStateReaderTest.testStateFormatUpdateWithExplicitRefreshLazy(ZkStateReaderTest.java:46) >[junit4]> at java.lang.Thread.run(Thread.java:745) > {noformat} > ...i've also seen this fail locally, but i've never been able to reproduce it > with the same seed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org