[jira] [Updated] (SOLR-9980) Expose configVersion in core admin status
[ https://issues.apache.org/jira/browse/SOLR-9980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jessica Cheng Mallet updated SOLR-9980: --- Assignee: Tomás Fernández Löbbe > Expose configVersion in core admin status > - > > Key: SOLR-9980 > URL: https://issues.apache.org/jira/browse/SOLR-9980 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Jessica Cheng Mallet >Assignee: Tomás Fernández Löbbe >Priority: Minor > Labels: admin, status > Attachments: SOLR-9980.diff > > > Expose the loaded znode version of the solr config for a core via the core > admin status call. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9980) Expose configVersion in core admin status
[ https://issues.apache.org/jira/browse/SOLR-9980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jessica Cheng Mallet updated SOLR-9980: --- Attachment: SOLR-9980.diff > Expose configVersion in core admin status > - > > Key: SOLR-9980 > URL: https://issues.apache.org/jira/browse/SOLR-9980 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Jessica Cheng Mallet >Priority: Minor > Labels: admin, status > Attachments: SOLR-9980.diff > > > Expose the loaded znode version of the solr config for a core via the core > admin status call. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9980) Expose configVersion in core admin status
[ https://issues.apache.org/jira/browse/SOLR-9980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jessica Cheng Mallet updated SOLR-9980: --- Issue Type: Task (was: Bug) > Expose configVersion in core admin status > - > > Key: SOLR-9980 > URL: https://issues.apache.org/jira/browse/SOLR-9980 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Jessica Cheng Mallet >Priority: Minor > Labels: admin, status > > Expose the loaded znode version of the solr config for a core via the core > admin status call. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-9980) Expose configVersion in core admin status
Jessica Cheng Mallet created SOLR-9980: -- Summary: Expose configVersion in core admin status Key: SOLR-9980 URL: https://issues.apache.org/jira/browse/SOLR-9980 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Reporter: Jessica Cheng Mallet Priority: Minor Expose the loaded znode version of the solr config for a core via the core admin status call. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9823) CoreContainer incorrectly setting MDCLoggingContext for core
[ https://issues.apache.org/jira/browse/SOLR-9823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jessica Cheng Mallet updated SOLR-9823: --- Attachment: SOLR-9823.diff > CoreContainer incorrectly setting MDCLoggingContext for core > > > Key: SOLR-9823 > URL: https://issues.apache.org/jira/browse/SOLR-9823 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: logging >Reporter: Jessica Cheng Mallet >Priority: Minor > Labels: logging > Attachments: SOLR-9823.diff > > > One line bug fix for setting up the MDCLoggingContext for core in > CoreContainer. Currently the code is always setting "null". -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-9823) CoreContainer incorrectly setting MDCLoggingContext for core
Jessica Cheng Mallet created SOLR-9823: -- Summary: CoreContainer incorrectly setting MDCLoggingContext for core Key: SOLR-9823 URL: https://issues.apache.org/jira/browse/SOLR-9823 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: logging Reporter: Jessica Cheng Mallet Priority: Minor One line bug fix for setting up the MDCLoggingContext for core in CoreContainer. Currently the code is always setting "null". -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9707) DeleteByQuery forward requests to down replicas and set it in LiR
[ https://issues.apache.org/jira/browse/SOLR-9707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jessica Cheng Mallet updated SOLR-9707: --- Attachment: SOLR-9707.diff > DeleteByQuery forward requests to down replicas and set it in LiR > - > > Key: SOLR-9707 > URL: https://issues.apache.org/jira/browse/SOLR-9707 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: Jessica Cheng Mallet > Labels: solrcloud > Attachments: SOLR-9707.diff > > > DeleteByQuery, unlike other requests, does not filter out the down replicas. > Thus, the update is still forwarded to the down replica and fails, and the > leader then sets the replica in LiR. In a cluster where there are lots of > deleteByQuery requests, this can flood the /overseer/queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-9707) DeleteByQuery forward requests to down replicas and set it in LiR
Jessica Cheng Mallet created SOLR-9707: -- Summary: DeleteByQuery forward requests to down replicas and set it in LiR Key: SOLR-9707 URL: https://issues.apache.org/jira/browse/SOLR-9707 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: SolrCloud Reporter: Jessica Cheng Mallet DeleteByQuery, unlike other requests, does not filter out the down replicas. Thus, the update is still forwarded to the down replica and fails, and the leader then sets the replica in LiR. In a cluster where there are lots of deleteByQuery requests, this can flood the /overseer/queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9408) Add solr commit data in TreeMergeRecordWriter
[ https://issues.apache.org/jira/browse/SOLR-9408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jessica Cheng Mallet updated SOLR-9408: --- Attachment: SOLR-9408.patch > Add solr commit data in TreeMergeRecordWriter > - > > Key: SOLR-9408 > URL: https://issues.apache.org/jira/browse/SOLR-9408 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - MapReduce >Reporter: Jessica Cheng Mallet > Labels: mapreduce, solrcloud > Attachments: SOLR-9408.patch, SOLR-9408.patch > > > The lucene index produced by TreeMergeRecordWriter when the segments are > merged doesn't contain Solr's commit data, specifically, commitTimeMsec. > This means that when this index is subsequently loaded into SolrCloud and if > the index stays unchanged so no newer commits occurs, ADDREPLICA will appear > to succeed but will not actually do any full sync due to SOLR-9369, resulting > in adding an empty index as a replica. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9408) Add solr commit data in TreeMergeRecordWriter
[ https://issues.apache.org/jira/browse/SOLR-9408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jessica Cheng Mallet updated SOLR-9408: --- Attachment: SOLR-9408.patch > Add solr commit data in TreeMergeRecordWriter > - > > Key: SOLR-9408 > URL: https://issues.apache.org/jira/browse/SOLR-9408 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - MapReduce >Reporter: Jessica Cheng Mallet > Labels: mapreduce, solrcloud > Attachments: SOLR-9408.patch > > > The lucene index produced by TreeMergeRecordWriter when the segments are > merged doesn't contain Solr's commit data, specifically, commitTimeMsec. > This means that when this index is subsequently loaded into SolrCloud and if > the index stays unchanged so no newer commits occurs, ADDREPLICA will appear > to succeed but will not actually do any full sync due to SOLR-9369, resulting > in adding an empty index as a replica. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-9408) Add solr commit data in TreeMergeRecordWriter
Jessica Cheng Mallet created SOLR-9408: -- Summary: Add solr commit data in TreeMergeRecordWriter Key: SOLR-9408 URL: https://issues.apache.org/jira/browse/SOLR-9408 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: contrib - MapReduce Reporter: Jessica Cheng Mallet The lucene index produced by TreeMergeRecordWriter when the segments are merged doesn't contain Solr's commit data, specifically, commitTimeMsec. This means that when this index is subsequently loaded into SolrCloud and if the index stays unchanged so no newer commits occurs, ADDREPLICA will appear to succeed but will not actually do any full sync due to SOLR-9369, resulting in adding an empty index as a replica. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9136) Separate out the error statistics into server-side error vs client-side error
[ https://issues.apache.org/jira/browse/SOLR-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jessica Cheng Mallet updated SOLR-9136: --- Attachment: SOLR-9136.patch > Separate out the error statistics into server-side error vs client-side error > - > > Key: SOLR-9136 > URL: https://issues.apache.org/jira/browse/SOLR-9136 > Project: Solr > Issue Type: Improvement >Reporter: Jessica Cheng Mallet >Priority: Minor > Attachments: SOLR-9136.patch, SOLR-9136.patch > > > Currently Solr counts both server-side errors (5xx) and client-side errors > (4xx) under the same statistic "errors". Operationally it's beneficial to > have those errors separated out so different teams can be alerted depending > on if Solr is seeing lots of server errors vs. client errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9136) Separate out the error statistics into server-side error vs client-side error
[ https://issues.apache.org/jira/browse/SOLR-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297077#comment-15297077 ] Jessica Cheng Mallet commented on SOLR-9136: Sure I like that. I'll make the change. Thanks! > Separate out the error statistics into server-side error vs client-side error > - > > Key: SOLR-9136 > URL: https://issues.apache.org/jira/browse/SOLR-9136 > Project: Solr > Issue Type: Improvement >Reporter: Jessica Cheng Mallet >Priority: Minor > Attachments: SOLR-9136.patch > > > Currently Solr counts both server-side errors (5xx) and client-side errors > (4xx) under the same statistic "errors". Operationally it's beneficial to > have those errors separated out so different teams can be alerted depending > on if Solr is seeing lots of server errors vs. client errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9136) Separate out the error statistics into server-side error vs client-side error
[ https://issues.apache.org/jira/browse/SOLR-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jessica Cheng Mallet updated SOLR-9136: --- Attachment: SOLR-9136.patch > Separate out the error statistics into server-side error vs client-side error > - > > Key: SOLR-9136 > URL: https://issues.apache.org/jira/browse/SOLR-9136 > Project: Solr > Issue Type: Improvement >Reporter: Jessica Cheng Mallet >Priority: Minor > Attachments: SOLR-9136.patch > > > Currently Solr counts both server-side errors (5xx) and client-side errors > (4xx) under the same statistic "errors". Operationally it's beneficial to > have those errors separated out so different teams can be alerted depending > on if Solr is seeing lots of server errors vs. client errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-9136) Separate out the error statistics into server-side error vs client-side error
Jessica Cheng Mallet created SOLR-9136: -- Summary: Separate out the error statistics into server-side error vs client-side error Key: SOLR-9136 URL: https://issues.apache.org/jira/browse/SOLR-9136 Project: Solr Issue Type: Improvement Reporter: Jessica Cheng Mallet Priority: Minor Currently Solr counts both server-side errors (5xx) and client-side errors (4xx) under the same statistic "errors". Operationally it's beneficial to have those errors separated out so different teams can be alerted depending on if Solr is seeing lots of server errors vs. client errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9116) Race condition causing occasional SolrIndexSearcher leak when SolrCore is reloaded
[ https://issues.apache.org/jira/browse/SOLR-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jessica Cheng Mallet updated SOLR-9116: --- Description: Fix a leak of SolrIndexSearcher when a SolrCore is reloaded. Added a test to expose this leak when run in many iterations (pretty reliable failure with iters=1K), which passes with the fix (ran iters=10K twice). The fundamental issue is that when an invocation of SolrCore#openNewSearcher is racing with SolrCore#close, if this synchronized block (https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/core/SolrCore.java#L1611) in openNewSearcher doesn't check for whether or not the core is closed, it can possibly run after the core runs closeSearcher and assign the newly constructed searcher to realtimeSearcher again, which will never be cleaned up. The fix is to check if the SolrCore is closed inside the synchronized block, and if so, clean up the newly constructed searcher and throw an Exception. was: Fix a leak of SolrIndexSearcher when a SolrCore is reloaded. Added a test to expose this leak when run in many iterations (pretty reliable failure with iters=1K), which passes with the fix (ran iters=10K twice). The fundamental issue is that when an invocation of SolrCore#openNewSearcher is racing with SolrCore#close, if this synchronized block (https://github.com/apache/lucene-solr/blob/master/solr/ core/src/java/org/apache/solr/core/SolrCore.java#L1611) in openNewSearcher doesn't check for whether or not the core is closed, it can possibly run after the core runs closeSearcher and assign the newly constructed searcher to realtimeSearcher again, which will never be cleaned up. The fix is to check if the SolrCore is closed inside the synchronized block, and if so, clean up the newly constructed searcher and throw an Exception. > Race condition causing occasional SolrIndexSearcher leak when SolrCore is > reloaded > -- > > Key: SOLR-9116 > URL: https://issues.apache.org/jira/browse/SOLR-9116 > Project: Solr > Issue Type: Bug >Reporter: Jessica Cheng Mallet > Labels: leak, searcher > Attachments: SOLR-9116.patch > > > Fix a leak of SolrIndexSearcher when a SolrCore is reloaded. Added a test to > expose this leak when run in many iterations (pretty reliable failure with > iters=1K), which passes with the fix (ran iters=10K twice). > The fundamental issue is that when an invocation of SolrCore#openNewSearcher > is racing with SolrCore#close, if this synchronized block > (https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/core/SolrCore.java#L1611) > in openNewSearcher doesn't check for whether or not the core is closed, it > can possibly run after the core runs closeSearcher and assign the newly > constructed searcher to realtimeSearcher again, which will never be cleaned > up. The fix is to check if the SolrCore is closed inside the synchronized > block, and if so, clean up the newly constructed searcher and throw an > Exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-9117) Leaking the first SolrCore after reload
Jessica Cheng Mallet created SOLR-9117: -- Summary: Leaking the first SolrCore after reload Key: SOLR-9117 URL: https://issues.apache.org/jira/browse/SOLR-9117 Project: Solr Issue Type: Bug Reporter: Jessica Cheng Mallet Attachments: SOLR-9117.patch When a SolrCore for a particular index is created for the first time, it's added to the SolrCores#createdCores map. However, this map doesn't get updated when this core is reloaded, leading to the first SolrCore being leaked. Taking a look at how createdCores is used, it seems like it doesn't serve any purpose (its only read is in SolrCores#getAllCoreNames, which includes entries from SolrCores.cores anyway), so I'm proposing a patch to remove the createdCores map completely. However, if someone else knows that createdCores exist for a reason, I'll be happy to change the fix to updating the createdCores map when reload is called. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9117) Leaking the first SolrCore after reload
[ https://issues.apache.org/jira/browse/SOLR-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jessica Cheng Mallet updated SOLR-9117: --- Attachment: SOLR-9117.patch > Leaking the first SolrCore after reload > --- > > Key: SOLR-9117 > URL: https://issues.apache.org/jira/browse/SOLR-9117 > Project: Solr > Issue Type: Bug >Reporter: Jessica Cheng Mallet > Labels: core, leak > Attachments: SOLR-9117.patch > > > When a SolrCore for a particular index is created for the first time, it's > added to the SolrCores#createdCores map. However, this map doesn't get > updated when this core is reloaded, leading to the first SolrCore being > leaked. > Taking a look at how createdCores is used, it seems like it doesn't serve any > purpose (its only read is in SolrCores#getAllCoreNames, which includes > entries from SolrCores.cores anyway), so I'm proposing a patch to remove the > createdCores map completely. However, if someone else knows that createdCores > exist for a reason, I'll be happy to change the fix to updating the > createdCores map when reload is called. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9116) Race condition causing occasional SolrIndexSearcher leak when SolrCore is reloaded
[ https://issues.apache.org/jira/browse/SOLR-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jessica Cheng Mallet updated SOLR-9116: --- Attachment: SOLR-9116.patch > Race condition causing occasional SolrIndexSearcher leak when SolrCore is > reloaded > -- > > Key: SOLR-9116 > URL: https://issues.apache.org/jira/browse/SOLR-9116 > Project: Solr > Issue Type: Bug >Reporter: Jessica Cheng Mallet > Labels: leak, searcher > Attachments: SOLR-9116.patch > > > Fix a leak of SolrIndexSearcher when a SolrCore is reloaded. Added a test to > expose this leak when run in many iterations (pretty reliable failure with > iters=1K), which passes with the fix (ran iters=10K twice). > The fundamental issue is that when an invocation of SolrCore#openNewSearcher > is racing with SolrCore#close, if this synchronized block > (https://github.com/apache/lucene-solr/blob/master/solr/ > core/src/java/org/apache/solr/core/SolrCore.java#L1611) in openNewSearcher > doesn't check for whether or not the core is closed, it can possibly run > after the core runs closeSearcher and assign the newly constructed searcher > to realtimeSearcher again, which will never be cleaned up. The fix is to > check if the SolrCore is closed inside the synchronized block, and if so, > clean up the newly constructed searcher and throw an Exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-9116) Race condition causing occasional SolrIndexSearcher leak when SolrCore is reloaded
Jessica Cheng Mallet created SOLR-9116: -- Summary: Race condition causing occasional SolrIndexSearcher leak when SolrCore is reloaded Key: SOLR-9116 URL: https://issues.apache.org/jira/browse/SOLR-9116 Project: Solr Issue Type: Bug Reporter: Jessica Cheng Mallet Fix a leak of SolrIndexSearcher when a SolrCore is reloaded. Added a test to expose this leak when run in many iterations (pretty reliable failure with iters=1K), which passes with the fix (ran iters=10K twice). The fundamental issue is that when an invocation of SolrCore#openNewSearcher is racing with SolrCore#close, if this synchronized block (https://github.com/apache/lucene-solr/blob/master/solr/ core/src/java/org/apache/solr/core/SolrCore.java#L1611) in openNewSearcher doesn't check for whether or not the core is closed, it can possibly run after the core runs closeSearcher and assign the newly constructed searcher to realtimeSearcher again, which will never be cleaned up. The fix is to check if the SolrCore is closed inside the synchronized block, and if so, clean up the newly constructed searcher and throw an Exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9092) Add safety checks to delete replica/shard/collection commands
[ https://issues.apache.org/jira/browse/SOLR-9092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282925#comment-15282925 ] Jessica Cheng Mallet commented on SOLR-9092: We do want to delete the replica from cluster state even if the node doesn't appear under live_nodes, but it shouldn't send the core admin UNLOAD to the node unless it's appearing under live_nodes. In the situation where the node is down and we want to remove it from the cluster state, we would rely on the deleteCoreNode call to have it removed: https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/cloud/OverseerCollectionMessageHandler.java#L634 > Add safety checks to delete replica/shard/collection commands > - > > Key: SOLR-9092 > URL: https://issues.apache.org/jira/browse/SOLR-9092 > Project: Solr > Issue Type: Improvement >Reporter: Varun Thacker >Assignee: Varun Thacker >Priority: Minor > > We should verify the delete commands against live_nodes to make sure the API > can atleast be executed correctly > If we have a two node cluster, a collection with 1 shard 2 replica. Call the > delete replica command against for the replica whose node is currently down. > You get an exception: > {code} > > > 0 > 5173 > > >name="192.168.1.101:7574_solr">org.apache.solr.client.solrj.SolrServerException:Server > refused connection at: http://192.168.1.101:7574/solr > > > {code} > At this point the entry for the replica is gone from state.json . The client > application retries since an error was thrown but the delete command will > never succeed now and an error like this will be seen- > {code} > > > 400 > 137 > >org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: > Invalid replica : core_node3 in shard/collection : shard1/gettingstarted > available replicas are core_node1 > > Invalid replica : core_node3 in shard/collection : > shard1/gettingstarted available replicas are core_node1 > 400 > > > > org.apache.solr.common.SolrException > name="root-error-class">org.apache.solr.common.SolrException > > Invalid replica : core_node3 in shard/collection : > shard1/gettingstarted available replicas are core_node1 > 400 > > > {code} > For create collection/add-replica we check the "createNodeSet" and "node" > params respectively against live_nodes to make sure it has a chance of > succeeding. > We should add a check against live_nodes for the delete commands as well. > Another situation where I saw this can be a problem - A second solr cluster > cloned from the first but the script didn't correctly change the hostnames in > the state.json file. When a delete command was issued against the second > cluster Solr deleted the replica from the first cluster. > In the above case the script was buggy obviously but if we verify against > live_nodes then Solr wouldn't have gone ahead and deleted replicas not > belonging to its cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8948) OverseerTaskQueue.containsTaskWithRequestId encounters json parse error if a SolrResponse node is in the overseer queue
[ https://issues.apache.org/jira/browse/SOLR-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jessica Cheng Mallet updated SOLR-8948: --- Attachment: SOLR-8948.patch > OverseerTaskQueue.containsTaskWithRequestId encounters json parse error if a > SolrResponse node is in the overseer queue > --- > > Key: SOLR-8948 > URL: https://issues.apache.org/jira/browse/SOLR-8948 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Jessica Cheng Mallet > Labels: overseer, solrcloud > Attachments: SOLR-8948.patch > > > Currently OverseerTaskQueue.containsTaskWithRequestId doesn't skip through > the response nodes in the queue and this causes a parse error since response > nodes are written in a different serialization format. > The code fix is one line. The rest is adding a test that exposes the bug and > slight refactoring that makes writing the test possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8948) OverseerTaskQueue.containsTaskWithRequestId encounters json parse error if a SolrResponse node is in the overseer queue
Jessica Cheng Mallet created SOLR-8948: -- Summary: OverseerTaskQueue.containsTaskWithRequestId encounters json parse error if a SolrResponse node is in the overseer queue Key: SOLR-8948 URL: https://issues.apache.org/jira/browse/SOLR-8948 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Jessica Cheng Mallet Currently OverseerTaskQueue.containsTaskWithRequestId doesn't skip through the response nodes in the queue and this causes a parse error since response nodes are written in a different serialization format. The code fix is one line. The rest is adding a test that exposes the bug and slight refactoring that makes writing the test possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5670) _version_ either indexed OR docvalue
[ https://issues.apache.org/jira/browse/SOLR-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15134515#comment-15134515 ] Jessica Cheng Mallet commented on SOLR-5670: Hi [~steff1193], do you have any sense of at what document size per core does using DocValues for _version_ start being more beneficial? > _version_ either indexed OR docvalue > > > Key: SOLR-5670 > URL: https://issues.apache.org/jira/browse/SOLR-5670 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Affects Versions: 4.7 >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: solr, solrcloud, version > Fix For: 4.7, Trunk > > Attachments: SOLR-5670.patch, SOLR-5670.patch > > > As far as I can see there is no good reason to require that "_version_" field > has to be indexed if it is docvalued. So I guess it will be ok with a rule > saying "_version_ has to be either indexed or docvalue (allowed to be both)". -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8651) The commitWithin parameter is not passed on for deleteById in UpdateRequest
Jessica Cheng Mallet created SOLR-8651: -- Summary: The commitWithin parameter is not passed on for deleteById in UpdateRequest Key: SOLR-8651 URL: https://issues.apache.org/jira/browse/SOLR-8651 Project: Solr Issue Type: Bug Components: SolrJ Reporter: Jessica Cheng Mallet Priority: Minor The commitWithin parameter is not passed on for deleteById in UpdateRequest, resulting in it not working. Adding a one-line fix plus test that fail before adding the line and passes after. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8651) The commitWithin parameter is not passed on for deleteById in UpdateRequest
[ https://issues.apache.org/jira/browse/SOLR-8651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jessica Cheng Mallet updated SOLR-8651: --- Attachment: fix-solrj-delete-commitWithin.patch > The commitWithin parameter is not passed on for deleteById in UpdateRequest > --- > > Key: SOLR-8651 > URL: https://issues.apache.org/jira/browse/SOLR-8651 > Project: Solr > Issue Type: Bug > Components: SolrJ >Reporter: Jessica Cheng Mallet >Priority: Minor > Labels: commitWithin, solrj > Attachments: fix-solrj-delete-commitWithin.patch > > > The commitWithin parameter is not passed on for deleteById in UpdateRequest, > resulting in it not working. > Adding a one-line fix plus test that fail before adding the line and passes > after. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8327) SolrDispatchFilter is not caching new state format, which results in live fetch from ZK per request if node does not contain core from collection
Jessica Cheng Mallet created SOLR-8327: -- Summary: SolrDispatchFilter is not caching new state format, which results in live fetch from ZK per request if node does not contain core from collection Key: SOLR-8327 URL: https://issues.apache.org/jira/browse/SOLR-8327 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.3 Reporter: Jessica Cheng Mallet While perf testing with non-solrj client (request can be sent to any solr node), we noticed a huge amount of data from Zookeeper in our tcpdump (~1G for 20 second dump). From the thread dump, we noticed this: java.lang.Object.wait (Native Method) java.lang.Object.wait (Object.java:503) org.apache.zookeeper.ClientCnxn.submitRequest (ClientCnxn.java:1309) org.apache.zookeeper.ZooKeeper.getData (ZooKeeper.java:1152) org.apache.solr.common.cloud.SolrZkClient$7.execute (SolrZkClient.java:345) org.apache.solr.common.cloud.SolrZkClient$7.execute (SolrZkClient.java:342) org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation (ZkCmdExecutor.java:61) org.apache.solr.common.cloud.SolrZkClient.getData (SolrZkClient.java:342) org.apache.solr.common.cloud.ZkStateReader.getCollectionLive (ZkStateReader.java:841) org.apache.solr.common.cloud.ZkStateReader$7.get (ZkStateReader.java:515) org.apache.solr.common.cloud.ClusterState.getCollectionOrNull (ClusterState.java:175) org.apache.solr.common.cloud.ClusterState.getLeader (ClusterState.java:98) org.apache.solr.servlet.HttpSolrCall.getCoreByCollection (HttpSolrCall.java:784) org.apache.solr.servlet.HttpSolrCall.init (HttpSolrCall.java:272) org.apache.solr.servlet.HttpSolrCall.call (HttpSolrCall.java:417) org.apache.solr.servlet.SolrDispatchFilter.doFilter (SolrDispatchFilter.java:210) org.apache.solr.servlet.SolrDispatchFilter.doFilter (SolrDispatchFilter.java:179) org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter (ServletHandler.java:1652) org.eclipse.jetty.servlet.ServletHandler.doHandle (ServletHandler.java:585) org.eclipse.jetty.server.handler.ScopedHandler.handle (ScopedHandler.java:143) org.eclipse.jetty.security.SecurityHandler.handle (SecurityHandler.java:577) org.eclipse.jetty.server.session.SessionHandler.doHandle (SessionHandler.java:223) org.eclipse.jetty.server.handler.ContextHandler.doHandle (ContextHandler.java:1127) org.eclipse.jetty.servlet.ServletHandler.doScope (ServletHandler.java:515) org.eclipse.jetty.server.session.SessionHandler.doScope (SessionHandler.java:185) org.eclipse.jetty.server.handler.ContextHandler.doScope (ContextHandler.java:1061) org.eclipse.jetty.server.handler.ScopedHandler.handle (ScopedHandler.java:141) org.eclipse.jetty.server.handler.ContextHandlerCollection.handle (ContextHandlerCollection.java:215) org.eclipse.jetty.server.handler.HandlerCollection.handle (HandlerCollection.java:110) org.eclipse.jetty.server.handler.HandlerWrapper.handle (HandlerWrapper.java:97) org.eclipse.jetty.server.Server.handle (Server.java:499) org.eclipse.jetty.server.HttpChannel.handle (HttpChannel.java:310) org.eclipse.jetty.server.HttpConnection.onFillable (HttpConnection.java:257) org.eclipse.jetty.io.AbstractConnection$2.run (AbstractConnection.java:540) org.eclipse.jetty.util.thread.QueuedThreadPool.runJob (QueuedThreadPool.java:635) org.eclipse.jetty.util.thread.QueuedThreadPool$3.run (QueuedThreadPool.java:555) java.lang.Thread.run (Thread.java:745) Looks like SolrDispatchFilter doesn't have caching similar to the collectionStateCache in CloudSolrClient, so if the node doesn't know about a collection in the new state format, it just live-fetch it from Zookeeper on every request. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8069) Ensure that only the valid ZooKeeper registered leader can put a replica into Leader Initiated Recovery.
[ https://issues.apache.org/jira/browse/SOLR-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902021#comment-14902021 ] Jessica Cheng Mallet commented on SOLR-8069: bq. I don't think we really protect against such cases where there is only a single leader that can accept an update This is not the scenario I'm describing. If you have 3 replicas and one that was the leader gets partitioned off, one of the other 2 will get elected and they can carry on. However, during this transition time, because the cluster state update hasn't been completed or propagated through watches, the old leader can still get trailing updates from the client. In a normal case where the updates are successfully forwarded to all replicas, no one cares. But in this case, the old leader cannot forward the update to others (because it's partitioned off), so it should not reply success to the client because that would be wrong (it is not the leader and it does not have the right to tell the others to recover). > Ensure that only the valid ZooKeeper registered leader can put a replica into > Leader Initiated Recovery. > > > Key: SOLR-8069 > URL: https://issues.apache.org/jira/browse/SOLR-8069 > Project: Solr > Issue Type: Bug >Reporter: Mark Miller >Assignee: Mark Miller >Priority: Critical > Attachments: SOLR-8069.patch, SOLR-8069.patch > > > I've seen this twice now. Need to work on a test. > When some issues hit all the replicas at once, you can end up in a situation > where the rightful leader was put or put itself into LIR. Even on restart, > this rightful leader won't take leadership and you have to manually clear the > LIR nodes. > It seems that if all the replicas participate in election on startup, LIR > should just be cleared. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8069) Leader Initiated Recovery can put the replica with the latest data into LIR and a shard will have no leader even on restart.
[ https://issues.apache.org/jira/browse/SOLR-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876879#comment-14876879 ] Jessica Cheng Mallet commented on SOLR-8069: bq. I think it does. A leader can do this. It doesn't matter if it had a valid reason to do it or not. If you believe that this is true, I do agree that your patch will accomplish the check that at the moment you're setting someone else down, you're the leader. If we're going with this policy though, I think if at this moment it realizes that it's not the leader, it should actually fail the request because it shouldn't accept it on the real leader's behalf. E.g. if it's a node that was a leader but has just been network-partitioned off (but clusterstate change hasn't been made since it's asynchronous) and wasn't able to actually forward the request to the real leader. > Leader Initiated Recovery can put the replica with the latest data into LIR > and a shard will have no leader even on restart. > > > Key: SOLR-8069 > URL: https://issues.apache.org/jira/browse/SOLR-8069 > Project: Solr > Issue Type: Bug >Reporter: Mark Miller > Attachments: SOLR-8069.patch, SOLR-8069.patch > > > I've seen this twice now. Need to work on a test. > When some issues hit all the replicas at once, you can end up in a situation > where the rightful leader was put or put itself into LIR. Even on restart, > this rightful leader won't take leadership and you have to manually clear the > LIR nodes. > It seems that if all the replicas participate in election on startup, LIR > should just be cleared. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8069) Leader Initiated Recovery can put the replica with the latest data into LIR and a shard will have no leader even on restart.
[ https://issues.apache.org/jira/browse/SOLR-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14875931#comment-14875931 ] Jessica Cheng Mallet commented on SOLR-8069: bq. I think of course it is. It's valid for the leader and only the leader to set anyone as down. It's definitely only valid for the leader to set anyone down, but it doesn't mean that the leader should set someone down based on old leadership decision. This is the only place I'm unsure about. bq. I don't see an easy way to do that in this case. Almost all the solutions that fit with the code have the exact same holes / races. If we're willing to make more changes, one way I see this work is to write down the election node path as a prop in the leader znode (this is now written via zk transaction from your other commit). Then, have the isLeader logic in DistributedUpdateProcessor be based on reading the leader znode, and at that point record down the election node path as well. Then, when setting LiR, predicate the ZK transaction on the election node path read in the beginning of DistributedUpdateProcessor. > Leader Initiated Recovery can put the replica with the latest data into LIR > and a shard will have no leader even on restart. > > > Key: SOLR-8069 > URL: https://issues.apache.org/jira/browse/SOLR-8069 > Project: Solr > Issue Type: Bug >Reporter: Mark Miller > Attachments: SOLR-8069.patch, SOLR-8069.patch > > > I've seen this twice now. Need to work on a test. > When some issues hit all the replicas at once, you can end up in a situation > where the rightful leader was put or put itself into LIR. Even on restart, > this rightful leader won't take leadership and you have to manually clear the > LIR nodes. > It seems that if all the replicas participate in election on startup, LIR > should just be cleared. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8069) Leader Initiated Recovery can put the replica with the latest data into LIR and a shard will have no leader even on restart.
[ https://issues.apache.org/jira/browse/SOLR-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803172#comment-14803172 ] Jessica Cheng Mallet commented on SOLR-8069: We have definitely seen this as well, even after commit for SOLR-7109 added zookeeper multi transaction to ZkController.markShardAsDownIfLeader, which is supposed to predicate setting the LiR node on the setter's still having the same election znode it thinks it has when it's a leader. Hmmm, reading the code now I'm not sure it's doing exactly the right thing since it calls getLeaderSeqPath, which just takes the current ElectionContext from electionContexts, which isn't necessarily the one the node had when it decided to mark someone else down, right? [~shalinmangar] thoughts? > Leader Initiated Recovery can put the replica with the latest data into LIR > and a shard will have no leader even on restart. > > > Key: SOLR-8069 > URL: https://issues.apache.org/jira/browse/SOLR-8069 > Project: Solr > Issue Type: Bug >Reporter: Mark Miller > > I've seen this twice now. Need to work on a test. > When some issues hit all the replicas at once, you can end up in a situation > where the rightful leader was put or put itself into LIR. Even on restart, > this rightful leader won't take leadership and you have to manually clear the > LIR nodes. > It seems that if all the replicas participate in election on startup, LIR > should just be cleared. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8069) Leader Initiated Recovery can put the replica with the latest data into LIR and a shard will have no leader even on restart.
[ https://issues.apache.org/jira/browse/SOLR-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804438#comment-14804438 ] Jessica Cheng Mallet commented on SOLR-8069: Actually, thinking about it -- why do we have the leader property in cluster state at all? If it's simply to publish leadership to solrj, it seems that on the server-side we should still use the leader znode as the "source of truth" so that we can have guarantees of consistent view along with the zk transactions. If solrj's view falls behind due to the asynchronous nature of having the Overseer update the state, at least on the server side we can check the leader znode. Any historical reason why leadership information is in two places? > Leader Initiated Recovery can put the replica with the latest data into LIR > and a shard will have no leader even on restart. > > > Key: SOLR-8069 > URL: https://issues.apache.org/jira/browse/SOLR-8069 > Project: Solr > Issue Type: Bug >Reporter: Mark Miller > Attachments: SOLR-8069.patch > > > I've seen this twice now. Need to work on a test. > When some issues hit all the replicas at once, you can end up in a situation > where the rightful leader was put or put itself into LIR. Even on restart, > this rightful leader won't take leadership and you have to manually clear the > LIR nodes. > It seems that if all the replicas participate in election on startup, LIR > should just be cleared. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8069) Leader Initiated Recovery can put the replica with the latest data into LIR and a shard will have no leader even on restart.
[ https://issues.apache.org/jira/browse/SOLR-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804407#comment-14804407 ] Jessica Cheng Mallet commented on SOLR-8069: I still struggle with the safety of getting the ElectionContext from electionContexts, because what's mapped there could change from under this thread. What about if we write down the election node path (e.g. 238121947050958365-core_node2-n_06) into the leader znode as a leader props, so that whenever we're actually checking that we're the leader, we can get that election node path back and do the zk multi checking for that particular election node path? Ugh, but then I guess lots of places are actually looking at the cluster state's leader instead of the leader node. >_< Why are there separate places for marking the leader? I don't know how to reason with the asynchronous nature of cluster state's update wrt actual leader election... > Leader Initiated Recovery can put the replica with the latest data into LIR > and a shard will have no leader even on restart. > > > Key: SOLR-8069 > URL: https://issues.apache.org/jira/browse/SOLR-8069 > Project: Solr > Issue Type: Bug >Reporter: Mark Miller > Attachments: SOLR-8069.patch > > > I've seen this twice now. Need to work on a test. > When some issues hit all the replicas at once, you can end up in a situation > where the rightful leader was put or put itself into LIR. Even on restart, > this rightful leader won't take leadership and you have to manually clear the > LIR nodes. > It seems that if all the replicas participate in election on startup, LIR > should just be cleared. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8069) Leader Initiated Recovery can put the replica with the latest data into LIR and a shard will have no leader even on restart.
[ https://issues.apache.org/jira/browse/SOLR-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804677#comment-14804677 ] Jessica Cheng Mallet commented on SOLR-8069: Yes, I think this is definitely an improvement. I'm just not sure if it gets everything covered. I suppose "we have near real confidence that we are the leader and can do still do as we please" is probably good enough -- though I haven't convinced myself yet through playing with complex scenarios of repeated leadership changes -- thus I prefer the simple logic of "do this action only if our zookeeper session state is exactly what it was when we decided to do it". Anyhow, this is probably beyond the scope of this JIRA. BTW, we tend to see this most when a "bad" query is issued (e.g. doing non-cursorMark deep paging of page 50,000). Presumably it creates GC on each replica it hits (since the request is retried) and a series of leadership changes happen. Along with complication of GC pauses, the states are quite difficult to reason through. > Leader Initiated Recovery can put the replica with the latest data into LIR > and a shard will have no leader even on restart. > > > Key: SOLR-8069 > URL: https://issues.apache.org/jira/browse/SOLR-8069 > Project: Solr > Issue Type: Bug >Reporter: Mark Miller > Attachments: SOLR-8069.patch, SOLR-8069.patch > > > I've seen this twice now. Need to work on a test. > When some issues hit all the replicas at once, you can end up in a situation > where the rightful leader was put or put itself into LIR. Even on restart, > this rightful leader won't take leadership and you have to manually clear the > LIR nodes. > It seems that if all the replicas participate in election on startup, LIR > should just be cleared. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8069) Leader Initiated Recovery can put the replica with the latest data into LIR and a shard will have no leader even on restart.
[ https://issues.apache.org/jira/browse/SOLR-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804934#comment-14804934 ] Jessica Cheng Mallet commented on SOLR-8069: The scenario that I have in mind is if somehow we're switching leadership back and forth due to nodes going into GC after receiving retries of an expensive query, what if a node is a leader at time T1, decided to set another node in LiR but went to GC before it did, so that it lost the leadership. Then, the other node briefly gained leadership at T2 but then also went to GC and lost its leadership. Then, the first node wakes up from GC and became the leader once more at T3--and then this code execute. My question is if it's absolutely safe for this node to set the other node in LiR simply because it's the leader now, even though when it decided to set the LiR, it was the leader at T1. > Leader Initiated Recovery can put the replica with the latest data into LIR > and a shard will have no leader even on restart. > > > Key: SOLR-8069 > URL: https://issues.apache.org/jira/browse/SOLR-8069 > Project: Solr > Issue Type: Bug >Reporter: Mark Miller > Attachments: SOLR-8069.patch, SOLR-8069.patch > > > I've seen this twice now. Need to work on a test. > When some issues hit all the replicas at once, you can end up in a situation > where the rightful leader was put or put itself into LIR. Even on restart, > this rightful leader won't take leadership and you have to manually clear the > LIR nodes. > It seems that if all the replicas participate in election on startup, LIR > should just be cleared. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-8069) Leader Initiated Recovery can put the replica with the latest data into LIR and a shard will have no leader even on restart.
[ https://issues.apache.org/jira/browse/SOLR-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804934#comment-14804934 ] Jessica Cheng Mallet edited comment on SOLR-8069 at 9/18/15 4:17 AM: - The scenario that I have in mind is if somehow we're switching leadership back and forth due to nodes going into GC after receiving retries of an expensive query, what if a node is a leader at time T1, decided to set another node in LiR but went to GC before it did, so that it lost the leadership. Then, the other node briefly gained leadership at T2 and maybe processed an update or two but then also went to GC and lost its leadership. Then, the first node wakes up from GC and became the leader once more at T3--and then this code execute. My question is if it's absolutely safe for this node to set the other node in LiR simply because it's the leader now, even though when it decided to set the LiR, it was the leader at T1. was (Author: mewmewball): The scenario that I have in mind is if somehow we're switching leadership back and forth due to nodes going into GC after receiving retries of an expensive query, what if a node is a leader at time T1, decided to set another node in LiR but went to GC before it did, so that it lost the leadership. Then, the other node briefly gained leadership at T2 but then also went to GC and lost its leadership. Then, the first node wakes up from GC and became the leader once more at T3--and then this code execute. My question is if it's absolutely safe for this node to set the other node in LiR simply because it's the leader now, even though when it decided to set the LiR, it was the leader at T1. > Leader Initiated Recovery can put the replica with the latest data into LIR > and a shard will have no leader even on restart. > > > Key: SOLR-8069 > URL: https://issues.apache.org/jira/browse/SOLR-8069 > Project: Solr > Issue Type: Bug >Reporter: Mark Miller > Attachments: SOLR-8069.patch, SOLR-8069.patch > > > I've seen this twice now. Need to work on a test. > When some issues hit all the replicas at once, you can end up in a situation > where the rightful leader was put or put itself into LIR. Even on restart, > this rightful leader won't take leadership and you have to manually clear the > LIR nodes. > It seems that if all the replicas participate in election on startup, LIR > should just be cleared. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8034) If minRF is not satisfied, leader should not put replicas in recovery
[ https://issues.apache.org/jira/browse/SOLR-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745746#comment-14745746 ] Jessica Cheng Mallet commented on SOLR-8034: Oops, sorry! Didn't know there's another Tim Potter. :P > If minRF is not satisfied, leader should not put replicas in recovery > - > > Key: SOLR-8034 > URL: https://issues.apache.org/jira/browse/SOLR-8034 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Jessica Cheng Mallet >Assignee: Anshum Gupta > Labels: solrcloud > Attachments: SOLR-8034.patch, SOLR-8034.patch > > > If the minimum replication factor parameter (minRf) in a solr update request > is not satisfied -- i.e. if the update was not successfully applied on at > least n replicas where n >= minRf -- the shard leader should not put the > failed replicas in "leader initiated recovery" and the client should retry > the update instead. > This is so that in the scenario were minRf is not satisfied, the failed > replicas can still be eligible to become a leader in case of leader failure, > since in the client's perspective this update did not succeed. > This came up from a network partition scenario where the leader becomes > sectioned off from its two followers, but they all could still talk to > zookeeper. The partitioned leader set its two followers as in leader > initiated recovery, so we couldn't just kill off the partitioned node and > have a follower take over leadership. For a minRf=1 case, this is the correct > behavior because the partitioned leader would have accepted updates that the > followers don't have, and therefore we can't switch leadership or we'd lose > those updates. However, in the case of minRf=2, solr never accepted any > update in the client's point of view, so in fact the partitioned leader > doesn't have any accepted update that the followers don't have, and therefore > the followers should be eligible to become leaders. Thus, I'm proposing > modifying the leader initiated recovery logic to not put the followers in > recovery if the minRf parameter is present and is not satisfied. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8034) If minRF is not satisfied, leader should not put replicas in recovery
[ https://issues.apache.org/jira/browse/SOLR-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jessica Cheng Mallet updated SOLR-8034: --- Attachment: SOLR-8034.patch > If minRF is not satisfied, leader should not put replicas in recovery > - > > Key: SOLR-8034 > URL: https://issues.apache.org/jira/browse/SOLR-8034 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Jessica Cheng Mallet > Labels: solrcloud > Attachments: SOLR-8034.patch, SOLR-8034.patch > > > If the minimum replication factor parameter (minRf) in a solr update request > is not satisfied -- i.e. if the update was not successfully applied on at > least n replicas where n >= minRf -- the shard leader should not put the > failed replicas in "leader initiated recovery" and the client should retry > the update instead. > This is so that in the scenario were minRf is not satisfied, the failed > replicas can still be eligible to become a leader in case of leader failure, > since in the client's perspective this update did not succeed. > This came up from a network partition scenario where the leader becomes > sectioned off from its two followers, but they all could still talk to > zookeeper. The partitioned leader set its two followers as in leader > initiated recovery, so we couldn't just kill off the partitioned node and > have a follower take over leadership. For a minRf=1 case, this is the correct > behavior because the partitioned leader would have accepted updates that the > followers don't have, and therefore we can't switch leadership or we'd lose > those updates. However, in the case of minRf=2, solr never accepted any > update in the client's point of view, so in fact the partitioned leader > doesn't have any accepted update that the followers don't have, and therefore > the followers should be eligible to become leaders. Thus, I'm proposing > modifying the leader initiated recovery logic to not put the followers in > recovery if the minRf parameter is present and is not satisfied. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8034) If minRF is not satisfied, leader should not put replicas in recovery
[ https://issues.apache.org/jira/browse/SOLR-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745722#comment-14745722 ] Jessica Cheng Mallet commented on SOLR-8034: Ah, and regarding {quote} I'm kind of split on this as the replica here would be out of sync from the leader and would never know about it, increasing the odds of inconsistency when the client doesn't handle it the right way i.e. it kind of self-heals at this point, and that would stop happening. {quote} I'd hope that if the user is explicitly using minRf that they handle it the right way (i.e. retry if minRf isn't achieved). The contract would be if the request fails, it needs to be retried or we can possibly see inconsistent state. I think this is true currently in a normal update if the forwarded parallel update to the followers succeeds but somehow it fails on the leader--a failure would be returned to the user but the update could be present on the followers. > If minRF is not satisfied, leader should not put replicas in recovery > - > > Key: SOLR-8034 > URL: https://issues.apache.org/jira/browse/SOLR-8034 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Jessica Cheng Mallet > Labels: solrcloud > Attachments: SOLR-8034.patch, SOLR-8034.patch > > > If the minimum replication factor parameter (minRf) in a solr update request > is not satisfied -- i.e. if the update was not successfully applied on at > least n replicas where n >= minRf -- the shard leader should not put the > failed replicas in "leader initiated recovery" and the client should retry > the update instead. > This is so that in the scenario were minRf is not satisfied, the failed > replicas can still be eligible to become a leader in case of leader failure, > since in the client's perspective this update did not succeed. > This came up from a network partition scenario where the leader becomes > sectioned off from its two followers, but they all could still talk to > zookeeper. The partitioned leader set its two followers as in leader > initiated recovery, so we couldn't just kill off the partitioned node and > have a follower take over leadership. For a minRf=1 case, this is the correct > behavior because the partitioned leader would have accepted updates that the > followers don't have, and therefore we can't switch leadership or we'd lose > those updates. However, in the case of minRf=2, solr never accepted any > update in the client's point of view, so in fact the partitioned leader > doesn't have any accepted update that the followers don't have, and therefore > the followers should be eligible to become leaders. Thus, I'm proposing > modifying the leader initiated recovery logic to not put the followers in > recovery if the minRf parameter is present and is not satisfied. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8034) If minRF is not satisfied, leader should not put replicas in recovery
[ https://issues.apache.org/jira/browse/SOLR-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745706#comment-14745706 ] Jessica Cheng Mallet commented on SOLR-8034: [~anshumg], I fixed the comment for the assertion, but I didn't add the test that the replica is down after the first network partition, because the point is that the replica will not realize it's down on its own since the partition is between the leader and the replica, not between the replica and zookeeper -- so it won't be set to down until the leader tries to forward the document to it and fails, and then set it in leader-initiated-recovery. [~tpot], we discussed this in ticket 4072. > If minRF is not satisfied, leader should not put replicas in recovery > - > > Key: SOLR-8034 > URL: https://issues.apache.org/jira/browse/SOLR-8034 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Jessica Cheng Mallet > Labels: solrcloud > Attachments: SOLR-8034.patch, SOLR-8034.patch > > > If the minimum replication factor parameter (minRf) in a solr update request > is not satisfied -- i.e. if the update was not successfully applied on at > least n replicas where n >= minRf -- the shard leader should not put the > failed replicas in "leader initiated recovery" and the client should retry > the update instead. > This is so that in the scenario were minRf is not satisfied, the failed > replicas can still be eligible to become a leader in case of leader failure, > since in the client's perspective this update did not succeed. > This came up from a network partition scenario where the leader becomes > sectioned off from its two followers, but they all could still talk to > zookeeper. The partitioned leader set its two followers as in leader > initiated recovery, so we couldn't just kill off the partitioned node and > have a follower take over leadership. For a minRf=1 case, this is the correct > behavior because the partitioned leader would have accepted updates that the > followers don't have, and therefore we can't switch leadership or we'd lose > those updates. However, in the case of minRf=2, solr never accepted any > update in the client's point of view, so in fact the partitioned leader > doesn't have any accepted update that the followers don't have, and therefore > the followers should be eligible to become leaders. Thus, I'm proposing > modifying the leader initiated recovery logic to not put the followers in > recovery if the minRf parameter is present and is not satisfied. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8034) If minRF is not satisfied, leader should not put replicas in recovery
[ https://issues.apache.org/jira/browse/SOLR-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741536#comment-14741536 ] Jessica Cheng Mallet commented on SOLR-8034: [~tpot] This is what we discussed a while ago. Will you please give it a look? Thanks! > If minRF is not satisfied, leader should not put replicas in recovery > - > > Key: SOLR-8034 > URL: https://issues.apache.org/jira/browse/SOLR-8034 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Jessica Cheng Mallet > Labels: solrcloud > Attachments: SOLR-8034.patch > > > If the minimum replication factor parameter (minRf) in a solr update request > is not satisfied -- i.e. if the update was not successfully applied on at > least n replicas where n >= minRf -- the shard leader should not put the > failed replicas in "leader initiated recovery" and the client should retry > the update instead. > This is so that in the scenario were minRf is not satisfied, the failed > replicas can still be eligible to become a leader in case of leader failure, > since in the client's perspective this update did not succeed. > This came up from a network partition scenario where the leader becomes > sectioned off from its two followers, but they all could still talk to > zookeeper. The partitioned leader set its two followers as in leader > initiated recovery, so we couldn't just kill off the partitioned node and > have a follower take over leadership. For a minRf=1 case, this is the correct > behavior because the partitioned leader would have accepted updates that the > followers don't have, and therefore we can't switch leadership or we'd lose > those updates. However, in the case of minRf=2, solr never accepted any > update in the client's point of view, so in fact the partitioned leader > doesn't have any accepted update that the followers don't have, and therefore > the followers should be eligible to become leaders. Thus, I'm proposing > modifying the leader initiated recovery logic to not put the followers in > recovery if the minRf parameter is present and is not satisfied. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8034) If minRF is not satisfied, leader should not put replicas in recovery
Jessica Cheng Mallet created SOLR-8034: -- Summary: If minRF is not satisfied, leader should not put replicas in recovery Key: SOLR-8034 URL: https://issues.apache.org/jira/browse/SOLR-8034 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Jessica Cheng Mallet If the minimum replication factor parameter (minRf) in a solr update request is not satisfied--i.e. if the update was not successfully applied on at least n replicas where n >= minRf--the shard leader should not put the failed replicas in "leader initiated recovery" and the client should retry the update instead. This is so that in the scenario were minRf is not satisfied, the failed replicas can still be eligible to become a leader in case of leader failure, since in the client's perspective this update did not succeed. This came up from a network partition scenario where the leader becomes sectioned off from its two followers, but they all could still talk to zookeeper. The partitioned leader set its two followers as in leader initiated recovery, so we couldn't just kill off the partitioned node and have a follower take over leadership. For a minRf=1 case, this is the correct behavior because the partitioned leader would have accepted updates that the followers don't have, and therefore we can't switch leadership or we'd lose those updates. However, in the case of minRf=2, solr never accepted any update in the client's point of view, so in fact the partitioned leader doesn't have any accepted update that the followers don't have, and therefore the followers should be eligible to become leaders. Thus, I'm proposing modifying the leader initiated recovery logic to not put the followers in recovery if the minRf parameter is present and is not satisfied. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8034) If minRF is not satisfied, leader should not put replicas in recovery
[ https://issues.apache.org/jira/browse/SOLR-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jessica Cheng Mallet updated SOLR-8034: --- Attachment: SOLR-8034.patch > If minRF is not satisfied, leader should not put replicas in recovery > - > > Key: SOLR-8034 > URL: https://issues.apache.org/jira/browse/SOLR-8034 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Jessica Cheng Mallet > Labels: solrcloud > Attachments: SOLR-8034.patch > > > If the minimum replication factor parameter (minRf) in a solr update request > is not satisfied--i.e. if the update was not successfully applied on at least > n replicas where n >= minRf--the shard leader should not put the failed > replicas in "leader initiated recovery" and the client should retry the > update instead. > This is so that in the scenario were minRf is not satisfied, the failed > replicas can still be eligible to become a leader in case of leader failure, > since in the client's perspective this update did not succeed. > This came up from a network partition scenario where the leader becomes > sectioned off from its two followers, but they all could still talk to > zookeeper. The partitioned leader set its two followers as in leader > initiated recovery, so we couldn't just kill off the partitioned node and > have a follower take over leadership. For a minRf=1 case, this is the correct > behavior because the partitioned leader would have accepted updates that the > followers don't have, and therefore we can't switch leadership or we'd lose > those updates. However, in the case of minRf=2, solr never accepted any > update in the client's point of view, so in fact the partitioned leader > doesn't have any accepted update that the followers don't have, and therefore > the followers should be eligible to become leaders. Thus, I'm proposing > modifying the leader initiated recovery logic to not put the followers in > recovery if the minRf parameter is present and is not satisfied. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8034) If minRF is not satisfied, leader should not put replicas in recovery
[ https://issues.apache.org/jira/browse/SOLR-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jessica Cheng Mallet updated SOLR-8034: --- Description: If the minimum replication factor parameter (minRf) in a solr update request is not satisfied -- i.e. if the update was not successfully applied on at least n replicas where n >= minRf -- the shard leader should not put the failed replicas in "leader initiated recovery" and the client should retry the update instead. This is so that in the scenario were minRf is not satisfied, the failed replicas can still be eligible to become a leader in case of leader failure, since in the client's perspective this update did not succeed. This came up from a network partition scenario where the leader becomes sectioned off from its two followers, but they all could still talk to zookeeper. The partitioned leader set its two followers as in leader initiated recovery, so we couldn't just kill off the partitioned node and have a follower take over leadership. For a minRf=1 case, this is the correct behavior because the partitioned leader would have accepted updates that the followers don't have, and therefore we can't switch leadership or we'd lose those updates. However, in the case of minRf=2, solr never accepted any update in the client's point of view, so in fact the partitioned leader doesn't have any accepted update that the followers don't have, and therefore the followers should be eligible to become leaders. Thus, I'm proposing modifying the leader initiated recovery logic to not put the followers in recovery if the minRf parameter is present and is not satisfied. was: If the minimum replication factor parameter (minRf) in a solr update request is not satisfied--i.e. if the update was not successfully applied on at least n replicas where n >= minRf--the shard leader should not put the failed replicas in "leader initiated recovery" and the client should retry the update instead. This is so that in the scenario were minRf is not satisfied, the failed replicas can still be eligible to become a leader in case of leader failure, since in the client's perspective this update did not succeed. This came up from a network partition scenario where the leader becomes sectioned off from its two followers, but they all could still talk to zookeeper. The partitioned leader set its two followers as in leader initiated recovery, so we couldn't just kill off the partitioned node and have a follower take over leadership. For a minRf=1 case, this is the correct behavior because the partitioned leader would have accepted updates that the followers don't have, and therefore we can't switch leadership or we'd lose those updates. However, in the case of minRf=2, solr never accepted any update in the client's point of view, so in fact the partitioned leader doesn't have any accepted update that the followers don't have, and therefore the followers should be eligible to become leaders. Thus, I'm proposing modifying the leader initiated recovery logic to not put the followers in recovery if the minRf parameter is present and is not satisfied. > If minRF is not satisfied, leader should not put replicas in recovery > - > > Key: SOLR-8034 > URL: https://issues.apache.org/jira/browse/SOLR-8034 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Reporter: Jessica Cheng Mallet > Labels: solrcloud > Attachments: SOLR-8034.patch > > > If the minimum replication factor parameter (minRf) in a solr update request > is not satisfied -- i.e. if the update was not successfully applied on at > least n replicas where n >= minRf -- the shard leader should not put the > failed replicas in "leader initiated recovery" and the client should retry > the update instead. > This is so that in the scenario were minRf is not satisfied, the failed > replicas can still be eligible to become a leader in case of leader failure, > since in the client's perspective this update did not succeed. > This came up from a network partition scenario where the leader becomes > sectioned off from its two followers, but they all could still talk to > zookeeper. The partitioned leader set its two followers as in leader > initiated recovery, so we couldn't just kill off the partitioned node and > have a follower take over leadership. For a minRf=1 case, this is the correct > behavior because the partitioned leader would have accepted updates that the > followers don't have, and therefore we can't switch leadership or we'd lose > those updates. However, in the case of minRf=2, solr never accepted any > update in the client's point of view, so in fact the partitioned leader > doesn't have any accepted update that the followers don't have,
[jira] [Commented] (SOLR-7844) Zookeeper session expiry during shard leader election can cause multiple leaders.
[ https://issues.apache.org/jira/browse/SOLR-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723682#comment-14723682 ] Jessica Cheng Mallet commented on SOLR-7844: This looks good to me. The two comments I have are the following: 1. It'd be good to add some "explanation/documentation" type comment in ElectionContext to describe what we're trying to accomplish with the zk multi-transaction. For example, why can we rely on the parent version, etc. 2. Will the change to the leaderProps in ZkController (adding CORE_NODE_NAME_PROP) make this change non-backwards compatible? Related but unrelated--how do you guys usually make line comments in a JIRA patch situation? Here I only have two comments so it's pretty tractable, but I can see it being difficult if it's a large change, etc. Thanks! > Zookeeper session expiry during shard leader election can cause multiple > leaders. > - > > Key: SOLR-7844 > URL: https://issues.apache.org/jira/browse/SOLR-7844 > Project: Solr > Issue Type: Bug >Affects Versions: 4.10.4 >Reporter: Mike Roberts >Assignee: Mark Miller > Fix For: Trunk, 5.4 > > Attachments: SOLR-7844.patch, SOLR-7844.patch, SOLR-7844.patch, > SOLR-7844.patch, SOLR-7844.patch, SOLR-7844.patch, SOLR-7844.patch, > SOLR-7844.patch > > > If the ZooKeeper session expires for a host during shard leader election, the > ephemeral leader_elect nodes are removed. However the threads that were > processing the election are still present (and could believe the host won the > election). They will then incorrectly create leader nodes once a new > ZooKeeper session is established. > This introduces a subtle race condition that could cause two hosts to become > leader. > Scenario: > a three machine cluster, all of the machines are restarting at approximately > the same time. > The first machine starts, writes a leader_elect ephemeral node, it's the only > candidate in the election so it wins and starts the leadership process. As it > knows it has peers, it begins to block waiting for the peers to arrive. > During this period of blocking[1] the ZK connection drops and the session > expires. > A new ZK session is established, and ElectionContext.cancelElection is > called. Then register() is called and a new set of leader_elect ephemeral > nodes are created. > During the period between the ZK session expiring, and new set of > leader_elect nodes being created the second machine starts. > It creates its leader_elect ephemeral nodes, as there are no other nodes it > wins the election and starts the leadership process. As its still missing one > of its peers, it begins to block waiting for the third machine to join. > There is now a race between machine1 & machine2, both of whom think they are > the leader. > So far, this isn't too bad, because the machine that loses the race will fail > when it tries to create the /collection/name/leader/shard1 node (as it > already exists), and will rejoin the election. > While this is happening, machine3 has started and has queued for leadership > behind machine2. > If the loser of the race is machine2, when it rejoins the election it cancels > the current context, deleting it's leader_elect ephemeral nodes. > At this point, machine3 believes it has become leader (the watcher it has on > the leader_elect node fires), and it runs the LeaderElector::checkIfIAmLeader > method. This method DELETES the current /collection/name/leader/shard1 node, > then starts the leadership process (as all three machines are now running, it > does not block to wait). > So, machine1 won the race with machine2 and declared its leadership and > created the nodes. However, machine3 has just deleted them, and recreated > them for itself. So machine1 and machine3 both believe they are the leader. > I am thinking that the fix should be to cancel & close all election contexts > immediately on reconnect (we do cancel them, however it's run serially which > has blocking issues, and just canceling does not cause the wait loop to > exit). That election context logic already has checks on the closed flag, so > they should exit if they see it has been closed. > I'm working on a patch for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7844) Zookeeper session expiry during shard leader election can cause multiple leaders.
[ https://issues.apache.org/jira/browse/SOLR-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720891#comment-14720891 ] Jessica Cheng Mallet commented on SOLR-7844: The parent's cversion will change if the child node expires, whereas in this case the version won't change--so they won't be in sync. But that's ok. Are you seeing any case where while the leader node stayed there (didn't change) that the parent version changed? Zookeeper session expiry during shard leader election can cause multiple leaders. - Key: SOLR-7844 URL: https://issues.apache.org/jira/browse/SOLR-7844 Project: Solr Issue Type: Bug Affects Versions: 4.10.4 Reporter: Mike Roberts Assignee: Mark Miller Fix For: Trunk, 5.4 Attachments: SOLR-7844.patch, SOLR-7844.patch, SOLR-7844.patch, SOLR-7844.patch If the ZooKeeper session expires for a host during shard leader election, the ephemeral leader_elect nodes are removed. However the threads that were processing the election are still present (and could believe the host won the election). They will then incorrectly create leader nodes once a new ZooKeeper session is established. This introduces a subtle race condition that could cause two hosts to become leader. Scenario: a three machine cluster, all of the machines are restarting at approximately the same time. The first machine starts, writes a leader_elect ephemeral node, it's the only candidate in the election so it wins and starts the leadership process. As it knows it has peers, it begins to block waiting for the peers to arrive. During this period of blocking[1] the ZK connection drops and the session expires. A new ZK session is established, and ElectionContext.cancelElection is called. Then register() is called and a new set of leader_elect ephemeral nodes are created. During the period between the ZK session expiring, and new set of leader_elect nodes being created the second machine starts. It creates its leader_elect ephemeral nodes, as there are no other nodes it wins the election and starts the leadership process. As its still missing one of its peers, it begins to block waiting for the third machine to join. There is now a race between machine1 machine2, both of whom think they are the leader. So far, this isn't too bad, because the machine that loses the race will fail when it tries to create the /collection/name/leader/shard1 node (as it already exists), and will rejoin the election. While this is happening, machine3 has started and has queued for leadership behind machine2. If the loser of the race is machine2, when it rejoins the election it cancels the current context, deleting it's leader_elect ephemeral nodes. At this point, machine3 believes it has become leader (the watcher it has on the leader_elect node fires), and it runs the LeaderElector::checkIfIAmLeader method. This method DELETES the current /collection/name/leader/shard1 node, then starts the leadership process (as all three machines are now running, it does not block to wait). So, machine1 won the race with machine2 and declared its leadership and created the nodes. However, machine3 has just deleted them, and recreated them for itself. So machine1 and machine3 both believe they are the leader. I am thinking that the fix should be to cancel close all election contexts immediately on reconnect (we do cancel them, however it's run serially which has blocking issues, and just canceling does not cause the wait loop to exit). That election context logic already has checks on the closed flag, so they should exit if they see it has been closed. I'm working on a patch for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7844) Zookeeper session expiry during shard leader election can cause multiple leaders.
[ https://issues.apache.org/jira/browse/SOLR-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720117#comment-14720117 ] Jessica Cheng Mallet commented on SOLR-7844: [~markrmil...@gmail.com], not sure if this is intended--looks like the newly added ShardLeaderElectionContextBase.cancelElection now blindly deletes the leader node, which sounds just as dangerous. From your comment it seems like you just wanted it to expire out, so I'm wondering if it's just a merge bug or something. In general, I think it'd make a lot of sense to predicate the writing of the leader node on the election node still having the same session as the thread thinks (using the same zookeeper multi-transactional semantics as in ZkController.markShardAsDownIfLeader), so that a thread that went GCing before writing the leader node will fail when it comes back since its election node will have expired. Zookeeper session expiry during shard leader election can cause multiple leaders. - Key: SOLR-7844 URL: https://issues.apache.org/jira/browse/SOLR-7844 Project: Solr Issue Type: Bug Affects Versions: 4.10.4 Reporter: Mike Roberts Assignee: Mark Miller Fix For: Trunk, 5.4 Attachments: SOLR-7844.patch, SOLR-7844.patch If the ZooKeeper session expires for a host during shard leader election, the ephemeral leader_elect nodes are removed. However the threads that were processing the election are still present (and could believe the host won the election). They will then incorrectly create leader nodes once a new ZooKeeper session is established. This introduces a subtle race condition that could cause two hosts to become leader. Scenario: a three machine cluster, all of the machines are restarting at approximately the same time. The first machine starts, writes a leader_elect ephemeral node, it's the only candidate in the election so it wins and starts the leadership process. As it knows it has peers, it begins to block waiting for the peers to arrive. During this period of blocking[1] the ZK connection drops and the session expires. A new ZK session is established, and ElectionContext.cancelElection is called. Then register() is called and a new set of leader_elect ephemeral nodes are created. During the period between the ZK session expiring, and new set of leader_elect nodes being created the second machine starts. It creates its leader_elect ephemeral nodes, as there are no other nodes it wins the election and starts the leadership process. As its still missing one of its peers, it begins to block waiting for the third machine to join. There is now a race between machine1 machine2, both of whom think they are the leader. So far, this isn't too bad, because the machine that loses the race will fail when it tries to create the /collection/name/leader/shard1 node (as it already exists), and will rejoin the election. While this is happening, machine3 has started and has queued for leadership behind machine2. If the loser of the race is machine2, when it rejoins the election it cancels the current context, deleting it's leader_elect ephemeral nodes. At this point, machine3 believes it has become leader (the watcher it has on the leader_elect node fires), and it runs the LeaderElector::checkIfIAmLeader method. This method DELETES the current /collection/name/leader/shard1 node, then starts the leadership process (as all three machines are now running, it does not block to wait). So, machine1 won the race with machine2 and declared its leadership and created the nodes. However, machine3 has just deleted them, and recreated them for itself. So machine1 and machine3 both believe they are the leader. I am thinking that the fix should be to cancel close all election contexts immediately on reconnect (we do cancel them, however it's run serially which has blocking issues, and just canceling does not cause the wait loop to exit). That election context logic already has checks on the closed flag, so they should exit if they see it has been closed. I'm working on a patch for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7844) Zookeeper session expiry during shard leader election can cause multiple leaders.
[ https://issues.apache.org/jira/browse/SOLR-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720220#comment-14720220 ] Jessica Cheng Mallet commented on SOLR-7844: {quote}We do want to make sure we only remove our own registration though. We should be able to track that with the znode version and ensure we don't remove another candidate's entry with an optimistic delete?{quote} Sounds good! Thanks for clarifying! Zookeeper session expiry during shard leader election can cause multiple leaders. - Key: SOLR-7844 URL: https://issues.apache.org/jira/browse/SOLR-7844 Project: Solr Issue Type: Bug Affects Versions: 4.10.4 Reporter: Mike Roberts Assignee: Mark Miller Fix For: Trunk, 5.4 Attachments: SOLR-7844.patch, SOLR-7844.patch If the ZooKeeper session expires for a host during shard leader election, the ephemeral leader_elect nodes are removed. However the threads that were processing the election are still present (and could believe the host won the election). They will then incorrectly create leader nodes once a new ZooKeeper session is established. This introduces a subtle race condition that could cause two hosts to become leader. Scenario: a three machine cluster, all of the machines are restarting at approximately the same time. The first machine starts, writes a leader_elect ephemeral node, it's the only candidate in the election so it wins and starts the leadership process. As it knows it has peers, it begins to block waiting for the peers to arrive. During this period of blocking[1] the ZK connection drops and the session expires. A new ZK session is established, and ElectionContext.cancelElection is called. Then register() is called and a new set of leader_elect ephemeral nodes are created. During the period between the ZK session expiring, and new set of leader_elect nodes being created the second machine starts. It creates its leader_elect ephemeral nodes, as there are no other nodes it wins the election and starts the leadership process. As its still missing one of its peers, it begins to block waiting for the third machine to join. There is now a race between machine1 machine2, both of whom think they are the leader. So far, this isn't too bad, because the machine that loses the race will fail when it tries to create the /collection/name/leader/shard1 node (as it already exists), and will rejoin the election. While this is happening, machine3 has started and has queued for leadership behind machine2. If the loser of the race is machine2, when it rejoins the election it cancels the current context, deleting it's leader_elect ephemeral nodes. At this point, machine3 believes it has become leader (the watcher it has on the leader_elect node fires), and it runs the LeaderElector::checkIfIAmLeader method. This method DELETES the current /collection/name/leader/shard1 node, then starts the leadership process (as all three machines are now running, it does not block to wait). So, machine1 won the race with machine2 and declared its leadership and created the nodes. However, machine3 has just deleted them, and recreated them for itself. So machine1 and machine3 both believe they are the leader. I am thinking that the fix should be to cancel close all election contexts immediately on reconnect (we do cancel them, however it's run serially which has blocking issues, and just canceling does not cause the wait loop to exit). That election context logic already has checks on the closed flag, so they should exit if they see it has been closed. I'm working on a patch for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7844) Zookeeper session expiry during shard leader election can cause multiple leaders.
[ https://issues.apache.org/jira/browse/SOLR-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720410#comment-14720410 ] Jessica Cheng Mallet commented on SOLR-7844: Ah, well, create means version is 0 (or whatever initial version is) right? Otherwise you get a NodeExists back. Hmm... Zookeeper session expiry during shard leader election can cause multiple leaders. - Key: SOLR-7844 URL: https://issues.apache.org/jira/browse/SOLR-7844 Project: Solr Issue Type: Bug Affects Versions: 4.10.4 Reporter: Mike Roberts Assignee: Mark Miller Fix For: Trunk, 5.4 Attachments: SOLR-7844.patch, SOLR-7844.patch, SOLR-7844.patch If the ZooKeeper session expires for a host during shard leader election, the ephemeral leader_elect nodes are removed. However the threads that were processing the election are still present (and could believe the host won the election). They will then incorrectly create leader nodes once a new ZooKeeper session is established. This introduces a subtle race condition that could cause two hosts to become leader. Scenario: a three machine cluster, all of the machines are restarting at approximately the same time. The first machine starts, writes a leader_elect ephemeral node, it's the only candidate in the election so it wins and starts the leadership process. As it knows it has peers, it begins to block waiting for the peers to arrive. During this period of blocking[1] the ZK connection drops and the session expires. A new ZK session is established, and ElectionContext.cancelElection is called. Then register() is called and a new set of leader_elect ephemeral nodes are created. During the period between the ZK session expiring, and new set of leader_elect nodes being created the second machine starts. It creates its leader_elect ephemeral nodes, as there are no other nodes it wins the election and starts the leadership process. As its still missing one of its peers, it begins to block waiting for the third machine to join. There is now a race between machine1 machine2, both of whom think they are the leader. So far, this isn't too bad, because the machine that loses the race will fail when it tries to create the /collection/name/leader/shard1 node (as it already exists), and will rejoin the election. While this is happening, machine3 has started and has queued for leadership behind machine2. If the loser of the race is machine2, when it rejoins the election it cancels the current context, deleting it's leader_elect ephemeral nodes. At this point, machine3 believes it has become leader (the watcher it has on the leader_elect node fires), and it runs the LeaderElector::checkIfIAmLeader method. This method DELETES the current /collection/name/leader/shard1 node, then starts the leadership process (as all three machines are now running, it does not block to wait). So, machine1 won the race with machine2 and declared its leadership and created the nodes. However, machine3 has just deleted them, and recreated them for itself. So machine1 and machine3 both believe they are the leader. I am thinking that the fix should be to cancel close all election contexts immediately on reconnect (we do cancel them, however it's run serially which has blocking issues, and just canceling does not cause the wait loop to exit). That election context logic already has checks on the closed flag, so they should exit if they see it has been closed. I'm working on a patch for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7844) Zookeeper session expiry during shard leader election can cause multiple leaders.
[ https://issues.apache.org/jira/browse/SOLR-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720713#comment-14720713 ] Jessica Cheng Mallet commented on SOLR-7844: Since you're doing a setData on the parent (and thereby bumping the parent's version) each time you create the leaderPath, you should be able to rely on the parent's version as well, instead of its cversion. Since you're doing the multi already, might as well add ops.add(Op.check(leaderSeqPath, -1)); right before the Op.create? Zookeeper session expiry during shard leader election can cause multiple leaders. - Key: SOLR-7844 URL: https://issues.apache.org/jira/browse/SOLR-7844 Project: Solr Issue Type: Bug Affects Versions: 4.10.4 Reporter: Mike Roberts Assignee: Mark Miller Fix For: Trunk, 5.4 Attachments: SOLR-7844.patch, SOLR-7844.patch, SOLR-7844.patch, SOLR-7844.patch If the ZooKeeper session expires for a host during shard leader election, the ephemeral leader_elect nodes are removed. However the threads that were processing the election are still present (and could believe the host won the election). They will then incorrectly create leader nodes once a new ZooKeeper session is established. This introduces a subtle race condition that could cause two hosts to become leader. Scenario: a three machine cluster, all of the machines are restarting at approximately the same time. The first machine starts, writes a leader_elect ephemeral node, it's the only candidate in the election so it wins and starts the leadership process. As it knows it has peers, it begins to block waiting for the peers to arrive. During this period of blocking[1] the ZK connection drops and the session expires. A new ZK session is established, and ElectionContext.cancelElection is called. Then register() is called and a new set of leader_elect ephemeral nodes are created. During the period between the ZK session expiring, and new set of leader_elect nodes being created the second machine starts. It creates its leader_elect ephemeral nodes, as there are no other nodes it wins the election and starts the leadership process. As its still missing one of its peers, it begins to block waiting for the third machine to join. There is now a race between machine1 machine2, both of whom think they are the leader. So far, this isn't too bad, because the machine that loses the race will fail when it tries to create the /collection/name/leader/shard1 node (as it already exists), and will rejoin the election. While this is happening, machine3 has started and has queued for leadership behind machine2. If the loser of the race is machine2, when it rejoins the election it cancels the current context, deleting it's leader_elect ephemeral nodes. At this point, machine3 believes it has become leader (the watcher it has on the leader_elect node fires), and it runs the LeaderElector::checkIfIAmLeader method. This method DELETES the current /collection/name/leader/shard1 node, then starts the leadership process (as all three machines are now running, it does not block to wait). So, machine1 won the race with machine2 and declared its leadership and created the nodes. However, machine3 has just deleted them, and recreated them for itself. So machine1 and machine3 both believe they are the leader. I am thinking that the fix should be to cancel close all election contexts immediately on reconnect (we do cancel them, however it's run serially which has blocking issues, and just canceling does not cause the wait loop to exit). That election context logic already has checks on the closed flag, so they should exit if they see it has been closed. I'm working on a patch for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-7694) Allow setting an overall client request timeout that includes retries
Jessica Cheng Mallet created SOLR-7694: -- Summary: Allow setting an overall client request timeout that includes retries Key: SOLR-7694 URL: https://issues.apache.org/jira/browse/SOLR-7694 Project: Solr Issue Type: Improvement Components: SolrJ Reporter: Jessica Cheng Mallet Current we're able to set a socket timeout on the underlying httpClient of an LBHttpSolrServer (used by CloudSolrServer). However, this timeout only applies to a single request that's issued from LBHttpSolrServer, but LBHttpSolrServer will go on to try all eligible candidate servers when a SocketTimeoutException is thrown, so that potentially the request can in fact take (socketTimeout * number of eligible servers) time to return from the caller's perspective. This is hard to predict. We should allow setting an overall client request timeout apart from the single request socketTimeout, so that the request call is guaranteed terminate by this timeout (either via success or via a timeout exception). This allows the client application to properly size their timeout and request thread pools to avoid request thread exhaustion if solr is experiencing issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540687#comment-14540687 ] Jessica Cheng Mallet commented on SOLR-6220: This doesn't seem to handle addReplica. I think it'd be nice to merge in the logic of the Assign class and get rid of it completely so there's just one place to handle any kind of replica assignment. Replica placement strategy for solrcloud Key: SOLR-6220 URL: https://issues.apache.org/jira/browse/SOLR-6220 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch h1.Objective Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster are allocated . Solr should have a flexible mechanism through which we should be able to control allocation of replicas or later change it to suit the needs of the system All configurations are per collection basis. The rules are applied whenever a replica is created in any of the shards in a given collection during * collection creation * shard splitting * add replica * createsshard There are two aspects to how replicas are placed: snitch and placement. h2.snitch How to identify the tags of nodes. Snitches are configured through collection create command with the snitch param . eg: snitch=EC2Snitch or snitch=class:EC2Snitch h2.ImplicitSnitch This is shipped by default with Solr. user does not need to specify {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are present in the rules , it is automatically used, tags provided by ImplicitSnitch # cores : No:of cores in the node # disk : Disk space available in the node # host : host name of the node # node: node name # D.* : These are values available from systrem propertes. {{D.key}} means a value that is passed to the node as {{-Dkey=keyValue}} during the node startup. It is possible to use rules like {{D.key:expectedVal,shard:*}} h2.Rules This tells how many replicas for a given shard needs to be assigned to nodes with the given key value pairs. These parameters will be passed on to the collection CREATE api as a multivalued parameter rule . The values will be saved in the state of the collection as follows {code:Javascript} { “mycollection”:{ “snitch”: { class:“ImplicitSnitch” } “rules”:[{cores:4-}, {replica:1 ,shard :* ,node:*}, {disk:100}] } {code} A rule is specified as a pseudo JSON syntax . which is a map of keys and values *Each collection can have any number of rules. As long as the rules do not conflict with each other it should be OK. Or else an error is thrown * In each rule , shard and replica can be omitted ** default value of replica is {{\*}} means ANY or you can specify a count and an operand such as {{}} (less than) or {{}} (greater than) ** and the value of shard can be a shard name or {{\*}} means EACH or {{**}} means ANY. default value is {{\*\*}} (ANY) * There should be exactly one extra condition in a rule other than {{shard}} and {{replica}}. * all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing but values provided by the snitch for each node * By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system implicitly h3.How are nodes picked up? Nodes are not picked up in random. The rules are used to first sort the nodes according to affinity. For example, if there is a rule that says {{disk:100+}} , nodes with more disk space are given higher preference. And if the rule is {{disk:100-}} nodes with lesser disk space will be given priority. If everything else is equal , nodes with fewer cores are given higher priority h3.Fuzzy match Fuzzy match can be applied when strict matches fail .The values can be prefixed {{~}} to specify fuzziness example rule {noformat} #Example requirement use only one replica of a shard in a host if possible, if no matches found , relax that rule. rack:*,shard:*,replica:2~ #Another example, assign all replicas to nodes with disk space of 100GB or more,, or relax the rule if not possible. This will ensure that if a node does not exist with 100GB disk, nodes are picked up the order of size say a 85GB node would be picked up over 80GB disk node disk:100~ {noformat} Examples: {noformat} #in each rack there can be max two replicas of A given shard rack:*,shard:*,replica:3 //in each rack there can be max two replicas of ANY replica rack:*,shard:**,replica:2 rack:*,replica:3 #in each node there should be a max one
[jira] [Commented] (SOLR-6220) Replica placement strategy for solrcloud
[ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526889#comment-14526889 ] Jessica Cheng Mallet commented on SOLR-6220: It'll also be nice to have a new collection API to modify the rule for a collection so that we can add rules for an existing collection or modify a bad rule set. Replica placement strategy for solrcloud Key: SOLR-6220 URL: https://issues.apache.org/jira/browse/SOLR-6220 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch h1.Objective Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster are allocated . Solr should have a flexible mechanism through which we should be able to control allocation of replicas or later change it to suit the needs of the system All configurations are per collection basis. The rules are applied whenever a replica is created in any of the shards in a given collection during * collection creation * shard splitting * add replica * createsshard There are two aspects to how replicas are placed: snitch and placement. h2.snitch How to identify the tags of nodes. Snitches are configured through collection create command with the snitch param . eg: snitch=EC2Snitch or snitch=class:EC2Snitch h2.ImplicitSnitch This is shipped by default with Solr. user does not need to specify {{ImplicitSnitch}} in configuration. If the tags known to ImplicitSnitch are present in the rules , it is automatically used, tags provided by ImplicitSnitch # cores : No:of cores in the node # disk : Disk space available in the node # host : host name of the node # node: node name # D.* : These are values available from systrem propertes. {{D.key}} means a value that is passed to the node as {{-Dkey=keyValue}} during the node startup. It is possible to use rules like {{D.key:expectedVal,shard:*}} h2.Rules This tells how many replicas for a given shard needs to be assigned to nodes with the given key value pairs. These parameters will be passed on to the collection CREATE api as a multivalued parameter rule . The values will be saved in the state of the collection as follows {code:Javascript} { “mycollection”:{ “snitch”: { class:“ImplicitSnitch” } “rules”:[{cores:4-}, {replica:1 ,shard :* ,node:*}, {disk:100}] } {code} A rule is specified as a pseudo JSON syntax . which is a map of keys and values *Each collection can have any number of rules. As long as the rules do not conflict with each other it should be OK. Or else an error is thrown * In each rule , shard and replica can be omitted ** default value of replica is {{\*}} means ANY or you can specify a count and an operand such as {{}} (less than) or {{}} (greater than) ** and the value of shard can be a shard name or {{\*}} means EACH or {{**}} means ANY. default value is {{\*\*}} (ANY) * There should be exactly one extra condition in a rule other than {{shard}} and {{replica}}. * all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing but values provided by the snitch for each node * By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system implicitly h3.How are nodes picked up? Nodes are not picked up in random. The rules are used to first sort the nodes according to affinity. For example, if there is a rule that says {{disk:100+}} , nodes with more disk space are given higher preference. And if the rule is {{disk:100-}} nodes with lesser disk space will be given priority. If everything else is equal , nodes with fewer cores are given higher priority h3.Fuzzy match Fuzzy match can be applied when strict matches fail .The values can be prefixed {{~}} to specify fuzziness example rule {noformat} #Example requirement use only one replica of a shard in a host if possible, if no matches found , relax that rule. rack:*,shard:*,replica:2~ #Another example, assign all replicas to nodes with disk space of 100GB or more,, or relax the rule if not possible. This will ensure that if a node does not exist with 100GB disk, nodes are picked up the order of size say a 85GB node would be picked up over 80GB disk node disk:100~ {noformat} Examples: {noformat} #in each rack there can be max two replicas of A given shard rack:*,shard:*,replica:3 //in each rack there can be max two replicas of ANY replica rack:*,shard:**,replica:2 rack:*,replica:3 #in each node there should be a max one replica of EACH shard
[jira] [Commented] (SOLR-7361) Main Jetty thread blocked by core loading delays HTTP listener from binding if core loading is slow
[ https://issues.apache.org/jira/browse/SOLR-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483985#comment-14483985 ] Jessica Cheng Mallet commented on SOLR-7361: I think I have also seen cases where if we bounced two nodes holding two replicas of a particular collection/shard, then they both can't complete their recovery because they can't talk to each other. This fixes itself eventually when they time out waiting for each other, but before that happens they're basically deadlocked. (Unfortunately I don't have logs to back that up anymore, so it's more of an anecdotal account.) Main Jetty thread blocked by core loading delays HTTP listener from binding if core loading is slow --- Key: SOLR-7361 URL: https://issues.apache.org/jira/browse/SOLR-7361 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Timothy Potter During server startup, the CoreContainer uses an ExecutorService to load cores in multiple back-ground threads but then blocks until cores are loaded, see: CoreContainer#load around line 290 on trunk (invokeAll). From the JavaDoc on that method, we have: {quote} Executes the given tasks, returning a list of Futures holding their status and results when all complete. Future.isDone() is true for each element of the returned list. {quote} In other words, this is a blocking call. This delays the Jetty HTTP listener from binding and accepting requests until all cores are loaded. Do we need to block the main thread? Also, prior to this happening, the node is registered as a live node in ZK, which makes it a candidate for receiving requests from the Overseer, such as to service a create collection request. The problem of course is that the node listed in /live_nodes isn't accepting requests yet. So we either need to unblock the main thread during server loading or maybe wait longer before we register as a live node ... not sure which is the better way forward? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-6722) Distributed function query sumtotaltermfreq does not return correct aggregated result
[ https://issues.apache.org/jira/browse/SOLR-6722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jessica Cheng Mallet closed SOLR-6722. -- Resolution: Won't Fix We implemented our own plugin to handle the aggregation correctly using a different field than maxScore. Distributed function query sumtotaltermfreq does not return correct aggregated result - Key: SOLR-6722 URL: https://issues.apache.org/jira/browse/SOLR-6722 Project: Solr Issue Type: Bug Reporter: Jessica Cheng Mallet Labels: aggregate, distributed, function The relevancy function query sumtotaltermfreq uses maxScore to return its result. However, in distributed mode, max is the incorrect aggregation function for sumtotaltermfreq. Instead, the sum should be returned. For example, in the following break-down of 3 shards, we expect the sumtotaltermfreq to be 1802.0 + 1693.0 + 1693.0, but instead the overall query returns a maxScore of 1802.0, which is the max but not the answer we want, and the sum is not returned anywhere. { responseHeader:{ status:0, QTime:4, params:{ debugQuery:true), indent:true, q:sumtotaltermfreq(field1), wt:json, rows:0, defType:func}}, response:{numFound”:477,”start:0,maxScore:1802.0,docs:[] }, debug:{ track:{ rid:-collection1_shard1_replica1-1415238629909-9, EXECUTE_QUERY:[ http://host1 ip:8983/solr/collection1_shard2_replica1/|http://host2 ip:8984/solr/collection1_shard2_replica2/,[ QTime,1, ElapsedTime,2, RequestPurpose,GET_TOP_IDS, NumFound,165, Response,{responseHeader={status=0,QTime=1,params={distrib=false,debug=track,wt=javabin,requestPurpose=GET_TOP_IDS,version=2,rows=0,defType=func,NOW=1415238629908,shard.url=http://host1 ip:8983/solr/collection1_shard2_replica1/|http://host2 ip:8984/solr/collection1_shard2_replica2/,df=text,debugQuery=false,fl=uuid,score,rid=-collection1_shard1_replica1-1415238629909-9,start=0,q=sumtotaltermfreq(field1),isShard=true,fsv=true}},response={numFound=165,start=0,maxScore=1802.0,docs=[]},sort_values={},debug={}}], http://host2 ip:8985/solr/collection1_shard1_replica1/|http://host1 ip:8986/solr/collection1_shard1_replica2/,[ QTime,0, ElapsedTime,2, RequestPurpose,GET_TOP_IDS, NumFound,145, Response,{responseHeader={status=0,QTime=0,params={distrib=false,debug=track,wt=javabin,requestPurpose=GET_TOP_IDS,version=2,rows=0,defType=func,NOW=1415238629908,shard.url=http://host2 ip:8985/solr/collection1_shard1_replica1/|http://host1 ip:8986/solr/collection1_shard1_replica2/,df=text,debugQuery=false,fl=uuid,score,rid=-collection1_shard1_replica1-1415238629909-9,start=0,q=sumtotaltermfreq(field1),isShard=true,fsv=true}},response={numFound=145,start=0,maxScore=1693.0,docs=[]},sort_values={},debug={}}], http://host2 ip:8988/solr/collection1_shard3_replica1/|http://host1 ip:8987/solr/collection1_shard3_replica2/,[ QTime,0, ElapsedTime,2, RequestPurpose,GET_TOP_IDS, NumFound,167, Response,{responseHeader={status=0,QTime=0,params={distrib=false,debug=track,wt=javabin,requestPurpose=GET_TOP_IDS,version=2,rows=0,defType=func,NOW=1415238629908,shard.url=http://host2 ip:8988/solr/collection1_shard3_replica1/|http://host1 ip:8987/solr/collection1_shard3_replica2/,df=text,debugQuery=false,fl=uuid,score,rid=-collection1_shard1_replica1-1415238629909-9,start=0,q=sumtotaltermfreq(field1),isShard=true,fsv=true}},response={numFound=167,start=0,maxScore=1693.0,docs=[]},sort_values={},debug={}}]]}, explain:{}}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6521) CloudSolrServer should synchronize cache cluster state loading
[ https://issues.apache.org/jira/browse/SOLR-6521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285925#comment-14285925 ] Jessica Cheng Mallet commented on SOLR-6521: The patch is locking the entire cache for all loading, which might not be an ideal solution for a cluster with many, many collections. Guava's implementation of LocalCache would only lock and wait on Segments, which increases the concurrency level (which is tunable). CloudSolrServer should synchronize cache cluster state loading -- Key: SOLR-6521 URL: https://issues.apache.org/jira/browse/SOLR-6521 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Jessica Cheng Mallet Assignee: Noble Paul Priority: Critical Labels: SolrCloud Fix For: 5.0, Trunk Attachments: SOLR-6521.patch Under heavy load-testing with the new solrj client that caches the cluster state instead of setting a watcher, I started seeing lots of zk connection loss on the client-side when refreshing the CloudSolrServer collectionStateCache, and this was causing crazy client-side 99.9% latency (~15 sec). I swapped the cache out with guava's LoadingCache (which does locking to ensure only one thread loads the content under one key while the other threads that want the same key wait) and the connection loss went away and the 99.9% latency also went down to just about 1 sec. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6521) CloudSolrServer should synchronize cache cluster state loading
[ https://issues.apache.org/jira/browse/SOLR-6521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286062#comment-14286062 ] Jessica Cheng Mallet commented on SOLR-6521: bq. I agree that the concurrency can be dramatically improved . Using Guava may not be an option because it is not yet a dependency on SolrJ. The other option would be to make the cache pluggable through an API . So ,if you have Guava or something else in your package you can plug it in through an API That'd be awesome! CloudSolrServer should synchronize cache cluster state loading -- Key: SOLR-6521 URL: https://issues.apache.org/jira/browse/SOLR-6521 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Jessica Cheng Mallet Assignee: Noble Paul Priority: Critical Labels: SolrCloud Fix For: 5.0, Trunk Attachments: SOLR-6521.patch Under heavy load-testing with the new solrj client that caches the cluster state instead of setting a watcher, I started seeing lots of zk connection loss on the client-side when refreshing the CloudSolrServer collectionStateCache, and this was causing crazy client-side 99.9% latency (~15 sec). I swapped the cache out with guava's LoadingCache (which does locking to ensure only one thread loads the content under one key while the other threads that want the same key wait) and the connection loss went away and the 99.9% latency also went down to just about 1 sec. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6854) Stale cached state in CloudSolrServer
Jessica Cheng Mallet created SOLR-6854: -- Summary: Stale cached state in CloudSolrServer Key: SOLR-6854 URL: https://issues.apache.org/jira/browse/SOLR-6854 Project: Solr Issue Type: Bug Components: SolrCloud, SolrJ Reporter: Jessica Cheng Mallet CloudSolrServer’s cached state is not being updated for a newly created collection if we started polling for the collection state too early and a down state is cached. Requests to the newly created collection continues to fail with No live SolrServers available to handle this request until the cache is invalidated by time. Logging on the client side reveals that while the state in ZkStateReader is updated to active, the cached state in CloudSolrServer remains in down. {quote} CloudSolrServer cached state: DocCollection(collection-1418250319268)={ shards:{shard1:{ range:8000-7fff, state:active, replicas:{core_node1:{ state:down, base_url:http://localhost:8983/solr;, core:collection-1418250319268_shard1_replica1, node_name:localhost:8983_solr, maxShardsPerNode:1, external:true, router:{name:compositeId}, replicationFactor:1”} ZkStateReader state: DocCollection(collection-1418250319268)={ shards:{shard1:{ range:8000-7fff, state:active, replicas:{core_node1:{ state:active, base_url:http://localhost:8983/solr;, core:collection-1418250319268_shard1_replica1, node_name:localhost:8983_solr, leader:true, maxShardsPerNode:1, router:{name:compositeId}, external:true, replicationFactor:1”} {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6854) Stale cached state in CloudSolrServer
[ https://issues.apache.org/jira/browse/SOLR-6854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jessica Cheng Mallet updated SOLR-6854: --- Description: CloudSolrServer’s cached state is not being updated for a newly created collection if we started polling for the collection state too early and a down state is cached. Requests to the newly created collection continues to fail with No live SolrServers available to handle this request until the cache is invalidated by time. Logging on the client side reveals that while the state in ZkStateReader is updated to active, the cached state in CloudSolrServer remains in down. {quote} CloudSolrServer cached state: DocCollection(collection-1418250319268)={ shards:{shard1:{ range:8000-7fff, state:active, replicas:{core_node1:{ state:down, base_url:http://localhost:8983/solr;, core:collection-1418250319268_shard1_replica1, node_name:localhost:8983_solr, maxShardsPerNode:1, external:true, router:{ name:compositeId}, replicationFactor:1”} ZkStateReader state: DocCollection(collection-1418250319268)={ shards:{shard1:{ range:8000-7fff, state:active, replicas:{core_node1:{ state:active, base_url:http://localhost:8983/solr;, core:collection-1418250319268_shard1_replica1, node_name:localhost:8983_solr, leader:true, maxShardsPerNode:1, router:{ name:compositeId}, external:true, replicationFactor:1”} {quote} was: CloudSolrServer’s cached state is not being updated for a newly created collection if we started polling for the collection state too early and a down state is cached. Requests to the newly created collection continues to fail with No live SolrServers available to handle this request until the cache is invalidated by time. Logging on the client side reveals that while the state in ZkStateReader is updated to active, the cached state in CloudSolrServer remains in down. {quote} CloudSolrServer cached state: DocCollection(collection-1418250319268)={ shards:{shard1:{ range:8000-7fff, state:active, replicas:{core_node1:{ state:down, base_url:http://localhost:8983/solr;, core:collection-1418250319268_shard1_replica1, node_name:localhost:8983_solr, maxShardsPerNode:1, external:true, router:{name:compositeId}, replicationFactor:1”} ZkStateReader state: DocCollection(collection-1418250319268)={ shards:{shard1:{ range:8000-7fff, state:active, replicas:{core_node1:{ state:active, base_url:http://localhost:8983/solr;, core:collection-1418250319268_shard1_replica1, node_name:localhost:8983_solr, leader:true, maxShardsPerNode:1, router:{name:compositeId}, external:true, replicationFactor:1”} {quote} Stale cached state in CloudSolrServer - Key: SOLR-6854 URL: https://issues.apache.org/jira/browse/SOLR-6854 Project: Solr Issue Type: Bug Components: SolrCloud, SolrJ Reporter: Jessica Cheng Mallet Labels: cache, solrcloud, solrj CloudSolrServer’s cached state is not being updated for a newly created collection if we started polling for the collection state too early and a down state is cached. Requests to the newly created collection continues to fail with No live SolrServers available to handle this request until the cache is invalidated by time. Logging on the client side reveals that while the state in ZkStateReader is updated to active, the cached state in CloudSolrServer remains in down. {quote} CloudSolrServer cached state: DocCollection(collection-1418250319268)={ shards:{shard1:{ range:8000-7fff, state:active, replicas:{core_node1:{ state:down, base_url:http://localhost:8983/solr;, core:collection-1418250319268_shard1_replica1, node_name:localhost:8983_solr, maxShardsPerNode:1, external:true, router:{ name:compositeId}, replicationFactor:1”} ZkStateReader state: DocCollection(collection-1418250319268)={ shards:{shard1:{ range:8000-7fff, state:active, replicas:{core_node1:{ state:active, base_url:http://localhost:8983/solr;, core:collection-1418250319268_shard1_replica1, node_name:localhost:8983_solr, leader:true, maxShardsPerNode:1, router:{ name:compositeId}, external:true, replicationFactor:1”} {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
[jira] [Commented] (SOLR-6626) NPE in FieldMutatingUpdateProcessor when indexing a doc with null field value
[ https://issues.apache.org/jira/browse/SOLR-6626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14244683#comment-14244683 ] Jessica Cheng Mallet commented on SOLR-6626: A similar NPE is happening in AllValuesOrNoneFieldMutatingUpdateProcessor. Should I open a new Jira for that? NPE in FieldMutatingUpdateProcessor when indexing a doc with null field value - Key: SOLR-6626 URL: https://issues.apache.org/jira/browse/SOLR-6626 Project: Solr Issue Type: Bug Affects Versions: 4.9 Reporter: Paul Baclace Assignee: Noble Paul Fix For: 5.0, Trunk NullPointerException when indexing a JSON doc with null field. 1. run the example-schemaless 2. visit http://localhost:8983/solr/#/collection1/documents 3. put a doc { id:fooop } and it succeeds 4. put a doc { id:fooop, exampleField:null } and NPE is result. This could be considered a regression of SOLR-2714 which was resolved in v3.6, but the error occurs when the null-containing doc is added instead of during parsing. Stacktrace: ERROR org.apache.solr.servlet.SolrDispatchFilter – null:java.lang.NullPointerException at org.apache.solr.update.processor.FieldValueMutatingUpdateProcessor.mutate(FieldValueMutatingUpdateProcessor.java:65) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:97) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:867) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1021) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:690) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.AbstractDefaultValueUpdateProcessorFactory$DefaultValueUpdateProcessor.processAdd(AbstractDefaultValueUpdateProcessorFactory.java:94) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:141) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:106) at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:68) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:99) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6626) NullPointerException when indexing a JSON doc with null field
[ https://issues.apache.org/jira/browse/SOLR-6626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243071#comment-14243071 ] Jessica Cheng Mallet commented on SOLR-6626: We're also seeing this with the javabin codec: null:java.lang.NullPointerException at org.apache.solr.update.processor.FieldValueMutatingUpdateProcessor.mutate(FieldValueMutatingUpdateProcessor.java:65) at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:97) at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:96) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:166) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:136) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:225) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:190) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:116) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:173) at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:106) at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1956) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:799) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:422) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at com.apple.cie.search.plugin.auth.TrustFilter.doFilter(TrustFilter.java:43) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:953) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) I wonder if we shouldn't just add a null check to FieldValueMutatingUpdateProcessor.mutate for null field value instead of making the codec skip serializing/deserializing nulls. NullPointerException when
[jira] [Created] (SOLR-6722) Distributed function query sumtotaltermfreq does not return correct aggregated result
Jessica Cheng Mallet created SOLR-6722: -- Summary: Distributed function query sumtotaltermfreq does not return correct aggregated result Key: SOLR-6722 URL: https://issues.apache.org/jira/browse/SOLR-6722 Project: Solr Issue Type: Bug Reporter: Jessica Cheng Mallet The relevancy function query sumtotaltermfreq uses maxScore to return its result. However, in distributed mode, max is the incorrect aggregation function for sumtotaltermfreq. Instead, the sum should be returned. For example, in the following break-down of 3 shards, we expect the sumtotaltermfreq to be 1802.0 + 1693.0 + 1693.0, but instead the overall query returns a maxScore of 1802.0, which is the max but not the answer we want, and the sum is not returned anywhere. { responseHeader:{ status:0, QTime:4, params:{ debugQuery:true), indent:true, q:sumtotaltermfreq(field1), wt:json, rows:0, defType:func}}, response:{numFound”:477,”start:0,maxScore:1802.0,docs:[] }, debug:{ track:{ rid:-collection1_shard1_replica1-1415238629909-9, EXECUTE_QUERY:[ http://host1 ip:8983/solr/collection1_shard2_replica1/|http://host2 ip:8984/solr/collection1_shard2_replica2/,[ QTime,1, ElapsedTime,2, RequestPurpose,GET_TOP_IDS, NumFound,165, Response,{responseHeader={status=0,QTime=1,params={distrib=false,debug=track,wt=javabin,requestPurpose=GET_TOP_IDS,version=2,rows=0,defType=func,NOW=1415238629908,shard.url=http://host1 ip:8983/solr/collection1_shard2_replica1/|http://host2 ip:8984/solr/collection1_shard2_replica2/,df=text,debugQuery=false,fl=uuid,score,rid=-collection1_shard1_replica1-1415238629909-9,start=0,q=sumtotaltermfreq(field1),isShard=true,fsv=true}},response={numFound=165,start=0,maxScore=1802.0,docs=[]},sort_values={},debug={}}], http://host2 ip:8985/solr/collection1_shard1_replica1/|http://host1 ip:8986/solr/collection1_shard1_replica2/,[ QTime,0, ElapsedTime,2, RequestPurpose,GET_TOP_IDS, NumFound,145, Response,{responseHeader={status=0,QTime=0,params={distrib=false,debug=track,wt=javabin,requestPurpose=GET_TOP_IDS,version=2,rows=0,defType=func,NOW=1415238629908,shard.url=http://host2 ip:8985/solr/collection1_shard1_replica1/|http://host1 ip:8986/solr/collection1_shard1_replica2/,df=text,debugQuery=false,fl=uuid,score,rid=-collection1_shard1_replica1-1415238629909-9,start=0,q=sumtotaltermfreq(field1),isShard=true,fsv=true}},response={numFound=145,start=0,maxScore=1693.0,docs=[]},sort_values={},debug={}}], http://host2 ip:8988/solr/collection1_shard3_replica1/|http://host1 ip:8987/solr/collection1_shard3_replica2/,[ QTime,0, ElapsedTime,2, RequestPurpose,GET_TOP_IDS, NumFound,167, Response,{responseHeader={status=0,QTime=0,params={distrib=false,debug=track,wt=javabin,requestPurpose=GET_TOP_IDS,version=2,rows=0,defType=func,NOW=1415238629908,shard.url=http://host2 ip:8988/solr/collection1_shard3_replica1/|http://host1 ip:8987/solr/collection1_shard3_replica2/,df=text,debugQuery=false,fl=uuid,score,rid=-collection1_shard1_replica1-1415238629909-9,start=0,q=sumtotaltermfreq(field1),isShard=true,fsv=true}},response={numFound=167,start=0,maxScore=1693.0,docs=[]},sort_values={},debug={}}]]}, explain:{}}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6631) DistributedQueue spinning on calling zookeeper getChildren()
[ https://issues.apache.org/jira/browse/SOLR-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14190522#comment-14190522 ] Jessica Cheng Mallet commented on SOLR-6631: +1, thanks Tim! DistributedQueue spinning on calling zookeeper getChildren() Key: SOLR-6631 URL: https://issues.apache.org/jira/browse/SOLR-6631 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Jessica Cheng Mallet Assignee: Timothy Potter Labels: solrcloud Attachments: SOLR-6631.patch, SOLR-6631.patch The change from SOLR-6336 introduced a bug where now I'm stuck in a loop making getChildren() request to zookeeper with this thread dump: {quote} Thread-51 [WAITING] CPU time: 1d 15h 0m 57s java.lang.Object.wait() org.apache.zookeeper.ClientCnxn.submitRequest(RequestHeader, Record, Record, ZooKeeper$WatchRegistration) org.apache.zookeeper.ZooKeeper.getChildren(String, Watcher) org.apache.solr.common.cloud.SolrZkClient$6.execute()2 recursive calls org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkOperation) org.apache.solr.common.cloud.SolrZkClient.getChildren(String, Watcher, boolean) org.apache.solr.cloud.DistributedQueue.orderedChildren(Watcher) org.apache.solr.cloud.DistributedQueue.getChildren(long) org.apache.solr.cloud.DistributedQueue.peek(long) org.apache.solr.cloud.DistributedQueue.peek(boolean) org.apache.solr.cloud.Overseer$ClusterStateUpdater.run() java.lang.Thread.run() {quote} Looking at the code, I think the issue is that LatchChildWatcher#process always sets the event to its member variable event, regardless of its type, but the problem is that once the member event is set, the await no longer waits. In this state, the while loop in getChildren(long), when called with wait being Integer.MAX_VALUE will loop back, NOT wait at await because event != null, but then it still will not get any children. {quote} while (true) \{ if (!children.isEmpty()) break; watcher.await(wait == Long.MAX_VALUE ? DEFAULT_TIMEOUT : wait); if (watcher.getWatchedEvent() != null) \{ children = orderedChildren(null); \} if (wait != Long.MAX_VALUE) break; \} {quote} I think the fix would be to only set the event in the watcher if the type is not None. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6631) DistributedQueue spinning on calling zookeeper getChildren()
[ https://issues.apache.org/jira/browse/SOLR-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188554#comment-14188554 ] Jessica Cheng Mallet commented on SOLR-6631: I originally thought NodeChildrenChanged would be enough too, but it made the tests hang forever. That's when I realized that the zk.exist() call in offer() also uses this watcher, so it's not enough to just watch for NodeChildrenChanged. We can either make the watcher set all not None events (None events don't remove watches, so they need to be excluded), or use a different kind of watch in the zk.exist() call. DistributedQueue spinning on calling zookeeper getChildren() Key: SOLR-6631 URL: https://issues.apache.org/jira/browse/SOLR-6631 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Jessica Cheng Mallet Assignee: Timothy Potter Labels: solrcloud Attachments: SOLR-6631.patch The change from SOLR-6336 introduced a bug where now I'm stuck in a loop making getChildren() request to zookeeper with this thread dump: {quote} Thread-51 [WAITING] CPU time: 1d 15h 0m 57s java.lang.Object.wait() org.apache.zookeeper.ClientCnxn.submitRequest(RequestHeader, Record, Record, ZooKeeper$WatchRegistration) org.apache.zookeeper.ZooKeeper.getChildren(String, Watcher) org.apache.solr.common.cloud.SolrZkClient$6.execute()2 recursive calls org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkOperation) org.apache.solr.common.cloud.SolrZkClient.getChildren(String, Watcher, boolean) org.apache.solr.cloud.DistributedQueue.orderedChildren(Watcher) org.apache.solr.cloud.DistributedQueue.getChildren(long) org.apache.solr.cloud.DistributedQueue.peek(long) org.apache.solr.cloud.DistributedQueue.peek(boolean) org.apache.solr.cloud.Overseer$ClusterStateUpdater.run() java.lang.Thread.run() {quote} Looking at the code, I think the issue is that LatchChildWatcher#process always sets the event to its member variable event, regardless of its type, but the problem is that once the member event is set, the await no longer waits. In this state, the while loop in getChildren(long), when called with wait being Integer.MAX_VALUE will loop back, NOT wait at await because event != null, but then it still will not get any children. {quote} while (true) \{ if (!children.isEmpty()) break; watcher.await(wait == Long.MAX_VALUE ? DEFAULT_TIMEOUT : wait); if (watcher.getWatchedEvent() != null) \{ children = orderedChildren(null); \} if (wait != Long.MAX_VALUE) break; \} {quote} I think the fix would be to only set the event in the watcher if the type is not None. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6610) In stateFormat=2, ZkController.publishAndWaitForDownStates always times out
[ https://issues.apache.org/jira/browse/SOLR-6610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187156#comment-14187156 ] Jessica Cheng Mallet commented on SOLR-6610: Shalin, I think you're right. I misread the code in that publishAndWaitForDownStates's call to clusterState.getCollection(collectionName) doesn't actually require a watch since it'll call out to zookeeper on-demand. This also explains why most of our complaints come for 1 node dev clusters. In stateFormat=2, ZkController.publishAndWaitForDownStates always times out --- Key: SOLR-6610 URL: https://issues.apache.org/jira/browse/SOLR-6610 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Jessica Cheng Mallet Assignee: Noble Paul Labels: solrcloud Using stateFormat=2, our solr always takes a while to start up and spits out this warning line: {quote} WARN - 2014-10-08 17:30:24.290; org.apache.solr.cloud.ZkController; Timed out waiting to see all nodes published as DOWN in our cluster state. {quote} Looking at the code, this is probably because ZkController.publishAndWaitForDownStates is called in ZkController.init, which gets called via ZkContainer.initZookeeper in CoreContainer.load before any of the stateFormat=2 collection watches are set in the CoreContainer.preRegisterInZk call a few lines later. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6650) Add optional slow request logging at WARN level
[ https://issues.apache.org/jira/browse/SOLR-6650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185361#comment-14185361 ] Jessica Cheng Mallet commented on SOLR-6650: Hi Tim, It should be the latest. Do you see the default being changed for 1000 to -1 (https://github.com/apache/lucene-solr/pull/102/files)? Add optional slow request logging at WARN level --- Key: SOLR-6650 URL: https://issues.apache.org/jira/browse/SOLR-6650 Project: Solr Issue Type: Improvement Reporter: Jessica Cheng Mallet Assignee: Timothy Potter Labels: logging Fix For: 5.0 At super high request rates, logging all the requests can become a bottleneck and therefore INFO logging is often turned off. However, it is still useful to be able to set a latency threshold above which a request is considered slow and log that request at WARN level so we can easily identify slow queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6650) Add optional slow request logging at WARN level
[ https://issues.apache.org/jira/browse/SOLR-6650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183804#comment-14183804 ] Jessica Cheng Mallet commented on SOLR-6650: Updated the PR to have this be disabled by default. Add optional slow request logging at WARN level --- Key: SOLR-6650 URL: https://issues.apache.org/jira/browse/SOLR-6650 Project: Solr Issue Type: Improvement Reporter: Jessica Cheng Mallet Assignee: Timothy Potter Labels: logging Fix For: 5.0 At super high request rates, logging all the requests can become a bottleneck and therefore INFO logging is often turned off. However, it is still useful to be able to set a latency threshold above which a request is considered slow and log that request at WARN level so we can easily identify slow queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6650) Add optional slow request logging at WARN level
Jessica Cheng Mallet created SOLR-6650: -- Summary: Add optional slow request logging at WARN level Key: SOLR-6650 URL: https://issues.apache.org/jira/browse/SOLR-6650 Project: Solr Issue Type: Improvement Reporter: Jessica Cheng Mallet At super high request rates, logging all the requests can become a bottleneck and therefore INFO logging is often turned off. However, it is still useful to be able to set a latency threshold above which a request is considered slow and log that request at WARN level so we can easily identify slow queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6639) Failed CREATE leaves behind on-disk cruft that DELETE does not remove
Jessica Cheng Mallet created SOLR-6639: -- Summary: Failed CREATE leaves behind on-disk cruft that DELETE does not remove Key: SOLR-6639 URL: https://issues.apache.org/jira/browse/SOLR-6639 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Jessica Cheng Mallet When a CREATE api call fails (due to bad config etc.), it leaves behind on-disk core directories that a DELETE api call does not delete, which results in future CREATE call of with the same collection name to fail. The only way to get around this is to go on to the host and manually remove the orphaned directories. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6631) DistributedQueue spinning on calling zookeeper getChildren()
[ https://issues.apache.org/jira/browse/SOLR-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jessica Cheng Mallet updated SOLR-6631: --- Description: The change from SOLR-6336 introduced a bug where now I'm stuck in a loop making getChildren() request to zookeeper with this thread dump: {quote} Thread-51 [WAITING] CPU time: 1d 15h 0m 57s java.lang.Object.wait() org.apache.zookeeper.ClientCnxn.submitRequest(RequestHeader, Record, Record, ZooKeeper$WatchRegistration) org.apache.zookeeper.ZooKeeper.getChildren(String, Watcher) org.apache.solr.common.cloud.SolrZkClient$6.execute()2 recursive calls org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkOperation) org.apache.solr.common.cloud.SolrZkClient.getChildren(String, Watcher, boolean) org.apache.solr.cloud.DistributedQueue.orderedChildren(Watcher) org.apache.solr.cloud.DistributedQueue.getChildren(long) org.apache.solr.cloud.DistributedQueue.peek(long) org.apache.solr.cloud.DistributedQueue.peek(boolean) org.apache.solr.cloud.Overseer$ClusterStateUpdater.run() java.lang.Thread.run() {quote} Looking at the code, I think the issue is that LatchChildWatcher#process always sets the event to its member variable event, regardless of its type, but the problem is that once the member event is set, the await no longer waits. In this state, the while loop in getChildren(long), when called with wait being Integer.MAX_VALUE will loop back, NOT wait at await because event != null, but then it still will not get any children. {quote} while (true) \{ if (!children.isEmpty()) break; watcher.await(wait == Long.MAX_VALUE ? DEFAULT_TIMEOUT : wait); if (watcher.getWatchedEvent() != null) \{ children = orderedChildren(null); \} if (wait != Long.MAX_VALUE) break; \} {quote} I think the fix would be to only set the event in the watcher if the type is not None. was: The change from SOLR-6336 introduced a bug where now I'm stuck in a loop making getChildren() request to zookeeper with this thread dump: {quote} Thread-51 [WAITING] CPU time: 1d 15h 0m 57s java.lang.Object.wait() org.apache.zookeeper.ClientCnxn.submitRequest(RequestHeader, Record, Record, ZooKeeper$WatchRegistration) org.apache.zookeeper.ZooKeeper.getChildren(String, Watcher) org.apache.solr.common.cloud.SolrZkClient$6.execute()2 recursive calls org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkOperation) org.apache.solr.common.cloud.SolrZkClient.getChildren(String, Watcher, boolean) org.apache.solr.cloud.DistributedQueue.orderedChildren(Watcher) org.apache.solr.cloud.DistributedQueue.getChildren(long) org.apache.solr.cloud.DistributedQueue.peek(long) org.apache.solr.cloud.DistributedQueue.peek(boolean) org.apache.solr.cloud.Overseer$ClusterStateUpdater.run() java.lang.Thread.run() {quote} Looking at the code, I think the issue is that LatchChildWatcher#process always sets the event to its member variable event, regardless of its type, but the problem is that once the member event is set, the await no longer waits. In this state, the while loop in getChildren(long), when called with wait being Integer.MAX_VALUE will loop back, NOT wait at await because event != null, but then it still will not get any children. {quote} while (true) \{ if (!children.isEmpty()) break; watcher.await(wait == Long.MAX_VALUE ? DEFAULT_TIMEOUT : wait); if (watcher.getWatchedEvent() != null) \{ children = orderedChildren(null); \} if (wait != Long.MAX_VALUE) break; \} {quote} I think the fix would be to only set the event in the watcher if the type is a NodeChildrenChanged. DistributedQueue spinning on calling zookeeper getChildren() Key: SOLR-6631 URL: https://issues.apache.org/jira/browse/SOLR-6631 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Jessica Cheng Mallet Labels: solrcloud The change from SOLR-6336 introduced a bug where now I'm stuck in a loop making getChildren() request to zookeeper with this thread dump: {quote} Thread-51 [WAITING] CPU time: 1d 15h 0m 57s java.lang.Object.wait() org.apache.zookeeper.ClientCnxn.submitRequest(RequestHeader, Record, Record, ZooKeeper$WatchRegistration) org.apache.zookeeper.ZooKeeper.getChildren(String, Watcher) org.apache.solr.common.cloud.SolrZkClient$6.execute()2 recursive calls org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkOperation) org.apache.solr.common.cloud.SolrZkClient.getChildren(String, Watcher, boolean) org.apache.solr.cloud.DistributedQueue.orderedChildren(Watcher) org.apache.solr.cloud.DistributedQueue.getChildren(long) org.apache.solr.cloud.DistributedQueue.peek(long) org.apache.solr.cloud.DistributedQueue.peek(boolean) org.apache.solr.cloud.Overseer$ClusterStateUpdater.run() java.lang.Thread.run() {quote} Looking at the code, I
[jira] [Commented] (SOLR-6336) DistributedQueue (and it's use in OCP) leaks ZK Watches
[ https://issues.apache.org/jira/browse/SOLR-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174420#comment-14174420 ] Jessica Cheng Mallet commented on SOLR-6336: Please let me know if I'm supposed to open a new issue (not sure what the policy is). I'm encountering a bug from this patch where now I'm stuck in a loop making getChildren() request to zookeeper with this thread dump: {quote} Thread-51 [WAITING] CPU time: 1d 15h 0m 57s java.lang.Object.wait() org.apache.zookeeper.ClientCnxn.submitRequest(RequestHeader, Record, Record, ZooKeeper$WatchRegistration) org.apache.zookeeper.ZooKeeper.getChildren(String, Watcher) org.apache.solr.common.cloud.SolrZkClient$6.execute()2 recursive calls org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkOperation) org.apache.solr.common.cloud.SolrZkClient.getChildren(String, Watcher, boolean) org.apache.solr.cloud.DistributedQueue.orderedChildren(Watcher) org.apache.solr.cloud.DistributedQueue.getChildren(long) org.apache.solr.cloud.DistributedQueue.peek(long) org.apache.solr.cloud.DistributedQueue.peek(boolean) org.apache.solr.cloud.Overseer$ClusterStateUpdater.run() java.lang.Thread.run() {quote} Looking at the code, I think the issue is that LatchChildWatcher#process always sets the event to its member, regardless of its type, but the problem is that once an event is set, the await no longer waits. In this state, the while loop in getChildren(long), when called with wait being Integer.MAX_VALUE will come back, NOT wait at await because event != null, but then it still will not get any children. {quote} while (true) { if (!children.isEmpty()) break; watcher.await(wait == Long.MAX_VALUE ? DEFAULT_TIMEOUT : wait); if (watcher.getWatchedEvent() != null) { children = orderedChildren(null); } if (wait != Long.MAX_VALUE) break; } {quote} I think the fix would be to only set the event in the watcher if the type is a NodeChildrenChanged. DistributedQueue (and it's use in OCP) leaks ZK Watches --- Key: SOLR-6336 URL: https://issues.apache.org/jira/browse/SOLR-6336 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Ramkumar Aiyengar Assignee: Mark Miller Fix For: 4.10, Trunk The current {{DistributedQueue}} implementation leaks ZK watches whenever it finds children or times out on finding one. OCP uses this in its event loop and can loop tight in some conditions (when exclusivity checks fail), leading to lots of watches which get triggered together on the next event (could be a while for some activities like shard splitting). This gets exposed by SOLR-6261 which spawns a new thread for every parallel watch event. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6336) DistributedQueue (and it's use in OCP) leaks ZK Watches
[ https://issues.apache.org/jira/browse/SOLR-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174519#comment-14174519 ] Jessica Cheng Mallet commented on SOLR-6336: Thanks for the clarification [~elyograg]! I'll open a new issue. Thanks! DistributedQueue (and it's use in OCP) leaks ZK Watches --- Key: SOLR-6336 URL: https://issues.apache.org/jira/browse/SOLR-6336 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Ramkumar Aiyengar Assignee: Mark Miller Fix For: 4.10, Trunk The current {{DistributedQueue}} implementation leaks ZK watches whenever it finds children or times out on finding one. OCP uses this in its event loop and can loop tight in some conditions (when exclusivity checks fail), leading to lots of watches which get triggered together on the next event (could be a while for some activities like shard splitting). This gets exposed by SOLR-6261 which spawns a new thread for every parallel watch event. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6631) DistributedQueue spinning on calling zookeeper getChildren()
Jessica Cheng Mallet created SOLR-6631: -- Summary: DistributedQueue spinning on calling zookeeper getChildren() Key: SOLR-6631 URL: https://issues.apache.org/jira/browse/SOLR-6631 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Jessica Cheng Mallet The change from SOLR-6336 introduced a bug where now I'm stuck in a loop making getChildren() request to zookeeper with this thread dump: {quote} Thread-51 [WAITING] CPU time: 1d 15h 0m 57s java.lang.Object.wait() org.apache.zookeeper.ClientCnxn.submitRequest(RequestHeader, Record, Record, ZooKeeper$WatchRegistration) org.apache.zookeeper.ZooKeeper.getChildren(String, Watcher) org.apache.solr.common.cloud.SolrZkClient$6.execute()2 recursive calls org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkOperation) org.apache.solr.common.cloud.SolrZkClient.getChildren(String, Watcher, boolean) org.apache.solr.cloud.DistributedQueue.orderedChildren(Watcher) org.apache.solr.cloud.DistributedQueue.getChildren(long) org.apache.solr.cloud.DistributedQueue.peek(long) org.apache.solr.cloud.DistributedQueue.peek(boolean) org.apache.solr.cloud.Overseer$ClusterStateUpdater.run() java.lang.Thread.run() {quote} Looking at the code, I think the issue is that LatchChildWatcher#process always sets the event to its member variable event, regardless of its type, but the problem is that once the member event is set, the await no longer waits. In this state, the while loop in getChildren(long), when called with wait being Integer.MAX_VALUE will loop back, NOT wait at await because event != null, but then it still will not get any children. {quote} while (true) \{ if (!children.isEmpty()) break; watcher.await(wait == Long.MAX_VALUE ? DEFAULT_TIMEOUT : wait); if (watcher.getWatchedEvent() != null) \{ children = orderedChildren(null); \} if (wait != Long.MAX_VALUE) break; \} {quote} I think the fix would be to only set the event in the watcher if the type is a NodeChildrenChanged. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6336) DistributedQueue (and it's use in OCP) leaks ZK Watches
[ https://issues.apache.org/jira/browse/SOLR-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174556#comment-14174556 ] Jessica Cheng Mallet commented on SOLR-6336: Got it! Thanks! DistributedQueue (and it's use in OCP) leaks ZK Watches --- Key: SOLR-6336 URL: https://issues.apache.org/jira/browse/SOLR-6336 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Ramkumar Aiyengar Assignee: Mark Miller Fix For: 4.10, Trunk The current {{DistributedQueue}} implementation leaks ZK watches whenever it finds children or times out on finding one. OCP uses this in its event loop and can loop tight in some conditions (when exclusivity checks fail), leading to lots of watches which get triggered together on the next event (could be a while for some activities like shard splitting). This gets exposed by SOLR-6261 which spawns a new thread for every parallel watch event. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6610) In stateFormat=2, ZkController.publishAndWaitForDownStates always times out
[ https://issues.apache.org/jira/browse/SOLR-6610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165613#comment-14165613 ] Jessica Cheng Mallet commented on SOLR-6610: We're seeing it manifested in our own build, but looks like the relevant code in trunk is the same. I did mis-describe it in that I said ZkController.init is called in ZkContainer.initZookeeper, but actually it's called in the constructor of ZKController, which is constructed in ZkContainer.initZookeeper. In stateFormat=2, ZkController.publishAndWaitForDownStates always times out --- Key: SOLR-6610 URL: https://issues.apache.org/jira/browse/SOLR-6610 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Jessica Cheng Mallet Assignee: Noble Paul Labels: solrcloud Using stateFormat=2, our solr always takes a while to start up and spits out this warning line: {quote} WARN - 2014-10-08 17:30:24.290; org.apache.solr.cloud.ZkController; Timed out waiting to see all nodes published as DOWN in our cluster state. {quote} Looking at the code, this is probably because ZkController.publishAndWaitForDownStates is called in ZkController.init, which gets called via ZkContainer.initZookeeper in CoreContainer.load before any of the stateFormat=2 collection watches are set in the CoreContainer.preRegisterInZk call a few lines later. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6610) In stateFormat=2, ZkController.publishAndWaitForDownStates always times out
Jessica Cheng Mallet created SOLR-6610: -- Summary: In stateFormat=2, ZkController.publishAndWaitForDownStates always times out Key: SOLR-6610 URL: https://issues.apache.org/jira/browse/SOLR-6610 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Jessica Cheng Mallet Using stateFormat=2, our solr always takes a while to start up and spits out this warning line: {quote} WARN - 2014-10-08 17:30:24.290; org.apache.solr.cloud.ZkController; Timed out waiting to see all nodes published as DOWN in our cluster state. {quote} Looking at the code, this is probably because ZkController.publishAndWaitForDownStates is called in ZkController.init, which gets called via ZkContainer.initZookeeper in CoreContainer.load before any of the stateFormat=2 collection watches are set in the CoreContainer.preRegisterInZk call a few lines later. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6426) SolrZkClient clean can fail due to a race with children nodes.
[ https://issues.apache.org/jira/browse/SOLR-6426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157255#comment-14157255 ] Jessica Cheng Mallet commented on SOLR-6426: Hey Mark, just took a look at this patch and there is a risk of stack overflow if children nodes are actively being added. Would you please comment on where you saw the race happen that necessitated this change? Is it better to eliminate that risk instead? SolrZkClient clean can fail due to a race with children nodes. -- Key: SOLR-6426 URL: https://issues.apache.org/jira/browse/SOLR-6426 Project: Solr Issue Type: Bug Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 5.0, Trunk -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6426) SolrZkClient clean can fail due to a race with children nodes.
[ https://issues.apache.org/jira/browse/SOLR-6426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157445#comment-14157445 ] Jessica Cheng Mallet commented on SOLR-6426: The only thing that I'm worried with regarding to the racing clients is that before, this code will fail (which is what you're trying to fix here), but now there might be a risk of infinite recursion here causing stack overflow if it keeps coming back to this point and finding more children after it thinks it deleted all of them. In practice it probably won't happen, but it just feels a bit scary. Maybe that part can be made iterative instead (with a maximum bail-out number of tries)? SolrZkClient clean can fail due to a race with children nodes. -- Key: SOLR-6426 URL: https://issues.apache.org/jira/browse/SOLR-6426 Project: Solr Issue Type: Bug Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 5.0, Trunk -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6521) CloudSolrServer should synchronize cache cluster state loading
[ https://issues.apache.org/jira/browse/SOLR-6521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135713#comment-14135713 ] Jessica Cheng Mallet commented on SOLR-6521: I worry that this will not be performant enough since that makes lookups of different keys serial, and in a system with lots of active collections, lookups will be unnecessarily blocked. In the same vein, I feel that the codebase can benefit from a LoadingCache-style cache that only synchronizes loading of the same key. Incidentally we also found that the FilterCache population suffers from the same problem. We've found that in our load test, when a filter query is heavily used, it actually adversely impacts performance to cache the fq because whenever the filter cache is invalidated, all the request threads try to build the cache entry for that exact same fq and locks the system up. We would get 30 second 99% request time when we cache the fq, and when we added {!cache=false} to the fq, 99% went back down to the 100ms range. I understand that adding guava to solrj would probably require a lot of discussions and voting, etc., but I think the benefit of a LoadingCache-style cache is high enough that even if we can't include guava, it might make sense for solr/apache to implement their own. CloudSolrServer should synchronize cache cluster state loading -- Key: SOLR-6521 URL: https://issues.apache.org/jira/browse/SOLR-6521 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Jessica Cheng Mallet Assignee: Noble Paul Labels: SolrCloud Under heavy load-testing with the new solrj client that caches the cluster state instead of setting a watcher, I started seeing lots of zk connection loss on the client-side when refreshing the CloudSolrServer collectionStateCache, and this was causing crazy client-side 99.9% latency (~15 sec). I swapped the cache out with guava's LoadingCache (which does locking to ensure only one thread loads the content under one key while the other threads that want the same key wait) and the connection loss went away and the 99.9% latency also went down to just about 1 sec. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6521) CloudSolrServer should synchronize cache cluster state loading
[ https://issues.apache.org/jira/browse/SOLR-6521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136363#comment-14136363 ] Jessica Cheng Mallet commented on SOLR-6521: Xu, I think it's unlikely you've hit the same issue unless you are using the new split clusterstate feature in https://issues.apache.org/jira/browse/SOLR-5473. CloudSolrServer should synchronize cache cluster state loading -- Key: SOLR-6521 URL: https://issues.apache.org/jira/browse/SOLR-6521 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Jessica Cheng Mallet Assignee: Noble Paul Labels: SolrCloud Under heavy load-testing with the new solrj client that caches the cluster state instead of setting a watcher, I started seeing lots of zk connection loss on the client-side when refreshing the CloudSolrServer collectionStateCache, and this was causing crazy client-side 99.9% latency (~15 sec). I swapped the cache out with guava's LoadingCache (which does locking to ensure only one thread loads the content under one key while the other threads that want the same key wait) and the connection loss went away and the 99.9% latency also went down to just about 1 sec. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6521) CloudSolrServer should synchronize cache cluster state loading
Jessica Cheng Mallet created SOLR-6521: -- Summary: CloudSolrServer should synchronize cache cluster state loading Key: SOLR-6521 URL: https://issues.apache.org/jira/browse/SOLR-6521 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Jessica Cheng Mallet Under heavy load-testing with the new solrj client that caches the cluster state instead of setting a watcher, I started seeing lots of zk connection loss on the client-side when refreshing the CloudSolrServer collectionStateCache, and this was causing crazy client-side 99.9% latency (~15 sec). I swapped the cache out with guava's LoadingCache (which does locking to ensure only one thread loads the content under one key while the other threads that want the same key wait) and the connection loss went away and the 99.9% latency also went down to just about 1 sec. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5473) Split clusterstate.json per collection and watch states selectively
[ https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134449#comment-14134449 ] Jessica Cheng Mallet commented on SOLR-5473: [~noble.paul] Here: https://issues.apache.org/jira/browse/SOLR-6521. Thanks! Split clusterstate.json per collection and watch states selectively Key: SOLR-5473 URL: https://issues.apache.org/jira/browse/SOLR-5473 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Labels: SolrCloud Fix For: 4.10, 5.0 Attachments: SOLR-5473-74 .patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74_POC.patch, SOLR-5473-configname-fix.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473_no_ui.patch, SOLR-5473_undo.patch, ec2-23-20-119-52_solr.log, ec2-50-16-38-73_solr.log As defined in the parent issue, store the states of each collection under /collections/collectionname/state.json node and watches state changes selectively. https://reviews.apache.org/r/24220/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5473) Split clusterstate.json per collection and watch states selectively
[ https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132126#comment-14132126 ] Jessica Cheng Mallet commented on SOLR-5473: Hey guys, under heavy load-testing yesterday with this I started seeing lots of zk connection loss on the client-side when refreshing the CloudSolrServer collectionStateCache, and this was causing crazy client-side 99.9% latency (~15 sec). I swapped the cache out with guava's LoadingCache (which does locking to ensure only one thread loads the content under one key while the other threads that want the same key wait) and the connection loss went away and the 99.9% latency also went down to just about 1 sec. Split clusterstate.json per collection and watch states selectively Key: SOLR-5473 URL: https://issues.apache.org/jira/browse/SOLR-5473 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Labels: SolrCloud Fix For: 4.10, 5.0 Attachments: SOLR-5473-74 .patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74_POC.patch, SOLR-5473-configname-fix.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473_no_ui.patch, SOLR-5473_undo.patch, ec2-23-20-119-52_solr.log, ec2-50-16-38-73_solr.log As defined in the parent issue, store the states of each collection under /collections/collectionname/state.json node and watches state changes selectively. https://reviews.apache.org/r/24220/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6405) ZooKeeper calls can easily not be retried enough on ConnectionLoss.
[ https://issues.apache.org/jira/browse/SOLR-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107153#comment-14107153 ] Jessica Cheng Mallet commented on SOLR-6405: Should the if (attemptCount 0) check be removed in retryDelay, now that the sleep is (attemptCount + 1) * retryDelay? I think in practice we'd never miss the initial 1.5s sleep since the padding on the retryDelay is enough to make up for it, but it's slightly harder to reason about. (The way I thought it worked was that roughly retry count is calculated so 1+2+3+4+...+retryCount ~= timeoutSec, so when that's multiplied by 1.5x (the retryDelay), we have the timeout covered. Is this right?) ZooKeeper calls can easily not be retried enough on ConnectionLoss. --- Key: SOLR-6405 URL: https://issues.apache.org/jira/browse/SOLR-6405 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Priority: Critical Fix For: 5.0, 4.10 Attachments: SOLR-6405.patch The current design requires that we are sure we retry on connection loss until session expiration. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6405) ZooKeeper calls can easily not be retried enough on ConnectionLoss.
[ https://issues.apache.org/jira/browse/SOLR-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107242#comment-14107242 ] Jessica Cheng Mallet commented on SOLR-6405: {quote} but not so long as we are waiting for no reason and tying up threads {quote} This is actually the other thing that I was worried about. With the padding being on a multiplier, for the default 15s timeout, we're already doing 7.5s of total extra sleep (1.5+3+4.5+6+7.5=22.5). Is that too much? With your change of the comment // 1500 ms over for padding, did you actually mean to do something like (attemptCount + 1) * 1000 + retryDelay? ZooKeeper calls can easily not be retried enough on ConnectionLoss. --- Key: SOLR-6405 URL: https://issues.apache.org/jira/browse/SOLR-6405 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Priority: Critical Fix For: 5.0, 4.10 Attachments: SOLR-6405.patch The current design requires that we are sure we retry on connection loss until session expiration. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6405) ZooKeeper calls can easily not be retried enough on ConnectionLoss.
[ https://issues.apache.org/jira/browse/SOLR-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107289#comment-14107289 ] Jessica Cheng Mallet commented on SOLR-6405: Right, most likely the first time it hits the ConnectionLoss it's not time=0 of the connection loss, so by loop i=4, it would've slept for 15s since the i=0 and therefore hit a SessionExpired. But then, thinking about it again, why be clever at all about the padding or back-off? Not to propose that we change this now, but let's pretend we don't do back-off and just sleep 1s between each loop. If we were to get ConnectionLoss back in the next attempt, there's no harm to try at all because if we're disconnected, the attempt wouldn't be hitting zookeeper anyway. If we were to get SessionExpired back, great, we can break out now and throw the exception. If we've reconnected, then yay, we succeeded. Because with each call we're expecting to get either success, failure (SessionExpired), or in progress (ConnectionLoss), we can really just retry forever without limiting the loop count (unless we're worried that somehow we'll keep getting ConnectionLoss even though the session has expired, but that'd be a pretty serious zookeeper client bug. And if we're really worried about that, we can always say do 10 more loops after we have slept a total of timeout already). In the end, it's really weird that this method should ever semantically allow throwing a ConnectionLoss exception, if we got the math wrong, because the intent is to retry until we get a SessionExpired, isn't it? ZooKeeper calls can easily not be retried enough on ConnectionLoss. --- Key: SOLR-6405 URL: https://issues.apache.org/jira/browse/SOLR-6405 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Priority: Critical Fix For: 5.0, 4.10 Attachments: SOLR-6405.patch The current design requires that we are sure we retry on connection loss until session expiration. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6405) ZooKeeper calls can easily not be retried enough on ConnectionLoss.
[ https://issues.apache.org/jira/browse/SOLR-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107289#comment-14107289 ] Jessica Cheng Mallet edited comment on SOLR-6405 at 8/22/14 7:01 PM: - Right, most likely the first time it hits the ConnectionLoss it's not time=0 of the connection loss, so by loop i=4, it would've slept for 15s since the i=0 and therefore hit a SessionExpired. But then, thinking about it again, why be clever at all about the padding or back-off? Not to propose that we change this now, but let's pretend we don't do back-off and just sleep 1s between each loop. If we were to get ConnectionLoss back in the next attempt, there's no harm to try at all because if we're disconnected, the attempt wouldn't be hitting zookeeper anyway. If we were to get SessionExpired back, great, we can break out now and throw the exception. If we've reconnected, then yay, we succeeded. Because with each call we're expecting to get either success, failure (SessionExpired), or in progress (ConnectionLoss), we can really just retry forever without limiting the loop count (unless we're worried that somehow we'll keep getting ConnectionLoss even though the session has expired, but that'd be a pretty serious zookeeper client bug. And if we're really worried about that, we can always say do 10 more loops after we have slept a total of timeout already). The advantage of this approach is to never sleep for too long before finding out the definitive answer of success or SessionExpired, while if the answer is ConnectionLoss, it's not really incurring any extra load on zookeeper anyway. In the end, it's really weird that this method should ever semantically allow throwing a ConnectionLoss exception, if we got the math wrong, because the intent is to retry until we get a SessionExpired, isn't it? was (Author: mewmewball): Right, most likely the first time it hits the ConnectionLoss it's not time=0 of the connection loss, so by loop i=4, it would've slept for 15s since the i=0 and therefore hit a SessionExpired. But then, thinking about it again, why be clever at all about the padding or back-off? Not to propose that we change this now, but let's pretend we don't do back-off and just sleep 1s between each loop. If we were to get ConnectionLoss back in the next attempt, there's no harm to try at all because if we're disconnected, the attempt wouldn't be hitting zookeeper anyway. If we were to get SessionExpired back, great, we can break out now and throw the exception. If we've reconnected, then yay, we succeeded. Because with each call we're expecting to get either success, failure (SessionExpired), or in progress (ConnectionLoss), we can really just retry forever without limiting the loop count (unless we're worried that somehow we'll keep getting ConnectionLoss even though the session has expired, but that'd be a pretty serious zookeeper client bug. And if we're really worried about that, we can always say do 10 more loops after we have slept a total of timeout already). In the end, it's really weird that this method should ever semantically allow throwing a ConnectionLoss exception, if we got the math wrong, because the intent is to retry until we get a SessionExpired, isn't it? ZooKeeper calls can easily not be retried enough on ConnectionLoss. --- Key: SOLR-6405 URL: https://issues.apache.org/jira/browse/SOLR-6405 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Priority: Critical Fix For: 5.0, 4.10 Attachments: SOLR-6405.patch The current design requires that we are sure we retry on connection loss until session expiration. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6405) ZooKeeper calls can easily not be retried enough on ConnectionLoss.
[ https://issues.apache.org/jira/browse/SOLR-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107289#comment-14107289 ] Jessica Cheng Mallet edited comment on SOLR-6405 at 8/22/14 7:02 PM: - Right, most likely the first time it hits the ConnectionLoss it's not time=0 of the connection loss, so by loop i=4, it would've slept for 15s since the i=0 and therefore hit a SessionExpired. But then, thinking about it again, why be clever at all about the padding or back-off? Not to propose that we change this now, but let's pretend we don't do back-off and just sleep 1s between each loop. If we were to get ConnectionLoss back in the next attempt, there's no harm to try at all because if we're disconnected, the attempt wouldn't be hitting zookeeper anyway. If we were to get SessionExpired back, great, we can break out now and throw the exception. If we've reconnected, then yay, we succeeded. Because with each call we're expecting to get either success, failure (SessionExpired), or in progress (ConnectionLoss), we can really just retry forever without limiting the loop count (unless we're worried that somehow we'll keep getting ConnectionLoss even though the session has expired, but that'd be a pretty serious zookeeper client bug. And if we're really worried about that, we can always say do 10 more loops after we have slept a total of timeout already). The advantage of this approach is to never sleep for too long before finding out the definitive answer of success or SessionExpired, while if the answer is ConnectionLoss, it's not really incurring any extra load on zookeeper anyway. In the end, it's really weird that this method should ever semantically allow throwing a ConnectionLoss exception, if we got the math wrong, because the intent is to retry until we get a SessionExpired, isn't it? (Oh, or success of course. :)) was (Author: mewmewball): Right, most likely the first time it hits the ConnectionLoss it's not time=0 of the connection loss, so by loop i=4, it would've slept for 15s since the i=0 and therefore hit a SessionExpired. But then, thinking about it again, why be clever at all about the padding or back-off? Not to propose that we change this now, but let's pretend we don't do back-off and just sleep 1s between each loop. If we were to get ConnectionLoss back in the next attempt, there's no harm to try at all because if we're disconnected, the attempt wouldn't be hitting zookeeper anyway. If we were to get SessionExpired back, great, we can break out now and throw the exception. If we've reconnected, then yay, we succeeded. Because with each call we're expecting to get either success, failure (SessionExpired), or in progress (ConnectionLoss), we can really just retry forever without limiting the loop count (unless we're worried that somehow we'll keep getting ConnectionLoss even though the session has expired, but that'd be a pretty serious zookeeper client bug. And if we're really worried about that, we can always say do 10 more loops after we have slept a total of timeout already). The advantage of this approach is to never sleep for too long before finding out the definitive answer of success or SessionExpired, while if the answer is ConnectionLoss, it's not really incurring any extra load on zookeeper anyway. In the end, it's really weird that this method should ever semantically allow throwing a ConnectionLoss exception, if we got the math wrong, because the intent is to retry until we get a SessionExpired, isn't it? ZooKeeper calls can easily not be retried enough on ConnectionLoss. --- Key: SOLR-6405 URL: https://issues.apache.org/jira/browse/SOLR-6405 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Priority: Critical Fix For: 5.0, 4.10 Attachments: SOLR-6405.patch The current design requires that we are sure we retry on connection loss until session expiration. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6405) ZooKeeper calls can easily not be retried enough on ConnectionLoss.
[ https://issues.apache.org/jira/browse/SOLR-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107382#comment-14107382 ] Jessica Cheng Mallet commented on SOLR-6405: OK, thanks for the explanations! It's really helpful. ZooKeeper calls can easily not be retried enough on ConnectionLoss. --- Key: SOLR-6405 URL: https://issues.apache.org/jira/browse/SOLR-6405 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Priority: Critical Fix For: 5.0, 4.10 Attachments: SOLR-6405.patch The current design requires that we are sure we retry on connection loss until session expiration. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6402) OverseerCollectionProcessor should not exit for ZK ConnectionLoss
[ https://issues.apache.org/jira/browse/SOLR-6402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106325#comment-14106325 ] Jessica Cheng Mallet commented on SOLR-6402: And thanks for fixing! :) OverseerCollectionProcessor should not exit for ZK ConnectionLoss - Key: SOLR-6402 URL: https://issues.apache.org/jira/browse/SOLR-6402 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.8, 5.0 Reporter: Jessica Cheng Mallet Assignee: Mark Miller Fix For: 5.0, 4.10 We saw an occurrence where we had some ZK connection blip and the OverseerCollectionProcessor thread stopped but the ClusterStateUpdater output some error but kept running, and the node didn't lose its leadership. this caused our collection work queue to back up. Right now OverseerCollectionProcessor's run method has on trunk: {quote} 344 if (e.code() == KeeperException.Code.SESSIONEXPIRED 345 || e.code() == KeeperException.Code.CONNECTIONLOSS) \{ 346 log.warn(Overseer cannot talk to ZK); 347 return; 348 \} {quote} I think this if statement should only be for SESSIONEXPIRED. If it just experiences a connection loss but then reconnect before the session expired, it'll keep being the leader. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6402) OverseerCollectionProcessor should not exit for ZK ConnectionLoss
[ https://issues.apache.org/jira/browse/SOLR-6402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106324#comment-14106324 ] Jessica Cheng Mallet commented on SOLR-6402: Unfortunately since it just logs and return, I just have the log line {quote} 2014-08-21 10:51:39,060 WARN [Overseer-164353762238923913-scrubbed IP:8983_solr-n_000757] OverseerCollectionProcessor.java (line 350) Overseer cannot talk to ZK {quote} Unfortunately, even though amILeader() tries to handle connection loss, there are lots of other operations past the if check that don't. E.g. all those workqueue manipulations. OverseerCollectionProcessor should not exit for ZK ConnectionLoss - Key: SOLR-6402 URL: https://issues.apache.org/jira/browse/SOLR-6402 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.8, 5.0 Reporter: Jessica Cheng Mallet Assignee: Mark Miller Fix For: 5.0, 4.10 We saw an occurrence where we had some ZK connection blip and the OverseerCollectionProcessor thread stopped but the ClusterStateUpdater output some error but kept running, and the node didn't lose its leadership. this caused our collection work queue to back up. Right now OverseerCollectionProcessor's run method has on trunk: {quote} 344 if (e.code() == KeeperException.Code.SESSIONEXPIRED 345 || e.code() == KeeperException.Code.CONNECTIONLOSS) \{ 346 log.warn(Overseer cannot talk to ZK); 347 return; 348 \} {quote} I think this if statement should only be for SESSIONEXPIRED. If it just experiences a connection loss but then reconnect before the session expired, it'll keep being the leader. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6402) OverseerCollectionProcessor should not exit for ZK ConnectionLoss
[ https://issues.apache.org/jira/browse/SOLR-6402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106433#comment-14106433 ] Jessica Cheng Mallet commented on SOLR-6402: {quote} All ZK manipulation should be through SolrZkClient, which should use ZkCmdExecutor to retry on connection loss passed expiration unless explicitly asked not to. {quote} Ah, I missed that. So I took a look at ZkCmdExecutor.retryOperation(), we have this effect (for the default of 15s timeout and therefore retryCount=5): i sleep 00s 11.5s 23s 34.5s 46s which adds up to 15s, the timeout. However, what if on loop i=4, the operation threw connection loss again, but then since the sleep is at the end of the catch block, while it slept the last time for 6s, the client reconnected so the session didn't expire? Maybe the intended thing is to do retryDelay(i+1) so that it would've slept 1.5s when i=0,..., and 6s when i=3, but retry i=4 at the end of 15s? Disclaimer that I actually don't know that what I think may have happened happened at all, since, like I said, I only have that one log message and the fact that while OverseerCollectionProcessor died, the ClusterStateUpdater didn't die. OverseerCollectionProcessor should not exit for ZK ConnectionLoss - Key: SOLR-6402 URL: https://issues.apache.org/jira/browse/SOLR-6402 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.8, 5.0 Reporter: Jessica Cheng Mallet Assignee: Mark Miller Fix For: 5.0, 4.10 We saw an occurrence where we had some ZK connection blip and the OverseerCollectionProcessor thread stopped but the ClusterStateUpdater output some error but kept running, and the node didn't lose its leadership. this caused our collection work queue to back up. Right now OverseerCollectionProcessor's run method has on trunk: {quote} 344 if (e.code() == KeeperException.Code.SESSIONEXPIRED 345 || e.code() == KeeperException.Code.CONNECTIONLOSS) \{ 346 log.warn(Overseer cannot talk to ZK); 347 return; 348 \} {quote} I think this if statement should only be for SESSIONEXPIRED. If it just experiences a connection loss but then reconnect before the session expired, it'll keep being the leader. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org