[jira] [Commented] (SOLR-11475) Endless loop and OOM in PeerSync
[ https://issues.apache.org/jira/browse/SOLR-11475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215049#comment-16215049 ] Pushkar Raste commented on SOLR-11475: -- If you are blocked then you can try to turn using versionRanges off and fallback to using individual versions. If you can wait for a code fix, I will take a stab at it this weekend. Solution I am thinking is keeping a counter and incrementing it for every iteration and if we don't break from the outermost `while` loop before `counter > Math.max(ourUpdates.size(), otherVersions.size())` then throw an exception. or in the `else` before we create a new rage add a check of X and -X and throw an exception if that is true > Endless loop and OOM in PeerSync > > > Key: SOLR-11475 > URL: https://issues.apache.org/jira/browse/SOLR-11475 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Andrey Kudryavtsev > > After problem described in SOLR-11459, I restarted cluster and got OOM on > start. > [PeerSync#handleVersionsWithRanges|https://github.com/apache/lucene-solr/blob/68bda0be421ce18811e03b229781fd6152fcc04a/solr/core/src/java/org/apache/solr/update/PeerSync.java#L539] > contains this logic: > {code} > while (otherUpdatesIndex >= 0) { > // we have run out of ourUpdates, pick up all the remaining versions > from the other versions > if (ourUpdatesIndex < 0) { > String range = otherVersions.get(otherUpdatesIndex) + "..." + > otherVersions.get(0); > rangesToRequest.add(range); > totalRequestedVersions += otherUpdatesIndex + 1; > break; > } > // stop when the entries get old enough that reorders may lead us to > see updates we don't need > if (!completeList && Math.abs(otherVersions.get(otherUpdatesIndex)) < > ourLowThreshold) break; > if (ourUpdates.get(ourUpdatesIndex).longValue() == > otherVersions.get(otherUpdatesIndex).longValue()) { > ourUpdatesIndex--; > otherUpdatesIndex--; > } else if (Math.abs(ourUpdates.get(ourUpdatesIndex)) < > Math.abs(otherVersions.get(otherUpdatesIndex))) { > ourUpdatesIndex--; > } else { > long rangeStart = otherVersions.get(otherUpdatesIndex); > while ((otherUpdatesIndex < otherVersions.size()) > && (Math.abs(otherVersions.get(otherUpdatesIndex)) < > Math.abs(ourUpdates.get(ourUpdatesIndex { > otherUpdatesIndex--; > totalRequestedVersions++; > } > // construct range here > rangesToRequest.add(rangeStart + "..." + > otherVersions.get(otherUpdatesIndex + 1)); > } > } > {code} > If at some point there will be > {code} ourUpdates.get(ourUpdatesIndex) = > -otherVersions.get(otherUpdatesIndex) {code} > loop will never end. It will add same string again and again into > {{rangesToRequest}} until process runs out of memory. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11475) Endless loop and OOM in PeerSync
[ https://issues.apache.org/jira/browse/SOLR-11475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214845#comment-16214845 ] Andrey Kudryavtsev commented on SOLR-11475: --- [~praste], In SOLR-11459 I described how I got into this X / -X situation. In short - cause of (probable) defect in distributed in-place updates. But having OOM here because of another bug(s) is not a good idea imho. I even would prefer to have an exception with message like {{Your index is corrupted, pls clear your tlog...}} on start instead. > Endless loop and OOM in PeerSync > > > Key: SOLR-11475 > URL: https://issues.apache.org/jira/browse/SOLR-11475 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Andrey Kudryavtsev > > After problem described in SOLR-11459, I restarted cluster and got OOM on > start. > [PeerSync#handleVersionsWithRanges|https://github.com/apache/lucene-solr/blob/68bda0be421ce18811e03b229781fd6152fcc04a/solr/core/src/java/org/apache/solr/update/PeerSync.java#L539] > contains this logic: > {code} > while (otherUpdatesIndex >= 0) { > // we have run out of ourUpdates, pick up all the remaining versions > from the other versions > if (ourUpdatesIndex < 0) { > String range = otherVersions.get(otherUpdatesIndex) + "..." + > otherVersions.get(0); > rangesToRequest.add(range); > totalRequestedVersions += otherUpdatesIndex + 1; > break; > } > // stop when the entries get old enough that reorders may lead us to > see updates we don't need > if (!completeList && Math.abs(otherVersions.get(otherUpdatesIndex)) < > ourLowThreshold) break; > if (ourUpdates.get(ourUpdatesIndex).longValue() == > otherVersions.get(otherUpdatesIndex).longValue()) { > ourUpdatesIndex--; > otherUpdatesIndex--; > } else if (Math.abs(ourUpdates.get(ourUpdatesIndex)) < > Math.abs(otherVersions.get(otherUpdatesIndex))) { > ourUpdatesIndex--; > } else { > long rangeStart = otherVersions.get(otherUpdatesIndex); > while ((otherUpdatesIndex < otherVersions.size()) > && (Math.abs(otherVersions.get(otherUpdatesIndex)) < > Math.abs(ourUpdates.get(ourUpdatesIndex { > otherUpdatesIndex--; > totalRequestedVersions++; > } > // construct range here > rangesToRequest.add(rangeStart + "..." + > otherVersions.get(otherUpdatesIndex + 1)); > } > } > {code} > If at some point there will be > {code} ourUpdates.get(ourUpdatesIndex) = > -otherVersions.get(otherUpdatesIndex) {code} > loop will never end. It will add same string again and again into > {{rangesToRequest}} until process runs out of memory. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11475) Endless loop and OOM in PeerSync
[ https://issues.apache.org/jira/browse/SOLR-11475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212935#comment-16212935 ] Pushkar Raste commented on SOLR-11475: -- Version numbers are monotonically increasing sequence numbers and for deletes sequence number is multiplied by -1 I dont think we would ever have version number X in replica's tlog and -X in leader's (or any other replica's) tlog Can you provide a valid test case for your issue. I am not in front of computer right now, however, IIRC tests have token PeerSync in the name. On Oct 20, 2017 5:54 AM, "Andrey Kudryavtsev (JIRA)"wrote: [ https://issues.apache.org/jira/browse/SOLR-11475?page= com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kudryavtsev mentioned you on SOLR-11475 -- I think throwing exception in case of {{ourUpdates.get(ourUpdatesIndex) = -otherVersions.get(otherUpdatesIndex)}} than OOM [~praste], [~shalinmangar] What do you think? comment Hint: You can mention someone in an issue description or comment by typing "@" in front of their username. -- This message was sent by Atlassian JIRA (v6.4.14#64029) > Endless loop and OOM in PeerSync > > > Key: SOLR-11475 > URL: https://issues.apache.org/jira/browse/SOLR-11475 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Andrey Kudryavtsev > > After problem described in SOLR-11459, I restarted cluster and got OOM on > start. > [PeerSync#handleVersionsWithRanges|https://github.com/apache/lucene-solr/blob/68bda0be421ce18811e03b229781fd6152fcc04a/solr/core/src/java/org/apache/solr/update/PeerSync.java#L539] > contains this logic: > {code} > while (otherUpdatesIndex >= 0) { > // we have run out of ourUpdates, pick up all the remaining versions > from the other versions > if (ourUpdatesIndex < 0) { > String range = otherVersions.get(otherUpdatesIndex) + "..." + > otherVersions.get(0); > rangesToRequest.add(range); > totalRequestedVersions += otherUpdatesIndex + 1; > break; > } > // stop when the entries get old enough that reorders may lead us to > see updates we don't need > if (!completeList && Math.abs(otherVersions.get(otherUpdatesIndex)) < > ourLowThreshold) break; > if (ourUpdates.get(ourUpdatesIndex).longValue() == > otherVersions.get(otherUpdatesIndex).longValue()) { > ourUpdatesIndex--; > otherUpdatesIndex--; > } else if (Math.abs(ourUpdates.get(ourUpdatesIndex)) < > Math.abs(otherVersions.get(otherUpdatesIndex))) { > ourUpdatesIndex--; > } else { > long rangeStart = otherVersions.get(otherUpdatesIndex); > while ((otherUpdatesIndex < otherVersions.size()) > && (Math.abs(otherVersions.get(otherUpdatesIndex)) < > Math.abs(ourUpdates.get(ourUpdatesIndex { > otherUpdatesIndex--; > totalRequestedVersions++; > } > // construct range here > rangesToRequest.add(rangeStart + "..." + > otherVersions.get(otherUpdatesIndex + 1)); > } > } > {code} > If at some point there will be > {code} ourUpdates.get(ourUpdatesIndex) = > -otherVersions.get(otherUpdatesIndex) {code} > loop will never end. It will add same string again and again into > {{rangesToRequest}} until process runs out of memory. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11475) Endless loop and OOM in PeerSync
[ https://issues.apache.org/jira/browse/SOLR-11475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212444#comment-16212444 ] Andrey Kudryavtsev commented on SOLR-11475: --- I think throwing exception in case of {{ourUpdates.get(ourUpdatesIndex) = -otherVersions.get(otherUpdatesIndex)}} than OOM [~praste], [~shalinmangar] What do you think? > Endless loop and OOM in PeerSync > > > Key: SOLR-11475 > URL: https://issues.apache.org/jira/browse/SOLR-11475 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Andrey Kudryavtsev > > After problem described in SOLR-11459, I restarted cluster and got OOM on > start. > [PeerSync#handleVersionsWithRanges|https://github.com/apache/lucene-solr/blob/68bda0be421ce18811e03b229781fd6152fcc04a/solr/core/src/java/org/apache/solr/update/PeerSync.java#L539] > contains this logic: > {code} > while (otherUpdatesIndex >= 0) { > // we have run out of ourUpdates, pick up all the remaining versions > from the other versions > if (ourUpdatesIndex < 0) { > String range = otherVersions.get(otherUpdatesIndex) + "..." + > otherVersions.get(0); > rangesToRequest.add(range); > totalRequestedVersions += otherUpdatesIndex + 1; > break; > } > // stop when the entries get old enough that reorders may lead us to > see updates we don't need > if (!completeList && Math.abs(otherVersions.get(otherUpdatesIndex)) < > ourLowThreshold) break; > if (ourUpdates.get(ourUpdatesIndex).longValue() == > otherVersions.get(otherUpdatesIndex).longValue()) { > ourUpdatesIndex--; > otherUpdatesIndex--; > } else if (Math.abs(ourUpdates.get(ourUpdatesIndex)) < > Math.abs(otherVersions.get(otherUpdatesIndex))) { > ourUpdatesIndex--; > } else { > long rangeStart = otherVersions.get(otherUpdatesIndex); > while ((otherUpdatesIndex < otherVersions.size()) > && (Math.abs(otherVersions.get(otherUpdatesIndex)) < > Math.abs(ourUpdates.get(ourUpdatesIndex { > otherUpdatesIndex--; > totalRequestedVersions++; > } > // construct range here > rangesToRequest.add(rangeStart + "..." + > otherVersions.get(otherUpdatesIndex + 1)); > } > } > {code} > If at some point there will be > {code} ourUpdates.get(ourUpdatesIndex) = > -otherVersions.get(otherUpdatesIndex) {code} > loop will never end. It will add same string again and again into > {{rangesToRequest}} until process runs out of memory. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org