[jira] [Commented] (SOLR-11475) Endless loop and OOM in PeerSync

2017-10-23 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215049#comment-16215049
 ] 

Pushkar Raste commented on SOLR-11475:
--

If you are blocked then you can try to turn using versionRanges off and 
fallback to using individual versions. 

If you can wait for a code fix, I will take a stab at it this weekend. Solution 
I am thinking is keeping a counter and incrementing it for every iteration and 
if we don't break from the outermost `while` loop before `counter > 
Math.max(ourUpdates.size(), otherVersions.size())` then throw an exception. 

or in the `else` before we create a new rage add a check of X and -X and throw 
an exception if that is true

> Endless loop and OOM in PeerSync
> 
>
> Key: SOLR-11475
> URL: https://issues.apache.org/jira/browse/SOLR-11475
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrey Kudryavtsev
>
> After problem described in SOLR-11459, I restarted cluster and got OOM on 
> start. 
> [PeerSync#handleVersionsWithRanges|https://github.com/apache/lucene-solr/blob/68bda0be421ce18811e03b229781fd6152fcc04a/solr/core/src/java/org/apache/solr/update/PeerSync.java#L539]
>  contains this logic: 
> {code}
> while (otherUpdatesIndex >= 0) {
>   // we have run out of ourUpdates, pick up all the remaining versions 
> from the other versions
>   if (ourUpdatesIndex < 0) {
> String range = otherVersions.get(otherUpdatesIndex) + "..." + 
> otherVersions.get(0);
> rangesToRequest.add(range);
> totalRequestedVersions += otherUpdatesIndex + 1;
> break;
>   }
>   // stop when the entries get old enough that reorders may lead us to 
> see updates we don't need
>   if (!completeList && Math.abs(otherVersions.get(otherUpdatesIndex)) < 
> ourLowThreshold) break;
>   if (ourUpdates.get(ourUpdatesIndex).longValue() == 
> otherVersions.get(otherUpdatesIndex).longValue()) {
> ourUpdatesIndex--;
> otherUpdatesIndex--;
>   } else if (Math.abs(ourUpdates.get(ourUpdatesIndex)) < 
> Math.abs(otherVersions.get(otherUpdatesIndex))) {
> ourUpdatesIndex--;
>   } else {
> long rangeStart = otherVersions.get(otherUpdatesIndex);
> while ((otherUpdatesIndex < otherVersions.size())
> && (Math.abs(otherVersions.get(otherUpdatesIndex)) < 
> Math.abs(ourUpdates.get(ourUpdatesIndex {
>   otherUpdatesIndex--;
>   totalRequestedVersions++;
> }
> // construct range here
> rangesToRequest.add(rangeStart + "..." + 
> otherVersions.get(otherUpdatesIndex + 1));
>   }
> }
> {code}
> If at some point there will be
> {code} ourUpdates.get(ourUpdatesIndex) = 
> -otherVersions.get(otherUpdatesIndex) {code}
> loop will never end. It will add same string again and again into 
> {{rangesToRequest}} until process runs out of memory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11475) Endless loop and OOM in PeerSync

2017-10-23 Thread Andrey Kudryavtsev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214845#comment-16214845
 ] 

Andrey Kudryavtsev commented on SOLR-11475:
---

[~praste], In SOLR-11459 I described how I got into this X / -X situation. In 
short - cause of (probable) defect in distributed in-place updates. 

But having  OOM here because of another bug(s) is not a good idea imho. I even 
would prefer to have an exception with message like {{Your index is corrupted, 
pls clear your tlog...}} on start instead. 

> Endless loop and OOM in PeerSync
> 
>
> Key: SOLR-11475
> URL: https://issues.apache.org/jira/browse/SOLR-11475
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrey Kudryavtsev
>
> After problem described in SOLR-11459, I restarted cluster and got OOM on 
> start. 
> [PeerSync#handleVersionsWithRanges|https://github.com/apache/lucene-solr/blob/68bda0be421ce18811e03b229781fd6152fcc04a/solr/core/src/java/org/apache/solr/update/PeerSync.java#L539]
>  contains this logic: 
> {code}
> while (otherUpdatesIndex >= 0) {
>   // we have run out of ourUpdates, pick up all the remaining versions 
> from the other versions
>   if (ourUpdatesIndex < 0) {
> String range = otherVersions.get(otherUpdatesIndex) + "..." + 
> otherVersions.get(0);
> rangesToRequest.add(range);
> totalRequestedVersions += otherUpdatesIndex + 1;
> break;
>   }
>   // stop when the entries get old enough that reorders may lead us to 
> see updates we don't need
>   if (!completeList && Math.abs(otherVersions.get(otherUpdatesIndex)) < 
> ourLowThreshold) break;
>   if (ourUpdates.get(ourUpdatesIndex).longValue() == 
> otherVersions.get(otherUpdatesIndex).longValue()) {
> ourUpdatesIndex--;
> otherUpdatesIndex--;
>   } else if (Math.abs(ourUpdates.get(ourUpdatesIndex)) < 
> Math.abs(otherVersions.get(otherUpdatesIndex))) {
> ourUpdatesIndex--;
>   } else {
> long rangeStart = otherVersions.get(otherUpdatesIndex);
> while ((otherUpdatesIndex < otherVersions.size())
> && (Math.abs(otherVersions.get(otherUpdatesIndex)) < 
> Math.abs(ourUpdates.get(ourUpdatesIndex {
>   otherUpdatesIndex--;
>   totalRequestedVersions++;
> }
> // construct range here
> rangesToRequest.add(rangeStart + "..." + 
> otherVersions.get(otherUpdatesIndex + 1));
>   }
> }
> {code}
> If at some point there will be
> {code} ourUpdates.get(ourUpdatesIndex) = 
> -otherVersions.get(otherUpdatesIndex) {code}
> loop will never end. It will add same string again and again into 
> {{rangesToRequest}} until process runs out of memory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11475) Endless loop and OOM in PeerSync

2017-10-20 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212935#comment-16212935
 ] 

Pushkar Raste commented on SOLR-11475:
--

Version numbers are monotonically increasing sequence numbers and for
deletes sequence number is multiplied by -1

I dont think we would ever have version number X in replica's tlog and -X
in leader's (or any other replica's) tlog

Can you provide a valid test case for your issue. I am not in front of
computer right now, however, IIRC tests have token PeerSync in the name.


On Oct 20, 2017 5:54 AM, "Andrey Kudryavtsev (JIRA)" 
wrote:


 [ https://issues.apache.org/jira/browse/SOLR-11475?page=
com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrey Kudryavtsev mentioned you on SOLR-11475
--

I think throwing exception in case of {{ourUpdates.get(ourUpdatesIndex) =
-otherVersions.get(otherUpdatesIndex)}} than OOM

[~praste], [~shalinmangar] What do you think?


comment

Hint: You can mention someone in an issue description or comment by typing
"@" in front of their username.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


> Endless loop and OOM in PeerSync
> 
>
> Key: SOLR-11475
> URL: https://issues.apache.org/jira/browse/SOLR-11475
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrey Kudryavtsev
>
> After problem described in SOLR-11459, I restarted cluster and got OOM on 
> start. 
> [PeerSync#handleVersionsWithRanges|https://github.com/apache/lucene-solr/blob/68bda0be421ce18811e03b229781fd6152fcc04a/solr/core/src/java/org/apache/solr/update/PeerSync.java#L539]
>  contains this logic: 
> {code}
> while (otherUpdatesIndex >= 0) {
>   // we have run out of ourUpdates, pick up all the remaining versions 
> from the other versions
>   if (ourUpdatesIndex < 0) {
> String range = otherVersions.get(otherUpdatesIndex) + "..." + 
> otherVersions.get(0);
> rangesToRequest.add(range);
> totalRequestedVersions += otherUpdatesIndex + 1;
> break;
>   }
>   // stop when the entries get old enough that reorders may lead us to 
> see updates we don't need
>   if (!completeList && Math.abs(otherVersions.get(otherUpdatesIndex)) < 
> ourLowThreshold) break;
>   if (ourUpdates.get(ourUpdatesIndex).longValue() == 
> otherVersions.get(otherUpdatesIndex).longValue()) {
> ourUpdatesIndex--;
> otherUpdatesIndex--;
>   } else if (Math.abs(ourUpdates.get(ourUpdatesIndex)) < 
> Math.abs(otherVersions.get(otherUpdatesIndex))) {
> ourUpdatesIndex--;
>   } else {
> long rangeStart = otherVersions.get(otherUpdatesIndex);
> while ((otherUpdatesIndex < otherVersions.size())
> && (Math.abs(otherVersions.get(otherUpdatesIndex)) < 
> Math.abs(ourUpdates.get(ourUpdatesIndex {
>   otherUpdatesIndex--;
>   totalRequestedVersions++;
> }
> // construct range here
> rangesToRequest.add(rangeStart + "..." + 
> otherVersions.get(otherUpdatesIndex + 1));
>   }
> }
> {code}
> If at some point there will be
> {code} ourUpdates.get(ourUpdatesIndex) = 
> -otherVersions.get(otherUpdatesIndex) {code}
> loop will never end. It will add same string again and again into 
> {{rangesToRequest}} until process runs out of memory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11475) Endless loop and OOM in PeerSync

2017-10-20 Thread Andrey Kudryavtsev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212444#comment-16212444
 ] 

Andrey Kudryavtsev commented on SOLR-11475:
---

I think throwing exception in case of {{ourUpdates.get(ourUpdatesIndex) = 
-otherVersions.get(otherUpdatesIndex)}} than OOM

[~praste], [~shalinmangar] What do you think?

> Endless loop and OOM in PeerSync
> 
>
> Key: SOLR-11475
> URL: https://issues.apache.org/jira/browse/SOLR-11475
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrey Kudryavtsev
>
> After problem described in SOLR-11459, I restarted cluster and got OOM on 
> start. 
> [PeerSync#handleVersionsWithRanges|https://github.com/apache/lucene-solr/blob/68bda0be421ce18811e03b229781fd6152fcc04a/solr/core/src/java/org/apache/solr/update/PeerSync.java#L539]
>  contains this logic: 
> {code}
> while (otherUpdatesIndex >= 0) {
>   // we have run out of ourUpdates, pick up all the remaining versions 
> from the other versions
>   if (ourUpdatesIndex < 0) {
> String range = otherVersions.get(otherUpdatesIndex) + "..." + 
> otherVersions.get(0);
> rangesToRequest.add(range);
> totalRequestedVersions += otherUpdatesIndex + 1;
> break;
>   }
>   // stop when the entries get old enough that reorders may lead us to 
> see updates we don't need
>   if (!completeList && Math.abs(otherVersions.get(otherUpdatesIndex)) < 
> ourLowThreshold) break;
>   if (ourUpdates.get(ourUpdatesIndex).longValue() == 
> otherVersions.get(otherUpdatesIndex).longValue()) {
> ourUpdatesIndex--;
> otherUpdatesIndex--;
>   } else if (Math.abs(ourUpdates.get(ourUpdatesIndex)) < 
> Math.abs(otherVersions.get(otherUpdatesIndex))) {
> ourUpdatesIndex--;
>   } else {
> long rangeStart = otherVersions.get(otherUpdatesIndex);
> while ((otherUpdatesIndex < otherVersions.size())
> && (Math.abs(otherVersions.get(otherUpdatesIndex)) < 
> Math.abs(ourUpdates.get(ourUpdatesIndex {
>   otherUpdatesIndex--;
>   totalRequestedVersions++;
> }
> // construct range here
> rangesToRequest.add(rangeStart + "..." + 
> otherVersions.get(otherUpdatesIndex + 1));
>   }
> }
> {code}
> If at some point there will be
> {code} ourUpdates.get(ourUpdatesIndex) = 
> -otherVersions.get(otherUpdatesIndex) {code}
> loop will never end. It will add same string again and again into 
> {{rangesToRequest}} until process runs out of memory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org