[jira] [Updated] (CASSANDRA-7704) FileNotFoundException during STREAM-OUT triggers 100% CPU usage

2015-01-07 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-7704:

Fix Version/s: (was: 2.1.0)
   (was: 2.0.10)
   2.1.3
   2.0.12

> FileNotFoundException during STREAM-OUT triggers 100% CPU usage
> ---
>
> Key: CASSANDRA-7704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7704
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Rick Branson
>Assignee: Benedict
> Fix For: 2.0.12, 2.1.3
>
> Attachments: 7704-2.1.txt, 7704.txt, backtrace.txt, other-errors.txt
>
>
> See attached backtrace which was what triggered this. This stream failed and 
> then ~12 seconds later it emitted that exception. At that point, all CPUs 
> went to 100%. A thread dump shows all the ReadStage threads stuck inside 
> IntervalTree.searchInternal inside of CFS.markReferenced().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7704) FileNotFoundException during STREAM-OUT triggers 100% CPU usage

2014-08-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-7704:


Attachment: (was: 7704.20.v2.txt)

> FileNotFoundException during STREAM-OUT triggers 100% CPU usage
> ---
>
> Key: CASSANDRA-7704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7704
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Rick Branson
>Assignee: Benedict
> Fix For: 2.0.10, 2.1.0
>
> Attachments: 7704-2.1.txt, 7704.txt, backtrace.txt, other-errors.txt
>
>
> See attached backtrace which was what triggered this. This stream failed and 
> then ~12 seconds later it emitted that exception. At that point, all CPUs 
> went to 100%. A thread dump shows all the ReadStage threads stuck inside 
> IntervalTree.searchInternal inside of CFS.markReferenced().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7704) FileNotFoundException during STREAM-OUT triggers 100% CPU usage

2014-08-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-7704:


Attachment: 7704-2.1.txt

Attaching a new version which does not cancel the task that was run, and 
updates the unit tests to match the new behaviour

> FileNotFoundException during STREAM-OUT triggers 100% CPU usage
> ---
>
> Key: CASSANDRA-7704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7704
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Rick Branson
>Assignee: Benedict
> Fix For: 2.0.10, 2.1.0
>
> Attachments: 7704-2.1.txt, 7704.txt, backtrace.txt, other-errors.txt
>
>
> See attached backtrace which was what triggered this. This stream failed and 
> then ~12 seconds later it emitted that exception. At that point, all CPUs 
> went to 100%. A thread dump shows all the ReadStage threads stuck inside 
> IntervalTree.searchInternal inside of CFS.markReferenced().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7704) FileNotFoundException during STREAM-OUT triggers 100% CPU usage

2014-08-14 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-7704:


Fix Version/s: 2.1.0
   2.0.10

> FileNotFoundException during STREAM-OUT triggers 100% CPU usage
> ---
>
> Key: CASSANDRA-7704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7704
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Rick Branson
>Assignee: Benedict
> Fix For: 2.0.10, 2.1.0
>
> Attachments: 7704.20.v2.txt, 7704.txt, backtrace.txt, other-errors.txt
>
>
> See attached backtrace which was what triggered this. This stream failed and 
> then ~12 seconds later it emitted that exception. At that point, all CPUs 
> went to 100%. A thread dump shows all the ReadStage threads stuck inside 
> IntervalTree.searchInternal inside of CFS.markReferenced().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7704) FileNotFoundException during STREAM-OUT triggers 100% CPU usage

2014-08-11 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-7704:


Attachment: 7704.20.v2.txt

FTR, there was a (probably innocuous) mistake in that patch; fixed version 
attached.

> FileNotFoundException during STREAM-OUT triggers 100% CPU usage
> ---
>
> Key: CASSANDRA-7704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7704
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Rick Branson
>Assignee: Benedict
> Attachments: 7704.20.v2.txt, 7704.txt, backtrace.txt, other-errors.txt
>
>
> See attached backtrace which was what triggered this. This stream failed and 
> then ~12 seconds later it emitted that exception. At that point, all CPUs 
> went to 100%. A thread dump shows all the ReadStage threads stuck inside 
> IntervalTree.searchInternal inside of CFS.markReferenced().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7704) FileNotFoundException during STREAM-OUT triggers 100% CPU usage

2014-08-08 Thread Rick Branson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rick Branson updated CASSANDRA-7704:


Attachment: other-errors.txt

There wasn't anything in the logs that indicated *why* the failure happened. Th 
I attached anything suspect. The IndexOutOfBoundsException occurred on the 
bootstrapping node *after* the stream failure occurred on the node that was 
streaming out.

There was a CompactionTask that ran at 2014-08-05 18:00:25,804 (4 minutes 
before the StreamOut task) that tried to compact that SSTable that referenced 
in the FileNotFoundException. No other log messages related to that file though.

> FileNotFoundException during STREAM-OUT triggers 100% CPU usage
> ---
>
> Key: CASSANDRA-7704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7704
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Rick Branson
>Assignee: Benedict
> Attachments: 7704.txt, backtrace.txt, other-errors.txt
>
>
> See attached backtrace which was what triggered this. This stream failed and 
> then ~12 seconds later it emitted that exception. At that point, all CPUs 
> went to 100%. A thread dump shows all the ReadStage threads stuck inside 
> IntervalTree.searchInternal inside of CFS.markReferenced().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7704) FileNotFoundException during STREAM-OUT triggers 100% CPU usage

2014-08-06 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-7704:


Attachment: 7704.txt

Attaching a patch that I think addresses this. There are a number of 
concurrency bugs here, and whilst we could fix them with more advanced 
lock-freedom, there is no compelling reason this class doesn't use synchronized 
everywhere, which would probably have avoided this problem in the first place. 
There is only one place where the execution is not guaranteed to be prompt, and 
I have left this out of the synchronization. I have at the same time simplified 
the logic, and fixed the logic for cancelling timeouts, as well as made the 
scheduled executor for timeouts globally shared (there's no good reason to 
spinup a new executor for each set of transfers)

In this particular instance the issue seems to have been a lack of atomicity 
between abort() and complete(); an ACK arrived at the same time as abort() was 
cancelling all transfers, causing a reference to be released twice. This could 
also occur with the timeouts, but since they occur only every 12hrs, the risk 
is low.

> FileNotFoundException during STREAM-OUT triggers 100% CPU usage
> ---
>
> Key: CASSANDRA-7704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7704
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Rick Branson
> Attachments: 7704.txt, backtrace.txt
>
>
> See attached backtrace which was what triggered this. This stream failed and 
> then ~12 seconds later it emitted that exception. At that point, all CPUs 
> went to 100%. A thread dump shows all the ReadStage threads stuck inside 
> IntervalTree.searchInternal inside of CFS.markReferenced().



--
This message was sent by Atlassian JIRA
(v6.2#6252)