[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197195#comment-16197195 ] Christian Esken commented on CASSANDRA-13265: - PR closed: https://github.com/apache/cassandra/pull/95 > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 3.0.14, 3.11.0, 4.0 > > Attachments: cassandra-13265-2.2-dtest_stdout.txt, > cassandra-13265-trun-dtest_stdout.txt, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15998277#comment-15998277 ] Christian Esken commented on CASSANDRA-13265: - It is fine, I do not use 2.2. I was just wondering because you asked me to start with 2.2, which required more effort and made things a bit more complicated. If you hadn't asked, I would have done patches just for the HEAD of version 3 (cassandra-3.0) and version 4 (trunk). So, mission complete. Thanks Ariel for guiding me through my first Cassandra patch. :-) > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 3.0.14, 3.11.0, 4.0 > > Attachments: cassandra-13265-2.2-dtest_stdout.txt, > cassandra-13265-trun-dtest_stdout.txt, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15996322#comment-15996322 ] Christian Esken commented on CASSANDRA-13265: - Thanks. I have seen your commit in three branches. I did not yet see the changes in cassandra-2.2, when looking at https://github.com/apache/cassandra/commits/cassandra-2.2 . Is this an omission, or is the github repo is not current? > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 3.0.14, 3.11.0, 4.0 > > Attachments: cassandra-13265-2.2-dtest_stdout.txt, > cassandra-13265-trun-dtest_stdout.txt, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15986426#comment-15986426 ] Christian Esken edited comment on CASSANDRA-13265 at 5/3/17 1:05 PM: - I am fixing the branches, while you work on the dtests. I will continue updating this comment as long as I work on it. || branch || sqaushed? || Unit Tests OK? || comment || | cassandra-13265-3.0 | yes | (/) / (?) | No stress-test in build.xml. I patched circle.yml to match that: https://github.com/christian-esken/cassandra/commit/1a776e299c76093eb3edf20e0d9054e14549a667 . CircleCI still kicks off a 4th test, which fails but can likely be ignored for now. | | cassandra-13265-3.11 | yes | CircleCI (/) | | | cassandra-13265-2.2 | yes | ant test (/) | CicrleCI hasn't kicked off tests for the branch | | cassandra-13265-trunk | yes | CircleCI (/) / (?) | My unit test works. But there is a strange unrelated unit test failure: ClassNotFoundException: org.apache.cassandra.stress.CompactionStress | was (Author: cesken): I am fixing the branches, while you work on the dtests. I will continue updating this comment as long as I work on it. || branch || sqaushed? || Unit Tests OK? || comment || | cassandra-13265-3.0 | yes | (/) / (?) | No stress-test in build.xml. I patched circle.yml to match that: https://github.com/christian-esken/cassandra/commit/1a776e299c76093eb3edf20e0d9054e14549a667 . CircleCI still kicks off a 4th test, which fails but can likely be ignored for now. | | cassandra-13265-3.11 | yes | CircleCI (/) | | | cassandra-13265-2.2 | yes | ant test (/) | CicrleCI hasn't kicked off tests for the branch | | cassandra-13265-trunk | yes | CircleCI (/) / (?) | My unit test works. Bu there is a strange unrelated unit test failure: ClassNotFoundException: org.apache.cassandra.stress.CompactionStress | > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 3.0.x > > Attachments: cassandra-13265-2.2-dtest_stdout.txt, > cassandra-13265-trun-dtest_stdout.txt, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15994820#comment-15994820 ] Christian Esken commented on CASSANDRA-13265: - Done. Squashed and pushed. I also removed my "stress-test" patch in the 3.0 branch, as it is not related and also does not look like a proper fix. As a reference, here is the patch: {code} -- case $CIRCLE_NODE_INDEX in 0) ant eclipse-warnings; ant test ;; 1) ant long-test ;; 2) ant test-compression ;; 3) ant stress-test ;;esac: +- case $CIRCLE_NODE_INDEX in 0) ant eclipse-warnings; ant test ;; 1) ant long-test ;; 2) ant test-compression ;;esac: {code} > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 3.0.x > > Attachments: cassandra-13265-2.2-dtest_stdout.txt, > cassandra-13265-trun-dtest_stdout.txt, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15986426#comment-15986426 ] Christian Esken edited comment on CASSANDRA-13265 at 5/3/17 1:01 PM: - I am fixing the branches, while you work on the dtests. I will continue updating this comment as long as I work on it. || branch || sqaushed? || Unit Tests OK? || comment || | cassandra-13265-3.0 | yes | (/) / (?) | No stress-test in build.xml. I patched circle.yml to match that: https://github.com/christian-esken/cassandra/commit/1a776e299c76093eb3edf20e0d9054e14549a667 . CircleCI still kicks off a 4th test, which fails but can likely be ignored for now. | | cassandra-13265-3.11 | yes | CircleCI (/) | | | cassandra-13265-2.2 | yes | ant test (/) | CicrleCI hasn't kicked off tests for the branch | | cassandra-13265-trunk | yes | CircleCI (/) / (?) | My unit test works. Bu there is a strange unrelated unit test failure: ClassNotFoundException: org.apache.cassandra.stress.CompactionStress | was (Author: cesken): I am fixing the branches, while you work on the dtests. I will continue updating this comment as long as I work on it. || branch || sqaushed? || Unit Tests OK? || comment || | cassandra-13265-3.0 | no | (/) / (?) | No stress-test in build.xml. I patched circle.yml to match that: https://github.com/christian-esken/cassandra/commit/1a776e299c76093eb3edf20e0d9054e14549a667 . CircleCI still kicks off a 4th test, which fails but can likely be ignored for now. | | cassandra-13265-3.11 | yes | CircleCI (/) | | | cassandra-13265-2.2 | yes | ant test (/) | CicrleCI hasn't kicked off tests for the branch | | cassandra-13265-trunk | yes | CircleCI (?) | My unit test works. Bu there is a strange unrelated unit test failure: ClassNotFoundException: org.apache.cassandra.stress.CompactionStress | > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 3.0.x > > Attachments: cassandra-13265-2.2-dtest_stdout.txt, > cassandra-13265-trun-dtest_stdout.txt, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15986426#comment-15986426 ] Christian Esken edited comment on CASSANDRA-13265 at 5/3/17 12:02 PM: -- I am fixing the branches, while you work on the dtests. I will continue updating this comment as long as I work on it. || branch || sqaushed? || Unit Tests OK? || comment || | cassandra-13265-3.0 | no | (/) / (?) | No stress-test in build.xml. I patched circle.yml to match that: https://github.com/christian-esken/cassandra/commit/1a776e299c76093eb3edf20e0d9054e14549a667 . CircleCI still kicks off a 4th test, which fails but can likely be ignored for now. | | cassandra-13265-3.11 | yes | CircleCI (/) | | | cassandra-13265-2.2 | yes | ant test (/) | CicrleCI hasn't kicked off tests for the branch | | cassandra-13265-trunk | yes | CircleCI (?) | My unit test works. Bu there is a strange unrelated unit test failure: ClassNotFoundException: org.apache.cassandra.stress.CompactionStress | was (Author: cesken): I am fixing the branches, while you work on the dtests. I will continue updating this comment as long as I work on it. || branch || sqaushed? || Unit Tests OK? || comment || | cassandra-13265-3.0 | no | (/) / (?) | No stress-test in build.xml. I patched circle.yml to match that: https://github.com/christian-esken/cassandra/commit/1a776e299c76093eb3edf20e0d9054e14549a667 . CircleCI still kicks off a 4th test, which fails but can likely be ignored for now. | | cassandra-13265-3.11 | no | CircleCI (/) | | | cassandra-13265-2.2 | yes | ant test (/) | CicrleCI hasn't kicked off tests for the branch | | cassandra-13265-trunk | no | CircleCI (?) | My unit test works. Bu there is a strange unrelated unit test failure: ClassNotFoundException: org.apache.cassandra.stress.CompactionStress | > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 3.0.x > > Attachments: cassandra-13265-2.2-dtest_stdout.txt, > cassandra-13265-trun-dtest_stdout.txt, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15986426#comment-15986426 ] Christian Esken edited comment on CASSANDRA-13265 at 4/27/17 1:24 PM: -- I am fixing the branches, while you work on the dtests. I will continue updating this comment as long as I work on it. || branch || sqaushed? || Unit Tests OK? || comment || | cassandra-13265-3.0 | no | (/) / (?) | No stress-test in build.xml. I patched circle.yml to match that: https://github.com/christian-esken/cassandra/commit/1a776e299c76093eb3edf20e0d9054e14549a667 . CircleCI still kicks off a 4th test, which fails but can likely be ignored for now. | | cassandra-13265-3.11 | no | CircleCI (/) | | | cassandra-13265-2.2 | yes | ant test (/) | CicrleCI hasn't kicked off tests for the branch | | cassandra-13265-trunk | no | CircleCI (?) | My unit test works. Bu there is a strange unrelated unit test failure: ClassNotFoundException: org.apache.cassandra.stress.CompactionStress | was (Author: cesken): I am fixing the branches, while you work on the dtests. I will continue updating this comment as long as I work on it. || branch || sqaushed? || Unit Tests OK? || comment || | cassandra-13265-3.0 | no | (CircleCI currently running) | No stress-test in build.xml. I patched circle.yml to match that: https://github.com/christian-esken/cassandra/commit/1a776e299c76093eb3edf20e0d9054e14549a667 | | cassandra-13265-3.11 | no | CircleCI (/) | | | cassandra-13265-2.2 | yes | ant test (/) | CicrleCI hasn't kicked off tests for the branch | | cassandra-13265-trunk | no | CircleCI (?) | My unit test works. Bu there is a strange unrelated unit test failure: ClassNotFoundException: org.apache.cassandra.stress.CompactionStress | > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 3.0.x > > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15986426#comment-15986426 ] Christian Esken edited comment on CASSANDRA-13265 at 4/27/17 11:53 AM: --- I am fixing the branches, while you work on the dtests. I will continue updating this comment as long as I work on it. || branch || sqaushed? || Unit Tests OK? || comment || | cassandra-13265-3.0 | no | (CircleCI currently running) | No stress-test in build.xml. I patched circle.yml to match that: https://github.com/christian-esken/cassandra/commit/1a776e299c76093eb3edf20e0d9054e14549a667 | | cassandra-13265-3.11 | no | CircleCI (/) | | | cassandra-13265-2.2 | yes | ant test (/) | CicrleCI hasn't kicked off tests for the branch | | cassandra-13265-trunk | no | CircleCI (?) | My unit test works. Bu there is a strange unrelated unit test failure: ClassNotFoundException: org.apache.cassandra.stress.CompactionStress | was (Author: cesken): I am fixing the branches, while you work on the dtests. I will continue updating this comment as long as I work on it. || branch || sqaushed? || Unit Tests OK? || comment || | cassandra-13265-3.0 | no | (CircleCI currently running) | | | cassandra-13265-3.11 | no | CircleCI (/) | | | cassandra-13265-2.2 | yes | ant test (/) | CicrleCI hasn't kicked off tests for the branch | | cassandra-13265-trunk | no | CircleCI (?) | My unit test works. Bu there is a strange unrelated unit test failure: ClassNotFoundException: org.apache.cassandra.stress.CompactionStress | > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 3.0.x > > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15986426#comment-15986426 ] Christian Esken commented on CASSANDRA-13265: - I am fixing the branches, while you work on the dtests. I will continue updating this comment as long as I work on it. || branch || sqaushed? || Unit Tests OK? || comment || | cassandra-13265-3.0 | no | (CircleCI currently running) | | | cassandra-13265-3.11 | no | CircleCI (/) | | | cassandra-13265-2.2 | yes | ant test (/) | CicrleCI hasn't kicked off tests for the branch | | cassandra-13265-trunk | no | CircleCI (?) | My unit test works. Bu there is a strange unrelated unit test failure: ClassNotFoundException: org.apache.cassandra.stress.CompactionStress | > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 3.0.x > > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981236#comment-15981236 ] Christian Esken edited comment on CASSANDRA-13265 at 4/24/17 2:32 PM: -- First here is a summary and the question I have: The tests work if I add "DatabaseDescriptor.daemonInitialization();" to the unit test of the affected branches. Is this a good idea, [~aweisberg]? Now the long story: This is the status for branch cassandra-13265-3.0: - (/) Running unit tests in Eclipse: Works - (/)/(?) CircleCI: All normal tests work fine. "Your build ran 4754 tests in junit with 0 failures". The build fails for me with: Target "stress-test" does not exist in the project "apache-cassandra". As "ant test" worked, I would guess that the patch is fine. I will reverify the specific unit test locally This is the status for branch cassandra-13265-3.11 and cassandra-13265-trunk: - (/) Running unit tests in Eclipse: Works - (x) Running unit tests with CircleCI or "ant test" fails, due to non-initialized DatabaseDescriptor. When I add the following to the unit test of cassandra-13265-3.11, the unit test works. {code} DatabaseDescriptor.daemonInitialization(); {code} {code} [junit] Null Test: Caused an ERROR [junit] null [junit] java.lang.ExceptionInInitializerError [junit] at java.lang.Class.forName0(Native Method) [junit] at java.lang.Class.forName(Class.java:264) [junit] Caused by: java.lang.NullPointerException [junit] at org.apache.cassandra.config.DatabaseDescriptor.getWriteRpcTimeout(DatabaseDescriptor.java:1400) [junit] at org.apache.cassandra.net.MessagingService$Verb$1.getTimeout(MessagingService.java:121) [junit] at org.apache.cassandra.net.OutboundTcpConnectionTest.(OutboundTcpConnectionTest.java:43) {code} was (Author: cesken): First here is the summary: The tests work if I add "DatabaseDescriptor.daemonInitialization();" to the unit test of the affected branches. Is this a good idea, [~aweisberg]? Now the long story: This is the status for branch cassandra-13265-3.0: - (/) Running unit tests in Eclipse: Works - (/)/(?) CircleCI: All normal tests work fine. "Your build ran 4754 tests in junit with 0 failures". The build fails for me with: Target "stress-test" does not exist in the project "apache-cassandra". As "ant test" worked, I would guess that the patch is fine. I will reverify the specific unit test locally This is the status for branch cassandra-13265-3.11 and cassandra-13265-trunk: - (/) Running unit tests in Eclipse: Works - (x) Running unit tests with CircleCI or "ant test" fails, due to non-initialized DatabaseDescriptor. When I add the following to the unit test of cassandra-13265-3.11, the unit test works. {code} DatabaseDescriptor.daemonInitialization(); {code} {code} [junit] Null Test: Caused an ERROR [junit] null [junit] java.lang.ExceptionInInitializerError [junit] at java.lang.Class.forName0(Native Method) [junit] at java.lang.Class.forName(Class.java:264) [junit] Caused by: java.lang.NullPointerException [junit] at org.apache.cassandra.config.DatabaseDescriptor.getWriteRpcTimeout(DatabaseDescriptor.java:1400) [junit] at org.apache.cassandra.net.MessagingService$Verb$1.getTimeout(MessagingService.java:121) [junit] at org.apache.cassandra.net.OutboundTcpConnectionTest.(OutboundTcpConnectionTest.java:43) {code} > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 3.0.x > > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981236#comment-15981236 ] Christian Esken commented on CASSANDRA-13265: - First here is the summary: The tests work if I add "DatabaseDescriptor.daemonInitialization();" to the unit test of the affected branches. Is this a good idea, [~aweisberg]? Now the long story: This is the status for branch cassandra-13265-3.0: - (/) Running unit tests in Eclipse: Works - (/)/(?) CircleCI: All normal tests work fine. "Your build ran 4754 tests in junit with 0 failures". The build fails for me with: Target "stress-test" does not exist in the project "apache-cassandra". As "ant test" worked, I would guess that the patch is fine. I will reverify the specific unit test locally This is the status for branch cassandra-13265-3.11 and cassandra-13265-trunk: - (/) Running unit tests in Eclipse: Works - (x) Running unit tests with CircleCI or "ant test" fails, due to non-initialized DatabaseDescriptor. When I add the following to the unit test of cassandra-13265-3.11, the unit test works. {code} DatabaseDescriptor.daemonInitialization(); {code} {code} [junit] Null Test: Caused an ERROR [junit] null [junit] java.lang.ExceptionInInitializerError [junit] at java.lang.Class.forName0(Native Method) [junit] at java.lang.Class.forName(Class.java:264) [junit] Caused by: java.lang.NullPointerException [junit] at org.apache.cassandra.config.DatabaseDescriptor.getWriteRpcTimeout(DatabaseDescriptor.java:1400) [junit] at org.apache.cassandra.net.MessagingService$Verb$1.getTimeout(MessagingService.java:121) [junit] at org.apache.cassandra.net.OutboundTcpConnectionTest.(OutboundTcpConnectionTest.java:43) {code} > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 3.0.x > > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978748#comment-15978748 ] Christian Esken commented on CASSANDRA-13265: - I pulled the changes, fixed the CHANGES.txt and pushed everything again. Now CircleCI is kicking off the builds for the branches. Looks like we are getting somewhere. :-) > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 3.0.x > > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978730#comment-15978730 ] Christian Esken commented on CASSANDRA-13265: - Hmm, I rebased on 2.2 and trunk. I am surprised that 3.11 is not current, as 3.11 is not that old. I will now clean my repo, including deleting the bad "13625" branches and rebasing 3.0 and 3.11. > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 3.0.x > > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976279#comment-15976279 ] Christian Esken commented on CASSANDRA-13265: - Unfortunately some test failed, not because of bugs but due to technical issues, mostly with "com.datastax.driver.core.exceptions.NoHostAvailableException". Are these the "dtest" issues in CircleCI you mentioned? I tried to run the tests locally, but even "ant test" runs > 1 hour and keeps failing with Timeout, NoHostAvailableException, or similar. I don't know why the tests fail, as my Laptop should be capable of doing it. I am frequently running a 3-node Cassandra on it via ccm and that works properly. Currently I think I did all I can do. Let me know if I can check something else. What is your proposal how do we continue here? > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 3.0.x > > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974885#comment-15974885 ] Christian Esken edited comment on CASSANDRA-13265 at 4/19/17 3:14 PM: -- No problem. I was away for Easter, so I did not even notice you being busy. I just started my CircleCI test for the first time. Its working on the first branch (trunk) for an hour and is not complete yet, so I guess with all the branches it can take a day to complete. I have restarted the build with more parallelism and hopefully that will create a more acceptable turnaround time. I will send an update whenever that is complete. https://circleci.com/gh/christian-esken/cassandra/3 was (Author: cesken): No problem. I was away for Easter, so I did not even notice you being busy. I just started my CircleCI test for the first time. Its working on the first branch (trunk) for an hour and is not complete yet, so I guess with all the branches it can take a day to complete. I have restarted the build with more parallelism and will send an update whenever that is complete. Hopefully that will create a more acceptable turnaround time. https://circleci.com/gh/christian-esken/cassandra/3 > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 3.0.x > > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974885#comment-15974885 ] Christian Esken commented on CASSANDRA-13265: - No problem. I was away for Easter, so I did not even notice you being busy. I just started my CircleCI test for the first time. Its working on the first branch (trunk) for an hour and is not complete yet, so I guess with all the branches it can take a day to complete. I have restarted the build with more parallelism and will send an update whenever that is complete. Hopefully that will create a more acceptable turnaround time. https://circleci.com/gh/christian-esken/cassandra/3 > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 3.0.x > > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974590#comment-15974590 ] Christian Esken commented on CASSANDRA-13265: - bq. For CHANGES.TXT the entry should go at the top of the list of entries for the version the change is for. I don't know why. I also haven't seen this mentioned. Probably someone could add that to https://wiki.apache.org/cassandra/HowToContribute or http://cassandra.apache.org/doc/latest/development/how_to_commit.html . Anyhow I have fixed that. bq. set up with CircleCI [...] Also you transposed 13625 and 13265 I changed the branches to correct the transposing 13625 and 13265. I didn't find any other place than the branch names. I will try to find out about how to do the CircleCI stuff. Meanwhile here are the updated links: https://github.com/christian-esken/cassandra/commits/cassandra-13265-2.2 https://github.com/christian-esken/cassandra/commits/cassandra-13265-3.0 https://github.com/christian-esken/cassandra/commits/cassandra-13265-3.11 https://github.com/christian-esken/cassandra/commits/cassandra-13265-trunk > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 3.0.x > > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967484#comment-15967484 ] Christian Esken edited comment on CASSANDRA-13265 at 4/13/17 11:59 AM: --- There were different reasons why the build failed, e.g. somehow Eclipse did not pick up the build parameters for 2.2 after "ant generate-eclipse-files" and the build was done with Java 8 language level (lambdas). Looks like building and testing in Eclipse alone is not enough, so I redid everything manually in the console and fixed the issues. As you recommended, I have created branches that follow your naming (cassandra-13625-3.0) with squashed commits. The new branches are: https://github.com/christian-esken/cassandra/commits/cassandra-13625-2.2 https://github.com/christian-esken/cassandra/commits/cassandra-13625-3.11 https://github.com/christian-esken/cassandra/commits/cassandra-13625-3.0 https://github.com/christian-esken/cassandra/commits/cassandra-13625-trunk About CHANGES.TXT: I added changes in the "matching" release versions that were listed in the individual branches. Please check, as the naming conventions within Cassandra are still not clear to me (e.g. there exists a 3.11 branch, a 3.0.11 release and a 3.11.0 changelog entry). was (Author: cesken): There were different reasons why the build failed, e.g. somehow Eclipse did not pick up the build parameters for 2.2 after "ant generate-eclipse-files" and the build was done with Java 8 language level (lambdas). Looks like building and testing in Eclipse alone is not enough, so I redid everything manually in the console and fixed the issues. As you recommended, I have created branches that follow your naming (cassandra-13625-3.0) with squashed commits. The new branches are: https://github.com/christian-esken/cassandra/commits/cassandra-13625-2.2 https://github.com/christian-esken/cassandra/commits/cassandra-13625-3.11 https://github.com/christian-esken/cassandra/commits/cassandra-13625-3.0 https://github.com/christian-esken/cassandra/commits/cassandra-13625-trunk About CHANGES.TXT: I added changes to all branches where in the appropriate versions. Please check, as the naming conventions within Cassandra are still not clear to me(e.g. there exists a 3.11 branch, a 3.0.11 release and a 3.11.0 changelog entry). > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 3.0.x > > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967484#comment-15967484 ] Christian Esken commented on CASSANDRA-13265: - There were different reasons why the build failed, e.g. somehow Eclipse did not pick up the build parameters for 2.2 after "ant generate-eclipse-files" and the build was done with Java 8 language level (lambdas). Looks like building and testing in Eclipse alone is not enough, so I redid everything manually in the console and fixed the issues. As you recommended, I have created branches that follow your naming (cassandra-13625-3.0) with squashed commits. The new branches are: https://github.com/christian-esken/cassandra/commits/cassandra-13625-2.2 https://github.com/christian-esken/cassandra/commits/cassandra-13625-3.11 https://github.com/christian-esken/cassandra/commits/cassandra-13625-3.0 https://github.com/christian-esken/cassandra/commits/cassandra-13625-trunk About CHANGES.TXT: I added changes to all branches where in the appropriate versions. Please check, as the naming conventions within Cassandra are still not clear to me(e.g. there exists a 3.11 branch, a 3.0.11 release and a 3.11.0 changelog entry). > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 3.0.x > > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15964522#comment-15964522 ] Christian Esken commented on CASSANDRA-13265: - Done. 2 organizational topics left: I will add the required line to the commit message. Looks OK?!? bq. patch by Christian Esken; reviewed by Ariel Weisberg and Jason Brown for CASSANDRA-13265 My proposals for the CHANGES.txt would be the following text. Can you do that, Ariel? I do not know in which versions to add that, as they are upcoming versions. bq. Expire OutboundTcpConnection messages by a single Thread Here are the branches.The cassandra-3.0 is already squashed. If that branch is OK, I will also squash the other 3 branches. https://github.com/christian-esken/cassandra/commits/cassandra-3.0 https://github.com/christian-esken/cassandra/commits/cassandra-3.11 https://github.com/christian-esken/cassandra/commits/trunk https://github.com/christian-esken/cassandra/commits/cassandra-2.2 > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 3.0.x > > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963095#comment-15963095 ] Christian Esken edited comment on CASSANDRA-13265 at 4/10/17 4:20 PM: -- Done. My highest priority is the 3.0 branch. I created a patch (single file, squashed) for 3.0, that I also applied to my Github fork https://github.com/christian-esken/cassandra/commits/cassandra-3.0 . I attached the patch using the Submit Patch button on the top. was (Author: cesken): Done. My highest priority is the 3.0 branch. I created a patch (single file, squashed) for 3.0, that I also applied to my Github fork https://github.com/christian-esken/cassandra/commits/cassandra-3.0 . Please have a look at the attached file 0001-3.0-Expire-OTC-messages-by-a-single-Thread.patch . > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 3.0.x > > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Esken updated CASSANDRA-13265: Status: Patch Available (was: Open) >From 6bd3f3fc3b2da3a66b53a94a819446a9ea8ea2cf Mon Sep 17 00:00:00 2001 From: Christian EskenDate: Wed, 1 Mar 2017 15:56:36 +0100 Subject: [PATCH] Expire OTC messages by a single Thread This patch consists of the following aspects related to OutboundTcpConnection: - Backlog queue expiration by a single Thread - Drop count statistics - QueuedMessage.isTimedOut() fix When backlog queue expiration is done, one single Thread is elected to do the work. Previously, all Threads would go in and do the same work, producing high lock contention. The Thread reading from the Queue could even be starved by not be able to acquire the read lock. Backlog queue is inspected every otc_backlog_expiration_interval_ms milliseconds if its size exceeds BACKLOG_PURGE_SIZE. Added unit tests for OutboundTcpConnection. Timed out messages are counted in the dropped statistics. Additionally count the dropped messages when it is not possible to write to the socket, e.g. if there is no connection because a target node is down. Fix QueuedMessage.isTimedOut(), which had used a "a < b" comparison on nano time values, which can be wrong due to wrapping of System.nanoTime(). CASSANDRA-13265 --- conf/cassandra.yaml| 9 ++ src/java/org/apache/cassandra/config/Config.java | 6 + .../cassandra/config/DatabaseDescriptor.java | 10 ++ .../cassandra/net/OutboundTcpConnection.java | 113 +++--- .../org/apache/cassandra/service/StorageProxy.java | 10 +- .../cassandra/service/StorageProxyMBean.java | 3 + .../cassandra/net/OutboundTcpConnectionTest.java | 170 + 7 files changed, 294 insertions(+), 27 deletions(-) create mode 100644 test/unit/org/apache/cassandra/net/OutboundTcpConnectionTest.java diff --git a/conf/cassandra.yaml b/conf/cassandra.yaml index 790dfd743b..9c1510b66a 100644 --- a/conf/cassandra.yaml +++ b/conf/cassandra.yaml @@ -985,3 +985,12 @@ windows_timer_interval: 1 # Do not try to coalesce messages if we already got that many messages. This should be more than 2 and less than 128. # otc_coalescing_enough_coalesced_messages: 8 + +# How many milliseconds to wait between two expiration runs on the backlog (queue) of the OutboundTcpConnection. +# Expiration is done if messages are piling up in the backlog. Droppable messages are expired to free the memory +# taken by expired messages. The interval should be between 0 and 1000, and in most installations the default value +# will be appropriate. A smaller value could potentially expire messages slightly sooner at the expense of more CPU +# time and queue contention while iterating the backlog of messages. +# An interval of 0 disables any wait time, which is the behavior of former Cassandra versions. +# +# otc_backlog_expiration_interval_ms: 200 diff --git a/src/java/org/apache/cassandra/config/Config.java b/src/java/org/apache/cassandra/config/Config.java index 9aaf7ae33e..6a99cd3cbd 100644 --- a/src/java/org/apache/cassandra/config/Config.java +++ b/src/java/org/apache/cassandra/config/Config.java @@ -298,6 +298,12 @@ public class Config public int otc_coalescing_window_us = otc_coalescing_window_us_default; public int otc_coalescing_enough_coalesced_messages = 8; +/** + * Backlog expiration interval in milliseconds for the OutboundTcpConnection. + */ +public static final int otc_backlog_expiration_interval_ms_default = 200; +public volatile int otc_backlog_expiration_interval_ms = otc_backlog_expiration_interval_ms_default; + public int windows_timer_interval = 0; public boolean enable_user_defined_functions = false; diff --git a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java b/src/java/org/apache/cassandra/config/DatabaseDescriptor.java index 602214f3c6..e9e54c3e20 100644 --- a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java +++ b/src/java/org/apache/cassandra/config/DatabaseDescriptor.java @@ -1967,6 +1967,16 @@ public class DatabaseDescriptor conf.otc_coalescing_enough_coalesced_messages = otc_coalescing_enough_coalesced_messages; } +public static int getOtcBacklogExpirationInterval() +{ +return conf.otc_backlog_expiration_interval_ms; +} + +public static void setOtcBacklogExpirationInterval(int intervalInMillis) +{ +conf.otc_backlog_expiration_interval_ms = intervalInMillis; +} + public static int getWindowsTimerInterval() { return conf.windows_timer_interval; diff --git a/src/java/org/apache/cassandra/net/OutboundTcpConnection.java b/src/java/org/apache/cassandra/net/OutboundTcpConnection.java index 46083994df..99ad194b94 100644 ---
[jira] [Updated] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Esken updated CASSANDRA-13265: Status: Open (was: Patch Available) > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 3.0.x > > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Esken updated CASSANDRA-13265: Fix Version/s: (was: 3.11.x) (was: 4.x) (was: 2.2.x) Status: Patch Available (was: Reopened) > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 3.0.x > > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963095#comment-15963095 ] Christian Esken commented on CASSANDRA-13265: - Done. My highest priority is the 3.0 branch. I created a patch (single file, squashed) for 3.0, that I also applied to my Github fork https://github.com/christian-esken/cassandra/commits/cassandra-3.0 . Please have a look at the attached file 0001-3.0-Expire-OTC-messages-by-a-single-Thread.patch . > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x > > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926348#comment-15926348 ] Christian Esken commented on CASSANDRA-13265: - bq. I still think it's a good idea to avoid hard coding this kind of value so operators have options without recompiling. [...] A java property. (/) static final int BACKLOG_PURGE_SIZE = Integer.getInteger("OTC_BACKLOG_PURGE_SIZE", 1024); bq. I think we should log the drops especially due to timeouts as they happen rather than at the end. (/) I agree, and I did not change that. After the loop I simply add the unprocessed messages, which happen due to {{break inner;}}. I'll push today, so you have a chance to see that. > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x > > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925815#comment-15925815 ] Christian Esken commented on CASSANDRA-13265: - bq. it's very low effort to add a property I see. I thought it would just clutter the cassanda.yaml, as nobody ever would change the value. But if you feel it is important enough, I can do so. bq. Not a huge deal, but I think we should log the drops especially due to timeouts as they happen OK. I can follow your argument. I will rewrite it, also adding comments with explanation. PS: I will be soon on vacation for two weeks, so please don't wonder why you do not see any updates from me. > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x > > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924135#comment-15924135 ] Christian Esken edited comment on CASSANDRA-13265 at 3/14/17 1:55 PM: -- Done. Hint: Not everything is committed yet, as I have to remove my debug code from it. - (/) A smaller value could potentially ... - (/) You shouldn't need the check for null? Usually we "just" make sure its not null OK. I thought it might be possible to set this to null, but even JConsole refuses it. - (/) Using a boxed integer makes it a bit confusing ... ACK. Happily changed that. Looks like I followed bad examples. - (/) Avoid unrelated whitespace changes. OK. I missed that after moving the field. - (?) I still think it's a good idea to avoid hard coding this kind of value so operators have options without recompiling. I would like BACKLOG_PURGE_SIZE to be kept hard coded for now. It has been there for quite some time hard coded, and in the long term I do not think it should be kept as-is. For example it would better to purge on the number of actually DROPPABLE messages in the queue (or their weight if you want to extend even further) - (/) Fun fact. You don't need backlogNextExpirationTime to be volatile. You can piggyback on backlogExpirationActive to get the desired effects from the Java memory model. [...] I wouldn't change it ... Yes. I am aware of that and using that technique often. Here I did not like it as visibility effects would not be obvious, unless explicitly documented. You are probably aware what Brian Goetz says about piggybacking in his JCIP book. BTW: A more obvious usage is for me in status fields, e.g. to make the results of a Future visible. I won't change it either, so marking this as done. - (?)Breaking out the uber bike shedding this could be maybeExpireMessages. Nope, I am not going back that road. I had expireMessagesConditionally() before and changed it on request. If we do this, then a Set should not have an add() method but only a maybeAdd(), because it might not add the entry. Also I added clear documentation, so it should be fine. - (/)Swap the order of these two stores so it doesn't do extra expirations. Ouch. That hurts. I wanted to protect from Exceptions inside the throw-Block which would disable expiration infinitely. I was quite tired yesterday. I am swapping it back, TimeUnit conversions never throw Exceptions, so it is safe. :-| - (/) This is not quite correct you can't count drainCount as dropped because some of the drained messages may have been sent during iteration. In progress. I am wondering if we should include fixing the drop count it in this patch, as it will likely create even more conflicts. OTOH I have to touch some related methods anyhow. I will think about it. was (Author: cesken): Done. Hint: Not everything is committed yet, as I have to remove my debug code from it. - (/) A smaller value could potentially ... - (/) You shouldn't need the check for null? Usually we "just" make sure its not null OK. I thought it might be possible to set this to null, but even JConsole refuses it. - (/) Using a boxed integer makes it a bit confusing ... ACK. Happily changed that. Looks like I followed bad examples. - (/) Avoid unrelated whitespace changes. OK. I missed that after moving the field. - (?) I still think it's a good idea to avoid hard coding this kind of value so operators have options without recompiling. I would like BACKLOG_PURGE_SIZE to be kept hard coded for now. It has been there for quite some time hard coded, and in the long term I do not think it should be kept as-is. For example it would better to purge on the number of actually DROPPABLE messages in the queue (or their weight if you want to extend even further) - (/) Fun fact. You don't need backlogNextExpirationTime to be volatile. You can piggyback on backlogExpirationActive to get the desired effects from the Java memory model. [...] I wouldn't change it ... Yes. I am aware of that and using that technique often. Here I did not like it as visibility effects would not be obvious, unless explicitly documented. You are probably aware what Brian Goetz says about piggybacking in his JCIP book. BTW: A more obvious usage is for me in status fields, e.g. to make the results of a Future visible. I won't change it either, so marking this as done. - (?)Breaking out the uber bike shedding this could be maybeExpireMessages. Nope, I am not going back that road. I had expireMessagesConditionally() before and changed it on request. If we do this, then a Set should not have an add() method but only a maybeAdd(), because it might not add the entry. Also I added clear documentation, so it should be fine. - (x)Swap the order of these two stores so it doesn't do extra expirations. Ouch. That hurts. I wanted to
[jira] [Comment Edited] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924135#comment-15924135 ] Christian Esken edited comment on CASSANDRA-13265 at 3/14/17 1:54 PM: -- Done. Hint: Not everything is committed yet, as I have to remove my debug code from it. - (/) A smaller value could potentially ... - (/) You shouldn't need the check for null? Usually we "just" make sure its not null OK. I thought it might be possible to set this to null, but even JConsole refuses it. - (/) Using a boxed integer makes it a bit confusing ... ACK. Happily changed that. Looks like I followed bad examples. - (/) Avoid unrelated whitespace changes. OK. I missed that after moving the field. - (?) I still think it's a good idea to avoid hard coding this kind of value so operators have options without recompiling. I would like BACKLOG_PURGE_SIZE to be kept hard coded for now. It has been there for quite some time hard coded, and in the long term I do not think it should be kept as-is. For example it would better to purge on the number of actually DROPPABLE messages in the queue (or their weight if you want to extend even further) - (/) Fun fact. You don't need backlogNextExpirationTime to be volatile. You can piggyback on backlogExpirationActive to get the desired effects from the Java memory model. [...] I wouldn't change it ... Yes. I am aware of that and using that technique often. Here I did not like it as visibility effects would not be obvious, unless explicitly documented. You are probably aware what Brian Goetz says about piggybacking in his JCIP book. BTW: A more obvious usage is for me in status fields, e.g. to make the results of a Future visible. I won't change it either, so marking this as done. - (?)Breaking out the uber bike shedding this could be maybeExpireMessages. Nope, I am not going back that road. I had expireMessagesConditionally() before and changed it on request. If we do this, then a Set should not have an add() method but only a maybeAdd(), because it might not add the entry. Also I added clear documentation, so it should be fine. - (x)Swap the order of these two stores so it doesn't do extra expirations. Ouch. That hurts. I wanted to protect from Exceptions inside the throw-Block which would disable expiration infinitely. I was quite tired yesterday. I am swapping it back, TimeUnit conversions never throw Exceptions, so it is safe. :-| - (/) This is not quite correct you can't count drainCount as dropped because some of the drained messages may have been sent during iteration. In progress. I am wondering if we should include fixing the drop count it in this patch, as it will likely create even more conflicts. OTOH I have to touch some related methods anyhow. I will think about it. was (Author: cesken): Marking off feedback: - (/) A smaller value could potentially ... - (/) You shouldn't need the check for null? Usually we "just" make sure its not null OK. I thought it might be possible to set this to null, but even JConsole refuses it. - (/) Using a boxed integer makes it a bit confusing ... ACK. Happily changed that. Looks like I followed bad examples. - (/) Avoid unrelated whitespace changes. OK. I missed that after moving the field. - (?) I still think it's a good idea to avoid hard coding this kind of value so operators have options without recompiling. I would like BACKLOG_PURGE_SIZE to be kept hard coded for now. It has been there for quite some time hard coded, and in the long term I do not think it should be kept as-is. For example it would better to purge on the number of actually DROPPABLE messages in the queue (or their weight if you want to extend even further) - (/) Fun fact. You don't need backlogNextExpirationTime to be volatile. You can piggyback on backlogExpirationActive to get the desired effects from the Java memory model. [...] I wouldn't change it ... Yes. I am aware of that and using that technique often. Here I did not like it as visibility effects would not be obvious, unless explicitly documented. You are probably aware what Brian Goetz says about piggybacking in his JCIP book. BTW: A more obvious usage is for me in status fields, e.g. to make the results of a Future visible. I won't change it either, so marking this as done. - (?)Breaking out the uber bike shedding this could be maybeExpireMessages. Nope, I am not going back that road. I had expireMessagesConditionally() before and changed it on request. If we do this, then a Set should not have an add() method but only a maybeAdd(), because it might not add the entry. Also I added clear documentation, so it should be fine. - (x)Swap the order of these two stores so it doesn't do extra expirations. Ouch. That hurts. I wanted to protect from Exceptions inside the throw-Block which would disable
[jira] [Comment Edited] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924135#comment-15924135 ] Christian Esken edited comment on CASSANDRA-13265 at 3/14/17 1:52 PM: -- Marking off feedback: - (/) A smaller value could potentially ... - (/) You shouldn't need the check for null? Usually we "just" make sure its not null OK. I thought it might be possible to set this to null, but even JConsole refuses it. - (/) Using a boxed integer makes it a bit confusing ... ACK. Happily changed that. Looks like I followed bad examples. - (/) Avoid unrelated whitespace changes. OK. I missed that after moving the field. - (?) I still think it's a good idea to avoid hard coding this kind of value so operators have options without recompiling. I would like BACKLOG_PURGE_SIZE to be kept hard coded for now. It has been there for quite some time hard coded, and in the long term I do not think it should be kept as-is. For example it would better to purge on the number of actually DROPPABLE messages in the queue (or their weight if you want to extend even further) - (/) Fun fact. You don't need backlogNextExpirationTime to be volatile. You can piggyback on backlogExpirationActive to get the desired effects from the Java memory model. [...] I wouldn't change it ... Yes. I am aware of that and using that technique often. Here I did not like it as visibility effects would not be obvious, unless explicitly documented. You are probably aware what Brian Goetz says about piggybacking in his JCIP book. BTW: A more obvious usage is for me in status fields, e.g. to make the results of a Future visible. I won't change it either, so marking this as done. - (?)Breaking out the uber bike shedding this could be maybeExpireMessages. Nope, I am not going back that road. I had expireMessagesConditionally() before and changed it on request. If we do this, then a Set should not have an add() method but only a maybeAdd(), because it might not add the entry. Also I added clear documentation, so it should be fine. - (x)Swap the order of these two stores so it doesn't do extra expirations. Ouch. That hurts. I wanted to protect from Exceptions inside the throw-Block which would disable expiration infinitely. I was quite tired yesterday. I am swapping it back, TimeUnit conversions never throw Exceptions, so it is safe. :-| - (?) This is not quite correct you can't count drainCount as dropped because some of the drained messages may have been sent during iteration. In progress. I am wondering if we should include fixing the drop count it in this patch, as it will likely create even more conflicts. OTOH I have to touch some related methods anyhow. I will think about it. was (Author: cesken): Marking off feedback: - (/) A smaller value could potentially ... - (/) You shouldn't need the check for null? Usually we "just" make sure its not null OK. I thought it might be possible to set this to null, but even JConsole refuses it. - (/) Using a boxed integer makes it a bit confusing ... ACK. Happily changed that. Looks like I followed bad examples. - (/) Avoid unrelated whitespace changes. OK. I missed that after moving the field. - (?) I still think it's a good idea to avoid hard coding this kind of value so operators have options without recompiling. I would like BACKLOG_PURGE_SIZE to be kept hard coded for now. It has been there for quite some time hard coded, and in the long term I do not think it should be kept as-is. For example it would better to purge on the number of actually DROPPABLE messages in the queue (or their weight if you want to extend even further) - (/) Fun fact. You don't need backlogNextExpirationTime to be volatile. You can piggyback on backlogExpirationActive to get the desired effects from the Java memory model. [...] I wouldn't change it ... Yes. I am aware of that and using that technique often. Here I did not like it as visibility effects would not be obvious, unless explicitly documented. You are probably aware what Brian Goetz says about piggybacking in his JCIP book. BTW: A more obvious usage is for me in status fields, e.g. to make the results of a Future visible. I won't change it either, so marking this as done. - (?)Breaking out the uber bike shedding this could be maybeExpireMessages. Nope, I am not going back that road. I had expireMessagesConditionally() before and changed it on request. his would be like in a Set there should be not add() method, but only a maybeAdd(), because it might not add the entry. Also I added clear documentation, so it should be fine. - (x)Swap the order of these two stores so it doesn't do extra expirations. Ouch. That hurts. I wanted to protect from Exceptions inside the throw-Block which would disable expiration infinitely. I was quite tired yesterday. I am swapping
[jira] [Comment Edited] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924135#comment-15924135 ] Christian Esken edited comment on CASSANDRA-13265 at 3/14/17 1:51 PM: -- Marking off feedback: - (/) A smaller value could potentially ... - (/) You shouldn't need the check for null? Usually we "just" make sure its not null OK. I thought it might be possible to set this to null, but even JConsole refuses it. - (/) Using a boxed integer makes it a bit confusing ... ACK. Happily changed that. Looks like I followed bad examples. - (/) Avoid unrelated whitespace changes. OK. I missed that after moving the field. - (?) I still think it's a good idea to avoid hard coding this kind of value so operators have options without recompiling. I would like BACKLOG_PURGE_SIZE to be kept hard coded for now. It has been there for quite some time hard coded, and in the long term I do not think it should be kept as-is. For example it would better to purge on the number of actually DROPPABLE messages in the queue (or their weight if you want to extend even further) - (/) Fun fact. You don't need backlogNextExpirationTime to be volatile. You can piggyback on backlogExpirationActive to get the desired effects from the Java memory model. [...] I wouldn't change it ... Yes. I am aware of that and using that technique often. Here I did not like it as visibility effects would not be obvious, unless explicitly documented. You are probably aware what Brian Goetz says about piggybacking in his JCIP book. BTW: A more obvious usage is for me in status fields, e.g. to make the results of a Future visible. I won't change it either, so marking this as done. - (?)Breaking out the uber bike shedding this could be maybeExpireMessages. Nope, I am not going back that road. I had expireMessagesConditionally() before and changed it on request. his would be like in a Set there should be not add() method, but only a maybeAdd(), because it might not add the entry. Also I added clear documentation, so it should be fine. - (x)Swap the order of these two stores so it doesn't do extra expirations. Ouch. That hurts. I wanted to protect from Exceptions inside the throw-Block which would disable expiration infinitely. I was quite tired yesterday. I am swapping it back, TimeUnit conversions never throw Exceptions, so it is safe. :-| - (?) This is not quite correct you can't count drainCount as dropped because some of the drained messages may have been sent during iteration. In progress. I am wondering if we should include fixing the drop count it in this patch, as it will likely create even more conflicts. OTOH I have to touch some related methods anyhow. I will think about it. was (Author: cesken): Marking off feedback: - (/) A smaller value could potentially ... - (/) You shouldn't need the check for null? Usually we "just" make sure its not null OK. I thought it might be possible to set this to null, but even JConsole refuses it. - (/) Using a boxed integer makes it a bit confusing ... ACK. Happily changed that. Looks like I followed bad examples. - (/) Avoid unrelated whitespace changes. OK. I missed that after moving the field. - (?) I still think it's a good idea to avoid hard coding this kind of value so operators have options without recompiling. I would like BACKLOG_PURGE_SIZE to be kept hard coded for now. It has been there for quite some time hard coded, and in the long term I do not think it should be kept as-is. For example it would better to purge on the number of actually DROPPABLE messages in the queue (or their weight if you want to extend even further) - (/) Fun fact. You don't need backlogNextExpirationTime to be volatile. You can piggyback on backlogExpirationActive to get the desired effects from the Java memory model. [...] I wouldn't change it ... Yes. I am aware of that and using that technique often. Here I did not like it as visibility effects would not be obvious, unless explicitly documented. You are probably aware what Brian Goetz says about piggybacking in his JCIP book. BTW: A more obvious usage is for me in status fields, e.g. to make the results of a Future visible. I won't change it either, so marking this as done. - (x)Breaking out the uber bike shedding this could be maybeExpireMessages. - (x)Swap the order of these two stores so it doesn't do extra expirations. Ouch. That hurts. I wanted to protect from Exceptions inside the throw-Block which would disable expiration infinitely. I was quite tired yesterday. I am swapping it back, TimeUnit conversions never throw Exceptions, so it is safe. :-| - (?) This is not quite correct you can't count drainCount as dropped because some of the drained messages may have been sent during iteration. In progress. I am wondering if we should include fixing the
[jira] [Comment Edited] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924135#comment-15924135 ] Christian Esken edited comment on CASSANDRA-13265 at 3/14/17 1:40 PM: -- Marking off feedback: - (/) A smaller value could potentially ... - (/) You shouldn't need the check for null? Usually we "just" make sure its not null OK. I thought it might be possible to set this to null, but even JConsole refuses it. - (/) Using a boxed integer makes it a bit confusing ... ACK. Happily changed that. Looks like I followed bad examples. - (/) Avoid unrelated whitespace changes. OK. I missed that after moving the field. - (?) I still think it's a good idea to avoid hard coding this kind of value so operators have options without recompiling. I would like BACKLOG_PURGE_SIZE hard coded. It has been there for quite some time hard coded, and in the long term I do not think it should be kept. Fox example it would better to purge on the number of actually DROPPABLE messages in the queue (or their weight if you want to extend even further) - (/) Fun fact. You don't need backlogNextExpirationTime to be volatile. You can piggyback on backlogExpirationActive to get the desired effects from the Java memory model. [...] I wouldn't change it ... Yes. I am aware of that and using that technique often. Here I did not like it as visibility effects would not be obvious, unless explicitly documented. You are probably aware what Brian Goetz says about piggybacking in his JCIP book. BTW: A more obvious usage is for me in status fields, e.g. to make the results of a Future visible. I won't change it either, so marking this as done. - (x)Breaking out the uber bike shedding this could be maybeExpireMessages. - (x)Swap the order of these two stores so it doesn't do extra expirations. Ouch. That hurts. I wanted to protect from Exceptions inside the throw-Block which would disable expiration infinitely. I was quite tired yesterday. I am swapping it back, TimeUnit conversions never throw Exceptions, so it is safe. :-| - (?) This is not quite correct you can't count drainCount as dropped because some of the drained messages may have been sent during iteration. In progress. I am wondering if we should include fixing the drop count it in this patch, as it will likely create even more conflicts. OTOH I have to touch some related methods anyhow. I will think about it. was (Author: cesken): Marking off feedback: - (/) A smaller value could potentially ... - (/) You shouldn't need the check for null? Usually we "just" make sure its not null OK. I thought it might be possible to set this to null, but even JConsole refuses it. - (/) Using a boxed integer makes it a bit confusing ... ACK. Happily changed that. Looks like I followed bad examples. - (/) Avoid unrelated whitespace changes. OK. I missed that after moving the field. - (x) I still think it's a good idea to avoid hard coding this kind of value so operators have options without recompiling. - (/) Fun fact. You don't need backlogNextExpirationTime to be volatile. You can piggyback on backlogExpirationActive to get the desired effects from the Java memory model. [...] I wouldn't change it ... Yes. I am aware of that and using that technique often. Here I did not like it as visibility effects would not be obvious, unless explicitly documented. You are probably aware what Brian Goetz says about piggybacking in his JCIP book. BTW: A more obvious usage is for me in status fields, e.g. to make the results of a Future visible. I won't change it either, so marking this as done. - (x)Breaking out the uber bike shedding this could be maybeExpireMessages. - (x)Swap the order of these two stores so it doesn't do extra expirations. Ouch. That hurts. I wanted to protect from Exceptions inside the throw-Block which would disable expiration infinitely. I was quite tired yesterday. I am swapping it back, TimeUnit conversions never throw Exceptions, so it is safe. :-| - (?) This is not quite correct you can't count drainCount as dropped because some of the drained messages may have been sent during iteration. In progress. I am wondering if we should include fixing the drop count it in this patch, as it will likely create even more conflicts. OTOH I have to touch some related methods anyhow. I will think about it. > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >
[jira] [Comment Edited] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924135#comment-15924135 ] Christian Esken edited comment on CASSANDRA-13265 at 3/14/17 1:42 PM: -- Marking off feedback: - (/) A smaller value could potentially ... - (/) You shouldn't need the check for null? Usually we "just" make sure its not null OK. I thought it might be possible to set this to null, but even JConsole refuses it. - (/) Using a boxed integer makes it a bit confusing ... ACK. Happily changed that. Looks like I followed bad examples. - (/) Avoid unrelated whitespace changes. OK. I missed that after moving the field. - (?) I still think it's a good idea to avoid hard coding this kind of value so operators have options without recompiling. I would like BACKLOG_PURGE_SIZE to be kept hard coded for now. It has been there for quite some time hard coded, and in the long term I do not think it should be kept as-is. For example it would better to purge on the number of actually DROPPABLE messages in the queue (or their weight if you want to extend even further) - (/) Fun fact. You don't need backlogNextExpirationTime to be volatile. You can piggyback on backlogExpirationActive to get the desired effects from the Java memory model. [...] I wouldn't change it ... Yes. I am aware of that and using that technique often. Here I did not like it as visibility effects would not be obvious, unless explicitly documented. You are probably aware what Brian Goetz says about piggybacking in his JCIP book. BTW: A more obvious usage is for me in status fields, e.g. to make the results of a Future visible. I won't change it either, so marking this as done. - (x)Breaking out the uber bike shedding this could be maybeExpireMessages. - (x)Swap the order of these two stores so it doesn't do extra expirations. Ouch. That hurts. I wanted to protect from Exceptions inside the throw-Block which would disable expiration infinitely. I was quite tired yesterday. I am swapping it back, TimeUnit conversions never throw Exceptions, so it is safe. :-| - (?) This is not quite correct you can't count drainCount as dropped because some of the drained messages may have been sent during iteration. In progress. I am wondering if we should include fixing the drop count it in this patch, as it will likely create even more conflicts. OTOH I have to touch some related methods anyhow. I will think about it. was (Author: cesken): Marking off feedback: - (/) A smaller value could potentially ... - (/) You shouldn't need the check for null? Usually we "just" make sure its not null OK. I thought it might be possible to set this to null, but even JConsole refuses it. - (/) Using a boxed integer makes it a bit confusing ... ACK. Happily changed that. Looks like I followed bad examples. - (/) Avoid unrelated whitespace changes. OK. I missed that after moving the field. - (?) I still think it's a good idea to avoid hard coding this kind of value so operators have options without recompiling. I would like BACKLOG_PURGE_SIZE hard coded. It has been there for quite some time hard coded, and in the long term I do not think it should be kept. Fox example it would better to purge on the number of actually DROPPABLE messages in the queue (or their weight if you want to extend even further) - (/) Fun fact. You don't need backlogNextExpirationTime to be volatile. You can piggyback on backlogExpirationActive to get the desired effects from the Java memory model. [...] I wouldn't change it ... Yes. I am aware of that and using that technique often. Here I did not like it as visibility effects would not be obvious, unless explicitly documented. You are probably aware what Brian Goetz says about piggybacking in his JCIP book. BTW: A more obvious usage is for me in status fields, e.g. to make the results of a Future visible. I won't change it either, so marking this as done. - (x)Breaking out the uber bike shedding this could be maybeExpireMessages. - (x)Swap the order of these two stores so it doesn't do extra expirations. Ouch. That hurts. I wanted to protect from Exceptions inside the throw-Block which would disable expiration infinitely. I was quite tired yesterday. I am swapping it back, TimeUnit conversions never throw Exceptions, so it is safe. :-| - (?) This is not quite correct you can't count drainCount as dropped because some of the drained messages may have been sent during iteration. In progress. I am wondering if we should include fixing the drop count it in this patch, as it will likely create even more conflicts. OTOH I have to touch some related methods anyhow. I will think about it. > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key:
[jira] [Comment Edited] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924135#comment-15924135 ] Christian Esken edited comment on CASSANDRA-13265 at 3/14/17 1:30 PM: -- Marking off feedback: - (/) A smaller value could potentially ... - (/) You shouldn't need the check for null? Usually we "just" make sure its not null OK. I thought it might be possible to set this to null, but even JConsole refuses it. - (/) Using a boxed integer makes it a bit confusing ... ACK. Happily changed that. Looks like I followed bad examples. - (/) Avoid unrelated whitespace changes. OK. I missed that after moving the field. - (x) I still think it's a good idea to avoid hard coding this kind of value so operators have options without recompiling. - (/) Fun fact. You don't need backlogNextExpirationTime to be volatile. You can piggyback on backlogExpirationActive to get the desired effects from the Java memory model. [...] I wouldn't change it ... Yes. I am aware of that and using that technique often. Here I did not like it as visibility effects would not be obvious, unless explicitly documented. You are probably aware what Brian Goetz says about piggybacking in his JCIP book. BTW: A more obvious usage is for me in status fields, e.g. to make the results of a Future visible. I won't change it either, so marking this as done. - (x)Breaking out the uber bike shedding this could be maybeExpireMessages. - (x)Swap the order of these two stores so it doesn't do extra expirations. Ouch. That hurts. I wanted to protect from Exceptions inside the throw-Block which would disable expiration infinitely. I was quite tired yesterday. I am swapping it back, TimeUnit conversions never throw Exceptions, so it is safe. :-| - (?) This is not quite correct you can't count drainCount as dropped because some of the drained messages may have been sent during iteration. In progress. I am wondering if we should include fixing the drop count it in this patch, as it will likely create even more conflicts. OTOH I have to touch some related methods anyhow. I will think about it. was (Author: cesken): Marking off feedback: - (/) A smaller value could potentially ... - (/) You shouldn't need the check for null? Usually we "just" make sure its not null OK. I thought it might be possible to set this to null, but even JConsole refuses it. - (/) Using a boxed integer makes it a bit confusing ... ACK. Happily changed that. Looks like I followed bad examples. - (/) Avoid unrelated whitespace changes. OK. I missed that after moving the field. - (x) I still think it's a good idea to avoid hard coding this kind of value so operators have options without recompiling. - (/) Fun fact. You don't need backlogNextExpirationTime to be volatile. You can piggyback on backlogExpirationActive to get the desired effects from the Java memory model. [...] I wouldn't change it ... Yes. I am aware of that and using that technique often. Here I did not like it as visibility effects would not be obvious, unless explicitly documented. You are probably aware what Brian Goetz says about piggybacking in his JCIP book. BTW: A more obvious usage is for me in status fields, e.g. to make the results of a Future visible. I won't change it either, so marking this as done. - (x)Breaking out the uber bike shedding this could be maybeExpireMessages. - (x)Swap the order of these two stores so it doesn't do extra expirations. Ouch. That hurts. I was quite tired yesterday. :-| - (?) This is not quite correct you can't count drainCount as dropped because some of the drained messages may have been sent during iteration. In progress. I am wondering if we should include fixing the drop count it in this patch, as it will likely create even more conflicts. OTOH I have to touch some related methods anyhow. I will think about it. > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x > > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924135#comment-15924135 ] Christian Esken commented on CASSANDRA-13265: - Marking off feedback: - (/) A smaller value could potentially ... - (/) You shouldn't need the check for null? Usually we "just" make sure its not null OK. I thought it might be possible to set this to null, but even JConsole refuses it. - (/) Using a boxed integer makes it a bit confusing ... ACK. Happily changed that. Looks like I followed bad examples. - (/) Avoid unrelated whitespace changes. OK. I missed that after moving the field. - (x) I still think it's a good idea to avoid hard coding this kind of value so operators have options without recompiling. - (/) Fun fact. You don't need backlogNextExpirationTime to be volatile. You can piggyback on backlogExpirationActive to get the desired effects from the Java memory model. [...] I wouldn't change it ... Yes. I am aware of that and using that technique often. Here I did not like it as visibility effects would not be obvious, unless explicitly documented. You are probably aware what Brian Goetz says about piggybacking in his JCIP book. BTW: A more obvious usage is for me in status fields, e.g. to make the results of a Future visible. I won't change it either, so marking this as done. - (x)Breaking out the uber bike shedding this could be maybeExpireMessages. - (x)Swap the order of these two stores so it doesn't do extra expirations. Ouch. That hurts. I was quite tired yesterday. :-| - (?) This is not quite correct you can't count drainCount as dropped because some of the drained messages may have been sent during iteration. In progress. I am wondering if we should include fixing the drop count it in this patch, as it will likely create even more conflicts. OTOH I have to touch some related methods anyhow. I will think about it. > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x > > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923915#comment-15923915 ] Christian Esken commented on CASSANDRA-13265: - bq. This is not quite correct you can't count drainCount as dropped because some of the drained messages may have been sent during iteration. I looked in more detail, and I think this a flaw in the original code "suggested" me to do this: {{drainedMessages.clear()}} is called twice, and one time would be enough. IMO it would be better to only keep the one at the end of the method and also do the drop-counting for the drained messages there. This would also cover a rather exotic case of the {{catch (Exception e)}} in the {{run()}} method. If an Exception is thrown, then there is a danger of nothing being counted. bq. Using a boxed integer bq. You shouldn't need the check for null? >From a brief check, this refers to a similar point. I saw many configuration >options to allow null and followed that route. I am absolutely happy to make >it non-boxed. bq. The right way to do it is create a branch for all the versions where this is going to be fixed. Start at 2.2, merge to 3.0, merge to 3.11, then merge to trunk. At Github? I can do so. But no PR, right? I saw it mentioned that one should not open PR's for Cassandra on Github as they cannot be handled (it's just a mirror). > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x > > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15907493#comment-15907493 ] Christian Esken edited comment on CASSANDRA-13265 at 3/13/17 2:33 PM: -- The change is now finished, including configuration, MBean and testing. I also tested an interval of 0 ms, which is close to what Cassandra does today. Please have a look. What is the recommended way of getting this into the official Cassandra repo? A patch, or would someone with write access take it directly from Github? This also has impact on how the merge conflicts will be resolved, as there are now 3 files with merge conflicts according to https://github.com/apache/cassandra/pull/95. I did not want to rebase my branch without asking. was (Author: cesken): The change is now finished, including configuration, MBean and testing. I also tested an interval of 0 ms, which is close to what Cassandra does today. Please have a look. What is the recommended way of getting this into the official Cassandra repo? A patch, or would someone with write access take it directly from Github? This also has impact on how the merge conflicts will be resolved, as there are now 3 files merge conflicts according to https://github.com/apache/cassandra/pull/95. I did not want to rebase my branch without asking. > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15907493#comment-15907493 ] Christian Esken edited comment on CASSANDRA-13265 at 3/13/17 1:56 PM: -- The change is now finished, including configuration, MBean and testing. I also tested an interval of 0 ms, which is close to what Cassandra does today. Please have a look. What is the recommended way of getting this into the official Cassandra repo? A patch, or would someone with write access take it directly from Github? This also has impact on how the merge conflicts will be resolved, as there are now 3 files merge conflicts according to https://github.com/apache/cassandra/pull/95. I did not want to rebase my branch without asking. was (Author: cesken): The change is now finished, including configuration, MBean and testing. I also tested an interval of 0 ms, which is close to what Cassandra does today. Please have a look. What is the recommended way of getting this into the official Cassandra repo? A patch, or would someone with write access take it directly from Github? This also has impact on how the merge conflicts will be resolved, as there are now 3 files with conflicts e branch has now merge conflicts on 3 files according to https://github.com/apache/cassandra/pull/95. I did not want to rebase my branch without asking. > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15907493#comment-15907493 ] Christian Esken commented on CASSANDRA-13265: - The change is now finished, including configuration, MBean and testing. I also tested an interval of 0 ms, which is close to what Cassandra does today. Please have a look. What is the recommended way of getting this into the official Cassandra repo? A patch, or would someone with write access take it directly from Github? This also has impact on how the merge conflicts will be resolved, as there are now 3 files with conflicts e branch has now merge conflicts on 3 files according to https://github.com/apache/cassandra/pull/95. I did not want to rebase my branch without asking. > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905409#comment-15905409 ] Christian Esken edited comment on CASSANDRA-13265 at 3/10/17 5:05 PM: -- I committed the change containing the configuration, as I would like some feedback whether I am on the right path. Please note that I did not yet have time for tests (planned for next Monday), but I thought it is better to give a chance to review the current changes. I also had to add back "AtomicBoolean backlogExpirationActive" Otherwise I cannot guarantee that only a single Thread is iterating the Queue, especially if a small expiration interval (1ms, or 0ms) is configured. The "AtomicLong backlogNextExpirationTime" could now be "volatile long". was (Author: cesken): I committed the change containing the configuration, as I would like some feedback whether I am on the right path. Please note that I did not yet have time for tests (planned for next Monday), but I thought it is better to give a chance to review the current changes. Please note that I had to add back "AtomicBoolean backlogExpirationActive" Otherwise I cannot guarantee that only a single Thread is iterating the Queue, especially if a small expiration interval (1ms, or 0ms) is configured. The "AtomicLong backlogNextExpirationTime" could now be "volatile long". > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905409#comment-15905409 ] Christian Esken commented on CASSANDRA-13265: - I committed the change containing the configuration, as I would like some feedback whether I am on the right path. Please note that I did not yet have time for tests (planned for next Monday), but I thought it is better to give a chance to review the current changes. Please note that I had to add back "AtomicBoolean backlogExpirationActive" Otherwise I cannot guarantee that only a single Thread is iterating the Queue, especially if a small expiration interval (1ms, or 0ms) is configured. The "AtomicLong backlogNextExpirationTime" could now be "volatile long". > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904949#comment-15904949 ] Christian Esken edited comment on CASSANDRA-13265 at 3/10/17 3:48 PM: -- I am nearly done with the configuration, and have two questions about it: 1. How to handle the default value? My approach is to pre-configure the default value in Config: {code} public static final int otc_backlog_expiration_interval_in_ms_default = 200; public volatile Integer otc_backlog_expiration_interval_in_ms = otc_backlog_expiration_interval_in_ms_default; {code} Additionally I will handle null values, that might have been set via JMX in the getter of DatabaseDescriptor: {code} public static Integer getOtcBacklogExpirationInterval() { Integer confValue = conf.otc_backlog_expiration_interval_in_ms; return confValue != null ? confValue : Config.otc_backlog_expiration_interval_in_ms_default; } {code} Is that OK? Should I also handle other illegal values in that getter (negative values), or reject them in the setter? I have not found a code example in Cassandra that handles bad values uniformly for MBean and Config. 2. How to read the config value? I am seeing some {{Integer.getInteger(propName, defaultValue)}}, but this looks strange to me. I think changes from JMX would not even be reflected. Thus I am calling the getter from above: {{DatabaseDescriptor.getOtcBacklogExpirationInterval()}}. Is the latter OK? was (Author: cesken): I am nearly done with the configuration, and have two questions about it: 1. How to handle the default value? My approach is to pre-configure the default value in Config: {code} public static final int otc_backlog_expiration_interval_in_ms_default = 200; public volatile Integer otc_backlog_expiration_interval_in_ms = otc_backlog_expiration_interval_in_ms_default; {code} Additionally I will handle null values, that might have been set via JMX in the getter of DatabaseDescriptor: {code} public static Integer getOtcBacklogExpirationInterval() { Integer confValue = conf.otc_backlog_expiration_interval_in_ms; return confValue != null ? confValue : Config.otc_backlog_expiration_interval_in_ms_default; } {code} 2. How to read the config value? I am seeing some {{Integer.getInteger(propName, defaultValue)}}, but this looks strange to me. I think changes from JMX would not even be reflected. Thus I am calling the getter from above: {{DatabaseDescriptor.getOtcBacklogExpirationInterval()}}. Is the latter OK? > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904949#comment-15904949 ] Christian Esken edited comment on CASSANDRA-13265 at 3/10/17 3:42 PM: -- I am nearly done with the configuration, and have two questions about it: 1. How to handle the default value? My approach is to pre-configure the default value in Config: {code} public static final int otc_backlog_expiration_interval_in_ms_default = 200; public volatile Integer otc_backlog_expiration_interval_in_ms = otc_backlog_expiration_interval_in_ms_default; {code} Additionally I will handle null values, that might have been set via JMX in the getter of DatabaseDescriptor: {code} public static Integer getOtcBacklogExpirationInterval() { Integer confValue = conf.otc_backlog_expiration_interval_in_ms; return confValue != null ? confValue : Config.otc_backlog_expiration_interval_in_ms_default; } {code} 2. How to read the config value? I am seeing some {{Integer.getInteger(propName, defaultValue)}}, but this looks strange to me. I think changes from JMX would not even be reflected. Thus I am calling the getter from above: {{DatabaseDescriptor.getOtcBacklogExpirationInterval()}}. Is the latter OK? was (Author: cesken): I am nearly done with the configuration, and have two questions about it: 1. How to handle the default value? My approach is to pre-configure the default value in Config: {code} public static final int otc_backlog_expiration_interval_in_ms_default = 200; public volatile Integer otc_backlog_expiration_interval_in_ms = otc_backlog_expiration_interval_in_ms_default; {code} Additionally I will handle null values, that might have been set via JMX in the getter of DatabaseDescriptor: {code} public static Integer getOtcBacklogExpirationInterval() { Integer confValue = conf.otc_backlog_expiration_interval_in_ms; return confValue != null ? confValue : Config.otc_backlog_expiration_interval_in_ms_default; } {code} 2. How to read the config value? I am seeing some {{Integer.getInteger(propName, defaultValue)}}, but this looks strange to me. I think changes from JMX would not even be reflected. Thus I am calling the getter from above: {{DatabaseDescriptor.getOtcBacklogExpirationInterval()}}. Is hte latter OK? > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904949#comment-15904949 ] Christian Esken edited comment on CASSANDRA-13265 at 3/10/17 11:46 AM: --- I am nearly done with the configuration, and have two questions about it: 1. How to handle the default value? My approach is to pre-configure the default value in Config: {code} public static final int otc_backlog_expiration_interval_in_ms_default = 200; public volatile Integer otc_backlog_expiration_interval_in_ms = otc_backlog_expiration_interval_in_ms_default; {code} Additionally I will handle null values, that might have been set via JMX in the getter of DatabaseDescriptor: {code} public static Integer getOtcBacklogExpirationInterval() { Integer confValue = conf.otc_backlog_expiration_interval_in_ms; return confValue != null ? confValue : Config.otc_backlog_expiration_interval_in_ms_default; } {code} 2. How to read the config value? I am seeing some {{Integer.getInteger(propName, defaultValue)}}, but this looks strange to me. I think changes from JMX would not even be reflected. Thus I am calling the getter from above: {{DatabaseDescriptor.getOtcBacklogExpirationInterval()}}. Is hte latter OK? was (Author: cesken): I am nearly done with the configuration, and have two questions about it: 1. How to handle the default value? My approach is to pre-configure the default value in Config: {code} public static final int otc_backlog_expiration_interval_in_ms_default = 200; public volatile Integer otc_backlog_expiration_interval_in_ms = otc_backlog_expiration_interval_in_ms_default; {code} Additionally I will handle null values, that might have been set via JMX in the getter of DatabaseDescriptor: {code} public static Integer getOtcBacklogExpirationInterval() { Integer confValue = conf.otc_backlog_expiration_interval_in_ms; return confValue != null ? confValue : Config.otc_backlog_expiration_interval_in_ms_default; } {code} 2. How to read the config value? I am seeing some Integer.getInteger(propName, defaultValue), but this looks strange to me. I think changes from JMX would not even be reflected. Thus I am calling the getter from above: {{DatabaseDescriptor.getOtcBacklogExpirationInterval()}}. Is thte latter OK? > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904949#comment-15904949 ] Christian Esken edited comment on CASSANDRA-13265 at 3/10/17 11:46 AM: --- I am nearly done with the configuration, and have two questions about it: 1. How to handle the default value? My approach is to pre-configure the default value in Config: {code} public static final int otc_backlog_expiration_interval_in_ms_default = 200; public volatile Integer otc_backlog_expiration_interval_in_ms = otc_backlog_expiration_interval_in_ms_default; {code} Additionally I will handle null values, that might have been set via JMX in the getter of DatabaseDescriptor: {code} public static Integer getOtcBacklogExpirationInterval() { Integer confValue = conf.otc_backlog_expiration_interval_in_ms; return confValue != null ? confValue : Config.otc_backlog_expiration_interval_in_ms_default; } {code} 2. How to read the config value? I am seeing some Integer.getInteger(propName, defaultValue), but this looks strange to me. I think changes from JMX would not even be reflected. Thus I am calling the getter from above: {{DatabaseDescriptor.getOtcBacklogExpirationInterval()}}. Is thte latter OK? was (Author: cesken): I am nearly done with the configuration, and have two questions about it: 1. How to handle the default value? My approach is to pre-configure the default value in Config: {code} public static final int otc_backlog_expiration_interval_in_ms_default = 200; public volatile Integer otc_backlog_expiration_interval_in_ms = otc_backlog_expiration_interval_in_ms_default; {code} Additionally I will handle null values, that might come in via MBean in the getter of DatabaseDescriptor: {code} public static Integer getOtcBacklogExpirationInterval() { Integer confValue = conf.otc_backlog_expiration_interval_in_ms; return confValue != null ? confValue : Config.otc_backlog_expiration_interval_in_ms_default; } {code} 2. How to read the config value? I am seeing some Integer.getInteger(propName, defaultValue), but this looks strange to me. I think changes from JMX would not even be reflected. Thus I am calling the getter from above: {{DatabaseDescriptor.getOtcBacklogExpirationInterval()}}. Is thte latter OK? > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904949#comment-15904949 ] Christian Esken edited comment on CASSANDRA-13265 at 3/10/17 11:45 AM: --- I am nearly done with the configuration, and have two questions about it: 1. How to handle the default value? My approach is to pre-configure the default value in Config: {code} public static final int otc_backlog_expiration_interval_in_ms_default = 200; public volatile Integer otc_backlog_expiration_interval_in_ms = otc_backlog_expiration_interval_in_ms_default; {code} Additionally I will handle null values, that might come in via MBean in the getter of DatabaseDescriptor: {code} public static Integer getOtcBacklogExpirationInterval() { Integer confValue = conf.otc_backlog_expiration_interval_in_ms; return confValue != null ? confValue : Config.otc_backlog_expiration_interval_in_ms_default; } {code} 2. How to read the config value? I am seeing some Integer.getInteger(propName, defaultValue), but this looks strange to me. I think changes from JMX would not even be reflected. Thus I am calling the getter from above: {{DatabaseDescriptor.getOtcBacklogExpirationInterval()}}. Is thte latter OK? was (Author: cesken): I am nearly done with the configuration, and have two questions about it: 1. How to handle the default value? My approach is to pre-configure the default value in Config: {code} public static final int otc_backlog_expiration_interval_in_ms_default = 200; public volatile Integer otc_backlog_expiration_interval_in_ms = otc_backlog_expiration_interval_in_ms_default; {code} - Additionally I will handle null values, that might come in via MBean in the getter of DatabaseDescriptor: {code} public static Integer getOtcBacklogExpirationInterval() { Integer confValue = conf.otc_backlog_expiration_interval_in_ms; return confValue != null ? confValue : Config.otc_backlog_expiration_interval_in_ms_default; } {code} 2. How to read the config value? I am seeing some Integer.getInteger(propName, defaultValue), but this looks strange to me. I think changes from JMX would not even be reflected. Thus I am calling the getter from above: {{DatabaseDescriptor.getOtcBacklogExpirationInterval()}}. Is thte latter OK? > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904949#comment-15904949 ] Christian Esken commented on CASSANDRA-13265: - I am nearly done with the configuration, and have two questions about it: 1. How to handle the default value? My approach is to pre-configure the default value in Config: {code} public static final int otc_backlog_expiration_interval_in_ms_default = 200; public volatile Integer otc_backlog_expiration_interval_in_ms = otc_backlog_expiration_interval_in_ms_default; {code} - Additionally I will handle null values, that might come in via MBean in the getter of DatabaseDescriptor: {code} public static Integer getOtcBacklogExpirationInterval() { Integer confValue = conf.otc_backlog_expiration_interval_in_ms; return confValue != null ? confValue : Config.otc_backlog_expiration_interval_in_ms_default; } {code} 2. How to read the config value? I am seeing some Integer.getInteger(propName, defaultValue), but this looks strange to me. I think changes from JMX would not even be reflected. Thus I am calling the getter from above: {{DatabaseDescriptor.getOtcBacklogExpirationInterval()}}. Is thte latter OK? > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901495#comment-15901495 ] Christian Esken commented on CASSANDRA-13265: - I pushed the changes noted in the former comment. I am plannig to do "Configurabilty and default value" tomorrow. > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901414#comment-15901414 ] Christian Esken edited comment on CASSANDRA-13265 at 3/8/17 4:12 PM: - I will update the status while working on the individual topics: (x)This needs to be configurable from the YAML and via JMX. (/)It should include drained messages as well. (/)Typo, "thus letting" (/)Extra line break (/)We don't do/allow author tags (/)Use TimeUnit => Additionally I am now determining the timeout value automatic (/)It isn't using the constant. (/)Just in case maybe assert the droppable/non-droppable status of the verbs. Or does it not matter since the tests will fail anyways? => It wouldn't matter, but I added a check make it more explicit. was (Author: cesken): I will update the status while working on the individual topics: (x)This needs to be configurable from the YAML and via JMX. (/)It should include drained messages as well. (/)Typo, "thus letting" (/)Extra line break (/)We don't do/allow author tags (/)Use TimeUnit (/)It isn't using the constant. (x)Just in case maybe assert the droppable/non-droppable status of the verbs. Or does it not matter since the tests will fail anyways? > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901414#comment-15901414 ] Christian Esken edited comment on CASSANDRA-13265 at 3/8/17 3:36 PM: - I will update the status while working on the individual topics: (x)This needs to be configurable from the YAML and via JMX. (/)It should include drained messages as well. (/)Typo, "thus letting" (/)Extra line break (/)We don't do/allow author tags (/)Use TimeUnit (/)It isn't using the constant. (x)Just in case maybe assert the droppable/non-droppable status of the verbs. Or does it not matter since the tests will fail anyways? was (Author: cesken): I will update the status while working on the individual topics: (x)This needs to be configurable from the YAML and via JMX. (/)It should include drained messages as well. (/)Typo, "thus letting" (/)Extra line break (/)We don't do/allow author tags (x)Use TimeUnit (x)It isn't using the constant. (x)Just in case maybe assert the droppable/non-droppable status of the verbs. Or does it not matter since the tests will fail anyways? > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901414#comment-15901414 ] Christian Esken commented on CASSANDRA-13265: - I will update the status while working on the individual topics: (x)This needs to be configurable from the YAML and via JMX. (/)It should include drained messages as well. (/)Typo, "thus letting" (/)Extra line break (/)We don't do/allow author tags (x)Use TimeUnit (x)It isn't using the constant. (x)Just in case maybe assert the droppable/non-droppable status of the verbs. Or does it not matter since the tests will fail anyways? > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901047#comment-15901047 ] Christian Esken edited comment on CASSANDRA-13265 at 3/8/17 10:36 AM: -- bq. It should include drained messages as well OMG, right. I thought about that, and concluded it is correct to exclude. But obviously the messages are already drained from the queue, so they must be added. bq. We don't do/allow author tags OOPS, always this oversmart IDE :-) bq. This needs to be configurable Oh. Even more work. This is getting bigger than anticipated, but I am happy to do it. Thanks for the hints on how do do the Configuration. I will work on it later this day or tomorrow. was (Author: cesken): bq. It should include drained messages as well OMG, right. I thought about that concluded it is correct to exclude. But obviously the messages are already drained from the queue, so they must be added. bq. We don't do/allow author tags OOPS, always this oversmart IDE :-) bq. This needs to be configurable Oh. Even more work. This is getting bigger than anticipated, but I am happy to do it. Thanks for the hints on how do do the Configuration. I will work on it later this day or tomorrow. > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901047#comment-15901047 ] Christian Esken commented on CASSANDRA-13265: - bq. It should include drained messages as well OMG, right. I thought about that concluded it is correct to exclude. But obviously the messages are already drained from the queue, so they must be added. bq. We don't do/allow author tags OOPS, always this oversmart IDE :-) bq. This needs to be configurable Oh. Even more work. This is getting bigger than anticipated, but I am happy to do it. Thanks for the hints on how do do the Configuration. I will work on it later this day or tomorrow. > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15899773#comment-15899773 ] Christian Esken commented on CASSANDRA-13265: - I added three changes: - Implemented unit test - Count dropped messages if Cassandra cannot write to the socket - Fix the QueuedMessage.isTimedOut(), which was prone to a System.nanoTime() wrap bug, as it used a check of type: aNanos < bNanos > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897641#comment-15897641 ] Christian Esken commented on CASSANDRA-13265: - I have one question about a code fragment. When the socket is not available the backlog is cleared, but no drops are counted. Looks like an omission to me, or is it intentional? {{dropped.addAndGet(backlog.size(); )}} would be an approximation. We likely cannot get closer as {{backlog.clear();}} does not tell how much elements were removed. {code} if (qm.isTimedOut()) dropped.incrementAndGet(); else if (socket != null || connect()) writeConnected(qm, count == 1 && backlog.isEmpty()); else { // clear out the queue, else gossip messages back up. drainedMessages.clear(); // dropped.addAndGet(backlog.size()); // TODO Should dropped statistics be counted in this case? backlog.clear(); break inner; } {code} > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897121#comment-15897121 ] Christian Esken edited comment on CASSANDRA-13265 at 3/6/17 11:23 AM: -- I was already looking into doing a unit test it but it requires access to the queue which means making it package level access and using {{@VisibleForTesting}}. I will do that tomorrow, unless there are arguments against it. I will also check alternatives. was (Author: cesken): I was already looking into doing a unit test it but it requires access to the queue which means making it package level access and using {{@VisibleForTesting}}. I will do that tomorrow, unless there are arguments against it. > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897121#comment-15897121 ] Christian Esken commented on CASSANDRA-13265: - I was already looking into doing a unit test it but it requires access to the queue which means making it package level access and using {{@VisibleForTesting}}. I will do that tomorrow, unless there are arguments against it. > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892471#comment-15892471 ] Christian Esken commented on CASSANDRA-13265: - Change to System.nanoTime() is done. I kept the logging, but stripped it down and guarded it with a {{isTraceEnabled()}} > Epxiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892332#comment-15892332 ] Christian Esken edited comment on CASSANDRA-13265 at 3/2/17 2:36 PM: - bq. use System.nanoTime() instead of System.currentTimeMillis(). Agreed, {{System.nanoTime()}} is slightly better here. In real life it won't make a terrific difference even with the worst clocks, but "_lets do things right"_. :-) . I never looked up the native code for nanoTIme(), but I bet on Unix it uses the POSIX {{clock_gettime(CLOCK_MONOTONIC, ...)}}. bq. I don't think we want to traverse the entire backlog. [...] Your argument "reasonably in ascending timestamp order" makes sense, if all entries would have the same expiration time. But the Verbs have different timeouts, the defaults ranging from 2 to 60 seconds. Thus iterating the whole Queue should be done, as in the worst case we will remove nothing even though most entries are timed out. was (Author: cesken): bq. use System.nanoTime() instead of System.currentTimeMillis(). Agreed, {{System.nanoTime()}} is slightly better here. In real life it won't make a terrific difference even with the worst clocks, but "_lets do things right"_. :-) . I never looked up the native code for nanoTIme(), but I bet on Unix it uses the POSIX {{clock_gettime(CLOCK_MONOTONIC, ...)}}. bq. I don't think we want to traverse the entire backlog. [...] Your argument "reasonably in ascending timestamp order" makes sense, if all entries would have the same expiration time. But the Verbs have different timeouts. Thus iterating the whole Queue should be done, as in the worst case we will remove nothing even though most entries are timed out. > Epxiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892332#comment-15892332 ] Christian Esken commented on CASSANDRA-13265: - bq. use System.nanoTime() instead of System.currentTimeMillis(). Agreed, {{System.nanoTime()}} is slightly better here. In real life it won't make a terrific difference even with the worst clocks, but "_lets do things right"_. :-) . I never looked up the native code for nanoTIme(), but I bet on Unix it uses the POSIX {{clock_gettime(CLOCK_MONOTONIC, ...)}}. bq. I don't think we want to traverse the entire backlog. [...] Your argument "reasonably in ascendingascending timestamp order" makes sense, if all entries would have the same expiration time. But the Verbs have different timeouts. Thus iterating the whole Queue should be done, as in the worst case we will remove nothing even though most entries are timed out. > Epxiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892332#comment-15892332 ] Christian Esken edited comment on CASSANDRA-13265 at 3/2/17 2:34 PM: - bq. use System.nanoTime() instead of System.currentTimeMillis(). Agreed, {{System.nanoTime()}} is slightly better here. In real life it won't make a terrific difference even with the worst clocks, but "_lets do things right"_. :-) . I never looked up the native code for nanoTIme(), but I bet on Unix it uses the POSIX {{clock_gettime(CLOCK_MONOTONIC, ...)}}. bq. I don't think we want to traverse the entire backlog. [...] Your argument "reasonably in ascending timestamp order" makes sense, if all entries would have the same expiration time. But the Verbs have different timeouts. Thus iterating the whole Queue should be done, as in the worst case we will remove nothing even though most entries are timed out. was (Author: cesken): bq. use System.nanoTime() instead of System.currentTimeMillis(). Agreed, {{System.nanoTime()}} is slightly better here. In real life it won't make a terrific difference even with the worst clocks, but "_lets do things right"_. :-) . I never looked up the native code for nanoTIme(), but I bet on Unix it uses the POSIX {{clock_gettime(CLOCK_MONOTONIC, ...)}}. bq. I don't think we want to traverse the entire backlog. [...] Your argument "reasonably in ascendingascending timestamp order" makes sense, if all entries would have the same expiration time. But the Verbs have different timeouts. Thus iterating the whole Queue should be done, as in the worst case we will remove nothing even though most entries are timed out. > Epxiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891905#comment-15891905 ] Christian Esken edited comment on CASSANDRA-13265 at 3/2/17 12:36 PM: -- Ariel wrote: {quote} Expiration is based on time. There is no point in attempting expiration again immediately because almost nothing will have expired. It allows one bad connection to consume resources it shouldn't in the form of hijacking a thread to iterate a list. I don't see the downside of switching from a boolean to a long and CASing that instead. If we aren't confident in it we can set a small interval so that it still checks for expiration often although I think that just generates useless work. We can't make timeouts pass faster. {quote} [~aweisberg], I understand that you want to CAS on "lastExpirationTime", right? I am also for doing this. Its fitting better and still keeps the change simple. In that case the Thread should iterate the whole Queue, and not bail out on the first hit. I will change it in the PR. was (Author: cesken): Ariel wrote: {quote} Expiration is based on time. There is no point in attempting expiration again immediately because almost nothing will have expired. It allows one bad connection to consume resources it shouldn't in the form of hijacking a thread to iterate a list. I don't see the downside of switching from a boolean to a long and CASing that instead. If we aren't confident in it we can set a small interval so that it still checks for expiration often although I think that just generates useless work. We can't make timeouts pass faster. {quote} [~aweisberg]: I understand that you want to CAS on "lastExpirationTime", right? I am also for doing this. Its fitting better and still keeps the change simple. In that case the Thread should iterate the whole Queue, and not bail out on the first hit. I will change it in the PR. > Epxiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891905#comment-15891905 ] Christian Esken edited comment on CASSANDRA-13265 at 3/2/17 12:35 PM: -- Ariel wrote: {quote} Expiration is based on time. There is no point in attempting expiration again immediately because almost nothing will have expired. It allows one bad connection to consume resources it shouldn't in the form of hijacking a thread to iterate a list. I don't see the downside of switching from a boolean to a long and CASing that instead. If we aren't confident in it we can set a small interval so that it still checks for expiration often although I think that just generates useless work. We can't make timeouts pass faster. {quote} [~aweisberg]: I understand that you want to CAS on "lastExpirationTime", right? I am also for doing this. Its fitting better and still keeps the change simple. In that case the Thread should iterate the whole Queue, and not bail out on the first hit. I will change it in the PR. was (Author: cesken): Ariel wrote: {quote} Expiration is based on time. There is no point in attempting expiration again immediately because almost nothing will have expired. It allows one bad connection to consume resources it shouldn't in the form of hijacking a thread to iterate a list. I don't see the downside of switching from a boolean to a long and CASing that instead. If we aren't confident in it we can set a small interval so that it still checks for expiration often although I think that just generates useless work. We can't make timeouts pass faster. {quote} I understand that you want to CAS on "lastExpirationTime", right? I am also for doing this. Its fitting better and still keeps the change simple. In that case the Thread should iterate the whole Queue, and not bail out on the first hit. I will change it in the PR. > Epxiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892114#comment-15892114 ] Christian Esken edited comment on CASSANDRA-13265 at 3/2/17 11:47 AM: -- I updated the PR with the following changes: - Variable names / modifiers (static) - Expiration is based on time - Expiration inspects the whole Queue (no bailing out) This is really hard to reproduce and to test. Because of that I did not yet remove the BACKLOG_EXPIRATION_DEBUG. If you have a hint about test possibilities, let me know. was (Author: cesken): I updated the PR with the following changes: - Variable names / modifiers (static) - Expiration is based on time - Expiration inspects the whole Queue (no bailing out) > Epxiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892114#comment-15892114 ] Christian Esken commented on CASSANDRA-13265: - I updated the PR with the following changes: - Variable names / modifiers (static) - Expiration is based on time - Expiration inspects the whole Queue (no bailing out) > Epxiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891896#comment-15891896 ] Christian Esken edited comment on CASSANDRA-13265 at 3/2/17 9:26 AM: - bq. How often does this issue occur? Not very often, but it happens. I assume that special scenarios trigger this: - High write-throughput (especially when you have write spikes it is easy to get above 1024 messages) - A long Stop-the-World GC phase (because then even more Threads could start to write and iterate the Queue) - Temporary network overload to the target node (because nothing is taken from the Queue in that case). - Many non-droppable entries in the Queue (because then the loop does not bail out: "if (! qm.droppable) continue;" ) Temporary overloads usually resolve themselves, but in this case it does not. As soon as the Queue has reached a certain size limit, most time is spent in iterating the Queue, and the reader is starved (1 reader Thread fights against 324 Threads that do a read-lock by calling iterator.next()). was (Author: cesken): bq. How often does this issue occur? Not very often, but it happens. I assume that special scenarios trigger this: - High write-throughput (especially when you have write spikes it is easy to get above 1024 messages) - A long Stop-the-World GC phase (because then even more Threads could start to write and iterate the Queue) - Temporary network overload to the target node (because nothing is taken from the Queue in that case). - Many non-droppable entries in the Queue (because then the loop does not bail out: "if (! qm.droppable) continue;" ) Usually temporary overloads resolve themselves, but in this case it does not. As soon as the Queue has reached a certain size limit, most time is spent in iterating the Queue, and the reader is starved (1 reader Thread fights against 324 Threads that do a read-lock by calling iterator.next()). > Epxiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891905#comment-15891905 ] Christian Esken edited comment on CASSANDRA-13265 at 3/2/17 9:25 AM: - Ariel wrote: {quote} Expiration is based on time. There is no point in attempting expiration again immediately because almost nothing will have expired. It allows one bad connection to consume resources it shouldn't in the form of hijacking a thread to iterate a list. I don't see the downside of switching from a boolean to a long and CASing that instead. If we aren't confident in it we can set a small interval so that it still checks for expiration often although I think that just generates useless work. We can't make timeouts pass faster. {quote} I understand that you want to CAS on "lastExpirationTime", right? I am also for doing this. Its fitting better and still keeps the change simple. In that case the Thread should iterate the whole Queue, and not bail out on the first hit. I will change it in the PR. was (Author: cesken): Ariel wrote: {quote} Expiration is based on time. There is no point in attempting expiration again immediately because almost nothing will have expired. It allows one bad connection to consume resources it shouldn't in the form of hijacking a thread to iterate a list. I don't see the downside of switching from a boolean to a long and CASing that instead. If we aren't confident in it we can set a small interval so that it still checks for expiration often although I think that just generates useless work. We can't make timeouts pass faster. {quote} I understand that you want to CAS on "lastExpirationTime", rjght? I am also for doing this. Its fitting better and still keeps the change simple. In that case the Thread should iterate the whole Queue, and not bail out on the first hit. I will change it in the PR. > Epxiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891905#comment-15891905 ] Christian Esken commented on CASSANDRA-13265: - Ariel wrote: {quote} Expiration is based on time. There is no point in attempting expiration again immediately because almost nothing will have expired. It allows one bad connection to consume resources it shouldn't in the form of hijacking a thread to iterate a list. I don't see the downside of switching from a boolean to a long and CASing that instead. If we aren't confident in it we can set a small interval so that it still checks for expiration often although I think that just generates useless work. We can't make timeouts pass faster. {quote} I understand that you want to CAS on "lastExpirationTime", rjght? I am also for doing this. Its fitting better and still keeps the change simple. In that case the Thread should iterate the whole Queue, and not bail out on the first hit. I will change it in the PR. > Epxiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891896#comment-15891896 ] Christian Esken commented on CASSANDRA-13265: - bq. How often does this issue occur? Not very often, but it happens. I assume that special scenarios trigger this: - High write-throughput (especially when you have write spikes it is easy to get above 1024 messages) - A long Stop-the-World GC phase (because then even more Threads could start to write and iterate the Queue) - Temporary network overload to the target node (because nothing is taken from the Queue in that case). - Many non-droppable entries in the Queue (because then the loop does not bail out: "if (! qm.droppable) continue;" ) Usually temporary overloads resolve themselves, but in this case it does not. As soon as the Queue has reached a certain size limit, most time is spent in iterating the Queue, and the reader is starved (1 reader Thread fights against 324 Threads that do a read-lock by calling iterator.next()). > Epxiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890316#comment-15890316 ] Christian Esken commented on CASSANDRA-13265: - I created a fresh branch to have a clean commit. The Pull request is opened: https://github.com/apache/cassandra/pull/95 > Epxiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890199#comment-15890199 ] Christian Esken commented on CASSANDRA-13265: - Will do. Thanks for your quick feedback. > Epxiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890155#comment-15890155 ] Christian Esken commented on CASSANDRA-13265: - Here is the patch for reducing contention in the queue expiration. The patch wraps expireMessages() in expireMessagesConditionally(), which makes sure that only a single Thread will do expiration at the same time: https://github.com/apache/cassandra/compare/trunk...christian-esken:13265-3.0?expand=1 PS: Commits in this patch are not yet squashed. If the patch is good, I will create a proper branch to have a more clear history. > Epxiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Esken updated CASSANDRA-13265: Summary: Epxiration in OutboundTcpConnection can block the reader Thread (was: Communication breakdown in OutboundTcpConnection) > Epxiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Reopened] (CASSANDRA-13265) Communication breakdown in OutboundTcpConnection
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Esken reopened CASSANDRA-13265: - Reopening. While the "averageGap == 0" issue has been fixed, I would still want to fix the issue from the description. That issue is that multiple Threads do the expiration, which leads to unnecessary locking, more CPU usage and possible starvation of the reader Thread. I will prepare a patch, that fixes that. > Communication breakdown in OutboundTcpConnection > > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Communication breakdown in OutboundTcpConnection
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888439#comment-15888439 ] Christian Esken commented on CASSANDRA-13265: - Here is one possibly very important observation. It looks like Coalescing is doing an infinite loop while doing maybeSleep(). I checked 10 Thread dumps, and in each of them the Thread was at the same location. Is it possible that averageGap is 0? This would lead to infinite recursion. {code} private static boolean maybeSleep(int messages, long averageGap, long maxCoalesceWindow, Parker parker) { // only sleep if we can expect to double the number of messages we're sending in the time interval long sleep = messages * averageGap; // TODO can averageGap be 0 ? if (sleep > maxCoalesceWindow) return false; // assume we receive as many messages as we expect; apply the same logic to the future batch: // expect twice as many messages to consider sleeping for "another" interval; this basically translates // to doubling our sleep period until we exceed our max sleep window while (sleep * 2 < maxCoalesceWindow) sleep *= 2; // CoalescingStrategies:106 parker.park(sleep); return true; } {code} If sum is bigger than MEASURED_INTERVAL, then averageGap() returns 0. I am aware that this is highly unlikely, but I cannot explain the likely hanging in maybeSleep() line 106. {code} private long averageGap() { if (sum == 0) return Integer.MAX_VALUE; return MEASURED_INTERVAL / sum; } {code} > Communication breakdown in OutboundTcpConnection > > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13265) Communication breakdown in OutboundTcpConnection
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887685#comment-15887685 ] Christian Esken edited comment on CASSANDRA-13265 at 2/28/17 10:10 AM: --- I see your argument. On larger clusters this may get problematic. I will try to summarize the alternative solutions: - Offload expiration to a "random" regular Thread, but only a single one. If one Thread already expires ... -- ... let the other Threads continue (1) -- ... let the other Threads wait (2) - Use an "Expiration Thread Pool" (3). I am not (currently) in favor for it, and if I understood you correctly then it is also not your preference. I will implement option (1) today. Please see the attached Thread Dump to see which Threads are blocking. Here are two examples from the Thread Dumps. Mainly they are SharedPool-Worker threads, that either do iterator.remove() or iterator.next(). I think in the Threaddump there is also a HintDispatcher Thread that is parking on the same lock. java.util.concurrent.LinkedBlockingQueue$Itr.remove: {code} "SharedPool-Worker-294" #587 daemon prio=5 os_prio=0 tid=0x7fb69b11e260 nid=0x6090 waiting on condition [0x7fb162c0e000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00023a426218> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at java.util.concurrent.LinkedBlockingQueue.fullyLock(LinkedBlockingQueue.java:225) at java.util.concurrent.LinkedBlockingQueue$Itr.remove(LinkedBlockingQueue.java:840) at org.apache.cassandra.net.OutboundTcpConnection.expireMessages(OutboundTcpConnection.java:555) at org.apache.cassandra.net.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:165) at org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:771) at org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:744) at org.apache.cassandra.hints.HintVerbHandler.reply(HintVerbHandler.java:99) at org.apache.cassandra.hints.HintVerbHandler.doVerb(HintVerbHandler.java:94) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136) at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) at java.lang.Thread.run(Thread.java:745) {code} java.util.concurrent.LinkedBlockingQueue$Itr.next: {code} "SharedPool-Worker-295" #590 daemon prio=5 os_prio=0 tid=0x7fb69b1135b0 nid=0x608d waiting on condition [0x7fb162cd1000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00023a426218> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at java.util.concurrent.LinkedBlockingQueue.fullyLock(LinkedBlockingQueue.java:225) at java.util.concurrent.LinkedBlockingQueue$Itr.next(LinkedBlockingQueue.java:823) at org.apache.cassandra.net.OutboundTcpConnection.expireMessages(OutboundTcpConnection.java:550) at org.apache.cassandra.net.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:165) at org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:771) at
[jira] [Comment Edited] (CASSANDRA-13265) Communication breakdown in OutboundTcpConnection
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887685#comment-15887685 ] Christian Esken edited comment on CASSANDRA-13265 at 2/28/17 9:53 AM: -- I see your argument. On larger clusters this may get problematic. Lets evaluate different solutions: - Offload expiration to a "random" regular Thread, but only a single one. If one Thread already expires ... --- ... let the other Threads continue --- ... let the other Threads wait - Go with your idea of an "Expiration Thread Pool" Please see the attached Thread Dump to see which Threads are blocking. Here are two examples from the Thread Dumps. Mainly they are SharedPool-Worker threads, that either do iterator.remove() or iterator.next(). I think in the Threaddump there is also a HintDispatcher Thread that is parking on the same lock. java.util.concurrent.LinkedBlockingQueue$Itr.remove: {code} "SharedPool-Worker-294" #587 daemon prio=5 os_prio=0 tid=0x7fb69b11e260 nid=0x6090 waiting on condition [0x7fb162c0e000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00023a426218> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at java.util.concurrent.LinkedBlockingQueue.fullyLock(LinkedBlockingQueue.java:225) at java.util.concurrent.LinkedBlockingQueue$Itr.remove(LinkedBlockingQueue.java:840) at org.apache.cassandra.net.OutboundTcpConnection.expireMessages(OutboundTcpConnection.java:555) at org.apache.cassandra.net.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:165) at org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:771) at org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:744) at org.apache.cassandra.hints.HintVerbHandler.reply(HintVerbHandler.java:99) at org.apache.cassandra.hints.HintVerbHandler.doVerb(HintVerbHandler.java:94) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136) at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) at java.lang.Thread.run(Thread.java:745) {code} java.util.concurrent.LinkedBlockingQueue$Itr.next: {code} "SharedPool-Worker-295" #590 daemon prio=5 os_prio=0 tid=0x7fb69b1135b0 nid=0x608d waiting on condition [0x7fb162cd1000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00023a426218> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at java.util.concurrent.LinkedBlockingQueue.fullyLock(LinkedBlockingQueue.java:225) at java.util.concurrent.LinkedBlockingQueue$Itr.next(LinkedBlockingQueue.java:823) at org.apache.cassandra.net.OutboundTcpConnection.expireMessages(OutboundTcpConnection.java:550) at org.apache.cassandra.net.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:165) at org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:771) at org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:744) at org.apache.cassandra.hints.HintVerbHandler.reply(HintVerbHandler.java:99) at
[jira] [Commented] (CASSANDRA-13265) Communication breakdown in OutboundTcpConnection
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887685#comment-15887685 ] Christian Esken commented on CASSANDRA-13265: - I see your argument. On larger clusters this may get problematic. Lets evaluate different solutions: - Offload expiration to a "random" regular Thread, but only a single one. If one Thread already expires ... --- ... let the other Threads continue --- ... let the other Threads wait - Go with your idea of an "Expiration Thread Pool" Please see the attached Thread Dump to see which Threads are blocking. Mainly they are SharedPool-Worker threads, that either do iterator.remove() or iterator.next(). {code} "SharedPool-Worker-294" #587 daemon prio=5 os_prio=0 tid=0x7fb69b11e260 nid=0x6090 waiting on condition [0x7fb162c0e000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00023a426218> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at java.util.concurrent.LinkedBlockingQueue.fullyLock(LinkedBlockingQueue.java:225) at java.util.concurrent.LinkedBlockingQueue$Itr.remove(LinkedBlockingQueue.java:840) at org.apache.cassandra.net.OutboundTcpConnection.expireMessages(OutboundTcpConnection.java:555) at org.apache.cassandra.net.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:165) at org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:771) at org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:744) at org.apache.cassandra.hints.HintVerbHandler.reply(HintVerbHandler.java:99) at org.apache.cassandra.hints.HintVerbHandler.doVerb(HintVerbHandler.java:94) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136) at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) at java.lang.Thread.run(Thread.java:745) {code} {code} "SharedPool-Worker-295" #590 daemon prio=5 os_prio=0 tid=0x7fb69b1135b0 nid=0x608d waiting on condition [0x7fb162cd1000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00023a426218> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at java.util.concurrent.LinkedBlockingQueue.fullyLock(LinkedBlockingQueue.java:225) at java.util.concurrent.LinkedBlockingQueue$Itr.next(LinkedBlockingQueue.java:823) at org.apache.cassandra.net.OutboundTcpConnection.expireMessages(OutboundTcpConnection.java:550) at org.apache.cassandra.net.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:165) at org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:771) at org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:744) at org.apache.cassandra.hints.HintVerbHandler.reply(HintVerbHandler.java:99) at org.apache.cassandra.hints.HintVerbHandler.doVerb(HintVerbHandler.java:94) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at
[jira] [Commented] (CASSANDRA-13265) Communication breakdown in OutboundTcpConnection
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883172#comment-15883172 ] Christian Esken commented on CASSANDRA-13265: - Link to the current patch: https://github.com/apache/cassandra/compare/trunk...christian-esken:13265-3.0?expand=1 > Communication breakdown in OutboundTcpConnection > > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Communication breakdown in OutboundTcpConnection
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883047#comment-15883047 ] Christian Esken commented on CASSANDRA-13265: - The Thread dumps show, that several Threads park on the same objects. - 324 Threads are waiting on the same object, trying to iterate over Queue (expiration) - 24 Threads wait on a different object, as far as we see they try to read from the Queue {code} --- cassandra.pb-cache4-dus.2017-02-20-01-41-14.td --- 1 - parking to wait for <0x0001c04b1748> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x0001c056d4f0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x0001c0579c60> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 24 - parking to wait for <0x0001c058ce50> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x0001c058e520> (a java.util.concurrent.Semaphore$NonfairSync) 1 - parking to wait for <0x0001c058ee50> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x0001c0592bc0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x0001c0593058> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x0001c0593ae0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x0001c05958d0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x0001c059f788> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 4 - parking to wait for <0x0001c07f5ea8> (a java.util.concurrent.SynchronousQueue$TransferStack) 1 - parking to wait for <0x0001c0df0548> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x0001c4b52790> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x0001c56a7ca8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x0001c56beea8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x0001c56bf2d8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 324 - parking to wait for <0x0001c5d5a150> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) 1 - parking to wait for <0x0001c628edb0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x0001c6290b78> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x0001c62958a8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x0001c6295b08> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x0001c72343a8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x0001c7581d58> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x0001c8dd5738> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x0001ccdc3b80> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x0001cd22e1b0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x0001f3c39428> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x0001fb43f5d0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 1 - parking to wait for <0x0002003b6018> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) {code} > Communication breakdown in OutboundTcpConnection > > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version
[jira] [Updated] (CASSANDRA-13265) Communication breakdown in OutboundTcpConnection
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Esken updated CASSANDRA-13265: Attachment: cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz Thread Dump > Communication breakdown in OutboundTcpConnection > > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13265) Communication breakdown in OutboundTcpConnection
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Esken updated CASSANDRA-13265: Attachment: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz Class Histogram > Communication breakdown in OutboundTcpConnection > > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13265) Communication breakdown in OutboundTcpConnection
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Esken updated CASSANDRA-13265: Description: I observed that sometimes a single node in a Cassandra cluster fails to communicate to the other nodes. This can happen at any time, during peak load or low load. Restarting that single node from the cluster fixes the issue. Before going in to details, I want to state that I have analyzed the situation and am already developing a possible fix. Here is the analysis so far: - A Threaddump in this situation showed 324 Threads in the OutboundTcpConnection class that want to lock the backlog queue for doing expiration. - A class histogram shows 262508 instances of OutboundTcpConnection$QueuedMessage. What is the effect of it? As soon as the Cassandra node has reached a certain amount of queued messages, it starts thrashing itself to death. Each of the Thread fully locks the Queue for reading and writing by calling iterator.next(), making the situation worse and worse. - Writing: Only after 262508 locking operation it can progress with actually writing to the Queue. - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and fully lock the Queue This means: Writing blocks the Queue for reading, and readers might even be starved which makes the situation even worse. - The setup is: - 3-node cluster - replication factor 2 - Consistency LOCAL_ONE - No remote DC's - high write throughput (10 INSERT statements per second and more during peak times). was: I observed that sometimes a single node in a Cassandra cluster fails to communicate to the other nodes. This can happen at any time, during peak load or low load. Restarting that single node from the cluster fixes the issue. Before going in to details, I want to state that I have analyzed the situation and am already developing a possible fix. Here is the analysis so far: - A Threaddump in this situation showed 324 Threads in the OutboundTcpConnection class that want to lock the backlog queue for doing expiration. - A class histogram shows 262508 instances of OutboundTcpConnection$QueuedMessage. What is the effect of it? As soon as the Cassandra node has reached that state, it never gets out of it by itself, it is thrashing itself to death instead, as each of the Thread fully locks the Queue for reading and writing by calling iterator.next(). - Writing: Only after 262508 locking operation it can progress with actually writing to the Queue. - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and fully lock the Queue This means: Writing blocks the Queue for reading, and readers might even be starved which makes the situation even worse. - The setup is: - 3-node cluster - replication factor 2 - Consistency LOCAL_ONE - No remote DC's - high write throughput (10 INSERT statements per second and more during peak times). > Communication breakdown in OutboundTcpConnection > > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by
[jira] [Updated] (CASSANDRA-13265) Communication breakdown in OutboundTcpConnection
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Esken updated CASSANDRA-13265: Description: I observed that sometimes a single node in a Cassandra cluster fails to communicate to the other nodes. This can happen at any time, during peak load or low load. Restarting that single node from the cluster fixes the issue. Before going in to details, I want to state that I have analyzed the situation and am already developing a possible fix. Here is the analysis so far: - A Threaddump in this situation showed 324 Threads in the OutboundTcpConnection class that want to lock the backlog queue for doing expiration. - A class histogram shows 262508 instances of OutboundTcpConnection$QueuedMessage. What is the effect of it? As soon as the Cassandra node has reached that state, it never gets out of it by itself, it is thrashing itself to death instead, as each of the Thread fully locks the Queue for reading and writing by calling iterator.next(). - Writing: Only after 262508 locking operation it can progress with actually writing to the Queue. - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and fully lock the Queue This means: Writing blocks the Queue for reading, and readers might even be starved which makes the situation even worse. - The setup is: - 3-node cluster - replication factor 2 - Consistency LOCAL_ONE - No remote DC's - high write throughput (10 INSERT statements per second and more during peak times). was: I observed that sometimes a single node in a Cassandra cluster fails to communicate to the other nodes. This can happen at any time, during peak load or low load. Restarting that single node from the cluster fixes the issue. Before going in to details, I want to state that I have analyzed the situation and am already developing a possible fix. Here is the analysis so far: - A Threaddump in this situation showed that 324 Threads in the OutboundTcpConnection class wanted to lock the backlog queue for doing expiration. - A class histogram shows 262508 instances of OutboundTcpConnection$QueuedMessage. What is the effect of it? As soon as the Cassandra node has reached that state, it never gets out of it by itself, it is thrashing itself to death instead, as each of the Thread fully locks the Queue for reading and writing by calling iterator.next(). - Writing: Only after 262508 locking operation it can progress with actually writing to the Queue. - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and fully lock the Queue This means: Writing blocks the Queue for reading, and readers might even be starved which makes the situation even worse. - The setup is: - 3-node cluster - replication factor 2 - Consistency LOCAL_ONE - No remote DC's - high write throughput (10 INSERT statements per second and more during peak times). > Communication breakdown in OutboundTcpConnection > > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached that > state, it never gets out of it by itself, it is thrashing itself to death > instead, as each of the Thread fully locks the Queue for reading and writing > by calling iterator.next(). > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CASSANDRA-13265) Communication breakdown in OutboundTcpConnection
Christian Esken created CASSANDRA-13265: --- Summary: Communication breakdown in OutboundTcpConnection Key: CASSANDRA-13265 URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 Project: Cassandra Issue Type: Bug Environment: Cassandra 3.0.9 Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version 1.8.0_112-b15) Linux 3.16 Reporter: Christian Esken I observed that sometimes a single node in a Cassandra cluster fails to communicate to the other nodes. This can happen at any time, during peak load or low load. Restarting that single node from the cluster fixes the issue. Before going in to details, I want to state that I have analyzed the situation and am already developing a possible fix. Here is the analysis so far: - A Threaddump in this situation showed that 324 Threads in the OutboundTcpConnection class wanted to lock the backlog queue for doing expiration. - A class histogram shows 262508 instances of OutboundTcpConnection$QueuedMessage. What is the effect of it? As soon as the Cassandra node has reached that state, it never gets out of it by itself, it is thrashing itself to death instead, as each of the Thread fully locks the Queue for reading and writing by calling iterator.next(). - Writing: Only after 262508 locking operation it can progress with actually writing to the Queue. - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and fully lock the Queue This means: Writing blocks the Queue for reading, and readers might even be starved which makes the situation even worse. - The setup is: - 3-node cluster - replication factor 2 - Consistency LOCAL_ONE - No remote DC's - high write throughput (10 INSERT statements per second and more during peak times). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13264) NullPointerException in sstabledump
[ https://issues.apache.org/jira/browse/CASSANDRA-13264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Esken updated CASSANDRA-13264: Description: sstabledump can fail with NullPointerException, when not all files are present on Disk. This behavior is frequently present on Cassandra nodes with a high write-throughput and much disk I/O. It looks like Cassandra writes the Data file fast, and the other files (Statistics, ...) later when it has time to do so. This can be even a minute later. Technically it is OK for the Cassandra DB itself, but the sstabledump tool does not handle this gracefully. Current behavior: - NullPointerException (see below) Expected behavior: - More graceful behavior, e.g. a message to STDERR PS: This is minor priority, I may pick up the ticket myself if nobody else is faster. - {code} ~/apache-cassandra-3.9/tools/bin/sstabledump /appdata/mc-53346-big-Data.db Exception in thread "main" java.lang.NullPointerException at org.apache.cassandra.utils.FBUtilities.newPartitioner(FBUtilities.java:429) at org.apache.cassandra.tools.SSTableExport.metadataFromSSTable(SSTableExport.java:104) at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:180) {code} FBUtilities.java:429 is the following line, and validationMetadata == null: bq. if (validationMetadata.partitioner.endsWith("LocalPartitioner")) was: sstabledump can fail with NullPointerException, when not all files are present on Disk. This behavior is frequently present on Cassandra nodes with a high write-throughput and much disk I/O. It looks like Cassandra writes the Data file fast, and the other files (Statistics, ...) when it has time to do so (e.g. a minute later). This is OK, but sstabledump does not handle this gracefully. Current behavior: - NullPointerException (see below) Expected behavior: - More graceful behavior, e.g. a message to STDERR PS: This is minor priority, I may pick up the ticket myself if nobody else is faster. - {code} ~/apache-cassandra-3.9/tools/bin/sstabledump /appdata/mc-53346-big-Data.db Exception in thread "main" java.lang.NullPointerException at org.apache.cassandra.utils.FBUtilities.newPartitioner(FBUtilities.java:429) at org.apache.cassandra.tools.SSTableExport.metadataFromSSTable(SSTableExport.java:104) at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:180) {code} FBUtilities.java:429 is the following line, and validationMetadata == null: bq. if (validationMetadata.partitioner.endsWith("LocalPartitioner")) > NullPointerException in sstabledump > --- > > Key: CASSANDRA-13264 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13264 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Priority: Minor > Labels: twcs > > sstabledump can fail with NullPointerException, when not all files are > present on Disk. This behavior is frequently present on Cassandra nodes with > a high write-throughput and much disk I/O. It looks like Cassandra writes the > Data file fast, and the other files (Statistics, ...) later when it has time > to do so. This can be even a minute later. Technically it is OK for the > Cassandra DB itself, but the sstabledump tool does not handle this gracefully. > Current behavior: > - NullPointerException (see below) > Expected behavior: > - More graceful behavior, e.g. a message to STDERR > PS: This is minor priority, I may pick up the ticket myself if nobody else is > faster. > - > {code} > ~/apache-cassandra-3.9/tools/bin/sstabledump /appdata/mc-53346-big-Data.db > Exception in thread "main" java.lang.NullPointerException > at > org.apache.cassandra.utils.FBUtilities.newPartitioner(FBUtilities.java:429) > at > org.apache.cassandra.tools.SSTableExport.metadataFromSSTable(SSTableExport.java:104) > at > org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:180) > {code} > FBUtilities.java:429 is the following line, and validationMetadata == null: > bq. if (validationMetadata.partitioner.endsWith("LocalPartitioner")) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13264) NullPointerException in sstabledump
[ https://issues.apache.org/jira/browse/CASSANDRA-13264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Esken updated CASSANDRA-13264: Description: sstabledump can fail with NullPointerException, when not all files are present on Disk. This behavior is frequently present on Cassandra nodes with a high write-throughput and much disk I/O. It looks like Cassandra writes the Data file fast, and the other files (Statistics, ...) when it has time to do so (e.g. a minute later). This is OK, but sstabledump does not handle this gracefully. Current behavior: - NullPointerException (see below) Expected behavior: - More graceful behavior, e.g. a message to STDERR PS: This is minor priority, I may pick up the ticket myself if nobody else is faster. - {code} ~/apache-cassandra-3.9/tools/bin/sstabledump /appdata/mc-53346-big-Data.db Exception in thread "main" java.lang.NullPointerException at org.apache.cassandra.utils.FBUtilities.newPartitioner(FBUtilities.java:429) at org.apache.cassandra.tools.SSTableExport.metadataFromSSTable(SSTableExport.java:104) at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:180) {code} FBUtilities.java:429 is the following line, and validationMetadata == null: bq. if (validationMetadata.partitioner.endsWith("LocalPartitioner")) was: I have a table where all columns are stored with TTL of maximum 4 hours. Usually TWCS compaction properly removes expired data via tombstone compaction and also removes fully expired tables. The number of SSTables is nearly constant since weeks. Good. The problem: Suddenly TWCS does not remove old SSTables any longer. They are being recreated frequently (judging form the file creation timestamp), but the number of tables is growing. Analysis and actions take so far: - sstablemetadata shows strange data, as if the table is completely empty. - sstabledump throws an Exception when running it on such a SSTable - Even triggering a manual major compaction will not remove the old SSTable's. To be more precise: They are recreated with new id and timestamp (not sure whether they are identical as I cannot inspect content due to the sstabledump crash) {color:blue}edit 2017-01-19: This ticket may be obsolete. See the later comments for more information.{color} > NullPointerException in sstabledump > --- > > Key: CASSANDRA-13264 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13264 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Priority: Minor > Labels: twcs > > sstabledump can fail with NullPointerException, when not all files are > present on Disk. This behavior is frequently present on Cassandra nodes with > a high write-throughput and much disk I/O. It looks like Cassandra writes the > Data file fast, and the other files (Statistics, ...) when it has time to do > so (e.g. a minute later). This is OK, but sstabledump does not handle this > gracefully. > Current behavior: > - NullPointerException (see below) > Expected behavior: > - More graceful behavior, e.g. a message to STDERR > PS: This is minor priority, I may pick up the ticket myself if nobody else is > faster. > - > {code} > ~/apache-cassandra-3.9/tools/bin/sstabledump /appdata/mc-53346-big-Data.db > Exception in thread "main" java.lang.NullPointerException > at > org.apache.cassandra.utils.FBUtilities.newPartitioner(FBUtilities.java:429) > at > org.apache.cassandra.tools.SSTableExport.metadataFromSSTable(SSTableExport.java:104) > at > org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:180) > {code} > FBUtilities.java:429 is the following line, and validationMetadata == null: > bq. if (validationMetadata.partitioner.endsWith("LocalPartitioner")) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CASSANDRA-13264) NullPointerException in sstabledump
Christian Esken created CASSANDRA-13264: --- Summary: NullPointerException in sstabledump Key: CASSANDRA-13264 URL: https://issues.apache.org/jira/browse/CASSANDRA-13264 Project: Cassandra Issue Type: Bug Components: Compaction Environment: Cassandra 3.0.9 Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version 1.8.0_112-b15) Linux 3.16 Reporter: Christian Esken Priority: Minor I have a table where all columns are stored with TTL of maximum 4 hours. Usually TWCS compaction properly removes expired data via tombstone compaction and also removes fully expired tables. The number of SSTables is nearly constant since weeks. Good. The problem: Suddenly TWCS does not remove old SSTables any longer. They are being recreated frequently (judging form the file creation timestamp), but the number of tables is growing. Analysis and actions take so far: - sstablemetadata shows strange data, as if the table is completely empty. - sstabledump throws an Exception when running it on such a SSTable - Even triggering a manual major compaction will not remove the old SSTable's. To be more precise: They are recreated with new id and timestamp (not sure whether they are identical as I cannot inspect content due to the sstabledump crash) {color:blue}edit 2017-01-19: This ticket may be obsolete. See the later comments for more information.{color} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (CASSANDRA-13005) Cassandra TWCS is not removing fully expired tables
[ https://issues.apache.org/jira/browse/CASSANDRA-13005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Esken resolved CASSANDRA-13005. - Resolution: Cannot Reproduce I cannot reproduce this issue any longer. As explained in the last comment, it might not actually be a real bug, except for the NPE in sstabledump. For the latter I will create a followup ticket, and will close this ticket. > Cassandra TWCS is not removing fully expired tables > --- > > Key: CASSANDRA-13005 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13005 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Priority: Minor > Labels: twcs > Attachments: sstablemetadata-empty-type-that-is-3GB.txt > > > I have a table where all columns are stored with TTL of maximum 4 hours. > Usually TWCS compaction properly removes expired data via tombstone > compaction and also removes fully expired tables. The number of SSTables is > nearly constant since weeks. Good. > The problem: Suddenly TWCS does not remove old SSTables any longer. They are > being recreated frequently (judging form the file creation timestamp), but > the number of tables is growing. Analysis and actions take so far: > - sstablemetadata shows strange data, as if the table is completely empty. > - sstabledump throws an Exception when running it on such a SSTable > - Even triggering a manual major compaction will not remove the old > SSTable's. To be more precise: They are recreated with new id and timestamp > (not sure whether they are identical as I cannot inspect content due to the > sstabledump crash) > {color:blue}edit 2017-01-19: This ticket may be obsolete. See the later > comments for more information.{color} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13005) Cassandra TWCS is not removing fully expired tables
[ https://issues.apache.org/jira/browse/CASSANDRA-13005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Esken updated CASSANDRA-13005: Description: I have a table where all columns are stored with TTL of maximum 4 hours. Usually TWCS compaction properly removes expired data via tombstone compaction and also removes fully expired tables. The number of SSTables is nearly constant since weeks. Good. The problem: Suddenly TWCS does not remove old SSTables any longer. They are being recreated frequently (judging form the file creation timestamp), but the number of tables is growing. Analysis and actions take so far: - sstablemetadata shows strange data, as if the table is completely empty. - sstabledump throws an Exception when running it on such a SSTable - Even triggering a manual major compaction will not remove the old SSTable's. To be more precise: They are recreated with new id and timestamp (not sure whether they are identical as I cannot inspect content due to the sstabledump crash) {color:blue}edit 2017-01-19: This ticket may be obsolete. See the later comments for more information.{color} was: I have a table where all columns are stored with TTL of maximum 4 hours. Usually TWCS compaction properly removes expired data via tombstone compaction and also removes fully expired tables. The number of SSTables is nearly constant since weeks. Good. The problem: Suddenly TWCS does not remove old SSTables any longer. They are being recreated frequently (judging form the file creation timestamp), but the number of tables is growing. Analysis and actions take so far: - sstablemetadata shows strange data, as if the table is completely empty. - sstabledump throws an Exception when running it on such a SSTable - Even triggering a manual major compaction will not remove the old SSTable's. To be more precise: They are recreated with new id and timestamp (not sure whether they are identical as I cannot inspect content due to the sstabledump crash) {color:blue}edit 2017-0-19: This ticket may be obsolete. See the later comments for more information.{color} > Cassandra TWCS is not removing fully expired tables > --- > > Key: CASSANDRA-13005 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13005 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Priority: Minor > Labels: twcs > Attachments: sstablemetadata-empty-type-that-is-3GB.txt > > > I have a table where all columns are stored with TTL of maximum 4 hours. > Usually TWCS compaction properly removes expired data via tombstone > compaction and also removes fully expired tables. The number of SSTables is > nearly constant since weeks. Good. > The problem: Suddenly TWCS does not remove old SSTables any longer. They are > being recreated frequently (judging form the file creation timestamp), but > the number of tables is growing. Analysis and actions take so far: > - sstablemetadata shows strange data, as if the table is completely empty. > - sstabledump throws an Exception when running it on such a SSTable > - Even triggering a manual major compaction will not remove the old > SSTable's. To be more precise: They are recreated with new id and timestamp > (not sure whether they are identical as I cannot inspect content due to the > sstabledump crash) > {color:blue}edit 2017-01-19: This ticket may be obsolete. See the later > comments for more information.{color} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-13005) Cassandra TWCS is not removing fully expired tables
[ https://issues.apache.org/jira/browse/CASSANDRA-13005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Esken updated CASSANDRA-13005: Description: I have a table where all columns are stored with TTL of maximum 4 hours. Usually TWCS compaction properly removes expired data via tombstone compaction and also removes fully expired tables. The number of SSTables is nearly constant since weeks. Good. The problem: Suddenly TWCS does not remove old SSTables any longer. They are being recreated frequently (judging form the file creation timestamp), but the number of tables is growing. Analysis and actions take so far: - sstablemetadata shows strange data, as if the table is completely empty. - sstabledump throws an Exception when running it on such a SSTable - Even triggering a manual major compaction will not remove the old SSTable's. To be more precise: They are recreated with new id and timestamp (not sure whether they are identical as I cannot inspect content due to the sstabledump crash) {color:blue}edit 2017-0-19: This ticket may be obsolete. See the later comments for more information.{color} was: I have a table where all columns are stored with TTL of maximum 4 hours. Usually TWCS compaction properly removes expired data via tombstone compaction and also removes fully expired tables. The number of SSTables is nearly constant since weeks. Good. The problem: Suddenly TWCS does not remove old SSTables any longer. They are being recreated frequently (judging form the file creation timestamp), but the number of tables is growing. Analysis and actions take so far: - sstablemetadata shows strange data, as if the table is completely empty. - sstabledump throws an Exception when running it on such a SSTable - Even triggering a manual major compaction will not remove the old SSTable's. To be more precise: They are recreated with new id and timestamp (not sure whether they are identical as I cannot inspect content due to the sstabledump crash) > Cassandra TWCS is not removing fully expired tables > --- > > Key: CASSANDRA-13005 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13005 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Priority: Minor > Labels: twcs > Attachments: sstablemetadata-empty-type-that-is-3GB.txt > > > I have a table where all columns are stored with TTL of maximum 4 hours. > Usually TWCS compaction properly removes expired data via tombstone > compaction and also removes fully expired tables. The number of SSTables is > nearly constant since weeks. Good. > The problem: Suddenly TWCS does not remove old SSTables any longer. They are > being recreated frequently (judging form the file creation timestamp), but > the number of tables is growing. Analysis and actions take so far: > - sstablemetadata shows strange data, as if the table is completely empty. > - sstabledump throws an Exception when running it on such a SSTable > - Even triggering a manual major compaction will not remove the old > SSTable's. To be more precise: They are recreated with new id and timestamp > (not sure whether they are identical as I cannot inspect content due to the > sstabledump crash) > {color:blue}edit 2017-0-19: This ticket may be obsolete. See the later > comments for more information.{color} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-13005) Cassandra TWCS is not removing fully expired tables
[ https://issues.apache.org/jira/browse/CASSANDRA-13005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15829587#comment-15829587 ] Christian Esken commented on CASSANDRA-13005: - I am not anymore convinced that my bug report is fully accurate. - It is true that files are missing - It is true that NullPointerException happens within sstabledump - OTOH missing files get created after some time. It may be the case that this always happens, even though it sometimes takes a long time (I did not yet measure this, but I encountered cases where the files were not there even after 1 minute) - I inspected some of the data files, after the missing files were created. At that point of time they were correct and contained non-expired data. Thus, this may not be a bug. I will lower priority, but will keep this bug report open for some time. > Cassandra TWCS is not removing fully expired tables > --- > > Key: CASSANDRA-13005 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13005 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken > Labels: twcs > Attachments: sstablemetadata-empty-type-that-is-3GB.txt > > > I have a table where all columns are stored with TTL of maximum 4 hours. > Usually TWCS compaction properly removes expired data via tombstone > compaction and also removes fully expired tables. The number of SSTables is > nearly constant since weeks. Good. > The problem: Suddenly TWCS does not remove old SSTables any longer. They are > being recreated frequently (judging form the file creation timestamp), but > the number of tables is growing. Analysis and actions take so far: > - sstablemetadata shows strange data, as if the table is completely empty. > - sstabledump throws an Exception when running it on such a SSTable > - Even triggering a manual major compaction will not remove the old > SSTable's. To be more precise: They are recreated with new id and timestamp > (not sure whether they are identical as I cannot inspect content due to the > sstabledump crash) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-13005) Cassandra TWCS is not removing fully expired tables
[ https://issues.apache.org/jira/browse/CASSANDRA-13005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Esken updated CASSANDRA-13005: Priority: Minor (was: Major) > Cassandra TWCS is not removing fully expired tables > --- > > Key: CASSANDRA-13005 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13005 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Priority: Minor > Labels: twcs > Attachments: sstablemetadata-empty-type-that-is-3GB.txt > > > I have a table where all columns are stored with TTL of maximum 4 hours. > Usually TWCS compaction properly removes expired data via tombstone > compaction and also removes fully expired tables. The number of SSTables is > nearly constant since weeks. Good. > The problem: Suddenly TWCS does not remove old SSTables any longer. They are > being recreated frequently (judging form the file creation timestamp), but > the number of tables is growing. Analysis and actions take so far: > - sstablemetadata shows strange data, as if the table is completely empty. > - sstabledump throws an Exception when running it on such a SSTable > - Even triggering a manual major compaction will not remove the old > SSTable's. To be more precise: They are recreated with new id and timestamp > (not sure whether they are identical as I cannot inspect content due to the > sstabledump crash) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-13005) Cassandra TWCS is not removing fully expired tables
[ https://issues.apache.org/jira/browse/CASSANDRA-13005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823645#comment-15823645 ] Christian Esken commented on CASSANDRA-13005: - Adding an additional observation on the event of Jam 13: Most or all "missing" files appeared after some time. It seems like some background process creates those files, but sometimes it takes quite some time and also it happens for multiple tables. For example at some time I was checking and files from 13 SSTables were missing (overall 78 files). > Cassandra TWCS is not removing fully expired tables > --- > > Key: CASSANDRA-13005 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13005 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken > Labels: twcs > Attachments: sstablemetadata-empty-type-that-is-3GB.txt > > > I have a table where all columns are stored with TTL of maximum 4 hours. > Usually TWCS compaction properly removes expired data via tombstone > compaction and also removes fully expired tables. The number of SSTables is > nearly constant since weeks. Good. > The problem: Suddenly TWCS does not remove old SSTables any longer. They are > being recreated frequently (judging form the file creation timestamp), but > the number of tables is growing. Analysis and actions take so far: > - sstablemetadata shows strange data, as if the table is completely empty. > - sstabledump throws an Exception when running it on such a SSTable > - Even triggering a manual major compaction will not remove the old > SSTable's. To be more precise: They are recreated with new id and timestamp > (not sure whether they are identical as I cannot inspect content due to the > sstabledump crash) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-13005) Cassandra TWCS is not removing fully expired tables
[ https://issues.apache.org/jira/browse/CASSANDRA-13005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821041#comment-15821041 ] Christian Esken commented on CASSANDRA-13005: - While the issue was not present in the last 4 weeks, it now has happened again. I am adding the sstableexpiredblockers output, as requested: {code} # sstableexpiredblockers cachestore bookinglinkentries [BigTableReader(path='/data/cassandra/data/cachestore/bookinglinkentries-a2502c60bba511e6917fcda6eb6df2bb/mc-204995-big-Data.db') (minTS = 1484213395148001, maxTS = 1484214299952743, maxLDT = 1484228699)], blocks 1 expired sstables from getting dropped: [BigTableReader(path='/data/cassandra/data/cachestore/bookinglinkentries-a2502c60bba511e6917fcda6eb6df2bb/mc-205731-big-Data.db') (minTS = 1484212495197503, maxTS = 1484213395210562, maxLDT = 1484227795)], {code} The broken SSTables do not appear in the sstableexpiredblockers output. As last time, the number of SSTables keeps increasing once the problem has occurred initially. > Cassandra TWCS is not removing fully expired tables > --- > > Key: CASSANDRA-13005 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13005 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken > Labels: twcs > Attachments: sstablemetadata-empty-type-that-is-3GB.txt > > > I have a table where all columns are stored with TTL of maximum 4 hours. > Usually TWCS compaction properly removes expired data via tombstone > compaction and also removes fully expired tables. The number of SSTables is > nearly constant since weeks. Good. > The problem: Suddenly TWCS does not remove old SSTables any longer. They are > being recreated frequently (judging form the file creation timestamp), but > the number of tables is growing. Analysis and actions take so far: > - sstablemetadata shows strange data, as if the table is completely empty. > - sstabledump throws an Exception when running it on such a SSTable > - Even triggering a manual major compaction will not remove the old > SSTable's. To be more precise: They are recreated with new id and timestamp > (not sure whether they are identical as I cannot inspect content due to the > sstabledump crash) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-13005) Cassandra TWCS is not removing fully expired tables
[ https://issues.apache.org/jira/browse/CASSANDRA-13005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15735819#comment-15735819 ] Christian Esken edited comment on CASSANDRA-13005 at 12/9/16 5:13 PM: -- I have imported some of the old defective SSTables in a test installation via sstableloader: {code} # sstableloader -d 127.0.0.1 cachestore/entries Established connection to initial hosts Opening sstables and calculating sections to stream Streaming relevant part of /home/cesken/cachestore/entries/mc-50789-big-Data.db /home/cesken/cachestore/entries/mc-51223-big-Data.db /home/cesken/cachestore/entries/mc-51351-big-Data.db to [/127.0.0.1] progress: [/127.0.0.1]0:0/3 0 % total: 0% 3,152MiB/s (avg: 3,152MiB/s) progress: [/127.0.0.1]0:0/3 0 % total: 0% 1,908GiB/s (avg: 6,294MiB/s) progress: [/127.0.0.1]0:0/3 0 % total: 0% 1,599GiB/s (avg: 9,423MiB/s) [...] progress: [/127.0.0.1]0:2/3 99 % total: 99% 3,177MiB/s (avg: 6,227MiB/s) progress: [/127.0.0.1]0:3/3 100% total: 100% 3,436MiB/s (avg: 6,214MiB/s) progress: [/127.0.0.1]0:3/3 100% total: 100% 0,000KiB/s (avg: 6,102MiB/s) Summary statistics: Connections per host: 1 Total files transferred : 3 Total bytes transferred : 3,783GiB Total duration : 634779 ms Average transfer rate : 6,102MiB/s Peak transfer rate : 9,423MiB/s {code} As seen above, the 3 files were loaded, but Cassandra did not import any rows. Probably because the files are defective, or because everything in there is expired. A SELECT on the table also does not return any data. {code} # sstableexpiredblockers cachestore entries No sstables for cachestore.entries {code} was (Author: cesken): I have imported some of the old defective SSTables in a test installation via sstableloader: {code} # sstableloader -d 127.0.0.1 cachestore/entries Established connection to initial hosts Opening sstables and calculating sections to stream Streaming relevant part of /home/cesken/cachestore/entries/mc-50789-big-Data.db /home/cesken/cachestore/entries/mc-51223-big-Data.db /home/cesken/cachestore/entries/mc-51351-big-Data.db to [/127.0.0.1] progress: [/127.0.0.1]0:0/3 0 % total: 0% 3,152MiB/s (avg: 3,152MiB/s) progress: [/127.0.0.1]0:0/3 0 % total: 0% 1,908GiB/s (avg: 6,294MiB/s) progress: [/127.0.0.1]0:0/3 0 % total: 0% 1,599GiB/s (avg: 9,423MiB/s) [...] progress: [/127.0.0.1]0:2/3 99 % total: 99% 3,177MiB/s (avg: 6,227MiB/s) progress: [/127.0.0.1]0:3/3 100% total: 100% 3,436MiB/s (avg: 6,214MiB/s) progress: [/127.0.0.1]0:3/3 100% total: 100% 0,000KiB/s (avg: 6,102MiB/s) Summary statistics: Connections per host: 1 Total files transferred : 3 Total bytes transferred : 3,783GiB Total duration : 634779 ms Average transfer rate : 6,102MiB/s Peak transfer rate : 9,423MiB/s {code} As seen above, the 3 files were loaded, but Cassandra did not import any rows. Probably because the files are defective, of because everything in there is expired. A SELECT on the table also does not return any data. {code} # sstableexpiredblockers cachestore entries No sstables for cachestore.entries {code} > Cassandra TWCS is not removing fully expired tables > --- > > Key: CASSANDRA-13005 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13005 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken > Labels: twcs > Attachments: sstablemetadata-empty-type-that-is-3GB.txt > > > I have a table where all columns are stored with TTL of maximum 4 hours. > Usually TWCS compaction properly removes expired data via tombstone > compaction and also removes fully expired tables. The number of SSTables is > nearly constant since weeks. Good. > The problem: Suddenly TWCS does not remove old SSTables any longer. They are > being recreated frequently (judging form the file creation timestamp), but > the number of tables is growing. Analysis and actions take so far: > - sstablemetadata shows strange data, as if the table is completely empty. > - sstabledump throws an Exception when running it on such a SSTable > - Even triggering a manual major compaction will not remove the old > SSTable's. To be more precise: They are recreated with new id and timestamp > (not sure whether they are identical as I cannot inspect content due to the > sstabledump crash) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-13005) Cassandra TWCS is not removing fully expired tables
[ https://issues.apache.org/jira/browse/CASSANDRA-13005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15735820#comment-15735820 ] Christian Esken commented on CASSANDRA-13005: - Is there anything else that I could try? > Cassandra TWCS is not removing fully expired tables > --- > > Key: CASSANDRA-13005 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13005 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken > Labels: twcs > Attachments: sstablemetadata-empty-type-that-is-3GB.txt > > > I have a table where all columns are stored with TTL of maximum 4 hours. > Usually TWCS compaction properly removes expired data via tombstone > compaction and also removes fully expired tables. The number of SSTables is > nearly constant since weeks. Good. > The problem: Suddenly TWCS does not remove old SSTables any longer. They are > being recreated frequently (judging form the file creation timestamp), but > the number of tables is growing. Analysis and actions take so far: > - sstablemetadata shows strange data, as if the table is completely empty. > - sstabledump throws an Exception when running it on such a SSTable > - Even triggering a manual major compaction will not remove the old > SSTable's. To be more precise: They are recreated with new id and timestamp > (not sure whether they are identical as I cannot inspect content due to the > sstabledump crash) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-13005) Cassandra TWCS is not removing fully expired tables
[ https://issues.apache.org/jira/browse/CASSANDRA-13005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15735819#comment-15735819 ] Christian Esken commented on CASSANDRA-13005: - I have imported some of the old defective SSTables in a test installation via sstableloader: {code} # sstableloader -d 127.0.0.1 cachestore/entries Established connection to initial hosts Opening sstables and calculating sections to stream Streaming relevant part of /home/cesken/cachestore/entries/mc-50789-big-Data.db /home/cesken/cachestore/entries/mc-51223-big-Data.db /home/cesken/cachestore/entries/mc-51351-big-Data.db to [/127.0.0.1] progress: [/127.0.0.1]0:0/3 0 % total: 0% 3,152MiB/s (avg: 3,152MiB/s) progress: [/127.0.0.1]0:0/3 0 % total: 0% 1,908GiB/s (avg: 6,294MiB/s) progress: [/127.0.0.1]0:0/3 0 % total: 0% 1,599GiB/s (avg: 9,423MiB/s) [...] progress: [/127.0.0.1]0:2/3 99 % total: 99% 3,177MiB/s (avg: 6,227MiB/s) progress: [/127.0.0.1]0:3/3 100% total: 100% 3,436MiB/s (avg: 6,214MiB/s) progress: [/127.0.0.1]0:3/3 100% total: 100% 0,000KiB/s (avg: 6,102MiB/s) Summary statistics: Connections per host: 1 Total files transferred : 3 Total bytes transferred : 3,783GiB Total duration : 634779 ms Average transfer rate : 6,102MiB/s Peak transfer rate : 9,423MiB/s {code} As seen above, the 3 files were loaded, but Cassandra did not import any rows. Probably because the files are defective, of because everything in there is expired. A SELECT on the table also does not return any data. {code} # sstableexpiredblockers cachestore entries No sstables for cachestore.entries {code} > Cassandra TWCS is not removing fully expired tables > --- > > Key: CASSANDRA-13005 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13005 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken > Labels: twcs > Attachments: sstablemetadata-empty-type-that-is-3GB.txt > > > I have a table where all columns are stored with TTL of maximum 4 hours. > Usually TWCS compaction properly removes expired data via tombstone > compaction and also removes fully expired tables. The number of SSTables is > nearly constant since weeks. Good. > The problem: Suddenly TWCS does not remove old SSTables any longer. They are > being recreated frequently (judging form the file creation timestamp), but > the number of tables is growing. Analysis and actions take so far: > - sstablemetadata shows strange data, as if the table is completely empty. > - sstabledump throws an Exception when running it on such a SSTable > - Even triggering a manual major compaction will not remove the old > SSTable's. To be more precise: They are recreated with new id and timestamp > (not sure whether they are identical as I cannot inspect content due to the > sstabledump crash) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-13005) Cassandra TWCS is not removing fully expired tables
[ https://issues.apache.org/jira/browse/CASSANDRA-13005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15732007#comment-15732007 ] Christian Esken commented on CASSANDRA-13005: - I can do that. I could provide a patch for it. The HowToContribute mentions I can do it also via GitHub, which I would prefer. > Cassandra TWCS is not removing fully expired tables > --- > > Key: CASSANDRA-13005 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13005 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken > Labels: twcs > Attachments: sstablemetadata-empty-type-that-is-3GB.txt > > > I have a table where all columns are stored with TTL of maximum 4 hours. > Usually TWCS compaction properly removes expired data via tombstone > compaction and also removes fully expired tables. The number of SSTables is > nearly constant since weeks. Good. > The problem: Suddenly TWCS does not remove old SSTables any longer. They are > being recreated frequently (judging form the file creation timestamp), but > the number of tables is growing. Analysis and actions take so far: > - sstablemetadata shows strange data, as if the table is completely empty. > - sstabledump throws an Exception when running it on such a SSTable > - Even triggering a manual major compaction will not remove the old > SSTable's. To be more precise: They are recreated with new id and timestamp > (not sure whether they are identical as I cannot inspect content due to the > sstabledump crash) -- This message was sent by Atlassian JIRA (v6.3.4#6332)