[jira] [Commented] (CASSANDRA-14610) Flaky dtest: nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters
[ https://issues.apache.org/jira/browse/CASSANDRA-14610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640961#comment-16640961 ] Jay Zhuang commented on CASSANDRA-14610: I'm unable to reproduce the problem locally, for the failed job in Jenkins, seems mostly it's because timeout to populate 6 nodes: {noformat} Error Message ccmlib.node.NodeError: Error starting node1. Stacktrace self = @since('4.0') def test_describecluster_more_information_three_datacenters(self): """ nodetool describecluster should be more informative. It should include detailes for total node count, list of datacenters, RF, number of nodes per dc, how many are down and version(s). @jira_ticket CASSANDRA-13853 @expected_result This test invokes nodetool describecluster and matches the output with the expected one """ cluster = self.cluster > cluster.populate([2, 3, 1]).start(wait_for_binary_proto=True) {noformat} Other tests which requires 6 nodes all marked as {{@pytest.mark.resource_intensive}} (then these tests are skipped). I think reducing the node number from 6 to 4 should help. +1 for the patch (also it passed 100 times locally run). > Flaky dtest: > nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters > --- > > Key: CASSANDRA-14610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14610 > Project: Cassandra > Issue Type: Task > Components: Testing, Tools >Reporter: Jason Brown >Assignee: Marcus Eriksson >Priority: Minor > Labels: dtest > > @jay zhuang observed > nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters > being flaky in Apache Jenkins. I ran locally and got a different flaky > behavior: > {noformat} > out_node1_dc3, err, _ = node1_dc3.nodetool('describecluster') > assert 0 == len(err), err > > assert out_node1_dc1 == out_node1_dc3 > E AssertionError: assert 'Cluster Info...1=3, dc3=1}\n' == 'Cluster > Infor...1=3, dc3=1}\n' > E Cluster Information: > E Name: test > E Snitch: org.apache.cassandra.locator.PropertyFileSnitch > E DynamicEndPointSnitch: enabled > E Partitioner: org.apache.cassandra.dht.Murmur3Partitioner > E Schema versions: > E fc9ec7cd-80ba-3f27-87af-fc0bafcf7a03: [127.0.0.6, > 127.0.0.5, 127.0.0.4, 127.0.0.3, 127.0.0.2, 127.0.0.1]... > E > E ...Full output truncated (26 lines hidden), use '-vv' to show > 09:58:14,357 ccm DEBUG Log-watching thread exiting. > ===Flaky Test Report=== > test_describecluster_more_information_three_datacenters failed and was not > selected for rerun. > > assert 'Cluster Info...1=3, dc3=1}\n' == 'Cluster Infor...1=3, dc3=1}\n' > Cluster Information: > Name: test > Snitch: org.apache.cassandra.locator.PropertyFileSnitch > DynamicEndPointSnitch: enabled > Partitioner: org.apache.cassandra.dht.Murmur3Partitioner > Schema versions: > fc9ec7cd-80ba-3f27-87af-fc0bafcf7a03: [127.0.0.6, 127.0.0.5, > 127.0.0.4, 127.0.0.3, 127.0.0.2, 127.0.0.1]... > > ...Full output truncated (26 lines hidden), use '-vv' to show > [ /opt/orig/1/opt/dev/cassandra-dtest/nodetool_test.py:373>] > ===End Flaky Test Report=== > {noformat} > As this test is for a patch that was introduced for 4.0, this dtest (should) > only be failing on trunk. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14610) Flaky dtest: nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters
[ https://issues.apache.org/jira/browse/CASSANDRA-14610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-14610: --- Reviewer: Jay Zhuang > Flaky dtest: > nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters > --- > > Key: CASSANDRA-14610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14610 > Project: Cassandra > Issue Type: Task > Components: Testing, Tools >Reporter: Jason Brown >Assignee: Marcus Eriksson >Priority: Minor > Labels: dtest > > @jay zhuang observed > nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters > being flaky in Apache Jenkins. I ran locally and got a different flaky > behavior: > {noformat} > out_node1_dc3, err, _ = node1_dc3.nodetool('describecluster') > assert 0 == len(err), err > > assert out_node1_dc1 == out_node1_dc3 > E AssertionError: assert 'Cluster Info...1=3, dc3=1}\n' == 'Cluster > Infor...1=3, dc3=1}\n' > E Cluster Information: > E Name: test > E Snitch: org.apache.cassandra.locator.PropertyFileSnitch > E DynamicEndPointSnitch: enabled > E Partitioner: org.apache.cassandra.dht.Murmur3Partitioner > E Schema versions: > E fc9ec7cd-80ba-3f27-87af-fc0bafcf7a03: [127.0.0.6, > 127.0.0.5, 127.0.0.4, 127.0.0.3, 127.0.0.2, 127.0.0.1]... > E > E ...Full output truncated (26 lines hidden), use '-vv' to show > 09:58:14,357 ccm DEBUG Log-watching thread exiting. > ===Flaky Test Report=== > test_describecluster_more_information_three_datacenters failed and was not > selected for rerun. > > assert 'Cluster Info...1=3, dc3=1}\n' == 'Cluster Infor...1=3, dc3=1}\n' > Cluster Information: > Name: test > Snitch: org.apache.cassandra.locator.PropertyFileSnitch > DynamicEndPointSnitch: enabled > Partitioner: org.apache.cassandra.dht.Murmur3Partitioner > Schema versions: > fc9ec7cd-80ba-3f27-87af-fc0bafcf7a03: [127.0.0.6, 127.0.0.5, > 127.0.0.4, 127.0.0.3, 127.0.0.2, 127.0.0.1]... > > ...Full output truncated (26 lines hidden), use '-vv' to show > [ /opt/orig/1/opt/dev/cassandra-dtest/nodetool_test.py:373>] > ===End Flaky Test Report=== > {noformat} > As this test is for a patch that was introduced for 4.0, this dtest (should) > only be failing on trunk. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14495) Memory Leak /High Memory usage post 3.11.2 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640770#comment-16640770 ] Abdul Patel commented on CASSANDRA-14495: - Also is nodetool info best place to cheq heap usage ? > Memory Leak /High Memory usage post 3.11.2 upgrade > -- > > Key: CASSANDRA-14495 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14495 > Project: Cassandra > Issue Type: Bug > Components: Metrics >Reporter: Abdul Patel >Priority: Major > Attachments: cas_heap.txt > > > Hi All, > > I recently upgraded my non prod cassandra cluster( 4 nodes single DC) from > 3.10 to 3.11.2 version. > No issues reported apart from only nodetool info reporting 80% usage . > I intially had 16GB memory on each node, later i bumped up to 20GB, and > rebooted all nodes. > Waited for an week and now again i have seen memory usage more than 80% , > 16GB + . > this means some memory leaks are happening over the time. > Any one has faced such issue or do we have any workaround ? my 3.11.2 version > upgrade rollout has been halted because of this bug. > === > ID : 65b64f5a-7fe6-4036-94c8-8da9c57718cc > Gossip active : true > Thrift active : true > Native Transport active: true > Load : 985.24 MiB > Generation No : 1526923117 > Uptime (seconds) : 1097684 > Heap Memory (MB) : 16875.64 / 20480.00 > Off Heap Memory (MB) : 20.42 > Data Center : DC7 > Rack : rac1 > Exceptions : 0 > Key Cache : entries 3569, size 421.44 KiB, capacity 100 MiB, > 7931933 hits, 8098632 requests, 0.979 recent hit rate, 14400 save period in > seconds > Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 > requests, NaN recent hit rate, 0 save period in seconds > Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 > requests, NaN recent hit rate, 7200 save period in seconds > Chunk Cache : entries 2361, size 147.56 MiB, capacity 3.97 GiB, > 2412803 misses, 72594047 requests, 0.967 recent hit rate, NaN microseconds > miss latency > Percent Repaired : 99.88086234106282% > Token : (invoke with -T/--tokens to see all 256 tokens) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14495) Memory Leak /High Memory usage post 3.11.2 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640766#comment-16640766 ] Abdul Patel commented on CASSANDRA-14495: - How frequently you had full gc job? Nodedool garbagecollect right? > Memory Leak /High Memory usage post 3.11.2 upgrade > -- > > Key: CASSANDRA-14495 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14495 > Project: Cassandra > Issue Type: Bug > Components: Metrics >Reporter: Abdul Patel >Priority: Major > Attachments: cas_heap.txt > > > Hi All, > > I recently upgraded my non prod cassandra cluster( 4 nodes single DC) from > 3.10 to 3.11.2 version. > No issues reported apart from only nodetool info reporting 80% usage . > I intially had 16GB memory on each node, later i bumped up to 20GB, and > rebooted all nodes. > Waited for an week and now again i have seen memory usage more than 80% , > 16GB + . > this means some memory leaks are happening over the time. > Any one has faced such issue or do we have any workaround ? my 3.11.2 version > upgrade rollout has been halted because of this bug. > === > ID : 65b64f5a-7fe6-4036-94c8-8da9c57718cc > Gossip active : true > Thrift active : true > Native Transport active: true > Load : 985.24 MiB > Generation No : 1526923117 > Uptime (seconds) : 1097684 > Heap Memory (MB) : 16875.64 / 20480.00 > Off Heap Memory (MB) : 20.42 > Data Center : DC7 > Rack : rac1 > Exceptions : 0 > Key Cache : entries 3569, size 421.44 KiB, capacity 100 MiB, > 7931933 hits, 8098632 requests, 0.979 recent hit rate, 14400 save period in > seconds > Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 > requests, NaN recent hit rate, 0 save period in seconds > Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 > requests, NaN recent hit rate, 7200 save period in seconds > Chunk Cache : entries 2361, size 147.56 MiB, capacity 3.97 GiB, > 2412803 misses, 72594047 requests, 0.967 recent hit rate, NaN microseconds > miss latency > Percent Repaired : 99.88086234106282% > Token : (invoke with -T/--tokens to see all 256 tokens) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14495) Memory Leak /High Memory usage post 3.11.2 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640749#comment-16640749 ] Ma Dega commented on CASSANDRA-14495: - Look for large partitions. Those creep up during compaction "Writing large partition". In my case, once I dealt with these. Constant Full GC and OOM was resolved. > Memory Leak /High Memory usage post 3.11.2 upgrade > -- > > Key: CASSANDRA-14495 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14495 > Project: Cassandra > Issue Type: Bug > Components: Metrics >Reporter: Abdul Patel >Priority: Major > Attachments: cas_heap.txt > > > Hi All, > > I recently upgraded my non prod cassandra cluster( 4 nodes single DC) from > 3.10 to 3.11.2 version. > No issues reported apart from only nodetool info reporting 80% usage . > I intially had 16GB memory on each node, later i bumped up to 20GB, and > rebooted all nodes. > Waited for an week and now again i have seen memory usage more than 80% , > 16GB + . > this means some memory leaks are happening over the time. > Any one has faced such issue or do we have any workaround ? my 3.11.2 version > upgrade rollout has been halted because of this bug. > === > ID : 65b64f5a-7fe6-4036-94c8-8da9c57718cc > Gossip active : true > Thrift active : true > Native Transport active: true > Load : 985.24 MiB > Generation No : 1526923117 > Uptime (seconds) : 1097684 > Heap Memory (MB) : 16875.64 / 20480.00 > Off Heap Memory (MB) : 20.42 > Data Center : DC7 > Rack : rac1 > Exceptions : 0 > Key Cache : entries 3569, size 421.44 KiB, capacity 100 MiB, > 7931933 hits, 8098632 requests, 0.979 recent hit rate, 14400 save period in > seconds > Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 > requests, NaN recent hit rate, 0 save period in seconds > Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 > requests, NaN recent hit rate, 7200 save period in seconds > Chunk Cache : entries 2361, size 147.56 MiB, capacity 3.97 GiB, > 2412803 misses, 72594047 requests, 0.967 recent hit rate, NaN microseconds > miss latency > Percent Repaired : 99.88086234106282% > Token : (invoke with -T/--tokens to see all 256 tokens) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org