[jira] [Commented] (CASSANDRA-14610) Flaky dtest: nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters

2018-10-06 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640961#comment-16640961
 ] 

Jay Zhuang commented on CASSANDRA-14610:


I'm unable to reproduce the problem locally, for the failed job in Jenkins, 
seems mostly it's because timeout to populate 6 nodes:
{noformat}
Error Message
ccmlib.node.NodeError: Error starting node1.
Stacktrace
self = 

@since('4.0')
def test_describecluster_more_information_three_datacenters(self):
"""
nodetool describecluster should be more informative. It should 
include detailes
for total node count, list of datacenters, RF, number of nodes per 
dc, how many
are down and version(s).
@jira_ticket CASSANDRA-13853
@expected_result This test invokes nodetool describecluster and 
matches the output with the expected one
"""
cluster = self.cluster
>   cluster.populate([2, 3, 1]).start(wait_for_binary_proto=True)
{noformat}

Other tests which requires 6 nodes all marked as 
{{@pytest.mark.resource_intensive}} (then these tests are skipped). I think 
reducing the node number from 6 to 4 should help.

+1 for the patch (also it passed 100 times locally run).

> Flaky dtest: 
> nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters
> ---
>
> Key: CASSANDRA-14610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14610
> Project: Cassandra
>  Issue Type: Task
>  Components: Testing, Tools
>Reporter: Jason Brown
>Assignee: Marcus Eriksson
>Priority: Minor
>  Labels: dtest
>
> @jay zhuang observed 
> nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters
>  being flaky in Apache Jenkins. I ran locally and got a different flaky 
> behavior:
> {noformat}
> out_node1_dc3, err, _ = node1_dc3.nodetool('describecluster')
> assert 0 == len(err), err
> >   assert out_node1_dc1 == out_node1_dc3
> E   AssertionError: assert 'Cluster Info...1=3, dc3=1}\n' == 'Cluster 
> Infor...1=3, dc3=1}\n'
> E   Cluster Information:
> E Name: test
> E Snitch: org.apache.cassandra.locator.PropertyFileSnitch
> E DynamicEndPointSnitch: enabled
> E Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
> E Schema versions:
> E fc9ec7cd-80ba-3f27-87af-fc0bafcf7a03: [127.0.0.6, 
> 127.0.0.5, 127.0.0.4, 127.0.0.3, 127.0.0.2, 127.0.0.1]...
> E 
> E ...Full output truncated (26 lines hidden), use '-vv' to show
> 09:58:14,357 ccm DEBUG Log-watching thread exiting.
> ===Flaky Test Report===
> test_describecluster_more_information_three_datacenters failed and was not 
> selected for rerun.
>   
>   assert 'Cluster Info...1=3, dc3=1}\n' == 'Cluster Infor...1=3, dc3=1}\n'
> Cluster Information:
>   Name: test
>   Snitch: org.apache.cassandra.locator.PropertyFileSnitch
>   DynamicEndPointSnitch: enabled
>   Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>   Schema versions:
>   fc9ec7cd-80ba-3f27-87af-fc0bafcf7a03: [127.0.0.6, 127.0.0.5, 
> 127.0.0.4, 127.0.0.3, 127.0.0.2, 127.0.0.1]...
>   
>   ...Full output truncated (26 lines hidden), use '-vv' to show
>   [ /opt/orig/1/opt/dev/cassandra-dtest/nodetool_test.py:373>]
> ===End Flaky Test Report===
> {noformat}
> As this test is for a patch that was introduced for 4.0, this dtest (should) 
> only be failing on trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14610) Flaky dtest: nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters

2018-10-06 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-14610:
---
Reviewer: Jay Zhuang

> Flaky dtest: 
> nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters
> ---
>
> Key: CASSANDRA-14610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14610
> Project: Cassandra
>  Issue Type: Task
>  Components: Testing, Tools
>Reporter: Jason Brown
>Assignee: Marcus Eriksson
>Priority: Minor
>  Labels: dtest
>
> @jay zhuang observed 
> nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters
>  being flaky in Apache Jenkins. I ran locally and got a different flaky 
> behavior:
> {noformat}
> out_node1_dc3, err, _ = node1_dc3.nodetool('describecluster')
> assert 0 == len(err), err
> >   assert out_node1_dc1 == out_node1_dc3
> E   AssertionError: assert 'Cluster Info...1=3, dc3=1}\n' == 'Cluster 
> Infor...1=3, dc3=1}\n'
> E   Cluster Information:
> E Name: test
> E Snitch: org.apache.cassandra.locator.PropertyFileSnitch
> E DynamicEndPointSnitch: enabled
> E Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
> E Schema versions:
> E fc9ec7cd-80ba-3f27-87af-fc0bafcf7a03: [127.0.0.6, 
> 127.0.0.5, 127.0.0.4, 127.0.0.3, 127.0.0.2, 127.0.0.1]...
> E 
> E ...Full output truncated (26 lines hidden), use '-vv' to show
> 09:58:14,357 ccm DEBUG Log-watching thread exiting.
> ===Flaky Test Report===
> test_describecluster_more_information_three_datacenters failed and was not 
> selected for rerun.
>   
>   assert 'Cluster Info...1=3, dc3=1}\n' == 'Cluster Infor...1=3, dc3=1}\n'
> Cluster Information:
>   Name: test
>   Snitch: org.apache.cassandra.locator.PropertyFileSnitch
>   DynamicEndPointSnitch: enabled
>   Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>   Schema versions:
>   fc9ec7cd-80ba-3f27-87af-fc0bafcf7a03: [127.0.0.6, 127.0.0.5, 
> 127.0.0.4, 127.0.0.3, 127.0.0.2, 127.0.0.1]...
>   
>   ...Full output truncated (26 lines hidden), use '-vv' to show
>   [ /opt/orig/1/opt/dev/cassandra-dtest/nodetool_test.py:373>]
> ===End Flaky Test Report===
> {noformat}
> As this test is for a patch that was introduced for 4.0, this dtest (should) 
> only be failing on trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14495) Memory Leak /High Memory usage post 3.11.2 upgrade

2018-10-06 Thread Abdul Patel (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640770#comment-16640770
 ] 

Abdul Patel commented on CASSANDRA-14495:
-

Also is nodetool info best place to cheq heap usage ?

> Memory Leak /High Memory usage post 3.11.2 upgrade
> --
>
> Key: CASSANDRA-14495
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14495
> Project: Cassandra
>  Issue Type: Bug
>  Components: Metrics
>Reporter: Abdul Patel
>Priority: Major
> Attachments: cas_heap.txt
>
>
> Hi All,
>  
> I recently upgraded my non prod cassandra cluster( 4 nodes single DC) from 
> 3.10 to 3.11.2 version.
> No issues reported apart from only nodetool info reporting 80% usage .
> I intially had 16GB memory on each node, later i bumped up to 20GB, and 
> rebooted all nodes.
> Waited for an week and now again i have seen memory usage more than 80% , 
> 16GB + .
> this means some memory leaks are happening over the time.
> Any one has faced such issue or do we have any workaround ? my 3.11.2 version 
>  upgrade rollout has been halted because of this bug.
> ===
> ID : 65b64f5a-7fe6-4036-94c8-8da9c57718cc
> Gossip active  : true
> Thrift active  : true
> Native Transport active: true
> Load   : 985.24 MiB
> Generation No  : 1526923117
> Uptime (seconds)   : 1097684
> Heap Memory (MB)   : 16875.64 / 20480.00
> Off Heap Memory (MB)   : 20.42
> Data Center    : DC7
> Rack   : rac1
> Exceptions : 0
> Key Cache  : entries 3569, size 421.44 KiB, capacity 100 MiB, 
> 7931933 hits, 8098632 requests, 0.979 recent hit rate, 14400 save period in 
> seconds
> Row Cache  : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 
> requests, NaN recent hit rate, 0 save period in seconds
> Counter Cache  : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 
> requests, NaN recent hit rate, 7200 save period in seconds
> Chunk Cache    : entries 2361, size 147.56 MiB, capacity 3.97 GiB, 
> 2412803 misses, 72594047 requests, 0.967 recent hit rate, NaN microseconds 
> miss latency
> Percent Repaired   : 99.88086234106282%
> Token  : (invoke with -T/--tokens to see all 256 tokens)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14495) Memory Leak /High Memory usage post 3.11.2 upgrade

2018-10-06 Thread Abdul Patel (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640766#comment-16640766
 ] 

Abdul Patel commented on CASSANDRA-14495:
-

How frequently you had full gc job?
Nodedool garbagecollect right?

> Memory Leak /High Memory usage post 3.11.2 upgrade
> --
>
> Key: CASSANDRA-14495
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14495
> Project: Cassandra
>  Issue Type: Bug
>  Components: Metrics
>Reporter: Abdul Patel
>Priority: Major
> Attachments: cas_heap.txt
>
>
> Hi All,
>  
> I recently upgraded my non prod cassandra cluster( 4 nodes single DC) from 
> 3.10 to 3.11.2 version.
> No issues reported apart from only nodetool info reporting 80% usage .
> I intially had 16GB memory on each node, later i bumped up to 20GB, and 
> rebooted all nodes.
> Waited for an week and now again i have seen memory usage more than 80% , 
> 16GB + .
> this means some memory leaks are happening over the time.
> Any one has faced such issue or do we have any workaround ? my 3.11.2 version 
>  upgrade rollout has been halted because of this bug.
> ===
> ID : 65b64f5a-7fe6-4036-94c8-8da9c57718cc
> Gossip active  : true
> Thrift active  : true
> Native Transport active: true
> Load   : 985.24 MiB
> Generation No  : 1526923117
> Uptime (seconds)   : 1097684
> Heap Memory (MB)   : 16875.64 / 20480.00
> Off Heap Memory (MB)   : 20.42
> Data Center    : DC7
> Rack   : rac1
> Exceptions : 0
> Key Cache  : entries 3569, size 421.44 KiB, capacity 100 MiB, 
> 7931933 hits, 8098632 requests, 0.979 recent hit rate, 14400 save period in 
> seconds
> Row Cache  : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 
> requests, NaN recent hit rate, 0 save period in seconds
> Counter Cache  : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 
> requests, NaN recent hit rate, 7200 save period in seconds
> Chunk Cache    : entries 2361, size 147.56 MiB, capacity 3.97 GiB, 
> 2412803 misses, 72594047 requests, 0.967 recent hit rate, NaN microseconds 
> miss latency
> Percent Repaired   : 99.88086234106282%
> Token  : (invoke with -T/--tokens to see all 256 tokens)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14495) Memory Leak /High Memory usage post 3.11.2 upgrade

2018-10-06 Thread Ma Dega (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640749#comment-16640749
 ] 

Ma Dega commented on CASSANDRA-14495:
-

Look for large partitions.  Those creep up during compaction "Writing large
partition".

In my case, once I dealt with these.  Constant Full GC and OOM was resolved.




> Memory Leak /High Memory usage post 3.11.2 upgrade
> --
>
> Key: CASSANDRA-14495
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14495
> Project: Cassandra
>  Issue Type: Bug
>  Components: Metrics
>Reporter: Abdul Patel
>Priority: Major
> Attachments: cas_heap.txt
>
>
> Hi All,
>  
> I recently upgraded my non prod cassandra cluster( 4 nodes single DC) from 
> 3.10 to 3.11.2 version.
> No issues reported apart from only nodetool info reporting 80% usage .
> I intially had 16GB memory on each node, later i bumped up to 20GB, and 
> rebooted all nodes.
> Waited for an week and now again i have seen memory usage more than 80% , 
> 16GB + .
> this means some memory leaks are happening over the time.
> Any one has faced such issue or do we have any workaround ? my 3.11.2 version 
>  upgrade rollout has been halted because of this bug.
> ===
> ID : 65b64f5a-7fe6-4036-94c8-8da9c57718cc
> Gossip active  : true
> Thrift active  : true
> Native Transport active: true
> Load   : 985.24 MiB
> Generation No  : 1526923117
> Uptime (seconds)   : 1097684
> Heap Memory (MB)   : 16875.64 / 20480.00
> Off Heap Memory (MB)   : 20.42
> Data Center    : DC7
> Rack   : rac1
> Exceptions : 0
> Key Cache  : entries 3569, size 421.44 KiB, capacity 100 MiB, 
> 7931933 hits, 8098632 requests, 0.979 recent hit rate, 14400 save period in 
> seconds
> Row Cache  : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 
> requests, NaN recent hit rate, 0 save period in seconds
> Counter Cache  : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 
> requests, NaN recent hit rate, 7200 save period in seconds
> Chunk Cache    : entries 2361, size 147.56 MiB, capacity 3.97 GiB, 
> 2412803 misses, 72594047 requests, 0.967 recent hit rate, NaN microseconds 
> miss latency
> Percent Repaired   : 99.88086234106282%
> Token  : (invoke with -T/--tokens to see all 256 tokens)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org