[jira] [Commented] (CASSANDRA-13538) Cassandra tasks permanently block after the following assertion occurs during compaction: "java.lang.AssertionError: Interval min > max "
[ https://issues.apache.org/jira/browse/CASSANDRA-13538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16501322#comment-16501322 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-13538: --- Noticed the similar issue in one of the environments, has anyone have any workaround? > Cassandra tasks permanently block after the following assertion occurs during > compaction: "java.lang.AssertionError: Interval min > max " > - > > Key: CASSANDRA-13538 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13538 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: This happens on a 7 node system with 2 data centers. > We're using Cassandra version 2.1.15. I upgraded to 2.1.17 and it still > occurs. >Reporter: Andy Klages >Priority: Major > Fix For: 2.1.x > > Attachments: cassandra.yaml, jstack.out, schema.cql3, system.log, > tpstats.out > > > We noticed this problem because the commitlogs proliferate to the point that > we eventually run out of disk space. nodetool tpstats shows several of the > tasks backed up: > {code} > Pool NameActive Pending Completed Blocked All > time blocked > MutationStage 0 0 134335315 0 > 0 > ReadStage 0 0 643986790 0 > 0 > RequestResponseStage 0 0 114298 0 > 0 > ReadRepairStage 0 0 36 0 > 0 > CounterMutationStage 0 0 0 0 > 0 > MiscStage 0 0 0 0 > 0 > AntiEntropySessions 1 1 79357 0 > 0 > HintedHandoff 0 0 90 0 > 0 > GossipStage 0 06595098 0 > 0 > CacheCleanupExecutor 0 0 0 0 > 0 > InternalResponseStage 0 01638369 0 > 0 > CommitLogArchiver 0 0 0 0 > 0 > CompactionExecutor2 1752922542 0 > 0 > ValidationExecutor0 01465374 0 > 0 > MigrationStage176600 0 > 0 > AntiEntropyStage 1 9238291098 0 > 0 > PendingRangeCalculator0 0 20 0 > 0 > Sampler 0 0 0 0 > 0 > MemtableFlushWriter 0 0 53017 0 > 0 > MemtablePostFlush 1 45841545141 0 > 0 > MemtableReclaimMemory 0 0 70639 0 > 0 > Native-Transport-Requests 0 0 352559 0 > 0 > {code} > This all starts after the following exception is raised in Cassandra: > {code} > ERROR [MemtableFlushWriter:2437] 2017-05-15 01:53:23,380 > CassandraDaemon.java:231 - Exception in thread > Thread[MemtableFlushWriter:2437,5,main] > java.lang.AssertionError: Interval min > max > at > org.apache.cassandra.utils.IntervalTree$IntervalNode.(IntervalTree.java:249) > ~[apache-cassandra-2.1.15.jar:2.1.15] > at org.apache.cassandra.utils.IntervalTree.(IntervalTree.java:72) > ~[apache-cassandra-2.1.15.jar:2.1.15] > at > org.apache.cassandra.db.DataTracker$SSTableIntervalTree.(DataTracker.java:603) > ~[apache-cassandra-2.1.15.jar:2.1.15] > at > org.apache.cassandra.db.DataTracker$SSTableIntervalTree.(DataTracker.java:597) > ~[apache-cassandra-2.1.15.jar:2.1.15] > at > org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:578) > ~[apache-cassandra-2.1.15.jar:2.1.15] > at > org.apache.cassandra.db.DataTracker$View.replaceFlushed(DataTracker.java:740) > ~[apache-cassandra-2.1.15.jar:2.1.15] > at > org.apache.cassandra.db.DataTracker.replaceFlushed(DataTracker.java:172) > ~[apache-cassandra-2.1.15.jar:2.1.15] > at > org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:234) > ~[apache-cassandra-2.1.15.jar:2.1.15] > at > org.apache.cassandra.db.ColumnFamilyStore.r
[jira] [Updated] (CASSANDRA-14497) Add Role login cache
[ https://issues.apache.org/jira/browse/CASSANDRA-14497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-14497: --- Description: The [{{ClientState.login()}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ClientState.java#L313] function is used for all auth message: [{{AuthResponse.java:82}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/AuthResponse.java#L82]. But the [{{role.canLogin}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L521] information is not cached. So it hits the database every time: [{{CassandraRoleManager.java:407}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L407]. For a cluster with lots of new connections, it's causing performance issue. The mitigation for us is to increase the {{system_auth}} replication factor to match the number of nodes, so [{{local_one}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L488] would be very cheap. The P99 dropped immediately, but I don't think it is not a good solution. I would purpose to add {{Role.canLogin}} to the RolesCache to improve the auth performance. was: The [{{ClientState.login()}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ClientState.java#L313] function is used for all auth message: [{{AuthResponse.java:82}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/AuthResponse.java#L82]. But the [{{role.canLogin}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L521] information is not cached. So it hits database everytime: [{{CassandraRoleManager.java:407}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L407]. For a cluster with lots of new connections, it's causing performance issue. The mitigation for us is to increase the {{system_auth}} replication factor to match the number of nodes, so [{{local_one}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L488] would be very cheap. The P99 dropped immediately, but I don't think it is not a good solution. I would purpose to add {{Role.canLogin}} to the RolesCache to improve the auth performance. > Add Role login cache > > > Key: CASSANDRA-14497 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14497 > Project: Cassandra > Issue Type: Improvement > Components: Auth >Reporter: Jay Zhuang >Priority: Major > > The > [{{ClientState.login()}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ClientState.java#L313] > function is used for all auth message: > [{{AuthResponse.java:82}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/AuthResponse.java#L82]. > But the > [{{role.canLogin}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L521] > information is not cached. So it hits the database every time: > [{{CassandraRoleManager.java:407}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L407]. > For a cluster with lots of new connections, it's causing performance issue. > The mitigation for us is to increase the {{system_auth}} replication factor > to match the number of nodes, so > [{{local_one}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L488] > would be very cheap. The P99 dropped immediately, but I don't think it is > not a good solution. > I would purpose to add {{Role.canLogin}} to the RolesCache to improve the > auth performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14497) Add Role login cache
[ https://issues.apache.org/jira/browse/CASSANDRA-14497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-14497: --- Description: The [{{ClientState.login()}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ClientState.java#L313] function is used for all auth message: [{{AuthResponse.java:82}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/AuthResponse.java#L82]. But the [{{role.canLogin}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L521] information is not cached. So it hits database everytime: [{{CassandraRoleManager.java:407}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L407]. For a cluster with lots of new connections, it's causing performance issue. The mitigation for us is to increase the {{system_auth}} replication factor to match the number of nodes, so [{{local_one}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L488] would be very cheap. The P99 dropped immediately, but I don't think it is not a good solution. I would purpose to add {{Role.canLogin}} to the RolesCache to improve the auth performance. was: The [{{ClientState.login()}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ClientState.java#L313] function is used for all auth message: [{{AuthResponse.java:82}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/AuthResponse.java#L82]. But the [{{role.canLogin}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L521] information is not cached. So it hits database everytime: [{{CassandraRoleManager.java:407}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L407]. For a cluster with lots of new connections, it's causing performance issue. The mitigation for us is to increase the {{system_auth}} replication factor to match the number of nodes, so [{{local_one}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L488] would be very cheap. The P99 dropped immeidcately Which is not a good solution. I would purpose to add {{Role.canLogin}} to the RolesCache to improve the auth performance. > Add Role login cache > > > Key: CASSANDRA-14497 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14497 > Project: Cassandra > Issue Type: Improvement > Components: Auth >Reporter: Jay Zhuang >Priority: Major > > The > [{{ClientState.login()}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ClientState.java#L313] > function is used for all auth message: > [{{AuthResponse.java:82}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/AuthResponse.java#L82]. > But the > [{{role.canLogin}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L521] > information is not cached. So it hits database everytime: > [{{CassandraRoleManager.java:407}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L407]. > For a cluster with lots of new connections, it's causing performance issue. > The mitigation for us is to increase the {{system_auth}} replication factor > to match the number of nodes, so > [{{local_one}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L488] > would be very cheap. The P99 dropped immediately, but I don't think it is > not a good solution. > I would purpose to add {{Role.canLogin}} to the RolesCache to improve the > auth performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14497) Add Role login cache
[ https://issues.apache.org/jira/browse/CASSANDRA-14497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-14497: --- Description: The [{{ClientState.login()}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ClientState.java#L313] function is used for all auth message: [{{AuthResponse.java:82}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/AuthResponse.java#L82]. But the [{{role.canLogin}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L521] information is not cached. So it hits database everytime: [{{CassandraRoleManager.java:407}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L407]. For a cluster with lots of new connections, it's causing performance issue. The mitigation for us is to increase the {{system_auth}} replication factor to match the number of nodes, so [{{local_one}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L488] would be very cheap. The P99 dropped immeidcately Which is not a good solution. I would purpose to add {{Role.canLogin}} to the RolesCache to improve the auth performance. was: The [{{ClientState.login()}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ClientState.java#L313] function is used for all auth message: [{{AuthResponse.java:82}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/AuthResponse.java#L82]. But the [{{role.canLogin}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L521] information is not cached. So it hits database everytime: [{{CassandraRoleManager.java:407}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L407]. For a cluster with lots of new connections, it's causing performance issue. The mitigation for us is to increase the {{system_auth}} replication factor to match the number of nodes, so [{{local_one}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L488] would be very cheap. Which is not a good solution. I would purpose to add {{Role.canLogin}} to the RolesCache to improve the auth performance. > Add Role login cache > > > Key: CASSANDRA-14497 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14497 > Project: Cassandra > Issue Type: Improvement > Components: Auth >Reporter: Jay Zhuang >Priority: Major > > The > [{{ClientState.login()}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ClientState.java#L313] > function is used for all auth message: > [{{AuthResponse.java:82}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/AuthResponse.java#L82]. > But the > [{{role.canLogin}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L521] > information is not cached. So it hits database everytime: > [{{CassandraRoleManager.java:407}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L407]. > For a cluster with lots of new connections, it's causing performance issue. > The mitigation for us is to increase the {{system_auth}} replication factor > to match the number of nodes, so > [{{local_one}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L488] > would be very cheap. The P99 dropped immeidcately Which is not a good > solution. > I would purpose to add {{Role.canLogin}} to the RolesCache to improve the > auth performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14497) Add Role login cache
[ https://issues.apache.org/jira/browse/CASSANDRA-14497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-14497: --- Description: The [{{ClientState.login()}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ClientState.java#L313] function is used for all auth message: [{{AuthResponse.java:82}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/AuthResponse.java#L82]. But the [{{role.canLogin}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L521] information is not cached. So it hits database everytime: [{{CassandraRoleManager.java:407}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L407]. For a cluster with lots of new connections, it's causing performance issue. The mitigation for us is to increase the {{system_auth}} replication factor to match the number of nodes, so [{{local_one}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L488] would be very cheap. Which is not a good solution. I would purpose to add {{Role.canLogin}} to the RolesCache to improve the auth performance. was: The [{{ClientState.login()}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ClientState.java#L313] function is used for all auth message: [{{AuthResponse.java:82}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/AuthResponse.java#L82]. But the [{{role.canLogin}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L521] information is not cached. So it hits database everytime: [{{CassandraRoleManager.java:407}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L407]. For a cluster with lots of new connections, it's causing performance issue. The mitigation for us is to increase the {{system_auth}} replication factor to match the number of nodes, so [{{local_one}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L488] would be very cheap. Which is a good solution. I would purpose to add {{Role.canLogin}} to the RolesCache to improve the auth performance. > Add Role login cache > > > Key: CASSANDRA-14497 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14497 > Project: Cassandra > Issue Type: Improvement > Components: Auth >Reporter: Jay Zhuang >Priority: Major > > The > [{{ClientState.login()}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ClientState.java#L313] > function is used for all auth message: > [{{AuthResponse.java:82}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/AuthResponse.java#L82]. > But the > [{{role.canLogin}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L521] > information is not cached. So it hits database everytime: > [{{CassandraRoleManager.java:407}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L407]. > For a cluster with lots of new connections, it's causing performance issue. > The mitigation for us is to increase the {{system_auth}} replication factor > to match the number of nodes, so > [{{local_one}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L488] > would be very cheap. Which is not a good solution. > I would purpose to add {{Role.canLogin}} to the RolesCache to improve the > auth performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14497) Add Role login cache
[ https://issues.apache.org/jira/browse/CASSANDRA-14497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-14497: --- Description: The [{{ClientState.login()}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ClientState.java#L313] function is used for all auth message: [{{AuthResponse.java:82}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/AuthResponse.java#L82]. But the [{{role.canLogin}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L521] information is not cached. So it hits database everytime: [{{CassandraRoleManager.java:407}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L407]. For a cluster with lots of new connections, it's causing performance issue. The mitigation for us is to increase the {{system_auth}} replication factor to match the number of nodes, so [{{local_one}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L488] would be very cheap. Which is a good solution. I would purpose to add {{Role.canLogin}} to the RolesCache to improve the auth performance. was:The Summary: Add Role login cache (was: Add ole login cache) > Add Role login cache > > > Key: CASSANDRA-14497 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14497 > Project: Cassandra > Issue Type: Improvement > Components: Auth >Reporter: Jay Zhuang >Priority: Major > > The > [{{ClientState.login()}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ClientState.java#L313] > function is used for all auth message: > [{{AuthResponse.java:82}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/AuthResponse.java#L82]. > But the > [{{role.canLogin}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L521] > information is not cached. So it hits database everytime: > [{{CassandraRoleManager.java:407}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L407]. > For a cluster with lots of new connections, it's causing performance issue. > The mitigation for us is to increase the {{system_auth}} replication factor > to match the number of nodes, so > [{{local_one}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L488] > would be very cheap. Which is a good solution. > I would purpose to add {{Role.canLogin}} to the RolesCache to improve the > auth performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14497) Add ole login cache
Jay Zhuang created CASSANDRA-14497: -- Summary: Add ole login cache Key: CASSANDRA-14497 URL: https://issues.apache.org/jira/browse/CASSANDRA-14497 Project: Cassandra Issue Type: Improvement Components: Auth Reporter: Jay Zhuang The -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14451) Infinity ms Commit Log Sync
[ https://issues.apache.org/jira/browse/CASSANDRA-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16501213#comment-16501213 ] Jordan West commented on CASSANDRA-14451: - [~jasobrown] change LGTM. A few questions and minor comments: * Are the ArchiveCommitLog dtest failures expected on the 3.0 branch? * The “sleep any time we have left” comment would be more appropriate above the assignment of {{wakeUpAt}}. * Mark {{maybeLogFlushLag}} and {{getTotalSyncDuration}} as {{@VisibleForTesting}} * Just wanted to check that the change in behavior of updating {{totalSyncDuration}} is intentional. It makes sense to me that we only increment it if a sync actually occurs but that wasn’t the case before. * Is there are reason you opted for the “excessTimeToFlush” approach in 3.0 but the “maxFlushTimestamp” approach on 3.11 and trunk? The only difference I see is the unit of time. > Infinity ms Commit Log Sync > --- > > Key: CASSANDRA-14451 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14451 > Project: Cassandra > Issue Type: Bug > Environment: 3.11.2 - 2 DC >Reporter: Harry Hough >Assignee: Jason Brown >Priority: Minor > Fix For: 3.0.x, 3.11.x, 4.0.x > > > Its giving commit log sync warnings where there were apparently zero syncs > and therefore gives "Infinityms" as the average duration > {code:java} > WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:11:14,294 > NoSpamLogger.java:94 - Out of 0 commit log syncs over the past 0.00s with > average duration of Infinityms, 1 have exceeded the configured commit > interval by an average of 74.40ms > WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:16:57,844 > NoSpamLogger.java:94 - Out of 0 commit log syncs over the past 0.00s with > average duration of Infinityms, 1 have exceeded the configured commit > interval by an average of 198.69ms > WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:24:46,325 > NoSpamLogger.java:94 - Out of 0 commit log syncs over the past 0.00s with > average duration of Infinityms, 1 have exceeded the configured commit > interval by an average of 264.11ms > WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:29:46,393 > NoSpamLogger.java:94 - Out of 32 commit log syncs over the past 268.84s with, > average duration of 17.56ms, 1 have exceeded the configured commit interval > by an average of 173.66ms{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14451) Infinity ms Commit Log Sync
[ https://issues.apache.org/jira/browse/CASSANDRA-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan West updated CASSANDRA-14451: Reviewer: Jordan West > Infinity ms Commit Log Sync > --- > > Key: CASSANDRA-14451 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14451 > Project: Cassandra > Issue Type: Bug > Environment: 3.11.2 - 2 DC >Reporter: Harry Hough >Assignee: Jason Brown >Priority: Minor > Fix For: 3.0.x, 3.11.x, 4.0.x > > > Its giving commit log sync warnings where there were apparently zero syncs > and therefore gives "Infinityms" as the average duration > {code:java} > WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:11:14,294 > NoSpamLogger.java:94 - Out of 0 commit log syncs over the past 0.00s with > average duration of Infinityms, 1 have exceeded the configured commit > interval by an average of 74.40ms > WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:16:57,844 > NoSpamLogger.java:94 - Out of 0 commit log syncs over the past 0.00s with > average duration of Infinityms, 1 have exceeded the configured commit > interval by an average of 198.69ms > WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:24:46,325 > NoSpamLogger.java:94 - Out of 0 commit log syncs over the past 0.00s with > average duration of Infinityms, 1 have exceeded the configured commit > interval by an average of 264.11ms > WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-16 21:29:46,393 > NoSpamLogger.java:94 - Out of 32 commit log syncs over the past 268.84s with, > average duration of 17.56ms, 1 have exceeded the configured commit interval > by an average of 173.66ms{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14496) TWCS erroneously disabling tombstone compactions
[ https://issues.apache.org/jira/browse/CASSANDRA-14496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Tarrall updated CASSANDRA-14496: --- Priority: Minor (was: Major) > TWCS erroneously disabling tombstone compactions > > > Key: CASSANDRA-14496 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14496 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Robert Tarrall >Priority: Minor > > This code: > {code:java} > this.options = new TimeWindowCompactionStrategyOptions(options); > if > (!options.containsKey(AbstractCompactionStrategy.TOMBSTONE_COMPACTION_INTERVAL_OPTION) > && > !options.containsKey(AbstractCompactionStrategy.TOMBSTONE_THRESHOLD_OPTION)) > { > disableTombstoneCompactions = true; > logger.debug("Disabling tombstone compactions for TWCS"); > } > else > logger.debug("Enabling tombstone compactions for TWCS"); > } > {code} > ... in TimeWindowCompactionStrategy.java disables tombstone compactions in > TWCS if you have not *explicitly* set either tombstone_compaction_interval or > tombstone_threshold. Adding 'tombstone_compaction_interval': '86400' to the > compaction stanza in a table definition has the (to me unexpected) side > effect of enabling tombstone compactions. > This is surprising and does not appear to be mentioned in the docs. > I would suggest that tombstone compactions should be run unless these options > are both set to 0. > If the concern is that (as with DTCS in CASSANDRA-9234) we don't want to > waste time on tombstone compactions when we expect the tables to eventually > be expired away, perhaps we should also check unchecked_tombstone_compaction > and still enable tombstone compactions if that's set to true. > May also make sense to set defaults for interval & threshold to 0 & disable > if they're nonzero so that setting non-default values, rather than setting > ANY value, is what determines whether tombstone compactions are enabled? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14346) Scheduled Repair in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16501081#comment-16501081 ] Joseph Lynch commented on CASSANDRA-14346: -- Just to give an update here, I'm actively working on porting our Priam specific (guice, netflix metrics/logging/monitoring, etc) sidecar to a generic java sidecar that works with Cassandra trunk per option B in the [design doc|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit#heading=h.5f10ng8gzle8]. I'll hopefully hit the JMX tickets as I need them in the port. > Scheduled Repair in Cassandra > - > > Key: CASSANDRA-14346 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14346 > Project: Cassandra > Issue Type: Improvement > Components: Repair >Reporter: Joseph Lynch >Priority: Major > Labels: CommunityFeedbackRequested > Fix For: 4.0 > > Attachments: ScheduledRepairV1_20180327.pdf > > > There have been many attempts to automate repair in Cassandra, which makes > sense given that it is necessary to give our users eventual consistency. Most > recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked > for ways to solve this problem. > At Netflix we've built a scheduled repair service within Priam (our sidecar), > which we spoke about last year at NGCC. Given the positive feedback at NGCC > we focussed on getting it production ready and have now been using it in > production to repair hundreds of clusters, tens of thousands of nodes, and > petabytes of data for the past six months. Also based on feedback at NGCC we > have invested effort in figuring out how to integrate this natively into > Cassandra rather than open sourcing it as an external service (e.g. in Priam). > As such, [~vinaykumarcse] and I would like to re-work and merge our > implementation into Cassandra, and have created a [design > document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing] > showing how we plan to make it happen, including the the user interface. > As we work on the code migration from Priam to Cassandra, any feedback would > be greatly appreciated about the interface or v1 implementation features. I > have tried to call out in the document features which we explicitly consider > future work (as well as a path forward to implement them in the future) > because I would very much like to get this done before the 4.0 merge window > closes, and to do that I think aggressively pruning scope is going to be a > necessity. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14496) TWCS erroneously disabling tombstone compactions
Robert Tarrall created CASSANDRA-14496: -- Summary: TWCS erroneously disabling tombstone compactions Key: CASSANDRA-14496 URL: https://issues.apache.org/jira/browse/CASSANDRA-14496 Project: Cassandra Issue Type: Bug Components: Compaction Reporter: Robert Tarrall This code: {code:java} this.options = new TimeWindowCompactionStrategyOptions(options); if (!options.containsKey(AbstractCompactionStrategy.TOMBSTONE_COMPACTION_INTERVAL_OPTION) && !options.containsKey(AbstractCompactionStrategy.TOMBSTONE_THRESHOLD_OPTION)) { disableTombstoneCompactions = true; logger.debug("Disabling tombstone compactions for TWCS"); } else logger.debug("Enabling tombstone compactions for TWCS"); } {code} ... in TimeWindowCompactionStrategy.java disables tombstone compactions in TWCS if you have not *explicitly* set either tombstone_compaction_interval or tombstone_threshold. Adding 'tombstone_compaction_interval': '86400' to the compaction stanza in a table definition has the (to me unexpected) side effect of enabling tombstone compactions. This is surprising and does not appear to be mentioned in the docs. I would suggest that tombstone compactions should be run unless these options are both set to 0. If the concern is that (as with DTCS in CASSANDRA-9234) we don't want to waste time on tombstone compactions when we expect the tables to eventually be expired away, perhaps we should also check unchecked_tombstone_compaction and still enable tombstone compactions if that's set to true. May also make sense to set defaults for interval & threshold to 0 & disable if they're nonzero so that setting non-default values, rather than setting ANY value, is what determines whether tombstone compactions are enabled? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14466) Enable Direct I/O
[ https://issues.apache.org/jira/browse/CASSANDRA-14466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500942#comment-16500942 ] Ariel Weisberg commented on CASSANDRA-14466: We have thought about doing that, but it's tricky to replace buffered IO, manage read ahead, and when I looked at it there wasn't a huge benefit because the page cache is scan resistant. > Enable Direct I/O > -- > > Key: CASSANDRA-14466 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14466 > Project: Cassandra > Issue Type: New Feature > Components: Local Write-Read Paths >Reporter: Mulugeta Mammo >Priority: Major > Attachments: direct_io.patch > > > Hi, > JDK 10 introduced a new API for Direct IO that enables applications to bypass > the file system cache and potentially improve performance. Details of this > feature can be found at [https://bugs.openjdk.java.net/browse/JDK-8164900]. > This patch uses the JDK 10 API to enable Direct IO for the Cassandra read > path. By default, we have disabled this feature; but it can be enabled using > a new configuration parameter, enable_direct_io_for_read_path. We have > conducted a Cassandra read-only stress test and measured a throughput gain of > up to 60% on flash drives. > The patch requires JDK 10 Cassandra Support - > https://issues.apache.org/jira/browse/CASSANDRA-9608 > Please review the patch and let us know your feedback. > Thanks, > [^direct_io.patch] > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14466) Enable Direct I/O
[ https://issues.apache.org/jira/browse/CASSANDRA-14466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500942#comment-16500942 ] Ariel Weisberg edited comment on CASSANDRA-14466 at 6/4/18 10:16 PM: - We have thought about doing that, but it's tricky to replace buffered IO, manage read ahead, and when I looked at it there wasn't a huge benefit because the page cache is scan resistant. Read ahead matters a LOT less with flash. Especially in the compressed case where we read 64k at a time which is not terrible. was (Author: aweisberg): We have thought about doing that, but it's tricky to replace buffered IO, manage read ahead, and when I looked at it there wasn't a huge benefit because the page cache is scan resistant. > Enable Direct I/O > -- > > Key: CASSANDRA-14466 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14466 > Project: Cassandra > Issue Type: New Feature > Components: Local Write-Read Paths >Reporter: Mulugeta Mammo >Priority: Major > Attachments: direct_io.patch > > > Hi, > JDK 10 introduced a new API for Direct IO that enables applications to bypass > the file system cache and potentially improve performance. Details of this > feature can be found at [https://bugs.openjdk.java.net/browse/JDK-8164900]. > This patch uses the JDK 10 API to enable Direct IO for the Cassandra read > path. By default, we have disabled this feature; but it can be enabled using > a new configuration parameter, enable_direct_io_for_read_path. We have > conducted a Cassandra read-only stress test and measured a throughput gain of > up to 60% on flash drives. > The patch requires JDK 10 Cassandra Support - > https://issues.apache.org/jira/browse/CASSANDRA-9608 > Please review the patch and let us know your feedback. > Thanks, > [^direct_io.patch] > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14466) Enable Direct I/O
[ https://issues.apache.org/jira/browse/CASSANDRA-14466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500910#comment-16500910 ] Jon Haddad commented on CASSANDRA-14466: One thing I realized over the weekend, at the very least, we should be using direct I/O on compaction. There's no benefit to pulling a bunch of data into cache that we're about to delete. > Enable Direct I/O > -- > > Key: CASSANDRA-14466 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14466 > Project: Cassandra > Issue Type: New Feature > Components: Local Write-Read Paths >Reporter: Mulugeta Mammo >Priority: Major > Attachments: direct_io.patch > > > Hi, > JDK 10 introduced a new API for Direct IO that enables applications to bypass > the file system cache and potentially improve performance. Details of this > feature can be found at [https://bugs.openjdk.java.net/browse/JDK-8164900]. > This patch uses the JDK 10 API to enable Direct IO for the Cassandra read > path. By default, we have disabled this feature; but it can be enabled using > a new configuration parameter, enable_direct_io_for_read_path. We have > conducted a Cassandra read-only stress test and measured a throughput gain of > up to 60% on flash drives. > The patch requires JDK 10 Cassandra Support - > https://issues.apache.org/jira/browse/CASSANDRA-9608 > Please review the patch and let us know your feedback. > Thanks, > [^direct_io.patch] > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14495) Memory Leak /High Memory usage post 3.11.2 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500633#comment-16500633 ] Chris Lohfink commented on CASSANDRA-14495: --- If you dont have long gc pauses or OOM exceptions I wouldn't start trying to tune the GC settings if I were you. Your solving a problem you don't have. > Memory Leak /High Memory usage post 3.11.2 upgrade > -- > > Key: CASSANDRA-14495 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14495 > Project: Cassandra > Issue Type: Bug > Components: Metrics >Reporter: Abdul Patel >Priority: Major > Attachments: cas_heap.txt > > > Hi All, > > I recently upgraded my non prod cassandra cluster( 4 nodes single DC) from > 3.10 to 3.11.2 version. > No issues reported apart from only nodetool info reporting 80% usage . > I intially had 16GB memory on each node, later i bumped up to 20GB, and > rebooted all nodes. > Waited for an week and now again i have seen memory usage more than 80% , > 16GB + . > this means some memory leaks are happening over the time. > Any one has faced such issue or do we have any workaround ? my 3.11.2 version > upgrade rollout has been halted because of this bug. > === > ID : 65b64f5a-7fe6-4036-94c8-8da9c57718cc > Gossip active : true > Thrift active : true > Native Transport active: true > Load : 985.24 MiB > Generation No : 1526923117 > Uptime (seconds) : 1097684 > Heap Memory (MB) : 16875.64 / 20480.00 > Off Heap Memory (MB) : 20.42 > Data Center : DC7 > Rack : rac1 > Exceptions : 0 > Key Cache : entries 3569, size 421.44 KiB, capacity 100 MiB, > 7931933 hits, 8098632 requests, 0.979 recent hit rate, 14400 save period in > seconds > Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 > requests, NaN recent hit rate, 0 save period in seconds > Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 > requests, NaN recent hit rate, 7200 save period in seconds > Chunk Cache : entries 2361, size 147.56 MiB, capacity 3.97 GiB, > 2412803 misses, 72594047 requests, 0.967 recent hit rate, NaN microseconds > miss latency > Percent Repaired : 99.88086234106282% > Token : (invoke with -T/--tokens to see all 256 tokens) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14495) Memory Leak /High Memory usage post 3.11.2 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500541#comment-16500541 ] Abdul Patel commented on CASSANDRA-14495: - Its says below for G1GC: ## Optional G1 Settings # Save CPU time on large (>= 16GB) heaps by delaying region scanning # until the heap is 70% full. The default in Hotspot 8u40 is 40%. #-XX:InitiatingHeapOccupancyPercent=70 > Memory Leak /High Memory usage post 3.11.2 upgrade > -- > > Key: CASSANDRA-14495 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14495 > Project: Cassandra > Issue Type: Bug > Components: Metrics >Reporter: Abdul Patel >Priority: Major > Attachments: cas_heap.txt > > > Hi All, > > I recently upgraded my non prod cassandra cluster( 4 nodes single DC) from > 3.10 to 3.11.2 version. > No issues reported apart from only nodetool info reporting 80% usage . > I intially had 16GB memory on each node, later i bumped up to 20GB, and > rebooted all nodes. > Waited for an week and now again i have seen memory usage more than 80% , > 16GB + . > this means some memory leaks are happening over the time. > Any one has faced such issue or do we have any workaround ? my 3.11.2 version > upgrade rollout has been halted because of this bug. > === > ID : 65b64f5a-7fe6-4036-94c8-8da9c57718cc > Gossip active : true > Thrift active : true > Native Transport active: true > Load : 985.24 MiB > Generation No : 1526923117 > Uptime (seconds) : 1097684 > Heap Memory (MB) : 16875.64 / 20480.00 > Off Heap Memory (MB) : 20.42 > Data Center : DC7 > Rack : rac1 > Exceptions : 0 > Key Cache : entries 3569, size 421.44 KiB, capacity 100 MiB, > 7931933 hits, 8098632 requests, 0.979 recent hit rate, 14400 save period in > seconds > Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 > requests, NaN recent hit rate, 0 save period in seconds > Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 > requests, NaN recent hit rate, 7200 save period in seconds > Chunk Cache : entries 2361, size 147.56 MiB, capacity 3.97 GiB, > 2412803 misses, 72594047 requests, 0.967 recent hit rate, NaN microseconds > miss latency > Percent Repaired : 99.88086234106282% > Token : (invoke with -T/--tokens to see all 256 tokens) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14495) Memory Leak /High Memory usage post 3.11.2 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500440#comment-16500440 ] Chris Lohfink commented on CASSANDRA-14495: --- More CPU than necessary possibly in exchange of keeping heap more empty. Honestly I dont think you need to do anything, I personally like that value lower just to reduce impact on fragmentation of the old space when using CMS, if your using G1 it has completely different meaning. > Memory Leak /High Memory usage post 3.11.2 upgrade > -- > > Key: CASSANDRA-14495 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14495 > Project: Cassandra > Issue Type: Bug > Components: Metrics >Reporter: Abdul Patel >Priority: Major > Attachments: cas_heap.txt > > > Hi All, > > I recently upgraded my non prod cassandra cluster( 4 nodes single DC) from > 3.10 to 3.11.2 version. > No issues reported apart from only nodetool info reporting 80% usage . > I intially had 16GB memory on each node, later i bumped up to 20GB, and > rebooted all nodes. > Waited for an week and now again i have seen memory usage more than 80% , > 16GB + . > this means some memory leaks are happening over the time. > Any one has faced such issue or do we have any workaround ? my 3.11.2 version > upgrade rollout has been halted because of this bug. > === > ID : 65b64f5a-7fe6-4036-94c8-8da9c57718cc > Gossip active : true > Thrift active : true > Native Transport active: true > Load : 985.24 MiB > Generation No : 1526923117 > Uptime (seconds) : 1097684 > Heap Memory (MB) : 16875.64 / 20480.00 > Off Heap Memory (MB) : 20.42 > Data Center : DC7 > Rack : rac1 > Exceptions : 0 > Key Cache : entries 3569, size 421.44 KiB, capacity 100 MiB, > 7931933 hits, 8098632 requests, 0.979 recent hit rate, 14400 save period in > seconds > Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 > requests, NaN recent hit rate, 0 save period in seconds > Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 > requests, NaN recent hit rate, 7200 save period in seconds > Chunk Cache : entries 2361, size 147.56 MiB, capacity 3.97 GiB, > 2412803 misses, 72594047 requests, 0.967 recent hit rate, NaN microseconds > miss latency > Percent Repaired : 99.88086234106282% > Token : (invoke with -T/--tokens to see all 256 tokens) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14457) Add a virtual table with current compactions
[ https://issues.apache.org/jira/browse/CASSANDRA-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500434#comment-16500434 ] Chris Lohfink commented on CASSANDRA-14457: --- minor note: renamed compaction_id to task_id since not calling them compactions anymore > Add a virtual table with current compactions > > > Key: CASSANDRA-14457 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14457 > Project: Cassandra > Issue Type: New Feature >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Minor > Fix For: 4.x > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14457) Add a virtual table with current compactions
[ https://issues.apache.org/jira/browse/CASSANDRA-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500429#comment-16500429 ] Chris Lohfink commented on CASSANDRA-14457: --- Latest looks something like: {code} cqlsh> desc table system_views.sstable_tasks; CREATE TABLE system_views.sstable_tasks ( compaction_id uuid, keyspace_name text, kind text, progress bigint, table_name text, total bigint, unit text, PRIMARY KEY (keyspace_name, table_name, compaction_id) ) WITH CLUSTERING ORDER BY (table_name ASC, compaction_id ASC) AND compaction = {'class': 'None'} AND compression = {}; cqlsh> SELECT * FROM system_views.sstable_tasks; keyspace_name | table_name | compaction_id| kind | progress | total| unit ---++--++--+--+--- basic | wide3 | 066ba210-6811-11e8-ade8-f5df16641a9d | compaction | 2266347 | 33091208 | bytes (1 rows) {code} > Add a virtual table with current compactions > > > Key: CASSANDRA-14457 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14457 > Project: Cassandra > Issue Type: New Feature >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Minor > Fix For: 4.x > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14495) Memory Leak /High Memory usage post 3.11.2 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500403#comment-16500403 ] Abdul Patel commented on CASSANDRA-14495: - got it , you are recommeding this parameter to be set @ 55? what would be downside of it? -XX:InitiatingHeapOccupancyPercent=70 > Memory Leak /High Memory usage post 3.11.2 upgrade > -- > > Key: CASSANDRA-14495 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14495 > Project: Cassandra > Issue Type: Bug > Components: Metrics >Reporter: Abdul Patel >Priority: Major > Attachments: cas_heap.txt > > > Hi All, > > I recently upgraded my non prod cassandra cluster( 4 nodes single DC) from > 3.10 to 3.11.2 version. > No issues reported apart from only nodetool info reporting 80% usage . > I intially had 16GB memory on each node, later i bumped up to 20GB, and > rebooted all nodes. > Waited for an week and now again i have seen memory usage more than 80% , > 16GB + . > this means some memory leaks are happening over the time. > Any one has faced such issue or do we have any workaround ? my 3.11.2 version > upgrade rollout has been halted because of this bug. > === > ID : 65b64f5a-7fe6-4036-94c8-8da9c57718cc > Gossip active : true > Thrift active : true > Native Transport active: true > Load : 985.24 MiB > Generation No : 1526923117 > Uptime (seconds) : 1097684 > Heap Memory (MB) : 16875.64 / 20480.00 > Off Heap Memory (MB) : 20.42 > Data Center : DC7 > Rack : rac1 > Exceptions : 0 > Key Cache : entries 3569, size 421.44 KiB, capacity 100 MiB, > 7931933 hits, 8098632 requests, 0.979 recent hit rate, 14400 save period in > seconds > Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 > requests, NaN recent hit rate, 0 save period in seconds > Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 > requests, NaN recent hit rate, 7200 save period in seconds > Chunk Cache : entries 2361, size 147.56 MiB, capacity 3.97 GiB, > 2412803 misses, 72594047 requests, 0.967 recent hit rate, NaN microseconds > miss latency > Percent Repaired : 99.88086234106282% > Token : (invoke with -T/--tokens to see all 256 tokens) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14495) Memory Leak /High Memory usage post 3.11.2 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500398#comment-16500398 ] Chris Lohfink commented on CASSANDRA-14495: --- So what exactly is the problem? Your concerned because in the past the heap usage was lower and now its higher but it actually causes no issues? For what its worth, the way the JVM works - its expected behavior for this to slowly creep up over time. > Memory Leak /High Memory usage post 3.11.2 upgrade > -- > > Key: CASSANDRA-14495 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14495 > Project: Cassandra > Issue Type: Bug > Components: Metrics >Reporter: Abdul Patel >Priority: Major > Attachments: cas_heap.txt > > > Hi All, > > I recently upgraded my non prod cassandra cluster( 4 nodes single DC) from > 3.10 to 3.11.2 version. > No issues reported apart from only nodetool info reporting 80% usage . > I intially had 16GB memory on each node, later i bumped up to 20GB, and > rebooted all nodes. > Waited for an week and now again i have seen memory usage more than 80% , > 16GB + . > this means some memory leaks are happening over the time. > Any one has faced such issue or do we have any workaround ? my 3.11.2 version > upgrade rollout has been halted because of this bug. > === > ID : 65b64f5a-7fe6-4036-94c8-8da9c57718cc > Gossip active : true > Thrift active : true > Native Transport active: true > Load : 985.24 MiB > Generation No : 1526923117 > Uptime (seconds) : 1097684 > Heap Memory (MB) : 16875.64 / 20480.00 > Off Heap Memory (MB) : 20.42 > Data Center : DC7 > Rack : rac1 > Exceptions : 0 > Key Cache : entries 3569, size 421.44 KiB, capacity 100 MiB, > 7931933 hits, 8098632 requests, 0.979 recent hit rate, 14400 save period in > seconds > Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 > requests, NaN recent hit rate, 0 save period in seconds > Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 > requests, NaN recent hit rate, 7200 save period in seconds > Chunk Cache : entries 2361, size 147.56 MiB, capacity 3.97 GiB, > 2412803 misses, 72594047 requests, 0.967 recent hit rate, NaN microseconds > miss latency > Percent Repaired : 99.88086234106282% > Token : (invoke with -T/--tokens to see all 256 tokens) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14495) Memory Leak /High Memory usage post 3.11.2 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500389#comment-16500389 ] Abdul Patel commented on CASSANDRA-14495: - I am more concern on why this sudden behaviour in this version only ? could you please share inputs on how i can work on =decrease initiating occupancy (55% I'd recommend) ? we already have G1GC setup, anything else can be done > Memory Leak /High Memory usage post 3.11.2 upgrade > -- > > Key: CASSANDRA-14495 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14495 > Project: Cassandra > Issue Type: Bug > Components: Metrics >Reporter: Abdul Patel >Priority: Major > Attachments: cas_heap.txt > > > Hi All, > > I recently upgraded my non prod cassandra cluster( 4 nodes single DC) from > 3.10 to 3.11.2 version. > No issues reported apart from only nodetool info reporting 80% usage . > I intially had 16GB memory on each node, later i bumped up to 20GB, and > rebooted all nodes. > Waited for an week and now again i have seen memory usage more than 80% , > 16GB + . > this means some memory leaks are happening over the time. > Any one has faced such issue or do we have any workaround ? my 3.11.2 version > upgrade rollout has been halted because of this bug. > === > ID : 65b64f5a-7fe6-4036-94c8-8da9c57718cc > Gossip active : true > Thrift active : true > Native Transport active: true > Load : 985.24 MiB > Generation No : 1526923117 > Uptime (seconds) : 1097684 > Heap Memory (MB) : 16875.64 / 20480.00 > Off Heap Memory (MB) : 20.42 > Data Center : DC7 > Rack : rac1 > Exceptions : 0 > Key Cache : entries 3569, size 421.44 KiB, capacity 100 MiB, > 7931933 hits, 8098632 requests, 0.979 recent hit rate, 14400 save period in > seconds > Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 > requests, NaN recent hit rate, 0 save period in seconds > Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 > requests, NaN recent hit rate, 7200 save period in seconds > Chunk Cache : entries 2361, size 147.56 MiB, capacity 3.97 GiB, > 2412803 misses, 72594047 requests, 0.967 recent hit rate, NaN microseconds > miss latency > Percent Repaired : 99.88086234106282% > Token : (invoke with -T/--tokens to see all 256 tokens) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14495) Memory Leak /High Memory usage post 3.11.2 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500378#comment-16500378 ] Chris Lohfink commented on CASSANDRA-14495: --- Do you have GC thrashing and OOM exceptions or just worried about high utilization? With a 75% (default) initiating occupancy and probably a few gb of YG you would perfectly expect the heap memory to reach over 80 or even 90%. Thats not an issue, but perfectly expected and functioning behavior. If it really concerns you (it shouldn't) You can decrease initiating occupancy (55% I'd recommend) and kick off old gen cleanup earlier. With G1 you can increase the ReserveSpace to like 15% to keep it under 85%. > Memory Leak /High Memory usage post 3.11.2 upgrade > -- > > Key: CASSANDRA-14495 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14495 > Project: Cassandra > Issue Type: Bug > Components: Metrics >Reporter: Abdul Patel >Priority: Major > Attachments: cas_heap.txt > > > Hi All, > > I recently upgraded my non prod cassandra cluster( 4 nodes single DC) from > 3.10 to 3.11.2 version. > No issues reported apart from only nodetool info reporting 80% usage . > I intially had 16GB memory on each node, later i bumped up to 20GB, and > rebooted all nodes. > Waited for an week and now again i have seen memory usage more than 80% , > 16GB + . > this means some memory leaks are happening over the time. > Any one has faced such issue or do we have any workaround ? my 3.11.2 version > upgrade rollout has been halted because of this bug. > === > ID : 65b64f5a-7fe6-4036-94c8-8da9c57718cc > Gossip active : true > Thrift active : true > Native Transport active: true > Load : 985.24 MiB > Generation No : 1526923117 > Uptime (seconds) : 1097684 > Heap Memory (MB) : 16875.64 / 20480.00 > Off Heap Memory (MB) : 20.42 > Data Center : DC7 > Rack : rac1 > Exceptions : 0 > Key Cache : entries 3569, size 421.44 KiB, capacity 100 MiB, > 7931933 hits, 8098632 requests, 0.979 recent hit rate, 14400 save period in > seconds > Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 > requests, NaN recent hit rate, 0 save period in seconds > Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 > requests, NaN recent hit rate, 7200 save period in seconds > Chunk Cache : entries 2361, size 147.56 MiB, capacity 3.97 GiB, > 2412803 misses, 72594047 requests, 0.967 recent hit rate, NaN microseconds > miss latency > Percent Repaired : 99.88086234106282% > Token : (invoke with -T/--tokens to see all 256 tokens) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14495) Memory Leak /High Memory usage post 3.11.2 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500366#comment-16500366 ] Abdul Patel commented on CASSANDRA-14495: - Attached the output , out of 20GB 14Gb is used on one of node for now > Memory Leak /High Memory usage post 3.11.2 upgrade > -- > > Key: CASSANDRA-14495 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14495 > Project: Cassandra > Issue Type: Bug > Components: Metrics >Reporter: Abdul Patel >Priority: Major > Attachments: cas_heap.txt > > > Hi All, > > I recently upgraded my non prod cassandra cluster( 4 nodes single DC) from > 3.10 to 3.11.2 version. > No issues reported apart from only nodetool info reporting 80% usage . > I intially had 16GB memory on each node, later i bumped up to 20GB, and > rebooted all nodes. > Waited for an week and now again i have seen memory usage more than 80% , > 16GB + . > this means some memory leaks are happening over the time. > Any one has faced such issue or do we have any workaround ? my 3.11.2 version > upgrade rollout has been halted because of this bug. > === > ID : 65b64f5a-7fe6-4036-94c8-8da9c57718cc > Gossip active : true > Thrift active : true > Native Transport active: true > Load : 985.24 MiB > Generation No : 1526923117 > Uptime (seconds) : 1097684 > Heap Memory (MB) : 16875.64 / 20480.00 > Off Heap Memory (MB) : 20.42 > Data Center : DC7 > Rack : rac1 > Exceptions : 0 > Key Cache : entries 3569, size 421.44 KiB, capacity 100 MiB, > 7931933 hits, 8098632 requests, 0.979 recent hit rate, 14400 save period in > seconds > Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 > requests, NaN recent hit rate, 0 save period in seconds > Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 > requests, NaN recent hit rate, 7200 save period in seconds > Chunk Cache : entries 2361, size 147.56 MiB, capacity 3.97 GiB, > 2412803 misses, 72594047 requests, 0.967 recent hit rate, NaN microseconds > miss latency > Percent Repaired : 99.88086234106282% > Token : (invoke with -T/--tokens to see all 256 tokens) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14495) Memory Leak /High Memory usage post 3.11.2 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abdul Patel updated CASSANDRA-14495: Attachment: cas_heap.txt > Memory Leak /High Memory usage post 3.11.2 upgrade > -- > > Key: CASSANDRA-14495 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14495 > Project: Cassandra > Issue Type: Bug > Components: Metrics >Reporter: Abdul Patel >Priority: Major > Attachments: cas_heap.txt > > > Hi All, > > I recently upgraded my non prod cassandra cluster( 4 nodes single DC) from > 3.10 to 3.11.2 version. > No issues reported apart from only nodetool info reporting 80% usage . > I intially had 16GB memory on each node, later i bumped up to 20GB, and > rebooted all nodes. > Waited for an week and now again i have seen memory usage more than 80% , > 16GB + . > this means some memory leaks are happening over the time. > Any one has faced such issue or do we have any workaround ? my 3.11.2 version > upgrade rollout has been halted because of this bug. > === > ID : 65b64f5a-7fe6-4036-94c8-8da9c57718cc > Gossip active : true > Thrift active : true > Native Transport active: true > Load : 985.24 MiB > Generation No : 1526923117 > Uptime (seconds) : 1097684 > Heap Memory (MB) : 16875.64 / 20480.00 > Off Heap Memory (MB) : 20.42 > Data Center : DC7 > Rack : rac1 > Exceptions : 0 > Key Cache : entries 3569, size 421.44 KiB, capacity 100 MiB, > 7931933 hits, 8098632 requests, 0.979 recent hit rate, 14400 save period in > seconds > Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 > requests, NaN recent hit rate, 0 save period in seconds > Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 > requests, NaN recent hit rate, 7200 save period in seconds > Chunk Cache : entries 2361, size 147.56 MiB, capacity 3.97 GiB, > 2412803 misses, 72594047 requests, 0.967 recent hit rate, NaN microseconds > miss latency > Percent Repaired : 99.88086234106282% > Token : (invoke with -T/--tokens to see all 256 tokens) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14495) Memory Leak /High Memory usage post 3.11.2 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500321#comment-16500321 ] Chris Lohfink edited comment on CASSANDRA-14495 at 6/4/18 2:49 PM: --- Can you include heap histogram ({{jmap -histo CASSANDRA_PID}}) to see whats in the heap? was (Author: cnlwsu): Can you include heap histogram ({{jmap -histo CASSANDRA_PID}}) to see whats in the heap?}} > Memory Leak /High Memory usage post 3.11.2 upgrade > -- > > Key: CASSANDRA-14495 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14495 > Project: Cassandra > Issue Type: Bug > Components: Metrics >Reporter: Abdul Patel >Priority: Major > > Hi All, > > I recently upgraded my non prod cassandra cluster( 4 nodes single DC) from > 3.10 to 3.11.2 version. > No issues reported apart from only nodetool info reporting 80% usage . > I intially had 16GB memory on each node, later i bumped up to 20GB, and > rebooted all nodes. > Waited for an week and now again i have seen memory usage more than 80% , > 16GB + . > this means some memory leaks are happening over the time. > Any one has faced such issue or do we have any workaround ? my 3.11.2 version > upgrade rollout has been halted because of this bug. > === > ID : 65b64f5a-7fe6-4036-94c8-8da9c57718cc > Gossip active : true > Thrift active : true > Native Transport active: true > Load : 985.24 MiB > Generation No : 1526923117 > Uptime (seconds) : 1097684 > Heap Memory (MB) : 16875.64 / 20480.00 > Off Heap Memory (MB) : 20.42 > Data Center : DC7 > Rack : rac1 > Exceptions : 0 > Key Cache : entries 3569, size 421.44 KiB, capacity 100 MiB, > 7931933 hits, 8098632 requests, 0.979 recent hit rate, 14400 save period in > seconds > Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 > requests, NaN recent hit rate, 0 save period in seconds > Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 > requests, NaN recent hit rate, 7200 save period in seconds > Chunk Cache : entries 2361, size 147.56 MiB, capacity 3.97 GiB, > 2412803 misses, 72594047 requests, 0.967 recent hit rate, NaN microseconds > miss latency > Percent Repaired : 99.88086234106282% > Token : (invoke with -T/--tokens to see all 256 tokens) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14495) Memory Leak /High Memory usage post 3.11.2 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500321#comment-16500321 ] Chris Lohfink edited comment on CASSANDRA-14495 at 6/4/18 2:48 PM: --- Can you include heap histogram ({{jmap -histo CASSANDRA_PID}}) to see whats in the heap?}} was (Author: cnlwsu): Can you include heap histogram (jmap -histo CASSANDRA_PID}}) to see whats in the heap?}} > Memory Leak /High Memory usage post 3.11.2 upgrade > -- > > Key: CASSANDRA-14495 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14495 > Project: Cassandra > Issue Type: Bug > Components: Metrics >Reporter: Abdul Patel >Priority: Major > > Hi All, > > I recently upgraded my non prod cassandra cluster( 4 nodes single DC) from > 3.10 to 3.11.2 version. > No issues reported apart from only nodetool info reporting 80% usage . > I intially had 16GB memory on each node, later i bumped up to 20GB, and > rebooted all nodes. > Waited for an week and now again i have seen memory usage more than 80% , > 16GB + . > this means some memory leaks are happening over the time. > Any one has faced such issue or do we have any workaround ? my 3.11.2 version > upgrade rollout has been halted because of this bug. > === > ID : 65b64f5a-7fe6-4036-94c8-8da9c57718cc > Gossip active : true > Thrift active : true > Native Transport active: true > Load : 985.24 MiB > Generation No : 1526923117 > Uptime (seconds) : 1097684 > Heap Memory (MB) : 16875.64 / 20480.00 > Off Heap Memory (MB) : 20.42 > Data Center : DC7 > Rack : rac1 > Exceptions : 0 > Key Cache : entries 3569, size 421.44 KiB, capacity 100 MiB, > 7931933 hits, 8098632 requests, 0.979 recent hit rate, 14400 save period in > seconds > Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 > requests, NaN recent hit rate, 0 save period in seconds > Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 > requests, NaN recent hit rate, 7200 save period in seconds > Chunk Cache : entries 2361, size 147.56 MiB, capacity 3.97 GiB, > 2412803 misses, 72594047 requests, 0.967 recent hit rate, NaN microseconds > miss latency > Percent Repaired : 99.88086234106282% > Token : (invoke with -T/--tokens to see all 256 tokens) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14495) Memory Leak /High Memory usage post 3.11.2 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500321#comment-16500321 ] Chris Lohfink commented on CASSANDRA-14495: --- Can you include heap histogram (jmap -histo CASSANDRA_PID}}) to see whats in the heap?}} > Memory Leak /High Memory usage post 3.11.2 upgrade > -- > > Key: CASSANDRA-14495 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14495 > Project: Cassandra > Issue Type: Bug > Components: Metrics >Reporter: Abdul Patel >Priority: Major > > Hi All, > > I recently upgraded my non prod cassandra cluster( 4 nodes single DC) from > 3.10 to 3.11.2 version. > No issues reported apart from only nodetool info reporting 80% usage . > I intially had 16GB memory on each node, later i bumped up to 20GB, and > rebooted all nodes. > Waited for an week and now again i have seen memory usage more than 80% , > 16GB + . > this means some memory leaks are happening over the time. > Any one has faced such issue or do we have any workaround ? my 3.11.2 version > upgrade rollout has been halted because of this bug. > === > ID : 65b64f5a-7fe6-4036-94c8-8da9c57718cc > Gossip active : true > Thrift active : true > Native Transport active: true > Load : 985.24 MiB > Generation No : 1526923117 > Uptime (seconds) : 1097684 > Heap Memory (MB) : 16875.64 / 20480.00 > Off Heap Memory (MB) : 20.42 > Data Center : DC7 > Rack : rac1 > Exceptions : 0 > Key Cache : entries 3569, size 421.44 KiB, capacity 100 MiB, > 7931933 hits, 8098632 requests, 0.979 recent hit rate, 14400 save period in > seconds > Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 > requests, NaN recent hit rate, 0 save period in seconds > Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 > requests, NaN recent hit rate, 7200 save period in seconds > Chunk Cache : entries 2361, size 147.56 MiB, capacity 3.97 GiB, > 2412803 misses, 72594047 requests, 0.967 recent hit rate, NaN microseconds > miss latency > Percent Repaired : 99.88086234106282% > Token : (invoke with -T/--tokens to see all 256 tokens) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14355) Memory leak
[ https://issues.apache.org/jira/browse/CASSANDRA-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500249#comment-16500249 ] Abdul Patel commented on CASSANDRA-14355: - If 3.11.2 release has memory leak issues, whats the next best patch version ? > Memory leak > --- > > Key: CASSANDRA-14355 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14355 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Debian Jessie, OpenJDK 1.8.0_151 >Reporter: Eric Evans >Priority: Major > Fix For: 3.11.3 > > Attachments: 01_Screenshot from 2018-04-04 14-24-00.png, > 02_Screenshot from 2018-04-04 14-28-33.png, 03_Screenshot from 2018-04-04 > 14-24-50.png > > > We're seeing regular, frequent {{OutOfMemoryError}} exceptions. Similar to > CASSANDRA-13754, an analysis of the heap dumps shows the heap consumed by the > {{threadLocals}} member of the instances of > {{io.netty.util.concurrent.FastThreadLocalThread}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14495) Memory Leak /High Memory usage post 3.11.2 upgrade
Abdul Patel created CASSANDRA-14495: --- Summary: Memory Leak /High Memory usage post 3.11.2 upgrade Key: CASSANDRA-14495 URL: https://issues.apache.org/jira/browse/CASSANDRA-14495 Project: Cassandra Issue Type: Bug Components: Metrics Reporter: Abdul Patel Hi All, I recently upgraded my non prod cassandra cluster( 4 nodes single DC) from 3.10 to 3.11.2 version. No issues reported apart from only nodetool info reporting 80% usage . I intially had 16GB memory on each node, later i bumped up to 20GB, and rebooted all nodes. Waited for an week and now again i have seen memory usage more than 80% , 16GB + . this means some memory leaks are happening over the time. Any one has faced such issue or do we have any workaround ? my 3.11.2 version upgrade rollout has been halted because of this bug. === ID : 65b64f5a-7fe6-4036-94c8-8da9c57718cc Gossip active : true Thrift active : true Native Transport active: true Load : 985.24 MiB Generation No : 1526923117 Uptime (seconds) : 1097684 Heap Memory (MB) : 16875.64 / 20480.00 Off Heap Memory (MB) : 20.42 Data Center : DC7 Rack : rac1 Exceptions : 0 Key Cache : entries 3569, size 421.44 KiB, capacity 100 MiB, 7931933 hits, 8098632 requests, 0.979 recent hit rate, 14400 save period in seconds Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds Chunk Cache : entries 2361, size 147.56 MiB, capacity 3.97 GiB, 2412803 misses, 72594047 requests, 0.967 recent hit rate, NaN microseconds miss latency Percent Repaired : 99.88086234106282% Token : (invoke with -T/--tokens to see all 256 tokens) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14457) Add a virtual table with current compactions
[ https://issues.apache.org/jira/browse/CASSANDRA-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500102#comment-16500102 ] Sam Tunnicliffe commented on CASSANDRA-14457: - Nice, this looks really good. [~cnlwsu] if you don't mind me chipping in, I have a couple of comments on naming. Could we avoid the requirement to quote some column names by adopting the pattern used in system tables? {{keyspace_name/table_name}} would remove the need to quote and be consistent with existing tables. Using {{undefined}} for tasks without a specific keyspace/table won't really fly as that's a perfectly valid identifier. For tasks without a specific target how about using values which are totally illegal for actual keyspace/table names like {{"all keyspaces"}} and {{"all tables"}}? I understand that the naming of the non-primary key columns is designed to make the layout in cqlsh user friendly, but I think we could achieve the same thing with a bit more meaningful naming ({{progess_total}} for example, is a bit unintuitive). OTOMH, a set of column names like {{kind, progress, size, unit}} would produce a usable display and communicate what the columns represent a bit better. Lastly, I'm a bit bothered by this being named the {{compactions}} table and described as a "{{List of current compactions}}" when there's a whole bunch of operations that are going to show up in this table which have little to do with compaction, except for the fact that they may be utilising some resources from the compaction pool. I'm not suggesting that they shouldn't be included here, they definitely should be if this table is to be used in the same way as nodetool compactionstats is currently, but seeing as this both user facing and something of a blank canvas I think we should try to get this sort of thing as right as we can up front. My vote would be to rename this to something like {{sstable_tasks}} and open a separate jira to create an actual {{compactions}} table which would "contain" a strict subset of the data exposed here. > Add a virtual table with current compactions > > > Key: CASSANDRA-14457 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14457 > Project: Cassandra > Issue Type: New Feature >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Minor > Fix For: 4.x > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14457) Add a virtual table with current compactions
[ https://issues.apache.org/jira/browse/CASSANDRA-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500052#comment-16500052 ] Aleksey Yeschenko commented on CASSANDRA-14457: --- Just noticed that we are doing some unnecessary conversions, followed by reversals here. Which isn't a big deal, but feels a little bit silly: {{CompactionManager.getCompactions()}} conversts {{CompactionInfo}} objects to maps of strings to strings, then we convert those strings back to proper types in {{CompactionsTable}}. What would be better is if {{CompactionManager}} had a method that would return a collection or just an iterable of {{CompactionInfo}} objects that we could work with directly, without a redundant ser-deser cycle. > Add a virtual table with current compactions > > > Key: CASSANDRA-14457 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14457 > Project: Cassandra > Issue Type: New Feature >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Minor > Fix For: 4.x > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14442) Let nodetool import take a list of directories
[ https://issues.apache.org/jira/browse/CASSANDRA-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-14442: Resolution: Fixed Fix Version/s: (was: 4.x) 4.0 Status: Resolved (was: Patch Available) committed with a variation on your suggestion (using anyMatch instead of filter+findAny) as {{0f79427758c58cf9768898115c1da28fd71e3550}}, thanks! > Let nodetool import take a list of directories > -- > > Key: CASSANDRA-14442 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14442 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Major > Fix For: 4.0 > > > It should be possible to load sstables from several input directories when > running nodetool import. Directories that failed to import should be output. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
cassandra git commit: Let nodetool import take a list of directories to import
Repository: cassandra Updated Branches: refs/heads/trunk 4d8fc5b05 -> 0f7942775 Let nodetool import take a list of directories to import Patch by marcuse; reviewed by Jordan West for CASSANDRA-14442 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/0f794277 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/0f794277 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/0f794277 Branch: refs/heads/trunk Commit: 0f79427758c58cf9768898115c1da28fd71e3550 Parents: 4d8fc5b Author: Marcus Eriksson Authored: Fri May 4 14:59:26 2018 +0200 Committer: Marcus Eriksson Committed: Mon Jun 4 09:47:18 2018 +0200 -- CHANGES.txt | 1 + .../apache/cassandra/db/ColumnFamilyStore.java | 360 ++ .../cassandra/db/ColumnFamilyStoreMBean.java| 22 +- .../apache/cassandra/db/SSTableImporter.java| 463 +++ .../cassandra/db/compaction/Verifier.java | 52 +-- .../io/sstable/format/SSTableReader.java| 45 ++ .../org/apache/cassandra/tools/NodeProbe.java | 4 +- .../apache/cassandra/tools/nodetool/Import.java | 31 +- .../org/apache/cassandra/db/ImportTest.java | 364 +++ .../cassandra/io/sstable/SSTableReaderTest.java | 72 ++- 10 files changed, 934 insertions(+), 480 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/0f794277/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 179e7bb..86842d0 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 4.0 + * Let nodetool import take a list of directories (CASSANDRA-14442) * Avoid unneeded memory allocations / cpu for disabled log levels (CASSANDRA-14488) * Implement virtual keyspace interface (CASSANDRA-7622) * nodetool import cleanup and improvements (CASSANDRA-14417) http://git-wip-us.apache.org/repos/asf/cassandra/blob/0f794277/src/java/org/apache/cassandra/db/ColumnFamilyStore.java -- diff --git a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java index 4be65c6..9c4921e 100644 --- a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java +++ b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java @@ -225,6 +225,8 @@ public class ColumnFamilyStore implements ColumnFamilyStoreMBean private final TableRepairManager repairManager; +private final SSTableImporter sstableImporter; + private volatile boolean compactionSpaceCheck = true; @VisibleForTesting @@ -458,6 +460,7 @@ public class ColumnFamilyStore implements ColumnFamilyStoreMBean writeHandler = new CassandraTableWriteHandler(this); streamManager = new CassandraStreamManager(this); repairManager = new CassandraTableRepairManager(this); +sstableImporter = new SSTableImporter(this); } public void updateSpeculationThreshold() @@ -701,239 +704,43 @@ public class ColumnFamilyStore implements ColumnFamilyStoreMBean @Deprecated public void loadNewSSTables() { -ImportOptions options = ImportOptions.options().resetLevel(true).build(); -importNewSSTables(options); -} -/** - * Iterates over all keys in the sstable index and invalidates the row cache - * - * also counts the number of tokens that should be on each disk in JBOD-config to minimize the amount of data compaction - * needs to move around - */ -@VisibleForTesting -static File findBestDiskAndInvalidateCaches(ColumnFamilyStore cfs, Descriptor desc, String srcPath, boolean clearCaches, boolean jbodCheck) throws IOException -{ -int boundaryIndex = 0; -DiskBoundaries boundaries = cfs.getDiskBoundaries(); -boolean shouldCountKeys = boundaries.positions != null && jbodCheck; -if (!cfs.isRowCacheEnabled() || !clearCaches) -{ -if (srcPath == null) // user has dropped the sstables in the data directory, use it directly -return desc.directory; -if (boundaries.directories != null && boundaries.directories.size() == 1) // only a single data directory, use it without counting keys -return cfs.directories.getLocationForDisk(boundaries.directories.get(0)); -if (!shouldCountKeys) // for non-random partitioners positions can be null, get the directory with the most space available -return cfs.directories.getWriteableLocationToLoadFile(new File(desc.baseFilename())); -} - -long count = 0; -int maxIndex = 0; -long maxCount = 0; - -try (KeyIterator iter = new KeyIterator(desc, cfs.metadata())) -