[jira] [Updated] (CASSANDRA-15295) Running into deadlock when do CommitLog initialization
[ https://issues.apache.org/jira/browse/CASSANDRA-15295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zephyr Guo updated CASSANDRA-15295: --- Description: Recently, I found a cassandra(3.11.4) node stuck in STARTING status for a long time. I used jstack to saw what happened. The main thread stuck in *AbstractCommitLogSegmentManager.awaitAvailableSegment* !screenshot-1.png! The strange thing is COMMIT-LOG-ALLOCATOR thread state was runnable but it was not actually running. !screenshot-2.png! And then I used pstack to troubleshoot. I found COMMIT-LOG-ALLOCATOR block on java class initialization. !screenshot-3.png! This is a deadlock obviously. CommitLog waits for a CommitLogSegment when initializing. In this moment, the CommitLog class is not initialized and the main thread holds the class lock. After that, COMMIT-LOG-ALLOCATOR creates a CommitLogSegment with exception and call *CommitLog.handleCommitError*(static method). COMMIT-LOG-ALLOCATOR will block on this line because CommitLog class is still initializing. was: Recently, I found a cassandra(3.11.4) node stuck in STARTING status for a long time. I used jstack to saw what happened. The main thread stuck in *AbstractCommitLogSegmentManager.awaitAvailableSegment* !screenshot-1.png! The strange thing is COMMIT-LOG-ALLOCATOR thread state was runnable but it was not actually running. !screenshot-2.png! And then I used pstack to troubleshoot. I found COMMIT-LOG-ALLOCATOR block on java class initialization. !screenshot-3.png! This is a deadlock obviously. CommitLog waits for a CommitLogSegment when initializing. In this moment, the CommitLog class is not initialized and the main thread holds the class lock. After that, COMMIT-LOG-ALLOCATOR creates a CommitLogSegment with exception and call *CommitLog.handleCommitError*(static method). COMMIT-LOG-ALLOCATOR will stick on this line because CommitLog class is still initializing. > Running into deadlock when do CommitLog initialization > -- > > Key: CASSANDRA-15295 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15295 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Zephyr Guo >Assignee: Zephyr Guo >Priority: Normal > Attachments: jstack.log, pstack.log, screenshot-1.png, > screenshot-2.png, screenshot-3.png > > > Recently, I found a cassandra(3.11.4) node stuck in STARTING status for a > long time. > I used jstack to saw what happened. The main thread stuck in > *AbstractCommitLogSegmentManager.awaitAvailableSegment* > !screenshot-1.png! > The strange thing is COMMIT-LOG-ALLOCATOR thread state was runnable but it > was not actually running. > !screenshot-2.png! > And then I used pstack to troubleshoot. I found COMMIT-LOG-ALLOCATOR block on > java class initialization. > !screenshot-3.png! > This is a deadlock obviously. CommitLog waits for a CommitLogSegment when > initializing. In this moment, the CommitLog class is not initialized and the > main thread holds the class lock. After that, COMMIT-LOG-ALLOCATOR creates a > CommitLogSegment with exception and call *CommitLog.handleCommitError*(static > method). COMMIT-LOG-ALLOCATOR will block on this line because CommitLog > class is still initializing. > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15296) ZstdCompressor compression_level setting
DeepakVohra created CASSANDRA-15296: --- Summary: ZstdCompressor compression_level setting Key: CASSANDRA-15296 URL: https://issues.apache.org/jira/browse/CASSANDRA-15296 Project: Cassandra Issue Type: Bug Components: Dependencies, Feature/Compression Reporter: DeepakVohra The DEFAULT_COMPRESSION_LEVEL for ZstdCompressor is set to 3, but its range for compression_level is indicated to be between -131072 and 2. The default value is outside the range. Is it by design or a bug? {code:java} - ``compression_level`` is only applicable for ``ZstdCompressor`` and accepts values between ``-131072`` and ``2``. // Compressor Defaults public static final int DEFAULT_COMPRESSION_LEVEL = 3; {code} https://github.com/apache/cassandra/commit/dccf53061a61e7c632669c60cd94626e405518e9 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15281) help for clearsnapshot needs to be updated to indicate requirement for --all
[ https://issues.apache.org/jira/browse/CASSANDRA-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Turner updated CASSANDRA-15281: Attachment: 15281.trunk > help for clearsnapshot needs to be updated to indicate requirement for --all > > > Key: CASSANDRA-15281 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15281 > Project: Cassandra > Issue Type: Bug > Components: Tool/nodetool >Reporter: DeepakVohra >Assignee: Josh Turner >Priority: Normal > Fix For: 4.x > > Attachments: 15281.trunk > > > According to _CASSANDRA-13391_ > _nodetool clearsnapshot should require --all to clear all snapshots_ > But the help for clearsnapshot does not indicate the same. > {code:java} > [ec2-user@ip-10-0-2-238 ~]$ nodetool help clearsnapshot > NAME nodetool clearsnapshot - Remove the snapshot with the given name from > the given keyspaces. If no snapshotName is specified we will remove all > snapshots{code} > The help for clearsnapshot needs to be updated to indicate requirement for > --all to remove all snapshots. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15281) help for clearsnapshot needs to be updated to indicate requirement for --all
[ https://issues.apache.org/jira/browse/CASSANDRA-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Turner updated CASSANDRA-15281: Test and Documentation Plan: Updated help message to match functionality. Status: Patch Available (was: Open) > help for clearsnapshot needs to be updated to indicate requirement for --all > > > Key: CASSANDRA-15281 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15281 > Project: Cassandra > Issue Type: Bug > Components: Tool/nodetool >Reporter: DeepakVohra >Assignee: Josh Turner >Priority: Normal > Fix For: 4.x > > Attachments: 15281.trunk > > > According to _CASSANDRA-13391_ > _nodetool clearsnapshot should require --all to clear all snapshots_ > But the help for clearsnapshot does not indicate the same. > {code:java} > [ec2-user@ip-10-0-2-238 ~]$ nodetool help clearsnapshot > NAME nodetool clearsnapshot - Remove the snapshot with the given name from > the given keyspaces. If no snapshotName is specified we will remove all > snapshots{code} > The help for clearsnapshot needs to be updated to indicate requirement for > --all to remove all snapshots. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15281) help for clearsnapshot needs to be updated to indicate requirement for --all
[ https://issues.apache.org/jira/browse/CASSANDRA-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Turner updated CASSANDRA-15281: Component/s: (was: CQL/Syntax) Tool/nodetool > help for clearsnapshot needs to be updated to indicate requirement for --all > > > Key: CASSANDRA-15281 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15281 > Project: Cassandra > Issue Type: Bug > Components: Tool/nodetool >Reporter: DeepakVohra >Assignee: Josh Turner >Priority: Normal > Fix For: 4.x > > > According to _CASSANDRA-13391_ > _nodetool clearsnapshot should require --all to clear all snapshots_ > But the help for clearsnapshot does not indicate the same. > {code:java} > [ec2-user@ip-10-0-2-238 ~]$ nodetool help clearsnapshot > NAME nodetool clearsnapshot - Remove the snapshot with the given name from > the given keyspaces. If no snapshotName is specified we will remove all > snapshots{code} > The help for clearsnapshot needs to be updated to indicate requirement for > --all to remove all snapshots. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15281) help for clearsnapshot needs to be updated to indicate requirement for --all
[ https://issues.apache.org/jira/browse/CASSANDRA-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Turner updated CASSANDRA-15281: Bug Category: Parent values: Code(13163)Level 1 values: Bug - Unclear Impact(13164) Complexity: Low Hanging Fruit Discovered By: User Report Fix Version/s: 4.x Severity: Low Status: Open (was: Triage Needed) > help for clearsnapshot needs to be updated to indicate requirement for --all > > > Key: CASSANDRA-15281 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15281 > Project: Cassandra > Issue Type: Bug > Components: CQL/Syntax >Reporter: DeepakVohra >Assignee: Josh Turner >Priority: Normal > Fix For: 4.x > > > According to _CASSANDRA-13391_ > _nodetool clearsnapshot should require --all to clear all snapshots_ > But the help for clearsnapshot does not indicate the same. > {code:java} > [ec2-user@ip-10-0-2-238 ~]$ nodetool help clearsnapshot > NAME nodetool clearsnapshot - Remove the snapshot with the given name from > the given keyspaces. If no snapshotName is specified we will remove all > snapshots{code} > The help for clearsnapshot needs to be updated to indicate requirement for > --all to remove all snapshots. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-15281) help for clearsnapshot needs to be updated to indicate requirement for --all
[ https://issues.apache.org/jira/browse/CASSANDRA-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Turner reassigned CASSANDRA-15281: --- Assignee: Josh Turner > help for clearsnapshot needs to be updated to indicate requirement for --all > > > Key: CASSANDRA-15281 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15281 > Project: Cassandra > Issue Type: Bug > Components: CQL/Syntax >Reporter: DeepakVohra >Assignee: Josh Turner >Priority: Normal > > According to _CASSANDRA-13391_ > _nodetool clearsnapshot should require --all to clear all snapshots_ > But the help for clearsnapshot does not indicate the same. > {code:java} > [ec2-user@ip-10-0-2-238 ~]$ nodetool help clearsnapshot > NAME nodetool clearsnapshot - Remove the snapshot with the given name from > the given keyspaces. If no snapshotName is specified we will remove all > snapshots{code} > The help for clearsnapshot needs to be updated to indicate requirement for > --all to remove all snapshots. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15295) Running into deadlock when do CommitLog initialization
[ https://issues.apache.org/jira/browse/CASSANDRA-15295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zephyr Guo updated CASSANDRA-15295: --- Attachment: screenshot-3.png > Running into deadlock when do CommitLog initialization > -- > > Key: CASSANDRA-15295 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15295 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Zephyr Guo >Assignee: Zephyr Guo >Priority: Normal > Attachments: jstack.log, pstack.log, screenshot-1.png, > screenshot-2.png, screenshot-3.png > > > Recently, I found a cassandra(3.11.4) node stuck in STARTING status for a > long time. > I used jstack to saw what happened. The main thread stuck in > *AbstractCommitLogSegmentManager.awaitAvailableSegment* > !image-2019-08-30-21-40-43-748.png|thumbnail! > The strange thing is COMMIT-LOG-ALLOCATOR thread state was runnable but it > was not actually running. > !image-2019-08-30-21-30-11-683.png|thumbnail! > And then I used pstack to troubleshoot. I found COMMIT-LOG-ALLOCATOR block on > java class initialization. > !image-2019-08-30-21-25-17-638.png|thumbnail! > This is a deadlock obviously. CommitLog waits for a CommitLogSegment when > initializing. In this moment, the CommitLog class is not initialized and the > main thread holds the class lock. After that, COMMIT-LOG-ALLOCATOR creates a > CommitLogSegment with exception and call *CommitLog.handleCommitError*(static > method). COMMIT-LOG-ALLOCATOR will stick on this line because CommitLog > class is still initializing. > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15295) Running into deadlock when do CommitLog initialization
[ https://issues.apache.org/jira/browse/CASSANDRA-15295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zephyr Guo updated CASSANDRA-15295: --- Description: Recently, I found a cassandra(3.11.4) node stuck in STARTING status for a long time. I used jstack to saw what happened. The main thread stuck in *AbstractCommitLogSegmentManager.awaitAvailableSegment* !screenshot-1.png! The strange thing is COMMIT-LOG-ALLOCATOR thread state was runnable but it was not actually running. !screenshot-2.png! And then I used pstack to troubleshoot. I found COMMIT-LOG-ALLOCATOR block on java class initialization. !screenshot-3.png! This is a deadlock obviously. CommitLog waits for a CommitLogSegment when initializing. In this moment, the CommitLog class is not initialized and the main thread holds the class lock. After that, COMMIT-LOG-ALLOCATOR creates a CommitLogSegment with exception and call *CommitLog.handleCommitError*(static method). COMMIT-LOG-ALLOCATOR will stick on this line because CommitLog class is still initializing. was: Recently, I found a cassandra(3.11.4) node stuck in STARTING status for a long time. I used jstack to saw what happened. The main thread stuck in *AbstractCommitLogSegmentManager.awaitAvailableSegment* !image-2019-08-30-21-40-43-748.png|thumbnail! The strange thing is COMMIT-LOG-ALLOCATOR thread state was runnable but it was not actually running. !image-2019-08-30-21-30-11-683.png|thumbnail! And then I used pstack to troubleshoot. I found COMMIT-LOG-ALLOCATOR block on java class initialization. !image-2019-08-30-21-25-17-638.png|thumbnail! This is a deadlock obviously. CommitLog waits for a CommitLogSegment when initializing. In this moment, the CommitLog class is not initialized and the main thread holds the class lock. After that, COMMIT-LOG-ALLOCATOR creates a CommitLogSegment with exception and call *CommitLog.handleCommitError*(static method). COMMIT-LOG-ALLOCATOR will stick on this line because CommitLog class is still initializing. > Running into deadlock when do CommitLog initialization > -- > > Key: CASSANDRA-15295 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15295 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Zephyr Guo >Assignee: Zephyr Guo >Priority: Normal > Attachments: jstack.log, pstack.log, screenshot-1.png, > screenshot-2.png, screenshot-3.png > > > Recently, I found a cassandra(3.11.4) node stuck in STARTING status for a > long time. > I used jstack to saw what happened. The main thread stuck in > *AbstractCommitLogSegmentManager.awaitAvailableSegment* > !screenshot-1.png! > The strange thing is COMMIT-LOG-ALLOCATOR thread state was runnable but it > was not actually running. > !screenshot-2.png! > And then I used pstack to troubleshoot. I found COMMIT-LOG-ALLOCATOR block on > java class initialization. > !screenshot-3.png! > This is a deadlock obviously. CommitLog waits for a CommitLogSegment when > initializing. In this moment, the CommitLog class is not initialized and the > main thread holds the class lock. After that, COMMIT-LOG-ALLOCATOR creates a > CommitLogSegment with exception and call *CommitLog.handleCommitError*(static > method). COMMIT-LOG-ALLOCATOR will stick on this line because CommitLog > class is still initializing. > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15295) Running into deadlock when do CommitLog initialization
[ https://issues.apache.org/jira/browse/CASSANDRA-15295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zephyr Guo updated CASSANDRA-15295: --- Attachment: screenshot-2.png > Running into deadlock when do CommitLog initialization > -- > > Key: CASSANDRA-15295 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15295 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Zephyr Guo >Assignee: Zephyr Guo >Priority: Normal > Attachments: jstack.log, pstack.log, screenshot-1.png, > screenshot-2.png, screenshot-3.png > > > Recently, I found a cassandra(3.11.4) node stuck in STARTING status for a > long time. > I used jstack to saw what happened. The main thread stuck in > *AbstractCommitLogSegmentManager.awaitAvailableSegment* > !image-2019-08-30-21-40-43-748.png|thumbnail! > The strange thing is COMMIT-LOG-ALLOCATOR thread state was runnable but it > was not actually running. > !image-2019-08-30-21-30-11-683.png|thumbnail! > And then I used pstack to troubleshoot. I found COMMIT-LOG-ALLOCATOR block on > java class initialization. > !image-2019-08-30-21-25-17-638.png|thumbnail! > This is a deadlock obviously. CommitLog waits for a CommitLogSegment when > initializing. In this moment, the CommitLog class is not initialized and the > main thread holds the class lock. After that, COMMIT-LOG-ALLOCATOR creates a > CommitLogSegment with exception and call *CommitLog.handleCommitError*(static > method). COMMIT-LOG-ALLOCATOR will stick on this line because CommitLog > class is still initializing. > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15295) Running into deadlock when do CommitLog initialization
[ https://issues.apache.org/jira/browse/CASSANDRA-15295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zephyr Guo updated CASSANDRA-15295: --- Attachment: (was: image-2019-08-30-21-30-11-683.png) > Running into deadlock when do CommitLog initialization > -- > > Key: CASSANDRA-15295 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15295 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Zephyr Guo >Assignee: Zephyr Guo >Priority: Normal > Attachments: jstack.log, pstack.log, screenshot-1.png > > > Recently, I found a cassandra(3.11.4) node stuck in STARTING status for a > long time. > I used jstack to saw what happened. The main thread stuck in > *AbstractCommitLogSegmentManager.awaitAvailableSegment* > !image-2019-08-30-21-40-43-748.png|thumbnail! > The strange thing is COMMIT-LOG-ALLOCATOR thread state was runnable but it > was not actually running. > !image-2019-08-30-21-30-11-683.png|thumbnail! > And then I used pstack to troubleshoot. I found COMMIT-LOG-ALLOCATOR block on > java class initialization. > !image-2019-08-30-21-25-17-638.png|thumbnail! > This is a deadlock obviously. CommitLog waits for a CommitLogSegment when > initializing. In this moment, the CommitLog class is not initialized and the > main thread holds the class lock. After that, COMMIT-LOG-ALLOCATOR creates a > CommitLogSegment with exception and call *CommitLog.handleCommitError*(static > method). COMMIT-LOG-ALLOCATOR will stick on this line because CommitLog > class is still initializing. > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15295) Running into deadlock when do CommitLog initialization
[ https://issues.apache.org/jira/browse/CASSANDRA-15295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zephyr Guo updated CASSANDRA-15295: --- Attachment: (was: image-2019-08-30-21-40-43-748.png) > Running into deadlock when do CommitLog initialization > -- > > Key: CASSANDRA-15295 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15295 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Zephyr Guo >Assignee: Zephyr Guo >Priority: Normal > Attachments: jstack.log, pstack.log, screenshot-1.png > > > Recently, I found a cassandra(3.11.4) node stuck in STARTING status for a > long time. > I used jstack to saw what happened. The main thread stuck in > *AbstractCommitLogSegmentManager.awaitAvailableSegment* > !image-2019-08-30-21-40-43-748.png|thumbnail! > The strange thing is COMMIT-LOG-ALLOCATOR thread state was runnable but it > was not actually running. > !image-2019-08-30-21-30-11-683.png|thumbnail! > And then I used pstack to troubleshoot. I found COMMIT-LOG-ALLOCATOR block on > java class initialization. > !image-2019-08-30-21-25-17-638.png|thumbnail! > This is a deadlock obviously. CommitLog waits for a CommitLogSegment when > initializing. In this moment, the CommitLog class is not initialized and the > main thread holds the class lock. After that, COMMIT-LOG-ALLOCATOR creates a > CommitLogSegment with exception and call *CommitLog.handleCommitError*(static > method). COMMIT-LOG-ALLOCATOR will stick on this line because CommitLog > class is still initializing. > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15295) Running into deadlock when do CommitLog initialization
[ https://issues.apache.org/jira/browse/CASSANDRA-15295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zephyr Guo updated CASSANDRA-15295: --- Attachment: screenshot-1.png > Running into deadlock when do CommitLog initialization > -- > > Key: CASSANDRA-15295 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15295 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Zephyr Guo >Assignee: Zephyr Guo >Priority: Normal > Attachments: jstack.log, pstack.log, screenshot-1.png > > > Recently, I found a cassandra(3.11.4) node stuck in STARTING status for a > long time. > I used jstack to saw what happened. The main thread stuck in > *AbstractCommitLogSegmentManager.awaitAvailableSegment* > !image-2019-08-30-21-40-43-748.png|thumbnail! > The strange thing is COMMIT-LOG-ALLOCATOR thread state was runnable but it > was not actually running. > !image-2019-08-30-21-30-11-683.png|thumbnail! > And then I used pstack to troubleshoot. I found COMMIT-LOG-ALLOCATOR block on > java class initialization. > !image-2019-08-30-21-25-17-638.png|thumbnail! > This is a deadlock obviously. CommitLog waits for a CommitLogSegment when > initializing. In this moment, the CommitLog class is not initialized and the > main thread holds the class lock. After that, COMMIT-LOG-ALLOCATOR creates a > CommitLogSegment with exception and call *CommitLog.handleCommitError*(static > method). COMMIT-LOG-ALLOCATOR will stick on this line because CommitLog > class is still initializing. > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15295) Running into deadlock when do CommitLog initialization
[ https://issues.apache.org/jira/browse/CASSANDRA-15295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zephyr Guo updated CASSANDRA-15295: --- Attachment: (was: image-2019-08-30-21-29-48-623.png) > Running into deadlock when do CommitLog initialization > -- > > Key: CASSANDRA-15295 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15295 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Zephyr Guo >Assignee: Zephyr Guo >Priority: Normal > Attachments: image-2019-08-30-21-40-43-748.png, jstack.log, pstack.log > > > Recently, I found a cassandra(3.11.4) node stuck in STARTING status for a > long time. > I used jstack to saw what happened. The main thread stuck in > *AbstractCommitLogSegmentManager.awaitAvailableSegment* > !image-2019-08-30-21-40-43-748.png|thumbnail! > The strange thing is COMMIT-LOG-ALLOCATOR thread state was runnable but it > was not actually running. > !image-2019-08-30-21-30-11-683.png|thumbnail! > And then I used pstack to troubleshoot. I found COMMIT-LOG-ALLOCATOR block on > java class initialization. > !image-2019-08-30-21-25-17-638.png|thumbnail! > This is a deadlock obviously. CommitLog waits for a CommitLogSegment when > initializing. In this moment, the CommitLog class is not initialized and the > main thread holds the class lock. After that, COMMIT-LOG-ALLOCATOR creates a > CommitLogSegment with exception and call *CommitLog.handleCommitError*(static > method). COMMIT-LOG-ALLOCATOR will stick on this line because CommitLog > class is still initializing. > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15295) Running into deadlock when do CommitLog initialization
[ https://issues.apache.org/jira/browse/CASSANDRA-15295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zephyr Guo updated CASSANDRA-15295: --- Attachment: (was: image-2019-08-30-21-25-17-638.png) > Running into deadlock when do CommitLog initialization > -- > > Key: CASSANDRA-15295 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15295 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Zephyr Guo >Assignee: Zephyr Guo >Priority: Normal > Attachments: image-2019-08-30-21-40-43-748.png, jstack.log, pstack.log > > > Recently, I found a cassandra(3.11.4) node stuck in STARTING status for a > long time. > I used jstack to saw what happened. The main thread stuck in > *AbstractCommitLogSegmentManager.awaitAvailableSegment* > !image-2019-08-30-21-40-43-748.png|thumbnail! > The strange thing is COMMIT-LOG-ALLOCATOR thread state was runnable but it > was not actually running. > !image-2019-08-30-21-30-11-683.png|thumbnail! > And then I used pstack to troubleshoot. I found COMMIT-LOG-ALLOCATOR block on > java class initialization. > !image-2019-08-30-21-25-17-638.png|thumbnail! > This is a deadlock obviously. CommitLog waits for a CommitLogSegment when > initializing. In this moment, the CommitLog class is not initialized and the > main thread holds the class lock. After that, COMMIT-LOG-ALLOCATOR creates a > CommitLogSegment with exception and call *CommitLog.handleCommitError*(static > method). COMMIT-LOG-ALLOCATOR will stick on this line because CommitLog > class is still initializing. > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15295) Running into deadlock when do CommitLog initialization
[ https://issues.apache.org/jira/browse/CASSANDRA-15295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zephyr Guo updated CASSANDRA-15295: --- Attachment: pstack.log > Running into deadlock when do CommitLog initialization > -- > > Key: CASSANDRA-15295 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15295 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Zephyr Guo >Assignee: Zephyr Guo >Priority: Normal > Attachments: image-2019-08-30-21-25-17-638.png, > image-2019-08-30-21-29-48-623.png, image-2019-08-30-21-30-11-683.png, > image-2019-08-30-21-40-43-748.png, jstack.log, pstack.log > > > Recently, I found a cassandra(3.11.4) node stuck in STARTING status for a > long time. > I used jstack to saw what happened. The main thread stuck in > *AbstractCommitLogSegmentManager.awaitAvailableSegment* > !image-2019-08-30-21-40-43-748.png|thumbnail! > The strange thing is COMMIT-LOG-ALLOCATOR thread state was runnable but it > was not actually running. > !image-2019-08-30-21-30-11-683.png|thumbnail! > And then I used pstack to troubleshoot. I found COMMIT-LOG-ALLOCATOR block on > java class initialization. > !image-2019-08-30-21-25-17-638.png|thumbnail! > This is a deadlock obviously. CommitLog waits for a CommitLogSegment when > initializing. In this moment, the CommitLog class is not initialized and the > main thread holds the class lock. After that, COMMIT-LOG-ALLOCATOR creates a > CommitLogSegment with exception and call *CommitLog.handleCommitError*(static > method). COMMIT-LOG-ALLOCATOR will stick on this line because CommitLog > class is still initializing. > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15295) Running into deadlock when do CommitLog initialization
[ https://issues.apache.org/jira/browse/CASSANDRA-15295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zephyr Guo updated CASSANDRA-15295: --- Attachment: jstack.log > Running into deadlock when do CommitLog initialization > -- > > Key: CASSANDRA-15295 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15295 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Zephyr Guo >Assignee: Zephyr Guo >Priority: Normal > Attachments: image-2019-08-30-21-25-17-638.png, > image-2019-08-30-21-29-48-623.png, image-2019-08-30-21-30-11-683.png, > image-2019-08-30-21-40-43-748.png, jstack.log > > > Recently, I found a cassandra(3.11.4) node stuck in STARTING status for a > long time. > I used jstack to saw what happened. The main thread stuck in > *AbstractCommitLogSegmentManager.awaitAvailableSegment* > !image-2019-08-30-21-40-43-748.png|thumbnail! > The strange thing is COMMIT-LOG-ALLOCATOR thread state was runnable but it > was not actually running. > !image-2019-08-30-21-30-11-683.png|thumbnail! > And then I used pstack to troubleshoot. I found COMMIT-LOG-ALLOCATOR block on > java class initialization. > !image-2019-08-30-21-25-17-638.png|thumbnail! > This is a deadlock obviously. CommitLog waits for a CommitLogSegment when > initializing. In this moment, the CommitLog class is not initialized and the > main thread holds the class lock. After that, COMMIT-LOG-ALLOCATOR creates a > CommitLogSegment with exception and call *CommitLog.handleCommitError*(static > method). COMMIT-LOG-ALLOCATOR will stick on this line because CommitLog > class is still initializing. > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15295) Running into deadlock when do CommitLog initialization
Zephyr Guo created CASSANDRA-15295: -- Summary: Running into deadlock when do CommitLog initialization Key: CASSANDRA-15295 URL: https://issues.apache.org/jira/browse/CASSANDRA-15295 Project: Cassandra Issue Type: Bug Components: Local/Commit Log Reporter: Zephyr Guo Assignee: Zephyr Guo Attachments: image-2019-08-30-21-25-17-638.png, image-2019-08-30-21-29-48-623.png, image-2019-08-30-21-30-11-683.png, image-2019-08-30-21-40-43-748.png Recently, I found a cassandra(3.11.4) node stuck in STARTING status for a long time. I used jstack to saw what happened. The main thread stuck in *AbstractCommitLogSegmentManager.awaitAvailableSegment* !image-2019-08-30-21-40-43-748.png|thumbnail! The strange thing is COMMIT-LOG-ALLOCATOR thread state was runnable but it was not actually running. !image-2019-08-30-21-30-11-683.png|thumbnail! And then I used pstack to troubleshoot. I found COMMIT-LOG-ALLOCATOR block on java class initialization. !image-2019-08-30-21-25-17-638.png|thumbnail! This is a deadlock obviously. CommitLog waits for a CommitLogSegment when initializing. In this moment, the CommitLog class is not initialized and the main thread holds the class lock. After that, COMMIT-LOG-ALLOCATOR creates a CommitLogSegment with exception and call *CommitLog.handleCommitError*(static method). COMMIT-LOG-ALLOCATOR will stick on this line because CommitLog class is still initializing. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15223) OutboundTcpConnection leaks direct memory
[ https://issues.apache.org/jira/browse/CASSANDRA-15223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venkata Harikrishna Nukala updated CASSANDRA-15223: --- Test and Documentation Plan: Ran unit tests and tested basic things using ccm with compression enabled. Status: Patch Available (was: Open) > OutboundTcpConnection leaks direct memory > - > > Key: CASSANDRA-15223 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15223 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Internode >Reporter: Benedict >Assignee: Venkata Harikrishna Nukala >Priority: Normal > Fix For: 3.0.x, 3.11.x > > > On disconnect we set {{out}} to null without first closing it -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15260) Add `allocate_tokens_for_dc_rf` yaml option for token allocation
[ https://issues.apache.org/jira/browse/CASSANDRA-15260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mck updated CASSANDRA-15260: Impacts: Docs Test and Documentation Plan: unit test, manual testing Status: Patch Available (was: In Progress) Have added a unit test in BootStrapperTest. Does not do that much, as SummaryStatistics is not available when using `allocate_tokens_for_local_replication_factor`. > Add `allocate_tokens_for_dc_rf` yaml option for token allocation > > > Key: CASSANDRA-15260 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15260 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: mck >Assignee: mck >Priority: Normal > Fix For: 4.x > > > Similar to DSE's option: {{allocate_tokens_for_local_replication_factor}} > Currently the > [ReplicationAwareTokenAllocator|https://www.datastax.com/dev/blog/token-allocation-algorithm] > requires a defined keyspace and a replica factor specified in the current > datacenter. > This is problematic in a number of ways. The real keyspace can not be used > when adding new datacenters as, in practice, all its nodes need to be up and > running before it has the capacity to replicate data into it. New datacenters > (or lift-and-shifting a cluster via datacenter migration) therefore has to be > done using a dummy keyspace that duplicates the replication strategy+factor > of the real keyspace. This gets even more difficult come version 4.0, as the > replica factor can not even be defined in new datacenters before those > datacenters are up and running. > These issues are removed by avoiding the keyspace definition and lookup, and > presuming the replica strategy is by datacenter, ie NTS. This can be done > with the use of an {{allocate_tokens_for_dc_rf}} option. > It may also be of value considering whether {{allocate_tokens_for_dc_rf=3}} > becomes the default? as this is the replication factor for the vast majority > of datacenters in production. I suspect this would be a good improvement over > the existing randomly generated tokens algorithm. > Initial patch is available in > [https://github.com/thelastpickle/cassandra/commit/fc4865b0399570e58f11215565ba17dc4a53da97] > The patch does not remove the existing {{allocate_tokens_for_keyspace}} > option, as that provides the codebase for handling different replication > strategies. > > fyi [~blambov] [~jay.zhuang] [~chovatia.jayd...@gmail.com] [~alokamvenki] > [~alexchueshev] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15260) Add `allocate_tokens_for_dc_rf` yaml option for token allocation
[ https://issues.apache.org/jira/browse/CASSANDRA-15260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905336#comment-16905336 ] mck edited comment on CASSANDRA-15260 at 8/30/19 10:48 AM: --- Thanks [~blambov]. The rename is done. ||branch||circleci||asf jenkins testall|| |[CASSANDRA-15260|https://github.com/apache/cassandra/compare/trunk...thelastpickle:mck/trunk__allocate_tokens_for_dc_rf]|[circleci|https://circleci.com/gh/thelastpickle/workflows/cassandra/tree/mck%2Ftrunk__allocate_tokens_for_dc_rf]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-testall/45//badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-testall/45/]| I've opened the ticket, and will 'Submit Patch' it after I get some unit tests in. was (Author: michaelsembwever): Thanks [~blambov]. The rename is done. ||branch||circleci||asf jenkins testall|| |[CASSANDRA-15260|https://github.com/apache/cassandra/compare/trunk...thelastpickle:mck/trunk__allocate_tokens_for_dc_rf]|[circleci|https://circleci.com/gh/thelastpickle/workflows/cassandra/tree/mck%2Ftrunk__allocate_tokens_for_dc_rf]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-testall/43//badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-testall/43/]| I've opened the ticket, and will 'Submit Patch' it after I get some unit tests in. > Add `allocate_tokens_for_dc_rf` yaml option for token allocation > > > Key: CASSANDRA-15260 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15260 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: mck >Assignee: mck >Priority: Normal > Fix For: 4.x > > > Similar to DSE's option: {{allocate_tokens_for_local_replication_factor}} > Currently the > [ReplicationAwareTokenAllocator|https://www.datastax.com/dev/blog/token-allocation-algorithm] > requires a defined keyspace and a replica factor specified in the current > datacenter. > This is problematic in a number of ways. The real keyspace can not be used > when adding new datacenters as, in practice, all its nodes need to be up and > running before it has the capacity to replicate data into it. New datacenters > (or lift-and-shifting a cluster via datacenter migration) therefore has to be > done using a dummy keyspace that duplicates the replication strategy+factor > of the real keyspace. This gets even more difficult come version 4.0, as the > replica factor can not even be defined in new datacenters before those > datacenters are up and running. > These issues are removed by avoiding the keyspace definition and lookup, and > presuming the replica strategy is by datacenter, ie NTS. This can be done > with the use of an {{allocate_tokens_for_dc_rf}} option. > It may also be of value considering whether {{allocate_tokens_for_dc_rf=3}} > becomes the default? as this is the replication factor for the vast majority > of datacenters in production. I suspect this would be a good improvement over > the existing randomly generated tokens algorithm. > Initial patch is available in > [https://github.com/thelastpickle/cassandra/commit/fc4865b0399570e58f11215565ba17dc4a53da97] > The patch does not remove the existing {{allocate_tokens_for_keyspace}} > option, as that provides the codebase for handling different replication > strategies. > > fyi [~blambov] [~jay.zhuang] [~chovatia.jayd...@gmail.com] [~alokamvenki] > [~alexchueshev] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14772) Fix issues in audit / full query log interactions
[ https://issues.apache.org/jira/browse/CASSANDRA-14772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-14772: - Fix Version/s: 4.0-rc > Fix issues in audit / full query log interactions > - > > Key: CASSANDRA-14772 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14772 > Project: Cassandra > Issue Type: Bug > Components: Legacy/CQL, Legacy/Tools >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 4.0, 4.0-rc > > > There are some problems with the audit + full query log code that need to be > resolved before 4.0 is released: > * Fix performance regression in FQL that makes it less usable than it should > be. > * move full query log specific code to a separate package > * do some audit log class renames (I keep reading {{BinLogAuditLogger}} vs > {{BinAuditLogger}} wrong for example) > * avoid parsing the CQL queries twice in {{QueryMessage}} when audit log is > enabled. > * add a new tool to dump audit logs (ie, let fqltool be full query log > specific). fqltool crashes when pointed to them. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13994) Remove COMPACT STORAGE internals before 4.0 release
[ https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-13994: - Fix Version/s: (was: 4.x) 4.0-rc > Remove COMPACT STORAGE internals before 4.0 release > --- > > Key: CASSANDRA-13994 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13994 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Local Write-Read Paths >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Low > Fix For: 4.0-rc > > > 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after > [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of > the related functionality is useless. > There are still some things to consider: > 1. One of the system tables (built indexes) was compact. For now, we just > added {{value}} column to it to make sure it's backwards-compatible, but we > might want to make sure it's just a "normal" table and doesn't have redundant > columns. > 2. Compact Tables were building indexes in {{KEYS}} mode. Removing it is > trivial, but this would mean that all built indexes will be defunct. We could > log a warning for now and ask users to migrate off those for now and > completely remove it from future releases. It's just a couple of classes > though. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14801) calculatePendingRanges no longer safe for multiple adjacent range movements
[ https://issues.apache.org/jira/browse/CASSANDRA-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-14801: - Fix Version/s: 4.0 > calculatePendingRanges no longer safe for multiple adjacent range movements > --- > > Key: CASSANDRA-14801 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14801 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Coordination, Legacy/Distributed Metadata >Reporter: Benedict >Priority: Normal > Fix For: 4.0, 4.0-beta > > > Correctness depended upon the narrowing to a {{Set}}, > which we no longer do - we maintain a collection of all {{Replica}}. Our > {{RangesAtEndpoint}} collection built by {{getPendingRanges}} can as a result > contain the same endpoint multiple times; and our {{EndpointsForToken}} > obtained by {{TokenMetadata.pendingEndpointsFor}} may fail to be constructed, > resulting in cluster-wide failures for writes to the affected token ranges > for the duration of the range movement. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15214) OOMs caught and not rethrown
[ https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-15214: - Fix Version/s: 4.0 > OOMs caught and not rethrown > > > Key: CASSANDRA-15214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15214 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client, Messaging/Internode >Reporter: Benedict >Priority: Normal > Fix For: 4.0, 4.0-rc > > Attachments: oom-experiments.zip > > > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. > It may be that the simplest most consistent way to do this would be to have a > single thread spawned at startup that waits for any exceptions we must > propagate to the Runtime. > We could probably submit a patch upstream to Netty, but for a guaranteed > future proof approach, it may be worth paying the cost of a single thread. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14888) Several mbeans are not unregistered when dropping a keyspace and table
[ https://issues.apache.org/jira/browse/CASSANDRA-14888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-14888: - Fix Version/s: 4.0 > Several mbeans are not unregistered when dropping a keyspace and table > -- > > Key: CASSANDRA-14888 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14888 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Ariel Weisberg >Assignee: Alex Deparvu >Priority: Urgent > Labels: patch-available > Fix For: 4.0, 4.0-rc > > Attachments: CASSANDRA-14888.patch > > > CasCommit, CasPrepare, CasPropose, ReadRepairRequests, > ShortReadProtectionRequests, AntiCompactionTime, BytesValidated, > PartitionsValidated, RepairPrepareTime, RepairSyncTime, > RepairedDataInconsistencies, ViewLockAcquireTime, ViewReadTime, > WriteFailedIdealCL > Basically for 3 years people haven't known what they are doing because the > entire thing is kind of obscure. Fix it and also add a dtest that detects if > any mbeans are left behind after dropping a table and keyspace. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15234) Standardise config and JVM parameters
[ https://issues.apache.org/jira/browse/CASSANDRA-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-15234: - Fix Version/s: 4.0 > Standardise config and JVM parameters > - > > Key: CASSANDRA-15234 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15234 > Project: Cassandra > Issue Type: Bug >Reporter: Benedict >Priority: Normal > Fix For: 4.0, 4.0-beta > > > We have a bunch of inconsistent names and config patterns in the codebase, > both from the yams and JVM properties. It would be nice to standardise the > naming (such as otc_ vs internode_) as well as the provision of values with > units - while maintaining perpetual backwards compatibility with the old > parameter names, of course. > For temporal units, I would propose parsing strings with suffixes of: > {{code}} > u|micros(econds?)? > s(econds?)? > m(inutes?)? > h(ours?)? > d(ays?)? > mo(nths?)? > {{code}} > For rate units, I would propose parsing any of the standard {{B/s, KiB/s, > MiB/s, GiB/s, TiB/s}}. > Perhaps for avoiding ambiguity we could not accept bauds {{bs, Mbps}} or > powers of 1000 such as {{KB/s}}, given these are regularly used for either > their old or new definition e.g. {{KiB/s}}, or we could support them and > simply log the value in bytes/s. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-10190) Python 3 support for cqlsh
[ https://issues.apache.org/jira/browse/CASSANDRA-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-10190: - Fix Version/s: 4.0 > Python 3 support for cqlsh > -- > > Key: CASSANDRA-10190 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10190 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Tools >Reporter: Andrew Pennebaker >Assignee: Patrick Bannister >Priority: Normal > Labels: cqlsh > Fix For: 4.0, 4.0-alpha > > Attachments: coverage_notes.txt > > > Users who operate in a Python 3 environment may have trouble launching cqlsh. > Could we please update cqlsh's syntax to run in Python 3? > As a workaround, users can setup pyenv, and cd to a directory with a > .python-version containing "2.7". But it would be nice if cqlsh supported > modern Python versions out of the box. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13994) Remove COMPACT STORAGE internals before 4.0 release
[ https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-13994: - Fix Version/s: 4.0 > Remove COMPACT STORAGE internals before 4.0 release > --- > > Key: CASSANDRA-13994 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13994 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Local Write-Read Paths >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Low > Fix For: 4.0, 4.0-rc > > > 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after > [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of > the related functionality is useless. > There are still some things to consider: > 1. One of the system tables (built indexes) was compact. For now, we just > added {{value}} column to it to make sure it's backwards-compatible, but we > might want to make sure it's just a "normal" table and doesn't have redundant > columns. > 2. Compact Tables were building indexes in {{KEYS}} mode. Removing it is > trivial, but this would mean that all built indexes will be defunct. We could > log a warning for now and ask users to migrate off those for now and > completely remove it from future releases. It's just a couple of classes > though. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15216) Cross node message creation times are disabled by default
[ https://issues.apache.org/jira/browse/CASSANDRA-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-15216: - Fix Version/s: 4.0 > Cross node message creation times are disabled by default > - > > Key: CASSANDRA-15216 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15216 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Internode >Reporter: Benedict >Priority: Normal > Fix For: 4.0, 4.0-alpha > > > This can cause a lot of wasted work for messages that have timed out on the > coordinator. We should generally assume that our users have setup NTP on > their clusters, and that clocks are modestly in sync, since it’s a > requirement for general correctness of last write wins. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14973) Bring v5 driver out of beta, introduce v6 before 4.0 release is cut
[ https://issues.apache.org/jira/browse/CASSANDRA-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-14973: - Fix Version/s: 4.0 > Bring v5 driver out of beta, introduce v6 before 4.0 release is cut > --- > > Key: CASSANDRA-14973 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14973 > Project: Cassandra > Issue Type: Task >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Urgent > Fix For: 4.0, 4.0-rc > > > In http://issues.apache.org/jira/browse/CASSANDRA-12142, we’ve introduced > Beta flag for v5 protocol. However, up till now, v5 is in beta both in > [Cassandra|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/ProtocolVersion.java#L46] > and in > [java-driver|https://github.com/datastax/java-driver/blob/3.x/driver-core/src/main/java/com/datastax/driver/core/ProtocolVersion.java#L35]. > > Before the final 4.0 release is cut, we need to bring v5 out of beta and > finalise native protocol spec, and start bringing all new changes to v6 > protocol, which will be in beta. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15229) BufferPool Regression
[ https://issues.apache.org/jira/browse/CASSANDRA-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-15229: - Fix Version/s: 4.0 > BufferPool Regression > - > > Key: CASSANDRA-15229 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15229 > Project: Cassandra > Issue Type: Bug > Components: Local/Caching >Reporter: Benedict >Priority: Normal > Fix For: 4.0, 4.0-beta > > > The BufferPool was never intended to be used for a {{ChunkCache}}, and we > need to either change our behaviour to handle uncorrelated lifetimes or use > something else. This is particularly important with the default chunk size > for compressed sstables being reduced. If we address the problem, we should > also utilise the BufferPool for native transport connections like we do for > internode messaging, and reduce the number of pooling solutions we employ. > Probably the best thing to do is to improve BufferPool’s behaviour when used > for things with uncorrelated lifetimes, which essentially boils down to > tracking those chunks that have not been freed and re-circulating them when > we run out of completely free blocks. We should probably also permit > instantiating separate {{BufferPool}}, so that we can insulate internode > messaging from the {{ChunkCache}}, or at least have separate memory bounds > for each, and only share fully-freed chunks. > With these improvements we can also safely increase the {{BufferPool}} chunk > size to 128KiB or 256KiB, to guarantee we can fit compressed pages and reduce > the amount of global coordination and per-allocation overhead. We don’t need > 1KiB granularity for allocations, nor 16 byte granularity for tiny > allocations. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14973) Bring v5 driver out of beta, introduce v6 before 4.0 release is cut
[ https://issues.apache.org/jira/browse/CASSANDRA-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-14973: - Fix Version/s: (was: 4.0) 4.0-rc > Bring v5 driver out of beta, introduce v6 before 4.0 release is cut > --- > > Key: CASSANDRA-14973 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14973 > Project: Cassandra > Issue Type: Task >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Urgent > Fix For: 4.0-rc > > > In http://issues.apache.org/jira/browse/CASSANDRA-12142, we’ve introduced > Beta flag for v5 protocol. However, up till now, v5 is in beta both in > [Cassandra|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/ProtocolVersion.java#L46] > and in > [java-driver|https://github.com/datastax/java-driver/blob/3.x/driver-core/src/main/java/com/datastax/driver/core/ProtocolVersion.java#L35]. > > Before the final 4.0 release is cut, we need to bring v5 out of beta and > finalise native protocol spec, and start bringing all new changes to v6 > protocol, which will be in beta. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14775) StreamingTombstoneHistogramBuilder overflows if > 2B in a single bucket/stabile
[ https://issues.apache.org/jira/browse/CASSANDRA-14775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-14775: - Fix Version/s: 4.0 > StreamingTombstoneHistogramBuilder overflows if > 2B in a single > bucket/stabile > --- > > Key: CASSANDRA-14775 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14775 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction >Reporter: Benedict >Assignee: Alex Deparvu >Priority: Normal > Fix For: 4.0, 4.0-beta > > > This may be unlikely, but is certainly not impossible. In this event, the > count for the bucket will be reset to zero, and the time distorted to 1s in > the future. If MAX_DELETION_TIME were encountered through overflow, this > might result in a bucket with NO_DELETION_TIME. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14773) Overflow of 32-bit integer during compaction.
[ https://issues.apache.org/jira/browse/CASSANDRA-14773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-14773: - Fix Version/s: 4.0 > Overflow of 32-bit integer during compaction. > - > > Key: CASSANDRA-14773 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14773 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction >Reporter: Vladimir Bukhtoyarov >Assignee: Vladimir Bukhtoyarov >Priority: Urgent > Fix For: 4.0, 4.0-beta > > > In scope of CASSANDRA-13444 the compaction was significantly improved from > CPU and memory perspective. Hovewer this improvement introduces the bug in > rounding. When rounding the expriration time which is close to > *Cell.MAX_DELETION_TIME*(it is just *Integer.MAX_VALUE*) the math overflow > happens(because in scope of -CASSANDRA-13444-) data type for point was > changed from Long to Integer in order to reduce memory footprint), as result > point became negative and acts as silent poison for internal structures of > StreamingTombstoneHistogramBuilder like *DistanceHolder* and *DataHolder*. > Then depending of point intervals: > * The TombstoneHistogram produces wrong values when interval of points is > less then binSize, it is not critical. > * Compaction crashes with ArrayIndexOutOfBoundsException if amount of point > intervals is great then binSize, this case is very critical. > > This is pull request [https://github.com/apache/cassandra/pull/273] that > reproduces the issue and provides the fix. > > The stacktrace when running(on codebase without fix) > *testMathOverflowDuringRoundingOfLargeTimestamp* without -ea JVM flag > {noformat} > java.lang.ArrayIndexOutOfBoundsException > at java.lang.System.arraycopy(Native Method) > at > org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder$DistanceHolder.add(StreamingTombstoneHistogramBuilder.java:208) > at > org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder.flushValue(StreamingTombstoneHistogramBuilder.java:140) > at > org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder$$Lambda$1/1967205423.consume(Unknown > Source) > at > org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder$Spool.forEach(StreamingTombstoneHistogramBuilder.java:574) > at > org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder.flushHistogram(StreamingTombstoneHistogramBuilder.java:124) > at > org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder.build(StreamingTombstoneHistogramBuilder.java:184) > at > org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilderTest.testMathOverflowDuringRoundingOfLargeTimestamp(StreamingTombstoneHistogramBuilderTest.java:183) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) > at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:44) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:180) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:41) > at org.junit.runners.ParentRunner$1.evaluate(ParentRunner.java:173) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) > at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) > at org.junit.runners.ParentRunner.run(ParentRunner.java:220) > at org.junit.runner.JUnitCore.run(JUnitCore.java:159) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {noformat} > > The stacktrace when running(on codebase without fix) > *testMathOverflowDuringRoundingOfLargeTimestamp* with
[jira] [Updated] (CASSANDRA-14748) Recycler$WeakOrderQueue occupies Heap
[ https://issues.apache.org/jira/browse/CASSANDRA-14748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-14748: - Fix Version/s: 4.0 > Recycler$WeakOrderQueue occupies Heap > - > > Key: CASSANDRA-14748 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14748 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core > Environment: The netty Cassandra using is netty-all-4.0.39.Final.jar >Reporter: HX >Priority: Normal > Fix For: 4.0, 3.11.x, 4.0-rc > > > Heap constantly high on some of the nodes in the cluster, I dump the heap and > open it through Eclipse Memory Analyzer, looks like Recycler$WeakOrderQueue > occupies most of the heap. > > ||Package||Retained Heap||Retained Heap, %||# Top Dominators|| > |!/jira/icons/i5.gif! |7,078,140,136|100.00%|379,627| > |io|5,665,035,800|80.04%|13,306| > |netty|5,665,035,800|80.04%|13,306| > |util|5,568,107,344|78.67%|2,965| > |Recycler$WeakOrderQueue|4,950,021,544|69.93%|2,169| -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14834) Avoid keeping StreamingTombstoneHistogramBuilder.Spool in memory during the whole compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-14834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-14834: - Fix Version/s: 4.0 > Avoid keeping StreamingTombstoneHistogramBuilder.Spool in memory during the > whole compaction > > > Key: CASSANDRA-14834 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14834 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Low > Fix For: 4.0, 4.0-beta > > > Since CASSANDRA-13444 {{StreamingTombstoneHistogramBuilder.Spool}} is > allocated to keep around an array with 131072 * 2 * 2 integers *per written > sstable* during the whole compaction. With LCS at times creating 1000s of > sstables during a compaction it kills the node. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15229) BufferPool Regression
[ https://issues.apache.org/jira/browse/CASSANDRA-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-15229: - Fix Version/s: (was: 4.0) 4.0-beta > BufferPool Regression > - > > Key: CASSANDRA-15229 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15229 > Project: Cassandra > Issue Type: Bug > Components: Local/Caching >Reporter: Benedict >Priority: Normal > Fix For: 4.0-beta > > > The BufferPool was never intended to be used for a {{ChunkCache}}, and we > need to either change our behaviour to handle uncorrelated lifetimes or use > something else. This is particularly important with the default chunk size > for compressed sstables being reduced. If we address the problem, we should > also utilise the BufferPool for native transport connections like we do for > internode messaging, and reduce the number of pooling solutions we employ. > Probably the best thing to do is to improve BufferPool’s behaviour when used > for things with uncorrelated lifetimes, which essentially boils down to > tracking those chunks that have not been freed and re-circulating them when > we run out of completely free blocks. We should probably also permit > instantiating separate {{BufferPool}}, so that we can insulate internode > messaging from the {{ChunkCache}}, or at least have separate memory bounds > for each, and only share fully-freed chunks. > With these improvements we can also safely increase the {{BufferPool}} chunk > size to 128KiB or 256KiB, to guarantee we can fit compressed pages and reduce > the amount of global coordination and per-allocation overhead. We don’t need > 1KiB granularity for allocations, nor 16 byte granularity for tiny > allocations. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14775) StreamingTombstoneHistogramBuilder overflows if > 2B in a single bucket/stabile
[ https://issues.apache.org/jira/browse/CASSANDRA-14775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-14775: - Fix Version/s: (was: 4.0) 4.0-beta > StreamingTombstoneHistogramBuilder overflows if > 2B in a single > bucket/stabile > --- > > Key: CASSANDRA-14775 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14775 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction >Reporter: Benedict >Assignee: Alex Deparvu >Priority: Normal > Fix For: 4.0-beta > > > This may be unlikely, but is certainly not impossible. In this event, the > count for the bucket will be reset to zero, and the time distorted to 1s in > the future. If MAX_DELETION_TIME were encountered through overflow, this > might result in a bucket with NO_DELETION_TIME. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14834) Avoid keeping StreamingTombstoneHistogramBuilder.Spool in memory during the whole compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-14834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-14834: - Fix Version/s: (was: 4.0) 4.0-beta > Avoid keeping StreamingTombstoneHistogramBuilder.Spool in memory during the > whole compaction > > > Key: CASSANDRA-14834 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14834 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Low > Fix For: 4.0-beta > > > Since CASSANDRA-13444 {{StreamingTombstoneHistogramBuilder.Spool}} is > allocated to keep around an array with 131072 * 2 * 2 integers *per written > sstable* during the whole compaction. With LCS at times creating 1000s of > sstables during a compaction it kills the node. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14748) Recycler$WeakOrderQueue occupies Heap
[ https://issues.apache.org/jira/browse/CASSANDRA-14748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-14748: - Fix Version/s: 4.0-rc > Recycler$WeakOrderQueue occupies Heap > - > > Key: CASSANDRA-14748 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14748 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core > Environment: The netty Cassandra using is netty-all-4.0.39.Final.jar >Reporter: HX >Priority: Normal > Fix For: 3.11.x, 4.0-rc > > > Heap constantly high on some of the nodes in the cluster, I dump the heap and > open it through Eclipse Memory Analyzer, looks like Recycler$WeakOrderQueue > occupies most of the heap. > > ||Package||Retained Heap||Retained Heap, %||# Top Dominators|| > |!/jira/icons/i5.gif! |7,078,140,136|100.00%|379,627| > |io|5,665,035,800|80.04%|13,306| > |netty|5,665,035,800|80.04%|13,306| > |util|5,568,107,344|78.67%|2,965| > |Recycler$WeakOrderQueue|4,950,021,544|69.93%|2,169| -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14773) Overflow of 32-bit integer during compaction.
[ https://issues.apache.org/jira/browse/CASSANDRA-14773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-14773: - Fix Version/s: (was: 4.x) 4.0-beta > Overflow of 32-bit integer during compaction. > - > > Key: CASSANDRA-14773 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14773 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction >Reporter: Vladimir Bukhtoyarov >Assignee: Vladimir Bukhtoyarov >Priority: Urgent > Fix For: 4.0-beta > > > In scope of CASSANDRA-13444 the compaction was significantly improved from > CPU and memory perspective. Hovewer this improvement introduces the bug in > rounding. When rounding the expriration time which is close to > *Cell.MAX_DELETION_TIME*(it is just *Integer.MAX_VALUE*) the math overflow > happens(because in scope of -CASSANDRA-13444-) data type for point was > changed from Long to Integer in order to reduce memory footprint), as result > point became negative and acts as silent poison for internal structures of > StreamingTombstoneHistogramBuilder like *DistanceHolder* and *DataHolder*. > Then depending of point intervals: > * The TombstoneHistogram produces wrong values when interval of points is > less then binSize, it is not critical. > * Compaction crashes with ArrayIndexOutOfBoundsException if amount of point > intervals is great then binSize, this case is very critical. > > This is pull request [https://github.com/apache/cassandra/pull/273] that > reproduces the issue and provides the fix. > > The stacktrace when running(on codebase without fix) > *testMathOverflowDuringRoundingOfLargeTimestamp* without -ea JVM flag > {noformat} > java.lang.ArrayIndexOutOfBoundsException > at java.lang.System.arraycopy(Native Method) > at > org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder$DistanceHolder.add(StreamingTombstoneHistogramBuilder.java:208) > at > org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder.flushValue(StreamingTombstoneHistogramBuilder.java:140) > at > org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder$$Lambda$1/1967205423.consume(Unknown > Source) > at > org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder$Spool.forEach(StreamingTombstoneHistogramBuilder.java:574) > at > org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder.flushHistogram(StreamingTombstoneHistogramBuilder.java:124) > at > org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder.build(StreamingTombstoneHistogramBuilder.java:184) > at > org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilderTest.testMathOverflowDuringRoundingOfLargeTimestamp(StreamingTombstoneHistogramBuilderTest.java:183) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) > at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:44) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:180) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:41) > at org.junit.runners.ParentRunner$1.evaluate(ParentRunner.java:173) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) > at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) > at org.junit.runners.ParentRunner.run(ParentRunner.java:220) > at org.junit.runner.JUnitCore.run(JUnitCore.java:159) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {noformat} > > The stacktrace when running(on codebase without fix) >
[jira] [Updated] (CASSANDRA-15214) OOMs caught and not rethrown
[ https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-15214: - Fix Version/s: (was: 4.0-rc) 4.0-beta > OOMs caught and not rethrown > > > Key: CASSANDRA-15214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15214 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client, Messaging/Internode >Reporter: Benedict >Priority: Normal > Fix For: 4.0-beta > > Attachments: oom-experiments.zip > > > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. > It may be that the simplest most consistent way to do this would be to have a > single thread spawned at startup that waits for any exceptions we must > propagate to the Runtime. > We could probably submit a patch upstream to Netty, but for a guaranteed > future proof approach, it may be worth paying the cost of a single thread. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15214) OOMs caught and not rethrown
[ https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-15214: - Fix Version/s: (was: 4.0-beta) 4.0-rc > OOMs caught and not rethrown > > > Key: CASSANDRA-15214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15214 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client, Messaging/Internode >Reporter: Benedict >Priority: Normal > Fix For: 4.0-rc > > Attachments: oom-experiments.zip > > > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. > It may be that the simplest most consistent way to do this would be to have a > single thread spawned at startup that waits for any exceptions we must > propagate to the Runtime. > We could probably submit a patch upstream to Netty, but for a guaranteed > future proof approach, it may be worth paying the cost of a single thread. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14888) Several mbeans are not unregistered when dropping a keyspace and table
[ https://issues.apache.org/jira/browse/CASSANDRA-14888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-14888: - Fix Version/s: (was: 4.0.x) 4.0-rc > Several mbeans are not unregistered when dropping a keyspace and table > -- > > Key: CASSANDRA-14888 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14888 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Ariel Weisberg >Assignee: Alex Deparvu >Priority: Urgent > Labels: patch-available > Fix For: 4.0-rc > > Attachments: CASSANDRA-14888.patch > > > CasCommit, CasPrepare, CasPropose, ReadRepairRequests, > ShortReadProtectionRequests, AntiCompactionTime, BytesValidated, > PartitionsValidated, RepairPrepareTime, RepairSyncTime, > RepairedDataInconsistencies, ViewLockAcquireTime, ViewReadTime, > WriteFailedIdealCL > Basically for 3 years people haven't known what they are doing because the > entire thing is kind of obscure. Fix it and also add a dtest that detects if > any mbeans are left behind after dropping a table and keyspace. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15214) OOMs caught and not rethrown
[ https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-15214: - Fix Version/s: (was: 4.0) 4.0-rc > OOMs caught and not rethrown > > > Key: CASSANDRA-15214 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15214 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client, Messaging/Internode >Reporter: Benedict >Priority: Normal > Fix For: 4.0-rc > > Attachments: oom-experiments.zip > > > Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, > so presently there is no way to ensure that an OOM reaches the JVM handler to > trigger a crash/heapdump. > It may be that the simplest most consistent way to do this would be to have a > single thread spawned at startup that waits for any exceptions we must > propagate to the Runtime. > We could probably submit a patch upstream to Netty, but for a guaranteed > future proof approach, it may be worth paying the cost of a single thread. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15234) Standardise config and JVM parameters
[ https://issues.apache.org/jira/browse/CASSANDRA-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-15234: - Fix Version/s: 4.0-beta > Standardise config and JVM parameters > - > > Key: CASSANDRA-15234 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15234 > Project: Cassandra > Issue Type: Bug >Reporter: Benedict >Priority: Normal > Fix For: 4.0-beta > > > We have a bunch of inconsistent names and config patterns in the codebase, > both from the yams and JVM properties. It would be nice to standardise the > naming (such as otc_ vs internode_) as well as the provision of values with > units - while maintaining perpetual backwards compatibility with the old > parameter names, of course. > For temporal units, I would propose parsing strings with suffixes of: > {{code}} > u|micros(econds?)? > s(econds?)? > m(inutes?)? > h(ours?)? > d(ays?)? > mo(nths?)? > {{code}} > For rate units, I would propose parsing any of the standard {{B/s, KiB/s, > MiB/s, GiB/s, TiB/s}}. > Perhaps for avoiding ambiguity we could not accept bauds {{bs, Mbps}} or > powers of 1000 such as {{KB/s}}, given these are regularly used for either > their old or new definition e.g. {{KiB/s}}, or we could support them and > simply log the value in bytes/s. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15216) Cross node message creation times are disabled by default
[ https://issues.apache.org/jira/browse/CASSANDRA-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-15216: - Fix Version/s: (was: 4.0) 4.0-alpha > Cross node message creation times are disabled by default > - > > Key: CASSANDRA-15216 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15216 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Internode >Reporter: Benedict >Priority: Normal > Fix For: 4.0-alpha > > > This can cause a lot of wasted work for messages that have timed out on the > coordinator. We should generally assume that our users have setup NTP on > their clusters, and that clocks are modestly in sync, since it’s a > requirement for general correctness of last write wins. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14801) calculatePendingRanges no longer safe for multiple adjacent range movements
[ https://issues.apache.org/jira/browse/CASSANDRA-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-14801: - Fix Version/s: (was: 4.0) 4.0-beta > calculatePendingRanges no longer safe for multiple adjacent range movements > --- > > Key: CASSANDRA-14801 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14801 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Coordination, Legacy/Distributed Metadata >Reporter: Benedict >Priority: Normal > Fix For: 4.0-beta > > > Correctness depended upon the narrowing to a {{Set}}, > which we no longer do - we maintain a collection of all {{Replica}}. Our > {{RangesAtEndpoint}} collection built by {{getPendingRanges}} can as a result > contain the same endpoint multiple times; and our {{EndpointsForToken}} > obtained by {{TokenMetadata.pendingEndpointsFor}} may fail to be constructed, > resulting in cluster-wide failures for writes to the affected token ranges > for the duration of the range movement. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org