[jira] [Commented] (CASSANDRA-7124) Use JMX Notifications to Indicate Success/Failure of Long-Running Operations
[ https://issues.apache.org/jira/browse/CASSANDRA-7124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242415#comment-14242415 ] Rajanarayanan Thottuvaikkatumana commented on CASSANDRA-7124: - [~yukim], Sure I can do that. Can you please explain the branching part a bit? Did you mean to say that, get the latest version from trunk, create a branch locally, apply all the changes and then push that to a different repository such as to my own Github repository and share the link with you? One other opinion is that, if you look at all these operations for which I am making changes, all these changes are independent and implementations are different. In other words, the changes for compact and decommission are two totally different things and they can be committed separately. Just like the way the original repair task was implemented earlier and committed separately. Just because of that itself dealing with them independently will be ideal. More over, if changes are there in all the implementations, verifying all of them together and doing any changes after the review process will be more difficult than dealing with them one by one. So if you can have a look at the patches submitted one by one at your convenience, I think that will be easy for you and me. Feel free to be opinionated and let me know. I will do it accordingly. Thanks Use JMX Notifications to Indicate Success/Failure of Long-Running Operations Key: CASSANDRA-7124 URL: https://issues.apache.org/jira/browse/CASSANDRA-7124 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Tyler Hobbs Assignee: Rajanarayanan Thottuvaikkatumana Priority: Minor Labels: lhf Fix For: 3.0 Attachments: 7124-wip.txt, cassandra-trunk-compact-7124.txt, cassandra-trunk-decommission-7124.txt If {{nodetool cleanup}} or some other long-running operation takes too long to complete, you'll see an error like the one in CASSANDRA-2126, so you can't tell if the operation completed successfully or not. CASSANDRA-4767 fixed this for repairs with JMX notifications. We should do something similar for nodetool cleanup, compact, decommission, move, relocate, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8449) Allow zero-copy reads again
[ https://issues.apache.org/jira/browse/CASSANDRA-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242439#comment-14242439 ] Benedict commented on CASSANDRA-8449: - bq. Isn't the existing use of OpOrder technically arbitrarily long due to GC for instance Any delay caused by GC to the termination of an OpOrder.Group is instantaneous from the point of view of the waiter, since it is also delayed by GC Either way, GC is not as arbitrarily long as I was referring to. Mostly I'm thinking about network consumers that haven't died but are, perhaps, in the process of doing so (GC death spiral), or where the network socket has frozen due to some other problem. i.e. where the problem is isolated from the rest of the host's functionality, but by being guarded by an OpOrder could conceivably cause the problem to infect the whole host's functionality. In reality we can probably guard against most of the risk, but I would still be reticent to use this scheme with that risk even minimally present without the ramifications being constrained as they are here. Allow zero-copy reads again --- Key: CASSANDRA-8449 URL: https://issues.apache.org/jira/browse/CASSANDRA-8449 Project: Cassandra Issue Type: Improvement Reporter: T Jake Luciani Assignee: T Jake Luciani Priority: Minor Labels: performance Fix For: 3.0 We disabled zero-copy reads in CASSANDRA-3179 due to in flight reads accessing a ByteBuffer when the data was unmapped by compaction. Currently this code path is only used for uncompressed reads. The actual bytes are in fact copied to the client output buffers for both netty and thrift before being sent over the wire, so the only issue really is the time it takes to process the read internally. This patch adds a slow network read test and changes the tidy() method to actually delete a sstable once the readTimeout has elapsed giving plenty of time to serialize the read. Removing this copy causes significantly less GC on the read path and improves the tail latencies: http://cstar.datastax.com/graph?stats=c0c8ce16-7fea-11e4-959d-42010af0688fmetric=gc_countoperation=2_readsmoothing=1show_aggregates=truexmin=0xmax=109.34ymin=0ymax=5.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8376) Add support for multiple configuration files (or conf.d)
[ https://issues.apache.org/jira/browse/CASSANDRA-8376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242445#comment-14242445 ] Marcus Olsson commented on CASSANDRA-8376: -- I think option 2 could be a nice feature, using the cassandra.yaml as a default config and then possibly have some files with cluster and node specific settings separated in conf.d that overrides some or all of cassandra.yaml. This would also remove the need to special case the node specific configuration when changing things related to the whole cluster. It could also make it easier to migrate between versions in case cassandra.yaml has changed and e.g. added a new setting(assuming that cassandra.yaml is the default before upgrading). Add support for multiple configuration files (or conf.d) Key: CASSANDRA-8376 URL: https://issues.apache.org/jira/browse/CASSANDRA-8376 Project: Cassandra Issue Type: New Feature Reporter: Omri Bahumi I'm using Chef to generate cassandra.yaml. Part of this file is the seed_provider, which is based on the Chef inventory. Changes to this file (due to Chef inventory change, when adding/removing Cassandra nodes) cause a restart, which is not desirable. The Chef way of handling this is to split the config file into two config files, one containing only the seed_provider and the other containing the rest of the config. Only the latter will cause a restart to Cassandra. This is achievable by either: 1. Specifying multiple config files to Cassandra 2. Specifying a conf.d directory -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8447) Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled
[ https://issues.apache.org/jira/browse/CASSANDRA-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242462#comment-14242462 ] Benedict commented on CASSANDRA-8447: - [~yangzhe1991]: I don't think your problem is related, since it looks to me like you're running 2.1? If so, if you could file another ticket and upload a heap dump from one of your smaller nodes, its config yaml, and a full system log from startup until the problem was encountered I'll see if I can help pinpoint the problem. Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled --- Key: CASSANDRA-8447 URL: https://issues.apache.org/jira/browse/CASSANDRA-8447 Project: Cassandra Issue Type: Bug Components: Core Environment: Cluster size - 4 nodes Node size - 12 CPU (hyper threaded to 24 cores), 192 GB RAM, 2 Raid 0 arrays (Data - 10 disk, spinning 10k drives | CL 2 disk, spinning 10k drives) OS - RHEL 6.5 jvm - oracle 1.7.0_71 Cassandra version 2.0.11 Reporter: jonathan lacefield Attachments: Node_with_compaction.png, Node_without_compaction.png, cassandra.yaml, gc.logs.tar.gz, gcinspector_messages.txt, memtable_debug, results.tar.gz, visualvm_screenshot Behavior - If autocompaction is enabled, nodes will become unresponsive due to a full Old Gen heap which is not cleared during CMS GC. Test methodology - disabled autocompaction on 3 nodes, left autocompaction enabled on 1 node. Executed different Cassandra stress loads, using write only operations. Monitored visualvm and jconsole for heap pressure. Captured iostat and dstat for most tests. Captured heap dump from 50 thread load. Hints were disabled for testing on all nodes to alleviate GC noise due to hints backing up. Data load test through Cassandra stress - /usr/bin/cassandra-stress write n=19 -rate threads=different threads tested -schema replication\(factor=3\) keyspace=Keyspace1 -node all nodes listed Data load thread count and results: * 1 thread - Still running but looks like the node can sustain this load (approx 500 writes per second per node) * 5 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 2k writes per second per node) * 10 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range * 50 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 10k writes per second per node) * 100 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 20k writes per second per node) * 200 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 25k writes per second per node) Note - the observed behavior was the same for all tests except for the single threaded test. The single threaded test does not appear to show this behavior. Tested different GC and Linux OS settings with a focus on the 50 and 200 thread loads. JVM settings tested: # default, out of the box, env-sh settings # 10 G Max | 1 G New - default env-sh settings # 10 G Max | 1 G New - default env-sh settings #* JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=50 # 20 G Max | 10 G New JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly JVM_OPTS=$JVM_OPTS -XX:+UseTLAB JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6 JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3 JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=12 JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=12 JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768 JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking # 20 G Max | 1 G New JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly JVM_OPTS=$JVM_OPTS -XX:+UseTLAB JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6
[jira] [Created] (CASSANDRA-8458) Avoid streaming from tmplink files
Marcus Eriksson created CASSANDRA-8458: -- Summary: Avoid streaming from tmplink files Key: CASSANDRA-8458 URL: https://issues.apache.org/jira/browse/CASSANDRA-8458 Project: Cassandra Issue Type: Bug Reporter: Marcus Eriksson Assignee: Marcus Eriksson Fix For: 2.1.3 Looks like we include tmplink sstables in streams in 2.1+, and when we do, sometimes we get this error message on the receiving side: {{java.io.IOException: Corrupt input data, block did not start with 2 byte signature ('ZV') followed by type byte, 2-byte length)}}. I've only seen this happen when a tmplink sstable is included in the stream. We can not just exclude the tmplink files when starting the stream - we need to include the original file, which we might miss since we check if the requested stream range intersects the sstable range. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8376) Add support for multiple configuration files (or conf.d)
[ https://issues.apache.org/jira/browse/CASSANDRA-8376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242470#comment-14242470 ] Aleksey Yeschenko commented on CASSANDRA-8376: -- We have pluggabe configuration loaders now. Maybe that would help (by writing your own)? Add support for multiple configuration files (or conf.d) Key: CASSANDRA-8376 URL: https://issues.apache.org/jira/browse/CASSANDRA-8376 Project: Cassandra Issue Type: New Feature Reporter: Omri Bahumi I'm using Chef to generate cassandra.yaml. Part of this file is the seed_provider, which is based on the Chef inventory. Changes to this file (due to Chef inventory change, when adding/removing Cassandra nodes) cause a restart, which is not desirable. The Chef way of handling this is to split the config file into two config files, one containing only the seed_provider and the other containing the rest of the config. Only the latter will cause a restart to Cassandra. This is achievable by either: 1. Specifying multiple config files to Cassandra 2. Specifying a conf.d directory -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8447) Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled
[ https://issues.apache.org/jira/browse/CASSANDRA-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jonathan lacefield updated CASSANDRA-8447: -- Attachment: output.svg Added flame graph taken from the time the node started until it's heap became full and the node became unresponsive. Flame graph was created using hprof and https://github.com/cykl/hprof2flamegraph. It's understood that cpu sampling with hprof can be flawed as Brendan Gregg mentions here - http://www.brendangregg.com/blog/2014-06-12/java-flame-graphs.html. We used hprof to leverage the same jvm version currently in use for Cassandra. We will provide another set of flame graphs today that show a healthy node as well as the node which has full heap, for comparison purposes. Please note the many epoll wait items in the graph on the right hand side. Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled --- Key: CASSANDRA-8447 URL: https://issues.apache.org/jira/browse/CASSANDRA-8447 Project: Cassandra Issue Type: Bug Components: Core Environment: Cluster size - 4 nodes Node size - 12 CPU (hyper threaded to 24 cores), 192 GB RAM, 2 Raid 0 arrays (Data - 10 disk, spinning 10k drives | CL 2 disk, spinning 10k drives) OS - RHEL 6.5 jvm - oracle 1.7.0_71 Cassandra version 2.0.11 Reporter: jonathan lacefield Attachments: Node_with_compaction.png, Node_without_compaction.png, cassandra.yaml, gc.logs.tar.gz, gcinspector_messages.txt, memtable_debug, output.svg, results.tar.gz, visualvm_screenshot Behavior - If autocompaction is enabled, nodes will become unresponsive due to a full Old Gen heap which is not cleared during CMS GC. Test methodology - disabled autocompaction on 3 nodes, left autocompaction enabled on 1 node. Executed different Cassandra stress loads, using write only operations. Monitored visualvm and jconsole for heap pressure. Captured iostat and dstat for most tests. Captured heap dump from 50 thread load. Hints were disabled for testing on all nodes to alleviate GC noise due to hints backing up. Data load test through Cassandra stress - /usr/bin/cassandra-stress write n=19 -rate threads=different threads tested -schema replication\(factor=3\) keyspace=Keyspace1 -node all nodes listed Data load thread count and results: * 1 thread - Still running but looks like the node can sustain this load (approx 500 writes per second per node) * 5 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 2k writes per second per node) * 10 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range * 50 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 10k writes per second per node) * 100 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 20k writes per second per node) * 200 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 25k writes per second per node) Note - the observed behavior was the same for all tests except for the single threaded test. The single threaded test does not appear to show this behavior. Tested different GC and Linux OS settings with a focus on the 50 and 200 thread loads. JVM settings tested: # default, out of the box, env-sh settings # 10 G Max | 1 G New - default env-sh settings # 10 G Max | 1 G New - default env-sh settings #* JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=50 # 20 G Max | 10 G New JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly JVM_OPTS=$JVM_OPTS -XX:+UseTLAB JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6 JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3 JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=12 JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=12 JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768 JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking # 20 G Max | 1 G New JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8
[jira] [Commented] (CASSANDRA-7124) Use JMX Notifications to Indicate Success/Failure of Long-Running Operations
[ https://issues.apache.org/jira/browse/CASSANDRA-7124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242492#comment-14242492 ] Rajanarayanan Thottuvaikkatumana commented on CASSANDRA-7124: - [~yukim], One question on the move operation. In the StorageService.java there is a method {{private void move(Token newToken) throws IOException}} which requires changes to return a ListenableFuture and for that we need to make the logic of the above mentioned method as runnable, do submission and return the ListenableFuture to its caller. Since it is not a runnable implementation, should I go ahead and implement a class like {{PendingRangeCalculatorService.java}} and implement a private static class implementing Runnable and its corresponding run method in it? Or can I include the logic of the {{private void move(Token newToken) throws IOException}} method in any of the existing classes like {{PendingRangeCalculatorService.java}}. Please confirm. Thanks Use JMX Notifications to Indicate Success/Failure of Long-Running Operations Key: CASSANDRA-7124 URL: https://issues.apache.org/jira/browse/CASSANDRA-7124 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Tyler Hobbs Assignee: Rajanarayanan Thottuvaikkatumana Priority: Minor Labels: lhf Fix For: 3.0 Attachments: 7124-wip.txt, cassandra-trunk-compact-7124.txt, cassandra-trunk-decommission-7124.txt If {{nodetool cleanup}} or some other long-running operation takes too long to complete, you'll see an error like the one in CASSANDRA-2126, so you can't tell if the operation completed successfully or not. CASSANDRA-4767 fixed this for repairs with JMX notifications. We should do something similar for nodetool cleanup, compact, decommission, move, relocate, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8459) autocompaction on reads can prevent memtable space reclaimation
Benedict created CASSANDRA-8459: --- Summary: autocompaction on reads can prevent memtable space reclaimation Key: CASSANDRA-8459 URL: https://issues.apache.org/jira/browse/CASSANDRA-8459 Project: Cassandra Issue Type: Bug Components: Core Reporter: Benedict Assignee: Benedict Fix For: 2.1.3 Memtable memory reclamation is dependent on reads always making progress, however on the collectTimeOrderedData critical path it is possible for the read to perform a _write_ inline, and for this write to block waiting for memtable space to be reclaimed. However the reclaimation is blocked waiting for this read to complete. There are a number of solutions to this, but the simplest is to make the defragmentation happen asynchronously, so the read terminates normally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8459) autocompaction on reads can prevent memtable space reclaimation
[ https://issues.apache.org/jira/browse/CASSANDRA-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-8459: Attachment: 8459.txt Attaching simple fix. autocompaction on reads can prevent memtable space reclaimation - Key: CASSANDRA-8459 URL: https://issues.apache.org/jira/browse/CASSANDRA-8459 Project: Cassandra Issue Type: Bug Components: Core Reporter: Benedict Assignee: Benedict Fix For: 2.1.3 Attachments: 8459.txt Memtable memory reclamation is dependent on reads always making progress, however on the collectTimeOrderedData critical path it is possible for the read to perform a _write_ inline, and for this write to block waiting for memtable space to be reclaimed. However the reclaimation is blocked waiting for this read to complete. There are a number of solutions to this, but the simplest is to make the defragmentation happen asynchronously, so the read terminates normally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8447) Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled
[ https://issues.apache.org/jira/browse/CASSANDRA-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242502#comment-14242502 ] Benedict commented on CASSANDRA-8447: - [~yangzhe1991]: Your thread dump allowed me to trace the problem to CASSANDRA-8459. Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled --- Key: CASSANDRA-8447 URL: https://issues.apache.org/jira/browse/CASSANDRA-8447 Project: Cassandra Issue Type: Bug Components: Core Environment: Cluster size - 4 nodes Node size - 12 CPU (hyper threaded to 24 cores), 192 GB RAM, 2 Raid 0 arrays (Data - 10 disk, spinning 10k drives | CL 2 disk, spinning 10k drives) OS - RHEL 6.5 jvm - oracle 1.7.0_71 Cassandra version 2.0.11 Reporter: jonathan lacefield Attachments: Node_with_compaction.png, Node_without_compaction.png, cassandra.yaml, gc.logs.tar.gz, gcinspector_messages.txt, memtable_debug, output.svg, results.tar.gz, visualvm_screenshot Behavior - If autocompaction is enabled, nodes will become unresponsive due to a full Old Gen heap which is not cleared during CMS GC. Test methodology - disabled autocompaction on 3 nodes, left autocompaction enabled on 1 node. Executed different Cassandra stress loads, using write only operations. Monitored visualvm and jconsole for heap pressure. Captured iostat and dstat for most tests. Captured heap dump from 50 thread load. Hints were disabled for testing on all nodes to alleviate GC noise due to hints backing up. Data load test through Cassandra stress - /usr/bin/cassandra-stress write n=19 -rate threads=different threads tested -schema replication\(factor=3\) keyspace=Keyspace1 -node all nodes listed Data load thread count and results: * 1 thread - Still running but looks like the node can sustain this load (approx 500 writes per second per node) * 5 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 2k writes per second per node) * 10 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range * 50 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 10k writes per second per node) * 100 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 20k writes per second per node) * 200 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 25k writes per second per node) Note - the observed behavior was the same for all tests except for the single threaded test. The single threaded test does not appear to show this behavior. Tested different GC and Linux OS settings with a focus on the 50 and 200 thread loads. JVM settings tested: # default, out of the box, env-sh settings # 10 G Max | 1 G New - default env-sh settings # 10 G Max | 1 G New - default env-sh settings #* JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=50 # 20 G Max | 10 G New JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly JVM_OPTS=$JVM_OPTS -XX:+UseTLAB JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6 JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3 JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=12 JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=12 JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768 JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking # 20 G Max | 1 G New JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly JVM_OPTS=$JVM_OPTS -XX:+UseTLAB JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6 JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3 JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=12 JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=12 JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity
[jira] [Comment Edited] (CASSANDRA-8447) Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled
[ https://issues.apache.org/jira/browse/CASSANDRA-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242502#comment-14242502 ] Benedict edited comment on CASSANDRA-8447 at 12/11/14 1:20 PM: --- [~yangzhe1991]: Your thread dump allowed me to trace the (your) problem to CASSANDRA-8459. This is a 2.1 specific issue, and not related to this ticket/ was (Author: benedict): [~yangzhe1991]: Your thread dump allowed me to trace the problem to CASSANDRA-8459. Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled --- Key: CASSANDRA-8447 URL: https://issues.apache.org/jira/browse/CASSANDRA-8447 Project: Cassandra Issue Type: Bug Components: Core Environment: Cluster size - 4 nodes Node size - 12 CPU (hyper threaded to 24 cores), 192 GB RAM, 2 Raid 0 arrays (Data - 10 disk, spinning 10k drives | CL 2 disk, spinning 10k drives) OS - RHEL 6.5 jvm - oracle 1.7.0_71 Cassandra version 2.0.11 Reporter: jonathan lacefield Attachments: Node_with_compaction.png, Node_without_compaction.png, cassandra.yaml, gc.logs.tar.gz, gcinspector_messages.txt, memtable_debug, output.svg, results.tar.gz, visualvm_screenshot Behavior - If autocompaction is enabled, nodes will become unresponsive due to a full Old Gen heap which is not cleared during CMS GC. Test methodology - disabled autocompaction on 3 nodes, left autocompaction enabled on 1 node. Executed different Cassandra stress loads, using write only operations. Monitored visualvm and jconsole for heap pressure. Captured iostat and dstat for most tests. Captured heap dump from 50 thread load. Hints were disabled for testing on all nodes to alleviate GC noise due to hints backing up. Data load test through Cassandra stress - /usr/bin/cassandra-stress write n=19 -rate threads=different threads tested -schema replication\(factor=3\) keyspace=Keyspace1 -node all nodes listed Data load thread count and results: * 1 thread - Still running but looks like the node can sustain this load (approx 500 writes per second per node) * 5 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 2k writes per second per node) * 10 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range * 50 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 10k writes per second per node) * 100 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 20k writes per second per node) * 200 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 25k writes per second per node) Note - the observed behavior was the same for all tests except for the single threaded test. The single threaded test does not appear to show this behavior. Tested different GC and Linux OS settings with a focus on the 50 and 200 thread loads. JVM settings tested: # default, out of the box, env-sh settings # 10 G Max | 1 G New - default env-sh settings # 10 G Max | 1 G New - default env-sh settings #* JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=50 # 20 G Max | 10 G New JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly JVM_OPTS=$JVM_OPTS -XX:+UseTLAB JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6 JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3 JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=12 JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=12 JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768 JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking # 20 G Max | 1 G New JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly JVM_OPTS=$JVM_OPTS -XX:+UseTLAB JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6 JVM_OPTS=$JVM_OPTS
[jira] [Commented] (CASSANDRA-8447) Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled
[ https://issues.apache.org/jira/browse/CASSANDRA-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242506#comment-14242506 ] Philo Yang commented on CASSANDRA-8447: --- [~benedict]got it, let's discuss this in CASSANDRA-8459 Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled --- Key: CASSANDRA-8447 URL: https://issues.apache.org/jira/browse/CASSANDRA-8447 Project: Cassandra Issue Type: Bug Components: Core Environment: Cluster size - 4 nodes Node size - 12 CPU (hyper threaded to 24 cores), 192 GB RAM, 2 Raid 0 arrays (Data - 10 disk, spinning 10k drives | CL 2 disk, spinning 10k drives) OS - RHEL 6.5 jvm - oracle 1.7.0_71 Cassandra version 2.0.11 Reporter: jonathan lacefield Attachments: Node_with_compaction.png, Node_without_compaction.png, cassandra.yaml, gc.logs.tar.gz, gcinspector_messages.txt, memtable_debug, output.svg, results.tar.gz, visualvm_screenshot Behavior - If autocompaction is enabled, nodes will become unresponsive due to a full Old Gen heap which is not cleared during CMS GC. Test methodology - disabled autocompaction on 3 nodes, left autocompaction enabled on 1 node. Executed different Cassandra stress loads, using write only operations. Monitored visualvm and jconsole for heap pressure. Captured iostat and dstat for most tests. Captured heap dump from 50 thread load. Hints were disabled for testing on all nodes to alleviate GC noise due to hints backing up. Data load test through Cassandra stress - /usr/bin/cassandra-stress write n=19 -rate threads=different threads tested -schema replication\(factor=3\) keyspace=Keyspace1 -node all nodes listed Data load thread count and results: * 1 thread - Still running but looks like the node can sustain this load (approx 500 writes per second per node) * 5 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 2k writes per second per node) * 10 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range * 50 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 10k writes per second per node) * 100 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 20k writes per second per node) * 200 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 25k writes per second per node) Note - the observed behavior was the same for all tests except for the single threaded test. The single threaded test does not appear to show this behavior. Tested different GC and Linux OS settings with a focus on the 50 and 200 thread loads. JVM settings tested: # default, out of the box, env-sh settings # 10 G Max | 1 G New - default env-sh settings # 10 G Max | 1 G New - default env-sh settings #* JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=50 # 20 G Max | 10 G New JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly JVM_OPTS=$JVM_OPTS -XX:+UseTLAB JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6 JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3 JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=12 JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=12 JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768 JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking # 20 G Max | 1 G New JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly JVM_OPTS=$JVM_OPTS -XX:+UseTLAB JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6 JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3 JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=12 JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=12 JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity JVM_OPTS=$JVM_OPTS
[jira] [Commented] (CASSANDRA-8418) Queries that require allow filtering are working without it
[ https://issues.apache.org/jira/browse/CASSANDRA-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242509#comment-14242509 ] Sylvain Lebresne commented on CASSANDRA-8418: - Actually, I do think the initial reasoning was correct. The query *should* require {{ALLOW FILTERING}} since the partition key is not provided. Because the index cell names starts by the partition key (before the clustering column) and so without the partition key, we do have to filter. Queries that require allow filtering are working without it --- Key: CASSANDRA-8418 URL: https://issues.apache.org/jira/browse/CASSANDRA-8418 Project: Cassandra Issue Type: Bug Reporter: Philip Thompson Assignee: Benjamin Lerer Priority: Minor Fix For: 3.0 Attachments: CASSANDRA-8418.txt The trunk dtest {{cql_tests.py:TestCQL.composite_index_with_pk_test}} has begun failing after the changes to CASSANDRA-7981. With the schema {code}CREATE TABLE blogs ( blog_id int, time1 int, time2 int, author text, content text, PRIMARY KEY (blog_id, time1, time2){code} and {code}CREATE INDEX ON blogs(author){code}, then the query {code}SELECT blog_id, content FROM blogs WHERE time1 0 AND author='foo'{code} now requires ALLOW FILTERING, but did not before the refactor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8459) autocompaction on reads can prevent memtable space reclaimation
[ https://issues.apache.org/jira/browse/CASSANDRA-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242511#comment-14242511 ] Philo Yang commented on CASSANDRA-8459: --- Hi, [~benedict], there is no node in my cluster that is unresponsive to dump the heap. But there are some hprof files dumped by +HeapDumpOnOutOfMemoryError automatically, are they helpful to you? If so I'll upload one of them. autocompaction on reads can prevent memtable space reclaimation - Key: CASSANDRA-8459 URL: https://issues.apache.org/jira/browse/CASSANDRA-8459 Project: Cassandra Issue Type: Bug Components: Core Reporter: Benedict Assignee: Benedict Fix For: 2.1.3 Attachments: 8459.txt Memtable memory reclamation is dependent on reads always making progress, however on the collectTimeOrderedData critical path it is possible for the read to perform a _write_ inline, and for this write to block waiting for memtable space to be reclaimed. However the reclaimation is blocked waiting for this read to complete. There are a number of solutions to this, but the simplest is to make the defragmentation happen asynchronously, so the read terminates normally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8459) autocompaction on reads can prevent memtable space reclaimation
[ https://issues.apache.org/jira/browse/CASSANDRA-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242513#comment-14242513 ] Benedict commented on CASSANDRA-8459: - No need, already sussed the problem and attached the fix autocompaction on reads can prevent memtable space reclaimation - Key: CASSANDRA-8459 URL: https://issues.apache.org/jira/browse/CASSANDRA-8459 Project: Cassandra Issue Type: Bug Components: Core Reporter: Benedict Assignee: Benedict Fix For: 2.1.3 Attachments: 8459.txt Memtable memory reclamation is dependent on reads always making progress, however on the collectTimeOrderedData critical path it is possible for the read to perform a _write_ inline, and for this write to block waiting for memtable space to be reclaimed. However the reclaimation is blocked waiting for this read to complete. There are a number of solutions to this, but the simplest is to make the defragmentation happen asynchronously, so the read terminates normally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8456) Some valid index queries can be considered as invalid
[ https://issues.apache.org/jira/browse/CASSANDRA-8456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242514#comment-14242514 ] Sylvain Lebresne commented on CASSANDRA-8456: - I don't think that's correct, for the reason explained in CASSANDRA-8418: those query do require filtering since the partition key is not provided. Some valid index queries can be considered as invalid - Key: CASSANDRA-8456 URL: https://issues.apache.org/jira/browse/CASSANDRA-8456 Project: Cassandra Issue Type: Bug Reporter: Benjamin Lerer Assignee: Benjamin Lerer Some secondary index queries are rejected or need ALLOW FILTERING but should not. It seems that in certain case {{SelectStatement}} use index filtering for clustering column restrictions while it should be using clustering column slices. The following unit tests can be used to reproduce the problem in 3.0 {code} @Test public void testMultipleClusteringWithIndex() throws Throwable { createTable(CREATE TABLE %s (a int, b int, c int, d int, e int, PRIMARY KEY (a, b, c, d))); createIndex(CREATE INDEX ON %s (b)); createIndex(CREATE INDEX ON %s (e)); execute(INSERT INTO %s (a, b, c, d, e) VALUES (?, ?, ?, ?, ?), 0, 0, 0, 0, 0); execute(INSERT INTO %s (a, b, c, d, e) VALUES (?, ?, ?, ?, ?), 0, 0, 1, 0, 1); execute(INSERT INTO %s (a, b, c, d, e) VALUES (?, ?, ?, ?, ?), 0, 0, 1, 1, 2); execute(INSERT INTO %s (a, b, c, d, e) VALUES (?, ?, ?, ?, ?), 0, 1, 0, 0, 0); execute(INSERT INTO %s (a, b, c, d, e) VALUES (?, ?, ?, ?, ?), 0, 1, 1, 0, 1); execute(INSERT INTO %s (a, b, c, d, e) VALUES (?, ?, ?, ?, ?), 0, 1, 1, 1, 2); execute(INSERT INTO %s (a, b, c, d, e) VALUES (?, ?, ?, ?, ?), 0, 2, 0, 0, 0); assertRows(execute(SELECT * FROM %s WHERE (b, c) = (?, ?), 1, 1), row(0, 1, 1, 0, 1), row(0, 1, 1, 1, 2)); } @Test public void testMultiplePartitionKeyAndMultiClusteringWithIndex() throws Throwable { createTable(CREATE TABLE %s (a int, b int, c int, d int, e int, f int, PRIMARY KEY ((a, b), c, d, e))); createIndex(CREATE INDEX ON %s (c)); createIndex(CREATE INDEX ON %s (f)); execute(INSERT INTO %s (a, b, c, d, e, f) VALUES (?, ?, ?, ?, ?, ?), 0, 0, 0, 0, 0, 0); execute(INSERT INTO %s (a, b, c, d, e, f) VALUES (?, ?, ?, ?, ?, ?), 0, 0, 0, 1, 0, 1); execute(INSERT INTO %s (a, b, c, d, e, f) VALUES (?, ?, ?, ?, ?, ?), 0, 0, 0, 1, 1, 2); execute(INSERT INTO %s (a, b, c, d, e, f) VALUES (?, ?, ?, ?, ?, ?), 0, 0, 1, 0, 0, 3); execute(INSERT INTO %s (a, b, c, d, e, f) VALUES (?, ?, ?, ?, ?, ?), 0, 0, 1, 1, 0, 4); execute(INSERT INTO %s (a, b, c, d, e, f) VALUES (?, ?, ?, ?, ?, ?), 0, 0, 1, 1, 1, 5); execute(INSERT INTO %s (a, b, c, d, e, f) VALUES (?, ?, ?, ?, ?, ?), 0, 0, 2, 0, 0, 6); assertRows(execute(SELECT * FROM %s WHERE a = ? AND (c) IN ((?), (?)) AND f = ?, 0, 1, 2, 5), row(0, 0, 1, 1, 1, 5)); assertRows(execute(SELECT * FROM %s WHERE a = ? AND (c, d) IN ((?, ?)) AND f = ?, 0, 1, 1, 5), row(0, 0, 1, 1, 1, 5)); assertRows(execute(SELECT * FROM %s WHERE a = ? AND (c) = (?) AND f = ?, 0, 1, 5), row(0, 0, 1, 1, 1, 5)); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8459) autocompaction on reads can prevent memtable space reclaimation
[ https://issues.apache.org/jira/browse/CASSANDRA-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242511#comment-14242511 ] Philo Yang edited comment on CASSANDRA-8459 at 12/11/14 1:27 PM: - Hi, [~benedict], there is no node in my cluster that is unresponsive now to dump the heap. But there are some hprof files dumped by +HeapDumpOnOutOfMemoryError automatically, are they helpful to you? If so I'll upload one of them. was (Author: yangzhe1991): Hi, [~benedict], there is no node in my cluster that is unresponsive to dump the heap. But there are some hprof files dumped by +HeapDumpOnOutOfMemoryError automatically, are they helpful to you? If so I'll upload one of them. autocompaction on reads can prevent memtable space reclaimation - Key: CASSANDRA-8459 URL: https://issues.apache.org/jira/browse/CASSANDRA-8459 Project: Cassandra Issue Type: Bug Components: Core Reporter: Benedict Assignee: Benedict Fix For: 2.1.3 Attachments: 8459.txt Memtable memory reclamation is dependent on reads always making progress, however on the collectTimeOrderedData critical path it is possible for the read to perform a _write_ inline, and for this write to block waiting for memtable space to be reclaimed. However the reclaimation is blocked waiting for this read to complete. There are a number of solutions to this, but the simplest is to make the defragmentation happen asynchronously, so the read terminates normally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8459) autocompaction on reads can prevent memtable space reclaimation
[ https://issues.apache.org/jira/browse/CASSANDRA-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242523#comment-14242523 ] Philo Yang commented on CASSANDRA-8459: --- Ok, Thanks! autocompaction on reads can prevent memtable space reclaimation - Key: CASSANDRA-8459 URL: https://issues.apache.org/jira/browse/CASSANDRA-8459 Project: Cassandra Issue Type: Bug Components: Core Reporter: Benedict Assignee: Benedict Fix For: 2.1.3 Attachments: 8459.txt Memtable memory reclamation is dependent on reads always making progress, however on the collectTimeOrderedData critical path it is possible for the read to perform a _write_ inline, and for this write to block waiting for memtable space to be reclaimed. However the reclaimation is blocked waiting for this read to complete. There are a number of solutions to this, but the simplest is to make the defragmentation happen asynchronously, so the read terminates normally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4139) Add varint encoding to Messaging service
[ https://issues.apache.org/jira/browse/CASSANDRA-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242529#comment-14242529 ] Sylvain Lebresne commented on CASSANDRA-4139: - To offer some kind of counter-point, I don't think this ticket would require lots of effort since we already have code to do the vint encoding/decoding. I might be missing something, but from what I can tell, it should be enough to pass the {{TypeSizes}} in {{IVersionedSerializer.serializedSize}} plus make sure both sides agree on whether vint is enabled or not, none of which is terribly involved (nor would add much complexity to the code). And since the investissement is not that big, I do think it's not completely worthless to evaluate it. It will probably not help in all cases or even with the default configuration, but I suspect it's faster than generic compression and so it could be interesting when you want a middle-ground between no compression at all and full messages compression. Anyway, not trying to convince anyone to prioritize this in any way, but just to say that unless someone beats me to it, I do intend to give this a shot at some point in the future (especially because some parts I made in CASSANDRA-8099 would benefit more from vint that what the current format probaby do). Add varint encoding to Messaging service Key: CASSANDRA-4139 URL: https://issues.apache.org/jira/browse/CASSANDRA-4139 Project: Cassandra Issue Type: Sub-task Components: Core Reporter: Vijay Assignee: Ariel Weisberg Fix For: 3.0 Attachments: 0001-CASSANDRA-4139-v1.patch, 0001-CASSANDRA-4139-v2.patch, 0001-CASSANDRA-4139-v4.patch, 0002-add-bytes-written-metric.patch, 4139-Test.rtf, ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-4139-v3.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8261) Clean up schema metadata classes
[ https://issues.apache.org/jira/browse/CASSANDRA-8261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-8261: - Attachment: 8261-isolate-serialization-code-v2.txt Clean up schema metadata classes Key: CASSANDRA-8261 URL: https://issues.apache.org/jira/browse/CASSANDRA-8261 Project: Cassandra Issue Type: Improvement Reporter: Aleksey Yeschenko Assignee: Aleksey Yeschenko Priority: Minor Fix For: 3.0 Attachments: 8261-isolate-hadcoded-system-tables.txt, 8261-isolate-serialization-code-v2.txt, 8261-isolate-serialization-code.txt, 8261-isolate-thrift-code.txt While working on CASSANDRA-6717, I've made some general cleanup changes to schema metadata classes - distracted from the core purpose. Also, being distracted from it by other things, every time I come back to it gives me a bit of a rebase hell. Thus I'm isolating those changes into a separate issue here, hoping to commit them one by one, before I go back and finalize CASSANDRA-6717. The changes include: - moving all the toThrift/fromThrift conversion code to ThriftConversion, where it belongs - moving the complied system CFMetaData objects away from CFMetaData (to SystemKeyspace and TracesKeyspace) - isolating legacy toSchema/fromSchema code into a separate class (LegacySchemaTables - former DefsTables) - refactoring CFMetaData/KSMetaData fields to match CQL CREATE TABLE syntax, and encapsulating more things in CompactionOptions/CompressionOptions/ReplicationOptions classes - moving the definition classes to the new 'schema' package -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8261) Clean up schema metadata classes
[ https://issues.apache.org/jira/browse/CASSANDRA-8261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242539#comment-14242539 ] Aleksey Yeschenko commented on CASSANDRA-8261: -- Attached a rebased v2 with the renames. The TODO is there for CASSANDRA-6717 to resolve (all of these 8261 patches are extracts from the 6717 branch, actually). Didn't touch javadoc, b/c many of those methods will be gone (all the ones that serialize schema the old way and some others). This is the last 8261 patch. The rest of the work will be completed in 6717. Clean up schema metadata classes Key: CASSANDRA-8261 URL: https://issues.apache.org/jira/browse/CASSANDRA-8261 Project: Cassandra Issue Type: Improvement Reporter: Aleksey Yeschenko Assignee: Aleksey Yeschenko Priority: Minor Fix For: 3.0 Attachments: 8261-isolate-hadcoded-system-tables.txt, 8261-isolate-serialization-code-v2.txt, 8261-isolate-serialization-code.txt, 8261-isolate-thrift-code.txt While working on CASSANDRA-6717, I've made some general cleanup changes to schema metadata classes - distracted from the core purpose. Also, being distracted from it by other things, every time I come back to it gives me a bit of a rebase hell. Thus I'm isolating those changes into a separate issue here, hoping to commit them one by one, before I go back and finalize CASSANDRA-6717. The changes include: - moving all the toThrift/fromThrift conversion code to ThriftConversion, where it belongs - moving the complied system CFMetaData objects away from CFMetaData (to SystemKeyspace and TracesKeyspace) - isolating legacy toSchema/fromSchema code into a separate class (LegacySchemaTables - former DefsTables) - refactoring CFMetaData/KSMetaData fields to match CQL CREATE TABLE syntax, and encapsulating more things in CompactionOptions/CompressionOptions/ReplicationOptions classes - moving the definition classes to the new 'schema' package -- This message was sent by Atlassian JIRA (v6.3.4#6332)
cassandra git commit: Fix error message on read repair timeouts
Repository: cassandra Updated Branches: refs/heads/cassandra-2.0 578430952 - 451c514a3 Fix error message on read repair timeouts patch by Sam Tunnicliffe; reviewed by Aleksey Yeschenko for CASSANDRA-7947 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/451c514a Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/451c514a Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/451c514a Branch: refs/heads/cassandra-2.0 Commit: 451c514a3a02f4e889f040176453beefbcd75843 Parents: 5784309 Author: Sam Tunnicliffe s...@beobal.com Authored: Thu Dec 11 15:17:29 2014 +0100 Committer: Aleksey Yeschenko alek...@apache.org Committed: Thu Dec 11 15:17:29 2014 +0100 -- CHANGES.txt | 1 + .../org/apache/cassandra/service/StorageProxy.java | 16 +++- 2 files changed, 16 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/451c514a/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 385af01..cd302fb 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.0.12: + * Fix error message on read repair timeouts (CASSANDRA-7947) * Default DTCS base_time_seconds changed to 60 (CASSANDRA-8417) * Refuse Paxos operation with more than one pending endpoint (CASSANDRA-8346) * Throw correct exception when trying to bind a keyspace or table http://git-wip-us.apache.org/repos/asf/cassandra/blob/451c514a/src/java/org/apache/cassandra/service/StorageProxy.java -- diff --git a/src/java/org/apache/cassandra/service/StorageProxy.java b/src/java/org/apache/cassandra/service/StorageProxy.java index f877aee..1e1a2a3 100644 --- a/src/java/org/apache/cassandra/service/StorageProxy.java +++ b/src/java/org/apache/cassandra/service/StorageProxy.java @@ -1368,6 +1368,17 @@ public class StorageProxy implements StorageProxyMBean { throw new AssertionError(e); // full data requested from each node here, no digests should be sent } +catch (ReadTimeoutException e) +{ +if (Tracing.isTracing()) +Tracing.trace(Timed out waiting on digest mismatch repair requests); +else +logger.debug(Timed out waiting on digest mismatch repair requests); +// the caught exception here will have CL.ALL from the repair command, +// not whatever CL the initial command was at (CASSANDRA-7947) +int blockFor = consistencyLevel.blockFor(Keyspace.open(command.getKeyspace())); +throw new ReadTimeoutException(consistencyLevel, blockFor-1, blockFor, true); +} RowDataResolver resolver = (RowDataResolver)handler.resolver; try @@ -1378,7 +1389,10 @@ public class StorageProxy implements StorageProxyMBean } catch (TimeoutException e) { -Tracing.trace(Timed out on digest mismatch retries); +if (Tracing.isTracing()) +Tracing.trace(Timed out waiting on digest mismatch repair acknowledgements); +else +logger.debug(Timed out waiting on digest mismatch repair acknowledgements); int blockFor = consistencyLevel.blockFor(Keyspace.open(command.getKeyspace())); throw new ReadTimeoutException(consistencyLevel, blockFor-1, blockFor, true); }
[jira] [Created] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS
Marcus Eriksson created CASSANDRA-8460: -- Summary: Make it possible to move non-compacting sstables to slow/big storage in DTCS Key: CASSANDRA-8460 URL: https://issues.apache.org/jira/browse/CASSANDRA-8460 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson It would be nice if we could configure DTCS to have a set of extra data directories where we move the sstables once they are older than max_sstable_age_days. This would enable users to have a quick, small SSD for hot, new data, and big spinning disks for data that is rarely read and never compacted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[1/2] cassandra git commit: Fix error message on read repair timeouts
Repository: cassandra Updated Branches: refs/heads/cassandra-2.1 27c67ad85 - 745ddd1c2 Fix error message on read repair timeouts patch by Sam Tunnicliffe; reviewed by Aleksey Yeschenko for CASSANDRA-7947 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/451c514a Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/451c514a Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/451c514a Branch: refs/heads/cassandra-2.1 Commit: 451c514a3a02f4e889f040176453beefbcd75843 Parents: 5784309 Author: Sam Tunnicliffe s...@beobal.com Authored: Thu Dec 11 15:17:29 2014 +0100 Committer: Aleksey Yeschenko alek...@apache.org Committed: Thu Dec 11 15:17:29 2014 +0100 -- CHANGES.txt | 1 + .../org/apache/cassandra/service/StorageProxy.java | 16 +++- 2 files changed, 16 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/451c514a/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 385af01..cd302fb 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.0.12: + * Fix error message on read repair timeouts (CASSANDRA-7947) * Default DTCS base_time_seconds changed to 60 (CASSANDRA-8417) * Refuse Paxos operation with more than one pending endpoint (CASSANDRA-8346) * Throw correct exception when trying to bind a keyspace or table http://git-wip-us.apache.org/repos/asf/cassandra/blob/451c514a/src/java/org/apache/cassandra/service/StorageProxy.java -- diff --git a/src/java/org/apache/cassandra/service/StorageProxy.java b/src/java/org/apache/cassandra/service/StorageProxy.java index f877aee..1e1a2a3 100644 --- a/src/java/org/apache/cassandra/service/StorageProxy.java +++ b/src/java/org/apache/cassandra/service/StorageProxy.java @@ -1368,6 +1368,17 @@ public class StorageProxy implements StorageProxyMBean { throw new AssertionError(e); // full data requested from each node here, no digests should be sent } +catch (ReadTimeoutException e) +{ +if (Tracing.isTracing()) +Tracing.trace(Timed out waiting on digest mismatch repair requests); +else +logger.debug(Timed out waiting on digest mismatch repair requests); +// the caught exception here will have CL.ALL from the repair command, +// not whatever CL the initial command was at (CASSANDRA-7947) +int blockFor = consistencyLevel.blockFor(Keyspace.open(command.getKeyspace())); +throw new ReadTimeoutException(consistencyLevel, blockFor-1, blockFor, true); +} RowDataResolver resolver = (RowDataResolver)handler.resolver; try @@ -1378,7 +1389,10 @@ public class StorageProxy implements StorageProxyMBean } catch (TimeoutException e) { -Tracing.trace(Timed out on digest mismatch retries); +if (Tracing.isTracing()) +Tracing.trace(Timed out waiting on digest mismatch repair acknowledgements); +else +logger.debug(Timed out waiting on digest mismatch repair acknowledgements); int blockFor = consistencyLevel.blockFor(Keyspace.open(command.getKeyspace())); throw new ReadTimeoutException(consistencyLevel, blockFor-1, blockFor, true); }
[2/2] cassandra git commit: Merge branch 'cassandra-2.0' into cassandra-2.1
Merge branch 'cassandra-2.0' into cassandra-2.1 Conflicts: CHANGES.txt Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/745ddd1c Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/745ddd1c Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/745ddd1c Branch: refs/heads/cassandra-2.1 Commit: 745ddd1c2c2156a43097934941b7d160cc6a981c Parents: 27c67ad 451c514 Author: Aleksey Yeschenko alek...@apache.org Authored: Thu Dec 11 15:20:26 2014 +0100 Committer: Aleksey Yeschenko alek...@apache.org Committed: Thu Dec 11 15:20:26 2014 +0100 -- CHANGES.txt | 1 + .../org/apache/cassandra/service/StorageProxy.java | 16 +++- 2 files changed, 16 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/745ddd1c/CHANGES.txt -- diff --cc CHANGES.txt index 25e0f47,cd302fb..71a6642 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,24 -1,5 +1,25 @@@ -2.0.12: +2.1.3 + * Remove tmplink files for offline compactions (CASSANDRA-8321) + * Reduce maxHintsInProgress (CASSANDRA-8415) + * BTree updates may call provided update function twice (CASSANDRA-8018) + * Release sstable references after anticompaction (CASSANDRA-8386) + * Handle abort() in SSTableRewriter properly (CASSANDRA-8320) + * Fix high size calculations for prepared statements (CASSANDRA-8231) + * Centralize shared executors (CASSANDRA-8055) + * Fix filtering for CONTAINS (KEY) relations on frozen collection + clustering columns when the query is restricted to a single + partition (CASSANDRA-8203) + * Do more aggressive entire-sstable TTL expiry checks (CASSANDRA-8243) + * Add more log info if readMeter is null (CASSANDRA-8238) + * add check of the system wall clock time at startup (CASSANDRA-8305) + * Support for frozen collections (CASSANDRA-7859) + * Fix overflow on histogram computation (CASSANDRA-8028) + * Have paxos reuse the timestamp generation of normal queries (CASSANDRA-7801) + * Fix incremental repair not remove parent session on remote (CASSANDRA-8291) + * Improve JBOD disk utilization (CASSANDRA-7386) + * Log failed host when preparing incremental repair (CASSANDRA-8228) +Merged from 2.0: + * Fix error message on read repair timeouts (CASSANDRA-7947) * Default DTCS base_time_seconds changed to 60 (CASSANDRA-8417) * Refuse Paxos operation with more than one pending endpoint (CASSANDRA-8346) * Throw correct exception when trying to bind a keyspace or table http://git-wip-us.apache.org/repos/asf/cassandra/blob/745ddd1c/src/java/org/apache/cassandra/service/StorageProxy.java --
[1/3] cassandra git commit: Fix error message on read repair timeouts
Repository: cassandra Updated Branches: refs/heads/trunk 6ce8b3fcb - 857de5540 Fix error message on read repair timeouts patch by Sam Tunnicliffe; reviewed by Aleksey Yeschenko for CASSANDRA-7947 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/451c514a Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/451c514a Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/451c514a Branch: refs/heads/trunk Commit: 451c514a3a02f4e889f040176453beefbcd75843 Parents: 5784309 Author: Sam Tunnicliffe s...@beobal.com Authored: Thu Dec 11 15:17:29 2014 +0100 Committer: Aleksey Yeschenko alek...@apache.org Committed: Thu Dec 11 15:17:29 2014 +0100 -- CHANGES.txt | 1 + .../org/apache/cassandra/service/StorageProxy.java | 16 +++- 2 files changed, 16 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/451c514a/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 385af01..cd302fb 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 2.0.12: + * Fix error message on read repair timeouts (CASSANDRA-7947) * Default DTCS base_time_seconds changed to 60 (CASSANDRA-8417) * Refuse Paxos operation with more than one pending endpoint (CASSANDRA-8346) * Throw correct exception when trying to bind a keyspace or table http://git-wip-us.apache.org/repos/asf/cassandra/blob/451c514a/src/java/org/apache/cassandra/service/StorageProxy.java -- diff --git a/src/java/org/apache/cassandra/service/StorageProxy.java b/src/java/org/apache/cassandra/service/StorageProxy.java index f877aee..1e1a2a3 100644 --- a/src/java/org/apache/cassandra/service/StorageProxy.java +++ b/src/java/org/apache/cassandra/service/StorageProxy.java @@ -1368,6 +1368,17 @@ public class StorageProxy implements StorageProxyMBean { throw new AssertionError(e); // full data requested from each node here, no digests should be sent } +catch (ReadTimeoutException e) +{ +if (Tracing.isTracing()) +Tracing.trace(Timed out waiting on digest mismatch repair requests); +else +logger.debug(Timed out waiting on digest mismatch repair requests); +// the caught exception here will have CL.ALL from the repair command, +// not whatever CL the initial command was at (CASSANDRA-7947) +int blockFor = consistencyLevel.blockFor(Keyspace.open(command.getKeyspace())); +throw new ReadTimeoutException(consistencyLevel, blockFor-1, blockFor, true); +} RowDataResolver resolver = (RowDataResolver)handler.resolver; try @@ -1378,7 +1389,10 @@ public class StorageProxy implements StorageProxyMBean } catch (TimeoutException e) { -Tracing.trace(Timed out on digest mismatch retries); +if (Tracing.isTracing()) +Tracing.trace(Timed out waiting on digest mismatch repair acknowledgements); +else +logger.debug(Timed out waiting on digest mismatch repair acknowledgements); int blockFor = consistencyLevel.blockFor(Keyspace.open(command.getKeyspace())); throw new ReadTimeoutException(consistencyLevel, blockFor-1, blockFor, true); }
[3/3] cassandra git commit: Merge branch 'cassandra-2.1' into trunk
Merge branch 'cassandra-2.1' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/857de554 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/857de554 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/857de554 Branch: refs/heads/trunk Commit: 857de554066ecbe507ebab13b9cfe6a2749c403f Parents: 6ce8b3f 745ddd1 Author: Aleksey Yeschenko alek...@apache.org Authored: Thu Dec 11 15:20:56 2014 +0100 Committer: Aleksey Yeschenko alek...@apache.org Committed: Thu Dec 11 15:20:56 2014 +0100 -- CHANGES.txt | 1 + .../org/apache/cassandra/service/StorageProxy.java | 16 +++- 2 files changed, 16 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/857de554/CHANGES.txt -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/857de554/src/java/org/apache/cassandra/service/StorageProxy.java --
[2/3] cassandra git commit: Merge branch 'cassandra-2.0' into cassandra-2.1
Merge branch 'cassandra-2.0' into cassandra-2.1 Conflicts: CHANGES.txt Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/745ddd1c Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/745ddd1c Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/745ddd1c Branch: refs/heads/trunk Commit: 745ddd1c2c2156a43097934941b7d160cc6a981c Parents: 27c67ad 451c514 Author: Aleksey Yeschenko alek...@apache.org Authored: Thu Dec 11 15:20:26 2014 +0100 Committer: Aleksey Yeschenko alek...@apache.org Committed: Thu Dec 11 15:20:26 2014 +0100 -- CHANGES.txt | 1 + .../org/apache/cassandra/service/StorageProxy.java | 16 +++- 2 files changed, 16 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/745ddd1c/CHANGES.txt -- diff --cc CHANGES.txt index 25e0f47,cd302fb..71a6642 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,24 -1,5 +1,25 @@@ -2.0.12: +2.1.3 + * Remove tmplink files for offline compactions (CASSANDRA-8321) + * Reduce maxHintsInProgress (CASSANDRA-8415) + * BTree updates may call provided update function twice (CASSANDRA-8018) + * Release sstable references after anticompaction (CASSANDRA-8386) + * Handle abort() in SSTableRewriter properly (CASSANDRA-8320) + * Fix high size calculations for prepared statements (CASSANDRA-8231) + * Centralize shared executors (CASSANDRA-8055) + * Fix filtering for CONTAINS (KEY) relations on frozen collection + clustering columns when the query is restricted to a single + partition (CASSANDRA-8203) + * Do more aggressive entire-sstable TTL expiry checks (CASSANDRA-8243) + * Add more log info if readMeter is null (CASSANDRA-8238) + * add check of the system wall clock time at startup (CASSANDRA-8305) + * Support for frozen collections (CASSANDRA-7859) + * Fix overflow on histogram computation (CASSANDRA-8028) + * Have paxos reuse the timestamp generation of normal queries (CASSANDRA-7801) + * Fix incremental repair not remove parent session on remote (CASSANDRA-8291) + * Improve JBOD disk utilization (CASSANDRA-7386) + * Log failed host when preparing incremental repair (CASSANDRA-8228) +Merged from 2.0: + * Fix error message on read repair timeouts (CASSANDRA-7947) * Default DTCS base_time_seconds changed to 60 (CASSANDRA-8417) * Refuse Paxos operation with more than one pending endpoint (CASSANDRA-8346) * Throw correct exception when trying to bind a keyspace or table http://git-wip-us.apache.org/repos/asf/cassandra/blob/745ddd1c/src/java/org/apache/cassandra/service/StorageProxy.java --
[jira] [Updated] (CASSANDRA-8447) Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled
[ https://issues.apache.org/jira/browse/CASSANDRA-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jonathan lacefield updated CASSANDRA-8447: -- Attachment: output.1.svg output.2.svg output.1.svg represents unhealthy node, with compaction output.2.svg represents healthy node, no compaction These flame graphs were created to compare healthy and unhealthy nodes. They were created after clearing out all CL replays by stopping dse, starting dse, flushing nodes, stopping dse, restarting dse, finally validated CL replay was not occurring through system.log. The flame graphs were created on the same test execution. Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled --- Key: CASSANDRA-8447 URL: https://issues.apache.org/jira/browse/CASSANDRA-8447 Project: Cassandra Issue Type: Bug Components: Core Environment: Cluster size - 4 nodes Node size - 12 CPU (hyper threaded to 24 cores), 192 GB RAM, 2 Raid 0 arrays (Data - 10 disk, spinning 10k drives | CL 2 disk, spinning 10k drives) OS - RHEL 6.5 jvm - oracle 1.7.0_71 Cassandra version 2.0.11 Reporter: jonathan lacefield Attachments: Node_with_compaction.png, Node_without_compaction.png, cassandra.yaml, gc.logs.tar.gz, gcinspector_messages.txt, memtable_debug, output.1.svg, output.2.svg, output.svg, results.tar.gz, visualvm_screenshot Behavior - If autocompaction is enabled, nodes will become unresponsive due to a full Old Gen heap which is not cleared during CMS GC. Test methodology - disabled autocompaction on 3 nodes, left autocompaction enabled on 1 node. Executed different Cassandra stress loads, using write only operations. Monitored visualvm and jconsole for heap pressure. Captured iostat and dstat for most tests. Captured heap dump from 50 thread load. Hints were disabled for testing on all nodes to alleviate GC noise due to hints backing up. Data load test through Cassandra stress - /usr/bin/cassandra-stress write n=19 -rate threads=different threads tested -schema replication\(factor=3\) keyspace=Keyspace1 -node all nodes listed Data load thread count and results: * 1 thread - Still running but looks like the node can sustain this load (approx 500 writes per second per node) * 5 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 2k writes per second per node) * 10 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range * 50 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 10k writes per second per node) * 100 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 20k writes per second per node) * 200 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 25k writes per second per node) Note - the observed behavior was the same for all tests except for the single threaded test. The single threaded test does not appear to show this behavior. Tested different GC and Linux OS settings with a focus on the 50 and 200 thread loads. JVM settings tested: # default, out of the box, env-sh settings # 10 G Max | 1 G New - default env-sh settings # 10 G Max | 1 G New - default env-sh settings #* JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=50 # 20 G Max | 10 G New JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly JVM_OPTS=$JVM_OPTS -XX:+UseTLAB JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6 JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3 JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=12 JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=12 JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768 JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking # 20 G Max | 1 G New JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly
[jira] [Updated] (CASSANDRA-8419) NPE in SelectStatement
[ https://issues.apache.org/jira/browse/CASSANDRA-8419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Thompson updated CASSANDRA-8419: --- Labels: qa-resolved (was: ) NPE in SelectStatement -- Key: CASSANDRA-8419 URL: https://issues.apache.org/jira/browse/CASSANDRA-8419 Project: Cassandra Issue Type: Bug Reporter: Philip Thompson Assignee: Benjamin Lerer Labels: qa-resolved Fix For: 3.0 Attachments: CASSANDRA-8419.txt The dtest {{cql_tests.py:TestCQL.empty_in_test}} is failing in trunk with a Null Pointer Exception. The stack trace is: {code}ERROR [SharedPool-Worker-1] 2014-12-03 16:24:16,274 ErrorMessage.java:243 - Unexpected exception during request java.lang.NullPointerException: null at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:213) ~[guava-16.0 .jar:na] at com.google.common.collect.Lists$TransformingSequentialList.init(Lists.java:525) ~[gu ava-16.0.jar:na] at com.google.common.collect.Lists.transform(Lists.java:508) ~[guava-16.0.jar:na] at org.apache.cassandra.db.composites.Composites.toByteBuffers(Composites.java:45) ~[main /:na] at org.apache.cassandra.cql3.restrictions.SingleColumnPrimaryKeyRestrictions.values(Singl eColumnPrimaryKeyRestrictions.java:257) ~[main/:na] at org.apache.cassandra.cql3.restrictions.StatementRestrictions.getPartitionKeys(StatementRestrictions.java:362) ~[main/:na] at org.apache.cassandra.cql3.statements.SelectStatement.getSliceCommands(SelectStatement.java:296) ~[main/:na] at org.apache.cassandra.cql3.statements.SelectStatement.getPageableCommand(SelectStatement.java:205) ~[main/:na] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:165) ~[main/:na] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:72) ~[main/:na] at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:239) ~[main/:na] at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:261) ~[main/:na] at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:118) ~[main/:na] at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:439) [main/:na] at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:335) [main/:na] at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324) [netty-all-4.0.23.Final.jar:4.0.23.Final] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_67] at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) [main/:na] at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [main/:na] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_67]{code} The error occurred while executing {{SELECT v FROM test_compact WHERE k1 IN ()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8321) SStablesplit behavior changed
[ https://issues.apache.org/jira/browse/CASSANDRA-8321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Thompson updated CASSANDRA-8321: --- Labels: qa-resolved (was: ) SStablesplit behavior changed - Key: CASSANDRA-8321 URL: https://issues.apache.org/jira/browse/CASSANDRA-8321 Project: Cassandra Issue Type: Bug Reporter: Philip Thompson Assignee: Marcus Eriksson Priority: Minor Labels: qa-resolved Fix For: 2.1.3 Attachments: 0001-ccm-fix-file-finding.patch, 0001-remove-tmplink-for-offline-compactions.patch The dtest sstablesplit_test.py has begun failing due to an incorrect number of sstables being created after running sstablesplit. http://cassci.datastax.com/job/cassandra-2.1_dtest/559/changes#detail1 is the run where the failure began. In 2.1.x, the test expects 7 sstables to be created after split, but instead 12 are being created. All of the data is there, and the sstables add up to the expected size, so this simply may be a change in default behavior. The test runs sstablesplit without the --size argument, and the default has not changed, so it is unexpected that the behavior would change in a minor point release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7124) Use JMX Notifications to Indicate Success/Failure of Long-Running Operations
[ https://issues.apache.org/jira/browse/CASSANDRA-7124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242677#comment-14242677 ] Yuki Morishita commented on CASSANDRA-7124: --- bq. Did you mean to say that, get the latest version from trunk, create a branch locally, apply all the changes and then push that to a different repository such as to my own Github repository and share the link with you? Yes. Trunk code will likely change often, so keeping your work in branch is preferable. At this point, I think it will be more easier to work/review if we create sub tasks. Compaction releated tasks(scrub, upgradesstable, compact, etc) and other tasks like move, decommission, etc need to have different codes. So why don't we focus on compaction related tasks first? For the latter question above, I prefer keeping classes small and focus their own responsibility. You can just go ahead and implement whatever you think is good. Use JMX Notifications to Indicate Success/Failure of Long-Running Operations Key: CASSANDRA-7124 URL: https://issues.apache.org/jira/browse/CASSANDRA-7124 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Tyler Hobbs Assignee: Rajanarayanan Thottuvaikkatumana Priority: Minor Labels: lhf Fix For: 3.0 Attachments: 7124-wip.txt, cassandra-trunk-compact-7124.txt, cassandra-trunk-decommission-7124.txt If {{nodetool cleanup}} or some other long-running operation takes too long to complete, you'll see an error like the one in CASSANDRA-2126, so you can't tell if the operation completed successfully or not. CASSANDRA-4767 fixed this for repairs with JMX notifications. We should do something similar for nodetool cleanup, compact, decommission, move, relocate, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8337) mmap underflow during validation compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242680#comment-14242680 ] Philip Thompson commented on CASSANDRA-8337: [~sterligovak], any chance you can attach one of the corrupt sstables? That would help with reproducing this and #8061 which you also ran into. Thanks you. mmap underflow during validation compaction --- Key: CASSANDRA-8337 URL: https://issues.apache.org/jira/browse/CASSANDRA-8337 Project: Cassandra Issue Type: Bug Reporter: Alexander Sterligov Assignee: Joshua McKenzie Fix For: 2.1.3 Attachments: 8337_v1.txt, thread_dump During full parallel repair I often get errors like the following {quote} [2014-11-19 01:02:39,355] Repair session 116beaf0-6f66-11e4-afbb-c1c082008cbe for range (3074457345618263602,-9223372036854775808] failed with error org.apache.cassandra.exceptions.RepairException: [repair #116beaf0-6f66-11e4-afbb-c1c082008cbe on iss/target_state_history, (3074457345618263602,-9223372036854775808]] Validation failed in /95.108.242.19 {quote} At the log of the node there are always same exceptions: {quote} ERROR [ValidationExecutor:2] 2014-11-19 01:02:10,847 JVMStabilityInspector.java:94 - JVM state determined to be unstable. Exiting forcefully due to: org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: mmap segment underflow; remaining is 15 but 47 requested at org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:1518) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:1385) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.io.sstable.SSTableReader.getPositionsForRanges(SSTableReader.java:1315) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:1706) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:1694) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.compaction.AbstractCompactionStrategy.getScanners(AbstractCompactionStrategy.java:276) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getScanners(WrappingCompactionStrategy.java:320) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:917) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:97) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:557) ~[apache-cassandra-2.1.2.jar:2.1.2] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_51] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_51] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51] at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] Caused by: java.io.IOException: mmap segment underflow; remaining is 15 but 47 requested at org.apache.cassandra.io.util.MappedFileDataInput.readBytes(MappedFileDataInput.java:135) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:348) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:327) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:1460) ~[apache-cassandra-2.1.2.jar:2.1.2] ... 13 common frames omitted {quote} Now i'm using die disk_failure_policy to determine such conditions faster, but I get them even with stop policy. Streams related to host with such exception are hanged. Thread dump is attached. Only restart helps. After retry I get errors from other nodes. scrub doesn't help and report that sstables are ok. Sequential repairs doesn't cause such exceptions. Load is about 1000 write rps and 50 read rps per node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8452) Add missing systems to FBUtilities.isUnix, add FBUtilities.isWindows
[ https://issues.apache.org/jira/browse/CASSANDRA-8452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Blake Eggleston updated CASSANDRA-8452: --- Attachment: CASSANDRA-8452-v2.patch It looks like there's already a patch in the works for 2.1 in CASSANDRA-6993, should I close this as a duplicate? If not, +1 on calculating at startup and calling it posix. The v2 patch attached determines OS on class initialization, and renames {{isUnix}} to {{isPosix}}. It also replaces a few {{!FBUtilities.isUnix()}} with {{FBUtilities.isWindows()}} where the comments indicate that the check is being done to support windows. Also, imo isPosix implies that the system is posix compliant, so I just changed it to isPosix, but lemme know if isPosixCompliant is strongly preferred, and I'll rename it. Add missing systems to FBUtilities.isUnix, add FBUtilities.isWindows Key: CASSANDRA-8452 URL: https://issues.apache.org/jira/browse/CASSANDRA-8452 Project: Cassandra Issue Type: Bug Reporter: Blake Eggleston Assignee: Blake Eggleston Priority: Minor Fix For: 2.1.3 Attachments: CASSANDRA-8452-v2.patch, CASSANDRA-8452.patch The isUnix method leaves out a few unix systems, which, after the changes in CASSANDRA-8136, causes some unexpected behavior during shutdown. It would also be clearer if FBUtilities had an isWindows method for branching into Windows specific logic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8459) autocompaction on reads can prevent memtable space reclaimation
[ https://issues.apache.org/jira/browse/CASSANDRA-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242734#comment-14242734 ] Jonathan Ellis commented on CASSANDRA-8459: --- +1 Do we also need this in 2.0? autocompaction on reads can prevent memtable space reclaimation - Key: CASSANDRA-8459 URL: https://issues.apache.org/jira/browse/CASSANDRA-8459 Project: Cassandra Issue Type: Bug Components: Core Reporter: Benedict Assignee: Benedict Fix For: 2.1.3 Attachments: 8459.txt Memtable memory reclamation is dependent on reads always making progress, however on the collectTimeOrderedData critical path it is possible for the read to perform a _write_ inline, and for this write to block waiting for memtable space to be reclaimed. However the reclaimation is blocked waiting for this read to complete. There are a number of solutions to this, but the simplest is to make the defragmentation happen asynchronously, so the read terminates normally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8459) autocompaction on reads can prevent memtable space reclaimation
[ https://issues.apache.org/jira/browse/CASSANDRA-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242738#comment-14242738 ] Benedict commented on CASSANDRA-8459: - It's probably not a *bad idea* for 2.0 as it stops a read touching the write path, but it isn't necessary for correctness. autocompaction on reads can prevent memtable space reclaimation - Key: CASSANDRA-8459 URL: https://issues.apache.org/jira/browse/CASSANDRA-8459 Project: Cassandra Issue Type: Bug Components: Core Reporter: Benedict Assignee: Benedict Fix For: 2.1.3 Attachments: 8459.txt Memtable memory reclamation is dependent on reads always making progress, however on the collectTimeOrderedData critical path it is possible for the read to perform a _write_ inline, and for this write to block waiting for memtable space to be reclaimed. However the reclaimation is blocked waiting for this read to complete. There are a number of solutions to this, but the simplest is to make the defragmentation happen asynchronously, so the read terminates normally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8459) autocompaction on reads can prevent memtable space reclaimation
[ https://issues.apache.org/jira/browse/CASSANDRA-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242747#comment-14242747 ] Jonathan Ellis commented on CASSANDRA-8459: --- Let's leave 2.0 alone then. autocompaction on reads can prevent memtable space reclaimation - Key: CASSANDRA-8459 URL: https://issues.apache.org/jira/browse/CASSANDRA-8459 Project: Cassandra Issue Type: Bug Components: Core Reporter: Benedict Assignee: Benedict Fix For: 2.1.3 Attachments: 8459.txt Memtable memory reclamation is dependent on reads always making progress, however on the collectTimeOrderedData critical path it is possible for the read to perform a _write_ inline, and for this write to block waiting for memtable space to be reclaimed. However the reclaimation is blocked waiting for this read to complete. There are a number of solutions to this, but the simplest is to make the defragmentation happen asynchronously, so the read terminates normally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8447) Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled
[ https://issues.apache.org/jira/browse/CASSANDRA-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242743#comment-14242743 ] Jonathan Ellis commented on CASSANDRA-8447: --- I don't think this is *the* problem, but I suggest patching with CASSANDRA-8164 just to eliminate those effects from the equation. Have you tried bisecting with earlier C* releases? Let's throw stress at this isn't exactly an untested scenario. Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled --- Key: CASSANDRA-8447 URL: https://issues.apache.org/jira/browse/CASSANDRA-8447 Project: Cassandra Issue Type: Bug Components: Core Environment: Cluster size - 4 nodes Node size - 12 CPU (hyper threaded to 24 cores), 192 GB RAM, 2 Raid 0 arrays (Data - 10 disk, spinning 10k drives | CL 2 disk, spinning 10k drives) OS - RHEL 6.5 jvm - oracle 1.7.0_71 Cassandra version 2.0.11 Reporter: jonathan lacefield Attachments: Node_with_compaction.png, Node_without_compaction.png, cassandra.yaml, gc.logs.tar.gz, gcinspector_messages.txt, memtable_debug, output.1.svg, output.2.svg, output.svg, results.tar.gz, visualvm_screenshot Behavior - If autocompaction is enabled, nodes will become unresponsive due to a full Old Gen heap which is not cleared during CMS GC. Test methodology - disabled autocompaction on 3 nodes, left autocompaction enabled on 1 node. Executed different Cassandra stress loads, using write only operations. Monitored visualvm and jconsole for heap pressure. Captured iostat and dstat for most tests. Captured heap dump from 50 thread load. Hints were disabled for testing on all nodes to alleviate GC noise due to hints backing up. Data load test through Cassandra stress - /usr/bin/cassandra-stress write n=19 -rate threads=different threads tested -schema replication\(factor=3\) keyspace=Keyspace1 -node all nodes listed Data load thread count and results: * 1 thread - Still running but looks like the node can sustain this load (approx 500 writes per second per node) * 5 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 2k writes per second per node) * 10 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range * 50 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 10k writes per second per node) * 100 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 20k writes per second per node) * 200 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 25k writes per second per node) Note - the observed behavior was the same for all tests except for the single threaded test. The single threaded test does not appear to show this behavior. Tested different GC and Linux OS settings with a focus on the 50 and 200 thread loads. JVM settings tested: # default, out of the box, env-sh settings # 10 G Max | 1 G New - default env-sh settings # 10 G Max | 1 G New - default env-sh settings #* JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=50 # 20 G Max | 10 G New JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly JVM_OPTS=$JVM_OPTS -XX:+UseTLAB JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6 JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3 JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=12 JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=12 JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768 JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking # 20 G Max | 1 G New JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly JVM_OPTS=$JVM_OPTS -XX:+UseTLAB JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6 JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3
[jira] [Commented] (CASSANDRA-8447) Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled
[ https://issues.apache.org/jira/browse/CASSANDRA-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242779#comment-14242779 ] Benedict commented on CASSANDRA-8447: - The problem is pretty simple: MeteredFlusher runs on StorageService.optionalTasks, and there are other events that can happen on here that can take a long time. In particular hint delivery scheduling, which is preceded by a blocking compaction of the hints table, during which no progress for any other optional tasks may proceed. MeteredFlusher should have its own dedicated thread, as responding promptly is essential; under this workload running every couple of seconds is pretty much necessary to avoid rapid catastrophic build up of state in memtables. Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled --- Key: CASSANDRA-8447 URL: https://issues.apache.org/jira/browse/CASSANDRA-8447 Project: Cassandra Issue Type: Bug Components: Core Environment: Cluster size - 4 nodes Node size - 12 CPU (hyper threaded to 24 cores), 192 GB RAM, 2 Raid 0 arrays (Data - 10 disk, spinning 10k drives | CL 2 disk, spinning 10k drives) OS - RHEL 6.5 jvm - oracle 1.7.0_71 Cassandra version 2.0.11 Reporter: jonathan lacefield Attachments: Node_with_compaction.png, Node_without_compaction.png, cassandra.yaml, gc.logs.tar.gz, gcinspector_messages.txt, memtable_debug, output.1.svg, output.2.svg, output.svg, results.tar.gz, visualvm_screenshot Behavior - If autocompaction is enabled, nodes will become unresponsive due to a full Old Gen heap which is not cleared during CMS GC. Test methodology - disabled autocompaction on 3 nodes, left autocompaction enabled on 1 node. Executed different Cassandra stress loads, using write only operations. Monitored visualvm and jconsole for heap pressure. Captured iostat and dstat for most tests. Captured heap dump from 50 thread load. Hints were disabled for testing on all nodes to alleviate GC noise due to hints backing up. Data load test through Cassandra stress - /usr/bin/cassandra-stress write n=19 -rate threads=different threads tested -schema replication\(factor=3\) keyspace=Keyspace1 -node all nodes listed Data load thread count and results: * 1 thread - Still running but looks like the node can sustain this load (approx 500 writes per second per node) * 5 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 2k writes per second per node) * 10 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range * 50 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 10k writes per second per node) * 100 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 20k writes per second per node) * 200 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 25k writes per second per node) Note - the observed behavior was the same for all tests except for the single threaded test. The single threaded test does not appear to show this behavior. Tested different GC and Linux OS settings with a focus on the 50 and 200 thread loads. JVM settings tested: # default, out of the box, env-sh settings # 10 G Max | 1 G New - default env-sh settings # 10 G Max | 1 G New - default env-sh settings #* JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=50 # 20 G Max | 10 G New JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly JVM_OPTS=$JVM_OPTS -XX:+UseTLAB JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6 JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3 JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=12 JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=12 JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768 JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking # 20 G Max | 1 G New JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8
[jira] [Comment Edited] (CASSANDRA-8447) Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled
[ https://issues.apache.org/jira/browse/CASSANDRA-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242779#comment-14242779 ] Benedict edited comment on CASSANDRA-8447 at 12/11/14 4:59 PM: --- The problem is pretty simple: MeteredFlusher runs on StorageService.optionalTasks, and there are other events that can happen on here that can take a long time. In particular hint delivery scheduling, which is preceded by a blocking compaction of the hints table, during which no progress for any other optional tasks may proceed. MeteredFlusher should have its own dedicated thread, as responding promptly is essential; under this workload running every couple of seconds is pretty much necessary to avoid rapid catastrophic build up of state in memtables. (edit: in case there's any ambiguity, this isn't a hypothesis. the heap dump clearly shows optionalTasks blocked waiting on the result of a FutureTask executing a runnable defined in CompactionManager (as far as I can tell in submitUserDefined); the current live memtable is retaining 6M records at 6Gb of retained heap, so MeteredFlusher hasn't had its turn in a long time) was (Author: benedict): The problem is pretty simple: MeteredFlusher runs on StorageService.optionalTasks, and there are other events that can happen on here that can take a long time. In particular hint delivery scheduling, which is preceded by a blocking compaction of the hints table, during which no progress for any other optional tasks may proceed. MeteredFlusher should have its own dedicated thread, as responding promptly is essential; under this workload running every couple of seconds is pretty much necessary to avoid rapid catastrophic build up of state in memtables. Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled --- Key: CASSANDRA-8447 URL: https://issues.apache.org/jira/browse/CASSANDRA-8447 Project: Cassandra Issue Type: Bug Components: Core Environment: Cluster size - 4 nodes Node size - 12 CPU (hyper threaded to 24 cores), 192 GB RAM, 2 Raid 0 arrays (Data - 10 disk, spinning 10k drives | CL 2 disk, spinning 10k drives) OS - RHEL 6.5 jvm - oracle 1.7.0_71 Cassandra version 2.0.11 Reporter: jonathan lacefield Attachments: Node_with_compaction.png, Node_without_compaction.png, cassandra.yaml, gc.logs.tar.gz, gcinspector_messages.txt, memtable_debug, output.1.svg, output.2.svg, output.svg, results.tar.gz, visualvm_screenshot Behavior - If autocompaction is enabled, nodes will become unresponsive due to a full Old Gen heap which is not cleared during CMS GC. Test methodology - disabled autocompaction on 3 nodes, left autocompaction enabled on 1 node. Executed different Cassandra stress loads, using write only operations. Monitored visualvm and jconsole for heap pressure. Captured iostat and dstat for most tests. Captured heap dump from 50 thread load. Hints were disabled for testing on all nodes to alleviate GC noise due to hints backing up. Data load test through Cassandra stress - /usr/bin/cassandra-stress write n=19 -rate threads=different threads tested -schema replication\(factor=3\) keyspace=Keyspace1 -node all nodes listed Data load thread count and results: * 1 thread - Still running but looks like the node can sustain this load (approx 500 writes per second per node) * 5 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 2k writes per second per node) * 10 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range * 50 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 10k writes per second per node) * 100 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 20k writes per second per node) * 200 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 25k writes per second per node) Note - the observed behavior was the same for all tests except for the single threaded test. The single threaded test does not appear to show this behavior. Tested different GC and Linux OS settings with a focus on the 50 and 200 thread loads. JVM settings tested: # default, out of the box, env-sh settings # 10 G Max | 1 G New - default env-sh settings # 10 G Max | 1 G New - default env-sh settings #* JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=50 # 20 G Max | 10 G New JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled
[jira] [Updated] (CASSANDRA-8452) Add missing systems to FBUtilities.isUnix, add FBUtilities.isWindows
[ https://issues.apache.org/jira/browse/CASSANDRA-8452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joshua McKenzie updated CASSANDRA-8452: --- Reviewer: Joshua McKenzie Add missing systems to FBUtilities.isUnix, add FBUtilities.isWindows Key: CASSANDRA-8452 URL: https://issues.apache.org/jira/browse/CASSANDRA-8452 Project: Cassandra Issue Type: Bug Reporter: Blake Eggleston Assignee: Blake Eggleston Priority: Minor Fix For: 2.1.3 Attachments: CASSANDRA-8452-v2.patch, CASSANDRA-8452.patch The isUnix method leaves out a few unix systems, which, after the changes in CASSANDRA-8136, causes some unexpected behavior during shutdown. It would also be clearer if FBUtilities had an isWindows method for branching into Windows specific logic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8458) Avoid streaming from tmplink files
[ https://issues.apache.org/jira/browse/CASSANDRA-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242832#comment-14242832 ] Benedict commented on CASSANDRA-8458: - We could also try and figure out how/why this happens, as it should be able to stream safely. Does it only happen if streaming a range that wraps zero (i.e. from +X, to -Y)? Avoid streaming from tmplink files -- Key: CASSANDRA-8458 URL: https://issues.apache.org/jira/browse/CASSANDRA-8458 Project: Cassandra Issue Type: Bug Reporter: Marcus Eriksson Assignee: Marcus Eriksson Fix For: 2.1.3 Looks like we include tmplink sstables in streams in 2.1+, and when we do, sometimes we get this error message on the receiving side: {{java.io.IOException: Corrupt input data, block did not start with 2 byte signature ('ZV') followed by type byte, 2-byte length)}}. I've only seen this happen when a tmplink sstable is included in the stream. We can not just exclude the tmplink files when starting the stream - we need to include the original file, which we might miss since we check if the requested stream range intersects the sstable range. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8458) Avoid streaming from tmplink files
[ https://issues.apache.org/jira/browse/CASSANDRA-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242832#comment-14242832 ] Benedict edited comment on CASSANDRA-8458 at 12/11/14 5:45 PM: --- We could also try and figure out how/why this happens, as it should be able to stream safely. Does it only happen if streaming a range that wraps zero (i.e. from +X, to -Y)? edit: To elaborate, I suspect the broken bit is that our dfile/ifile objects don't actually truncate the readable range - only our indexed decoratedkey range is truncated. In sstable.getPositionsForRanges we just return the end of the file if the range goes past the range of the file; in this case we could stream partially written data. If so, we could fix by simply making sstable.getPositionsForRanges() lookup the start position of the last key in the file, and always ensure we leave a key's overlap between the dropped sstables and the replacement. was (Author: benedict): We could also try and figure out how/why this happens, as it should be able to stream safely. Does it only happen if streaming a range that wraps zero (i.e. from +X, to -Y)? Avoid streaming from tmplink files -- Key: CASSANDRA-8458 URL: https://issues.apache.org/jira/browse/CASSANDRA-8458 Project: Cassandra Issue Type: Bug Reporter: Marcus Eriksson Assignee: Marcus Eriksson Fix For: 2.1.3 Looks like we include tmplink sstables in streams in 2.1+, and when we do, sometimes we get this error message on the receiving side: {{java.io.IOException: Corrupt input data, block did not start with 2 byte signature ('ZV') followed by type byte, 2-byte length)}}. I've only seen this happen when a tmplink sstable is included in the stream. We can not just exclude the tmplink files when starting the stream - we need to include the original file, which we might miss since we check if the requested stream range intersects the sstable range. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8390) The process cannot access the file because it is being used by another process
[ https://issues.apache.org/jira/browse/CASSANDRA-8390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242835#comment-14242835 ] Alexander Radzin commented on CASSANDRA-8390: - FYI: I have Windows Defender on my computer. I have just tried to turn it off and got the same result. The process cannot access the file because it is being used by another process -- Key: CASSANDRA-8390 URL: https://issues.apache.org/jira/browse/CASSANDRA-8390 Project: Cassandra Issue Type: Bug Reporter: Ilya Komolkin Assignee: Joshua McKenzie Fix For: 2.1.3 21:46:27.810 [NonPeriodicTasks:1] ERROR o.a.c.service.CassandraDaemon - Exception in thread Thread[NonPeriodicTasks:1,5,main] org.apache.cassandra.io.FSWriteError: java.nio.file.FileSystemException: E:\Upsource_12391\data\cassandra\data\kernel\filechangehistory_t-a277b560764611e48c8e4915424c75fe\kernel-filechangehistory_t-ka-33-Index.db: The process cannot access the file because it is being used by another process. at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:135) ~[cassandra-all-2.1.1.jar:2.1.1] at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:121) ~[cassandra-all-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTable.delete(SSTable.java:113) ~[cassandra-all-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableDeletingTask.run(SSTableDeletingTask.java:94) ~[cassandra-all-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableReader$6.run(SSTableReader.java:664) ~[cassandra-all-2.1.1.jar:2.1.1] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_71] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_71] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) ~[na:1.7.0_71] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) ~[na:1.7.0_71] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_71] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71] Caused by: java.nio.file.FileSystemException: E:\Upsource_12391\data\cassandra\data\kernel\filechangehistory_t-a277b560764611e48c8e4915424c75fe\kernel-filechangehistory_t-ka-33-Index.db: The process cannot access the file because it is being used by another process. at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) ~[na:1.7.0_71] at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) ~[na:1.7.0_71] at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102) ~[na:1.7.0_71] at sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269) ~[na:1.7.0_71] at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103) ~[na:1.7.0_71] at java.nio.file.Files.delete(Files.java:1079) ~[na:1.7.0_71] at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:131) ~[cassandra-all-2.1.1.jar:2.1.1] ... 11 common frames omitted -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[1/2] cassandra git commit: Support for user-defined aggregate functions
Repository: cassandra Updated Branches: refs/heads/trunk 857de5540 - e2f35c767 http://git-wip-us.apache.org/repos/asf/cassandra/blob/e2f35c76/src/java/org/apache/cassandra/cql3/statements/DropAggregateStatement.java -- diff --git a/src/java/org/apache/cassandra/cql3/statements/DropAggregateStatement.java b/src/java/org/apache/cassandra/cql3/statements/DropAggregateStatement.java new file mode 100644 index 000..118f89d --- /dev/null +++ b/src/java/org/apache/cassandra/cql3/statements/DropAggregateStatement.java @@ -0,0 +1,136 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.cassandra.cql3.statements; + +import java.util.ArrayList; +import java.util.List; + +import org.apache.cassandra.auth.Permission; +import org.apache.cassandra.cql3.CQL3Type; +import org.apache.cassandra.cql3.functions.*; +import org.apache.cassandra.db.marshal.AbstractType; +import org.apache.cassandra.exceptions.InvalidRequestException; +import org.apache.cassandra.exceptions.RequestValidationException; +import org.apache.cassandra.exceptions.UnauthorizedException; +import org.apache.cassandra.service.ClientState; +import org.apache.cassandra.service.MigrationManager; +import org.apache.cassandra.thrift.ThriftValidation; +import org.apache.cassandra.transport.Event; + +/** + * A codeDROP AGGREGATE/code statement parsed from a CQL query. + */ +public final class DropAggregateStatement extends SchemaAlteringStatement +{ +private FunctionName functionName; +private final boolean ifExists; +private final ListCQL3Type.Raw argRawTypes; +private final boolean argsPresent; + +public DropAggregateStatement(FunctionName functionName, + ListCQL3Type.Raw argRawTypes, + boolean argsPresent, + boolean ifExists) +{ +this.functionName = functionName; +this.argRawTypes = argRawTypes; +this.argsPresent = argsPresent; +this.ifExists = ifExists; +} + +public void prepareKeyspace(ClientState state) throws InvalidRequestException +{ +if (!functionName.hasKeyspace() state.getRawKeyspace() != null) +functionName = new FunctionName(state.getKeyspace(), functionName.name); + +if (!functionName.hasKeyspace()) +throw new InvalidRequestException(Functions must be fully qualified with a keyspace name if a keyspace is not set for the session); + +ThriftValidation.validateKeyspaceNotSystem(functionName.keyspace); +} + +public void checkAccess(ClientState state) throws UnauthorizedException, InvalidRequestException +{ +// TODO CASSANDRA-7557 (function DDL permission) + +state.hasKeyspaceAccess(functionName.keyspace, Permission.DROP); +} + +public void validate(ClientState state) throws RequestValidationException +{ +} + +public Event.SchemaChange changeEvent() +{ +return null; +} + +public boolean announceMigration(boolean isLocalOnly) throws RequestValidationException +{ +ListFunction olds = Functions.find(functionName); + +if (!argsPresent olds != null olds.size() 1) +throw new InvalidRequestException(String.format('DROP AGGREGATE %s' matches multiple function definitions; + +specify the argument types by issuing a statement like + +'DROP AGGREGATE %s (type, type, ...)'. Hint: use cqlsh + +'DESCRIBE AGGREGATE %s' command to find all overloads, +functionName, functionName, functionName)); + +ListAbstractType? argTypes = new ArrayList(argRawTypes.size()); +for (CQL3Type.Raw rawType : argRawTypes) +argTypes.add(rawType.prepare(functionName.keyspace).getType()); + +Function old; +if (argsPresent) +{ +old = Functions.find(functionName, argTypes); +if (old == null || !(old
[2/2] cassandra git commit: Support for user-defined aggregate functions
Support for user-defined aggregate functions Patch by Robert Stupp; reviewed by Tyler Hobbs for CASSANDRA-8053 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/e2f35c76 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/e2f35c76 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/e2f35c76 Branch: refs/heads/trunk Commit: e2f35c767e479da9761628578299b54872d7eea9 Parents: 857de55 Author: Robert Stupp sn...@snazy.de Authored: Thu Dec 11 11:46:28 2014 -0600 Committer: Tyler Hobbs ty...@datastax.com Committed: Thu Dec 11 11:46:28 2014 -0600 -- CHANGES.txt | 1 + pylib/cqlshlib/cql3handling.py | 28 +- src/java/org/apache/cassandra/auth/Auth.java| 12 + .../org/apache/cassandra/config/KSMetaData.java | 1 + src/java/org/apache/cassandra/cql3/Cql.g| 61 ++ .../apache/cassandra/cql3/QueryProcessor.java | 15 + .../cql3/functions/AbstractFunction.java| 10 + .../cassandra/cql3/functions/AggregateFcts.java | 64 +- .../cql3/functions/AggregateFunction.java | 10 +- .../cassandra/cql3/functions/Function.java | 4 + .../cassandra/cql3/functions/FunctionCall.java | 2 +- .../cassandra/cql3/functions/Functions.java | 24 +- .../cql3/functions/JavaSourceUDFFactory.java| 6 +- .../cassandra/cql3/functions/UDAggregate.java | 280 .../cassandra/cql3/functions/UDFunction.java| 193 ++ .../cassandra/cql3/functions/UDHelper.java | 123 .../selection/AbstractFunctionSelector.java | 4 +- .../selection/AggregateFunctionSelector.java| 6 +- .../cassandra/cql3/selection/FieldSelector.java | 2 +- .../cassandra/cql3/selection/Selection.java | 8 +- .../cassandra/cql3/selection/Selector.java | 2 +- .../cql3/selection/SelectorFactories.java | 2 +- .../statements/CreateAggregateStatement.java| 194 ++ .../statements/CreateFunctionStatement.java | 11 +- .../cql3/statements/DropAggregateStatement.java | 136 .../cql3/statements/DropFunctionStatement.java | 17 +- .../org/apache/cassandra/db/DefsTables.java | 89 ++- .../org/apache/cassandra/db/SystemKeyspace.java | 21 +- .../cassandra/service/IMigrationListener.java | 3 + .../cassandra/service/MigrationManager.java | 45 +- .../org/apache/cassandra/transport/Server.java | 12 + .../apache/cassandra/cql3/AggregationTest.java | 640 ++- .../org/apache/cassandra/cql3/CQLTester.java| 14 + test/unit/org/apache/cassandra/cql3/UFTest.java | 8 - 34 files changed, 1795 insertions(+), 253 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/e2f35c76/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 34e740e..6ff61e7 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 3.0 + * Support for user-defined aggregation functions (CASSANDRA-8053) * Fix NPE in SelectStatement with empty IN values (CASSANDRA-8419) * Refactor SelectStatement, return IN results in natural order instead of IN value list order (CASSANDRA-7981) http://git-wip-us.apache.org/repos/asf/cassandra/blob/e2f35c76/pylib/cqlshlib/cql3handling.py -- diff --git a/pylib/cqlshlib/cql3handling.py b/pylib/cqlshlib/cql3handling.py index f8a3069..84af796 100644 --- a/pylib/cqlshlib/cql3handling.py +++ b/pylib/cqlshlib/cql3handling.py @@ -41,7 +41,7 @@ class Cql3ParsingRuleSet(CqlParsingRuleSet): 'select', 'from', 'where', 'and', 'key', 'insert', 'update', 'with', 'limit', 'using', 'use', 'set', 'begin', 'apply', 'batch', 'truncate', 'delete', 'in', 'create', -'function', 'keyspace', 'schema', 'columnfamily', 'table', 'index', 'on', 'drop', +'function', 'aggregate', 'keyspace', 'schema', 'columnfamily', 'table', 'index', 'on', 'drop', 'primary', 'into', 'values', 'timestamp', 'ttl', 'alter', 'add', 'type', 'compact', 'storage', 'order', 'by', 'asc', 'desc', 'clustering', 'token', 'writetime', 'map', 'list', 'to', 'custom', 'if', 'not' @@ -209,7 +209,10 @@ JUNK ::= /([ \t\r\f\v]+|(--|[/][/])[^\n\r]*([\n\r]|$)|[/][*].*?[*][/])/ ; mapLiteral ::= { term : term ( , term : term )* } ; -functionName ::= identifier ( . identifier )? +userFunctionName ::= identifier ( . identifier )? + ; + +functionName ::= userFunctionName | TOKEN ; @@ -233,12 +236,14 @@ JUNK ::= /([ \t\r\f\v]+|(--|[/][/])[^\n\r]*([\n\r]|$)|[/][*].*?[*][/])/ ; | createIndexStatement | createUserTypeStatement |
[jira] [Updated] (CASSANDRA-8053) Support for user defined aggregate functions
[ https://issues.apache.org/jira/browse/CASSANDRA-8053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Hobbs updated CASSANDRA-8053: --- Attachment: 8053-final.txt Support for user defined aggregate functions Key: CASSANDRA-8053 URL: https://issues.apache.org/jira/browse/CASSANDRA-8053 Project: Cassandra Issue Type: New Feature Reporter: Robert Stupp Assignee: Robert Stupp Labels: cql, udf Fix For: 3.0 Attachments: 8053-final.txt, 8053v1.txt, 8053v2.txt CASSANDRA-4914 introduces aggregate functions. This ticket is about to decide how we can support user defined aggregate functions. UD aggregate functions should be supported for all UDF flavors (class, java, jsr223). Things to consider: * Special implementations for each scripting language should be omitted * No exposure of internal APIs (e.g. {{AggregateFunction}} interface) * No need for users to deal with serializers / codecs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7124) Use JMX Notifications to Indicate Success/Failure of Long-Running Operations
[ https://issues.apache.org/jira/browse/CASSANDRA-7124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242851#comment-14242851 ] Rajanarayanan Thottuvaikkatumana commented on CASSANDRA-7124: - [~yukim], Please find the branch and the commit reference for the compact task - https://github.com/rnamboodiri/cassandra/commit/3e1a49c511eb23d0e9b5bc854de1316d3be9fd86 Please review it and let me know. Since I have done the decommission part as well, I will commit that also and send you the link. From then on, I will focus on the compact related tasks such as (scrub, upgradesstable) etc. Thanks Use JMX Notifications to Indicate Success/Failure of Long-Running Operations Key: CASSANDRA-7124 URL: https://issues.apache.org/jira/browse/CASSANDRA-7124 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Tyler Hobbs Assignee: Rajanarayanan Thottuvaikkatumana Priority: Minor Labels: lhf Fix For: 3.0 Attachments: 7124-wip.txt, cassandra-trunk-compact-7124.txt, cassandra-trunk-decommission-7124.txt If {{nodetool cleanup}} or some other long-running operation takes too long to complete, you'll see an error like the one in CASSANDRA-2126, so you can't tell if the operation completed successfully or not. CASSANDRA-4767 fixed this for repairs with JMX notifications. We should do something similar for nodetool cleanup, compact, decommission, move, relocate, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8447) Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled
[ https://issues.apache.org/jira/browse/CASSANDRA-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242852#comment-14242852 ] Jonathan Ellis commented on CASSANDRA-8447: --- There is a patch on CASSANDRA-8285 that moves hint delivery off of optionalTasks to leave it free for MeteredFlusher. Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled --- Key: CASSANDRA-8447 URL: https://issues.apache.org/jira/browse/CASSANDRA-8447 Project: Cassandra Issue Type: Bug Components: Core Environment: Cluster size - 4 nodes Node size - 12 CPU (hyper threaded to 24 cores), 192 GB RAM, 2 Raid 0 arrays (Data - 10 disk, spinning 10k drives | CL 2 disk, spinning 10k drives) OS - RHEL 6.5 jvm - oracle 1.7.0_71 Cassandra version 2.0.11 Reporter: jonathan lacefield Attachments: Node_with_compaction.png, Node_without_compaction.png, cassandra.yaml, gc.logs.tar.gz, gcinspector_messages.txt, memtable_debug, output.1.svg, output.2.svg, output.svg, results.tar.gz, visualvm_screenshot Behavior - If autocompaction is enabled, nodes will become unresponsive due to a full Old Gen heap which is not cleared during CMS GC. Test methodology - disabled autocompaction on 3 nodes, left autocompaction enabled on 1 node. Executed different Cassandra stress loads, using write only operations. Monitored visualvm and jconsole for heap pressure. Captured iostat and dstat for most tests. Captured heap dump from 50 thread load. Hints were disabled for testing on all nodes to alleviate GC noise due to hints backing up. Data load test through Cassandra stress - /usr/bin/cassandra-stress write n=19 -rate threads=different threads tested -schema replication\(factor=3\) keyspace=Keyspace1 -node all nodes listed Data load thread count and results: * 1 thread - Still running but looks like the node can sustain this load (approx 500 writes per second per node) * 5 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 2k writes per second per node) * 10 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range * 50 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 10k writes per second per node) * 100 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 20k writes per second per node) * 200 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 25k writes per second per node) Note - the observed behavior was the same for all tests except for the single threaded test. The single threaded test does not appear to show this behavior. Tested different GC and Linux OS settings with a focus on the 50 and 200 thread loads. JVM settings tested: # default, out of the box, env-sh settings # 10 G Max | 1 G New - default env-sh settings # 10 G Max | 1 G New - default env-sh settings #* JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=50 # 20 G Max | 10 G New JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly JVM_OPTS=$JVM_OPTS -XX:+UseTLAB JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6 JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3 JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=12 JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=12 JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768 JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking # 20 G Max | 1 G New JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly JVM_OPTS=$JVM_OPTS -XX:+UseTLAB JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6 JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3 JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=12 JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=12 JVM_OPTS=$JVM_OPTS
[jira] [Commented] (CASSANDRA-8457) nio MessagingService
[ https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242859#comment-14242859 ] Benedict commented on CASSANDRA-8457: - FTR, I strongly doubt _context switching_ is actually as much of a problem as we think, although constraining it is never a bad thing. The big hit we have is _thread signalling_ costs, which is a different but related beast. Certainly the talking point that raised this was discussing system time spent serving context switches which would definitely be referring to signalling, not the switching itself. Now, we do use a BlockingQueue for OutboundTcpConnection which will incur these costs, however I strongly suspect the impact will be much lower than predicted - especially as the testing done to flag this up was on small clusters with RF=1, where these threads would not be being exercised at all. The costs of going to the network itself are likely to exceed the context switching costs, and naturally permit messages to accumulate in the queue, reducing the number of signals actually needed. There's then the negative performance implications we have found from small numbers of connections under NIO to consider, so that this change could have significant downsides for the majority of deployed clusters (although if we get batching in the client driver we may see these penalties disappear). To establish if there's likely a benefit to exploit, we could most likely refactor this code comparatively minimally (than rewriting to NIO/Netty) to make use of the SharedExecutorPool to establish if such a positive effect is indeed to be had, as this would reduce the number of threads in flight to those actually serving work on the OTCs. This wouldn't affect the ITC, but I am dubious of their contribution. We should probably also actually test if this is indeed a problem from clusters at scale performing in-memory CL1 reads. nio MessagingService Key: CASSANDRA-8457 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Ariel Weisberg Labels: performance Fix For: 3.0 Thread-per-peer (actually two each incoming and outbound) is a big contributor to context switching, especially for larger clusters. Let's look at switching to nio, possibly via Netty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7937) Apply backpressure gently when overloaded with writes
[ https://issues.apache.org/jira/browse/CASSANDRA-7937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242880#comment-14242880 ] Michaël Figuière commented on CASSANDRA-7937: - The StreamIDs, introduced in the native protocol to multiplex several pending requests on a single connection, could actually serve as a backpressure mechanism. Before protocol v2 we had just 128 IDs per connection with drivers typically allowing just a few connection per node. This therefore already acts as a throttling mechanism on the client side. With protocol v3 we've increased this limit but the driver still let the user define a value for the max requests per host that will have the same effect. A simple way the handle backpressure could therefore be to introduce a Window (similar as TCP Window) of the currently allowed concurrent requests for each client. Just like in TCP, the Window Size could be included in each response header to the client. This Window Size could then be adjusted using a magic formula to define, probably based on the load of each Stage of the Cassandra architecture, state of compaction, etc... I agree with [~jbellis]'s point: backpressure in a distributed system like Cassandra, with a coordinator fowarding traffic to replicas, is confusing. But in practice, most recent CQL Drivers now do Token Aware Balancing by default (since 2.0.2 in the Java Driver), which will send the queries to the replicas any PreparedStatement (expected to be used under the high pressure condition described here). So in this situation the backpressure information received by the client could be used properly, as it would just be understood by the client as a request to slow down for *this* particular replica, it could therefore pick another replica. Thus we end up with a system in which we avoid doing Load Shedding (which is a waste of time, bandwidth and workload) and that, I believe, could behave more smoothly when the cluster is overloaded. Note that this StreamID Window could be considered as a mandatory limit or just as a hint in the protocol specification. The driver could then adjust its strategy to use it or not depending on the settings or type of request. Apply backpressure gently when overloaded with writes - Key: CASSANDRA-7937 URL: https://issues.apache.org/jira/browse/CASSANDRA-7937 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassandra 2.0 Reporter: Piotr Kołaczkowski Labels: performance When writing huge amounts of data into C* cluster from analytic tools like Hadoop or Apache Spark, we can see that often C* can't keep up with the load. This is because analytic tools typically write data as fast as they can in parallel, from many nodes and they are not artificially rate-limited, so C* is the bottleneck here. Also, increasing the number of nodes doesn't really help, because in a collocated setup this also increases number of Hadoop/Spark nodes (writers) and although possible write performance is higher, the problem still remains. We observe the following behavior: 1. data is ingested at an extreme fast pace into memtables and flush queue fills up 2. the available memory limit for memtables is reached and writes are no longer accepted 3. the application gets hit by write timeout, and retries repeatedly, in vain 4. after several failed attempts to write, the job gets aborted Desired behaviour: 1. data is ingested at an extreme fast pace into memtables and flush queue fills up 2. after exceeding some memtable fill threshold, C* applies adaptive rate limiting to writes - the more the buffers are filled-up, the less writes/s are accepted, however writes still occur within the write timeout. 3. thanks to slowed down data ingestion, now flush can finish before all the memory gets used Of course the details how rate limiting could be done are up for a discussion. It may be also worth considering putting such logic into the driver, not C* core, but then C* needs to expose at least the following information to the driver, so we could calculate the desired maximum data rate: 1. current amount of memory available for writes before they would completely block 2. total amount of data queued to be flushed and flush progress (amount of data to flush remaining for the memtable currently being flushed) 3. average flush write speed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8418) Queries that require allow filtering are working without it
[ https://issues.apache.org/jira/browse/CASSANDRA-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242878#comment-14242878 ] Benjamin Lerer commented on CASSANDRA-8418: --- What is happening is that non matching clustering columns are filtered out from the index row in {{CompositesSearcher.getIndexedIterator}} as it is more efficient than loading the row and then filtering. This filtering can only works today if the clustering columns have been specified as a slice and cause the behavior to not always be consistent specially if some indices exists on the clustering columns. I agree that this behaviour is just some filtering optimization and that ALLOW FILTERING should be required. Queries that require allow filtering are working without it --- Key: CASSANDRA-8418 URL: https://issues.apache.org/jira/browse/CASSANDRA-8418 Project: Cassandra Issue Type: Bug Reporter: Philip Thompson Assignee: Benjamin Lerer Priority: Minor Fix For: 3.0 Attachments: CASSANDRA-8418.txt The trunk dtest {{cql_tests.py:TestCQL.composite_index_with_pk_test}} has begun failing after the changes to CASSANDRA-7981. With the schema {code}CREATE TABLE blogs ( blog_id int, time1 int, time2 int, author text, content text, PRIMARY KEY (blog_id, time1, time2){code} and {code}CREATE INDEX ON blogs(author){code}, then the query {code}SELECT blog_id, content FROM blogs WHERE time1 0 AND author='foo'{code} now requires ALLOW FILTERING, but did not before the refactor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8370) cqlsh doesn't handle LIST statements correctly
[ https://issues.apache.org/jira/browse/CASSANDRA-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242889#comment-14242889 ] Tyler Hobbs commented on CASSANDRA-8370: bq. v2 wfm, one tiny nit - is it worth checking the querystring first to avoid unnecessary parsing? I couldn't see a case where we'd want to parse here just to validate the query, but I could be missing something. +1, it's better to avoid parsing, if we can. cqlsh doesn't handle LIST statements correctly -- Key: CASSANDRA-8370 URL: https://issues.apache.org/jira/browse/CASSANDRA-8370 Project: Cassandra Issue Type: Bug Reporter: Sam Tunnicliffe Assignee: Sam Tunnicliffe Priority: Minor Labels: cqlsh Fix For: 2.1.3 Attachments: 8370.txt, 8370v2.patch {{LIST USERS}} and {{LIST PERMISSIONS}} statements are not handled correctly by cqlsh in 2.1 (since CASSANDRA-6307). Running such a query results in errors along the lines of: {noformat} sam@easy:~/projects/cassandra$ bin/cqlsh --debug -u cassandra -p cassandra Using CQL driver: module 'cassandra' from '/home/sam/projects/cassandra/bin/../lib/cassandra-driver-internal-only-2.1.2.zip/cassandra-driver-2.1.2/cassandra/__init__.py' Connected to Test Cluster at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 2.1.2-SNAPSHOT | CQL spec 3.2.0 | Native protocol v3] Use HELP for help. cassandra@cqlsh list users; Traceback (most recent call last): File bin/cqlsh, line 879, in onecmd self.handle_statement(st, statementtext) File bin/cqlsh, line 920, in handle_statement return self.perform_statement(cqlruleset.cql_extract_orig(tokens, srcstr)) File bin/cqlsh, line 953, in perform_statement result = self.perform_simple_statement(stmt) File bin/cqlsh, line 989, in perform_simple_statement self.print_result(rows, self.parse_for_table_meta(statement.query_string)) File bin/cqlsh, line 970, in parse_for_table_meta return self.get_table_meta(ks, cf) File bin/cqlsh, line 732, in get_table_meta ksmeta = self.get_keyspace_meta(ksname) File bin/cqlsh, line 717, in get_keyspace_meta raise KeyspaceNotFound('Keyspace %r not found.' % ksname) KeyspaceNotFound: Keyspace None not found. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8461) java.lang.AssertionError when running select queries
Chamila Dilshan Wijayarathna created CASSANDRA-8461: --- Summary: java.lang.AssertionError when running select queries Key: CASSANDRA-8461 URL: https://issues.apache.org/jira/browse/CASSANDRA-8461 Project: Cassandra Issue Type: Bug Components: API Environment: ubuntu 14.04 Reporter: Chamila Dilshan Wijayarathna I have a column family with following schema. CREATE TABLE corpus.trigram_category_ordered_frequency ( id bigint, word1 varchar, word2 varchar, word3 varchar, category varchar, frequency int, PRIMARY KEY(category,frequency,word1,word2,word3) ); When I run select word1,word2,word3 from corpus.trigram_category_ordered_frequency where category IN ('N','A','C','S','G') order by frequency DESC LIMIT 10; I am getting error saying ErrorMessage code= [Server error] message=java.lang.AssertionError But when I ran select * from corpus.trigram_category_ordered_frequency where category IN ('N','A','C','S','G') order by frequency DESC LIMIT 10; it works without any error. system log for this error is as follows. ERROR [SharedPool-Worker-1] 2014-12-11 20:42:20,152 Message.java:538 - Unexpected exception during request; channel = [id: 0xea57d8b6, /127.0.0.1:35624 = /127.0.0.1:9042] java.lang.AssertionError: null at org.apache.cassandra.cql3.ResultSet.addRow(ResultSet.java:63) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.Selection$ResultSetBuilder.newRow(Selection.java:333) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.processColumnFamily(SelectStatement.java:1227) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:1161) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:290) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:267) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:215) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:64) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:226) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:248) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:119) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:439) [apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:335) [apache-cassandra-2.1.2.jar:2.1.2] at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324) [netty-all-4.0.23.Final.jar:4.0.23.Final] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_72] at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) [apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-2.1.2.jar:2.1.2] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_72] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7937) Apply backpressure gently when overloaded with writes
[ https://issues.apache.org/jira/browse/CASSANDRA-7937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242880#comment-14242880 ] Michaël Figuière edited comment on CASSANDRA-7937 at 12/11/14 6:37 PM: --- The StreamIDs, introduced in the native protocol to multiplex several pending requests on a single connection, could actually serve as a backpressure mechanism. Before protocol v2 we had just 128 IDs per connection with drivers typically allowing just a few connection per node. This therefore already acts as a throttling mechanism on the client side. With protocol v3 we've increased this limit but the driver still let the user define a value for the max requests per host that will have the same effect. A simple way the handle backpressure could therefore be to introduce a Window (similar as TCP Window) of the currently allowed concurrent requests for each client. Just like in TCP, the Window Size could be included in each response header to the client. This Window Size could then be adjusted using a magic formula to define, probably based on the load of each Stage of the Cassandra architecture, state of compaction, etc... I agree with [~jbellis]'s point: backpressure in a distributed system like Cassandra, with a coordinator fowarding traffic to replicas, is confusing. But in practice, most recent CQL Drivers now do Token Aware Balancing by default (since 2.0.2 in the Java Driver), which will send the request to the replicas for any PreparedStatement (expected to be used under the high pressure condition described here). So in this situation the backpressure information received by the client could be used properly, as it would just be understood by the client as a request to slow down for *this* particular replica, it could therefore pick another replica. Thus we end up with a system in which we avoid doing Load Shedding (which is a waste of time, bandwidth and workload) and that, I believe, could behave more smoothly when the cluster is overloaded. Note that this StreamID Window could be considered as a mandatory limit or just as a hint in the protocol specification. The driver could then adjust its strategy to use it or not depending on the settings or type of request. was (Author: mfiguiere): The StreamIDs, introduced in the native protocol to multiplex several pending requests on a single connection, could actually serve as a backpressure mechanism. Before protocol v2 we had just 128 IDs per connection with drivers typically allowing just a few connection per node. This therefore already acts as a throttling mechanism on the client side. With protocol v3 we've increased this limit but the driver still let the user define a value for the max requests per host that will have the same effect. A simple way the handle backpressure could therefore be to introduce a Window (similar as TCP Window) of the currently allowed concurrent requests for each client. Just like in TCP, the Window Size could be included in each response header to the client. This Window Size could then be adjusted using a magic formula to define, probably based on the load of each Stage of the Cassandra architecture, state of compaction, etc... I agree with [~jbellis]'s point: backpressure in a distributed system like Cassandra, with a coordinator fowarding traffic to replicas, is confusing. But in practice, most recent CQL Drivers now do Token Aware Balancing by default (since 2.0.2 in the Java Driver), which will send the queries to the replicas any PreparedStatement (expected to be used under the high pressure condition described here). So in this situation the backpressure information received by the client could be used properly, as it would just be understood by the client as a request to slow down for *this* particular replica, it could therefore pick another replica. Thus we end up with a system in which we avoid doing Load Shedding (which is a waste of time, bandwidth and workload) and that, I believe, could behave more smoothly when the cluster is overloaded. Note that this StreamID Window could be considered as a mandatory limit or just as a hint in the protocol specification. The driver could then adjust its strategy to use it or not depending on the settings or type of request. Apply backpressure gently when overloaded with writes - Key: CASSANDRA-7937 URL: https://issues.apache.org/jira/browse/CASSANDRA-7937 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassandra 2.0 Reporter: Piotr Kołaczkowski Labels: performance When writing huge amounts of data into C* cluster from analytic tools like Hadoop or Apache Spark, we can see that often C* can't keep up with the load. This is
[jira] [Updated] (CASSANDRA-8461) java.lang.AssertionError when running select queries
[ https://issues.apache.org/jira/browse/CASSANDRA-8461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Thompson updated CASSANDRA-8461: --- Description: I have a column family with following schema. CREATE TABLE corpus.trigram_category_ordered_frequency ( id bigint, word1 varchar, word2 varchar, word3 varchar, category varchar, frequency int, PRIMARY KEY(category,frequency,word1,word2,word3) ); When I run select word1,word2,word3 from corpus.trigram_category_ordered_frequency where category IN ('N','A','C','S','G') order by frequency DESC LIMIT 10; I am getting error saying ErrorMessage code= [Server error] message=java.lang.AssertionError But when I ran select * from corpus.trigram_category_ordered_frequency where category IN ('N','A','C','S','G') order by frequency DESC LIMIT 10; it works without any error. system log for this error is as follows. {code} ERROR [SharedPool-Worker-1] 2014-12-11 20:42:20,152 Message.java:538 - Unexpected exception during request; channel = [id: 0xea57d8b6, /127.0.0.1:35624 = /127.0.0.1:9042] java.lang.AssertionError: null at org.apache.cassandra.cql3.ResultSet.addRow(ResultSet.java:63) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.Selection$ResultSetBuilder.newRow(Selection.java:333) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.processColumnFamily(SelectStatement.java:1227) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:1161) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:290) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:267) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:215) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:64) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:226) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:248) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:119) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:439) [apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:335) [apache-cassandra-2.1.2.jar:2.1.2] at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324) [netty-all-4.0.23.Final.jar:4.0.23.Final] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_72] at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) [apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-2.1.2.jar:2.1.2] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_72]{code} was: I have a column family with following schema. CREATE TABLE corpus.trigram_category_ordered_frequency ( id bigint, word1 varchar, word2 varchar, word3 varchar, category varchar, frequency int, PRIMARY KEY(category,frequency,word1,word2,word3) ); When I run select word1,word2,word3 from corpus.trigram_category_ordered_frequency where category IN ('N','A','C','S','G') order by frequency DESC LIMIT 10; I am getting error saying ErrorMessage code= [Server error] message=java.lang.AssertionError But when I ran select * from corpus.trigram_category_ordered_frequency where category IN ('N','A','C','S','G') order by frequency DESC LIMIT 10; it works without any error. system log for this error is as follows. ERROR [SharedPool-Worker-1] 2014-12-11 20:42:20,152 Message.java:538 - Unexpected exception during request; channel = [id: 0xea57d8b6, /127.0.0.1:35624 = /127.0.0.1:9042] java.lang.AssertionError: null at
[jira] [Assigned] (CASSANDRA-8461) java.lang.AssertionError when running select queries
[ https://issues.apache.org/jira/browse/CASSANDRA-8461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Thompson reassigned CASSANDRA-8461: -- Assignee: Philip Thompson java.lang.AssertionError when running select queries Key: CASSANDRA-8461 URL: https://issues.apache.org/jira/browse/CASSANDRA-8461 Project: Cassandra Issue Type: Bug Components: API Environment: ubuntu 14.04 Reporter: Chamila Dilshan Wijayarathna Assignee: Philip Thompson I have a column family with following schema. CREATE TABLE corpus.trigram_category_ordered_frequency ( id bigint, word1 varchar, word2 varchar, word3 varchar, category varchar, frequency int, PRIMARY KEY(category,frequency,word1,word2,word3) ); When I run select word1,word2,word3 from corpus.trigram_category_ordered_frequency where category IN ('N','A','C','S','G') order by frequency DESC LIMIT 10; I am getting error saying ErrorMessage code= [Server error] message=java.lang.AssertionError But when I ran select * from corpus.trigram_category_ordered_frequency where category IN ('N','A','C','S','G') order by frequency DESC LIMIT 10; it works without any error. system log for this error is as follows. ERROR [SharedPool-Worker-1] 2014-12-11 20:42:20,152 Message.java:538 - Unexpected exception during request; channel = [id: 0xea57d8b6, /127.0.0.1:35624 = /127.0.0.1:9042] java.lang.AssertionError: null at org.apache.cassandra.cql3.ResultSet.addRow(ResultSet.java:63) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.Selection$ResultSetBuilder.newRow(Selection.java:333) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.processColumnFamily(SelectStatement.java:1227) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:1161) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:290) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:267) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:215) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:64) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:226) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:248) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:119) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:439) [apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:335) [apache-cassandra-2.1.2.jar:2.1.2] at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324) [netty-all-4.0.23.Final.jar:4.0.23.Final] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_72] at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) [apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-2.1.2.jar:2.1.2] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_72] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8461) java.lang.AssertionError when running select queries
[ https://issues.apache.org/jira/browse/CASSANDRA-8461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242942#comment-14242942 ] Philip Thompson commented on CASSANDRA-8461: [~thobbs], I will verify that this was not fixed by other work. java.lang.AssertionError when running select queries Key: CASSANDRA-8461 URL: https://issues.apache.org/jira/browse/CASSANDRA-8461 Project: Cassandra Issue Type: Bug Components: API Environment: ubuntu 14.04 Reporter: Chamila Dilshan Wijayarathna Assignee: Philip Thompson I have a column family with following schema. CREATE TABLE corpus.trigram_category_ordered_frequency ( id bigint, word1 varchar, word2 varchar, word3 varchar, category varchar, frequency int, PRIMARY KEY(category,frequency,word1,word2,word3) ); When I run select word1,word2,word3 from corpus.trigram_category_ordered_frequency where category IN ('N','A','C','S','G') order by frequency DESC LIMIT 10; I am getting error saying ErrorMessage code= [Server error] message=java.lang.AssertionError But when I ran select * from corpus.trigram_category_ordered_frequency where category IN ('N','A','C','S','G') order by frequency DESC LIMIT 10; it works without any error. system log for this error is as follows. {code} ERROR [SharedPool-Worker-1] 2014-12-11 20:42:20,152 Message.java:538 - Unexpected exception during request; channel = [id: 0xea57d8b6, /127.0.0.1:35624 = /127.0.0.1:9042] java.lang.AssertionError: null at org.apache.cassandra.cql3.ResultSet.addRow(ResultSet.java:63) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.Selection$ResultSetBuilder.newRow(Selection.java:333) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.processColumnFamily(SelectStatement.java:1227) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:1161) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:290) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:267) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:215) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:64) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:226) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:248) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:119) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:439) [apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:335) [apache-cassandra-2.1.2.jar:2.1.2] at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324) [netty-all-4.0.23.Final.jar:4.0.23.Final] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_72] at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) [apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-2.1.2.jar:2.1.2] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_72]{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8462) Upgrading a 2.0 to 2.1 breaks CFMetaData on 2.0 nodes
Rick Branson created CASSANDRA-8462: --- Summary: Upgrading a 2.0 to 2.1 breaks CFMetaData on 2.0 nodes Key: CASSANDRA-8462 URL: https://issues.apache.org/jira/browse/CASSANDRA-8462 Project: Cassandra Issue Type: Bug Components: Core Reporter: Rick Branson Added a 2.1.2 node to a cluster running 2.0.11. Didn't make any schema changes. When I tried to reboot one of the 2.0 nodes, it failed to boot with this exception. Besides an obvious fix, any workarounds for this? {code} java.lang.IllegalArgumentException: No enum constant org.apache.cassandra.config.CFMetaData.Caching.{keys:ALL, rows_per_partition:NONE} at java.lang.Enum.valueOf(Enum.java:236) at org.apache.cassandra.config.CFMetaData$Caching.valueOf(CFMetaData.java:286) at org.apache.cassandra.config.CFMetaData.fromSchemaNoColumnsNoTriggers(CFMetaData.java:1713) at org.apache.cassandra.config.CFMetaData.fromSchema(CFMetaData.java:1793) at org.apache.cassandra.config.KSMetaData.deserializeColumnFamilies(KSMetaData.java:307) at org.apache.cassandra.config.KSMetaData.fromSchema(KSMetaData.java:288) at org.apache.cassandra.db.DefsTables.loadFromKeyspace(DefsTables.java:131) at org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:529) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:270) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585) {/code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8461) java.lang.AssertionError when running select queries
[ https://issues.apache.org/jira/browse/CASSANDRA-8461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Thompson updated CASSANDRA-8461: --- Reproduced In: 2.1.2 java.lang.AssertionError when running select queries Key: CASSANDRA-8461 URL: https://issues.apache.org/jira/browse/CASSANDRA-8461 Project: Cassandra Issue Type: Bug Components: API Environment: ubuntu 14.04 Reporter: Chamila Dilshan Wijayarathna Assignee: Philip Thompson I have a column family with following schema. CREATE TABLE corpus.trigram_category_ordered_frequency ( id bigint, word1 varchar, word2 varchar, word3 varchar, category varchar, frequency int, PRIMARY KEY(category,frequency,word1,word2,word3) ); When I run select word1,word2,word3 from corpus.trigram_category_ordered_frequency where category IN ('N','A','C','S','G') order by frequency DESC LIMIT 10; I am getting error saying ErrorMessage code= [Server error] message=java.lang.AssertionError But when I ran select * from corpus.trigram_category_ordered_frequency where category IN ('N','A','C','S','G') order by frequency DESC LIMIT 10; it works without any error. system log for this error is as follows. {code} ERROR [SharedPool-Worker-1] 2014-12-11 20:42:20,152 Message.java:538 - Unexpected exception during request; channel = [id: 0xea57d8b6, /127.0.0.1:35624 = /127.0.0.1:9042] java.lang.AssertionError: null at org.apache.cassandra.cql3.ResultSet.addRow(ResultSet.java:63) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.Selection$ResultSetBuilder.newRow(Selection.java:333) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.processColumnFamily(SelectStatement.java:1227) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:1161) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:290) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:267) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:215) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:64) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:226) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:248) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:119) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:439) [apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:335) [apache-cassandra-2.1.2.jar:2.1.2] at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324) [netty-all-4.0.23.Final.jar:4.0.23.Final] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_72] at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) [apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-2.1.2.jar:2.1.2] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_72]{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-8461) java.lang.AssertionError when running select queries
[ https://issues.apache.org/jira/browse/CASSANDRA-8461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Hobbs resolved CASSANDRA-8461. Resolution: Duplicate [~philipthompson] before I saw your comment, I went ahead and verified that CASSANDRA-8286 was the patch that fixed this, so Im resolving it as a duplicate of that. java.lang.AssertionError when running select queries Key: CASSANDRA-8461 URL: https://issues.apache.org/jira/browse/CASSANDRA-8461 Project: Cassandra Issue Type: Bug Components: API Environment: ubuntu 14.04 Reporter: Chamila Dilshan Wijayarathna Assignee: Philip Thompson I have a column family with following schema. CREATE TABLE corpus.trigram_category_ordered_frequency ( id bigint, word1 varchar, word2 varchar, word3 varchar, category varchar, frequency int, PRIMARY KEY(category,frequency,word1,word2,word3) ); When I run select word1,word2,word3 from corpus.trigram_category_ordered_frequency where category IN ('N','A','C','S','G') order by frequency DESC LIMIT 10; I am getting error saying ErrorMessage code= [Server error] message=java.lang.AssertionError But when I ran select * from corpus.trigram_category_ordered_frequency where category IN ('N','A','C','S','G') order by frequency DESC LIMIT 10; it works without any error. system log for this error is as follows. {code} ERROR [SharedPool-Worker-1] 2014-12-11 20:42:20,152 Message.java:538 - Unexpected exception during request; channel = [id: 0xea57d8b6, /127.0.0.1:35624 = /127.0.0.1:9042] java.lang.AssertionError: null at org.apache.cassandra.cql3.ResultSet.addRow(ResultSet.java:63) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.Selection$ResultSetBuilder.newRow(Selection.java:333) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.processColumnFamily(SelectStatement.java:1227) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:1161) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:290) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:267) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:215) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:64) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:226) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:248) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:119) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:439) [apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:335) [apache-cassandra-2.1.2.jar:2.1.2] at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324) [netty-all-4.0.23.Final.jar:4.0.23.Final] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_72] at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) [apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-2.1.2.jar:2.1.2] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_72]{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8447) Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled
[ https://issues.apache.org/jira/browse/CASSANDRA-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242954#comment-14242954 ] jonathan lacefield commented on CASSANDRA-8447: --- [~benedict] Interesting about hints. Just verified hints on the cluster. * CQLSH shows 0 for count * data directory locally is empty under hints for all nodes. * For all healthy nodes, tpstats shows no pending/active hints. * For the unhealthy node, TPstats shows 2 Active and 3 Pending hint ops From unhealthy node cqlsh use system; cqlsh:system select count(*) from hints ; count --- 0 Pool NameActive Pending Completed Blocked All time blocked ReadStage 0 0 2 0 0 RequestResponseStage 0 0 9 0 0 MutationStage 0 0 16471703 0 0 ReadRepairStage 0 0 0 0 0 ReplicateOnWriteStage 0 0 0 0 0 GossipStage 0 0439 0 0 CacheCleanupExecutor 0 0 0 0 0 MigrationStage0 0 0 0 0 MemoryMeter 0 0 24 0 0 FlushWriter 0 0175 0 0 ValidationExecutor0 0 0 0 0 InternalResponseStage 0 0 0 0 0 AntiEntropyStage 0 0 0 0 0 MemtablePostFlusher 0 0194 0 0 MiscStage 0 0 0 0 0 PendingRangeCalculator0 0 6 0 0 CompactionExecutor117 18 0 0 commitlog_archiver0 0 0 0 0 HintedHandoff 2 3 0 0 0 Here is the excerpt from the current hints config items in the .yaml from all 4 nodes hinted_handoff_enabled: false # this defines the maximum amount of time a dead host will have hints # generated. After it has been dead this long, new hints for it will not be max_hint_window_in_ms: 1080 # 3 hours # since we expect two nodes to be delivering hints simultaneously.) hinted_handoff_throttle_in_kb: 1024 # Number of threads with which to deliver hints; max_hints_delivery_threads: 2 Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled --- Key: CASSANDRA-8447 URL: https://issues.apache.org/jira/browse/CASSANDRA-8447 Project: Cassandra Issue Type: Bug Components: Core Environment: Cluster size - 4 nodes Node size - 12 CPU (hyper threaded to 24 cores), 192 GB RAM, 2 Raid 0 arrays (Data - 10 disk, spinning 10k drives | CL 2 disk, spinning 10k drives) OS - RHEL 6.5 jvm - oracle 1.7.0_71 Cassandra version 2.0.11 Reporter: jonathan lacefield Attachments: Node_with_compaction.png, Node_without_compaction.png, cassandra.yaml, gc.logs.tar.gz, gcinspector_messages.txt, memtable_debug, output.1.svg, output.2.svg, output.svg, results.tar.gz, visualvm_screenshot Behavior - If autocompaction is enabled, nodes will become unresponsive due to a full Old Gen heap which is not cleared during CMS GC. Test methodology - disabled autocompaction on 3 nodes, left autocompaction enabled on 1 node. Executed different Cassandra stress loads, using write only operations. Monitored visualvm and jconsole for heap pressure. Captured iostat and dstat for most tests. Captured heap dump from 50 thread load. Hints were disabled for testing on all nodes to alleviate GC noise due to hints backing up. Data load test through Cassandra stress - /usr/bin/cassandra-stress write n=19 -rate threads=different threads tested -schema replication\(factor=3\) keyspace=Keyspace1 -node all nodes listed Data load thread count and results: * 1 thread - Still running but looks like the node can sustain this load (approx 500 writes per second per node) * 5 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 2k writes per second per node) * 10 threads - Nodes become unresponsive due to full Old Gen Heap. CMS
[jira] [Commented] (CASSANDRA-8461) java.lang.AssertionError when running select queries
[ https://issues.apache.org/jira/browse/CASSANDRA-8461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242955#comment-14242955 ] Philip Thompson commented on CASSANDRA-8461: [~thobbs], thanks. [~cdwijayarathna], this issue has already been fixed in 2.1.3. When that minor release comes out, upgrading should fix your problem. Thank you for filing the JIRA! java.lang.AssertionError when running select queries Key: CASSANDRA-8461 URL: https://issues.apache.org/jira/browse/CASSANDRA-8461 Project: Cassandra Issue Type: Bug Components: API Environment: ubuntu 14.04 Reporter: Chamila Dilshan Wijayarathna Assignee: Philip Thompson I have a column family with following schema. CREATE TABLE corpus.trigram_category_ordered_frequency ( id bigint, word1 varchar, word2 varchar, word3 varchar, category varchar, frequency int, PRIMARY KEY(category,frequency,word1,word2,word3) ); When I run select word1,word2,word3 from corpus.trigram_category_ordered_frequency where category IN ('N','A','C','S','G') order by frequency DESC LIMIT 10; I am getting error saying ErrorMessage code= [Server error] message=java.lang.AssertionError But when I ran select * from corpus.trigram_category_ordered_frequency where category IN ('N','A','C','S','G') order by frequency DESC LIMIT 10; it works without any error. system log for this error is as follows. {code} ERROR [SharedPool-Worker-1] 2014-12-11 20:42:20,152 Message.java:538 - Unexpected exception during request; channel = [id: 0xea57d8b6, /127.0.0.1:35624 = /127.0.0.1:9042] java.lang.AssertionError: null at org.apache.cassandra.cql3.ResultSet.addRow(ResultSet.java:63) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.Selection$ResultSetBuilder.newRow(Selection.java:333) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.processColumnFamily(SelectStatement.java:1227) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:1161) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:290) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:267) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:215) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:64) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:226) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:248) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:119) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:439) [apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:335) [apache-cassandra-2.1.2.jar:2.1.2] at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324) [netty-all-4.0.23.Final.jar:4.0.23.Final] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_72] at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) [apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-2.1.2.jar:2.1.2] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_72]{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8461) java.lang.AssertionError when running select queries
[ https://issues.apache.org/jira/browse/CASSANDRA-8461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Thompson updated CASSANDRA-8461: --- Assignee: (was: Philip Thompson) java.lang.AssertionError when running select queries Key: CASSANDRA-8461 URL: https://issues.apache.org/jira/browse/CASSANDRA-8461 Project: Cassandra Issue Type: Bug Components: API Environment: ubuntu 14.04 Reporter: Chamila Dilshan Wijayarathna I have a column family with following schema. CREATE TABLE corpus.trigram_category_ordered_frequency ( id bigint, word1 varchar, word2 varchar, word3 varchar, category varchar, frequency int, PRIMARY KEY(category,frequency,word1,word2,word3) ); When I run select word1,word2,word3 from corpus.trigram_category_ordered_frequency where category IN ('N','A','C','S','G') order by frequency DESC LIMIT 10; I am getting error saying ErrorMessage code= [Server error] message=java.lang.AssertionError But when I ran select * from corpus.trigram_category_ordered_frequency where category IN ('N','A','C','S','G') order by frequency DESC LIMIT 10; it works without any error. system log for this error is as follows. {code} ERROR [SharedPool-Worker-1] 2014-12-11 20:42:20,152 Message.java:538 - Unexpected exception during request; channel = [id: 0xea57d8b6, /127.0.0.1:35624 = /127.0.0.1:9042] java.lang.AssertionError: null at org.apache.cassandra.cql3.ResultSet.addRow(ResultSet.java:63) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.Selection$ResultSetBuilder.newRow(Selection.java:333) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.processColumnFamily(SelectStatement.java:1227) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:1161) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:290) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:267) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:215) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:64) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:226) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:248) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:119) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:439) [apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:335) [apache-cassandra-2.1.2.jar:2.1.2] at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32) [netty-all-4.0.23.Final.jar:4.0.23.Final] at io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324) [netty-all-4.0.23.Final.jar:4.0.23.Final] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_72] at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) [apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-2.1.2.jar:2.1.2] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_72]{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8447) Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled
[ https://issues.apache.org/jira/browse/CASSANDRA-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242954#comment-14242954 ] jonathan lacefield edited comment on CASSANDRA-8447 at 12/11/14 6:58 PM: - [~benedict] Interesting about hints. Just verified hints on the cluster. * CQLSH shows 0 for count * data directory locally is empty under hints for all nodes. * For all healthy nodes, tpstats shows no pending/active hints. * For the unhealthy node, TPstats shows 2 Active and 3 Pending hint ops From unhealthy node cqlsh use system; cqlsh:system select count(*) from hints ; count --- 0 Pool NameActive Pending Completed Blocked All time blocked ReadStage 0 0 2 0 0 RequestResponseStage 0 0 9 0 0 MutationStage 0 0 16471703 0 0 ReadRepairStage 0 0 0 0 0 ReplicateOnWriteStage 0 0 0 0 0 GossipStage 0 0439 0 0 CacheCleanupExecutor 0 0 0 0 0 MigrationStage0 0 0 0 0 MemoryMeter 0 0 24 0 0 FlushWriter 0 0175 0 0 ValidationExecutor0 0 0 0 0 InternalResponseStage 0 0 0 0 0 AntiEntropyStage 0 0 0 0 0 MemtablePostFlusher 0 0194 0 0 MiscStage 0 0 0 0 0 PendingRangeCalculator0 0 6 0 0 CompactionExecutor117 18 0 0 commitlog_archiver0 0 0 0 0 HintedHandoff 2 3 0 0 0 Here is the excerpt from the current hints config items in the .yaml from all 4 nodes hinted_handoff_enabled: false # this defines the maximum amount of time a dead host will have hints # generated. After it has been dead this long, new hints for it will not be max_hint_window_in_ms: 1080 # 3 hours # since we expect two nodes to be delivering hints simultaneously.) hinted_handoff_throttle_in_kb: 1024 # Number of threads with which to deliver hints; max_hints_delivery_threads: 2 (edited - even after restarting dse, the unhealthy node shows 2 active and 3 pending hints via tpstats) was (Author: jlacefie): [~benedict] Interesting about hints. Just verified hints on the cluster. * CQLSH shows 0 for count * data directory locally is empty under hints for all nodes. * For all healthy nodes, tpstats shows no pending/active hints. * For the unhealthy node, TPstats shows 2 Active and 3 Pending hint ops From unhealthy node cqlsh use system; cqlsh:system select count(*) from hints ; count --- 0 Pool NameActive Pending Completed Blocked All time blocked ReadStage 0 0 2 0 0 RequestResponseStage 0 0 9 0 0 MutationStage 0 0 16471703 0 0 ReadRepairStage 0 0 0 0 0 ReplicateOnWriteStage 0 0 0 0 0 GossipStage 0 0439 0 0 CacheCleanupExecutor 0 0 0 0 0 MigrationStage0 0 0 0 0 MemoryMeter 0 0 24 0 0 FlushWriter 0 0175 0 0 ValidationExecutor0 0 0 0 0 InternalResponseStage 0 0 0 0 0 AntiEntropyStage 0 0 0 0 0 MemtablePostFlusher 0 0194 0 0 MiscStage 0 0 0 0
[jira] [Commented] (CASSANDRA-8452) Add missing systems to FBUtilities.isUnix, add FBUtilities.isWindows
[ https://issues.apache.org/jira/browse/CASSANDRA-8452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242969#comment-14242969 ] Joshua McKenzie commented on CASSANDRA-8452: I'm not sure if isPosix is correct in retrospect ([reference|http://en.wikipedia.org/wiki/POSIX#POSIX-oriented_operating_systems]). Not only are we not actually checking if something's truly posix compliant (though being mostly compliant serves our needs for now...), but our checks are more oriented around granular platform-specific differences, i.e. whether or not the underlying filesystem is an ext/hfs vs. ntfs, or whether or not the platform has a /proc filesystem (Note: I don't think the /proc filesystem is actually defined in the posix standard ([reference 2|http://pubs.opengroup.org/onlinepubs/9699919799/])). While having methods like 'isNTFS' or 'hasProcFilesystem' would be arguably more correct, at this point if we slice the eco-sytem into Windows vs. non-Windows it seems like it would satisfy our requirements. I could be off on that though - do we have areas in the code-base where we support specific sub-types of the *nix world w/different checks? i.e. is cassandra run on any systems that are currently missing a /proc filesystem, or have wacky hard-link behavior so early re-open is a problem, etc? Add missing systems to FBUtilities.isUnix, add FBUtilities.isWindows Key: CASSANDRA-8452 URL: https://issues.apache.org/jira/browse/CASSANDRA-8452 Project: Cassandra Issue Type: Bug Reporter: Blake Eggleston Assignee: Blake Eggleston Priority: Minor Fix For: 2.1.3 Attachments: CASSANDRA-8452-v2.patch, CASSANDRA-8452.patch The isUnix method leaves out a few unix systems, which, after the changes in CASSANDRA-8136, causes some unexpected behavior during shutdown. It would also be clearer if FBUtilities had an isWindows method for branching into Windows specific logic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8447) Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled
[ https://issues.apache.org/jira/browse/CASSANDRA-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242954#comment-14242954 ] jonathan lacefield edited comment on CASSANDRA-8447 at 12/11/14 7:21 PM: - [~benedict] Interesting about hints. Just verified hints on the cluster. * CQLSH shows 0 for count * data directory locally is empty under hints for all nodes. * For all healthy nodes, tpstats shows no pending/active hints. * For the unhealthy node, TPstats shows 2 Active and 3 Pending hint ops From unhealthy node cqlsh use system; cqlsh:system select count(*) from hints ; count --- 0 Pool NameActive Pending Completed Blocked All time blocked ReadStage 0 0 2 0 0 RequestResponseStage 0 0 9 0 0 MutationStage 0 0 16471703 0 0 ReadRepairStage 0 0 0 0 0 ReplicateOnWriteStage 0 0 0 0 0 GossipStage 0 0439 0 0 CacheCleanupExecutor 0 0 0 0 0 MigrationStage0 0 0 0 0 MemoryMeter 0 0 24 0 0 FlushWriter 0 0175 0 0 ValidationExecutor0 0 0 0 0 InternalResponseStage 0 0 0 0 0 AntiEntropyStage 0 0 0 0 0 MemtablePostFlusher 0 0194 0 0 MiscStage 0 0 0 0 0 PendingRangeCalculator0 0 6 0 0 CompactionExecutor117 18 0 0 commitlog_archiver0 0 0 0 0 HintedHandoff 2 3 0 0 0 Here is the excerpt from the current hints config items in the .yaml from all 4 nodes hinted_handoff_enabled: false # this defines the maximum amount of time a dead host will have hints # generated. After it has been dead this long, new hints for it will not be max_hint_window_in_ms: 1080 # 3 hours # since we expect two nodes to be delivering hints simultaneously.) hinted_handoff_throttle_in_kb: 1024 # Number of threads with which to deliver hints; max_hints_delivery_threads: 2 (edited - even after restarting dse, the unhealthy node shows 2 active and 3 pending hints via tpstats) (edit 2 - was able to clear pending and active hints by dropping the keyspace through cqlsh as well as dropping the keyspace folder on this node. new test is executing to see if behavior persists) was (Author: jlacefie): [~benedict] Interesting about hints. Just verified hints on the cluster. * CQLSH shows 0 for count * data directory locally is empty under hints for all nodes. * For all healthy nodes, tpstats shows no pending/active hints. * For the unhealthy node, TPstats shows 2 Active and 3 Pending hint ops From unhealthy node cqlsh use system; cqlsh:system select count(*) from hints ; count --- 0 Pool NameActive Pending Completed Blocked All time blocked ReadStage 0 0 2 0 0 RequestResponseStage 0 0 9 0 0 MutationStage 0 0 16471703 0 0 ReadRepairStage 0 0 0 0 0 ReplicateOnWriteStage 0 0 0 0 0 GossipStage 0 0439 0 0 CacheCleanupExecutor 0 0 0 0 0 MigrationStage0 0 0 0 0 MemoryMeter 0 0 24 0 0 FlushWriter 0 0175 0 0 ValidationExecutor0 0 0 0 0 InternalResponseStage 0 0 0 0 0 AntiEntropyStage 0 0
[jira] [Created] (CASSANDRA-8463) Upgrading 2.0 to 2.1 causes LCS to recompact all files
Rick Branson created CASSANDRA-8463: --- Summary: Upgrading 2.0 to 2.1 causes LCS to recompact all files Key: CASSANDRA-8463 URL: https://issues.apache.org/jira/browse/CASSANDRA-8463 Project: Cassandra Issue Type: Bug Components: Core Environment: Hardware is recent 2-socket, 16-core (x2 Hyperthreaded), 144G RAM, solid-state storage. Platform is Linux 3.2.51, Oracle JDK 64-bit 1.7.0_65. Heap is 32G total, 4G newsize. 8G/8G on-heap/off-heap memtables, offheap_buffer allocator, 0.5 memtable_cleanup_threshold concurrent_compactors: 20 Reporter: Rick Branson It appears that tables configured with LCS will completely re-compact themselves over some period of time after upgrading from 2.0 to 2.1 (2.0.11 - 2.1.2, specifically). It starts out with 10 pending tasks for an hour or so, then starts building up, now with 50-100 tasks pending across the cluster after 12 hours. These nodes are under heavy write load, but were easily able to keep up in 2.0 (they rarely had 5 pending compaction tasks), so I don't think it's LCS in 2.1 actually being worse, just perhaps some different LCS behavior that causes the layout of tables from 2.0 to prompt the compactor to reorganize them? The nodes flushed ~11MB SSTables under 2.0. They're currently flushing ~36MB SSTables due to the improved memtable setup in 2.1. Before I upgraded the entire cluster to 2.1, I noticed the problem and tried several variations on the flush size, thinking perhaps the larger tables in L0 were causing some kind of cascading compactions. Even if they're sized roughly like the 2.0 flushes were, same behavior occurs. I also tried both enabling disabling STCS in L0 with no real change other than L0 began to back up faster, so I left the STCS in L0 enabled. Tables are configured with 32MB sstable_size_in_mb, which was found to be an improvement on the 160MB table size for compaction performance. Maybe this is wrong now? Otherwise, the tables are configured with defaults. Compaction has been unthrottled to help them catch-up. The compaction threads stay very busy, with the cluster-wide CPU at 45% nice time. No nodes have completely caught up yet. I'll update JIRA with status about their progress if anything interesting happens. From a node around 12 hours ago, around an hour after the upgrade, with 19 pending compaction tasks: SSTables in each level: [6/4, 10, 105/100, 268, 0, 0, 0, 0, 0] SSTables in each level: [6/4, 10, 106/100, 271, 0, 0, 0, 0, 0] SSTables in each level: [1, 16/10, 105/100, 269, 0, 0, 0, 0, 0] SSTables in each level: [5/4, 10, 103/100, 272, 0, 0, 0, 0, 0] SSTables in each level: [4, 11/10, 105/100, 270, 0, 0, 0, 0, 0] SSTables in each level: [1, 12/10, 105/100, 271, 0, 0, 0, 0, 0] SSTables in each level: [1, 14/10, 104/100, 267, 0, 0, 0, 0, 0] SSTables in each level: [9/4, 10, 103/100, 265, 0, 0, 0, 0, 0] Recently, with 41 pending compaction tasks: SSTables in each level: [4, 13/10, 106/100, 269, 0, 0, 0, 0, 0] SSTables in each level: [4, 12/10, 106/100, 273, 0, 0, 0, 0, 0] SSTables in each level: [5/4, 11/10, 106/100, 271, 0, 0, 0, 0, 0] SSTables in each level: [4, 12/10, 103/100, 275, 0, 0, 0, 0, 0] SSTables in each level: [2, 13/10, 106/100, 273, 0, 0, 0, 0, 0] SSTables in each level: [3, 10, 104/100, 275, 0, 0, 0, 0, 0] SSTables in each level: [6/4, 11/10, 103/100, 269, 0, 0, 0, 0, 0] SSTables in each level: [4, 16/10, 105/100, 264, 0, 0, 0, 0, 0] More information about the use case: writes are roughly uniform across these tables. The data is sharded across these 8 tables by key to improve compaction parallelism. Each node receives up to 75,000 writes/sec sustained at peak, and a small number of reads. This is a pre-production cluster that's being warmed up with new data, so the low volume of reads (~100/sec per node) is just from automatic sampled data checks, otherwise we'd just use STCS :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8457) nio MessagingService
[ https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243008#comment-14243008 ] Ariel Weisberg commented on CASSANDRA-8457: --- bq. To establish if there's likely a benefit to exploit, we could most likely refactor this code comparatively minimally (than rewriting to NIO/Netty) to make use of the SharedExecutorPool to establish if such a positive effect is indeed to be had, as this would reduce the number of threads in flight to those actually serving work on the OTCs. This wouldn't affect the ITC, but I am dubious of their contribution. We should probably also actually test if this is indeed a problem from clusters at scale performing in-memory CL1 reads. I wonder what there is to be gained by having a single socket for inbound/outbound? Running a representative test will take some doing. cstar doesn't support multiple stress clients and it seems like the clusters only have 3 nodes? This is another argument for getting some decent size performance runs in CI working rather then doing one-off manual tests. Having profiling artifacts collected as part of this would also make doing performance research and validation easier. I feel pretty under informed when we discuss what to do next due to the lack of profiling information and the lack of canonical/repeatable performance data and workloads. nio MessagingService Key: CASSANDRA-8457 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Ariel Weisberg Labels: performance Fix For: 3.0 Thread-per-peer (actually two each incoming and outbound) is a big contributor to context switching, especially for larger clusters. Let's look at switching to nio, possibly via Netty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8429) Stress on trunk fails mixed workload on missing keys
[ https://issues.apache.org/jira/browse/CASSANDRA-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-8429: --- Fix Version/s: 2.1.3 Stress on trunk fails mixed workload on missing keys Key: CASSANDRA-8429 URL: https://issues.apache.org/jira/browse/CASSANDRA-8429 Project: Cassandra Issue Type: Bug Environment: Ubuntu 14.04 Reporter: Ariel Weisberg Assignee: Marcus Eriksson Fix For: 2.1.3 Attachments: cluster.conf, run_stress.sh Starts as part of merge commit 25be46497a8df46f05ffa102bc645bfd684ea48a Stress will say that a key wasn't validated because it isn't returned even though it's loaded. The key will eventually appear and can be queried using cqlsh. Reproduce with #!/bin/sh ROWCOUNT=1000 SCHEMA='-col n=fixed(1) -schema compaction(strategy=LeveledCompactionStrategy) compression=LZ4Compressor' ./cassandra-stress write n=$ROWCOUNT -node xh61 -pop seq=1..$ROWCOUNT no-wrap -rate threads=25 $SCHEMA ./cassandra-stress mixed ratio(read=2) n=1 -node xh61 -pop dist=extreme(1..$ROWCOUNT,0.6) -rate threads=25 $SCHEMA -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7304) Ability to distinguish between NULL and UNSET values in Prepared Statements
[ https://issues.apache.org/jira/browse/CASSANDRA-7304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oded Peer updated CASSANDRA-7304: - Attachment: 7304-04.patch I appreciate your comments, they are very helpful and I am learning a lot from them. bq. In the spec, instead of changing the meaning of {{\[bytes\]}}, I would rather add a new {{\[value\]}} definition that support 'unset', and use that exclusively in the definition of values for bind variables in QUERY and EXECUTE messages, so as to make it clear that it makes no sense in any other place. I would then add a specific {{CBUtil.readBoundValue()}} to read those. Done bq. Making {{UNSET_CONSTANT_VALUE}} be {{new Value(null)}} is somewhat incorrect, it should be {{new Value(UNSET_BYTE_BUFFER)}} so that we don't lose the the information that it's 'unset' if {{bindAndGet}} is used. For this reason, I'd prefer using {{Constants.UNSET_CONSTANT_VALUE}} (renamed as {{UNSET_VALUE}}) in collections too (instead of adding {{Lists.UNSET_LIST_VALUE}}, ...). Done bq. We can't have an 'unset' value inside a collection since we don't allow bind markers in the first place, and so there is a bit of useless code/validation related to that. I verified the changes aren’t useless with unit tests. bq. There is a bunch of place that don't handle 'UNSET_BYTE_BUFFER' properly: Tuples, ColumnCondition (we might want to reject queries for which all conditions are 'unset' as going through the paxos code for no reason feels like the user is doing something wrong) and SelectStatement where we could get an 'unset' pretty much anywhere where the {{values()}} or {{bound()}} method of a {{Restriction}} is used (and validation might be tricky in SelectStatement: if we have {{SELECT * FROM foo WHERE k1 = ? AND k2 = ? AND k3 = ?}}, then we shouldn't accept an 'unset' for {{k2}} unless {{k3}} is also unset; note that I'd be fine just refusing 'unset' in selects for now to simplify, but we at least need the validation code to reject them). I opted for rejecting unset values in selects. It’s not only to simplify I think it’s the right thing to do. Having a variable assignment or a condition with unset variables is undefined. bq. I'd reject 'unset' indexes in {{UDPATE ... SET l\[?\] = ?}} since it's rejected for map keys. Unless maybe if both the key/index and value are 'unset', but that should be coherent for lists and maps. Done bq. In Constants.Marker.bindAndGet, we should skip validation if 'unset' (even though the validation will never fail because empty values are always accepted, it's still dodgy). Done bq. We should have separate error messages when we reject both {{null}} and {{unset}}. Done bq. I'd prefer rejecting 'unset' inside UDTs (and tuples). Making it equivalent to {{null}} gives it a different meaning than usual and we should avoid that. Done bq. For the limit in SelectStatement, it would make sense to accept unset and to have it mean no limit (instead of being rejected). The same applies for the timestamp and ttl in {{Attributes}}. Done bq. In CBUtil.readValue(), we should throw a ProtocolException instead of an IllegalArgumentException. Done bq. I might have put {{UNSET_BYTE_BUFFER}} in {{ByteBufferUtil}} since it's a {{ByteBuffer}}. Done bq. The patch appears to have windows end-of-line and a few weird indentations. Could you check that? My apologies. I switched to Ubuntu. bq. I'd have added an unset() in CQLTester to use in tests to make the tests terser. Done Ability to distinguish between NULL and UNSET values in Prepared Statements --- Key: CASSANDRA-7304 URL: https://issues.apache.org/jira/browse/CASSANDRA-7304 Project: Cassandra Issue Type: Sub-task Reporter: Drew Kutcharian Assignee: Oded Peer Labels: cql, protocolv4 Fix For: 3.0 Attachments: 7304-03.patch, 7304-04.patch, 7304-2.patch, 7304.patch Currently Cassandra inserts tombstones when a value of a column is bound to NULL in a prepared statement. At higher insert rates managing all these tombstones becomes an unnecessary overhead. This limits the usefulness of the prepared statements since developers have to either create multiple prepared statements (each with a different combination of column names, which at times is just unfeasible because of the sheer number of possible combinations) or fall back to using regular (non-prepared) statements. This JIRA is here to explore the possibility of either: A. Have a flag on prepared statements that once set, tells Cassandra to ignore null columns or B. Have an UNSET value which makes Cassandra skip the null columns and not tombstone them Basically, in the context of a prepared statement, a null value means delete, but we don’t have anything that
[jira] [Commented] (CASSANDRA-7124) Use JMX Notifications to Indicate Success/Failure of Long-Running Operations
[ https://issues.apache.org/jira/browse/CASSANDRA-7124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243076#comment-14243076 ] Rajanarayanan Thottuvaikkatumana commented on CASSANDRA-7124: - [~yukim], Please find the changes for the decommission task - https://github.com/rnamboodiri/cassandra/commit/ca6c8e3788f6bdd54f21007524d10e287614cc88 Thanks Use JMX Notifications to Indicate Success/Failure of Long-Running Operations Key: CASSANDRA-7124 URL: https://issues.apache.org/jira/browse/CASSANDRA-7124 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Tyler Hobbs Assignee: Rajanarayanan Thottuvaikkatumana Priority: Minor Labels: lhf Fix For: 3.0 Attachments: 7124-wip.txt, cassandra-trunk-compact-7124.txt, cassandra-trunk-decommission-7124.txt If {{nodetool cleanup}} or some other long-running operation takes too long to complete, you'll see an error like the one in CASSANDRA-2126, so you can't tell if the operation completed successfully or not. CASSANDRA-4767 fixed this for repairs with JMX notifications. We should do something similar for nodetool cleanup, compact, decommission, move, relocate, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8457) nio MessagingService
[ https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243095#comment-14243095 ] T Jake Luciani commented on CASSANDRA-8457: --- bq. Running a representative test will take some doing. cstar doesn't support multiple stress clients and it seems like the clusters only have 3 nodes? But if you run with RF 1 you can stress the internal network which is what we are changing in this ticket nio MessagingService Key: CASSANDRA-8457 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Ariel Weisberg Labels: performance Fix For: 3.0 Thread-per-peer (actually two each incoming and outbound) is a big contributor to context switching, especially for larger clusters. Let's look at switching to nio, possibly via Netty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation
[ https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243148#comment-14243148 ] Jon Haddad commented on CASSANDRA-7032: --- Was the original 256 number chosen in hopes that it would minimize the chance of imbalance? If so, would this patch result in recommending fewer vnodes, say 16? If so, I imagine that would result in less time consuming repair as well as improvements to the spark driver, which, as of the last time I checked, did 1 query per token to achieve data locality. I would assume 16 nodes streaming data to a single one would still achieve the benefits of vnodes, but I'm just picking a number out of the air. Improve vnode allocation Key: CASSANDRA-7032 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Branimir Lambov Labels: performance, vnodes Fix For: 3.0 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, TestVNodeAllocation.java It's been known for a little while that random vnode allocation causes hotspots of ownership. It should be possible to improve dramatically on this with deterministic allocation. I have quickly thrown together a simple greedy algorithm that allocates vnodes efficiently, and will repair hotspots in a randomly allocated cluster gradually as more nodes are added, and also ensures that token ranges are fairly evenly spread between nodes (somewhat tunably so). The allocation still permits slight discrepancies in ownership, but it is bound by the inverse of the size of the cluster (as opposed to random allocation, which strangely gets worse as the cluster size increases). I'm sure there is a decent dynamic programming solution to this that would be even better. If on joining the ring a new node were to CAS a shared table where a canonical allocation of token ranges lives after running this (or a similar) algorithm, we could then get guaranteed bounds on the ownership distribution in a cluster. This will also help for CASSANDRA-6696. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8464) Support direct buffer decompression for reads
T Jake Luciani created CASSANDRA-8464: - Summary: Support direct buffer decompression for reads Key: CASSANDRA-8464 URL: https://issues.apache.org/jira/browse/CASSANDRA-8464 Project: Cassandra Issue Type: Improvement Reporter: T Jake Luciani Assignee: T Jake Luciani Fix For: 3.0 Currently when we read a compressed sstable we copy the data on heap then send it to be de-compressed to another on heap buffer (albeit pooled). But now both snappy and lz4 (with CASSANDRA-7039) allow decompression of direct byte buffers. This lets us mmap the data and decompress completely off heap (and avoids moving bytes over JNI). One issue is performing the checksum offheap but the Adler32 does support in java 8 (it's also in java 7 but marked private?!) This change yields a 10% boost in read performance on cstar. Locally I see upto 30% improvement. http://cstar.datastax.com/graph?stats=5ebcdd70-816b-11e4-aed6-42010af0688fmetric=op_rateoperation=2_readsmoothing=1show_aggregates=truexmin=0xmax=200.09ymin=0ymax=135908.3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7937) Apply backpressure gently when overloaded with writes
[ https://issues.apache.org/jira/browse/CASSANDRA-7937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243181#comment-14243181 ] Jonathan Ellis commented on CASSANDRA-7937: --- bq. So in this situation the backpressure information received by the client could be used properly, as it would just be understood by the client as a request to slow down for this particular replica, it could therefore pick another replica. That is a good point. However, it's only really useful for reads, since writes are always sent to all replicas. And unfortunately writes are by far a bigger problem because of the memory pressure they generate (in queues, as well as in the memtable). I've never seen a node OOM and fall over from too many reads. Apply backpressure gently when overloaded with writes - Key: CASSANDRA-7937 URL: https://issues.apache.org/jira/browse/CASSANDRA-7937 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassandra 2.0 Reporter: Piotr Kołaczkowski Labels: performance When writing huge amounts of data into C* cluster from analytic tools like Hadoop or Apache Spark, we can see that often C* can't keep up with the load. This is because analytic tools typically write data as fast as they can in parallel, from many nodes and they are not artificially rate-limited, so C* is the bottleneck here. Also, increasing the number of nodes doesn't really help, because in a collocated setup this also increases number of Hadoop/Spark nodes (writers) and although possible write performance is higher, the problem still remains. We observe the following behavior: 1. data is ingested at an extreme fast pace into memtables and flush queue fills up 2. the available memory limit for memtables is reached and writes are no longer accepted 3. the application gets hit by write timeout, and retries repeatedly, in vain 4. after several failed attempts to write, the job gets aborted Desired behaviour: 1. data is ingested at an extreme fast pace into memtables and flush queue fills up 2. after exceeding some memtable fill threshold, C* applies adaptive rate limiting to writes - the more the buffers are filled-up, the less writes/s are accepted, however writes still occur within the write timeout. 3. thanks to slowed down data ingestion, now flush can finish before all the memory gets used Of course the details how rate limiting could be done are up for a discussion. It may be also worth considering putting such logic into the driver, not C* core, but then C* needs to expose at least the following information to the driver, so we could calculate the desired maximum data rate: 1. current amount of memory available for writes before they would completely block 2. total amount of data queued to be flushed and flush progress (amount of data to flush remaining for the memtable currently being flushed) 3. average flush write speed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8462) Upgrading a 2.0 to 2.1 breaks CFMetaData on 2.0 nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-8462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-8462: -- Assignee: Aleksey Yeschenko Upgrading a 2.0 to 2.1 breaks CFMetaData on 2.0 nodes - Key: CASSANDRA-8462 URL: https://issues.apache.org/jira/browse/CASSANDRA-8462 Project: Cassandra Issue Type: Bug Components: Core Reporter: Rick Branson Assignee: Aleksey Yeschenko Added a 2.1.2 node to a cluster running 2.0.11. Didn't make any schema changes. When I tried to reboot one of the 2.0 nodes, it failed to boot with this exception. Besides an obvious fix, any workarounds for this? {code} java.lang.IllegalArgumentException: No enum constant org.apache.cassandra.config.CFMetaData.Caching.{keys:ALL, rows_per_partition:NONE} at java.lang.Enum.valueOf(Enum.java:236) at org.apache.cassandra.config.CFMetaData$Caching.valueOf(CFMetaData.java:286) at org.apache.cassandra.config.CFMetaData.fromSchemaNoColumnsNoTriggers(CFMetaData.java:1713) at org.apache.cassandra.config.CFMetaData.fromSchema(CFMetaData.java:1793) at org.apache.cassandra.config.KSMetaData.deserializeColumnFamilies(KSMetaData.java:307) at org.apache.cassandra.config.KSMetaData.fromSchema(KSMetaData.java:288) at org.apache.cassandra.db.DefsTables.loadFromKeyspace(DefsTables.java:131) at org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:529) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:270) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585) {/code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8463) Upgrading 2.0 to 2.1 causes LCS to recompact all files
[ https://issues.apache.org/jira/browse/CASSANDRA-8463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-8463: -- Assignee: Marcus Eriksson Upgrading 2.0 to 2.1 causes LCS to recompact all files -- Key: CASSANDRA-8463 URL: https://issues.apache.org/jira/browse/CASSANDRA-8463 Project: Cassandra Issue Type: Bug Components: Core Environment: Hardware is recent 2-socket, 16-core (x2 Hyperthreaded), 144G RAM, solid-state storage. Platform is Linux 3.2.51, Oracle JDK 64-bit 1.7.0_65. Heap is 32G total, 4G newsize. 8G/8G on-heap/off-heap memtables, offheap_buffer allocator, 0.5 memtable_cleanup_threshold concurrent_compactors: 20 Reporter: Rick Branson Assignee: Marcus Eriksson Fix For: 2.1.3 It appears that tables configured with LCS will completely re-compact themselves over some period of time after upgrading from 2.0 to 2.1 (2.0.11 - 2.1.2, specifically). It starts out with 10 pending tasks for an hour or so, then starts building up, now with 50-100 tasks pending across the cluster after 12 hours. These nodes are under heavy write load, but were easily able to keep up in 2.0 (they rarely had 5 pending compaction tasks), so I don't think it's LCS in 2.1 actually being worse, just perhaps some different LCS behavior that causes the layout of tables from 2.0 to prompt the compactor to reorganize them? The nodes flushed ~11MB SSTables under 2.0. They're currently flushing ~36MB SSTables due to the improved memtable setup in 2.1. Before I upgraded the entire cluster to 2.1, I noticed the problem and tried several variations on the flush size, thinking perhaps the larger tables in L0 were causing some kind of cascading compactions. Even if they're sized roughly like the 2.0 flushes were, same behavior occurs. I also tried both enabling disabling STCS in L0 with no real change other than L0 began to back up faster, so I left the STCS in L0 enabled. Tables are configured with 32MB sstable_size_in_mb, which was found to be an improvement on the 160MB table size for compaction performance. Maybe this is wrong now? Otherwise, the tables are configured with defaults. Compaction has been unthrottled to help them catch-up. The compaction threads stay very busy, with the cluster-wide CPU at 45% nice time. No nodes have completely caught up yet. I'll update JIRA with status about their progress if anything interesting happens. From a node around 12 hours ago, around an hour after the upgrade, with 19 pending compaction tasks: SSTables in each level: [6/4, 10, 105/100, 268, 0, 0, 0, 0, 0] SSTables in each level: [6/4, 10, 106/100, 271, 0, 0, 0, 0, 0] SSTables in each level: [1, 16/10, 105/100, 269, 0, 0, 0, 0, 0] SSTables in each level: [5/4, 10, 103/100, 272, 0, 0, 0, 0, 0] SSTables in each level: [4, 11/10, 105/100, 270, 0, 0, 0, 0, 0] SSTables in each level: [1, 12/10, 105/100, 271, 0, 0, 0, 0, 0] SSTables in each level: [1, 14/10, 104/100, 267, 0, 0, 0, 0, 0] SSTables in each level: [9/4, 10, 103/100, 265, 0, 0, 0, 0, 0] Recently, with 41 pending compaction tasks: SSTables in each level: [4, 13/10, 106/100, 269, 0, 0, 0, 0, 0] SSTables in each level: [4, 12/10, 106/100, 273, 0, 0, 0, 0, 0] SSTables in each level: [5/4, 11/10, 106/100, 271, 0, 0, 0, 0, 0] SSTables in each level: [4, 12/10, 103/100, 275, 0, 0, 0, 0, 0] SSTables in each level: [2, 13/10, 106/100, 273, 0, 0, 0, 0, 0] SSTables in each level: [3, 10, 104/100, 275, 0, 0, 0, 0, 0] SSTables in each level: [6/4, 11/10, 103/100, 269, 0, 0, 0, 0, 0] SSTables in each level: [4, 16/10, 105/100, 264, 0, 0, 0, 0, 0] More information about the use case: writes are roughly uniform across these tables. The data is sharded across these 8 tables by key to improve compaction parallelism. Each node receives up to 75,000 writes/sec sustained at peak, and a small number of reads. This is a pre-production cluster that's being warmed up with new data, so the low volume of reads (~100/sec per node) is just from automatic sampled data checks, otherwise we'd just use STCS :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8463) Upgrading 2.0 to 2.1 causes LCS to recompact all files
[ https://issues.apache.org/jira/browse/CASSANDRA-8463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-8463: -- Fix Version/s: 2.1.3 Upgrading 2.0 to 2.1 causes LCS to recompact all files -- Key: CASSANDRA-8463 URL: https://issues.apache.org/jira/browse/CASSANDRA-8463 Project: Cassandra Issue Type: Bug Components: Core Environment: Hardware is recent 2-socket, 16-core (x2 Hyperthreaded), 144G RAM, solid-state storage. Platform is Linux 3.2.51, Oracle JDK 64-bit 1.7.0_65. Heap is 32G total, 4G newsize. 8G/8G on-heap/off-heap memtables, offheap_buffer allocator, 0.5 memtable_cleanup_threshold concurrent_compactors: 20 Reporter: Rick Branson Assignee: Marcus Eriksson Fix For: 2.1.3 It appears that tables configured with LCS will completely re-compact themselves over some period of time after upgrading from 2.0 to 2.1 (2.0.11 - 2.1.2, specifically). It starts out with 10 pending tasks for an hour or so, then starts building up, now with 50-100 tasks pending across the cluster after 12 hours. These nodes are under heavy write load, but were easily able to keep up in 2.0 (they rarely had 5 pending compaction tasks), so I don't think it's LCS in 2.1 actually being worse, just perhaps some different LCS behavior that causes the layout of tables from 2.0 to prompt the compactor to reorganize them? The nodes flushed ~11MB SSTables under 2.0. They're currently flushing ~36MB SSTables due to the improved memtable setup in 2.1. Before I upgraded the entire cluster to 2.1, I noticed the problem and tried several variations on the flush size, thinking perhaps the larger tables in L0 were causing some kind of cascading compactions. Even if they're sized roughly like the 2.0 flushes were, same behavior occurs. I also tried both enabling disabling STCS in L0 with no real change other than L0 began to back up faster, so I left the STCS in L0 enabled. Tables are configured with 32MB sstable_size_in_mb, which was found to be an improvement on the 160MB table size for compaction performance. Maybe this is wrong now? Otherwise, the tables are configured with defaults. Compaction has been unthrottled to help them catch-up. The compaction threads stay very busy, with the cluster-wide CPU at 45% nice time. No nodes have completely caught up yet. I'll update JIRA with status about their progress if anything interesting happens. From a node around 12 hours ago, around an hour after the upgrade, with 19 pending compaction tasks: SSTables in each level: [6/4, 10, 105/100, 268, 0, 0, 0, 0, 0] SSTables in each level: [6/4, 10, 106/100, 271, 0, 0, 0, 0, 0] SSTables in each level: [1, 16/10, 105/100, 269, 0, 0, 0, 0, 0] SSTables in each level: [5/4, 10, 103/100, 272, 0, 0, 0, 0, 0] SSTables in each level: [4, 11/10, 105/100, 270, 0, 0, 0, 0, 0] SSTables in each level: [1, 12/10, 105/100, 271, 0, 0, 0, 0, 0] SSTables in each level: [1, 14/10, 104/100, 267, 0, 0, 0, 0, 0] SSTables in each level: [9/4, 10, 103/100, 265, 0, 0, 0, 0, 0] Recently, with 41 pending compaction tasks: SSTables in each level: [4, 13/10, 106/100, 269, 0, 0, 0, 0, 0] SSTables in each level: [4, 12/10, 106/100, 273, 0, 0, 0, 0, 0] SSTables in each level: [5/4, 11/10, 106/100, 271, 0, 0, 0, 0, 0] SSTables in each level: [4, 12/10, 103/100, 275, 0, 0, 0, 0, 0] SSTables in each level: [2, 13/10, 106/100, 273, 0, 0, 0, 0, 0] SSTables in each level: [3, 10, 104/100, 275, 0, 0, 0, 0, 0] SSTables in each level: [6/4, 11/10, 103/100, 269, 0, 0, 0, 0, 0] SSTables in each level: [4, 16/10, 105/100, 264, 0, 0, 0, 0, 0] More information about the use case: writes are roughly uniform across these tables. The data is sharded across these 8 tables by key to improve compaction parallelism. Each node receives up to 75,000 writes/sec sustained at peak, and a small number of reads. This is a pre-production cluster that's being warmed up with new data, so the low volume of reads (~100/sec per node) is just from automatic sampled data checks, otherwise we'd just use STCS :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8457) nio MessagingService
[ https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243188#comment-14243188 ] Ariel Weisberg commented on CASSANDRA-8457: --- Thank's Jake that is a good point. 3 nodes is still a problem as that allows these threads to get a lot hotter then they normally would in then in larger cluster. I will try Benedict's suggestion since that would be easy to put in. nio MessagingService Key: CASSANDRA-8457 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Ariel Weisberg Labels: performance Fix For: 3.0 Thread-per-peer (actually two each incoming and outbound) is a big contributor to context switching, especially for larger clusters. Let's look at switching to nio, possibly via Netty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7937) Apply backpressure gently when overloaded with writes
[ https://issues.apache.org/jira/browse/CASSANDRA-7937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243239#comment-14243239 ] Michaël Figuière commented on CASSANDRA-7937: - bq. That is a good point. However, it's only really useful for reads, since writes are always sent to all replicas. And unfortunately writes are by far a bigger problem because of the memory pressure they generate (in queues, as well as in the memtable). I've never seen a node OOM and fall over from too many reads. Indeed for Reads with CL=1 this will bring an appropriate backpressure for each replica. For Writes the appropriate backpressure that you'd want to see is the clients to slow down their rate for all the replicas, that is for the entire partition, as you don't want to loose it. And we could actually have it with this mechanism at the Window Size of each of the replicas would be reduced due to the heavy load they experience, and when Token Awareness is enabled on the client, it could avoid balancing to another node when reaching the maximum allowed concurrent requests threshold for each Replica, if configured to do so. Now if the entire cluster starts to be overloaded, this mechanism would make sure that the clients slow down their traffic, as there's no point in hammering an already overloaded cluster. Apply backpressure gently when overloaded with writes - Key: CASSANDRA-7937 URL: https://issues.apache.org/jira/browse/CASSANDRA-7937 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassandra 2.0 Reporter: Piotr Kołaczkowski Labels: performance When writing huge amounts of data into C* cluster from analytic tools like Hadoop or Apache Spark, we can see that often C* can't keep up with the load. This is because analytic tools typically write data as fast as they can in parallel, from many nodes and they are not artificially rate-limited, so C* is the bottleneck here. Also, increasing the number of nodes doesn't really help, because in a collocated setup this also increases number of Hadoop/Spark nodes (writers) and although possible write performance is higher, the problem still remains. We observe the following behavior: 1. data is ingested at an extreme fast pace into memtables and flush queue fills up 2. the available memory limit for memtables is reached and writes are no longer accepted 3. the application gets hit by write timeout, and retries repeatedly, in vain 4. after several failed attempts to write, the job gets aborted Desired behaviour: 1. data is ingested at an extreme fast pace into memtables and flush queue fills up 2. after exceeding some memtable fill threshold, C* applies adaptive rate limiting to writes - the more the buffers are filled-up, the less writes/s are accepted, however writes still occur within the write timeout. 3. thanks to slowed down data ingestion, now flush can finish before all the memory gets used Of course the details how rate limiting could be done are up for a discussion. It may be also worth considering putting such logic into the driver, not C* core, but then C* needs to expose at least the following information to the driver, so we could calculate the desired maximum data rate: 1. current amount of memory available for writes before they would completely block 2. total amount of data queued to be flushed and flush progress (amount of data to flush remaining for the memtable currently being flushed) 3. average flush write speed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7708) UDF schema change events/results
[ https://issues.apache.org/jira/browse/CASSANDRA-7708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp updated CASSANDRA-7708: Attachment: 7708-1.txt UDF schema change events/results Key: CASSANDRA-7708 URL: https://issues.apache.org/jira/browse/CASSANDRA-7708 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Labels: protocolv4 Fix For: 3.0 Attachments: 7708-1.txt Schema change notifications for UDF might be interesting for client. This covers both - the result of {{CREATE}} + {{DROP}} statements and events. Just adding {{FUNCTION}} as a new target for these events breaks previous native protocol contract. Proposal is to introduce a new target {{FUNCTION}} in native protocol v4. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7708) UDF schema change events/results
[ https://issues.apache.org/jira/browse/CASSANDRA-7708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243243#comment-14243243 ] Robert Stupp commented on CASSANDRA-7708: - Who can review the patch? UDF schema change events/results Key: CASSANDRA-7708 URL: https://issues.apache.org/jira/browse/CASSANDRA-7708 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Labels: protocolv4 Fix For: 3.0 Attachments: 7708-1.txt Schema change notifications for UDF might be interesting for client. This covers both - the result of {{CREATE}} + {{DROP}} statements and events. Just adding {{FUNCTION}} as a new target for these events breaks previous native protocol contract. Proposal is to introduce a new target {{FUNCTION}} in native protocol v4. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8374) Better support of null for UDF
[ https://issues.apache.org/jira/browse/CASSANDRA-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp updated CASSANDRA-8374: Assignee: Robert Stupp Better support of null for UDF -- Key: CASSANDRA-8374 URL: https://issues.apache.org/jira/browse/CASSANDRA-8374 Project: Cassandra Issue Type: Bug Reporter: Sylvain Lebresne Assignee: Robert Stupp Fix For: 3.0 Currently, every function needs to deal with it's argument potentially being {{null}}. There is very many case where that's just annoying, users should be able to define a function like: {noformat} CREATE FUNCTION addTwo(val int) RETURNS int LANGUAGE JAVA AS 'return val + 2;' {noformat} without having this crashing as soon as a column it's applied to doesn't a value for some rows (I'll note that this definition apparently cannot be compiled currently, which should be looked into). In fact, I think that by default methods shouldn't have to care about {{null}} values: if the value is {{null}}, we should not call the method at all and return {{null}}. There is still methods that may explicitely want to handle {{null}} (to return a default value for instance), so maybe we can add an {{ALLOW NULLS}} to the creation syntax. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8192) AssertionError in Memory.java
[ https://issues.apache.org/jira/browse/CASSANDRA-8192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joshua McKenzie updated CASSANDRA-8192: --- Attachment: 8192_v1.txt Looking at the raw data on these files, in both cases on the attached their file size is similar to the others however they are filled with 0's, meaning none of the parameters that CompressionMetadata needs to pull out are correctly set. This worked fine on 2.0.10 as we didn't query the chunkOffsetsSize during the constructor and only queried that data lazily - we changed our behavior to storing it at construction for CASSANDRA-6916. On 2.0.10, a simple select * from sstable_activity should get you an exception that would uncover that you have corrupt data. Attaching a patch for 2.1 to check for 0 chunks in a compressed file and throw an IOException if that's encountered. AssertionError in Memory.java - Key: CASSANDRA-8192 URL: https://issues.apache.org/jira/browse/CASSANDRA-8192 Project: Cassandra Issue Type: Bug Components: Core Environment: Windows-7-32 bit, 3GB RAM, Java 1.7.0_67 Reporter: Andreas Schnitzerling Assignee: Joshua McKenzie Fix For: 2.1.3 Attachments: 8192_v1.txt, cassandra.bat, cassandra.yaml, logdata-onlinedata-ka-196504-CompressionInfo.zip, printChunkOffsetErrors.txt, system-compactions_in_progress-ka-47594-CompressionInfo.zip, system-sstable_activity-jb-25-Filter.zip, system.log, system_AssertionTest.log Since update of 1 of 12 nodes from 2.1.0-rel to 2.1.1-rel Exception during start up. {panel:title=system.log} ERROR [SSTableBatchOpen:1] 2014-10-27 09:44:00,079 CassandraDaemon.java:153 - Exception in thread Thread[SSTableBatchOpen:1,5,main] java.lang.AssertionError: null at org.apache.cassandra.io.util.Memory.size(Memory.java:307) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:135) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:83) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:50) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:48) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:766) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:725) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:402) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:302) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:438) ~[apache-cassandra-2.1.1.jar:2.1.1] at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[na:1.7.0_55] at java.util.concurrent.FutureTask.run(Unknown Source) ~[na:1.7.0_55] at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [na:1.7.0_55] at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [na:1.7.0_55] at java.lang.Thread.run(Unknown Source) [na:1.7.0_55] {panel} In the attached log you can still see as well CASSANDRA-8069 and CASSANDRA-6283. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8465) Phase 1: Remove Singleton and break statics into classes
Joshua McKenzie created CASSANDRA-8465: -- Summary: Phase 1: Remove Singleton and break statics into classes Key: CASSANDRA-8465 URL: https://issues.apache.org/jira/browse/CASSANDRA-8465 Project: Cassandra Issue Type: Sub-task Components: Core Reporter: Joshua McKenzie Fix For: 3.0 1: Convert StorageProxy into a non-singleton 2: Writes * Regular * Counter * RegularBatch * CounterBatch * AtomicBatch 3: Reads * Regular * Range 4: LightweightTransaction * Write * Read 5: Truncate -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Deleted] (CASSANDRA-8443) Phase 1: Refactor StorageProxy read path into separate classes
[ https://issues.apache.org/jira/browse/CASSANDRA-8443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joshua McKenzie deleted CASSANDRA-8443: --- Phase 1: Refactor StorageProxy read path into separate classes -- Key: CASSANDRA-8443 URL: https://issues.apache.org/jira/browse/CASSANDRA-8443 Project: Cassandra Issue Type: Sub-task Reporter: Joshua McKenzie Refactor the read path inside StorageProxy into separate classes. All are Request/Response pairs: * Regular * Range Keep them synchronous for now and just break it out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Deleted] (CASSANDRA-8442) Phase 1: Refactor StorageProxy write path into separate classes
[ https://issues.apache.org/jira/browse/CASSANDRA-8442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joshua McKenzie deleted CASSANDRA-8442: --- Phase 1: Refactor StorageProxy write path into separate classes --- Key: CASSANDRA-8442 URL: https://issues.apache.org/jira/browse/CASSANDRA-8442 Project: Cassandra Issue Type: Sub-task Reporter: Joshua McKenzie Refactor the write path inside StorageProxy into separate classes. All are Request/Response pairs: * Regular * Counter * RegularBatch * CounterBatch * AtomicBatch Keep them sync for now and just break it out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Deleted] (CASSANDRA-8445) Phase 1: Refactor StorageProxy truncate into separate class
[ https://issues.apache.org/jira/browse/CASSANDRA-8445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joshua McKenzie deleted CASSANDRA-8445: --- Phase 1: Refactor StorageProxy truncate into separate class --- Key: CASSANDRA-8445 URL: https://issues.apache.org/jira/browse/CASSANDRA-8445 Project: Cassandra Issue Type: Sub-task Reporter: Joshua McKenzie Refactor truncation into separate class, keep it synchronous. Should be pretty trivial. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Deleted] (CASSANDRA-8441) Phase 1: Un-Singleton the StorageProxy
[ https://issues.apache.org/jira/browse/CASSANDRA-8441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joshua McKenzie deleted CASSANDRA-8441: --- Phase 1: Un-Singleton the StorageProxy -- Key: CASSANDRA-8441 URL: https://issues.apache.org/jira/browse/CASSANDRA-8441 Project: Cassandra Issue Type: Sub-task Reporter: Joshua McKenzie To test / refactor the StorageProxy it'll help for it not to be a singleton. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Deleted] (CASSANDRA-8444) Phase 1: Refactor StorageProxy LightWeightTransactions into separate classes
[ https://issues.apache.org/jira/browse/CASSANDRA-8444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joshua McKenzie deleted CASSANDRA-8444: --- Phase 1: Refactor StorageProxy LightWeightTransactions into separate classes Key: CASSANDRA-8444 URL: https://issues.apache.org/jira/browse/CASSANDRA-8444 Project: Cassandra Issue Type: Sub-task Reporter: Joshua McKenzie Refactor the lightweight transaction paths inside StorageProxy into separate classes. Request/Response pairs for: LightWeightTransactionRead LightWeightTransactionWrite Keep them synchronous for now and just break it out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8374) Better support of null for UDF
[ https://issues.apache.org/jira/browse/CASSANDRA-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243354#comment-14243354 ] Robert Stupp edited comment on CASSANDRA-8374 at 12/11/14 11:32 PM: Added optional {{ALLOW NULL}} to arguments and return type. For _java_ source UDFs {{ALLOW NULL}} also means that the Java primitive types (e.g. {{int}} instead of {{j.l.Integer}}) are used making the Java source much nicer: {code:title=current state} return val == null ? null : Double.valueOf(Math.sin(val.doubleValue())); {code} becomes {code:title=with ALLOW NULL for arg and return} return Math.sin(val); {code} I'm just thinking of whether to use {{ALLOW NULL}} for each individual argument and the return type or to use {{ALLOW NULLS}} globally. It's not much effort to additionally allow something like {{DEFAULT a_value}} for a UDF argument. Seems to be a nice option. (The linked git branch is not ready for review yet) was (Author: snazy): Added optional {{ALLOW NULL}} to arguments and return type. For _java_ source UDFs {{ALLOW NULL}} also means that the Java primitive types (e.g. {{int}} instead of {{j.l.Integer}}) are used making the Java source much nicer: {code:title=current state} return val == null ? null : Double.valueOf(Math.sin(val.doubleValue())); {code} becomes {code:title=with ALLOW NULL for arg and return} return Math.sin(val); {code} I'm just thinking of whether to use {{ALLOW NULL}} for each individual argument and the return type or to use {{ALLOW NULLS}} globally. It's not much effort to alternatively allow something like {{DEFAULT a_value}} for a UDF argument. Seems to be a nice option. (The linked git branch is not ready for review yet) Better support of null for UDF -- Key: CASSANDRA-8374 URL: https://issues.apache.org/jira/browse/CASSANDRA-8374 Project: Cassandra Issue Type: Bug Reporter: Sylvain Lebresne Assignee: Robert Stupp Fix For: 3.0 Currently, every function needs to deal with it's argument potentially being {{null}}. There is very many case where that's just annoying, users should be able to define a function like: {noformat} CREATE FUNCTION addTwo(val int) RETURNS int LANGUAGE JAVA AS 'return val + 2;' {noformat} without having this crashing as soon as a column it's applied to doesn't a value for some rows (I'll note that this definition apparently cannot be compiled currently, which should be looked into). In fact, I think that by default methods shouldn't have to care about {{null}} values: if the value is {{null}}, we should not call the method at all and return {{null}}. There is still methods that may explicitely want to handle {{null}} (to return a default value for instance), so maybe we can add an {{ALLOW NULLS}} to the creation syntax. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8374) Better support of null for UDF
[ https://issues.apache.org/jira/browse/CASSANDRA-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243354#comment-14243354 ] Robert Stupp commented on CASSANDRA-8374: - Added optional {{ALLOW NULL}} to arguments and return type. For _java_ source UDFs {{ALLOW NULL}} also means that the Java primitive types (e.g. {{int}} instead of {{j.l.Integer}}) are used making the Java source much nicer: {code:title=current state} return val == null ? null : Double.valueOf(Math.sin(val.doubleValue())); {code} becomes {code:title=with ALLOW NULL for arg and return} return Math.sin(val); {code} I'm just thinking of whether to use {{ALLOW NULL}} for each individual argument and the return type or to use {{ALLOW NULLS}} globally. It's not much effort to alternatively allow something like {{DEFAULT a_value}} for a UDF argument. Seems to be a nice option. (The linked git branch is not ready for review yet) Better support of null for UDF -- Key: CASSANDRA-8374 URL: https://issues.apache.org/jira/browse/CASSANDRA-8374 Project: Cassandra Issue Type: Bug Reporter: Sylvain Lebresne Assignee: Robert Stupp Fix For: 3.0 Currently, every function needs to deal with it's argument potentially being {{null}}. There is very many case where that's just annoying, users should be able to define a function like: {noformat} CREATE FUNCTION addTwo(val int) RETURNS int LANGUAGE JAVA AS 'return val + 2;' {noformat} without having this crashing as soon as a column it's applied to doesn't a value for some rows (I'll note that this definition apparently cannot be compiled currently, which should be looked into). In fact, I think that by default methods shouldn't have to care about {{null}} values: if the value is {{null}}, we should not call the method at all and return {{null}}. There is still methods that may explicitely want to handle {{null}} (to return a default value for instance), so maybe we can add an {{ALLOW NULLS}} to the creation syntax. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7708) UDF schema change events/results
[ https://issues.apache.org/jira/browse/CASSANDRA-7708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-7708: -- Reviewer: Tyler Hobbs [~thobbs]? UDF schema change events/results Key: CASSANDRA-7708 URL: https://issues.apache.org/jira/browse/CASSANDRA-7708 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Labels: protocolv4 Fix For: 3.0 Attachments: 7708-1.txt Schema change notifications for UDF might be interesting for client. This covers both - the result of {{CREATE}} + {{DROP}} statements and events. Just adding {{FUNCTION}} as a new target for these events breaks previous native protocol contract. Proposal is to introduce a new target {{FUNCTION}} in native protocol v4. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8374) Better support of null for UDF
[ https://issues.apache.org/jira/browse/CASSANDRA-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243444#comment-14243444 ] Robert Stupp commented on CASSANDRA-8374: - To clarify - the syntax would be: {code} CREATE FUNCTION foo ( argOne intDEFAULT 42, argTwo double ALLOW NULL, argThree float ) RETURN float LANGUAGE java ... {code} would generate a Java method with this signature: {code} public float execute(int argOne, java.lang.Double argTwo, float argThree) {code} If any of the arguments would be {{null}} and neither {{ALLOW NULL}} nor {{DEFAULT x}} has been declared, the method wouldn't be executed. There are also implications to aggregates (need some additional checks - {{INITCOND}} might be required for UDFs, otherwise the state/final function might never be called at all). Better support of null for UDF -- Key: CASSANDRA-8374 URL: https://issues.apache.org/jira/browse/CASSANDRA-8374 Project: Cassandra Issue Type: Bug Reporter: Sylvain Lebresne Assignee: Robert Stupp Fix For: 3.0 Currently, every function needs to deal with it's argument potentially being {{null}}. There is very many case where that's just annoying, users should be able to define a function like: {noformat} CREATE FUNCTION addTwo(val int) RETURNS int LANGUAGE JAVA AS 'return val + 2;' {noformat} without having this crashing as soon as a column it's applied to doesn't a value for some rows (I'll note that this definition apparently cannot be compiled currently, which should be looked into). In fact, I think that by default methods shouldn't have to care about {{null}} values: if the value is {{null}}, we should not call the method at all and return {{null}}. There is still methods that may explicitely want to handle {{null}} (to return a default value for instance), so maybe we can add an {{ALLOW NULLS}} to the creation syntax. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8452) Add missing systems to FBUtilities.isUnix, add FBUtilities.isWindows
[ https://issues.apache.org/jira/browse/CASSANDRA-8452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Blake Eggleston updated CASSANDRA-8452: --- Attachment: CASSANDRA-8452-v3.patch Well I'm glad you brought up procfs, since it turns out osx doesn't have one. The rest of the posix systems we're checking for do [according to wikipedia | http://en.wikipedia.org/wiki/Procfs]. I guess adding hasProcFS would make some sense, just for the sake of being correct. At the moment, it only affects whether some warnings are logged. Basically we eagerly try to open a proc file, and use isPosix to see if we should log anything if that fails. Relying on isPosix alone causes erroneous startup warnings in mac dev environments. I would think isWindows is enough for ntfs specific logic. Linux _can_ mount and write to ntfs disks, but I'm don't know how common it is for C* to be using it, outside of maybe some dual boot dev environments. I also not as clear if it would have the same behaviors we're coding around with isWindows. It's written against a spec that was reverse engineered from the Windows implementation. Add missing systems to FBUtilities.isUnix, add FBUtilities.isWindows Key: CASSANDRA-8452 URL: https://issues.apache.org/jira/browse/CASSANDRA-8452 Project: Cassandra Issue Type: Bug Reporter: Blake Eggleston Assignee: Blake Eggleston Priority: Minor Fix For: 2.1.3 Attachments: CASSANDRA-8452-v2.patch, CASSANDRA-8452-v3.patch, CASSANDRA-8452.patch The isUnix method leaves out a few unix systems, which, after the changes in CASSANDRA-8136, causes some unexpected behavior during shutdown. It would also be clearer if FBUtilities had an isWindows method for branching into Windows specific logic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8447) Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled
[ https://issues.apache.org/jira/browse/CASSANDRA-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243645#comment-14243645 ] jonathan lacefield commented on CASSANDRA-8447: --- Please find a new heap dump attached here - https://drive.google.com/file/d/0B4Imdpu2YrEbdVRNMGM4X3BTS3M/view?usp=sharing Test was executed after ensuring hints were disabled and did not exist in the cluster. Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled --- Key: CASSANDRA-8447 URL: https://issues.apache.org/jira/browse/CASSANDRA-8447 Project: Cassandra Issue Type: Bug Components: Core Environment: Cluster size - 4 nodes Node size - 12 CPU (hyper threaded to 24 cores), 192 GB RAM, 2 Raid 0 arrays (Data - 10 disk, spinning 10k drives | CL 2 disk, spinning 10k drives) OS - RHEL 6.5 jvm - oracle 1.7.0_71 Cassandra version 2.0.11 Reporter: jonathan lacefield Attachments: Node_with_compaction.png, Node_without_compaction.png, cassandra.yaml, gc.logs.tar.gz, gcinspector_messages.txt, memtable_debug, output.1.svg, output.2.svg, output.svg, results.tar.gz, visualvm_screenshot Behavior - If autocompaction is enabled, nodes will become unresponsive due to a full Old Gen heap which is not cleared during CMS GC. Test methodology - disabled autocompaction on 3 nodes, left autocompaction enabled on 1 node. Executed different Cassandra stress loads, using write only operations. Monitored visualvm and jconsole for heap pressure. Captured iostat and dstat for most tests. Captured heap dump from 50 thread load. Hints were disabled for testing on all nodes to alleviate GC noise due to hints backing up. Data load test through Cassandra stress - /usr/bin/cassandra-stress write n=19 -rate threads=different threads tested -schema replication\(factor=3\) keyspace=Keyspace1 -node all nodes listed Data load thread count and results: * 1 thread - Still running but looks like the node can sustain this load (approx 500 writes per second per node) * 5 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 2k writes per second per node) * 10 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range * 50 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 10k writes per second per node) * 100 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 20k writes per second per node) * 200 threads - Nodes become unresponsive due to full Old Gen Heap. CMS measured in the 60 second range (approx 25k writes per second per node) Note - the observed behavior was the same for all tests except for the single threaded test. The single threaded test does not appear to show this behavior. Tested different GC and Linux OS settings with a focus on the 50 and 200 thread loads. JVM settings tested: # default, out of the box, env-sh settings # 10 G Max | 1 G New - default env-sh settings # 10 G Max | 1 G New - default env-sh settings #* JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=50 # 20 G Max | 10 G New JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly JVM_OPTS=$JVM_OPTS -XX:+UseTLAB JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6 JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3 JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=12 JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=12 JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768 JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking # 20 G Max | 1 G New JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly JVM_OPTS=$JVM_OPTS -XX:+UseTLAB JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6 JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3 JVM_OPTS=$JVM_OPTS
[jira] [Commented] (CASSANDRA-8463) Upgrading 2.0 to 2.1 causes LCS to recompact all files
[ https://issues.apache.org/jira/browse/CASSANDRA-8463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243740#comment-14243740 ] Marcus Eriksson commented on CASSANDRA-8463: [~rbranson] could you attach logs for the upgraded node? Upgrading 2.0 to 2.1 causes LCS to recompact all files -- Key: CASSANDRA-8463 URL: https://issues.apache.org/jira/browse/CASSANDRA-8463 Project: Cassandra Issue Type: Bug Components: Core Environment: Hardware is recent 2-socket, 16-core (x2 Hyperthreaded), 144G RAM, solid-state storage. Platform is Linux 3.2.51, Oracle JDK 64-bit 1.7.0_65. Heap is 32G total, 4G newsize. 8G/8G on-heap/off-heap memtables, offheap_buffer allocator, 0.5 memtable_cleanup_threshold concurrent_compactors: 20 Reporter: Rick Branson Assignee: Marcus Eriksson Fix For: 2.1.3 It appears that tables configured with LCS will completely re-compact themselves over some period of time after upgrading from 2.0 to 2.1 (2.0.11 - 2.1.2, specifically). It starts out with 10 pending tasks for an hour or so, then starts building up, now with 50-100 tasks pending across the cluster after 12 hours. These nodes are under heavy write load, but were easily able to keep up in 2.0 (they rarely had 5 pending compaction tasks), so I don't think it's LCS in 2.1 actually being worse, just perhaps some different LCS behavior that causes the layout of tables from 2.0 to prompt the compactor to reorganize them? The nodes flushed ~11MB SSTables under 2.0. They're currently flushing ~36MB SSTables due to the improved memtable setup in 2.1. Before I upgraded the entire cluster to 2.1, I noticed the problem and tried several variations on the flush size, thinking perhaps the larger tables in L0 were causing some kind of cascading compactions. Even if they're sized roughly like the 2.0 flushes were, same behavior occurs. I also tried both enabling disabling STCS in L0 with no real change other than L0 began to back up faster, so I left the STCS in L0 enabled. Tables are configured with 32MB sstable_size_in_mb, which was found to be an improvement on the 160MB table size for compaction performance. Maybe this is wrong now? Otherwise, the tables are configured with defaults. Compaction has been unthrottled to help them catch-up. The compaction threads stay very busy, with the cluster-wide CPU at 45% nice time. No nodes have completely caught up yet. I'll update JIRA with status about their progress if anything interesting happens. From a node around 12 hours ago, around an hour after the upgrade, with 19 pending compaction tasks: SSTables in each level: [6/4, 10, 105/100, 268, 0, 0, 0, 0, 0] SSTables in each level: [6/4, 10, 106/100, 271, 0, 0, 0, 0, 0] SSTables in each level: [1, 16/10, 105/100, 269, 0, 0, 0, 0, 0] SSTables in each level: [5/4, 10, 103/100, 272, 0, 0, 0, 0, 0] SSTables in each level: [4, 11/10, 105/100, 270, 0, 0, 0, 0, 0] SSTables in each level: [1, 12/10, 105/100, 271, 0, 0, 0, 0, 0] SSTables in each level: [1, 14/10, 104/100, 267, 0, 0, 0, 0, 0] SSTables in each level: [9/4, 10, 103/100, 265, 0, 0, 0, 0, 0] Recently, with 41 pending compaction tasks: SSTables in each level: [4, 13/10, 106/100, 269, 0, 0, 0, 0, 0] SSTables in each level: [4, 12/10, 106/100, 273, 0, 0, 0, 0, 0] SSTables in each level: [5/4, 11/10, 106/100, 271, 0, 0, 0, 0, 0] SSTables in each level: [4, 12/10, 103/100, 275, 0, 0, 0, 0, 0] SSTables in each level: [2, 13/10, 106/100, 273, 0, 0, 0, 0, 0] SSTables in each level: [3, 10, 104/100, 275, 0, 0, 0, 0, 0] SSTables in each level: [6/4, 11/10, 103/100, 269, 0, 0, 0, 0, 0] SSTables in each level: [4, 16/10, 105/100, 264, 0, 0, 0, 0, 0] More information about the use case: writes are roughly uniform across these tables. The data is sharded across these 8 tables by key to improve compaction parallelism. Each node receives up to 75,000 writes/sec sustained at peak, and a small number of reads. This is a pre-production cluster that's being warmed up with new data, so the low volume of reads (~100/sec per node) is just from automatic sampled data checks, otherwise we'd just use STCS :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)