[jira] [Commented] (CASSANDRA-8061) tmplink files are not removed
[ https://issues.apache.org/jira/browse/CASSANDRA-8061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260951#comment-14260951 ] Benedict commented on CASSANDRA-8061: - Regrettably not. It was rebased to trunk when it was decided to bump it to a 3.0 release. If we decide (as I have proposed) to release for 2.1, I can rebase back again, but it's a somewhat painful job otherwise. tmplink files are not removed - Key: CASSANDRA-8061 URL: https://issues.apache.org/jira/browse/CASSANDRA-8061 Project: Cassandra Issue Type: Bug Components: Core Environment: Linux Reporter: Gianluca Borello Assignee: Joshua McKenzie Priority: Critical Fix For: 2.1.3 Attachments: 8061_v1.txt, 8248-thread_dump.txt After installing 2.1.0, I'm experiencing a bunch of tmplink files that are filling my disk. I found https://issues.apache.org/jira/browse/CASSANDRA-7803 and that is very similar, and I confirm it happens both on 2.1.0 as well as from the latest commit on the cassandra-2.1 branch (https://github.com/apache/cassandra/commit/aca80da38c3d86a40cc63d9a122f7d45258e4685 from the cassandra-2.1) Even starting with a clean keyspace, after a few hours I get: {noformat} $ sudo find /raid0 | grep tmplink | xargs du -hs 2.7G /raid0/cassandra/data/draios/protobuf1-ccc6dce04beb11e4abf997b38fbf920b/draios-protobuf1-tmplink-ka-4515-Data.db 13M /raid0/cassandra/data/draios/protobuf1-ccc6dce04beb11e4abf997b38fbf920b/draios-protobuf1-tmplink-ka-4515-Index.db 1.8G /raid0/cassandra/data/draios/protobuf_by_agent1-cd071a304beb11e4abf997b38fbf920b/draios-protobuf_by_agent1-tmplink-ka-1788-Data.db 12M /raid0/cassandra/data/draios/protobuf_by_agent1-cd071a304beb11e4abf997b38fbf920b/draios-protobuf_by_agent1-tmplink-ka-1788-Index.db 5.2M /raid0/cassandra/data/draios/protobuf_by_agent1-cd071a304beb11e4abf997b38fbf920b/draios-protobuf_by_agent1-tmplink-ka-2678-Index.db 822M /raid0/cassandra/data/draios/protobuf_by_agent1-cd071a304beb11e4abf997b38fbf920b/draios-protobuf_by_agent1-tmplink-ka-2678-Data.db 7.3M /raid0/cassandra/data/draios/protobuf_by_agent1-cd071a304beb11e4abf997b38fbf920b/draios-protobuf_by_agent1-tmplink-ka-3283-Index.db 1.2G /raid0/cassandra/data/draios/protobuf_by_agent1-cd071a304beb11e4abf997b38fbf920b/draios-protobuf_by_agent1-tmplink-ka-3283-Data.db 6.7M /raid0/cassandra/data/draios/protobuf_by_agent1-cd071a304beb11e4abf997b38fbf920b/draios-protobuf_by_agent1-tmplink-ka-3951-Index.db 1.1G /raid0/cassandra/data/draios/protobuf_by_agent1-cd071a304beb11e4abf997b38fbf920b/draios-protobuf_by_agent1-tmplink-ka-3951-Data.db 11M /raid0/cassandra/data/draios/protobuf_by_agent1-cd071a304beb11e4abf997b38fbf920b/draios-protobuf_by_agent1-tmplink-ka-4799-Index.db 1.7G /raid0/cassandra/data/draios/protobuf_by_agent1-cd071a304beb11e4abf997b38fbf920b/draios-protobuf_by_agent1-tmplink-ka-4799-Data.db 812K /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-234-Index.db 122M /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-208-Data.db 744K /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-739-Index.db 660K /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-193-Index.db 796K /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-230-Index.db 137M /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-230-Data.db 161M /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-269-Data.db 139M /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-234-Data.db 940K /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-786-Index.db 936K /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-269-Index.db 161M /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-786-Data.db 672K /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-197-Index.db 113M /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-193-Data.db
[jira] [Updated] (CASSANDRA-8546) RangeTombstoneList becoming bottleneck on tombstone heavy tasks
[ https://issues.apache.org/jira/browse/CASSANDRA-8546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Letz updated CASSANDRA-8546: Attachment: cassandra-2.1-8546.txt Makes sense. I'm attaching a rebased version for 2.1 - same issues and solution still applies. RangeTombstoneList becoming bottleneck on tombstone heavy tasks --- Key: CASSANDRA-8546 URL: https://issues.apache.org/jira/browse/CASSANDRA-8546 Project: Cassandra Issue Type: Improvement Components: Core Environment: 2.0.11 / 2.1 Reporter: Dominic Letz Fix For: 2.1.3 Attachments: cassandra-2.0.11-8546.txt, cassandra-2.1-8546.txt, tombstone_test.tgz I would like to propose a change of the data structure used in the RangeTombstoneList to store and insert tombstone ranges to something with at least O(log N) insert in the middle and at near O(1) and start AND end. Here is why: When having tombstone heavy work-loads the current implementation of RangeTombstoneList becomes a bottleneck with slice queries. Scanning the number of tombstones up to the default maximum (100k) can take up to 3 minutes of how addInternal() scales on insertion of middle and start elements. The attached test shows that with 50k deletes from both sides of a range. INSERT 1...11 flush() DELETE 1...5 DELETE 11...6 While one direction performs ok (~400ms on my notebook): {code} SELECT * FROM timeseries WHERE name = 'a' ORDER BY timestamp DESC LIMIT 1 {code} The other direction underperforms (~7seconds on my notebook) {code} SELECT * FROM timeseries WHERE name = 'a' ORDER BY timestamp ASC LIMIT 1 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6559) cqlsh should warn about ALLOW FILTERING
[ https://issues.apache.org/jira/browse/CASSANDRA-6559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260964#comment-14260964 ] Mike Adamson commented on CASSANDRA-6559: - If we kept the (Y/n) for every AF query could we have a cqlshrc / command line option to not ask the question but just print the warning? {noformat} [options] allow_dangerous_queries = true {noformat} cqlsh should warn about ALLOW FILTERING --- Key: CASSANDRA-6559 URL: https://issues.apache.org/jira/browse/CASSANDRA-6559 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Tupshin Harper Priority: Minor Labels: cqlsh Fix For: 2.0.12 ALLOW FILTERING can be a convenience for preliminary exploration of your data, and can be useful for batch jobs, but it is such an anti-pattern for regular production queries, that cqlsh should provie an explicit warn ingwhenever such a query is performed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8399) Reference Counter exception when dropping user type
[ https://issues.apache.org/jira/browse/CASSANDRA-8399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261044#comment-14261044 ] Benedict commented on CASSANDRA-8399: - Ah. I now see that CASSANDRA-8019 was the source of these woes, which was itself triggered by CASSANDRA-7932. I didn't have the context. It looks to me like the underlying problem is that the set of iscompacting can be disjoint from the set of isreferenced, and we should ensure it is always a subset. What I mean by this is that if an sstable is marked compacting, it should not be possible for it to be considered unreferenced. So I propose always acquiring a reference prior to successfully marking compacting, and releasing on unmarking. To relate to what you were saying about multiple concepts: we actually have three right now, not two, and two of these are extremely similar. By making one of the two similar concepts a subset of the other, we reduce the potential for mistakes. I have no doubt the 7932 mistake was down to my belief that a reference was maintained by AbstractCompactionTask, when in fact it is only the iscompacting state. Reference Counter exception when dropping user type --- Key: CASSANDRA-8399 URL: https://issues.apache.org/jira/browse/CASSANDRA-8399 Project: Cassandra Issue Type: Bug Reporter: Philip Thompson Assignee: Joshua McKenzie Fix For: 2.1.3 Attachments: 8399_fix_empty_results.txt, 8399_v2.txt, node2.log, ubuntu-8399.log When running the dtest {{user_types_test.py:TestUserTypes.test_type_keyspace_permission_isolation}} with the current 2.1-HEAD code, very frequently, but not always, when dropping a type, the following exception is seen:{code} ERROR [MigrationStage:1] 2014-12-01 13:54:54,824 CassandraDaemon.java:170 - Exception in thread Thread[MigrationStage:1,5,main] java.lang.AssertionError: Reference counter -1 for /var/folders/v3/z4wf_34n1q506_xjdy49gb78gn/T/dtest-eW2RXj/test/node2/data/system/schema_keyspaces-b0f2235744583cdb9631c43e59ce3676/system-sche ma_keyspaces-ka-14-Data.db at org.apache.cassandra.io.sstable.SSTableReader.releaseReference(SSTableReader.java:1662) ~[main/:na] at org.apache.cassandra.io.sstable.SSTableScanner.close(SSTableScanner.java:164) ~[main/:na] at org.apache.cassandra.utils.MergeIterator.close(MergeIterator.java:62) ~[main/:na] at org.apache.cassandra.db.ColumnFamilyStore$8.close(ColumnFamilyStore.java:1943) ~[main/:na] at org.apache.cassandra.db.ColumnFamilyStore.filter(ColumnFamilyStore.java:2116) ~[main/:na] at org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:2029) ~[main/:na] at org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1963) ~[main/:na] at org.apache.cassandra.db.SystemKeyspace.serializedSchema(SystemKeyspace.java:744) ~[main/:na] at org.apache.cassandra.db.SystemKeyspace.serializedSchema(SystemKeyspace.java:731) ~[main/:na] at org.apache.cassandra.config.Schema.updateVersion(Schema.java:374) ~[main/:na] at org.apache.cassandra.config.Schema.updateVersionAndAnnounce(Schema.java:399) ~[main/:na] at org.apache.cassandra.db.DefsTables.mergeSchema(DefsTables.java:167) ~[main/:na] at org.apache.cassandra.db.DefinitionsUpdateVerbHandler$1.runMayThrow(DefinitionsUpdateVerbHandler.java:49) ~[main/:na] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[main/:na] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_67] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_67] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_67] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_67] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_67]{code} Log of the node with the error is attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8409) Node generating a huge number of tiny sstable_activity flushes
[ https://issues.apache.org/jira/browse/CASSANDRA-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261133#comment-14261133 ] Benedict commented on CASSANDRA-8409: - This is likely fixed in the latest repository version, but I'm not sure if that is generally ready for production deployment. Given the holiday period we are unlikely to be deploying a production release for a couple of weeks or more. You could try applying the patch for CASSANDRA-8018 only, and seeing if this solves your problem. Alternatively, ensure your clients are batching updates to a single partition and that multiple clients are not aggressively modifying the same partition simultaneously. Node generating a huge number of tiny sstable_activity flushes -- Key: CASSANDRA-8409 URL: https://issues.apache.org/jira/browse/CASSANDRA-8409 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassandra 2.1.0, Oracle JDK 1.8.0_25, Ubuntu 12.04 Reporter: Fred Wulff Fix For: 2.1.3 Attachments: system-sstable_activity-ka-67802-Data.db On one of my nodes, I’m seeing hundreds per second of “INFO 21:28:05 Enqueuing flush of sstable_activity: 0 (0%) on-heap, 33 (0%) off-heap”. tpstats shows a steadily climbing # of pending MemtableFlushWriter/MemtablePostFlush until the node OOMs. When the flushes actually happen the sstable written is invariably 121 bytes. I’m writing pretty aggressively to one of my user tables (sev.mdb_group_pit), but that table's flushing behavior seems reasonable. tpstats: {quote} frew@hostname:~/s_dist/apache-cassandra-2.1.0$ bin/nodetool -h hostname tpstats Pool NameActive Pending Completed Blocked All time blocked MutationStage 128 4429 36810 0 0 ReadStage 0 0 1205 0 0 RequestResponseStage 0 0 24910 0 0 ReadRepairStage 0 0 26 0 0 CounterMutationStage 0 0 0 0 0 MiscStage 0 0 0 0 0 HintedHandoff 2 2 9 0 0 GossipStage 0 0 5157 0 0 CacheCleanupExecutor 0 0 0 0 0 InternalResponseStage 0 0 0 0 0 CommitLogArchiver 0 0 0 0 0 CompactionExecutor428429 0 0 ValidationExecutor0 0 0 0 0 MigrationStage0 0 0 0 0 AntiEntropyStage 0 0 0 0 0 PendingRangeCalculator0 0 11 0 0 MemtableFlushWriter 8 38644 8987 0 0 MemtablePostFlush 1 38940 8735 0 0 MemtableReclaimMemory 0 0 8987 0 0 Message type Dropped READ 0 RANGE_SLICE 0 _TRACE 0 MUTATION 10457 COUNTER_MUTATION 0 BINARY 0 REQUEST_RESPONSE 0 PAGED_RANGE 0 READ_REPAIR208 {quote} I've attached one of the produced sstables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-8409) Node generating a huge number of tiny sstable_activity flushes
[ https://issues.apache.org/jira/browse/CASSANDRA-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict resolved CASSANDRA-8409. - Resolution: Duplicate Node generating a huge number of tiny sstable_activity flushes -- Key: CASSANDRA-8409 URL: https://issues.apache.org/jira/browse/CASSANDRA-8409 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassandra 2.1.0, Oracle JDK 1.8.0_25, Ubuntu 12.04 Reporter: Fred Wulff Fix For: 2.1.3 Attachments: system-sstable_activity-ka-67802-Data.db On one of my nodes, I’m seeing hundreds per second of “INFO 21:28:05 Enqueuing flush of sstable_activity: 0 (0%) on-heap, 33 (0%) off-heap”. tpstats shows a steadily climbing # of pending MemtableFlushWriter/MemtablePostFlush until the node OOMs. When the flushes actually happen the sstable written is invariably 121 bytes. I’m writing pretty aggressively to one of my user tables (sev.mdb_group_pit), but that table's flushing behavior seems reasonable. tpstats: {quote} frew@hostname:~/s_dist/apache-cassandra-2.1.0$ bin/nodetool -h hostname tpstats Pool NameActive Pending Completed Blocked All time blocked MutationStage 128 4429 36810 0 0 ReadStage 0 0 1205 0 0 RequestResponseStage 0 0 24910 0 0 ReadRepairStage 0 0 26 0 0 CounterMutationStage 0 0 0 0 0 MiscStage 0 0 0 0 0 HintedHandoff 2 2 9 0 0 GossipStage 0 0 5157 0 0 CacheCleanupExecutor 0 0 0 0 0 InternalResponseStage 0 0 0 0 0 CommitLogArchiver 0 0 0 0 0 CompactionExecutor428429 0 0 ValidationExecutor0 0 0 0 0 MigrationStage0 0 0 0 0 AntiEntropyStage 0 0 0 0 0 PendingRangeCalculator0 0 11 0 0 MemtableFlushWriter 8 38644 8987 0 0 MemtablePostFlush 1 38940 8735 0 0 MemtableReclaimMemory 0 0 8987 0 0 Message type Dropped READ 0 RANGE_SLICE 0 _TRACE 0 MUTATION 10457 COUNTER_MUTATION 0 BINARY 0 REQUEST_RESPONSE 0 PAGED_RANGE 0 READ_REPAIR208 {quote} I've attached one of the produced sstables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8325) Cassandra 2.1.x fails to start on FreeBSD (JVM crash)
[ https://issues.apache.org/jira/browse/CASSANDRA-8325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261151#comment-14261151 ] graham sanderson commented on CASSANDRA-8325: - My first patch was kind of ugly (shortest path to probable correctness); whilst I could do the same thing for the comparators, at that point the code will cease to be very elegant, and probably in need of some refactoring to make it more approachable. [~benedict] do you want to take this issue. I was really only taking a look out of curiosity (hadn't seen it in our playings with 2.1.x), but it just seemed odd that there should be an issue with Sun.misc.Unsafe on FreeBSD. We don't use FreeBSD or plan to, but as mentioned above, this is actually incorrect code although it happens to work on most other things. It seems like you took great pains to get the {{@Inline}} mix right, and you have much more context than me with what you were thinking (again as per {{MIN_COPY_THRESHOLD}} and its only partial use - which may not have been intended). Cassandra 2.1.x fails to start on FreeBSD (JVM crash) - Key: CASSANDRA-8325 URL: https://issues.apache.org/jira/browse/CASSANDRA-8325 Project: Cassandra Issue Type: Bug Environment: FreeBSD 10.0 with openjdk version 1.7.0_71, 64-Bit Server VM Reporter: Leonid Shalupov Attachments: hs_err_pid1856.log, system.log, unsafeCopy1.txt See attached error file after JVM crash {quote} FreeBSD xxx.intellij.net 10.0-RELEASE FreeBSD 10.0-RELEASE #0 r260789: Thu Jan 16 22:34:59 UTC 2014 r...@snap.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 {quote} {quote} % java -version openjdk version 1.7.0_71 OpenJDK Runtime Environment (build 1.7.0_71-b14) OpenJDK 64-Bit Server VM (build 24.71-b01, mixed mode) {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8464) Support direct buffer decompression for reads
[ https://issues.apache.org/jira/browse/CASSANDRA-8464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261154#comment-14261154 ] T Jake Luciani commented on CASSANDRA-8464: --- No, I don't think we would remove mmap for 3.0. And this ticket does allow mmapping of compressed data Support direct buffer decompression for reads - Key: CASSANDRA-8464 URL: https://issues.apache.org/jira/browse/CASSANDRA-8464 Project: Cassandra Issue Type: Improvement Reporter: T Jake Luciani Assignee: T Jake Luciani Labels: performance Fix For: 3.0 Attachments: compression_direct.png Currently when we read a compressed sstable we copy the data on heap then send it to be de-compressed to another on heap buffer (albeit pooled). But now both snappy and lz4 (with CASSANDRA-7039) allow decompression of direct byte buffers. This lets us mmap the data and decompress completely off heap (and avoids moving bytes over JNI). One issue is performing the checksum offheap but the Adler32 does support in java 8 (it's also in java 7 but marked private?!) This change yields a 10% boost in read performance on cstar. Locally I see upto 30% improvement. http://cstar.datastax.com/graph?stats=5ebcdd70-816b-11e4-aed6-42010af0688fmetric=op_rateoperation=2_readsmoothing=1show_aggregates=truexmin=0xmax=200.09ymin=0ymax=135908.3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8325) Cassandra 2.1.x fails to start on FreeBSD (JVM crash)
[ https://issues.apache.org/jira/browse/CASSANDRA-8325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261161#comment-14261161 ] Benedict commented on CASSANDRA-8325: - I agree this should be addressed, but while it's reasonably trivial to knock out a patch that makes FreeBSD work, it's not so trivial to make it work neatly without penalty on the other systems we target. Since it's not a supported system and I have a lot of other pressing work to do, I can't really spare the time right now. I'll certainly take a look if nobody else has once some time frees up, however. As to MIN_COPY_THRESHOLD, the behaviour was largely adopted from the JDK, and this doesn't apply the limit to offheap-to-offheap copies. Whether or not this is a good or bad distinction, I haven't investigated. Cassandra 2.1.x fails to start on FreeBSD (JVM crash) - Key: CASSANDRA-8325 URL: https://issues.apache.org/jira/browse/CASSANDRA-8325 Project: Cassandra Issue Type: Bug Environment: FreeBSD 10.0 with openjdk version 1.7.0_71, 64-Bit Server VM Reporter: Leonid Shalupov Attachments: hs_err_pid1856.log, system.log, unsafeCopy1.txt See attached error file after JVM crash {quote} FreeBSD xxx.intellij.net 10.0-RELEASE FreeBSD 10.0-RELEASE #0 r260789: Thu Jan 16 22:34:59 UTC 2014 r...@snap.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 {quote} {quote} % java -version openjdk version 1.7.0_71 OpenJDK Runtime Environment (build 1.7.0_71-b14) OpenJDK 64-Bit Server VM (build 24.71-b01, mixed mode) {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8409) Node generating a huge number of tiny sstable_activity flushes
[ https://issues.apache.org/jira/browse/CASSANDRA-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261191#comment-14261191 ] Rahul Bhardwaj commented on CASSANDRA-8409: --- thanks for your reply.. am also facing one issue my cassandra daemon consuming whole RAM of 64 GB...can u plz tell how can i cope up with this issue.. Node generating a huge number of tiny sstable_activity flushes -- Key: CASSANDRA-8409 URL: https://issues.apache.org/jira/browse/CASSANDRA-8409 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassandra 2.1.0, Oracle JDK 1.8.0_25, Ubuntu 12.04 Reporter: Fred Wulff Fix For: 2.1.3 Attachments: system-sstable_activity-ka-67802-Data.db On one of my nodes, I’m seeing hundreds per second of “INFO 21:28:05 Enqueuing flush of sstable_activity: 0 (0%) on-heap, 33 (0%) off-heap”. tpstats shows a steadily climbing # of pending MemtableFlushWriter/MemtablePostFlush until the node OOMs. When the flushes actually happen the sstable written is invariably 121 bytes. I’m writing pretty aggressively to one of my user tables (sev.mdb_group_pit), but that table's flushing behavior seems reasonable. tpstats: {quote} frew@hostname:~/s_dist/apache-cassandra-2.1.0$ bin/nodetool -h hostname tpstats Pool NameActive Pending Completed Blocked All time blocked MutationStage 128 4429 36810 0 0 ReadStage 0 0 1205 0 0 RequestResponseStage 0 0 24910 0 0 ReadRepairStage 0 0 26 0 0 CounterMutationStage 0 0 0 0 0 MiscStage 0 0 0 0 0 HintedHandoff 2 2 9 0 0 GossipStage 0 0 5157 0 0 CacheCleanupExecutor 0 0 0 0 0 InternalResponseStage 0 0 0 0 0 CommitLogArchiver 0 0 0 0 0 CompactionExecutor428429 0 0 ValidationExecutor0 0 0 0 0 MigrationStage0 0 0 0 0 AntiEntropyStage 0 0 0 0 0 PendingRangeCalculator0 0 11 0 0 MemtableFlushWriter 8 38644 8987 0 0 MemtablePostFlush 1 38940 8735 0 0 MemtableReclaimMemory 0 0 8987 0 0 Message type Dropped READ 0 RANGE_SLICE 0 _TRACE 0 MUTATION 10457 COUNTER_MUTATION 0 BINARY 0 REQUEST_RESPONSE 0 PAGED_RANGE 0 READ_REPAIR208 {quote} I've attached one of the produced sstables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8536) Wrong cluster information and replication
[ https://issues.apache.org/jira/browse/CASSANDRA-8536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Hobbs updated CASSANDRA-8536: --- Description: Two machine cluster - Cassandra 2.1.2, GossipingPropertyFileSnitch, one data center with one rack. Seed - 10.0.0.2 Node - 10.0.0.3 -start seed -start node Run nodetool status on any machine: {noformat} Datacenter: DC1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens OwnsHost ID Rack UN 10.0.0.3 107.15 KB 256 ? ad29cd96-d21e-4d02-94e7-0fd68ef5fbad RAC1 UN 10.0.0.2 87.73 KB 256 ? c26fdffc-6df5-4d1a-8eda-6d585d2178c1 RAC1 {noformat} -stop both instances -run seed -run nodetool status on seed {noformat} Datacenter: DC1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens OwnsHost ID Rack UN 10.0.0.2 113.31 KB 256 ? c26fdffc-6df5-4d1a-8eda-6d585d2178c1 RAC1 {noformat} So no information about node 10.0.0.3 at all. Actually main problem is not wrong info, but replication/synchronization problem: On seed (after restart, when 2nd node is down) create keyspace with replication factor 2 (strategy doesn't matter), create table, insert something into table: {noformat} CREATE KEYSPACE Excelsior WITH REPLICATION={'class':'SimpleStrategy','replication_factor':2}; CREATE TABLE Excelsior.users (name text PRIMARY KEY, id int); INSERT INTO excelsior.users (name, id ) VALUES ( 'node',123); SELECT * FROM excelsior.users; name | id --+- node | 123 (1 rows) {noformat} Start node, now nodetool status shows both nodes UN on both machines again. Now created keyspace and table are seen on node (create was propagated from seed), but is empty from node point of view: {noformat} SELECT * FROM excelsior.users; name | id --+ (0 rows) {noformat} I guess synchronization problem probably not different bug, but stems from wrong cluster information. Version 2.0.11 works fine. was: Two machine cluster - Cassandra 2.1.2, GossipingPropertyFileSnitch, one data center with one rack. Seed - 10.0.0.2 Node - 10.0.0.3 -start seed -start node Run nodetool status on any machine: {quote} Datacenter: DC1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens OwnsHost ID Rack UN 10.0.0.3 107.15 KB 256 ? ad29cd96-d21e-4d02-94e7-0fd68ef5fbad RAC1 UN 10.0.0.2 87.73 KB 256 ? c26fdffc-6df5-4d1a-8eda-6d585d2178c1 RAC1 {quote} -stop both instances -run seed -run nodetool status on seed {quote} Datacenter: DC1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens OwnsHost ID Rack UN 10.0.0.2 113.31 KB 256 ? c26fdffc-6df5-4d1a-8eda-6d585d2178c1 RAC1 {quote} So no information about node 10.0.0.3 at all. Actually main problem is not wrong info, but replication/synchronization problem: On seed (after restart, when 2nd node is down) create keyspace with replication factor 2 (strategy doesn't matter), create table, insert something into table: CREATE KEYSPACE Excelsior WITH REPLICATION={'class':'SimpleStrategy','replication_factor':2}; CREATE TABLE Excelsior.users (name text PRIMARY KEY, id int); INSERT INTO excelsior.users (name, id ) VALUES ( 'node',123); SELECT * FROM excelsior.users; name | id --+- node | 123 (1 rows) Start node, now nodetool status shows both nodes UN on both machines again. Now created keyspace and table are seen on node (create was propagated from seed), but is empty from node point of view: {quote} SELECT * FROM excelsior.users; name | id --+ (0 rows) {quote} I guess synchronization problem probably not different bug, but stems from wrong cluster information. Version 2.0.11 works fine. Wrong cluster information and replication - Key: CASSANDRA-8536 URL: https://issues.apache.org/jira/browse/CASSANDRA-8536 Project: Cassandra Issue Type: Bug Components: Core Environment: CentOS 7 x64 Reporter: Vova Two machine cluster - Cassandra 2.1.2, GossipingPropertyFileSnitch, one data center with one rack. Seed - 10.0.0.2 Node - 10.0.0.3 -start seed -start node Run nodetool status on any machine: {noformat} Datacenter: DC1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens OwnsHost ID Rack UN 10.0.0.3 107.15 KB 256 ? ad29cd96-d21e-4d02-94e7-0fd68ef5fbad RAC1 UN 10.0.0.2 87.73 KB 256 ? c26fdffc-6df5-4d1a-8eda-6d585d2178c1 RAC1 {noformat} -stop both instances -run seed -run
[jira] [Updated] (CASSANDRA-8536) Wrong cluster information and replication
[ https://issues.apache.org/jira/browse/CASSANDRA-8536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Hobbs updated CASSANDRA-8536: --- Assignee: Brandon Williams Wrong cluster information and replication - Key: CASSANDRA-8536 URL: https://issues.apache.org/jira/browse/CASSANDRA-8536 Project: Cassandra Issue Type: Bug Components: Core Environment: CentOS 7 x64 Reporter: Vova Assignee: Brandon Williams Two machine cluster - Cassandra 2.1.2, GossipingPropertyFileSnitch, one data center with one rack. Seed - 10.0.0.2 Node - 10.0.0.3 -start seed -start node Run nodetool status on any machine: {noformat} Datacenter: DC1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens OwnsHost ID Rack UN 10.0.0.3 107.15 KB 256 ? ad29cd96-d21e-4d02-94e7-0fd68ef5fbad RAC1 UN 10.0.0.2 87.73 KB 256 ? c26fdffc-6df5-4d1a-8eda-6d585d2178c1 RAC1 {noformat} -stop both instances -run seed -run nodetool status on seed {noformat} Datacenter: DC1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens OwnsHost ID Rack UN 10.0.0.2 113.31 KB 256 ? c26fdffc-6df5-4d1a-8eda-6d585d2178c1 RAC1 {noformat} So no information about node 10.0.0.3 at all. Actually main problem is not wrong info, but replication/synchronization problem: On seed (after restart, when 2nd node is down) create keyspace with replication factor 2 (strategy doesn't matter), create table, insert something into table: {noformat} CREATE KEYSPACE Excelsior WITH REPLICATION={'class':'SimpleStrategy','replication_factor':2}; CREATE TABLE Excelsior.users (name text PRIMARY KEY, id int); INSERT INTO excelsior.users (name, id ) VALUES ( 'node',123); SELECT * FROM excelsior.users; name | id --+- node | 123 (1 rows) {noformat} Start node, now nodetool status shows both nodes UN on both machines again. Now created keyspace and table are seen on node (create was propagated from seed), but is empty from node point of view: {noformat} SELECT * FROM excelsior.users; name | id --+ (0 rows) {noformat} I guess synchronization problem probably not different bug, but stems from wrong cluster information. Version 2.0.11 works fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8245) Cassandra nodes periodically die in 2-DC configuration
[ https://issues.apache.org/jira/browse/CASSANDRA-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261238#comment-14261238 ] Donald Smith commented on CASSANDRA-8245: - We're getting a similar increase in the number of pending Gossip stage tasks, followed by OutOfMemory. This happens once a day or so on some node of our 38 node DC. Other nodes have increases in pending Gossip stage tasks but they recover. This is with C* 2.0.11.We have two other DCs. ntpd is running on all nodes. But all nodes on one DC are down now. Cassandra nodes periodically die in 2-DC configuration -- Key: CASSANDRA-8245 URL: https://issues.apache.org/jira/browse/CASSANDRA-8245 Project: Cassandra Issue Type: Bug Components: Core Environment: Scientific Linux release 6.5 java version 1.7.0_51 Cassandra 2.0.9 Reporter: Oleg Poleshuk Assignee: Brandon Williams Priority: Minor Attachments: stack1.txt, stack2.txt, stack3.txt, stack4.txt, stack5.txt We have 2 DCs with 3 nodes in each. Second DC periodically has 1-2 nodes down. Looks like it looses connectivity with another nodes and then Gossiper starts to accumulate tasks until Cassandra dies with OOM. WARN [MemoryMeter:1] 2014-08-12 14:34:59,803 Memtable.java (line 470) setting live ratio to maximum of 64.0 instead of Infinity WARN [GossipTasks:1] 2014-08-12 14:44:34,866 Gossiper.java (line 637) Gossip stage has 1 pending tasks; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2014-08-12 14:44:35,968 Gossiper.java (line 637) Gossip stage has 4 pending tasks; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2014-08-12 14:44:37,070 Gossiper.java (line 637) Gossip stage has 8 pending tasks; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2014-08-12 14:44:38,171 Gossiper.java (line 637) Gossip stage has 11 pending tasks; skipping status check (no nodes will be marked down) ... WARN [GossipTasks:1] 2014-10-06 21:42:51,575 Gossiper.java (line 637) Gossip stage has 1014764 pending tasks; skipping status check (no nodes will be marked down) WARN [New I/O worker #13] 2014-10-06 21:54:27,010 Slf4JLogger.java (line 76) Unexpected exception in the selector loop. java.lang.OutOfMemoryError: Java heap space Also those lines but not sure it is relevant: DEBUG [GossipStage:1] 2014-08-12 11:33:18,801 FailureDetector.java (line 338) Ignoring interval time of 2085963047 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8546) RangeTombstoneList becoming bottleneck on tombstone heavy tasks
[ https://issues.apache.org/jira/browse/CASSANDRA-8546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Hobbs updated CASSANDRA-8546: --- Assignee: Dominic Letz RangeTombstoneList becoming bottleneck on tombstone heavy tasks --- Key: CASSANDRA-8546 URL: https://issues.apache.org/jira/browse/CASSANDRA-8546 Project: Cassandra Issue Type: Improvement Components: Core Environment: 2.0.11 / 2.1 Reporter: Dominic Letz Assignee: Dominic Letz Fix For: 2.1.3 Attachments: cassandra-2.0.11-8546.txt, cassandra-2.1-8546.txt, tombstone_test.tgz I would like to propose a change of the data structure used in the RangeTombstoneList to store and insert tombstone ranges to something with at least O(log N) insert in the middle and at near O(1) and start AND end. Here is why: When having tombstone heavy work-loads the current implementation of RangeTombstoneList becomes a bottleneck with slice queries. Scanning the number of tombstones up to the default maximum (100k) can take up to 3 minutes of how addInternal() scales on insertion of middle and start elements. The attached test shows that with 50k deletes from both sides of a range. INSERT 1...11 flush() DELETE 1...5 DELETE 11...6 While one direction performs ok (~400ms on my notebook): {code} SELECT * FROM timeseries WHERE name = 'a' ORDER BY timestamp DESC LIMIT 1 {code} The other direction underperforms (~7seconds on my notebook) {code} SELECT * FROM timeseries WHERE name = 'a' ORDER BY timestamp ASC LIMIT 1 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8546) RangeTombstoneList becoming bottleneck on tombstone heavy tasks
[ https://issues.apache.org/jira/browse/CASSANDRA-8546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261256#comment-14261256 ] Tyler Hobbs commented on CASSANDRA-8546: I think it'd be better for the perf team to take a look at this than me. [~benedict] do you want to take a look at this or assign somebody to review? RangeTombstoneList becoming bottleneck on tombstone heavy tasks --- Key: CASSANDRA-8546 URL: https://issues.apache.org/jira/browse/CASSANDRA-8546 Project: Cassandra Issue Type: Improvement Components: Core Environment: 2.0.11 / 2.1 Reporter: Dominic Letz Assignee: Dominic Letz Fix For: 2.1.3 Attachments: cassandra-2.0.11-8546.txt, cassandra-2.1-8546.txt, tombstone_test.tgz I would like to propose a change of the data structure used in the RangeTombstoneList to store and insert tombstone ranges to something with at least O(log N) insert in the middle and at near O(1) and start AND end. Here is why: When having tombstone heavy work-loads the current implementation of RangeTombstoneList becomes a bottleneck with slice queries. Scanning the number of tombstones up to the default maximum (100k) can take up to 3 minutes of how addInternal() scales on insertion of middle and start elements. The attached test shows that with 50k deletes from both sides of a range. INSERT 1...11 flush() DELETE 1...5 DELETE 11...6 While one direction performs ok (~400ms on my notebook): {code} SELECT * FROM timeseries WHERE name = 'a' ORDER BY timestamp DESC LIMIT 1 {code} The other direction underperforms (~7seconds on my notebook) {code} SELECT * FROM timeseries WHERE name = 'a' ORDER BY timestamp ASC LIMIT 1 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8390) The process cannot access the file because it is being used by another process
[ https://issues.apache.org/jira/browse/CASSANDRA-8390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Thompson updated CASSANDRA-8390: --- Description: {code}21:46:27.810 [NonPeriodicTasks:1] ERROR o.a.c.service.CassandraDaemon - Exception in thread Thread[NonPeriodicTasks:1,5,main] org.apache.cassandra.io.FSWriteError: java.nio.file.FileSystemException: E:\Upsource_12391\data\cassandra\data\kernel\filechangehistory_t-a277b560764611e48c8e4915424c75fe\kernel-filechangehistory_t-ka-33-Index.db: The process cannot access the file because it is being used by another process. at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:135) ~[cassandra-all-2.1.1.jar:2.1.1] at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:121) ~[cassandra-all-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTable.delete(SSTable.java:113) ~[cassandra-all-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableDeletingTask.run(SSTableDeletingTask.java:94) ~[cassandra-all-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableReader$6.run(SSTableReader.java:664) ~[cassandra-all-2.1.1.jar:2.1.1] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_71] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_71] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) ~[na:1.7.0_71] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) ~[na:1.7.0_71] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_71] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71] Caused by: java.nio.file.FileSystemException: E:\Upsource_12391\data\cassandra\data\kernel\filechangehistory_t-a277b560764611e48c8e4915424c75fe\kernel-filechangehistory_t-ka-33-Index.db: The process cannot access the file because it is being used by another process. at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) ~[na:1.7.0_71] at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) ~[na:1.7.0_71] at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102) ~[na:1.7.0_71] at sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269) ~[na:1.7.0_71] at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103) ~[na:1.7.0_71] at java.nio.file.Files.delete(Files.java:1079) ~[na:1.7.0_71] at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:131) ~[cassandra-all-2.1.1.jar:2.1.1] ... 11 common frames omitted{code} was: 21:46:27.810 [NonPeriodicTasks:1] ERROR o.a.c.service.CassandraDaemon - Exception in thread Thread[NonPeriodicTasks:1,5,main] org.apache.cassandra.io.FSWriteError: java.nio.file.FileSystemException: E:\Upsource_12391\data\cassandra\data\kernel\filechangehistory_t-a277b560764611e48c8e4915424c75fe\kernel-filechangehistory_t-ka-33-Index.db: The process cannot access the file because it is being used by another process. at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:135) ~[cassandra-all-2.1.1.jar:2.1.1] at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:121) ~[cassandra-all-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTable.delete(SSTable.java:113) ~[cassandra-all-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableDeletingTask.run(SSTableDeletingTask.java:94) ~[cassandra-all-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableReader$6.run(SSTableReader.java:664) ~[cassandra-all-2.1.1.jar:2.1.1] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_71] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_71] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) ~[na:1.7.0_71] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) ~[na:1.7.0_71] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_71] at
[jira] [Comment Edited] (CASSANDRA-8245) Cassandra nodes periodically die in 2-DC configuration
[ https://issues.apache.org/jira/browse/CASSANDRA-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261238#comment-14261238 ] Donald Smith edited comment on CASSANDRA-8245 at 12/30/14 5:20 PM: --- We're getting a similar increase in the number of pending Gossip stage tasks, followed by OutOfMemory. This happens once a day or so on some node of our 38 node DC. Other nodes have increases in pending Gossip stage tasks but they recover. This is with C* 2.0.11.We have two other DCs. ntpd is running on all nodes. But all nodes on one DC are down now. What's odd is that the cassandra process continues running despite the OutOfMemory exception. You'd expect it to exit. {noformat} WARN [GossipTasks:1] 2014-12-26 02:45:06,204 Gossiper.java (line 648) Gossip stage has 2695 pending tasks; skipping status check (no nodes will be marked down) ERROR [Thread-49234] 2014-12-26 07:18:42,281 CassandraDaemon.java (line 199) Exception in thread Thread[Thread-49234,5,main] java.lang.OutOfMemoryError: Java heap space ERROR [Thread-49235] 2014-12-26 07:18:42,291 CassandraDaemon.java (line 199) Exception in thread Thread[Thread-49235,5,main] java.lang.OutOfMemoryError: Java heap space ... {noformat} was (Author: thinkerfeeler): We're getting a similar increase in the number of pending Gossip stage tasks, followed by OutOfMemory. This happens once a day or so on some node of our 38 node DC. Other nodes have increases in pending Gossip stage tasks but they recover. This is with C* 2.0.11.We have two other DCs. ntpd is running on all nodes. But all nodes on one DC are down now. Cassandra nodes periodically die in 2-DC configuration -- Key: CASSANDRA-8245 URL: https://issues.apache.org/jira/browse/CASSANDRA-8245 Project: Cassandra Issue Type: Bug Components: Core Environment: Scientific Linux release 6.5 java version 1.7.0_51 Cassandra 2.0.9 Reporter: Oleg Poleshuk Assignee: Brandon Williams Priority: Minor Attachments: stack1.txt, stack2.txt, stack3.txt, stack4.txt, stack5.txt We have 2 DCs with 3 nodes in each. Second DC periodically has 1-2 nodes down. Looks like it looses connectivity with another nodes and then Gossiper starts to accumulate tasks until Cassandra dies with OOM. WARN [MemoryMeter:1] 2014-08-12 14:34:59,803 Memtable.java (line 470) setting live ratio to maximum of 64.0 instead of Infinity WARN [GossipTasks:1] 2014-08-12 14:44:34,866 Gossiper.java (line 637) Gossip stage has 1 pending tasks; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2014-08-12 14:44:35,968 Gossiper.java (line 637) Gossip stage has 4 pending tasks; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2014-08-12 14:44:37,070 Gossiper.java (line 637) Gossip stage has 8 pending tasks; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2014-08-12 14:44:38,171 Gossiper.java (line 637) Gossip stage has 11 pending tasks; skipping status check (no nodes will be marked down) ... WARN [GossipTasks:1] 2014-10-06 21:42:51,575 Gossiper.java (line 637) Gossip stage has 1014764 pending tasks; skipping status check (no nodes will be marked down) WARN [New I/O worker #13] 2014-10-06 21:54:27,010 Slf4JLogger.java (line 76) Unexpected exception in the selector loop. java.lang.OutOfMemoryError: Java heap space Also those lines but not sure it is relevant: DEBUG [GossipStage:1] 2014-08-12 11:33:18,801 FailureDetector.java (line 338) Ignoring interval time of 2085963047 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8544) Cassandra could not start with NPE in ColumnFamilyStore.removeUnfinishedCompactionLeftovers
[ https://issues.apache.org/jira/browse/CASSANDRA-8544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joshua McKenzie updated CASSANDRA-8544: --- Attachment: 8544_show_npe.txt Some questions: bq. It happens sometimes after restarts caused by undeletable files under Windows. How frequently is sometimes? What files are undeletable? How are you resolving / working around this? Does it work after another attempt to restart? Do you have steps to reproduce this? Attaching a patch that should print out some more details on the NPE if you can reproduce it consistently. Cassandra could not start with NPE in ColumnFamilyStore.removeUnfinishedCompactionLeftovers --- Key: CASSANDRA-8544 URL: https://issues.apache.org/jira/browse/CASSANDRA-8544 Project: Cassandra Issue Type: Bug Environment: Windows Reporter: Leonid Shalupov Assignee: Joshua McKenzie Labels: windows Fix For: 2.1.3 Attachments: 8544_show_npe.txt It happens sometimes after restarts caused by undeletable files under Windows. {quote} Caused by: java.lang.NullPointerException at org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:579) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:232) at org.apache.cassandra.service.CassandraDaemon.init(CassandraDaemon.java:377) at com.jetbrains.cassandra.service.CassandraServiceMain.start(CassandraServiceMain.java:81) ... 6 more {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6559) cqlsh should warn about ALLOW FILTERING
[ https://issues.apache.org/jira/browse/CASSANDRA-6559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261286#comment-14261286 ] Aaron Ploetz commented on CASSANDRA-6559: - Not a bad idea. But then some users would always have that option enabled. And with that option enabled: -if the warning text was output before the result set, it would be completely lost/ignored with larger sets. -if the warning text was output after the result set, it would push-up desired results...again, only really an issue with larger sets. Although in this case, the user would actually see it. cqlsh should warn about ALLOW FILTERING --- Key: CASSANDRA-6559 URL: https://issues.apache.org/jira/browse/CASSANDRA-6559 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Tupshin Harper Priority: Minor Labels: cqlsh Fix For: 2.0.12 ALLOW FILTERING can be a convenience for preliminary exploration of your data, and can be useful for batch jobs, but it is such an anti-pattern for regular production queries, that cqlsh should provie an explicit warn ingwhenever such a query is performed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8535) java.lang.RuntimeException: Failed to rename XXX to YYY
[ https://issues.apache.org/jira/browse/CASSANDRA-8535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261297#comment-14261297 ] Joshua McKenzie commented on CASSANDRA-8535: There are a large number of tests that will currently error out on the 2.1 branch due to the RandomAccessReader (see CASSANDRA-4050). Are you having problems running a Cassandra cluster or is this ticket about failing unit tests on the 2.1 branch? Regardless, we're still sorting out some issues with the CompactionTask that show up on Windows only right now due to file system sharing violations (see CASSANDRA-8399). If you run with the v2 from that ticket (which is not the Correct solution, mind you, but should alleviate the current symptoms) it should address most of these Unable to delete and/or rename issues with unit tests. Let me know if you still reproduce this error w/v2 from that ticket - my guess is that this is a duplicate but I'd prefer confirmation before closing out. java.lang.RuntimeException: Failed to rename XXX to YYY --- Key: CASSANDRA-8535 URL: https://issues.apache.org/jira/browse/CASSANDRA-8535 Project: Cassandra Issue Type: Bug Environment: Windows 2008 X64 Reporter: Leonid Shalupov Assignee: Joshua McKenzie {code} java.lang.RuntimeException: Failed to rename build\test\cassandra\data;0\system\schema_keyspaces-b0f2235744583cdb9631c43e59ce3676\system-schema_keyspaces-tmp-ka-5-Index.db to build\test\cassandra\data;0\system\schema_keyspaces-b0f2235744583cdb9631c43e59ce3676\system-schema_keyspaces-ka-5-Index.db at org.apache.cassandra.io.util.FileUtils.renameWithConfirm(FileUtils.java:170) ~[main/:na] at org.apache.cassandra.io.util.FileUtils.renameWithConfirm(FileUtils.java:154) ~[main/:na] at org.apache.cassandra.io.sstable.SSTableWriter.rename(SSTableWriter.java:569) ~[main/:na] at org.apache.cassandra.io.sstable.SSTableWriter.rename(SSTableWriter.java:561) ~[main/:na] at org.apache.cassandra.io.sstable.SSTableWriter.close(SSTableWriter.java:535) ~[main/:na] at org.apache.cassandra.io.sstable.SSTableWriter.finish(SSTableWriter.java:470) ~[main/:na] at org.apache.cassandra.io.sstable.SSTableRewriter.finishAndMaybeThrow(SSTableRewriter.java:349) ~[main/:na] at org.apache.cassandra.io.sstable.SSTableRewriter.finish(SSTableRewriter.java:324) ~[main/:na] at org.apache.cassandra.io.sstable.SSTableRewriter.finish(SSTableRewriter.java:304) ~[main/:na] at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:200) ~[main/:na] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[main/:na] at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:75) ~[main/:na] at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) ~[main/:na] at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:226) ~[main/:na] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_45] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_45] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_45] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_45] at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45] Caused by: java.nio.file.FileSystemException: build\test\cassandra\data;0\system\schema_keyspaces-b0f2235744583cdb9631c43e59ce3676\system-schema_keyspaces-tmp-ka-5-Index.db - build\test\cassandra\data;0\system\schema_keyspaces-b0f2235744583cdb9631c43e59ce3676\system-schema_keyspaces-ka-5-Index.db: The process cannot access the file because it is being used by another process. at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) ~[na:1.7.0_45] at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) ~[na:1.7.0_45] at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301) ~[na:1.7.0_45] at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287) ~[na:1.7.0_45] at java.nio.file.Files.move(Files.java:1345) ~[na:1.7.0_45] at org.apache.cassandra.io.util.FileUtils.atomicMoveWithFallback(FileUtils.java:184) ~[main/:na] at org.apache.cassandra.io.util.FileUtils.renameWithConfirm(FileUtils.java:166) ~[main/:na] ... 18 common frames omitted {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6559) cqlsh should warn about ALLOW FILTERING
[ https://issues.apache.org/jira/browse/CASSANDRA-6559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261318#comment-14261318 ] Mike Adamson commented on CASSANDRA-6559: - I have to say that I'm with [~philipthompson] on this anyway. It seems to me that if we get CASSANDRA-8303 then this becomes redundant because an admin could block these queries if they don't want them. If they aren't blocked then the user isn't going to want to see a warning. cqlsh should warn about ALLOW FILTERING --- Key: CASSANDRA-6559 URL: https://issues.apache.org/jira/browse/CASSANDRA-6559 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Tupshin Harper Priority: Minor Labels: cqlsh Fix For: 2.0.12 ALLOW FILTERING can be a convenience for preliminary exploration of your data, and can be useful for batch jobs, but it is such an anti-pattern for regular production queries, that cqlsh should provie an explicit warn ingwhenever such a query is performed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
cassandra git commit: Schema change events/results for UDFs and aggregates
Repository: cassandra Updated Branches: refs/heads/trunk cfee3da90 - dcc3bb054 Schema change events/results for UDFs and aggregates Patch by Robert Stupp; reviewed by Tyler Hobbs for CASSANDRA-7708 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/dcc3bb05 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/dcc3bb05 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/dcc3bb05 Branch: refs/heads/trunk Commit: dcc3bb054167eb5f408cea79935855780fd56285 Parents: cfee3da Author: Robert Stupp sn...@snazy.de Authored: Tue Dec 30 12:25:17 2014 -0600 Committer: Tyler Hobbs ty...@datastax.com Committed: Tue Dec 30 12:25:17 2014 -0600 -- CHANGES.txt | 7 +- doc/native_protocol_v4.spec | 33 --- src/java/org/apache/cassandra/auth/Auth.java| 58 +--- .../apache/cassandra/cql3/QueryProcessor.java | 24 ++--- .../cassandra/cql3/functions/Functions.java | 20 +--- .../cassandra/cql3/functions/UDHelper.java | 23 + .../statements/CreateAggregateStatement.java| 14 ++- .../statements/CreateFunctionStatement.java | 13 ++- .../cql3/statements/DropAggregateStatement.java | 8 +- .../cql3/statements/DropFunctionStatement.java | 8 +- .../cassandra/db/marshal/AbstractType.java | 9 ++ .../cassandra/schema/LegacySchemaTables.java| 43 +++-- .../cassandra/service/IMigrationListener.java | 40 .../cassandra/service/MigrationListener.java| 85 + .../cassandra/service/MigrationManager.java | 49 +- .../org/apache/cassandra/transport/Event.java | 96 +--- .../org/apache/cassandra/transport/Server.java | 28 -- .../apache/cassandra/cql3/AggregationTest.java | 62 - .../org/apache/cassandra/cql3/CQLTester.java| 43 +++-- test/unit/org/apache/cassandra/cql3/UFTest.java | 46 +- .../cassandra/transport/SerDeserTest.java | 14 +++ 21 files changed, 483 insertions(+), 240 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/dcc3bb05/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 1468693..ac63fb3 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -10,7 +10,8 @@ * Fix aggregate fn results on empty selection, result column name, and cqlsh parsing (CASSANDRA-8229) * Mark sstables as repaired after full repair (CASSANDRA-7586) - * Extend Descriptor to include a format value and refactor reader/writer apis (CASSANDRA-7443) + * Extend Descriptor to include a format value and refactor reader/writer + APIs (CASSANDRA-7443) * Integrate JMH for microbenchmarks (CASSANDRA-8151) * Keep sstable levels when bootstrapping (CASSANDRA-7460) * Add Sigar library and perform basic OS settings check on startup (CASSANDRA-7838) @@ -22,8 +23,8 @@ * Improve compaction logging (CASSANDRA-7818) * Remove YamlFileNetworkTopologySnitch (CASSANDRA-7917) * Do anticompaction in groups (CASSANDRA-6851) - * Support pure user-defined functions (CASSANDRA-7395, 7526, 7562, 7740, 7781, 7929, - 7924, 7812, 8063, 7813) + * Support user-defined functions (CASSANDRA-7395, 7526, 7562, 7740, 7781, 7929, + 7924, 7812, 8063, 7813, 7708) * Permit configurable timestamps with cassandra-stress (CASSANDRA-7416) * Move sstable RandomAccessReader to nio2, which allows using the FILE_SHARE_DELETE flag on Windows (CASSANDRA-4050) http://git-wip-us.apache.org/repos/asf/cassandra/blob/dcc3bb05/doc/native_protocol_v4.spec -- diff --git a/doc/native_protocol_v4.spec b/doc/native_protocol_v4.spec index 02aac3b..3764e91 100644 --- a/doc/native_protocol_v4.spec +++ b/doc/native_protocol_v4.spec @@ -669,18 +669,25 @@ Table of Contents the rest of the message will be change_typetargetoptions where: - change_type is a [string] representing the type of changed involved. It will be one of CREATED, UPDATED or DROPPED. -- target is a [string] that can be one of KEYSPACE, TABLE or TYPE - and describes what has been modified (TYPE stands for modifications - related to user types). -- options depends on the preceding target. If target is - KEYSPACE, then options will be a single [string] representing the - keyspace changed. Otherwise, if target is TABLE or TYPE, then - options will be 2 [string]: the first one will be the keyspace - containing the affected object, and the second one will be the name - of said affected object (so either the table name or the user type - name). - - All EVENT message have a streamId of -1 (Section 2.3). +- target
[jira] [Commented] (CASSANDRA-6559) cqlsh should warn about ALLOW FILTERING
[ https://issues.apache.org/jira/browse/CASSANDRA-6559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261328#comment-14261328 ] Philip Thompson commented on CASSANDRA-6559: I don't know if we should rely on CASSANDRA-8303 to close this, but if that is implemented, this does feel redundant. I'm +0 on adding a cqlshrc option to not question the user. The warning text should probably be printed immediately, before the result set. If the query really is that large, they'll notice from how long it takes to filter that its problematic. The problem with ALLOW FILTERING is that it looks fine with small result sets, before moving to production. cqlsh should warn about ALLOW FILTERING --- Key: CASSANDRA-6559 URL: https://issues.apache.org/jira/browse/CASSANDRA-6559 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Tupshin Harper Priority: Minor Labels: cqlsh Fix For: 2.0.12 ALLOW FILTERING can be a convenience for preliminary exploration of your data, and can be useful for batch jobs, but it is such an anti-pattern for regular production queries, that cqlsh should provie an explicit warn ingwhenever such a query is performed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-6559) cqlsh should warn about ALLOW FILTERING
[ https://issues.apache.org/jira/browse/CASSANDRA-6559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261328#comment-14261328 ] Philip Thompson edited comment on CASSANDRA-6559 at 12/30/14 6:30 PM: -- I don't know if we should rely on CASSANDRA-8303 to close this, but if that is implemented, this does feel redundant. I'm +0 on adding a cqlshrc option to not question the user. was (Author: philipthompson): I don't know if we should rely on CASSANDRA-8303 to close this, but if that is implemented, this does feel redundant. I'm +0 on adding a cqlshrc option to not question the user. The warning text should probably be printed immediately, before the result set. If the query really is that large, they'll notice from how long it takes to filter that its problematic. The problem with ALLOW FILTERING is that it looks fine with small result sets, before moving to production. cqlsh should warn about ALLOW FILTERING --- Key: CASSANDRA-6559 URL: https://issues.apache.org/jira/browse/CASSANDRA-6559 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Tupshin Harper Priority: Minor Labels: cqlsh Fix For: 2.0.12 ALLOW FILTERING can be a convenience for preliminary exploration of your data, and can be useful for batch jobs, but it is such an anti-pattern for regular production queries, that cqlsh should provie an explicit warn ingwhenever such a query is performed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6559) cqlsh should warn about ALLOW FILTERING
[ https://issues.apache.org/jira/browse/CASSANDRA-6559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261329#comment-14261329 ] Philip Thompson commented on CASSANDRA-6559: The warning text should probably be printed immediately, before the result set. If the query really is that large, they'll notice from how long it takes to filter that its problematic. The problem with ALLOW FILTERING is that it looks fine with small result sets, before moving to production. cqlsh should warn about ALLOW FILTERING --- Key: CASSANDRA-6559 URL: https://issues.apache.org/jira/browse/CASSANDRA-6559 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Tupshin Harper Priority: Minor Labels: cqlsh Fix For: 2.0.12 ALLOW FILTERING can be a convenience for preliminary exploration of your data, and can be useful for batch jobs, but it is such an anti-pattern for regular production queries, that cqlsh should provie an explicit warn ingwhenever such a query is performed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes
[ https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261376#comment-14261376 ] Jeremy Hanna commented on CASSANDRA-5220: - I think it's important to reiterate that the project devs recognize that these inefficiencies are impacting many users. However, lots of parallel work is getting done on repair. As Yuki pointed out, with incremental repair (CASSANDRA-5351) already in 2.1 and improving the concurrency of the repair process (CASSANDRA-6455) coming in 3.0, many of the problems seen in this ticket will be resolved. Until 2.1/3.0, sub-range repair (CASSANDRA-5280) is helpful to parallelize and repair more efficiently with virtual nodes. See http://www.datastax.com/dev/blog/advanced-repair-techniques for details about efficiency gains with sub-range repair. It's just more tedious to track. Saving repair data to a system table (CASSANDRA-5839) will help track that in Cassandra itself. Repair improvements when using vnodes - Key: CASSANDRA-5220 URL: https://issues.apache.org/jira/browse/CASSANDRA-5220 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 1.2.0 beta 1 Reporter: Brandon Williams Assignee: Yuki Morishita Labels: performance, repair Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2 Currently when using vnodes, repair takes much longer to complete than without them. This appears at least in part because it's using a session per range and processing them sequentially. This generates a lot of log spam with vnodes, and while being gentler and lighter on hard disk deployments, ssd-based deployments would often prefer that repair be as fast as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8245) Cassandra nodes periodically die in 2-DC configuration
[ https://issues.apache.org/jira/browse/CASSANDRA-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261238#comment-14261238 ] Donald Smith edited comment on CASSANDRA-8245 at 12/30/14 7:36 PM: --- We're getting a similar increase in the number of pending Gossip stage tasks, followed by OutOfMemory. This happens once a day or so on some node of our 38 node DC. Other nodes have increases in pending Gossip stage tasks but they recover. This is with C* 2.0.11.We have two other DCs. ntpd is running on all nodes. But all nodes on one DC are down now. What's odd is that the cassandra process continues running despite the OutOfMemory exception. You'd expect it to exit. Prior to getting OutOfMemory, I notice that such nodes are slow in responding to commands and queries (e.g., jmx). {noformat} WARN [GossipTasks:1] 2014-12-26 02:45:06,204 Gossiper.java (line 648) Gossip stage has 2695 pending tasks; skipping status check (no nodes will be marked down) ERROR [Thread-49234] 2014-12-26 07:18:42,281 CassandraDaemon.java (line 199) Exception in thread Thread[Thread-49234,5,main] java.lang.OutOfMemoryError: Java heap space ERROR [Thread-49235] 2014-12-26 07:18:42,291 CassandraDaemon.java (line 199) Exception in thread Thread[Thread-49235,5,main] java.lang.OutOfMemoryError: Java heap space ... {noformat} was (Author: thinkerfeeler): We're getting a similar increase in the number of pending Gossip stage tasks, followed by OutOfMemory. This happens once a day or so on some node of our 38 node DC. Other nodes have increases in pending Gossip stage tasks but they recover. This is with C* 2.0.11.We have two other DCs. ntpd is running on all nodes. But all nodes on one DC are down now. What's odd is that the cassandra process continues running despite the OutOfMemory exception. You'd expect it to exit. {noformat} WARN [GossipTasks:1] 2014-12-26 02:45:06,204 Gossiper.java (line 648) Gossip stage has 2695 pending tasks; skipping status check (no nodes will be marked down) ERROR [Thread-49234] 2014-12-26 07:18:42,281 CassandraDaemon.java (line 199) Exception in thread Thread[Thread-49234,5,main] java.lang.OutOfMemoryError: Java heap space ERROR [Thread-49235] 2014-12-26 07:18:42,291 CassandraDaemon.java (line 199) Exception in thread Thread[Thread-49235,5,main] java.lang.OutOfMemoryError: Java heap space ... {noformat} Cassandra nodes periodically die in 2-DC configuration -- Key: CASSANDRA-8245 URL: https://issues.apache.org/jira/browse/CASSANDRA-8245 Project: Cassandra Issue Type: Bug Components: Core Environment: Scientific Linux release 6.5 java version 1.7.0_51 Cassandra 2.0.9 Reporter: Oleg Poleshuk Assignee: Brandon Williams Priority: Minor Attachments: stack1.txt, stack2.txt, stack3.txt, stack4.txt, stack5.txt We have 2 DCs with 3 nodes in each. Second DC periodically has 1-2 nodes down. Looks like it looses connectivity with another nodes and then Gossiper starts to accumulate tasks until Cassandra dies with OOM. WARN [MemoryMeter:1] 2014-08-12 14:34:59,803 Memtable.java (line 470) setting live ratio to maximum of 64.0 instead of Infinity WARN [GossipTasks:1] 2014-08-12 14:44:34,866 Gossiper.java (line 637) Gossip stage has 1 pending tasks; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2014-08-12 14:44:35,968 Gossiper.java (line 637) Gossip stage has 4 pending tasks; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2014-08-12 14:44:37,070 Gossiper.java (line 637) Gossip stage has 8 pending tasks; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2014-08-12 14:44:38,171 Gossiper.java (line 637) Gossip stage has 11 pending tasks; skipping status check (no nodes will be marked down) ... WARN [GossipTasks:1] 2014-10-06 21:42:51,575 Gossiper.java (line 637) Gossip stage has 1014764 pending tasks; skipping status check (no nodes will be marked down) WARN [New I/O worker #13] 2014-10-06 21:54:27,010 Slf4JLogger.java (line 76) Unexpected exception in the selector loop. java.lang.OutOfMemoryError: Java heap space Also those lines but not sure it is relevant: DEBUG [GossipStage:1] 2014-08-12 11:33:18,801 FailureDetector.java (line 338) Ignoring interval time of 2085963047 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8390) The process cannot access the file because it is being used by another process
[ https://issues.apache.org/jira/browse/CASSANDRA-8390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261444#comment-14261444 ] Philip Thompson commented on CASSANDRA-8390: No luck reproducing. I'm able to run the tests for a few hours with no problems, I increased the number of iterations in the loop and tried changing the year used. Based on the hypothesis it was an AV issue, I even ran a few active scans during the tests and that didn't affect anything. The process cannot access the file because it is being used by another process -- Key: CASSANDRA-8390 URL: https://issues.apache.org/jira/browse/CASSANDRA-8390 Project: Cassandra Issue Type: Bug Reporter: Ilya Komolkin Assignee: Joshua McKenzie Fix For: 2.1.3 Attachments: NoHostAvailableLogs.zip {code}21:46:27.810 [NonPeriodicTasks:1] ERROR o.a.c.service.CassandraDaemon - Exception in thread Thread[NonPeriodicTasks:1,5,main] org.apache.cassandra.io.FSWriteError: java.nio.file.FileSystemException: E:\Upsource_12391\data\cassandra\data\kernel\filechangehistory_t-a277b560764611e48c8e4915424c75fe\kernel-filechangehistory_t-ka-33-Index.db: The process cannot access the file because it is being used by another process. at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:135) ~[cassandra-all-2.1.1.jar:2.1.1] at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:121) ~[cassandra-all-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTable.delete(SSTable.java:113) ~[cassandra-all-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableDeletingTask.run(SSTableDeletingTask.java:94) ~[cassandra-all-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableReader$6.run(SSTableReader.java:664) ~[cassandra-all-2.1.1.jar:2.1.1] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_71] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_71] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) ~[na:1.7.0_71] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) ~[na:1.7.0_71] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_71] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71] Caused by: java.nio.file.FileSystemException: E:\Upsource_12391\data\cassandra\data\kernel\filechangehistory_t-a277b560764611e48c8e4915424c75fe\kernel-filechangehistory_t-ka-33-Index.db: The process cannot access the file because it is being used by another process. at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) ~[na:1.7.0_71] at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) ~[na:1.7.0_71] at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102) ~[na:1.7.0_71] at sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269) ~[na:1.7.0_71] at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103) ~[na:1.7.0_71] at java.nio.file.Files.delete(Files.java:1079) ~[na:1.7.0_71] at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:131) ~[cassandra-all-2.1.1.jar:2.1.1] ... 11 common frames omitted{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7886) Coordinator should not wait for read timeouts when replicas hit Exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261460#comment-14261460 ] Tyler Hobbs commented on CASSANDRA-7886: bq. Regarding TOE: Currently I throw TOEs as exceptions and they get logged just like any other exception. I am not sure if this is desireable and would like to hear your feedback. I think we have the following options: bq. Leave as it is in v5, meaning TOEs get logged with stacktraces. Hmm, I forgot that with the previous setup, we wouldn't have stacktraces logged for TOEs under normal circumstances. bq. Add catch blocks where neccessary and log it in user-friendly way. But it might be in many places. Also in this case I would prefer making TOE a checked exception. Imho TOE should not be unchecked. I believe TOEs should remain unchecked. They are closer in nature to an IOError than something that calling methods should explicitly account for. They would also add a lot of noise to the entire read path. bq. Add TOE logging to C* default exception handler. (I did not investigate yet, but I assume there is a exceptionhandler) We do have an unhandled exception handler (in {{CassandraDaemon}}), but I'm not sure that's the best solution either. It might be okay to suppress stacktraces for TOEs on the normal read path, but in unexpected cases (like, say, dealing with hints or other system tables internally) we would want to see the stacktrace. Unfortunately we can't reliably distinguish the two at this level. bq. Leave it as it was before I think it's a toss-up between this (catching TOEs in a few places and suppressing) and always allowing stacktraces to be logged for TombstoneOverwhelmingExceptions. Coordinator should not wait for read timeouts when replicas hit Exceptions -- Key: CASSANDRA-7886 URL: https://issues.apache.org/jira/browse/CASSANDRA-7886 Project: Cassandra Issue Type: Improvement Components: Core Environment: Tested with Cassandra 2.0.8 Reporter: Christian Spriegel Assignee: Christian Spriegel Priority: Minor Labels: protocolv4 Fix For: 3.0 Attachments: 7886_v1.txt, 7886_v2_trunk.txt, 7886_v3_trunk.txt, 7886_v4_trunk.txt, 7886_v5_trunk.txt *Issue* When you have TombstoneOverwhelmingExceptions occuring in queries, this will cause the query to be simply dropped on every data-node, but no response is sent back to the coordinator. Instead the coordinator waits for the specified read_request_timeout_in_ms. On the application side this can cause memory issues, since the application is waiting for the timeout interval for every request.Therefore, if our application runs into TombstoneOverwhelmingExceptions, then (sooner or later) our entire application cluster goes down :-( *Proposed solution* I think the data nodes should send a error message to the coordinator when they run into a TombstoneOverwhelmingException. Then the coordinator does not have to wait for the timeout-interval. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8245) Cassandra nodes periodically die in 2-DC configuration
[ https://issues.apache.org/jira/browse/CASSANDRA-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261468#comment-14261468 ] Tyler Hobbs commented on CASSANDRA-8245: bq. What's odd is that the cassandra process continues running despite the OutOfMemory exception. You'd expect it to exit. bq. Prior to getting OutOfMemory, I notice that such nodes are slow in responding to commands and queries (e.g., jmx). OOMs are handled in a better (more consistent) way with CASSANDRA-7507. That ticket may answer a few questions for you. Cassandra nodes periodically die in 2-DC configuration -- Key: CASSANDRA-8245 URL: https://issues.apache.org/jira/browse/CASSANDRA-8245 Project: Cassandra Issue Type: Bug Components: Core Environment: Scientific Linux release 6.5 java version 1.7.0_51 Cassandra 2.0.9 Reporter: Oleg Poleshuk Assignee: Brandon Williams Priority: Minor Attachments: stack1.txt, stack2.txt, stack3.txt, stack4.txt, stack5.txt We have 2 DCs with 3 nodes in each. Second DC periodically has 1-2 nodes down. Looks like it looses connectivity with another nodes and then Gossiper starts to accumulate tasks until Cassandra dies with OOM. WARN [MemoryMeter:1] 2014-08-12 14:34:59,803 Memtable.java (line 470) setting live ratio to maximum of 64.0 instead of Infinity WARN [GossipTasks:1] 2014-08-12 14:44:34,866 Gossiper.java (line 637) Gossip stage has 1 pending tasks; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2014-08-12 14:44:35,968 Gossiper.java (line 637) Gossip stage has 4 pending tasks; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2014-08-12 14:44:37,070 Gossiper.java (line 637) Gossip stage has 8 pending tasks; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2014-08-12 14:44:38,171 Gossiper.java (line 637) Gossip stage has 11 pending tasks; skipping status check (no nodes will be marked down) ... WARN [GossipTasks:1] 2014-10-06 21:42:51,575 Gossiper.java (line 637) Gossip stage has 1014764 pending tasks; skipping status check (no nodes will be marked down) WARN [New I/O worker #13] 2014-10-06 21:54:27,010 Slf4JLogger.java (line 76) Unexpected exception in the selector loop. java.lang.OutOfMemoryError: Java heap space Also those lines but not sure it is relevant: DEBUG [GossipStage:1] 2014-08-12 11:33:18,801 FailureDetector.java (line 338) Ignoring interval time of 2085963047 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8548) Nodetool Cleanup - IllegalArgumentException
Sebastian Estevez created CASSANDRA-8548: Summary: Nodetool Cleanup - IllegalArgumentException Key: CASSANDRA-8548 URL: https://issues.apache.org/jira/browse/CASSANDRA-8548 Project: Cassandra Issue Type: Bug Reporter: Sebastian Estevez Fix For: 2.0.11 Needed to free up some space on a node but getting the dump below when running nodetool cleanup. Tried turning on debug to try to obtain additional details in the logs but nothing gets added to the logs when running cleanup. Added: log4j.logger.org.apache.cassandra.db=DEBUG in log4j-server.properties This is especially frustrating because of having recently upgraded due to a different--DSE specific--cleanup related bug (DSP-4310). See the stack trace below: root@cassandra-019:~# nodetool cleanup Error occurred during cleanup java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.cassandra.db.compaction.CompactionManager.performAllSSTableOperation(CompactionManager.java:228) at org.apache.cassandra.db.compaction.CompactionManager.performCleanup(CompactionManager.java:266) at org.apache.cassandra.db.ColumnFamilyStore.forceCleanup(ColumnFamilyStore.java:1112) at org.apache.cassandra.service.StorageService.forceKeyspaceCleanup(StorageService.java:2162) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.GeneratedMethodAccessor64.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:267) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.decompressChunk(CompressedRandomAccessReader.java:108) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:87) at org.apache.cassandra.io.compress.CompressedThrottledReader.reBuffer(CompressedThrottledReader.java:41) at
[jira] [Updated] (CASSANDRA-8548) Nodetool Cleanup - IllegalArgumentException
[ https://issues.apache.org/jira/browse/CASSANDRA-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Estevez updated CASSANDRA-8548: - Description: Needed to free up some space on a node but getting the dump below when running nodetool cleanup. Tried turning on debug to try to obtain additional details in the logs but nothing gets added to the logs when running cleanup. Added: log4j.logger.org.apache.cassandra.db=DEBUG in log4j-server.properties This is especially frustrating because of having recently upgraded due to a different-DSE specific-cleanup related bug (DSP-4310). See the stack trace below: root@cassandra-019:~# nodetool cleanup Error occurred during cleanup java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.cassandra.db.compaction.CompactionManager.performAllSSTableOperation(CompactionManager.java:228) at org.apache.cassandra.db.compaction.CompactionManager.performCleanup(CompactionManager.java:266) at org.apache.cassandra.db.ColumnFamilyStore.forceCleanup(ColumnFamilyStore.java:1112) at org.apache.cassandra.service.StorageService.forceKeyspaceCleanup(StorageService.java:2162) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.GeneratedMethodAccessor64.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:267) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.decompressChunk(CompressedRandomAccessReader.java:108) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:87) at org.apache.cassandra.io.compress.CompressedThrottledReader.reBuffer(CompressedThrottledReader.java:41) at org.apache.cassandra.io.util.RandomAccessReader.seek(RandomAccessReader.java:283) at
[jira] [Updated] (CASSANDRA-8548) Nodetool Cleanup - IllegalArgumentException
[ https://issues.apache.org/jira/browse/CASSANDRA-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Estevez updated CASSANDRA-8548: - Description: Needed to free up some space on a node but getting the dump below when running nodetool cleanup. Tried turning on debug to try to obtain additional details in the logs but nothing gets added to the logs when running cleanup. Added: log4j.logger.org.apache.cassandra.db=DEBUG in log4j-server.properties This is especially frustrating because of having recently upgraded due to a different (DSE specific) cleanup related bug (DSP-4310). See the stack trace below: root@cassandra-019:~# nodetool cleanup Error occurred during cleanup java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.cassandra.db.compaction.CompactionManager.performAllSSTableOperation(CompactionManager.java:228) at org.apache.cassandra.db.compaction.CompactionManager.performCleanup(CompactionManager.java:266) at org.apache.cassandra.db.ColumnFamilyStore.forceCleanup(ColumnFamilyStore.java:1112) at org.apache.cassandra.service.StorageService.forceKeyspaceCleanup(StorageService.java:2162) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.GeneratedMethodAccessor64.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:267) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.decompressChunk(CompressedRandomAccessReader.java:108) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:87) at org.apache.cassandra.io.compress.CompressedThrottledReader.reBuffer(CompressedThrottledReader.java:41) at org.apache.cassandra.io.util.RandomAccessReader.seek(RandomAccessReader.java:283) at
[jira] [Updated] (CASSANDRA-8548) Nodetool Cleanup - IllegalArgumentException
[ https://issues.apache.org/jira/browse/CASSANDRA-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Estevez updated CASSANDRA-8548: - Description: Needed to free up some space on a node but getting the dump below when running nodetool cleanup. Tried turning on debug to try to obtain additional details in the logs but nothing gets added to the logs when running cleanup. Added: log4j.logger.org.apache.cassandra.db=DEBUG in log4j-server.properties This is especially frustrating because of having recently upgraded due to a different (DSE specific) cleanup related bug (DSP-4310). See the stack trace below: root@cassandra-019:~# nodetool cleanup Error occurred during cleanup java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.cassandra.db.compaction.CompactionManager.performAllSSTableOperation(CompactionManager.java:228) at org.apache.cassandra.db.compaction.CompactionManager.performCleanup(CompactionManager.java:266) at org.apache.cassandra.db.ColumnFamilyStore.forceCleanup(ColumnFamilyStore.java:1112) at org.apache.cassandra.service.StorageService.forceKeyspaceCleanup(StorageService.java:2162) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.GeneratedMethodAccessor64.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:267) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.decompressChunk(CompressedRandomAccessReader.java:108) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:87) at org.apache.cassandra.io.compress.CompressedThrottledReader.reBuffer(CompressedThrottledReader.java:41) at org.apache.cassandra.io.util.RandomAccessReader.seek(RandomAccessReader.java:283) at
[jira] [Updated] (CASSANDRA-8548) Nodetool Cleanup - IllegalArgumentException
[ https://issues.apache.org/jira/browse/CASSANDRA-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Estevez updated CASSANDRA-8548: - Description: Needed to free up some space on a node but getting the dump below when running nodetool cleanup. Tried turning on debug to try to obtain additional details in the logs but nothing gets added to the logs when running cleanup. Added: log4j.logger.org.apache.cassandra.db=DEBUG in log4j-server.properties See the stack trace below: root@cassandra-019:~# nodetool cleanup Error occurred during cleanup java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.cassandra.db.compaction.CompactionManager.performAllSSTableOperation(CompactionManager.java:228) at org.apache.cassandra.db.compaction.CompactionManager.performCleanup(CompactionManager.java:266) at org.apache.cassandra.db.ColumnFamilyStore.forceCleanup(ColumnFamilyStore.java:1112) at org.apache.cassandra.service.StorageService.forceKeyspaceCleanup(StorageService.java:2162) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.GeneratedMethodAccessor64.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:267) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.decompressChunk(CompressedRandomAccessReader.java:108) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:87) at org.apache.cassandra.io.compress.CompressedThrottledReader.reBuffer(CompressedThrottledReader.java:41) at org.apache.cassandra.io.util.RandomAccessReader.seek(RandomAccessReader.java:283) at org.apache.cassandra.io.sstable.SSTableScanner.seekToCurrentRangeStart(SSTableScanner.java:123) at org.apache.cassandra.io.sstable.SSTableScanner.access$200(SSTableScanner.java:45) at
[jira] [Commented] (CASSANDRA-8303) Provide strict mode for CQL Queries
[ https://issues.apache.org/jira/browse/CASSANDRA-8303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261885#comment-14261885 ] Jonathan Shook commented on CASSANDRA-8303: --- A permission that might be helpful to add to the list: UNPREPARED_STATEMENTS. I can easily see unprepared statements being disallowed in some environments, for prod app accounts. Provide strict mode for CQL Queries - Key: CASSANDRA-8303 URL: https://issues.apache.org/jira/browse/CASSANDRA-8303 Project: Cassandra Issue Type: Improvement Reporter: Anupam Arora Fix For: 3.0 Please provide a strict mode option in cassandra that will kick out any CQL queries that are expensive, e.g. any query with ALLOWS FILTERING, multi-partition queries, secondary index queries, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8546) RangeTombstoneList becoming bottleneck on tombstone heavy tasks
[ https://issues.apache.org/jira/browse/CASSANDRA-8546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Letz updated CASSANDRA-8546: Attachment: cassandra-2.1-8546.txt There was seemingly a rebase error on my side, I've uploaded the fixed version. Got confused by the existing errors here: http://cassci.datastax.com/view/cassandra-2.1/job/cassandra-2.1_utest/791/testReport/ as I get the same on my machine. RangeTombstoneList becoming bottleneck on tombstone heavy tasks --- Key: CASSANDRA-8546 URL: https://issues.apache.org/jira/browse/CASSANDRA-8546 Project: Cassandra Issue Type: Improvement Components: Core Environment: 2.0.11 / 2.1 Reporter: Dominic Letz Assignee: Dominic Letz Fix For: 2.1.3 Attachments: cassandra-2.0.11-8546.txt, cassandra-2.1-8546.txt, cassandra-2.1-8546.txt, tombstone_test.tgz I would like to propose a change of the data structure used in the RangeTombstoneList to store and insert tombstone ranges to something with at least O(log N) insert in the middle and at near O(1) and start AND end. Here is why: When having tombstone heavy work-loads the current implementation of RangeTombstoneList becomes a bottleneck with slice queries. Scanning the number of tombstones up to the default maximum (100k) can take up to 3 minutes of how addInternal() scales on insertion of middle and start elements. The attached test shows that with 50k deletes from both sides of a range. INSERT 1...11 flush() DELETE 1...5 DELETE 11...6 While one direction performs ok (~400ms on my notebook): {code} SELECT * FROM timeseries WHERE name = 'a' ORDER BY timestamp DESC LIMIT 1 {code} The other direction underperforms (~7seconds on my notebook) {code} SELECT * FROM timeseries WHERE name = 'a' ORDER BY timestamp ASC LIMIT 1 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8546) RangeTombstoneList becoming bottleneck on tombstone heavy tasks
[ https://issues.apache.org/jira/browse/CASSANDRA-8546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Letz updated CASSANDRA-8546: Attachment: (was: cassandra-2.1-8546.txt) RangeTombstoneList becoming bottleneck on tombstone heavy tasks --- Key: CASSANDRA-8546 URL: https://issues.apache.org/jira/browse/CASSANDRA-8546 Project: Cassandra Issue Type: Improvement Components: Core Environment: 2.0.11 / 2.1 Reporter: Dominic Letz Assignee: Dominic Letz Fix For: 2.1.3 Attachments: cassandra-2.0.11-8546.txt, cassandra-2.1-8546.txt, tombstone_test.tgz I would like to propose a change of the data structure used in the RangeTombstoneList to store and insert tombstone ranges to something with at least O(log N) insert in the middle and at near O(1) and start AND end. Here is why: When having tombstone heavy work-loads the current implementation of RangeTombstoneList becomes a bottleneck with slice queries. Scanning the number of tombstones up to the default maximum (100k) can take up to 3 minutes of how addInternal() scales on insertion of middle and start elements. The attached test shows that with 50k deletes from both sides of a range. INSERT 1...11 flush() DELETE 1...5 DELETE 11...6 While one direction performs ok (~400ms on my notebook): {code} SELECT * FROM timeseries WHERE name = 'a' ORDER BY timestamp DESC LIMIT 1 {code} The other direction underperforms (~7seconds on my notebook) {code} SELECT * FROM timeseries WHERE name = 'a' ORDER BY timestamp ASC LIMIT 1 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)