[jira] [Resolved] (CASSANDRA-9957) Unable to build Apache Cassandra Under Debian 8 OS with the provided ant script
[ https://issues.apache.org/jira/browse/CASSANDRA-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis resolved CASSANDRA-9957. --- Resolution: Not A Problem Something is broken in your environment, but this is not a C* bug. Unable to build Apache Cassandra Under Debian 8 OS with the provided ant script --- Key: CASSANDRA-9957 URL: https://issues.apache.org/jira/browse/CASSANDRA-9957 Project: Cassandra Issue Type: Bug Environment: PRETTY_NAME=Debian GNU/Linux 8 (jessie) NAME=Debian GNU/Linux VERSION_ID=8 VERSION=8 (jessie) ID=debian HOME_URL=http://www.debian.org/; SUPPORT_URL=http://www.debian.org/support/; BUG_REPORT_URL=https://bugs.debian.org/; java version 1.8.0_45 Java(TM) SE Runtime Environment (build 1.8.0_45-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode) Apache Ant(TM) version 1.9.5 compiled on May 31 2015 Reporter: Adelin M.Ghanayem Labels: Cassandra, ant, build, build.xml Trying to use the tool CCM ( Cassandra Cluster Manger ) I've been blocked by an issue related to compiling Cassandra source. CCM installs Cassandra builds it source before anything else. However the CCM thrown an error https://gist.github.com/AdelinGhanaem/593d1c8a63857113d0a7 here you can find all info you need. I've then tried to download the source and compile it using ant jar but I've got the same error. Basically the jars that are installed then running ant jar are corrupted ! Extract them with jar xf thrown an error. The only way that I could build the source is by downloading the jars by hand from maven. I've described the error and the process in this post here http://mradelin.blogspot.com/2015/07/error-packaging-cassandra-220-db-source_31.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-5220) Repair improvements when using vnodes
[ https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-5220: -- Assignee: Marcus Olsson Repair improvements when using vnodes - Key: CASSANDRA-5220 URL: https://issues.apache.org/jira/browse/CASSANDRA-5220 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 1.2.0 beta 1 Reporter: Brandon Williams Assignee: Marcus Olsson Labels: performance, repair Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, cassandra-3.0-5220-1.patch, cassandra-3.0-5220-2.patch, cassandra-3.0-5220.patch Currently when using vnodes, repair takes much longer to complete than without them. This appears at least in part because it's using a session per range and processing them sequentially. This generates a lot of log spam with vnodes, and while being gentler and lighter on hard disk deployments, ssd-based deployments would often prefer that repair be as fast as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-5220) Repair improvements when using vnodes
[ https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-5220: -- Reviewer: Stefania (was: Yuki Morishita) Reassigning review to [~Stefania] Repair improvements when using vnodes - Key: CASSANDRA-5220 URL: https://issues.apache.org/jira/browse/CASSANDRA-5220 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 1.2.0 beta 1 Reporter: Brandon Williams Assignee: Marcus Olsson Labels: performance, repair Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, cassandra-3.0-5220-1.patch, cassandra-3.0-5220-2.patch, cassandra-3.0-5220.patch Currently when using vnodes, repair takes much longer to complete than without them. This appears at least in part because it's using a session per range and processing them sequentially. This generates a lot of log spam with vnodes, and while being gentler and lighter on hard disk deployments, ssd-based deployments would often prefer that repair be as fast as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9947) nodetool verify is broken
Jonathan Ellis created CASSANDRA-9947: - Summary: nodetool verify is broken Key: CASSANDRA-9947 URL: https://issues.apache.org/jira/browse/CASSANDRA-9947 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Priority: Critical Fix For: 2.2.x Raised these issues on CASSANDRA-5791, but didn't revert/re-open, so they were ignored: We mark sstables that fail verification as unrepaired, but that's not going to do what you think. What it means is that the local node will use that sstable in the next repair, but other nodes will not. So all we'll end up doing is streaming whatever data we can read from it, to the other replicas. If we could magically mark whatever sstables correspond on the remote nodes, to the data in the local sstable, that would work, but we can't. IMO what we should do is: *scrub, because it's quite likely we'll fail reading from the sstable otherwise and *full repair across the data range covered by the sstable Additionally, * I'm not sure that keeping extended verify code around is worth it. Since the point is to work around not having a checksum, we could just scrub instead. This is slightly more heavyweight but it would be a one-time cost (scrub would build a new checksum) and we wouldn't have to worry about keeping two versions of almost-the-same-code in sync. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9947) nodetool verify is broken
[ https://issues.apache.org/jira/browse/CASSANDRA-9947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649199#comment-14649199 ] Jonathan Ellis commented on CASSANDRA-9947: --- IMO we should disable verify for 2.2.1 until we can rearchitect it since this is a nontrivial change. nodetool verify is broken - Key: CASSANDRA-9947 URL: https://issues.apache.org/jira/browse/CASSANDRA-9947 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Priority: Critical Fix For: 2.2.x Raised these issues on CASSANDRA-5791, but didn't revert/re-open, so they were ignored: We mark sstables that fail verification as unrepaired, but that's not going to do what you think. What it means is that the local node will use that sstable in the next repair, but other nodes will not. So all we'll end up doing is streaming whatever data we can read from it, to the other replicas. If we could magically mark whatever sstables correspond on the remote nodes, to the data in the local sstable, that would work, but we can't. IMO what we should do is: *scrub, because it's quite likely we'll fail reading from the sstable otherwise and *full repair across the data range covered by the sstable Additionally, * I'm not sure that keeping extended verify code around is worth it. Since the point is to work around not having a checksum, we could just scrub instead. This is slightly more heavyweight but it would be a one-time cost (scrub would build a new checksum) and we wouldn't have to worry about keeping two versions of almost-the-same-code in sync. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-5791) A nodetool command to validate all sstables in a node
[ https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649200#comment-14649200 ] Jonathan Ellis commented on CASSANDRA-5791: --- Created CASSANDRA-9947 to follow up. A nodetool command to validate all sstables in a node - Key: CASSANDRA-5791 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791 Project: Cassandra Issue Type: New Feature Components: Core Reporter: sankalp kohli Assignee: Jeff Jirsa Priority: Minor Fix For: 2.2.0 beta 1 Attachments: cassandra-5791-20150319.diff, cassandra-5791-patch-3.diff, cassandra-5791.patch-2 CUrrently there is no nodetool command to validate all sstables on disk. The only way to do this is to run a repair and see if it succeeds. But we cannot repair the system keyspace. Also we can run upgrade sstables but that re writes all the sstables. This command should check the hash of all sstables and return whether all data is readable all not. This should NOT care about consistency. The compressed sstables do not have hash so not sure how it will work there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (CASSANDRA-8143) Partitioner should not be accessed through StorageService
[ https://issues.apache.org/jira/browse/CASSANDRA-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis reopened CASSANDRA-8143: --- Partitioner should not be accessed through StorageService - Key: CASSANDRA-8143 URL: https://issues.apache.org/jira/browse/CASSANDRA-8143 Project: Cassandra Issue Type: Improvement Reporter: Branimir Lambov Assignee: Branimir Lambov Fix For: 3.0 beta 1 The configured partitioner is no longer the only partitioner in use in the database, as e.g. index tables use LocalPartitioner. To make sure the correct partitioner is used for each table, accesses of StorageService.getPartitioner() should be replaced with retrieval of the CFS-specific partitioner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9031) nodetool info -T throws ArrayOutOfBounds when the node has not joined the cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9031: -- Reviewer: Stefania [~Stefania] to review nodetool info -T throws ArrayOutOfBounds when the node has not joined the cluster - Key: CASSANDRA-9031 URL: https://issues.apache.org/jira/browse/CASSANDRA-9031 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Ron Kuris Assignee: Yuki Morishita Fix For: 2.1.x Attachments: patch.txt To reproduce, bring up a node that does not join the cluster, either using -Dcassandra.write_survey=true or -Dcassandra.join_ring=false, then run 'nodetool info -T'. You'll get the following stack trace: {code}ID : e384209f-f7a9-4cff-8fd5-03adfaa0d846 Gossip active : true Thrift active : true Native Transport active: true Load : 76.69 KB Generation No : 1427229938 Uptime (seconds) : 728 Heap Memory (MB) : 109.93 / 826.00 Off Heap Memory (MB) : 0.01 Exception in thread main java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.cassandra.tools.NodeProbe.getEndpoint(NodeProbe.java:676) at org.apache.cassandra.tools.NodeProbe.getDataCenter(NodeProbe.java:694) at org.apache.cassandra.tools.NodeCmd.printInfo(NodeCmd.java:666) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1277){code} After applying the attached patch, the new error is: {code}ID : a7d76a2a-82d2-4faa-94e1-a30df6663ebb Gossip active : true Thrift active : false Native Transport active: false Load : 89.36 KB Generation No : 1427231804 Uptime (seconds) : 12 Heap Memory (MB) : 135.49 / 826.00 Off Heap Memory (MB) : 0.01 Exception in thread main java.lang.RuntimeException: This node does not have any tokens. Perhaps it is not part of the ring? at org.apache.cassandra.tools.NodeProbe.getEndpoint(NodeProbe.java:678) at org.apache.cassandra.tools.NodeProbe.getDataCenter(NodeProbe.java:698) at org.apache.cassandra.tools.NodeCmd.printInfo(NodeCmd.java:676) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1313){code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9483) Document incompatibilities with -XX:+PerfDisableSharedMem
[ https://issues.apache.org/jira/browse/CASSANDRA-9483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649730#comment-14649730 ] Jonathan Ellis commented on CASSANDRA-9483: --- {{working:}} instead of {{working.}}, otherwise +1 Document incompatibilities with -XX:+PerfDisableSharedMem - Key: CASSANDRA-9483 URL: https://issues.apache.org/jira/browse/CASSANDRA-9483 Project: Cassandra Issue Type: Task Components: Config, Documentation website Reporter: Tyler Hobbs Assignee: T Jake Luciani Priority: Minor Fix For: 3.0 beta 1 Attachments: news_update.txt We recently discovered that [the Jolokia agent is incompatible with the -XX:+PerfDisableSharedMem JVM option|https://github.com/rhuss/jolokia/issues/198]. I assume that this may affect other monitoring tools as well. If we are going to leave this enabled by default, we should document the potential problems with it. A combination of a comment in {{cassandra-env.sh}} (and the Windows equivalent) and a comment in NEWS.txt should suffice, I think. If possible, it would be good to figure out what other tools are affected and also mention them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9949) maxPurgeableTimestamp needs to check memtables too
[ https://issues.apache.org/jira/browse/CASSANDRA-9949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9949: -- Assignee: Stefania maxPurgeableTimestamp needs to check memtables too -- Key: CASSANDRA-9949 URL: https://issues.apache.org/jira/browse/CASSANDRA-9949 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Stefania Fix For: 2.1.x, 2.2.x overlapIterator/maxPurgeableTimestamp don't include the memtables, so a very-out-of-order write could be ignored -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8143) Partitioner should not be accessed through StorageService
[ https://issues.apache.org/jira/browse/CASSANDRA-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649505#comment-14649505 ] Jonathan Ellis commented on CASSANDRA-8143: --- From IRC: {noformat} exlt this is going to be a hold the presses moment - we're going to jack up jenkins really quickly with this not being fixed exlt ... the last run (as well as several before that) of that branch job was aborted for running out of control - http://cassci.datastax.com/view/Dev/view/blambov/job/blambov-8143-partitioner-dtest/17/ exlt this shouldn't have been merged {noformat} reverted pending a fix Partitioner should not be accessed through StorageService - Key: CASSANDRA-8143 URL: https://issues.apache.org/jira/browse/CASSANDRA-8143 Project: Cassandra Issue Type: Improvement Reporter: Branimir Lambov Assignee: Branimir Lambov Fix For: 3.0 beta 1 The configured partitioner is no longer the only partitioner in use in the database, as e.g. index tables use LocalPartitioner. To make sure the correct partitioner is used for each table, accesses of StorageService.getPartitioner() should be replaced with retrieval of the CFS-specific partitioner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9949) maxPurgeableTimestamp needs to check memtables too
Jonathan Ellis created CASSANDRA-9949: - Summary: maxPurgeableTimestamp needs to check memtables too Key: CASSANDRA-9949 URL: https://issues.apache.org/jira/browse/CASSANDRA-9949 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Fix For: 2.1.x, 2.2.x overlapIterator/maxPurgeableTimestamp don't include the memtables, so a very-out-of-order write could be ignored -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9949) maxPurgeableTimestamp needs to check memtables too
[ https://issues.apache.org/jira/browse/CASSANDRA-9949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649931#comment-14649931 ] Jonathan Ellis commented on CASSANDRA-9949: --- Nit: should probably reverse the order of the predicates in {{timestamp getMaxPurgeableTimestamp() localDeletionTime gcBefore}} since the former is expensive while the latter is not. maxPurgeableTimestamp needs to check memtables too -- Key: CASSANDRA-9949 URL: https://issues.apache.org/jira/browse/CASSANDRA-9949 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Fix For: 2.1.x, 2.2.x overlapIterator/maxPurgeableTimestamp don't include the memtables, so a very-out-of-order write could be ignored -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9935: -- Assignee: Yuki Morishita Repair fails with RuntimeException -- Key: CASSANDRA-9935 URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 Project: Cassandra Issue Type: Bug Environment: C* 2.1.8, Debian Wheezy Reporter: mlowicki Assignee: Yuki Morishita Attachments: db1.sync.lati.osa.cassandra.log We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade to 2.1.8 it started to work faster but now it fails with: {code} ... [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished [2015-07-29 20:44:03,957] Repair command #4 finished error: nodetool failed, check server logs -- StackTrace -- java.lang.RuntimeException: nodetool failed, check server logs at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) {code} After running: {code} nodetool repair --partitioner-range --parallel --in-local-dc sync {code} Last records in logs regarding repair are: {code} INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range (-7695808664784761779,-7693529816291585568] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range (806371695398849,8065203836608925992] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range (-5474076923322749342,-5468600594078911162] finished INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range (-8631877858109464676,-8624040066373718932] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range (-5372806541854279315,-5369354119480076785] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range (8166489034383821955,8168408930184216281] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range (6084602890817326921,6088328703025510057] finished INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range (-781874602493000830,-781745173070807746] finished {code} but a bit above I see (at least two times in attached log): {code} ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range (5765414319217852786,5781018794516851576] failed with error org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.7.0_80] at java.util.concurrent.FutureTask.get(FutureTask.java:188) [na:1.7.0_80] at org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:2950) ~[apache-cassandra-2.1.8.jar:2.1.8] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) [apache-cassandra-2.1.8.jar:2.1.8] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_80] at
[jira] [Commented] (CASSANDRA-8325) Cassandra 2.1.x fails to start on FreeBSD (JVM crash)
[ https://issues.apache.org/jira/browse/CASSANDRA-8325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647671#comment-14647671 ] Jonathan Ellis commented on CASSANDRA-8325: --- Basically we're waiting for someone who wants to run it badly enough to do the work. Cassandra 2.1.x fails to start on FreeBSD (JVM crash) - Key: CASSANDRA-8325 URL: https://issues.apache.org/jira/browse/CASSANDRA-8325 Project: Cassandra Issue Type: Bug Environment: FreeBSD 10.0 with openjdk version 1.7.0_71, 64-Bit Server VM Reporter: Leonid Shalupov Fix For: 2.1.x Attachments: hs_err_pid1856.log, system.log, unsafeCopy1.txt, untested_8325.patch See attached error file after JVM crash {quote} FreeBSD xxx.intellij.net 10.0-RELEASE FreeBSD 10.0-RELEASE #0 r260789: Thu Jan 16 22:34:59 UTC 2014 r...@snap.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 {quote} {quote} % java -version openjdk version 1.7.0_71 OpenJDK Runtime Environment (build 1.7.0_71-b14) OpenJDK 64-Bit Server VM (build 24.71-b01, mixed mode) {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9944) sstablesplit.bat does not split sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-9944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9944: -- Assignee: Stefania sstablesplit.bat does not split sstables Key: CASSANDRA-9944 URL: https://issues.apache.org/jira/browse/CASSANDRA-9944 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Philip Thompson Assignee: Stefania Fix For: 2.2.x The dtest {{sstablesplit_test.py:TestSStableSplit.split_test}} is failing on windows on 2.2-head. An sstable approximately 280MB is created, and then we run sstablesplit.bat on it. By default, we should split into 50MB sstables, giving us six new sstables. Instead, nothing happens, and we are left with the original. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9942) SStableofflinerevel and sstablelevelreset don't have windows versions
[ https://issues.apache.org/jira/browse/CASSANDRA-9942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9942: -- Assignee: Paulo Motta SStableofflinerevel and sstablelevelreset don't have windows versions - Key: CASSANDRA-9942 URL: https://issues.apache.org/jira/browse/CASSANDRA-9942 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Philip Thompson Assignee: Paulo Motta Fix For: 2.2.x These two tools located in tools/bin do not have corresponding .bat versions, so they do not run on windows. This is also breaking their related dtests on windows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9946) use ioprio_set instead of throttling by default
[ https://issues.apache.org/jira/browse/CASSANDRA-9946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648573#comment-14648573 ] Jonathan Ellis commented on CASSANDRA-9946: --- Things I don't know: # What is the Windows equivalent? /cc [~JoshuaMcKenzie] # Should we pick a one-size-fits-all priority, or allow the user to override class/priority? /cc [~a...@ooyala.com] use ioprio_set instead of throttling by default --- Key: CASSANDRA-9946 URL: https://issues.apache.org/jira/browse/CASSANDRA-9946 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Ariel Weisberg Fix For: 3.x Compaction throttling works as designed, but it has two drawbacks: * it requires manual tuning to choose the right value for a given machine * it does not allow compaction to burst above its limit if there is additional i/o capacity available while there are less application requests to serve Using ioprio_set instead solves both of these problems. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9946) use ioprio_set on compaction threads by default instead of manually throttling
[ https://issues.apache.org/jira/browse/CASSANDRA-9946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9946: -- Summary: use ioprio_set on compaction threads by default instead of manually throttling (was: use ioprio_set instead of throttling by default) use ioprio_set on compaction threads by default instead of manually throttling -- Key: CASSANDRA-9946 URL: https://issues.apache.org/jira/browse/CASSANDRA-9946 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Ariel Weisberg Fix For: 3.x Compaction throttling works as designed, but it has two drawbacks: * it requires manual tuning to choose the right value for a given machine * it does not allow compaction to burst above its limit if there is additional i/o capacity available while there are less application requests to serve Using ioprio_set instead solves both of these problems. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9946) use ioprio_set instead of throttling by default
Jonathan Ellis created CASSANDRA-9946: - Summary: use ioprio_set instead of throttling by default Key: CASSANDRA-9946 URL: https://issues.apache.org/jira/browse/CASSANDRA-9946 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Ariel Weisberg Fix For: 3.x Compaction throttling works as designed, but it has two drawbacks: * it requires manual tuning to choose the right value for a given machine * it does not allow compaction to burst above its limit if there is additional i/o capacity available while there are less application requests to serve Using ioprio_set instead solves both of these problems. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9843) Augment or replace partition index with adaptive range filters
[ https://issues.apache.org/jira/browse/CASSANDRA-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648568#comment-14648568 ] Jonathan Ellis commented on CASSANDRA-9843: --- [~danchia], the first thing we'd need is an ARF implementation that supports Cell. Augment or replace partition index with adaptive range filters -- Key: CASSANDRA-9843 URL: https://issues.apache.org/jira/browse/CASSANDRA-9843 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: T Jake Luciani Labels: performance Adaptive range filters are, in principle, bloom filters for range queries. They provide a space-efficient way to avoid scanning a partition when we can tell that we do not contain any data for the range requested. Like BF, they can return false positives but not false negatives. The implementation is of course totally different from BF. ARF is a tree where each leaf of the tree is a range of data and a bit, either on or off, denoting whether we have *some* data in that range. ARF are described here: http://www.vldb.org/pvldb/vol6/p1714-kossmann.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9946) use ioprio_set on compaction threads by default instead of manually throttling
[ https://issues.apache.org/jira/browse/CASSANDRA-9946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648652#comment-14648652 ] Jonathan Ellis commented on CASSANDRA-9946: --- CFQ is default on debian and RHEL. Is there a syscall that can check if it's enabled first? use ioprio_set on compaction threads by default instead of manually throttling -- Key: CASSANDRA-9946 URL: https://issues.apache.org/jira/browse/CASSANDRA-9946 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Ariel Weisberg Fix For: 3.x Compaction throttling works as designed, but it has two drawbacks: * it requires manual tuning to choose the right value for a given machine * it does not allow compaction to burst above its limit if there is additional i/o capacity available while there are less application requests to serve Using ioprio_set instead solves both of these problems. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646843#comment-14646843 ] Jonathan Ellis commented on CASSANDRA-6477: --- Reverted 3bdcaa336a6e6a9727c333b433bb9f5d3afc0fb1 and b93f05d7d1490c6146576a35f5a572d9d0e72399 pending a fix. Materialized Views (was: Global Indexes) Key: CASSANDRA-6477 URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 Project: Cassandra Issue Type: New Feature Components: API, Core Reporter: Jonathan Ellis Assignee: Carl Yeksigian Labels: cql Fix For: 3.0 alpha 1 Attachments: test-view-data.sh, users.yaml Local indexes are suitable for low-cardinality data, where spreading the index across the cluster is a Good Thing. However, for high-cardinality data, local indexes require querying most nodes in the cluster even if only a handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9894) Serialize the header only once per message
[ https://issues.apache.org/jira/browse/CASSANDRA-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9894: -- Reviewer: Ariel Weisberg [~aweisberg] to review Serialize the header only once per message -- Key: CASSANDRA-9894 URL: https://issues.apache.org/jira/browse/CASSANDRA-9894 Project: Cassandra Issue Type: Sub-task Components: Core Reporter: Sylvain Lebresne Assignee: Benedict Fix For: 3.0 beta 1 One last improvement I'd like to do on the serialization side is that we currently serialize the {{SerializationHeader}} for each partition. That header contains the serialized columns in particular and for range queries, serializing that for every partition is wasted (note that it's only a problem for the messaging protocol as for sstable we only write the header once per sstable). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis reopened CASSANDRA-6477: --- Materialized Views (was: Global Indexes) Key: CASSANDRA-6477 URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 Project: Cassandra Issue Type: New Feature Components: API, Core Reporter: Jonathan Ellis Assignee: Carl Yeksigian Labels: cql Fix For: 3.0 alpha 1 Attachments: test-view-data.sh, users.yaml Local indexes are suitable for low-cardinality data, where spreading the index across the cluster is a Good Thing. However, for high-cardinality data, local indexes require querying most nodes in the cluster even if only a handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8931) IndexSummary (and Index) should store the token, and the minimal key to unambiguously direct a query
[ https://issues.apache.org/jira/browse/CASSANDRA-8931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646225#comment-14646225 ] Jonathan Ellis commented on CASSANDRA-8931: --- Good idea. This will save a lot of memory. IndexSummary (and Index) should store the token, and the minimal key to unambiguously direct a query Key: CASSANDRA-8931 URL: https://issues.apache.org/jira/browse/CASSANDRA-8931 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Stefania Labels: performance Since these files are likely sticking around a little longer, it is probably worth optimising them. A relatively simple change to Index and IndexSummary could reduce the amount of space required significantly, reduce the CPU burden of lookup, and hopefully bound the amount of space needed as key size grows. On writing first we always store the token before the key (if it is different to the key); then we simply truncate the whole record to the minimum length necessary to answer an inequality search. Since the data file contains the key also, we can corroborate we have the right key once we've looked up. Since BFs are used to reduce unnecessary lookups, we don't save much by ruling the false positives out one step earlier. An improved follow up version would be to use a trie of shortest length to answer inequality lookups, as this would also ensure very long keys with common prefixes would not significantly increase the size of the index or summary. This would translate to a trie index for the summary keying into a static trie page for the index. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8931) IndexSummary (and Index) should store the token, and the minimal key to unambiguously direct a query
[ https://issues.apache.org/jira/browse/CASSANDRA-8931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646126#comment-14646126 ] Jonathan Ellis commented on CASSANDRA-8931: --- bq. then we simply truncate the whole record to the minimum length necessary to answer an inequality search Meaning, we only store enough to disambiguate from the records before and after? IndexSummary (and Index) should store the token, and the minimal key to unambiguously direct a query Key: CASSANDRA-8931 URL: https://issues.apache.org/jira/browse/CASSANDRA-8931 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Stefania Labels: performance Since these files are likely sticking around a little longer, it is probably worth optimising them. A relatively simple change to Index and IndexSummary could reduce the amount of space required significantly, reduce the CPU burden of lookup, and hopefully bound the amount of space needed as key size grows. On writing first we always store the token before the key (if it is different to the key); then we simply truncate the whole record to the minimum length necessary to answer an inequality search. Since the data file contains the key also, we can corroborate we have the right key once we've looked up. Since BFs are used to reduce unnecessary lookups, we don't save much by ruling the false positives out one step earlier. An improved follow up version would be to use a trie of shortest length to answer inequality lookups, as this would also ensure very long keys with common prefixes would not significantly increase the size of the index or summary. This would translate to a trie index for the summary keying into a static trie page for the index. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646091#comment-14646091 ] Jonathan Ellis commented on CASSANDRA-9738: --- Did you mean to link a different issue? Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.x Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9895) Batchlog RF1 writes to a single node but not itself.
[ https://issues.apache.org/jira/browse/CASSANDRA-9895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646142#comment-14646142 ] Jonathan Ellis commented on CASSANDRA-9895: --- Writing to localhost doesn't really improve our durability though, since it's already involved as the coordinator pushing the batch through. (To be clear, it improves durability more than logging to nothing, but much less than logging to a different node.) Batchlog RF1 writes to a single node but not itself. - Key: CASSANDRA-9895 URL: https://issues.apache.org/jira/browse/CASSANDRA-9895 Project: Cassandra Issue Type: Bug Reporter: T Jake Luciani Assignee: Aleksey Yeschenko Fix For: 2.1.x, 3.0 beta 1 In the batchlogmanager when selecting the endpoints for to write the batchlog to, for RF1, we filter out any down nodes and the local node. This means we require two nodes up but only write to one. Why? This affects availability since we need two nodes to write at CL.ONE. If we *require* two copies of the batchlog then we should include ourselfs in the calculation. If we allow a batchlog write with only a single node up then we should write to the local batchlog. The code is here: https://github.com/apache/cassandra/blob/1c80b04be1d47d03bbde888cea960f5ff8a95d58/src/java/org/apache/cassandra/db/BatchlogManager.java#L530 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7066) Simplify (and unify) cleanup of compaction leftovers
[ https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645359#comment-14645359 ] Jonathan Ellis commented on CASSANDRA-7066: --- /cc [~nickmbailey] Simplify (and unify) cleanup of compaction leftovers Key: CASSANDRA-7066 URL: https://issues.apache.org/jira/browse/CASSANDRA-7066 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Stefania Priority: Minor Labels: benedict-to-commit, compaction Fix For: 3.0 alpha 1 Attachments: 7066.txt Currently we manage a list of in-progress compactions in a system table, which we use to cleanup incomplete compactions when we're done. The problem with this is that 1) it's a bit clunky (and leaves us in positions where we can unnecessarily cleanup completed files, or conversely not cleanup files that have been superceded); and 2) it's only used for a regular compaction - no other compaction types are guarded in the same way, so can result in duplication if we fail before deleting the replacements. I'd like to see each sstable store in its metadata its direct ancestors, and on startup we simply delete any sstables that occur in the union of all ancestor sets. This way as soon as we finish writing we're capable of cleaning up any leftovers, so we never get duplication. It's also much easier to reason about. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9790) CommitLogUpgradeTest.test{20,21} failure
[ https://issues.apache.org/jira/browse/CASSANDRA-9790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9790: -- Assignee: Ariel Weisberg (was: Sylvain Lebresne) CommitLogUpgradeTest.test{20,21} failure Key: CASSANDRA-9790 URL: https://issues.apache.org/jira/browse/CASSANDRA-9790 Project: Cassandra Issue Type: Sub-task Reporter: Michael Shuler Assignee: Ariel Weisberg Priority: Blocker Labels: test-failure Fix For: 3.0 beta 1 These test failures started with the 8099 commit. {noformat} Stacktrace java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:275) at org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:583) at org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(ByteBufferUtil.java:592) at org.apache.cassandra.db.marshal.CompositeType.splitName(CompositeType.java:197) at org.apache.cassandra.db.LegacyLayout.decodeClustering(LegacyLayout.java:235) at org.apache.cassandra.db.LegacyLayout.decodeCellName(LegacyLayout.java:127) at org.apache.cassandra.db.LegacyLayout.readLegacyCellBody(LegacyLayout.java:672) at org.apache.cassandra.db.LegacyLayout.readLegacyCell(LegacyLayout.java:643) at org.apache.cassandra.db.LegacyLayout$8.computeNext(LegacyLayout.java:713) at org.apache.cassandra.db.LegacyLayout$8.computeNext(LegacyLayout.java:702) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at com.google.common.collect.Iterators$PeekingImpl.hasNext(Iterators.java:1149) at org.apache.cassandra.db.LegacyLayout.toUnfilteredRowIterator(LegacyLayout.java:310) at org.apache.cassandra.db.LegacyLayout.onWireCellstoUnfilteredRowIterator(LegacyLayout.java:298) at org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:670) at org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:276) at org.apache.cassandra.db.commitlog.CommitLogTestReplayer.replayMutation(CommitLogTestReplayer.java:66) at org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:464) at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:370) at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:145) at org.apache.cassandra.db.commitlog.CommitLogUpgradeTest.testRestore(CommitLogUpgradeTest.java:105) at org.apache.cassandra.db.commitlog.CommitLogUpgradeTest.test21(CommitLogUpgradeTest.java:66) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9533) Make batch commitlog mode easier to tune
[ https://issues.apache.org/jira/browse/CASSANDRA-9533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644344#comment-14644344 ] Jonathan Ellis commented on CASSANDRA-9533: --- I think this goes without saying, but to be explicit, I'm not willing to spend time optimizing for single-threaded performance. (Even if it's a regression from 2.0.) Make batch commitlog mode easier to tune Key: CASSANDRA-9533 URL: https://issues.apache.org/jira/browse/CASSANDRA-9533 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Benedict Fix For: 3.x As discussed in CASSANDRA-9504, 2.1 changed commitlog_sync_batch_window_in_ms from a maximum time to wait between fsync to the minimum time, so one must be very careful to keep it small enough that most writers aren't kept waiting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9853) loadConfig() called twice on startup
[ https://issues.apache.org/jira/browse/CASSANDRA-9853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644350#comment-14644350 ] Jonathan Ellis commented on CASSANDRA-9853: --- +1 loadConfig() called twice on startup Key: CASSANDRA-9853 URL: https://issues.apache.org/jira/browse/CASSANDRA-9853 Project: Cassandra Issue Type: Improvement Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.x Attachments: 9853.txt {{YamlConfigrationLoader.loadConfig()}} is called twice on startup from {{org.apache.cassandra.locator.SimpleSeedProvider#getSeeds}} and {{org.apache.cassandra.config.DatabaseDescriptor#forceStaticInitialization}}. It's not nice, but not fatal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9853) loadConfig() called twice on startup
[ https://issues.apache.org/jira/browse/CASSANDRA-9853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9853: -- Reviewer: Jonathan Ellis (was: Aleksey Yeschenko) loadConfig() called twice on startup Key: CASSANDRA-9853 URL: https://issues.apache.org/jira/browse/CASSANDRA-9853 Project: Cassandra Issue Type: Improvement Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.x Attachments: 9853.txt {{YamlConfigrationLoader.loadConfig()}} is called twice on startup from {{org.apache.cassandra.locator.SimpleSeedProvider#getSeeds}} and {{org.apache.cassandra.config.DatabaseDescriptor#forceStaticInitialization}}. It's not nice, but not fatal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9880) ScrubTest.testScrubOutOfOrder should generate test file on the fly
[ https://issues.apache.org/jira/browse/CASSANDRA-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9880: -- Reviewer: Stefania [~Stefania] to review ScrubTest.testScrubOutOfOrder should generate test file on the fly -- Key: CASSANDRA-9880 URL: https://issues.apache.org/jira/browse/CASSANDRA-9880 Project: Cassandra Issue Type: Bug Reporter: Yuki Morishita Assignee: Yuki Morishita Priority: Blocker Labels: test-failure Fix For: 3.0 beta 1 ScrubTest#testScrubOutOfOrder is failing on trunk due to the serialization format change from pre-generated out-of-order SSTable. We should change that to generate out-of-order SSTable on the fly so that we don't need to bother generating SSTable by hand again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9775) some paging dtests fail/flap on trunk
[ https://issues.apache.org/jira/browse/CASSANDRA-9775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9775: -- Reviewer: Benjamin Lerer some paging dtests fail/flap on trunk - Key: CASSANDRA-9775 URL: https://issues.apache.org/jira/browse/CASSANDRA-9775 Project: Cassandra Issue Type: Sub-task Reporter: Jim Witschey Assignee: Sylvain Lebresne Priority: Blocker Fix For: 3.0 beta 1 Several paging dtests fail on trunk: [static_columns_paging_test|http://cassci.datastax.com/view/trunk/job/trunk_dtest/lastSuccessfulBuild/testReport/junit/paging_test/TestPagingData/static_columns_paging_test/history/] [test_undefined_page_size_default|http://cassci.datastax.com/view/trunk/job/trunk_dtest/lastSuccessfulBuild/testReport/junit/paging_test/TestPagingSize/test_undefined_page_size_default/history/] [test_failure_threshold_deletions|http://cassci.datastax.com/view/trunk/job/trunk_dtest/lastSuccessfulBuild/testReport/junit/paging_test/TestPagingWithDeletions/test_failure_threshold_deletions/history/] I'm not sure if these are all rooted in the same underlying problem, so I defer to whoever takes this ticket on. [~thobbs] I'm assigning you because this is about paging, but reassign as you see fit. Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9895) Batchlog RF1 writes to a single node but not itself.
[ https://issues.apache.org/jira/browse/CASSANDRA-9895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644552#comment-14644552 ] Jonathan Ellis commented on CASSANDRA-9895: --- The idea was that the batchlog should give you the guarantee that you won't lose atomicity unless you lose 3 machines during the request (coordinator plus two others). Allowing coordinator to be one of the replicas weakens this to 2. Batchlog RF1 writes to a single node but not itself. - Key: CASSANDRA-9895 URL: https://issues.apache.org/jira/browse/CASSANDRA-9895 Project: Cassandra Issue Type: Bug Reporter: T Jake Luciani Assignee: Aleksey Yeschenko Fix For: 2.1.x, 3.0 beta 1 In the batchlogmanager when selecting the endpoints for to write the batchlog to, for RF1, we filter out any down nodes and the local node. This means we require two nodes up but only write to one. Why? This affects availability since we need two nodes to write at CL.ONE. If we *require* two copies of the batchlog then we should include ourselfs in the calculation. If we allow a batchlog write with only a single node up then we should write to the local batchlog. The code is here: https://github.com/apache/cassandra/blob/1c80b04be1d47d03bbde888cea960f5ff8a95d58/src/java/org/apache/cassandra/db/BatchlogManager.java#L530 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7066) Simplify (and unify) cleanup of compaction leftovers
[ https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644630#comment-14644630 ] Jonathan Ellis commented on CASSANDRA-7066: --- bq. if we lose the new file, say, then we will delete the old files on startup none-the-wiser. The result being a partial replacement of the sstables (perhaps with nothing at all). Isn't that worse than what I proposed a while back? bq. log that the new [sstables] are in progress, then when they're done, we clear the in progress log file and delete the old files. If the process dies in between those two steps (very rare, deletes are fast) [or if the log file is corrupted] we have some extra redundant data left but correctness is preserved. Simplify (and unify) cleanup of compaction leftovers Key: CASSANDRA-7066 URL: https://issues.apache.org/jira/browse/CASSANDRA-7066 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Stefania Priority: Minor Labels: benedict-to-commit, compaction Fix For: 3.0 alpha 1 Attachments: 7066.txt Currently we manage a list of in-progress compactions in a system table, which we use to cleanup incomplete compactions when we're done. The problem with this is that 1) it's a bit clunky (and leaves us in positions where we can unnecessarily cleanup completed files, or conversely not cleanup files that have been superceded); and 2) it's only used for a regular compaction - no other compaction types are guarded in the same way, so can result in duplication if we fail before deleting the replacements. I'd like to see each sstable store in its metadata its direct ancestors, and on startup we simply delete any sstables that occur in the union of all ancestor sets. This way as soon as we finish writing we're capable of cleaning up any leftovers, so we never get duplication. It's also much easier to reason about. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7066) Simplify (and unify) cleanup of compaction leftovers
[ https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644659#comment-14644659 ] Jonathan Ellis commented on CASSANDRA-7066: --- Granted, but a bug in the implementation could lead to similar results. I'd be a lot more comfortable with a design whose failure scenario is we do extra compaction than we silently lose data. Simplify (and unify) cleanup of compaction leftovers Key: CASSANDRA-7066 URL: https://issues.apache.org/jira/browse/CASSANDRA-7066 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Stefania Priority: Minor Labels: benedict-to-commit, compaction Fix For: 3.0 alpha 1 Attachments: 7066.txt Currently we manage a list of in-progress compactions in a system table, which we use to cleanup incomplete compactions when we're done. The problem with this is that 1) it's a bit clunky (and leaves us in positions where we can unnecessarily cleanup completed files, or conversely not cleanup files that have been superceded); and 2) it's only used for a regular compaction - no other compaction types are guarded in the same way, so can result in duplication if we fail before deleting the replacements. I'd like to see each sstable store in its metadata its direct ancestors, and on startup we simply delete any sstables that occur in the union of all ancestor sets. This way as soon as we finish writing we're capable of cleaning up any leftovers, so we never get duplication. It's also much easier to reason about. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (CASSANDRA-9801) Use vints where it makes sense
[ https://issues.apache.org/jira/browse/CASSANDRA-9801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis reopened CASSANDRA-9801: --- Assignee: Ariel Weisberg (was: Sylvain Lebresne) We need to make progress towards less broken tests, not break more, even when we're convinced it's not the new code's fault. I've reverted and am assigning to Ariel to finish up, if necessary, since he's already working on CASSANDRA-9865. Use vints where it makes sense -- Key: CASSANDRA-9801 URL: https://issues.apache.org/jira/browse/CASSANDRA-9801 Project: Cassandra Issue Type: Sub-task Components: Core Reporter: Sylvain Lebresne Assignee: Ariel Weisberg Fix For: 3.0 alpha 1 CASSANDRA-9705 have switched to vints for a number of things, but there is some I've missed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9801) Use vints where it makes sense
[ https://issues.apache.org/jira/browse/CASSANDRA-9801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9801: -- Fix Version/s: (was: 3.0 alpha 1) 3.0 beta 1 Use vints where it makes sense -- Key: CASSANDRA-9801 URL: https://issues.apache.org/jira/browse/CASSANDRA-9801 Project: Cassandra Issue Type: Sub-task Components: Core Reporter: Sylvain Lebresne Assignee: Ariel Weisberg Fix For: 3.0 beta 1 CASSANDRA-9705 have switched to vints for a number of things, but there is some I've missed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7066) Simplify (and unify) cleanup of compaction leftovers
[ https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644619#comment-14644619 ] Jonathan Ellis commented on CASSANDRA-7066: --- bq. If you lose one of these files you are SOL Is that SOL as in Now we can't tell which is new so we have to keep both and do redundant compaction or as in Now the note can't start up? Simplify (and unify) cleanup of compaction leftovers Key: CASSANDRA-7066 URL: https://issues.apache.org/jira/browse/CASSANDRA-7066 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Stefania Priority: Minor Labels: benedict-to-commit, compaction Fix For: 3.0 alpha 1 Attachments: 7066.txt Currently we manage a list of in-progress compactions in a system table, which we use to cleanup incomplete compactions when we're done. The problem with this is that 1) it's a bit clunky (and leaves us in positions where we can unnecessarily cleanup completed files, or conversely not cleanup files that have been superceded); and 2) it's only used for a regular compaction - no other compaction types are guarded in the same way, so can result in duplication if we fail before deleting the replacements. I'd like to see each sstable store in its metadata its direct ancestors, and on startup we simply delete any sstables that occur in the union of all ancestor sets. This way as soon as we finish writing we're capable of cleaning up any leftovers, so we never get duplication. It's also much easier to reason about. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9664) Allow MV's select statements to be more complex
[ https://issues.apache.org/jira/browse/CASSANDRA-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644760#comment-14644760 ] Jonathan Ellis commented on CASSANDRA-9664: --- Materializing aggregates is out of scope here. See CASSANDRA-9778 for that. Allow MV's select statements to be more complex --- Key: CASSANDRA-9664 URL: https://issues.apache.org/jira/browse/CASSANDRA-9664 Project: Cassandra Issue Type: New Feature Reporter: Carl Yeksigian [Materialized Views|https://issues.apache.org/jira/browse/CASSANDRA-6477] add support for a syntax which includes a {{SELECT}} statement, but only allows selection of direct columns, and does not allow any filtering to take place. We should add support to the MV {{SELECT}} statement to bring better parity with the normal CQL {{SELECT}} statement, specifically simple functions in the selected columns, as well as specifying a {{WHERE}} clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9265) Add checksum to saved cache files
[ https://issues.apache.org/jira/browse/CASSANDRA-9265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642737#comment-14642737 ] Jonathan Ellis commented on CASSANDRA-9265: --- Throwing out the cache isn't a showstopper since we can always rebuild it, but it will hurt performance until it's done. Upgrading an entire cluster will be slower since you need to wait longer between machines. So if possible it's nice to preserve compatibility. Add checksum to saved cache files - Key: CASSANDRA-9265 URL: https://issues.apache.org/jira/browse/CASSANDRA-9265 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Fix For: 3.x Saved caches are not covered by a checksum. We should at least emit a checksum. My suggestion is a large checksum of the whole file (convenient offline validation), and then smaller per record checksums after each record is written (possibly a subset of the incrementally maintained larger checksum). I wouldn't go for anything fancy to try to recover from corruption since it is just a saved cache. If corruption is detected while reading I would just have it bail out. I would rather have less code to review and test in this instance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9889) Disable scripted UDFs by default
[ https://issues.apache.org/jira/browse/CASSANDRA-9889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641771#comment-14641771 ] Jonathan Ellis commented on CASSANDRA-9889: --- bq. Requiring that permission for script-UDFs would effectively always disable the sandbox for them. Well, it's acknowledging reality, which is that if you allow users to create scripted UDFs then you need to trust them not to do something dumb. Disable scripted UDFs by default Key: CASSANDRA-9889 URL: https://issues.apache.org/jira/browse/CASSANDRA-9889 Project: Cassandra Issue Type: Improvement Reporter: Robert Stupp Assignee: Robert Stupp Priority: Minor Fix For: 3.0.0 rc1 (Follow-up to CASSANDRA-9402) TL;DR this ticket is about to add an other config option to enable scripted UDFs. Securing Java-UDFs is much easier than scripted UDFs. The secure execution of scripted UDFs heavily relies on how secure a particular script provider implementation is. Nashorn is probably pretty good at this - but (as discussed offline with [~iamaleksey]) we are not certain. This becomes worse with other JSR-223 providers (which need to be installed by the user anyway). E.g.: {noformat} # Enables use of scripted UDFs. # Java UDFs are always enabled, if enable_user_defined_functions is true. # Enable this option to be able to use UDFs with language javascript or any custom JSR-223 provider. enable_scripted_user_defined_functions: false {noformat} TBH: I would feel more comfortable to have this one. But we should review this along with enable_user_defined_functions for 4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9892) Add support for unsandboxed UDF
Jonathan Ellis created CASSANDRA-9892: - Summary: Add support for unsandboxed UDF Key: CASSANDRA-9892 URL: https://issues.apache.org/jira/browse/CASSANDRA-9892 Project: Cassandra Issue Type: New Feature Reporter: Jonathan Ellis Assignee: Robert Stupp Priority: Minor From discussion on CASSANDRA-9402, The approach postgresql takes is to distinguish between trusted (sandboxed) and untrusted (anything goes) UDF languages. Creating an untrusted language always requires superuser mode. Once that is done, creating functions in it requires nothing special. Personally I would be fine with this approach, but I think it would be more useful to have the extra permission on creating the function, and also wouldn't require adding explicit CREATE LANGUAGE. So I'd suggest just providing different CQL permissions for trusted and untrusted, i.e. if you have CREATE FUNCTION permission that allows you to create sandboxed UDF, but you can only create unsandboxed if you have CREATE UNTRUSTED. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-9481) FENCED UDFs
[ https://issues.apache.org/jira/browse/CASSANDRA-9481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis resolved CASSANDRA-9481. --- Resolution: Won't Fix Largely unnecessary with the successful resolution of 9402. FENCED UDFs --- Key: CASSANDRA-9481 URL: https://issues.apache.org/jira/browse/CASSANDRA-9481 Project: Cassandra Issue Type: New Feature Reporter: Brian Hess Related to security/sandboxing of UDFs (CASSANDRA-9042) Essentially, the UDF will run in a separate process when it is registered as FENCED, and run in-process when it is registered as UNFENCED. This doesn't necessarily remove all the issues, but it does help mitigate them/some - especially since it would (optionally) run as another user. This could look like the following with Cassandra: - FENCED is a GRANTable privilege - In cassandra.yaml you can specify the user to use when launching the separate process (so that it is not the same user that is running the database - or optionally is) - This is good so that the UDF can't stop the database, delete database files, etc. - For FENCED UDFs, IPC would be used to transfer rows to the UDF and to return results. We could use CQL rows for the data. This could be shared memory or sockets (Unux or TPC - slight preference for sockets for some follow-on ideas). - Ideally, switching from FENCED to UNFENCED would be just a DDL change. That is, the API would work such that a simple ALTER FUNCTION myFunction(DOUBLE, DOUBLE) UNFENCED would change it. - If you wanted, because this is a separate process you could use a separate class loader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9889) Disable scripted UDFs by default
[ https://issues.apache.org/jira/browse/CASSANDRA-9889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640421#comment-14640421 ] Jonathan Ellis commented on CASSANDRA-9889: --- What if we made scripted always untrusted? CASSANDRA-9892 Disable scripted UDFs by default Key: CASSANDRA-9889 URL: https://issues.apache.org/jira/browse/CASSANDRA-9889 Project: Cassandra Issue Type: Improvement Reporter: Robert Stupp Assignee: Robert Stupp Priority: Minor Fix For: 3.0.0 rc1 (Follow-up to CASSANDRA-9402) TL;DR this ticket is about to add an other config option to enable scripted UDFs. Securing Java-UDFs is much easier than scripted UDFs. The secure execution of scripted UDFs heavily relies on how secure a particular script provider implementation is. Nashorn is probably pretty good at this - but (as discussed offline with [~iamaleksey]) we are not certain. This becomes worse with other JSR-223 providers (which need to be installed by the user anyway). E.g.: {noformat} # Enables use of scripted UDFs. # Java UDFs are always enabled, if enable_user_defined_functions is true. # Enable this option to be able to use UDFs with language javascript or any custom JSR-223 provider. enable_scripted_user_defined_functions: false {noformat} TBH: I would feel more comfortable to have this one. But we should review this along with enable_user_defined_functions for 4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9889) Disable scripted UDFs by default
[ https://issues.apache.org/jira/browse/CASSANDRA-9889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640519#comment-14640519 ] Jonathan Ellis commented on CASSANDRA-9889: --- I think it's the same issue. If Groovy is always untrusted, then you require user to have CREATE UNTRUSTED permission and the problem is solved. Disable scripted UDFs by default Key: CASSANDRA-9889 URL: https://issues.apache.org/jira/browse/CASSANDRA-9889 Project: Cassandra Issue Type: Improvement Reporter: Robert Stupp Assignee: Robert Stupp Priority: Minor Fix For: 3.0.0 rc1 (Follow-up to CASSANDRA-9402) TL;DR this ticket is about to add an other config option to enable scripted UDFs. Securing Java-UDFs is much easier than scripted UDFs. The secure execution of scripted UDFs heavily relies on how secure a particular script provider implementation is. Nashorn is probably pretty good at this - but (as discussed offline with [~iamaleksey]) we are not certain. This becomes worse with other JSR-223 providers (which need to be installed by the user anyway). E.g.: {noformat} # Enables use of scripted UDFs. # Java UDFs are always enabled, if enable_user_defined_functions is true. # Enable this option to be able to use UDFs with language javascript or any custom JSR-223 provider. enable_scripted_user_defined_functions: false {noformat} TBH: I would feel more comfortable to have this one. But we should review this along with enable_user_defined_functions for 4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9868) Archive commitlogs tests failing
[ https://issues.apache.org/jira/browse/CASSANDRA-9868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9868: -- Assignee: Stefania Archive commitlogs tests failing Key: CASSANDRA-9868 URL: https://issues.apache.org/jira/browse/CASSANDRA-9868 Project: Cassandra Issue Type: Sub-task Reporter: Shawn Kumar Assignee: Stefania Priority: Blocker Fix For: 3.0 alpha 1 Attachments: commitlog_archiving.properties A number of archive commitlog dtests (snapshot_tests.py) are failing on trunk at the point in the tests where the node is asked to restore data from archived commitlogs. It appears that the snapshot functionality works, but the [assertion|https://github.com/riptano/cassandra-dtest/blob/master/snapshot_test.py#L312] regarding data that should have been restored from archived commitlogs fails. I also tested this manually on trunk and found no success in restoring either, so it appears to not just be a test issue. Should note that it seems archiving the commitlogs works (in that they are actually copied) and rather restoring them is the issue. Attached is a the commitlog properties file (to show the commands used). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9865) Broken vint encoding, at least when interacting with OHCProvider
[ https://issues.apache.org/jira/browse/CASSANDRA-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641022#comment-14641022 ] Jonathan Ellis commented on CASSANDRA-9865: --- Is this still a problem now that 9863 is resolved? Broken vint encoding, at least when interacting with OHCProvider Key: CASSANDRA-9865 URL: https://issues.apache.org/jira/browse/CASSANDRA-9865 Project: Cassandra Issue Type: Bug Reporter: Sylvain Lebresne Fix For: 3.0 alpha 1 Attachments: 9865-hacky-test.txt I haven't investigated this very closely so I only have a slightly hacky way to show the problem, but if you apply the patch attached, you'll see that the vints serialized and the one deserialized are not the same ones. If you remove the use of vints (as is currently on trunk, but only due to this issue because we do want to use vints), everything works correctly. I'm honestly not sure where the problem is, but it sounds like it could be either in {{NIODataInputStream}} or in the {{OHCProvider}} since it's used on that test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9764) dtest for many UPDATE batches, low contention fails on trunk
[ https://issues.apache.org/jira/browse/CASSANDRA-9764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641023#comment-14641023 ] Jonathan Ellis commented on CASSANDRA-9764: --- [~enigmacurry] is this still a problem with 9863 committed? dtest for many UPDATE batches, low contention fails on trunk Key: CASSANDRA-9764 URL: https://issues.apache.org/jira/browse/CASSANDRA-9764 Project: Cassandra Issue Type: Sub-task Reporter: Jim Witschey Assignee: Sylvain Lebresne Priority: Blocker Fix For: 3.0 alpha 1 {{paxos_tests.py:TestPaxos.contention_test_multi_iterations}} fails consistently on trunk ([cassci history|http://cassci.datastax.com/view/trunk/job/trunk_dtest/lastSuccessfulBuild/testReport/paxos_tests/TestPaxos/contention_test_multi_iterations/history/]). The test works by creating 8 workers, each of which increments an integer 100 times using UPDATE. Based on the test failures, it looks like 2 or 3 of the updates are consistently dropped. Other dtests that don't run as many iterations but have more contention succeed. I'm assigning you, [~slebresne], because you wrote the test and were the last person to modify it, but feel free to reassign. Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9764) dtest for many UPDATE batches, low contention fails on trunk
[ https://issues.apache.org/jira/browse/CASSANDRA-9764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641023#comment-14641023 ] Jonathan Ellis edited comment on CASSANDRA-9764 at 7/24/15 8:21 PM: [~mambocab] is this still a problem with 9863 committed? was (Author: jbellis): [~enigmacurry] is this still a problem with 9863 committed? dtest for many UPDATE batches, low contention fails on trunk Key: CASSANDRA-9764 URL: https://issues.apache.org/jira/browse/CASSANDRA-9764 Project: Cassandra Issue Type: Sub-task Reporter: Jim Witschey Assignee: Sylvain Lebresne Priority: Blocker Fix For: 3.0 alpha 1 {{paxos_tests.py:TestPaxos.contention_test_multi_iterations}} fails consistently on trunk ([cassci history|http://cassci.datastax.com/view/trunk/job/trunk_dtest/lastSuccessfulBuild/testReport/paxos_tests/TestPaxos/contention_test_multi_iterations/history/]). The test works by creating 8 workers, each of which increments an integer 100 times using UPDATE. Based on the test failures, it looks like 2 or 3 of the updates are consistently dropped. Other dtests that don't run as many iterations but have more contention succeed. I'm assigning you, [~slebresne], because you wrote the test and were the last person to modify it, but feel free to reassign. Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9868) Archive commitlogs tests failing
[ https://issues.apache.org/jira/browse/CASSANDRA-9868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9868: -- Assignee: Ariel Weisberg (was: Stefania) Archive commitlogs tests failing Key: CASSANDRA-9868 URL: https://issues.apache.org/jira/browse/CASSANDRA-9868 Project: Cassandra Issue Type: Sub-task Reporter: Shawn Kumar Assignee: Ariel Weisberg Priority: Blocker Fix For: 3.0 alpha 1 Attachments: commitlog_archiving.properties A number of archive commitlog dtests (snapshot_tests.py) are failing on trunk at the point in the tests where the node is asked to restore data from archived commitlogs. It appears that the snapshot functionality works, but the [assertion|https://github.com/riptano/cassandra-dtest/blob/master/snapshot_test.py#L312] regarding data that should have been restored from archived commitlogs fails. I also tested this manually on trunk and found no success in restoring either, so it appears to not just be a test issue. Should note that it seems archiving the commitlogs works (in that they are actually copied) and rather restoring them is the issue. Attached is a the commitlog properties file (to show the commands used). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9865) Broken vint encoding, at least when interacting with OHCProvider
[ https://issues.apache.org/jira/browse/CASSANDRA-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9865: -- Assignee: Ariel Weisberg Broken vint encoding, at least when interacting with OHCProvider Key: CASSANDRA-9865 URL: https://issues.apache.org/jira/browse/CASSANDRA-9865 Project: Cassandra Issue Type: Bug Reporter: Sylvain Lebresne Assignee: Ariel Weisberg Fix For: 3.0 alpha 1 Attachments: 9865-hacky-test.txt I haven't investigated this very closely so I only have a slightly hacky way to show the problem, but if you apply the patch attached, you'll see that the vints serialized and the one deserialized are not the same ones. If you remove the use of vints (as is currently on trunk, but only due to this issue because we do want to use vints), everything works correctly. I'm honestly not sure where the problem is, but it sounds like it could be either in {{NIODataInputStream}} or in the {{OHCProvider}} since it's used on that test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9741) cfhistograms dtest flaps on trunk and 2.2
[ https://issues.apache.org/jira/browse/CASSANDRA-9741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9741: -- Assignee: Ariel Weisberg (was: Tyler Hobbs) cfhistograms dtest flaps on trunk and 2.2 - Key: CASSANDRA-9741 URL: https://issues.apache.org/jira/browse/CASSANDRA-9741 Project: Cassandra Issue Type: Bug Reporter: Jim Witschey Assignee: Ariel Weisberg Fix For: 2.2.x, 3.0.x {{jmx_test.py:TestJMX.cfhistograms_test}} flaps on CassCI under trunk and 2.2. On 2.2, it fails one of its assertions when {{'Unable to compute when histogram overflowed'}} is found in the output of {{nodetool cfhistograms}}. Here's the failure history for 2.2: http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/junit/jmx_test/TestJMX/cfhistograms_test/history/ On trunk, it fails when an error about a {{WriteFailureException}} during hinted handoff is found in the C* logs after the tests run ([example cassci output|http://cassci.datastax.com/view/trunk/job/trunk_dtest/315/testReport/junit/jmx_test/TestJMX/cfhistograms_test/]). Here's the failure history for trunk: http://cassci.datastax.com/view/trunk/job/trunk_dtest/lastCompletedBuild/testReport/junit/jmx_test/TestJMX/cfhistograms_test/history/ I haven't seen it fail locally yet, but haven't run the test more than a couple times because it takes a while. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7392) Abort in-progress queries that time out
[ https://issues.apache.org/jira/browse/CASSANDRA-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-7392: -- Reviewer: Ariel Weisberg (was: Benjamin Lerer) (Benjamin is out next week, so re-handing review to Ariel.) Abort in-progress queries that time out --- Key: CASSANDRA-7392 URL: https://issues.apache.org/jira/browse/CASSANDRA-7392 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Stefania Fix For: 3.x Currently we drop queries that time out before we get to them (because node is overloaded) but not queries that time out while being processed. (Particularly common for index queries on data that shouldn't be indexed.) Adding the latter and logging when we have to interrupt one gets us a poor man's slow query log for free. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8906) Experiment with optimizing partition merging when we can prove that some sources don't overlap
[ https://issues.apache.org/jira/browse/CASSANDRA-8906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-8906: -- Fix Version/s: 3.x Experiment with optimizing partition merging when we can prove that some sources don't overlap -- Key: CASSANDRA-8906 URL: https://issues.apache.org/jira/browse/CASSANDRA-8906 Project: Cassandra Issue Type: Improvement Reporter: Sylvain Lebresne Assignee: Ariel Weisberg Labels: compaction, performance Fix For: 3.x When we merge a partition from two sources and it turns out that those 2 sources don't overlap for that partition, we still end up doing one comparison by row in the first source. However, if we can prove that the 2 sources don't overlap, for example by using the sstable min/max clustering values that we store, we could speed this up. Note that it practice it's little bit more hairy because we need to deal with N sources, but that's probably not too hard either. I'll note that using the sstable min/max clustering values is not terribly precise. We could do better if we were to push the same reasoning inside the merge iterator, by for instance using the sstable per-partition index, which can in theory tell use things like don't bother comparing rows until the end of this row block. This is quite a bit more involved though so maybe note worth the complexity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8906) Experiment with optimizing partition merging when we can prove that some sources don't overlap
[ https://issues.apache.org/jira/browse/CASSANDRA-8906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-8906: -- Assignee: Ariel Weisberg Experiment with optimizing partition merging when we can prove that some sources don't overlap -- Key: CASSANDRA-8906 URL: https://issues.apache.org/jira/browse/CASSANDRA-8906 Project: Cassandra Issue Type: Improvement Reporter: Sylvain Lebresne Assignee: Ariel Weisberg Labels: compaction, performance Fix For: 3.x When we merge a partition from two sources and it turns out that those 2 sources don't overlap for that partition, we still end up doing one comparison by row in the first source. However, if we can prove that the 2 sources don't overlap, for example by using the sstable min/max clustering values that we store, we could speed this up. Note that it practice it's little bit more hairy because we need to deal with N sources, but that's probably not too hard either. I'll note that using the sstable min/max clustering values is not terribly precise. We could do better if we were to push the same reasoning inside the merge iterator, by for instance using the sstable per-partition index, which can in theory tell use things like don't bother comparing rows until the end of this row block. This is quite a bit more involved though so maybe note worth the complexity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9258) Range movement causes CPU performance impact
[ https://issues.apache.org/jira/browse/CASSANDRA-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641218#comment-14641218 ] Jonathan Ellis commented on CASSANDRA-9258: --- No. Range movement causes CPU performance impact -- Key: CASSANDRA-9258 URL: https://issues.apache.org/jira/browse/CASSANDRA-9258 Project: Cassandra Issue Type: Bug Environment: Cassandra 2.1.4 Reporter: Rick Branson Fix For: 2.1.x Observing big CPU latency regressions when doing range movements on clusters with many tens of thousands of vnodes. See CPU usage increase by ~80% when a single node is being replaced. Top methods are: 1) Ljava/math/BigInteger;.compareTo in Lorg/apache/cassandra/dht/ComparableObjectToken;.compareTo 2) Lcom/google/common/collect/AbstractMapBasedMultimap;.wrapCollection in Lcom/google/common/collect/AbstractMapBasedMultimap$AsMap$AsMapIterator;.next 3) Lorg/apache/cassandra/db/DecoratedKey;.compareTo in Lorg/apache/cassandra/dht/Range;.contains Here's a sample stack from a thread dump: {code} Thrift:50673 daemon prio=10 tid=0x7f2f20164800 nid=0x3a04af runnable [0x7f2d878d] java.lang.Thread.State: RUNNABLE at org.apache.cassandra.dht.Range.isWrapAround(Range.java:260) at org.apache.cassandra.dht.Range.contains(Range.java:51) at org.apache.cassandra.dht.Range.contains(Range.java:110) at org.apache.cassandra.locator.TokenMetadata.pendingEndpointsFor(TokenMetadata.java:916) at org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:775) at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:541) at org.apache.cassandra.service.StorageProxy.mutateWithTriggers(StorageProxy.java:616) at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:1101) at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:1083) at org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:976) at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3996) at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3980) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:205) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745){code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9259) Bulk Reading from Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-9259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9259: -- Assignee: Ariel Weisberg Bulk Reading from Cassandra --- Key: CASSANDRA-9259 URL: https://issues.apache.org/jira/browse/CASSANDRA-9259 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Brian Hess Assignee: Ariel Weisberg This ticket is following on from the 2015 NGCC. This ticket is designed to be a place for discussing and designing an approach to bulk reading. The goal is to have a bulk reading path for Cassandra. That is, a path optimized to grab a large portion of the data for a table (potentially all of it). This is a core element in the Spark integration with Cassandra, and the speed at which Cassandra can deliver bulk data to Spark is limiting the performance of Spark-plus-Cassandra operations. This is especially of importance as Cassandra will (likely) leverage Spark for internal operations (for example CASSANDRA-8234). The core CQL to consider is the following: SELECT a, b, c FROM myKs.myTable WHERE Token(partitionKey) X AND Token(partitionKey) = Y Here, we choose X and Y to be contained within one token range (perhaps considering the primary range of a node without vnodes, for example). This query pushes 50K-100K rows/sec, which is not very fast if we are doing bulk operations via Spark (or other processing frameworks - ETL, etc). There are a few causes (e.g., inefficient paging). There are a few approaches that could be considered. First, we consider a new Streaming Compaction approach. The key observation here is that a bulk read from Cassandra is a lot like a major compaction, though instead of outputting a new SSTable we would output CQL rows to a stream/socket/etc. This would be similar to a CompactionTask, but would strip out some unnecessary things in there (e.g., some of the indexing, etc). Predicates and projections could also be encapsulated in this new StreamingCompactionTask, for example. Another approach would be an alternate storage format. For example, we might employ Parquet (just as an example) to store the same data as in the primary Cassandra storage (aka SSTables). This is akin to Global Indexes (an alternate storage of the same data optimized for a particular query). Then, Cassandra can choose to leverage this alternate storage for particular CQL queries (e.g., range scans). These are just 2 suggestions to get the conversation going. One thing to note is that it will be useful to have this storage segregated by token range so that when you extract via these mechanisms you do not get replications-factor numbers of copies of the data. That will certainly be an issue for some Spark operations (e.g., counting). Thus, we will want per-token-range storage (even for single disks), so this will likely leverage CASSANDRA-6696 (though, we'll want to also consider the single disk case). It is also worth discussing what the success criteria is here. It is unlikely to be as fast as EDW or HDFS performance (though, that is still a good goal), but being within some percentage of that performance should be set as success. For example, 2x as long as doing bulk operations on HDFS with similar node count/size/etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9830) Option to disable bloom filter in highest level of LCS sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-9830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9830: -- Reviewer: Joshua McKenzie Option to disable bloom filter in highest level of LCS sstables --- Key: CASSANDRA-9830 URL: https://issues.apache.org/jira/browse/CASSANDRA-9830 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Paulo Motta Priority: Minor Labels: performance Fix For: 3.x We expect about 90% of data to be in the highest level of LCS in a fully populated series. (See also CASSANDRA-9829.) Thus if the user is primarily asking for data (partitions) that has actually been inserted, the bloom filter on the highest level only helps reject sstables about 10% of the time. We should add an option that suppresses bloom filter creation on top-level sstables. This will dramatically reduce memory usage for LCS and may even improve performance as we no longer check a low-value filter. (This is also an idea from RocksDB.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9890) Bytecode inspection for Java-UDFs
[ https://issues.apache.org/jira/browse/CASSANDRA-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9890: -- Reviewer: T Jake Luciani Bytecode inspection for Java-UDFs - Key: CASSANDRA-9890 URL: https://issues.apache.org/jira/browse/CASSANDRA-9890 Project: Cassandra Issue Type: Improvement Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.0.0 rc1 (Follow-up to CASSANDRA-9402) For Java-UDFs we could inspect the compiled Java byte code to find usages of the Java language that are forbidden to UDFs. These include usages of: * {{synchronized}} keyword * call to {{j.l.Object.wait}} * call to {{j.l.Object.notify}} * call to {{j.l.Object.notifyAll}} * call to {{j.l.Object.getClass}} * calls to specific methods of currently allowed classes in the driver (but would need some investigation) By inspecting the byte code _before_ the class is actually used, even dirty constructs like the following would be impossible: {noformat} CREATE OR REPLACE FUNCTION ... AS $$ return Math.sin(val); } { // anonymous initializer code } static { // static initializer code $$; {noformat} (inspired by [this blog post|http://jordan-wright.com/blog/2015/03/08/elasticsearch-rce-vulnerability-cve-2015-1427/]) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9830) Option to disable bloom filter in highest level of LCS sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-9830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640988#comment-14640988 ] Jonathan Ellis commented on CASSANDRA-9830: --- Good point, but IMO we should still make disabled the default [in 3.x] and let users enable it if necessary. Option to disable bloom filter in highest level of LCS sstables --- Key: CASSANDRA-9830 URL: https://issues.apache.org/jira/browse/CASSANDRA-9830 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Paulo Motta Priority: Minor Labels: performance Fix For: 3.x We expect about 90% of data to be in the highest level of LCS in a fully populated series. (See also CASSANDRA-9829.) Thus if the user is primarily asking for data (partitions) that has actually been inserted, the bloom filter on the highest level only helps reject sstables about 10% of the time. We should add an option that suppresses bloom filter creation on top-level sstables. This will dramatically reduce memory usage for LCS and may even improve performance as we no longer check a low-value filter. (This is also an idea from RocksDB.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9892) Add support for unsandboxed UDF
[ https://issues.apache.org/jira/browse/CASSANDRA-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640442#comment-14640442 ] Jonathan Ellis commented on CASSANDRA-9892: --- I get what you mean, but from a user's perspective it would mean we trust the server to guarantee that the function can't do bad things. We could use a different term if that's confusing though. Add support for unsandboxed UDF --- Key: CASSANDRA-9892 URL: https://issues.apache.org/jira/browse/CASSANDRA-9892 Project: Cassandra Issue Type: New Feature Reporter: Jonathan Ellis Assignee: Robert Stupp Priority: Minor From discussion on CASSANDRA-9402, The approach postgresql takes is to distinguish between trusted (sandboxed) and untrusted (anything goes) UDF languages. Creating an untrusted language always requires superuser mode. Once that is done, creating functions in it requires nothing special. Personally I would be fine with this approach, but I think it would be more useful to have the extra permission on creating the function, and also wouldn't require adding explicit CREATE LANGUAGE. So I'd suggest just providing different CQL permissions for trusted and untrusted, i.e. if you have CREATE FUNCTION permission that allows you to create sandboxed UDF, but you can only create unsandboxed if you have CREATE UNTRUSTED. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-9498) If more than 65K columns, sparse layout will break
[ https://issues.apache.org/jira/browse/CASSANDRA-9498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis resolved CASSANDRA-9498. --- Resolution: Duplicate Assignee: (was: Benedict) Fix Version/s: (was: 3.0 beta 1) If more than 65K columns, sparse layout will break -- Key: CASSANDRA-9498 URL: https://issues.apache.org/jira/browse/CASSANDRA-9498 Project: Cassandra Issue Type: Bug Components: Core Reporter: Benedict Priority: Minor Follow up to CASSANDRA-8099. It is a relatively small bug, since the exposed population of users is likely to be very low, but fixing it in a good way is a bit tricky. I'm filing a separate JIRA, because I would like us to address this by introducing a writeVInt method to DataOutputStreamPlus, that we can also exploit to improve the encoding of timestamps and deletion times, and this JIRA will help to track the dependencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9881) Rows with negative-sized keys can't be skipped by sstablescrub
[ https://issues.apache.org/jira/browse/CASSANDRA-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9881: -- Assignee: Stefania Rows with negative-sized keys can't be skipped by sstablescrub -- Key: CASSANDRA-9881 URL: https://issues.apache.org/jira/browse/CASSANDRA-9881 Project: Cassandra Issue Type: Bug Components: Core Reporter: Brandon Williams Assignee: Stefania Priority: Minor Fix For: 2.1.x It is possible to have corruption in such a way that scrub (on or offline) can't skip the row, so you end up in a loop where this just keeps repeating: {noformat} WARNING: Row starting at position 2087453 is unreadable; skipping to next Reading row at 2087453 row (unreadable key) is -1 bytes {noformat} The workaround is to just delete the problem sstable since you were going to have to repair anyway, but it would still be nice to salvage the rest of the sstable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9498) If more than 65K columns, sparse layout will break
[ https://issues.apache.org/jira/browse/CASSANDRA-9498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9498: -- Assignee: Benedict Let's limit this to not imposing any new backwards compatibility challenges for b1. We can do more in 3.x. If more than 65K columns, sparse layout will break -- Key: CASSANDRA-9498 URL: https://issues.apache.org/jira/browse/CASSANDRA-9498 Project: Cassandra Issue Type: Bug Components: Core Reporter: Benedict Assignee: Benedict Priority: Minor Fix For: 3.0 beta 1 Follow up to CASSANDRA-8099. It is a relatively small bug, since the exposed population of users is likely to be very low, but fixing it in a good way is a bit tricky. I'm filing a separate JIRA, because I would like us to address this by introducing a writeVInt method to DataOutputStreamPlus, that we can also exploit to improve the encoding of timestamps and deletion times, and this JIRA will help to track the dependencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7937) Apply backpressure gently when overloaded with writes
[ https://issues.apache.org/jira/browse/CASSANDRA-7937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638908#comment-14638908 ] Jonathan Ellis commented on CASSANDRA-7937: --- I think we can do as well with a simpler approach by using MessagingService queues as a proxy for target's load. (If the target is overwhelmed it will read slower from the socket and our queue will not drain; if it is not more-than-usually-overwhelmed but clients are sending us so many requests for that target that we still can't drain it fast enough, then we should also pause accepting extra requests.) See CASSANDRA-9318 and in particular my summary [here|https://issues.apache.org/jira/browse/CASSANDRA-9318?focusedCommentId=14604649page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14604649]. (NB feel free to reassign to Jacek if he has free cycles.) Apply backpressure gently when overloaded with writes - Key: CASSANDRA-7937 URL: https://issues.apache.org/jira/browse/CASSANDRA-7937 Project: Cassandra Issue Type: Improvement Components: Core Environment: Cassandra 2.0 Reporter: Piotr Kołaczkowski Assignee: Jacek Lewandowski Labels: performance When writing huge amounts of data into C* cluster from analytic tools like Hadoop or Apache Spark, we can see that often C* can't keep up with the load. This is because analytic tools typically write data as fast as they can in parallel, from many nodes and they are not artificially rate-limited, so C* is the bottleneck here. Also, increasing the number of nodes doesn't really help, because in a collocated setup this also increases number of Hadoop/Spark nodes (writers) and although possible write performance is higher, the problem still remains. We observe the following behavior: 1. data is ingested at an extreme fast pace into memtables and flush queue fills up 2. the available memory limit for memtables is reached and writes are no longer accepted 3. the application gets hit by write timeout, and retries repeatedly, in vain 4. after several failed attempts to write, the job gets aborted Desired behaviour: 1. data is ingested at an extreme fast pace into memtables and flush queue fills up 2. after exceeding some memtable fill threshold, C* applies adaptive rate limiting to writes - the more the buffers are filled-up, the less writes/s are accepted, however writes still occur within the write timeout. 3. thanks to slowed down data ingestion, now flush can finish before all the memory gets used Of course the details how rate limiting could be done are up for a discussion. It may be also worth considering putting such logic into the driver, not C* core, but then C* needs to expose at least the following information to the driver, so we could calculate the desired maximum data rate: 1. current amount of memory available for writes before they would completely block 2. total amount of data queued to be flushed and flush progress (amount of data to flush remaining for the memtable currently being flushed) 3. average flush write speed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9402) Implement proper sandboxing for UDFs
[ https://issues.apache.org/jira/browse/CASSANDRA-9402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639147#comment-14639147 ] Jonathan Ellis commented on CASSANDRA-9402: --- nio is whitelisted, but my understanding is that's only checked *if* the SecurityManager approves. All i/o (file, socket) is prohibited there. Implement proper sandboxing for UDFs Key: CASSANDRA-9402 URL: https://issues.apache.org/jira/browse/CASSANDRA-9402 Project: Cassandra Issue Type: Task Reporter: T Jake Luciani Assignee: Robert Stupp Priority: Critical Labels: docs-impacting, security Fix For: 3.0 beta 1 Attachments: 9402-warning.txt We want to avoid a security exploit for our users. We need to make sure we ship 2.2 UDFs with good defaults so someone exposing it to the internet accidentally doesn't open themselves up to having arbitrary code run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-7937) Apply backpressure gently when overloaded with writes
[ https://issues.apache.org/jira/browse/CASSANDRA-7937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis resolved CASSANDRA-7937. --- Resolution: Later Assignee: (was: Jacek Lewandowski) Marking Later, we can reopen if 9318 proves insufficient. Apply backpressure gently when overloaded with writes - Key: CASSANDRA-7937 URL: https://issues.apache.org/jira/browse/CASSANDRA-7937 Project: Cassandra Issue Type: Improvement Components: Core Environment: Cassandra 2.0 Reporter: Piotr Kołaczkowski Labels: performance When writing huge amounts of data into C* cluster from analytic tools like Hadoop or Apache Spark, we can see that often C* can't keep up with the load. This is because analytic tools typically write data as fast as they can in parallel, from many nodes and they are not artificially rate-limited, so C* is the bottleneck here. Also, increasing the number of nodes doesn't really help, because in a collocated setup this also increases number of Hadoop/Spark nodes (writers) and although possible write performance is higher, the problem still remains. We observe the following behavior: 1. data is ingested at an extreme fast pace into memtables and flush queue fills up 2. the available memory limit for memtables is reached and writes are no longer accepted 3. the application gets hit by write timeout, and retries repeatedly, in vain 4. after several failed attempts to write, the job gets aborted Desired behaviour: 1. data is ingested at an extreme fast pace into memtables and flush queue fills up 2. after exceeding some memtable fill threshold, C* applies adaptive rate limiting to writes - the more the buffers are filled-up, the less writes/s are accepted, however writes still occur within the write timeout. 3. thanks to slowed down data ingestion, now flush can finish before all the memory gets used Of course the details how rate limiting could be done are up for a discussion. It may be also worth considering putting such logic into the driver, not C* core, but then C* needs to expose at least the following information to the driver, so we could calculate the desired maximum data rate: 1. current amount of memory available for writes before they would completely block 2. total amount of data queued to be flushed and flush progress (amount of data to flush remaining for the memtable currently being flushed) 3. average flush write speed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9318: -- Assignee: Jacek Lewandowski (was: Ariel Weisberg) Bound the number of in-flight requests at the coordinator - Key: CASSANDRA-9318 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 Project: Cassandra Issue Type: Improvement Reporter: Ariel Weisberg Assignee: Jacek Lewandowski Fix For: 2.1.x, 2.2.x It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9483) Document incompatibilities with -XX:+PerfDisableSharedMem
[ https://issues.apache.org/jira/browse/CASSANDRA-9483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9483: -- Assignee: T Jake Luciani (was: Tyler Hobbs) Document incompatibilities with -XX:+PerfDisableSharedMem - Key: CASSANDRA-9483 URL: https://issues.apache.org/jira/browse/CASSANDRA-9483 Project: Cassandra Issue Type: Task Components: Config, Documentation website Reporter: Tyler Hobbs Assignee: T Jake Luciani Priority: Minor Fix For: 3.0 beta 1 We recently discovered that [the Jolokia agent is incompatible with the -XX:+PerfDisableSharedMem JVM option|https://github.com/rhuss/jolokia/issues/198]. I assume that this may affect other monitoring tools as well. If we are going to leave this enabled by default, we should document the potential problems with it. A combination of a comment in {{cassandra-env.sh}} (and the Windows equivalent) and a comment in NEWS.txt should suffice, I think. If possible, it would be good to figure out what other tools are affected and also mention them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9416) 3.x should refuse to start on JVM_VERSION 1.8
[ https://issues.apache.org/jira/browse/CASSANDRA-9416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9416: -- Assignee: Philip Thompson 3.x should refuse to start on JVM_VERSION 1.8 --- Key: CASSANDRA-9416 URL: https://issues.apache.org/jira/browse/CASSANDRA-9416 Project: Cassandra Issue Type: Task Reporter: Michael Shuler Assignee: Philip Thompson Priority: Minor Labels: lhf Fix For: 3.0 beta 1 Attachments: trunk-9416.patch When I was looking at CASSANDRA-9408, I noticed that {{conf/cassandra-env.sh}} and {{conf/cassandra-env.ps1}} do JVM version checking and should get updated for 3.x to refuse to start with JVM_VERSION 1.8. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9498) If more than 65K columns, sparse layout will break
[ https://issues.apache.org/jira/browse/CASSANDRA-9498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639216#comment-14639216 ] Jonathan Ellis commented on CASSANDRA-9498: --- With 9499 finished what is left here? If more than 65K columns, sparse layout will break -- Key: CASSANDRA-9498 URL: https://issues.apache.org/jira/browse/CASSANDRA-9498 Project: Cassandra Issue Type: Bug Components: Core Reporter: Benedict Priority: Minor Fix For: 3.0 beta 1 Follow up to CASSANDRA-8099. It is a relatively small bug, since the exposed population of users is likely to be very low, but fixing it in a good way is a bit tricky. I'm filing a separate JIRA, because I would like us to address this by introducing a writeVInt method to DataOutputStreamPlus, that we can also exploit to improve the encoding of timestamps and deletion times, and this JIRA will help to track the dependencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9717) TestCommitLog segment size dtests fail on trunk
[ https://issues.apache.org/jira/browse/CASSANDRA-9717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9717: -- Assignee: Jim Witschey (was: Branimir Lambov) Reviewer: Ariel Weisberg TestCommitLog segment size dtests fail on trunk --- Key: CASSANDRA-9717 URL: https://issues.apache.org/jira/browse/CASSANDRA-9717 Project: Cassandra Issue Type: Sub-task Reporter: Jim Witschey Assignee: Jim Witschey Priority: Blocker Fix For: 3.0 beta 1 The test for the commit log segment size when the specified size is 32MB. It fails for me locally and on on cassci. ([cassci link|http://cassci.datastax.com/view/trunk/job/trunk_dtest/305/testReport/commitlog_test/TestCommitLog/default_segment_size_test/]) The command to run the test by itself is {{CASSANDRA_VERSION=git:trunk nosetests commitlog_test.py:TestCommitLog.default_segment_size_test}}. EDIT: a similar test, {{commitlog_test.py:TestCommitLog.small_segment_size_test}}, also fails with a similar error. The solution here may just be to change the expected size or the acceptable error -- the result isn't far off. I'm happy to make the dtest change if that's the solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9799) RangeTombstonListTest sometimes fails on trunk
[ https://issues.apache.org/jira/browse/CASSANDRA-9799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9799: -- Reviewer: Joshua McKenzie RangeTombstonListTest sometimes fails on trunk -- Key: CASSANDRA-9799 URL: https://issues.apache.org/jira/browse/CASSANDRA-9799 Project: Cassandra Issue Type: Test Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Labels: test Fix For: 3.0 beta 1 I've seen random failures with {{RangeTombstoneList.addAllRandomTest}}. The problem is 2 inequalities in {{RangeTombstoneList.insertFrom}} that should be inclusive rather than strict when we deal with boundaries between range. In practice, that makes us consider range like {{[3, 3)}} during addition, which is non-sensical. Attaching patch as well as a test that reproduce (extracted from {{addAllRandomTest}} with a failing seed). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9847) Don't serialize CFMetaData in read responses
[ https://issues.apache.org/jira/browse/CASSANDRA-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9847: -- Reviewer: Joshua McKenzie Don't serialize CFMetaData in read responses Key: CASSANDRA-9847 URL: https://issues.apache.org/jira/browse/CASSANDRA-9847 Project: Cassandra Issue Type: Sub-task Components: Core Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Fix For: 3.0 beta 1 Our CFMetaData ids are 16 bytes long, which for small messages is a non trivial part of the size (we're further currently unnecessarily serialize it with every partition). At least for read response, we don't really need to serialize it at all since we always to which query this is a response of. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9828) Minor improvements to RowStats
[ https://issues.apache.org/jira/browse/CASSANDRA-9828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9828: -- Reviewer: Joshua McKenzie Minor improvements to RowStats -- Key: CASSANDRA-9828 URL: https://issues.apache.org/jira/browse/CASSANDRA-9828 Project: Cassandra Issue Type: Sub-task Components: Core Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Priority: Minor Fix For: 3.0 beta 1 There is some small improvements/refactor I'd like to do for {{RowStats}}. More specifically, I'm attaching 3 commits: # the first one merely rename {{RowStats}} to {{EncodingStats}}. {{RowStats}} was not a terribly helpful name while {{EncodingStats}} at least give a sense of why the thing exists. # the 2nd one improve the serialization of those {{EncodingStats}}. {{EncodingStats}} holds both a {{minTimestamp}} and a {{minLocalDeletionTime}}, both of which are unix timestamp (or at least should be almost all the time for the timestamp by convention) and so are fairly big numbers that don't get much love (if any) from vint encoding. So the patch introducing hard-coded epoch numbers for both that roughly correspond to now, and substract that to the actual {{EncodingStats}} number to make it more rip for vint encoding. It does mean the exact encoding size will deteriorate over time, but it'll take a while before it becomes useless and we'll probably have more more change to the encodings by then anyway (and/or we can change the epoch number regularly with new versions of the messaging protocol if we so wish). # the last patch is just a small simple cleanup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9302) Optimize cqlsh COPY FROM, part 3
[ https://issues.apache.org/jira/browse/CASSANDRA-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637211#comment-14637211 ] Jonathan Ellis commented on CASSANDRA-9302: --- ([~aholmber] will provide us a pure-python murmur hash, so we can start in on cqlsh side of TAR while that's happening.) Optimize cqlsh COPY FROM, part 3 Key: CASSANDRA-9302 URL: https://issues.apache.org/jira/browse/CASSANDRA-9302 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Jonathan Ellis Assignee: David Kua Fix For: 2.1.x We've had some discussion moving to Spark CSV import for bulk load in 3.x, but people need a good bulk load tool now. One option is to add a separate Java bulk load tool (CASSANDRA-9048), but if we can match that performance from cqlsh I would prefer to leave COPY FROM as the preferred option to which we point people, rather than adding more tools that need to be supported indefinitely. Previous work on COPY FROM optimization was done in CASSANDRA-7405 and CASSANDRA-8225. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9304) COPY TO improvements
[ https://issues.apache.org/jira/browse/CASSANDRA-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9304: -- Reviewer: Stefania Alborghetti [~stefania_alborghetti] to review COPY TO improvements Key: CASSANDRA-9304 URL: https://issues.apache.org/jira/browse/CASSANDRA-9304 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: David Kua Priority: Minor Labels: cqlsh Fix For: 2.1.x COPY FROM has gotten a lot of love. COPY TO not so much. One obvious improvement could be to parallelize reading and writing (write one page of data while fetching the next). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9448) Metrics should use up to date nomenclature
[ https://issues.apache.org/jira/browse/CASSANDRA-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637659#comment-14637659 ] Jonathan Ellis commented on CASSANDRA-9448: --- Now that we don't cache entire partitions I actually think rowCache makes more sense. Metrics should use up to date nomenclature -- Key: CASSANDRA-9448 URL: https://issues.apache.org/jira/browse/CASSANDRA-9448 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Sam Tunnicliffe Assignee: Stefania Labels: docs-impacting, jmx Fix For: 3.0 beta 1 There are a number of exposed metrics that currently are named using the old nomenclature of columnfamily and rows (meaning partitions). It would be good to audit all metrics and update any names to match what they actually represent; we should probably do that in a single sweep to avoid a confusing mixture of old and new terminology. As we'd need to do this in a major release, I've initially set the fixver for 3.0 beta1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9871) Cannot replace token does not exist - DN node removed as Fat Client
[ https://issues.apache.org/jira/browse/CASSANDRA-9871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9871: -- Assignee: Stefania Cannot replace token does not exist - DN node removed as Fat Client --- Key: CASSANDRA-9871 URL: https://issues.apache.org/jira/browse/CASSANDRA-9871 Project: Cassandra Issue Type: Bug Reporter: Sebastian Estevez Assignee: Stefania Fix For: 2.1.x We lost a node due to disk failure, we tried to replace it via -Dcassandra.replace_address per -- http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsReplaceNode.html The node would not come up with these errors in the system.log: {code} INFO [main] 2015-07-22 03:20:06,722 StorageService.java:500 - Gathering node replacement information for /10.171.115.233 ... INFO [SharedPool-Worker-1] 2015-07-22 03:22:34,281 Gossiper.java:954 - InetAddress /10.111.183.101 is now UP INFO [GossipTasks:1] 2015-07-22 03:22:59,300 Gossiper.java:735 - FatClient /10.171.115.233 has been silent for 3ms, removing from gossip ERROR [main] 2015-07-22 03:23:28,485 CassandraDaemon.java:541 - Exception encountered during startup java.lang.UnsupportedOperationException: Cannot replace token -1013652079972151677 which does not exist! {code} It is not clear why Gossiper removed the node as a FatClient, given that it was a full node before it died and it had tokens assigned to it (including -1013652079972151677) in system.peers and nodetool ring. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9644) DTCS configuration proposals for handling consequences of repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-9644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9644: -- Labels: compaction dtcs (was: dtcs) DTCS configuration proposals for handling consequences of repairs - Key: CASSANDRA-9644 URL: https://issues.apache.org/jira/browse/CASSANDRA-9644 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Antti Nissinen Labels: compaction, dtcs Fix For: 3.x, 2.1.x Attachments: node0_20150621_1646_time_graph.txt, node0_20150621_2320_time_graph.txt, node0_20150623_1526_time_graph.txt, node1_20150621_1646_time_graph.txt, node1_20150621_2320_time_graph.txt, node1_20150623_1526_time_graph.txt, node2_20150621_1646_time_graph.txt, node2_20150621_2320_time_graph.txt, node2_20150623_1526_time_graph.txt, nodetool status infos.txt, sstable_compaction_trace.txt, sstable_compaction_trace_snipped.txt, sstable_counts.jpg This is a document bringing up some issues when DTCS is used to compact time series data in a three node cluster. The DTCS is currently configured with a few parameters that are making the configuration fairly simple, but might cause problems in certain special cases like recovering from the flood of small SSTables due to repair operation. We are suggesting some ideas that might be a starting point for further discussions. Following sections are containing: - Description of the cassandra setup - Feeding process of the data - Failure testing - Issues caused by the repair operations for the DTCS - Proposal for the DTCS configuration parameters Attachments are included to support the discussion and there is a separate section giving explanation for those. Cassandra setup and data model - Cluster is composed from three nodes running Cassandra 2.1.2. Replication factor is two and read and write consistency levels are ONE. - Data is time series data. Data is saved so that one row contains a certain time span of data for a given metric ( 20 days in this case). The row key contains information about the start time of the time span and metrix name. Column name gives the offset from the beginning of time span. Column time stamp is set to correspond time stamp when adding together the timestamp from the row key and the offset (the actual time stamp of data point). Data model is analog to KairosDB implementation. - Average sampling rate is 10 seconds varying significantly from metric to metric. - 100 000 metrics are fed to the Cassandra. - max_sstable_age_days is set to 5 days (objective is to keep SStable files in manageable size, around 50 GB) - TTL is not in use in the test. Procedure for the failure test. - Data is first dumped to Cassandra for 11 days and the data dumping is stopped so that DTCS will have a change to finish all compactions. Data is dumped with fake timestamps so that column time stamp is set when data is written to Cassandra. - One of the nodes is taken down and new data is dumped on top of the earlier data covering couple of hours worth of data (faked time stamps). - Dumping is stopped and the node is kept down for few hours. - Node is taken up and the nodetool repair is applied on the node that was down. Consequences - Repair operation will lead to massive amount of new SStables far back in the history. New SStables are covering similar time spans than the files that were created by DTCS before the shutdown of one of the nodes. - To be able to compact the small files the max_sstable_age_days should be increased to allow compaction to handle the files. However, the in a practical case the time window will increase so large that generated files will be huge that is not desirable. The compaction also combines together one very large file with a bunch of small files in several phases that is not effective. Generating really large files may also lead to out of disc space problems. - See the list of time graphs later in the document. Improvement proposals for the DTCS configuration Below is a list of desired properties for the configuration. Current parameters are mentioned if available. - Initial window size (currently:base_time_seconds) - The amount of similar size windows for the bucketing (currently: min_threshold) - The multiplier for the window size when increased (currently: min_threshold). This we would like to be independent from the min_threshold parameter so that you could actually control the rate how fast the window size is increased. - Maximum length of the time window inside which the files are assigned for a certain bucket (not currently defined). This means that expansion of time window length is restricted. When the limit is
[jira] [Updated] (CASSANDRA-9644) DTCS configuration proposals for handling consequences of repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-9644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9644: -- Assignee: Marcus Eriksson DTCS configuration proposals for handling consequences of repairs - Key: CASSANDRA-9644 URL: https://issues.apache.org/jira/browse/CASSANDRA-9644 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Antti Nissinen Assignee: Marcus Eriksson Labels: compaction, dtcs Fix For: 3.x, 2.1.x Attachments: node0_20150621_1646_time_graph.txt, node0_20150621_2320_time_graph.txt, node0_20150623_1526_time_graph.txt, node1_20150621_1646_time_graph.txt, node1_20150621_2320_time_graph.txt, node1_20150623_1526_time_graph.txt, node2_20150621_1646_time_graph.txt, node2_20150621_2320_time_graph.txt, node2_20150623_1526_time_graph.txt, nodetool status infos.txt, sstable_compaction_trace.txt, sstable_compaction_trace_snipped.txt, sstable_counts.jpg This is a document bringing up some issues when DTCS is used to compact time series data in a three node cluster. The DTCS is currently configured with a few parameters that are making the configuration fairly simple, but might cause problems in certain special cases like recovering from the flood of small SSTables due to repair operation. We are suggesting some ideas that might be a starting point for further discussions. Following sections are containing: - Description of the cassandra setup - Feeding process of the data - Failure testing - Issues caused by the repair operations for the DTCS - Proposal for the DTCS configuration parameters Attachments are included to support the discussion and there is a separate section giving explanation for those. Cassandra setup and data model - Cluster is composed from three nodes running Cassandra 2.1.2. Replication factor is two and read and write consistency levels are ONE. - Data is time series data. Data is saved so that one row contains a certain time span of data for a given metric ( 20 days in this case). The row key contains information about the start time of the time span and metrix name. Column name gives the offset from the beginning of time span. Column time stamp is set to correspond time stamp when adding together the timestamp from the row key and the offset (the actual time stamp of data point). Data model is analog to KairosDB implementation. - Average sampling rate is 10 seconds varying significantly from metric to metric. - 100 000 metrics are fed to the Cassandra. - max_sstable_age_days is set to 5 days (objective is to keep SStable files in manageable size, around 50 GB) - TTL is not in use in the test. Procedure for the failure test. - Data is first dumped to Cassandra for 11 days and the data dumping is stopped so that DTCS will have a change to finish all compactions. Data is dumped with fake timestamps so that column time stamp is set when data is written to Cassandra. - One of the nodes is taken down and new data is dumped on top of the earlier data covering couple of hours worth of data (faked time stamps). - Dumping is stopped and the node is kept down for few hours. - Node is taken up and the nodetool repair is applied on the node that was down. Consequences - Repair operation will lead to massive amount of new SStables far back in the history. New SStables are covering similar time spans than the files that were created by DTCS before the shutdown of one of the nodes. - To be able to compact the small files the max_sstable_age_days should be increased to allow compaction to handle the files. However, the in a practical case the time window will increase so large that generated files will be huge that is not desirable. The compaction also combines together one very large file with a bunch of small files in several phases that is not effective. Generating really large files may also lead to out of disc space problems. - See the list of time graphs later in the document. Improvement proposals for the DTCS configuration Below is a list of desired properties for the configuration. Current parameters are mentioned if available. - Initial window size (currently:base_time_seconds) - The amount of similar size windows for the bucketing (currently: min_threshold) - The multiplier for the window size when increased (currently: min_threshold). This we would like to be independent from the min_threshold parameter so that you could actually control the rate how fast the window size is increased. - Maximum length of the time window inside which the files are assigned for a certain bucket (not currently defined). This means that expansion of time window length is
[jira] [Commented] (CASSANDRA-9851) Write Durability Failures Even During Batch Commit Mode
[ https://issues.apache.org/jira/browse/CASSANDRA-9851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634149#comment-14634149 ] Jonathan Ellis commented on CASSANDRA-9851: --- Can you bisect? Write Durability Failures Even During Batch Commit Mode Key: CASSANDRA-9851 URL: https://issues.apache.org/jira/browse/CASSANDRA-9851 Project: Cassandra Issue Type: Bug Components: Core Environment: Debian, x86_64, Kernel 3.16.7 Reporter: Joel Knighton Attachments: n1.log, n2.log, n3.log, n4.log, n5.log Reproducible as of a66863861136a29dc04d7bc3b319f9f8fae0f49f on trunk, as well as in other recent commits. Durability of writes seems to be violated, even under batch commitlog mode. This issue was discovered by a test that adds a range of values to a CQL Set, with no deletes issued. The test is available here https://github.com/riptano/jepsen/blob/cassandra/cassandra/src/cassandra/collections/set.clj#L56. During this write pattern, random nodes in the 5 node cluster are kill -9ed. Once all nodes have been brought back up, another read at CL.ALL is issued. This read fails to return values that have previously been successfully read from the cluster. This problem is not reproducible on 2.1.* or 2.2. Log files from each node are attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632541#comment-14632541 ] Jonathan Ellis commented on CASSANDRA-6477: --- Why not just apply MV maintenance to streamed rows the way we do 2i maintenance? Materialized Views (was: Global Indexes) Key: CASSANDRA-6477 URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 Project: Cassandra Issue Type: New Feature Components: API, Core Reporter: Jonathan Ellis Assignee: Carl Yeksigian Labels: cql Fix For: 3.0 beta 1 Attachments: test-view-data.sh, users.yaml Local indexes are suitable for low-cardinality data, where spreading the index across the cluster is a Good Thing. However, for high-cardinality data, local indexes require querying most nodes in the cluster even if only a handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632542#comment-14632542 ] Jonathan Ellis commented on CASSANDRA-6477: --- Where do we rely on a single node? Materialized Views (was: Global Indexes) Key: CASSANDRA-6477 URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 Project: Cassandra Issue Type: New Feature Components: API, Core Reporter: Jonathan Ellis Assignee: Carl Yeksigian Labels: cql Fix For: 3.0 beta 1 Attachments: test-view-data.sh, users.yaml Local indexes are suitable for low-cardinality data, where spreading the index across the cluster is a Good Thing. However, for high-cardinality data, local indexes require querying most nodes in the cluster even if only a handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632545#comment-14632545 ] Jonathan Ellis commented on CASSANDRA-6477: --- The majority of use cases are going to be denormalizing what are today query tables, i.e., I want to give the client what it needs by scanning a single partition. Doing extra queries to save disk space may occasionally be necessary but it is not the norm. Materialized Views (was: Global Indexes) Key: CASSANDRA-6477 URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 Project: Cassandra Issue Type: New Feature Components: API, Core Reporter: Jonathan Ellis Assignee: Carl Yeksigian Labels: cql Fix For: 3.0 beta 1 Attachments: test-view-data.sh, users.yaml Local indexes are suitable for low-cardinality data, where spreading the index across the cluster is a Good Thing. However, for high-cardinality data, local indexes require querying most nodes in the cluster even if only a handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631410#comment-14631410 ] Jonathan Ellis commented on CASSANDRA-6477: --- bq. From a user's perspective, I agree with Sylvain that the MV should respect the CL. I wouldn't expect to do a write at ALL, then do a read and get an old record back. But the other side of that coin is is, we're effectively promoting all operations to at least QUORUM regardless of what the user asked for... Materialized Views (was: Global Indexes) Key: CASSANDRA-6477 URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 Project: Cassandra Issue Type: New Feature Components: API, Core Reporter: Jonathan Ellis Assignee: Carl Yeksigian Labels: cql Fix For: 3.0 beta 1 Attachments: test-view-data.sh, users.yaml Local indexes are suitable for low-cardinality data, where spreading the index across the cluster is a Good Thing. However, for high-cardinality data, local indexes require querying most nodes in the cluster even if only a handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631422#comment-14631422 ] Jonathan Ellis commented on CASSANDRA-6477: --- 1. Paired replica? What? 2. Under what conditions does replica BL save you from replaying coordinator BL? Materialized Views (was: Global Indexes) Key: CASSANDRA-6477 URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 Project: Cassandra Issue Type: New Feature Components: API, Core Reporter: Jonathan Ellis Assignee: Carl Yeksigian Labels: cql Fix For: 3.0 beta 1 Attachments: test-view-data.sh, users.yaml Local indexes are suitable for low-cardinality data, where spreading the index across the cluster is a Good Thing. However, for high-cardinality data, local indexes require querying most nodes in the cluster even if only a handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631426#comment-14631426 ] Jonathan Ellis commented on CASSANDRA-6477: --- Pedantically you are correct. Which is why I said effectively and not literally. :) Materialized Views (was: Global Indexes) Key: CASSANDRA-6477 URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 Project: Cassandra Issue Type: New Feature Components: API, Core Reporter: Jonathan Ellis Assignee: Carl Yeksigian Labels: cql Fix For: 3.0 beta 1 Attachments: test-view-data.sh, users.yaml Local indexes are suitable for low-cardinality data, where spreading the index across the cluster is a Good Thing. However, for high-cardinality data, local indexes require querying most nodes in the cluster even if only a handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631466#comment-14631466 ] Jonathan Ellis commented on CASSANDRA-6477: --- No, you're right. Synchronous MV updates is a terrible idea, which is more obvious when considering the case of more than one MV. In the extreme case you could touch every node in the cluster. Materialized Views (was: Global Indexes) Key: CASSANDRA-6477 URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 Project: Cassandra Issue Type: New Feature Components: API, Core Reporter: Jonathan Ellis Assignee: Carl Yeksigian Labels: cql Fix For: 3.0 beta 1 Attachments: test-view-data.sh, users.yaml Local indexes are suitable for low-cardinality data, where spreading the index across the cluster is a Good Thing. However, for high-cardinality data, local indexes require querying most nodes in the cluster even if only a handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631471#comment-14631471 ] Jonathan Ellis commented on CASSANDRA-6477: --- If there are multiple MVs being updated, do they get merged into a single set of batchlogs? (I.e. Just one on coordinator, one on each base replica, instead of one per MV.) Materialized Views (was: Global Indexes) Key: CASSANDRA-6477 URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 Project: Cassandra Issue Type: New Feature Components: API, Core Reporter: Jonathan Ellis Assignee: Carl Yeksigian Labels: cql Fix For: 3.0 beta 1 Attachments: test-view-data.sh, users.yaml Local indexes are suitable for low-cardinality data, where spreading the index across the cluster is a Good Thing. However, for high-cardinality data, local indexes require querying most nodes in the cluster even if only a handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631734#comment-14631734 ] Jonathan Ellis commented on CASSANDRA-6477: --- I disagree about making synchronous the default. As Jake points out that can kill your availability even on a single MV if you are unlucky with replica placement, and it's virtually guaranteed to kill it with many MV. I would go so far as to say that synchronous MV updates are not useful and we should not bother adding it. Materialized Views (was: Global Indexes) Key: CASSANDRA-6477 URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 Project: Cassandra Issue Type: New Feature Components: API, Core Reporter: Jonathan Ellis Assignee: Carl Yeksigian Labels: cql Fix For: 3.0 beta 1 Attachments: test-view-data.sh, users.yaml Local indexes are suitable for low-cardinality data, where spreading the index across the cluster is a Good Thing. However, for high-cardinality data, local indexes require querying most nodes in the cluster even if only a handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-9842) Creation of partition and update of static columns in the same LWT fails
[ https://issues.apache.org/jira/browse/CASSANDRA-9842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis resolved CASSANDRA-9842. --- Resolution: Not A Problem Cassandra's behavior is correct, the partition not existing is not the same as existing with null value. Creation of partition and update of static columns in the same LWT fails Key: CASSANDRA-9842 URL: https://issues.apache.org/jira/browse/CASSANDRA-9842 Project: Cassandra Issue Type: Bug Components: Core Environment: cassandra-2.1.8 on Ubuntu 15.04 Reporter: Chandra Sekar Both inserting a row (in a non-existent partition) and updating a static column in the same LWT fails. Creating the partition before performing the LWT works. h3. Table Definition {code} create table txtable(pcol bigint, ccol bigint, scol bigint static, ncol text, primary key((pcol), ccol)); {code} h3. Inserting row in non-existent partition and updating static column in one LWT {code} begin batch insert into txtable (pcol, ccol, ncol) values (1, 1, 'A'); update txtable set scol = 1 where pcol = 1 if scol = null; apply batch; [applied] --- False {code} h3. Creating partition before LWT {code} insert into txtable (pcol, scol) values (1, null) if not exists; begin batch insert into txtable (pcol, ccol, ncol) values (1, 1, 'A'); update txtable set scol = 1 where pcol = 1 if scol = null; apply batch; [applied] --- True {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9843) Augment or replace partition index with adaptive range filters
Jonathan Ellis created CASSANDRA-9843: - Summary: Augment or replace partition index with adaptive range filters Key: CASSANDRA-9843 URL: https://issues.apache.org/jira/browse/CASSANDRA-9843 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: T Jake Luciani Adaptive range filters are, in principle, bloom filters for range queries. They provide a space-efficient way to avoid scanning a partition when we can tell that we do not contain any data for the range requested. Like BF, they can return false positives but not false negatives. The implementation is of course totally different from BF. ARF is a tree where each leaf of the tree is a range of data and a bit, either on or off, denoting whether we have *some* data in that range. ARF are described here: http://www.vldb.org/pvldb/vol6/p1714-kossmann.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9842) Creation of partition and update of static columns in the same LWT fails
[ https://issues.apache.org/jira/browse/CASSANDRA-9842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632248#comment-14632248 ] Jonathan Ellis commented on CASSANDRA-9842: --- The LWT clause is evaluated before the rest of the batch. Statement order doesn't matter. Creation of partition and update of static columns in the same LWT fails Key: CASSANDRA-9842 URL: https://issues.apache.org/jira/browse/CASSANDRA-9842 Project: Cassandra Issue Type: Bug Components: Core Environment: cassandra-2.1.8 on Ubuntu 15.04 Reporter: Chandra Sekar Both inserting a row (in a non-existent partition) and updating a static column in the same LWT fails. Creating the partition before performing the LWT works. h3. Table Definition {code} create table txtable(pcol bigint, ccol bigint, scol bigint static, ncol text, primary key((pcol), ccol)); {code} h3. Inserting row in non-existent partition and updating static column in one LWT {code} begin batch insert into txtable (pcol, ccol, ncol) values (1, 1, 'A'); update txtable set scol = 1 where pcol = 1 if scol = null; apply batch; [applied] --- False {code} h3. Creating partition before LWT {code} insert into txtable (pcol, scol) values (1, null) if not exists; begin batch insert into txtable (pcol, ccol, ncol) values (1, 1, 'A'); update txtable set scol = 1 where pcol = 1 if scol = null; apply batch; [applied] --- True {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9843) Augment or replace partition index with adaptive range filters
[ https://issues.apache.org/jira/browse/CASSANDRA-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9843: -- Labels: performance (was: ) Augment or replace partition index with adaptive range filters -- Key: CASSANDRA-9843 URL: https://issues.apache.org/jira/browse/CASSANDRA-9843 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: T Jake Luciani Labels: performance Adaptive range filters are, in principle, bloom filters for range queries. They provide a space-efficient way to avoid scanning a partition when we can tell that we do not contain any data for the range requested. Like BF, they can return false positives but not false negatives. The implementation is of course totally different from BF. ARF is a tree where each leaf of the tree is a range of data and a bit, either on or off, denoting whether we have *some* data in that range. ARF are described here: http://www.vldb.org/pvldb/vol6/p1714-kossmann.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9843) Augment or replace partition index with adaptive range filters
[ https://issues.apache.org/jira/browse/CASSANDRA-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632257#comment-14632257 ] Jonathan Ellis commented on CASSANDRA-9843: --- Rather than just adding an ARF per partition (the way we used to have a BF -- the difference is that BF is not useful for scans but this would be), we may be able to adapt this further by moving our index into the ARF. Instead of just a bit indicating yes or no we could have the offset for the start of each range [that we do have data for] in the leaf. (The adaptive in ARF means you can tune it to index hot parts of the data range in greater detail, without increasing the total memory used, at the cost of less detail for the cold ranges. We could do this in Cassandra as well, writing updated ARF to a new file. This could reduce the memory problems of pulling the indexes for very large partitions into memory. However, the paper describes very good results even without adaptation, so this is not required for proof of concept.) Augment or replace partition index with adaptive range filters -- Key: CASSANDRA-9843 URL: https://issues.apache.org/jira/browse/CASSANDRA-9843 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: T Jake Luciani Labels: performance Adaptive range filters are, in principle, bloom filters for range queries. They provide a space-efficient way to avoid scanning a partition when we can tell that we do not contain any data for the range requested. Like BF, they can return false positives but not false negatives. The implementation is of course totally different from BF. ARF is a tree where each leaf of the tree is a range of data and a bit, either on or off, denoting whether we have *some* data in that range. ARF are described here: http://www.vldb.org/pvldb/vol6/p1714-kossmann.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)