[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897121#comment-15897121 ] Christian Esken commented on CASSANDRA-13265: - I was already looking into doing a unit test it but it requires access to the queue which means making it package level access and using {{@VisibleForTesting}}. I will do that tomorrow, unless there are arguments against it. > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13300) Upgrade the jna version to 4.3.0
[ https://issues.apache.org/jira/browse/CASSANDRA-13300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897218#comment-15897218 ] Jason Brown commented on CASSANDRA-13300: - [~amitkumar_ghatwal] to upgrade the jar, you've already performed the necessary steps for your own build :). As for jna itself, eyeballing the [CHANGES.md|https://github.com/java-native-access/jna/blob/master/CHANGES.md], the vast majority of changes are Windows-related - and, of course, the [PPCLE|https://github.com/java-native-access/jna/pull/425] change (as [~yukim] pointed out). As there's nothing really on the critical-bug/security front, I think we should only update trunk and not any earlier branches. > Upgrade the jna version to 4.3.0 > > > Key: CASSANDRA-13300 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13300 > Project: Cassandra > Issue Type: Improvement > Components: Configuration >Reporter: Amitkumar Ghatwal > > Could you please upgrade the jna version present in the github cassandra > location : https://github.com/apache/cassandra/blob/trunk/lib/jna-4.0.0.jar > to below latest version - 4.3.0 - > http://repo1.maven.org/maven2/net/java/dev/jna/jna/4.3.0/jna-4.3.0-javadoc.jar -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (CASSANDRA-13294) Possible data loss on upgrade 2.1 - 3.0
[ https://issues.apache.org/jira/browse/CASSANDRA-13294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson reassigned CASSANDRA-13294: --- Assignee: Stefania (was: Marcus Eriksson) Reviewer: Marcus Eriksson (was: Stefania) > Possible data loss on upgrade 2.1 - 3.0 > --- > > Key: CASSANDRA-13294 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13294 > Project: Cassandra > Issue Type: Bug >Reporter: Marcus Eriksson >Assignee: Stefania >Priority: Blocker > Fix For: 3.0.x, 3.11.x > > > After finishing a compaction we delete the compacted away files. This is done > [here|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/lifecycle/LogFile.java#L328-L337] > which uses > [this|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java#L265-L271] > to get the files - we get all files starting with {{absoluteFilePath}}. > Absolute file path is generated > [here|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/io/sstable/Descriptor.java#L142-L153]. > For 3.0 version files the filename looks like this: > {{/blabla/keyspace1/standard1-bdb031c0ff7b11e6940fdd0479dd8912/mc-1332-big}} > but for 2.1 version files, they look like this: > {{/blabla/keyspace1/standard1-bdb031c0ff7b11e6940fdd0479dd8912/keyspace1-standard1-ka-2}}. > The problem is then that if we were to finish a compaction including the > legacy file, we would actually delete all legacy files having a generation > starting with '2' -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13294) Possible data loss on upgrade 2.1 - 3.0
[ https://issues.apache.org/jira/browse/CASSANDRA-13294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897287#comment-15897287 ] Marcus Eriksson commented on CASSANDRA-13294: - +1 on the patch and [here|https://github.com/riptano/cassandra-dtest/pull/1449] is an upgrade dtest which reproduces this > Possible data loss on upgrade 2.1 - 3.0 > --- > > Key: CASSANDRA-13294 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13294 > Project: Cassandra > Issue Type: Bug >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Blocker > Fix For: 3.0.x, 3.11.x > > > After finishing a compaction we delete the compacted away files. This is done > [here|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/lifecycle/LogFile.java#L328-L337] > which uses > [this|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java#L265-L271] > to get the files - we get all files starting with {{absoluteFilePath}}. > Absolute file path is generated > [here|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/io/sstable/Descriptor.java#L142-L153]. > For 3.0 version files the filename looks like this: > {{/blabla/keyspace1/standard1-bdb031c0ff7b11e6940fdd0479dd8912/mc-1332-big}} > but for 2.1 version files, they look like this: > {{/blabla/keyspace1/standard1-bdb031c0ff7b11e6940fdd0479dd8912/keyspace1-standard1-ka-2}}. > The problem is then that if we were to finish a compaction including the > legacy file, we would actually delete all legacy files having a generation > starting with '2' -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13294) Possible data loss on upgrade 2.1 - 3.0
[ https://issues.apache.org/jira/browse/CASSANDRA-13294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-13294: Status: Ready to Commit (was: Patch Available) > Possible data loss on upgrade 2.1 - 3.0 > --- > > Key: CASSANDRA-13294 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13294 > Project: Cassandra > Issue Type: Bug >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Blocker > Fix For: 3.0.x, 3.11.x > > > After finishing a compaction we delete the compacted away files. This is done > [here|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/lifecycle/LogFile.java#L328-L337] > which uses > [this|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java#L265-L271] > to get the files - we get all files starting with {{absoluteFilePath}}. > Absolute file path is generated > [here|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/io/sstable/Descriptor.java#L142-L153]. > For 3.0 version files the filename looks like this: > {{/blabla/keyspace1/standard1-bdb031c0ff7b11e6940fdd0479dd8912/mc-1332-big}} > but for 2.1 version files, they look like this: > {{/blabla/keyspace1/standard1-bdb031c0ff7b11e6940fdd0479dd8912/keyspace1-standard1-ka-2}}. > The problem is then that if we were to finish a compaction including the > legacy file, we would actually delete all legacy files having a generation > starting with '2' -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (CASSANDRA-13300) Upgrade the jna version to 4.3.0
[ https://issues.apache.org/jira/browse/CASSANDRA-13300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown reassigned CASSANDRA-13300: --- Assignee: Jason Brown > Upgrade the jna version to 4.3.0 > > > Key: CASSANDRA-13300 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13300 > Project: Cassandra > Issue Type: Improvement > Components: Configuration >Reporter: Amitkumar Ghatwal >Assignee: Jason Brown > > Could you please upgrade the jna version present in the github cassandra > location : https://github.com/apache/cassandra/blob/trunk/lib/jna-4.0.0.jar > to below latest version - 4.3.0 - > http://repo1.maven.org/maven2/net/java/dev/jna/jna/4.3.0/jna-4.3.0-javadoc.jar -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13300) Upgrade the jna version to 4.3.0
[ https://issues.apache.org/jira/browse/CASSANDRA-13300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897230#comment-15897230 ] Jason Brown commented on CASSANDRA-13300: - Updated the jar and running the tests now: ||trunk|| |[branch|https://github.com/jasobrown/cassandra/tree/13300-trunk]| |[dtest|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-13300-trunk-dtest/]| |[testall|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-13300-trunk-testall/]| > Upgrade the jna version to 4.3.0 > > > Key: CASSANDRA-13300 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13300 > Project: Cassandra > Issue Type: Improvement > Components: Configuration >Reporter: Amitkumar Ghatwal >Assignee: Jason Brown > > Could you please upgrade the jna version present in the github cassandra > location : https://github.com/apache/cassandra/blob/trunk/lib/jna-4.0.0.jar > to below latest version - 4.3.0 - > http://repo1.maven.org/maven2/net/java/dev/jna/jna/4.3.0/jna-4.3.0-javadoc.jar -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897121#comment-15897121 ] Christian Esken edited comment on CASSANDRA-13265 at 3/6/17 11:23 AM: -- I was already looking into doing a unit test it but it requires access to the queue which means making it package level access and using {{@VisibleForTesting}}. I will do that tomorrow, unless there are arguments against it. I will also check alternatives. was (Author: cesken): I was already looking into doing a unit test it but it requires access to the queue which means making it package level access and using {{@VisibleForTesting}}. I will do that tomorrow, unless there are arguments against it. > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13300) Upgrade the jna version to 4.3.0
[ https://issues.apache.org/jira/browse/CASSANDRA-13300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897241#comment-15897241 ] Jason Brown commented on CASSANDRA-13300: - rebased the patch on actual apache/trunk, not whatever branch I was working on before that. rerunning tests > Upgrade the jna version to 4.3.0 > > > Key: CASSANDRA-13300 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13300 > Project: Cassandra > Issue Type: Improvement > Components: Configuration >Reporter: Amitkumar Ghatwal >Assignee: Jason Brown > > Could you please upgrade the jna version present in the github cassandra > location : https://github.com/apache/cassandra/blob/trunk/lib/jna-4.0.0.jar > to below latest version - 4.3.0 - > http://repo1.maven.org/maven2/net/java/dev/jna/jna/4.3.0/jna-4.3.0-javadoc.jar -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13300) Upgrade the jna version to 4.3.0
[ https://issues.apache.org/jira/browse/CASSANDRA-13300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897267#comment-15897267 ] Robert Stupp commented on CASSANDRA-13300: -- I don't want to appear to be the bad guy here, but I want to note that PPC architecture is not supported. Supporting another CPU architecture is more than just updating a jar file and just starting C* or running unit tests or dtests. We do a lot of stuff in our code base, which is thoroughly tested on x64 CPUs (looking at you, memory fences, volatiles, unsafe). Additionally, we pull in a couple of 3rd party libraries, which are probably only tested on x64 CPUs and are not under our control. Further, there are probably non-neglectible hardware differences between x64 and PPC affecting I/O (disk and network). Anyway, as Jason already mentioned, since this change is not a bug fix any bug or fix for a security issue, it would have to go into trunk. > Upgrade the jna version to 4.3.0 > > > Key: CASSANDRA-13300 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13300 > Project: Cassandra > Issue Type: Improvement > Components: Configuration >Reporter: Amitkumar Ghatwal >Assignee: Jason Brown > > Could you please upgrade the jna version present in the github cassandra > location : https://github.com/apache/cassandra/blob/trunk/lib/jna-4.0.0.jar > to below latest version - 4.3.0 - > http://repo1.maven.org/maven2/net/java/dev/jna/jna/4.3.0/jna-4.3.0-javadoc.jar -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-12915) SASI: Index intersection with an empty range really inefficient
[ https://issues.apache.org/jira/browse/CASSANDRA-12915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897393#comment-15897393 ] Alex Petrov commented on CASSANDRA-12915: - Thank you very much for the patch, I have a couple of questions: * [here|https://github.com/iksaif/cassandra/commit/4c8981a9b900a38dd67b3bb1560aaac7cc7ccfac#diff-88eaa3c77aa17a84006ad76f3151ed31L162], as far as I understand the code (and purpose), adding an empty range to current range doesn't change the range, why do you think we should remove {{ranges.isEmpty()}}? * when comparing range sizes, there's a check for emptiness which looks redundant, for example [here|https://github.com/iksaif/cassandra/commit/4c8981a9b900a38dd67b3bb1560aaac7cc7ccfac#diff-88eaa3c77aa17a84006ad76f3151ed31R294] and [here|https://github.com/iksaif/cassandra/commit/4c8981a9b900a38dd67b3bb1560aaac7cc7ccfac#diff-88eaa3c77aa17a84006ad76f3151ed31R301] also, [here|https://github.com/iksaif/cassandra/commit/4c8981a9b900a38dd67b3bb1560aaac7cc7ccfac#diff-88eaa3c77aa17a84006ad76f3151ed31R255]. In other words, there's nothing special about empty ranges. We can just return an empty range instead of null and that's pretty much it. > SASI: Index intersection with an empty range really inefficient > --- > > Key: CASSANDRA-12915 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12915 > Project: Cassandra > Issue Type: Improvement > Components: sasi >Reporter: Corentin Chary >Assignee: Corentin Chary > Fix For: 3.11.x, 4.x > > > It looks like RangeIntersectionIterator.java and be pretty inefficient in > some cases. Let's take the following query: > SELECT data FROM table WHERE index1 = 'foo' AND index2 = 'bar'; > In this case: > * index1 = 'foo' will match 2 items > * index2 = 'bar' will match ~300k items > On my setup, the query will take ~1 sec, most of the time being spent in > disk.TokenTree.getTokenAt(). > if I patch RangeIntersectionIterator so that it doesn't try to do the > intersection (and effectively only use 'index1') the query will run in a few > tenth of milliseconds. > I see multiple solutions for that: > * Add a static thresold to avoid the use of the index for the intersection > when we know it will be slow. Probably when the range size factor is very > small and the range size is big. > * CASSANDRA-10765 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-12888) Incremental repairs broken for MVs and CDC
[ https://issues.apache.org/jira/browse/CASSANDRA-12888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897523#comment-15897523 ] T Jake Luciani commented on CASSANDRA-12888: Proposing another (maybe simpler solution to this problem): We currently replay the base table mutations through the write path and drop the streamed table. Instead of this we could keep the streamed based table with repairedAt flag and drop it in. Then, just as we do now replay the mutations through the write path, only create a mutation flag that only updates the MVs and not the base table. > Incremental repairs broken for MVs and CDC > -- > > Key: CASSANDRA-12888 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12888 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Stefan Podkowinski >Assignee: Benjamin Roth >Priority: Critical > Fix For: 3.0.x, 3.11.x > > > SSTables streamed during the repair process will first be written locally and > afterwards either simply added to the pool of existing sstables or, in case > of existing MVs or active CDC, replayed on mutation basis: > As described in {{StreamReceiveTask.OnCompletionRunnable}}: > {quote} > We have a special path for views and for CDC. > For views, since the view requires cleaning up any pre-existing state, we > must put all partitions through the same write path as normal mutations. This > also ensures any 2is are also updated. > For CDC-enabled tables, we want to ensure that the mutations are run through > the CommitLog so they can be archived by the CDC process on discard. > {quote} > Using the regular write path turns out to be an issue for incremental > repairs, as we loose the {{repaired_at}} state in the process. Eventually the > streamed rows will end up in the unrepaired set, in contrast to the rows on > the sender site moved to the repaired set. The next repair run will stream > the same data back again, causing rows to bounce on and on between nodes on > each repair. > See linked dtest on steps to reproduce. An example for reproducing this > manually using ccm can be found > [here|https://gist.github.com/spodkowinski/2d8e0408516609c7ae701f2bf1e515e8] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CASSANDRA-13301) Cannot enter non-US characters at cqlsh interactive prompt on Mac
Henrik Ståhl created CASSANDRA-13301: Summary: Cannot enter non-US characters at cqlsh interactive prompt on Mac Key: CASSANDRA-13301 URL: https://issues.apache.org/jira/browse/CASSANDRA-13301 Project: Cassandra Issue Type: Bug Components: CQL Environment: OS X 10.12.3, MacBook Pro / Swedish Pro keyboard Reporter: Henrik Ståhl Priority: Minor The cqlsh interactive prompt does not accept key entries of non-US characters (such as å, ä, ö, §) from my keyboard. When I hit a non-US key, I get a "boink" sound and nothing appears at the prompt. I don't have this issue at the system command prompt, nor when parsing a cql file using "cqlsh -f myfile.cql". Seems to be related to the system-provided Python shell (2.7.10) on OS X. Verified workaround: "easy_install readline" from command line as per https://discussions.apple.com/message/11569875#11569875, then restart cqlsh. Possible (not verified) workaround: Install newer Python version. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13197) +=/-= shortcut syntax bugs/inconsistencies
[ https://issues.apache.org/jira/browse/CASSANDRA-13197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897438#comment-15897438 ] Benjamin Lerer commented on CASSANDRA-13197: I do not think it really make sense to change that behavior for the moment. It is the kind of nasty breaking changes are they are really easy to miss and can easily impact users. In my opinion we should just improve the error messages for (2) and (3). > +=/-= shortcut syntax bugs/inconsistencies > -- > > Key: CASSANDRA-13197 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13197 > Project: Cassandra > Issue Type: Bug >Reporter: Kishan Karunaratne >Assignee: Alex Petrov > > CASSANDRA-12232 introduced (+=/-=) shortcuts for counters and collection > types. I ran into some bugs/consistencies. > Given the schema: > {noformat} > CREATE TABLE simplex.collection_table (k int PRIMARY KEY, d_l List, d_s > Set, d_m Map, d_t Tuple); > {noformat} > 1) Using -= on a list column removes all elements that match the value, > instead of the first or last occurrence of it. Is this expected? > {noformat} > Given d_l = [0, 1, 2, 1, 1] > UPDATE collection_table SET d_l -= [1] WHERE k=0; > yields > [0, 2] > {noformat} > 2) I can't seem to remove a map key/value pair: > {noformat} > Given d_m = {0: 0, 1: 1} > UPDATE collection_table SET d_m -= {1:1} WHERE k=0; > yields > Invalid map literal for d_m of type frozen > {noformat} > However {noformat}UPDATE collection_table SET d_m -= {1} WHERE k=0;{noformat} > does work. > 3) Tuples are immutable so it make sense that +=/-= doesn't apply. However > the error message could be better, now that other collection types are > allowed: > {noformat} > UPDATE collection_table SET d_t += (1) WHERE k=0; > yields > Invalid operation (d_t = d_t + (1)) for non counter column d_t > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13197) +=/-= shortcut syntax bugs/inconsistencies
[ https://issues.apache.org/jira/browse/CASSANDRA-13197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897362#comment-15897362 ] Alex Petrov commented on CASSANDRA-13197: - Thank you for the report, [~kishkaru]. This is a nice find! [CASSANDRA-12232] didn't really add this functionality, it only have make it available via the shortcut. So behaviour described in (1) would require some additional discussion. It'd be good to hear an opinion of [~blerer] on that. For example: {code} execute("UPDATE %s SET l = l + ? WHERE k = 0", list("v1", "v2", "v1", "v2", "v1", "v2")); execute("UPDATE %s SET l = l - ? WHERE k=0", list("v1", "v2")); assertRows(execute("SELECT l FROM %s WHERE k = 0"), row((Object) null)); {code} In my opinion this kind of makes sense. Although changing behaviour isn't a big problem (although it's going to be a breaking change). re: (2), interface for removing items from maps is {{-= set(key)}}. So in that case I think even the error message is kind of making sense (at least from the inner workings perspective). We can make it more user-friendly though. re: (3), I will make sure we give a better message. > +=/-= shortcut syntax bugs/inconsistencies > -- > > Key: CASSANDRA-13197 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13197 > Project: Cassandra > Issue Type: Bug >Reporter: Kishan Karunaratne >Assignee: Alex Petrov > > CASSANDRA-12232 introduced (+=/-=) shortcuts for counters and collection > types. I ran into some bugs/consistencies. > Given the schema: > {noformat} > CREATE TABLE simplex.collection_table (k int PRIMARY KEY, d_l List, d_s > Set, d_m Map, d_t Tuple); > {noformat} > 1) Using -= on a list column removes all elements that match the value, > instead of the first or last occurrence of it. Is this expected? > {noformat} > Given d_l = [0, 1, 2, 1, 1] > UPDATE collection_table SET d_l -= [1] WHERE k=0; > yields > [0, 2] > {noformat} > 2) I can't seem to remove a map key/value pair: > {noformat} > Given d_m = {0: 0, 1: 1} > UPDATE collection_table SET d_m -= {1:1} WHERE k=0; > yields > Invalid map literal for d_m of type frozen > {noformat} > However {noformat}UPDATE collection_table SET d_m -= {1} WHERE k=0;{noformat} > does work. > 3) Tuples are immutable so it make sense that +=/-= doesn't apply. However > the error message could be better, now that other collection types are > allowed: > {noformat} > UPDATE collection_table SET d_t += (1) WHERE k=0; > yields > Invalid operation (d_t = d_t + (1)) for non counter column d_t > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-12915) SASI: Index intersection with an empty range really inefficient
[ https://issues.apache.org/jira/browse/CASSANDRA-12915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897430#comment-15897430 ] Corentin Chary commented on CASSANDRA-12915: * Removing ranges.isEmpty() happens in another function. Removing it doesn't change anything as forEach() will iterate on an empty list. * True for min() and max(). It's this way for the switch() because computing min / max keys with an empty range doesn't make much sense. Anything else ? If not I'll remove the duplicated code in min() and max() > SASI: Index intersection with an empty range really inefficient > --- > > Key: CASSANDRA-12915 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12915 > Project: Cassandra > Issue Type: Improvement > Components: sasi >Reporter: Corentin Chary >Assignee: Corentin Chary > Fix For: 3.11.x, 4.x > > > It looks like RangeIntersectionIterator.java and be pretty inefficient in > some cases. Let's take the following query: > SELECT data FROM table WHERE index1 = 'foo' AND index2 = 'bar'; > In this case: > * index1 = 'foo' will match 2 items > * index2 = 'bar' will match ~300k items > On my setup, the query will take ~1 sec, most of the time being spent in > disk.TokenTree.getTokenAt(). > if I patch RangeIntersectionIterator so that it doesn't try to do the > intersection (and effectively only use 'index1') the query will run in a few > tenth of milliseconds. > I see multiple solutions for that: > * Add a static thresold to avoid the use of the index for the intersection > when we know it will be slow. Probably when the range size factor is very > small and the range size is big. > * CASSANDRA-10765 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-12888) Incremental repairs broken for MVs and CDC
[ https://issues.apache.org/jira/browse/CASSANDRA-12888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897523#comment-15897523 ] T Jake Luciani edited comment on CASSANDRA-12888 at 3/6/17 3:44 PM: Proposing another (maybe simpler solution to this problem): We currently replay the base table mutations through the write path and drop the streamed table. Instead of this we could use the streamed base table with repairedAt flag like any other repair. Then, just as we do now replay the mutations through the write path, only create a mutation flag that only updates the MVs and not the base table. This means the MV wouldn't be incrementally repairable but really you shouldn't need to repair the MVs unless there is dataloss. was (Author: tjake): Proposing another (maybe simpler solution to this problem): We currently replay the base table mutations through the write path and drop the streamed table. Instead of this we could keep the streamed based table with repairedAt flag and drop it in. Then, just as we do now replay the mutations through the write path, only create a mutation flag that only updates the MVs and not the base table. > Incremental repairs broken for MVs and CDC > -- > > Key: CASSANDRA-12888 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12888 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Stefan Podkowinski >Assignee: Benjamin Roth >Priority: Critical > Fix For: 3.0.x, 3.11.x > > > SSTables streamed during the repair process will first be written locally and > afterwards either simply added to the pool of existing sstables or, in case > of existing MVs or active CDC, replayed on mutation basis: > As described in {{StreamReceiveTask.OnCompletionRunnable}}: > {quote} > We have a special path for views and for CDC. > For views, since the view requires cleaning up any pre-existing state, we > must put all partitions through the same write path as normal mutations. This > also ensures any 2is are also updated. > For CDC-enabled tables, we want to ensure that the mutations are run through > the CommitLog so they can be archived by the CDC process on discard. > {quote} > Using the regular write path turns out to be an issue for incremental > repairs, as we loose the {{repaired_at}} state in the process. Eventually the > streamed rows will end up in the unrepaired set, in contrast to the rows on > the sender site moved to the repaired set. The next repair run will stream > the same data back again, causing rows to bounce on and on between nodes on > each repair. > See linked dtest on steps to reproduce. An example for reproducing this > manually using ccm can be found > [here|https://gist.github.com/spodkowinski/2d8e0408516609c7ae701f2bf1e515e8] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-11471) Add SASL mechanism negotiation to the native protocol
[ https://issues.apache.org/jira/browse/CASSANDRA-11471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897645#comment-15897645 ] Ben Bromhead commented on CASSANDRA-11471: -- Thanks I will look to resolve the comments this week. > Add SASL mechanism negotiation to the native protocol > - > > Key: CASSANDRA-11471 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11471 > Project: Cassandra > Issue Type: Sub-task > Components: CQL >Reporter: Sam Tunnicliffe >Assignee: Ben Bromhead > Labels: client-impacting > Attachments: CASSANDRA-11471 > > > Introducing an additional message exchange into the authentication sequence > would allow us to support multiple authentication schemes and [negotiation of > SASL mechanisms|https://tools.ietf.org/html/rfc4422#section-3.2]. > The current {{AUTHENTICATE}} message sent from Client to Server includes the > java classname of the configured {{IAuthenticator}}. This could be superceded > by a new message which lists the SASL mechanisms supported by the server. The > client would then respond with a new message which indicates it's choice of > mechanism. This would allow the server to support multiple mechanisms, for > example enabling both {{PLAIN}} for username/password authentication and > {{EXTERNAL}} for a mechanism for extracting credentials from SSL > certificates\* (see the example in > [RFC-4422|https://tools.ietf.org/html/rfc4422#appendix-A]). Furthermore, the > server could tailor the list of supported mechanisms on a per-connection > basis, e.g. only offering certificate based auth to encrypted clients. > The client's response should include the selected mechanism and any initial > response data. This is mechanism-specific; the {{PLAIN}} mechanism consists > of a single round in which the client sends encoded credentials as the > initial response data and the server response indicates either success or > failure with no futher challenges required. > From a protocol perspective, after the mechanism negotiation the exchange > would continue as in protocol v4, with one or more rounds of > {{AUTH_CHALLENGE}} and {{AUTH_RESPONSE}} messages, terminated by an > {{AUTH_SUCCESS}} sent from Server to Client upon successful authentication or > an {{ERROR}} on auth failure. > XMPP performs mechanism negotiation in this way, > [RFC-3920|http://tools.ietf.org/html/rfc3920#section-6] includes a good > overview. > \* Note: this would require some a priori agreement between client and server > over the implementation of the {{EXTERNAL}} mechanism. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-12888) Incremental repairs broken for MVs and CDC
[ https://issues.apache.org/jira/browse/CASSANDRA-12888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897549#comment-15897549 ] T Jake Luciani commented on CASSANDRA-12888: I think that idea won't fly actually. The problem is if you add the sstable first, the MV updates won't reflect the before state. If you add it after the MV updates there will be a time when the MV has data the base table does not. Maybe the latter isn't a deal breaker but more chance for problems. > Incremental repairs broken for MVs and CDC > -- > > Key: CASSANDRA-12888 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12888 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Stefan Podkowinski >Assignee: Benjamin Roth >Priority: Critical > Fix For: 3.0.x, 3.11.x > > > SSTables streamed during the repair process will first be written locally and > afterwards either simply added to the pool of existing sstables or, in case > of existing MVs or active CDC, replayed on mutation basis: > As described in {{StreamReceiveTask.OnCompletionRunnable}}: > {quote} > We have a special path for views and for CDC. > For views, since the view requires cleaning up any pre-existing state, we > must put all partitions through the same write path as normal mutations. This > also ensures any 2is are also updated. > For CDC-enabled tables, we want to ensure that the mutations are run through > the CommitLog so they can be archived by the CDC process on discard. > {quote} > Using the regular write path turns out to be an issue for incremental > repairs, as we loose the {{repaired_at}} state in the process. Eventually the > streamed rows will end up in the unrepaired set, in contrast to the rows on > the sender site moved to the repaired set. The next repair run will stream > the same data back again, causing rows to bounce on and on between nodes on > each repair. > See linked dtest on steps to reproduce. An example for reproducing this > manually using ccm can be found > [here|https://gist.github.com/spodkowinski/2d8e0408516609c7ae701f2bf1e515e8] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CASSANDRA-13302) last row of previous page == first row of next page while querying data using SASI index
Andy Tolbert created CASSANDRA-13302: Summary: last row of previous page == first row of next page while querying data using SASI index Key: CASSANDRA-13302 URL: https://issues.apache.org/jira/browse/CASSANDRA-13302 Project: Cassandra Issue Type: Bug Environment: Tested with C* 3.9 and 3.10. Reporter: Andy Tolbert Apologies if this is a duplicate (couldn't track down an existing bug). Similarly to [CASSANDRA-11208], it appears it is possible to retrieve duplicate rows when paging using a SASI index as documented in [JAVA-1362|https://datastax-oss.atlassian.net/browse/JAVA-1362], the following test demonstrates that data is repeated while querying using a SASI index: {code:java} public class TestPagingBug { public static void main(String[] args) { Cluster.Builder builder = Cluster.builder(); Cluster c = builder.addContactPoints("192.168.98.190").build(); Session s = c.connect(); s.execute("CREATE KEYSPACE IF NOT EXISTS test WITH replication = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 }"); s.execute("CREATE TABLE IF NOT EXISTS test.test_table_sec(sec BIGINT PRIMARY KEY, id INT)"); //create secondary index on ID column, used for select statement String index = "CREATE CUSTOM INDEX test_table_sec_idx ON test.test_table_sec (id) USING 'org.apache.cassandra.index.sasi.SASIIndex' " + "WITH OPTIONS = { 'mode': 'PREFIX' }"; s.execute(index); PreparedStatement insert = s.prepare("INSERT INTO test.test_table_sec (id, sec) VALUES (1, ?)"); for (int i = 0; i < 1000; i++) s.execute(insert.bind((long) i)); PreparedStatement select = s.prepare("SELECT sec FROM test.test_table_sec WHERE id = 1"); long lastSec = -1; for (Row row : s.execute(select.bind().setFetchSize(300))) { long sec = row.getLong("sec"); if (sec == lastSec) System.out.println(String.format("Duplicated id %d", sec)); lastSec = sec; } System.exit(0); } } {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-12888) Incremental repairs broken for MVs and CDC
[ https://issues.apache.org/jira/browse/CASSANDRA-12888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897565#comment-15897565 ] Benjamin Roth commented on CASSANDRA-12888: --- I also had this idea but it wont work. It will totally break base <> MV consistency. Except: You lock all involved partitions for the whole process. But that would create insanely long locks and a extremely high contention > Incremental repairs broken for MVs and CDC > -- > > Key: CASSANDRA-12888 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12888 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Stefan Podkowinski >Assignee: Benjamin Roth >Priority: Critical > Fix For: 3.0.x, 3.11.x > > > SSTables streamed during the repair process will first be written locally and > afterwards either simply added to the pool of existing sstables or, in case > of existing MVs or active CDC, replayed on mutation basis: > As described in {{StreamReceiveTask.OnCompletionRunnable}}: > {quote} > We have a special path for views and for CDC. > For views, since the view requires cleaning up any pre-existing state, we > must put all partitions through the same write path as normal mutations. This > also ensures any 2is are also updated. > For CDC-enabled tables, we want to ensure that the mutations are run through > the CommitLog so they can be archived by the CDC process on discard. > {quote} > Using the regular write path turns out to be an issue for incremental > repairs, as we loose the {{repaired_at}} state in the process. Eventually the > streamed rows will end up in the unrepaired set, in contrast to the rows on > the sender site moved to the repaired set. The next repair run will stream > the same data back again, causing rows to bounce on and on between nodes on > each repair. > See linked dtest on steps to reproduce. An example for reproducing this > manually using ccm can be found > [here|https://gist.github.com/spodkowinski/2d8e0408516609c7ae701f2bf1e515e8] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13302) last row of previous page == first row of next page while querying data using SASI index
[ https://issues.apache.org/jira/browse/CASSANDRA-13302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Tolbert updated CASSANDRA-13302: - Description: Apologies if this is a duplicate (couldn't track down an existing bug). Similarly to [CASSANDRA-11208], it appears it is possible to retrieve duplicate rows when paging using a SASI index as documented in [JAVA-1413|https://datastax-oss.atlassian.net/browse/JAVA-1413], the following test demonstrates that data is repeated while querying using a SASI index: {code:java} public class TestPagingBug { public static void main(String[] args) { Cluster.Builder builder = Cluster.builder(); Cluster c = builder.addContactPoints("192.168.98.190").build(); Session s = c.connect(); s.execute("CREATE KEYSPACE IF NOT EXISTS test WITH replication = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 }"); s.execute("CREATE TABLE IF NOT EXISTS test.test_table_sec(sec BIGINT PRIMARY KEY, id INT)"); //create secondary index on ID column, used for select statement String index = "CREATE CUSTOM INDEX test_table_sec_idx ON test.test_table_sec (id) USING 'org.apache.cassandra.index.sasi.SASIIndex' " + "WITH OPTIONS = { 'mode': 'PREFIX' }"; s.execute(index); PreparedStatement insert = s.prepare("INSERT INTO test.test_table_sec (id, sec) VALUES (1, ?)"); for (int i = 0; i < 1000; i++) s.execute(insert.bind((long) i)); PreparedStatement select = s.prepare("SELECT sec FROM test.test_table_sec WHERE id = 1"); long lastSec = -1; for (Row row : s.execute(select.bind().setFetchSize(300))) { long sec = row.getLong("sec"); if (sec == lastSec) System.out.println(String.format("Duplicated id %d", sec)); lastSec = sec; } System.exit(0); } } {code} The program outputs the following: {noformat} Duplicated id 23 Duplicated id 192 Duplicated id 684 {noformat} Note that the simple primary key is required to reproduce this. was: Apologies if this is a duplicate (couldn't track down an existing bug). Similarly to [CASSANDRA-11208], it appears it is possible to retrieve duplicate rows when paging using a SASI index as documented in [JAVA-1413|https://datastax-oss.atlassian.net/browse/JAVA-1413], the following test demonstrates that data is repeated while querying using a SASI index: {code:java} public class TestPagingBug { public static void main(String[] args) { Cluster.Builder builder = Cluster.builder(); Cluster c = builder.addContactPoints("192.168.98.190").build(); Session s = c.connect(); s.execute("CREATE KEYSPACE IF NOT EXISTS test WITH replication = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 }"); s.execute("CREATE TABLE IF NOT EXISTS test.test_table_sec(sec BIGINT PRIMARY KEY, id INT)"); //create secondary index on ID column, used for select statement String index = "CREATE CUSTOM INDEX test_table_sec_idx ON test.test_table_sec (id) USING 'org.apache.cassandra.index.sasi.SASIIndex' " + "WITH OPTIONS = { 'mode': 'PREFIX' }"; s.execute(index); PreparedStatement insert = s.prepare("INSERT INTO test.test_table_sec (id, sec) VALUES (1, ?)"); for (int i = 0; i < 1000; i++) s.execute(insert.bind((long) i)); PreparedStatement select = s.prepare("SELECT sec FROM test.test_table_sec WHERE id = 1"); long lastSec = -1; for (Row row : s.execute(select.bind().setFetchSize(300))) { long sec = row.getLong("sec"); if (sec == lastSec) System.out.println(String.format("Duplicated id %d", sec)); lastSec = sec; } System.exit(0); } } {code} > last row of previous page == first row of next page while querying data using > SASI index > > > Key: CASSANDRA-13302 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13302 > Project: Cassandra > Issue Type: Bug >
[jira] [Commented] (CASSANDRA-12888) Incremental repairs broken for MVs and CDC
[ https://issues.apache.org/jira/browse/CASSANDRA-12888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897589#comment-15897589 ] Benjamin Roth commented on CASSANDRA-12888: --- Btw.: My concept seems to work, but there is one question left: Why does a StreamSession create unrepaired SSTables? IncomingFileMessage => Creates RangeAwareSSTableWriter:97 => cfs.createSSTableMultiWriter ... => CompactionStrategyManager.createSSTableMultiWriter:185 Will it be marked as repaired later? If so, where/when? Why I ask: The received SSTable has the repairedFlag in RangeAwareSSTableWriter and it's header but it is lost when the SSTable is finished and returned as SSTableReader. > Incremental repairs broken for MVs and CDC > -- > > Key: CASSANDRA-12888 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12888 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Stefan Podkowinski >Assignee: Benjamin Roth >Priority: Critical > Fix For: 3.0.x, 3.11.x > > > SSTables streamed during the repair process will first be written locally and > afterwards either simply added to the pool of existing sstables or, in case > of existing MVs or active CDC, replayed on mutation basis: > As described in {{StreamReceiveTask.OnCompletionRunnable}}: > {quote} > We have a special path for views and for CDC. > For views, since the view requires cleaning up any pre-existing state, we > must put all partitions through the same write path as normal mutations. This > also ensures any 2is are also updated. > For CDC-enabled tables, we want to ensure that the mutations are run through > the CommitLog so they can be archived by the CDC process on discard. > {quote} > Using the regular write path turns out to be an issue for incremental > repairs, as we loose the {{repaired_at}} state in the process. Eventually the > streamed rows will end up in the unrepaired set, in contrast to the rows on > the sender site moved to the repaired set. The next repair run will stream > the same data back again, causing rows to bounce on and on between nodes on > each repair. > See linked dtest on steps to reproduce. An example for reproducing this > manually using ccm can be found > [here|https://gist.github.com/spodkowinski/2d8e0408516609c7ae701f2bf1e515e8] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-12915) SASI: Index intersection with an empty range really inefficient
[ https://issues.apache.org/jira/browse/CASSANDRA-12915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897545#comment-15897545 ] Corentin Chary commented on CASSANDRA-12915: The fact that you didn't change the following line makes me thing that your patch doesn't really do what we need: Assert.assertEquals(1L, builder.add(new LongIterator(new long[] {})).rangeCount()); Empty ranges really should not get ignored, and the changes made in https://github.com/ifesdjeen/cassandra/commit/78b1ff630536b0f48787ced74a66d702d13637ba#diff-22e58be2cfd42af959cb63c97de7eb3cR246 show that the code do not behave like we would like it to. > SASI: Index intersection with an empty range really inefficient > --- > > Key: CASSANDRA-12915 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12915 > Project: Cassandra > Issue Type: Improvement > Components: sasi >Reporter: Corentin Chary >Assignee: Corentin Chary > Fix For: 3.11.x, 4.x > > > It looks like RangeIntersectionIterator.java and be pretty inefficient in > some cases. Let's take the following query: > SELECT data FROM table WHERE index1 = 'foo' AND index2 = 'bar'; > In this case: > * index1 = 'foo' will match 2 items > * index2 = 'bar' will match ~300k items > On my setup, the query will take ~1 sec, most of the time being spent in > disk.TokenTree.getTokenAt(). > if I patch RangeIntersectionIterator so that it doesn't try to do the > intersection (and effectively only use 'index1') the query will run in a few > tenth of milliseconds. > I see multiple solutions for that: > * Add a static thresold to avoid the use of the index for the intersection > when we know it will be slow. Probably when the range size factor is very > small and the range size is big. > * CASSANDRA-10765 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13302) last row of previous page == first row of next page while querying data using SASI index
[ https://issues.apache.org/jira/browse/CASSANDRA-13302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Tolbert updated CASSANDRA-13302: - Description: Apologies if this is a duplicate (couldn't track down an existing bug). Similarly to [CASSANDRA-11208], it appears it is possible to retrieve duplicate rows when paging using a SASI index as documented in [JAVA-1413|https://datastax-oss.atlassian.net/browse/JAVA-1413], the following test demonstrates that data is repeated while querying using a SASI index: {code:java} public class TestPagingBug { public static void main(String[] args) { Cluster.Builder builder = Cluster.builder(); Cluster c = builder.addContactPoints("192.168.98.190").build(); Session s = c.connect(); s.execute("CREATE KEYSPACE IF NOT EXISTS test WITH replication = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 }"); s.execute("CREATE TABLE IF NOT EXISTS test.test_table_sec(sec BIGINT PRIMARY KEY, id INT)"); //create secondary index on ID column, used for select statement String index = "CREATE CUSTOM INDEX test_table_sec_idx ON test.test_table_sec (id) USING 'org.apache.cassandra.index.sasi.SASIIndex' " + "WITH OPTIONS = { 'mode': 'PREFIX' }"; s.execute(index); PreparedStatement insert = s.prepare("INSERT INTO test.test_table_sec (id, sec) VALUES (1, ?)"); for (int i = 0; i < 1000; i++) s.execute(insert.bind((long) i)); PreparedStatement select = s.prepare("SELECT sec FROM test.test_table_sec WHERE id = 1"); long lastSec = -1; for (Row row : s.execute(select.bind().setFetchSize(300))) { long sec = row.getLong("sec"); if (sec == lastSec) System.out.println(String.format("Duplicated id %d", sec)); lastSec = sec; } System.exit(0); } } {code} was: Apologies if this is a duplicate (couldn't track down an existing bug). Similarly to [CASSANDRA-11208], it appears it is possible to retrieve duplicate rows when paging using a SASI index as documented in [JAVA-1362|https://datastax-oss.atlassian.net/browse/JAVA-1362], the following test demonstrates that data is repeated while querying using a SASI index: {code:java} public class TestPagingBug { public static void main(String[] args) { Cluster.Builder builder = Cluster.builder(); Cluster c = builder.addContactPoints("192.168.98.190").build(); Session s = c.connect(); s.execute("CREATE KEYSPACE IF NOT EXISTS test WITH replication = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 }"); s.execute("CREATE TABLE IF NOT EXISTS test.test_table_sec(sec BIGINT PRIMARY KEY, id INT)"); //create secondary index on ID column, used for select statement String index = "CREATE CUSTOM INDEX test_table_sec_idx ON test.test_table_sec (id) USING 'org.apache.cassandra.index.sasi.SASIIndex' " + "WITH OPTIONS = { 'mode': 'PREFIX' }"; s.execute(index); PreparedStatement insert = s.prepare("INSERT INTO test.test_table_sec (id, sec) VALUES (1, ?)"); for (int i = 0; i < 1000; i++) s.execute(insert.bind((long) i)); PreparedStatement select = s.prepare("SELECT sec FROM test.test_table_sec WHERE id = 1"); long lastSec = -1; for (Row row : s.execute(select.bind().setFetchSize(300))) { long sec = row.getLong("sec"); if (sec == lastSec) System.out.println(String.format("Duplicated id %d", sec)); lastSec = sec; } System.exit(0); } } {code} > last row of previous page == first row of next page while querying data using > SASI index > > > Key: CASSANDRA-13302 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13302 > Project: Cassandra > Issue Type: Bug > Environment: Tested with C* 3.9 and 3.10. >Reporter: Andy Tolbert > > Apologies if this is a duplicate (couldn't track down an existing bug). > Similarly to
[jira] [Commented] (CASSANDRA-12915) SASI: Index intersection with an empty range really inefficient
[ https://issues.apache.org/jira/browse/CASSANDRA-12915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897529#comment-15897529 ] Alex Petrov commented on CASSANDRA-12915: - I've poked around a bit and could not see any behaviour change if we only introduce an empty iterator, like [here|https://github.com/ifesdjeen/cassandra/commit/78b1ff630536b0f48787ced74a66d702d13637ba] (this is by no means meant as a final version of the patch, only bringing it here as an example and a base for discussion). I'm just trying to understand the purpose of the rest of changes, it'd be good to hear your opinion. > SASI: Index intersection with an empty range really inefficient > --- > > Key: CASSANDRA-12915 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12915 > Project: Cassandra > Issue Type: Improvement > Components: sasi >Reporter: Corentin Chary >Assignee: Corentin Chary > Fix For: 3.11.x, 4.x > > > It looks like RangeIntersectionIterator.java and be pretty inefficient in > some cases. Let's take the following query: > SELECT data FROM table WHERE index1 = 'foo' AND index2 = 'bar'; > In this case: > * index1 = 'foo' will match 2 items > * index2 = 'bar' will match ~300k items > On my setup, the query will take ~1 sec, most of the time being spent in > disk.TokenTree.getTokenAt(). > if I patch RangeIntersectionIterator so that it doesn't try to do the > intersection (and effectively only use 'index1') the query will run in a few > tenth of milliseconds. > I see multiple solutions for that: > * Add a static thresold to avoid the use of the index for the intersection > when we know it will be slow. Probably when the range size factor is very > small and the range size is big. > * CASSANDRA-10765 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
cassandra-builds git commit: Reorder virtualenv steps
Repository: cassandra-builds Updated Branches: refs/heads/master 08f76054c -> 9e62fe8a6 Reorder virtualenv steps Project: http://git-wip-us.apache.org/repos/asf/cassandra-builds/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra-builds/commit/9e62fe8a Tree: http://git-wip-us.apache.org/repos/asf/cassandra-builds/tree/9e62fe8a Diff: http://git-wip-us.apache.org/repos/asf/cassandra-builds/diff/9e62fe8a Branch: refs/heads/master Commit: 9e62fe8a6af3578cee5b6337bd259ce5eaa5631d Parents: 08f7605 Author: Michael ShulerAuthored: Mon Mar 6 12:00:21 2017 -0600 Committer: Michael Shuler Committed: Mon Mar 6 12:00:21 2017 -0600 -- build-scripts/cassandra-cqlsh-tests.sh | 52 - 1 file changed, 28 insertions(+), 24 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra-builds/blob/9e62fe8a/build-scripts/cassandra-cqlsh-tests.sh -- diff --git a/build-scripts/cassandra-cqlsh-tests.sh b/build-scripts/cassandra-cqlsh-tests.sh index 8cd867d..f7218eb 100755 --- a/build-scripts/cassandra-cqlsh-tests.sh +++ b/build-scripts/cassandra-cqlsh-tests.sh @@ -8,6 +8,8 @@ export PYTHONIOENCODING="utf-8" export PYTHONUNBUFFERED=true +export CASS_DRIVER_NO_EXTENSIONS=true +export CASS_DRIVER_NO_CYTHON=true export CCM_MAX_HEAP_SIZE="2048M" export CCM_HEAP_NEWSIZE="200M" export NUM_TOKENS="32" @@ -27,21 +29,25 @@ if [ "${RETURN}" -ne "0" ]; then exit ${RETURN} fi - -# -# Main -# - +# Set up venv with dtest dependencies +set -e # enable immediate exit if venv setup fails +virtualenv --python=python2 --no-site-packages venv +source venv/bin/activate +pip install -r cassandra-dtest/requirements.txt +pip freeze if [ "$cython" = "yes" ]; then -virtualenv --python=python2 --no-site-packages venv -source venv/bin/activate pip install "Cython>=0.20,<0.25" -pip freeze cd pylib/; python setup.py build_ext --inplace cd ${WORKSPACE} fi + +# +# Main +# + + ccm create test -n 1 ccm updateconf "enable_user_defined_functions: true" @@ -72,28 +78,26 @@ detailed-errors=1 with-xunit=1 EOF +set +e # disable immediate exit from this point nosetests ccm remove mv nosetests.xml ${WORKSPACE}/cqlshlib.xml -if [ "$cython" = "yes" ]; then -deactivate # venv -fi -cd ${WORKSPACE} - # run dtest cqlsh suite -cd cassandra-dtest/ -if [ "$cython" = "no" ]; then -export CASS_DRIVER_NO_EXTENSIONS=true -export CASS_DRIVER_NO_CYTHON=true -fi -virtualenv --python=python2 --no-site-packages venv -source venv/bin/activate -pip install -r requirements.txt -pip freeze - +cd ${WORKSPACE}/cassandra-dtest/ nosetests --verbosity=3 --with-xunit --nocapture cqlsh_tests/ mv nosetests.xml ${WORKSPACE}/ -deactivate # venv + + +# +# Clean +# + + +# /virtualenv +deactivate + +# Exit cleanly for usable "Unstable" status +exit 0
[jira] [Commented] (CASSANDRA-12915) SASI: Index intersection with an empty range really inefficient
[ https://issues.apache.org/jira/browse/CASSANDRA-12915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897907#comment-15897907 ] Alex Petrov commented on CASSANDRA-12915: - This was a piece of test that remained from your code, the approach I suggest is slightly different, so I'm not surprised the empty range still counts as a range. My point is that we could have even taken out the null check from [here|https://github.com/ifesdjeen/cassandra/commit/78b1ff630536b0f48787ced74a66d702d13637ba#diff-25d7f486e2818c56d6b01aa952d459f3L146] and replaced it with an empty iterator and would still achieve the same result and get the issue fixed. If you disagree with the approach I'm proposing, you could answer my initial question and explain. I'm open, I just want to understand before we get it committed. > SASI: Index intersection with an empty range really inefficient > --- > > Key: CASSANDRA-12915 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12915 > Project: Cassandra > Issue Type: Improvement > Components: sasi >Reporter: Corentin Chary >Assignee: Corentin Chary > Fix For: 3.11.x, 4.x > > > It looks like RangeIntersectionIterator.java and be pretty inefficient in > some cases. Let's take the following query: > SELECT data FROM table WHERE index1 = 'foo' AND index2 = 'bar'; > In this case: > * index1 = 'foo' will match 2 items > * index2 = 'bar' will match ~300k items > On my setup, the query will take ~1 sec, most of the time being spent in > disk.TokenTree.getTokenAt(). > if I patch RangeIntersectionIterator so that it doesn't try to do the > intersection (and effectively only use 'index1') the query will run in a few > tenth of milliseconds. > I see multiple solutions for that: > * Add a static thresold to avoid the use of the index for the intersection > when we know it will be slow. Probably when the range size factor is very > small and the range size is big. > * CASSANDRA-10765 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-12888) Incremental repairs broken for MVs and CDC
[ https://issues.apache.org/jira/browse/CASSANDRA-12888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Roth updated CASSANDRA-12888: -- Status: Patch Available (was: Awaiting Feedback) https://github.com/apache/cassandra/compare/trunk...Jaumo:CASSANDRA-12888 Some dtest assertions: https://github.com/riptano/cassandra-dtest/compare/master...Jaumo:CASSANDRA-12888?expand=1 > Incremental repairs broken for MVs and CDC > -- > > Key: CASSANDRA-12888 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12888 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Stefan Podkowinski >Assignee: Benjamin Roth >Priority: Critical > Fix For: 3.0.x, 3.11.x > > > SSTables streamed during the repair process will first be written locally and > afterwards either simply added to the pool of existing sstables or, in case > of existing MVs or active CDC, replayed on mutation basis: > As described in {{StreamReceiveTask.OnCompletionRunnable}}: > {quote} > We have a special path for views and for CDC. > For views, since the view requires cleaning up any pre-existing state, we > must put all partitions through the same write path as normal mutations. This > also ensures any 2is are also updated. > For CDC-enabled tables, we want to ensure that the mutations are run through > the CommitLog so they can be archived by the CDC process on discard. > {quote} > Using the regular write path turns out to be an issue for incremental > repairs, as we loose the {{repaired_at}} state in the process. Eventually the > streamed rows will end up in the unrepaired set, in contrast to the rows on > the sender site moved to the repaired set. The next repair run will stream > the same data back again, causing rows to bounce on and on between nodes on > each repair. > See linked dtest on steps to reproduce. An example for reproducing this > manually using ccm can be found > [here|https://gist.github.com/spodkowinski/2d8e0408516609c7ae701f2bf1e515e8] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13303) CompactionsTest.testSingleSSTableCompactionWithSizeTieredCompaction super flaky
[ https://issues.apache.org/jira/browse/CASSANDRA-13303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897942#comment-15897942 ] Benjamin Roth edited comment on CASSANDRA-13303 at 3/6/17 7:58 PM: --- 1. Happens in trunk 2. Maybe not clear enough: Table is simply not compacted as AbstractCompactionStrategy.worthDroppingTombstones returns false as droppableRatio is 0.0 {code} double droppableRatio = sstable.getEstimatedDroppableTombstoneRatio(gcBefore); {code} Message: "should be less than x but was y" was (Author: brstgt): 1. Happens in trunk 2. Maybe not clear enough: Table is simply not compacted as AbstractCompactionStrategy.worthDroppingTombstones returns false as droppableRatio is 0.0 {code} double droppableRatio = sstable.getEstimatedDroppableTombstoneRatio(gcBefore); {code} > CompactionsTest.testSingleSSTableCompactionWithSizeTieredCompaction super > flaky > --- > > Key: CASSANDRA-13303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13303 > Project: Cassandra > Issue Type: Bug >Reporter: Benjamin Roth > > On my machine, this test succeeds maybe 1 out of 10 times. > Cause seems to be that sstable is not elected for compation in > worthDroppingTombstones as droppableRatio is 0.0 > I don't know the primary intention of this test, so I didn't touch it but the > conditions are not safe. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13303) CompactionsTest.testSingleSSTableCompactionWithSizeTieredCompaction super flaky
[ https://issues.apache.org/jira/browse/CASSANDRA-13303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897966#comment-15897966 ] Benjamin Roth commented on CASSANDRA-13303: --- I read the comments in 13038, I guess we are talking of the same thing. Also MetadataSerializerTest fails on my machine. > CompactionsTest.testSingleSSTableCompactionWithSizeTieredCompaction super > flaky > --- > > Key: CASSANDRA-13303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13303 > Project: Cassandra > Issue Type: Bug >Reporter: Benjamin Roth > > On my machine, this test succeeds maybe 1 out of 10 times. > Cause seems to be that sstable is not elected for compation in > worthDroppingTombstones as droppableRatio is 0.0 > I don't know the primary intention of this test, so I didn't touch it but the > conditions are not safe. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-12489) consecutive repairs of same range always finds 'out of sync' in sane cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-12489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898028#comment-15898028 ] Benjamin Roth commented on CASSANDRA-12489: --- May I ask what's the reason that incremental + subrange repair doesn't do anticompaction? Is it because anticompaction is too expensive in this case or to say it in different words: A subrange full repair is cheaper than subrange incremental repair with anticompaction? > consecutive repairs of same range always finds 'out of sync' in sane cluster > > > Key: CASSANDRA-12489 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12489 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Benjamin Roth >Assignee: Benjamin Roth > Labels: lhf > Attachments: trace_3_10.1.log.gz, trace_3_10.2.log.gz, > trace_3_10.3.log.gz, trace_3_10.4.log.gz, trace_3_9.1.log.gz, > trace_3_9.2.log.gz > > > No matter how often or when I run the same subrange repair, it ALWAYS tells > me that some ranges are our of sync. Tested in 3.9 + 3.10 (git trunk of > 2016-08-17). The cluster is sane. All nodes are up, cluster is not overloaded. > I guess this is not a desired behaviour. I'd expect that a repair does what > it says and a consecutive repair shouldn't report "out of syncs" any more if > the cluster is sane. > Especially for tables with MVs that puts a lot of pressure during repair as > ranges are repaired over and over again. > See traces of different runs attached. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-12489) consecutive repairs of same range always finds 'out of sync' in sane cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-12489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898060#comment-15898060 ] Marcus Eriksson commented on CASSANDRA-12489: - the idea is that with incremental repairs we don't need to use the tools that split the ranges, instead the amount of data to repair is small since we only include unrepaired data > consecutive repairs of same range always finds 'out of sync' in sane cluster > > > Key: CASSANDRA-12489 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12489 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Benjamin Roth >Assignee: Benjamin Roth > Labels: lhf > Attachments: trace_3_10.1.log.gz, trace_3_10.2.log.gz, > trace_3_10.3.log.gz, trace_3_10.4.log.gz, trace_3_9.1.log.gz, > trace_3_9.2.log.gz > > > No matter how often or when I run the same subrange repair, it ALWAYS tells > me that some ranges are our of sync. Tested in 3.9 + 3.10 (git trunk of > 2016-08-17). The cluster is sane. All nodes are up, cluster is not overloaded. > I guess this is not a desired behaviour. I'd expect that a repair does what > it says and a consecutive repair shouldn't report "out of syncs" any more if > the cluster is sane. > Especially for tables with MVs that puts a lot of pressure during repair as > ranges are repaired over and over again. > See traces of different runs attached. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CASSANDRA-13303) CompactionsTest.testSingleSSTableCompactionWithSizeTieredCompaction super flaky
Benjamin Roth created CASSANDRA-13303: - Summary: CompactionsTest.testSingleSSTableCompactionWithSizeTieredCompaction super flaky Key: CASSANDRA-13303 URL: https://issues.apache.org/jira/browse/CASSANDRA-13303 Project: Cassandra Issue Type: Bug Reporter: Benjamin Roth On my machine, this test succeeds maybe 1 out of 10 times. Cause seems to be that sstable is not elected for compation in worthDroppingTombstones as droppableRatio is 0.0 I don't know the primary intention of this test, so I didn't touch it but the conditions are not safe. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13303) CompactionsTest.testSingleSSTableCompactionWithSizeTieredCompaction super flaky
[ https://issues.apache.org/jira/browse/CASSANDRA-13303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897963#comment-15897963 ] Joel Knighton commented on CASSANDRA-13303: --- [CASSANDRA-13038] introduced the regression in {{a5ce963117acf5e4cf0a31057551f2f42385c398}}. The regression was fixed in {{adbe2cc4df0134955a2c83ae4ebd0086ea5e9164}}. > CompactionsTest.testSingleSSTableCompactionWithSizeTieredCompaction super > flaky > --- > > Key: CASSANDRA-13303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13303 > Project: Cassandra > Issue Type: Bug >Reporter: Benjamin Roth > > On my machine, this test succeeds maybe 1 out of 10 times. > Cause seems to be that sstable is not elected for compation in > worthDroppingTombstones as droppableRatio is 0.0 > I don't know the primary intention of this test, so I didn't touch it but the > conditions are not safe. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-12489) consecutive repairs of same range always finds 'out of sync' in sane cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-12489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898051#comment-15898051 ] Benjamin Roth commented on CASSANDRA-12489: --- Thanks for the answer. Thats what I thought. But what a right to exist do incremental repairs then have in the real world if (most, many, whatever) people use a tool that makes repairs manageable which eliminates this case. The use case + real benefit is quite limited then, isn't it? Probably thats a philosophic question but I'm curios what other guys think about it and if I am maybe missing a valuable use case. > consecutive repairs of same range always finds 'out of sync' in sane cluster > > > Key: CASSANDRA-12489 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12489 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Benjamin Roth >Assignee: Benjamin Roth > Labels: lhf > Attachments: trace_3_10.1.log.gz, trace_3_10.2.log.gz, > trace_3_10.3.log.gz, trace_3_10.4.log.gz, trace_3_9.1.log.gz, > trace_3_9.2.log.gz > > > No matter how often or when I run the same subrange repair, it ALWAYS tells > me that some ranges are our of sync. Tested in 3.9 + 3.10 (git trunk of > 2016-08-17). The cluster is sane. All nodes are up, cluster is not overloaded. > I guess this is not a desired behaviour. I'd expect that a repair does what > it says and a consecutive repair shouldn't report "out of syncs" any more if > the cluster is sane. > Especially for tables with MVs that puts a lot of pressure during repair as > ranges are repaired over and over again. > See traces of different runs attached. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-12888) Incremental repairs broken for MVs and CDC
[ https://issues.apache.org/jira/browse/CASSANDRA-12888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897917#comment-15897917 ] Benjamin Roth edited comment on CASSANDRA-12888 at 3/6/17 7:48 PM: --- Review would be much appreciated. Don't know if [~pauloricardomg] still wants to do the review. Please give me some feedback, thanks! was (Author: brstgt): Review would be much appreciated. Don't know if @pauloricardomg still wants to do the review. Please give me some feedback, thanks! > Incremental repairs broken for MVs and CDC > -- > > Key: CASSANDRA-12888 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12888 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Stefan Podkowinski >Assignee: Benjamin Roth >Priority: Critical > Fix For: 3.0.x, 3.11.x > > > SSTables streamed during the repair process will first be written locally and > afterwards either simply added to the pool of existing sstables or, in case > of existing MVs or active CDC, replayed on mutation basis: > As described in {{StreamReceiveTask.OnCompletionRunnable}}: > {quote} > We have a special path for views and for CDC. > For views, since the view requires cleaning up any pre-existing state, we > must put all partitions through the same write path as normal mutations. This > also ensures any 2is are also updated. > For CDC-enabled tables, we want to ensure that the mutations are run through > the CommitLog so they can be archived by the CDC process on discard. > {quote} > Using the regular write path turns out to be an issue for incremental > repairs, as we loose the {{repaired_at}} state in the process. Eventually the > streamed rows will end up in the unrepaired set, in contrast to the rows on > the sender site moved to the repaired set. The next repair run will stream > the same data back again, causing rows to bounce on and on between nodes on > each repair. > See linked dtest on steps to reproduce. An example for reproducing this > manually using ccm can be found > [here|https://gist.github.com/spodkowinski/2d8e0408516609c7ae701f2bf1e515e8] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-12888) Incremental repairs broken for MVs and CDC
[ https://issues.apache.org/jira/browse/CASSANDRA-12888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897917#comment-15897917 ] Benjamin Roth commented on CASSANDRA-12888: --- Review would be much appreciated. Don't know if @pauloricardomg still wants to do the review. Please give me some feedback, thanks! > Incremental repairs broken for MVs and CDC > -- > > Key: CASSANDRA-12888 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12888 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Stefan Podkowinski >Assignee: Benjamin Roth >Priority: Critical > Fix For: 3.0.x, 3.11.x > > > SSTables streamed during the repair process will first be written locally and > afterwards either simply added to the pool of existing sstables or, in case > of existing MVs or active CDC, replayed on mutation basis: > As described in {{StreamReceiveTask.OnCompletionRunnable}}: > {quote} > We have a special path for views and for CDC. > For views, since the view requires cleaning up any pre-existing state, we > must put all partitions through the same write path as normal mutations. This > also ensures any 2is are also updated. > For CDC-enabled tables, we want to ensure that the mutations are run through > the CommitLog so they can be archived by the CDC process on discard. > {quote} > Using the regular write path turns out to be an issue for incremental > repairs, as we loose the {{repaired_at}} state in the process. Eventually the > streamed rows will end up in the unrepaired set, in contrast to the rows on > the sender site moved to the repaired set. The next repair run will stream > the same data back again, causing rows to bounce on and on between nodes on > each repair. > See linked dtest on steps to reproduce. An example for reproducing this > manually using ccm can be found > [here|https://gist.github.com/spodkowinski/2d8e0408516609c7ae701f2bf1e515e8] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13303) CompactionsTest.testSingleSSTableCompactionWithSizeTieredCompaction super flaky
[ https://issues.apache.org/jira/browse/CASSANDRA-13303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897934#comment-15897934 ] Joel Knighton commented on CASSANDRA-13303: --- Thanks for the report, but there isn't a lot that's actionable here. Could you provide the branch(es) it is failing on for you? In addition, the specific failure (as shown by test output/stacktrace) you see would help someone identify the problem, particularly in cases like this when the test isn't failing on CI. This test recently had a regression introduced and fixed in [CASSANDRA-13038], but I don't know if it's the same failure you're seeing. > CompactionsTest.testSingleSSTableCompactionWithSizeTieredCompaction super > flaky > --- > > Key: CASSANDRA-13303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13303 > Project: Cassandra > Issue Type: Bug >Reporter: Benjamin Roth > > On my machine, this test succeeds maybe 1 out of 10 times. > Cause seems to be that sstable is not elected for compation in > worthDroppingTombstones as droppableRatio is 0.0 > I don't know the primary intention of this test, so I didn't touch it but the > conditions are not safe. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13303) CompactionsTest.testSingleSSTableCompactionWithSizeTieredCompaction super flaky
[ https://issues.apache.org/jira/browse/CASSANDRA-13303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897942#comment-15897942 ] Benjamin Roth commented on CASSANDRA-13303: --- 1. Happens in trunk 2. Maybe not clear enough: Table is simply not compacted as AbstractCompactionStrategy.worthDroppingTombstones returns false as droppableRatio is 0.0 {code} double droppableRatio = sstable.getEstimatedDroppableTombstoneRatio(gcBefore); {code} > CompactionsTest.testSingleSSTableCompactionWithSizeTieredCompaction super > flaky > --- > > Key: CASSANDRA-13303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13303 > Project: Cassandra > Issue Type: Bug >Reporter: Benjamin Roth > > On my machine, this test succeeds maybe 1 out of 10 times. > Cause seems to be that sstable is not elected for compation in > worthDroppingTombstones as droppableRatio is 0.0 > I don't know the primary intention of this test, so I didn't touch it but the > conditions are not safe. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13303) CompactionsTest.testSingleSSTableCompactionWithSizeTieredCompaction super flaky
[ https://issues.apache.org/jira/browse/CASSANDRA-13303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897953#comment-15897953 ] Joel Knighton commented on CASSANDRA-13303: --- The exact test output would help diagnose this, but it sounds like the failure introduced/fixed in [CASSANDRA-13038], as seen in CI [here|http://cassci.datastax.com/job/trunk_testall/1436/testReport/junit/org.apache.cassandra.db.compaction/CompactionsTest/testSingleSSTableCompactionWithSizeTieredCompaction/]. Can you make sure this failure still occurs after fetching latest trunk? If so, what's your trunk commit hash? > CompactionsTest.testSingleSSTableCompactionWithSizeTieredCompaction super > flaky > --- > > Key: CASSANDRA-13303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13303 > Project: Cassandra > Issue Type: Bug >Reporter: Benjamin Roth > > On my machine, this test succeeds maybe 1 out of 10 times. > Cause seems to be that sstable is not elected for compation in > worthDroppingTombstones as droppableRatio is 0.0 > I don't know the primary intention of this test, so I didn't touch it but the > conditions are not safe. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13303) CompactionsTest.testSingleSSTableCompactionWithSizeTieredCompaction super flaky
[ https://issues.apache.org/jira/browse/CASSANDRA-13303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897956#comment-15897956 ] Benjamin Roth commented on CASSANDRA-13303: --- CASSANDRA-13038 is fixed in commit a5ce963117acf5e4cf0a31057551f2f42385c398 which I have in my trunk and I don't see a newer commit for 13038. > CompactionsTest.testSingleSSTableCompactionWithSizeTieredCompaction super > flaky > --- > > Key: CASSANDRA-13303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13303 > Project: Cassandra > Issue Type: Bug >Reporter: Benjamin Roth > > On my machine, this test succeeds maybe 1 out of 10 times. > Cause seems to be that sstable is not elected for compation in > worthDroppingTombstones as droppableRatio is 0.0 > I don't know the primary intention of this test, so I didn't touch it but the > conditions are not safe. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (CASSANDRA-13303) CompactionsTest.testSingleSSTableCompactionWithSizeTieredCompaction super flaky
[ https://issues.apache.org/jira/browse/CASSANDRA-13303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Roth resolved CASSANDRA-13303. --- Resolution: Duplicate Duplicate to CASSANDRA-13038 regression fixes > CompactionsTest.testSingleSSTableCompactionWithSizeTieredCompaction super > flaky > --- > > Key: CASSANDRA-13303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13303 > Project: Cassandra > Issue Type: Bug >Reporter: Benjamin Roth > > On my machine, this test succeeds maybe 1 out of 10 times. > Cause seems to be that sstable is not elected for compation in > worthDroppingTombstones as droppableRatio is 0.0 > I don't know the primary intention of this test, so I didn't touch it but the > conditions are not safe. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13303) CompactionsTest.testSingleSSTableCompactionWithSizeTieredCompaction super flaky
[ https://issues.apache.org/jira/browse/CASSANDRA-13303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897969#comment-15897969 ] Benjamin Roth commented on CASSANDRA-13303: --- My trunk is older. So I close the ticket > CompactionsTest.testSingleSSTableCompactionWithSizeTieredCompaction super > flaky > --- > > Key: CASSANDRA-13303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13303 > Project: Cassandra > Issue Type: Bug >Reporter: Benjamin Roth > > On my machine, this test succeeds maybe 1 out of 10 times. > Cause seems to be that sstable is not elected for compation in > worthDroppingTombstones as droppableRatio is 0.0 > I don't know the primary intention of this test, so I didn't touch it but the > conditions are not safe. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13300) Upgrade the jna version to 4.3.0
[ https://issues.apache.org/jira/browse/CASSANDRA-13300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898011#comment-15898011 ] Michael Kjellman commented on CASSANDRA-13300: -- [~snazy] i agree -- looking at this closer this is sorta related fallout to the change I made with CASSANDRA-13233. It would have always failed -- but it's a bit clearer now because the getPid() call in this case is correctly hitting the JNA path as the report shows this was run on Ubuntu. The only PPC handling we've had (that I can see) is the following: {code} if (System.getProperty("os.arch").toLowerCase().contains("ppc")) { if (OS_LINUX) { MCL_CURRENT = 0x2000; MCL_FUTURE = 0x4000; } else if (OS_AIX) { MCL_CURRENT = 0x100; MCL_FUTURE = 0x200; } else { MCL_CURRENT = 1; MCL_FUTURE = 2; } } else { MCL_CURRENT = 1; MCL_FUTURE = 2; } {code} [~jasobrown] I think we should also add a log line at level WARN if it's an OS that we don't know about or if it's a platform we can't really test/don't support (PPC to start with). > Upgrade the jna version to 4.3.0 > > > Key: CASSANDRA-13300 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13300 > Project: Cassandra > Issue Type: Improvement > Components: Configuration >Reporter: Amitkumar Ghatwal >Assignee: Jason Brown > > Could you please upgrade the jna version present in the github cassandra > location : https://github.com/apache/cassandra/blob/trunk/lib/jna-4.0.0.jar > to below latest version - 4.3.0 - > http://repo1.maven.org/maven2/net/java/dev/jna/jna/4.3.0/jna-4.3.0-javadoc.jar -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-10671) Consider removing Config.index_interval
[ https://issues.apache.org/jira/browse/CASSANDRA-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898022#comment-15898022 ] Michael Kjellman commented on CASSANDRA-10671: -- Yup! Looks really good and dead... Just looked in a few places... Sorry for the delay [~snazy] > Consider removing Config.index_interval > --- > > Key: CASSANDRA-10671 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10671 > Project: Cassandra > Issue Type: Task >Reporter: Robert Stupp >Priority: Minor > Fix For: 4.0 > > > {{Config.index_interval}} is deprecated since 2.0.? and unused in 3.0. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-12489) consecutive repairs of same range always finds 'out of sync' in sane cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-12489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898043#comment-15898043 ] Marcus Eriksson commented on CASSANDRA-12489: - Yeah, typically people use stuff like spotifys reaper which splits the range of the in n (1000?) parts. If we have an sstable that covers the full range of the node, we would rewrite it n times - we write each repaired range into a new sstable, and the unrepaired parts get written to another sstable (and that sstable gets rewritten on the next repair etc). > consecutive repairs of same range always finds 'out of sync' in sane cluster > > > Key: CASSANDRA-12489 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12489 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Benjamin Roth >Assignee: Benjamin Roth > Labels: lhf > Attachments: trace_3_10.1.log.gz, trace_3_10.2.log.gz, > trace_3_10.3.log.gz, trace_3_10.4.log.gz, trace_3_9.1.log.gz, > trace_3_9.2.log.gz > > > No matter how often or when I run the same subrange repair, it ALWAYS tells > me that some ranges are our of sync. Tested in 3.9 + 3.10 (git trunk of > 2016-08-17). The cluster is sane. All nodes are up, cluster is not overloaded. > I guess this is not a desired behaviour. I'd expect that a repair does what > it says and a consecutive repair shouldn't report "out of syncs" any more if > the cluster is sane. > Especially for tables with MVs that puts a lot of pressure during repair as > ranges are repaired over and over again. > See traces of different runs attached. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
cassandra-builds git commit: Use cut to build release series instead of counting characters
Repository: cassandra-builds Updated Branches: refs/heads/master 9e62fe8a6 -> a1f0d3309 Use cut to build release series instead of counting characters Project: http://git-wip-us.apache.org/repos/asf/cassandra-builds/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra-builds/commit/a1f0d330 Tree: http://git-wip-us.apache.org/repos/asf/cassandra-builds/tree/a1f0d330 Diff: http://git-wip-us.apache.org/repos/asf/cassandra-builds/diff/a1f0d330 Branch: refs/heads/master Commit: a1f0d330904d559aae07dd0a0ff67f65fe00c818 Parents: 9e62fe8 Author: Michael ShulerAuthored: Mon Mar 6 14:56:30 2017 -0600 Committer: Michael Shuler Committed: Mon Mar 6 14:56:30 2017 -0600 -- cassandra-release/finish_release.sh | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra-builds/blob/a1f0d330/cassandra-release/finish_release.sh -- diff --git a/cassandra-release/finish_release.sh b/cassandra-release/finish_release.sh index 99655c1..bcc9898 100755 --- a/cassandra-release/finish_release.sh +++ b/cassandra-release/finish_release.sh @@ -134,6 +134,8 @@ then else release_short=${release:0:$((idx-1))} fi +release_major=$(echo ${release_short} | cut -d '.' -f 1) +release_minor=$(echo ${release_short} | cut -d '.' -f 2) echo "Deploying artifacts ..." 1>&3 2>&4 start_dir=$PWD @@ -164,7 +166,7 @@ echo "Deploying debian packages ..." 1>&3 2>&4 current_dir=`pwd` -debian_series="${release_short:0:1}${release_short:2:2}x" +debian_series="${release_major}${release_minor}x" execute "cd $reprepro_dir" execute "reprepro --ignore=wrongdistribution include $debian_series $debian_package_dir/cassandra_${release}_debian/cassandra_${deb_release}_*.changes" @@ -192,7 +194,7 @@ echo "Downloads of source and binary distributions are listed in our download se echo "" >> $mail_file echo " http://cassandra.apache.org/download/; >> $mail_file echo "" >> $mail_file -series="${release_short:0:1}.${release_short:2:1}" +series="${release_major}.${release_minor}" echo "This version is a bug fix release[1] on the $series series. As always, please pay attention to the release notes[2] and Let us know[3] if you were to encounter any problem." >> $mail_file echo "" >> $mail_file echo "Enjoy!" >> $mail_file
[jira] [Updated] (CASSANDRA-10671) Consider removing Config.index_interval
[ https://issues.apache.org/jira/browse/CASSANDRA-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-10671: Reviewer: Jason Brown > Consider removing Config.index_interval > --- > > Key: CASSANDRA-10671 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10671 > Project: Cassandra > Issue Type: Task >Reporter: Robert Stupp >Priority: Minor > Fix For: 4.0 > > > {{Config.index_interval}} is deprecated since 2.0.? and unused in 3.0. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-10671) Consider removing Config.index_interval
[ https://issues.apache.org/jira/browse/CASSANDRA-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898101#comment-15898101 ] Jason Brown commented on CASSANDRA-10671: - commit at will, [~snazy] > Consider removing Config.index_interval > --- > > Key: CASSANDRA-10671 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10671 > Project: Cassandra > Issue Type: Task >Reporter: Robert Stupp >Priority: Minor > Fix For: 4.0 > > > {{Config.index_interval}} is deprecated since 2.0.? and unused in 3.0. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-12915) SASI: Index intersection with an empty range really inefficient
[ https://issues.apache.org/jira/browse/CASSANDRA-12915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898148#comment-15898148 ] Corentin Chary commented on CASSANDRA-12915: Could you re-phrase the question ? I though I answered everything from [this comment|https://issues.apache.org/jira/browse/CASSANDRA-12915?focusedCommentId=15897393=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15897393] but it looks like I didn't. The idea of my approach is that I'm looking for this behavior: {code} builder = RangeIntersectionIterator.builder(strategy); builder.add(new LongIterator(new long[] {})); builder.add(new LongIterator(new long[] {1})); range = builder.build(); Assert.assertEquals(0, range.getCount()); Assert.assertFalse(range.hasNext()); // (optimized though isOverlapping() returning false {code} In other words, adding an empty iterator to a RangeIntersectionIterator should make it empty and there is a strong different between an empty and null iterator. I believe in your case your empty iterator will just get ignored because you need to remove this check: https://github.com/ifesdjeen/cassandra/blob/78b1ff630536b0f48787ced74a66d702d13637ba/src/java/org/apache/cassandra/index/sasi/utils/RangeIterator.java#L151 > SASI: Index intersection with an empty range really inefficient > --- > > Key: CASSANDRA-12915 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12915 > Project: Cassandra > Issue Type: Improvement > Components: sasi >Reporter: Corentin Chary >Assignee: Corentin Chary > Fix For: 3.11.x, 4.x > > > It looks like RangeIntersectionIterator.java and be pretty inefficient in > some cases. Let's take the following query: > SELECT data FROM table WHERE index1 = 'foo' AND index2 = 'bar'; > In this case: > * index1 = 'foo' will match 2 items > * index2 = 'bar' will match ~300k items > On my setup, the query will take ~1 sec, most of the time being spent in > disk.TokenTree.getTokenAt(). > if I patch RangeIntersectionIterator so that it doesn't try to do the > intersection (and effectively only use 'index1') the query will run in a few > tenth of milliseconds. > I see multiple solutions for that: > * Add a static thresold to avoid the use of the index for the intersection > when we know it will be slow. Probably when the range size factor is very > small and the range size is big. > * CASSANDRA-10765 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13300) Upgrade the jna version to 4.3.0
[ https://issues.apache.org/jira/browse/CASSANDRA-13300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898204#comment-15898204 ] Jason Brown commented on CASSANDRA-13300: - OK, I (force) pushed another commit which logs when the OS is not linux/mac/windows, and another which logs when the {{os.arch}} includes {{ppc}}. As many different architectures can be the value of {{os.arch}}, (such as amd64/x86_64/x86), I figured it would be best to just complain about the architectures we do know know about. I'm open to all suggestions on this one. > Upgrade the jna version to 4.3.0 > > > Key: CASSANDRA-13300 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13300 > Project: Cassandra > Issue Type: Improvement > Components: Configuration >Reporter: Amitkumar Ghatwal >Assignee: Jason Brown > > Could you please upgrade the jna version present in the github cassandra > location : https://github.com/apache/cassandra/blob/trunk/lib/jna-4.0.0.jar > to below latest version - 4.3.0 - > http://repo1.maven.org/maven2/net/java/dev/jna/jna/4.3.0/jna-4.3.0-javadoc.jar -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897641#comment-15897641 ] Christian Esken commented on CASSANDRA-13265: - I have one question about a code fragment. When the socket is not available the backlog is cleared, but no drops are counted. Looks like an omission to me, or is it intentional? {{dropped.addAndGet(backlog.size(); )}} would be an approximation. We likely cannot get closer as {{backlog.clear();}} does not tell how much elements were removed. {code} if (qm.isTimedOut()) dropped.incrementAndGet(); else if (socket != null || connect()) writeConnected(qm, count == 1 && backlog.isEmpty()); else { // clear out the queue, else gossip messages back up. drainedMessages.clear(); // dropped.addAndGet(backlog.size()); // TODO Should dropped statistics be counted in this case? backlog.clear(); break inner; } {code} > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13289) Make it possible to monitor an ideal consistency level separate from actual consistency level
[ https://issues.apache.org/jira/browse/CASSANDRA-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898263#comment-15898263 ] Ariel Weisberg commented on CASSANDRA-13289: It doesn't add much overhead because we already track all the responses until all of them arrive or the timeout fires. That said I was going to have this off by default. Yes you can set it via JMX. > Make it possible to monitor an ideal consistency level separate from actual > consistency level > - > > Key: CASSANDRA-13289 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13289 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > > As an operator there are several issues related to multi-datacenter > replication and consistency you may want to have more information on from > your production database. > For instance. If your application writes at LOCAL_QUORUM how often are those > writes failing to achieve EACH_QUORUM at other data centers. If you failed > your application over to one of those data centers roughly how inconsistent > might it be given the number of writes that didn't propagate since the last > incremental repair? > You might also want to know roughly what the latency of writes would be if > you switched to a different consistency level. For instance you are writing > at LOCAL_QUORUM and want to know what would happen if you switched to > EACH_QUORUM. > The proposed change is to allow an ideal_consistency_level to be specified in > cassandra.yaml as well as get/set via JMX. If no ideal consistency level is > specified no additional tracking is done. > if an ideal consistency level is specified then the > {{AbstractWriteResponesHandler}} will contain a delegate WriteResponseHandler > that tracks whether the ideal consistency level is met before a write times > out. It also tracks the latency for achieving the ideal CL of successful > writes. > These two metrics would be reported on a per keyspace basis. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898266#comment-15898266 ] Jason Brown commented on CASSANDRA-13265: - bq. we were supposed to mark those as dropped We probably should count them as dropped as we are dropping them, huh ;) fwiw, looks like it's always been implemented this way, since CASSANDRA-3005 > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898266#comment-15898266 ] Jason Brown edited comment on CASSANDRA-13265 at 3/6/17 10:23 PM: -- bq. we were supposed to mark those as dropped We probably should count them as dropped as we are dropping them, huh ;) fwiw, looks like it's always been implemented this way, since CASSANDRA-3005. So, yes, please update the counter. was (Author: jasobrown): bq. we were supposed to mark those as dropped We probably should count them as dropped as we are dropping them, huh ;) fwiw, looks like it's always been implemented this way, since CASSANDRA-3005 > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13265) Expiration in OutboundTcpConnection can block the reader Thread
[ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898249#comment-15898249 ] Ariel Weisberg commented on CASSANDRA-13265: I think you are correct that we were supposed to mark those as dropped. [~jasobrown] do you agree? I think the approximate nanoTime/currenTimeMillis approach where a single thread periodically updates the time is reasonable. If you added nanoTime to ApproximateTime with it's own configuration I think it would be fine to use it in this context. > Expiration in OutboundTcpConnection can block the reader Thread > --- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version > 1.8.0_112-b15) > Linux 3.16 >Reporter: Christian Esken >Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, > cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to > communicate to the other nodes. This can happen at any time, during peak load > or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the > situation and am already developing a possible fix. Here is the analysis so > far: > - A Threaddump in this situation showed 324 Threads in the > OutboundTcpConnection class that want to lock the backlog queue for doing > expiration. > - A class histogram shows 262508 instances of > OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain > amount of queued messages, it starts thrashing itself to death. Each of the > Thread fully locks the Queue for reading and writing by calling > iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually > writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and > fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be > starved which makes the situation even worse. > - > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (10 INSERT statements per second and more during > peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CASSANDRA-13304) Add checksumming to the native protocol
Michael Kjellman created CASSANDRA-13304: Summary: Add checksumming to the native protocol Key: CASSANDRA-13304 URL: https://issues.apache.org/jira/browse/CASSANDRA-13304 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Michael Kjellman Assignee: Michael Kjellman The native binary transport implementation doesn't include checksums. This makes it highly susceptible to silently inserting corrupted data either due to hardware issues causing bit flips on the sender/client side, C*/receiver side, or network in between. Attaching an implementation that makes checksum'ing mandatory (assuming both client and server know about a protocol version that supports checksums) -- and also adds checksumming to clients that request compression. The serialized format looks something like this: {noformat} * 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Number of Compressed Chunks | Compressed Length (e1)/ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * / Compressed Length cont. (e1) |Uncompressed Length (e1) / * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Uncompressed Length cont. (e1)| CRC32 Checksum of Lengths (e1)| * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Checksum of Lengths cont. (e1)|Compressed Bytes (e1)+// * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | CRC32 Checksum (e1) || * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * |Compressed Length (e2) | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Uncompressed Length (e2)| * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * |CRC32 Checksum of Lengths (e2) | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Compressed Bytes (e2) +// * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | CRC32 Checksum (e2) || * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * |Compressed Length (en) | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Uncompressed Length (en)| * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * |CRC32 Checksum of Lengths (en) | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | Compressed Bytes (en) +// * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | CRC32 Checksum (en) || * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ {noformat} The first pass here adds checksums only to the actual contents of the frame body itself (and doesn't actually checksum lengths and headers). While it would be great to fully add checksuming across the entire protocol, the proposed implementation will ensure we at least catch corrupted data and likely protect ourselves pretty well anyways. I didn't go to the trouble of implementing a Snappy Checksum'ed Compressor implementation as it's been deprecated for a while -- is really slow and crappy compared to LZ4 -- and we should do everything in our power to make sure no one in the community is still using it. I left it in (for obvious backwards compatibility aspects) old for clients that don't know about the new protocol. The current protocol has a 256MB (max) frame body -- where the serialized contents are simply written in to the frame body. If the client sends a compression option in the startup, we will install a FrameCompressor inline. Unfortunately, we went with a decision to treat the frame body separately from the header bits etc in a given message. So, instead we put a compressor implementation in the options and then if it's not null, we push the serialized bytes for the frame body *only* thru the given FrameCompressor implementation. The existing implementations simply provide all the bytes for the frame body in one go to the compressor implementation and then serialize it with the length of the compressed bytes up front. Unfortunately, this won't work for checksum'ing for obvious reasons as we can't naively just checksum the entire (potentially) 256MB frame body and slap it at the end... so, The best place to start with the changes is in {{ChecksumedCompressor}}. I
[jira] [Updated] (CASSANDRA-13304) Add checksumming to the native protocol
[ https://issues.apache.org/jira/browse/CASSANDRA-13304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Kjellman updated CASSANDRA-13304: - Attachment: 13304_v1.diff > Add checksumming to the native protocol > --- > > Key: CASSANDRA-13304 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13304 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Michael Kjellman >Assignee: Michael Kjellman > Attachments: 13304_v1.diff > > > The native binary transport implementation doesn't include checksums. This > makes it highly susceptible to silently inserting corrupted data either due > to hardware issues causing bit flips on the sender/client side, C*/receiver > side, or network in between. > Attaching an implementation that makes checksum'ing mandatory (assuming both > client and server know about a protocol version that supports checksums) -- > and also adds checksumming to clients that request compression. > The serialized format looks something like this: > {noformat} > * 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 > * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | Number of Compressed Chunks | Compressed Length (e1)/ > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * / Compressed Length cont. (e1) |Uncompressed Length (e1) / > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | Uncompressed Length cont. (e1)| CRC32 Checksum of Lengths (e1)| > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | Checksum of Lengths cont. (e1)|Compressed Bytes (e1)+// > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | CRC32 Checksum (e1) || > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * |Compressed Length (e2) | > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | Uncompressed Length (e2)| > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * |CRC32 Checksum of Lengths (e2) | > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | Compressed Bytes (e2) +// > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | CRC32 Checksum (e2) || > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * |Compressed Length (en) | > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | Uncompressed Length (en)| > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * |CRC32 Checksum of Lengths (en) | > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | Compressed Bytes (en) +// > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | CRC32 Checksum (en) || > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > {noformat} > The first pass here adds checksums only to the actual contents of the frame > body itself (and doesn't actually checksum lengths and headers). While it > would be great to fully add checksuming across the entire protocol, the > proposed implementation will ensure we at least catch corrupted data and > likely protect ourselves pretty well anyways. > I didn't go to the trouble of implementing a Snappy Checksum'ed Compressor > implementation as it's been deprecated for a while -- is really slow and > crappy compared to LZ4 -- and we should do everything in our power to make > sure no one in the community is still using it. I left it in (for obvious > backwards compatibility aspects) old for clients that don't know about the > new protocol. > The current protocol has a 256MB (max) frame body -- where the serialized > contents are simply written in to the frame body. > If the client sends a compression option in the startup, we will install a > FrameCompressor inline. Unfortunately, we went with a decision to treat the > frame body separately from the header bits etc in a given message. So, > instead we put a compressor implementation in the options and then if it's > not null, we push the serialized bytes for the frame body *only* thru the > given FrameCompressor implementation. The existing implementations simply > provide all the bytes for the frame body in one go to the compressor
[jira] [Updated] (CASSANDRA-13304) Add checksumming to the native protocol
[ https://issues.apache.org/jira/browse/CASSANDRA-13304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Kjellman updated CASSANDRA-13304: - Status: Patch Available (was: Open) > Add checksumming to the native protocol > --- > > Key: CASSANDRA-13304 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13304 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Michael Kjellman >Assignee: Michael Kjellman > Attachments: 13304_v1.diff > > > The native binary transport implementation doesn't include checksums. This > makes it highly susceptible to silently inserting corrupted data either due > to hardware issues causing bit flips on the sender/client side, C*/receiver > side, or network in between. > Attaching an implementation that makes checksum'ing mandatory (assuming both > client and server know about a protocol version that supports checksums) -- > and also adds checksumming to clients that request compression. > The serialized format looks something like this: > {noformat} > * 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 > * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | Number of Compressed Chunks | Compressed Length (e1)/ > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * / Compressed Length cont. (e1) |Uncompressed Length (e1) / > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | Uncompressed Length cont. (e1)| CRC32 Checksum of Lengths (e1)| > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | Checksum of Lengths cont. (e1)|Compressed Bytes (e1)+// > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | CRC32 Checksum (e1) || > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * |Compressed Length (e2) | > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | Uncompressed Length (e2)| > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * |CRC32 Checksum of Lengths (e2) | > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | Compressed Bytes (e2) +// > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | CRC32 Checksum (e2) || > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * |Compressed Length (en) | > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | Uncompressed Length (en)| > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * |CRC32 Checksum of Lengths (en) | > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | Compressed Bytes (en) +// > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | CRC32 Checksum (en) || > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > {noformat} > The first pass here adds checksums only to the actual contents of the frame > body itself (and doesn't actually checksum lengths and headers). While it > would be great to fully add checksuming across the entire protocol, the > proposed implementation will ensure we at least catch corrupted data and > likely protect ourselves pretty well anyways. > I didn't go to the trouble of implementing a Snappy Checksum'ed Compressor > implementation as it's been deprecated for a while -- is really slow and > crappy compared to LZ4 -- and we should do everything in our power to make > sure no one in the community is still using it. I left it in (for obvious > backwards compatibility aspects) old for clients that don't know about the > new protocol. > The current protocol has a 256MB (max) frame body -- where the serialized > contents are simply written in to the frame body. > If the client sends a compression option in the startup, we will install a > FrameCompressor inline. Unfortunately, we went with a decision to treat the > frame body separately from the header bits etc in a given message. So, > instead we put a compressor implementation in the options and then if it's > not null, we push the serialized bytes for the frame body *only* thru the > given FrameCompressor implementation. The existing implementations simply > provide all the bytes for the frame body in one go to the
[jira] [Commented] (CASSANDRA-13041) Do not allow removal of a DC from system_auth replication settings if the DC has active Cassandra instances
[ https://issues.apache.org/jira/browse/CASSANDRA-13041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898527#comment-15898527 ] Nachiket Patil commented on CASSANDRA-13041: [~jjirsa] Yes. In a way the change I am proposing is opposite of CASSANDRA-12510. CASSANDRA-12510 still has a way to force decommission an instance if it is violating the RF. `system.auth` is a special case. This patch prevents dropping a DC from replication factor when there are enough hosts to satisfy the RF or less than RF in that DC. > Do not allow removal of a DC from system_auth replication settings if the DC > has active Cassandra instances > --- > > Key: CASSANDRA-13041 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13041 > Project: Cassandra > Issue Type: Improvement > Components: Distributed Metadata >Reporter: Nachiket Patil >Assignee: Nachiket Patil >Priority: Minor > Fix For: 4.x > > Attachments: trunk.diff > > > I don’t believe it is ever correct to remove a DC from the system_auth > replication settings while there are nodes up in that DC. Cassandra should > not allow this change if there are hosts which are currently members of the > cluster in that DC, as any request which is routed to these hosts will meet > an unavailable. Also dropping the keyspace system_auth should not be allowed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13294) Possible data loss on upgrade 2.1 - 3.0
[ https://issues.apache.org/jira/browse/CASSANDRA-13294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898736#comment-15898736 ] Michael Shuler commented on CASSANDRA-13294: Thanks all! I started all the cassandra-3.0 branch tests on commit {{1ba68a1}} and will check on them in the morning for a look at a release. > Possible data loss on upgrade 2.1 - 3.0 > --- > > Key: CASSANDRA-13294 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13294 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Marcus Eriksson >Assignee: Stefania >Priority: Blocker > Fix For: 3.0.12, 3.11.0 > > > After finishing a compaction we delete the compacted away files. This is done > [here|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/lifecycle/LogFile.java#L328-L337] > which uses > [this|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java#L265-L271] > to get the files - we get all files starting with {{absoluteFilePath}}. > Absolute file path is generated > [here|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/io/sstable/Descriptor.java#L142-L153]. > For 3.0 version files the filename looks like this: > {{/blabla/keyspace1/standard1-bdb031c0ff7b11e6940fdd0479dd8912/mc-1332-big}} > but for 2.1 version files, they look like this: > {{/blabla/keyspace1/standard1-bdb031c0ff7b11e6940fdd0479dd8912/keyspace1-standard1-ka-2}}. > The problem is then that if we were to finish a compaction including the > legacy file, we would actually delete all legacy files having a generation > starting with '2' -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13289) Make it possible to monitor an ideal consistency level separate from actual consistency level
[ https://issues.apache.org/jira/browse/CASSANDRA-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-13289: --- Fix Version/s: 4.0 Status: Patch Available (was: Open) ||Code|utests|dtests|| |[trunk|https://github.com/apache/cassandra/compare/trunk...aweisberg:cassandra-13289?expand=1]|[utests|https://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-cassandra-13289-testall/2/]|[dtests|https://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-cassandra-13289-dtest/2/]| > Make it possible to monitor an ideal consistency level separate from actual > consistency level > - > > Key: CASSANDRA-13289 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13289 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > Fix For: 4.0 > > > As an operator there are several issues related to multi-datacenter > replication and consistency you may want to have more information on from > your production database. > For instance. If your application writes at LOCAL_QUORUM how often are those > writes failing to achieve EACH_QUORUM at other data centers. If you failed > your application over to one of those data centers roughly how inconsistent > might it be given the number of writes that didn't propagate since the last > incremental repair? > You might also want to know roughly what the latency of writes would be if > you switched to a different consistency level. For instance you are writing > at LOCAL_QUORUM and want to know what would happen if you switched to > EACH_QUORUM. > The proposed change is to allow an ideal_consistency_level to be specified in > cassandra.yaml as well as get/set via JMX. If no ideal consistency level is > specified no additional tracking is done. > if an ideal consistency level is specified then the > {{AbstractWriteResponesHandler}} will contain a delegate WriteResponseHandler > that tracks whether the ideal consistency level is met before a write times > out. It also tracks the latency for achieving the ideal CL of successful > writes. > These two metrics would be reported on a per keyspace basis. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13294) Possible data loss on upgrade 2.1 - 3.0
[ https://issues.apache.org/jira/browse/CASSANDRA-13294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania updated CASSANDRA-13294: - Resolution: Fixed Fix Version/s: (was: 3.11.x) (was: 3.0.x) 3.11.0 3.0.12 Status: Resolved (was: Ready to Commit) > Possible data loss on upgrade 2.1 - 3.0 > --- > > Key: CASSANDRA-13294 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13294 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Marcus Eriksson >Assignee: Stefania >Priority: Blocker > Fix For: 3.0.12, 3.11.0 > > > After finishing a compaction we delete the compacted away files. This is done > [here|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/lifecycle/LogFile.java#L328-L337] > which uses > [this|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java#L265-L271] > to get the files - we get all files starting with {{absoluteFilePath}}. > Absolute file path is generated > [here|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/io/sstable/Descriptor.java#L142-L153]. > For 3.0 version files the filename looks like this: > {{/blabla/keyspace1/standard1-bdb031c0ff7b11e6940fdd0479dd8912/mc-1332-big}} > but for 2.1 version files, they look like this: > {{/blabla/keyspace1/standard1-bdb031c0ff7b11e6940fdd0479dd8912/keyspace1-standard1-ka-2}}. > The problem is then that if we were to finish a compaction including the > legacy file, we would actually delete all legacy files having a generation > starting with '2' -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13294) Possible data loss on upgrade 2.1 - 3.0
[ https://issues.apache.org/jira/browse/CASSANDRA-13294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania updated CASSANDRA-13294: - Component/s: Local Write-Read Paths > Possible data loss on upgrade 2.1 - 3.0 > --- > > Key: CASSANDRA-13294 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13294 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Marcus Eriksson >Assignee: Stefania >Priority: Blocker > Fix For: 3.0.12, 3.11.0 > > > After finishing a compaction we delete the compacted away files. This is done > [here|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/lifecycle/LogFile.java#L328-L337] > which uses > [this|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java#L265-L271] > to get the files - we get all files starting with {{absoluteFilePath}}. > Absolute file path is generated > [here|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/io/sstable/Descriptor.java#L142-L153]. > For 3.0 version files the filename looks like this: > {{/blabla/keyspace1/standard1-bdb031c0ff7b11e6940fdd0479dd8912/mc-1332-big}} > but for 2.1 version files, they look like this: > {{/blabla/keyspace1/standard1-bdb031c0ff7b11e6940fdd0479dd8912/keyspace1-standard1-ka-2}}. > The problem is then that if we were to finish a compaction including the > legacy file, we would actually delete all legacy files having a generation > starting with '2' -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13294) Possible data loss on upgrade 2.1 - 3.0
[ https://issues.apache.org/jira/browse/CASSANDRA-13294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898532#comment-15898532 ] Stefania commented on CASSANDRA-13294: -- Thank you for the review and the test. Committed to 3.0 as {{1ba68a1e5d681c091e2c53e7720029f10591e7ef}} and merged into 3.11. Then merged into trunk with {{-s ours}}. > Possible data loss on upgrade 2.1 - 3.0 > --- > > Key: CASSANDRA-13294 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13294 > Project: Cassandra > Issue Type: Bug >Reporter: Marcus Eriksson >Assignee: Stefania >Priority: Blocker > Fix For: 3.0.x, 3.11.x > > > After finishing a compaction we delete the compacted away files. This is done > [here|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/lifecycle/LogFile.java#L328-L337] > which uses > [this|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java#L265-L271] > to get the files - we get all files starting with {{absoluteFilePath}}. > Absolute file path is generated > [here|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/io/sstable/Descriptor.java#L142-L153]. > For 3.0 version files the filename looks like this: > {{/blabla/keyspace1/standard1-bdb031c0ff7b11e6940fdd0479dd8912/mc-1332-big}} > but for 2.1 version files, they look like this: > {{/blabla/keyspace1/standard1-bdb031c0ff7b11e6940fdd0479dd8912/keyspace1-standard1-ka-2}}. > The problem is then that if we were to finish a compaction including the > legacy file, we would actually delete all legacy files having a generation > starting with '2' -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13001) pluggable slow query logging / handling
[ https://issues.apache.org/jira/browse/CASSANDRA-13001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898600#comment-15898600 ] Murukesh Mohanan commented on CASSANDRA-13001: -- {{MonitoringTask}} also logs timed-out queries using pretty much identical code as that used for slow queries. Does it make sense to move both of them to the same (pluggable) logging method? If so, do I pass something like a {{Type: "slow"}} or {{Type: "time-out"}} to the method doing the logging? > pluggable slow query logging / handling > --- > > Key: CASSANDRA-13001 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13001 > Project: Cassandra > Issue Type: New Feature >Reporter: Jon Haddad >Assignee: Murukesh Mohanan > Fix For: 4.0 > > Attachments: > 0001-Add-multiple-logging-methods-for-slow-queries-CASSAN.patch > > > Currently CASSANDRA-12403 logs slow queries as DEBUG to a file. It would be > better to have this as an interface which we can log to alternative > locations, such as to a table on the cluster or to a remote location (statsd, > graphite, etc). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[2/6] cassandra git commit: Prevent data loss on upgrade 2.1 - 3.0 by adding component separator to LogRecord absolute path
Prevent data loss on upgrade 2.1 - 3.0 by adding component separator to LogRecord absolute path patch by Stefania Alborghetti; reviewed by Marcus Eriksson for CASSANDRA-13294 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/1ba68a1e Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/1ba68a1e Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/1ba68a1e Branch: refs/heads/cassandra-3.11 Commit: 1ba68a1e5d681c091e2c53e7720029f10591e7ef Parents: 77d45ea Author: Stefania AlborghettiAuthored: Mon Mar 6 09:45:56 2017 +0800 Committer: Stefania Alborghetti Committed: Mon Mar 6 10:32:32 2017 +0800 -- CHANGES.txt | 1 + .../cassandra/db/lifecycle/LogRecord.java | 31 .../db/lifecycle/LogTransactionTest.java| 4 +-- 3 files changed, 29 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/1ba68a1e/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 076b337..36f058b 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 3.0.12 + * Prevent data loss on upgrade 2.1 - 3.0 by adding component separator to LogRecord absolute path (CASSANDRA-13294) * Improve testing on macOS by eliminating sigar logging (CASSANDRA-13233) * Cqlsh copy-from should error out when csv contains invalid data for collections (CASSANDRA-13071) * Update c.yaml doc for offheap memtables (CASSANDRA-13179) http://git-wip-us.apache.org/repos/asf/cassandra/blob/1ba68a1e/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java -- diff --git a/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java b/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java index c981b02..ac6d6d0 100644 --- a/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java +++ b/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java @@ -29,6 +29,7 @@ import java.util.regex.Pattern; import java.util.stream.Collectors; import java.util.zip.CRC32; +import org.apache.cassandra.io.sstable.Component; import org.apache.cassandra.io.sstable.SSTable; import org.apache.cassandra.io.util.FileUtils; import org.apache.cassandra.utils.FBUtilities; @@ -123,7 +124,7 @@ final class LogRecord Type type = Type.fromPrefix(matcher.group(1)); return new LogRecord(type, - matcher.group(2), + matcher.group(2) + Component.separator, // see comment on CASSANDRA-13294 below Long.valueOf(matcher.group(3)), Integer.valueOf(matcher.group(4)), Long.valueOf(matcher.group(5)), line); @@ -146,7 +147,11 @@ final class LogRecord public static LogRecord make(Type type, SSTable table) { -String absoluteTablePath = FileUtils.getCanonicalPath(table.descriptor.baseFilename()); +// CASSANDRA-13294: add the sstable component separator because for legacy (2.1) files +// there is no separator after the generation number, and this would cause files of sstables with +// a higher generation number that starts with the same number, to be incorrectly classified as files +// of this record sstable +String absoluteTablePath = FileUtils.getCanonicalPath(table.descriptor.baseFilename() + Component.separator); return make(type, getExistingFiles(absoluteTablePath), table.getAllFilePaths().size(), absoluteTablePath); } @@ -188,7 +193,7 @@ final class LogRecord assert !type.hasFile() || absolutePath != null : "Expected file path for file records"; this.type = type; -this.absolutePath = type.hasFile() ? Optional.of(absolutePath) : Optional.empty(); +this.absolutePath = type.hasFile() ? Optional.of(absolutePath) : Optional.empty(); this.updateTime = type == Type.REMOVE ? updateTime : 0; this.numFiles = type.hasFile() ? numFiles : 0; this.status = new Status(); @@ -287,9 +292,25 @@ final class LogRecord : false; } -String absolutePath() +/** + * Return the absolute path, if present, except for the last character (the descriptor separator), or + * the empty string if the record has no path. This method is only to be used internally for writing + * the record to file or computing the checksum. + * + * CASSANDRA-13294: the last character of the absolute path is the descriptor separator, it is removed + * from the absolute path for backward compatibility, to make sure
[4/6] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.11
Merge branch 'cassandra-3.0' into cassandra-3.11 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/f070f1e2 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/f070f1e2 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/f070f1e2 Branch: refs/heads/trunk Commit: f070f1e206e306a4b3ae5cca0cb780cb780357a1 Parents: 7f5dc69 1ba68a1 Author: Stefania AlborghettiAuthored: Tue Mar 7 09:17:16 2017 +0800 Committer: Stefania Alborghetti Committed: Tue Mar 7 09:17:16 2017 +0800 -- CHANGES.txt | 1 + .../cassandra/db/lifecycle/LogRecord.java | 31 .../db/lifecycle/LogTransactionTest.java| 4 +-- 3 files changed, 29 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/f070f1e2/CHANGES.txt -- diff --cc CHANGES.txt index b91edb4,36f058b..3852a64 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,18 -1,8 +1,19 @@@ -3.0.12 +3.11.0 + * Fix equality comparisons of columns using the duration type (CASSANDRA-13174) + * Obfuscate password in stress-graphs (CASSANDRA-12233) + * Move to FastThreadLocalThread and FastThreadLocal (CASSANDRA-13034) + * nodetool stopdaemon errors out (CASSANDRA-13030) + * Tables in system_distributed should not use gcgs of 0 (CASSANDRA-12954) + * Fix primary index calculation for SASI (CASSANDRA-12910) + * More fixes to the TokenAllocator (CASSANDRA-12990) + * NoReplicationTokenAllocator should work with zero replication factor (CASSANDRA-12983) + * Address message coalescing regression (CASSANDRA-12676) +Merged from 3.0: + * Prevent data loss on upgrade 2.1 - 3.0 by adding component separator to LogRecord absolute path (CASSANDRA-13294) * Improve testing on macOS by eliminating sigar logging (CASSANDRA-13233) * Cqlsh copy-from should error out when csv contains invalid data for collections (CASSANDRA-13071) - * Update c.yaml doc for offheap memtables (CASSANDRA-13179) + * Fix "multiple versions of ant detected..." when running ant test (CASSANDRA-13232) + * Coalescing strategy sleeps too much (CASSANDRA-13090) * Faster StreamingHistogram (CASSANDRA-13038) * Legacy deserializer can create unexpected boundary range tombstones (CASSANDRA-13237) * Remove unnecessary assertion from AntiCompactionTest (CASSANDRA-13070) http://git-wip-us.apache.org/repos/asf/cassandra/blob/f070f1e2/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java -- diff --cc src/java/org/apache/cassandra/db/lifecycle/LogRecord.java index 9c1ba31,ac6d6d0..eb8400d --- a/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java +++ b/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java @@@ -123,16 -124,14 +124,16 @@@ final class LogRecor Type type = Type.fromPrefix(matcher.group(1)); return new LogRecord(type, - matcher.group(2), + matcher.group(2) + Component.separator, // see comment on CASSANDRA-13294 below - Long.valueOf(matcher.group(3)), - Integer.valueOf(matcher.group(4)), - Long.valueOf(matcher.group(5)), line); + Long.parseLong(matcher.group(3)), + Integer.parseInt(matcher.group(4)), + Long.parseLong(matcher.group(5)), + line); } -catch (Throwable t) +catch (IllegalArgumentException e) { -return new LogRecord(Type.UNKNOWN, null, 0, 0, 0, line).setError(t); +return new LogRecord(Type.UNKNOWN, null, 0, 0, 0, line) + .setError(String.format("Failed to parse line: %s", e.getMessage())); } } http://git-wip-us.apache.org/repos/asf/cassandra/blob/f070f1e2/test/unit/org/apache/cassandra/db/lifecycle/LogTransactionTest.java --
[6/6] cassandra git commit: Merge branch 'cassandra-3.11' into trunk
Merge branch 'cassandra-3.11' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/76542b66 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/76542b66 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/76542b66 Branch: refs/heads/trunk Commit: 76542b66673f151c73402ee037a58b5e54d3ae4a Parents: 1b55265 f070f1e Author: Stefania AlborghettiAuthored: Tue Mar 7 09:19:13 2017 +0800 Committer: Stefania Alborghetti Committed: Tue Mar 7 09:19:13 2017 +0800 -- --
[3/6] cassandra git commit: Prevent data loss on upgrade 2.1 - 3.0 by adding component separator to LogRecord absolute path
Prevent data loss on upgrade 2.1 - 3.0 by adding component separator to LogRecord absolute path patch by Stefania Alborghetti; reviewed by Marcus Eriksson for CASSANDRA-13294 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/1ba68a1e Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/1ba68a1e Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/1ba68a1e Branch: refs/heads/trunk Commit: 1ba68a1e5d681c091e2c53e7720029f10591e7ef Parents: 77d45ea Author: Stefania AlborghettiAuthored: Mon Mar 6 09:45:56 2017 +0800 Committer: Stefania Alborghetti Committed: Mon Mar 6 10:32:32 2017 +0800 -- CHANGES.txt | 1 + .../cassandra/db/lifecycle/LogRecord.java | 31 .../db/lifecycle/LogTransactionTest.java| 4 +-- 3 files changed, 29 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/1ba68a1e/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 076b337..36f058b 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 3.0.12 + * Prevent data loss on upgrade 2.1 - 3.0 by adding component separator to LogRecord absolute path (CASSANDRA-13294) * Improve testing on macOS by eliminating sigar logging (CASSANDRA-13233) * Cqlsh copy-from should error out when csv contains invalid data for collections (CASSANDRA-13071) * Update c.yaml doc for offheap memtables (CASSANDRA-13179) http://git-wip-us.apache.org/repos/asf/cassandra/blob/1ba68a1e/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java -- diff --git a/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java b/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java index c981b02..ac6d6d0 100644 --- a/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java +++ b/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java @@ -29,6 +29,7 @@ import java.util.regex.Pattern; import java.util.stream.Collectors; import java.util.zip.CRC32; +import org.apache.cassandra.io.sstable.Component; import org.apache.cassandra.io.sstable.SSTable; import org.apache.cassandra.io.util.FileUtils; import org.apache.cassandra.utils.FBUtilities; @@ -123,7 +124,7 @@ final class LogRecord Type type = Type.fromPrefix(matcher.group(1)); return new LogRecord(type, - matcher.group(2), + matcher.group(2) + Component.separator, // see comment on CASSANDRA-13294 below Long.valueOf(matcher.group(3)), Integer.valueOf(matcher.group(4)), Long.valueOf(matcher.group(5)), line); @@ -146,7 +147,11 @@ final class LogRecord public static LogRecord make(Type type, SSTable table) { -String absoluteTablePath = FileUtils.getCanonicalPath(table.descriptor.baseFilename()); +// CASSANDRA-13294: add the sstable component separator because for legacy (2.1) files +// there is no separator after the generation number, and this would cause files of sstables with +// a higher generation number that starts with the same number, to be incorrectly classified as files +// of this record sstable +String absoluteTablePath = FileUtils.getCanonicalPath(table.descriptor.baseFilename() + Component.separator); return make(type, getExistingFiles(absoluteTablePath), table.getAllFilePaths().size(), absoluteTablePath); } @@ -188,7 +193,7 @@ final class LogRecord assert !type.hasFile() || absolutePath != null : "Expected file path for file records"; this.type = type; -this.absolutePath = type.hasFile() ? Optional.of(absolutePath) : Optional.empty(); +this.absolutePath = type.hasFile() ? Optional.of(absolutePath) : Optional.empty(); this.updateTime = type == Type.REMOVE ? updateTime : 0; this.numFiles = type.hasFile() ? numFiles : 0; this.status = new Status(); @@ -287,9 +292,25 @@ final class LogRecord : false; } -String absolutePath() +/** + * Return the absolute path, if present, except for the last character (the descriptor separator), or + * the empty string if the record has no path. This method is only to be used internally for writing + * the record to file or computing the checksum. + * + * CASSANDRA-13294: the last character of the absolute path is the descriptor separator, it is removed + * from the absolute path for backward compatibility, to make sure that on
[5/6] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.11
Merge branch 'cassandra-3.0' into cassandra-3.11 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/f070f1e2 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/f070f1e2 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/f070f1e2 Branch: refs/heads/cassandra-3.11 Commit: f070f1e206e306a4b3ae5cca0cb780cb780357a1 Parents: 7f5dc69 1ba68a1 Author: Stefania AlborghettiAuthored: Tue Mar 7 09:17:16 2017 +0800 Committer: Stefania Alborghetti Committed: Tue Mar 7 09:17:16 2017 +0800 -- CHANGES.txt | 1 + .../cassandra/db/lifecycle/LogRecord.java | 31 .../db/lifecycle/LogTransactionTest.java| 4 +-- 3 files changed, 29 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/f070f1e2/CHANGES.txt -- diff --cc CHANGES.txt index b91edb4,36f058b..3852a64 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,18 -1,8 +1,19 @@@ -3.0.12 +3.11.0 + * Fix equality comparisons of columns using the duration type (CASSANDRA-13174) + * Obfuscate password in stress-graphs (CASSANDRA-12233) + * Move to FastThreadLocalThread and FastThreadLocal (CASSANDRA-13034) + * nodetool stopdaemon errors out (CASSANDRA-13030) + * Tables in system_distributed should not use gcgs of 0 (CASSANDRA-12954) + * Fix primary index calculation for SASI (CASSANDRA-12910) + * More fixes to the TokenAllocator (CASSANDRA-12990) + * NoReplicationTokenAllocator should work with zero replication factor (CASSANDRA-12983) + * Address message coalescing regression (CASSANDRA-12676) +Merged from 3.0: + * Prevent data loss on upgrade 2.1 - 3.0 by adding component separator to LogRecord absolute path (CASSANDRA-13294) * Improve testing on macOS by eliminating sigar logging (CASSANDRA-13233) * Cqlsh copy-from should error out when csv contains invalid data for collections (CASSANDRA-13071) - * Update c.yaml doc for offheap memtables (CASSANDRA-13179) + * Fix "multiple versions of ant detected..." when running ant test (CASSANDRA-13232) + * Coalescing strategy sleeps too much (CASSANDRA-13090) * Faster StreamingHistogram (CASSANDRA-13038) * Legacy deserializer can create unexpected boundary range tombstones (CASSANDRA-13237) * Remove unnecessary assertion from AntiCompactionTest (CASSANDRA-13070) http://git-wip-us.apache.org/repos/asf/cassandra/blob/f070f1e2/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java -- diff --cc src/java/org/apache/cassandra/db/lifecycle/LogRecord.java index 9c1ba31,ac6d6d0..eb8400d --- a/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java +++ b/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java @@@ -123,16 -124,14 +124,16 @@@ final class LogRecor Type type = Type.fromPrefix(matcher.group(1)); return new LogRecord(type, - matcher.group(2), + matcher.group(2) + Component.separator, // see comment on CASSANDRA-13294 below - Long.valueOf(matcher.group(3)), - Integer.valueOf(matcher.group(4)), - Long.valueOf(matcher.group(5)), line); + Long.parseLong(matcher.group(3)), + Integer.parseInt(matcher.group(4)), + Long.parseLong(matcher.group(5)), + line); } -catch (Throwable t) +catch (IllegalArgumentException e) { -return new LogRecord(Type.UNKNOWN, null, 0, 0, 0, line).setError(t); +return new LogRecord(Type.UNKNOWN, null, 0, 0, 0, line) + .setError(String.format("Failed to parse line: %s", e.getMessage())); } } http://git-wip-us.apache.org/repos/asf/cassandra/blob/f070f1e2/test/unit/org/apache/cassandra/db/lifecycle/LogTransactionTest.java --
[1/6] cassandra git commit: Prevent data loss on upgrade 2.1 - 3.0 by adding component separator to LogRecord absolute path
Repository: cassandra Updated Branches: refs/heads/cassandra-3.0 77d45ea53 -> 1ba68a1e5 refs/heads/cassandra-3.11 7f5dc696f -> f070f1e20 refs/heads/trunk 1b552658b -> 76542b666 Prevent data loss on upgrade 2.1 - 3.0 by adding component separator to LogRecord absolute path patch by Stefania Alborghetti; reviewed by Marcus Eriksson for CASSANDRA-13294 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/1ba68a1e Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/1ba68a1e Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/1ba68a1e Branch: refs/heads/cassandra-3.0 Commit: 1ba68a1e5d681c091e2c53e7720029f10591e7ef Parents: 77d45ea Author: Stefania AlborghettiAuthored: Mon Mar 6 09:45:56 2017 +0800 Committer: Stefania Alborghetti Committed: Mon Mar 6 10:32:32 2017 +0800 -- CHANGES.txt | 1 + .../cassandra/db/lifecycle/LogRecord.java | 31 .../db/lifecycle/LogTransactionTest.java| 4 +-- 3 files changed, 29 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/1ba68a1e/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 076b337..36f058b 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 3.0.12 + * Prevent data loss on upgrade 2.1 - 3.0 by adding component separator to LogRecord absolute path (CASSANDRA-13294) * Improve testing on macOS by eliminating sigar logging (CASSANDRA-13233) * Cqlsh copy-from should error out when csv contains invalid data for collections (CASSANDRA-13071) * Update c.yaml doc for offheap memtables (CASSANDRA-13179) http://git-wip-us.apache.org/repos/asf/cassandra/blob/1ba68a1e/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java -- diff --git a/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java b/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java index c981b02..ac6d6d0 100644 --- a/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java +++ b/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java @@ -29,6 +29,7 @@ import java.util.regex.Pattern; import java.util.stream.Collectors; import java.util.zip.CRC32; +import org.apache.cassandra.io.sstable.Component; import org.apache.cassandra.io.sstable.SSTable; import org.apache.cassandra.io.util.FileUtils; import org.apache.cassandra.utils.FBUtilities; @@ -123,7 +124,7 @@ final class LogRecord Type type = Type.fromPrefix(matcher.group(1)); return new LogRecord(type, - matcher.group(2), + matcher.group(2) + Component.separator, // see comment on CASSANDRA-13294 below Long.valueOf(matcher.group(3)), Integer.valueOf(matcher.group(4)), Long.valueOf(matcher.group(5)), line); @@ -146,7 +147,11 @@ final class LogRecord public static LogRecord make(Type type, SSTable table) { -String absoluteTablePath = FileUtils.getCanonicalPath(table.descriptor.baseFilename()); +// CASSANDRA-13294: add the sstable component separator because for legacy (2.1) files +// there is no separator after the generation number, and this would cause files of sstables with +// a higher generation number that starts with the same number, to be incorrectly classified as files +// of this record sstable +String absoluteTablePath = FileUtils.getCanonicalPath(table.descriptor.baseFilename() + Component.separator); return make(type, getExistingFiles(absoluteTablePath), table.getAllFilePaths().size(), absoluteTablePath); } @@ -188,7 +193,7 @@ final class LogRecord assert !type.hasFile() || absolutePath != null : "Expected file path for file records"; this.type = type; -this.absolutePath = type.hasFile() ? Optional.of(absolutePath) : Optional.empty(); +this.absolutePath = type.hasFile() ? Optional.of(absolutePath) : Optional.empty(); this.updateTime = type == Type.REMOVE ? updateTime : 0; this.numFiles = type.hasFile() ? numFiles : 0; this.status = new Status(); @@ -287,9 +292,25 @@ final class LogRecord : false; } -String absolutePath() +/** + * Return the absolute path, if present, except for the last character (the descriptor separator), or + * the empty string if the record has no path. This method is only to be used internally for writing + * the record to file or computing the checksum. +
[jira] [Comment Edited] (CASSANDRA-13041) Do not allow removal of a DC from system_auth replication settings if the DC has active Cassandra instances
[ https://issues.apache.org/jira/browse/CASSANDRA-13041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898527#comment-15898527 ] Nachiket Patil edited comment on CASSANDRA-13041 at 3/7/17 1:19 AM: [~jjirsa] Yes. In a way the change I am proposing is opposite of CASSANDRA-12510. CASSANDRA-12510 still has a way to force decommission an instance if it is violating the RF. `system.auth` is a special case. This patch prevents dropping a DC from replication factor when there are enough hosts to satisfy the RF or less than RF in that DC. was (Author: nachiket_patil): [~jjirsa] Yes. In a way the change I am proposing is opposite of CASSANDRA-12510. CASSANDRA-12510 still has a way to force decommission an instance if it is violating the RF. `system.auth` is a special case. This patch prevents dropping a DC from replication factor when there are enough hosts to satisfy the RF or less than RF in that DC. > Do not allow removal of a DC from system_auth replication settings if the DC > has active Cassandra instances > --- > > Key: CASSANDRA-13041 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13041 > Project: Cassandra > Issue Type: Improvement > Components: Distributed Metadata >Reporter: Nachiket Patil >Assignee: Nachiket Patil >Priority: Minor > Fix For: 4.x > > Attachments: trunk.diff > > > I don’t believe it is ever correct to remove a DC from the system_auth > replication settings while there are nodes up in that DC. Cassandra should > not allow this change if there are hosts which are currently members of the > cluster in that DC, as any request which is routed to these hosts will meet > an unavailable. Also dropping the keyspace system_auth should not be allowed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-10671) Consider removing Config.index_interval
[ https://issues.apache.org/jira/browse/CASSANDRA-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp updated CASSANDRA-10671: - Resolution: Fixed Assignee: Robert Stupp Status: Resolved (was: Patch Available) Thanks! Committed as [46283cdc5b76ea053387a00254d27315027cd5b8|https://github.com/apache/cassandra/commit/46283cdc5b76ea053387a00254d27315027cd5b8] to [trunk|https://github.com/apache/cassandra/tree/trunk] > Consider removing Config.index_interval > --- > > Key: CASSANDRA-10671 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10671 > Project: Cassandra > Issue Type: Task >Reporter: Robert Stupp >Assignee: Robert Stupp >Priority: Minor > Fix For: 4.0 > > > {{Config.index_interval}} is deprecated since 2.0.? and unused in 3.0. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
cassandra git commit: Consider removing Config.index_interval
Repository: cassandra Updated Branches: refs/heads/trunk 76542b666 -> 46283cdc5 Consider removing Config.index_interval patch by Robert Stupp; reviewed by Jason Brown for CASSANDRA-10671 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/46283cdc Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/46283cdc Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/46283cdc Branch: refs/heads/trunk Commit: 46283cdc5b76ea053387a00254d27315027cd5b8 Parents: 76542b6 Author: Robert StuppAuthored: Tue Mar 7 06:32:28 2017 +0100 Committer: Robert Stupp Committed: Tue Mar 7 06:32:28 2017 +0100 -- CHANGES.txt | 1 + NEWS.txt | 1 + src/java/org/apache/cassandra/config/Config.java | 3 --- src/java/org/apache/cassandra/config/DatabaseDescriptor.java | 3 --- 4 files changed, 2 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/46283cdc/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index b00a141..6cf571a 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 4.0 + * Remove config option index_interval (CASSANDRA-10671) * Reduce lock contention for collection types and serializers (CASSANDRA-13271) * Make it possible to override MessagingService.Verb ids (CASSANDRA-13283) * Avoid synchronized on prepareForRepair in ActiveRepairService (CASSANDRA-9292) http://git-wip-us.apache.org/repos/asf/cassandra/blob/46283cdc/NEWS.txt -- diff --git a/NEWS.txt b/NEWS.txt index d53d069..027786d7 100644 --- a/NEWS.txt +++ b/NEWS.txt @@ -43,6 +43,7 @@ Upgrading full and incremental repairs. For full repairs, data is no longer marked repaired. For incremental repairs, anticompaction is run at the beginning of the repair, instead of at the end. +- Config option index_interval has been removed (it was deprecated since 2.0) 3.11.0 == http://git-wip-us.apache.org/repos/asf/cassandra/blob/46283cdc/src/java/org/apache/cassandra/config/Config.java -- diff --git a/src/java/org/apache/cassandra/config/Config.java b/src/java/org/apache/cassandra/config/Config.java index dbad65e..36ce576 100644 --- a/src/java/org/apache/cassandra/config/Config.java +++ b/src/java/org/apache/cassandra/config/Config.java @@ -217,9 +217,6 @@ public class Config public InternodeCompression internode_compression = InternodeCompression.none; -@Deprecated -public Integer index_interval = null; - public int hinted_handoff_throttle_in_kb = 1024; public int batchlog_replay_throttle_in_kb = 1024; public int max_hints_delivery_threads = 2; http://git-wip-us.apache.org/repos/asf/cassandra/blob/46283cdc/src/java/org/apache/cassandra/config/DatabaseDescriptor.java -- diff --git a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java b/src/java/org/apache/cassandra/config/DatabaseDescriptor.java index e5d59fc..4fb742c 100644 --- a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java +++ b/src/java/org/apache/cassandra/config/DatabaseDescriptor.java @@ -622,9 +622,6 @@ public class DatabaseDescriptor throw new ConfigurationException("index_summary_capacity_in_mb option was set incorrectly to '" + conf.index_summary_capacity_in_mb + "', it should be a non-negative integer.", false); -if (conf.index_interval != null) -logger.warn("index_interval has been deprecated and should be removed from cassandra.yaml"); - if(conf.encryption_options != null) { logger.warn("Please rename encryption_options as server_encryption_options in the yaml");
[jira] [Commented] (CASSANDRA-13289) Make it possible to monitor an ideal consistency level separate from actual consistency level
[ https://issues.apache.org/jira/browse/CASSANDRA-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898858#comment-15898858 ] Romain Hardouin commented on CASSANDRA-13289: - > Yes you can set it via JMX. Great, thanks! Typo in cassandra.yaml: {{requested by each each write}} > Make it possible to monitor an ideal consistency level separate from actual > consistency level > - > > Key: CASSANDRA-13289 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13289 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > Fix For: 4.0 > > > As an operator there are several issues related to multi-datacenter > replication and consistency you may want to have more information on from > your production database. > For instance. If your application writes at LOCAL_QUORUM how often are those > writes failing to achieve EACH_QUORUM at other data centers. If you failed > your application over to one of those data centers roughly how inconsistent > might it be given the number of writes that didn't propagate since the last > incremental repair? > You might also want to know roughly what the latency of writes would be if > you switched to a different consistency level. For instance you are writing > at LOCAL_QUORUM and want to know what would happen if you switched to > EACH_QUORUM. > The proposed change is to allow an ideal_consistency_level to be specified in > cassandra.yaml as well as get/set via JMX. If no ideal consistency level is > specified no additional tracking is done. > if an ideal consistency level is specified then the > {{AbstractWriteResponesHandler}} will contain a delegate WriteResponseHandler > that tracks whether the ideal consistency level is met before a write times > out. It also tracks the latency for achieving the ideal CL of successful > writes. > These two metrics would be reported on a per keyspace basis. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (CASSANDRA-12915) SASI: Index intersection with an empty range really inefficient
[ https://issues.apache.org/jira/browse/CASSANDRA-12915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898877#comment-15898877 ] Alex Petrov edited comment on CASSANDRA-12915 at 3/7/17 7:40 AM: - Sure, what I'm saying is that behaviour is unnecessary to fix the problem that we have. SASI Range iterators are taking the shortest range. For example, if we have records like || a || b || c || | 1 | 1 | 1 | | 2 | 2 | 1 | | 3 | 3 | 1 | And {{b}} and {{c}} are SASI-indexed, and we want to run the query {{WHERE b = 2 AND c = 1}}, we will get 2 iterators, one with {{2}} (for the column {{b}}) and the second one with {{1,2,3}} (for the column {{c}}). Now, SASI will take the shortest range ({{b}}) and start iterating by "rewinding" the {{c}} iterator to the token {{2}}. If there's no item for the token, intersection will be empty. Now, moving closer to the problem. If we had just imbalanced iterators (one returning 100 results and the other just 1), SASI would be able to efficiently intersect ranges and hit the storage to retrieve just a single row. In the case with __empty__ results from one of the iterators, the one you're fixing, we were simply using a single iterator, and had to hit the storage a 100 times (assuming the other iterator returns 100 results). Now, with what I propose, we will hit is 0 times. Same with your approach, although with a lot of complex logic that I can not justify, so I have asked you to clarify. The trick was simply to "replace" that [null check|https://github.com/ifesdjeen/cassandra/commit/78b1ff630536b0f48787ced74a66d702d13637ba#diff-25d7f486e2818c56d6b01aa952d459f3L146] with an actual iterator, to let the intersection know there's a second part, and it's empty. I thought it was already discussed in [this comment|https://issues.apache.org/jira/browse/CASSANDRA-12915?focusedCommentId=15728412=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15728412] but I'm happy to re-iterate: bq. The way RangeIterator work and how it's optimised, we're [picking|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeIntersectionIterator.java#L234-L237] the "shortest" token tree and start skipping the second token tree based on the results retrieved from the first one. So one of the indexes will be iterated anyways. The second index (since it's larger) might have to fetch/load roughly 10 blocks (since they might be located far from one another on disk), but it never has to fetch all 2M items. It'll iterate only as many items as the smallest index has. For example, two index queries would skip through (left is which index is used, right is index of the item within the token tree): and bq. Moreover, RangeIterator already has [different optimisation strategies|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeIntersectionIterator.java#L76-L78] based on differences in cardinality. I'd say that current benchmark shows that the query is slightly slower (since we have to go through around twice as much data on disk). But given the numbers at hand the difference that is small and it's sub-millisecond, this optimisation seems not to pay the complexity that it brings. That's one of the ideas behind SASI, pretty much: to be able to efficiently merge, iterate and skip iterators. Looks like we're mostly on the same page. I've ran (relatively) large scale tests with a variant of the patch I've posted (160+GB of data), and it works exactly as expected. Now, I want to simplify the patch and make sure we do not add the code we do not need. So there's no need to add additional logic to hide empty ranges, it's already handled. was (Author: ifesdjeen): Sure, what I'm saying is that behaviour is unnecessary to fix the problem that we have. SASI Range iterators are taking the shortest range. For example, if we have records like || a || b || c || | 1 | 1 | 1 | | 2 | 2 | 1 | | 3 | 3 | 1 | And {{b}} and {{c}} are SASI-indexed, and we want to run the query {{WHERE b = 2 AND c = 1}}, we will get 2 iterators, one with {{2}} (for the column {{b}}) and the second one with {{1,2,3}} (for the column {{c}}). Now, SASI will take the shortest range ({{b}}) and start iterating by "rewinding" the {{c}} iterator to the token {{2}}. If there's no item for the token, intersection will be empty. Now, moving closer to the problem. If we had just imbalanced iterators (one returning 100 results and the other just 1), SASI would be able to efficiently intersect ranges and hit the storage to retrieve just a single row. In the case with __empty__ results from one of the iterators, the one you're fixing, we were simply using a single iterator, and had to hit the storage a 100 times (assuming the other iterator returns 100 results). Now, with what I propose, we
[jira] [Commented] (CASSANDRA-12915) SASI: Index intersection with an empty range really inefficient
[ https://issues.apache.org/jira/browse/CASSANDRA-12915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898877#comment-15898877 ] Alex Petrov commented on CASSANDRA-12915: - Sure, what I'm saying is that behaviour is unnecessary to fix the problem that we have. SASI Range iterators are taking the shortest range. For example, if we have records like || a || b || c || | 1 | 1 | 1 | | 2 | 2 | 1 | | 3 | 3 | 1 | And {{b}} and {{c}} are SASI-indexed, and we want to run the query {{WHERE b = 2 AND c = 1}}, we will get 2 iterators, one with {{2}} (for the column {{b}}) and the second one with {{1,2,3}} (for the column {{c}}). Now, SASI will take the shortest range ({{b}}) and start iterating by "rewinding" the {{c}} iterator to the token {{2}}. If there's no item for the token, intersection will be empty. Now, moving closer to the problem. If we had just imbalanced iterators (one returning 100 results and the other just 1), SASI would be able to efficiently intersect ranges and hit the storage to retrieve just a single row. In the case with __empty__ results from one of the iterators, the one you're fixing, we were simply using a single iterator, and had to hit the storage a 100 times (assuming the other iterator returns 100 results). Now, with what I propose, we will hit is 0 times. Same with your approach, although with a lot of complex logic that I can not justify, so I have asked you to clarify. The trick was simply to "replace" that [null check|https://github.com/ifesdjeen/cassandra/commit/78b1ff630536b0f48787ced74a66d702d13637ba#diff-25d7f486e2818c56d6b01aa952d459f3L146] with an actual iterator, to let the intersection know there's a second part, and it's empty. I thought it was already discussed in [this comment|https://issues.apache.org/jira/browse/CASSANDRA-12915?focusedCommentId=15728412=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15728412] but I'm happy to re-iterate: bq. The way RangeIterator work and how it's optimised, we're [picking|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeIntersectionIterator.java#L234-L237] the "shortest" token tree and start skipping the second token tree based on the results retrieved from the first one. So one of the indexes will be iterated anyways. The second index (since it's larger) might have to fetch/load roughly 10 blocks (since they might be located far from one another on disk), but it never has to fetch all 2M items. It'll iterate only as many items as the smallest index has. For example, two index queries would skip through (left is which index is used, right is index of the item within the token tree): and bq. Moreover, RangeIterator already has [different optimisation strategies|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeIntersectionIterator.java#L76-L78] based on differences in cardinality. I'd say that current benchmark shows that the query is slightly slower (since we have to go through around twice as much data on disk). But given the numbers at hand the difference that is small and it's sub-millisecond, this optimisation seems not to pay the complexity that it brings. That's one of the ideas behind SASI, pretty much: to be able to efficiently merge, iterate and skip iterators. Looks like we're mostly on the same page. I've ran large-scale tests with a variant of the patch I've posted (160+GB of data), and it works exactly as expected. Now, I want to simplify the patch and make sure we do not add the code we do not need. So there's no need to add additional logic to hide empty ranges, it's already handled. > SASI: Index intersection with an empty range really inefficient > --- > > Key: CASSANDRA-12915 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12915 > Project: Cassandra > Issue Type: Improvement > Components: sasi >Reporter: Corentin Chary >Assignee: Corentin Chary > Fix For: 3.11.x, 4.x > > > It looks like RangeIntersectionIterator.java and be pretty inefficient in > some cases. Let's take the following query: > SELECT data FROM table WHERE index1 = 'foo' AND index2 = 'bar'; > In this case: > * index1 = 'foo' will match 2 items > * index2 = 'bar' will match ~300k items > On my setup, the query will take ~1 sec, most of the time being spent in > disk.TokenTree.getTokenAt(). > if I patch RangeIntersectionIterator so that it doesn't try to do the > intersection (and effectively only use 'index1') the query will run in a few > tenth of milliseconds. > I see multiple solutions for that: > * Add a static thresold to avoid the use of the index for the intersection > when we know it will be slow. Probably when the range size
[jira] [Commented] (CASSANDRA-13300) Upgrade the jna version to 4.3.0
[ https://issues.apache.org/jira/browse/CASSANDRA-13300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896870#comment-15896870 ] Yuki Morishita commented on CASSANDRA-13300: I don't have ppc64le machine so I cannot reproduce it, but ifrom their change log, it looks like jna added support for ppc64le since v4.2, and I guess that is the reason for this update request. > Upgrade the jna version to 4.3.0 > > > Key: CASSANDRA-13300 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13300 > Project: Cassandra > Issue Type: Improvement > Components: Configuration >Reporter: Amitkumar Ghatwal > > Could you please upgrade the jna version present in the github cassandra > location : https://github.com/apache/cassandra/blob/trunk/lib/jna-4.0.0.jar > to below latest version - 4.3.0 - > http://repo1.maven.org/maven2/net/java/dev/jna/jna/4.3.0/jna-4.3.0-javadoc.jar -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13300) Upgrade the jna version to 4.3.0
[ https://issues.apache.org/jira/browse/CASSANDRA-13300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896921#comment-15896921 ] Amitkumar Ghatwal commented on CASSANDRA-13300: --- Hi Yuki , Thanks for the quick response. Could you please help me in explaining/understanding- what is necessary to upgrade the jna version to 4.3.0 ?. I can thereafter follow your steps so that we can have the upgraded version of jna in cassandra-trunk. Regards, Amit > Upgrade the jna version to 4.3.0 > > > Key: CASSANDRA-13300 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13300 > Project: Cassandra > Issue Type: Improvement > Components: Configuration >Reporter: Amitkumar Ghatwal > > Could you please upgrade the jna version present in the github cassandra > location : https://github.com/apache/cassandra/blob/trunk/lib/jna-4.0.0.jar > to below latest version - 4.3.0 - > http://repo1.maven.org/maven2/net/java/dev/jna/jna/4.3.0/jna-4.3.0-javadoc.jar -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-10671) Consider removing Config.index_interval
[ https://issues.apache.org/jira/browse/CASSANDRA-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897041#comment-15897041 ] Robert Stupp commented on CASSANDRA-10671: -- Ping [~mkjellman] > Consider removing Config.index_interval > --- > > Key: CASSANDRA-10671 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10671 > Project: Cassandra > Issue Type: Task >Reporter: Robert Stupp >Priority: Minor > Fix For: 4.0 > > > {{Config.index_interval}} is deprecated since 2.0.? and unused in 3.0. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CASSANDRA-13132) Add currentTimestamp and currentDate functions
[ https://issues.apache.org/jira/browse/CASSANDRA-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Petrov updated CASSANDRA-13132: Status: Ready to Commit (was: Patch Available) > Add currentTimestamp and currentDate functions > -- > > Key: CASSANDRA-13132 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13132 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Fix For: 4.x > > > Today, the only way to get the current {{timestamp}} or {{date}} is to > convert using the {{toTimestamp}} and {{toDate}} functions the output of > {{now()}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)