[jira] [Commented] (CASSANDRA-13530) GroupCommitLogService
[ https://issues.apache.org/jira/browse/CASSANDRA-13530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250863#comment-16250863 ] Yuji Ito commented on CASSANDRA-13530: -- Thank you [~jasobrown], Sorry for late, I'm glad I've contributed to this. > GroupCommitLogService > - > > Key: CASSANDRA-13530 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13530 > Project: Cassandra > Issue Type: Improvement >Reporter: Yuji Ito >Assignee: Yuji Ito > Fix For: 2.2.x, 3.0.x, 3.11.x > > Attachments: GuavaRequestThread.java, MicroRequestThread.java, > groupAndBatch.png, groupCommit22.patch, groupCommit30.patch, > groupCommit3x.patch, groupCommitLog_noSerial_result.xlsx, > groupCommitLog_result.xlsx > > > I propose a new CommitLogService, GroupCommitLogService, to improve the > throughput when lots of requests are received. > It improved the throughput by maximum 94%. > I'd like to discuss about this CommitLogService. > Currently, we can select either 2 CommitLog services; Periodic and Batch. > In Periodic, we might lose some commit log which hasn't written to the disk. > In Batch, we can write commit log to the disk every time. The size of commit > log to write is too small (< 4KB). When high concurrency, these writes are > gathered and persisted to the disk at once. But, when insufficient > concurrency, many small writes are issued and the performance decreases due > to the latency of the disk. Even if you use SSD, processes of many IO > commands decrease the performance. > GroupCommitLogService writes some commitlog to the disk at once. > The patch adds GroupCommitLogService (It is enabled by setting > `commitlog_sync` and `commitlog_sync_group_window_in_ms` in cassandra.yaml). > The difference from Batch is just only waiting for the semaphore. > By waiting for the semaphore, some writes for commit logs are executed at the > same time. > In GroupCommitLogService, the latency becomes worse if the there is no > concurrency. > I measured the performance with my microbench (MicroRequestThread.java) by > increasing the number of threads.The cluster has 3 nodes (Replication factor: > 3). Each nodes is AWS EC2 m4.large instance + 200IOPS io1 volume. > The result is as below. The GroupCommitLogService with 10ms window improved > update with Paxos by 94% and improved select with Paxos by 76%. > h6. SELECT / sec > ||\# of threads||Batch 2ms||Group 10ms|| > |1|192|103| > |2|163|212| > |4|264|416| > |8|454|800| > |16|744|1311| > |32|1151|1481| > |64|1767|1844| > |128|2949|3011| > |256|4723|5000| > h6. UPDATE / sec > ||\# of threads||Batch 2ms||Group 10ms|| > |1|45|26| > |2|39|51| > |4|58|102| > |8|102|198| > |16|167|213| > |32|289|295| > |64|544|548| > |128|1046|1058| > |256|2020|2061| -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13992) Don't send new_metadata_id for conditional updates
[ https://issues.apache.org/jira/browse/CASSANDRA-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250786#comment-16250786 ] Kurt Greaves commented on CASSANDRA-13992: -- My understanding is that, at the moment, {{METADATA_CHANGED}} will _always_ be set for a conditional update, regardless of whether it's necessary or not. Necessary being defined as the schema has actually changed and the prepared statements need to be updated client side to reflect those schema changes. [~omichallat] is this true? what exactly is "metadata" referring to on the driver side, and why is the answer "always no" for conditional updates? If there is a change to one of the columns in the update is that going to cause problems if we don't tell the driver that it has changed? I'm with Olivier that that's a hacky addition to the driver, but if it's not even necessary as per above then simply only passing an empty digest will be sufficient. I've updated my [branch|https://github.com/apache/cassandra/compare/trunk...kgreav:13992] to reflect this. Note I've changed to using {{MD5Digest#compute}} to calculate an "empty" digest. Although it's thread local it will always be the same digest, and this will also solve the initial preparation problem, as it also uses the {{EMPTY}} resultset + metadata. > Don't send new_metadata_id for conditional updates > -- > > Key: CASSANDRA-13992 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13992 > Project: Cassandra > Issue Type: Bug >Reporter: Olivier Michallat >Assignee: Kurt Greaves >Priority: Minor > > This is a follow-up to CASSANDRA-10786. > Given the table > {code} > CREATE TABLE foo (k int PRIMARY KEY) > {code} > And the prepared statement > {code} > INSERT INTO foo (k) VALUES (?) IF NOT EXISTS > {code} > The result set metadata changes depending on the outcome of the update: > * if the row didn't exist, there is only a single column \[applied] = true > * if it did, the result contains \[applied] = false, plus the current value > of column k. > The way this was handled so far is that the PREPARED response contains no > result set metadata, and therefore all EXECUTE messages have SKIP_METADATA = > false, and the responses always include the full (and correct) metadata. > CASSANDRA-10786 still sends the PREPARED response with no metadata, *but the > response to EXECUTE now contains a {{new_metadata_id}}*. The driver thinks it > is because of a schema change, and updates its local copy of the prepared > statement's result metadata. > The next EXECUTE is sent with SKIP_METADATA = true, but the server appears to > ignore that, and still sends the metadata in the response. So each response > includes the correct metadata, the driver uses it, and there is no visible > issue for client code. > The only drawback is that the driver updates its local copy of the metadata > unnecessarily, every time. We can work around that by only updating if we had > metadata before, at the cost of an extra volatile read. But I think the best > thing to do would be to never send a {{new_metadata_id}} in for a conditional > update. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13530) GroupCommitLogService
[ https://issues.apache.org/jira/browse/CASSANDRA-13530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250740#comment-16250740 ] Jason Brown edited comment on CASSANDRA-13530 at 11/14/17 3:03 AM: --- [~yuji] We would like this functionality rather soon, so I'd like to take it over. You've done a nice job up to now, and let's drive it home. [~aweisberg] I've taken [~yuji]'s patch and added the comments and tests. wrt utests, the functionality I wanted to test is largely all in {{CommitLogTest}}, but the choice of commitlog mode is driven by the {{test/conf/cassandra.yaml}}. Add on to this [~JoshuaMcKenzie]'s attempts to make commitlog more ammenable to unit testing (read: they are still not very friendly for unit testing; see [this comment|https://issues.apache.org/jira/browse/CASSANDRA-13123?focusedCommentId=16189523=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16189523]), I've subclassed {{CommitLogTest}} for each of the three modes (periodic, batch, and now group). This way we can test group mode, and get periodic as a bonus. This is the cowardly way of testing the different modes and their replayability rather than reworking the commit log as a whole (as [~JoshuaMcKenzie] alludes to), but it seems like that's a larger issue to tackle (on a different ticket). ||13530|| |[branch|https://github.com/jasobrown/cassandra/tree/13530]| |[utests|https://circleci.com/gh/jasobrown/cassandra/tree/13530]| Note: I know there's a problem with {{PeriodicCommitLogTest}}, and I'll look in the morning. It should not hold up reviewing the small amount that I've added if you start reviewing before I fix the test. was (Author: jasobrown): [~yuji] We would like this functionality rather soon, so I'd like to take it over. You've done a nice job up to now, and let's drive it home. @Ariel, I've taken [~yuji]'s patch and added the comments and tests. wrt utests, the functionality I wanted to test is largely all in {{CommitLogTest}}, but the choice of commitlog mode is driven by the {{test/conf/cassandra.yaml}}. Add on to this [~JoshuaMcKenzie]'s attempts to make commitlog more ammenable to unit testing (read: they are still not very friendly for unit testing; see [this comment|https://issues.apache.org/jira/browse/CASSANDRA-13123?focusedCommentId=16189523=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16189523]), I've subclassed {{CommitLogTest}} for each of the three modes (periodic, batch, and now group). This way we can test group mode, and get periodic as a bonus. This is the cowardly way of testing the different modes and their replayability rather than reworking the commit log as a whole (as [~JoshuaMcKenzie] alludes to), but it seems like that's a larger issue to tackle (on a different ticket). ||13530|| |[branch|https://github.com/jasobrown/cassandra/tree/13530]| |[utests|https://circleci.com/gh/jasobrown/cassandra/tree/13530]| Note: I know there's a problem with {{PeriodicCommitLogTest}}, and I'll look in the morning. It should not hold up reviewing the small amount that I've added if you start reviewing before I fix the test. > GroupCommitLogService > - > > Key: CASSANDRA-13530 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13530 > Project: Cassandra > Issue Type: Improvement >Reporter: Yuji Ito >Assignee: Yuji Ito > Fix For: 2.2.x, 3.0.x, 3.11.x > > Attachments: GuavaRequestThread.java, MicroRequestThread.java, > groupAndBatch.png, groupCommit22.patch, groupCommit30.patch, > groupCommit3x.patch, groupCommitLog_noSerial_result.xlsx, > groupCommitLog_result.xlsx > > > I propose a new CommitLogService, GroupCommitLogService, to improve the > throughput when lots of requests are received. > It improved the throughput by maximum 94%. > I'd like to discuss about this CommitLogService. > Currently, we can select either 2 CommitLog services; Periodic and Batch. > In Periodic, we might lose some commit log which hasn't written to the disk. > In Batch, we can write commit log to the disk every time. The size of commit > log to write is too small (< 4KB). When high concurrency, these writes are > gathered and persisted to the disk at once. But, when insufficient > concurrency, many small writes are issued and the performance decreases due > to the latency of the disk. Even if you use SSD, processes of many IO > commands decrease the performance. > GroupCommitLogService writes some commitlog to the disk at once. > The patch adds GroupCommitLogService (It is enabled by setting > `commitlog_sync` and `commitlog_sync_group_window_in_ms` in cassandra.yaml). > The difference from Batch is just only waiting for the semaphore. > By waiting for the semaphore, some writes for commit logs are executed at the >
[jira] [Commented] (CASSANDRA-13530) GroupCommitLogService
[ https://issues.apache.org/jira/browse/CASSANDRA-13530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250740#comment-16250740 ] Jason Brown commented on CASSANDRA-13530: - [~yuji] We would like this functionality rather soon, so I'd like to take it over. You've done a nice job up to now, and let's drive it home. @Ariel, I've taken [~yuji]'s patch and added the comments and tests. wrt utests, the functionality I wanted to test is largely all in {{CommitLogTest}}, but the choice of commitlog mode is driven by the {{test/conf/cassandra.yaml}}. Add on to this [~JoshuaMcKenzie]'s attempts to make commitlog more ammenable to unit testing (read: they are still not very friendly for unit testing; see [this comment|https://issues.apache.org/jira/browse/CASSANDRA-13123?focusedCommentId=16189523=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16189523]), I've subclassed {{CommitLogTest}} for each of the three modes (periodic, batch, and now group). This way we can test group mode, and get periodic as a bonus. This is the cowardly way of testing the different modes and their replayability rather than reworking the commit log as a whole (as [~JoshuaMcKenzie] alludes to), but it seems like that's a larger issue to tackle (on a different ticket). ||13530|| |[branch|https://github.com/jasobrown/cassandra/tree/13530]| |[utests|https://circleci.com/gh/jasobrown/cassandra/tree/13530]| Note: I know there's a problem with {{PeriodicCommitLogTest}}, and I'll look in the morning. It should not hold up reviewing the small amount that I've added if you start reviewing before I fix the test. > GroupCommitLogService > - > > Key: CASSANDRA-13530 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13530 > Project: Cassandra > Issue Type: Improvement >Reporter: Yuji Ito >Assignee: Yuji Ito > Fix For: 2.2.x, 3.0.x, 3.11.x > > Attachments: GuavaRequestThread.java, MicroRequestThread.java, > groupAndBatch.png, groupCommit22.patch, groupCommit30.patch, > groupCommit3x.patch, groupCommitLog_noSerial_result.xlsx, > groupCommitLog_result.xlsx > > > I propose a new CommitLogService, GroupCommitLogService, to improve the > throughput when lots of requests are received. > It improved the throughput by maximum 94%. > I'd like to discuss about this CommitLogService. > Currently, we can select either 2 CommitLog services; Periodic and Batch. > In Periodic, we might lose some commit log which hasn't written to the disk. > In Batch, we can write commit log to the disk every time. The size of commit > log to write is too small (< 4KB). When high concurrency, these writes are > gathered and persisted to the disk at once. But, when insufficient > concurrency, many small writes are issued and the performance decreases due > to the latency of the disk. Even if you use SSD, processes of many IO > commands decrease the performance. > GroupCommitLogService writes some commitlog to the disk at once. > The patch adds GroupCommitLogService (It is enabled by setting > `commitlog_sync` and `commitlog_sync_group_window_in_ms` in cassandra.yaml). > The difference from Batch is just only waiting for the semaphore. > By waiting for the semaphore, some writes for commit logs are executed at the > same time. > In GroupCommitLogService, the latency becomes worse if the there is no > concurrency. > I measured the performance with my microbench (MicroRequestThread.java) by > increasing the number of threads.The cluster has 3 nodes (Replication factor: > 3). Each nodes is AWS EC2 m4.large instance + 200IOPS io1 volume. > The result is as below. The GroupCommitLogService with 10ms window improved > update with Paxos by 94% and improved select with Paxos by 76%. > h6. SELECT / sec > ||\# of threads||Batch 2ms||Group 10ms|| > |1|192|103| > |2|163|212| > |4|264|416| > |8|454|800| > |16|744|1311| > |32|1151|1481| > |64|1767|1844| > |128|2949|3011| > |256|4723|5000| > h6. UPDATE / sec > ||\# of threads||Batch 2ms||Group 10ms|| > |1|45|26| > |2|39|51| > |4|58|102| > |8|102|198| > |16|167|213| > |32|289|295| > |64|544|548| > |128|1046|1058| > |256|2020|2061| -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14013) Data loss in snapshots keyspace after service restart
[ https://issues.apache.org/jira/browse/CASSANDRA-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250453#comment-16250453 ] Jason Brown commented on CASSANDRA-14013: - OK, I walked through [~kongo2002]'s example script above on the 3.11 branch, and indeed I am able to reproduce. I tried on 3.0, and I think it did not repro (would need to do it again, tbqh). I don't have time to dig in for the next few days, but I suspect it's because you named the keyspace "{{snapshots}}", and cassandra might be getting confused by trying to clean up any data it thinks is "snapshot" data. Especially as you have other keyspaces by other names, and you are not seeing this problem, I'm guessing we have a bug in the handling of subdirectories names "snapshots" > Data loss in snapshots keyspace after service restart > - > > Key: CASSANDRA-14013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14013 > Project: Cassandra > Issue Type: Bug >Reporter: Gregor Uhlenheuer > > I am posting this bug in hope to discover the stupid mistake I am doing > because I can't imagine a reasonable answer for the behavior I see right now > :-) > In short words, I do observe data loss in a keyspace called *snapshots* after > restarting the Cassandra service. Say I do have 1000 records in a table > called *snapshots.test_idx* then after restart the table has less entries or > is even empty. > My kind of "mysterious" observation is that it happens only in a keyspace > called *snapshots*... > h3. Steps to reproduce > These steps to reproduce show the described behavior in "most" attempts (not > every single time though). > {code} > # create keyspace > CREATE KEYSPACE snapshots WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': 1}; > # create table > CREATE TABLE snapshots.test_idx (key text, seqno bigint, primary key(key)); > # insert some test data > INSERT INTO snapshots.test_idx (key,seqno) values ('key1', 1); > ... > INSERT INTO snapshots.test_idx (key,seqno) values ('key1000', 1000); > # count entries > SELECT count(*) FROM snapshots.test_idx; > 1000 > # restart service > kill > cassandra -f > # count entries > SELECT count(*) FROM snapshots.test_idx; > 0 > {code} > I hope someone can point me to the obvious mistake I am doing :-) > This happened to me using both Cassandra 3.9 and 3.11.0 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13987) Multithreaded commitlog subtly changed durability
[ https://issues.apache.org/jira/browse/CASSANDRA-13987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250428#comment-16250428 ] Jason Brown commented on CASSANDRA-13987: - I don't believe we've had a policy or guarantee in the past about the availability of commit log data that was unflushed (not {{mysyc}}'ed), thus I'm not sure how much of a 'regression' this changed behavior is. It's unfortunate that some previous assumptions that both developers and operators may have had were altered, and the end result may result in data loss. So I'm kind of on the fence with how far to go back, but I think 3.0 and up is reasonable. > Multithreaded commitlog subtly changed durability > - > > Key: CASSANDRA-13987 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13987 > Project: Cassandra > Issue Type: Improvement >Reporter: Jason Brown >Assignee: Jason Brown > Fix For: 4.x > > > When multithreaded commitlog was introduced in CASSANDRA-3578, we subtly > changed the way that commitlog durability worked. Everything still gets > written to an mmap file. However, not everything is replayable from the > mmaped file after a process crash, in periodic mode. > In brief, the reason this changesd is due to the chained markers that are > required for the multithreaded commit log. At each msync, we wait for > outstanding mutations to serialize into the commitlog, and update a marker > before and after the commits that have accumluated since the last sync. With > those markers, we can safely replay that section of the commitlog. Without > the markers, we have no guarantee that the commits in that section were > successfully written, thus we abandon those commits on replay. > If you have correlated process failures of multiple nodes at "nearly" the > same time (see ["There Is No > Now"|http://queue.acm.org/detail.cfm?id=2745385]), it is possible to have > data loss if none of the nodes msync the commitlog. For example, with RF=3, > if quorum write succeeds on two nodes (and we acknowledge the write back to > the client), and then the process on both nodes OOMs (say, due to reading the > index for a 100GB partition), the write will be lost if neither process > msync'ed the commitlog. More exactly, the commitlog cannot be fully replayed. > The reason why this data is silently lost is due to the chained markers that > were introduced with CASSANDRA-3578. > The problem we are addressing with this ticket is incrementally improving > 'durability' due to process crash, not host crash. (Note: operators should > use batch mode to ensure greater durability, but batch mode in it's current > implementation is a) borked, and b) will burn through, *very* rapidly, SSDs > that don't have a non-volatile write cache sitting in front.) > The current default for {{commitlog_sync_period_in_ms}} is 10 seconds, which > means that a node could lose up to ten seconds of data due to process crash. > The unfortunate thing is that the data is still avaialble, in the mmap file, > but we can't replay it due to incomplete chained markers. > ftr, I don't believe we've ever had a stated policy about commitlog > durability wrt process crash. Pre-2.0 we naturally piggy-backed off the > memory mapped file and the fact that every mutation was acquired a lock and > wrote into the mmap buffer, and the ability to replay everything out of it > came for free. With CASSANDRA-3578, that was subtly changed. > Something [~jjirsa] pointed out to me is that [MySQL provides a way to adjust > the durability > guarantees|https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit] > of each commit in innodb via the {{innodb_flush_log_at_trx_commit}}. I'm > using that idea as a loose springboard for what to do here. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14013) Data loss in snapshots keyspace after service restart
[ https://issues.apache.org/jira/browse/CASSANDRA-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250381#comment-16250381 ] Gregor Uhlenheuer commented on CASSANDRA-14013: --- [~jasobrown] Thanks for the pointer to CASSANDRA-13987 - although I don't think this is the same problem as I do wait for more than 10 seconds indeed. It actually appears that I can pretty much restart the service a couple of times until the table in the *snapshots* keyspace is completely empty. I just tried again on a different machine with the same behavior. > Data loss in snapshots keyspace after service restart > - > > Key: CASSANDRA-14013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14013 > Project: Cassandra > Issue Type: Bug >Reporter: Gregor Uhlenheuer > > I am posting this bug in hope to discover the stupid mistake I am doing > because I can't imagine a reasonable answer for the behavior I see right now > :-) > In short words, I do observe data loss in a keyspace called *snapshots* after > restarting the Cassandra service. Say I do have 1000 records in a table > called *snapshots.test_idx* then after restart the table has less entries or > is even empty. > My kind of "mysterious" observation is that it happens only in a keyspace > called *snapshots*... > h3. Steps to reproduce > These steps to reproduce show the described behavior in "most" attempts (not > every single time though). > {code} > # create keyspace > CREATE KEYSPACE snapshots WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': 1}; > # create table > CREATE TABLE snapshots.test_idx (key text, seqno bigint, primary key(key)); > # insert some test data > INSERT INTO snapshots.test_idx (key,seqno) values ('key1', 1); > ... > INSERT INTO snapshots.test_idx (key,seqno) values ('key1000', 1000); > # count entries > SELECT count(*) FROM snapshots.test_idx; > 1000 > # restart service > kill > cassandra -f > # count entries > SELECT count(*) FROM snapshots.test_idx; > 0 > {code} > I hope someone can point me to the obvious mistake I am doing :-) > This happened to me using both Cassandra 3.9 and 3.11.0 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13987) Multithreaded commitlog subtly changed durability
[ https://issues.apache.org/jira/browse/CASSANDRA-13987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250373#comment-16250373 ] Jeff Jirsa commented on CASSANDRA-13987: I get that the ship has sailed on 2.1/2.1, and I accept that. I'd like it in 3.0/3.11 because I think it's a guarantee people expect, but I'm open to arguments that it's too dangerous (I haven't touched that code in months, you have, I'll defer to you if you think it's straightforward enough). > Multithreaded commitlog subtly changed durability > - > > Key: CASSANDRA-13987 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13987 > Project: Cassandra > Issue Type: Improvement >Reporter: Jason Brown >Assignee: Jason Brown > Fix For: 4.x > > > When multithreaded commitlog was introduced in CASSANDRA-3578, we subtly > changed the way that commitlog durability worked. Everything still gets > written to an mmap file. However, not everything is replayable from the > mmaped file after a process crash, in periodic mode. > In brief, the reason this changesd is due to the chained markers that are > required for the multithreaded commit log. At each msync, we wait for > outstanding mutations to serialize into the commitlog, and update a marker > before and after the commits that have accumluated since the last sync. With > those markers, we can safely replay that section of the commitlog. Without > the markers, we have no guarantee that the commits in that section were > successfully written, thus we abandon those commits on replay. > If you have correlated process failures of multiple nodes at "nearly" the > same time (see ["There Is No > Now"|http://queue.acm.org/detail.cfm?id=2745385]), it is possible to have > data loss if none of the nodes msync the commitlog. For example, with RF=3, > if quorum write succeeds on two nodes (and we acknowledge the write back to > the client), and then the process on both nodes OOMs (say, due to reading the > index for a 100GB partition), the write will be lost if neither process > msync'ed the commitlog. More exactly, the commitlog cannot be fully replayed. > The reason why this data is silently lost is due to the chained markers that > were introduced with CASSANDRA-3578. > The problem we are addressing with this ticket is incrementally improving > 'durability' due to process crash, not host crash. (Note: operators should > use batch mode to ensure greater durability, but batch mode in it's current > implementation is a) borked, and b) will burn through, *very* rapidly, SSDs > that don't have a non-volatile write cache sitting in front.) > The current default for {{commitlog_sync_period_in_ms}} is 10 seconds, which > means that a node could lose up to ten seconds of data due to process crash. > The unfortunate thing is that the data is still avaialble, in the mmap file, > but we can't replay it due to incomplete chained markers. > ftr, I don't believe we've ever had a stated policy about commitlog > durability wrt process crash. Pre-2.0 we naturally piggy-backed off the > memory mapped file and the fact that every mutation was acquired a lock and > wrote into the mmap buffer, and the ability to replay everything out of it > came for free. With CASSANDRA-3578, that was subtly changed. > Something [~jjirsa] pointed out to me is that [MySQL provides a way to adjust > the durability > guarantees|https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit] > of each commit in innodb via the {{innodb_flush_log_at_trx_commit}}. I'm > using that idea as a loose springboard for what to do here. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13987) Multithreaded commitlog subtly changed durability
[ https://issues.apache.org/jira/browse/CASSANDRA-13987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250370#comment-16250370 ] Benedict commented on CASSANDRA-13987: -- The discussed behavioural change was introduced in 2.1, so if it's considered a regression it should probably go all the way back, at least to 2.2 (I think we're still servicing that, right?) However if we consider it a regression this doesn't fundamentally fix the problem, and we should probably file a follow up ticket if we want to restore 2.0 behaviour. For the record, it's quite likely that for unencrypted segments we can get very nearly identical behaviour to before with only changes to replay, by just skipping corrupted sync markers and continuing to replay records while we are able to. Some changes to the file format and/or the time at which we serialize the size/checksum could make this more reliable, but here we're talking about race conditions, which arguably isn't a regression given these could have equivalently simply been held up in the queue for the commit log thread before. For encrypted segments, I don't know if we need to "restore" behaviour since it was never available before, but it would make sense to do so (least surprise and all that). In which case we'd probably want to modify our segment writing to happen concurrently (but serially writing the bytes, of course). This probably isn't actually such a dramatic change, though it's been a while since I've looked at the code. This way we could "just" do the same as above, but also abort when we hit a corrupted/abrupt end of encrypted/compressed stream. > Multithreaded commitlog subtly changed durability > - > > Key: CASSANDRA-13987 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13987 > Project: Cassandra > Issue Type: Improvement >Reporter: Jason Brown >Assignee: Jason Brown > Fix For: 4.x > > > When multithreaded commitlog was introduced in CASSANDRA-3578, we subtly > changed the way that commitlog durability worked. Everything still gets > written to an mmap file. However, not everything is replayable from the > mmaped file after a process crash, in periodic mode. > In brief, the reason this changesd is due to the chained markers that are > required for the multithreaded commit log. At each msync, we wait for > outstanding mutations to serialize into the commitlog, and update a marker > before and after the commits that have accumluated since the last sync. With > those markers, we can safely replay that section of the commitlog. Without > the markers, we have no guarantee that the commits in that section were > successfully written, thus we abandon those commits on replay. > If you have correlated process failures of multiple nodes at "nearly" the > same time (see ["There Is No > Now"|http://queue.acm.org/detail.cfm?id=2745385]), it is possible to have > data loss if none of the nodes msync the commitlog. For example, with RF=3, > if quorum write succeeds on two nodes (and we acknowledge the write back to > the client), and then the process on both nodes OOMs (say, due to reading the > index for a 100GB partition), the write will be lost if neither process > msync'ed the commitlog. More exactly, the commitlog cannot be fully replayed. > The reason why this data is silently lost is due to the chained markers that > were introduced with CASSANDRA-3578. > The problem we are addressing with this ticket is incrementally improving > 'durability' due to process crash, not host crash. (Note: operators should > use batch mode to ensure greater durability, but batch mode in it's current > implementation is a) borked, and b) will burn through, *very* rapidly, SSDs > that don't have a non-volatile write cache sitting in front.) > The current default for {{commitlog_sync_period_in_ms}} is 10 seconds, which > means that a node could lose up to ten seconds of data due to process crash. > The unfortunate thing is that the data is still avaialble, in the mmap file, > but we can't replay it due to incomplete chained markers. > ftr, I don't believe we've ever had a stated policy about commitlog > durability wrt process crash. Pre-2.0 we naturally piggy-backed off the > memory mapped file and the fact that every mutation was acquired a lock and > wrote into the mmap buffer, and the ability to replay everything out of it > came for free. With CASSANDRA-3578, that was subtly changed. > Something [~jjirsa] pointed out to me is that [MySQL provides a way to adjust > the durability > guarantees|https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit] > of each commit in innodb via the {{innodb_flush_log_at_trx_commit}}. I'm > using that idea as a loose springboard for what to do
[jira] [Commented] (CASSANDRA-13987) Multithreaded commitlog subtly changed durability
[ https://issues.apache.org/jira/browse/CASSANDRA-13987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250368#comment-16250368 ] Jason Brown commented on CASSANDRA-13987: - [~jjirsa] This is a change in behavior from when multithreaded commitlog (CASSANDRA-3578) was introduced, in 2.1. I'm pretty sure we don't want to update 2.1, and 2.2 is highly doubtful, as well, but I'm fine with 3.0 and higher if folks think it's worth it. > Multithreaded commitlog subtly changed durability > - > > Key: CASSANDRA-13987 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13987 > Project: Cassandra > Issue Type: Improvement >Reporter: Jason Brown >Assignee: Jason Brown > Fix For: 4.x > > > When multithreaded commitlog was introduced in CASSANDRA-3578, we subtly > changed the way that commitlog durability worked. Everything still gets > written to an mmap file. However, not everything is replayable from the > mmaped file after a process crash, in periodic mode. > In brief, the reason this changesd is due to the chained markers that are > required for the multithreaded commit log. At each msync, we wait for > outstanding mutations to serialize into the commitlog, and update a marker > before and after the commits that have accumluated since the last sync. With > those markers, we can safely replay that section of the commitlog. Without > the markers, we have no guarantee that the commits in that section were > successfully written, thus we abandon those commits on replay. > If you have correlated process failures of multiple nodes at "nearly" the > same time (see ["There Is No > Now"|http://queue.acm.org/detail.cfm?id=2745385]), it is possible to have > data loss if none of the nodes msync the commitlog. For example, with RF=3, > if quorum write succeeds on two nodes (and we acknowledge the write back to > the client), and then the process on both nodes OOMs (say, due to reading the > index for a 100GB partition), the write will be lost if neither process > msync'ed the commitlog. More exactly, the commitlog cannot be fully replayed. > The reason why this data is silently lost is due to the chained markers that > were introduced with CASSANDRA-3578. > The problem we are addressing with this ticket is incrementally improving > 'durability' due to process crash, not host crash. (Note: operators should > use batch mode to ensure greater durability, but batch mode in it's current > implementation is a) borked, and b) will burn through, *very* rapidly, SSDs > that don't have a non-volatile write cache sitting in front.) > The current default for {{commitlog_sync_period_in_ms}} is 10 seconds, which > means that a node could lose up to ten seconds of data due to process crash. > The unfortunate thing is that the data is still avaialble, in the mmap file, > but we can't replay it due to incomplete chained markers. > ftr, I don't believe we've ever had a stated policy about commitlog > durability wrt process crash. Pre-2.0 we naturally piggy-backed off the > memory mapped file and the fact that every mutation was acquired a lock and > wrote into the mmap buffer, and the ability to replay everything out of it > came for free. With CASSANDRA-3578, that was subtly changed. > Something [~jjirsa] pointed out to me is that [MySQL provides a way to adjust > the durability > guarantees|https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit] > of each commit in innodb via the {{innodb_flush_log_at_trx_commit}}. I'm > using that idea as a loose springboard for what to do here. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14013) Data loss in snapshots keyspace after service restart
[ https://issues.apache.org/jira/browse/CASSANDRA-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250357#comment-16250357 ] Jason Brown commented on CASSANDRA-14013: - [~kongo2002] After you perform the inserts, how long do you wait before bouncing cassandra? If you wait for >= 10 seconds (or whatever {{commitlog_sync_period_in_ms}} is set to in the {{cassandra.yaml}}), do you still have the same problem? I believe CASSANDRA-13987 addresses the same issue that you are raising here. You can read that ticket for all the gory details. > Data loss in snapshots keyspace after service restart > - > > Key: CASSANDRA-14013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14013 > Project: Cassandra > Issue Type: Bug >Reporter: Gregor Uhlenheuer > > I am posting this bug in hope to discover the stupid mistake I am doing > because I can't imagine a reasonable answer for the behavior I see right now > :-) > In short words, I do observe data loss in a keyspace called *snapshots* after > restarting the Cassandra service. Say I do have 1000 records in a table > called *snapshots.test_idx* then after restart the table has less entries or > is even empty. > My kind of "mysterious" observation is that it happens only in a keyspace > called *snapshots*... > h3. Steps to reproduce > These steps to reproduce show the described behavior in "most" attempts (not > every single time though). > {code} > # create keyspace > CREATE KEYSPACE snapshots WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': 1}; > # create table > CREATE TABLE snapshots.test_idx (key text, seqno bigint, primary key(key)); > # insert some test data > INSERT INTO snapshots.test_idx (key,seqno) values ('key1', 1); > ... > INSERT INTO snapshots.test_idx (key,seqno) values ('key1000', 1000); > # count entries > SELECT count(*) FROM snapshots.test_idx; > 1000 > # restart service > kill > cassandra -f > # count entries > SELECT count(*) FROM snapshots.test_idx; > 0 > {code} > I hope someone can point me to the obvious mistake I am doing :-) > This happened to me using both Cassandra 3.9 and 3.11.0 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13987) Multithreaded commitlog subtly changed durability
[ https://issues.apache.org/jira/browse/CASSANDRA-13987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250338#comment-16250338 ] sankalp kohli commented on CASSANDRA-13987: --- +1 for doing this in 3.0+ > Multithreaded commitlog subtly changed durability > - > > Key: CASSANDRA-13987 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13987 > Project: Cassandra > Issue Type: Improvement >Reporter: Jason Brown >Assignee: Jason Brown > Fix For: 4.x > > > When multithreaded commitlog was introduced in CASSANDRA-3578, we subtly > changed the way that commitlog durability worked. Everything still gets > written to an mmap file. However, not everything is replayable from the > mmaped file after a process crash, in periodic mode. > In brief, the reason this changesd is due to the chained markers that are > required for the multithreaded commit log. At each msync, we wait for > outstanding mutations to serialize into the commitlog, and update a marker > before and after the commits that have accumluated since the last sync. With > those markers, we can safely replay that section of the commitlog. Without > the markers, we have no guarantee that the commits in that section were > successfully written, thus we abandon those commits on replay. > If you have correlated process failures of multiple nodes at "nearly" the > same time (see ["There Is No > Now"|http://queue.acm.org/detail.cfm?id=2745385]), it is possible to have > data loss if none of the nodes msync the commitlog. For example, with RF=3, > if quorum write succeeds on two nodes (and we acknowledge the write back to > the client), and then the process on both nodes OOMs (say, due to reading the > index for a 100GB partition), the write will be lost if neither process > msync'ed the commitlog. More exactly, the commitlog cannot be fully replayed. > The reason why this data is silently lost is due to the chained markers that > were introduced with CASSANDRA-3578. > The problem we are addressing with this ticket is incrementally improving > 'durability' due to process crash, not host crash. (Note: operators should > use batch mode to ensure greater durability, but batch mode in it's current > implementation is a) borked, and b) will burn through, *very* rapidly, SSDs > that don't have a non-volatile write cache sitting in front.) > The current default for {{commitlog_sync_period_in_ms}} is 10 seconds, which > means that a node could lose up to ten seconds of data due to process crash. > The unfortunate thing is that the data is still avaialble, in the mmap file, > but we can't replay it due to incomplete chained markers. > ftr, I don't believe we've ever had a stated policy about commitlog > durability wrt process crash. Pre-2.0 we naturally piggy-backed off the > memory mapped file and the fact that every mutation was acquired a lock and > wrote into the mmap buffer, and the ability to replay everything out of it > came for free. With CASSANDRA-3578, that was subtly changed. > Something [~jjirsa] pointed out to me is that [MySQL provides a way to adjust > the durability > guarantees|https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit] > of each commit in innodb via the {{innodb_flush_log_at_trx_commit}}. I'm > using that idea as a loose springboard for what to do here. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13987) Multithreaded commitlog subtly changed durability
[ https://issues.apache.org/jira/browse/CASSANDRA-13987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250338#comment-16250338 ] sankalp kohli edited comment on CASSANDRA-13987 at 11/13/17 10:07 PM: -- +1 for doing this in 3.0+ was (Author: kohlisankalp): +1 for doing this in 3.0+ > Multithreaded commitlog subtly changed durability > - > > Key: CASSANDRA-13987 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13987 > Project: Cassandra > Issue Type: Improvement >Reporter: Jason Brown >Assignee: Jason Brown > Fix For: 4.x > > > When multithreaded commitlog was introduced in CASSANDRA-3578, we subtly > changed the way that commitlog durability worked. Everything still gets > written to an mmap file. However, not everything is replayable from the > mmaped file after a process crash, in periodic mode. > In brief, the reason this changesd is due to the chained markers that are > required for the multithreaded commit log. At each msync, we wait for > outstanding mutations to serialize into the commitlog, and update a marker > before and after the commits that have accumluated since the last sync. With > those markers, we can safely replay that section of the commitlog. Without > the markers, we have no guarantee that the commits in that section were > successfully written, thus we abandon those commits on replay. > If you have correlated process failures of multiple nodes at "nearly" the > same time (see ["There Is No > Now"|http://queue.acm.org/detail.cfm?id=2745385]), it is possible to have > data loss if none of the nodes msync the commitlog. For example, with RF=3, > if quorum write succeeds on two nodes (and we acknowledge the write back to > the client), and then the process on both nodes OOMs (say, due to reading the > index for a 100GB partition), the write will be lost if neither process > msync'ed the commitlog. More exactly, the commitlog cannot be fully replayed. > The reason why this data is silently lost is due to the chained markers that > were introduced with CASSANDRA-3578. > The problem we are addressing with this ticket is incrementally improving > 'durability' due to process crash, not host crash. (Note: operators should > use batch mode to ensure greater durability, but batch mode in it's current > implementation is a) borked, and b) will burn through, *very* rapidly, SSDs > that don't have a non-volatile write cache sitting in front.) > The current default for {{commitlog_sync_period_in_ms}} is 10 seconds, which > means that a node could lose up to ten seconds of data due to process crash. > The unfortunate thing is that the data is still avaialble, in the mmap file, > but we can't replay it due to incomplete chained markers. > ftr, I don't believe we've ever had a stated policy about commitlog > durability wrt process crash. Pre-2.0 we naturally piggy-backed off the > memory mapped file and the fact that every mutation was acquired a lock and > wrote into the mmap buffer, and the ability to replay everything out of it > came for free. With CASSANDRA-3578, that was subtly changed. > Something [~jjirsa] pointed out to me is that [MySQL provides a way to adjust > the durability > guarantees|https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit] > of each commit in innodb via the {{innodb_flush_log_at_trx_commit}}. I'm > using that idea as a loose springboard for what to do here. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13987) Multithreaded commitlog subtly changed durability
[ https://issues.apache.org/jira/browse/CASSANDRA-13987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250333#comment-16250333 ] Jeff Jirsa commented on CASSANDRA-13987: Am I the only one who thinks this belongs in 3.0+ instead of 4.0? It's a regression (though not from 2.1/2.2, I guess), and it impacts data safety. > Multithreaded commitlog subtly changed durability > - > > Key: CASSANDRA-13987 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13987 > Project: Cassandra > Issue Type: Improvement >Reporter: Jason Brown >Assignee: Jason Brown > Fix For: 4.x > > > When multithreaded commitlog was introduced in CASSANDRA-3578, we subtly > changed the way that commitlog durability worked. Everything still gets > written to an mmap file. However, not everything is replayable from the > mmaped file after a process crash, in periodic mode. > In brief, the reason this changesd is due to the chained markers that are > required for the multithreaded commit log. At each msync, we wait for > outstanding mutations to serialize into the commitlog, and update a marker > before and after the commits that have accumluated since the last sync. With > those markers, we can safely replay that section of the commitlog. Without > the markers, we have no guarantee that the commits in that section were > successfully written, thus we abandon those commits on replay. > If you have correlated process failures of multiple nodes at "nearly" the > same time (see ["There Is No > Now"|http://queue.acm.org/detail.cfm?id=2745385]), it is possible to have > data loss if none of the nodes msync the commitlog. For example, with RF=3, > if quorum write succeeds on two nodes (and we acknowledge the write back to > the client), and then the process on both nodes OOMs (say, due to reading the > index for a 100GB partition), the write will be lost if neither process > msync'ed the commitlog. More exactly, the commitlog cannot be fully replayed. > The reason why this data is silently lost is due to the chained markers that > were introduced with CASSANDRA-3578. > The problem we are addressing with this ticket is incrementally improving > 'durability' due to process crash, not host crash. (Note: operators should > use batch mode to ensure greater durability, but batch mode in it's current > implementation is a) borked, and b) will burn through, *very* rapidly, SSDs > that don't have a non-volatile write cache sitting in front.) > The current default for {{commitlog_sync_period_in_ms}} is 10 seconds, which > means that a node could lose up to ten seconds of data due to process crash. > The unfortunate thing is that the data is still avaialble, in the mmap file, > but we can't replay it due to incomplete chained markers. > ftr, I don't believe we've ever had a stated policy about commitlog > durability wrt process crash. Pre-2.0 we naturally piggy-backed off the > memory mapped file and the fact that every mutation was acquired a lock and > wrote into the mmap buffer, and the ability to replay everything out of it > came for free. With CASSANDRA-3578, that was subtly changed. > Something [~jjirsa] pointed out to me is that [MySQL provides a way to adjust > the durability > guarantees|https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit] > of each commit in innodb via the {{innodb_flush_log_at_trx_commit}}. I'm > using that idea as a loose springboard for what to do here. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13983) Support a means of logging all queries as they were invoked
[ https://issues.apache.org/jira/browse/CASSANDRA-13983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250240#comment-16250240 ] Blake Eggleston commented on CASSANDRA-13983: - Well I meant more than one :). Anyway, if it's there intentionally, I don't have a problem with it. I was just calling it out because it seemed like it could have been something left over from an earlier iteration. > Support a means of logging all queries as they were invoked > --- > > Key: CASSANDRA-13983 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13983 > Project: Cassandra > Issue Type: New Feature > Components: CQL, Observability, Testing, Tools >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > Fix For: 4.0 > > > For correctness testing it's useful to be able to capture production traffic > so that it can be replayed against both the old and new versions of Cassandra > while comparing the results. > Implementing this functionality once inside the database is high performance > and presents less operational complexity. > In [this patch|https://github.com/apache/cassandra/pull/169] there is an > implementation of a full query log that logs uses chronicle-queue (apache > licensed, the maven artifacts are labeled incorrectly in some cases, > dependencies are also apache licensed) to implement a rotating log of queries. > * Single thread asynchronously writes log entries to disk to reduce impact on > query latency > * Heap memory usage bounded by a weighted queue with configurable maximum > weight sitting in front of logging thread > * If the weighted queue is full producers can be blocked or samples can be > dropped > * Disk utilization is bounded by deleting old log segments once a > configurable size is reached > * The on disk serialization uses a flexible schema binary format > (chronicle-wire) making it easy to skip unrecognized fields, add new ones, > and omit old ones. > * Can be enabled and configured via JMX, disabled, and reset (delete on disk > data), logging path is configurable via both JMX and YAML > * Introduce new {{fqltool}} in /bin that currently implements {{Dump}} which > can dump in a human readable format full query logs as well as follow active > full query logs > Follow up work: > * Introduce new {{fqltool}} command Replay which can replay N full query logs > to two different clusters and compare the result and check for > inconsistencies. <- Actively working on getting this done > * Log not just queries but their results to facilitate a comparison between > the original query result and the replayed result. <- Really just don't have > specific use case at the moment > * "Consistent" query logging allowing replay to fully replicate the original > order of execution and completion even in the face of races (including CAS). > <- This is more speculative -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13987) Multithreaded commitlog subtly changed durability
[ https://issues.apache.org/jira/browse/CASSANDRA-13987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-13987: Status: In Progress (was: Patch Available) > Multithreaded commitlog subtly changed durability > - > > Key: CASSANDRA-13987 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13987 > Project: Cassandra > Issue Type: Improvement >Reporter: Jason Brown >Assignee: Jason Brown > Fix For: 4.x > > > When multithreaded commitlog was introduced in CASSANDRA-3578, we subtly > changed the way that commitlog durability worked. Everything still gets > written to an mmap file. However, not everything is replayable from the > mmaped file after a process crash, in periodic mode. > In brief, the reason this changesd is due to the chained markers that are > required for the multithreaded commit log. At each msync, we wait for > outstanding mutations to serialize into the commitlog, and update a marker > before and after the commits that have accumluated since the last sync. With > those markers, we can safely replay that section of the commitlog. Without > the markers, we have no guarantee that the commits in that section were > successfully written, thus we abandon those commits on replay. > If you have correlated process failures of multiple nodes at "nearly" the > same time (see ["There Is No > Now"|http://queue.acm.org/detail.cfm?id=2745385]), it is possible to have > data loss if none of the nodes msync the commitlog. For example, with RF=3, > if quorum write succeeds on two nodes (and we acknowledge the write back to > the client), and then the process on both nodes OOMs (say, due to reading the > index for a 100GB partition), the write will be lost if neither process > msync'ed the commitlog. More exactly, the commitlog cannot be fully replayed. > The reason why this data is silently lost is due to the chained markers that > were introduced with CASSANDRA-3578. > The problem we are addressing with this ticket is incrementally improving > 'durability' due to process crash, not host crash. (Note: operators should > use batch mode to ensure greater durability, but batch mode in it's current > implementation is a) borked, and b) will burn through, *very* rapidly, SSDs > that don't have a non-volatile write cache sitting in front.) > The current default for {{commitlog_sync_period_in_ms}} is 10 seconds, which > means that a node could lose up to ten seconds of data due to process crash. > The unfortunate thing is that the data is still avaialble, in the mmap file, > but we can't replay it due to incomplete chained markers. > ftr, I don't believe we've ever had a stated policy about commitlog > durability wrt process crash. Pre-2.0 we naturally piggy-backed off the > memory mapped file and the fact that every mutation was acquired a lock and > wrote into the mmap buffer, and the ability to replay everything out of it > came for free. With CASSANDRA-3578, that was subtly changed. > Something [~jjirsa] pointed out to me is that [MySQL provides a way to adjust > the durability > guarantees|https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit] > of each commit in innodb via the {{innodb_flush_log_at_trx_commit}}. I'm > using that idea as a loose springboard for what to do here. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13987) Multithreaded commitlog subtly changed durability
[ https://issues.apache.org/jira/browse/CASSANDRA-13987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250195#comment-16250195 ] Sam Tunnicliffe commented on CASSANDRA-13987: - Previously, {{writeCDCIndexFile}} was only called ever called after a flush, which would be consistent with its comment that states: {code}We persist the offset of the last data synced to disk so clients can parse only durable data if they choose{code} So currently this definition of durable would include durability in the face of host failures, whereas with this patch the index file may contain offsets for segments that are durable under process crash, but which have not yet been msynced/fsynced and so may not survive a host failure. Should we move the call to {{writeCDCIndexFile}} into the {{if (flush || close)}} block, to after the flush has completed? That question aside, the code seems solid and I've manually tested both as-is and with some added hacks to inject failures etc, but I feel like it could still benefit from some automated testing to cover the new behaviour. I know that writing tests for this area is non-trivial and usually involves byteman, but do you think it's worth adding a unit test or two for this? Nits: * Typo in cassandra.yaml #380 s/mmaped/mmapped * The comment atop {{AbstractCommitLogSegmentManager::sync}} could use updating. The fact that it says it flushes, but also takes a boolean flush arg is a bit confusing. * {{CompressedSegment}} and {{EncryptedSegment}} no longer need to import {{SyncUtil}} > Multithreaded commitlog subtly changed durability > - > > Key: CASSANDRA-13987 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13987 > Project: Cassandra > Issue Type: Improvement >Reporter: Jason Brown >Assignee: Jason Brown > Fix For: 4.x > > > When multithreaded commitlog was introduced in CASSANDRA-3578, we subtly > changed the way that commitlog durability worked. Everything still gets > written to an mmap file. However, not everything is replayable from the > mmaped file after a process crash, in periodic mode. > In brief, the reason this changesd is due to the chained markers that are > required for the multithreaded commit log. At each msync, we wait for > outstanding mutations to serialize into the commitlog, and update a marker > before and after the commits that have accumluated since the last sync. With > those markers, we can safely replay that section of the commitlog. Without > the markers, we have no guarantee that the commits in that section were > successfully written, thus we abandon those commits on replay. > If you have correlated process failures of multiple nodes at "nearly" the > same time (see ["There Is No > Now"|http://queue.acm.org/detail.cfm?id=2745385]), it is possible to have > data loss if none of the nodes msync the commitlog. For example, with RF=3, > if quorum write succeeds on two nodes (and we acknowledge the write back to > the client), and then the process on both nodes OOMs (say, due to reading the > index for a 100GB partition), the write will be lost if neither process > msync'ed the commitlog. More exactly, the commitlog cannot be fully replayed. > The reason why this data is silently lost is due to the chained markers that > were introduced with CASSANDRA-3578. > The problem we are addressing with this ticket is incrementally improving > 'durability' due to process crash, not host crash. (Note: operators should > use batch mode to ensure greater durability, but batch mode in it's current > implementation is a) borked, and b) will burn through, *very* rapidly, SSDs > that don't have a non-volatile write cache sitting in front.) > The current default for {{commitlog_sync_period_in_ms}} is 10 seconds, which > means that a node could lose up to ten seconds of data due to process crash. > The unfortunate thing is that the data is still avaialble, in the mmap file, > but we can't replay it due to incomplete chained markers. > ftr, I don't believe we've ever had a stated policy about commitlog > durability wrt process crash. Pre-2.0 we naturally piggy-backed off the > memory mapped file and the fact that every mutation was acquired a lock and > wrote into the mmap buffer, and the ability to replay everything out of it > came for free. With CASSANDRA-3578, that was subtly changed. > Something [~jjirsa] pointed out to me is that [MySQL provides a way to adjust > the durability > guarantees|https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit] > of each commit in innodb via the {{innodb_flush_log_at_trx_commit}}. I'm > using that idea as a loose springboard for what to do here. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CASSANDRA-13992) Don't send new_metadata_id for conditional updates
[ https://issues.apache.org/jira/browse/CASSANDRA-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250154#comment-16250154 ] Olivier Michallat commented on CASSANDRA-13992: --- {{METADATA_CHANGED}} tells the client if it needs to update its local copy of the metadata. For conditional updates, the answer is always no (since the client should never store that information in the first place); that is why I think it's more intuitive to set the flag to false. To put it another way: if the flag is forced to true, I have to add a condition in the client code ({{newMetadataId.bytes.length > 0}}). My worry is that a client implementation could forget to check that the id is empty, and end up with a sub-optimal behavior (that updates the local metadata unnecessarily each time). If the flag is absent, conditional updates can be handled like any other statement. > Don't send new_metadata_id for conditional updates > -- > > Key: CASSANDRA-13992 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13992 > Project: Cassandra > Issue Type: Bug >Reporter: Olivier Michallat >Assignee: Kurt Greaves >Priority: Minor > > This is a follow-up to CASSANDRA-10786. > Given the table > {code} > CREATE TABLE foo (k int PRIMARY KEY) > {code} > And the prepared statement > {code} > INSERT INTO foo (k) VALUES (?) IF NOT EXISTS > {code} > The result set metadata changes depending on the outcome of the update: > * if the row didn't exist, there is only a single column \[applied] = true > * if it did, the result contains \[applied] = false, plus the current value > of column k. > The way this was handled so far is that the PREPARED response contains no > result set metadata, and therefore all EXECUTE messages have SKIP_METADATA = > false, and the responses always include the full (and correct) metadata. > CASSANDRA-10786 still sends the PREPARED response with no metadata, *but the > response to EXECUTE now contains a {{new_metadata_id}}*. The driver thinks it > is because of a schema change, and updates its local copy of the prepared > statement's result metadata. > The next EXECUTE is sent with SKIP_METADATA = true, but the server appears to > ignore that, and still sends the metadata in the response. So each response > includes the correct metadata, the driver uses it, and there is no visible > issue for client code. > The only drawback is that the driver updates its local copy of the metadata > unnecessarily, every time. We can work around that by only updating if we had > metadata before, at the cost of an extra volatile read. But I think the best > thing to do would be to never send a {{new_metadata_id}} in for a conditional > update. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13992) Don't send new_metadata_id for conditional updates
[ https://issues.apache.org/jira/browse/CASSANDRA-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250071#comment-16250071 ] Alex Petrov edited comment on CASSANDRA-13992 at 11/13/17 7:43 PM: --- [~omichallat] not sure, since {{METADATA_CHANGED}} is just a flag: e.g. if it's set it's {{true}}, otherwise it's {{false}}. Moreover, I think that the default behaviour for LWTs has to be that we _always_ update metadata: there's no way for server to know what was the last metadata on the client (since it depends on the result), the server can't distinguish between the metadata hash inequality caused by {{ALTER}} vs caused by success/non-success LWT result. Unless I'm missing something, my patch achieves exactly that (also, without any driver changes): it forces the server to _always_ send the metadata. This, combined with the metadata consisting of zeroes can instruct the client that caching metadata is possible, but won't bring anything: new result metadata will just be re-delivered on every call, since it's potentially going to be changing on every request. I haven't updated spec though. I will, if/when we agree on the behaviour. was (Author: ifesdjeen): [~omichallat] not sure, since {{METADATA_CHANGED}} is just a flag: e.g. if it's set it's {{true}}, otherwise it's {{false}}. Moreover, I think that the default behaviour for LWTs has to be that we _always_ update metadata: there's no way for server to know what was the last metadata on the client (since it depends on the result), the server can't distinguish between the metadata hash inequality caused by {{ALTER}} vs caused by success/non-success LWT result. Unless I'm missing something, my patch achieves exactly that (also, without any driver changes): it forces the server to _always_ send the metadata. This, combined with the metadata consisting of zeroes can instruct the client that caching metadata is possible, but won't bring anything: new result metadata will just be re-delivered on every call, since it's potentially going to be changing on every request. > Don't send new_metadata_id for conditional updates > -- > > Key: CASSANDRA-13992 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13992 > Project: Cassandra > Issue Type: Bug >Reporter: Olivier Michallat >Assignee: Kurt Greaves >Priority: Minor > > This is a follow-up to CASSANDRA-10786. > Given the table > {code} > CREATE TABLE foo (k int PRIMARY KEY) > {code} > And the prepared statement > {code} > INSERT INTO foo (k) VALUES (?) IF NOT EXISTS > {code} > The result set metadata changes depending on the outcome of the update: > * if the row didn't exist, there is only a single column \[applied] = true > * if it did, the result contains \[applied] = false, plus the current value > of column k. > The way this was handled so far is that the PREPARED response contains no > result set metadata, and therefore all EXECUTE messages have SKIP_METADATA = > false, and the responses always include the full (and correct) metadata. > CASSANDRA-10786 still sends the PREPARED response with no metadata, *but the > response to EXECUTE now contains a {{new_metadata_id}}*. The driver thinks it > is because of a schema change, and updates its local copy of the prepared > statement's result metadata. > The next EXECUTE is sent with SKIP_METADATA = true, but the server appears to > ignore that, and still sends the metadata in the response. So each response > includes the correct metadata, the driver uses it, and there is no visible > issue for client code. > The only drawback is that the driver updates its local copy of the metadata > unnecessarily, every time. We can work around that by only updating if we had > metadata before, at the cost of an extra volatile read. But I think the best > thing to do would be to never send a {{new_metadata_id}} in for a conditional > update. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13992) Don't send new_metadata_id for conditional updates
[ https://issues.apache.org/jira/browse/CASSANDRA-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250071#comment-16250071 ] Alex Petrov commented on CASSANDRA-13992: - [~omichallat] not sure, since {{METADATA_CHANGED}} is just a flag: e.g. if it's set it's {{true}}, otherwise it's {{false}}. Moreover, I think that the default behaviour for LWTs has to be that we _always_ update metadata: there's no way for server to know what was the last metadata on the client (since it depends on the result), the server can't distinguish between the metadata hash inequality caused by {{ALTER}} vs caused by success/non-success LWT result. Unless I'm missing something, my patch achieves exactly that (also, without any driver changes): it forces the server to _always_ send the metadata. This, combined with the metadata consisting of zeroes can instruct the client that caching metadata is possible, but won't bring anything: new result metadata will just be re-delivered on every call, since it's potentially going to be changing on every request. > Don't send new_metadata_id for conditional updates > -- > > Key: CASSANDRA-13992 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13992 > Project: Cassandra > Issue Type: Bug >Reporter: Olivier Michallat >Assignee: Kurt Greaves >Priority: Minor > > This is a follow-up to CASSANDRA-10786. > Given the table > {code} > CREATE TABLE foo (k int PRIMARY KEY) > {code} > And the prepared statement > {code} > INSERT INTO foo (k) VALUES (?) IF NOT EXISTS > {code} > The result set metadata changes depending on the outcome of the update: > * if the row didn't exist, there is only a single column \[applied] = true > * if it did, the result contains \[applied] = false, plus the current value > of column k. > The way this was handled so far is that the PREPARED response contains no > result set metadata, and therefore all EXECUTE messages have SKIP_METADATA = > false, and the responses always include the full (and correct) metadata. > CASSANDRA-10786 still sends the PREPARED response with no metadata, *but the > response to EXECUTE now contains a {{new_metadata_id}}*. The driver thinks it > is because of a schema change, and updates its local copy of the prepared > statement's result metadata. > The next EXECUTE is sent with SKIP_METADATA = true, but the server appears to > ignore that, and still sends the metadata in the response. So each response > includes the correct metadata, the driver uses it, and there is no visible > issue for client code. > The only drawback is that the driver updates its local copy of the metadata > unnecessarily, every time. We can work around that by only updating if we had > metadata before, at the cost of an extra volatile read. But I think the best > thing to do would be to never send a {{new_metadata_id}} in for a conditional > update. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13983) Support a means of logging all queries as they were invoked
[ https://issues.apache.org/jira/browse/CASSANDRA-13983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250045#comment-16250045 ] Ariel Weisberg commented on CASSANDRA-13983: bq. It doesn't look like we use special Weigher implementations anywhere. I think this could be slightly simplified if we made the type param ? There is one. It's the natural weigher :-) I think we don't do enough to bound resources by weight in general. Having a piece of library code ready to go lowers the barrier to using it. The idiom of allowing a pluggable weigher for legacy items or classes you can't modify is pretty common in this kind of library code (Comparable and sorting and navigable maps, Guava Cache's weigher). I get that if you want to cut to the bone then yes technically this could be done without it, but it's unit tested and it can be done with it. I'd like to keep it but you are right it's not core to what this ticket is trying to do. > Support a means of logging all queries as they were invoked > --- > > Key: CASSANDRA-13983 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13983 > Project: Cassandra > Issue Type: New Feature > Components: CQL, Observability, Testing, Tools >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > Fix For: 4.0 > > > For correctness testing it's useful to be able to capture production traffic > so that it can be replayed against both the old and new versions of Cassandra > while comparing the results. > Implementing this functionality once inside the database is high performance > and presents less operational complexity. > In [this patch|https://github.com/apache/cassandra/pull/169] there is an > implementation of a full query log that logs uses chronicle-queue (apache > licensed, the maven artifacts are labeled incorrectly in some cases, > dependencies are also apache licensed) to implement a rotating log of queries. > * Single thread asynchronously writes log entries to disk to reduce impact on > query latency > * Heap memory usage bounded by a weighted queue with configurable maximum > weight sitting in front of logging thread > * If the weighted queue is full producers can be blocked or samples can be > dropped > * Disk utilization is bounded by deleting old log segments once a > configurable size is reached > * The on disk serialization uses a flexible schema binary format > (chronicle-wire) making it easy to skip unrecognized fields, add new ones, > and omit old ones. > * Can be enabled and configured via JMX, disabled, and reset (delete on disk > data), logging path is configurable via both JMX and YAML > * Introduce new {{fqltool}} in /bin that currently implements {{Dump}} which > can dump in a human readable format full query logs as well as follow active > full query logs > Follow up work: > * Introduce new {{fqltool}} command Replay which can replay N full query logs > to two different clusters and compare the result and check for > inconsistencies. <- Actively working on getting this done > * Log not just queries but their results to facilitate a comparison between > the original query result and the replayed result. <- Really just don't have > specific use case at the moment > * "Consistent" query logging allowing replay to fully replicate the original > order of execution and completion even in the face of races (including CAS). > <- This is more speculative -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14013) Data loss in snapshots keyspace after service restart
[ https://issues.apache.org/jira/browse/CASSANDRA-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249947#comment-16249947 ] Gregor Uhlenheuer commented on CASSANDRA-14013: --- What additionally throws me off is that similar {{INSERTs}} into another keyspace with the exact same schema and settings do survive every restart without any issues. > Data loss in snapshots keyspace after service restart > - > > Key: CASSANDRA-14013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14013 > Project: Cassandra > Issue Type: Bug >Reporter: Gregor Uhlenheuer > > I am posting this bug in hope to discover the stupid mistake I am doing > because I can't imagine a reasonable answer for the behavior I see right now > :-) > In short words, I do observe data loss in a keyspace called *snapshots* after > restarting the Cassandra service. Say I do have 1000 records in a table > called *snapshots.test_idx* then after restart the table has less entries or > is even empty. > My kind of "mysterious" observation is that it happens only in a keyspace > called *snapshots*... > h3. Steps to reproduce > These steps to reproduce show the described behavior in "most" attempts (not > every single time though). > {code} > # create keyspace > CREATE KEYSPACE snapshots WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': 1}; > # create table > CREATE TABLE snapshots.test_idx (key text, seqno bigint, primary key(key)); > # insert some test data > INSERT INTO snapshots.test_idx (key,seqno) values ('key1', 1); > ... > INSERT INTO snapshots.test_idx (key,seqno) values ('key1000', 1000); > # count entries > SELECT count(*) FROM snapshots.test_idx; > 1000 > # restart service > kill > cassandra -f > # count entries > SELECT count(*) FROM snapshots.test_idx; > 0 > {code} > I hope someone can point me to the obvious mistake I am doing :-) > This happened to me using both Cassandra 3.9 and 3.11.0 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14013) Data loss in snapshots keyspace after service restart
[ https://issues.apache.org/jira/browse/CASSANDRA-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249896#comment-16249896 ] Gregor Uhlenheuer commented on CASSANDRA-14013: --- It's the default (which is {{periodic}} if I recall correctly) since I tested with the vanilla configuration. > Data loss in snapshots keyspace after service restart > - > > Key: CASSANDRA-14013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14013 > Project: Cassandra > Issue Type: Bug >Reporter: Gregor Uhlenheuer > > I am posting this bug in hope to discover the stupid mistake I am doing > because I can't imagine a reasonable answer for the behavior I see right now > :-) > In short words, I do observe data loss in a keyspace called *snapshots* after > restarting the Cassandra service. Say I do have 1000 records in a table > called *snapshots.test_idx* then after restart the table has less entries or > is even empty. > My kind of "mysterious" observation is that it happens only in a keyspace > called *snapshots*... > h3. Steps to reproduce > These steps to reproduce show the described behavior in "most" attempts (not > every single time though). > {code} > # create keyspace > CREATE KEYSPACE snapshots WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': 1}; > # create table > CREATE TABLE snapshots.test_idx (key text, seqno bigint, primary key(key)); > # insert some test data > INSERT INTO snapshots.test_idx (key,seqno) values ('key1', 1); > ... > INSERT INTO snapshots.test_idx (key,seqno) values ('key1000', 1000); > # count entries > SELECT count(*) FROM snapshots.test_idx; > 1000 > # restart service > kill > cassandra -f > # count entries > SELECT count(*) FROM snapshots.test_idx; > 0 > {code} > I hope someone can point me to the obvious mistake I am doing :-) > This happened to me using both Cassandra 3.9 and 3.11.0 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14013) Data loss in snapshots keyspace after service restart
[ https://issues.apache.org/jira/browse/CASSANDRA-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249851#comment-16249851 ] sankalp kohli commented on CASSANDRA-14013: --- Which commit log mode are you using? > Data loss in snapshots keyspace after service restart > - > > Key: CASSANDRA-14013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14013 > Project: Cassandra > Issue Type: Bug >Reporter: Gregor Uhlenheuer > > I am posting this bug in hope to discover the stupid mistake I am doing > because I can't imagine a reasonable answer for the behavior I see right now > :-) > In short words, I do observe data loss in a keyspace called *snapshots* after > restarting the Cassandra service. Say I do have 1000 records in a table > called *snapshots.test_idx* then after restart the table has less entries or > is even empty. > My kind of "mysterious" observation is that it happens only in a keyspace > called *snapshots*... > h3. Steps to reproduce > These steps to reproduce show the described behavior in "most" attempts (not > every single time though). > {code} > # create keyspace > CREATE KEYSPACE snapshots WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': 1}; > # create table > CREATE TABLE snapshots.test_idx (key text, seqno bigint, primary key(key)); > # insert some test data > INSERT INTO snapshots.test_idx (key,seqno) values ('key1', 1); > ... > INSERT INTO snapshots.test_idx (key,seqno) values ('key1000', 1000); > # count entries > SELECT count(*) FROM snapshots.test_idx; > 1000 > # restart service > kill > cassandra -f > # count entries > SELECT count(*) FROM snapshots.test_idx; > 0 > {code} > I hope someone can point me to the obvious mistake I am doing :-) > This happened to me using both Cassandra 3.9 and 3.11.0 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13992) Don't send new_metadata_id for conditional updates
[ https://issues.apache.org/jira/browse/CASSANDRA-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249838#comment-16249838 ] Olivier Michallat edited comment on CASSANDRA-13992 at 11/13/17 5:18 PM: - [~ifesdjeen] that would work, the driver can treat an empty {{new_metadata_id}} as "don't update my local copy". Namely, changing [this line|https://github.com/datastax/java-driver/blob/6eeb8b2193ab5b50b73b0d9a533e775265f11007/driver-core/src/main/java/com/datastax/driver/core/ArrayBackedResultSet.java#L83] to: {code} if (newMetadataId != null && newMetadataId.bytes.length > 0) { {code} However that feels kind of hacky. Consider how we would have to explain that in the protocol spec: {quote} - is \[short bytes] representing the new, changed resultset metadata. The new metadata ID must also be used in subsequent executions of the corresponding prepared statement, if any, *except if it is empty*. {quote} It would make so much more sense to force {{METADATA_CHANGED}} to *false* for conditional updates, isn't there any way we can do that? was (Author: omichallat): [~ifesdjeen] that would work, the driver can treat an empty {{new_metadata_id}} as "don't update my local copy". Namely, changing [this line|https://github.com/datastax/java-driver/blob/6eeb8b2193ab5b50b73b0d9a533e775265f11007/driver-core/src/main/java/com/datastax/driver/core/ArrayBackedResultSet.java#L83] to: {code} if (newMetadataId != null && newMetadataId.bytes.length > 0) { {code} However that feels kind of hacky. Consider how we would have to update the protocol spec to explain this: {quote} - is \[short bytes] representing the new, changed resultset metadata. The new metadata ID must also be used in subsequent executions of the corresponding prepared statement, if any, *except if it is empty*. {quote} It would make so much more sense to force {{METADATA_CHANGED}} to *false* for conditional statements, isn't there any way we can do that? > Don't send new_metadata_id for conditional updates > -- > > Key: CASSANDRA-13992 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13992 > Project: Cassandra > Issue Type: Bug >Reporter: Olivier Michallat >Assignee: Kurt Greaves >Priority: Minor > > This is a follow-up to CASSANDRA-10786. > Given the table > {code} > CREATE TABLE foo (k int PRIMARY KEY) > {code} > And the prepared statement > {code} > INSERT INTO foo (k) VALUES (?) IF NOT EXISTS > {code} > The result set metadata changes depending on the outcome of the update: > * if the row didn't exist, there is only a single column \[applied] = true > * if it did, the result contains \[applied] = false, plus the current value > of column k. > The way this was handled so far is that the PREPARED response contains no > result set metadata, and therefore all EXECUTE messages have SKIP_METADATA = > false, and the responses always include the full (and correct) metadata. > CASSANDRA-10786 still sends the PREPARED response with no metadata, *but the > response to EXECUTE now contains a {{new_metadata_id}}*. The driver thinks it > is because of a schema change, and updates its local copy of the prepared > statement's result metadata. > The next EXECUTE is sent with SKIP_METADATA = true, but the server appears to > ignore that, and still sends the metadata in the response. So each response > includes the correct metadata, the driver uses it, and there is no visible > issue for client code. > The only drawback is that the driver updates its local copy of the metadata > unnecessarily, every time. We can work around that by only updating if we had > metadata before, at the cost of an extra volatile read. But I think the best > thing to do would be to never send a {{new_metadata_id}} in for a conditional > update. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13992) Don't send new_metadata_id for conditional updates
[ https://issues.apache.org/jira/browse/CASSANDRA-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249838#comment-16249838 ] Olivier Michallat commented on CASSANDRA-13992: --- [~ifesdjeen] that would work, the driver can treat an empty {{new_metadata_id}} as "don't update my local copy". Namely, changing [this line|https://github.com/datastax/java-driver/blob/6eeb8b2193ab5b50b73b0d9a533e775265f11007/driver-core/src/main/java/com/datastax/driver/core/ArrayBackedResultSet.java#L83] to: {code} if (newMetadataId != null && newMetadataId.bytes.length > 0) { {code} However that feels kind of hacky. Consider how we would have to update the protocol spec to explain this: {quote} - is \[short bytes] representing the new, changed resultset metadata. The new metadata ID must also be used in subsequent executions of the corresponding prepared statement, if any, *except if it is empty*. {quote} It would make so much more sense to force {{METADATA_CHANGED}} to *false* for conditional statements, isn't there any way we can do that? > Don't send new_metadata_id for conditional updates > -- > > Key: CASSANDRA-13992 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13992 > Project: Cassandra > Issue Type: Bug >Reporter: Olivier Michallat >Assignee: Kurt Greaves >Priority: Minor > > This is a follow-up to CASSANDRA-10786. > Given the table > {code} > CREATE TABLE foo (k int PRIMARY KEY) > {code} > And the prepared statement > {code} > INSERT INTO foo (k) VALUES (?) IF NOT EXISTS > {code} > The result set metadata changes depending on the outcome of the update: > * if the row didn't exist, there is only a single column \[applied] = true > * if it did, the result contains \[applied] = false, plus the current value > of column k. > The way this was handled so far is that the PREPARED response contains no > result set metadata, and therefore all EXECUTE messages have SKIP_METADATA = > false, and the responses always include the full (and correct) metadata. > CASSANDRA-10786 still sends the PREPARED response with no metadata, *but the > response to EXECUTE now contains a {{new_metadata_id}}*. The driver thinks it > is because of a schema change, and updates its local copy of the prepared > statement's result metadata. > The next EXECUTE is sent with SKIP_METADATA = true, but the server appears to > ignore that, and still sends the metadata in the response. So each response > includes the correct metadata, the driver uses it, and there is no visible > issue for client code. > The only drawback is that the driver updates its local copy of the metadata > unnecessarily, every time. We can work around that by only updating if we had > metadata before, at the cost of an extra volatile read. But I think the best > thing to do would be to never send a {{new_metadata_id}} in for a conditional > update. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14013) Data loss in snapshots keyspace after service restart
Gregor Uhlenheuer created CASSANDRA-14013: - Summary: Data loss in snapshots keyspace after service restart Key: CASSANDRA-14013 URL: https://issues.apache.org/jira/browse/CASSANDRA-14013 Project: Cassandra Issue Type: Bug Reporter: Gregor Uhlenheuer I am posting this bug in hope to discover the stupid mistake I am doing because I can't imagine a reasonable answer for the behavior I see right now :-) In short words, I do observe data loss in a keyspace called *snapshots* after restarting the Cassandra service. Say I do have 1000 records in a table called *snapshots.test_idx* then after restart the table has less entries or is even empty. My kind of "mysterious" observation is that it happens only in a keyspace called *snapshots*... h3. Steps to reproduce These steps to reproduce show the described behavior in "most" attempts (not every single time though). {code} # create keyspace CREATE KEYSPACE snapshots WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; # create table CREATE TABLE snapshots.test_idx (key text, seqno bigint, primary key(key)); # insert some test data INSERT INTO snapshots.test_idx (key,seqno) values ('key1', 1); ... INSERT INTO snapshots.test_idx (key,seqno) values ('key1000', 1000); # count entries SELECT count(*) FROM snapshots.test_idx; 1000 # restart service kill cassandra -f # count entries SELECT count(*) FROM snapshots.test_idx; 0 {code} I hope someone can point me to the obvious mistake I am doing :-) This happened to me using both Cassandra 3.9 and 3.11.0 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14007) cqlshlib tests fail due to compact table
[ https://issues.apache.org/jira/browse/CASSANDRA-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249778#comment-16249778 ] Joel Knighton commented on CASSANDRA-14007: --- Yeah, the cqlshlib tests have their own script to run and don't run as part of dtests. See [https://github.com/apache/cassandra-builds/blob/f0e63d66269f9086c3a0393a24a55577d21b4454/build-scripts/cassandra-cqlsh-tests.sh] for an example of how to run them. > cqlshlib tests fail due to compact table > > > Key: CASSANDRA-14007 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14007 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Joel Knighton >Assignee: Alex Petrov > > The pylib/cqlshlib tests fail on initialization with the error > {{SyntaxException: query\] message="Compact tables are not allowed in Cassandra starting with > 4.0 version.">}}. > The table {{dynamic_columns}} is created {{WITH COMPACT STORAGE}}. Since > [CASSANDRA-10857], this is no longer supported. It looks like dropping the > COMPACT STORAGE modifier is enough for the tests to run, but I haven't looked > if we should instead remove the table and all related tests entirely, or if > there's an interesting code path covered by this that we should test in a > different way now. [~ifesdjeen] might know at a glance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14012) Document gossip protocol
Jörn Heissler created CASSANDRA-14012: - Summary: Document gossip protocol Key: CASSANDRA-14012 URL: https://issues.apache.org/jira/browse/CASSANDRA-14012 Project: Cassandra Issue Type: Improvement Reporter: Jörn Heissler Priority: Minor I had an issue today with two nodes communicating with each other; there's a flaw in my configuration (wrong broadcast address). I saw a little bit of traffic on port 7000, but I couldn't understand it for lack of documentation. With documentation I would have understood my issue very quickly (7f 00 01 01 is a bad broadcast address!). But I didn't recognize those 4 bytes as the bc address. Could you please document the gossip protocol? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14011) Multi threaded L0 -> L1 compaction
Marcus Eriksson created CASSANDRA-14011: --- Summary: Multi threaded L0 -> L1 compaction Key: CASSANDRA-14011 URL: https://issues.apache.org/jira/browse/CASSANDRA-14011 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson Fix For: 4.x Currently L0 -> L1 compactions are almost always single threaded because every L0 sstable will overlap with all L1 sstables. To improve this, we should range-split the input sstables in a configurable amount of parts and then use multiple threads to write out the results. This is similar to the {{max_subcompactions}} option in RocksDB: https://github.com/facebook/rocksdb/wiki/Leveled-Compaction -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13987) Multithreaded commitlog subtly changed durability
[ https://issues.apache.org/jira/browse/CASSANDRA-13987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-13987: Reviewer: Sam Tunnicliffe (was: Blake Eggleston) > Multithreaded commitlog subtly changed durability > - > > Key: CASSANDRA-13987 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13987 > Project: Cassandra > Issue Type: Improvement >Reporter: Jason Brown >Assignee: Jason Brown > Fix For: 4.x > > > When multithreaded commitlog was introduced in CASSANDRA-3578, we subtly > changed the way that commitlog durability worked. Everything still gets > written to an mmap file. However, not everything is replayable from the > mmaped file after a process crash, in periodic mode. > In brief, the reason this changesd is due to the chained markers that are > required for the multithreaded commit log. At each msync, we wait for > outstanding mutations to serialize into the commitlog, and update a marker > before and after the commits that have accumluated since the last sync. With > those markers, we can safely replay that section of the commitlog. Without > the markers, we have no guarantee that the commits in that section were > successfully written, thus we abandon those commits on replay. > If you have correlated process failures of multiple nodes at "nearly" the > same time (see ["There Is No > Now"|http://queue.acm.org/detail.cfm?id=2745385]), it is possible to have > data loss if none of the nodes msync the commitlog. For example, with RF=3, > if quorum write succeeds on two nodes (and we acknowledge the write back to > the client), and then the process on both nodes OOMs (say, due to reading the > index for a 100GB partition), the write will be lost if neither process > msync'ed the commitlog. More exactly, the commitlog cannot be fully replayed. > The reason why this data is silently lost is due to the chained markers that > were introduced with CASSANDRA-3578. > The problem we are addressing with this ticket is incrementally improving > 'durability' due to process crash, not host crash. (Note: operators should > use batch mode to ensure greater durability, but batch mode in it's current > implementation is a) borked, and b) will burn through, *very* rapidly, SSDs > that don't have a non-volatile write cache sitting in front.) > The current default for {{commitlog_sync_period_in_ms}} is 10 seconds, which > means that a node could lose up to ten seconds of data due to process crash. > The unfortunate thing is that the data is still avaialble, in the mmap file, > but we can't replay it due to incomplete chained markers. > ftr, I don't believe we've ever had a stated policy about commitlog > durability wrt process crash. Pre-2.0 we naturally piggy-backed off the > memory mapped file and the fact that every mutation was acquired a lock and > wrote into the mmap buffer, and the ability to replay everything out of it > came for free. With CASSANDRA-3578, that was subtly changed. > Something [~jjirsa] pointed out to me is that [MySQL provides a way to adjust > the durability > guarantees|https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit] > of each commit in innodb via the {{innodb_flush_log_at_trx_commit}}. I'm > using that idea as a loose springboard for what to do here. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14010) NullPointerException when creating keyspace
Jonathan Pellby created CASSANDRA-14010: --- Summary: NullPointerException when creating keyspace Key: CASSANDRA-14010 URL: https://issues.apache.org/jira/browse/CASSANDRA-14010 Project: Cassandra Issue Type: Bug Reporter: Jonathan Pellby We have a test environment were we drop and create keyspaces and tables several times within a short time frame. Since upgrading from 3.11.0 to 3.11.1, we are seeing a lot of create statements failing. See the logs below: {code:java} 2017-11-13T14:29:20.037986449Z WARN Directory /tmp/ramdisk/commitlog doesn't exist 2017-11-13T14:29:20.038009590Z WARN Directory /tmp/ramdisk/saved_caches doesn't exist 2017-11-13T14:29:20.094337265Z INFO Initialized prepared statement caches with 10 MB (native) and 10 MB (Thrift) 2017-11-13T14:29:20.805946340Z INFO Initializing system.IndexInfo 2017-11-13T14:29:21.934686905Z INFO Initializing system.batches 2017-11-13T14:29:21.973914733Z INFO Initializing system.paxos 2017-11-13T14:29:21.994550268Z INFO Initializing system.local 2017-11-13T14:29:22.014097194Z INFO Initializing system.peers 2017-11-13T14:29:22.124211254Z INFO Initializing system.peer_events 2017-11-13T14:29:22.153966833Z INFO Initializing system.range_xfers 2017-11-13T14:29:22.174097334Z INFO Initializing system.compaction_history 2017-11-13T14:29:22.194259920Z INFO Initializing system.sstable_activity 2017-11-13T14:29:22.210178271Z INFO Initializing system.size_estimates 2017-11-13T14:29:22.223836992Z INFO Initializing system.available_ranges 2017-11-13T14:29:22.237854207Z INFO Initializing system.transferred_ranges 2017-11-13T14:29:22.253995621Z INFO Initializing system.views_builds_in_progress 2017-11-13T14:29:22.264052481Z INFO Initializing system.built_views 2017-11-13T14:29:22.283334779Z INFO Initializing system.hints 2017-11-13T14:29:22.304110311Z INFO Initializing system.batchlog 2017-11-13T14:29:22.318031950Z INFO Initializing system.prepared_statements 2017-11-13T14:29:22.326547917Z INFO Initializing system.schema_keyspaces 2017-11-13T14:29:22.337097407Z INFO Initializing system.schema_columnfamilies 2017-11-13T14:29:22.354082675Z INFO Initializing system.schema_columns 2017-11-13T14:29:22.384179063Z INFO Initializing system.schema_triggers 2017-11-13T14:29:22.394222027Z INFO Initializing system.schema_usertypes 2017-11-13T14:29:22.414199833Z INFO Initializing system.schema_functions 2017-11-13T14:29:22.427205182Z INFO Initializing system.schema_aggregates 2017-11-13T14:29:22.427228345Z INFO Not submitting build tasks for views in keyspace system as storage service is not initialized 2017-11-13T14:29:22.652838866Z INFO Scheduling approximate time-check task with a precision of 10 milliseconds 2017-11-13T14:29:22.732862906Z INFO Initializing system_schema.keyspaces 2017-11-13T14:29:22.746598744Z INFO Initializing system_schema.tables 2017-11-13T14:29:22.759649011Z INFO Initializing system_schema.columns 2017-11-13T14:29:22.766245435Z INFO Initializing system_schema.triggers 2017-11-13T14:29:22.778716809Z INFO Initializing system_schema.dropped_columns 2017-11-13T14:29:22.791369819Z INFO Initializing system_schema.views 2017-11-13T14:29:22.839141724Z INFO Initializing system_schema.types 2017-11-13T14:29:22.852911976Z INFO Initializing system_schema.functions 2017-11-13T14:29:22.852938112Z INFO Initializing system_schema.aggregates 2017-11-13T14:29:22.869348526Z INFO Initializing system_schema.indexes 2017-11-13T14:29:22.874178682Z INFO Not submitting build tasks for views in keyspace system_schema as storage service is not initialized 2017-11-13T14:29:23.700250435Z INFO Initializing key cache with capacity of 25 MBs. 2017-11-13T14:29:23.724357053Z INFO Initializing row cache with capacity of 0 MBs 2017-11-13T14:29:23.724383599Z INFO Initializing counter cache with capacity of 12 MBs 2017-11-13T14:29:23.724386906Z INFO Scheduling counter cache save to every 7200 seconds (going to save all keys). 2017-11-13T14:29:23.984408710Z INFO Populating token metadata from system tables 2017-11-13T14:29:24.032687075Z INFO Global buffer pool is enabled, when pool is exhausted (max is 125.000MiB) it will allocate on heap 2017-11-13T14:29:24.214123695Z INFO Token metadata: 2017-11-13T14:29:24.304218769Z INFO Completed loading (14 ms; 8 keys) KeyCache cache 2017-11-13T14:29:24.363978406Z INFO No commitlog files found; skipping replay 2017-11-13T14:29:24.364005238Z INFO Populating token metadata from system tables 2017-11-13T14:29:24.394408476Z INFO Token metadata: 2017-11-13T14:29:24.709411652Z INFO Preloaded 0 prepared statements 2017-11-13T14:29:24.719332880Z INFO Cassandra version: 3.11.1 2017-11-13T14:29:24.719355969Z INFO Thrift API version: 20.1.0 2017-11-13T14:29:24.719359443Z INFO CQL supported versions: 3.4.4 (default: 3.4.4) 2017-11-13T14:29:24.719362103Z INFO Native protocol supported versions: 3/v3, 4/v4, 5/v5-beta (default: 4/v4)
[jira] [Deleted] (CASSANDRA-14009) _to_be_deleted
[ https://issues.apache.org/jira/browse/CASSANDRA-14009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa deleted CASSANDRA-14009: --- > _to_be_deleted > -- > > Key: CASSANDRA-14009 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14009 > Project: Cassandra > Issue Type: Test >Reporter: Andrzej Bober >Priority: Trivial > > __deleted__ -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14009) _to_be_deleted
[ https://issues.apache.org/jira/browse/CASSANDRA-14009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bober updated CASSANDRA-14009: -- Priority: Trivial (was: Major) Component/s: (was: Auth) Issue Type: Test (was: Bug) > _to_be_deleted > -- > > Key: CASSANDRA-14009 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14009 > Project: Cassandra > Issue Type: Test >Reporter: Andrzej Bober >Priority: Trivial > > __deleted__ -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14009) _to_be_deleted
[ https://issues.apache.org/jira/browse/CASSANDRA-14009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bober updated CASSANDRA-14009: -- Labels: (was: security) > _to_be_deleted > -- > > Key: CASSANDRA-14009 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14009 > Project: Cassandra > Issue Type: Test >Reporter: Andrzej Bober >Priority: Trivial > > __deleted__ -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14009) _to_be_deleted
[ https://issues.apache.org/jira/browse/CASSANDRA-14009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bober updated CASSANDRA-14009: -- Summary: _to_be_deleted (was: Any user can overwrite any table with sstableloader) > _to_be_deleted > -- > > Key: CASSANDRA-14009 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14009 > Project: Cassandra > Issue Type: Bug > Components: Auth >Reporter: Andrzej Bober > Labels: security > > __deleted__ -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Resolved] (CASSANDRA-14009) Any user can overwrite any table with sstableloader
[ https://issues.apache.org/jira/browse/CASSANDRA-14009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bober resolved CASSANDRA-14009. --- Resolution: Incomplete Fix Version/s: (was: 3.11.x) (was: 3.0.x) (was: 2.2.x) (was: 2.1.x) > Any user can overwrite any table with sstableloader > --- > > Key: CASSANDRA-14009 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14009 > Project: Cassandra > Issue Type: Bug > Components: Auth >Reporter: Andrzej Bober > Labels: security > > __deleted__ -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14009) Any user can overwrite any table with sstableloader
[ https://issues.apache.org/jira/browse/CASSANDRA-14009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bober updated CASSANDRA-14009: -- Description: __deleted__ (was: Hi there, Looks like any user can overwrite any table with sstableloader. Tested ubuntu 16.04.3, Java 1.8.0_151_b12, and Cassandra 2.1.19 / 2.2.11 / 3.0.15 / 3.11.1. {code:sql} cassandra@cqlsh> CREATE USER alice WITH PASSWORD 'Alice'; cassandra@cqlsh> CREATE USER bob WITH PASSWORD 'Bob'; cassandra@cqlsh> CREATE KEYSPACE db4alice WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; cassandra@cqlsh> GRANT ALL PERMISSIONS ON KEYSPACE db4alice TO alice; alice@cqlsh> CREATE TABLE users (userid text PRIMARY KEY, password text); alice@cqlsh> INSERT INTO users (userid, password) VALUES ('user1', 'pass1'); alice@cqlsh> INSERT INTO users (userid, password) VALUES ('user2’, 'pass2’); alice@cqlsh> INSERT INTO users (userid, password) VALUES ('user3’, 'pass3’); alice@cqlsh> truncate users; alice@cqlsh> select * from db4alice.users ; userid | password +-- (0 rows) sstableloader -d 127.0.0.1 -u bob -pw Bob ./db4alice/users alice@cqlsh> select * from db4alice.users ; userid | password +-- user2 |pass2 user1 |pass1 user3 |pass3 (3 rows) {code} Looks like a pretty serious bug to me.) > Any user can overwrite any table with sstableloader > --- > > Key: CASSANDRA-14009 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14009 > Project: Cassandra > Issue Type: Bug > Components: Auth >Reporter: Andrzej Bober > Labels: security > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.11.x > > > __deleted__ -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-9988: --- Resolution: Fixed Status: Resolved (was: Patch Available) committed as sha {{0eab80bf389114be8d6f7627f72249bbc3c02e64}} Thanks! > Introduce leaf-only iterator > > > Key: CASSANDRA-9988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9988 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Labels: patch > Fix For: 4.0 > > Attachments: 9988-3tests.png, 9988-data.png, 9988-result.png, > 9988-result2.png, 9988-result3.png, 9988-test-result-expsearch.xlsx, > 9988-test-result-raw.png, 9988-test-result.xlsx, 9988-test-result3.png, > 9988-trunk-new-update.txt, 9988-trunk-new.txt, trunk-9988.txt > > > In many cases we have small btrees, small enough to fit in a single leaf > page. In this case it _may_ be more efficient to specialise our iterator. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
cassandra git commit: Introduce Leaf-only BTree Iterator
Repository: cassandra Updated Branches: refs/heads/trunk 07258a96b -> 0eab80bf3 Introduce Leaf-only BTree Iterator patch by Piotr Jastrzebski, Jay Zhuang; reviewed by jasobrown for CASSANDRA-9988 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/0eab80bf Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/0eab80bf Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/0eab80bf Branch: refs/heads/trunk Commit: 0eab80bf389114be8d6f7627f72249bbc3c02e64 Parents: 07258a9 Author: Jay ZhuangAuthored: Sun Jan 15 16:52:58 2017 -0800 Committer: Jason Brown Committed: Mon Nov 13 05:43:22 2017 -0800 -- CHANGES.txt | 1 + .../db/partitions/AbstractBTreePartition.java | 3 +- .../org/apache/cassandra/utils/btree/BTree.java | 26 +- .../utils/btree/BTreeSearchIterator.java| 137 +-- .../apache/cassandra/utils/btree/BTreeSet.java | 3 +- .../utils/btree/FullBTreeSearchIterator.java| 159 .../utils/btree/LeafBTreeSearchIterator.java| 113 + .../microbench/BTreeSearchIteratorBench.java| 143 +++ .../utils/btree/BTreeSearchIteratorTest.java| 241 +++ 9 files changed, 683 insertions(+), 143 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/0eab80bf/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index f5951d6..494901c 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 4.0 + * Introduce leaf-only iterator (CASSANDRA-9988) * Upgrade Guava to 23.3 and Airline to 0.8 (CASSANDRA-13997) * Allow only one concurrent call to StatusLogger (CASSANDRA-12182) * Refactoring to specialised functional interfaces (CASSANDRA-13982) http://git-wip-us.apache.org/repos/asf/cassandra/blob/0eab80bf/src/java/org/apache/cassandra/db/partitions/AbstractBTreePartition.java -- diff --git a/src/java/org/apache/cassandra/db/partitions/AbstractBTreePartition.java b/src/java/org/apache/cassandra/db/partitions/AbstractBTreePartition.java index d913cb3..6dbaff5 100644 --- a/src/java/org/apache/cassandra/db/partitions/AbstractBTreePartition.java +++ b/src/java/org/apache/cassandra/db/partitions/AbstractBTreePartition.java @@ -26,7 +26,6 @@ import org.apache.cassandra.db.filter.ColumnFilter; import org.apache.cassandra.db.rows.*; import org.apache.cassandra.utils.SearchIterator; import org.apache.cassandra.utils.btree.BTree; -import org.apache.cassandra.utils.btree.BTreeSearchIterator; import static org.apache.cassandra.utils.btree.BTree.Dir.desc; @@ -131,7 +130,7 @@ public abstract class AbstractBTreePartition implements Partition, Iterable final Holder current = holder(); return new SearchIterator () { -private final SearchIterator rawIter = new BTreeSearchIterator<>(current.tree, metadata().comparator, desc(reversed)); +private final SearchIterator rawIter = BTree.slice(current.tree, metadata().comparator, desc(reversed)); private final DeletionTime partitionDeletion = current.deletionInfo.getPartitionDeletion(); public Row next(Clustering clustering) http://git-wip-us.apache.org/repos/asf/cassandra/blob/0eab80bf/src/java/org/apache/cassandra/utils/btree/BTree.java -- diff --git a/src/java/org/apache/cassandra/utils/btree/BTree.java b/src/java/org/apache/cassandra/utils/btree/BTree.java index a4519b9..9ed7534 100644 --- a/src/java/org/apache/cassandra/utils/btree/BTree.java +++ b/src/java/org/apache/cassandra/utils/btree/BTree.java @@ -201,12 +201,14 @@ public class BTree public static Iterator iterator(Object[] btree, Dir dir) { -return new BTreeSearchIterator<>(btree, null, dir); +return isLeaf(btree) ? new LeafBTreeSearchIterator<>(btree, null, dir) + : new FullBTreeSearchIterator<>(btree, null, dir); } public static Iterator iterator(Object[] btree, int lb, int ub, Dir dir) { -return new BTreeSearchIterator<>(btree, null, dir, lb, ub); +return isLeaf(btree) ? new LeafBTreeSearchIterator<>(btree, null, dir, lb, ub) + : new FullBTreeSearchIterator<>(btree, null, dir, lb, ub); } public static Iterable iterable(Object[] btree) @@ -234,7 +236,8 @@ public class BTree */ public static BTreeSearchIterator slice(Object[] btree, Comparator comparator, Dir dir) { -return new BTreeSearchIterator<>(btree,
[jira] [Updated] (CASSANDRA-13975) Add a workaround for overly large read repair mutations
[ https://issues.apache.org/jira/browse/CASSANDRA-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-13975: -- Resolution: Fixed Fix Version/s: (was: 3.11.x) (was: 3.0.x) 3.11.2 3.0.16 Status: Resolved (was: Ready to Commit) > Add a workaround for overly large read repair mutations > --- > > Key: CASSANDRA-13975 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13975 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Aleksey Yeschenko >Assignee: Aleksey Yeschenko > Fix For: 3.0.16, 3.11.2 > > > It's currently possible for {{DataResolver}} to accumulate more changes to > read repair that would fit in a single serialized mutation. If that happens, > the node receiving the mutation would fail, and the read would time out, and > won't be able to proceed until the operator runs repair or manually drops the > affected partitions. > Ideally we should either read repair iteratively, or at least split the > resulting mutation into smaller chunks in the end. In the meantime, for > 3.0.x, I suggest we add logging to catch this, and a -D flag to allow > proceeding with the requests as is when the mutation is too large, without > read repair. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13975) Add a workaround for overly large read repair mutations
[ https://issues.apache.org/jira/browse/CASSANDRA-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249540#comment-16249540 ] Aleksey Yeschenko commented on CASSANDRA-13975: --- Thanks, committed as [f1e850a492126572efc636a6838cff90333806b9|https://github.com/apache/cassandra/commit/f1e850a492126572efc636a6838cff90333806b9] to 3.0 and merged up with 3.11 and trunk. > Add a workaround for overly large read repair mutations > --- > > Key: CASSANDRA-13975 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13975 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Aleksey Yeschenko >Assignee: Aleksey Yeschenko > Fix For: 3.0.16, 3.11.2 > > > It's currently possible for {{DataResolver}} to accumulate more changes to > read repair that would fit in a single serialized mutation. If that happens, > the node receiving the mutation would fail, and the read would time out, and > won't be able to proceed until the operator runs repair or manually drops the > affected partitions. > Ideally we should either read repair iteratively, or at least split the > resulting mutation into smaller chunks in the end. In the meantime, for > 3.0.x, I suggest we add logging to catch this, and a -D flag to allow > proceeding with the requests as is when the mutation is too large, without > read repair. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[1/6] cassandra git commit: Add flag to allow dropping oversized read repair mutations
Repository: cassandra Updated Branches: refs/heads/cassandra-3.0 f767d35ae -> f1e850a49 refs/heads/cassandra-3.11 387d3a4eb -> 9ee44db49 refs/heads/trunk 7707b736c -> 07258a96b Add flag to allow dropping oversized read repair mutations patch by Aleksey Yeschenko; reviewed by Sam Tunnicliffe for CASSANDRA-13975 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/f1e850a4 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/f1e850a4 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/f1e850a4 Branch: refs/heads/cassandra-3.0 Commit: f1e850a492126572efc636a6838cff90333806b9 Parents: f767d35 Author: Aleksey YeschenkoAuthored: Wed Oct 25 20:15:39 2017 +0100 Committer: Aleksey Yeschenko Committed: Mon Nov 13 13:10:28 2017 + -- CHANGES.txt | 2 + .../apache/cassandra/metrics/TableMetrics.java | 2 + .../apache/cassandra/service/DataResolver.java | 53 +--- 3 files changed, 49 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/f1e850a4/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index e3026aa..a3c43fd 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 3.0.16 + * Add flag to allow dropping oversized read repair mutations (CASSANDRA-13975) * Fix SSTableLoader logger message (CASSANDRA-14003) * Fix repair race that caused gossip to block (CASSANDRA-13849) * Tracing interferes with digest requests when using RandomPartitioner (CASSANDRA-13964) @@ -8,6 +9,7 @@ * Mishandling of cells for removed/dropped columns when reading legacy files (CASSANDRA-13939) * Deserialise sstable metadata in nodetool verify (CASSANDRA-13922) + 3.0.15 * Improve TRUNCATE performance (CASSANDRA-13909) * Implement short read protection on partition boundaries (CASSANDRA-13595) http://git-wip-us.apache.org/repos/asf/cassandra/blob/f1e850a4/src/java/org/apache/cassandra/metrics/TableMetrics.java -- diff --git a/src/java/org/apache/cassandra/metrics/TableMetrics.java b/src/java/org/apache/cassandra/metrics/TableMetrics.java index fe88a63..eb56ed9 100644 --- a/src/java/org/apache/cassandra/metrics/TableMetrics.java +++ b/src/java/org/apache/cassandra/metrics/TableMetrics.java @@ -151,6 +151,7 @@ public class TableMetrics public final static LatencyMetrics globalWriteLatency = new LatencyMetrics(globalFactory, globalAliasFactory, "Write"); public final static LatencyMetrics globalRangeLatency = new LatencyMetrics(globalFactory, globalAliasFactory, "Range"); +public final Meter readRepairRequests; public final Meter shortReadProtectionRequests; public final Map samplers; @@ -648,6 +649,7 @@ public class TableMetrics casPropose = new LatencyMetrics(factory, "CasPropose", cfs.keyspace.metric.casPropose); casCommit = new LatencyMetrics(factory, "CasCommit", cfs.keyspace.metric.casCommit); +readRepairRequests = Metrics.meter(factory.createMetricName("ReadRepairRequests")); shortReadProtectionRequests = Metrics.meter(factory.createMetricName("ShortReadProtectionRequests")); } http://git-wip-us.apache.org/repos/asf/cassandra/blob/f1e850a4/src/java/org/apache/cassandra/service/DataResolver.java -- diff --git a/src/java/org/apache/cassandra/service/DataResolver.java b/src/java/org/apache/cassandra/service/DataResolver.java index 5fb34c6..f02b565 100644 --- a/src/java/org/apache/cassandra/service/DataResolver.java +++ b/src/java/org/apache/cassandra/service/DataResolver.java @@ -44,6 +44,9 @@ import org.apache.cassandra.utils.FBUtilities; public class DataResolver extends ResponseResolver { +private static final boolean DROP_OVERSIZED_READ_REPAIR_MUTATIONS = +Boolean.getBoolean("cassandra.drop_oversized_readrepair_mutations"); + @VisibleForTesting final List repairResults = Collections.synchronizedList(new ArrayList<>()); @@ -452,15 +455,49 @@ public class DataResolver extends ResponseResolver public void close() { for (int i = 0; i < repairs.length; i++) +if (null != repairs[i]) +sendRepairMutation(repairs[i], sources[i]); +} + +private void sendRepairMutation(PartitionUpdate partition, InetAddress destination) +{ +Mutation mutation = new Mutation(partition); +int messagingVersion = MessagingService.instance().getVersion(destination); + +
[4/6] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.11
Merge branch 'cassandra-3.0' into cassandra-3.11 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/9ee44db4 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/9ee44db4 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/9ee44db4 Branch: refs/heads/trunk Commit: 9ee44db49b13d4b4c91c9d6332ce06a6e2abf944 Parents: 387d3a4 f1e850a Author: Aleksey YeschenkoAuthored: Mon Nov 13 13:13:06 2017 + Committer: Aleksey Yeschenko Committed: Mon Nov 13 13:13:06 2017 + -- CHANGES.txt | 1 + .../apache/cassandra/metrics/TableMetrics.java | 2 + .../apache/cassandra/service/DataResolver.java | 53 +--- 3 files changed, 48 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/9ee44db4/CHANGES.txt -- diff --cc CHANGES.txt index 6a78b60,a3c43fd..a1a1a37 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,9 -1,5 +1,10 @@@ -3.0.16 +3.11.2 + * Add asm jar to build.xml for maven builds (CASSANDRA-11193) + * Round buffer size to powers of 2 for the chunk cache (CASSANDRA-13897) + * Update jackson JSON jars (CASSANDRA-13949) + * Avoid locks when checking LCS fanout and if we should defrag (CASSANDRA-13930) +Merged from 3.0: + * Add flag to allow dropping oversized read repair mutations (CASSANDRA-13975) * Fix SSTableLoader logger message (CASSANDRA-14003) * Fix repair race that caused gossip to block (CASSANDRA-13849) * Tracing interferes with digest requests when using RandomPartitioner (CASSANDRA-13964) http://git-wip-us.apache.org/repos/asf/cassandra/blob/9ee44db4/src/java/org/apache/cassandra/metrics/TableMetrics.java -- diff --cc src/java/org/apache/cassandra/metrics/TableMetrics.java index b0f667c,eb56ed9..e78bb66 --- a/src/java/org/apache/cassandra/metrics/TableMetrics.java +++ b/src/java/org/apache/cassandra/metrics/TableMetrics.java @@@ -167,40 -151,7 +167,41 @@@ public class TableMetric public final static LatencyMetrics globalWriteLatency = new LatencyMetrics(globalFactory, globalAliasFactory, "Write"); public final static LatencyMetrics globalRangeLatency = new LatencyMetrics(globalFactory, globalAliasFactory, "Range"); +public final static Gauge globalPercentRepaired = Metrics.register(globalFactory.createMetricName("PercentRepaired"), +new Gauge() +{ +public Double getValue() +{ +double repaired = 0; +double total = 0; +for (String keyspace : Schema.instance.getNonSystemKeyspaces()) +{ +Keyspace k = Schema.instance.getKeyspaceInstance(keyspace); +if (SchemaConstants.DISTRIBUTED_KEYSPACE_NAME.equals(k.getName())) +continue; +if (k.getReplicationStrategy().getReplicationFactor() < 2) +continue; + +for (ColumnFamilyStore cf : k.getColumnFamilyStores()) +{ +if (!SecondaryIndexManager.isIndexColumnFamily(cf.name)) +{ +for (SSTableReader sstable : cf.getSSTables(SSTableSet.CANONICAL)) +{ +if (sstable.isRepaired()) +{ +repaired += sstable.uncompressedLength(); +} +total += sstable.uncompressedLength(); +} +} +} +} +return total > 0 ? (repaired / total) * 100 : 100.0; +} +}); + + public final Meter readRepairRequests; public final Meter shortReadProtectionRequests; public final Map samplers; http://git-wip-us.apache.org/repos/asf/cassandra/blob/9ee44db4/src/java/org/apache/cassandra/service/DataResolver.java -- diff --cc src/java/org/apache/cassandra/service/DataResolver.java index 111d561,f02b565..f63f4f5 --- a/src/java/org/apache/cassandra/service/DataResolver.java +++ b/src/java/org/apache/cassandra/service/DataResolver.java @@@ -44,15 -44,17 +44,18 @@@ import org.apache.cassandra.utils.FBUti public class DataResolver extends ResponseResolver { + private static final boolean DROP_OVERSIZED_READ_REPAIR_MUTATIONS = + Boolean.getBoolean("cassandra.drop_oversized_readrepair_mutations"); + @VisibleForTesting final List repairResults = Collections.synchronizedList(new ArrayList<>()); - +
[6/6] cassandra git commit: Merge branch 'cassandra-3.11' into trunk
Merge branch 'cassandra-3.11' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/07258a96 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/07258a96 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/07258a96 Branch: refs/heads/trunk Commit: 07258a96bfde3a6df839b4cc2c79e500d95163f0 Parents: 7707b73 9ee44db Author: Aleksey YeschenkoAuthored: Mon Nov 13 13:15:15 2017 + Committer: Aleksey Yeschenko Committed: Mon Nov 13 13:18:03 2017 + -- CHANGES.txt | 1 + .../apache/cassandra/metrics/TableMetrics.java | 2 + .../apache/cassandra/service/DataResolver.java | 51 +--- 3 files changed, 46 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/07258a96/CHANGES.txt -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/07258a96/src/java/org/apache/cassandra/metrics/TableMetrics.java -- diff --cc src/java/org/apache/cassandra/metrics/TableMetrics.java index 04fbf46,e78bb66..5c4a849 --- a/src/java/org/apache/cassandra/metrics/TableMetrics.java +++ b/src/java/org/apache/cassandra/metrics/TableMetrics.java @@@ -248,33 -201,7 +248,34 @@@ public class TableMetric } }); +public static final Gauge globalBytesRepaired = Metrics.register(globalFactory.createMetricName("BytesRepaired"), + new Gauge() +{ +public Long getValue() +{ +return totalNonSystemTablesSize(SSTableReader::isRepaired).left; +} +}); + +public static final Gauge globalBytesUnrepaired = Metrics.register(globalFactory.createMetricName("BytesUnrepaired"), + new Gauge() +{ +public Long getValue() +{ +return totalNonSystemTablesSize(s -> !s.isRepaired() && !s.isPendingRepair()).left; +} +}); + +public static final Gauge globalBytesPendingRepair = Metrics.register(globalFactory.createMetricName("BytesPendingRepair"), + new Gauge() +{ +public Long getValue() +{ +return totalNonSystemTablesSize(SSTableReader::isPendingRepair).left; +} +}); + + public final Meter readRepairRequests; public final Meter shortReadProtectionRequests; public final Map samplers; @@@ -825,26 -698,7 +826,27 @@@ casPropose = new LatencyMetrics(factory, "CasPropose", cfs.keyspace.metric.casPropose); casCommit = new LatencyMetrics(factory, "CasCommit", cfs.keyspace.metric.casCommit); +repairsStarted = createTableCounter("RepairJobsStarted"); +repairsCompleted = createTableCounter("RepairJobsCompleted"); + +anticompactionTime = createTableTimer("AnticompactionTime", cfs.keyspace.metric.anticompactionTime); +validationTime = createTableTimer("ValidationTime", cfs.keyspace.metric.validationTime); +syncTime = createTableTimer("SyncTime", cfs.keyspace.metric.repairSyncTime); + +bytesValidated = createTableHistogram("BytesValidated", cfs.keyspace.metric.bytesValidated, false); +partitionsValidated = createTableHistogram("PartitionsValidated", cfs.keyspace.metric.partitionsValidated, false); +bytesAnticompacted = createTableCounter("BytesAnticompacted"); +bytesMutatedAnticompaction = createTableCounter("BytesMutatedAnticompaction"); +mutatedAnticompactionGauge = createTableGauge("MutatedAnticompactionGauge", () -> +{ +double bytesMutated = bytesMutatedAnticompaction.getCount(); +double bytesAnticomp = bytesAnticompacted.getCount(); +if (bytesAnticomp + bytesMutated > 0) +return bytesMutated / (bytesAnticomp + bytesMutated); +return 0.0; +}); + + readRepairRequests = Metrics.meter(factory.createMetricName("ReadRepairRequests")); shortReadProtectionRequests = Metrics.meter(factory.createMetricName("ShortReadProtectionRequests")); } http://git-wip-us.apache.org/repos/asf/cassandra/blob/07258a96/src/java/org/apache/cassandra/service/DataResolver.java -- diff --cc src/java/org/apache/cassandra/service/DataResolver.java index d4c77d1,f63f4f5..933014f --- a/src/java/org/apache/cassandra/service/DataResolver.java +++
[3/6] cassandra git commit: Add flag to allow dropping oversized read repair mutations
Add flag to allow dropping oversized read repair mutations patch by Aleksey Yeschenko; reviewed by Sam Tunnicliffe for CASSANDRA-13975 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/f1e850a4 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/f1e850a4 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/f1e850a4 Branch: refs/heads/trunk Commit: f1e850a492126572efc636a6838cff90333806b9 Parents: f767d35 Author: Aleksey YeschenkoAuthored: Wed Oct 25 20:15:39 2017 +0100 Committer: Aleksey Yeschenko Committed: Mon Nov 13 13:10:28 2017 + -- CHANGES.txt | 2 + .../apache/cassandra/metrics/TableMetrics.java | 2 + .../apache/cassandra/service/DataResolver.java | 53 +--- 3 files changed, 49 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/f1e850a4/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index e3026aa..a3c43fd 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 3.0.16 + * Add flag to allow dropping oversized read repair mutations (CASSANDRA-13975) * Fix SSTableLoader logger message (CASSANDRA-14003) * Fix repair race that caused gossip to block (CASSANDRA-13849) * Tracing interferes with digest requests when using RandomPartitioner (CASSANDRA-13964) @@ -8,6 +9,7 @@ * Mishandling of cells for removed/dropped columns when reading legacy files (CASSANDRA-13939) * Deserialise sstable metadata in nodetool verify (CASSANDRA-13922) + 3.0.15 * Improve TRUNCATE performance (CASSANDRA-13909) * Implement short read protection on partition boundaries (CASSANDRA-13595) http://git-wip-us.apache.org/repos/asf/cassandra/blob/f1e850a4/src/java/org/apache/cassandra/metrics/TableMetrics.java -- diff --git a/src/java/org/apache/cassandra/metrics/TableMetrics.java b/src/java/org/apache/cassandra/metrics/TableMetrics.java index fe88a63..eb56ed9 100644 --- a/src/java/org/apache/cassandra/metrics/TableMetrics.java +++ b/src/java/org/apache/cassandra/metrics/TableMetrics.java @@ -151,6 +151,7 @@ public class TableMetrics public final static LatencyMetrics globalWriteLatency = new LatencyMetrics(globalFactory, globalAliasFactory, "Write"); public final static LatencyMetrics globalRangeLatency = new LatencyMetrics(globalFactory, globalAliasFactory, "Range"); +public final Meter readRepairRequests; public final Meter shortReadProtectionRequests; public final Map samplers; @@ -648,6 +649,7 @@ public class TableMetrics casPropose = new LatencyMetrics(factory, "CasPropose", cfs.keyspace.metric.casPropose); casCommit = new LatencyMetrics(factory, "CasCommit", cfs.keyspace.metric.casCommit); +readRepairRequests = Metrics.meter(factory.createMetricName("ReadRepairRequests")); shortReadProtectionRequests = Metrics.meter(factory.createMetricName("ShortReadProtectionRequests")); } http://git-wip-us.apache.org/repos/asf/cassandra/blob/f1e850a4/src/java/org/apache/cassandra/service/DataResolver.java -- diff --git a/src/java/org/apache/cassandra/service/DataResolver.java b/src/java/org/apache/cassandra/service/DataResolver.java index 5fb34c6..f02b565 100644 --- a/src/java/org/apache/cassandra/service/DataResolver.java +++ b/src/java/org/apache/cassandra/service/DataResolver.java @@ -44,6 +44,9 @@ import org.apache.cassandra.utils.FBUtilities; public class DataResolver extends ResponseResolver { +private static final boolean DROP_OVERSIZED_READ_REPAIR_MUTATIONS = +Boolean.getBoolean("cassandra.drop_oversized_readrepair_mutations"); + @VisibleForTesting final List repairResults = Collections.synchronizedList(new ArrayList<>()); @@ -452,15 +455,49 @@ public class DataResolver extends ResponseResolver public void close() { for (int i = 0; i < repairs.length; i++) +if (null != repairs[i]) +sendRepairMutation(repairs[i], sources[i]); +} + +private void sendRepairMutation(PartitionUpdate partition, InetAddress destination) +{ +Mutation mutation = new Mutation(partition); +int messagingVersion = MessagingService.instance().getVersion(destination); + +intmutationSize = (int) Mutation.serializer.serializedSize(mutation, messagingVersion); +int maxMutationSize = DatabaseDescriptor.getMaxMutationSize(); + +if
[2/6] cassandra git commit: Add flag to allow dropping oversized read repair mutations
Add flag to allow dropping oversized read repair mutations patch by Aleksey Yeschenko; reviewed by Sam Tunnicliffe for CASSANDRA-13975 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/f1e850a4 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/f1e850a4 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/f1e850a4 Branch: refs/heads/cassandra-3.11 Commit: f1e850a492126572efc636a6838cff90333806b9 Parents: f767d35 Author: Aleksey YeschenkoAuthored: Wed Oct 25 20:15:39 2017 +0100 Committer: Aleksey Yeschenko Committed: Mon Nov 13 13:10:28 2017 + -- CHANGES.txt | 2 + .../apache/cassandra/metrics/TableMetrics.java | 2 + .../apache/cassandra/service/DataResolver.java | 53 +--- 3 files changed, 49 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/f1e850a4/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index e3026aa..a3c43fd 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 3.0.16 + * Add flag to allow dropping oversized read repair mutations (CASSANDRA-13975) * Fix SSTableLoader logger message (CASSANDRA-14003) * Fix repair race that caused gossip to block (CASSANDRA-13849) * Tracing interferes with digest requests when using RandomPartitioner (CASSANDRA-13964) @@ -8,6 +9,7 @@ * Mishandling of cells for removed/dropped columns when reading legacy files (CASSANDRA-13939) * Deserialise sstable metadata in nodetool verify (CASSANDRA-13922) + 3.0.15 * Improve TRUNCATE performance (CASSANDRA-13909) * Implement short read protection on partition boundaries (CASSANDRA-13595) http://git-wip-us.apache.org/repos/asf/cassandra/blob/f1e850a4/src/java/org/apache/cassandra/metrics/TableMetrics.java -- diff --git a/src/java/org/apache/cassandra/metrics/TableMetrics.java b/src/java/org/apache/cassandra/metrics/TableMetrics.java index fe88a63..eb56ed9 100644 --- a/src/java/org/apache/cassandra/metrics/TableMetrics.java +++ b/src/java/org/apache/cassandra/metrics/TableMetrics.java @@ -151,6 +151,7 @@ public class TableMetrics public final static LatencyMetrics globalWriteLatency = new LatencyMetrics(globalFactory, globalAliasFactory, "Write"); public final static LatencyMetrics globalRangeLatency = new LatencyMetrics(globalFactory, globalAliasFactory, "Range"); +public final Meter readRepairRequests; public final Meter shortReadProtectionRequests; public final Map samplers; @@ -648,6 +649,7 @@ public class TableMetrics casPropose = new LatencyMetrics(factory, "CasPropose", cfs.keyspace.metric.casPropose); casCommit = new LatencyMetrics(factory, "CasCommit", cfs.keyspace.metric.casCommit); +readRepairRequests = Metrics.meter(factory.createMetricName("ReadRepairRequests")); shortReadProtectionRequests = Metrics.meter(factory.createMetricName("ShortReadProtectionRequests")); } http://git-wip-us.apache.org/repos/asf/cassandra/blob/f1e850a4/src/java/org/apache/cassandra/service/DataResolver.java -- diff --git a/src/java/org/apache/cassandra/service/DataResolver.java b/src/java/org/apache/cassandra/service/DataResolver.java index 5fb34c6..f02b565 100644 --- a/src/java/org/apache/cassandra/service/DataResolver.java +++ b/src/java/org/apache/cassandra/service/DataResolver.java @@ -44,6 +44,9 @@ import org.apache.cassandra.utils.FBUtilities; public class DataResolver extends ResponseResolver { +private static final boolean DROP_OVERSIZED_READ_REPAIR_MUTATIONS = +Boolean.getBoolean("cassandra.drop_oversized_readrepair_mutations"); + @VisibleForTesting final List repairResults = Collections.synchronizedList(new ArrayList<>()); @@ -452,15 +455,49 @@ public class DataResolver extends ResponseResolver public void close() { for (int i = 0; i < repairs.length; i++) +if (null != repairs[i]) +sendRepairMutation(repairs[i], sources[i]); +} + +private void sendRepairMutation(PartitionUpdate partition, InetAddress destination) +{ +Mutation mutation = new Mutation(partition); +int messagingVersion = MessagingService.instance().getVersion(destination); + +intmutationSize = (int) Mutation.serializer.serializedSize(mutation, messagingVersion); +int maxMutationSize = DatabaseDescriptor.getMaxMutationSize(); + +
[5/6] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.11
Merge branch 'cassandra-3.0' into cassandra-3.11 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/9ee44db4 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/9ee44db4 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/9ee44db4 Branch: refs/heads/cassandra-3.11 Commit: 9ee44db49b13d4b4c91c9d6332ce06a6e2abf944 Parents: 387d3a4 f1e850a Author: Aleksey YeschenkoAuthored: Mon Nov 13 13:13:06 2017 + Committer: Aleksey Yeschenko Committed: Mon Nov 13 13:13:06 2017 + -- CHANGES.txt | 1 + .../apache/cassandra/metrics/TableMetrics.java | 2 + .../apache/cassandra/service/DataResolver.java | 53 +--- 3 files changed, 48 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/9ee44db4/CHANGES.txt -- diff --cc CHANGES.txt index 6a78b60,a3c43fd..a1a1a37 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,9 -1,5 +1,10 @@@ -3.0.16 +3.11.2 + * Add asm jar to build.xml for maven builds (CASSANDRA-11193) + * Round buffer size to powers of 2 for the chunk cache (CASSANDRA-13897) + * Update jackson JSON jars (CASSANDRA-13949) + * Avoid locks when checking LCS fanout and if we should defrag (CASSANDRA-13930) +Merged from 3.0: + * Add flag to allow dropping oversized read repair mutations (CASSANDRA-13975) * Fix SSTableLoader logger message (CASSANDRA-14003) * Fix repair race that caused gossip to block (CASSANDRA-13849) * Tracing interferes with digest requests when using RandomPartitioner (CASSANDRA-13964) http://git-wip-us.apache.org/repos/asf/cassandra/blob/9ee44db4/src/java/org/apache/cassandra/metrics/TableMetrics.java -- diff --cc src/java/org/apache/cassandra/metrics/TableMetrics.java index b0f667c,eb56ed9..e78bb66 --- a/src/java/org/apache/cassandra/metrics/TableMetrics.java +++ b/src/java/org/apache/cassandra/metrics/TableMetrics.java @@@ -167,40 -151,7 +167,41 @@@ public class TableMetric public final static LatencyMetrics globalWriteLatency = new LatencyMetrics(globalFactory, globalAliasFactory, "Write"); public final static LatencyMetrics globalRangeLatency = new LatencyMetrics(globalFactory, globalAliasFactory, "Range"); +public final static Gauge globalPercentRepaired = Metrics.register(globalFactory.createMetricName("PercentRepaired"), +new Gauge() +{ +public Double getValue() +{ +double repaired = 0; +double total = 0; +for (String keyspace : Schema.instance.getNonSystemKeyspaces()) +{ +Keyspace k = Schema.instance.getKeyspaceInstance(keyspace); +if (SchemaConstants.DISTRIBUTED_KEYSPACE_NAME.equals(k.getName())) +continue; +if (k.getReplicationStrategy().getReplicationFactor() < 2) +continue; + +for (ColumnFamilyStore cf : k.getColumnFamilyStores()) +{ +if (!SecondaryIndexManager.isIndexColumnFamily(cf.name)) +{ +for (SSTableReader sstable : cf.getSSTables(SSTableSet.CANONICAL)) +{ +if (sstable.isRepaired()) +{ +repaired += sstable.uncompressedLength(); +} +total += sstable.uncompressedLength(); +} +} +} +} +return total > 0 ? (repaired / total) * 100 : 100.0; +} +}); + + public final Meter readRepairRequests; public final Meter shortReadProtectionRequests; public final Map samplers; http://git-wip-us.apache.org/repos/asf/cassandra/blob/9ee44db4/src/java/org/apache/cassandra/service/DataResolver.java -- diff --cc src/java/org/apache/cassandra/service/DataResolver.java index 111d561,f02b565..f63f4f5 --- a/src/java/org/apache/cassandra/service/DataResolver.java +++ b/src/java/org/apache/cassandra/service/DataResolver.java @@@ -44,15 -44,17 +44,18 @@@ import org.apache.cassandra.utils.FBUti public class DataResolver extends ResponseResolver { + private static final boolean DROP_OVERSIZED_READ_REPAIR_MUTATIONS = + Boolean.getBoolean("cassandra.drop_oversized_readrepair_mutations"); + @VisibleForTesting final List repairResults = Collections.synchronizedList(new ArrayList<>());
[jira] [Commented] (CASSANDRA-14008) RTs at index boundaries in 2.x sstables can create unexpected CQL row in 3.x
[ https://issues.apache.org/jira/browse/CASSANDRA-14008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249432#comment-16249432 ] Aleksey Yeschenko commented on CASSANDRA-14008: --- We can probably generate an sstable that triggers this bug relatively easily for a regression test (nice to have, but won't block the patch on lack of it). And, as Jeff mentions, it would be nice to find a way to un-break 3.0 sstables where the damage's been done already, in a follow-up JIRA. > RTs at index boundaries in 2.x sstables can create unexpected CQL row in 3.x > > > Key: CASSANDRA-14008 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14008 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Jeff Jirsa >Assignee: Jeff Jirsa > Labels: correctness > Fix For: 3.0.x, 3.11.x > > > In 2.1/2.2, it is possible for a range tombstone that isn't a row deletion > and isn't a complex deletion to appear between two cells with the same > clustering. The 8099 legacy code incorrectly treats the two (non-RT) cells as > two distinct CQL rows, despite having the same clustering prefix. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14008) RTs at index boundaries in 2.x sstables can create unexpected CQL row in 3.x
[ https://issues.apache.org/jira/browse/CASSANDRA-14008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-14008: -- Component/s: (was: Core) Local Write-Read Paths > RTs at index boundaries in 2.x sstables can create unexpected CQL row in 3.x > > > Key: CASSANDRA-14008 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14008 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Jeff Jirsa >Assignee: Jeff Jirsa > Labels: correctness > Fix For: 3.0.x, 3.11.x > > > In 2.1/2.2, it is possible for a range tombstone that isn't a row deletion > and isn't a complex deletion to appear between two cells with the same > clustering. The 8099 legacy code incorrectly treats the two (non-RT) cells as > two distinct CQL rows, despite having the same clustering prefix. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14007) cqlshlib tests fail due to compact table
[ https://issues.apache.org/jira/browse/CASSANDRA-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249370#comment-16249370 ] Alex Petrov commented on CASSANDRA-14007: - I've just re-ran all the dtests and they seem to be clean. Or do we run the cqlshlib tests in some other way?.. > cqlshlib tests fail due to compact table > > > Key: CASSANDRA-14007 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14007 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Joel Knighton >Assignee: Alex Petrov > > The pylib/cqlshlib tests fail on initialization with the error > {{SyntaxException: query\] message="Compact tables are not allowed in Cassandra starting with > 4.0 version.">}}. > The table {{dynamic_columns}} is created {{WITH COMPACT STORAGE}}. Since > [CASSANDRA-10857], this is no longer supported. It looks like dropping the > COMPACT STORAGE modifier is enough for the tests to run, but I haven't looked > if we should instead remove the table and all related tests entirely, or if > there's an interesting code path covered by this that we should test in a > different way now. [~ifesdjeen] might know at a glance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14009) Any user can overwrite any table with sstableloader
Andrzej Bober created CASSANDRA-14009: - Summary: Any user can overwrite any table with sstableloader Key: CASSANDRA-14009 URL: https://issues.apache.org/jira/browse/CASSANDRA-14009 Project: Cassandra Issue Type: Bug Components: Auth Reporter: Andrzej Bober Fix For: 2.1.x, 2.2.x, 3.0.x, 3.11.x Hi there, Looks like any user can overwrite any table with sstableloader. Tested ubuntu 16.04.3, Java 1.8.0_151_b12, and Cassandra 2.1.19 / 2.2.11 / 3.0.15 / 3.11.1. {code:sql} cassandra@cqlsh> CREATE USER alice WITH PASSWORD 'Alice'; cassandra@cqlsh> CREATE USER bob WITH PASSWORD 'Bob'; cassandra@cqlsh> CREATE KEYSPACE db4alice WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; cassandra@cqlsh> GRANT ALL PERMISSIONS ON KEYSPACE db4alice TO alice; alice@cqlsh> CREATE TABLE users (userid text PRIMARY KEY, password text); alice@cqlsh> INSERT INTO users (userid, password) VALUES ('user1', 'pass1'); alice@cqlsh> INSERT INTO users (userid, password) VALUES ('user2’, 'pass2’); alice@cqlsh> INSERT INTO users (userid, password) VALUES ('user3’, 'pass3’); alice@cqlsh> truncate users; alice@cqlsh> select * from db4alice.users ; userid | password +-- (0 rows) sstableloader -d 127.0.0.1 -u bob -pw Bob ./db4alice/users alice@cqlsh> select * from db4alice.users ; userid | password +-- user2 |pass2 user1 |pass1 user3 |pass3 (3 rows) {code} Looks like a pretty serious bug to me. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13997) Upgrade Guava to 23.3 and Airline to 0.8
[ https://issues.apache.org/jira/browse/CASSANDRA-13997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249290#comment-16249290 ] Stefan Podkowinski commented on CASSANDRA-13997: If we do that, then we should probably include the guava artifact directly instead of j2objc. This should override the ancient guava-16 version that is pulled by ohc. > Upgrade Guava to 23.3 and Airline to 0.8 > > > Key: CASSANDRA-13997 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13997 > Project: Cassandra > Issue Type: Improvement > Components: Libraries >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson > Fix For: 4.0 > > Attachments: airline-0.8.jar.asc, guava-23.3-jre.jar.asc > > > For 4.0 we should upgrade guava to the latest version > patch here: https://github.com/krummas/cassandra/commits/marcuse/guava23 > A bunch of quite commonly used methods have been deprecated since guava 18 > which we use now ({{Throwables.propagate}} for example), this patch mostly > updates uses where compilation fails. {{Futures.transform(ListenableFuture > ..., AsyncFunction ...}} was deprecated in Guava 19 and removed in 20 for > example, we should probably open new tickets to remove calls to all > deprecated guava methods. > Also had to add a dependency on {{com.google.j2objc.j2objc-annotations}}, to > avoid some build-time warnings (maybe due to > https://github.com/google/guava/commit/fffd2b1f67d158c7b4052123c5032b0ba54a910d > ?) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14007) cqlshlib tests fail due to compact table
[ https://issues.apache.org/jira/browse/CASSANDRA-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Petrov reassigned CASSANDRA-14007: --- Assignee: Alex Petrov > cqlshlib tests fail due to compact table > > > Key: CASSANDRA-14007 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14007 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Joel Knighton >Assignee: Alex Petrov > > The pylib/cqlshlib tests fail on initialization with the error > {{SyntaxException: query\] message="Compact tables are not allowed in Cassandra starting with > 4.0 version.">}}. > The table {{dynamic_columns}} is created {{WITH COMPACT STORAGE}}. Since > [CASSANDRA-10857], this is no longer supported. It looks like dropping the > COMPACT STORAGE modifier is enough for the tests to run, but I haven't looked > if we should instead remove the table and all related tests entirely, or if > there's an interesting code path covered by this that we should test in a > different way now. [~ifesdjeen] might know at a glance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes
[ https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249228#comment-16249228 ] Paulo Motta commented on CASSANDRA-13948: - Testall passed with no failures, and [dtest failures|https://issues.apache.org/jira/secure/attachment/12897298/dtest13948.png] look unrelated. > Reload compaction strategies when JBOD disk boundary changes > > > Key: CASSANDRA-13948 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13948 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Paulo Motta >Assignee: Paulo Motta > Fix For: 3.11.x, 4.x > > Attachments: debug.log, dtest13948.png, threaddump-cleanup.txt, > threaddump.txt, trace.log > > > The thread dump below shows a race between an sstable replacement by the > {{IndexSummaryRedistribution}} and > {{AbstractCompactionTask.getNextBackgroundTask}}: > {noformat} > Thread 94580: (state = BLOCKED) > - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information > may be imprecise) > - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, > line=175 (Compiled frame) > - > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() > @bci=1, line=836 (Compiled frame) > - > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, > int) @bci=67, line=870 (Compiled frame) > - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) > @bci=17, line=1199 (Compiled frame) > - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, > line=943 (Compiled frame) > - > org.apache.cassandra.db.compaction.CompactionStrategyManager.handleListChangedNotification(java.lang.Iterable, > java.lang.Iterable) @bci=359, line=483 (Interpreted frame) > - > org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(org.apache.cassandra.notifications.INotification, > java.lang.Object) @bci=53, line=555 (Interpreted frame) > - > org.apache.cassandra.db.lifecycle.Tracker.notifySSTablesChanged(java.util.Collection, > java.util.Collection, org.apache.cassandra.db.compaction.OperationType, > java.lang.Throwable) @bci=50, line=409 (Interpreted frame) > - > org.apache.cassandra.db.lifecycle.LifecycleTransaction.doCommit(java.lang.Throwable) > @bci=157, line=227 (Interpreted frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit(java.lang.Throwable) > @bci=61, line=116 (Compiled frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit() > @bci=2, line=200 (Interpreted frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish() > @bci=5, line=185 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryRedistribution.redistributeSummaries() > @bci=559, line=130 (Interpreted frame) > - > org.apache.cassandra.db.compaction.CompactionManager.runIndexSummaryRedistribution(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) > @bci=9, line=1420 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) > @bci=4, line=250 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries() > @bci=30, line=228 (Interpreted frame) > - org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow() > @bci=4, line=125 (Interpreted frame) > - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 > (Interpreted frame) > - > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run() > @bci=4, line=118 (Compiled frame) > - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 > (Compiled frame) > - java.util.concurrent.FutureTask.runAndReset() @bci=47, line=308 (Compiled > frame) > - > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) > @bci=1, line=180 (Compiled frame) > - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() > @bci=37, line=294 (Compiled frame) > - > java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) > @bci=95, line=1149 (Compiled frame) > - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 > (Interpreted frame) > - > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable) > @bci=1, line=81 (Interpreted frame) > - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 >
[jira] [Updated] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes
[ https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-13948: Attachment: dtest13948.png > Reload compaction strategies when JBOD disk boundary changes > > > Key: CASSANDRA-13948 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13948 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Paulo Motta >Assignee: Paulo Motta > Fix For: 3.11.x, 4.x > > Attachments: debug.log, dtest13948.png, threaddump-cleanup.txt, > threaddump.txt, trace.log > > > The thread dump below shows a race between an sstable replacement by the > {{IndexSummaryRedistribution}} and > {{AbstractCompactionTask.getNextBackgroundTask}}: > {noformat} > Thread 94580: (state = BLOCKED) > - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information > may be imprecise) > - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, > line=175 (Compiled frame) > - > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() > @bci=1, line=836 (Compiled frame) > - > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, > int) @bci=67, line=870 (Compiled frame) > - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) > @bci=17, line=1199 (Compiled frame) > - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, > line=943 (Compiled frame) > - > org.apache.cassandra.db.compaction.CompactionStrategyManager.handleListChangedNotification(java.lang.Iterable, > java.lang.Iterable) @bci=359, line=483 (Interpreted frame) > - > org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(org.apache.cassandra.notifications.INotification, > java.lang.Object) @bci=53, line=555 (Interpreted frame) > - > org.apache.cassandra.db.lifecycle.Tracker.notifySSTablesChanged(java.util.Collection, > java.util.Collection, org.apache.cassandra.db.compaction.OperationType, > java.lang.Throwable) @bci=50, line=409 (Interpreted frame) > - > org.apache.cassandra.db.lifecycle.LifecycleTransaction.doCommit(java.lang.Throwable) > @bci=157, line=227 (Interpreted frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit(java.lang.Throwable) > @bci=61, line=116 (Compiled frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit() > @bci=2, line=200 (Interpreted frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish() > @bci=5, line=185 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryRedistribution.redistributeSummaries() > @bci=559, line=130 (Interpreted frame) > - > org.apache.cassandra.db.compaction.CompactionManager.runIndexSummaryRedistribution(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) > @bci=9, line=1420 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) > @bci=4, line=250 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries() > @bci=30, line=228 (Interpreted frame) > - org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow() > @bci=4, line=125 (Interpreted frame) > - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 > (Interpreted frame) > - > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run() > @bci=4, line=118 (Compiled frame) > - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 > (Compiled frame) > - java.util.concurrent.FutureTask.runAndReset() @bci=47, line=308 (Compiled > frame) > - > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) > @bci=1, line=180 (Compiled frame) > - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() > @bci=37, line=294 (Compiled frame) > - > java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) > @bci=95, line=1149 (Compiled frame) > - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 > (Interpreted frame) > - > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable) > @bci=1, line=81 (Interpreted frame) > - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 > (Interpreted frame) > - java.lang.Thread.run() @bci=11, line=748 (Compiled frame) > {noformat} > {noformat} > Thread 94573: (state = IN_JAVA) > -
[jira] [Commented] (CASSANDRA-13992) Don't send new_metadata_id for conditional updates
[ https://issues.apache.org/jira/browse/CASSANDRA-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249215#comment-16249215 ] Alex Petrov commented on CASSANDRA-13992: - I've composed a version of the patch, to demonstrate my thinking [here|https://github.com/apache/cassandra/compare/trunk...ifesdjeen:CASSANDRA-13992]. It seems that we can solve this problem without patching the driver. In fact, it might be even better if inner doings of metadata hash are transparent for the driver. In short, we can always force {{METADATA_CHANGED}} for conditional statements and avoid computing their metadata to make sure it's empty. It's a rough equivalent of making metadata hash random, just simpler to reason about. What do you think about it [~KurtG] [~omichallat] > Don't send new_metadata_id for conditional updates > -- > > Key: CASSANDRA-13992 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13992 > Project: Cassandra > Issue Type: Bug >Reporter: Olivier Michallat >Assignee: Kurt Greaves >Priority: Minor > > This is a follow-up to CASSANDRA-10786. > Given the table > {code} > CREATE TABLE foo (k int PRIMARY KEY) > {code} > And the prepared statement > {code} > INSERT INTO foo (k) VALUES (?) IF NOT EXISTS > {code} > The result set metadata changes depending on the outcome of the update: > * if the row didn't exist, there is only a single column \[applied] = true > * if it did, the result contains \[applied] = false, plus the current value > of column k. > The way this was handled so far is that the PREPARED response contains no > result set metadata, and therefore all EXECUTE messages have SKIP_METADATA = > false, and the responses always include the full (and correct) metadata. > CASSANDRA-10786 still sends the PREPARED response with no metadata, *but the > response to EXECUTE now contains a {{new_metadata_id}}*. The driver thinks it > is because of a schema change, and updates its local copy of the prepared > statement's result metadata. > The next EXECUTE is sent with SKIP_METADATA = true, but the server appears to > ignore that, and still sends the metadata in the response. So each response > includes the correct metadata, the driver uses it, and there is no visible > issue for client code. > The only drawback is that the driver updates its local copy of the metadata > unnecessarily, every time. We can work around that by only updating if we had > metadata before, at the cost of an extra volatile read. But I think the best > thing to do would be to never send a {{new_metadata_id}} in for a conditional > update. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org