[jira] [Comment Edited] (CASSANDRA-15830) Invalid version value: 4.0~alpha4 during startup
[ https://issues.apache.org/jira/browse/CASSANDRA-15830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118111#comment-17118111 ] Michael Semb Wever edited comment on CASSANDRA-15830 at 5/27/20, 9:32 PM: -- Reproduced with {code} cd /tmp wget https://downloads.apache.org/cassandra/redhat/40x/cassandra-4.0~alpha4-1.noarch.rpm rpm2cpio cassandra-4.0\~alpha4-1.noarch.rpm | cpio -idmv jar xvf ./usr/share/cassandra/apache-cassandra-4.0~alpha4.jar org/apache/cassandra/config/version.properties cat org/apache/cassandra/config/version.properties {code} The version used is wrong when building the project artifacts inside the [build-rpms.sh|https://github.com/apache/cassandra-builds/blob/master/docker/build-rpms.sh#L75] script, either the ${deb_release} or $CASSANDRA_VERSION. was (Author: michaelsembwever): Reproduced with {code} cd /tmp wget https://downloads.apache.org/cassandra/redhat/40x/cassandra-4.0~alpha4-1.noarch.rpm rpm2cpio cassandra-4.0\~alpha4-1.noarch.rpm| cpio -idmv jar xvf ./usr/share/cassandra/apache-cassandra-4.0~alpha4.jar org/apache/cassandra/config/version.properties cat org/apache/cassandra/config/version.properties {code} The version used is wrong when building the project artifacts inside the [build-rpms.sh|https://github.com/apache/cassandra-builds/blob/master/docker/build-rpms.sh#L75] script, either the ${deb_release} or $CASSANDRA_VERSION. > Invalid version value: 4.0~alpha4 during startup > > > Key: CASSANDRA-15830 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15830 > Project: Cassandra > Issue Type: Bug > Components: Build, Packaging >Reporter: Eric Wong >Assignee: Michael Semb Wever >Priority: Normal > Fix For: 4.0-alpha > > > Hi: > We are testing the latest cassandra-4.0 on Centos 7 using a clean database. > When we started cassandra the first time, everything is fine. However, when > we stop and restart cassandra, we got the following error and the db refuses > to startup: > {code} > ERROR [main] 2020-05-22 05:58:18,698 CassandraDaemon.java:789 - Exception > encountered during startup > java.lang.IllegalArgumentException: Invalid version value: 4.0~alpha4 > at > org.apache.cassandra.utils.CassandraVersion.(CassandraVersion.java:64) > at > org.apache.cassandra.io.sstable.SSTableHeaderFix.fixNonFrozenUDTIfUpgradeFrom30(SSTableHeaderFix.java:84) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:250) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:650) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:767) > {code} > The only way to get the node up and running again is by deleting all data > under /var/lib/cassandra. > > Is that a known issue? > Thanks, Eric > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15830) Invalid version value: 4.0~alpha4 during startup
[ https://issues.apache.org/jira/browse/CASSANDRA-15830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever updated CASSANDRA-15830: --- Platform: Linux (was: All) > Invalid version value: 4.0~alpha4 during startup > > > Key: CASSANDRA-15830 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15830 > Project: Cassandra > Issue Type: Bug > Components: Build, Packaging >Reporter: Eric Wong >Assignee: Michael Semb Wever >Priority: Normal > Fix For: 4.0-alpha > > > Hi: > We are testing the latest cassandra-4.0 on Centos 7 using a clean database. > When we started cassandra the first time, everything is fine. However, when > we stop and restart cassandra, we got the following error and the db refuses > to startup: > {code} > ERROR [main] 2020-05-22 05:58:18,698 CassandraDaemon.java:789 - Exception > encountered during startup > java.lang.IllegalArgumentException: Invalid version value: 4.0~alpha4 > at > org.apache.cassandra.utils.CassandraVersion.(CassandraVersion.java:64) > at > org.apache.cassandra.io.sstable.SSTableHeaderFix.fixNonFrozenUDTIfUpgradeFrom30(SSTableHeaderFix.java:84) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:250) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:650) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:767) > {code} > The only way to get the node up and running again is by deleting all data > under /var/lib/cassandra. > > Is that a known issue? > Thanks, Eric > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15830) Invalid version value: 4.0~alpha4 during startup
[ https://issues.apache.org/jira/browse/CASSANDRA-15830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever updated CASSANDRA-15830: --- Bug Category: Parent values: Correctness(12982)Level 1 values: API / Semantic Implementation(12988) Complexity: Low Hanging Fruit Component/s: Packaging Build Discovered By: User Report Fix Version/s: 4.0-alpha Severity: Critical Status: Open (was: Triage Needed) > Invalid version value: 4.0~alpha4 during startup > > > Key: CASSANDRA-15830 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15830 > Project: Cassandra > Issue Type: Bug > Components: Build, Packaging >Reporter: Eric Wong >Assignee: Michael Semb Wever >Priority: Normal > Fix For: 4.0-alpha > > > Hi: > We are testing the latest cassandra-4.0 on Centos 7 using a clean database. > When we started cassandra the first time, everything is fine. However, when > we stop and restart cassandra, we got the following error and the db refuses > to startup: > {code} > ERROR [main] 2020-05-22 05:58:18,698 CassandraDaemon.java:789 - Exception > encountered during startup > java.lang.IllegalArgumentException: Invalid version value: 4.0~alpha4 > at > org.apache.cassandra.utils.CassandraVersion.(CassandraVersion.java:64) > at > org.apache.cassandra.io.sstable.SSTableHeaderFix.fixNonFrozenUDTIfUpgradeFrom30(SSTableHeaderFix.java:84) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:250) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:650) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:767) > {code} > The only way to get the node up and running again is by deleting all data > under /var/lib/cassandra. > > Is that a known issue? > Thanks, Eric > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15830) Invalid version value: 4.0~alpha4 during startup
[ https://issues.apache.org/jira/browse/CASSANDRA-15830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118111#comment-17118111 ] Michael Semb Wever commented on CASSANDRA-15830: Reproduced with {code} cd /tmp wget https://downloads.apache.org/cassandra/redhat/40x/cassandra-4.0~alpha4-1.noarch.rpm rpm2cpio cassandra-4.0\~alpha4-1.noarch.rpm| cpio -idmv jar xvf ./usr/share/cassandra/apache-cassandra-4.0~alpha4.jar org/apache/cassandra/config/version.properties cat org/apache/cassandra/config/version.properties {code} The version used is wrong when building the project artifacts inside the [build-rpms.sh|https://github.com/apache/cassandra-builds/blob/master/docker/build-rpms.sh#L75] script, either the ${deb_release} or $CASSANDRA_VERSION. > Invalid version value: 4.0~alpha4 during startup > > > Key: CASSANDRA-15830 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15830 > Project: Cassandra > Issue Type: Bug >Reporter: Eric Wong >Assignee: Michael Semb Wever >Priority: Normal > > Hi: > We are testing the latest cassandra-4.0 on Centos 7 using a clean database. > When we started cassandra the first time, everything is fine. However, when > we stop and restart cassandra, we got the following error and the db refuses > to startup: > {code} > ERROR [main] 2020-05-22 05:58:18,698 CassandraDaemon.java:789 - Exception > encountered during startup > java.lang.IllegalArgumentException: Invalid version value: 4.0~alpha4 > at > org.apache.cassandra.utils.CassandraVersion.(CassandraVersion.java:64) > at > org.apache.cassandra.io.sstable.SSTableHeaderFix.fixNonFrozenUDTIfUpgradeFrom30(SSTableHeaderFix.java:84) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:250) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:650) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:767) > {code} > The only way to get the node up and running again is by deleting all data > under /var/lib/cassandra. > > Is that a known issue? > Thanks, Eric > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15837) Enhance fqltool to be able to export the fql log into a format which doesn't depend on Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-15837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118024#comment-17118024 ] David Capwell commented on CASSANDRA-15837: --- Spoke to Marcus about this, dumping context here. * "why not read the raw fql logs?" main reasons 1) cassandra brings in its dependencies, and these can be very old causing conflict 2) cassandra doesn’t have a stable API, so using different versions require a rewrite [1] 3) thrift is a lot faster [2], so we spend less time reading the files and more time trying to put load on the cluster. Right now my bottleneck isn’t reading the files, its that cassandra can’t keep up so need to throttle the operations. [1] - It is less of an issue reading the files, but there are no guarantees that QueryOptions won’t change the java API at will. The main issue I had in upgrading from 3.0 to 4.0 was the query parsing side to classify the query. I added logic to annotate what the query does and touches, so tools could filter for specific tables or only replay specific types of queries (such as selects or updates, etc.); having this logic is very useful in tooling (actively using right now) but this part is not compatible cross releases. [2] - Below are two consumers of the fql logs: one reading the raw logs and ignoring the output, the other reading the thrift version and collecting stats. In both tools they read 100% of the data and do this sequentially. $ time ./bin/fqltool thrift-stats ../query_logs/*/fql/fql.thrift.gz 1>/dev/null 13.67 real14.88 user 1.59 sys $ time ./bin/fqltool dump-thrift -- ../query_logs/*/fql/ 56.97 real76.70 user 2.47 sys * "we just removed [thrift]" I started in 3.0 so used thrift because it was there, in doing so I grew to hate it... I find protobuf to be better so will likely switch to that. But, to avoid adding a dependency into Cassandra, I can take on the work to allow FQL to have a different set of dependencies. > Enhance fqltool to be able to export the fql log into a format which doesn't > depend on Cassandra > > > Key: CASSANDRA-15837 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15837 > Project: Cassandra > Issue Type: Improvement > Components: Tool/fql >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Time Spent: 10m > Remaining Estimate: 0h > > Currently the fql log format uses Cassandra serialization within the message, > which means that reading the file also requires Cassandra classes. To make it > easier for outside tools to read the fql logs we should enhance the fqltool > to be able to dump the logs to a file format using thrift or protobuf. > Additionally we should support exporting the original query with a > deterministic version allowing tools to have reproducible and comparable > results. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15821) Metrics Documentation Enhancements
[ https://issues.apache.org/jira/browse/CASSANDRA-15821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117939#comment-17117939 ] Stephen Mallette commented on CASSANDRA-15821: -- I've pushed another batch of changes to my branch which cover now cover all the documented items in my spreadsheet (i.e. all the known metrics that I could identify after a simple cassandra initialization are now in the {{metrics.rst}}). A few odds and ends I noticed: 1. Seems a bit odd that {{DroppedMessageMetrics}} and {{MessagingMetrics}} aren't handled in a consistent fashion. The former use the {{Verb}} as the scope (which is nice) but the latter appends the {{Verb}} to the metric name itself (which seems less nice). I'm not sure what decisions led to this situation but I'd be curious to hear if anyone thinks this a concern at all. 2. ReadRepair.RepairedAsync does not appear to be in use? I could be missing something but it does not seem to be referenced in the code beyond its declaration. Could this be deleted? 3. {{DroppedMessageMetrics}} had PAGED_SLICE and RANGED_SLICE documented but they don't appear to be available. I couldn't quite isolate exactly when they were removed but I assume it's safe that that happened? 4. A minor point but I'd wonder what tolerance there is for making casing consistent throughout the metrics. For example, Client metrics has some a mixture of casing. For example, connectedNativeClients and AuthFailure - Would be nice to my eyes to change to "ConnectedNativeClients" in that example. > Metrics Documentation Enhancements > -- > > Key: CASSANDRA-15821 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15821 > Project: Cassandra > Issue Type: Improvement > Components: Documentation/Website >Reporter: Stephen Mallette >Assignee: Stephen Mallette >Priority: Normal > > CASSANDRA-15582 involves quality around metrics and it was mentioned that > reviewing and [improving > documentation|https://github.com/apache/cassandra/blob/trunk/doc/source/operating/metrics.rst] > around metrics would fall into that scope. Please consider some of this > analysis in determining what improvements to make here: > Please see [this > spreadsheet|https://docs.google.com/spreadsheets/d/1iPWfCMIG75CI6LbYuDtCTjEOvZw-5dyH-e08bc63QnI/edit?usp=sharing] > that itemizes almost all of cassandra's metrics and whether they are > documented or not (and other notes). That spreadsheet is "almost all" > because there are some metrics that don't seem to initialize as part of > Cassandra startup (i was able to trigger some to initialize, but all were not > immediately obvious). The missing metrics seem to be related to the following: > * ThreadPool metrics - only some initialize at startup the list of which > follow below > * Streaming Metrics > * HintedHandoff Metrics > * HintsService Metrics > Here are the ThreadPool scopes that get listed: > {code} > AntiEntropyStage > CacheCleanupExecutor > CompactionExecutor > GossipStage > HintsDispatcher > MemtableFlushWriter > MemtablePostFlush > MemtableReclaimMemory > MigrationStage > MutationStage > Native-Transport-Requests > PendingRangeCalculator > PerDiskMemtableFlushWriter_0 > ReadStage > Repair-Task > RequestResponseStage > Sampler > SecondaryIndexManagement > ValidationExecutor > ViewBuildExecutor > {code} > I noticed that Keyspace Metrics have this note: "Most of these metrics are > the same as the Table Metrics above, only they are aggregated at the Keyspace > level." I think I've isolated those metrics on table that are not on keyspace > to specifically be: > {code} > BloomFilterFalsePositives > BloomFilterFalseRatio > BytesAnticompacted > BytesFlushed > BytesMutatedAnticompaction > BytesPendingRepair > BytesRepaired > BytesUnrepaired > CompactionBytesWritten > CompressionRatio > CoordinatorReadLatency > CoordinatorScanLatency > CoordinatorWriteLatency > EstimatedColumnCountHistogram > EstimatedPartitionCount > EstimatedPartitionSizeHistogram > KeyCacheHitRate > LiveSSTableCount > MaxPartitionSize > MeanPartitionSize > MinPartitionSize > MutatedAnticompactionGauge > PercentRepaired > RowCacheHitOutOfRange > RowCacheHit > RowCacheMiss > SpeculativeSampleLatencyNanos > SyncTime > WaitingOnFreeMemtableSpace > DroppedMutations > {code} > Someone with greater knowledge of this area might consider it worth the > effort to see if any of these metrics should be aggregated to the keyspace > level in case they were inadvertently missed. In any case, perhaps the > documentation could easily now reflect which metric names could be expected > on Keyspace. > The DroppedMessage metrics have a much larger body of scopes than just what > were documented: > {code} > ASYMMETRIC_SYNC_REQ > BATCH_REMOVE_REQ > BATCH_REMOVE_RSP > BATCH_STORE_REQ >
[jira] [Comment Edited] (CASSANDRA-12126) CAS Reads Inconsistencies
[ https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117855#comment-17117855 ] Benedict Elliott Smith edited comment on CASSANDRA-12126 at 5/27/20, 3:39 PM: -- The test cases I provided demonstrate several consistency violations during range movements. I've just thought of another one, and am writing a test case for it. Perhaps we could claim that range movements are always (potentially) consistency violations, but they are particularly keenly felt when you claim a linearisable history. There are also (more debatably) issues with TTL on {{system.paxos}}, particularly when mixed with non-global commit; perhaps we could claim this is the user's problem, but it's not clear why we support global consensus that can be lost through local commit, and I don't think we communicate clearly the consistency implications to not call this a bug. Also, mixing LOCAL_SERIAL and SERIAL is entirely unsafe, and even supporting them both is arguably a consistency violation without mechanisms to safely transition from one level to another. was (Author: benedict): The test cases I provided demonstrate several consistency violations during range movements. I've just thought of another one, and am writing a test case for it. Perhaps we could claim that range movements are always consistency violations, but they are particularly keenly felt when you claim a linearisable history. There are also (more debatably) issues with TTL on {{system.paxos}}, particularly when mixed with non-global commit; perhaps we could claim this is the user's problem, but it's not clear why we support global consensus that can be lost through local commit, and I don't think we communicate clearly the consistency implications to not call this a bug. Also, mixing LOCAL_SERIAL and SERIAL is entirely unsafe, and even supporting them both is arguably a consistency violation without mechanisms to safely transition from one level to another. > CAS Reads Inconsistencies > -- > > Key: CASSANDRA-12126 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12126 > Project: Cassandra > Issue Type: Bug > Components: Feature/Lightweight Transactions, Legacy/Coordination >Reporter: Sankalp Kohli >Assignee: Sylvain Lebresne >Priority: Normal > Labels: LWT, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > While looking at the CAS code in Cassandra, I found a potential issue with > CAS Reads. Here is how it can happen with RF=3 > 1) You issue a CAS Write and it fails in the propose phase. A machine replies > true to a propose and saves the commit in accepted filed. The other two > machines B and C does not get to the accept phase. > Current state is that machine A has this commit in paxos table as accepted > but not committed and B and C does not. > 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the > value written in step 1. This step is as if nothing is inflight. > 3) Issue another CAS Read and it goes to A and B. Now we will discover that > there is something inflight from A and will propose and commit it with the > current ballot. Now we can read the value written in step 1 as part of this > CAS read. > If we skip step 3 and instead run step 4, we will never learn about value > written in step 1. > 4. Issue a CAS Write and it involves only B and C. This will succeed and > commit a different value than step 1. Step 1 value will never be seen again > and was never seen before. > If you read the Lamport “paxos made simple” paper and read section 2.3. It > talks about this issue which is how learners can find out if majority of the > acceptors have accepted the proposal. > In step 3, it is correct that we propose the value again since we dont know > if it was accepted by majority of acceptors. When we ask majority of > acceptors, and more than one acceptors but not majority has something in > flight, we have no way of knowing if it is accepted by majority of acceptors. > So this behavior is correct. > However we need to fix step 2, since it caused reads to not be linearizable > with respect to writes and other reads. In this case, we know that majority > of acceptors have no inflight commit which means we have majority that > nothing was accepted by majority. I think we should run a propose step here > with empty commit and that will cause write written in step 1 to not be > visible ever after. > With this fix, we will either see data written in step 1 on next serial read > or will never see it which is what we want. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To
[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies
[ https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117855#comment-17117855 ] Benedict Elliott Smith commented on CASSANDRA-12126: The test cases I provided demonstrate several consistency violations during range movements. I've just thought of another one, and am writing a test case for it. Perhaps we could claim that range movements are always consistency violations, but they are particularly keenly felt when you claim a linearisable history. There are also (more debatably) issues with TTL on {{system.paxos}}, particularly when mixed with non-global commit; perhaps we could claim this is the user's problem, but it's not clear why we support global consensus that can be lost through local commit, and I don't think we communicate clearly the consistency implications to not call this a bug. Also, mixing LOCAL_SERIAL and SERIAL is entirely unsafe, and even supporting them both is arguably a consistency violation without mechanisms to safely transition from one level to another. > CAS Reads Inconsistencies > -- > > Key: CASSANDRA-12126 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12126 > Project: Cassandra > Issue Type: Bug > Components: Feature/Lightweight Transactions, Legacy/Coordination >Reporter: Sankalp Kohli >Assignee: Sylvain Lebresne >Priority: Normal > Labels: LWT, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > While looking at the CAS code in Cassandra, I found a potential issue with > CAS Reads. Here is how it can happen with RF=3 > 1) You issue a CAS Write and it fails in the propose phase. A machine replies > true to a propose and saves the commit in accepted filed. The other two > machines B and C does not get to the accept phase. > Current state is that machine A has this commit in paxos table as accepted > but not committed and B and C does not. > 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the > value written in step 1. This step is as if nothing is inflight. > 3) Issue another CAS Read and it goes to A and B. Now we will discover that > there is something inflight from A and will propose and commit it with the > current ballot. Now we can read the value written in step 1 as part of this > CAS read. > If we skip step 3 and instead run step 4, we will never learn about value > written in step 1. > 4. Issue a CAS Write and it involves only B and C. This will succeed and > commit a different value than step 1. Step 1 value will never be seen again > and was never seen before. > If you read the Lamport “paxos made simple” paper and read section 2.3. It > talks about this issue which is how learners can find out if majority of the > acceptors have accepted the proposal. > In step 3, it is correct that we propose the value again since we dont know > if it was accepted by majority of acceptors. When we ask majority of > acceptors, and more than one acceptors but not majority has something in > flight, we have no way of knowing if it is accepted by majority of acceptors. > So this behavior is correct. > However we need to fix step 2, since it caused reads to not be linearizable > with respect to writes and other reads. In this case, we know that majority > of acceptors have no inflight commit which means we have majority that > nothing was accepted by majority. I think we should run a propose step here > with empty commit and that will cause write written in step 1 to not be > visible ever after. > With this fix, we will either see data written in step 1 on next serial read > or will never see it which is what we want. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15805) Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones interacts with collection tombstones
[ https://issues.apache.org/jira/browse/CASSANDRA-15805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-15805: - Fix Version/s: (was: 3.11.x) (was: 3.0.x) 3.11.7 3.0.21 Since Version: 3.0 alpha 1 Source Control Link: [8358e19840d352475a5831d130ff3c43a11f2f4e|https://github.com/apache/cassandra/commit/8358e19840d352475a5831d130ff3c43a11f2f4e], [c8a2834606d683ba9945e9cc11bdb4207ce269d1|https://github.com/apache/cassandra/commit/c8a2834606d683ba9945e9cc11bdb4207ce269d1] Resolution: Fixed Status: Resolved (was: Ready to Commit) > Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones > interacts with collection tombstones > -- > > Key: CASSANDRA-15805 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15805 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Coordination, Local/SSTable >Reporter: Sylvain Lebresne >Assignee: Sylvain Lebresne >Priority: Normal > Fix For: 3.0.21, 3.11.7 > > > The legacy reading code ({{LegacyLayout}} and > {{UnfilteredDeserializer.OldFormatDeserializer}}) does not handle correctly > the case where a range tombstone covering multiple rows interacts with a > collection tombstone. > A simple example of this problem is if one runs on 2.X: > {noformat} > CREATE TABLE t ( > k int, > c1 text, > c2 text, > a text, > b set, > c text, > PRIMARY KEY((k), c1, c2) > ); > // Delete all rows where c1 is 'A' > DELETE FROM t USING TIMESTAMP 1 WHERE k = 0 AND c1 = 'A'; > // Inserts a row covered by that previous range tombstone > INSERT INTO t(k, c1, c2, a, b, c) VALUES (0, 'A', 'X', 'foo', {'whatever'}, > 'bar') USING TIMESTAMP 2; > // Delete the collection of that previously inserted row > DELETE b FROM t USING TIMESTAMP 3 WHERE k = 0 AND c1 = 'A' and c2 = 'X'; > {noformat} > If the following is ran on 2.X (with everything either flushed in the same > table or compacted together), then this will result in the inserted row being > duplicated (one part containing the {{a}} column, the other the {{c}} one). > I will note that this is _not_ a duplicate of CASSANDRA-15789 and this > reproduce even with the fix to {{LegacyLayout}} of this ticket. That said, > the additional code added to CASSANDRA-15789 to force merging duplicated rows > if they are produced _will_ end up fixing this as a consequence (assuming > there is no variation of this problem that leads to other visible issues than > duplicated rows). That said, I "think" we'd still rather fix the source of > the issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] 01/01: Merge branch 'cassandra-3.11' into trunk
This is an automated email from the ASF dual-hosted git repository. slebresne pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git commit 44c4e1f7874c962309a853fa2434a9dd4ae16aeb Merge: e690e29 ebfd052 Author: Sylvain Lebresne AuthorDate: Wed May 27 17:18:43 2020 +0200 Merge branch 'cassandra-3.11' into trunk - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] branch trunk updated (e690e29 -> 44c4e1f)
This is an automated email from the ASF dual-hosted git repository. slebresne pushed a change to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git. from e690e29 Add docs section on configuring the Jenkins master to create the "Cassandra" category throttle. new 8358e19 Fix legacy handling of RangeTombstone with collection ones new c8a2834 Fix LegacyLayout handling of non-selected collection tombstones new ebfd052 Merge commit 'c8a2834606d683ba9945e9cc11bdb4207ce269d1' into cassandra-3.11 new 44c4e1f Merge branch 'cassandra-3.11' into trunk The 4 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] 01/01: Merge commit 'c8a2834606d683ba9945e9cc11bdb4207ce269d1' into cassandra-3.11
This is an automated email from the ASF dual-hosted git repository. slebresne pushed a commit to branch cassandra-3.11 in repository https://gitbox.apache.org/repos/asf/cassandra.git commit ebfd05254f84000f71fa018650632d24d3761f07 Merge: 3cda9d7 c8a2834 Author: Sylvain Lebresne AuthorDate: Wed May 27 17:12:44 2020 +0200 Merge commit 'c8a2834606d683ba9945e9cc11bdb4207ce269d1' into cassandra-3.11 CHANGES.txt| 1 + src/java/org/apache/cassandra/db/LegacyLayout.java | 105 + .../cassandra/db/UnfilteredDeserializer.java | 129 - .../upgrade/MixedModeRangeTombstoneTest.java | 73 .../org/apache/cassandra/db/LegacyLayoutTest.java | 102 +--- 5 files changed, 340 insertions(+), 70 deletions(-) diff --cc CHANGES.txt index 11515c4,46b3f56..a809016 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,7 -1,5 +1,8 @@@ -3.0.21 +3.11.7 + * Fix CQL formatting of read command restrictions for slow query log (CASSANDRA-15503) + * Allow sstableloader to use SSL on the native port (CASSANDRA-14904) +Merged from 3.0: + * Fix duplicated row on 2.x upgrades when multi-rows range tombstones interact with collection ones (CASSANDRA-15805) * Rely on snapshotted session infos on StreamResultFuture.maybeComplete to avoid race conditions (CASSANDRA-15667) * EmptyType doesn't override writeValue so could attempt to write bytes when expected not to (CASSANDRA-15790) * Fix index queries on partition key columns when some partitions contains only static data (CASSANDRA-13666) diff --cc src/java/org/apache/cassandra/db/LegacyLayout.java index 4ec0c30,8492de5..b28c72a --- a/src/java/org/apache/cassandra/db/LegacyLayout.java +++ b/src/java/org/apache/cassandra/db/LegacyLayout.java @@@ -1891,9 -1934,9 +1936,9 @@@ public abstract class LegacyLayou if ((start.collectionName == null) != (stop.collectionName == null)) { if (start.collectionName == null) - stop = new LegacyBound(stop.bound, stop.isStatic, null); -stop = new LegacyBound(Slice.Bound.inclusiveEndOf(stop.bound.values), stop.isStatic, null); ++stop = new LegacyBound(ClusteringBound.inclusiveEndOf(stop.bound.values), stop.isStatic, null); else - start = new LegacyBound(start.bound, start.isStatic, null); -start = new LegacyBound(Slice.Bound.inclusiveStartOf(start.bound.values), start.isStatic, null); ++start = new LegacyBound(ClusteringBound.inclusiveStartOf(start.bound.values), start.isStatic, null); } else if (!Objects.equals(start.collectionName, stop.collectionName)) { @@@ -1920,11 -1963,21 +1965,21 @@@ return new LegacyRangeTombstone(newStart, stop, deletionTime); } -public LegacyRangeTombstone withNewStart(Slice.Bound newStart) ++public LegacyRangeTombstone withNewStart(ClusteringBound newStart) + { + return withNewStart(new LegacyBound(newStart, start.isStatic, null)); + } + public LegacyRangeTombstone withNewEnd(LegacyBound newStop) { return new LegacyRangeTombstone(start, newStop, deletionTime); } -public LegacyRangeTombstone withNewEnd(Slice.Bound newEnd) ++public LegacyRangeTombstone withNewEnd(ClusteringBound newEnd) + { + return withNewEnd(new LegacyBound(newEnd, stop.isStatic, null)); + } + public boolean isCell() { return false; diff --cc src/java/org/apache/cassandra/db/UnfilteredDeserializer.java index cdcde2e,2d270bc..262b333 --- a/src/java/org/apache/cassandra/db/UnfilteredDeserializer.java +++ b/src/java/org/apache/cassandra/db/UnfilteredDeserializer.java @@@ -480,19 -480,9 +480,10 @@@ public abstract class UnfilteredDeseria this.helper = helper; this.grouper = new LegacyLayout.CellGrouper(metadata, helper); this.tombstoneTracker = new TombstoneTracker(partitionDeletion); - this.atoms = new AtomIterator(atomReader); - } - - private boolean isRow(LegacyLayout.LegacyAtom atom) - { - if (atom.isCell()) - return true; - - LegacyLayout.LegacyRangeTombstone tombstone = atom.asRangeTombstone(); - return tombstone.isCollectionTombstone() || tombstone.isRowDeletion(metadata); + this.atoms = new AtomIterator(atomReader, metadata); } + public boolean hasNext() { // Note that we loop on next == null because TombstoneTracker.openNew() could return null below or the atom might be shadowed. @@@ -540,13 -530,57 +531,57 @@@
[cassandra] branch cassandra-3.11 updated (3cda9d7 -> ebfd052)
This is an automated email from the ASF dual-hosted git repository. slebresne pushed a change to branch cassandra-3.11 in repository https://gitbox.apache.org/repos/asf/cassandra.git. from 3cda9d7 Merge branch cassandra-3.0 into cassandra-3.11 new 8358e19 Fix legacy handling of RangeTombstone with collection ones new c8a2834 Fix LegacyLayout handling of non-selected collection tombstones new ebfd052 Merge commit 'c8a2834606d683ba9945e9cc11bdb4207ce269d1' into cassandra-3.11 The 3 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: CHANGES.txt| 1 + src/java/org/apache/cassandra/db/LegacyLayout.java | 105 + .../cassandra/db/UnfilteredDeserializer.java | 129 - .../upgrade/MixedModeRangeTombstoneTest.java | 73 .../org/apache/cassandra/db/LegacyLayoutTest.java | 102 +--- 5 files changed, 340 insertions(+), 70 deletions(-) create mode 100644 test/distributed/org/apache/cassandra/distributed/upgrade/MixedModeRangeTombstoneTest.java - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] 01/02: Fix legacy handling of RangeTombstone with collection ones
This is an automated email from the ASF dual-hosted git repository. slebresne pushed a commit to branch cassandra-3.0 in repository https://gitbox.apache.org/repos/asf/cassandra.git commit 8358e19840d352475a5831d130ff3c43a11f2f4e Author: Sylvain Lebresne AuthorDate: Fri May 8 18:12:55 2020 +0200 Fix legacy handling of RangeTombstone with collection ones When a multi-row range tombstone interacts with a a collection tombstone within one of a covered row, the resulting range tombstone in the legacy format will start in the middle of the row and extend past said row and it needs special handling. Before this commit, the code deserializing that RT was making it artificially start at the end of the row (in which the collection tombstone is), but that means that when `LegacyLayout.CellGrouper` encountered it, it decided the row was finished, even if it was not, leading to potential row duplication. The patch solves this by: 1. making that problematic tombstone start at the beginning of the row instead of its end (to avoid code deciding the row is over). 2. modify `UnfilteredDeserializer` to 'split' that range tombstone into a row tombstone for the row it covers, which is handled as a normal row tombstone, and push the rest of the range tombstone (that starts after the row and extends to the original end of the RT) to be handled after that row is fully "grouped". The patch also removes the possibility of getting an empty row from `LegacyLayout#getNextRow` to avoid theoretical problems with that. Patch by Sylvain Lebresne; reviewed by Marcus Eriksson & Aleksey Yeschenko for CASSANDRA-15805 --- CHANGES.txt| 1 + src/java/org/apache/cassandra/db/LegacyLayout.java | 99 .../cassandra/db/UnfilteredDeserializer.java | 129 - .../upgrade/MixedModeRangeTombstoneTest.java | 73 4 files changed, 252 insertions(+), 50 deletions(-) diff --git a/CHANGES.txt b/CHANGES.txt index cdb9ad0..46b3f56 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 3.0.21 + * Fix duplicated row on 2.x upgrades when multi-rows range tombstones interact with collection ones (CASSANDRA-15805) * Rely on snapshotted session infos on StreamResultFuture.maybeComplete to avoid race conditions (CASSANDRA-15667) * EmptyType doesn't override writeValue so could attempt to write bytes when expected not to (CASSANDRA-15790) * Fix index queries on partition key columns when some partitions contains only static data (CASSANDRA-13666) diff --git a/src/java/org/apache/cassandra/db/LegacyLayout.java b/src/java/org/apache/cassandra/db/LegacyLayout.java index 37cc935..39dd54a 100644 --- a/src/java/org/apache/cassandra/db/LegacyLayout.java +++ b/src/java/org/apache/cassandra/db/LegacyLayout.java @@ -1115,7 +1115,7 @@ public abstract class LegacyLayout return true; } -private static Comparator legacyAtomComparator(CFMetaData metadata) +static Comparator legacyAtomComparator(CFMetaData metadata) { return (o1, o2) -> { @@ -1373,8 +1373,24 @@ public abstract class LegacyLayout this.hasValidCells = false; } +/** + * Try adding the provided atom to the currently grouped row. + * + * @param atom the new atom to try to add. This must be a "row" atom, that is either a cell or a legacy + * range tombstone that covers only one row (row deletion) or a subset of it (collection + * deletion). Meaning that legacy range tombstone covering multiple rows (that should be handled as + * legit range tombstone in the new storage engine) should be handled separately. Atoms should also + * be provided in proper clustering order. + * @return {@code true} if the provided atom has been "consumed" by this grouper (this does _not_ mean the + * atom has been "used" by the grouper as the grouper will skip some shadowed atoms for instance, just + * that {@link #getRow()} shouldn't be called just yet if there is more atom in the atom iterator we're + * grouping). {@code false} otherwise, that is if the row currently built by this grouper is done + * _without_ the provided atom being "consumed" (and so {@link #getRow()} should be called and the + * grouper resetted, after which the provided atom should be provided again). + */ public boolean addAtom(LegacyAtom atom) { +assert atom.isRowAtom(metadata) : "Unexpected non in-row legacy range tombstone " + atom; return atom.isCell() ? addCell(atom.asCell()) : addRangeTombstone(atom.asRangeTombstone()); @@ -1472,11 +1488,16 @@ public abstract class
[cassandra] 02/02: Fix LegacyLayout handling of non-selected collection tombstones
This is an automated email from the ASF dual-hosted git repository. slebresne pushed a commit to branch cassandra-3.0 in repository https://gitbox.apache.org/repos/asf/cassandra.git commit c8a2834606d683ba9945e9cc11bdb4207ce269d1 Author: Sylvain Lebresne AuthorDate: Wed May 13 11:44:08 2020 +0200 Fix LegacyLayout handling of non-selected collection tombstones If a collection tombstone is not included by a query, it can be ignored, but it currently made `LegacyLayout.CellGrouper#addCollectionTombstone` return `false`, which made it stop the current row, which is incorrect (this can potentially lead to a duplicate row). This patch changes it to return `true`. Patch by Sylvain Lebresne; reviewed by Marcus Eriksson & Aleksey Yeschenko for CASSANDRA-15805 --- src/java/org/apache/cassandra/db/LegacyLayout.java | 6 +- .../org/apache/cassandra/db/LegacyLayoutTest.java | 102 + 2 files changed, 88 insertions(+), 20 deletions(-) diff --git a/src/java/org/apache/cassandra/db/LegacyLayout.java b/src/java/org/apache/cassandra/db/LegacyLayout.java index 39dd54a..8492de5 100644 --- a/src/java/org/apache/cassandra/db/LegacyLayout.java +++ b/src/java/org/apache/cassandra/db/LegacyLayout.java @@ -1537,8 +1537,12 @@ public abstract class LegacyLayout private boolean addCollectionTombstone(LegacyRangeTombstone tombstone) { +// If the collection tombstone is not included in the query (which technically would only apply to thrift +// queries since CQL one "fetch" everything), we can skip it (so return), but we're problably still within +// the current row so we return `true`. Technically, it is possible that tombstone belongs to another row +// that the row currently grouped, but as we ignore it, returning `true` is ok in that case too. if (!helper.includes(tombstone.start.collectionName)) -return false; // see CASSANDRA-13109 +return true; // see CASSANDRA-13109 // The helper needs to be informed about the current complex column identifier before // it can perform the comparison between the recorded drop time and the RT deletion time. diff --git a/test/unit/org/apache/cassandra/db/LegacyLayoutTest.java b/test/unit/org/apache/cassandra/db/LegacyLayoutTest.java index 0bb2459..f0d2a02 100644 --- a/test/unit/org/apache/cassandra/db/LegacyLayoutTest.java +++ b/test/unit/org/apache/cassandra/db/LegacyLayoutTest.java @@ -24,18 +24,19 @@ import java.nio.file.Files; import java.nio.file.Path; import java.nio.file.Paths; +import org.apache.cassandra.db.LegacyLayout.CellGrouper; +import org.apache.cassandra.db.LegacyLayout.LegacyBound; +import org.apache.cassandra.db.LegacyLayout.LegacyCell; +import org.apache.cassandra.db.LegacyLayout.LegacyRangeTombstone; import org.apache.cassandra.db.filter.ColumnFilter; import org.apache.cassandra.db.marshal.MapType; import org.apache.cassandra.db.marshal.UTF8Type; -import org.apache.cassandra.db.partitions.ImmutableBTreePartition; import org.apache.cassandra.db.rows.BufferCell; import org.apache.cassandra.db.rows.Cell; import org.apache.cassandra.db.rows.RowIterator; -import org.apache.cassandra.db.rows.Rows; import org.apache.cassandra.db.rows.SerializationHelper; import org.apache.cassandra.db.rows.UnfilteredRowIterator; import org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer; -import org.apache.cassandra.db.rows.UnfilteredRowIterators; import org.apache.cassandra.db.transform.FilteredRows; import org.apache.cassandra.exceptions.ConfigurationException; import org.apache.cassandra.io.util.DataInputBuffer; @@ -62,10 +63,10 @@ import org.apache.cassandra.db.rows.BTreeRow; import org.apache.cassandra.db.rows.Row; import org.apache.cassandra.dht.Murmur3Partitioner; import org.apache.cassandra.schema.KeyspaceParams; -import org.apache.cassandra.utils.ByteBufferUtil; import org.apache.cassandra.utils.Hex; import static org.apache.cassandra.net.MessagingService.VERSION_21; +import static org.apache.cassandra.utils.ByteBufferUtil.bytes; import static org.junit.Assert.*; public class LegacyLayoutTest @@ -98,7 +99,7 @@ public class LegacyLayoutTest builder.addComplexDeletion(b, new DeletionTime(1L, 1)); Row row = builder.build(); -ByteBuffer key = ByteBufferUtil.bytes(1); +ByteBuffer key = bytes(1); PartitionUpdate upd = PartitionUpdate.singleRowUpdate(table, key, row); LegacyLayout.LegacyUnfilteredPartition p = LegacyLayout.fromUnfilteredRowIterator(null, upd.unfilteredIterator()); @@ -216,7 +217,7 @@ public class LegacyLayoutTest builder.addCell(new BufferCell(v, 1L, Cell.NO_TTL, Cell.NO_DELETION_TIME, Int32Serializer.instance.serialize(1), null)); Row row = builder.build(); -DecoratedKey pk = table.decorateKey(ByteBufferUtil.bytes(1)); +DecoratedKey pk =
[cassandra] branch cassandra-3.0 updated (a4b6deb -> c8a2834)
This is an automated email from the ASF dual-hosted git repository. slebresne pushed a change to branch cassandra-3.0 in repository https://gitbox.apache.org/repos/asf/cassandra.git. from a4b6deb Rely on snapshotted session infos on StreamResultFuture.maybeComplete to avoid races new 8358e19 Fix legacy handling of RangeTombstone with collection ones new c8a2834 Fix LegacyLayout handling of non-selected collection tombstones The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: CHANGES.txt| 1 + src/java/org/apache/cassandra/db/LegacyLayout.java | 105 + .../cassandra/db/UnfilteredDeserializer.java | 129 - .../upgrade/MixedModeRangeTombstoneTest.java | 73 .../org/apache/cassandra/db/LegacyLayoutTest.java | 102 +--- 5 files changed, 340 insertions(+), 70 deletions(-) create mode 100644 test/distributed/org/apache/cassandra/distributed/upgrade/MixedModeRangeTombstoneTest.java - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies
[ https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117833#comment-17117833 ] Sylvain Lebresne commented on CASSANDRA-12126: -- bq. But we do have other serious consistency violations that should also be fixed. Could you expand on that? > CAS Reads Inconsistencies > -- > > Key: CASSANDRA-12126 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12126 > Project: Cassandra > Issue Type: Bug > Components: Feature/Lightweight Transactions, Legacy/Coordination >Reporter: Sankalp Kohli >Assignee: Sylvain Lebresne >Priority: Normal > Labels: LWT, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > While looking at the CAS code in Cassandra, I found a potential issue with > CAS Reads. Here is how it can happen with RF=3 > 1) You issue a CAS Write and it fails in the propose phase. A machine replies > true to a propose and saves the commit in accepted filed. The other two > machines B and C does not get to the accept phase. > Current state is that machine A has this commit in paxos table as accepted > but not committed and B and C does not. > 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the > value written in step 1. This step is as if nothing is inflight. > 3) Issue another CAS Read and it goes to A and B. Now we will discover that > there is something inflight from A and will propose and commit it with the > current ballot. Now we can read the value written in step 1 as part of this > CAS read. > If we skip step 3 and instead run step 4, we will never learn about value > written in step 1. > 4. Issue a CAS Write and it involves only B and C. This will succeed and > commit a different value than step 1. Step 1 value will never be seen again > and was never seen before. > If you read the Lamport “paxos made simple” paper and read section 2.3. It > talks about this issue which is how learners can find out if majority of the > acceptors have accepted the proposal. > In step 3, it is correct that we propose the value again since we dont know > if it was accepted by majority of acceptors. When we ask majority of > acceptors, and more than one acceptors but not majority has something in > flight, we have no way of knowing if it is accepted by majority of acceptors. > So this behavior is correct. > However we need to fix step 2, since it caused reads to not be linearizable > with respect to writes and other reads. In this case, we know that majority > of acceptors have no inflight commit which means we have majority that > nothing was accepted by majority. I think we should run a propose step here > with empty commit and that will cause write written in step 1 to not be > visible ever after. > With this fix, we will either see data written in step 1 on next serial read > or will never see it which is what we want. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15778) CorruptSSTableException after a 2.1 SSTable is upgraded to 3.0, failing reads
[ https://issues.apache.org/jira/browse/CASSANDRA-15778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117827#comment-17117827 ] Sylvain Lebresne commented on CASSANDRA-15778: -- That patch looks like a reasonable solution to me, at least from my understanding of the issue. Small comments on the code itself: * I'd put a comment in {{AlterTableStatement}} to point out to this ticket (may feel like a peculiar special case to future readers without context). * In {{AbstractType}}, the changes to {{writeValue}}/{{writtenLength}} feels confusing to me, and if the new code is ever triggered, this would mean we silently drop a value on the floor (we get a non-empty value, but the type say the value should be empty, so we'd write nothing), and that doesn't feel lik a good idea. Instead of specializing the 0 size case, I'd just add a {{assert valueLengthIfFixed < 0 || value.remaining() == valueLengthIfFixed}} to basically ensure we're not going to write something we don't know how to read (and effectively forbid the call of those method for {{EmptyType}} in conjunction with the existing assert). * Assuming we agree on the previous point, I'd prefer not overriding the methods in {{EmptyType}}. For the write ones, it wouldn't add anything, and overriding {{readValue}} feels confusing when the rest of the code ensures we can never write an empty value through those methods. * Nit: LegacySchemaMigrator has unused leftover imports ({{java.io.InvalidClassException}} and {{net.bytebuddy.implementation.bytecode.Throw}}). > CorruptSSTableException after a 2.1 SSTable is upgraded to 3.0, failing reads > - > > Key: CASSANDRA-15778 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15778 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction, Local/SSTable >Reporter: Sumanth Pasupuleti >Assignee: Alex Petrov >Priority: Normal > Fix For: 3.0.x > > > Below is the exception with stack trace. This issue is consistently > reproduce-able. > {code:java} > ERROR [SharedPool-Worker-1] 2020-05-01 14:57:57,661 > AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread > Thread[SharedPool-Worker-1,5,main]ERROR [SharedPool-Worker-1] 2020-05-01 > 14:57:57,661 AbstractLocalAwareExecutorService.java:169 - Uncaught exception > on thread > Thread[SharedPool-Worker-1,5,main]org.apache.cassandra.io.sstable.CorruptSSTableException: > Corrupted: > /mnt/data/cassandra/data// at > org.apache.cassandra.db.columniterator.AbstractSSTableIterator$Reader.hasNext(AbstractSSTableIterator.java:349) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.db.columniterator.AbstractSSTableIterator.hasNext(AbstractSSTableIterator.java:220) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.db.columniterator.SSTableIterator.hasNext(SSTableIterator.java:33) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:95) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:129) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:95) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:129) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:129) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:131) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at > org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77) > ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at >
[jira] [Updated] (CASSANDRA-15665) StreamManager should clearly differentiate between "initiator" and "receiver" sessions
[ https://issues.apache.org/jira/browse/CASSANDRA-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-15665: --- Reviewers: Benjamin Lerer, Sergio Bossa, Benjamin Lerer (was: Benjamin Lerer, Sergio Bossa) Benjamin Lerer, Sergio Bossa, Benjamin Lerer (was: Benjamin Lerer, Sergio Bossa) Status: Review In Progress (was: Patch Available) > StreamManager should clearly differentiate between "initiator" and "receiver" > sessions > -- > > Key: CASSANDRA-15665 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15665 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging >Reporter: Sergio Bossa >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0 > > > {{StreamManager}} does currently a suboptimal job in differentiating between > stream sessions (in form of {{StreamResultFuture}}) which have been either > initiated or "received", for the following reasons: > 1) Naming is IMO confusing: a "receiver" session could actually both send and > receive files, so technically an initiator is also a receiver. > 2) {{StreamManager#findSession()}} assumes we should first looking into > "initiator" sessions, then into "receiver" ones: this is a dangerous > assumptions, in particular for test environments where the same process could > work as both an initiator and a receiver. > I would recommend the following changes: > 1) Rename "receiver" with "follower" everywhere the former is used. > 2) Introduce a new flag into {{StreamMessageHeader}} to signal if the message > comes from an initiator or follower session, in order to correctly > differentiate and look for sessions in {{StreamManager}}. > While my arguments above might seem trivial, I believe they will improve > clarity and save from potential bugs/headaches at testing time, and doing > such changes now that we're revamping streaming for 4.0 seems the right time. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15812) Submitting Validation requests can block ANTI_ENTROPY stage
[ https://issues.apache.org/jira/browse/CASSANDRA-15812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117787#comment-17117787 ] Benjamin Lerer commented on CASSANDRA-15812: The patch looks good to me. I just wonder if it makes sense to allow users to use the blocking behavior for the {{ValidationExecutor}} as we know that this approach might lead to blocking the ANTI_ENTROPY stage. Is there a scenario in which using that behavior would be better? > Submitting Validation requests can block ANTI_ENTROPY stage > > > Key: CASSANDRA-15812 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15812 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 4.0-alpha > > > RepairMessages are handled on Stage.ANTI_ENTROPY, which has a thread pool > with core/max capacity of one, ie. we can only process one message at a time. > > Scheduling validation compactions may however block the stage completely, by > blocking on CompactionManager's ValidationExecutor while submitting a new > validation compaction, in cases where there are already more validations > running than can be executed in parallel. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15825) Fix flaky test incrementalSSTableSelection - org.apache.cassandra.db.streaming.CassandraStreamManagerTest
[ https://issues.apache.org/jira/browse/CASSANDRA-15825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Berenguer Blasi updated CASSANDRA-15825: Test and Documentation Plan: See PR Status: Patch Available (was: In Progress) > Fix flaky test incrementalSSTableSelection - > org.apache.cassandra.db.streaming.CassandraStreamManagerTest > - > > Key: CASSANDRA-15825 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15825 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: David Capwell >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-alpha > > Time Spent: 20m > Remaining Estimate: 0h > > Build link: > https://app.circleci.com/pipelines/github/dcapwell/cassandra/287/workflows/06baf3db-7094-431f-920d-e8fcd1da9cce/jobs/1398 > > {code} > java.lang.RuntimeException: java.nio.file.NoSuchFileException: > /tmp/cassandra/build/test/cassandra/data:2/ks_1589913975959/tbl-051c0a709a0111eab5fb6f52366536f8/na-4-big-Statistics.db > at > org.apache.cassandra.io.util.ChannelProxy.openChannel(ChannelProxy.java:55) > at > org.apache.cassandra.io.util.ChannelProxy.(ChannelProxy.java:66) > at > org.apache.cassandra.io.util.RandomAccessReader.open(RandomAccessReader.java:315) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:126) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:136) > at > org.apache.cassandra.io.sstable.format.SSTableReader.reloadSSTableMetadata(SSTableReader.java:2047) > at > org.apache.cassandra.db.streaming.CassandraStreamManagerTest.mutateRepaired(CassandraStreamManagerTest.java:128) > at > org.apache.cassandra.db.streaming.CassandraStreamManagerTest.incrementalSSTableSelection(CassandraStreamManagerTest.java:175) > Caused by: java.nio.file.NoSuchFileException: > /tmp/cassandra/build/test/cassandra/data:2/ks_1589913975959/tbl-051c0a709a0111eab5fb6f52366536f8/na-4-big-Statistics.db > at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) > at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > at > sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177) > at java.nio.channels.FileChannel.open(FileChannel.java:287) > at java.nio.channels.FileChannel.open(FileChannel.java:335) > at > org.apache.cassandra.io.util.ChannelProxy.openChannel(ChannelProxy.java:51) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies
[ https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117652#comment-17117652 ] Benedict Elliott Smith commented on CASSANDRA-12126: bq. I'm rather cold on that because, tbh. I think non-strict serializability is a theoretical notion that is useless in practice and that it is something we should not offer. And I'd rather avoid one more "feature" for which we spend our time saying "don't use it". Yeah, I'm very sympathetic to this view, and have always assumed linearizability with partitions as the object. I'm just really trying to morally justify providing some time to fix this without any negative repercussions. Either way, we should definitely clarify what we mean by SERIAL in some official project documentation somewhere though. We probably need to do so in terms of strict serializability as opposed to linearizability, so that it can be consistent with a future in which we support multi-partition transactions (which as a project we really need to deliver in the not-too-distant future). bq. non-applying CAS for that FWIW, I think this particular case is a no-brainer; there's no real cost to strengthening the semantics of non-applying CAS IMO, since users should anticipate their CAS operations will ordinarily take this long. Whatever the conclusion of our discussion, I think we should apply a fix at least for the non-applying case immediately, and I do not believe any flag to disable this part of the fix is necessary. Reads are trickier, because the user will see a significant performance penalty on patch version upgrade. I'm sympathetic to the view we should just fix the read part immediately, performance regressions be damned. But we do have other serious consistency violations that should also be fixed. I think it is worth _considering_ if we should instead aggressively try to remedy all of the known issues, have a strong verification push, and then roll out all of the changes at-once - including a fix for this that does not regress performance. It might seem a lot for a patch version, but I'm not sure risk is a concern when we know there are several serious problems today, and have been for years. I'm not going to advocate super strongly for either approach, as I don't think there's a clear answer, I just want to raise the alternative as an option to expressly consider. bq. Awesome, thanks. I'll look at integrating those in the branch if you don't mind. Absolutely, that was my intention. > CAS Reads Inconsistencies > -- > > Key: CASSANDRA-12126 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12126 > Project: Cassandra > Issue Type: Bug > Components: Feature/Lightweight Transactions, Legacy/Coordination >Reporter: Sankalp Kohli >Assignee: Sylvain Lebresne >Priority: Normal > Labels: LWT, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > While looking at the CAS code in Cassandra, I found a potential issue with > CAS Reads. Here is how it can happen with RF=3 > 1) You issue a CAS Write and it fails in the propose phase. A machine replies > true to a propose and saves the commit in accepted filed. The other two > machines B and C does not get to the accept phase. > Current state is that machine A has this commit in paxos table as accepted > but not committed and B and C does not. > 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the > value written in step 1. This step is as if nothing is inflight. > 3) Issue another CAS Read and it goes to A and B. Now we will discover that > there is something inflight from A and will propose and commit it with the > current ballot. Now we can read the value written in step 1 as part of this > CAS read. > If we skip step 3 and instead run step 4, we will never learn about value > written in step 1. > 4. Issue a CAS Write and it involves only B and C. This will succeed and > commit a different value than step 1. Step 1 value will never be seen again > and was never seen before. > If you read the Lamport “paxos made simple” paper and read section 2.3. It > talks about this issue which is how learners can find out if majority of the > acceptors have accepted the proposal. > In step 3, it is correct that we propose the value again since we dont know > if it was accepted by majority of acceptors. When we ask majority of > acceptors, and more than one acceptors but not majority has something in > flight, we have no way of knowing if it is accepted by majority of acceptors. > So this behavior is correct. > However we need to fix step 2, since it caused reads to not be linearizable > with respect to writes and other reads. In this case, we know that majority > of acceptors
[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies
[ https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117628#comment-17117628 ] Sylvain Lebresne commented on CASSANDRA-12126: -- bq. I'm amenable to such flag Actually, let me rephrase that a bit. I'd *really* prefer not adding such flag. If someone is ok with serializability without linearizability, then they can use QUORUM reads, and given how things are implemented, it provides (non-strict) serializability. Granted, for someone that uses SERIAL today, is ok with the lack of linearizability and can't afford the performance penalty, it'll require a client side change, which this flag would avoid, so there is not zero value to such flag. But I suspect user fitting that category (knowingly ok with lack of linearizability) is really really small, and we always have to make trade-offs. So in that case I feel adding one more flag, one I consider dangerous, is not worth it. So to clarify, if a consensus appears for such flag, so be it, I'll add it, but I'm personally not neutral either. > CAS Reads Inconsistencies > -- > > Key: CASSANDRA-12126 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12126 > Project: Cassandra > Issue Type: Bug > Components: Feature/Lightweight Transactions, Legacy/Coordination >Reporter: Sankalp Kohli >Assignee: Sylvain Lebresne >Priority: Normal > Labels: LWT, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > While looking at the CAS code in Cassandra, I found a potential issue with > CAS Reads. Here is how it can happen with RF=3 > 1) You issue a CAS Write and it fails in the propose phase. A machine replies > true to a propose and saves the commit in accepted filed. The other two > machines B and C does not get to the accept phase. > Current state is that machine A has this commit in paxos table as accepted > but not committed and B and C does not. > 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the > value written in step 1. This step is as if nothing is inflight. > 3) Issue another CAS Read and it goes to A and B. Now we will discover that > there is something inflight from A and will propose and commit it with the > current ballot. Now we can read the value written in step 1 as part of this > CAS read. > If we skip step 3 and instead run step 4, we will never learn about value > written in step 1. > 4. Issue a CAS Write and it involves only B and C. This will succeed and > commit a different value than step 1. Step 1 value will never be seen again > and was never seen before. > If you read the Lamport “paxos made simple” paper and read section 2.3. It > talks about this issue which is how learners can find out if majority of the > acceptors have accepted the proposal. > In step 3, it is correct that we propose the value again since we dont know > if it was accepted by majority of acceptors. When we ask majority of > acceptors, and more than one acceptors but not majority has something in > flight, we have no way of knowing if it is accepted by majority of acceptors. > So this behavior is correct. > However we need to fix step 2, since it caused reads to not be linearizable > with respect to writes and other reads. In this case, we know that majority > of acceptors have no inflight commit which means we have majority that > nothing was accepted by majority. I think we should run a propose step here > with empty commit and that will cause write written in step 1 to not be > visible ever after. > With this fix, we will either see data written in step 1 on next serial read > or will never see it which is what we want. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15825) Fix flaky test incrementalSSTableSelection - org.apache.cassandra.db.streaming.CassandraStreamManagerTest
[ https://issues.apache.org/jira/browse/CASSANDRA-15825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117624#comment-17117624 ] Berenguer Blasi commented on CASSANDRA-15825: - I noticed commenting out the other test in that class made this one fail consistently. So there seems to be cross talk between test cases. It seems that on the 4th sstable a compaction is triggered and that can remove the sstables under our feet depending on who is faster. The fix I applied is to prevent the compaction from running. I hope it makes sense. Waiting on CI now. > Fix flaky test incrementalSSTableSelection - > org.apache.cassandra.db.streaming.CassandraStreamManagerTest > - > > Key: CASSANDRA-15825 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15825 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: David Capwell >Assignee: Berenguer Blasi >Priority: Normal > Fix For: 4.0-alpha > > Time Spent: 20m > Remaining Estimate: 0h > > Build link: > https://app.circleci.com/pipelines/github/dcapwell/cassandra/287/workflows/06baf3db-7094-431f-920d-e8fcd1da9cce/jobs/1398 > > {code} > java.lang.RuntimeException: java.nio.file.NoSuchFileException: > /tmp/cassandra/build/test/cassandra/data:2/ks_1589913975959/tbl-051c0a709a0111eab5fb6f52366536f8/na-4-big-Statistics.db > at > org.apache.cassandra.io.util.ChannelProxy.openChannel(ChannelProxy.java:55) > at > org.apache.cassandra.io.util.ChannelProxy.(ChannelProxy.java:66) > at > org.apache.cassandra.io.util.RandomAccessReader.open(RandomAccessReader.java:315) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:126) > at > org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:136) > at > org.apache.cassandra.io.sstable.format.SSTableReader.reloadSSTableMetadata(SSTableReader.java:2047) > at > org.apache.cassandra.db.streaming.CassandraStreamManagerTest.mutateRepaired(CassandraStreamManagerTest.java:128) > at > org.apache.cassandra.db.streaming.CassandraStreamManagerTest.incrementalSSTableSelection(CassandraStreamManagerTest.java:175) > Caused by: java.nio.file.NoSuchFileException: > /tmp/cassandra/build/test/cassandra/data:2/ks_1589913975959/tbl-051c0a709a0111eab5fb6f52366536f8/na-4-big-Statistics.db > at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) > at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > at > sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177) > at java.nio.channels.FileChannel.open(FileChannel.java:287) > at java.nio.channels.FileChannel.open(FileChannel.java:335) > at > org.apache.cassandra.io.util.ChannelProxy.openChannel(ChannelProxy.java:51) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies
[ https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117615#comment-17117615 ] Sylvain Lebresne commented on CASSANDRA-12126: -- {quote}I think we should include a flag to disable the fix {quote} The option of having a flag occurred to me, but I rejected it initially because I continue to believe the current behavior is wrong (a moral judgment, I guess) and in principle, having a "please, make my database broken" flag does not feel like a good idea. But I reckon that it _may_ exists advanced users that did noticed the lack of linearizability for reads and effectively built around it knowingly, for which the performance impact may be considered a regression with no upside (but if you sense skepticism on my part when reading that sentence, you're radar is not completely off). And as we're talking minor upgrade here, I'm amenable to such flag, though I'd prefer making it clear somehow that it is unsafe/risky and something we may remove in the future with no particular warning. {quote}It would be good to have a test for that as well. {quote} Certainly, good point, I can add the 2 missing interleaving. {quote}do we actually claim our consistency properties are for SERIAL? {quote} While our official doc on the matter is certainly lacking (not spelling much guarantee at all afaict, and I'm happy to piggy-back on this ticket to correct that), we've always implied linearizability. I have, at least, and I'm sure I can dig up other doing it as well on the mailing list if necessary. We did this both by throwing the linearizable word out from time to time, but also by repeatedly recommending that when a write times out, one needs to issue a SERIAL read to 'observe' if that write went through or not (and as an aside, if you can't rely on either reads or non-applying CAS for that, I'm not even sure how to use LWTs, except maybe for excessively specific cases). {quote}perhaps we should instead introduce a new STRICT_SERIAL consistency level {quote} I'm rather cold on that because, tbh. I think non-strict serializability is a theoretical notion that is useless in practice and that it is something we should not offer. And I'd rather avoid one more "feature" for which we spend our time saying "don't use it". {quote}I've pushed various test cases {quote} Awesome, thanks. I'll look at integrating those in the branch if you don't mind. > CAS Reads Inconsistencies > -- > > Key: CASSANDRA-12126 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12126 > Project: Cassandra > Issue Type: Bug > Components: Feature/Lightweight Transactions, Legacy/Coordination >Reporter: Sankalp Kohli >Assignee: Sylvain Lebresne >Priority: Normal > Labels: LWT, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > While looking at the CAS code in Cassandra, I found a potential issue with > CAS Reads. Here is how it can happen with RF=3 > 1) You issue a CAS Write and it fails in the propose phase. A machine replies > true to a propose and saves the commit in accepted filed. The other two > machines B and C does not get to the accept phase. > Current state is that machine A has this commit in paxos table as accepted > but not committed and B and C does not. > 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the > value written in step 1. This step is as if nothing is inflight. > 3) Issue another CAS Read and it goes to A and B. Now we will discover that > there is something inflight from A and will propose and commit it with the > current ballot. Now we can read the value written in step 1 as part of this > CAS read. > If we skip step 3 and instead run step 4, we will never learn about value > written in step 1. > 4. Issue a CAS Write and it involves only B and C. This will succeed and > commit a different value than step 1. Step 1 value will never be seen again > and was never seen before. > If you read the Lamport “paxos made simple” paper and read section 2.3. It > talks about this issue which is how learners can find out if majority of the > acceptors have accepted the proposal. > In step 3, it is correct that we propose the value again since we dont know > if it was accepted by majority of acceptors. When we ask majority of > acceptors, and more than one acceptors but not majority has something in > flight, we have no way of knowing if it is accepted by majority of acceptors. > So this behavior is correct. > However we need to fix step 2, since it caused reads to not be linearizable > with respect to writes and other reads. In this case, we know that majority > of acceptors have no inflight commit which means we have majority that > nothing was accepted by majority. I think we should
[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies
[ https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117597#comment-17117597 ] Benedict Elliott Smith commented on CASSANDRA-12126: I've pushed various test cases [here|https://github.com/belliottsmith/cassandra/tree/12126-tests-3.0] - most of them marked {{@Ignore}} because they are known to fail, and won't be resolved immediately. > CAS Reads Inconsistencies > -- > > Key: CASSANDRA-12126 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12126 > Project: Cassandra > Issue Type: Bug > Components: Feature/Lightweight Transactions, Legacy/Coordination >Reporter: Sankalp Kohli >Assignee: Sylvain Lebresne >Priority: Normal > Labels: LWT, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > While looking at the CAS code in Cassandra, I found a potential issue with > CAS Reads. Here is how it can happen with RF=3 > 1) You issue a CAS Write and it fails in the propose phase. A machine replies > true to a propose and saves the commit in accepted filed. The other two > machines B and C does not get to the accept phase. > Current state is that machine A has this commit in paxos table as accepted > but not committed and B and C does not. > 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the > value written in step 1. This step is as if nothing is inflight. > 3) Issue another CAS Read and it goes to A and B. Now we will discover that > there is something inflight from A and will propose and commit it with the > current ballot. Now we can read the value written in step 1 as part of this > CAS read. > If we skip step 3 and instead run step 4, we will never learn about value > written in step 1. > 4. Issue a CAS Write and it involves only B and C. This will succeed and > commit a different value than step 1. Step 1 value will never be seen again > and was never seen before. > If you read the Lamport “paxos made simple” paper and read section 2.3. It > talks about this issue which is how learners can find out if majority of the > acceptors have accepted the proposal. > In step 3, it is correct that we propose the value again since we dont know > if it was accepted by majority of acceptors. When we ask majority of > acceptors, and more than one acceptors but not majority has something in > flight, we have no way of knowing if it is accepted by majority of acceptors. > So this behavior is correct. > However we need to fix step 2, since it caused reads to not be linearizable > with respect to writes and other reads. In this case, we know that majority > of acceptors have no inflight commit which means we have majority that > nothing was accepted by majority. I think we should run a propose step here > with empty commit and that will cause write written in step 1 to not be > visible ever after. > With this fix, we will either see data written in step 1 on next serial read > or will never see it which is what we want. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-15830) Invalid version value: 4.0~alpha4 during startup
[ https://issues.apache.org/jira/browse/CASSANDRA-15830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever reassigned CASSANDRA-15830: -- Assignee: Michael Semb Wever > Invalid version value: 4.0~alpha4 during startup > > > Key: CASSANDRA-15830 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15830 > Project: Cassandra > Issue Type: Bug >Reporter: Eric Wong >Assignee: Michael Semb Wever >Priority: Normal > > Hi: > We are testing the latest cassandra-4.0 on Centos 7 using a clean database. > When we started cassandra the first time, everything is fine. However, when > we stop and restart cassandra, we got the following error and the db refuses > to startup: > {code} > ERROR [main] 2020-05-22 05:58:18,698 CassandraDaemon.java:789 - Exception > encountered during startup > java.lang.IllegalArgumentException: Invalid version value: 4.0~alpha4 > at > org.apache.cassandra.utils.CassandraVersion.(CassandraVersion.java:64) > at > org.apache.cassandra.io.sstable.SSTableHeaderFix.fixNonFrozenUDTIfUpgradeFrom30(SSTableHeaderFix.java:84) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:250) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:650) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:767) > {code} > The only way to get the node up and running again is by deleting all data > under /var/lib/cassandra. > > Is that a known issue? > Thanks, Eric > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org