[jira] [Commented] (CASSANDRA-18986) SHA1 keys prevent installation on RHEL 9

2023-10-31 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781395#comment-17781395
 ] 

Sylvain Lebresne commented on CASSANDRA-18986:
--

I might need to be reminded how I would go about removing my key, but otherwise 
happy to do it.

> SHA1 keys prevent installation on RHEL 9
> 
>
> Key: CASSANDRA-18986
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18986
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Brandon Williams
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x
>
>
> Due to the presence of SHA1 keys they have to be explicitly allowed before C* 
> can be installed on RHEL 9-based systems: 
> {quote}
> Importing GPG key 0xF2833C93:
>  Userid : "Eric Evans "
>  Fingerprint: CEC8 6BB4 A0BA 9D0F 9039 7CAE F835 8FA2 F283 3C93
>  From   : https://downloads.apache.org/cassandra/KEYS
> Is this ok [y/N]: y
> Key imported successfully
> Importing GPG key 0x8D77295D:
>  Userid : "Eric Evans "
>  Fingerprint: C496 5EE9 E301 5D19 2CCC F2B6 F758 CE31 8D77 295D
>  From   : https://downloads.apache.org/cassandra/KEYS
> Is this ok [y/N]: y
> Key imported successfully
> Importing GPG key 0x2B5C1B00:
>  Userid : "Sylvain Lebresne (pcmanus) "
>  Fingerprint: 5AED 1BF3 78E9 A19D ADE1 BCB3 4BD7 36A8 2B5C 1B00
>  From   : https://downloads.apache.org/cassandra/KEYS
> Is this ok [y/N]: y
> warning: Signature not supported. Hash algorithm SHA1 not available.
> Key import failed (code 2). Failing package is: cassandra-4.0.11-1.noarch
>  GPG Keys are configured as: https://downloads.apache.org/cassandra/KEYS
> The downloaded packages were saved in cache until the next successful 
> transaction.
> You can remove cached packages by executing 'yum clean packages'.
> Error: GPG check FAILED
> {quote}
> This can be worked around by allowing SHA1:
> {quote}
> update-crypto-policies --set DEFAULT:SHA1
> {quote}
> https://www.redhat.com/en/blog/rhel-security-sha-1-package-signatures-distrusted-rhel-9



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16355) Fix flaky test incompletePropose - org.apache.cassandra.distributed.test.CASTest

2021-02-16 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17285276#comment-17285276
 ] 

Sylvain Lebresne commented on CASSANDRA-16355:
--

It sounds very plausible than 200ms is on the low end for CI, and [~blerer] 
explanations of why timeouts would lead to the failure seen on this ticket make 
sense.

So +1 for the patch raising the timeout. Maybe just a nit: I'd use the 
opportunity for moving the timeout value into a constant.

That said, it's a bit unfortunate the test failures don't surface more clearly 
that this is due to a timeout. The reason this happen, at least for 
`incompletePropose` and if I understand correctly, is that while the initial 
inserts timeout _before_ it was supposed to, the `catch` doesn't know that. So 
what about modifying `IMessageFilters.Filter` so that it counts the number of 
messages it drops? With that, we could check after that the first insert 
timeout that the filter was triggered. And if it wasn't, that would imply we 
timed out before we were supposed too (and we could have a message saying "Hey, 
CI is slow again today").

But I do understand this imply a change and subsequent release of the in-jvm 
dtest API, so I'd be fine committing the timeout bump now for the sake of 
cleaning up CI and have that "improvement" in a followup (or not at all, it's 
just a suggestion).


> Fix flaky test incompletePropose - 
> org.apache.cassandra.distributed.test.CASTest
> 
>
> Key: CASSANDRA-16355
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16355
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Test/dtest/java
>Reporter: David Capwell
>Assignee: Benjamin Lerer
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/853/workflows/0766c0de-956e-4831-aa40-9303748a2708/jobs/5030
> {code}
> junit.framework.AssertionFailedError: Expected: [[1, 1, 2]]
> Actual: []
>   at 
> org.apache.cassandra.distributed.shared.AssertUtils.fail(AssertUtils.java:193)
>   at 
> org.apache.cassandra.distributed.shared.AssertUtils.assertEquals(AssertUtils.java:163)
>   at 
> org.apache.cassandra.distributed.shared.AssertUtils.assertRows(AssertUtils.java:63)
>   at 
> org.apache.cassandra.distributed.test.CASTest.incompletePropose(CASTest.java:124)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16355) Fix flaky test incompletePropose - org.apache.cassandra.distributed.test.CASTest

2021-02-16 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-16355:
-
Reviewers: Sylvain Lebresne

> Fix flaky test incompletePropose - 
> org.apache.cassandra.distributed.test.CASTest
> 
>
> Key: CASSANDRA-16355
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16355
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Test/dtest/java
>Reporter: David Capwell
>Assignee: Benjamin Lerer
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/853/workflows/0766c0de-956e-4831-aa40-9303748a2708/jobs/5030
> {code}
> junit.framework.AssertionFailedError: Expected: [[1, 1, 2]]
> Actual: []
>   at 
> org.apache.cassandra.distributed.shared.AssertUtils.fail(AssertUtils.java:193)
>   at 
> org.apache.cassandra.distributed.shared.AssertUtils.assertEquals(AssertUtils.java:163)
>   at 
> org.apache.cassandra.distributed.shared.AssertUtils.assertRows(AssertUtils.java:63)
>   at 
> org.apache.cassandra.distributed.test.CASTest.incompletePropose(CASTest.java:124)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-11-27 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-12126:
-
Source Control Link: 
[3.0|https://github.com/apache/cassandra/commit/2d0b16804785660e8515aca9944784fb3733c619],
 
[3.11|https://github.com/apache/cassandra/commit/080280dc0177da6176dd4ba970e5a35aa7e2a729],
 [4.0|https://github.com/apache/cassandra/commit/9a3ca008bad2a7bfa887a]  (was: 
[3.0|https://github.com/apache/cassandra/commit/2d0b16804785660e8515aca9944784fb3733c619],[3.11|https://github.com/apache/cassandra/commit/080280dc0177da6176dd4ba970e5a35aa7e2a729],[trunk|https://github.com/apache/cassandra/commit/9a3ca008bad2a7bfa887a])

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
> Fix For: 4.0-beta4, 3.0.24, 3.11.10
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-11-27 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-12126:
-
Source Control Link: 
[3.0|https://github.com/apache/cassandra/commit/2d0b16804785660e8515aca9944784fb3733c619],[3.11|https://github.com/apache/cassandra/commit/080280dc0177da6176dd4ba970e5a35aa7e2a729],[trunk|https://github.com/apache/cassandra/commit/9a3ca008bad2a7bfa887a]
  (was: 
[3.0|https://github.com/apache/cassandra/commit/2d0b16804785660e8515aca9944784fb3733c619],
 
[3.11|https://github.com/apache/cassandra/commit/080280dc0177da6176dd4ba970e5a35aa7e2a729],
 [trunk|https://github.com/apache/cassandra/commit/9a3ca008bad2a7bfa887a)

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
> Fix For: 4.0-beta4, 3.0.24, 3.11.10
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-11-27 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-12126:
-
  Fix Version/s: (was: 4.0-beta)
 (was: 3.11.x)
 (was: 3.0.x)
 3.11.10
 3.0.24
 4.0-beta4
  Since Version: 2.0.0
Source Control Link: 
[3.0|https://github.com/apache/cassandra/commit/2d0b16804785660e8515aca9944784fb3733c619],
 
[3.11|https://github.com/apache/cassandra/commit/080280dc0177da6176dd4ba970e5a35aa7e2a729],
 [trunk|https://github.com/apache/cassandra/commit/9a3ca008bad2a7bfa887a
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Committed following the dev mailing list discussion. Thanks.

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
> Fix For: 4.0-beta4, 3.0.24, 3.11.10
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-11-05 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226643#comment-17226643
 ] 

Sylvain Lebresne commented on CASSANDRA-12126:
--

Thanks for the review. I've rebased the branches, but since the last runs were 
a while ago, I restarted CI runs. I'll commit if those look clean.

||branch||CI||
|[3.0|https://github.com/pcmanus/cassandra/tree/C-12126-3.0]|[Run 
#171|https://ci-cassandra.apache.org/job/Cassandra-devbranch/171/]|
|[3.11|https://github.com/pcmanus/cassandra/tree/C-12126-3.11]|[Run 
#172|https://ci-cassandra.apache.org/job/Cassandra-devbranch/172/]|
|[4.0|https://github.com/pcmanus/cassandra/tree/C-12126-4.0]|[Run 
#173|https://ci-cassandra.apache.org/job/Cassandra-devbranch/173/]|


> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16223) Reading dense table yields invalid results in case of row scan queries

2020-11-02 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-16223:
-
  Fix Version/s: (was: 3.11.x)
 3.11.9
 3.0.23
  Since Version: 3.0.0
Source Control Link: 
https://github.com/apache/cassandra/commit/f106ef0697e172492b0343462c593edb703f2ac8,https://github.com/apache/cassandra/commit/833ba83c155871247092d6783d026c27582cde7b
  (was: https://github.com/apache/cassandra/pull/787)
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

The CI run appears good to me, so committed.

> Reading dense table yields invalid results in case of row scan queries
> --
>
> Key: CASSANDRA-16223
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16223
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.23, 3.11.9
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{ThriftIntegrationTest}} is broken in the way that it does not actually test 
> reads before and after flushing, because it does not do flush at all (see 
> https://github.com/apache/cassandra/blob/cassandra-3.11/test/unit/org/apache/cassandra/cql3/validation/ThriftIntegrationTest.java#L939).
>  After fixing that method so that it really flushes memtables to disk, we can 
> see inconsistency in reads from dense table - the results returned from 
> memtable differs from the results returned from sstable (the later are wrong, 
> cell values are skipped unexpectedly).
> {noformat}
> java.lang.AssertionError: Invalid value for row 0 column 0 (value of type 
> ascii), expected  but got <>
> {noformat}
> In principle this problems is about skipping column values when doing row 
> scan queries with explicitly selected columns (not wildcard), when the 
> columns belong to a super column. This happens only when reading from 
> sstables, it does not happen when reading from memtables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16223) Reading dense table yields invalid results in case of row scan queries

2020-11-02 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-16223:
-
Source Control Link: 
https://github.com/apache/cassandra/commit/f106ef0697e172492b0343462c593edb703f2ac8,
 
https://github.com/apache/cassandra/commit/833ba83c155871247092d6783d026c27582cde7b
  (was: 
https://github.com/apache/cassandra/commit/f106ef0697e172492b0343462c593edb703f2ac8,https://github.com/apache/cassandra/commit/833ba83c155871247092d6783d026c27582cde7b)

> Reading dense table yields invalid results in case of row scan queries
> --
>
> Key: CASSANDRA-16223
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16223
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.0.23, 3.11.9
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{ThriftIntegrationTest}} is broken in the way that it does not actually test 
> reads before and after flushing, because it does not do flush at all (see 
> https://github.com/apache/cassandra/blob/cassandra-3.11/test/unit/org/apache/cassandra/cql3/validation/ThriftIntegrationTest.java#L939).
>  After fixing that method so that it really flushes memtables to disk, we can 
> see inconsistency in reads from dense table - the results returned from 
> memtable differs from the results returned from sstable (the later are wrong, 
> cell values are skipped unexpectedly).
> {noformat}
> java.lang.AssertionError: Invalid value for row 0 column 0 (value of type 
> ascii), expected  but got <>
> {noformat}
> In principle this problems is about skipping column values when doing row 
> scan queries with explicitly selected columns (not wildcard), when the 
> columns belong to a super column. This happens only when reading from 
> sstables, it does not happen when reading from memtables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16223) Reading dense table yields invalid results in case of row scan queries

2020-11-02 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-16223:
-
Status: Ready to Commit  (was: Review In Progress)

> Reading dense table yields invalid results in case of row scan queries
> --
>
> Key: CASSANDRA-16223
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16223
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.11.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{ThriftIntegrationTest}} is broken in the way that it does not actually test 
> reads before and after flushing, because it does not do flush at all (see 
> https://github.com/apache/cassandra/blob/cassandra-3.11/test/unit/org/apache/cassandra/cql3/validation/ThriftIntegrationTest.java#L939).
>  After fixing that method so that it really flushes memtables to disk, we can 
> see inconsistency in reads from dense table - the results returned from 
> memtable differs from the results returned from sstable (the later are wrong, 
> cell values are skipped unexpectedly).
> {noformat}
> java.lang.AssertionError: Invalid value for row 0 column 0 (value of type 
> ascii), expected  but got <>
> {noformat}
> In principle this problems is about skipping column values when doing row 
> scan queries with explicitly selected columns (not wildcard), when the 
> columns belong to a super column. This happens only when reading from 
> sstables, it does not happen when reading from memtables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16223) Reading dense table yields invalid results in case of row scan queries

2020-10-26 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17220657#comment-17220657
 ] 

Sylvain Lebresne commented on CASSANDRA-16223:
--

bq. It does not impact 3.0.x because AFAIU there is no value skipping 
optimization in 3.0.x.

It's true that the bug here does not manifest on 3.0. That said, the code is 
still kind of wrong in 3.0 as well (range queries just don't get the proper 
column filter for super columns), and while this may not manifest in a bug in 
practice, it still feel a bit dodgy to leave as is in practice. Plus, fixing 
{{ThriftIntegrationTest}} in 3.0 doesn't hurt either.

Anyway, I've pulled the patch against 3.0 as well (it applies without any 
changes whatsoever) and started CI on both branches. Planning to commit both 
when I get clean result from CI (unless someone object quickly on the 3.0 part).

|| patch || CI run ||
| [3.0|https://github.com/pcmanus/cassandra/commits/C-16223-3.0] | 
[#133|https://ci-cassandra.apache.org/job/Cassandra-devbranch/133/] |
| [3.11|https://github.com/pcmanus/cassandra/commits/C-16223-3.11] | 
[#134|https://ci-cassandra.apache.org/job/Cassandra-devbranch/134/] |


> Reading dense table yields invalid results in case of row scan queries
> --
>
> Key: CASSANDRA-16223
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16223
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.11.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{ThriftIntegrationTest}} is broken in the way that it does not actually test 
> reads before and after flushing, because it does not do flush at all (see 
> https://github.com/apache/cassandra/blob/cassandra-3.11/test/unit/org/apache/cassandra/cql3/validation/ThriftIntegrationTest.java#L939).
>  After fixing that method so that it really flushes memtables to disk, we can 
> see inconsistency in reads from dense table - the results returned from 
> memtable differs from the results returned from sstable (the later are wrong, 
> cell values are skipped unexpectedly).
> {noformat}
> java.lang.AssertionError: Invalid value for row 0 column 0 (value of type 
> ascii), expected  but got <>
> {noformat}
> In principle this problems is about skipping column values when doing row 
> scan queries with explicitly selected columns (not wildcard), when the 
> columns belong to a super column. This happens only when reading from 
> sstables, it does not happen when reading from memtables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16223) Reading dense table yields invalid results in case of row scan queries

2020-10-26 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-16223:
-
Reviewers: Sylvain Lebresne, Sylvain Lebresne  (was: Sylvain Lebresne)
   Sylvain Lebresne, Sylvain Lebresne  (was: Sylvain Lebresne)
   Status: Review In Progress  (was: Patch Available)

> Reading dense table yields invalid results in case of row scan queries
> --
>
> Key: CASSANDRA-16223
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16223
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.11.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{ThriftIntegrationTest}} is broken in the way that it does not actually test 
> reads before and after flushing, because it does not do flush at all (see 
> https://github.com/apache/cassandra/blob/cassandra-3.11/test/unit/org/apache/cassandra/cql3/validation/ThriftIntegrationTest.java#L939).
>  After fixing that method so that it really flushes memtables to disk, we can 
> see inconsistency in reads from dense table - the results returned from 
> memtable differs from the results returned from sstable (the later are wrong, 
> cell values are skipped unexpectedly).
> {noformat}
> java.lang.AssertionError: Invalid value for row 0 column 0 (value of type 
> ascii), expected  but got <>
> {noformat}
> In principle this problems is about skipping column values when doing row 
> scan queries with explicitly selected columns (not wildcard), when the 
> columns belong to a super column. This happens only when reading from 
> sstables, it does not happen when reading from memtables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16223) Reading dense table yields invalid results in case of row scan queries

2020-10-26 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17220652#comment-17220652
 ] 

Sylvain Lebresne commented on CASSANDRA-16223:
--

Good catch, and the fix lgtm, +1.

> Reading dense table yields invalid results in case of row scan queries
> --
>
> Key: CASSANDRA-16223
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16223
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 3.11.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{ThriftIntegrationTest}} is broken in the way that it does not actually test 
> reads before and after flushing, because it does not do flush at all (see 
> https://github.com/apache/cassandra/blob/cassandra-3.11/test/unit/org/apache/cassandra/cql3/validation/ThriftIntegrationTest.java#L939).
>  After fixing that method so that it really flushes memtables to disk, we can 
> see inconsistency in reads from dense table - the results returned from 
> memtable differs from the results returned from sstable (the later are wrong, 
> cell values are skipped unexpectedly).
> {noformat}
> java.lang.AssertionError: Invalid value for row 0 column 0 (value of type 
> ascii), expected  but got <>
> {noformat}
> In principle this problems is about skipping column values when doing row 
> scan queries with explicitly selected columns (not wildcard), when the 
> columns belong to a super column. This happens only when reading from 
> sstables, it does not happen when reading from memtables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15897) Dropping compact storage with 2.1-sstables on disk make them unreadable

2020-10-08 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210092#comment-17210092
 ] 

Sylvain Lebresne commented on CASSANDRA-15897:
--

I wrote a patch for the Gossip option discussed a while back. It's 
[here|https://github.com/pcmanus/cassandra/commits/C-15897-3.0] and should be 
pretty complete as far as refusing `DROP COMPACT STORAGE` until no nodes have 
2.x sstables anymore goes.

I didn't got to updating {{upgradesstable}} to allow migrating 2.x sstables for 
potential users that have already cornered themselves. And I'm afraid I'm 
unlikely to have time for that in the short term.

So if we're ok with pushing update to {{upgradesstable}} (or {{scrub}}) to a 
followup (I could argue that no user has complained about this yet; maybe 
no-one has run into this yet), and are still ok with the Gossip approach, then 
the path is more or less ready for review.

If we prefer having the {{upgradesstable}} change in this ticket however, I 
cannot make promises on how quickly I can get to it (but I'm happy to hand it 
over).


> Dropping compact storage with 2.1-sstables on disk make them unreadable
> ---
>
> Key: CASSANDRA-15897
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15897
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Marcus Eriksson
>Assignee: Sylvain Lebresne
>Priority: Normal
> Fix For: 4.0-beta3
>
>
> Test reproducing: 
> https://github.com/krummas/cassandra/commits/marcuse/dropcompactstorage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16063) Fix user experience when upgrading to 4.0 with compact tables

2020-10-07 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17209591#comment-17209591
 ] 

Sylvain Lebresne commented on CASSANDRA-16063:
--

bq. Before I start any implementation, I decided first to update the ticket and 
confirm the approach.

CASSANDRA-15897 is open to implement _exactly_ that problem. I suggest we 
commit the fix on this ticket as is and leave the issue of cluster-wide 
detection to CASSANDRA-15897. We did discuss options there some time ago, and 
kind of settled on Gossip-based at the time, so [~brandon.williams] is not 
going to be happy. I do have an almost ready branch for that Gossip approach 
btw (which, I won't deny, is a bit involved), and while I don't have time to 
get this to the finish line right now, I can share my branch (tomorrow most 
likely) and you can decide whether to use that or not.

bq. how do we handle nodes that are down?

Fwiw, my existing branch for CASSANDRA-15897 make nodes share the sstables 
version they have in use. If a node is down, other nodes simply rely on the 
last information they got from that node, which should work pretty well in 
practice.

> Fix user experience when upgrading to 4.0 with compact tables
> -
>
> Key: CASSANDRA-16063
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16063
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/CQL
>Reporter: Sylvain Lebresne
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: Compact_storage_upgrade_tests.txt
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The code to handle compact tables has been removed from 4.0, and the intended 
> upgrade path to 4.0 for users having compact tables on 3.x is that they must 
> execute {{ALTER ... DROP COMPACT STORAGE}} on all of their compact tables 
> *before* attempting the upgrade.
> Obviously, some users won't read the upgrade instructions (or miss a table) 
> and may try upgrading despite still having compact tables. If they do so, the 
> intent is that the node will _not_ start, with a message clearly indicating 
> the pre-upgrade step the user has missed. The user will then downgrade back 
> the node(s) to 3.x, run the proper {{ALTER ... DROP COMPACT STORAGE}}, and 
> then upgrade again.
> But while 4.0 does currently fail startup when finding any compact tables 
> with a decent message, I believe the check is done too late during startup.
> Namely, that check is done as we read the tables schema, so within 
> [{{Schema.instance.loadFromDisk()}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/CassandraDaemon.java#L241].
>   But by then, we've _at least_ called 
> {{SystemKeyspace.persistLocalMetadata()}}} and 
> {{SystemKeyspaceMigrator40.migrate()}}, which will get into the commit log, 
> and even possibly flush new {{na}} format sstables. As a results, a user 
> might not be able to seemlessly restart the node on 3.x (to drop compact 
> storage on the appropriate tables).
> Basically, we should make sure the check for compact tables done at 4.0 
> startup is done as a {{StartupCheck}}, before the node does anything.
> We should also add a test for this (checking that if you try upgrading to 4.0 
> with compact storage, you can downgrade back with no intervention whatsoever).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15538) 4.0 quality testing: Local Read/Write Path: Other Areas

2020-09-29 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17204048#comment-17204048
 ] 

Sylvain Lebresne commented on CASSANDRA-15538:
--

No, I haven't really started anything on this issue, and I don't plan to in the 
near term, so I unassigned myself. I should have done it sooner, my bad.

I did spent a few cycles some time ago thinking about what could be done 
concretely here and I'll share my "reflections" in case that's useful. That 
said, in general, the scope here was a bit fuzzy to me.

First, if you look at (true) unit testing for the classes that constitute the 
read/write path, there isn't much. So I suppose one could try to cover that 
somewhat, but the work to make a dent there is huge, and I'm not sure the value 
is that great since those path are mostly covered, but by 
"integration/functional" tests. But this doesn't make is super clear to me if 
specific area are more in need of additional testing than others.

Then the description mentions "numerous bugs and issues with the 3.0 storage 
engine rewrite", so I looked at the list of "serious bugs" that was shared on 
the mailing list (by [~kohlisankalp] I believe; too lazy to dig the link right 
now). From looking at that, the biggest bucket I saw for "storage engine 
rewrite" related bugs was with 'legacy layout conversions/handling'.  And that 
was clearly under-tested, but it's also gone in 4.0. From memory, there were 
also 2-3 read-repair related bugs, but we have CASSANDRA-15977.  Nothing else 
struck me as pointing to a specific area to focus one.

Those aside and fwiw, I've a feeling that things like reverse queries and range 
tombstones may be 2 features that aren't as well tested as they could, but it's 
more an impression of mine than hard data.

Short of focusing on some specific area, the "read/write path" is a big place 
and the space to explore is kinda big. So I feel the biggest value would be to 
start exploring more of that space through randomized testing, specifically 
randomizing queries and/or schema. Presumably, that's what 
[Harry|https://issues.apache.org/jira/browse/CASSANDRA-15348] is for (though I 
haven't really checked it as of yet, so I don't know how capable it is for 
this).  So if it was me, I'd look in this direction. But again, I don't have 
plans to at the moment due to other priorities.


> 4.0 quality testing: Local Read/Write Path: Other Areas
> ---
>
> Key: CASSANDRA-15538
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15538
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest/java, Test/dtest/python
>Reporter: Josh McKenzie
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: Aleksey Yeschenko*
> Testing in this area refers to the local read/write path (StorageProxy, 
> ColumnFamilyStore, Memtable, SSTable reading/writing, etc). We are still 
> finding numerous bugs and issues with the 3.0 storage engine rewrite 
> (CASSANDRA-8099). For 4.0 we want to ensure that we thoroughly cover the 
> local read/write path with techniques such as property-based testing, fuzzing 
> ([example|http://cassandra.apache.org/blog/2018/10/17/finding_bugs_with_property_based_testing.html]),
>  and a source audit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15538) 4.0 quality testing: Local Read/Write Path: Other Areas

2020-09-29 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne reassigned CASSANDRA-15538:


Assignee: (was: Sylvain Lebresne)

> 4.0 quality testing: Local Read/Write Path: Other Areas
> ---
>
> Key: CASSANDRA-15538
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15538
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest/java, Test/dtest/python
>Reporter: Josh McKenzie
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: Aleksey Yeschenko*
> Testing in this area refers to the local read/write path (StorageProxy, 
> ColumnFamilyStore, Memtable, SSTable reading/writing, etc). We are still 
> finding numerous bugs and issues with the 3.0 storage engine rewrite 
> (CASSANDRA-8099). For 4.0 we want to ensure that we thoroughly cover the 
> local read/write path with techniques such as property-based testing, fuzzing 
> ([example|http://cassandra.apache.org/blog/2018/10/17/finding_bugs_with_property_based_testing.html]),
>  and a source audit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16063) Fix user experience when upgrading to 4.0 with compact tables

2020-09-29 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17203976#comment-17203976
 ] 

Sylvain Lebresne commented on CASSANDRA-16063:
--

I don't the time to test the patch thoroughly right now, but from a code review 
point of view, this lgtm.

> Fix user experience when upgrading to 4.0 with compact tables
> -
>
> Key: CASSANDRA-16063
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16063
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/CQL
>Reporter: Sylvain Lebresne
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: Compact_storage_upgrade_tests.txt
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The code to handle compact tables has been removed from 4.0, and the intended 
> upgrade path to 4.0 for users having compact tables on 3.x is that they must 
> execute {{ALTER ... DROP COMPACT STORAGE}} on all of their compact tables 
> *before* attempting the upgrade.
> Obviously, some users won't read the upgrade instructions (or miss a table) 
> and may try upgrading despite still having compact tables. If they do so, the 
> intent is that the node will _not_ start, with a message clearly indicating 
> the pre-upgrade step the user has missed. The user will then downgrade back 
> the node(s) to 3.x, run the proper {{ALTER ... DROP COMPACT STORAGE}}, and 
> then upgrade again.
> But while 4.0 does currently fail startup when finding any compact tables 
> with a decent message, I believe the check is done too late during startup.
> Namely, that check is done as we read the tables schema, so within 
> [{{Schema.instance.loadFromDisk()}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/CassandraDaemon.java#L241].
>   But by then, we've _at least_ called 
> {{SystemKeyspace.persistLocalMetadata()}}} and 
> {{SystemKeyspaceMigrator40.migrate()}}, which will get into the commit log, 
> and even possibly flush new {{na}} format sstables. As a results, a user 
> might not be able to seemlessly restart the node on 3.x (to drop compact 
> storage on the appropriate tables).
> Basically, we should make sure the check for compact tables done at 4.0 
> startup is done as a {{StartupCheck}}, before the node does anything.
> We should also add a test for this (checking that if you try upgrading to 4.0 
> with compact storage, you can downgrade back with no intervention whatsoever).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16063) Fix user experience when upgrading to 4.0 with compact tables

2020-08-31 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17187616#comment-17187616
 ] 

Sylvain Lebresne commented on CASSANDRA-16063:
--

Ok, I hadn't understood the strategy you described, and in particular was 
skipping errors during commit replay was part of this.

 To rephrase, in case that helps other people as dense as me, the strategy 
implemented by the current patches is that if a users starts 4.0 with some 
compact tables, the server will not start but may (will?) write some CL 
segments. The user is then asked to restart on 3.x with a special startup flag 
that makes it so that if replaying those 4.0 CL segments fails, the server 
ignores those errors.

But as mentioned in my previous comment, this is not an ideal experience (it 
also makes it harder to convince oneself that it is safe). Ideally, users 
should not have to pass special flags when restarting 3.x. So back to my 
question above, what is technically preventing us to do that startup check 
before any CL segment is written?

> Fix user experience when upgrading to 4.0 with compact tables
> -
>
> Key: CASSANDRA-16063
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16063
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/CQL
>Reporter: Sylvain Lebresne
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-beta
>
>
> The code to handle compact tables has been removed from 4.0, and the intended 
> upgrade path to 4.0 for users having compact tables on 3.x is that they must 
> execute {{ALTER ... DROP COMPACT STORAGE}} on all of their compact tables 
> *before* attempting the upgrade.
> Obviously, some users won't read the upgrade instructions (or miss a table) 
> and may try upgrading despite still having compact tables. If they do so, the 
> intent is that the node will _not_ start, with a message clearly indicating 
> the pre-upgrade step the user has missed. The user will then downgrade back 
> the node(s) to 3.x, run the proper {{ALTER ... DROP COMPACT STORAGE}}, and 
> then upgrade again.
> But while 4.0 does currently fail startup when finding any compact tables 
> with a decent message, I believe the check is done too late during startup.
> Namely, that check is done as we read the tables schema, so within 
> [{{Schema.instance.loadFromDisk()}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/CassandraDaemon.java#L241].
>   But by then, we've _at least_ called 
> {{SystemKeyspace.persistLocalMetadata()}}} and 
> {{SystemKeyspaceMigrator40.migrate()}}, which will get into the commit log, 
> and even possibly flush new {{na}} format sstables. As a results, a user 
> might not be able to seemlessly restart the node on 3.x (to drop compact 
> storage on the appropriate tables).
> Basically, we should make sure the check for compact tables done at 4.0 
> startup is done as a {{StartupCheck}}, before the node does anything.
> We should also add a test for this (checking that if you try upgrading to 4.0 
> with compact storage, you can downgrade back with no intervention whatsoever).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-16063) Fix user experience when upgrading to 4.0 with compact tables

2020-08-31 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17187616#comment-17187616
 ] 

Sylvain Lebresne edited comment on CASSANDRA-16063 at 8/31/20, 10:04 AM:
-

Ok, I hadn't understood the strategy you described, and in particular why 
skipping errors during commit replay was part of this.

 To rephrase, in case that helps other people as dense as me, the strategy 
implemented by the current patches is that if a users starts 4.0 with some 
compact tables, the server will not start but may (will?) write some CL 
segments. The user is then asked to restart on 3.x with a special startup flag 
that makes it so that if replaying those 4.0 CL segments fails, the server 
ignores those errors.

But as mentioned in my previous comment, this is not an ideal experience (it 
also makes it harder to convince oneself that it is safe). Ideally, users 
should not have to pass special flags when restarting 3.x. So back to my 
question above, what is technically preventing us to do that startup check 
before any CL segment is written?


was (Author: slebresne):
Ok, I hadn't understood the strategy you described, and in particular was 
skipping errors during commit replay was part of this.

 To rephrase, in case that helps other people as dense as me, the strategy 
implemented by the current patches is that if a users starts 4.0 with some 
compact tables, the server will not start but may (will?) write some CL 
segments. The user is then asked to restart on 3.x with a special startup flag 
that makes it so that if replaying those 4.0 CL segments fails, the server 
ignores those errors.

But as mentioned in my previous comment, this is not an ideal experience (it 
also makes it harder to convince oneself that it is safe). Ideally, users 
should not have to pass special flags when restarting 3.x. So back to my 
question above, what is technically preventing us to do that startup check 
before any CL segment is written?

> Fix user experience when upgrading to 4.0 with compact tables
> -
>
> Key: CASSANDRA-16063
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16063
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/CQL
>Reporter: Sylvain Lebresne
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-beta
>
>
> The code to handle compact tables has been removed from 4.0, and the intended 
> upgrade path to 4.0 for users having compact tables on 3.x is that they must 
> execute {{ALTER ... DROP COMPACT STORAGE}} on all of their compact tables 
> *before* attempting the upgrade.
> Obviously, some users won't read the upgrade instructions (or miss a table) 
> and may try upgrading despite still having compact tables. If they do so, the 
> intent is that the node will _not_ start, with a message clearly indicating 
> the pre-upgrade step the user has missed. The user will then downgrade back 
> the node(s) to 3.x, run the proper {{ALTER ... DROP COMPACT STORAGE}}, and 
> then upgrade again.
> But while 4.0 does currently fail startup when finding any compact tables 
> with a decent message, I believe the check is done too late during startup.
> Namely, that check is done as we read the tables schema, so within 
> [{{Schema.instance.loadFromDisk()}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/CassandraDaemon.java#L241].
>   But by then, we've _at least_ called 
> {{SystemKeyspace.persistLocalMetadata()}}} and 
> {{SystemKeyspaceMigrator40.migrate()}}, which will get into the commit log, 
> and even possibly flush new {{na}} format sstables. As a results, a user 
> might not be able to seemlessly restart the node on 3.x (to drop compact 
> storage on the appropriate tables).
> Basically, we should make sure the check for compact tables done at 4.0 
> startup is done as a {{StartupCheck}}, before the node does anything.
> We should also add a test for this (checking that if you try upgrading to 4.0 
> with compact storage, you can downgrade back with no intervention whatsoever).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16063) Fix user experience when upgrading to 4.0 with compact tables

2020-08-28 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-16063:
-
Reviewers: Sylvain Lebresne

> Fix user experience when upgrading to 4.0 with compact tables
> -
>
> Key: CASSANDRA-16063
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16063
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/CQL
>Reporter: Sylvain Lebresne
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-beta
>
>
> The code to handle compact tables has been removed from 4.0, and the intended 
> upgrade path to 4.0 for users having compact tables on 3.x is that they must 
> execute {{ALTER ... DROP COMPACT STORAGE}} on all of their compact tables 
> *before* attempting the upgrade.
> Obviously, some users won't read the upgrade instructions (or miss a table) 
> and may try upgrading despite still having compact tables. If they do so, the 
> intent is that the node will _not_ start, with a message clearly indicating 
> the pre-upgrade step the user has missed. The user will then downgrade back 
> the node(s) to 3.x, run the proper {{ALTER ... DROP COMPACT STORAGE}}, and 
> then upgrade again.
> But while 4.0 does currently fail startup when finding any compact tables 
> with a decent message, I believe the check is done too late during startup.
> Namely, that check is done as we read the tables schema, so within 
> [{{Schema.instance.loadFromDisk()}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/CassandraDaemon.java#L241].
>   But by then, we've _at least_ called 
> {{SystemKeyspace.persistLocalMetadata()}}} and 
> {{SystemKeyspaceMigrator40.migrate()}}, which will get into the commit log, 
> and even possibly flush new {{na}} format sstables. As a results, a user 
> might not be able to seemlessly restart the node on 3.x (to drop compact 
> storage on the appropriate tables).
> Basically, we should make sure the check for compact tables done at 4.0 
> startup is done as a {{StartupCheck}}, before the node does anything.
> We should also add a test for this (checking that if you try upgrading to 4.0 
> with compact storage, you can downgrade back with no intervention whatsoever).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16063) Fix user experience when upgrading to 4.0 with compact tables

2020-08-28 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186530#comment-17186530
 ] 

Sylvain Lebresne commented on CASSANDRA-16063:
--

Happy to review (early next week most likely).

One question before checking the code though: you mentions that with this 
change, we still write a CL segment. Why is that and have you look at how we 
could avoid it? Because if we still write the CL on 4.0 before erroring out, I 
assume this CL gets replayed once the user restart the node on 3.0 (to run DROP 
COMPACT STORAGE)? If so, how are we 100% confident what is replayed will not 
create problems?

> Fix user experience when upgrading to 4.0 with compact tables
> -
>
> Key: CASSANDRA-16063
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16063
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/CQL
>Reporter: Sylvain Lebresne
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-beta
>
>
> The code to handle compact tables has been removed from 4.0, and the intended 
> upgrade path to 4.0 for users having compact tables on 3.x is that they must 
> execute {{ALTER ... DROP COMPACT STORAGE}} on all of their compact tables 
> *before* attempting the upgrade.
> Obviously, some users won't read the upgrade instructions (or miss a table) 
> and may try upgrading despite still having compact tables. If they do so, the 
> intent is that the node will _not_ start, with a message clearly indicating 
> the pre-upgrade step the user has missed. The user will then downgrade back 
> the node(s) to 3.x, run the proper {{ALTER ... DROP COMPACT STORAGE}}, and 
> then upgrade again.
> But while 4.0 does currently fail startup when finding any compact tables 
> with a decent message, I believe the check is done too late during startup.
> Namely, that check is done as we read the tables schema, so within 
> [{{Schema.instance.loadFromDisk()}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/CassandraDaemon.java#L241].
>   But by then, we've _at least_ called 
> {{SystemKeyspace.persistLocalMetadata()}}} and 
> {{SystemKeyspaceMigrator40.migrate()}}, which will get into the commit log, 
> and even possibly flush new {{na}} format sstables. As a results, a user 
> might not be able to seemlessly restart the node on 3.x (to drop compact 
> storage on the appropriate tables).
> Basically, we should make sure the check for compact tables done at 4.0 
> startup is done as a {{StartupCheck}}, before the node does anything.
> We should also add a test for this (checking that if you try upgrading to 4.0 
> with compact storage, you can downgrade back with no intervention whatsoever).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16069) Loss of functionality around null clustering when dropping compact storage

2020-08-21 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-16069:
-
 Bug Category: Parent values: Correctness(12982)Level 1 values: API / 
Semantic Definition(13162)
   Complexity: Normal
Discovered By: Code Inspection
 Severity: Low
   Status: Open  (was: Triage Needed)

> Loss of functionality around null clustering when dropping compact storage
> --
>
> Key: CASSANDRA-16069
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16069
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/CQL
>Reporter: Sylvain Lebresne
>Priority: Normal
>
> For backward compatibility reasons[1], it is allowed to insert rows where 
> some of the clustering columns are {{null}} for compact tables. That support 
> is a tad limited/inconsistent[2] but essentially you can do:
> {noformat}
> cqlsh:ks> CREATE TABLE t (k int, c1 int, c2 int, v int, PRIMARY KEY (k, c1, 
> c2)) WITH COMPACT STORAGE;
> cqlsh:ks> INSERT INTO t(k, c1, v) VALUES (1, 1, 1);
> cqlsh:ks> SELECT * FROM t;
>  k | c1 | c2   | v
> ---++--+---
>  1 |  1 | null | 1
> (1 rows)
> cqlsh:ks> UPDATE t SET v = 2 WHERE k = 1 AND c1 = 1;
> cqlsh:ks> SELECT * FROM t;
>  k | c1 | c2   | v
> ---++--+---
>  1 |  1 | null | 2
> (1 rows)
> {noformat}
> This is not allowed on non-compact tables however:
> {noformat}
> cqlsh:ks> CREATE TABLE t2 (k int, c1 int, c2 int, v int, PRIMARY KEY (k, c1, 
> c2));
> cqlsh:ks> INSERT INTO t2(k, c1, v) VALUES (1, 1, 1);
> InvalidRequest: Error from server: code=2200 [Invalid query] message="Some 
> clustering keys are missing: c2"
> cqlsh:ks> UPDATE t2 SET v = 2 WHERE k = 1 AND c1 = 1;
> InvalidRequest: Error from server: code=2200 [Invalid query] message="Some 
> clustering keys are missing: c2"
> {noformat}
> Which means that a user with a compact table that rely on this will not be 
> able to use {{DROP COMPACT STORAGE}}.
> Which is a problem for the 4.0 upgrade story. Problem to which we need an 
> answer.
>  
> 
> [1]: the underlying {{CompositeType}} used by such tables allows to provide 
> only a prefix of components, so thrift users could have used such 
> functionality. We thus had to support it in CQL, or those users wouldn't have 
> been able to upgrade to CQL easily.
> [2]: building on the example above, the value for {{c2}} is essentially 
> {{null}}, yet none of the following is currently allowed:
> {noformat}
> cqlsh:ks> INSERT INTO t(k, c1, c2, v) VALUES (1, 1, null, 1);
> InvalidRequest: Error from server: code=2200 [Invalid query] message="Invalid 
> null value in condition for column c2"
> cqlsh:ks> UPDATE t SET v = 2 WHERE k = 1 AND c1 = 1 AND c2 = null;
> InvalidRequest: Error from server: code=2200 [Invalid query] message="Invalid 
> null value in condition for column c2"
> cqlsh:ks> SELECT * FROM c WHERE k = 1 AND c1 = 1 AND c2 = null;
> InvalidRequest: Error from server: code=2200 [Invalid query] message="Invalid 
> null value in condition for column c2"
> {noformat}
> Not only is that unintuitive/inconsistent, but the {{SELECT}} one means there 
> is no way to select only the row. You can skip specifying {{c2}} in the 
> {{SELECT}}, but this become a slice selection essentially, as shown below:
> {noformat}
> cqlsh:ks> INSERT INTO ct(k, c1, c2, v) VALUES (1, 1, 1, 1);
> cqlsh:ks> SELECT * FROM ct WHERE k = 1 AND c1 = 1;
>  k | c1 | c2   | v
> ---++--+---
>  1 |  1 | null | 1
>  1 |  1 |1 | 1
> (2 rows)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16069) Loss of functionality around null clustering when dropping compact storage

2020-08-21 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181991#comment-17181991
 ] 

Sylvain Lebresne commented on CASSANDRA-16069:
--

I'll note that it's not obvious to me what is the best course of action here. 
Here's the options I see, none of which is great imo:
# we extend support for {{null}} in clustering columns to all tables. But the 
reason we didn't support this in the first place is that we felt this might be 
more confusing than helpful. After all, this isn't a thing in SQL. Of course, 
we can revisit that opinion, but I think we should be very careful with that 
kind of additive semantic changes (once it's there, it's there forever). And 
for this ticket, we'd have to make the change in 3.0/3.11, which, well, feels 
scary to me.
# we make a special case for tables "that used to be compact" and support 
{{null}} clustering only for those. But technically, we have no way to detect 
those tables as of now, {{DROP COMPACT STORAGE}} does not leave any trace. Even 
if we added such trace, which was already suggested as one of the option for 
CASSANDRA-15897, that trace would (mostly) not be user visible, so that would 
become of pretty confusing rule probably.
# we do nothing (outside of documentation). Which sounds preposterous at face 
value, but to play devil's advocate for a minute: this behavior is pretty 
specific in the first place, and I don't think we document it anywhere. So it's 
not improbable that only a very tiny fraction of users rely on this. There has 
to be point where _if_ we believe the other options are bad for C* in general, 
then it becomes better to say to a handful users "Sorry, you will have to 
either find a way to migrate out of this behavior or stay on 3.0/3.11".


> Loss of functionality around null clustering when dropping compact storage
> --
>
> Key: CASSANDRA-16069
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16069
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/CQL
>Reporter: Sylvain Lebresne
>Priority: Normal
>
> For backward compatibility reasons[1], it is allowed to insert rows where 
> some of the clustering columns are {{null}} for compact tables. That support 
> is a tad limited/inconsistent[2] but essentially you can do:
> {noformat}
> cqlsh:ks> CREATE TABLE t (k int, c1 int, c2 int, v int, PRIMARY KEY (k, c1, 
> c2)) WITH COMPACT STORAGE;
> cqlsh:ks> INSERT INTO t(k, c1, v) VALUES (1, 1, 1);
> cqlsh:ks> SELECT * FROM t;
>  k | c1 | c2   | v
> ---++--+---
>  1 |  1 | null | 1
> (1 rows)
> cqlsh:ks> UPDATE t SET v = 2 WHERE k = 1 AND c1 = 1;
> cqlsh:ks> SELECT * FROM t;
>  k | c1 | c2   | v
> ---++--+---
>  1 |  1 | null | 2
> (1 rows)
> {noformat}
> This is not allowed on non-compact tables however:
> {noformat}
> cqlsh:ks> CREATE TABLE t2 (k int, c1 int, c2 int, v int, PRIMARY KEY (k, c1, 
> c2));
> cqlsh:ks> INSERT INTO t2(k, c1, v) VALUES (1, 1, 1);
> InvalidRequest: Error from server: code=2200 [Invalid query] message="Some 
> clustering keys are missing: c2"
> cqlsh:ks> UPDATE t2 SET v = 2 WHERE k = 1 AND c1 = 1;
> InvalidRequest: Error from server: code=2200 [Invalid query] message="Some 
> clustering keys are missing: c2"
> {noformat}
> Which means that a user with a compact table that rely on this will not be 
> able to use {{DROP COMPACT STORAGE}}.
> Which is a problem for the 4.0 upgrade story. Problem to which we need an 
> answer.
>  
> 
> [1]: the underlying {{CompositeType}} used by such tables allows to provide 
> only a prefix of components, so thrift users could have used such 
> functionality. We thus had to support it in CQL, or those users wouldn't have 
> been able to upgrade to CQL easily.
> [2]: building on the example above, the value for {{c2}} is essentially 
> {{null}}, yet none of the following is currently allowed:
> {noformat}
> cqlsh:ks> INSERT INTO t(k, c1, c2, v) VALUES (1, 1, null, 1);
> InvalidRequest: Error from server: code=2200 [Invalid query] message="Invalid 
> null value in condition for column c2"
> cqlsh:ks> UPDATE t SET v = 2 WHERE k = 1 AND c1 = 1 AND c2 = null;
> InvalidRequest: Error from server: code=2200 [Invalid query] message="Invalid 
> null value in condition for column c2"
> cqlsh:ks> SELECT * FROM c WHERE k = 1 AND c1 = 1 AND c2 = null;
> InvalidRequest: Error from server: code=2200 [Invalid query] message="Invalid 
> null value in condition for column c2"
> {noformat}
> Not only is that unintuitive/inconsistent, but the {{SELECT}} one means there 
> is no way to select only the row. You can skip specifying {{c2}} in the 
> {{SELECT}}, but this become a slice selection essentially, as shown below:
> {noformat}
> cqlsh:ks> INSERT INTO ct(k, c1, c2, v) VALUES (1, 1, 1, 1);
> cqlsh:ks> 

[jira] [Created] (CASSANDRA-16069) Loss of functionality around null clustering when dropping compact storage

2020-08-21 Thread Sylvain Lebresne (Jira)
Sylvain Lebresne created CASSANDRA-16069:


 Summary: Loss of functionality around null clustering when 
dropping compact storage
 Key: CASSANDRA-16069
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16069
 Project: Cassandra
  Issue Type: Bug
  Components: Legacy/CQL
Reporter: Sylvain Lebresne


For backward compatibility reasons[1], it is allowed to insert rows where some 
of the clustering columns are {{null}} for compact tables. That support is a 
tad limited/inconsistent[2] but essentially you can do:
{noformat}
cqlsh:ks> CREATE TABLE t (k int, c1 int, c2 int, v int, PRIMARY KEY (k, c1, 
c2)) WITH COMPACT STORAGE;
cqlsh:ks> INSERT INTO t(k, c1, v) VALUES (1, 1, 1);
cqlsh:ks> SELECT * FROM t;

 k | c1 | c2   | v
---++--+---
 1 |  1 | null | 1

(1 rows)
cqlsh:ks> UPDATE t SET v = 2 WHERE k = 1 AND c1 = 1;
cqlsh:ks> SELECT * FROM t;

 k | c1 | c2   | v
---++--+---
 1 |  1 | null | 2

(1 rows)
{noformat}
This is not allowed on non-compact tables however:
{noformat}
cqlsh:ks> CREATE TABLE t2 (k int, c1 int, c2 int, v int, PRIMARY KEY (k, c1, 
c2));
cqlsh:ks> INSERT INTO t2(k, c1, v) VALUES (1, 1, 1);
InvalidRequest: Error from server: code=2200 [Invalid query] message="Some 
clustering keys are missing: c2"
cqlsh:ks> UPDATE t2 SET v = 2 WHERE k = 1 AND c1 = 1;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Some 
clustering keys are missing: c2"
{noformat}
Which means that a user with a compact table that rely on this will not be able 
to use {{DROP COMPACT STORAGE}}.

Which is a problem for the 4.0 upgrade story. Problem to which we need an 
answer.

 

[1]: the underlying {{CompositeType}} used by such tables allows to provide 
only a prefix of components, so thrift users could have used such 
functionality. We thus had to support it in CQL, or those users wouldn't have 
been able to upgrade to CQL easily.

[2]: building on the example above, the value for {{c2}} is essentially 
{{null}}, yet none of the following is currently allowed:
{noformat}
cqlsh:ks> INSERT INTO t(k, c1, c2, v) VALUES (1, 1, null, 1);
InvalidRequest: Error from server: code=2200 [Invalid query] message="Invalid 
null value in condition for column c2"
cqlsh:ks> UPDATE t SET v = 2 WHERE k = 1 AND c1 = 1 AND c2 = null;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Invalid 
null value in condition for column c2"
cqlsh:ks> SELECT * FROM c WHERE k = 1 AND c1 = 1 AND c2 = null;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Invalid 
null value in condition for column c2"
{noformat}
Not only is that unintuitive/inconsistent, but the {{SELECT}} one means there 
is no way to select only the row. You can skip specifying {{c2}} in the 
{{SELECT}}, but this become a slice selection essentially, as shown below:
{noformat}
cqlsh:ks> INSERT INTO ct(k, c1, c2, v) VALUES (1, 1, 1, 1);
cqlsh:ks> SELECT * FROM ct WHERE k = 1 AND c1 = 1;

 k | c1 | c2   | v
---++--+---
 1 |  1 | null | 1
 1 |  1 |1 | 1

(2 rows)
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16048) Safely Ignore Compact Storage Tables Where Users Have Defined Clustering and Value Columns

2020-08-20 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181308#comment-17181308
 ] 

Sylvain Lebresne commented on CASSANDRA-16048:
--

bq. Regarding the DENSE flag, this is the same as if "DROP COMPACT STORAGE" was 
run, and the behavior prior to this patch is that the database would fail to 
start, so I'm not sure there will be much surprise to the user or that its of 
much concern.

This patch silently drop compact storage on (some) user tables. And as 
mentionned, this _has_ user visible consequences (no more "WITH COMPACT 
STORAGE" mention when using {{DESCRIBED}} and removed {{DENSE}} flag), even if 
they can arguably be considered minor.

The user surprise I'm talking about is a user noticing one of those 
consequences some time post 4.0 upgrade, and having no clue why this happened 
(since again, the drop is automatic and basically silent\[1\]). I mean, in 
general, silent changes tends to be surprising.

Without this patch, users are obligated to _manually_ run {{DROP COMPACT 
STORAGE}} on all tables, so no risk of surprise exists. Which is kind of a 
feature in that case imo.

I suppose the opinion I'm pushing here is that it might be better to ask a few 
power users to run more manual {{DROP COMPACT STORAGE}} (it's not like it can't 
be scripted externally easily if you know what you're doing), than risking to 
confuse some users, even slightly, around something already kind of confusing 
like compact tables. Obviously, I don't know how many and how much users might 
get confused by this. And probably not a lot. Is the risk worth the upside for 
users in general? I lean "no" as of now, but definitively a very weakly held 
position.

bq. As an aside, I am also concerned about the current behavior that we halt 
the database from starting when we detect compact storage tables

This may or may not be what you meant here, but fwiw, I created CASSANDRA-16063 
which is relevant (long story short, I think the general behavior is sensible 
but I don't think it's implemented right; if you're concerned by the general 
behavior and have alternative to propose, feel free to hijack that ticket for 
that purpose).


\[1\]: on that "silent" subject: I think that if we do end up doing this, we 
should at least clearly log when we automatically migrating a table, which the 
current patch don't do, so it's not completely silent. Tbc, doing so wouldn't 
entirely assuage my concerns because I don't think everyone reads their log 
carefully.


> Safely Ignore Compact Storage Tables Where Users Have Defined Clustering and 
> Value Columns
> --
>
> Key: CASSANDRA-16048
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16048
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/CQL
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Some compact storage tables, specifically those where the user has defined 
> both at least one clustering and the value column, can be safely handled in 
> 4.0 because besides the DENSE flag they are not materially different post 3.0 
> and there is no visible change to the user facing schema after dropping 
> compact storage. We can detect this case and allow these tables to silently 
> drop the DENSE flag while still throwing a start-up error for COMPACT STORAGE 
> tables that don’t meet the criteria. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-16063) Fix user experience when upgrading to 4.0 with compact tables

2020-08-20 Thread Sylvain Lebresne (Jira)
Sylvain Lebresne created CASSANDRA-16063:


 Summary: Fix user experience when upgrading to 4.0 with compact 
tables
 Key: CASSANDRA-16063
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16063
 Project: Cassandra
  Issue Type: Bug
  Components: Legacy/CQL
Reporter: Sylvain Lebresne


The code to handle compact tables has been removed from 4.0, and the intended 
upgrade path to 4.0 for users having compact tables on 3.x is that they must 
execute {{ALTER ... DROP COMPACT STORAGE}} on all of their compact tables 
*before* attempting the upgrade.

Obviously, some users won't read the upgrade instructions (or miss a table) and 
may try upgrading despite still having compact tables. If they do so, the 
intent is that the node will _not_ start, with a message clearly indicating the 
pre-upgrade step the user has missed. The user will then downgrade back the 
node(s) to 3.x, run the proper {{ALTER ... DROP COMPACT STORAGE}}, and then 
upgrade again.

But while 4.0 does currently fail startup when finding any compact tables with 
a decent message, I believe the check is done too late during startup.

Namely, that check is done as we read the tables schema, so within 
[{{Schema.instance.loadFromDisk()}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/CassandraDaemon.java#L241].
  But by then, we've _at least_ called 
{{SystemKeyspace.persistLocalMetadata()}}} and 
{{SystemKeyspaceMigrator40.migrate()}}, which will get into the commit log, and 
even possibly flush new {{na}} format sstables. As a results, a user might not 
be able to seemlessly restart the node on 3.x (to drop compact storage on the 
appropriate tables).

Basically, we should make sure the check for compact tables done at 4.0 startup 
is done as a {{StartupCheck}}, before the node does anything.

We should also add a test for this (checking that if you try upgrading to 4.0 
with compact storage, you can downgrade back with no intervention whatsoever).




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15962) Digest for some queries is different depending whether the data are retrieved from sstable or memtable

2020-08-20 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181221#comment-17181221
 ] 

Sylvain Lebresne commented on CASSANDRA-15962:
--

The approach lgtm. Just a few remarks:
* Could you include the {{DigestTest}} you wrote into the PR?
* Changes to {{SelectStatement}} (special casing of the filter for super 
columns on range queries, as is done for slice ones) looks legit, but unrelated 
and a problem on 3.0. We should either split to another ticket (probably 
ideal), or have a 3.0 branch with that fix. It would also be good to have a 
test (one that shows the problem that exists and is fixed by this change).
* Similarly, the change to {{AbstractReadExecutor}} also look legit but 
unrelated and exists in 3.0. That one is pretty minor since I don't think it 
has visible consequence, so not worth a ticket of its own, but would nice to 
include with whatever we do for my previous point so it gets into 3.0.
* BTreeRow#filter: we should reuse/extract the 
{{!queriedByUserTester.test(column)}} test from the {{isSkippable}} initializer 
as value for {{shouldSkipValue}} instead of redoing the work by calling 
{{!filter.fetchedColumnIsQueried(column)}}.
* ComplexColumnData#filter:
** I'd store the result of {{filter.fetchedColumnIsQueried(column))}} in a 
variable at the beginning of the function, to avoid potentially repeating the 
call multiple times. Mostly because it'll be imo more readable, but it also 
avoid bad cases where we repeat it for vary many cells.
** Nit: The {{path != null}} in the {{shouldSkipValue}} initializer is 
unecessary: cells of complex columns are guaranteed to have a non-null path 
(see assertion in {{BufferCell}} ctor).
** It would be nice to avoid the repetition of 
{{cellTest.fetchedCellIsQueried(path)}} as well, readability wise. I'd suggest 
something like:
   {noformat}
   CellPath path = cell.path();
   boolean isForDropped = ...;
   boolean isShardowed = ...;
   boolean isFetchedCell = cellTester == null || cellTest.fetches(path);
   boolean isQueriedCell = isQueriedColumn && isFetchedCell && (cellTester == 
null || cellTester.fetchedCellIsQueried(path));
   boolean isSkippableCell = !isFetchedCell || (!isQueriedCell && 
cell.timestamp() < rowLiveness.timestamp());
   if (isForDropped ||| isShadowed || isSkippable)
   return null;

   // If the cell is only fetched but not queried, we need the cell but never 
the value. So, when reading from sstables, we
   // "skip" the value of such cells as an optimiation (see Cell#deserialize). 
We _must_ thus do the same here to avoid
   // disrepancies between data coming from memtables or sstables, which would 
lead to digest mismatches.
   return isQueriedCell ? cell : cell.withSkippedValue();
   {noformat}


With those addressed, if you could set up both a 3.11 branch and a trunk one 
and run CI, this would be ideal.


> Digest for some queries is different depending whether the data are retrieved 
> from sstable or memtable
> --
>
> Key: CASSANDRA-15962
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15962
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 4.0, 3.11.x
>
> Attachments: DigestTest.java
>
>
> Not sure into which category should I assign this ticket.
>  
> Basically when reading using certain column filters, the digest is different 
> depending whether we read from sstable and memtable. This happens on 
> {{trunk}} and {{cassandra-3.11}} branches. However it works properly on 
> {{cassandra-3.0}} branch.
>  
> I'm attaching a simple test for trunk to demonstrate what I mean. 
>  
> Please verify my test and my conclusions
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16048) Safely Ignore Compact Storage Tables Where Users Have Defined Clustering and Value Columns

2020-08-18 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17179691#comment-17179691
 ] 

Sylvain Lebresne commented on CASSANDRA-16048:
--

Not to reject this upfront completely, but wanted to note that a part of me 
wonders if this is a good idea.

First, I feel this murky the waters a bit around the upgrade path to 4.0. "You 
have to run {{DROP COMPACT STORAGE}} on all compact table before upgrading" is 
a simple rule. Simple is good, and I feel adding special cases is adding 
confusion. And the upside here is pretty minor.

Of course, we could leave this undocumented, and keep only the simple rule in 
all documentation. But if we don't document this, I think this create a risk of 
surprising users, which is my other point.

Because it's not 100% true that "there is no visible change to the user facing 
schema after dropping compact storage". The schema of those tables loses the 
{{WITH COMPACT STORAGE}} (and the {{system_schema}} table will lose the 
{{DENSE}} flag, which is also technically visible). Which may sound trivial, 
but it's a bit hard to ensure that there no user tool/code out there that rely 
on it. And in general, users may simply be confused that their table appears to 
lose a property silently, or wonder why the schema dump of that table done on 
3.x does not replay properly on 4.0 anymore.

Tbc, I get how for "us", devs of C* that understand the internals, it sounds 
annoying to have to run a command when we know it could be done automatically 
and we're not gonna be confused by it. Just not sure this is a good place to 
optimize for "us".


> Safely Ignore Compact Storage Tables Where Users Have Defined Clustering and 
> Value Columns
> --
>
> Key: CASSANDRA-16048
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16048
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/CQL
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Some compact storage tables, specifically those where the user has defined 
> both at least one clustering and the value column, can be safely handled in 
> 4.0 because besides the DENSE flag they are not materially different post 3.0 
> and there is no visible change to the user facing schema after dropping 
> compact storage. We can detect this case and allow these tables to silently 
> drop the DENSE flag while still throwing a start-up error for COMPACT STORAGE 
> tables that don’t meet the criteria. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13994) Remove dead compact storage code before 4.0 release

2020-08-17 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-13994:
-
  Fix Version/s: (was: 4.0-beta)
 (was: 4.0)
 4.0-beta2
Impacts: None
   Platform: All
Source Control Link: 
https://github.com/apache/cassandra/commit/cba0c27ce9f135ded45beaa27a913a0be03b2afb
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Great work [~e.dimitrova], thanks. Committed.

> Remove dead compact storage code before 4.0 release
> ---
>
> Key: CASSANDRA-13994
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13994
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Local Write-Read Paths
>Reporter: Alex Petrov
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: 4.0-beta2
>
>
> 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after 
> [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of 
> the related functionality is useless.
> There are still some things to consider:
> 1. One of the system tables (built indexes) was compact. For now, we just 
> added {{value}} column to it to make sure it's backwards-compatible, but we 
> might want to make sure it's just a "normal" table and doesn't have redundant 
> columns.
> 2. Compact Tables were building indexes in {{KEYS}} mode. Removing it is 
> trivial, but this would mean that all built indexes will be defunct. We could 
> log a warning for now and ask users to migrate off those for now and 
> completely remove it from future releases. It's just a couple of classes 
> though.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13994) Remove dead compact storage code before 4.0 release

2020-08-17 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-13994:
-
Status: Ready to Commit  (was: Review In Progress)

> Remove dead compact storage code before 4.0 release
> ---
>
> Key: CASSANDRA-13994
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13994
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Local Write-Read Paths
>Reporter: Alex Petrov
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: 4.0, 4.0-beta
>
>
> 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after 
> [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of 
> the related functionality is useless.
> There are still some things to consider:
> 1. One of the system tables (built indexes) was compact. For now, we just 
> added {{value}} column to it to make sure it's backwards-compatible, but we 
> might want to make sure it's just a "normal" table and doesn't have redundant 
> columns.
> 2. Compact Tables were building indexes in {{KEYS}} mode. Removing it is 
> trivial, but this would mean that all built indexes will be defunct. We could 
> log a warning for now and ask users to migrate off those for now and 
> completely remove it from future releases. It's just a couple of classes 
> though.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15432) The "read defragmentation" optimization does not work

2020-08-17 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-15432:
-
  Fix Version/s: (was: 3.11.x)
 (was: 4.x)
 (was: 3.0.x)
 4.0-beta2
 3.11.8
 3.0.22
  Since Version: 1.1.0
Source Control Link: 
3.0:https://github.com/apache/cassandra/commit/e2ecdf268a82fa3ac0f4c9fe77ab35bca33cc72a,
 
3.11:https://github.com/apache/cassandra/commit/ecd23f1da5894511cccac6c8445f962f3b73f733,
 trunk:https://github.com/apache/cassandra/commit/efce6b39fb557314fad0cb56b0
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Thanks for the review. CI doesn't seem to show anything new broken so committed.

> The "read defragmentation" optimization does not work
> -
>
> Key: CASSANDRA-15432
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15432
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Normal
> Fix For: 3.0.22, 3.11.8, 4.0-beta2
>
>
> The so-called "read defragmentation" that has been added way back with 
> CASSANDRA-2503 actually does not work, and never has. That is, the 
> defragmentation writes do happen, but they only additional load on the nodes 
> without helping anything, and are thus a clear negative.
> The "read defragmentation" (which only impact so-called "names queries") 
> kicks in when a read hits "too many" sstables (> 4 by default), and when it 
> does, it writes down the result of that read. The assumption being that the 
> next read for that data would only read the newly written data, which if not 
> still in memtable would at least be in a single sstable, thus speeding that 
> next read.
> Unfortunately, this is not how this work. When we defrag and write the result 
> of our original read, we do so with the timestamp of the data read (as we 
> should, changing the timestamp would be plain wrong). And as a result, 
> following reads will read that data first, but will have no way to tell that 
> no more sstables should be read. Technically, the 
> [{{reduceFilter}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L830]
>  call will not return {{null}} because the {{currentMaxTs}} will be higher 
> than at least some of the data in the result, and this until we've read from 
> as many sstables than in the original read.
> I see no easy way to fix this. It might be possible to make it work with 
> additional per-sstable metadata, but nothing sufficiently simple and cheap to 
> be worth it comes to mind. And I thus suggest simply removing that code.
> For the record, I'll note that there is actually a 2nd problem with that 
> code: currently, we "defrag" a read even if we didn't got data for everything 
> that the query requests. This also is "wrong" even if we ignore the first 
> issue: a following read that would read the defragmented data would also have 
> no way to know to not read more sstables to try to get the missing parts. 
> This problem would be fixeable, but is obviously overshadowed by the previous 
> one anyway.
> Anyway, as mentioned, I suggest to just remove the "optimization" (which 
> again, never optimized anything) altogether, and happy to provide the simple 
> patch.
> The only question might be in which versions? This impact all versions, but 
> this isn't a correction bug either, "just" a performance one. So do we want 
> 4.0 only or is there appetite for earlier?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13994) Remove dead compact storage code before 4.0 release

2020-08-13 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177109#comment-17177109
 ] 

Sylvain Lebresne commented on CASSANDRA-13994:
--

Available again so did a review pass on the last version.

And that last version looks good to me. I just gathered a few last nits/minor 
suggestions in [this 
commit|https://github.com/pcmanus/cassandra/commits/C-13994-review]. Which is 
mostly cleaning a few minor things and updating/adding comments. The only 2 
real 'code changes' there are:
* in {{CassandraIndex#indexCfsMetadata}}, the leftover special case for keys 
index felt more cleanly handled now by overriding the 
{{CassandraIndexFunctions#addIndexClusteringColumns}} method, which this commit 
does.
* we were checking/throwing for unsupported flags both in {{TableMetadata}} 
ctor and in {{SchemaKeyspace#fetchTable}}, and this with different messages, 
but the latter calls the former in all cases, so I simplified a bit keeping 
only the {{TableMetadata}} ctor case.

But those are all minor suggestions so whether you keep them or not, +1 from me 
(assuming CI is still clean on this obviously).


> Remove dead compact storage code before 4.0 release
> ---
>
> Key: CASSANDRA-13994
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13994
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Local Write-Read Paths
>Reporter: Alex Petrov
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: 4.0, 4.0-beta
>
>
> 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after 
> [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of 
> the related functionality is useless.
> There are still some things to consider:
> 1. One of the system tables (built indexes) was compact. For now, we just 
> added {{value}} column to it to make sure it's backwards-compatible, but we 
> might want to make sure it's just a "normal" table and doesn't have redundant 
> columns.
> 2. Compact Tables were building indexes in {{KEYS}} mode. Removing it is 
> trivial, but this would mean that all built indexes will be defunct. We could 
> log a warning for now and ask users to migrate off those for now and 
> completely remove it from future releases. It's just a couple of classes 
> though.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15432) The "read defragmentation" optimization does not work

2020-08-13 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-15432:
-
Fix Version/s: 4.x
   3.11.x
   3.0.x

> The "read defragmentation" optimization does not work
> -
>
> Key: CASSANDRA-15432
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15432
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> The so-called "read defragmentation" that has been added way back with 
> CASSANDRA-2503 actually does not work, and never has. That is, the 
> defragmentation writes do happen, but they only additional load on the nodes 
> without helping anything, and are thus a clear negative.
> The "read defragmentation" (which only impact so-called "names queries") 
> kicks in when a read hits "too many" sstables (> 4 by default), and when it 
> does, it writes down the result of that read. The assumption being that the 
> next read for that data would only read the newly written data, which if not 
> still in memtable would at least be in a single sstable, thus speeding that 
> next read.
> Unfortunately, this is not how this work. When we defrag and write the result 
> of our original read, we do so with the timestamp of the data read (as we 
> should, changing the timestamp would be plain wrong). And as a result, 
> following reads will read that data first, but will have no way to tell that 
> no more sstables should be read. Technically, the 
> [{{reduceFilter}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L830]
>  call will not return {{null}} because the {{currentMaxTs}} will be higher 
> than at least some of the data in the result, and this until we've read from 
> as many sstables than in the original read.
> I see no easy way to fix this. It might be possible to make it work with 
> additional per-sstable metadata, but nothing sufficiently simple and cheap to 
> be worth it comes to mind. And I thus suggest simply removing that code.
> For the record, I'll note that there is actually a 2nd problem with that 
> code: currently, we "defrag" a read even if we didn't got data for everything 
> that the query requests. This also is "wrong" even if we ignore the first 
> issue: a following read that would read the defragmented data would also have 
> no way to know to not read more sstables to try to get the missing parts. 
> This problem would be fixeable, but is obviously overshadowed by the previous 
> one anyway.
> Anyway, as mentioned, I suggest to just remove the "optimization" (which 
> again, never optimized anything) altogether, and happy to provide the simple 
> patch.
> The only question might be in which versions? This impact all versions, but 
> this isn't a correction bug either, "just" a performance one. So do we want 
> 4.0 only or is there appetite for earlier?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15432) The "read defragmentation" optimization does not work

2020-08-13 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-15432:
-
Test and Documentation Plan: No impact on testing as this is removing code 
and no test existed for the removed optimization. Afaict, the optimization was 
not documented, so no impact on documentation.
 Status: Patch Available  (was: Open)

> The "read defragmentation" optimization does not work
> -
>
> Key: CASSANDRA-15432
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15432
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Normal
>
> The so-called "read defragmentation" that has been added way back with 
> CASSANDRA-2503 actually does not work, and never has. That is, the 
> defragmentation writes do happen, but they only additional load on the nodes 
> without helping anything, and are thus a clear negative.
> The "read defragmentation" (which only impact so-called "names queries") 
> kicks in when a read hits "too many" sstables (> 4 by default), and when it 
> does, it writes down the result of that read. The assumption being that the 
> next read for that data would only read the newly written data, which if not 
> still in memtable would at least be in a single sstable, thus speeding that 
> next read.
> Unfortunately, this is not how this work. When we defrag and write the result 
> of our original read, we do so with the timestamp of the data read (as we 
> should, changing the timestamp would be plain wrong). And as a result, 
> following reads will read that data first, but will have no way to tell that 
> no more sstables should be read. Technically, the 
> [{{reduceFilter}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L830]
>  call will not return {{null}} because the {{currentMaxTs}} will be higher 
> than at least some of the data in the result, and this until we've read from 
> as many sstables than in the original read.
> I see no easy way to fix this. It might be possible to make it work with 
> additional per-sstable metadata, but nothing sufficiently simple and cheap to 
> be worth it comes to mind. And I thus suggest simply removing that code.
> For the record, I'll note that there is actually a 2nd problem with that 
> code: currently, we "defrag" a read even if we didn't got data for everything 
> that the query requests. This also is "wrong" even if we ignore the first 
> issue: a following read that would read the defragmented data would also have 
> no way to know to not read more sstables to try to get the missing parts. 
> This problem would be fixeable, but is obviously overshadowed by the previous 
> one anyway.
> Anyway, as mentioned, I suggest to just remove the "optimization" (which 
> again, never optimized anything) altogether, and happy to provide the simple 
> patch.
> The only question might be in which versions? This impact all versions, but 
> this isn't a correction bug either, "just" a performance one. So do we want 
> 4.0 only or is there appetite for earlier?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15432) The "read defragmentation" optimization does not work

2020-08-13 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176982#comment-17176982
 ] 

Sylvain Lebresne commented on CASSANDRA-15432:
--

Back on this later than I meant, but attaching fairly trivial patches to remove 
said optimization on 3.0, 3.11 and trunk/4.0.
||patch||CI||
|[3.0|https://github.com/pcmanus/cassandra/commits/C-15432-3.0]|[#239|https://ci-cassandra.apache.org/job/Cassandra-devbranch/239/]|
|[3.11|https://github.com/pcmanus/cassandra/commits/C-15432-3.11]|[#240|https://ci-cassandra.apache.org/job/Cassandra-devbranch/240/]|
|[trunk|https://github.com/pcmanus/cassandra/commits/C-15432-trunk]|[#241|https://ci-cassandra.apache.org/job/Cassandra-devbranch/241/]|

[~aleksey] or [~benedict]: would one of you have cycles to review by any chance 
(pretty simple diff, removing the {{if}} triggering the defrag as well as tiny 
bits of incidental code that is now dead).



> The "read defragmentation" optimization does not work
> -
>
> Key: CASSANDRA-15432
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15432
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Normal
>
> The so-called "read defragmentation" that has been added way back with 
> CASSANDRA-2503 actually does not work, and never has. That is, the 
> defragmentation writes do happen, but they only additional load on the nodes 
> without helping anything, and are thus a clear negative.
> The "read defragmentation" (which only impact so-called "names queries") 
> kicks in when a read hits "too many" sstables (> 4 by default), and when it 
> does, it writes down the result of that read. The assumption being that the 
> next read for that data would only read the newly written data, which if not 
> still in memtable would at least be in a single sstable, thus speeding that 
> next read.
> Unfortunately, this is not how this work. When we defrag and write the result 
> of our original read, we do so with the timestamp of the data read (as we 
> should, changing the timestamp would be plain wrong). And as a result, 
> following reads will read that data first, but will have no way to tell that 
> no more sstables should be read. Technically, the 
> [{{reduceFilter}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L830]
>  call will not return {{null}} because the {{currentMaxTs}} will be higher 
> than at least some of the data in the result, and this until we've read from 
> as many sstables than in the original read.
> I see no easy way to fix this. It might be possible to make it work with 
> additional per-sstable metadata, but nothing sufficiently simple and cheap to 
> be worth it comes to mind. And I thus suggest simply removing that code.
> For the record, I'll note that there is actually a 2nd problem with that 
> code: currently, we "defrag" a read even if we didn't got data for everything 
> that the query requests. This also is "wrong" even if we ignore the first 
> issue: a following read that would read the defragmented data would also have 
> no way to know to not read more sstables to try to get the missing parts. 
> This problem would be fixeable, but is obviously overshadowed by the previous 
> one anyway.
> Anyway, as mentioned, I suggest to just remove the "optimization" (which 
> again, never optimized anything) altogether, and happy to provide the simple 
> patch.
> The only question might be in which versions? This impact all versions, but 
> this isn't a correction bug either, "just" a performance one. So do we want 
> 4.0 only or is there appetite for earlier?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15432) The "read defragmentation" optimization does not work

2020-08-11 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne reassigned CASSANDRA-15432:


Assignee: Sylvain Lebresne

> The "read defragmentation" optimization does not work
> -
>
> Key: CASSANDRA-15432
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15432
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Normal
>
> The so-called "read defragmentation" that has been added way back with 
> CASSANDRA-2503 actually does not work, and never has. That is, the 
> defragmentation writes do happen, but they only additional load on the nodes 
> without helping anything, and are thus a clear negative.
> The "read defragmentation" (which only impact so-called "names queries") 
> kicks in when a read hits "too many" sstables (> 4 by default), and when it 
> does, it writes down the result of that read. The assumption being that the 
> next read for that data would only read the newly written data, which if not 
> still in memtable would at least be in a single sstable, thus speeding that 
> next read.
> Unfortunately, this is not how this work. When we defrag and write the result 
> of our original read, we do so with the timestamp of the data read (as we 
> should, changing the timestamp would be plain wrong). And as a result, 
> following reads will read that data first, but will have no way to tell that 
> no more sstables should be read. Technically, the 
> [{{reduceFilter}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L830]
>  call will not return {{null}} because the {{currentMaxTs}} will be higher 
> than at least some of the data in the result, and this until we've read from 
> as many sstables than in the original read.
> I see no easy way to fix this. It might be possible to make it work with 
> additional per-sstable metadata, but nothing sufficiently simple and cheap to 
> be worth it comes to mind. And I thus suggest simply removing that code.
> For the record, I'll note that there is actually a 2nd problem with that 
> code: currently, we "defrag" a read even if we didn't got data for everything 
> that the query requests. This also is "wrong" even if we ignore the first 
> issue: a following read that would read the defragmented data would also have 
> no way to know to not read more sstables to try to get the missing parts. 
> This problem would be fixeable, but is obviously overshadowed by the previous 
> one anyway.
> Anyway, as mentioned, I suggest to just remove the "optimization" (which 
> again, never optimized anything) altogether, and happy to provide the simple 
> patch.
> The only question might be in which versions? This impact all versions, but 
> this isn't a correction bug either, "just" a performance one. So do we want 
> 4.0 only or is there appetite for earlier?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15924) Avoid emitting empty range tombstones from RangeTombstoneList

2020-07-07 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17152563#comment-17152563
 ] 

Sylvain Lebresne commented on CASSANDRA-15924:
--

Left a minor comment on github for making operators priority explicit in the 
2nd commit, but lgtm otherwise, +1. 

> Avoid emitting empty range tombstones from RangeTombstoneList
> -
>
> Key: CASSANDRA-15924
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15924
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> In {{RangeTombstoneList#iterator}} there is a chance we emit empty range 
> tombstones depending on the slice passed in. This can happen during read 
> repair with either an empty slice or with paging and the final page being 
> empty.
> This creates problems in RTL if we try to insert a new range tombstone which 
> covers the empty ones;
> {code}
> Caused by: java.lang.AssertionError
>   at 
> org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:541)
>   at 
> org.apache.cassandra.db.RangeTombstoneList.addAll(RangeTombstoneList.java:217)
>   at 
> org.apache.cassandra.db.MutableDeletionInfo.add(MutableDeletionInfo.java:141)
>   at 
> org.apache.cassandra.db.partitions.AtomicBTreePartition.addAllWithSizeDelta(AtomicBTreePartition.java:137)
>   at org.apache.cassandra.db.Memtable.put(Memtable.java:254)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1210)
>   at org.apache.cassandra.db.Keyspace.applyInternal(Keyspace.java:573)
>   at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:421)
>   at org.apache.cassandra.db.Mutation.apply(Mutation.java:210)
>   at org.apache.cassandra.db.Mutation.apply(Mutation.java:215)
>   at org.apache.cassandra.db.Mutation.apply(Mutation.java:224)
>   at 
> org.apache.cassandra.cql3.statements.ModificationStatement.executeInternalWithoutCondition(ModificationStatement.java:582)
>   at 
> org.apache.cassandra.cql3.statements.ModificationStatement.executeInternal(ModificationStatement.java:572)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15897) Dropping compact storage with 2.1-sstables on disk make them unreadable

2020-07-02 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17150347#comment-17150347
 ] 

Sylvain Lebresne commented on CASSANDRA-15897:
--

I'll give a shot to the gossip route since that seems to win this popularity 
contest, even if not by a great margin.

bq. And we might need a one-off tool/support in upgradesstables to convert 
those 2.x sstables.

Yeah. I'll look at adding a flag to {{upgradesstables}}. 

That said, I suspect there may be corner cases that we can't entirely handle. 
For instance, if the user started adding/removing columns just after that 
{{DROP COMPACT STORAGE}} and before figuring out that some tables couldn't be 
read, then we might be in the dark. Does not mean we shouldn't try handling 
those sstables in the simple case, but just to list a probable limitation.

> Dropping compact storage with 2.1-sstables on disk make them unreadable
> ---
>
> Key: CASSANDRA-15897
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15897
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Marcus Eriksson
>Assignee: Sylvain Lebresne
>Priority: Normal
>
> Test reproducing: 
> https://github.com/krummas/cassandra/commits/marcuse/dropcompactstorage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15897) Dropping compact storage with 2.1-sstables on disk make them unreadable

2020-07-01 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149466#comment-17149466
 ] 

Sylvain Lebresne commented on CASSANDRA-15897:
--

bq. Maybe, depending on how CASSANDRA-15811 is implemented, that can even be a 
node-local decision?

Haven't though too long about that other ticket, so not sure if this would 
help. I do agree we should do something there though and happy to start the 
conversation on what exactly.

> Dropping compact storage with 2.1-sstables on disk make them unreadable
> ---
>
> Key: CASSANDRA-15897
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15897
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Marcus Eriksson
>Assignee: Sylvain Lebresne
>Priority: Normal
>
> Test reproducing: 
> https://github.com/krummas/cassandra/commits/marcuse/dropcompactstorage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15897) Dropping compact storage with 2.1-sstables on disk make them unreadable

2020-07-01 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-15897:
-
 Bug Category: Parent values: Degradation(12984)Level 1 values: Other 
Exception(12998)
   Complexity: Normal
  Component/s: Legacy/Local Write-Read Paths
Discovered By: Unit Test
 Severity: Normal
   Status: Open  (was: Triage Needed)

> Dropping compact storage with 2.1-sstables on disk make them unreadable
> ---
>
> Key: CASSANDRA-15897
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15897
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Marcus Eriksson
>Assignee: Sylvain Lebresne
>Priority: Normal
>
> Test reproducing: 
> https://github.com/krummas/cassandra/commits/marcuse/dropcompactstorage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15897) Dropping compact storage with 2.1-sstables on disk make them unreadable

2020-07-01 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149210#comment-17149210
 ] 

Sylvain Lebresne commented on CASSANDRA-15897:
--

The problem here is that the 2.X sstables layout is that of the "compact" 
table, and can't be read properly post-DROP COMPACT unless we still remember 
the table "was" compact.

Overall, I can see only 2 main options:
# we remember somehow that a table "was" compact, even after a {{DROP COMPACT 
STORAGE}}. The code reading legacy sstables could then use the old compact 
version of the table metadata, and I think this would work. One way to preserve 
this information could be that, instead of dropping the COMPOUND/DENSE flags 
when {{DROP COMPACT STORAGE}} is used, we'd preserve those but add a new 
{{COMPACT_STORAGE_DROPPED}} flag. I think doing so would be relatively simple 
code-wise, and preserving the information that a table _was_ compact 
internally, at least until 4.0 (which can clean all that up) almost feel like a 
good idea (I doubt many people have tried {{DROP COMPACT STORAGE}} yet, but 
some will have to to upgrade to 4.0 and keeping this info may help diagnose 
issues along the way). The downsides I can see however are:
#* adding a flag impacts drivers, at least when they read the schema (not that 
it completely break driver versions that won't know about the flag...).
#* DROP COMPACT STORAGE has been released a while ago already and this wouldn't 
be retroactive. Not sure it's a big deal, but I may not have think it through.
# we don't "allow" DROP COMPACT STORAGE until all 2.x sstables have been 
upgraded.  Now, if we could easily reject DROP COMPACT STORAGE requests until 
all sstables are in the 3.x format, I probably wouldn't even suggest the first 
option above. But it's actually not trivial, because when a coordinator receive 
such requests, it has no way to know what sstable formats the other nodes have. 
So I guess there is 2 sub-options:
## we clearly document that one should upgrade sstables on all nodes before 
trying DROP COMPACT STORAGE, but don't do more than that. Not amazing, but 
certainly the simplest option.
## we start having each node gossip, say, the lowest sstable format version it 
has locally, so we can properly reject DROP COMPACT STORAGE until it's safe. My 
main personal caveat here is that I'm always a tad nervous with adding things 
to Gossip in a minor. But I _think_ it's pretty safe to do so.

I'm happy to implement any of those solution we prefer (and of course, there 
may be better suggestions), but we need to pick. Personally, I'd prefer 
avoiding 2.1 unless the other options prove more complex than I think, but I'm 
not 100% sure between 1 and 2.2. Maybe 2.2 because it has not externally 
visible impact on drivers.

Other opinions (going to arbitrarily ping [~marcuse], [~ifesdjeen] and 
[~aleksey] as people knowledgeable of COMPACT STORAGE but welcoming every 
opinion)?


> Dropping compact storage with 2.1-sstables on disk make them unreadable
> ---
>
> Key: CASSANDRA-15897
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15897
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Sylvain Lebresne
>Priority: Normal
>
> Test reproducing: 
> https://github.com/krummas/cassandra/commits/marcuse/dropcompactstorage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15897) Dropping compact storage with 2.1-sstables on disk make them unreadable

2020-06-30 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne reassigned CASSANDRA-15897:


Assignee: Sylvain Lebresne

> Dropping compact storage with 2.1-sstables on disk make them unreadable
> ---
>
> Key: CASSANDRA-15897
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15897
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Sylvain Lebresne
>Priority: Normal
>
> Test reproducing: 
> https://github.com/krummas/cassandra/commits/marcuse/dropcompactstorage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15891) provide a configuration option such as endpoint_verification_method

2020-06-30 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-15891:
-
Change Category: Operability
 Complexity: Normal
Component/s: Messaging/Internode
 Status: Open  (was: Triage Needed)

> provide a configuration option such as endpoint_verification_method
> ---
>
> Key: CASSANDRA-15891
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15891
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Messaging/Internode
>Reporter: Thanh
>Priority: Normal
>
> With cassandra-9220, it's possible to configure endpoint/hostname 
> verification when enabling internode encryption.  However, you don't have any 
> control over what endpoint is used for the endpoint verification; instead, 
> cassandra will automatically try to use node IP (not node hostname) for 
> endpoint verification, so if your node certificates don't include the IP in 
> the ssl certificate's SAN list, then you'll get an error like:
> {code:java}
> ERROR [MessagingService-Outgoing-/10.10.88.194-Gossip] 2018-11-13 
> 10:20:26,903 OutboundTcpConnection.java:606 - SSL handshake error for 
> outbound connection to 50cc97c1[SSL_NULL_WITH_NULL_NULL: 
> Socket[addr=/,port=7001,localport=47684]] 
> javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: 
> No subject alternative names matching IP address  found 
> at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) {code}
> From what I've seen, most orgs will not have node IPs in their certs.
> So, it will be best if cassandra would provide another configuration option 
> such as *{{endpoint_verification_method}}* which you could set to "ip" or 
> "fqdn" or something else (eg "hostname_alias" if for whatever reason the org 
> doesn't want to use fqdn for endpoint verification).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15850) Delay between Gossip settle and CQL port opening during the startup

2020-06-30 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-15850:
-
Change Category: Performance
 Complexity: Normal
Component/s: Local/Startup and Shutdown
 Status: Open  (was: Triage Needed)

> Delay between Gossip settle and CQL port opening during the startup
> ---
>
> Key: CASSANDRA-15850
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15850
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Startup and Shutdown
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Normal
>
> Hello,
> When I am bootstrapping/restarting a Cassandra Node, there is a delay between 
> gossip settle and CQL port opening. Can someone please explain me where this 
> delay is configured and can this be changed? I don't see any information in 
> the logs
> In my case if you see there is  a ~3 minutes delay and this increases if I 
> increase the #of tables and #of nodes and DC.
> {code:java}
> INFO  [main] 2020-05-31 23:51:07,554 Gossiper.java:1692 - Waiting for gossip 
> to settle...
> INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; 
> proceeding
> INFO  [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty 
> using native Epoll event loop
> INFO  [main] 2020-05-31 23:54:06,913 Server.java:155 - Using Netty Version: 
> [netty-buffer=netty-buffer-4.0.44.Final.452812a, 
> netty-codec=netty-codec-4.0.44.Final.452812a, 
> netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, 
> netty-codec-http=netty-codec-http-4.0.44.Final.452812a, 
> netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, 
> netty-common=netty-common-4.0.44.Final.452812a, 
> netty-handler=netty-handler-4.0.44.Final.452812a, 
> netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, 
> netty-transport=netty-transport-4.0.44.Final.452812a, 
> netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a,
>  netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, 
> netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, 
> netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a]
> INFO  [main] 2020-05-31 23:54:06,913 Server.java:156 - Starting listening for 
> CQL clients on /x.x.x.x:9042 (encrypted)...
> {code}
> Also during this 3-10 minutes delay, I see 
> {noformat}
> nodetool compactionstats
> {noformat}
>  command is hung and never respond, until the CQL port is up and running.
> Can someone please help me understand the delay here?
> Cassandra Version: 3.11.3
> The issue can be easily reproducible with around 300 Tables and 100 nodes in 
> a cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15850) Delay between Gossip settle and CQL port opening during the startup

2020-06-30 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-15850:
-
  Workflow: Cassandra Default Workflow  (was: Cassandra Bug Workflow)
Issue Type: Improvement  (was: Bug)

> Delay between Gossip settle and CQL port opening during the startup
> ---
>
> Key: CASSANDRA-15850
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15850
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Normal
>
> Hello,
> When I am bootstrapping/restarting a Cassandra Node, there is a delay between 
> gossip settle and CQL port opening. Can someone please explain me where this 
> delay is configured and can this be changed? I don't see any information in 
> the logs
> In my case if you see there is  a ~3 minutes delay and this increases if I 
> increase the #of tables and #of nodes and DC.
> {code:java}
> INFO  [main] 2020-05-31 23:51:07,554 Gossiper.java:1692 - Waiting for gossip 
> to settle...
> INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; 
> proceeding
> INFO  [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty 
> using native Epoll event loop
> INFO  [main] 2020-05-31 23:54:06,913 Server.java:155 - Using Netty Version: 
> [netty-buffer=netty-buffer-4.0.44.Final.452812a, 
> netty-codec=netty-codec-4.0.44.Final.452812a, 
> netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, 
> netty-codec-http=netty-codec-http-4.0.44.Final.452812a, 
> netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, 
> netty-common=netty-common-4.0.44.Final.452812a, 
> netty-handler=netty-handler-4.0.44.Final.452812a, 
> netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, 
> netty-transport=netty-transport-4.0.44.Final.452812a, 
> netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a,
>  netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, 
> netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, 
> netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a]
> INFO  [main] 2020-05-31 23:54:06,913 Server.java:156 - Starting listening for 
> CQL clients on /x.x.x.x:9042 (encrypted)...
> {code}
> Also during this 3-10 minutes delay, I see 
> {noformat}
> nodetool compactionstats
> {noformat}
>  command is hung and never respond, until the CQL port is up and running.
> Can someone please help me understand the delay here?
> Cassandra Version: 3.11.3
> The issue can be easily reproducible with around 300 Tables and 100 nodes in 
> a cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15850) Delay between Gossip settle and CQL port opening during the startup

2020-06-30 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17148717#comment-17148717
 ] 

Sylvain Lebresne commented on CASSANDRA-15850:
--

>From a look at the code, between gossip settling and starting the CQL server, 
>the only thing that happens is that all the tables are "reloaded" (which 
>involves a number of steps) to account for changes that could have happened 
>once Gossip settles, and compactions are started.

None of that shouldn't be super long for a given table, but it's not the most 
optimized thing ever either, and we do reload all tables sequentially, so this 
may well be the culprit for the delay you are seeing.

Assuming I'm correct (I'm only going from a quick read of the code here), I 
don't think any configuration option will help reduce that delay (but it does 
make sense the # of tables is a main factor).

It's not a bug, the server is doing work, albeit maybe inefficiently.

I'm sure this could be improved though. At a minimum, it would be more user 
friendly to add a log message to explain what is being done so users are not 
left wondering what is going on.

I'm sure we can also make that faster. 2 things comes in mind in particular:
 - it seems the only reason to do this reloading is for the compaction 
strategy(ies) to take any disk boundaries change into account, but reloading 
does other things, and a bit of benchmarking could probably tell us if we could 
save meaningful time by doing a more targetted reloading.
 - parallelizing the work might yield benefits.

> Delay between Gossip settle and CQL port opening during the startup
> ---
>
> Key: CASSANDRA-15850
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15850
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Normal
>
> Hello,
> When I am bootstrapping/restarting a Cassandra Node, there is a delay between 
> gossip settle and CQL port opening. Can someone please explain me where this 
> delay is configured and can this be changed? I don't see any information in 
> the logs
> In my case if you see there is  a ~3 minutes delay and this increases if I 
> increase the #of tables and #of nodes and DC.
> {code:java}
> INFO  [main] 2020-05-31 23:51:07,554 Gossiper.java:1692 - Waiting for gossip 
> to settle...
> INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; 
> proceeding
> INFO  [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty 
> using native Epoll event loop
> INFO  [main] 2020-05-31 23:54:06,913 Server.java:155 - Using Netty Version: 
> [netty-buffer=netty-buffer-4.0.44.Final.452812a, 
> netty-codec=netty-codec-4.0.44.Final.452812a, 
> netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, 
> netty-codec-http=netty-codec-http-4.0.44.Final.452812a, 
> netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, 
> netty-common=netty-common-4.0.44.Final.452812a, 
> netty-handler=netty-handler-4.0.44.Final.452812a, 
> netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, 
> netty-transport=netty-transport-4.0.44.Final.452812a, 
> netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a,
>  netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, 
> netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, 
> netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a]
> INFO  [main] 2020-05-31 23:54:06,913 Server.java:156 - Starting listening for 
> CQL clients on /x.x.x.x:9042 (encrypted)...
> {code}
> Also during this 3-10 minutes delay, I see 
> {noformat}
> nodetool compactionstats
> {noformat}
>  command is hung and never respond, until the CQL port is up and running.
> Can someone please help me understand the delay here?
> Cassandra Version: 3.11.3
> The issue can be easily reproducible with around 300 Tables and 100 nodes in 
> a cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15908) Improve messaging on indexing frozen collections

2020-06-30 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-15908:
-
Change Category: Operability
 Complexity: Low Hanging Fruit
 Status: Open  (was: Triage Needed)

> Improve messaging on indexing frozen collections
> 
>
> Key: CASSANDRA-15908
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15908
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL/Semantics
>Reporter: Rocco Varela
>Assignee: Rocco Varela
>Priority: Low
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When attempting to create an index on a frozen collection the error message 
> produced can be improved to provide more detail about the problem and 
> possible workarounds. Currently, a user will receive a message indicating 
> "...Frozen collections only support full() indexes" which is not immediately 
> clear for users new to Cassandra indexing and datatype compatibility.
> Here is an example:
> {code:java}
> cqlsh> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> cqlsh> CREATE TABLE test.mytable ( id int primary key, addresses 
> frozen> );
> cqlsh> CREATE INDEX mytable_addresses_idx on test.mytable (addresses);
>  InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot 
> create values() index on frozen column addresses. Frozen collections only 
> support full() indexes"{code}
>  
> I'm proposing possibly enhancing the messaging to something like this.
> {quote}Cannot create values() index on frozen column addresses. Frozen 
> collections only support indexes on the entire data structure due to 
> immutability constraints of being frozen, wrap your frozen column with the 
> full() target type to index properly.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15847) High Local read latency for few tables

2020-06-30 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-15847:
-
Resolution: Invalid
Status: Resolved  (was: Triage Needed)

The user mailing list (u...@cassandra.apache.org) is the appropriate venue for 
getting such help. JIRA is for reporting bugs, and documenting idea for new 
improvements and features.


> High Local read latency for few tables
> --
>
> Key: CASSANDRA-15847
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15847
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/sstable
>Reporter: Ananda Babu Velupala
>Priority: Normal
>
> Hi Team,
> I am seeing high Local read latency for 3 tables in node(its 5 node cluster) 
> and keyspace has total 16 sstables and hitting 10 sstables for read from that 
> table, can you please suggest any path forward to fix read latency. 
> Appreciate your help.. Thanks
> Cassandra version : 3.11.3
> SSTable Hitratio:
> ==
> k2view_usp/service_network_element_relation histograms
> Percentile SSTables Write Latency Read Latency Partition Size Cell Count
> (micros) (micros) (bytes)
> 50% 3.00 0.00 219.34 179 10
> 75% 7.00 0.00 315.85 179 10
> 95% 10.00 0.00 454.83 179 10
> 98% 10.00 0.00 545.79 215 10
> 99% 10.00 0.00 545.79 310 20
> Min 0.00 0.00 51.01 43 0
> Max 10.00 0.00 545.79 89970660 8409007
>  
> TABLE STATS:
> ==
> Table: service_network_element_relation_mirTable: 
> service_network_element_relation_mir SSTable count: 3 Space used (live): 
> 283698097 Space used (total): 283698097 Space used by snapshots (total): 0 
> Off heap memory used (total): 5335824 SSTable Compression Ratio: 
> 0.39563345719027554 Number of partitions (estimate): 2194136 Memtable cell 
> count: 0 Memtable data size: 0 Memtable off heap memory used: 0 Memtable 
> switch count: 0 Local read count: 0 Local read latency: NaN ms Local write 
> count: 0 Local write latency: NaN ms Pending flushes: 0 Percent repaired: 
> 100.0 Bloom filter false positives: 0 Bloom filter false ratio: 0.0 Bloom 
> filter space used: 4567016 Bloom filter off heap memory used: 4566992 Index 
> summary off heap memory used: 705208 Compression metadata off heap memory 
> used: 63624 Compacted partition minimum bytes: 104 Compacted partition 
> maximum bytes: 310 Compacted partition mean bytes: 154 Average live cells per 
> slice (last five minutes): NaN Maximum live cells per slice (last five 
> minutes): 0 Average tombstones per slice (last five minutes): NaN Maximum 
> tombstones per slice (last five minutes): 0 Dropped Mutations: 0
>  
>  
> Table: service_network_element_relationTable: 
> service_network_element_relation SSTable count: 11 Space used (live): 
> 8067239427 Space used (total): 8067239427 Space used by snapshots (total): 0 
> Off heap memory used (total): 143032693 SSTable Compression Ratio: 
> 0.21558247949161227 Number of partitions (estimate): 29357598 Memtable cell 
> count: 2714 Memtable data size: 691617 Memtable off heap memory used: 0 
> Memtable switch count: 9 Local read count: 6369399 Local read latency: 0.311 
> ms Local write count: 161229 Local write latency: NaN ms Pending flushes: 0 
> Percent repaired: 99.91 Bloom filter false positives: 1508 Bloom filter false 
> ratio: 0.00012 Bloom filter space used: 113071680 Bloom filter off heap 
> memory used: 113071592 Index summary off heap memory used: 27244541 
> Compression metadata off heap memory used: 2716560 Compacted partition 
> minimum bytes: 43 Compacted partition maximum bytes: 89970660 Compacted 
> partition mean bytes: 265 Average live cells per slice (last five minutes): 
> 1.1779891304347827 Maximum live cells per slice (last five minutes): 103 
> Average tombstones per slice (last five minutes): 1.0 Maximum tombstones per 
> slice (last five minutes): 1 Dropped Mutations: 0
>  
> Table: service_relationTable: service_relation SSTable count: 7 Space used 
> (live): 281354042 Space used (total): 281354042 Space used by snapshots 
> (total): 35695068 Off heap memory used (total): 6423276 SSTable Compression 
> Ratio: 0.17685515178431085 Number of partitions (estimate): 1719400 Memtable 
> cell count: 1150 Memtable data size: 67482 Memtable off heap memory used: 0 
> Memtable switch count: 3 Local read count: 5506327 Local read latency: 0.182 
> ms Local write count: 5237 Local write latency: 0.084 ms Pending flushes: 0 
> Percent repaired: 55.48 Bloom filter false positives: 17 Bloom filter false 
> ratio: 0.0 Bloom filter space used: 5549664 Bloom filter off heap memory 
> used: 5549608 Index summary off heap memory used: 737348 Compression metadata 
> off heap memory used: 136320 Compacted partition minimum bytes: 87 Compacted 
> partition maximum bytes: 4055269 Compacted partition mean 

[jira] [Updated] (CASSANDRA-15846) BusyPoolException with OperationTimedOutException

2020-06-30 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-15846:
-
Resolution: Not A Problem
Status: Resolved  (was: Triage Needed)

I'll close because as said above, I really don't think this is any indication 
of a server bug or issue. But if you have more information that suggests 
otherwise, please feel free to re-open with that additional information.

> BusyPoolException with OperationTimedOutException
> -
>
> Key: CASSANDRA-15846
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15846
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Yogesh Bidari
>Priority: Normal
>
> I am facing issue with my cassandra cluster, we are observing connection 
> timeout errors with BusyPoolException.
> Logs:
> java.util.concurrent.ExecutionException: 
> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
> tried for query failed (tried: /x.x.x.x:9042 
> (com.datastax.driver.core.exceptions.OperationTimedOutException: 
> [/x.x.x.x:9042] Timed out waiting for server response), 
> cassandra/x.x.x.x:9042 
> (com.datastax.driver.core.exceptions.BusyPoolException: 
> [cassandra/x.x.x.x:9042] Pool is busy (no available connection and the queue 
> has reached its max size 0)))java.util.concurrent.ExecutionException: 
> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
> tried for query failed (tried: /x.x.x.x:9042 
> (com.datastax.driver.core.exceptions.OperationTimedOutException: 
> [/x.x.x.x:9042] Timed out waiting for server response), 
> cassandra/x.x.x.x:9042 
> (com.datastax.driver.core.exceptions.BusyPoolException: 
> [cassandra/x.x.x.x:9042] Pool is busy (no available connection and the queue 
> has reached its max size 0))) at 
> com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552)
>  at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:513) 
> at 
> akka.persistence.cassandra.package$ListenableFutureConverter$$anon$2.$anonfun$run$2(package.scala:50)
>  at scala.util.Try$.apply(Try.scala:213) at 
> akka.persistence.cassandra.package$ListenableFutureConverter$$anon$2.run(package.scala:50)
>  at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:47) at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:47)
>  at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source) at 
> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown 
> Source) at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source) 
> at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source) at 
> java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)Caused 
> by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
> tried for query failed (tried: /x.x.x.x:9042 
> (com.datastax.driver.core.exceptions.OperationTimedOutException: 
> [/x.x.x.x:9042] Timed out waiting for server response), 
> cassandra/x.x.x.x:9042 
> (com.datastax.driver.core.exceptions.BusyPoolException: 
> [cassandra/x.x.x.x:9042] Pool is busy (no available connection and the queue 
> has reached its max size 0))) at 
> com.datastax.driver.core.RequestHandler.reportNoMoreHosts(RequestHandler.java:283)
>  at 
> com.datastax.driver.core.RequestHandler.access$1200(RequestHandler.java:61) 
> at 
> com.datastax.driver.core.RequestHandler$SpeculativeExecution.findNextHostAndQuery(RequestHandler.java:375)
>  at 
> com.datastax.driver.core.RequestHandler$SpeculativeExecution$1.onFailure(RequestHandler.java:444)
>  at 
> com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1015)
>  at 
> com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
>  at 
> com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1137)
>  at 
> com.google.common.util.concurrent.AbstractFuture.addListener(AbstractFuture.java:707)
>  at 
> com.google.common.util.concurrent.AbstractFuture$TrustedFuture.addListener(AbstractFuture.java:112)
>  at com.google.common.util.concurrent.Futures.addCallback(Futures.java:996) 
> at 
> com.datastax.driver.core.GuavaCompatibility.addCallback(GuavaCompatibility.java:112)
>  at 
> com.datastax.driver.core.GuavaCompatibility.addCallback(GuavaCompatibility.java:100)
>  at 
> com.datastax.driver.core.RequestHandler$SpeculativeExecution.query(RequestHandler.java:400)
>  at 
> com.datastax.driver.core.RequestHandler$SpeculativeExecution.findNextHostAndQuery(RequestHandler.java:359)
>  at 
> com.datastax.driver.core.RequestHandler$SpeculativeExecution.retry(RequestHandler.java:557)
>  at 
> 

[jira] [Commented] (CASSANDRA-15846) BusyPoolException with OperationTimedOutException

2020-06-30 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17148476#comment-17148476
 ] 

Sylvain Lebresne commented on CASSANDRA-15846:
--

{{BusyPoolException}} is a driver exception, and drivers are (currently) their 
own separate project, so you'd want to reach to that specific project for more 
precise help, but I'd venture such exception suggests you are sending too many 
request asynchronously at once, exhausting the connections and queues (or have 
configured those too low), and would have to throttle this somewhat.

But from the information here, there is nothing suggesting a bug server side 
(nor even really in the driver). Plese reach the C* user mailing list 
(u...@cassandra.apache.org) or, possibly even better, the Java driver mailing 
list 
(https://groups.google.com/a/lists.datastax.com/forum/#!forum/java-driver-user) 
if you need more help to figure what might be wrong.

> BusyPoolException with OperationTimedOutException
> -
>
> Key: CASSANDRA-15846
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15846
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Yogesh Bidari
>Priority: Normal
>
> I am facing issue with my cassandra cluster, we are observing connection 
> timeout errors with BusyPoolException.
> Logs:
> java.util.concurrent.ExecutionException: 
> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
> tried for query failed (tried: /x.x.x.x:9042 
> (com.datastax.driver.core.exceptions.OperationTimedOutException: 
> [/x.x.x.x:9042] Timed out waiting for server response), 
> cassandra/x.x.x.x:9042 
> (com.datastax.driver.core.exceptions.BusyPoolException: 
> [cassandra/x.x.x.x:9042] Pool is busy (no available connection and the queue 
> has reached its max size 0)))java.util.concurrent.ExecutionException: 
> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
> tried for query failed (tried: /x.x.x.x:9042 
> (com.datastax.driver.core.exceptions.OperationTimedOutException: 
> [/x.x.x.x:9042] Timed out waiting for server response), 
> cassandra/x.x.x.x:9042 
> (com.datastax.driver.core.exceptions.BusyPoolException: 
> [cassandra/x.x.x.x:9042] Pool is busy (no available connection and the queue 
> has reached its max size 0))) at 
> com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552)
>  at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:513) 
> at 
> akka.persistence.cassandra.package$ListenableFutureConverter$$anon$2.$anonfun$run$2(package.scala:50)
>  at scala.util.Try$.apply(Try.scala:213) at 
> akka.persistence.cassandra.package$ListenableFutureConverter$$anon$2.run(package.scala:50)
>  at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:47) at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:47)
>  at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source) at 
> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown 
> Source) at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source) 
> at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source) at 
> java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)Caused 
> by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
> tried for query failed (tried: /x.x.x.x:9042 
> (com.datastax.driver.core.exceptions.OperationTimedOutException: 
> [/x.x.x.x:9042] Timed out waiting for server response), 
> cassandra/x.x.x.x:9042 
> (com.datastax.driver.core.exceptions.BusyPoolException: 
> [cassandra/x.x.x.x:9042] Pool is busy (no available connection and the queue 
> has reached its max size 0))) at 
> com.datastax.driver.core.RequestHandler.reportNoMoreHosts(RequestHandler.java:283)
>  at 
> com.datastax.driver.core.RequestHandler.access$1200(RequestHandler.java:61) 
> at 
> com.datastax.driver.core.RequestHandler$SpeculativeExecution.findNextHostAndQuery(RequestHandler.java:375)
>  at 
> com.datastax.driver.core.RequestHandler$SpeculativeExecution$1.onFailure(RequestHandler.java:444)
>  at 
> com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1015)
>  at 
> com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
>  at 
> com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1137)
>  at 
> com.google.common.util.concurrent.AbstractFuture.addListener(AbstractFuture.java:707)
>  at 
> com.google.common.util.concurrent.AbstractFuture$TrustedFuture.addListener(AbstractFuture.java:112)
>  at com.google.common.util.concurrent.Futures.addCallback(Futures.java:996) 
> at 
> com.datastax.driver.core.GuavaCompatibility.addCallback(GuavaCompatibility.java:112)
>  at 
> 

[jira] [Updated] (CASSANDRA-15906) Queries on KEYS 2i are broken by DROP COMPACT STORAGE on 3.0

2020-06-29 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-15906:
-
Test and Documentation Plan: upgrade dtest included
 Status: Patch Available  (was: Open)

The fix is pretty simple, we can just skip the validation if the table has a 
KEYS index. More importantly, I've added a dtest to test the upgrade path of 
KEYS 2i to 4.0, which in particular demonstrates the problem described (as DROP 
COMPACT STORAGE is part of that upgrade path).
||C* patch||dtest patch||CI||
|[3.0|https://github.com/pcmanus/cassandra/commits/C-15906-3.0]|[dtest|https://github.com/pcmanus/cassandra-dtest/commits/test_keys_2i_upgrade]|[#169|https://ci-cassandra.apache.org/job/Cassandra-devbranch/169/]|

Note that there is only a 3.0 patch since as said in the description, 3.11+ is 
not affected.


> Queries on KEYS 2i are broken by DROP COMPACT STORAGE on 3.0
> 
>
> Key: CASSANDRA-15906
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15906
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL/Interpreter
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Normal
>
> From 3.0 onwards, the declared columns of a thrift table are internally 
> static columns. While the table is compact, this 
> After DROP COMPACT STORAGE is used on a table that has a KEYS 2i, queries 
> that uses that index will start failing with:
> {noformat}
> Queries using 2ndary indexes don't support selecting only static columns
> {noformat}
> In 3.0, we don't support index on static columns and have that validation 
> that rejects 2i queries on static columns. But the declared columns of 
> compact table are static under the hood, and while this specific validation 
> is skipped while the table is compact, it isn't anymore after the DROP 
> COMPACT STORAGE.
> Note that internally, nothing changes with the DROP COMPACT STORAGE, and the 
> 2i queries would still work as well as before, it is just that they are 
> rejected.
> Also not that this is only a problem in 3.0. In 3.11, static column indexes 
> were added (CASSANDRA-8103) and thus this validation has been removed, and 
> everything works as it should.
> However, since DROP COMPACT STORAGE is a mandatory step for compact tables 
> before upgrading to 4.0, fixing this annoying in 3.0 would avoid forcing 
> users with KEYS 2i on 3.0 to upgrade to 3.11 before going to 4.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15906) Queries on KEYS 2i are broken by DROP COMPACT STORAGE on 3.0

2020-06-29 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-15906:
-
Fix Version/s: 3.0.x

> Queries on KEYS 2i are broken by DROP COMPACT STORAGE on 3.0
> 
>
> Key: CASSANDRA-15906
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15906
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL/Interpreter
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Normal
> Fix For: 3.0.x
>
>
> From 3.0 onwards, the declared columns of a thrift table are internally 
> static columns. While the table is compact, this 
> After DROP COMPACT STORAGE is used on a table that has a KEYS 2i, queries 
> that uses that index will start failing with:
> {noformat}
> Queries using 2ndary indexes don't support selecting only static columns
> {noformat}
> In 3.0, we don't support index on static columns and have that validation 
> that rejects 2i queries on static columns. But the declared columns of 
> compact table are static under the hood, and while this specific validation 
> is skipped while the table is compact, it isn't anymore after the DROP 
> COMPACT STORAGE.
> Note that internally, nothing changes with the DROP COMPACT STORAGE, and the 
> 2i queries would still work as well as before, it is just that they are 
> rejected.
> Also not that this is only a problem in 3.0. In 3.11, static column indexes 
> were added (CASSANDRA-8103) and thus this validation has been removed, and 
> everything works as it should.
> However, since DROP COMPACT STORAGE is a mandatory step for compact tables 
> before upgrading to 4.0, fixing this annoying in 3.0 would avoid forcing 
> users with KEYS 2i on 3.0 to upgrade to 3.11 before going to 4.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15906) Queries on KEYS 2i are broken by DROP COMPACT STORAGE on 3.0

2020-06-29 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-15906:
-
 Bug Category: Parent values: Degradation(12984)Level 1 values: Other 
Exception(12998)
   Complexity: Low Hanging Fruit
Discovered By: Unit Test
 Severity: Low
   Status: Open  (was: Triage Needed)

> Queries on KEYS 2i are broken by DROP COMPACT STORAGE on 3.0
> 
>
> Key: CASSANDRA-15906
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15906
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL/Interpreter
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Normal
>
> From 3.0 onwards, the declared columns of a thrift table are internally 
> static columns. While the table is compact, this 
> After DROP COMPACT STORAGE is used on a table that has a KEYS 2i, queries 
> that uses that index will start failing with:
> {noformat}
> Queries using 2ndary indexes don't support selecting only static columns
> {noformat}
> In 3.0, we don't support index on static columns and have that validation 
> that rejects 2i queries on static columns. But the declared columns of 
> compact table are static under the hood, and while this specific validation 
> is skipped while the table is compact, it isn't anymore after the DROP 
> COMPACT STORAGE.
> Note that internally, nothing changes with the DROP COMPACT STORAGE, and the 
> 2i queries would still work as well as before, it is just that they are 
> rejected.
> Also not that this is only a problem in 3.0. In 3.11, static column indexes 
> were added (CASSANDRA-8103) and thus this validation has been removed, and 
> everything works as it should.
> However, since DROP COMPACT STORAGE is a mandatory step for compact tables 
> before upgrading to 4.0, fixing this annoying in 3.0 would avoid forcing 
> users with KEYS 2i on 3.0 to upgrade to 3.11 before going to 4.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15906) Queries on KEYS 2i are broken by DROP COMPACT STORAGE on 3.0

2020-06-29 Thread Sylvain Lebresne (Jira)
Sylvain Lebresne created CASSANDRA-15906:


 Summary: Queries on KEYS 2i are broken by DROP COMPACT STORAGE on 
3.0
 Key: CASSANDRA-15906
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15906
 Project: Cassandra
  Issue Type: Bug
  Components: CQL/Interpreter
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne


>From 3.0 onwards, the declared columns of a thrift table are internally static 
>columns. While the table is compact, this 

After DROP COMPACT STORAGE is used on a table that has a KEYS 2i, queries that 
uses that index will start failing with:
{noformat}
Queries using 2ndary indexes don't support selecting only static columns
{noformat}

In 3.0, we don't support index on static columns and have that validation that 
rejects 2i queries on static columns. But the declared columns of compact table 
are static under the hood, and while this specific validation is skipped while 
the table is compact, it isn't anymore after the DROP COMPACT STORAGE.

Note that internally, nothing changes with the DROP COMPACT STORAGE, and the 2i 
queries would still work as well as before, it is just that they are rejected.

Also not that this is only a problem in 3.0. In 3.11, static column indexes 
were added (CASSANDRA-8103) and thus this validation has been removed, and 
everything works as it should.

However, since DROP COMPACT STORAGE is a mandatory step for compact tables 
before upgrading to 4.0, fixing this annoying in 3.0 would avoid forcing users 
with KEYS 2i on 3.0 to upgrade to 3.11 before going to 4.0.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13994) Remove COMPACT STORAGE internals before 4.0 release

2020-06-25 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17145056#comment-17145056
 ] 

Sylvain Lebresne commented on CASSANDRA-13994:
--

Getting back to the KEYS index question.

bq. I don't think user can upgrade to 4.0 at all if they still have KEYS index.

I was wrong.

I tested it now (see [this upgrade 
test|https://github.com/pcmanus/cassandra-dtest/commit/09a6a9888a73eb14613eaecb4dc7e5cba9a46765#diff-59d5ceaa3ba81b0a9a360dedcf3bed16R378],
 but you if you create a KEYS index through thrift (in 2.x/3.x), {{DROP COMPACT 
STORAGE}} on the base table and then uprade from 3.11 to 4.0, this 'just work'™ 
(a rolling upgrade can be done while continuing to use the KEYS index before, 
during and after the upgrade).

I though this would have broke because 4.0 crashes if it finds tables with 
compact storage "flags" when reading the schema system tables, but 2i metadata 
are not written there so this pass. We still hit [this 
warning|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/schema/TableMetadata.java#L140]
 (and we should remove the calls to {{isDense}} and {{isCompound}} in 
{{CassandraIndex}} to avoid that) but that is the only consequence.

So my opinion here is that we should keep KEYS index for now (so revert their 
removal by this ticket), as we currently imo don't have a good upgrade story 
for them otherwise. We can look at this more closely later, but for me, it's 
not urgent, as their code is not that complex and fairly isolated.

I'll create a followup ticket soonish to add that upgrade test mentioned above 
and discuss a few minor related points, but as far as this ticket goes, let's 
keep KEYS indexes.


> Remove COMPACT STORAGE internals before 4.0 release
> ---
>
> Key: CASSANDRA-13994
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13994
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Local Write-Read Paths
>Reporter: Alex Petrov
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: 4.0, 4.0-alpha
>
>
> 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after 
> [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of 
> the related functionality is useless.
> There are still some things to consider:
> 1. One of the system tables (built indexes) was compact. For now, we just 
> added {{value}} column to it to make sure it's backwards-compatible, but we 
> might want to make sure it's just a "normal" table and doesn't have redundant 
> columns.
> 2. Compact Tables were building indexes in {{KEYS}} mode. Removing it is 
> trivial, but this would mean that all built indexes will be defunct. We could 
> log a warning for now and ask users to migrate off those for now and 
> completely remove it from future releases. It's just a couple of classes 
> though.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13994) Remove COMPACT STORAGE internals before 4.0 release

2020-06-25 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17144894#comment-17144894
 ] 

Sylvain Lebresne commented on CASSANDRA-13994:
--

bq. can someone please clarify if we want to move forward with this?

I vote yes (see additional context on the mailing list).

bq. This ticket was deemed too risky and would 'invalidate testing'

I don't know where those qualifications come from, and on what they were based, 
but they simply don't match what this ticket does and the attached patch. This 
ticket just removes dead code, not _that_ much of it, and it's hardly a 
chirurgical removal. There is no testing invalidation nor invasive changes 
involved. This is fake news.

I think people just got scared because this is compact storage and that has 
historically been messy. But the truth is that nearly all the complex and 
invasive removal of legacy code has been committed _years_ ago, mainly by 
CASSANDRA-5, CASSANDRA-12716 and CASSANDRA-10857. This ticket is just 
cleaning 2 small left-over, that's all.


> Remove COMPACT STORAGE internals before 4.0 release
> ---
>
> Key: CASSANDRA-13994
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13994
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Local Write-Read Paths
>Reporter: Alex Petrov
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: 4.0, 4.0-alpha
>
>
> 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after 
> [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of 
> the related functionality is useless.
> There are still some things to consider:
> 1. One of the system tables (built indexes) was compact. For now, we just 
> added {{value}} column to it to make sure it's backwards-compatible, but we 
> might want to make sure it's just a "normal" table and doesn't have redundant 
> columns.
> 2. Compact Tables were building indexes in {{KEYS}} mode. Removing it is 
> trivial, but this would mean that all built indexes will be defunct. We could 
> log a warning for now and ask users to migrate off those for now and 
> completely remove it from future releases. It's just a couple of classes 
> though.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-06-19 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140578#comment-17140578
 ] 

Sylvain Lebresne commented on CASSANDRA-12126:
--

Alright, my "tomorrow" is off by 1, but pushed an additional commit to 
implement the optimization suggested by Benedict above. Restarted CI for good 
measure.

||branch||CI||
|[3.0|https://github.com/pcmanus/cassandra/tree/C-12126-3.0]|[Run 
#155|https://ci-cassandra.apache.org/job/Cassandra-devbranch/155/]|
|[3.11|https://github.com/pcmanus/cassandra/tree/C-12126-3.11]|[Run 
#156|https://ci-cassandra.apache.org/job/Cassandra-devbranch/156/]|
|[4.0|https://github.com/pcmanus/cassandra/tree/C-12126-4.0]|[Run 
#157|https://ci-cassandra.apache.org/job/Cassandra-devbranch/157/]|


> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-06-17 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138578#comment-17138578
 ] 

Sylvain Lebresne commented on CASSANDRA-12126:
--

Ok, I understand what you are suggesting now and I agree this should work as 
well. And it does is more optimal.

I like to think of our algorithm as "pure Paxos instances" separated by the MRC 
to tell us when we can forget the previous instance and start a new one.  
Committing empty updates as any other updates still fits that mental model, 
while your suggestion adds a bit of a special case in that it bends the Paxos 
rules slightly, allowing to sometime ignore a previously accepted value in a 
promise (when it's empty). Which is not a criticism, just thinking out loud.  
It's more performant and this is likely worth the slight special casing since 
it's not too hard to reason about its correctness.

I'll sleep on it and modify to your suggestion tomorrow (which is trivial, just 
need to massage an appropriate comment to explain it).


> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-06-17 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138473#comment-17138473
 ] 

Sylvain Lebresne commented on CASSANDRA-12126:
--

I'll have to apologize, but I don't understand what you are suggesting.

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-06-17 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138455#comment-17138455
 ] 

Sylvain Lebresne commented on CASSANDRA-12126:
--

bq. I'm reasonably sure it cannot be necessary for us to commit an empty 
proposal, because we do not ever need to witness it.

We may have to be precise. We do not need to "apply" an empty commit, since 
it's a no-op, and the patch actually ensures we don't bother. But "committed" 
do something else, it update the "mrc" value, and _that_ needs to be done. 
Otherwise, if we _accept_ an empty proposal, yet does not update the "mrc" 
value, we will not do progress anymore (well, without additional modification 
to the algorithm that is).

But I could be misunderstanding what you are suggesting here. I'll note though, 
just in case that help, that the logic I'm calling faulty is not the _commit_ 
of empty updates (though, as said above, I think it's necessary for the sake of 
the mrc value), it's the fact the don't replay the _proposal_ of empty updates. 

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-06-17 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138370#comment-17138370
 ] 

Sylvain Lebresne commented on CASSANDRA-12126:
--

I noticed that the previous version of the patches wasn't working in all cases 
due to an existing quirk of the CAS implementation.

Namely, accepted updates that were empty were not replayed by 
{{beginAndRepairPaxos}}. Which is a problem for the new empty commits made 
during serial reads/non-applying CAS. I added tests to show that if the commit 
messages for those empty commits were lost/delayed, we could still have 
linearizability violations.

Now, the logic of not replaying empty updates looks wrong to me. There 
shouldn't be anything special about an empty update, and if one is explicitely 
accepted by a quorum of nodes, we shouldn't ignore it, or that's a break of the 
Paxos algorithm (as kind of can be demonstrated by the tests I added).

To be clear, that logic was added *by me* in CASSANDRA-6012 and that was the 
sole purpose of that ticket. Except that I can't make sense of my reasoning 
back then, and since I didn't included a test to demonstrate the problem I was 
solving back then (which was wrong, mea culpa), I have to assume that I was 
just confused (maybe I mixed in my head promised ballots and accepted ones?). 
Anyway, I think the fix here is simply to remove that bad logic, which fixes 
the issue, and I included an additional commit for that.


> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13994) Remove COMPACT STORAGE internals before 4.0 release

2020-06-17 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17138278#comment-17138278
 ] 

Sylvain Lebresne commented on CASSANDRA-13994:
--

I have made a first pass of review and offered a few remarks above.

But I think this ticket is hang up on us deciding whether removing the KEYS 
2ndary index code is ok or not. And this yield, to me, the question of what is 
the upgrade path to 4.0 for users that still have KEYS index (which, reminder, 
could only be created with Thrift, but could _used_ with CQL and thus still be 
around).

Because, while I haven't tested this myself, I suspect we have a hole here.

Namely, KEYS index were compact tables, and 4.0 does not *start* if there is 
still compact tables. And while for user tables, user are asked to use {{DROP 
COMPACT STORAGE}} before upgrading, this cannot be done on KEYS index (there is 
just no syntax to do it), so unless there is code I'm not aware of (and please, 
someone correct me if I'm wrong), I don't think user can upgrade to 4.0 at all 
if they still have KEYS index. They'd have to drop those index first.

So If I'm right here, this technically mean removing the KEYS index code in 4.0 
is fine, since you cannot upgrade in the first place if you have KEYS index. 
But the more important question for 4.0 imo is what is the upgrade path for 
users if they have a KEYS index in 3.X?

Currently (without code changes), the only available option I can think of is 
that before upgrade to 4.0, users would have to 1) drop their KEYS index and 
then 2) re-create a "normal" (non-KEYS) equivalent index.

Are we comfortable with that being the upgrade path for KEYS index?

Personally, I'm not sure I am because this is not a seamless upgrade, as 
between the 1) and 2) above, there is a window where there is no accessible 
index, so if the user application rely on it, it means a period of downtime for 
the application to perform the upgrade. However, if we want a more seamless 
upgrade, we need to figure something out, and that probably involve non trivial 
amounts of code and testing. And, playing devil's advocate, KEYS index being so 
old, maybe nobody that plans to upgrade to 4.0 have them anymore, and maybe 
it's not worth bothering?

So I could use others opinions here.

Tl;dr, this ticket raises the point that "Oops, I'm not sure we have though 
through the question of upgrade to 4.0 for KEYS indexes". And tbc, it's not 
directly related to this ticket, only indirectly, but it is still something we 
need to figure out. And I'd say, before 4.0-alpha. But I'm happy to create a 
separate ticket specific to that question if that helps.

> Remove COMPACT STORAGE internals before 4.0 release
> ---
>
> Key: CASSANDRA-13994
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13994
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Local Write-Read Paths
>Reporter: Alex Petrov
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: 4.0, 4.0-alpha
>
>
> 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after 
> [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of 
> the related functionality is useless.
> There are still some things to consider:
> 1. One of the system tables (built indexes) was compact. For now, we just 
> added {{value}} column to it to make sure it's backwards-compatible, but we 
> might want to make sure it's just a "normal" table and doesn't have redundant 
> columns.
> 2. Compact Tables were building indexes in {{KEYS}} mode. Removing it is 
> trivial, but this would mean that all built indexes will be defunct. We could 
> log a warning for now and ask users to migrate off those for now and 
> completely remove it from future releases. It's just a couple of classes 
> though.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-06-15 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-12126:
-
Test and Documentation Plan: Included in-jvm dtests
 Status: Patch Available  (was: Open)

I'm only semi-sure how to parse Jenkins CI results these days but from what I 
can tell, all failures are unrelated so marking ready for review.

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-06-12 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134239#comment-17134239
 ] 

Sylvain Lebresne commented on CASSANDRA-12126:
--

Ok, I've rebased the patch against 4.0 and started CI on it all:
||branch||CI||
|[3.0|https://github.com/pcmanus/cassandra/tree/C-12126-3.0]|[Run 
#146|https://ci-cassandra.apache.org/job/Cassandra-devbranch/146/]|
|[3.11|https://github.com/pcmanus/cassandra/tree/C-12126-3.11]|[Run 
#147|https://ci-cassandra.apache.org/job/Cassandra-devbranch/147/]|
|[4.0|https://github.com/pcmanus/cassandra/tree/C-12126-4.0]|[Run 
#148|https://ci-cassandra.apache.org/job/Cassandra-devbranch/148/]|

I included a commit to add the flag that disables the new empty commit for 
SERIAL reads as suggested by [~bdeggleston] earlier. Still slightly on the 
fence on the need for such flag, but I call it "unsafe" 
({{-Dcassandra.unsafe.disable-serial-reads-linearizability}} to be specific) 
and log a warning when used, so I'm at peace with that.

I'll note for future reviewers that while the 3.11 branch is almost a straight 
away merge up of 3.0, there is a minor differences on the 4.0 branch, namely:
 * the added in-jvm dtests needed a few changes to reflect 4.0 changes. To make 
that easier, I squashed 2 of the commits from the 3.0/3.11 branches, which is 
why that branch has one less commit.
 * There is a few changes related to the translation of 
{{WriteTimeoutException}} into {{CasWriteTimeoutException}} (I pushed it down 
in some cases). I believe this fixes a minor "bug" where the "contentions" 
number we returned with {{CasWriteTimeoutException}} was potentially inaccurate 
(namely, if we timed out in {{beginRepairAndPaxos}}, contention leading to that 
exception would be ignored)

I'll wait on getting usable CI results to officially mark it 'ready to review', 
but it is in spirit if anyone is burning to look at this.

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13994) Remove COMPACT STORAGE internals before 4.0 release

2020-06-10 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130721#comment-17130721
 ] 

Sylvain Lebresne commented on CASSANDRA-13994:
--

{quote}please don't remove it this far into the alphas.
{quote}
{quote}removing protocol V3 at this point would delay the ability of many to 
upgrade to Cassandra 4.0
{quote}
Fair enough, I don't mind keeping it more than that.

Though, fwiw, I do am a bit surprised by your points. Protocol v4 was added 
back in 2.2.0, and was a small enough iteration over V3 that the main drivers 
supported it right away. And C* 4.0 does not support upgrading from 2.X 
directly at all (in fact, reminder that you need to upgrade from at least 
3.0.13 or 3.11.0 according to the NEWS file). Requiring that users have 
upgraded their driver version in the last 5 years didn't felt, a priori, a big 
constraining ask to me. But I trust your "lots of people" and "of many" are 
backed by data, so again, happy to keep V3, I just admit surprise. TIL.

> Remove COMPACT STORAGE internals before 4.0 release
> ---
>
> Key: CASSANDRA-13994
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13994
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Local Write-Read Paths
>Reporter: Alex Petrov
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: 4.0, 4.0-rc
>
>
> 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after 
> [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of 
> the related functionality is useless.
> There are still some things to consider:
> 1. One of the system tables (built indexes) was compact. For now, we just 
> added {{value}} column to it to make sure it's backwards-compatible, but we 
> might want to make sure it's just a "normal" table and doesn't have redundant 
> columns.
> 2. Compact Tables were building indexes in {{KEYS}} mode. Removing it is 
> trivial, but this would mean that all built indexes will be defunct. We could 
> log a warning for now and ask users to migrate off those for now and 
> completely remove it from future releases. It's just a couple of classes 
> though.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13994) Remove COMPACT STORAGE internals before 4.0 release

2020-06-09 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129271#comment-17129271
 ] 

Sylvain Lebresne commented on CASSANDRA-13994:
--

Looked at the patch. I left a few nitpicks [on this 
commit|https://github.com/pcmanus/cassandra/commit/0ab608cbb6f78657ae6b3e99cea0f47d84aa3dad].
 More general remarks/questions:
 * I'd be in favor of removing support for the native protocol V3 while at it. 
V4 has been out since pre-C* 3.0, and V3 still use the cell layout in the 
paging state, which force us to keep some legacy code around like 
{{CellInLegacyOrderIterator}} in {{BTreeRow.java}} (plus, some complexity in 
{{PagingState}} can be nixed if we remove it). Overall, I find it doubtful that 
anyone would still be on V3 on 3.X+, but even if someone is, I think forcing 
them to upgrade to V4 pre-upgrade to 4.0 is a good idea even outside of the 
benefit of allowing us to remove some code.
 * In {{TableMetadata:}}
 ** I think we should remove the {{#isCompound}} method, as only compact tables 
could ever be non-compound, so "compound" does not make sense anymore. Also, we 
shouldn't remove the {{&& flags.contains(Flag.COMPOUND)}} part from 
{{Flag#isSupported}}, as a non-compound table is _not_ supported anymore and is 
indicative of someone forgetting to use {{DROP COMPACT STORAGE}}.
 ** We should however keep the {{CompactTable#isSuperColumnMapColumn}} method 
and its call (though it's probably not worth keeping the {{CompactTables}} 
class for that; a static method in {{TableMetadata}} is probably fine; I pushed 
a commit doing just that 
[here|https://github.com/pcmanus/cassandra/commit/0ab608cbb6f78657ae6b3e99cea0f47d84aa3dad]
 if you'd like) .
 * In {{SchemaEvent.java}}, in {{repr(TableMetadata)}}, we should keep the 
"isCounter" case (we still support counter tables). However, we should remove 
the "isCompound" entry (for the reason mentioned above).
 * In {{ColumnFamilyStore}}, at the end of the ctor, the code to detect 
unsupported indexes has been commented out. Is that intentional?
 * Maybe worth removing the references to {{COMPACT STORAGE}} in the doc, 
namely those in {{doc/source/cql/ddl.rst}}.
 * I'm a bit unsure what our story about upgrading KEYS indexes is, and in 
particular, I'm unsure we can remove them like that. That is, while we could 
ask users to drop their KEYS index and re-create them afterwards, this cannot 
be done without downtime (for anything touching the index), and are we OK with 
that?

{quote}About the system table Built_indexes, I saw there is some 
SystemKeyspaceMigrator40 class, I guess on start we can check and migrate this 
table there (if it was not already migrated) ?
{quote}
That's an option, and I'm ok if that's done. But fwiw, I'm equally ok with 
letting it be. Basically, that table simply has a {{value}} column that is 
unused, but as that table is meant for internal consumption in the first place, 
that feel like a pretty minor detail. So either way is fine with me.
{quote}We definitely need to be very careful when removing usage of compact 
storage methods, since most of the usage is in around the storage engine.
{quote}
I wanted to point out that all the hard, deep, storage engine level changes 
have been done when removing thrift, some times ago. The patch here is actually 
pretty simple and almost exclusively touch CQL. It's also pretty easy to 
convince oneself that the majority of the change is just removing dead code.

The one exception imo is the 2ndary index question, and that's worth being 
careful, but the rest is pretty straightforward.
{quote}Also, it might be useful to know the impact of the removal and whether 
or not we got anything from it performance-wise
{quote}
Related to my previous point, I'm cool if we do performance testing, that never 
hurt, but I think this is *very* low on the list of tickets that justify the 
effort. Again, we're merely removing a bunch of 'if' in CQL that are never 
taken anymore, so the performance impact is almost surely non measurable, one 
way or another.

> Remove COMPACT STORAGE internals before 4.0 release
> ---
>
> Key: CASSANDRA-13994
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13994
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Local Write-Read Paths
>Reporter: Alex Petrov
>Assignee: Ekaterina Dimitrova
>Priority: Low
> Fix For: 4.0, 4.0-rc
>
>
> 4.0 comes without thrift (after [CASSANDRA-5]) and COMPACT STORAGE (after 
> [CASSANDRA-10857]), and since Compact Storage flags are now disabled, all of 
> the related functionality is useless.
> There are still some things to consider:
> 1. One of the system tables (built indexes) was compact. For now, we just 
> added {{value}} column to it to make 

[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-06-05 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126937#comment-17126937
 ] 

Sylvain Lebresne commented on CASSANDRA-12126:
--

bq. The test cases I provided demonstrate several consistency violations during 
range movements.

Yes, sorry I hadn't read them before commenting. And I certainly agree those 
are problematic (I was about to open a ticket so it's tracked, but I'd say 
CASSANDRA-15745 kind of cover those).

bq. There are also (more debatably) issues with TTL on system.paxos

Agreed this has always been a weak point. It does feel somewhat separated of 
other consistency points though, and maybe short term we can just offer a way 
to override the TTL (with documentation on the tradeoffs involved)?

bq. Also, mixing LOCAL_SERIAL and SERIAL is entirely unsafe

Yeah. I'm not sure how to fix that one without a breaking API change though 
(namely, limiting their unrestricted use together). It's not "that" different 
from the fact we allow unrestricted mixing of serial and non-serial operations. 
 Which is something I don't like and I'm happy to discuss moving forward, but 
imo post-3.X material in the best of cases.

bq. I think it is worth considering if we should instead aggressively try to 
remedy all of the known issues, have a strong verification push, and then roll 
out all of the changes at-once - including a fix for this that does not regress 
performance.

It is certainly an option worth bringing, and thank you for that. I'm not sure 
how to really know what is the best option though, so I can only offer my 
current opinion.

Which is that I feel this issue is a very serious issue. And I don't mean that 
in a way that diminishes the seriousness of the other problems you mentioned, I 
mean that in absolute terms (the range movement issues are also fairly bad imo 
for instance). But leaving less of our known serious unaddressed feels better 
than not, so I'd personally prefer fixing that issue ASAP. Basically, I'm 
worried that waiting for a more all-encompassing fix might take us quite some 
time, with no absolute guarantee that we'll be collectively at ease with 
pushing that to 3.X.

Anyway, I'd like to move this forward personally. How do we decide if we do?


> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data 

[jira] [Updated] (CASSANDRA-15805) Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones interacts with collection tombstones

2020-05-27 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-15805:
-
  Fix Version/s: (was: 3.11.x)
 (was: 3.0.x)
 3.11.7
 3.0.21
  Since Version: 3.0 alpha 1
Source Control Link: 
[8358e19840d352475a5831d130ff3c43a11f2f4e|https://github.com/apache/cassandra/commit/8358e19840d352475a5831d130ff3c43a11f2f4e],
 
[c8a2834606d683ba9945e9cc11bdb4207ce269d1|https://github.com/apache/cassandra/commit/c8a2834606d683ba9945e9cc11bdb4207ce269d1]
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones 
> interacts with collection tombstones
> --
>
> Key: CASSANDRA-15805
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15805
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Local/SSTable
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Normal
> Fix For: 3.0.21, 3.11.7
>
>
> The legacy reading code ({{LegacyLayout}} and 
> {{UnfilteredDeserializer.OldFormatDeserializer}}) does not handle correctly 
> the case where a range tombstone covering multiple rows interacts with a 
> collection tombstone.
> A simple example of this problem is if one runs on 2.X:
> {noformat}
> CREATE TABLE t (
>   k int,
>   c1 text,
>   c2 text,
>   a text,
>   b set,
>   c text,
>   PRIMARY KEY((k), c1, c2)
> );
> // Delete all rows where c1 is 'A'
> DELETE FROM t USING TIMESTAMP 1 WHERE k = 0 AND c1 = 'A';
> // Inserts a row covered by that previous range tombstone
> INSERT INTO t(k, c1, c2, a, b, c) VALUES (0, 'A', 'X', 'foo', {'whatever'}, 
> 'bar') USING TIMESTAMP 2;
> // Delete the collection of that previously inserted row
> DELETE b FROM t USING TIMESTAMP 3 WHERE k = 0 AND c1 = 'A' and c2 = 'X';
> {noformat}
> If the following is ran on 2.X (with everything either flushed in the same 
> table or compacted together), then this will result in the inserted row being 
> duplicated (one part containing the {{a}} column, the other the {{c}} one).
> I will note that this is _not_ a duplicate of CASSANDRA-15789 and this 
> reproduce even with the fix to {{LegacyLayout}} of this ticket. That said, 
> the additional code added to CASSANDRA-15789 to force merging duplicated rows 
> if they are produced _will_ end up fixing this as a consequence (assuming 
> there is no variation of this problem that leads to other visible issues than 
> duplicated rows). That said, I "think" we'd still rather fix the source of 
> the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-05-27 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117833#comment-17117833
 ] 

Sylvain Lebresne commented on CASSANDRA-12126:
--

bq. But we do have other serious consistency violations that should also be 
fixed.

Could you expand on that?


> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15778) CorruptSSTableException after a 2.1 SSTable is upgraded to 3.0, failing reads

2020-05-27 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117827#comment-17117827
 ] 

Sylvain Lebresne commented on CASSANDRA-15778:
--

That patch looks like a reasonable solution to me, at least from my 
understanding of the issue.

Small comments on the code itself:
* I'd put a comment in {{AlterTableStatement}} to point out to this ticket (may 
feel like a peculiar special case to future readers without context).
* In {{AbstractType}}, the changes to {{writeValue}}/{{writtenLength}} feels 
confusing to me, and if the new code is ever triggered, this would mean we 
silently drop a value on the floor (we get a non-empty value, but the type say 
the value should be empty, so we'd write nothing), and that doesn't feel lik a 
good idea. Instead of specializing the 0 size case, I'd just add a {{assert 
valueLengthIfFixed < 0 || value.remaining() == valueLengthIfFixed}} to 
basically ensure we're not going to write something we don't know how to read 
(and effectively forbid the call of those method for {{EmptyType}} in 
conjunction with the existing assert).
* Assuming we agree on the previous point, I'd prefer not overriding the 
methods in {{EmptyType}}. For the write ones, it wouldn't add anything, and 
overriding {{readValue}} feels confusing when the rest of the code ensures we 
can never write an empty value through those methods.
* Nit: LegacySchemaMigrator has unused leftover imports 
({{java.io.InvalidClassException}} and 
{{net.bytebuddy.implementation.bytecode.Throw}}).


> CorruptSSTableException after a 2.1 SSTable is upgraded to 3.0, failing reads
> -
>
> Key: CASSANDRA-15778
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15778
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction, Local/SSTable
>Reporter: Sumanth Pasupuleti
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 3.0.x
>
>
> Below is the exception with stack trace. This issue is consistently 
> reproduce-able.
> {code:java}
> ERROR [SharedPool-Worker-1] 2020-05-01 14:57:57,661 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]ERROR [SharedPool-Worker-1] 2020-05-01 
> 14:57:57,661 AbstractLocalAwareExecutorService.java:169 - Uncaught exception 
> on thread 
> Thread[SharedPool-Worker-1,5,main]org.apache.cassandra.io.sstable.CorruptSSTableException:
>  Corrupted: 
> /mnt/data/cassandra/data//  at 
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator$Reader.hasNext(AbstractSSTableIterator.java:349)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator.hasNext(AbstractSSTableIterator.java:220)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.columniterator.SSTableIterator.hasNext(SSTableIterator.java:33)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:95)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:129) 
> ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:95)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:129) 
> ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:129) 
> ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:131)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77)
>  ~[nf-cassandra-3.0.19.8.jar:3.0.19.8] at 
> 

[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-05-27 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117628#comment-17117628
 ] 

Sylvain Lebresne commented on CASSANDRA-12126:
--

bq. I'm amenable to such flag

Actually, let me rephrase that a bit. I'd *really* prefer not adding such flag. 
If someone is ok with serializability without linearizability, then they can 
use QUORUM reads, and given how things are implemented, it provides 
(non-strict) serializability. Granted, for someone that uses SERIAL today, is 
ok with the lack of linearizability and can't afford the performance penalty, 
it'll require a client side change, which this flag would avoid, so there is 
not zero value to such flag. But I suspect user fitting that category 
(knowingly ok with lack of linearizability) is really really small, and we 
always have to make trade-offs. So in that case I feel adding one more flag, 
one I consider dangerous, is not worth it. So to clarify, if a consensus 
appears for such flag, so be it, I'll add it, but I'm personally not neutral 
either.

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-05-27 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117615#comment-17117615
 ] 

Sylvain Lebresne commented on CASSANDRA-12126:
--

{quote}I think we should include a flag to disable the fix
{quote}
The option of having a flag occurred to me, but I rejected it initially because 
I continue to believe the current behavior is wrong (a moral judgment, I guess) 
and in principle, having a "please, make my database broken" flag does not feel 
like a good idea.

But I reckon that it _may_ exists advanced users that did noticed the lack of 
linearizability for reads and effectively built around it knowingly, for which 
the performance impact may be considered a regression with no upside (but if 
you sense skepticism on my part when reading that sentence, you're radar is not 
completely off).

And as we're talking minor upgrade here, I'm amenable to such flag, though I'd 
prefer making it clear somehow that it is unsafe/risky and something we may 
remove in the future with no particular warning.
{quote}It would be good to have a test for that as well.
{quote}
Certainly, good point, I can add the 2 missing interleaving.
{quote}do we actually claim our consistency properties are for SERIAL?
{quote}
While our official doc on the matter is certainly lacking (not spelling much 
guarantee at all afaict, and I'm happy to piggy-back on this ticket to correct 
that), we've always implied linearizability. I have, at least, and I'm sure I 
can dig up other doing it as well on the mailing list if necessary. We did this 
both by throwing the linearizable word out from time to time, but also by 
repeatedly recommending that when a write times out, one needs to issue a 
SERIAL read to 'observe' if that write went through or not (and as an aside, if 
you can't rely on either reads or non-applying CAS for that, I'm not even sure 
how to use LWTs, except maybe for excessively specific cases).
{quote}perhaps we should instead introduce a new STRICT_SERIAL consistency level
{quote}
I'm rather cold on that because, tbh. I think non-strict serializability is a 
theoretical notion that is useless in practice and that it is something we 
should not offer. And I'd rather avoid one more "feature" for which we spend 
our time saying "don't use it".
{quote}I've pushed various test cases
{quote}
Awesome, thanks. I'll look at integrating those in the branch if you don't mind.

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should 

[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-05-26 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116578#comment-17116578
 ] 

Sylvain Lebresne commented on CASSANDRA-12126:
--

It definitively doesn't look good that this messages comes so late, but I feel 
this is a serious issue of the {{SERIAL}}/{{LOCAL_SERIAL}} consistency levels 
since this breaks the basic guarantee they exist to provide, and as such should 
be fixed all the way down 3.0, and the sooner, the better.

In an attempt to sum this up quickly, the problem we have here affects both 
serial reads _and_ LWT updates that do not apply (whose condition evaluates to 
{{false}}). In both case, while the current code replays "effectively 
committed" proposals (those whose proposal has been accepted by majority of 
replica) with {{beginAndRepairPaxos}}, neither make proposals of their own, so 
nothing will prevent a proposal accepted by a minority of replica (say just 
one) to be later replayed (and thus committed).

I've pushed [2 in-jvm 
dtests|https://github.com/pcmanus/cassandra/commit/3442277905362b38e0d6a2b8170916fcfd18d469]
 that demonstrate the issue for both cases (again, serial reads and 
non-applying updates). They use "filters" to selectively drop messages to make 
failure consistent but aren't otherwise very involved.

As [~kohlisankalp] mentioned initially, the "simplest"\[1\] way to fix this 
that I see is to commit an empty update in both cases. Actually committing, 
which sets the {{mostRecentCommit}} value in the Paxos state, ensures that no 
prior proposal can ever be replayed. I've pushed a patch to do so on 3.0/3.11 
below (will merge up on 4.0, but wanted to make sure we're ok on the approach 
first):

||version||
| [3.0|https://github.com/pcmanus/cassandra/commits/C-12126-3.0] |
| [3.11|https://github.com/pcmanus/cassandra/commits/C-12126-3.11] |

The big downside of this patch however is the performance impact. Currently, a 
{{SERIAL}} read (that finds nothing in progress it needs to replay) is 2 
round-trips (a prepare phase, followed by the actual read). With this patch, it 
is 3 round-trips as we have to propose our empty commit and get acceptance (we 
don't really have to wait for responses on the commit though), which will be 
noticeable for performance sensitive use-cases. Similarly, the performance of 
LWT that don't apply will be impacted.

That said, I don't seen another approach to fixing this that would be as 
acceptable for 3.0/3.11 in terms of risks, and imo 'slower but correct' beats 
'faster but broken' any day, so I'm in favor of moving forward with this fix.

Opinions?



\[1\]: I mean by that both the simplicity of the change, but also of validating 
that this fix the problem at hand without creating new correctness problems.


> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is 

[jira] [Comment Edited] (CASSANDRA-15805) Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones interacts with collection tombstones

2020-05-26 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111019#comment-17111019
 ] 

Sylvain Lebresne edited comment on CASSANDRA-15805 at 5/26/20, 9:42 AM:


Thanks for the review. I addressed the comments, squash-cleaned, 'merged' into 
3.11 and started CI (first try at https://ci-cassandra.apache.org, not sure how 
that will go).

||branch||CI||
| [3.0|https://github.com/pcmanus/cassandra/commits/C-15805-3.0] | 
[ci-cassandra 
#134|https://ci-cassandra.apache.org/job/Cassandra-devbranch/134/] |
| [3.11|https://github.com/pcmanus/cassandra/commits/C-15805-3.11] | 
[ci-cassandra 
#135|https://ci-cassandra.apache.org/job/Cassandra-devbranch/135/] |



was (Author: slebresne):
Thanks for the review. I addressed the comments, squash-cleaned, 'merged' into 
3.11 and started CI (first try at https://ci-cassandra.apache.org, not sure how 
that will go).

||branch||CI||
| [3.0|https://github.com/pcmanus/cassandra/commits/C-15805-3.0] | 
[ci-cassandra 
#131|https://ci-cassandra.apache.org/job/Cassandra-devbranch/131/] |
| [3.11|https://github.com/pcmanus/cassandra/commits/C-15805-3.11] | 
[ci-cassandra 
#132|https://ci-cassandra.apache.org/job/Cassandra-devbranch/132/] |


> Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones 
> interacts with collection tombstones
> --
>
> Key: CASSANDRA-15805
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15805
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Local/SSTable
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Normal
> Fix For: 3.0.x, 3.11.x
>
>
> The legacy reading code ({{LegacyLayout}} and 
> {{UnfilteredDeserializer.OldFormatDeserializer}}) does not handle correctly 
> the case where a range tombstone covering multiple rows interacts with a 
> collection tombstone.
> A simple example of this problem is if one runs on 2.X:
> {noformat}
> CREATE TABLE t (
>   k int,
>   c1 text,
>   c2 text,
>   a text,
>   b set,
>   c text,
>   PRIMARY KEY((k), c1, c2)
> );
> // Delete all rows where c1 is 'A'
> DELETE FROM t USING TIMESTAMP 1 WHERE k = 0 AND c1 = 'A';
> // Inserts a row covered by that previous range tombstone
> INSERT INTO t(k, c1, c2, a, b, c) VALUES (0, 'A', 'X', 'foo', {'whatever'}, 
> 'bar') USING TIMESTAMP 2;
> // Delete the collection of that previously inserted row
> DELETE b FROM t USING TIMESTAMP 3 WHERE k = 0 AND c1 = 'A' and c2 = 'X';
> {noformat}
> If the following is ran on 2.X (with everything either flushed in the same 
> table or compacted together), then this will result in the inserted row being 
> duplicated (one part containing the {{a}} column, the other the {{c}} one).
> I will note that this is _not_ a duplicate of CASSANDRA-15789 and this 
> reproduce even with the fix to {{LegacyLayout}} of this ticket. That said, 
> the additional code added to CASSANDRA-15789 to force merging duplicated rows 
> if they are produced _will_ end up fixing this as a consequence (assuming 
> there is no variation of this problem that leads to other visible issues than 
> duplicated rows). That said, I "think" we'd still rather fix the source of 
> the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-05-20 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne reassigned CASSANDRA-12126:


Assignee: Sylvain Lebresne

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15805) Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones interacts with collection tombstones

2020-05-20 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111019#comment-17111019
 ] 

Sylvain Lebresne edited comment on CASSANDRA-15805 at 5/20/20, 8:59 AM:


Thanks for the review. I addressed the comments, squash-cleaned, 'merged' into 
3.11 and started CI (first try at https://ci-cassandra.apache.org, not sure how 
that will go).

||branch||CI||
| [3.0|https://github.com/pcmanus/cassandra/commits/C-15805-3.0] | 
[ci-cassandra 
#131|https://ci-cassandra.apache.org/job/Cassandra-devbranch/131/] |
| [3.11|https://github.com/pcmanus/cassandra/commits/C-15805-3.11] | 
[ci-cassandra 
#132|https://ci-cassandra.apache.org/job/Cassandra-devbranch/132/] |



was (Author: slebresne):
Thanks for the review. I addressed the comments, squash-cleaned, 'merged' into 
3.11 and started CI (first try at https://ci-cassandra.apache.org, not sure how 
that will go).

||branch||CI||
| [3.0|https://github.com/pcmanus/cassandra/commits/C-15805-3.0] | 
[ci-cassandra 
#122|https://ci-cassandra.apache.org/job/Cassandra-devbranch/131/] |
| [3.11|https://github.com/pcmanus/cassandra/commits/C-15805-3.11] | 
[ci-cassandra 
#123|https://ci-cassandra.apache.org/job/Cassandra-devbranch/132/] |


> Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones 
> interacts with collection tombstones
> --
>
> Key: CASSANDRA-15805
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15805
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Local/SSTable
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Normal
> Fix For: 3.0.x, 3.11.x
>
>
> The legacy reading code ({{LegacyLayout}} and 
> {{UnfilteredDeserializer.OldFormatDeserializer}}) does not handle correctly 
> the case where a range tombstone covering multiple rows interacts with a 
> collection tombstone.
> A simple example of this problem is if one runs on 2.X:
> {noformat}
> CREATE TABLE t (
>   k int,
>   c1 text,
>   c2 text,
>   a text,
>   b set,
>   c text,
>   PRIMARY KEY((k), c1, c2)
> );
> // Delete all rows where c1 is 'A'
> DELETE FROM t USING TIMESTAMP 1 WHERE k = 0 AND c1 = 'A';
> // Inserts a row covered by that previous range tombstone
> INSERT INTO t(k, c1, c2, a, b, c) VALUES (0, 'A', 'X', 'foo', {'whatever'}, 
> 'bar') USING TIMESTAMP 2;
> // Delete the collection of that previously inserted row
> DELETE b FROM t USING TIMESTAMP 3 WHERE k = 0 AND c1 = 'A' and c2 = 'X';
> {noformat}
> If the following is ran on 2.X (with everything either flushed in the same 
> table or compacted together), then this will result in the inserted row being 
> duplicated (one part containing the {{a}} column, the other the {{c}} one).
> I will note that this is _not_ a duplicate of CASSANDRA-15789 and this 
> reproduce even with the fix to {{LegacyLayout}} of this ticket. That said, 
> the additional code added to CASSANDRA-15789 to force merging duplicated rows 
> if they are produced _will_ end up fixing this as a consequence (assuming 
> there is no variation of this problem that leads to other visible issues than 
> duplicated rows). That said, I "think" we'd still rather fix the source of 
> the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15805) Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones interacts with collection tombstones

2020-05-20 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111019#comment-17111019
 ] 

Sylvain Lebresne edited comment on CASSANDRA-15805 at 5/20/20, 8:59 AM:


Thanks for the review. I addressed the comments, squash-cleaned, 'merged' into 
3.11 and started CI (first try at https://ci-cassandra.apache.org, not sure how 
that will go).

||branch||CI||
| [3.0|https://github.com/pcmanus/cassandra/commits/C-15805-3.0] | 
[ci-cassandra 
#122|https://ci-cassandra.apache.org/job/Cassandra-devbranch/131/] |
| [3.11|https://github.com/pcmanus/cassandra/commits/C-15805-3.11] | 
[ci-cassandra 
#123|https://ci-cassandra.apache.org/job/Cassandra-devbranch/132/] |



was (Author: slebresne):
Thanks for the review. I addressed the comments, squash-cleaned, 'merged' into 
3.11 and started CI (first try at https://ci-cassandra.apache.org, not sure how 
that will go).

||branch||CI||
| [3.0|https://github.com/pcmanus/cassandra/commits/C-15805-3.0] | 
[ci-cassandra 
#122|https://ci-cassandra.apache.org/job/Cassandra-devbranch/122/] |
| [3.11|https://github.com/pcmanus/cassandra/commits/C-15805-3.11] | 
[ci-cassandra 
#123|https://ci-cassandra.apache.org/job/Cassandra-devbranch/123/] |


> Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones 
> interacts with collection tombstones
> --
>
> Key: CASSANDRA-15805
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15805
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Local/SSTable
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Normal
> Fix For: 3.0.x, 3.11.x
>
>
> The legacy reading code ({{LegacyLayout}} and 
> {{UnfilteredDeserializer.OldFormatDeserializer}}) does not handle correctly 
> the case where a range tombstone covering multiple rows interacts with a 
> collection tombstone.
> A simple example of this problem is if one runs on 2.X:
> {noformat}
> CREATE TABLE t (
>   k int,
>   c1 text,
>   c2 text,
>   a text,
>   b set,
>   c text,
>   PRIMARY KEY((k), c1, c2)
> );
> // Delete all rows where c1 is 'A'
> DELETE FROM t USING TIMESTAMP 1 WHERE k = 0 AND c1 = 'A';
> // Inserts a row covered by that previous range tombstone
> INSERT INTO t(k, c1, c2, a, b, c) VALUES (0, 'A', 'X', 'foo', {'whatever'}, 
> 'bar') USING TIMESTAMP 2;
> // Delete the collection of that previously inserted row
> DELETE b FROM t USING TIMESTAMP 3 WHERE k = 0 AND c1 = 'A' and c2 = 'X';
> {noformat}
> If the following is ran on 2.X (with everything either flushed in the same 
> table or compacted together), then this will result in the inserted row being 
> duplicated (one part containing the {{a}} column, the other the {{c}} one).
> I will note that this is _not_ a duplicate of CASSANDRA-15789 and this 
> reproduce even with the fix to {{LegacyLayout}} of this ticket. That said, 
> the additional code added to CASSANDRA-15789 to force merging duplicated rows 
> if they are produced _will_ end up fixing this as a consequence (assuming 
> there is no variation of this problem that leads to other visible issues than 
> duplicated rows). That said, I "think" we'd still rather fix the source of 
> the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15805) Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones interacts with collection tombstones

2020-05-19 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111019#comment-17111019
 ] 

Sylvain Lebresne commented on CASSANDRA-15805:
--

Thanks for the review. I addressed the comments, squash-cleaned, 'merged' into 
3.11 and started CI (first try at https://ci-cassandra.apache.org, not sure how 
that will go).

||branch||CI||
| [3.0|https://github.com/pcmanus/cassandra/commits/C-15805-3.0] | 
[ci-cassandra 
#122|https://ci-cassandra.apache.org/job/Cassandra-devbranch/122/] |
| [3.11|https://github.com/pcmanus/cassandra/commits/C-15805-3.11] | 
[ci-cassandra 
#123|https://ci-cassandra.apache.org/job/Cassandra-devbranch/123/] |


> Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones 
> interacts with collection tombstones
> --
>
> Key: CASSANDRA-15805
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15805
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Local/SSTable
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Normal
> Fix For: 3.0.x, 3.11.x
>
>
> The legacy reading code ({{LegacyLayout}} and 
> {{UnfilteredDeserializer.OldFormatDeserializer}}) does not handle correctly 
> the case where a range tombstone covering multiple rows interacts with a 
> collection tombstone.
> A simple example of this problem is if one runs on 2.X:
> {noformat}
> CREATE TABLE t (
>   k int,
>   c1 text,
>   c2 text,
>   a text,
>   b set,
>   c text,
>   PRIMARY KEY((k), c1, c2)
> );
> // Delete all rows where c1 is 'A'
> DELETE FROM t USING TIMESTAMP 1 WHERE k = 0 AND c1 = 'A';
> // Inserts a row covered by that previous range tombstone
> INSERT INTO t(k, c1, c2, a, b, c) VALUES (0, 'A', 'X', 'foo', {'whatever'}, 
> 'bar') USING TIMESTAMP 2;
> // Delete the collection of that previously inserted row
> DELETE b FROM t USING TIMESTAMP 3 WHERE k = 0 AND c1 = 'A' and c2 = 'X';
> {noformat}
> If the following is ran on 2.X (with everything either flushed in the same 
> table or compacted together), then this will result in the inserted row being 
> duplicated (one part containing the {{a}} column, the other the {{c}} one).
> I will note that this is _not_ a duplicate of CASSANDRA-15789 and this 
> reproduce even with the fix to {{LegacyLayout}} of this ticket. That said, 
> the additional code added to CASSANDRA-15789 to force merging duplicated rows 
> if they are produced _will_ end up fixing this as a consequence (assuming 
> there is no variation of this problem that leads to other visible issues than 
> duplicated rows). That said, I "think" we'd still rather fix the source of 
> the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15789) Rows can get duplicated in mixed major-version clusters and after full upgrade

2020-05-18 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110270#comment-17110270
 ] 

Sylvain Lebresne commented on CASSANDRA-15789:
--

+1 from me.

> Rows can get duplicated in mixed major-version clusters and after full upgrade
> --
>
> Key: CASSANDRA-15789
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15789
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Local/Memtable, Local/SSTable
>Reporter: Aleksey Yeschenko
>Assignee: Marcus Eriksson
>Priority: Normal
>
> In a mixed 2.X/3.X major version cluster a sequence of row deletes, 
> collection overwrites, paging, and read repair can cause 3.X nodes to split 
> individual rows into several rows with identical clustering. This happens due 
> to 2.X paging and RT semantics, and a 3.X {{LegacyLayout}} deficiency.
> To reproduce, set up a 2-node mixed major version cluster with the following 
> table:
> {code}
> CREATE TABLE distributed_test_keyspace.tlb (
> pk int,
> ck int,
> v map,
> PRIMARY KEY (pk, ck)
> );
> {code}
> 1. Using either node as the coordinator, delete the row with ck=2 using 
> timestamp 1
> {code}
> DELETE FROM tbl USING TIMESTAMP 1 WHERE pk = 1 AND ck = 2;
> {code}
> 2. Using either node as the coordinator, insert the following 3 rows:
> {code}
> INSERT INTO tbl (pk, ck, v) VALUES (1, 1, {'e':'f'}) USING TIMESTAMP 3;
> INSERT INTO tbl (pk, ck, v) VALUES (1, 2, {'g':'h'}) USING TIMESTAMP 3;
> INSERT INTO tbl (pk, ck, v) VALUES (1, 3, {'i':'j'}) USING TIMESTAMP 3;
> {code}
> 3. Flush the table on both nodes
> 4. Using the 2.2 node as the coordinator, force read repar by querying the 
> table with page size = 2:
>  
> {code}
> SELECT * FROM tbl;
> {code}
> 5. Overwrite the row with ck=2 using timestamp 5:
> {code}
> INSERT INTO tbl (pk, ck, v) VALUES (1, 2, {'g':'h'}) USING TIMESTAMP 5;}}
> {code}
> 6. Query the 3.0 node and observe the split row:
> {code}
> cqlsh> select * from distributed_test_keyspace.tlb ;
>  pk | ck | v
> ++
>   1 |  1 | {'e': 'f'}
>   1 |  2 | {'g': 'h'}
>   1 |  2 | {'k': 'l'}
>   1 |  3 | {'i': 'j'}
> {code}
> This happens because the read to query the second page ends up generating the 
> following mutation for the 3.0 node:
> {code}
> ColumnFamily(tbl -{deletedAt=-9223372036854775808, localDeletion=2147483647,
>  ranges=[2:v:_-2:v:!, deletedAt=2, localDeletion=1588588821]
> [2:v:!-2:!,   deletedAt=1, localDeletion=1588588821]
> [3:v:_-3:v:!, deletedAt=2, localDeletion=1588588821]}-
>  [2:v:63:false:1@3,])
> {code}
> Which on 3.0 side gets incorrectly deserialized as
> {code}
> Mutation(keyspace='distributed_test_keyspace', key='0001', modifications=[
>   [distributed_test_keyspace.tbl] key=1 
> partition_deletion=deletedAt=-9223372036854775808, localDeletion=2147483647 
> columns=[[] | [v]]
> Row[info=[ts=-9223372036854775808] ]: ck=2 | del(v)=deletedAt=2, 
> localDeletion=1588588821, [v[c]=d ts=3]
> Row[info=[ts=-9223372036854775808] del=deletedAt=1, 
> localDeletion=1588588821 ]: ck=2 |
> Row[info=[ts=-9223372036854775808] ]: ck=3 | del(v)=deletedAt=2, 
> localDeletion=1588588821
> ])
> {code}
> {{LegacyLayout}} correctly interprets a range tombstone whose start and 
> finish {{collectionName}} values don't match as a wrapping fragment of a 
> legacy row deletion that's being interrupted by a collection deletion 
> (correctly) - see 
> [code|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/LegacyLayout.java#L1874-L1889].
>  Quoting the comment inline:
> {code}
> // Because of the way RangeTombstoneList work, we can have a tombstone where 
> only one of
> // the bound has a collectionName. That happens if we have a big tombstone A 
> (spanning one
> // or multiple rows) and a collection tombstone B. In that case, 
> RangeTombstoneList will
> // split this into 3 RTs: the first one from the beginning of A to the 
> beginning of B,
> // then B, then a third one from the end of B to the end of A. To make this 
> simpler, if
>  // we detect that case we transform the 1st and 3rd tombstone so they don't 
> end in the middle
>  // of a row (which is still correct).
> {code}
> {{LegacyLayout#addRowTombstone()}} method then chokes when it encounters such 
> a tombstone in the middle of an existing row - having seen {{v[c]=d}} first, 
> and mistakenly starts a new row, while in the middle of an existing one: (see 
> [code|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/LegacyLayout.java#L1500-L1501]).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CASSANDRA-15805) Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones interacts with collection tombstones

2020-05-13 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106271#comment-17106271
 ] 

Sylvain Lebresne commented on CASSANDRA-15805:
--

There is probably a few variations for how to fix this, but what feels the more 
intuitive to me is to:
# slightly modify the ctor of {{LegacyRangeTombstone}} so the {{atom5}} of my 
previous comment use an inclusive start. Mostly because that make the rest of 
the logic a bit simpler imo (we can still assume that when we get a atom whose 
cluster strictly sort after the currently grouper row, we're done with that 
row).
# modify {{UnfilteredDeserializer.OldFormatDeserializer}} so that when, while 
grouping a row, it encounters a RT that covers it, is "splits" that into a row 
tombstone covering the row, and push back the handling of the rest of the 
tombstone to when the row is truly finished.

I've pushed a patch doing so for 3.0 below (thanks to [~markus] for triggering 
CI on this):
||branch||unit tests||dtests||jvm dtests||jvm upgrade dtest||
| https://github.com/pcmanus/cassandra/commits/C-15805-3.0 | 
[utests|https://circleci.com/gh/krummas/cassandra/3289] | 
[vnodes|https://circleci.com/gh/krummas/cassandra/3292] 
[no-vnodes|https://circleci.com/gh/krummas/cassandra/3293] | [jvm 
dtests|https://circleci.com/gh/krummas/cassandra/3290] | [upgrade 
dtests|https://circleci.com/gh/krummas/cassandra/3294] |

I'll note that the branch contains another small fix, that is a lot less 
important. Namely, [the return at the beginning of 
{{CellGrouper#addCollectionTombstone}}|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/LegacyLayout.java#L1516]
 should be {{true}}, not {{false}} as it currently is. The test in question 
checks if a collection tombstone happens to not be selected by the query we're 
decoding data for. If it isn't included, we can ignore the tombstone, so we 
can/should return, but not with {{false}} as that imply the row is finished, 
which it probably isn't.

Now the reason I say that last problem is less important is that in practice, 
only thrift queries should run into this (since CQL queries queries all column 
effectively) so even if we duplicate the row here, it won't matter when the 
result is converted back to thrift (besides, having a collection tombstone 
implies that this is a thrift query on a CQL table, which is dodgy in the first 
place). Anyway, the code is still obviously wrong and the fix trivial, so 
included it it nonetheless (in a separate commit).


> Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones 
> interacts with collection tombstones
> --
>
> Key: CASSANDRA-15805
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15805
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Local/SSTable
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Normal
> Fix For: 3.0.x, 3.11.x
>
>
> The legacy reading code ({{LegacyLayout}} and 
> {{UnfilteredDeserializer.OldFormatDeserializer}}) does not handle correctly 
> the case where a range tombstone covering multiple rows interacts with a 
> collection tombstone.
> A simple example of this problem is if one runs on 2.X:
> {noformat}
> CREATE TABLE t (
>   k int,
>   c1 text,
>   c2 text,
>   a text,
>   b set,
>   c text,
>   PRIMARY KEY((k), c1, c2)
> );
> // Delete all rows where c1 is 'A'
> DELETE FROM t USING TIMESTAMP 1 WHERE k = 0 AND c1 = 'A';
> // Inserts a row covered by that previous range tombstone
> INSERT INTO t(k, c1, c2, a, b, c) VALUES (0, 'A', 'X', 'foo', {'whatever'}, 
> 'bar') USING TIMESTAMP 2;
> // Delete the collection of that previously inserted row
> DELETE b FROM t USING TIMESTAMP 3 WHERE k = 0 AND c1 = 'A' and c2 = 'X';
> {noformat}
> If the following is ran on 2.X (with everything either flushed in the same 
> table or compacted together), then this will result in the inserted row being 
> duplicated (one part containing the {{a}} column, the other the {{c}} one).
> I will note that this is _not_ a duplicate of CASSANDRA-15789 and this 
> reproduce even with the fix to {{LegacyLayout}} of this ticket. That said, 
> the additional code added to CASSANDRA-15789 to force merging duplicated rows 
> if they are produced _will_ end up fixing this as a consequence (assuming 
> there is no variation of this problem that leads to other visible issues than 
> duplicated rows). That said, I "think" we'd still rather fix the source of 
> the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: 

[jira] [Comment Edited] (CASSANDRA-15805) Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones interacts with collection tombstones

2020-05-13 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106271#comment-17106271
 ] 

Sylvain Lebresne edited comment on CASSANDRA-15805 at 5/13/20, 12:51 PM:
-

There is probably a few variations for how to fix this, but what feels the more 
intuitive to me is to:
 # slightly modify the ctor of {{LegacyRangeTombstone}} so the {{atom5}} of my 
previous comment use an inclusive start. Mostly because that make the rest of 
the logic a bit simpler imo (we can still assume that when we get a atom whose 
cluster strictly sort after the currently grouper row, we're done with that 
row).
 # modify {{UnfilteredDeserializer.OldFormatDeserializer}} so that when, while 
grouping a row, it encounters a RT that covers it, is "splits" that into a row 
tombstone covering the row, and push back the handling of the rest of the 
tombstone to when the row is truly finished.

I've pushed a patch doing so for 3.0 below (thanks to [~marcuse] for triggering 
CI on this):
||branch||unit tests||dtests||jvm dtests||jvm upgrade dtest||
|[https://github.com/pcmanus/cassandra/commits/C-15805-3.0]|[utests|https://circleci.com/gh/krummas/cassandra/3289]|[vnodes|https://circleci.com/gh/krummas/cassandra/3292]
 [no-vnodes|https://circleci.com/gh/krummas/cassandra/3293]|[jvm 
dtests|https://circleci.com/gh/krummas/cassandra/3290]|[upgrade 
dtests|https://circleci.com/gh/krummas/cassandra/3294]|

I'll note that the branch contains another small fix, that is a lot less 
important. Namely, [the return at the beginning of 
{{CellGrouper#addCollectionTombstone}}|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/LegacyLayout.java#L1516]
 should be {{true}}, not {{false}} as it currently is. The test in question 
checks if a collection tombstone happens to not be selected by the query we're 
decoding data for. If it isn't included, we can ignore the tombstone, so we 
can/should return, but not with {{false}} as that imply the row is finished, 
which it probably isn't.

Now the reason I say that last problem is less important is that in practice, 
only thrift queries should run into this (since CQL queries queries all column 
effectively) so even if we duplicate the row here, it won't matter when the 
result is converted back to thrift (besides, having a collection tombstone 
implies that this is a thrift query on a CQL table, which is dodgy in the first 
place). Anyway, the code is still obviously wrong and the fix trivial, so 
included it it nonetheless (in a separate commit).


was (Author: slebresne):
There is probably a few variations for how to fix this, but what feels the more 
intuitive to me is to:
# slightly modify the ctor of {{LegacyRangeTombstone}} so the {{atom5}} of my 
previous comment use an inclusive start. Mostly because that make the rest of 
the logic a bit simpler imo (we can still assume that when we get a atom whose 
cluster strictly sort after the currently grouper row, we're done with that 
row).
# modify {{UnfilteredDeserializer.OldFormatDeserializer}} so that when, while 
grouping a row, it encounters a RT that covers it, is "splits" that into a row 
tombstone covering the row, and push back the handling of the rest of the 
tombstone to when the row is truly finished.

I've pushed a patch doing so for 3.0 below (thanks to [~markus] for triggering 
CI on this):
||branch||unit tests||dtests||jvm dtests||jvm upgrade dtest||
| https://github.com/pcmanus/cassandra/commits/C-15805-3.0 | 
[utests|https://circleci.com/gh/krummas/cassandra/3289] | 
[vnodes|https://circleci.com/gh/krummas/cassandra/3292] 
[no-vnodes|https://circleci.com/gh/krummas/cassandra/3293] | [jvm 
dtests|https://circleci.com/gh/krummas/cassandra/3290] | [upgrade 
dtests|https://circleci.com/gh/krummas/cassandra/3294] |

I'll note that the branch contains another small fix, that is a lot less 
important. Namely, [the return at the beginning of 
{{CellGrouper#addCollectionTombstone}}|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/LegacyLayout.java#L1516]
 should be {{true}}, not {{false}} as it currently is. The test in question 
checks if a collection tombstone happens to not be selected by the query we're 
decoding data for. If it isn't included, we can ignore the tombstone, so we 
can/should return, but not with {{false}} as that imply the row is finished, 
which it probably isn't.

Now the reason I say that last problem is less important is that in practice, 
only thrift queries should run into this (since CQL queries queries all column 
effectively) so even if we duplicate the row here, it won't matter when the 
result is converted back to thrift (besides, having a collection tombstone 
implies that this is a thrift query on a CQL table, which is dodgy in the first 
place). Anyway, the code is still obviously wrong and the fix 

[jira] [Updated] (CASSANDRA-15805) Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones interacts with collection tombstones

2020-05-12 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-15805:
-
 Bug Category: Parent values: Correctness(12982)Level 1 values: Recoverable 
Corruption / Loss(12986)
   Complexity: Normal
Discovered By: User Report
 Severity: Critical
 Assignee: Sylvain Lebresne
   Status: Open  (was: Triage Needed)

> Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones 
> interacts with collection tombstones
> --
>
> Key: CASSANDRA-15805
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15805
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Local/SSTable
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Normal
>
> The legacy reading code ({{LegacyLayout}} and 
> {{UnfilteredDeserializer.OldFormatDeserializer}}) does not handle correctly 
> the case where a range tombstone covering multiple rows interacts with a 
> collection tombstone.
> A simple example of this problem is if one runs on 2.X:
> {noformat}
> CREATE TABLE t (
>   k int,
>   c1 text,
>   c2 text,
>   a text,
>   b set,
>   c text,
>   PRIMARY KEY((k), c1, c2)
> );
> // Delete all rows where c1 is 'A'
> DELETE FROM t USING TIMESTAMP 1 WHERE k = 0 AND c1 = 'A';
> // Inserts a row covered by that previous range tombstone
> INSERT INTO t(k, c1, c2, a, b, c) VALUES (0, 'A', 'X', 'foo', {'whatever'}, 
> 'bar') USING TIMESTAMP 2;
> // Delete the collection of that previously inserted row
> DELETE b FROM t USING TIMESTAMP 3 WHERE k = 0 AND c1 = 'A' and c2 = 'X';
> {noformat}
> If the following is ran on 2.X (with everything either flushed in the same 
> table or compacted together), then this will result in the inserted row being 
> duplicated (one part containing the {{a}} column, the other the {{c}} one).
> I will note that this is _not_ a duplicate of CASSANDRA-15789 and this 
> reproduce even with the fix to {{LegacyLayout}} of this ticket. That said, 
> the additional code added to CASSANDRA-15789 to force merging duplicated rows 
> if they are produced _will_ end up fixing this as a consequence (assuming 
> there is no variation of this problem that leads to other visible issues than 
> duplicated rows). That said, I "think" we'd still rather fix the source of 
> the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15805) Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones interacts with collection tombstones

2020-05-12 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105536#comment-17105536
 ] 

Sylvain Lebresne commented on CASSANDRA-15805:
--

To understand why this happens, let me write down the atoms that the example of 
the description generates on 2.X (using a simplified representation that I hope 
is clear enough):
{noformat}
atom1: RT([A:_, A:X:b:_])@1, // beginning of all 'A' rows to beginning of 
A:X's b column
atom2: Cell(A:X:)@2, // row marker for A:X
atom3: Cell(A:X:a=foo)@2,// value of a in A:X
atom4: RT([A:X:b:_, A:X:b:!])@3, // collection tombstone for b in A:X's
atom5: RT([A:X:b:!, A:!])@1, // remainder of covering RT, from end of b in 
A:X to end of all 'A' rows
atom6: Cell(A:X:c=bar)@2 // value of c in A:X
{noformat}
Those atoms are deserialized into {{LegacyCell}} and {{LegacyRangeTombstone}} 
on 3.X as:
{noformat}
atom1: RT(Bound(INCL_START_BOUND(A), 
collection=null)-Bound(EXCL_END_BOUND(A:B), collection=null), deletedAt=1, 
localDeletion=1589204864)
atom2: LegacyCell(REGULAR, name=Cellname(clustering=A:X, column=null, 
collElt=null), v=, ts=2, ldt=2147483647, ttl=0)
atom3: LegacyCell(REGULAR, name=Cellname(clustering=A:X, column=a, 
collElt=null), v=foo, ts=2, ldt=2147483647, ttl=0)
atom4: RT(Bound(INCL_START_BOUND(A:X), collection=b)-Bound(INCL_END_BOUND(A:X), 
collection=b), deletedAt=3, localDeletion=1589204864)
atom5: RT(Bound(EXCL_START_BOUND(A:X), 
collection=null)-Bound(INCL_END_BOUND(A), collection=null), deletedAt=1, 
localDeletion=1589204864)
atom6: LegacyCell(REGULAR, name=Cellname(clustering=A:X, column=c, 
collElt=null), v=bar, ts=2, ldt=2147483647, ttl=0)
{noformat}

I'll point out that those are a direct translation of the 2.X atoms except for 
{{atom1}} and {{atom5}} that are slightly different:
* instead of {{atom1}} stopping at the beginning of the row {{b}} column, it 
extends to the end of the row.
* and instead of {{atom5}} staring after that {{b}} column, it starts after the 
row. Do note however that the order of atoms is still the one above, so that 
atom is effectively out-of-order.

The reason for those differences is the logic [at the beginning of 
{{LegacyLayout.RangeTombstone}}|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/LegacyLayout.java#L1883],
 whose comment is trying to explain, but is basically due to the legacy layer 
having to map all 2.X RTs into either a 3.X range tombstone (so one over 
multiple rows), a row tombstone or a collection one.

Anyway, as mentioned above, the problem is that {{atom5}} is out of order.  
What currently happens is that when {{atom5}} is encountered by 
{{UnfilteredDeserialized.OldFormatDeserializer}}, it will be passed to the 
{{CellGrouper}} currently grouping the row, and will end up in the 
[{{CellGrouper#addGenericTombstone}} 
method|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/LegacyLayout.java#L1544].
  But, because that atom starts strictly after the row being grouped, the 
method returns {{false}} and the row is generated a first time. Later, we get 
{{atom6}} which restarts the row with the value of column {{c}}, after which it 
is generated a second time.


> Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones 
> interacts with collection tombstones
> --
>
> Key: CASSANDRA-15805
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15805
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Local/SSTable
>Reporter: Sylvain Lebresne
>Priority: Normal
>
> The legacy reading code ({{LegacyLayout}} and 
> {{UnfilteredDeserializer.OldFormatDeserializer}}) does not handle correctly 
> the case where a range tombstone covering multiple rows interacts with a 
> collection tombstone.
> A simple example of this problem is if one runs on 2.X:
> {noformat}
> CREATE TABLE t (
>   k int,
>   c1 text,
>   c2 text,
>   a text,
>   b set,
>   c text,
>   PRIMARY KEY((k), c1, c2)
> );
> // Delete all rows where c1 is 'A'
> DELETE FROM t USING TIMESTAMP 1 WHERE k = 0 AND c1 = 'A';
> // Inserts a row covered by that previous range tombstone
> INSERT INTO t(k, c1, c2, a, b, c) VALUES (0, 'A', 'X', 'foo', {'whatever'}, 
> 'bar') USING TIMESTAMP 2;
> // Delete the collection of that previously inserted row
> DELETE b FROM t USING TIMESTAMP 3 WHERE k = 0 AND c1 = 'A' and c2 = 'X';
> {noformat}
> If the following is ran on 2.X (with everything either flushed in the same 
> table or compacted together), then this will result in the inserted row being 
> duplicated (one part containing the {{a}} column, the other the {{c}} one).
> I will note that this is 

[jira] [Created] (CASSANDRA-15805) Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones interacts with collection tombstones

2020-05-12 Thread Sylvain Lebresne (Jira)
Sylvain Lebresne created CASSANDRA-15805:


 Summary: Potential duplicate rows on 2.X->3.X upgrade when 
multi-rows range tombstones interacts with collection tombstones
 Key: CASSANDRA-15805
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15805
 Project: Cassandra
  Issue Type: Bug
  Components: Consistency/Coordination, Local/SSTable
Reporter: Sylvain Lebresne


The legacy reading code ({{LegacyLayout}} and 
{{UnfilteredDeserializer.OldFormatDeserializer}}) does not handle correctly the 
case where a range tombstone covering multiple rows interacts with a collection 
tombstone.

A simple example of this problem is if one runs on 2.X:
{noformat}
CREATE TABLE t (
  k int,
  c1 text,
  c2 text,
  a text,
  b set,
  c text,
  PRIMARY KEY((k), c1, c2)
);

// Delete all rows where c1 is 'A'
DELETE FROM t USING TIMESTAMP 1 WHERE k = 0 AND c1 = 'A';
// Inserts a row covered by that previous range tombstone
INSERT INTO t(k, c1, c2, a, b, c) VALUES (0, 'A', 'X', 'foo', {'whatever'}, 
'bar') USING TIMESTAMP 2;
// Delete the collection of that previously inserted row
DELETE b FROM t USING TIMESTAMP 3 WHERE k = 0 AND c1 = 'A' and c2 = 'X';
{noformat}

If the following is ran on 2.X (with everything either flushed in the same 
table or compacted together), then this will result in the inserted row being 
duplicated (one part containing the {{a}} column, the other the {{c}} one).

I will note that this is _not_ a duplicate of CASSANDRA-15789 and this 
reproduce even with the fix to {{LegacyLayout}} of this ticket. That said, the 
additional code added to CASSANDRA-15789 to force merging duplicated rows if 
they are produced _will_ end up fixing this as a consequence (assuming there is 
no variation of this problem that leads to other visible issues than duplicated 
rows). That said, I "think" we'd still rather fix the source of the issue.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15789) Rows can get duplicated in mixed major-version clusters and after full upgrade

2020-05-08 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102676#comment-17102676
 ] 

Sylvain Lebresne commented on CASSANDRA-15789:
--

I had a quick look at those commits, and agrees about the fix in `LegacyLayout`.

And I have no strong objections on the 2 other parts, but wanted to remark 2 
points:
- regarding the elimination of duplicates on iterator coming from 
`LegacyLayout`: the patch currently merge the duplicates rather silently.  What 
if we have another bug in `LegacyLayout` for which row duplication is only one 
sign, but that also lose data? Are we sure we won't regret not failing on what 
would be an unknown bug?
- Regarding the duplicate check on all reads, I "think" this could have a 
measurable impact on performance for some workloads. Which isn't a reason not 
to add it, but as this impact all reads and will go in "stable" versions, do we 
want to run a few benchmarks to quantify this? Or have a way to disable the 
check?


> Rows can get duplicated in mixed major-version clusters and after full upgrade
> --
>
> Key: CASSANDRA-15789
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15789
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Local/Memtable, Local/SSTable
>Reporter: Aleksey Yeschenko
>Assignee: Marcus Eriksson
>Priority: Normal
>
> In a mixed 2.X/3.X major version cluster a sequence of row deletes, 
> collection overwrites, paging, and read repair can cause 3.X nodes to split 
> individual rows into several rows with identical clustering. This happens due 
> to 2.X paging and RT semantics, and a 3.X {{LegacyLayout}} deficiency.
> To reproduce, set up a 2-node mixed major version cluster with the following 
> table:
> {code}
> CREATE TABLE distributed_test_keyspace.tlb (
> pk int,
> ck int,
> v map,
> PRIMARY KEY (pk, ck)
> );
> {code}
> 1. Using either node as the coordinator, delete the row with ck=2 using 
> timestamp 1
> {code}
> DELETE FROM tbl USING TIMESTAMP 1 WHERE pk = 1 AND ck = 2;
> {code}
> 2. Using either node as the coordinator, insert the following 3 rows:
> {code}
> INSERT INTO tbl (pk, ck, v) VALUES (1, 1, {'e':'f'}) USING TIMESTAMP 3;
> INSERT INTO tbl (pk, ck, v) VALUES (1, 2, {'g':'h'}) USING TIMESTAMP 3;
> INSERT INTO tbl (pk, ck, v) VALUES (1, 3, {'i':'j'}) USING TIMESTAMP 3;
> {code}
> 3. Flush the table on both nodes
> 4. Using the 2.2 node as the coordinator, force read repar by querying the 
> table with page size = 2:
>  
> {code}
> SELECT * FROM tbl;
> {code}
> 5. Overwrite the row with ck=2 using timestamp 5:
> {code}
> INSERT INTO tbl (pk, ck, v) VALUES (1, 2, {'g':'h'}) USING TIMESTAMP 5;}}
> {code}
> 6. Query the 3.0 node and observe the split row:
> {code}
> cqlsh> select * from distributed_test_keyspace.tlb ;
>  pk | ck | v
> ++
>   1 |  1 | {'e': 'f'}
>   1 |  2 | {'g': 'h'}
>   1 |  2 | {'k': 'l'}
>   1 |  3 | {'i': 'j'}
> {code}
> This happens because the read to query the second page ends up generating the 
> following mutation for the 3.0 node:
> {code}
> ColumnFamily(tbl -{deletedAt=-9223372036854775808, localDeletion=2147483647,
>  ranges=[2:v:_-2:v:!, deletedAt=2, localDeletion=1588588821]
> [2:v:!-2:!,   deletedAt=1, localDeletion=1588588821]
> [3:v:_-3:v:!, deletedAt=2, localDeletion=1588588821]}-
>  [2:v:63:false:1@3,])
> {code}
> Which on 3.0 side gets incorrectly deserialized as
> {code}
> Mutation(keyspace='distributed_test_keyspace', key='0001', modifications=[
>   [distributed_test_keyspace.tbl] key=1 
> partition_deletion=deletedAt=-9223372036854775808, localDeletion=2147483647 
> columns=[[] | [v]]
> Row[info=[ts=-9223372036854775808] ]: ck=2 | del(v)=deletedAt=2, 
> localDeletion=1588588821, [v[c]=d ts=3]
> Row[info=[ts=-9223372036854775808] del=deletedAt=1, 
> localDeletion=1588588821 ]: ck=2 |
> Row[info=[ts=-9223372036854775808] ]: ck=3 | del(v)=deletedAt=2, 
> localDeletion=1588588821
> ])
> {code}
> {{LegacyLayout}} correctly interprets a range tombstone whose start and 
> finish {{collectionName}} values don't match as a wrapping fragment of a 
> legacy row deletion that's being interrupted by a collection deletion 
> (correctly) - see 
> [code|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/LegacyLayout.java#L1874-L1889].
>  Quoting the comment inline:
> {code}
> // Because of the way RangeTombstoneList work, we can have a tombstone where 
> only one of
> // the bound has a collectionName. That happens if we have a big tombstone A 
> (spanning one
> // or multiple rows) and a collection tombstone B. In that case, 
> RangeTombstoneList will

[jira] [Updated] (CASSANDRA-13917) COMPACT STORAGE queries on dense static tables accept hidden column1 and value columns

2020-01-10 Thread Sylvain Lebresne (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-13917:
-
Status: Open  (was: Resolved)

Re-opening because this patch breaks things badly. Did you run the tests after 
the move of {{isHiddenColumn}} to {{getColumnDefinition}}? Because if so, our 
tests aren't very good.

The committed version, that move {{isHiddenColumn}} to {{getColumnDefinition}}, 
means the compact column is invisible internally, which is wrong, it must be 
accessible internally.

Concretely, a simple test that creates a 2ndary index, wait for it to be built, 
and then restart the node will fail on restart with
{noformat}
java.lang.RuntimeException: Unknown column value during deserialization
at 
org.apache.cassandra.db.SerializationHeader$Component.toHeader(SerializationHeader.java:353)
 ~[main/:na]
at 
org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:496)
 ~[main/:na]
at 
org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:365)
 ~[main/:na]
at 
org.apache.cassandra.io.sstable.format.SSTableReader$2.run(SSTableReader.java:544)
 ~[main/:na]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_152]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[na:1.8.0_152]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[na:1.8.0_152]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_152]
at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:83)
 [main/:na]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_152]
{noformat}
because the {{SystemKeyspace.BUILT_INDEXES}} table is compact, so when it tries 
to open the header of the sstables for this table, the {{getColumnDefinition}} 
returns {{null}} for the {{value}} column, even though it obviously exists and 
should be returned.


> COMPACT STORAGE queries on dense static tables accept hidden column1 and 
> value columns
> --
>
> Key: CASSANDRA-13917
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13917
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Alex Petrov
>Assignee: Aleksandr Sorokoumov
>Priority: Low
>  Labels: lhf
> Fix For: 3.0.x, 3.11.x
>
> Attachments: 13917-3.0-testall-13.12.2019, 13917-3.0-testall-2.png, 
> 13917-3.0-testall-20.11.2019.png, 13917-3.0.png, 
> 13917-3.11-testall-13.12.2019, 13917-3.11-testall-2.png, 
> 13917-3.11-testall-20.11.2019.png, 13917-3.11.png
>
>
> Test for the issue:
> {code}
> @Test
> public void testCompactStorage() throws Throwable
> {
> createTable("CREATE TABLE %s (a int PRIMARY KEY, b int, c int) WITH 
> COMPACT STORAGE");
> assertInvalid("INSERT INTO %s (a, b, c, column1) VALUES (?, ?, ?, 
> ?)", 1, 1, 1, ByteBufferUtil.bytes('a'));
> // This one fails with Some clustering keys are missing: column1, 
> which is still wrong
> assertInvalid("INSERT INTO %s (a, b, c, value) VALUES (?, ?, ?, ?)", 
> 1, 1, 1, ByteBufferUtil.bytes('a'));   
> assertInvalid("INSERT INTO %s (a, b, c, column1, value) VALUES (?, ?, 
> ?, ?, ?)", 1, 1, 1, ByteBufferUtil.bytes('a'), ByteBufferUtil.bytes('b'));
> assertEmpty(execute("SELECT * FROM %s"));
> }
> {code}
> Gladly, these writes are no-op, even though they succeed.
> {{value}} and {{column1}} should be completely hidden. Fixing this one should 
> be as easy as just adding validations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15432) The "read defragmentation" optimization does not work

2019-11-18 Thread Sylvain Lebresne (Jira)
Sylvain Lebresne created CASSANDRA-15432:


 Summary: The "read defragmentation" optimization does not work
 Key: CASSANDRA-15432
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15432
 Project: Cassandra
  Issue Type: Bug
Reporter: Sylvain Lebresne


The so-called "read defragmentation" that has been added way back with 
CASSANDRA-2503 actually does not work, and never has. That is, the 
defragmentation writes do happen, but they only additional load on the nodes 
without helping anything, and are thus a clear negative.

The "read defragmentation" (which only impact so-called "names queries") kicks 
in when a read hits "too many" sstables (> 4 by default), and when it does, it 
writes down the result of that read. The assumption being that the next read 
for that data would only read the newly written data, which if not still in 
memtable would at least be in a single sstable, thus speeding that next read.

Unfortunately, this is not how this work. When we defrag and write the result 
of our original read, we do so with the timestamp of the data read (as we 
should, changing the timestamp would be plain wrong). And as a result, 
following reads will read that data first, but will have no way to tell that no 
more sstables should be read. Technically, the 
[{{reduceFilter}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L830]
 call will not return {{null}} because the {{currentMaxTs}} will be higher than 
at least some of the data in the result, and this until we've read from as many 
sstables than in the original read.

I see no easy way to fix this. It might be possible to make it work with 
additional per-sstable metadata, but nothing sufficiently simple and cheap to 
be worth it comes to mind. And I thus suggest simply removing that code.

For the record, I'll note that there is actually a 2nd problem with that code: 
currently, we "defrag" a read even if we didn't got data for everything that 
the query requests. This also is "wrong" even if we ignore the first issue: a 
following read that would read the defragmented data would also have no way to 
know to not read more sstables to try to get the missing parts. This problem 
would be fixeable, but is obviously overshadowed by the previous one anyway.

Anyway, as mentioned, I suggest to just remove the "optimization" (which again, 
never optimized anything) altogether, and happy to provide the simple patch.

The only question might be in which versions? This impact all versions, but 
this isn't a correction bug either, "just" a performance one. So do we want 4.0 
only or is there appetite for earlier?




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14825) Expose table schema for drivers

2019-09-06 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924234#comment-16924234
 ] 

Sylvain Lebresne commented on CASSANDRA-14825:
--

{quote}it could support versioning of CQL grammar
{quote}
Good point. And again, that fall into the category of {{DESCRIBE}} makes it 
easy to have schema specific stuffs, more so than a virtual table approach.
{quote}{{cqlsh}} having it already does not seem particularly important, and 
should not bind our future decisions. It is a single tool, even if it is 
bundled.
{quote}
The {{cqlsh}} {{DESCRIBE}} has been _the_ main way to get schema users have 
used for the past 7-ish years, it's the one documented everywhere, and afaict, 
there is no proposal to remove it, so users will continue to use it. I fail to 
see how building on something familiar and not creating multiple way to do 
essentially the same thing is not at least a bit relevant, and advantageous.

Don't get me wrong, if something is bad, then sure, we shouldn't build on a bad 
idea just because it's there, but are we making the argument that {{DESCRIBE}} 
is _bad_?
{quote}Virtual tables are also very capable of surfacing the necessary 
information to produce dependent types, for instance as a collection column of 
the names of those type.
{quote}
I'm genuinely a bit unsure what you are trying to argue here in the context of 
this ticket.

This ticket is explicitly about exposing the schema in CQL form, so as strings 
at the end of the day, so I'm not sure how virtual tables brings structure to 
that. Btw, the tables exposed by the current patch have no collections 
whatsoever and no more structure that what {{DESCRIBE}} would give.

It's very possible you have something in mind that is not the current patch, 
but I think you'll need to describe it at least a bit so we can discuss it.
{quote}I can say that I hate features like {{DESCRIBE}} because I have to go 
and google the manual. With a virtual table interface, I just {{SELECT}}
{quote}
:)

How do you know which table to {{SELECT}}? Believe it or not, the table in the 
current patch to get a given table schema is called 
{{system_views.describe_table}} ...
{quote}There is a legitimate case to be made to support both approaches, in my 
opinion.
{quote}
To clarify, my *main* position here is that doing both approaches would be a 
mistake. Not that the virtual table approach is terrible (it's not), nor that 
{{DESCRIBE}} is order of magnitude better.

In fact, my argument is that both approach are not different _enough_ to 
justify adding user confusion by having 2 ways to do essentially the same thing 
(exposing the schema in CQL form without having drivers rebuilding it manually 
from our existing schema tables).

That's why the pre-existence (and reasonable adequacy so far) of {{DESCRIBE}} 
is very relevant to me: since we've not removing {{DESCRIBE}} in cqlsh, pushing 
it server side is really the only option that does not create 2 ways to 
"describe the schema" (and the fact it's better at handling things like 
versions or "internal" schema details finishes to convince me it's _at least_ 
not substantially worth than the virtual table approach, if not a bit better).

I genuinely believe that this, having different ways to do essentially the same 
thing, is one of the thing we've been historically bad at and is a contributor 
to the (deserved) reputation of C* of being hard to use/learn (obviously, not 
the only factor, but one nonetheless). I wish for us to learn from our mistake, 
not repeat them. I feel we can easily avoid it here.

> Expose table schema for drivers
> ---
>
> Key: CASSANDRA-14825
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14825
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/CQL
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently the drivers recreate the CQL for the tables by putting together the 
> system table values. This is very difficult to keep up to date and buggy 
> enough that its only even supported in Java and Python drivers. Cassandra 
> already has some limited output available for snapshots that we could provide 
> in a virtual table or new query that the drivers can fetch. This can greatly 
> reduce the complexity of drivers while also reducing bugs like 
> CASSANDRA-14822 as the underlying schema and properties change.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14825) Expose table schema for drivers

2019-09-06 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924060#comment-16924060
 ] 

Sylvain Lebresne commented on CASSANDRA-14825:
--

I was trying to have an argumented and as-objective-as-possible discussion on 
the pros and cons of each approach. I honestly don't feel there has been much 
of that: a lot of the early conversation has dismissed the {{DESCRIBE}} 
approach on clear mis-comprehension of the proposal (even the voting email 
still talk of protocol changes, which is simply not part of the suggestion at 
all), touting advantages of the virtual table approach that just aren't there; 
the more recent arguments for why the virtual table approach would be superior 
seems to be a tad subjective ("IMO the right UX", "more natural").

And in particular, no-one even acknowledged the points about "how do we 
properly expose the internal parts of the schema". This ticket only feels stuck 
because you guys have made up your mind and don't want to discuss it anymore, 
not because all relevant points have been fully considered.

But with that said, I've done my job of raising the arguments I saw, and If 
there is no interest in it, feel free to decide with a vote.
{quote}Given that and that the virtual table implementation is rather 
concise(300 lines), I felt it pragmatic to consider doing both.
{quote}
My point against having 2 solutions for the same thing wasn't a line-of-code 
consideration (at least not primarily, having 2 times 300 lines of code is 
still worth than having only it one time for maintenance). It's a user-centric 
concern. Having multiple ways to do the same thing is exactly the type of 
things that further C* reputation of being hard to approach.

> Expose table schema for drivers
> ---
>
> Key: CASSANDRA-14825
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14825
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/CQL
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently the drivers recreate the CQL for the tables by putting together the 
> system table values. This is very difficult to keep up to date and buggy 
> enough that its only even supported in Java and Python drivers. Cassandra 
> already has some limited output available for snapshots that we could provide 
> in a virtual table or new query that the drivers can fetch. This can greatly 
> reduce the complexity of drivers while also reducing bugs like 
> CASSANDRA-14822 as the underlying schema and properties change.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14825) Expose table schema for drivers

2019-09-05 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923709#comment-16923709
 ] 

Sylvain Lebresne commented on CASSANDRA-14825:
--

bq. but is there any reason why we can't also offer a virtual table solution?

That would mean supporting 2 different ways to do the exact same things. That's 
usually not considered ideal.

bq. as it allows users to access schema via any driver, and doesn't depend on 
the drivers to build schema

I think you misunderstood [~snazy]'s comment above. He's not saying we should 
stick to cqlsh-based {{DESCRIBE}}, he's mentioning the alternative to the 
virtual table approach of having {{DESCRIBE}} being a genuine CQL (server-side) 
command, one that would return result sets (and that I describe in a number of 
comments above).

This would still give users access to the schema via any driver (without any 
driver change) and would not depend on the drivers to build schema.

And fwiw, I so far continue to believe that this server-side {{DESCRIBE}} 
approach is, as objectively as I can put it, better than a virtual-table one. 
As I've mentioned before, users already know {{DESCRIBE}} and as {{DESCRIBE}} 
is not going away cqlsh-side, the virtual table approach kind of creates 2 
separate ways to "describe" schema for users, which again don't feel ideal to 
me.

Additionally, and that's the point [~snazy] mentioned, using a specific command 
(instead of basically reusing {{SELECT}}) gives us flexibility for schema 
specific stuffs much more easily. As Robert says, there is subtleties when it 
comes to schema, in particular some things that are kind of "internal" (dropped 
columns record, table ID, ...), but can be necessary when needing to recreate 
the schema identically. So there is a (genuine afaict) need for both getting 
the schema with and without those internal info (something we currently support 
badly, but it's not a reason to continue doing so).

With the {{DESCRIBE}} approach, this is simple, just support some {{WITH 
INTERNALS}} to decide if those "internals" are returned or not. With the 
virtual table approach, not so much. Adding syntax to {{SELECT}} that is only 
ever useful when querying a handful of system views is ugly.

As for the argument that virtual tables give you the "full power of SELECT", I 
think it's more theoretical than anything when you look into the details.  It's 
not like SELECT is _that_ flexible in the first place, it's somewhat limited by 
what schema we pick for the system view. And the {{DESCRIBE}} syntax already 
provides 1) full schema, 2) schema of one keyspace and 3) schema of one 
"object" for all our schema objects (table, types, ...). What more do we need 
in practice?


> Expose table schema for drivers
> ---
>
> Key: CASSANDRA-14825
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14825
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/CQL
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently the drivers recreate the CQL for the tables by putting together the 
> system table values. This is very difficult to keep up to date and buggy 
> enough that its only even supported in Java and Python drivers. Cassandra 
> already has some limited output available for snapshots that we could provide 
> in a virtual table or new query that the drivers can fetch. This can greatly 
> reduce the complexity of drivers while also reducing bugs like 
> CASSANDRA-14822 as the underlying schema and properties change.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14670) Table Metrics Virtual Table

2019-07-03 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877729#comment-16877729
 ] 

Sylvain Lebresne commented on CASSANDRA-14670:
--

Fwiw, I rather strongly agree with [~iamaleksey] here. Breaking the core 
foundation of data modeling for a virtual table 'because it looks at bit better 
by default' is a really bad idea imo, and I even disagree that it's a better 
UX, because it might actually confuse people that are not C* developers, while 
using {{ORDER BY}} will be familiar to every developer on earth.

Lifting restrictions on {{ORDER BY}} and {{ALLOW FILTERING}} restrictions on 
virtual tables would also be generally useful for all virtual tables, so that's 
an additional motivation.

bq. I am fine with changing partition key to ((keyspace_name), table_name) once 
the functionality is at least possible because finding the top tables is an 
operational need thats not possible otherwise.

That bugs me, because you somewhat suggest we cannot afford to delay this to do 
it right on the account that it's not _possible otherwise_, but that's pretty 
disingenuous when you yourself said in the description:
bq. his can kinda be figured out with cfstats sorting and some clever bash-foo

Personally, I'd vote for reverting this until done right, or block 4.0 on a 
follow-up ticket to fix it, but saying "there is still time before 4.0 GA" is 
the surest way to have it slip.

> Table Metrics Virtual Table
> ---
>
> Key: CASSANDRA-14670
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14670
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/CQL, Legacy/Observability
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Low
>  Labels: pull-request-available, virtual-tables
> Fix For: 4.0.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Different than CASSANDRA-14572 whose goal is to expose all metrics. This is 
> to expose a few hand tailored tables that are particularly useful in 
> debugging slow Cassandra instances (in my experience). These are useful in 
> finding out which table it is that is having issues if you see a node 
> performing poorly in general. This can kinda be figured out with cfstats 
> sorting and some clever bash-foo but its been a bit of a operational UX pain 
> for me personally for awhile.
> examples:
> {code}
> cqlsh> select * from system_views.max_partition_size limit 5;
>  max_partition_size | keyspace_name | table_name
> +---+
>  126934 |system | size_estimates
>9887 | system_schema |columns
>9887 | system_schema | tables
>6866 |system |  local
> 258 | keyspace1 |  standard1
> (5 rows)
> cqlsh> select * from system_views.local_reads limit 5 ;
>  count | keyspace_name | table_name  | 99th  | max   | median  | 
> per_second
> ---+---+-+---+---+-+
> 23 |system |   local | 186563160 | 186563160 | 1629722 |  
>   3.56101
> 22 | system_schema |  tables |   4055269 |   4055269 |  454826 |  
>   3.72452
> 14 | system_schema | columns |   1131752 |   1131752 |  545791 |  
>   2.37015
> 14 | system_schema | dropped_columns |126934 |126934 |   88148 |  
>   2.37015
> 14 | system_schema | indexes |219342 |219342 |  152321 |  
>   2.37015
> (5 rows)
> cqlsh> select * from system_views.coordinator_reads limit 5;
>  count | keyspace_name | table_name | 99th | max | median | per_second
> ---+---++--+-++
>  2 |system |  local |0 |   0 |  0 |   0.005324
>  1 |   system_auth |  roles |0 |   0 |  0 |   0.002662
>  0 | basic |   wide |0 |   0 |  0 |  0
>  0 | basic |  wide3 |0 |   0 |  0 |  0
>  0 | keyspace1 |   counter1 |0 |   0 |  0 |  0
> (5 rows)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15038) Provide an option to Disable Truststore CA check for internode_encryption

2019-02-28 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780765#comment-16780765
 ] 

Sylvain Lebresne commented on CASSANDRA-15038:
--

bq. even when client auth is disabled, we need the trust store to verify SSL 
certificate of peers when we make outbound connections.

Doesn't this make the {{require_client_auth}} option on 
{{server_encryption_options}} kind of pointless though? Since we make 
bi-directional connections between any 2 nodes anyway. As in, it doesn't feel 
like setting this option or not (for {{server_encryption_options}}) allow or 
disallow any concrete use case. I mean, you get the theoretical knowledge that 
on inbound connection the remote certificate is not checked, but since you're 
gonna check it on outbound connections anyway in practice ...

My point being, we should imo make one of 2 changes:
# make {{require_client_auth == false}} make it so that you can leave the 
truststore unset, so what Jai wants. The security of such setting is obviously 
debatable, and we could have clear warnings, but at least it provide some kind 
of concretely usable option (get "some" security without having to set a trust 
store).
# deprecating/removing {{require_client_auth}} from 
{{server_encryption_options}} altogether, since it's imo more confusing than 
anything in its current state (though I'm no SSL expert, so maybe I'm just 
misunderstanding this).

> Provide an option to Disable Truststore CA check for internode_encryption
> -
>
> Key: CASSANDRA-15038
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15038
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Encryption
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Major
>
> Hello,
> The current internode encryption between cassandra nodes uses a keystore and 
> truststore. However there are some use-case where users are okay to allow any 
> one to trust as long as they have a keystore. This is requirement is only for 
> encryption but not trusting the identity.
> It would be good to have an option to disable the Truststore CA check for the 
> internode_encryption.
>  
> In the current cassandra.yaml, there is no way to comment/disable the 
> truststore and truststore password and allow anyone to connect with a 
> certificate. 
>  
> though the require_client_auth: is set to false, cassandra fails to startup 
> if we disable truststore and truststore_password as it look for default 
> truststore under `conf/.truststore`
>  
> {code:java}
> server_encryption_options:
>  internode_encryption: all
>  keystore: /etc/cassandra/keystore.jks
>  keystore_password: mykeypass
>  truststore: /etc/cassandra/truststore.jks
>  truststore_password: truststorepass
>  # More advanced defaults below:
>  # protocol: TLS
>  # algorithm: SunX509
>  # store_type: JKS
>  # cipher_suites: 
> [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
>  # require_client_auth: false
>  # require_endpoint_verification: false{code}
> {noformat}
> Caused by: java.io.IOException: Error creating the initializing the SSL 
> Context
>  at 
> org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:201)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at 
> org.apache.cassandra.security.SSLFactory.getServerSocket(SSLFactory.java:61) 
> ~[apache-cassandra-3.11.3.jar:3.11.3]
>  at 
> org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:708)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  ... 8 common frames omitted
> Caused by: java.io.FileNotFoundException: conf/.truststore (Permission denied)
>  at java.io.FileInputStream.open0(Native Method) ~[na:1.8.0_151]
>  at java.io.FileInputStream.open(FileInputStream.java:195) ~[na:1.8.0_151]
>  at java.io.FileInputStream.(FileInputStream.java:138) ~[na:1.8.0_151]
>  at java.io.FileInputStream.(FileInputStream.java:93) ~[na:1.8.0_151]
>  at 
> org.apache.cassandra.security.SSLFactory.createSSLContext(SSLFactory.java:168)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
>  ... 10 common frames omitted{noformat}
>  
>  Cassandra Version: 3.11.3
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14925) DecimalSerializer.toString() can be used as OOM attack

2018-12-11 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716681#comment-16716681
 ] 

Sylvain Lebresne commented on CASSANDRA-14925:
--

The patch looks obviously technically ok, but there is of course the question 
of backward compatibility. I believe the most of usage of 
{{AbstractType#getString}} is for log messages, but there is a few other 
usages. {{sstabledump}} is one, though I'm not too worried about this here. 
There is a also a bunch of case where it's used for internal stuffs but "I 
think" this should be case where {{decimal}} is not used. But I have far from 
make a careful analysis of all the places where it is used, so I think we're 
fine but I'm not 100% sure.

Overall, not sure what to do about that previous comment. I do think we should 
fix this and I don't think the risk of someone running into backward 
compatibility troubles is very high here, but I wonder if we shouldn't stick to 
trunk as a compromise. Would welcome other opinions here for sure. Maybe worth 
a quick email on the mailing list to gather opinions? 



> DecimalSerializer.toString() can be used as OOM attack 
> ---
>
> Key: CASSANDRA-14925
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14925
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Minor
>
> Currently, in {{DecimalSerializer.toString(value)}}, it uses 
> {{BigDecimal.toPlainString()}} which generates huge string for large scale 
> values.
>  
> {code:java}
> BigDecimal d = new BigDecimal("1e-" + (Integer.MAX_VALUE - 6));
> d.toPlainString(); // oom{code}
>  
> Propose to use {{BigDecimal.toString()}} when scale is larger than 100 which 
> is configurable via {{-Dcassandra.decimal.maxscaleforstring}}
>  
> | patch | circle-ci |
> | [3.0|https://github.com/jasonstack/cassandra/commits/decimal-tostring-3.0] 
> | 
> [unit|https://circleci.com/gh/jasonstack/cassandra/747?utm_campaign=vcs-integration-link_medium=referral_source=github-build-link]
>  |
> The code should apply cleanly to 3.0+.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14925) DecimalSerializer.toString() can be used as OOM attack

2018-12-10 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714525#comment-16714525
 ] 

Sylvain Lebresne commented on CASSANDRA-14925:
--

Can't we just use {{BigDecimal.toString()}} all the time as save ourselves the 
trouble of adding yet one more runtime parameter that no user will probably 
ever modify?

> DecimalSerializer.toString() can be used as OOM attack 
> ---
>
> Key: CASSANDRA-14925
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14925
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Minor
>
> Currently, in {{DecimalSerializer.toString(value)}}, it uses 
> {{BigDecimal.toPlainString()}} which generates huge string for large scale 
> values.
>  
> {code:java}
> BigDecimal d = new BigDecimal("1e-" + (Integer.MAX_VALUE - 6));
> d.toPlainString(); // oom{code}
>  
> Propose to use {{BigDecimal.toString()}} when scale is larger than 100 which 
> is configurable via {{-Dcassandra.decimal.maxscaleforstring}}
>  
> | patch | circle-ci |
> | [3.0|https://github.com/jasonstack/cassandra/commits/decimal-tostring-3.0] 
> | 
> [unit|https://circleci.com/gh/jasonstack/cassandra/747?utm_campaign=vcs-integration-link_medium=referral_source=github-build-link]
>  |
> The code should apply cleanly to 3.0+.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Resolved] (CASSANDRA-14584) insert if not exists, with replication factor of 2 doesn't work

2018-11-19 Thread Sylvain Lebresne (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne resolved CASSANDRA-14584.
--
Resolution: Not A Problem

bq. Or any limitation on the insert if not exists command?

Yes, "insert if not exists" is a serial ({{CL.SERIAL}}) /lightweight 
transaction (LWT) query, which means it always require a quorum of nodes up. 
And a quorum of RF=2 is 2 node, so you won't be able to do any {{CL.SERIAL}} 
queries on a single node cluster if RF=2.

> insert if not exists, with replication factor of 2 doesn't work
> ---
>
> Key: CASSANDRA-14584
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14584
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: arik
>Priority: Major
>
> Running with a single node cluster.
> My keyspace has a replication factor of 2.
> Insert if not exists doesn't work on that setup.
> Produce the following error:
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:720)
>  Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All 
> host(s) tried for query failed (tried: cassandra-service/10.23.251.29:9042 
> (com.datastax.driver.core.exceptions.UnavailableException: Not enough 
> replicas available for query at consistency QUORUM (2 required but only 1 
> alive))) at 
> com.datastax.driver.core.RequestHandler.reportNoMoreHosts(RequestHandler.java:223)
>  at 
> com.datastax.driver.core.RequestHandler.access$1200(RequestHandler.java:41) 
> at 
> com.datastax.driver.core.RequestHandler$SpeculativeExecution.findNextHostAndQuery(RequestHandler.java:309)
>  at 
> com.datastax.driver.core.RequestHandler$SpeculativeExecution.retry(RequestHandler.java:477)
>  at 
> com.datastax.driver.core.RequestHandler$SpeculativeExecution.processRetryDecision(RequestHandler.java:455)
>  at 
> com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:686)
>  at 
> com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1091)
>  at 
> com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1008)
>  at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
>  at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
>  at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
>  at 
> io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
>  at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
>  at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>  at 
> com.datastax.driver.core.InboundTrafficMeter.channelRead(InboundTrafficMeter.java:29)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
>  at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1273) at 
> io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1084) at 
> io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
>  at 
> 

[jira] [Commented] (CASSANDRA-14825) Expose table schema for drivers

2018-11-12 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683564#comment-16683564
 ] 

Sylvain Lebresne commented on CASSANDRA-14825:
--

bq. just to be clear if you query describe_keyspace table you can iterate 
through the result set to get the entire schema (...)

And in case there was doubt, I didn't say it wasn't the case. I'm not saying 
you can't get schema information through virtual tables.

What I'm asking is, why use virtual tables when we could just promote to CQL 
the `DESCRIBE` statements every user is already familiar and which is, I think, 
a more fexible/direct approach?

By which I mean that you can both get the granular if you want, but also get a 
full schema dump directly. With virtual tables, you get the granular, but a 
full schema dump requires a small amount of post-processing (_not_ saying it's 
hard, but it is harder than no post-processing at all). Additionally, it's very 
easy to add new options to statements, while once you settle on some virtual 
table schema, it can harder to evolve.

What are the pros in favor of virtual tables that outweigh those 2 pros of 
promoting `DESCRIBE` (existing familiarity and at least some form of better 
flexibility; to which I could add not having 2 ways to do the same thing, since 
afaik, we're not going to remove `DESCRIBE` from cqlsh)? I get that virtual 
tables are everyone's new shiny hammer, but it's not an objective argument.

> Expose table schema for drivers
> ---
>
> Key: CASSANDRA-14825
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14825
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently the drivers recreate the CQL for the tables by putting together the 
> system table values. This is very difficult to keep up to date and buggy 
> enough that its only even supported in Java and Python drivers. Cassandra 
> already has some limited output available for snapshots that we could provide 
> in a virtual table or new query that the drivers can fetch. This can greatly 
> reduce the complexity of drivers while also reducing bugs like 
> CASSANDRA-14822 as the underlying schema and properties change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14825) Expose table schema for drivers

2018-11-09 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681885#comment-16681885
 ] 

Sylvain Lebresne commented on CASSANDRA-14825:
--

bq. At the least we should provide selecting by keyspace, ideally keyspace and 
table.

Ok, sure. As my comments hopefully make it clear, I agree with you on those 
options needing to be provided. And I'm sure that's not controversial.

> Expose table schema for drivers
> ---
>
> Key: CASSANDRA-14825
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14825
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently the drivers recreate the CQL for the tables by putting together the 
> system table values. This is very difficult to keep up to date and buggy 
> enough that its only even supported in Java and Python drivers. Cassandra 
> already has some limited output available for snapshots that we could provide 
> in a virtual table or new query that the drivers can fetch. This can greatly 
> reduce the complexity of drivers while also reducing bugs like 
> CASSANDRA-14822 as the underlying schema and properties change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14825) Expose table schema for drivers

2018-11-09 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681723#comment-16681723
 ] 

Sylvain Lebresne commented on CASSANDRA-14825:
--

bq. I'm -1 on this. Good, bad or ugly, there are a lot of clusters out there 
that do multi-tennancy or otherwise have > 1k tables for which this would be 
untenable. 

I'm confused here. How is it fundamentally that different from the exact same 
query in cqlsh today? I assume those cluster simply do do it today and _nobody_ 
would force them to do it either. I assume you've read enough to see that 
nothing I propose would prevent those users to get their schema in a more 
incremental fashion.

> Expose table schema for drivers
> ---
>
> Key: CASSANDRA-14825
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14825
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently the drivers recreate the CQL for the tables by putting together the 
> system table values. This is very difficult to keep up to date and buggy 
> enough that its only even supported in Java and Python drivers. Cassandra 
> already has some limited output available for snapshots that we could provide 
> in a virtual table or new query that the drivers can fetch. This can greatly 
> reduce the complexity of drivers while also reducing bugs like 
> CASSANDRA-14822 as the underlying schema and properties change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14825) Expose table schema for drivers

2018-11-09 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681667#comment-16681667
 ] 

Sylvain Lebresne commented on CASSANDRA-14825:
--

To maybe clarify/be more precise, when I say "promote the DESCRIBE syntax cqlsh 
supports to proper CQL", what I have in mind is that each `DESCRIBE` statement 
would return a result set with only one result, but there would many variants, 
so:
{noformat}
DESCRIBE SCHEMA; // returns a single string with the whole schema
DESCRIBE KEYSPACE ks; // returns a single string, with all of keyspace 'ks'
DESCRIBE TABLE ks.t   // returns a single string with just that one table.
...
{noformat}
Again, I think it's going much more messy to get something approaching that 
flexibility with tables (doable, sure, but more messy).

> Expose table schema for drivers
> ---
>
> Key: CASSANDRA-14825
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14825
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the drivers recreate the CQL for the tables by putting together the 
> system table values. This is very difficult to keep up to date and buggy 
> enough that its only even supported in Java and Python drivers. Cassandra 
> already has some limited output available for snapshots that we could provide 
> in a virtual table or new query that the drivers can fetch. This can greatly 
> reduce the complexity of drivers while also reducing bugs like 
> CASSANDRA-14822 as the underlying schema and properties change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



  1   2   3   4   5   6   7   8   9   10   >