date:20180927

[jira] [Commented] (CASSANDRA-12704) snapshot build never be able to publish to mvn artifactory

2018-09-27 Thread C. Scott Andreas (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-12704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631234#comment-16631234
 ] 

C. Scott Andreas commented on CASSANDRA-12704:
--

Thanks, Jay and Mick!

> snapshot build never be able to publish to mvn artifactory
> --
>
> Key: CASSANDRA-12704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12704
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.0
>
> Attachments: 12704-trunk.txt
>
>
> {code}
> $ ant publish
> {code}
> works fine when property "release" is set, which publishes the binaries to 
> release Artifactory.
> But for daily snapshot build, if "release" is set, it won't be snapshot build:
> https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L74
> if "release" is not set, it doesn't publish to snapshot Artifactory:
> https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L1888
> I would suggest just removing the "if check" for target "publish".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2018-09-27 Thread Benedict (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631226#comment-16631226
 ] 

Benedict commented on CASSANDRA-12126:
--

bq. We read nothing in node Y, yet node Z read something in the next request.

I think the problem here is that, at the API level, there isn't enough 
information to say that X didn't simply 'occur' *after* both Y and Z.  That is, 
unless the rejection of Y occurs after X's timeout.  In this case, it would 
seem to be an API-visible error, as at the point of timeout the indeterminacy 
should be fixed.  Timeouts should not ‘live forever’ as the bogeyman, ready to 
mess with history.

I think, though, that the suggested mechanism could result in this.

Take three nodes (RF=3) A, B and C; and any three CAS operations X, Y and Z 
such that:
* X and Y can always succeed
* Z can only succeed if X has succeeded

Setup:
# Prepare _and_ Propose X with ballot 1; proposal accepted only by A 
#* this will be the last and only node’s proposal acceptance
# Prepare Y with ballot 2; reach B and C before ballot 1, so they do not accept
# Now, lock X and Y in battle, always failing to proceed to the propose step 
before the other reaches the prepare step again
# X and Y both timeout having failed to cleanly apply

Part 2:
# Z is now attempted; it prepares to only B and C, seeing no in-progress 
proposal
# As a result, it does not see X; it is rejected, so there is no new 
proposal/commit 
# Read at SERIAL is performed; this time, A is consulted
# Suddenly, a wild X appears.  From nowhere.

It does seem, in essence, to be an incidence of the bug (or a very similar one) 
described in the ticket.


> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: sankalp kohli
>Priority: Major
>  Labels: LWT
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-12704) snapshot build never be able to publish to mvn artifactory

2018-09-27 Thread Jay Zhuang (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-12704:
---
   Resolution: Fixed
Fix Version/s: 4.0
   Status: Resolved  (was: Ready to Commit)

> snapshot build never be able to publish to mvn artifactory
> --
>
> Key: CASSANDRA-12704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12704
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.0
>
> Attachments: 12704-trunk.txt
>
>
> {code}
> $ ant publish
> {code}
> works fine when property "release" is set, which publishes the binaries to 
> release Artifactory.
> But for daily snapshot build, if "release" is set, it won't be snapshot build:
> https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L74
> if "release" is not set, it doesn't publish to snapshot Artifactory:
> https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L1888
> I would suggest just removing the "if check" for target "publish".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-12704) snapshot build never be able to publish to mvn artifactory

2018-09-27 Thread Jay Zhuang (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-12704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631220#comment-16631220
 ] 

Jay Zhuang commented on CASSANDRA-12704:


Thanks [~michaelsembwever]. Committed to trunk as 
[{{87a}}|https://github.com/apache/cassandra/commit/87abe7249f7ad8b11235d61e048735bd6d62].

> snapshot build never be able to publish to mvn artifactory
> --
>
> Key: CASSANDRA-12704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12704
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Minor
> Attachments: 12704-trunk.txt
>
>
> {code}
> $ ant publish
> {code}
> works fine when property "release" is set, which publishes the binaries to 
> release Artifactory.
> But for daily snapshot build, if "release" is set, it won't be snapshot build:
> https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L74
> if "release" is not set, it doesn't publish to snapshot Artifactory:
> https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L1888
> I would suggest just removing the "if check" for target "publish".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

cassandra git commit: Enable snapshot artifacts publish

2018-09-27 Thread jzhuang

Repository: cassandra
Updated Branches:
  refs/heads/trunk 29f83b888 -> 87abe


Enable snapshot artifacts publish

So "ant publish" is able to upload snapshot artifacts to artifactory.

patch by Jay Zhuang; reviewed by Mick Semb Wever for CASSANDRA-12704


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/87ab
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/87ab
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/87ab

Branch: refs/heads/trunk
Commit: 87abe7249f7ad8b11235d61e048735bd6d62
Parents: 29f83b8
Author: Jay Zhuang 
Authored: Tue Oct 4 14:26:18 2016 -0700
Committer: Jay Zhuang 
Committed: Thu Sep 27 17:39:22 2018 -0700

--
 CHANGES.txt | 1 +
 build.xml   | 1 -
 2 files changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/87ab/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index e227c40..0d3571d 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 4.0
+ * Enable snapshot artifacts publish (CASSANDRA-12704)
  * Introduce RangesAtEndpoint.unwrap to simplify 
StreamSession.addTransferRanges (CASSANDRA-14770)
  * LOCAL_QUORUM may speculate to non-local nodes, resulting in Timeout instead 
of Unavailable (CASSANDRA-14735)
  * Avoid creating empty compaction tasks after truncate (CASSANDRA-14780)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/87ab/build.xml
--
diff --git a/build.xml b/build.xml
index 86462f7..28c33bf 100644
--- a/build.xml
+++ b/build.xml
@@ -2038,7 +2038,6 @@
 
   
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-12704) snapshot build never be able to publish to mvn artifactory

2018-09-27 Thread Jay Zhuang (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-12704:
---
Status: Ready to Commit  (was: Patch Available)

> snapshot build never be able to publish to mvn artifactory
> --
>
> Key: CASSANDRA-12704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12704
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Minor
> Attachments: 12704-trunk.txt
>
>
> {code}
> $ ant publish
> {code}
> works fine when property "release" is set, which publishes the binaries to 
> release Artifactory.
> But for daily snapshot build, if "release" is set, it won't be snapshot build:
> https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L74
> if "release" is not set, it doesn't publish to snapshot Artifactory:
> https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L1888
> I would suggest just removing the "if check" for target "publish".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Assigned] (CASSANDRA-14788) Add test coverage workflows to CircleCI config

2018-09-27 Thread Jordan West (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West reassigned CASSANDRA-14788:
---

Assignee: Jon Meredith

> Add test coverage workflows to CircleCI config
> --
>
> Key: CASSANDRA-14788
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14788
> Project: Cassandra
>  Issue Type: Improvement
>  Components: 4.0, Build
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> To support 4.0 testing efforts it's helpful to know how much of the code is 
> being exercised by unit tests and dtests.
> Add support for running the unit tests and dtests instrumented for test 
> coverage on CircleCI and then combine the results of all tests (unit, dtest 
> with vnodes, dtest without vnodes) into a single coverage report.
> All of the hard work of getting JaCoCo to work with unit tests and dtests has 
> already been done, it just needs wiring up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-12704) snapshot build never be able to publish to mvn artifactory

2018-09-27 Thread mck (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-12704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631185#comment-16631185
 ] 

mck commented on CASSANDRA-12704:
-

{quote}Do you think the change should go to trunk only or other branches 
too?{quote}
I'm thinking only trunk to begin with, as that's where the current need is. We 
also don't know how this will work, and it might be better to iron out wrinkles 
against just trunk first, and back-port it later if a definite need arises. 
That's my opinion, otherwise easy.

> snapshot build never be able to publish to mvn artifactory
> --
>
> Key: CASSANDRA-12704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12704
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Minor
> Attachments: 12704-trunk.txt
>
>
> {code}
> $ ant publish
> {code}
> works fine when property "release" is set, which publishes the binaries to 
> release Artifactory.
> But for daily snapshot build, if "release" is set, it won't be snapshot build:
> https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L74
> if "release" is not set, it doesn't publish to snapshot Artifactory:
> https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L1888
> I would suggest just removing the "if check" for target "publish".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2018-09-27 Thread Jeffrey F. Lukman (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630850#comment-16630850
 ] 

Jeffrey F. Lukman commented on CASSANDRA-12126:
---

Thank you for your responses, [~jjordan] and [~kohlisankalp].
I think you have cleared up some misunderstandings for me (and our team) where 
timeout is a "gray area" for the client 
to determine whether a request has been successfully processed.

One thing that we would like to point out maybe, based on the early discussion 
in this bug description, quote
{quote}However we need to fix step 2, since it caused reads to not be 
linearizable with respect to writes and other reads. In this case, we know that 
majority of acceptors have no inflight commit which means we have majority that 
nothing was accepted by majority. I think we should run a propose step here 
with empty commit and that will cause write written in step 1 to not be visible 
ever after.
{quote}
What we tried to mimic with our model checker in the beginning actually was 
this scenario where node Y saw that the majority of nodes do not have 
inProgress value, but then suddenly node Z saw that there is an inProgress 
value from node X and tried to repair and commit it.
So, we confirm that we can also see this behavior:
{quote}2: Read -> Nothing
3: Read -> Something
{quote}
We read nothing in node Y, yet node Z read something in the next request.



To sum up, at least, our scenario explains this behavior: Node Y does not try 
to repair the Paxos because node X's prepare response comes last, therefore 
node Y ignores the node X's prepare response and based its decision to not 
repair the Paxos.
But in node Z's client request, node Z decides to repair the Paxos based on 
node X's existing inProgress value_1="A" because node X's prepare response 
comes early (1st or 2nd). Which cause an inconsistent reaction in some way 
between node Y and node Z (although this is correct based on the original Paxos 
algorithm).


A solution to avoid this inconsistent reactions from these two nodes maybe is 
for each node to decide whether to repair a Paxos or not based on the complete 
view of the alive nodes, therefore if the response X's comes last with an 
inProgress value, node Y will still repair the Paxos.

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: sankalp kohli
>Priority: Major
>  Labels: LWT
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2018-09-27 Thread sankalp kohli (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630772#comment-16630772
 ] 

sankalp kohli commented on CASSANDRA-12126:
---

I agree with [~jjordan] that this is a correct response. 

Also in the future please open a new Jira if it is a different issue. 

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: sankalp kohli
>Priority: Major
>  Labels: LWT
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14794) Avoid calling iter.next() in a loop when notifying indexers about range tombstones

2018-09-27 Thread Alex Petrov (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630728#comment-16630728
 ] 

Alex Petrov commented on CASSANDRA-14794:
-

+1, thank you for the patch!

> Avoid calling iter.next() in a loop when notifying indexers about range 
> tombstones
> --
>
> Key: CASSANDRA-14794
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14794
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> In 
> [SecondaryIndexManager|https://github.com/apache/cassandra/blob/914c66685c5bebe1624d827a9b4562b73a08c297/src/java/org/apache/cassandra/index/SecondaryIndexManager.java#L901-L902]
>  - avoid calling {{.next()}} in the {{.forEach(..)}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14791) [utest] tests unable to write system tmp directory

2018-09-27 Thread Jay Zhuang (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630718#comment-16630718
 ] 

Jay Zhuang commented on CASSANDRA-14791:


[~mshuler] talked about the docker option in the last NGCC: 
https://github.com/ngcc/ngcc2017/blob/master/Help_Test_Apache_Cassandra-NGCC_2017.pdf
 . Any idea how we can move forward with this?

> [utest] tests unable to write system tmp directory
> --
>
> Key: CASSANDRA-14791
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14791
> Project: Cassandra
>  Issue Type: Task
>  Components: Testing
>Reporter: Jay Zhuang
>Priority: Minor
>
> Some tests are failing from time to time because it cannot write to directory 
> {{/tmp/}}:
> https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/lastCompletedBuild/testReport/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/
> {noformat}
> java.lang.RuntimeException: java.nio.file.AccessDeniedException: 
> /tmp/na-1-big-Data.db
>   at 
> org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:119)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:152)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:141)
>   at 
> org.apache.cassandra.io.compress.CompressedSequentialWriter.(CompressedSequentialWriter.java:82)
>   at 
> org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testCompressedReadWith(CompressedInputStreamTest.java:119)
>   at 
> org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testException(CompressedInputStreamTest.java:78)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.nio.file.AccessDeniedException: /tmp/na-1-big-Data.db
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
>   at java.nio.channels.FileChannel.open(FileChannel.java:287)
>   at java.nio.channels.FileChannel.open(FileChannel.java:335)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:100)
> {noformat}
>  I guess it's because some Jenkins slaves don't have proper permission set. 
> For slave {{cassandra16}}, the tests are fine:
> https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/723/testReport/junit/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/history/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-12126) CAS Reads Inconsistencies

2018-09-27 Thread Jeremiah Jordan (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630700#comment-16630700
 ] 

Jeremiah Jordan edited comment on CASSANDRA-12126 at 9/27/18 4:31 PM:
--

bq. client request-1: Timed out, client request-2: Rejected, client request-3: 
Timed out

Given those responses to the queries.  The client side does not know the state 
of the system without issuing a READ at SERIAL (or doing another INSERT that 
gets a success which the state can be inferred from).

bq. There we get an inconsistency between the client side and the server side, 
where all requests actually failed, but when we read the end result again from 
all nodes, we get value_1='A', value_2=null, value_3=null.

Given the responses you got, there is no inconsistency.  The client received 
"timed out" exceptions.  A timed out exception means "your query may or may not 
have been applied, the server doesn't know, you should retry it if you want to 
ensure it goes through".  In this case request-1 was successful, and request-3 
failed.  So {value_1='A', value_2=null, value_3=null} is a valid state and not 
inconsistent.


was (Author: jjordan):
bq. client request-1: Timed out, client request-2: Rejected, client request-3: 
Timed out

Given those responses to the queries.  The client side does not know the state 
of the system without issuing a READ at SERIAL (or doing another INSERT that 
gets a success).

bq. There we get an inconsistency between the client side and the server side, 
where all requests actually failed, but when we read the end result again from 
all nodes, we get value_1='A', value_2=null, value_3=null.

Given the responses you got, there is no inconsistency.  The client received 
"timed out" exceptions.  A timed out exception means "your query may or may not 
have been applied, the server doesn't know, you should retry it if you want to 
ensure it goes through".  In this case request-1 was successful, and request-3 
failed.  So {value_1='A', value_2=null, value_3=null} is a valid state and not 
inconsistent.

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: sankalp kohli
>Priority: Major
>  Labels: LWT
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail:

[jira] [Updated] (CASSANDRA-12704) snapshot build never be able to publish to mvn artifactory

2018-09-27 Thread Jay Zhuang (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-12704:
---
Reviewer: mck

> snapshot build never be able to publish to mvn artifactory
> --
>
> Key: CASSANDRA-12704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12704
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Minor
> Attachments: 12704-trunk.txt
>
>
> {code}
> $ ant publish
> {code}
> works fine when property "release" is set, which publishes the binaries to 
> release Artifactory.
> But for daily snapshot build, if "release" is set, it won't be snapshot build:
> https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L74
> if "release" is not set, it doesn't publish to snapshot Artifactory:
> https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L1888
> I would suggest just removing the "if check" for target "publish".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-12126) CAS Reads Inconsistencies

2018-09-27 Thread Jeremiah Jordan (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630700#comment-16630700
 ] 

Jeremiah Jordan edited comment on CASSANDRA-12126 at 9/27/18 4:31 PM:
--

bq. client request-1: Timed out, client request-2: Rejected, client request-3: 
Timed out

Given those responses to the queries.  The client side does not know the state 
of the system without issuing a READ at SERIAL (or doing another INSERT that 
gets a success which the state can be inferred from).

bq. There we get an inconsistency between the client side and the server side, 
where all requests actually failed, but when we read the end result again from 
all nodes, we get value_1='A', value_2=null, value_3=null.

Given the responses you got, there is no inconsistency.  The client received 
"timed out" exceptions.  A timed out exception means "your query may or may not 
have been applied, the server doesn't know, you should retry it if you want to 
ensure it goes through".  In this case request-1 was successful, and request-3 
failed.  So {{value_1='A', value_2=null, value_3=null}} is a valid state and 
not inconsistent.


was (Author: jjordan):
bq. client request-1: Timed out, client request-2: Rejected, client request-3: 
Timed out

Given those responses to the queries.  The client side does not know the state 
of the system without issuing a READ at SERIAL (or doing another INSERT that 
gets a success which the state can be inferred from).

bq. There we get an inconsistency between the client side and the server side, 
where all requests actually failed, but when we read the end result again from 
all nodes, we get value_1='A', value_2=null, value_3=null.

Given the responses you got, there is no inconsistency.  The client received 
"timed out" exceptions.  A timed out exception means "your query may or may not 
have been applied, the server doesn't know, you should retry it if you want to 
ensure it goes through".  In this case request-1 was successful, and request-3 
failed.  So {value_1='A', value_2=null, value_3=null} is a valid state and not 
inconsistent.

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: sankalp kohli
>Priority: Major
>  Labels: LWT
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For

[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2018-09-27 Thread Jeremiah Jordan (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630700#comment-16630700
 ] 

Jeremiah Jordan commented on CASSANDRA-12126:
-

bq. client request-1: Timed out, client request-2: Rejected, client request-3: 
Timed out

Given those responses to the queries.  The client side does not know the state 
of the system without issuing a READ at SERIAL (or doing another INSERT that 
gets a success).

bq. There we get an inconsistency between the client side and the server side, 
where all requests actually failed, but when we read the end result again from 
all nodes, we get value_1='A', value_2=null, value_3=null.

Given the responses you got, there is no inconsistency.  The client received 
"timed out" exceptions.  A timed out exception means "your query may or may not 
have been applied, the server doesn't know, you should retry it if you want to 
ensure it goes through".  In this case request-1 was successful, and request-3 
failed.  So {value_1='A', value_2=null, value_3=null} is a valid state and not 
inconsistent.

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: sankalp kohli
>Priority: Major
>  Labels: LWT
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-12704) snapshot build never be able to publish to mvn artifactory

2018-09-27 Thread Jay Zhuang (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-12704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630699#comment-16630699
 ] 

Jay Zhuang commented on CASSANDRA-12704:


Do you think the change should go to trunk only or other branches too?
I would prefer branches from 2.2, as we might have snapshot artifacts for all 
active branches.

> snapshot build never be able to publish to mvn artifactory
> --
>
> Key: CASSANDRA-12704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12704
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Minor
> Attachments: 12704-trunk.txt
>
>
> {code}
> $ ant publish
> {code}
> works fine when property "release" is set, which publishes the binaries to 
> release Artifactory.
> But for daily snapshot build, if "release" is set, it won't be snapshot build:
> https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L74
> if "release" is not set, it doesn't publish to snapshot Artifactory:
> https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L1888
> I would suggest just removing the "if check" for target "publish".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart

2018-09-27 Thread Kevin Zhang (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630690#comment-16630690
 ] 

Kevin Zhang commented on CASSANDRA-14358:
-

[~jolynch]: thanks for the answer. I did the kernel workaround and it worked. 

> OutboundTcpConnection can hang for many minutes when nodes restart
> --
>
> Key: CASSANDRA-14358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.19 (also reproduced on 3.0.15), running 
> with {{internode_encryption: all}} and the EC2 multi region snitch on Linux 
> 4.13 within the same AWS region. Smallest cluster I've seen the problem on is 
> 12 nodes, reproduces more reliably on 40+ and 300 node clusters consistently 
> reproduce on at least one node in the cluster.
> So all the connections are SSL and we're connecting on the internal ip 
> addresses (not the public endpoint ones).
> Potentially relevant sysctls:
> {noformat}
> /proc/sys/net/ipv4/tcp_syn_retries = 2
> /proc/sys/net/ipv4/tcp_synack_retries = 5
> /proc/sys/net/ipv4/tcp_keepalive_time = 7200
> /proc/sys/net/ipv4/tcp_keepalive_probes = 9
> /proc/sys/net/ipv4/tcp_keepalive_intvl = 75
> /proc/sys/net/ipv4/tcp_retries2 = 15
> {noformat}
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
>  Labels: 4.0-feature-freeze-review-requested
> Fix For: 4.0, 2.1.x, 2.2.x, 3.0.x, 3.11.x
>
> Attachments: 10 Minute Partition.pdf
>
>
> edit summary: This primarily impacts networks with stateful firewalls such as 
> AWS. I'm working on a proper patch for trunk but unfortunately it relies on 
> the Netty refactor in 4.0 so it will be hard to backport to previous 
> versions. A workaround for earlier versions is to set the 
> {{net.ipv4.tcp_retries2}} sysctl to ~5. This can be done with the following:
> {code:java}
> $ cat /etc/sysctl.d/20-cassandra-tuning.conf
> net.ipv4.tcp_retries2=5
> $ # Reload all sysctls
> $ sysctl --system{code}
> Original Bug Report:
> I've been trying to debug nodes not being able to see each other during 
> longer (~5 minute+) Cassandra restarts in 3.0.x and 2.1.x which can 
> contribute to {{UnavailableExceptions}} during rolling restarts of 3.0.x and 
> 2.1.x clusters for us. I think I finally have a lead. It appears that prior 
> to trunk (with the awesome Netty refactor) we do not set socket connect 
> timeouts on SSL connections (in 2.1.x, 3.0.x, or 3.11.x) nor do we set 
> {{SO_TIMEOUT}} as far as I can tell on outbound connections either. I believe 
> that this means that we could potentially block forever on {{connect}} or 
> {{recv}} syscalls, and we could block forever on the SSL Handshake as well. I 
> think that the OS will protect us somewhat (and that may be what's causing 
> the eventual timeout) but I think that given the right network conditions our 
> {{OutboundTCPConnection}} threads can just be stuck never making any progress 
> until the OS intervenes.
> I have attached some logs of such a network partition during a rolling 
> restart where an old node in the cluster has a completely foobarred 
> {{OutboundTcpConnection}} for ~10 minutes before finally getting a 
> {{java.net.SocketException: Connection timed out (Write failed)}} and 
> immediately successfully reconnecting. I conclude that the old node is the 
> problem because the new node (the one that restarted) is sending ECHOs to the 
> old node, and the old node is sending ECHOs and REQUEST_RESPONSES to the new 
> node's ECHOs, but the new node is never getting the ECHO's. This appears, to 
> me, to indicate that the old node's {{OutboundTcpConnection}} thread is just 
> stuck and can't make any forward progress. By the time we could notice this 
> and slap TRACE logging on, the only thing we see is ~10 minutes later a 
> {{SocketException}} inside {{writeConnected}}'s flush and an immediate 
> recovery. It is interesting to me that the exception happens in 
> {{writeConnected}} and it's a _connection timeout_ (and since we see {{Write 
> failure}} I believe that this can't be a connection reset), because my 
> understanding is that we should have a fully handshaked SSL connection at 
> that point in the code.
> Current theory:
>  # "New" node restarts,  "Old" node calls 
> [newSocket|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L433]
>  # Old node starts [creating a 
> new|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnectionPool.java#L141]
>  SSL socket 
>  # SSLSocket calls 
>

[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2018-09-27 Thread Jeffrey F. Lukman (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630688#comment-16630688
 ] 

Jeffrey F. Lukman commented on CASSANDRA-12126:
---

During our testing with our model checker, we limit the round of Paxos for each 
query,
because if not, it is possible that we get stuck in a very long sequence of 
message transactions among the nodes without progressing anywhere. So, what we 
do is we only execute one round of Paxos for each query.

To enlight our test and combine our whole story, here is what happened in 
detail:
 * We first prepared the 3 node-cluster with the test.tests table as initial 
table structure and yes, the initial table began with:
{name:'testing', owner:'user_1', value1:null, value2:null, value3:null}
 * Next, we run the model checker that will start the 3 node-cluster.
 * Inject the 3 client requests in order: query 1, then query 2, then query 3.
This cause query 1 to have ballot number < query 2 ballot number < query 3 
ballot number.
 * Now this means, in the beginning, the model checker already see there will 
be 9 prepare messages in its queue that will be reordered in some way.
 * When the bug is manifested, we ended up having:
 ** Node X's prepare messages proceed and all nodes response with true back to 
node X.
 ** Node X sends its propose message with value_1='A' to itself first and get a 
response true as well.
 ** At this moment, Node X inProgress value is updated to the proposed value, 
value_1='A'
 ** But then node Y prepare messages proceed and all nodes response with true 
back to node Y,
because prepare messages of node Y have a higher ballot number.
 ** But when node Y about to proceed the propose messages it realized that the 
current data does not fulfill the IF condition, so it does not proceed to 
propose messages. --> Client request 2 to node Y is therefore rejected
 ** Continuing node X propose messages to node Y and Z, both requests are 
returned with false to node X
 ** Now at this point node X should be able to retry the Paxos with a higher 
ballot number, but since we limit the round of Paxos for each query to one, 
therefore client request 1 to node X is timed out.
 ** Lastly, node Z sends its prepare messages to all nodes, and get response 
true messages from all nodes,
because the ballot number is higher as well.
 ** At this point, if the node X response message is returned first to node X, 
what will happen is node Z will realize that node X still has an inProgress 
value in the process (value_1='A'). This cause node Z to send propose messages 
and commit messages but for client request 1 using the current highest ballot 
number.
Here we have our first data update saved: value_1='A', value_2=null, 
value_3=null.
 ** Back to our constraint of one round Paxos for each query, we ended up not 
retrying client request-3 because we reached timeout.
 * To sum up:
 ** client request-1: Timed out
 ** client request-2: Rejected
 ** client request-3: Timed out

There we get an inconsistency between the client side and the server side, 
where all requests actually failed, but when we read the end result again from 
all nodes, we get value_1='A', value_2=null, value_3=null.

 

I made a wrong statement at the end of my first comment:
{quote}9. Therefore, we ended up having client request 1 stored to the server, 
although client request-3 was the one that is said successful.
{quote}
It should be failed due to timeout.

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: sankalp kohli
>Priority: Major
>  Labels: LWT
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will

[jira] [Updated] (CASSANDRA-14794) Avoid calling iter.next() in a loop when notifying indexers about range tombstones

2018-09-27 Thread Marcus Eriksson (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-14794:

Status: Patch Available  (was: Open)

https://github.com/krummas/cassandra/commits/marcuse/14794
tests: https://circleci.com/workflow-run/8413c7d8-bd59-4c78-8ae8-7d529c55ab1f

> Avoid calling iter.next() in a loop when notifying indexers about range 
> tombstones
> --
>
> Key: CASSANDRA-14794
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14794
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> In 
> [SecondaryIndexManager|https://github.com/apache/cassandra/blob/914c66685c5bebe1624d827a9b4562b73a08c297/src/java/org/apache/cassandra/index/SecondaryIndexManager.java#L901-L902]
>  - avoid calling {{.next()}} in the {{.forEach(..)}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-14794) Avoid calling iter.next() in a loop when notifying indexers about range tombstones

2018-09-27 Thread Marcus Eriksson (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-14794:

Reviewers: Alex Petrov, Sam Tunnicliffe

> Avoid calling iter.next() in a loop when notifying indexers about range 
> tombstones
> --
>
> Key: CASSANDRA-14794
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14794
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> In 
> [SecondaryIndexManager|https://github.com/apache/cassandra/blob/914c66685c5bebe1624d827a9b4562b73a08c297/src/java/org/apache/cassandra/index/SecondaryIndexManager.java#L901-L902]
>  - avoid calling {{.next()}} in the {{.forEach(..)}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-14742) Race Condition in batchlog replica collection

2018-09-27 Thread Alex Petrov (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-14742:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Race Condition in batchlog replica collection
> -
>
> Key: CASSANDRA-14742
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14742
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When we collect nodes for it in {{StorageProxy#getBatchlogReplicas}}, we 
> already filter out down replicas; subsequently they get picked up and taken 
> for liveAndDown.
> There's a possible race condition due to picking tokens from token metadata 
> twice (once in {{StorageProxy#getBatchlogReplicas}} and second one in 
> {{ReplicaPlan#forBatchlogWrite}})



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14742) Race Condition in batchlog replica collection

2018-09-27 Thread Alex Petrov (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630666#comment-16630666
 ] 

Alex Petrov commented on CASSANDRA-14742:
-

Thank you,

Committed to trunk as 
[29f83b88821c4792087df19d829ac87b5c06e9e6|https://github.com/apache/cassandra/commit/29f83b88821c4792087df19d829ac87b5c06e9e6]

> Race Condition in batchlog replica collection
> -
>
> Key: CASSANDRA-14742
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14742
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When we collect nodes for it in {{StorageProxy#getBatchlogReplicas}}, we 
> already filter out down replicas; subsequently they get picked up and taken 
> for liveAndDown.
> There's a possible race condition due to picking tokens from token metadata 
> twice (once in {{StorageProxy#getBatchlogReplicas}} and second one in 
> {{ReplicaPlan#forBatchlogWrite}})



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

cassandra git commit: Consolidate batch write code

2018-09-27 Thread ifesdjeen

Repository: cassandra
Updated Branches:
  refs/heads/trunk 914c66685 -> 29f83b888


Consolidate batch write code

Patch by Alex Petrov; reviewed by Benedict Elliott Smith for CASSANDRA-14742

Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/29f83b88
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/29f83b88
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/29f83b88

Branch: refs/heads/trunk
Commit: 29f83b88821c4792087df19d829ac87b5c06e9e6
Parents: 914c666
Author: Alex Petrov 
Authored: Mon Sep 17 15:13:05 2018 +0200
Committer: Alex Petrov 
Committed: Thu Sep 27 18:03:48 2018 +0200

--
 .../cassandra/batchlog/BatchlogManager.java |  89 -
 .../cassandra/config/DatabaseDescriptor.java|   2 +-
 .../db/CounterMutationVerbHandler.java  |   2 +-
 .../org/apache/cassandra/db/SystemKeyspace.java |   4 +-
 .../org/apache/cassandra/db/view/ViewUtils.java |   6 +-
 .../org/apache/cassandra/dht/Datacenters.java   |   2 +-
 .../cassandra/dht/RangeFetchMapCalculator.java  |   2 +-
 .../org/apache/cassandra/dht/RangeStreamer.java |   8 +-
 .../cassandra/locator/EndpointSnitchInfo.java   |   4 +-
 .../cassandra/locator/IEndpointSnitch.java  |  22 ++-
 .../apache/cassandra/locator/InOurDcTester.java |   2 +-
 .../org/apache/cassandra/locator/Replica.java   |   2 +-
 .../apache/cassandra/locator/ReplicaPlans.java  | 133 ++-
 .../cassandra/locator/SystemReplicas.java   |   5 +-
 .../apache/cassandra/net/MessagingService.java  |   6 +-
 .../cassandra/service/RangeRelocator.java   |   3 +-
 .../apache/cassandra/service/StartupChecks.java |   4 +-
 .../apache/cassandra/service/StorageProxy.java  |  85 +---
 .../cassandra/service/StorageService.java   |   8 +-
 .../service/reads/AbstractReadExecutor.java |   2 +-
 .../reads/ShortReadPartitionsProtection.java|   2 +-
 .../apache/cassandra/streaming/StreamPlan.java  |   4 +-
 .../cassandra/streaming/StreamSession.java  |   4 +-
 .../batchlog/BatchlogEndpointFilterTest.java|  65 -
 24 files changed, 240 insertions(+), 226 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/29f83b88/src/java/org/apache/cassandra/batchlog/BatchlogManager.java
--
diff --git a/src/java/org/apache/cassandra/batchlog/BatchlogManager.java 
b/src/java/org/apache/cassandra/batchlog/BatchlogManager.java
index 77f725c..91129ed 100644
--- a/src/java/org/apache/cassandra/batchlog/BatchlogManager.java
+++ b/src/java/org/apache/cassandra/batchlog/BatchlogManager.java
@@ -527,93 +527,4 @@ public class BatchlogManager implements 
BatchlogManagerMBean
 }
 }
 }
-
-public static class EndpointFilter
-{
-private final String localRack;
-private final Multimap endpoints;
-
-public EndpointFilter(String localRack, Multimap endpoints)
-{
-this.localRack = localRack;
-this.endpoints = endpoints;
-}
-
-/**
- * @return list of candidates for batchlog hosting. If possible these 
will be two nodes from different racks.
- */
-public Collection filter()
-{
-// special case for single-node data centers
-if (endpoints.values().size() == 1)
-return endpoints.values();
-
-// strip out dead endpoints and localhost
-ListMultimap validated = 
ArrayListMultimap.create();
-for (Map.Entry entry : 
endpoints.entries())
-if (isValid(entry.getValue()))
-validated.put(entry.getKey(), entry.getValue());
-
-if (validated.size() <= 2)
-return validated.values();
-
-if (validated.size() - validated.get(localRack).size() >= 2)
-{
-// we have enough endpoints in other racks
-validated.removeAll(localRack);
-}
-
-if (validated.keySet().size() == 1)
-{
-/*
- * we have only 1 `other` rack to select replicas from 
(whether it be the local rack or a single non-local rack)
- * pick two random nodes from there; we are guaranteed to have 
at least two nodes in the single remaining rack
- * because of the preceding if block.
- */
-List otherRack = 
Lists.newArrayList(validated.values());
-shuffle(otherRack);
-return otherRack.subList(0, 2);
-}
-
-// randomize which racks we pick from if more than 2 remaining
-Collection racks;
-if (validated.keySet().size() == 2)
-{
-racks =

[jira] [Commented] (CASSANDRA-14795) Expose information about stored hints via JMX

2018-09-27 Thread Aleksey Yeschenko (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630617#comment-16630617
 ] 

Aleksey Yeschenko commented on CASSANDRA-14795:
---

This might have to wait until 5.0 unfortunately. 3.0 and 3.11 should be only 
accepting bug fixes now, and 4.0 is feature-frozen currently.

> Expose information about stored hints via JMX
> -
>
> Key: CASSANDRA-14795
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14795
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Aleksandr Sorokoumov
>Assignee: Aleksandr Sorokoumov
>Priority: Minor
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
>
> Currently there is no way to determine what kind of hints a node has, apart 
> from looking at the filenames (thus host-ids) on disk. Having a way to access 
> this information would help with debugging hint creation/replay scenarios.
> In addition to the JMX method, there is a new nodetool command:
> {noformat}$ bin/nodetool -h 127.0.0.1 -p 7100 listendpointspendinghints
> Host ID Address Rack DC Status Total files Newest Oldest
> 5762b140-3fdf-4057-9ca7-05c070ccc9c3 127.0.0.2 rack1 datacenter1 DOWN 2 
> 2018-09-18 14:05:18,835 2018-09-18 14:05:08,811
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-12126) CAS Reads Inconsistencies

2018-09-27 Thread Jeffrey F. Lukman (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629685#comment-16629685
 ] 

Jeffrey F. Lukman edited comment on CASSANDRA-12126 at 9/27/18 3:19 PM:


To complete our scenario, here is the setup for our Cassandra:
 We run the scenario with Cassandra-v2.0.15.
 Here is the scheme that we use:
 * CREATE KEYSPACE test WITH REPLICATION = \{'class': 'SimpleStrategy', 
'replication_factor': 3};
 * CREATE TABLE tests ( name text PRIMARY KEY, owner text, value_1 text, 
value_2 text, value_3 text);

Here are the queries that we submit:
 * client request to node X (1st): UPDATE test.tests SET value_1 = 'A' WHERE 
name = 'testing' IF owner = 'user_1';
 * client request to node Y (2nd): UPDATE test.tests SET value_2 = 'B' WHERE 
name = 'testing' IF value_1 = 'A';
 * client request to node Z (3rd): UPDATE test.tests SET value_3 = 'C' WHERE 
name = 'testing' IF value_1 = 'A';

To confirm, when the bug is manifested, the end result will be: value_1='A', 
value_2=null, value_3=null

[~jjirsa], regarding our tool, at this point, it is not open for public. 


was (Author: jeffreyflukman):
To complete our scenario, here is the setup for our Cassandra:
We run the scenario with Cassandra-v2.0.15.
Here is the scheme that we use:
 * 
CREATE KEYSPACE test WITH REPLICATION = \{'class': 'SimpleStrategy', 
'replication_factor': 3};
 * 
CREATE TABLE tests ( name text PRIMARY KEY, owner text, value_1 text, value_2 
text, value_3 text);

Here are the queries that we submit:
 * client request to node X (1st): UPDATE test.tests SET value_1 = 'A' WHERE 
name = 'testing' IF owner = 'user_1';
 * client request to node Y (2nd): UPDATE test.tests SET value_2 = 'B' WHERE 
name = 'testing' IF value_1 = 'A';
 * client request to node Z (3rd): UPDATE test.tests SET value_3 = 'C' WHERE 
name = 'testing' IF value_1 = 'A';

To confirm, when the bug is manifested, the end result will be: value_1='A', 
value_2=null, value_3=null



[~jjirsa], regarding our tool, at this point, it is not open for public. 

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: sankalp kohli
>Priority: Major
>  Labels: LWT
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-12438) Data inconsistencies with lightweight transactions, serial reads, and rejoining node

2018-09-27 Thread Jeffrey F. Lukman (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-12438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630602#comment-16630602
 ] 

Jeffrey F. Lukman commented on CASSANDRA-12438:
---

Yes, we also performed some reads after all related messages of the client 
requests have been executed to verify the consistency among the nodes.

We run this query:

SELECT * FROM test.tests WHERE name = 'cass-12438';

We executed this query to each node using the cqlsh.
If the bug is manifested, we can see that node X and Y will return the expected 
result, while node Z will return the buggy result.
Therefore, the data are inconsistent among the nodes.

> Data inconsistencies with lightweight transactions, serial reads, and 
> rejoining node
> 
>
> Key: CASSANDRA-12438
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12438
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Steven Schaefer
>Priority: Major
>
> I've run into some issues with data inconsistency in a situation where a 
> single node is rejoining a 3-node cluster with RF=3. I'm running 3.7.
> I have a client system which inserts data into a table with around 7 columns, 
> named let's say A-F,id, and version. LWTs are used to make the inserts and 
> updates.
> Typically what happens is there's an insert of values id, V_a1, V_b1, ... , 
> version=1, then another process will pick up rows with for example A=V_a1 and 
> subsequently update A to V_a2 and version=2. Yet another process will watch 
> for A=V_a2 to then make a second update to the same column, and set version 
> to 3, with end result being  There's a 
> secondary index on this A column (there's only a few possible values for A so 
> not worried about the cardinality issue), though I've reproed with the new 
> SASI index too.
> If one of the nodes is down, there's still 2 alive for quorum so inserts can 
> still happen. When I bring up the downed node, sometimes I get really weird 
> state back which ultimately crashes the client system that's talking to 
> Cassandra. 
> When reading I always select all the columns, but there is a conditional 
> where clause that A=V_a2 (e.g. SELECT * FROM table WHERE A=V_a2). This read 
> is for processing any rows with V_a2, and ultimately updating to V_a3 when 
> complete. While periodically polling for A=V_a2 it is of course possible for 
> the poller to to observe the old V_a2 value while the other parts of the 
> client system process and make the update to V_a3, and that's generally ok 
> because of the LWTs used for updates, an occassionaly wasted reprocessing run 
> ins't a big deal, but when reading at serial I always expect to get the 
> original values for columns that were never updated too. If a paxos update is 
> in progress then I expect that completed before its value(s) returned. But 
> instead, the read seems to be seeing the partial commit of the LWT, returning 
> the old V_2a value for the changed column, but no values whatsoever for the 
> other columns. From the example above, instead of getting  , version=3>, or even the older  (either of 
> which I expect and are ok), I get only , so the rest of 
> the columns end up null, which I never expect. However this isn't persistent, 
> Cassandra does end up consistent, which I see via sstabledump and cqlsh after 
> the fact.
> In my client system logs I record the insert / updates, and this 
> inconsistency happens around the same time as the update from V_a2 to V_a3, 
> hence my comment about Cassandra seeing a partial commit. So that leads me to 
> suspect that perhaps due to the where clause in my read query for A=V_a2, 
> perhaps one of the original good nodes already has the new V_a3 value, so it 
> doesn't return this row for the select query, but the other good node and the 
> one that was down still have the old value V_a2, so those 2 nodes return what 
> they have. The one that was down doesn't yet have the original insert, just 
> the update from V_a1 -> V_a2 (again I suspect, it's not been easy to verify), 
> which would explain where  comes from, that's all it 
> knows about. However since it's a serial quorum read, I'd expect some sort of 
> exception as neither of the remaining 2 nodes with A=V_a2 would be able to 
> come to a quorum on the values for all the columns, as I'd expect the other 
> good node to return 
> I know at some point nodetool repair should be run on this node, but I'm 
> concerned about a window of time between when the node comes back up and 
> repair starts/completes. It almost seems like if a node goes down the safest 
> bet is to remove it from the cluster and rebuild, instead of simply 
> restarting the node? However I haven't tested that to see if it runs into a 
> similar situation.
> It is of course possible to work

[jira] [Commented] (CASSANDRA-14795) Expose information about stored hints via JMX

2018-09-27 Thread Chris Lohfink (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630514#comment-16630514
 ] 

Chris Lohfink commented on CASSANDRA-14795:
---

this might be nice as a virtual table instead of a new nodetool command (or 
both)

> Expose information about stored hints via JMX
> -
>
> Key: CASSANDRA-14795
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14795
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Aleksandr Sorokoumov
>Assignee: Aleksandr Sorokoumov
>Priority: Minor
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
>
> Currently there is no way to determine what kind of hints a node has, apart 
> from looking at the filenames (thus host-ids) on disk. Having a way to access 
> this information would help with debugging hint creation/replay scenarios.
> In addition to the JMX method, there is a new nodetool command:
> {noformat}$ bin/nodetool -h 127.0.0.1 -p 7100 listendpointspendinghints
> Host ID Address Rack DC Status Total files Newest Oldest
> 5762b140-3fdf-4057-9ca7-05c070ccc9c3 127.0.0.2 rack1 datacenter1 DOWN 2 
> 2018-09-18 14:05:18,835 2018-09-18 14:05:08,811
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-14795) Expose information about stored hints via JMX

2018-09-27 Thread Aleksandr Sorokoumov (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Sorokoumov updated CASSANDRA-14795:
-
Fix Version/s: 4.0.x
   3.11.x
   3.0.x

> Expose information about stored hints via JMX
> -
>
> Key: CASSANDRA-14795
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14795
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Aleksandr Sorokoumov
>Assignee: Aleksandr Sorokoumov
>Priority: Minor
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
>
> Currently there is no way to determine what kind of hints a node has, apart 
> from looking at the filenames (thus host-ids) on disk. Having a way to access 
> this information would help with debugging hint creation/replay scenarios.
> In addition to the JMX method, there is a new nodetool command:
> {noformat}$ bin/nodetool -h 127.0.0.1 -p 7100 listendpointspendinghints
> Host ID Address Rack DC Status Total files Newest Oldest
> 5762b140-3fdf-4057-9ca7-05c070ccc9c3 127.0.0.2 rack1 datacenter1 DOWN 2 
> 2018-09-18 14:05:18,835 2018-09-18 14:05:08,811
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-14795) Expose information about stored hints via JMX

2018-09-27 Thread Aleksandr Sorokoumov (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Sorokoumov updated CASSANDRA-14795:
-
Status: Patch Available  (was: Open)

Patches:

* 
[3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...Ge:14795-3.0]
* 
[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...Ge:14795-3.11]
* [4.0|https://github.com/apache/cassandra/compare/trunk...Ge:14795-4.0]
* 
[dtest|https://github.com/Ge/cassandra-dtest/commit/54639ab84bd9a404ce9fbefb83a40c626f0d57a8]

I have created patches also for 3.0 and 3.11 as it does not change the existing 
code, but only exposes information via JMX.

> Expose information about stored hints via JMX
> -
>
> Key: CASSANDRA-14795
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14795
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Aleksandr Sorokoumov
>Assignee: Aleksandr Sorokoumov
>Priority: Minor
>
> Currently there is no way to determine what kind of hints a node has, apart 
> from looking at the filenames (thus host-ids) on disk. Having a way to access 
> this information would help with debugging hint creation/replay scenarios.
> In addition to the JMX method, there is a new nodetool command:
> {noformat}$ bin/nodetool -h 127.0.0.1 -p 7100 listendpointspendinghints
> Host ID Address Rack DC Status Total files Newest Oldest
> 5762b140-3fdf-4057-9ca7-05c070ccc9c3 127.0.0.2 rack1 datacenter1 DOWN 2 
> 2018-09-18 14:05:18,835 2018-09-18 14:05:08,811
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14793) Improve system table handling when losing a disk when using JBOD

2018-09-27 Thread Eric Evans (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630407#comment-16630407
 ] 

Eric Evans commented on CASSANDRA-14793:


Awesome; I'd been meaning to open this very ticket.

I had planned to suggest what [~krummas] did, that it be possible to put 
{{system}} in a different data directory. At least if this were possible, 
{{system}} could be put on a RAID.  And, at least in our environments, if the 
expectation is that you can survive a single device failure, then the OS is 
likely already on RAID-1 or similar.

Of course, if the tables in {{system}} could be regenerated, that would be 
better still but I'm not sure what that looks like complexity-wise versus 
pinning it.

> Improve system table handling when losing a disk when using JBOD
> 
>
> Key: CASSANDRA-14793
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14793
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Priority: Major
> Fix For: 4.0
>
>
> We should improve the way we handle disk failures when losing a disk in a 
> JBOD setup
>  One way could be to pin the system tables to a special data directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-14795) Expose information about stored hints via JMX

2018-09-27 Thread Aleksandr Sorokoumov (JIRA)

Aleksandr Sorokoumov created CASSANDRA-14795:


 Summary: Expose information about stored hints via JMX
 Key: CASSANDRA-14795
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14795
 Project: Cassandra
  Issue Type: Improvement
Reporter: Aleksandr Sorokoumov
Assignee: Aleksandr Sorokoumov


Currently there is no way to determine what kind of hints a node has, apart 
from looking at the filenames (thus host-ids) on disk. Having a way to access 
this information would help with debugging hint creation/replay scenarios.
In addition to the JMX method, there is a new nodetool command:


{noformat}$ bin/nodetool -h 127.0.0.1 -p 7100 listendpointspendinghints
Host ID Address Rack DC Status Total files Newest Oldest
5762b140-3fdf-4057-9ca7-05c070ccc9c3 127.0.0.2 rack1 datacenter1 DOWN 2 
2018-09-18 14:05:18,835 2018-09-18 14:05:08,811
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14791) [utest] tests unable to write system tmp directory

2018-09-27 Thread Marcus Eriksson (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630269#comment-16630269
 ] 

Marcus Eriksson commented on CASSANDRA-14791:
-

should we move unit tests to run in docker to get more control?

> [utest] tests unable to write system tmp directory
> --
>
> Key: CASSANDRA-14791
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14791
> Project: Cassandra
>  Issue Type: Task
>  Components: Testing
>Reporter: Jay Zhuang
>Priority: Minor
>
> Some tests are failing from time to time because it cannot write to directory 
> {{/tmp/}}:
> https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/lastCompletedBuild/testReport/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/
> {noformat}
> java.lang.RuntimeException: java.nio.file.AccessDeniedException: 
> /tmp/na-1-big-Data.db
>   at 
> org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:119)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:152)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:141)
>   at 
> org.apache.cassandra.io.compress.CompressedSequentialWriter.(CompressedSequentialWriter.java:82)
>   at 
> org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testCompressedReadWith(CompressedInputStreamTest.java:119)
>   at 
> org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testException(CompressedInputStreamTest.java:78)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.nio.file.AccessDeniedException: /tmp/na-1-big-Data.db
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
>   at java.nio.channels.FileChannel.open(FileChannel.java:287)
>   at java.nio.channels.FileChannel.open(FileChannel.java:335)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:100)
> {noformat}
>  I guess it's because some Jenkins slaves don't have proper permission set. 
> For slave {{cassandra16}}, the tests are fine:
> https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/723/testReport/junit/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/history/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14373) Allow using custom script for chronicle queue BinLog archival

2018-09-27 Thread Marcus Eriksson (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630259#comment-16630259
 ] 

Marcus Eriksson commented on CASSANDRA-14373:
-

hey [~psivaraju] thanks for the patch, I rewrote it a bit here: 
[https://github.com/krummas/cassandra/commits/marcuse/14373]
 * make sure nodetool can take multi-word arguments (ie, {{nodetool 
enablefullquerylog --archive-command "/path/to/executable.sh %path"}} should 
work) - my solution required me to change the nodetool interpreter from 
{{/bin/sh}} to {{/bin/bash}} to get arrays, I'll try to figure out a way to do 
in in posix sh though.
 * remove the option to set the --archive-command from enableauditlog - it 
wasn't working and it looks like it might need some more refactoring of that 
code to get it working
 * Break out the archiving logic in to two separate classes 
({{DeletingArchiver}} and {{ExternalArchiver}}) for easier testing
 * Reworked the configuration to add binlog specific configuration to its own 
class

Also wrote a bunch of dtests for audit/fullquerylogging here: 
[https://github.com/krummas/cassandra-dtest/commits/marcuse/14373] (these also 
required a CCM change which has been 
[merged|https://github.com/riptano/ccm/commit/a76e45c1fbe7ebb4ef43de686644c90181e80b29])

I also filed CASSANDRA-14772 to further clean up the audit/fullquerylogging code

Note that 4.0 is currently in feature freeze and one could argue that this is a 
new feature, but as it simplifies collecting full query logs for replay 
testing, I'd like to get it in 4.0.

> Allow using custom script for chronicle queue BinLog archival
> -
>
> Key: CASSANDRA-14373
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14373
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Stefan Podkowinski
>Assignee: Pramod K Sivaraju
>Priority: Major
>  Labels: lhf, pull-request-available
> Fix For: 4.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It would be nice to allow the user to configure an archival script that will 
> be executed in {{BinLog.onReleased(cycle, file)}} for every deleted bin log, 
> just as we do in {{CommitLogArchiver}}. The script should be able to copy the 
> released file to an external location or do whatever the author hand in mind. 
> Deleting the log file should be delegated to the script as well.
> See CASSANDRA-13983, CASSANDRA-12151 for use cases.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14787) nodetool status "Load" columns has wrong width

2018-09-27 Thread Lapo Luchini (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630260#comment-16630260
 ] 

Lapo Luchini commented on CASSANDRA-14787:
--

Today I have had even an 11 columns result: {{1010.18 KiB}}.

> nodetool status "Load" columns has wrong width
> --
>
> Key: CASSANDRA-14787
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14787
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Lapo Luchini
>Priority: Trivial
>
> Using Cassandra 3.11.2 on FreeBSD, I get:
> {code:java}
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  AddressLoad   Tokens   Owns (effective)  Host ID ...
> UN  server1.andxor.it  11.11 MiB  256  39.6% ...
> UN  server2.andxor.it  32.04 MiB  256  41.8% ...
> UN  server3.andxor.it  519.33 KiB  256  40.0% ...
> UN  server4.andxor.it  10.95 MiB  256  40.3% ...
> UN  server5.andxor.it  11.03 MiB  256  38.4% ...
> {code}
> AFAICT this is caused by {{"%-9s"}} in 
> [Status.java:292|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/tools/nodetool/Status.java#L292]
>  which should be probably a 10 instead of 9.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Assigned] (CASSANDRA-14781) Log message when mutation passed to CommitLog#add(Mutation) is too large is not descriptive enough

2018-09-27 Thread Kurt Greaves (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Greaves reassigned CASSANDRA-14781:


Assignee: Tom Petracca

> Log message when mutation passed to CommitLog#add(Mutation) is too large is 
> not descriptive enough
> --
>
> Key: CASSANDRA-14781
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14781
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Jordan West
>Assignee: Tom Petracca
>Priority: Major
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
> Attachments: CASSANDRA-14781.patch, CASSANDRA-14781_3.0.patch
>
>
> When hitting 
> [https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/commitlog/CommitLog.java#L256-L257],
>  the log message produced does not help the operator track down what data is 
> being written. At a minimum the keyspace and cfIds involved would be useful 
> (and are available) – more detail might not be reasonable to include. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-14781) Log message when mutation passed to CommitLog#add(Mutation) is too large is not descriptive enough

2018-09-27 Thread Kurt Greaves (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Greaves updated CASSANDRA-14781:
-
Status: Patch Available  (was: Open)

> Log message when mutation passed to CommitLog#add(Mutation) is too large is 
> not descriptive enough
> --
>
> Key: CASSANDRA-14781
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14781
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Jordan West
>Assignee: Tom Petracca
>Priority: Major
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
> Attachments: CASSANDRA-14781.patch, CASSANDRA-14781_3.0.patch
>
>
> When hitting 
> [https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/commitlog/CommitLog.java#L256-L257],
>  the log message produced does not help the operator track down what data is 
> being written. At a minimum the keyspace and cfIds involved would be useful 
> (and are available) – more detail might not be reasonable to include. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Resolved] (CASSANDRA-14784) Evaluate defaults in cassandra.yaml for 4.0

2018-09-27 Thread Kurt Greaves (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Greaves resolved CASSANDRA-14784.
--
   Resolution: Duplicate
Fix Version/s: (was: 4.0)

> Evaluate defaults in cassandra.yaml for 4.0
> ---
>
> Key: CASSANDRA-14784
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14784
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Dinesh Joshi
>Priority: Major
>  Labels: 4.0
>
> Some settings default to values that are known to cause issues for example 
> num_tokens has a default value of 256. Anybody who has operated Cassandra 
> knows to lower it. Setting this as default will cause users issues and has 
> proven problematic.
> Let’s evaluate the defaults in cassandra.yaml and update them for Cassandra 
> 4.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14784) Evaluate defaults in cassandra.yaml for 4.0

2018-09-27 Thread Kurt Greaves (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630190#comment-16630190
 ] 

Kurt Greaves commented on CASSANDRA-14784:
--

Resolving as duplicate of CASSANDRA-13701

> Evaluate defaults in cassandra.yaml for 4.0
> ---
>
> Key: CASSANDRA-14784
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14784
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Dinesh Joshi
>Priority: Major
>  Labels: 4.0
>
> Some settings default to values that are known to cause issues for example 
> num_tokens has a default value of 256. Anybody who has operated Cassandra 
> knows to lower it. Setting this as default will cause users issues and has 
> proven problematic.
> Let’s evaluate the defaults in cassandra.yaml and update them for Cassandra 
> 4.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14726) ReplicaCollection follow-up

2018-09-27 Thread Alex Petrov (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630187#comment-16630187
 ] 

Alex Petrov commented on CASSANDRA-14726:
-

+1,

I have found only one tiny problem in tests 
[here|https://github.com/apache/cassandra/pull/271#discussion-diff-220871785R173].
 All other comments are nits.

Regarding tests themselves, I'm usually trying to make non-generic tests (e.g. 
write hardcoded asserts rather than comparing outputs of functions), since 
they're easier to read for reviewers. Not saying we should get rid of the ones 
that are already there, but maybe we could at least add a comment or open a 
jira to write some more "hardcoded" tests.

> ReplicaCollection follow-up
> ---
>
> Key: CASSANDRA-14726
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14726
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We introduced \{{ReplicaCollection}} as part of CASSANDRA-14404, but while it 
> improves readability, we could do more to ensure it minimises extra garbage, 
> and does not otherwise unnecessarily waste cycles.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14791) [utest] tests unable to write system tmp directory

2018-09-27 Thread Kurt Greaves (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630175#comment-16630175
 ] 

Kurt Greaves commented on CASSANDRA-14791:
--

All the hosts are meant to be set up in the same way... However INFRA use 
puppet, which I have absolutely no faith in. The slaves are all completely 
managed by INFRA so it's probably a case for an INFRA ticket :/

> [utest] tests unable to write system tmp directory
> --
>
> Key: CASSANDRA-14791
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14791
> Project: Cassandra
>  Issue Type: Task
>  Components: Testing
>Reporter: Jay Zhuang
>Priority: Minor
>
> Some tests are failing from time to time because it cannot write to directory 
> {{/tmp/}}:
> https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/lastCompletedBuild/testReport/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/
> {noformat}
> java.lang.RuntimeException: java.nio.file.AccessDeniedException: 
> /tmp/na-1-big-Data.db
>   at 
> org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:119)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:152)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:141)
>   at 
> org.apache.cassandra.io.compress.CompressedSequentialWriter.(CompressedSequentialWriter.java:82)
>   at 
> org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testCompressedReadWith(CompressedInputStreamTest.java:119)
>   at 
> org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testException(CompressedInputStreamTest.java:78)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.nio.file.AccessDeniedException: /tmp/na-1-big-Data.db
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
>   at java.nio.channels.FileChannel.open(FileChannel.java:287)
>   at java.nio.channels.FileChannel.open(FileChannel.java:335)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:100)
> {noformat}
>  I guess it's because some Jenkins slaves don't have proper permission set. 
> For slave {{cassandra16}}, the tests are fine:
> https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/723/testReport/junit/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/history/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14793) Improve system table handling when losing a disk when using JBOD

2018-09-27 Thread Kurt Greaves (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630172#comment-16630172
 ] 

Kurt Greaves commented on CASSANDRA-14793:
--

IDK that pinning the system tables really fixes the issue as we'll still have 
the problem of losing one disk means a whole node replacement.
Theoretically it should be possible to rebuild the (important) system tables in 
the case they get corrupted. Most things currently should be able to be 
retrieved from gossip/other nodes. How about if there's something we currently 
can't retrieve we just store it in the peers table on all the nodes. I can't 
imagine the amount of info we'd have to store would be very big. Then we can 
just provide a startup flag to indicate that a node should try and rebuild its 
system tables.


> Improve system table handling when losing a disk when using JBOD
> 
>
> Key: CASSANDRA-14793
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14793
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Priority: Major
> Fix For: 4.0
>
>
> We should improve the way we handle disk failures when losing a disk in a 
> JBOD setup
>  One way could be to pin the system tables to a special data directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-12126) CAS Reads Inconsistencies

2018-09-27 Thread Benedict (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629950#comment-16629950
 ] 

Benedict edited comment on CASSANDRA-12126 at 9/27/18 10:12 AM:


[~jeffreyflukman] it would help if you could explicitly state the client 
responses returned for each of your operations.  The options are: time out, 
rejected (condition not met), success (condition met, and mutation applied)

For completeness, as with CASSANDRA-12438, the read queries you are performing, 
to which nodes, at what point and with what consistency levels would be helpful 
to know.  Are you verifying the state with a SERIAL read after the last query, 
most specifically?  Also, can we assume that the state of the table began with 
\\{name:'testing', owner:'user_1', value1:null, value2:null, value3:null}\?


was (Author: benedict):
I'm not sure if I'm following, but it seems the bug report is suggesting that 
operation #3 is returned to the client as successful, but #1's state is the 
only state visible.  However, if #1 was successful and the state of the cluster 
prior to #3 succeeding, then #3 should have also modified the cluster state 
since its IF statement should have evaluated to true.

As with CASSANDRA-12438, the read queries you are performing, to which nodes, 
at what point and with what consistency levels would be helpful to know.  Also, 
can we assume that the state of the table began with \\{name:'testing', 
owner:'user_1', value1:null, value2:null, value3:null}\?

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: sankalp kohli
>Priority: Major
>  Labels: LWT
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Assigned] (CASSANDRA-14794) Avoid calling iter.next() in a loop when notifying indexers about range tombstones

2018-09-27 Thread Marcus Eriksson (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson reassigned CASSANDRA-14794:
---

Assignee: Marcus Eriksson

> Avoid calling iter.next() in a loop when notifying indexers about range 
> tombstones
> --
>
> Key: CASSANDRA-14794
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14794
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> In 
> [SecondaryIndexManager|https://github.com/apache/cassandra/blob/914c66685c5bebe1624d827a9b4562b73a08c297/src/java/org/apache/cassandra/index/SecondaryIndexManager.java#L901-L902]
>  - avoid calling {{.next()}} in the {{.forEach(..)}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-14794) Avoid calling iter.next() in a loop when notifying indexers about range tombstones

2018-09-27 Thread Marcus Eriksson (JIRA)

Marcus Eriksson created CASSANDRA-14794:
---

 Summary: Avoid calling iter.next() in a loop when notifying 
indexers about range tombstones
 Key: CASSANDRA-14794
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14794
 Project: Cassandra
  Issue Type: Bug
Reporter: Marcus Eriksson
 Fix For: 3.0.x, 3.11.x, 4.x


In 
[SecondaryIndexManager|https://github.com/apache/cassandra/blob/914c66685c5bebe1624d827a9b4562b73a08c297/src/java/org/apache/cassandra/index/SecondaryIndexManager.java#L901-L902]
 - avoid calling {{.next()}} in the {{.forEach(..)}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14742) Race Condition in batchlog replica collection

2018-09-27 Thread Benedict (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629970#comment-16629970
 ] 

Benedict commented on CASSANDRA-14742:
--

Thanks.

I'm a little concerned about the inconsistency of our 
{{DatabaseDescriptor.getLocalDataCenter}} and this 
{{IEndpointSnitch.getLocalDataCenter}}.  I had intended to simply refer to the 
former (although, debatably, only the latter should exist - since the former is 
not consistently updated, although it should for correctness never change).

I'm honestly not clear what the best clean up of this mess is, particularly 
with the distinction between the per-replication strategy snitch and the global 
snitch (the former of which seem to simply cache the latter), and perhaps it 
can be deferred to a later dedicated cleanup ticket anyway.

> Race Condition in batchlog replica collection
> -
>
> Key: CASSANDRA-14742
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14742
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When we collect nodes for it in {{StorageProxy#getBatchlogReplicas}}, we 
> already filter out down replicas; subsequently they get picked up and taken 
> for liveAndDown.
> There's a possible race condition due to picking tokens from token metadata 
> twice (once in {{StorageProxy#getBatchlogReplicas}} and second one in 
> {{ReplicaPlan#forBatchlogWrite}})



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-14742) Race Condition in batchlog replica collection

2018-09-27 Thread Benedict (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629970#comment-16629970
 ] 

Benedict edited comment on CASSANDRA-14742 at 9/27/18 8:59 AM:
---

Thanks.

I'm a little concerned about the inconsistency of our 
{{DatabaseDescriptor.getLocalDataCenter}} and this 
{{IEndpointSnitch.getLocalDataCenter}}.  I had intended to simply refer to the 
former (although, debatably, only the latter should exist - since the former is 
not consistently updated, although it should for correctness never change).

I'm honestly not clear what the best clean up of this mess is, particularly 
with the distinction between the per-replication strategy snitch and the global 
snitch (the former of which seem to simply cache the latter), and perhaps it 
can be deferred to a later dedicated cleanup ticket anyway.

Otherwise, +1


was (Author: benedict):
Thanks.

I'm a little concerned about the inconsistency of our 
{{DatabaseDescriptor.getLocalDataCenter}} and this 
{{IEndpointSnitch.getLocalDataCenter}}.  I had intended to simply refer to the 
former (although, debatably, only the latter should exist - since the former is 
not consistently updated, although it should for correctness never change).

I'm honestly not clear what the best clean up of this mess is, particularly 
with the distinction between the per-replication strategy snitch and the global 
snitch (the former of which seem to simply cache the latter), and perhaps it 
can be deferred to a later dedicated cleanup ticket anyway.

> Race Condition in batchlog replica collection
> -
>
> Key: CASSANDRA-14742
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14742
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When we collect nodes for it in {{StorageProxy#getBatchlogReplicas}}, we 
> already filter out down replicas; subsequently they get picked up and taken 
> for liveAndDown.
> There's a possible race condition due to picking tokens from token metadata 
> twice (once in {{StorageProxy#getBatchlogReplicas}} and second one in 
> {{ReplicaPlan#forBatchlogWrite}})



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-14742) Race Condition in batchlog replica collection

2018-09-27 Thread Alex Petrov (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629957#comment-16629957
 ] 

Alex Petrov edited comment on CASSANDRA-14742 at 9/27/18 8:47 AM:
--

Thank you, have addressed all three comments and took a  liberty to introduce 
{{localDatacenter}} and {{localRack}} shortcuts, waiting for one more CI run.


was (Author: ifesdjeen):
Thank you, have addressed all three comments and took a  liberty to introduce 
{{localDatacenter}} and {{localRack}} shortcuts.

> Race Condition in batchlog replica collection
> -
>
> Key: CASSANDRA-14742
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14742
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When we collect nodes for it in {{StorageProxy#getBatchlogReplicas}}, we 
> already filter out down replicas; subsequently they get picked up and taken 
> for liveAndDown.
> There's a possible race condition due to picking tokens from token metadata 
> twice (once in {{StorageProxy#getBatchlogReplicas}} and second one in 
> {{ReplicaPlan#forBatchlogWrite}})



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14742) Race Condition in batchlog replica collection

2018-09-27 Thread Alex Petrov (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629957#comment-16629957
 ] 

Alex Petrov commented on CASSANDRA-14742:
-

Thank you, have addressed all three comments and took a  liberty to introduce 
{{localDatacenter}} and {{localRack}} shortcuts.

> Race Condition in batchlog replica collection
> -
>
> Key: CASSANDRA-14742
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14742
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When we collect nodes for it in {{StorageProxy#getBatchlogReplicas}}, we 
> already filter out down replicas; subsequently they get picked up and taken 
> for liveAndDown.
> There's a possible race condition due to picking tokens from token metadata 
> twice (once in {{StorageProxy#getBatchlogReplicas}} and second one in 
> {{ReplicaPlan#forBatchlogWrite}})



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2018-09-27 Thread Benedict (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629950#comment-16629950
 ] 

Benedict commented on CASSANDRA-12126:
--

I'm not sure if I'm following, but it seems the bug report is suggesting that 
operation #3 is returned to the client as successful, but #1's state is the 
only state visible.  However, if #1 was successful and the state of the cluster 
prior to #3 succeeding, then #3 should have also modified the cluster state 
since its IF statement should have evaluated to true.

As with CASSANDRA-12438, the read queries you are performing, to which nodes, 
at what point and with what consistency levels would be helpful to know.  Also, 
can we assume that the state of the table began with \\{name:'testing', 
owner:'user_1', value1:null, value2:null, value3:null}\?

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: sankalp kohli
>Priority: Major
>  Labels: LWT
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

cassandra-dtest git commit: Adjust error messages after CASSANDRA-14756

2018-09-27 Thread ifesdjeen

Repository: cassandra-dtest
Updated Branches:
  refs/heads/master f4888c897 -> 99c695e85


Adjust error messages after CASSANDRA-14756


Project: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/commit/99c695e8
Tree: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/tree/99c695e8
Diff: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/diff/99c695e8

Branch: refs/heads/master
Commit: 99c695e850057a16bddd0e314aef31dde1d76a2f
Parents: f4888c8
Author: Alex Petrov 
Authored: Mon Sep 17 14:16:56 2018 +0200
Committer: Alex Petrov 
Committed: Thu Sep 27 10:43:19 2018 +0200

--
 bootstrap_test.py | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra-dtest/blob/99c695e8/bootstrap_test.py
--
diff --git a/bootstrap_test.py b/bootstrap_test.py
index 661aef9..6e7682f 100644
--- a/bootstrap_test.py
+++ b/bootstrap_test.py
@@ -284,7 +284,10 @@ class TestBootstrap(Tester):
 assert_bootstrap_state(self, node3, 'COMPLETED')
 else:
 if consistent_range_movement:
-node3.watch_log_for("A node required to move the data 
consistently is down")
+if cluster.version() < '4.0':
+node3.watch_log_for("A node required to move the data 
consistently is down")
+else:
+node3.watch_log_for("Necessary replicas for strict 
consistency were removed by source filters")
 else:
 node3.watch_log_for("Unable to find sufficient sources for 
streaming range")
 assert_not_running(node3)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-12438) Data inconsistencies with lightweight transactions, serial reads, and rejoining node

2018-09-27 Thread Benedict (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-12438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629887#comment-16629887
 ] 

Benedict edited comment on CASSANDRA-12438 at 9/27/18 8:10 AM:
---

You must also perform some reads?  What reads are you performing to verify the 
cluster state, at what consistency level, and routed to which coordinator, at 
which times?


was (Author: benedict):
You must also perform some reads?  What reads are you performing to verify the 
cluster state, at what consistency level, and routed to which coordinator?

> Data inconsistencies with lightweight transactions, serial reads, and 
> rejoining node
> 
>
> Key: CASSANDRA-12438
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12438
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Steven Schaefer
>Priority: Major
>
> I've run into some issues with data inconsistency in a situation where a 
> single node is rejoining a 3-node cluster with RF=3. I'm running 3.7.
> I have a client system which inserts data into a table with around 7 columns, 
> named let's say A-F,id, and version. LWTs are used to make the inserts and 
> updates.
> Typically what happens is there's an insert of values id, V_a1, V_b1, ... , 
> version=1, then another process will pick up rows with for example A=V_a1 and 
> subsequently update A to V_a2 and version=2. Yet another process will watch 
> for A=V_a2 to then make a second update to the same column, and set version 
> to 3, with end result being  There's a 
> secondary index on this A column (there's only a few possible values for A so 
> not worried about the cardinality issue), though I've reproed with the new 
> SASI index too.
> If one of the nodes is down, there's still 2 alive for quorum so inserts can 
> still happen. When I bring up the downed node, sometimes I get really weird 
> state back which ultimately crashes the client system that's talking to 
> Cassandra. 
> When reading I always select all the columns, but there is a conditional 
> where clause that A=V_a2 (e.g. SELECT * FROM table WHERE A=V_a2). This read 
> is for processing any rows with V_a2, and ultimately updating to V_a3 when 
> complete. While periodically polling for A=V_a2 it is of course possible for 
> the poller to to observe the old V_a2 value while the other parts of the 
> client system process and make the update to V_a3, and that's generally ok 
> because of the LWTs used for updates, an occassionaly wasted reprocessing run 
> ins't a big deal, but when reading at serial I always expect to get the 
> original values for columns that were never updated too. If a paxos update is 
> in progress then I expect that completed before its value(s) returned. But 
> instead, the read seems to be seeing the partial commit of the LWT, returning 
> the old V_2a value for the changed column, but no values whatsoever for the 
> other columns. From the example above, instead of getting  , version=3>, or even the older  (either of 
> which I expect and are ok), I get only , so the rest of 
> the columns end up null, which I never expect. However this isn't persistent, 
> Cassandra does end up consistent, which I see via sstabledump and cqlsh after 
> the fact.
> In my client system logs I record the insert / updates, and this 
> inconsistency happens around the same time as the update from V_a2 to V_a3, 
> hence my comment about Cassandra seeing a partial commit. So that leads me to 
> suspect that perhaps due to the where clause in my read query for A=V_a2, 
> perhaps one of the original good nodes already has the new V_a3 value, so it 
> doesn't return this row for the select query, but the other good node and the 
> one that was down still have the old value V_a2, so those 2 nodes return what 
> they have. The one that was down doesn't yet have the original insert, just 
> the update from V_a1 -> V_a2 (again I suspect, it's not been easy to verify), 
> which would explain where  comes from, that's all it 
> knows about. However since it's a serial quorum read, I'd expect some sort of 
> exception as neither of the remaining 2 nodes with A=V_a2 would be able to 
> come to a quorum on the values for all the columns, as I'd expect the other 
> good node to return 
> I know at some point nodetool repair should be run on this node, but I'm 
> concerned about a window of time between when the node comes back up and 
> repair starts/completes. It almost seems like if a node goes down the safest 
> bet is to remove it from the cluster and rebuild, instead of simply 
> restarting the node? However I haven't tested that to see if it runs into a 
> similar situation.
> It is of course possible to work around the inconsistency for now by 
> detecting and ignoring it in the client

[jira] [Commented] (CASSANDRA-12438) Data inconsistencies with lightweight transactions, serial reads, and rejoining node

2018-09-27 Thread Benedict (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-12438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629887#comment-16629887
 ] 

Benedict commented on CASSANDRA-12438:
--

You must also perform some reads?  What reads are you performing to verify the 
cluster state, at what consistency level, and routed to which coordinator?

> Data inconsistencies with lightweight transactions, serial reads, and 
> rejoining node
> 
>
> Key: CASSANDRA-12438
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12438
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Steven Schaefer
>Priority: Major
>
> I've run into some issues with data inconsistency in a situation where a 
> single node is rejoining a 3-node cluster with RF=3. I'm running 3.7.
> I have a client system which inserts data into a table with around 7 columns, 
> named let's say A-F,id, and version. LWTs are used to make the inserts and 
> updates.
> Typically what happens is there's an insert of values id, V_a1, V_b1, ... , 
> version=1, then another process will pick up rows with for example A=V_a1 and 
> subsequently update A to V_a2 and version=2. Yet another process will watch 
> for A=V_a2 to then make a second update to the same column, and set version 
> to 3, with end result being  There's a 
> secondary index on this A column (there's only a few possible values for A so 
> not worried about the cardinality issue), though I've reproed with the new 
> SASI index too.
> If one of the nodes is down, there's still 2 alive for quorum so inserts can 
> still happen. When I bring up the downed node, sometimes I get really weird 
> state back which ultimately crashes the client system that's talking to 
> Cassandra. 
> When reading I always select all the columns, but there is a conditional 
> where clause that A=V_a2 (e.g. SELECT * FROM table WHERE A=V_a2). This read 
> is for processing any rows with V_a2, and ultimately updating to V_a3 when 
> complete. While periodically polling for A=V_a2 it is of course possible for 
> the poller to to observe the old V_a2 value while the other parts of the 
> client system process and make the update to V_a3, and that's generally ok 
> because of the LWTs used for updates, an occassionaly wasted reprocessing run 
> ins't a big deal, but when reading at serial I always expect to get the 
> original values for columns that were never updated too. If a paxos update is 
> in progress then I expect that completed before its value(s) returned. But 
> instead, the read seems to be seeing the partial commit of the LWT, returning 
> the old V_2a value for the changed column, but no values whatsoever for the 
> other columns. From the example above, instead of getting  , version=3>, or even the older  (either of 
> which I expect and are ok), I get only , so the rest of 
> the columns end up null, which I never expect. However this isn't persistent, 
> Cassandra does end up consistent, which I see via sstabledump and cqlsh after 
> the fact.
> In my client system logs I record the insert / updates, and this 
> inconsistency happens around the same time as the update from V_a2 to V_a3, 
> hence my comment about Cassandra seeing a partial commit. So that leads me to 
> suspect that perhaps due to the where clause in my read query for A=V_a2, 
> perhaps one of the original good nodes already has the new V_a3 value, so it 
> doesn't return this row for the select query, but the other good node and the 
> one that was down still have the old value V_a2, so those 2 nodes return what 
> they have. The one that was down doesn't yet have the original insert, just 
> the update from V_a1 -> V_a2 (again I suspect, it's not been easy to verify), 
> which would explain where  comes from, that's all it 
> knows about. However since it's a serial quorum read, I'd expect some sort of 
> exception as neither of the remaining 2 nodes with A=V_a2 would be able to 
> come to a quorum on the values for all the columns, as I'd expect the other 
> good node to return 
> I know at some point nodetool repair should be run on this node, but I'm 
> concerned about a window of time between when the node comes back up and 
> repair starts/completes. It almost seems like if a node goes down the safest 
> bet is to remove it from the cluster and rebuild, instead of simply 
> restarting the node? However I haven't tested that to see if it runs into a 
> similar situation.
> It is of course possible to work around the inconsistency for now by 
> detecting and ignoring it in the client system, but if there is indeed a bug 
> I hope we can identify it and ultimately resolve it.
> I'm also curious if this relates to CASSANDRA-12126, and also CASSANDRA-11219 
> may be relevant.
> I've been reproducing with a combination of manually

[jira] [Commented] (CASSANDRA-12704) snapshot build never be able to publish to mvn artifactory

2018-09-27 Thread Stephen Connolly (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-12704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629803#comment-16629803
 ] 

Stephen Connolly commented on CASSANDRA-12704:
--

IIRC it was hard enough getting interest to publish releases... but were back 
in the mists of time

> snapshot build never be able to publish to mvn artifactory
> --
>
> Key: CASSANDRA-12704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12704
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Minor
> Attachments: 12704-trunk.txt
>
>
> {code}
> $ ant publish
> {code}
> works fine when property "release" is set, which publishes the binaries to 
> release Artifactory.
> But for daily snapshot build, if "release" is set, it won't be snapshot build:
> https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L74
> if "release" is not set, it doesn't publish to snapshot Artifactory:
> https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L1888
> I would suggest just removing the "if check" for target "publish".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

54 matches

Mail list logo