[jira] [Commented] (CASSANDRA-13455) lose check of null strings in decoding client token

2017-04-18 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974071#comment-15974071
 ] 

Robert Stupp commented on CASSANDRA-13455:
--

Well, there's technically nothing wrong with empty passwords and people in the 
wild actually use empty passwords. Therefore I doubt that the 
empty-password-check is feasible.
For the empty-username-check, I guess that's ok and does not harm. However, 
since it's impossible to have a user/role with an empty name, the check seems 
to be superfluous in {{PasswordAuthenticator}} - i.e. it would not do anything.

> lose check of null strings in decoding client token
> ---
>
> Key: CASSANDRA-13455
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13455
> Project: Cassandra
>  Issue Type: Bug
> Environment: CentOS7.2
> Java 1.8
>Reporter: Amos Jianjun Kong
>Assignee: Amos Jianjun Kong
> Fix For: 3.10
>
> Attachments: 0001-auth-check-both-null-points-and-null-strings.patch, 
> 0001-auth-strictly-delimit-in-decoding-client-token.patch
>
>
> RFC4616 requests AuthZID, USERNAME, PASSWORD are delimited by single '\000'.
> Current code actually delimits by serial '\000', when username or password
> is null, it caused decoding derangement.
> The problem was found in code review.
> 
> update: above description is wrong, the problem is that :
> When client responses null strings for username or password,
> current decodeCredentials() can't identify it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-04-18 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974024#comment-15974024
 ] 

Corentin Chary commented on CASSANDRA-13418:


[~rgerard]: No it should certainly not be the default. If you look at the 
description of our usecase, it's only necessary when you have short-lived data 
with a lot of cells, which make running periodic repairs impossible or very 
impractical and when you also need/want read-repairs because you can't afford 
QUORUM reads (datacenters on separate continents and low latency requirements). 
So there is a need for it, but it should not be the default.

[~jjirsa], any opinion ?

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13307) The specification of protocol version in cqlsh means the python driver doesn't automatically downgrade protocol version.

2017-04-18 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973974#comment-15973974
 ] 

T Jake Luciani commented on CASSANDRA-13307:


yes please

> The specification of protocol version in cqlsh means the python driver 
> doesn't automatically downgrade protocol version.
> 
>
> Key: CASSANDRA-13307
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13307
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Matt Byrd
>Assignee: Matt Byrd
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 3.11.x
>
>
> Hi,
> Looks like we've regressed on the issue described in:
> https://issues.apache.org/jira/browse/CASSANDRA-9467
> In that we're no longer able to connect from newer cqlsh versions
> (e.g trunk) to older versions of Cassandra with a lower version of the 
> protocol (e.g 2.1 with protocol version 3)
> The problem seems to be that we're relying on the ability for the client to 
> automatically downgrade protocol version implemented in Cassandra here:
> https://issues.apache.org/jira/browse/CASSANDRA-12838
> and utilised in the python client here:
> https://datastax-oss.atlassian.net/browse/PYTHON-240
> The problem however comes when we implemented:
> https://datastax-oss.atlassian.net/browse/PYTHON-537
> "Don't downgrade protocol version if explicitly set" 
> (included when we bumped from 3.5.0 to 3.7.0 of the python driver as part of 
> fixing: https://issues.apache.org/jira/browse/CASSANDRA-11534)
> Since we do explicitly specify the protocol version in the bin/cqlsh.py.
> I've got a patch which just adds an option to explicitly specify the protocol 
> version (for those who want to do that) and then otherwise defaults to not 
> setting the protocol version, i.e using the protocol version from the client 
> which we ship, which should by default be the same protocol as the server.
> Then it should downgrade gracefully as was intended. 
> Let me know if that seems reasonable.
> Thanks,
> Matt



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-11381) Node running with join_ring=false and authentication can not serve requests

2017-04-18 Thread mck (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mck updated CASSANDRA-11381:

Status: In Progress  (was: Awaiting Feedback)

> Node running with join_ring=false and authentication can not serve requests
> ---
>
> Key: CASSANDRA-11381
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11381
> Project: Cassandra
>  Issue Type: Bug
>Reporter: mck
>Assignee: mck
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.11.x
>
> Attachments: 11381-2.1.txt, 11381-2.2.txt, 11381-3.0.txt, 
> 11381-3.X.txt, 11381-trunk.txt, dtest-11381-trunk.txt
>
>
> Starting up a node with {{-Dcassandra.join_ring=false}} in a cluster that has 
> authentication configured, eg PasswordAuthenticator, won't be able to serve 
> requests. This is because {{Auth.setup()}} never gets called during the 
> startup.
> Without {{Auth.setup()}} having been called in {{StorageService}} clients 
> connecting to the node fail with the node throwing
> {noformat}
> java.lang.NullPointerException
> at 
> org.apache.cassandra.auth.PasswordAuthenticator.authenticate(PasswordAuthenticator.java:119)
> at 
> org.apache.cassandra.thrift.CassandraServer.login(CassandraServer.java:1471)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$login.getResult(Cassandra.java:3505)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$login.getResult(Cassandra.java:3489)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at com.thinkaurelius.thrift.Message.invoke(Message.java:314)
> at 
> com.thinkaurelius.thrift.Message$Invocation.execute(Message.java:90)
> at 
> com.thinkaurelius.thrift.TDisruptorServer$InvocationHandler.onEvent(TDisruptorServer.java:695)
> at 
> com.thinkaurelius.thrift.TDisruptorServer$InvocationHandler.onEvent(TDisruptorServer.java:689)
> at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:112)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The exception thrown from the 
> [code|https://github.com/apache/cassandra/blob/cassandra-2.0.16/src/java/org/apache/cassandra/auth/PasswordAuthenticator.java#L119]
> {code}
> ResultMessage.Rows rows = 
> authenticateStatement.execute(QueryState.forInternalCalls(), new 
> QueryOptions(consistencyForUser(username),
>   
>Lists.newArrayList(ByteBufferUtil.bytes(username;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CASSANDRA-12835) Tracing payload not passed from QueryMessage to tracing session

2017-04-18 Thread mck (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973938#comment-15973938
 ] 

mck edited comment on CASSANDRA-12835 at 4/19/17 2:39 AM:
--

{quote}One nit: I think you've included an unused import in the TracingTest: 
org.apache.commons.lang3.StringUtils{quote}
StringUtils is used on line 194.

{quote}+1 assuming the tests look good.{quote}
[~tjake], Is protocol for you as the reviewer still to push my commit? Or given 
a commented "+1" by the reviewer am I free to push (updating the commit msg to 
mark you as reviewer)?


was (Author: michaelsembwever):
{quote}One nit: I think you've included an unused import in the TracingTest: 
org.apache.commons.lang3.StringUtils{quote}
StringUtils is used on line 194.

{quote}+1 assuming the tests look good.{quote}
[~tjake], Is protocol for you as the review still to push my commit? Or given a 
commented "+1" by the reviewer am I free to push (updating the commit msg to 
mark you as reviewer)?

> Tracing payload not passed from QueryMessage to tracing session
> ---
>
> Key: CASSANDRA-12835
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12835
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Hannu Kröger
>Assignee: mck
>Priority: Critical
>  Labels: tracing
> Fix For: 3.11.x, 4.x
>
>
> Caused by CASSANDRA-10392.
> Related to CASSANDRA-11706.
> When querying using CQL statements (not prepared) the message type is 
> QueryMessage and the code in 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/QueryMessage.java#L101
>  is as follows:
> {code:java}
> if (state.traceNextQuery())
> {
> state.createTracingSession();
> ImmutableMap.Builder builder = 
> ImmutableMap.builder();
> {code}
> {{state.createTracingSession();}} should probably be 
> {{state.createTracingSession(getCustomPayload());}}. At least that fixes the 
> problem for me.
> This also raises the question whether some other parts of the code should 
> pass the custom payload as well (I'm not the right person to analyze this):
> {code}
> $ ag createTracingSession
> src/java/org/apache/cassandra/service/QueryState.java
> 80:public void createTracingSession()
> 82:createTracingSession(Collections.EMPTY_MAP);
> 85:public void createTracingSession(Map customPayload)
> src/java/org/apache/cassandra/thrift/CassandraServer.java
> 2528:state().getQueryState().createTracingSession();
> src/java/org/apache/cassandra/transport/messages/BatchMessage.java
> 163:state.createTracingSession();
> src/java/org/apache/cassandra/transport/messages/ExecuteMessage.java
> 114:state.createTracingSession(getCustomPayload());
> src/java/org/apache/cassandra/transport/messages/QueryMessage.java
> 101:state.createTracingSession();
> src/java/org/apache/cassandra/transport/messages/PrepareMessage.java
> 74:state.createTracingSession();
> {code}
> This is not marked as `minor` as the CASSANDRA-11706 was because this cannot 
> be fixed by the tracing plugin.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CASSANDRA-12835) Tracing payload not passed from QueryMessage to tracing session

2017-04-18 Thread mck (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972551#comment-15972551
 ] 

mck edited comment on CASSANDRA-12835 at 4/19/17 2:39 AM:
--

[~tjake], patches are updated here:
|| Branch   || Testall  || Dtest ||
| 
[3.11|https://github.com/michaelsembwever/cassandra/commit/4105fc71c652794d3ae1fba475f01ebf00199a07]
  | [testall|https://circleci.com/gh/michaelsembwever/cassandra/16]   | 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/20/]
 |
| 
[trunk|https://github.com/michaelsembwever/cassandra/commit/c4de4f0dd0e70d7d67ade1e315ee3053494cf51c]
 | [testall|https://circleci.com/gh/michaelsembwever/cassandra/20]  
 | 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/19/]
 |

(dtests are queued and will likely take some time to complete)


was (Author: michaelsembwever):
[~tjake], patches are updated here:
|| Branch   || Testall  || Dtest ||
| 
[3.11|https://github.com/michaelsembwever/cassandra/commit/4105fc71c652794d3ae1fba475f01ebf00199a07]
  | [testall|https://circleci.com/gh/michaelsembwever/cassandra/16]   | 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/15/]
 |
| 
[trunk|https://github.com/michaelsembwever/cassandra/commit/c4de4f0dd0e70d7d67ade1e315ee3053494cf51c]
 | [testall|https://circleci.com/gh/michaelsembwever/cassandra/20]  
 | 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/19/]
 |

(dtests are queued and will likely take some time to complete)

> Tracing payload not passed from QueryMessage to tracing session
> ---
>
> Key: CASSANDRA-12835
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12835
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Hannu Kröger
>Assignee: mck
>Priority: Critical
>  Labels: tracing
> Fix For: 3.11.x, 4.x
>
>
> Caused by CASSANDRA-10392.
> Related to CASSANDRA-11706.
> When querying using CQL statements (not prepared) the message type is 
> QueryMessage and the code in 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/QueryMessage.java#L101
>  is as follows:
> {code:java}
> if (state.traceNextQuery())
> {
> state.createTracingSession();
> ImmutableMap.Builder builder = 
> ImmutableMap.builder();
> {code}
> {{state.createTracingSession();}} should probably be 
> {{state.createTracingSession(getCustomPayload());}}. At least that fixes the 
> problem for me.
> This also raises the question whether some other parts of the code should 
> pass the custom payload as well (I'm not the right person to analyze this):
> {code}
> $ ag createTracingSession
> src/java/org/apache/cassandra/service/QueryState.java
> 80:public void createTracingSession()
> 82:createTracingSession(Collections.EMPTY_MAP);
> 85:public void createTracingSession(Map customPayload)
> src/java/org/apache/cassandra/thrift/CassandraServer.java
> 2528:state().getQueryState().createTracingSession();
> src/java/org/apache/cassandra/transport/messages/BatchMessage.java
> 163:state.createTracingSession();
> src/java/org/apache/cassandra/transport/messages/ExecuteMessage.java
> 114:state.createTracingSession(getCustomPayload());
> src/java/org/apache/cassandra/transport/messages/QueryMessage.java
> 101:state.createTracingSession();
> src/java/org/apache/cassandra/transport/messages/PrepareMessage.java
> 74:state.createTracingSession();
> {code}
> This is not marked as `minor` as the CASSANDRA-11706 was because this cannot 
> be fixed by the tracing plugin.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-12835) Tracing payload not passed from QueryMessage to tracing session

2017-04-18 Thread mck (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973938#comment-15973938
 ] 

mck commented on CASSANDRA-12835:
-

{quote}One nit: I think you've included an unused import in the TracingTest: 
org.apache.commons.lang3.StringUtils{quote}
StringUtils is used on line 194.

{quote}+1 assuming the tests look good.{quote}
Is protocol for you as the review still to push my commit? Or given a commented 
"+1" by the reviewer am I free to push (updating the commit msg to mark you as 
reviewer)?

> Tracing payload not passed from QueryMessage to tracing session
> ---
>
> Key: CASSANDRA-12835
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12835
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Hannu Kröger
>Assignee: mck
>Priority: Critical
>  Labels: tracing
> Fix For: 3.11.x, 4.x
>
>
> Caused by CASSANDRA-10392.
> Related to CASSANDRA-11706.
> When querying using CQL statements (not prepared) the message type is 
> QueryMessage and the code in 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/QueryMessage.java#L101
>  is as follows:
> {code:java}
> if (state.traceNextQuery())
> {
> state.createTracingSession();
> ImmutableMap.Builder builder = 
> ImmutableMap.builder();
> {code}
> {{state.createTracingSession();}} should probably be 
> {{state.createTracingSession(getCustomPayload());}}. At least that fixes the 
> problem for me.
> This also raises the question whether some other parts of the code should 
> pass the custom payload as well (I'm not the right person to analyze this):
> {code}
> $ ag createTracingSession
> src/java/org/apache/cassandra/service/QueryState.java
> 80:public void createTracingSession()
> 82:createTracingSession(Collections.EMPTY_MAP);
> 85:public void createTracingSession(Map customPayload)
> src/java/org/apache/cassandra/thrift/CassandraServer.java
> 2528:state().getQueryState().createTracingSession();
> src/java/org/apache/cassandra/transport/messages/BatchMessage.java
> 163:state.createTracingSession();
> src/java/org/apache/cassandra/transport/messages/ExecuteMessage.java
> 114:state.createTracingSession(getCustomPayload());
> src/java/org/apache/cassandra/transport/messages/QueryMessage.java
> 101:state.createTracingSession();
> src/java/org/apache/cassandra/transport/messages/PrepareMessage.java
> 74:state.createTracingSession();
> {code}
> This is not marked as `minor` as the CASSANDRA-11706 was because this cannot 
> be fixed by the tracing plugin.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CASSANDRA-12835) Tracing payload not passed from QueryMessage to tracing session

2017-04-18 Thread mck (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973938#comment-15973938
 ] 

mck edited comment on CASSANDRA-12835 at 4/19/17 2:37 AM:
--

{quote}One nit: I think you've included an unused import in the TracingTest: 
org.apache.commons.lang3.StringUtils{quote}
StringUtils is used on line 194.

{quote}+1 assuming the tests look good.{quote}
[~tjake], Is protocol for you as the review still to push my commit? Or given a 
commented "+1" by the reviewer am I free to push (updating the commit msg to 
mark you as reviewer)?


was (Author: michaelsembwever):
{quote}One nit: I think you've included an unused import in the TracingTest: 
org.apache.commons.lang3.StringUtils{quote}
StringUtils is used on line 194.

{quote}+1 assuming the tests look good.{quote}
Is protocol for you as the review still to push my commit? Or given a commented 
"+1" by the reviewer am I free to push (updating the commit msg to mark you as 
reviewer)?

> Tracing payload not passed from QueryMessage to tracing session
> ---
>
> Key: CASSANDRA-12835
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12835
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Hannu Kröger
>Assignee: mck
>Priority: Critical
>  Labels: tracing
> Fix For: 3.11.x, 4.x
>
>
> Caused by CASSANDRA-10392.
> Related to CASSANDRA-11706.
> When querying using CQL statements (not prepared) the message type is 
> QueryMessage and the code in 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/QueryMessage.java#L101
>  is as follows:
> {code:java}
> if (state.traceNextQuery())
> {
> state.createTracingSession();
> ImmutableMap.Builder builder = 
> ImmutableMap.builder();
> {code}
> {{state.createTracingSession();}} should probably be 
> {{state.createTracingSession(getCustomPayload());}}. At least that fixes the 
> problem for me.
> This also raises the question whether some other parts of the code should 
> pass the custom payload as well (I'm not the right person to analyze this):
> {code}
> $ ag createTracingSession
> src/java/org/apache/cassandra/service/QueryState.java
> 80:public void createTracingSession()
> 82:createTracingSession(Collections.EMPTY_MAP);
> 85:public void createTracingSession(Map customPayload)
> src/java/org/apache/cassandra/thrift/CassandraServer.java
> 2528:state().getQueryState().createTracingSession();
> src/java/org/apache/cassandra/transport/messages/BatchMessage.java
> 163:state.createTracingSession();
> src/java/org/apache/cassandra/transport/messages/ExecuteMessage.java
> 114:state.createTracingSession(getCustomPayload());
> src/java/org/apache/cassandra/transport/messages/QueryMessage.java
> 101:state.createTracingSession();
> src/java/org/apache/cassandra/transport/messages/PrepareMessage.java
> 74:state.createTracingSession();
> {code}
> This is not marked as `minor` as the CASSANDRA-11706 was because this cannot 
> be fixed by the tracing plugin.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CASSANDRA-12835) Tracing payload not passed from QueryMessage to tracing session

2017-04-18 Thread mck (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972551#comment-15972551
 ] 

mck edited comment on CASSANDRA-12835 at 4/19/17 2:32 AM:
--

[~tjake], patches are updated here:
|| Branch   || Testall  || Dtest ||
| 
[3.11|https://github.com/michaelsembwever/cassandra/commit/4105fc71c652794d3ae1fba475f01ebf00199a07]
  | [testall|https://circleci.com/gh/michaelsembwever/cassandra/16]   | 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/15/]
 |
| 
[trunk|https://github.com/michaelsembwever/cassandra/commit/c4de4f0dd0e70d7d67ade1e315ee3053494cf51c]
 | [testall|https://circleci.com/gh/michaelsembwever/cassandra/20]  
 | 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/19/]
 |

(dtests are queued and will likely take some time to complete)


was (Author: michaelsembwever):
[~tjake], patches are updated here:
|| Branch   || Testall  || Dtest ||
| 
[3.11|https://github.com/michaelsembwever/cassandra/commit/4105fc71c652794d3ae1fba475f01ebf00199a07]
  | [testall|https://circleci.com/gh/michaelsembwever/cassandra/16]   | 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/15/]
 |
| 
[trunk|https://github.com/michaelsembwever/cassandra/commit/c4de4f0dd0e70d7d67ade1e315ee3053494cf51c]
 | [testall|https://circleci.com/gh/michaelsembwever/cassandra/17]  
 | 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/19/]
 |

(dtests are queued and will likely take some time to complete)

> Tracing payload not passed from QueryMessage to tracing session
> ---
>
> Key: CASSANDRA-12835
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12835
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Hannu Kröger
>Assignee: mck
>Priority: Critical
>  Labels: tracing
> Fix For: 3.11.x, 4.x
>
>
> Caused by CASSANDRA-10392.
> Related to CASSANDRA-11706.
> When querying using CQL statements (not prepared) the message type is 
> QueryMessage and the code in 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/QueryMessage.java#L101
>  is as follows:
> {code:java}
> if (state.traceNextQuery())
> {
> state.createTracingSession();
> ImmutableMap.Builder builder = 
> ImmutableMap.builder();
> {code}
> {{state.createTracingSession();}} should probably be 
> {{state.createTracingSession(getCustomPayload());}}. At least that fixes the 
> problem for me.
> This also raises the question whether some other parts of the code should 
> pass the custom payload as well (I'm not the right person to analyze this):
> {code}
> $ ag createTracingSession
> src/java/org/apache/cassandra/service/QueryState.java
> 80:public void createTracingSession()
> 82:createTracingSession(Collections.EMPTY_MAP);
> 85:public void createTracingSession(Map customPayload)
> src/java/org/apache/cassandra/thrift/CassandraServer.java
> 2528:state().getQueryState().createTracingSession();
> src/java/org/apache/cassandra/transport/messages/BatchMessage.java
> 163:state.createTracingSession();
> src/java/org/apache/cassandra/transport/messages/ExecuteMessage.java
> 114:state.createTracingSession(getCustomPayload());
> src/java/org/apache/cassandra/transport/messages/QueryMessage.java
> 101:state.createTracingSession();
> src/java/org/apache/cassandra/transport/messages/PrepareMessage.java
> 74:state.createTracingSession();
> {code}
> This is not marked as `minor` as the CASSANDRA-11706 was because this cannot 
> be fixed by the tracing plugin.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13455) lose check of null strings in decoding client token

2017-04-18 Thread Amos Jianjun Kong (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amos Jianjun Kong updated CASSANDRA-13455:
--
Description: 
RFC4616 requests AuthZID, USERNAME, PASSWORD are delimited by single '\000'.
Current code actually delimits by serial '\000', when username or password
is null, it caused decoding derangement.

The problem was found in code review.


update: above description is wrong, the problem is that :
When client responses null strings for username or password,
current decodeCredentials() can't identify it.


  was:
RFC4616 requests AuthZID, USERNAME, PASSWORD are delimited by single '\000'.
Current code actually delimits by serial '\000', when username or password
is null, it caused decoding derangement.

The problem was found in code review.


> lose check of null strings in decoding client token
> ---
>
> Key: CASSANDRA-13455
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13455
> Project: Cassandra
>  Issue Type: Bug
> Environment: CentOS7.2
> Java 1.8
>Reporter: Amos Jianjun Kong
>Assignee: Amos Jianjun Kong
> Fix For: 3.10
>
> Attachments: 0001-auth-check-both-null-points-and-null-strings.patch, 
> 0001-auth-strictly-delimit-in-decoding-client-token.patch
>
>
> RFC4616 requests AuthZID, USERNAME, PASSWORD are delimited by single '\000'.
> Current code actually delimits by serial '\000', when username or password
> is null, it caused decoding derangement.
> The problem was found in code review.
> 
> update: above description is wrong, the problem is that :
> When client responses null strings for username or password,
> current decodeCredentials() can't identify it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13455) lose check of null strings in decoding client token

2017-04-18 Thread Amos Jianjun Kong (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amos Jianjun Kong updated CASSANDRA-13455:
--
Summary: lose check of null strings in decoding client token  (was: 
derangement in decoding client token)

> lose check of null strings in decoding client token
> ---
>
> Key: CASSANDRA-13455
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13455
> Project: Cassandra
>  Issue Type: Bug
> Environment: CentOS7.2
> Java 1.8
>Reporter: Amos Jianjun Kong
>Assignee: Amos Jianjun Kong
> Fix For: 3.10
>
> Attachments: 0001-auth-check-both-null-points-and-null-strings.patch, 
> 0001-auth-strictly-delimit-in-decoding-client-token.patch
>
>
> RFC4616 requests AuthZID, USERNAME, PASSWORD are delimited by single '\000'.
> Current code actually delimits by serial '\000', when username or password
> is null, it caused decoding derangement.
> The problem was found in code review.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13455) derangement in decoding client token

2017-04-18 Thread Amos Jianjun Kong (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amos Jianjun Kong updated CASSANDRA-13455:
--
Attachment: 0001-auth-check-both-null-points-and-null-strings.patch

Post a V2 to add check of null strings.

> derangement in decoding client token
> 
>
> Key: CASSANDRA-13455
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13455
> Project: Cassandra
>  Issue Type: Bug
> Environment: CentOS7.2
> Java 1.8
>Reporter: Amos Jianjun Kong
>Assignee: Amos Jianjun Kong
> Fix For: 3.10
>
> Attachments: 0001-auth-check-both-null-points-and-null-strings.patch, 
> 0001-auth-strictly-delimit-in-decoding-client-token.patch
>
>
> RFC4616 requests AuthZID, USERNAME, PASSWORD are delimited by single '\000'.
> Current code actually delimits by serial '\000', when username or password
> is null, it caused decoding derangement.
> The problem was found in code review.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CASSANDRA-13455) derangement in decoding client token

2017-04-18 Thread Amos Jianjun Kong (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973866#comment-15973866
 ] 

Amos Jianjun Kong edited comment on CASSANDRA-13455 at 4/19/17 1:40 AM:


[~snazy] You are right, current code can split the client response bytes 
rightly with single '\000'.

The only problem is that it only checked null points, but lost checking of none 
strings.

{code}
-if (pass == null)
+if (pass == null || pass.length == 0)
 throw new AuthenticationException("Password must not be null");
-if (user == null)
+if (user == null || user.length == 0)
 throw new AuthenticationException("Authentication ID must not 
be null");
{code}



was (Author: amoskong):
[~snazy] You are right, current code can split the client response bytes 
rightly with single '\000'.

The only problem is that it only checked null points, but lost checking of none 
strings.

```
-if (pass == null)
+if (pass == null || pass.length == 0)
 throw new AuthenticationException("Password must not be null");
-if (user == null)
+if (user == null || user.length == 0)
 throw new AuthenticationException("Authentication ID must not 
be null");
```




> derangement in decoding client token
> 
>
> Key: CASSANDRA-13455
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13455
> Project: Cassandra
>  Issue Type: Bug
> Environment: CentOS7.2
> Java 1.8
>Reporter: Amos Jianjun Kong
>Assignee: Amos Jianjun Kong
> Fix For: 3.10
>
> Attachments: 0001-auth-strictly-delimit-in-decoding-client-token.patch
>
>
> RFC4616 requests AuthZID, USERNAME, PASSWORD are delimited by single '\000'.
> Current code actually delimits by serial '\000', when username or password
> is null, it caused decoding derangement.
> The problem was found in code review.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13455) derangement in decoding client token

2017-04-18 Thread Amos Jianjun Kong (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973866#comment-15973866
 ] 

Amos Jianjun Kong commented on CASSANDRA-13455:
---

[~snazy] You are right, current code can split the client response bytes 
rightly with single '\000'.

The only problem is that it only checked null points, but lost checking of none 
strings.

```
-if (pass == null)
+if (pass == null || pass.length == 0)
 throw new AuthenticationException("Password must not be null");
-if (user == null)
+if (user == null || user.length == 0)
 throw new AuthenticationException("Authentication ID must not 
be null");
```




> derangement in decoding client token
> 
>
> Key: CASSANDRA-13455
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13455
> Project: Cassandra
>  Issue Type: Bug
> Environment: CentOS7.2
> Java 1.8
>Reporter: Amos Jianjun Kong
>Assignee: Amos Jianjun Kong
> Fix For: 3.10
>
> Attachments: 0001-auth-strictly-delimit-in-decoding-client-token.patch
>
>
> RFC4616 requests AuthZID, USERNAME, PASSWORD are delimited by single '\000'.
> Current code actually delimits by serial '\000', when username or password
> is null, it caused decoding derangement.
> The problem was found in code review.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-04-18 Thread Romain GERARD (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973651#comment-15973651
 ] 

Romain GERARD commented on CASSANDRA-13418:
---

I haven't had time to check my own questions yet (I will this week) so I am 
holding my judgment for now.
My main concern is 'yet an other option ?' I need to convince myself first if 
we can't make it a default for TWCS ? Why ? As the need is tightly coupled to 
TWCS can't we bundle the option more closely to it, ...
I will definitively spend some time reading Cassandra this week,

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-12728) Handling partially written hint files

2017-04-18 Thread Jeff Jirsa (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa updated CASSANDRA-12728:
---
   Resolution: Fixed
Fix Version/s: 4.0
   3.11.0
   3.0.14
   Status: Resolved  (was: Ready to Commit)

Thanks [~garvitjuniwal] and [~iamaleksey] - committed to 3.0 as 
{{3110d27dde2d518297e118c2f1f6b6bccfed7899}} and merged up through trunk.


> Handling partially written hint files
> -
>
> Key: CASSANDRA-12728
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12728
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sharvanath Pathak
>Assignee: Garvit Juniwal
>  Labels: lhf
> Fix For: 3.0.14, 3.11.0, 4.0
>
> Attachments: CASSANDRA-12728.patch
>
>
> {noformat}
> ERROR [HintsDispatcher:1] 2016-09-28 17:44:43,397 
> HintsDispatchExecutor.java:225 - Failed to dispatch hints file 
> d5d7257c-9f81-49b2-8633-6f9bda6e3dea-1474892654160-1.hints: file is corrupted 
> ({})
> org.apache.cassandra.io.FSReadError: java.io.EOFException
> at 
> org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:282)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
> at 
> org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:252)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
> at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at 
> org.apache.cassandra.hints.HintsDispatcher.sendHints(HintsDispatcher.java:156)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
> at 
> org.apache.cassandra.hints.HintsDispatcher.sendHintsAndAwait(HintsDispatcher.java:137)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
> at 
> org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:119) 
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at 
> org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:91) 
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.deliver(HintsDispatchExecutor.java:259)
>  [apache-cassandra-3.0.6.jar:3.0.6]
> at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:242)
>  [apache-cassandra-3.0.6.jar:3.0.6]
> at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:220)
>  [apache-cassandra-3.0.6.jar:3.0.6]
> at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.run(HintsDispatchExecutor.java:199)
>  [apache-cassandra-3.0.6.jar:3.0.6]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_77]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> [na:1.8.0_77]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_77]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_77]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
> Caused by: java.io.EOFException: null
> at 
> org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:68)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
> at 
> org.apache.cassandra.io.util.RebufferingInputStream.readFully(RebufferingInputStream.java:60)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
> at 
> org.apache.cassandra.hints.ChecksummedDataInput.readFully(ChecksummedDataInput.java:126)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
> at 
> org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:402) 
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at 
> org.apache.cassandra.hints.HintsReader$BuffersIterator.readBuffer(HintsReader.java:310)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
> at 
> org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNextInternal(HintsReader.java:301)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
> at 
> org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:278)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
> ... 15 common frames omitted
> {noformat}
> We've found out that the hint file was truncated because there was a hard 
> reboot around the time of last write to the file. I think we basically need 
> to handle partially written hint files. Also, the CRC file does not exist in 
> this case (probably because it crashed while writing the hints file). May be 
> ignoring and cleaning up such partially written hint files can be a way to 
> fix this?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[1/6] cassandra git commit: Handling partially written hint files

2017-04-18 Thread jjirsa
Repository: cassandra
Updated Branches:
  refs/heads/cassandra-3.0 71d4f66d7 -> 3110d27dd
  refs/heads/cassandra-3.11 be49029c1 -> 65c1fddbc
  refs/heads/trunk 8f5f54fb1 -> e52420624


Handling partially written hint files

Patch by  Garvit Juniwal; Reviewed by Aleksey Yeschenko for CASSANDRA-12728


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/3110d27d
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/3110d27d
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/3110d27d

Branch: refs/heads/cassandra-3.0
Commit: 3110d27dde2d518297e118c2f1f6b6bccfed7899
Parents: 71d4f66
Author: Jeff Jirsa 
Authored: Tue Apr 4 11:31:29 2017 -0700
Committer: Jeff Jirsa 
Committed: Tue Apr 18 13:25:54 2017 -0700

--
 CHANGES.txt  |  2 +-
 src/java/org/apache/cassandra/hints/HintsReader.java | 11 +++
 2 files changed, 12 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/3110d27d/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index e46abcd..918c46b 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,5 +1,5 @@
 3.0.14
- * 
+ * Handling partially written hint files (CASSANDRA-12728) 
 
 3.0.13
  * Make reading of range tombstones more reliable (CASSANDRA-12811)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/3110d27d/src/java/org/apache/cassandra/hints/HintsReader.java
--
diff --git a/src/java/org/apache/cassandra/hints/HintsReader.java 
b/src/java/org/apache/cassandra/hints/HintsReader.java
index d88c4f5..8104051 100644
--- a/src/java/org/apache/cassandra/hints/HintsReader.java
+++ b/src/java/org/apache/cassandra/hints/HintsReader.java
@@ -17,6 +17,7 @@
  */
 package org.apache.cassandra.hints;
 
+import java.io.EOFException;
 import java.io.File;
 import java.io.IOException;
 import java.nio.ByteBuffer;
@@ -188,6 +189,11 @@ class HintsReader implements AutoCloseable, 
Iterable
 {
 hint = computeNextInternal();
 }
+catch (EOFException e)
+{
+logger.warn("Unexpected EOF replaying hints ({}), likely 
due to unflushed hint file on shutdown; continuing", descriptor.fileName(), e);
+return endOfData();
+}
 catch (IOException e)
 {
 throw new FSReadError(e, file);
@@ -280,6 +286,11 @@ class HintsReader implements AutoCloseable, 
Iterable
 {
 buffer = computeNextInternal();
 }
+catch (EOFException e)
+{
+logger.warn("Unexpected EOF replaying hints ({}), likely 
due to unflushed hint file on shutdown; continuing", descriptor.fileName(), e);
+return endOfData();
+}
 catch (IOException e)
 {
 throw new FSReadError(e, file);



[3/6] cassandra git commit: Handling partially written hint files

2017-04-18 Thread jjirsa
Handling partially written hint files

Patch by  Garvit Juniwal; Reviewed by Aleksey Yeschenko for CASSANDRA-12728


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/3110d27d
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/3110d27d
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/3110d27d

Branch: refs/heads/trunk
Commit: 3110d27dde2d518297e118c2f1f6b6bccfed7899
Parents: 71d4f66
Author: Jeff Jirsa 
Authored: Tue Apr 4 11:31:29 2017 -0700
Committer: Jeff Jirsa 
Committed: Tue Apr 18 13:25:54 2017 -0700

--
 CHANGES.txt  |  2 +-
 src/java/org/apache/cassandra/hints/HintsReader.java | 11 +++
 2 files changed, 12 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/3110d27d/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index e46abcd..918c46b 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,5 +1,5 @@
 3.0.14
- * 
+ * Handling partially written hint files (CASSANDRA-12728) 
 
 3.0.13
  * Make reading of range tombstones more reliable (CASSANDRA-12811)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/3110d27d/src/java/org/apache/cassandra/hints/HintsReader.java
--
diff --git a/src/java/org/apache/cassandra/hints/HintsReader.java 
b/src/java/org/apache/cassandra/hints/HintsReader.java
index d88c4f5..8104051 100644
--- a/src/java/org/apache/cassandra/hints/HintsReader.java
+++ b/src/java/org/apache/cassandra/hints/HintsReader.java
@@ -17,6 +17,7 @@
  */
 package org.apache.cassandra.hints;
 
+import java.io.EOFException;
 import java.io.File;
 import java.io.IOException;
 import java.nio.ByteBuffer;
@@ -188,6 +189,11 @@ class HintsReader implements AutoCloseable, 
Iterable
 {
 hint = computeNextInternal();
 }
+catch (EOFException e)
+{
+logger.warn("Unexpected EOF replaying hints ({}), likely 
due to unflushed hint file on shutdown; continuing", descriptor.fileName(), e);
+return endOfData();
+}
 catch (IOException e)
 {
 throw new FSReadError(e, file);
@@ -280,6 +286,11 @@ class HintsReader implements AutoCloseable, 
Iterable
 {
 buffer = computeNextInternal();
 }
+catch (EOFException e)
+{
+logger.warn("Unexpected EOF replaying hints ({}), likely 
due to unflushed hint file on shutdown; continuing", descriptor.fileName(), e);
+return endOfData();
+}
 catch (IOException e)
 {
 throw new FSReadError(e, file);



[4/6] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.11

2017-04-18 Thread jjirsa
Merge branch 'cassandra-3.0' into cassandra-3.11


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/65c1fddb
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/65c1fddb
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/65c1fddb

Branch: refs/heads/trunk
Commit: 65c1fddbcc2d7fff6f069c996306e04b6697f737
Parents: be49029 3110d27
Author: Jeff Jirsa 
Authored: Tue Apr 18 13:26:07 2017 -0700
Committer: Jeff Jirsa 
Committed: Tue Apr 18 13:26:28 2017 -0700

--
 CHANGES.txt  |  1 +
 src/java/org/apache/cassandra/hints/HintsReader.java | 11 +++
 2 files changed, 12 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/65c1fddb/CHANGES.txt
--
diff --cc CHANGES.txt
index 5516bbd,918c46b..19d8162
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -1,31 -1,7 +1,32 @@@
 -3.0.14
 - * Handling partially written hint files (CASSANDRA-12728) 
 -
 -3.0.13
 +3.11.0
 + * V5 protocol flags decoding broken (CASSANDRA-13443)
 + * Use write lock not read lock for removing sstables from compaction 
strategies. (CASSANDRA-13422)
 + * Use corePoolSize equal to maxPoolSize in JMXEnabledThreadPoolExecutors 
(CASSANDRA-13329)
 + * Avoid rebuilding SASI indexes containing no values (CASSANDRA-12962)
 + * Add charset to Analyser input stream (CASSANDRA-13151)
 + * Fix testLimitSSTables flake caused by concurrent flush (CASSANDRA-12820)
 + * cdc column addition strikes again (CASSANDRA-13382)
 + * Fix static column indexes (CASSANDRA-13277)
 + * DataOutputBuffer.asNewBuffer broken (CASSANDRA-13298)
 + * unittest CipherFactoryTest failed on MacOS (CASSANDRA-13370)
 + * Forbid SELECT restrictions and CREATE INDEX over non-frozen UDT columns 
(CASSANDRA-13247)
 + * Default logging we ship will incorrectly print "?:?" for "%F:%L" pattern 
(CASSANDRA-13317)
 + * Possible AssertionError in UnfilteredRowIteratorWithLowerBound 
(CASSANDRA-13366)
 + * Support unaligned memory access for AArch64 (CASSANDRA-13326)
 + * Improve SASI range iterator efficiency on intersection with an empty range 
(CASSANDRA-12915).
 + * Fix equality comparisons of columns using the duration type 
(CASSANDRA-13174)
 + * Obfuscate password in stress-graphs (CASSANDRA-12233)
 + * Move to FastThreadLocalThread and FastThreadLocal (CASSANDRA-13034)
 + * nodetool stopdaemon errors out (CASSANDRA-13030)
 + * Tables in system_distributed should not use gcgs of 0 (CASSANDRA-12954)
 + * Fix primary index calculation for SASI (CASSANDRA-12910)
 + * More fixes to the TokenAllocator (CASSANDRA-12990)
 + * NoReplicationTokenAllocator should work with zero replication factor 
(CASSANDRA-12983)
 + * Address message coalescing regression (CASSANDRA-12676)
 + * Delete illegal character from StandardTokenizerImpl.jflex (CASSANDRA-13417)
 +Merged from 3.0:
++ * Handling partially written hint files (CASSANDRA-12728)
 + * Fix NPE issue in StorageService (CASSANDRA-13060)
   * Make reading of range tombstones more reliable (CASSANDRA-12811)
   * Fix startup problems due to schema tables not completely flushed 
(CASSANDRA-12213)
   * Fix view builder bug that can filter out data on restart (CASSANDRA-13405)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/65c1fddb/src/java/org/apache/cassandra/hints/HintsReader.java
--



[5/6] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.11

2017-04-18 Thread jjirsa
Merge branch 'cassandra-3.0' into cassandra-3.11


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/65c1fddb
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/65c1fddb
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/65c1fddb

Branch: refs/heads/cassandra-3.11
Commit: 65c1fddbcc2d7fff6f069c996306e04b6697f737
Parents: be49029 3110d27
Author: Jeff Jirsa 
Authored: Tue Apr 18 13:26:07 2017 -0700
Committer: Jeff Jirsa 
Committed: Tue Apr 18 13:26:28 2017 -0700

--
 CHANGES.txt  |  1 +
 src/java/org/apache/cassandra/hints/HintsReader.java | 11 +++
 2 files changed, 12 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/65c1fddb/CHANGES.txt
--
diff --cc CHANGES.txt
index 5516bbd,918c46b..19d8162
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -1,31 -1,7 +1,32 @@@
 -3.0.14
 - * Handling partially written hint files (CASSANDRA-12728) 
 -
 -3.0.13
 +3.11.0
 + * V5 protocol flags decoding broken (CASSANDRA-13443)
 + * Use write lock not read lock for removing sstables from compaction 
strategies. (CASSANDRA-13422)
 + * Use corePoolSize equal to maxPoolSize in JMXEnabledThreadPoolExecutors 
(CASSANDRA-13329)
 + * Avoid rebuilding SASI indexes containing no values (CASSANDRA-12962)
 + * Add charset to Analyser input stream (CASSANDRA-13151)
 + * Fix testLimitSSTables flake caused by concurrent flush (CASSANDRA-12820)
 + * cdc column addition strikes again (CASSANDRA-13382)
 + * Fix static column indexes (CASSANDRA-13277)
 + * DataOutputBuffer.asNewBuffer broken (CASSANDRA-13298)
 + * unittest CipherFactoryTest failed on MacOS (CASSANDRA-13370)
 + * Forbid SELECT restrictions and CREATE INDEX over non-frozen UDT columns 
(CASSANDRA-13247)
 + * Default logging we ship will incorrectly print "?:?" for "%F:%L" pattern 
(CASSANDRA-13317)
 + * Possible AssertionError in UnfilteredRowIteratorWithLowerBound 
(CASSANDRA-13366)
 + * Support unaligned memory access for AArch64 (CASSANDRA-13326)
 + * Improve SASI range iterator efficiency on intersection with an empty range 
(CASSANDRA-12915).
 + * Fix equality comparisons of columns using the duration type 
(CASSANDRA-13174)
 + * Obfuscate password in stress-graphs (CASSANDRA-12233)
 + * Move to FastThreadLocalThread and FastThreadLocal (CASSANDRA-13034)
 + * nodetool stopdaemon errors out (CASSANDRA-13030)
 + * Tables in system_distributed should not use gcgs of 0 (CASSANDRA-12954)
 + * Fix primary index calculation for SASI (CASSANDRA-12910)
 + * More fixes to the TokenAllocator (CASSANDRA-12990)
 + * NoReplicationTokenAllocator should work with zero replication factor 
(CASSANDRA-12983)
 + * Address message coalescing regression (CASSANDRA-12676)
 + * Delete illegal character from StandardTokenizerImpl.jflex (CASSANDRA-13417)
 +Merged from 3.0:
++ * Handling partially written hint files (CASSANDRA-12728)
 + * Fix NPE issue in StorageService (CASSANDRA-13060)
   * Make reading of range tombstones more reliable (CASSANDRA-12811)
   * Fix startup problems due to schema tables not completely flushed 
(CASSANDRA-12213)
   * Fix view builder bug that can filter out data on restart (CASSANDRA-13405)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/65c1fddb/src/java/org/apache/cassandra/hints/HintsReader.java
--



[2/6] cassandra git commit: Handling partially written hint files

2017-04-18 Thread jjirsa
Handling partially written hint files

Patch by  Garvit Juniwal; Reviewed by Aleksey Yeschenko for CASSANDRA-12728


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/3110d27d
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/3110d27d
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/3110d27d

Branch: refs/heads/cassandra-3.11
Commit: 3110d27dde2d518297e118c2f1f6b6bccfed7899
Parents: 71d4f66
Author: Jeff Jirsa 
Authored: Tue Apr 4 11:31:29 2017 -0700
Committer: Jeff Jirsa 
Committed: Tue Apr 18 13:25:54 2017 -0700

--
 CHANGES.txt  |  2 +-
 src/java/org/apache/cassandra/hints/HintsReader.java | 11 +++
 2 files changed, 12 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/3110d27d/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index e46abcd..918c46b 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,5 +1,5 @@
 3.0.14
- * 
+ * Handling partially written hint files (CASSANDRA-12728) 
 
 3.0.13
  * Make reading of range tombstones more reliable (CASSANDRA-12811)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/3110d27d/src/java/org/apache/cassandra/hints/HintsReader.java
--
diff --git a/src/java/org/apache/cassandra/hints/HintsReader.java 
b/src/java/org/apache/cassandra/hints/HintsReader.java
index d88c4f5..8104051 100644
--- a/src/java/org/apache/cassandra/hints/HintsReader.java
+++ b/src/java/org/apache/cassandra/hints/HintsReader.java
@@ -17,6 +17,7 @@
  */
 package org.apache.cassandra.hints;
 
+import java.io.EOFException;
 import java.io.File;
 import java.io.IOException;
 import java.nio.ByteBuffer;
@@ -188,6 +189,11 @@ class HintsReader implements AutoCloseable, 
Iterable
 {
 hint = computeNextInternal();
 }
+catch (EOFException e)
+{
+logger.warn("Unexpected EOF replaying hints ({}), likely 
due to unflushed hint file on shutdown; continuing", descriptor.fileName(), e);
+return endOfData();
+}
 catch (IOException e)
 {
 throw new FSReadError(e, file);
@@ -280,6 +286,11 @@ class HintsReader implements AutoCloseable, 
Iterable
 {
 buffer = computeNextInternal();
 }
+catch (EOFException e)
+{
+logger.warn("Unexpected EOF replaying hints ({}), likely 
due to unflushed hint file on shutdown; continuing", descriptor.fileName(), e);
+return endOfData();
+}
 catch (IOException e)
 {
 throw new FSReadError(e, file);




[6/6] cassandra git commit: Merge branch 'cassandra-3.11' into trunk

2017-04-18 Thread jjirsa
Merge branch 'cassandra-3.11' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/e5242062
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/e5242062
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/e5242062

Branch: refs/heads/trunk
Commit: e524206246bca48fe94064c8dd8f804fd81d77c0
Parents: 8f5f54f 65c1fdd
Author: Jeff Jirsa 
Authored: Tue Apr 18 13:26:39 2017 -0700
Committer: Jeff Jirsa 
Committed: Tue Apr 18 13:27:15 2017 -0700

--
 CHANGES.txt  |  1 +
 src/java/org/apache/cassandra/hints/HintsReader.java | 11 +++
 2 files changed, 12 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/e5242062/CHANGES.txt
--

http://git-wip-us.apache.org/repos/asf/cassandra/blob/e5242062/src/java/org/apache/cassandra/hints/HintsReader.java
--



[jira] [Commented] (CASSANDRA-13461) Update circle.yml to run dtests and utests in parallel across containers

2017-04-18 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973357#comment-15973357
 ] 

Ariel Weisberg commented on CASSANDRA-13461:


Yeah looks like we will have to hold off until we talk to Circle and see what 
is possible.

> Update circle.yml to run dtests and utests in parallel across containers
> 
>
> Key: CASSANDRA-13461
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13461
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>
> I have a circle.yml that parallelizes the dtests and utests over the 4 free 
> available containers. It can be tweaked to support however many containers 
> are available.
> The unit tests pass normally. The dtests run mostly normally. There are 10 or 
> so tests that fail on trunk, but 30 that fail when run in CircleCI. It's 
> still better than not running the dtests IMO. I am currently working on 
> figuring out why the test failures don't match.
> {noformat}
> version: 2
> jobs:
>   build:
> resource_class: xlarge
> working_directory: ~/
> parallelism: 4
> docker:
>   - image: ubuntu:xenial-20170410
> steps:
>   - run:
>   name: apt-get install packages
>   command: |
> echo "deb http://ppa.launchpad.net/webupd8team/java/ubuntu xenial 
> main" | tee /etc/apt/sources.list.d/webupd8team-java.list
> echo "deb-src http://ppa.launchpad.net/webupd8team/java/ubuntu 
> xenial main" | tee -a /etc/apt/sources.list.d/webupd8team-java.list
> apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 
> EEA14886
> echo oracle-java8-installer shared/accepted-oracle-license-v1-1 
> select true | /usr/bin/debconf-set-selections
> apt-get update
> apt-get install -y git-core npm python python-pip python-dev ant 
> ant-optional oracle-java8-installer net-tools
> ln -s /usr/bin/nodejs /usr/bin/node || true
>   - run:
>   name: Log environment information
>   command: |
>   echo '*** id ***'
>   id
>   echo '*** cat /proc/cpuinfo ***'
>   cat /proc/cpuinfo
>   echo '*** free -m ***'
>   free -m
>   echo '*** df -m ***'
>   df -m
>   echo '*** ifconfig -a ***'
>   ifconfig -a
>   echo '*** uname -a ***'
>   uname -a
>   echo '*** mount ***'
>   mount
>   echo '*** env ***'
>   env
>   - run:
>   name: Clone git repos
>   command: |
> git clone --single-branch --depth 1 
> https://github.com/riptano/cassandra-dtest ~/cassandra-dtest
> git clone --single-branch --depth 1 --branch $CIRCLE_BRANCH 
> git://github.com/$CIRCLE_PROJECT_USERNAME/$CIRCLE_PROJECT_REPONAME.git 
> ~/cassandra
>   - run:
>   name: Install junit-merge
>   command: npm install -g junit-merge
>   - run:
>   name: Install virtualenv
>   command: pip install virtualenv 
>   - run:
>   name: Configure virtualenv and python dependencies
>   command: |
> virtualenv --python=python2 --no-site-packages venv
> source venv/bin/activate
> export CASS_DRIVER_NO_EXTENSIONS=true
> export CASS_DRIVER_NO_CYTHON=true
> pip install -r ~/cassandra-dtest/requirements.txt
> pip freeze
>   - run:
>   name: Build Cassandra
>   command: |
> cd ~/cassandra
> # Loop to prevent failure due to maven-ant-tasks not downloading 
> a jar..
> for x in $(seq 1 3); do
> ant clean jar
> RETURN="$?"
> if [ "${RETURN}" -eq "0" ]; then
> break
> fi
> done
> # Exit, if we didn't build successfully
> if [ "${RETURN}" -ne "0" ]; then
> echo "Build failed with exit code: ${RETURN}"
> exit ${RETURN}
> fi
>   no_output_timeout: 20m
>   - run:
>   name: Determine tests to run
>   no_output_timeout: 10m
>   command: |
> #"$HOME/cassandra-dtest/**/*.py"
> echo `circleci tests glob "$HOME/cassandra/test/unit/**" 
> "$HOME/cassandra-dtest/**/*.py" | grep -v upgrade_tests | grep -v 
> "cassandra-thrift" | grep -v "thrift_bindings" | grep -v "tools" | grep -v 
> "dtest.py" | grep -v "$HOME/cassandra-dtest/bin" | grep -v "plugins" | 
> circleci tests split --split-by=timings --timings-type=filename` > 
> /tmp/tests.txt
> echo "***processed tests***"
> 

[jira] [Commented] (CASSANDRA-13461) Update circle.yml to run dtests and utests in parallel across containers

2017-04-18 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973340#comment-15973340
 ] 

Jeff Jirsa commented on CASSANDRA-13461:


Tried this on one of my branches (with free resource settings), and it's not so 
pretty - https://circleci.com/gh/jeffjirsa/cassandra/134

> Update circle.yml to run dtests and utests in parallel across containers
> 
>
> Key: CASSANDRA-13461
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13461
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>
> I have a circle.yml that parallelizes the dtests and utests over the 4 free 
> available containers. It can be tweaked to support however many containers 
> are available.
> The unit tests pass normally. The dtests run mostly normally. There are 10 or 
> so tests that fail on trunk, but 30 that fail when run in CircleCI. It's 
> still better than not running the dtests IMO. I am currently working on 
> figuring out why the test failures don't match.
> {noformat}
> version: 2
> jobs:
>   build:
> resource_class: xlarge
> working_directory: ~/
> parallelism: 4
> docker:
>   - image: ubuntu:xenial-20170410
> steps:
>   - run:
>   name: apt-get install packages
>   command: |
> echo "deb http://ppa.launchpad.net/webupd8team/java/ubuntu xenial 
> main" | tee /etc/apt/sources.list.d/webupd8team-java.list
> echo "deb-src http://ppa.launchpad.net/webupd8team/java/ubuntu 
> xenial main" | tee -a /etc/apt/sources.list.d/webupd8team-java.list
> apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 
> EEA14886
> echo oracle-java8-installer shared/accepted-oracle-license-v1-1 
> select true | /usr/bin/debconf-set-selections
> apt-get update
> apt-get install -y git-core npm python python-pip python-dev ant 
> ant-optional oracle-java8-installer net-tools
> ln -s /usr/bin/nodejs /usr/bin/node || true
>   - run:
>   name: Log environment information
>   command: |
>   echo '*** id ***'
>   id
>   echo '*** cat /proc/cpuinfo ***'
>   cat /proc/cpuinfo
>   echo '*** free -m ***'
>   free -m
>   echo '*** df -m ***'
>   df -m
>   echo '*** ifconfig -a ***'
>   ifconfig -a
>   echo '*** uname -a ***'
>   uname -a
>   echo '*** mount ***'
>   mount
>   echo '*** env ***'
>   env
>   - run:
>   name: Clone git repos
>   command: |
> git clone --single-branch --depth 1 
> https://github.com/riptano/cassandra-dtest ~/cassandra-dtest
> git clone --single-branch --depth 1 --branch $CIRCLE_BRANCH 
> git://github.com/$CIRCLE_PROJECT_USERNAME/$CIRCLE_PROJECT_REPONAME.git 
> ~/cassandra
>   - run:
>   name: Install junit-merge
>   command: npm install -g junit-merge
>   - run:
>   name: Install virtualenv
>   command: pip install virtualenv 
>   - run:
>   name: Configure virtualenv and python dependencies
>   command: |
> virtualenv --python=python2 --no-site-packages venv
> source venv/bin/activate
> export CASS_DRIVER_NO_EXTENSIONS=true
> export CASS_DRIVER_NO_CYTHON=true
> pip install -r ~/cassandra-dtest/requirements.txt
> pip freeze
>   - run:
>   name: Build Cassandra
>   command: |
> cd ~/cassandra
> # Loop to prevent failure due to maven-ant-tasks not downloading 
> a jar..
> for x in $(seq 1 3); do
> ant clean jar
> RETURN="$?"
> if [ "${RETURN}" -eq "0" ]; then
> break
> fi
> done
> # Exit, if we didn't build successfully
> if [ "${RETURN}" -ne "0" ]; then
> echo "Build failed with exit code: ${RETURN}"
> exit ${RETURN}
> fi
>   no_output_timeout: 20m
>   - run:
>   name: Determine tests to run
>   no_output_timeout: 10m
>   command: |
> #"$HOME/cassandra-dtest/**/*.py"
> echo `circleci tests glob "$HOME/cassandra/test/unit/**" 
> "$HOME/cassandra-dtest/**/*.py" | grep -v upgrade_tests | grep -v 
> "cassandra-thrift" | grep -v "thrift_bindings" | grep -v "tools" | grep -v 
> "dtest.py" | grep -v "$HOME/cassandra-dtest/bin" | grep -v "plugins" | 
> circleci tests split --split-by=timings --timings-type=filename` > 
> /tmp/tests.txt
> 

[jira] [Created] (CASSANDRA-13461) Update circle.yml to run dtests and utests in parallel across containers

2017-04-18 Thread Ariel Weisberg (JIRA)
Ariel Weisberg created CASSANDRA-13461:
--

 Summary: Update circle.yml to run dtests and utests in parallel 
across containers
 Key: CASSANDRA-13461
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13461
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg


I have a circle.yml that parallelizes the dtests and utests over the 4 free 
available containers. It can be tweaked to support however many containers are 
available.

The unit tests pass normally. The dtests run mostly normally. There are 10 or 
so tests that fail on trunk, but 30 that fail when run in CircleCI. It's still 
better than not running the dtests IMO. I am currently working on figuring out 
why the test failures don't match.

{noformat}
version: 2
jobs:
  build:
resource_class: xlarge
working_directory: ~/
parallelism: 4
docker:
  - image: ubuntu:xenial-20170410
steps:
  - run:
  name: apt-get install packages
  command: |
echo "deb http://ppa.launchpad.net/webupd8team/java/ubuntu xenial 
main" | tee /etc/apt/sources.list.d/webupd8team-java.list
echo "deb-src http://ppa.launchpad.net/webupd8team/java/ubuntu 
xenial main" | tee -a /etc/apt/sources.list.d/webupd8team-java.list
apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 
EEA14886
echo oracle-java8-installer shared/accepted-oracle-license-v1-1 
select true | /usr/bin/debconf-set-selections
apt-get update
apt-get install -y git-core npm python python-pip python-dev ant 
ant-optional oracle-java8-installer net-tools
ln -s /usr/bin/nodejs /usr/bin/node || true
  - run:
  name: Log environment information
  command: |
  echo '*** id ***'
  id
  echo '*** cat /proc/cpuinfo ***'
  cat /proc/cpuinfo
  echo '*** free -m ***'
  free -m
  echo '*** df -m ***'
  df -m
  echo '*** ifconfig -a ***'
  ifconfig -a
  echo '*** uname -a ***'
  uname -a
  echo '*** mount ***'
  mount
  echo '*** env ***'
  env
  - run:
  name: Clone git repos
  command: |
git clone --single-branch --depth 1 
https://github.com/riptano/cassandra-dtest ~/cassandra-dtest
git clone --single-branch --depth 1 --branch $CIRCLE_BRANCH 
git://github.com/$CIRCLE_PROJECT_USERNAME/$CIRCLE_PROJECT_REPONAME.git 
~/cassandra
  - run:
  name: Install junit-merge
  command: npm install -g junit-merge
  - run:
  name: Install virtualenv
  command: pip install virtualenv 
  - run:
  name: Configure virtualenv and python dependencies
  command: |
virtualenv --python=python2 --no-site-packages venv
source venv/bin/activate
export CASS_DRIVER_NO_EXTENSIONS=true
export CASS_DRIVER_NO_CYTHON=true
pip install -r ~/cassandra-dtest/requirements.txt
pip freeze
  - run:
  name: Build Cassandra
  command: |
cd ~/cassandra
# Loop to prevent failure due to maven-ant-tasks not downloading a 
jar..
for x in $(seq 1 3); do
ant clean jar
RETURN="$?"
if [ "${RETURN}" -eq "0" ]; then
break
fi
done
# Exit, if we didn't build successfully
if [ "${RETURN}" -ne "0" ]; then
echo "Build failed with exit code: ${RETURN}"
exit ${RETURN}
fi
  no_output_timeout: 20m
  - run:
  name: Determine tests to run
  no_output_timeout: 10m
  command: |
#"$HOME/cassandra-dtest/**/*.py"
echo `circleci tests glob "$HOME/cassandra/test/unit/**" 
"$HOME/cassandra-dtest/**/*.py" | grep -v upgrade_tests | grep -v 
"cassandra-thrift" | grep -v "thrift_bindings" | grep -v "tools" | grep -v 
"dtest.py" | grep -v "$HOME/cassandra-dtest/bin" | grep -v "plugins" | circleci 
tests split --split-by=timings --timings-type=filename` > /tmp/tests.txt
echo "***processed tests***"
cat /tmp/tests.txt | sed -e 's/\s\+/\n/g' > /tmp/processed_tests.txt
cat /tmp/processed_tests.txt
echo "***java tests***"
cat /tmp/processed_tests.txt | grep "\.java$" > /tmp/java_tests.txt
cat /tmp/java_tests.txt
cat /tmp/java_tests.txt | cut -c 27-100 | grep "Test\.java$" > 
/tmp/java_tests_final.txt
echo "***final java tests***"
cat /tmp/java_tests_final.txt
echo "***python tests***"
cat /tmp/processed_tests.txt 

[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2017-04-18 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973117#comment-15973117
 ] 

Jonathan Ellis commented on CASSANDRA-12126:


I'm confused, because it sounds like you're saying "all operations should be 
visible once finished."  Of course that's not actually what you mean that would 
require participation from all replicas to finish in-flight requests and not 
just a majority.  What is the distinction you are proposing?

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: sankalp kohli
>Assignee: Stefan Podkowinski
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13275) Cassandra throws an exception during CQL select query filtering on map key

2017-04-18 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973113#comment-15973113
 ] 

Alex Petrov commented on CASSANDRA-13275:
-

Partition key filtering was introduced in [CASSANDRA-11031], although 
{{CONTAINS}} didn't trigger filtering, the read path was trying to convert 
{{CONTAINS}} restriction to bounds.

|[3.11|https://github.com/apache/cassandra/compare/3.11...ifesdjeen:13275-3.11]|[testall|http://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-13275-3.11-testall/]|[dtest|http://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-13275-3.11-dtest/]|
|[trunk|https://github.com/apache/cassandra/compare/trunk...ifesdjeen:13275-trunk]|[testall|http://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-13275-trunk-testall/]|[dtest|http://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-13275-trunk-dtest/]|

> Cassandra throws an exception during CQL select query filtering on map key 
> ---
>
> Key: CASSANDRA-13275
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13275
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Abderrahmane CHRAIBI
>Assignee: Alex Petrov
>
> Env: cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4
> Using this table structure:
> {code}CREATE TABLE mytable (
> mymap frozen>> PRIMARY KEY
> )
> {code}
> Executing:
> {code} select * from mytable where mymap contains key UUID;
> {code}
> Within cqlsh shows this message:
> {code}
> ServerError: java.lang.UnsupportedOperationException
> system.log:
> java.lang.UnsupportedOperationException: null
> at 
> org.apache.cassandra.cql3.restrictions.SingleColumnRestriction$ContainsRestriction.appendTo(SingleColumnRestriction.java:456)
>  ~[apache-cassandra-3.9.jar:3.9]
> at 
> org.apache.cassandra.cql3.restrictions.PartitionKeySingleRestrictionSet.values(PartitionKeySingleRestrictionSet.java:86)
>  ~[apache-cassandra-3.9.jar:3.9]
> at 
> org.apache.cassandra.cql3.restrictions.StatementRestrictions.getPartitionKeys(StatementRestrictions.java:585)
>  ~[apache-cassandra-3.9.jar:3.9]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.getSliceCommands(SelectStatement.java:474)
>  ~[apache-cassandra-3.9.jar:3.9]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.getQuery(SelectStatement.java:262)
>  ~[apache-cassandra-3.9.jar:3.9]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:227)
>  ~[apache-cassandra-3.9.jar:3.9]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:76)
>  ~[apache-cassandra-3.9.jar:3.9]
> at 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:188)
>  ~[apache-cassandra-3.9.jar:3.9]
> at 
> org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:219) 
> ~[apache-cassandra-3.9.jar:3.9]
> at 
> org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:204) 
> ~[apache-cassandra-3.9.jar:3.9]
> at 
> org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:115)
>  ~[apache-cassandra-3.9.jar:3.9]
> at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:513)
>  [apache-cassandra-3.9.jar:3.9]
> at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:407)
>  [apache-cassandra-3.9.jar:3.9]
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>  [netty-all-4.0.39.Final.jar:4.0.39.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:366)
>  [netty-all-4.0.39.Final.jar:4.0.39.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:35)
>  [netty-all-4.0.39.Final.jar:4.0.39.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:357)
>  [netty-all-4.0.39.Final.jar:4.0.39.Final]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_121]
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  [apache-cassandra-3.9.jar:3.9]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) 
> [apache-cassandra-3.9.jar:3.9]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with storage requirements approximating RF=2

2017-04-18 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972997#comment-15972997
 ] 

sankalp kohli commented on CASSANDRA-13442:
---

Regarding cost, it will help you save roughly 30% if you go from RF=6 to 4(2 in 
each DC) and more if you do this with RF=10 to 6(3 in each DC). 
It will be quite complex and we can spec it out to see if it is worth the time. 

> Support a means of strongly consistent highly available replication with 
> storage requirements approximating RF=2
> 
>
> Key: CASSANDRA-13442
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13442
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Coordination, Distributed Metadata, Local 
> Write-Read Paths
>Reporter: Ariel Weisberg
>
> Replication factors like RF=2 can't provide strong consistency and 
> availability because if a single node is lost it's impossible to reach a 
> quorum of replicas. Stepping up to RF=3 will allow you to lose a node and 
> still achieve quorum for reads and writes, but requires committing additional 
> storage.
> The requirement of a quorum for writes/reads doesn't seem to be something 
> that can be relaxed without additional constraints on queries, but it seems 
> like it should be possible to relax the requirement that 3 full copies of the 
> entire data set are kept. What is actually required is a covering data set 
> for the range and we should be able to achieve a covering data set and high 
> availability without having three full copies. 
> After a repair we know that some subset of the data set is fully replicated. 
> At that point we don't have to read from a quorum of nodes for the repaired 
> data. It is sufficient to read from a single node for the repaired data and a 
> quorum of nodes for the unrepaired data.
> One way to exploit this would be to have N replicas, say the last N replicas 
> (where N varies with RF) in the preference list, delete all repaired data 
> after a repair completes. Subsequent quorum reads will be able to retrieve 
> the repaired data from any of the two full replicas and the unrepaired data 
> from a quorum read of any replica including the "transient" replicas.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2017-04-18 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972899#comment-15972899
 ] 

Sylvain Lebresne commented on CASSANDRA-12126:
--

bq.  I am saying that we actually also satisfy (1) because the write is not 
complete until Sankalp's step 3.

Your definition of "completion" doesn't work logically. You're suggesting (1) 
only complete when it is visible (in (3)), but linearizabilty is about when 
operations that complete are visible, so if you define completion as "when it's 
visible", you're in for trouble.

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: sankalp kohli
>Assignee: Stefan Podkowinski
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-04-18 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972898#comment-15972898
 ] 

Corentin Chary commented on CASSANDRA-13418:


Here is an attempt at a patch: 
https://github.com/iksaif/cassandra/tree/CASSANDRA-13005-trunk

Works with:
{code}
ALTER TABLE test.test WITH compaction = {'class': 
'TimeWindowCompactionStrategy', 'provide_overlapping_tombstones': 
'ignore_overlaps'};
{code}

This outputs:
{code}
WARN  [CompactionExecutor:4] 2017-04-18 17:17:00,538 
CompactionController.java:96 - You are running with overlapping sstable sanity 
checks for tombstones disabled on test:test,this can lead to inconsistencies 
when running explicit deletions.
{code}

I'm still not sure about reusing the existing option, I could be conviced 
overwise (but it should not be hard to change).

Once we agree on that I can add documentation and unit tests.


> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2017-04-18 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972890#comment-15972890
 ] 

Sylvain Lebresne commented on CASSANDRA-12126:
--

bq. But we stipulated that 1 times out and did not complete.

It completed in the sense that the operation is finished and returned to the 
client (albeit with a timeout). Don't get me wrong, theory is blind, so if you 
want to define than an operation completes only if it "finished and did not 
timeout" and define linearizability only in term of completing operation (with 
that definition of completion), then sure, I agree this particular definition 
of linearizability is not broken by the description of this ticket.

But how useful is a definition of linearizability that says nothing about 
operations that timeouts (especially keeping in mind that our implementation is 
particularly prone to timeouts)?

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: sankalp kohli
>Assignee: Stefan Podkowinski
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2017-04-18 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972886#comment-15972886
 ] 

Jonathan Ellis commented on CASSANDRA-12126:


Bailis:

# once a write completes, all later reads (where “later” is defined by 
wall-clock start time) should return the value of that write or the value of a 
later write. 
# Once a read returns a particular value, all later reads should return that 
value or the value of a later write.

I think we all agree that our current behavior satisfies (2).  I am saying that 
we actually also satisfy (1) because the write is not complete until Sankalp's 
step 3.


> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: sankalp kohli
>Assignee: Stefan Podkowinski
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2017-04-18 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972883#comment-15972883
 ] 

Jonathan Ellis commented on CASSANDRA-12126:


But we stipulated that 1 times out and did not complete.  (If it did complete 
it would be guaranteed to be visible by any majority of course.)

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: sankalp kohli
>Assignee: Stefan Podkowinski
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2017-04-18 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972861#comment-15972861
 ] 

Sylvain Lebresne commented on CASSANDRA-12126:
--

bq. 1 -> 2 -> 3. The value V from 1 is not seen in 2, but once it is seen in 3 
it is always seen.

As 1 "completes" before 2, it's result should be visible by 2 (or not ever) for 
linearizability (taken in the sense discussed here for instance: 
http://www.bailis.org/blog/linearizability-versus-serializability/). More 
pragmatically, outside of any theoretical definition, if serial read can't 
guarantee they will see any previous operation (even failed one, as long as 
they returned to the client), then they are not very useful in the first place.

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: sankalp kohli
>Assignee: Stefan Podkowinski
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13432) MemtableReclaimMemory can get stuck because of lack of timeout in getTopLevelColumns()

2017-04-18 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972825#comment-15972825
 ] 

Corentin Chary commented on CASSANDRA-13432:


Latest patch https://github.com/iksaif/cassandra/commits/CASSANDRA-13432-2.x

> MemtableReclaimMemory can get stuck because of lack of timeout in 
> getTopLevelColumns()
> --
>
> Key: CASSANDRA-13432
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13432
> Project: Cassandra
>  Issue Type: Bug
> Environment: cassandra 2.1.15
>Reporter: Corentin Chary
> Fix For: 2.1.x
>
> Attachments: CASSANDRA-13432.patch
>
>
> This might affect 3.x too, I'm not sure.
> {code}
> $ nodetool tpstats
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 0   32135875 0
>  0
> ReadStage   114 0   29492940 0
>  0
> RequestResponseStage  0 0   86090931 0
>  0
> ReadRepairStage   0 0 166645 0
>  0
> CounterMutationStage  0 0  0 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 47 0
>  0
> GossipStage   0 0 188769 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor0 0  86835 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0  0 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0 92 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   0 0563 0
>  0
> MemtablePostFlush 0 0   1500 0
>  0
> MemtableReclaimMemory 129534 0
>  0
> Native-Transport-Requests41 0   54819182 0
>   1896
> {code}
> {code}
> "MemtableReclaimMemory:195" - Thread t@6268
>java.lang.Thread.State: WAITING
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
>   at 
> org.apache.cassandra.utils.concurrent.WaitQueue$AbstractSignal.awaitUninterruptibly(WaitQueue.java:283)
>   at 
> org.apache.cassandra.utils.concurrent.OpOrder$Barrier.await(OpOrder.java:417)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush$1.runMayThrow(ColumnFamilyStore.java:1151)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - locked <6e7b1160> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> "SharedPool-Worker-195" - Thread t@989
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.cassandra.db.RangeTombstoneList.addInternal(RangeTombstoneList.java:690)
>   at 
> org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:650)
>   at 
> org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:171)
>   at 
> org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:143)
>   at org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:240)
>   at 
> org.apache.cassandra.db.ArrayBackedSortedColumns.delete(ArrayBackedSortedColumns.java:483)
>   at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:153)
>   at 
> 

[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2017-04-18 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972815#comment-15972815
 ] 

Jonathan Ellis commented on CASSANDRA-12126:


I see you outlining two series of steps:

1 -> 2 -> 3.  The value V from 1 is not seen in 2, but once it is seen in 3 it 
is always seen.

1 -> 2 -> 4.  V is never seen.

It seems to me that both of these maintain linearizability.  What am I missing?

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: sankalp kohli
>Assignee: Stefan Podkowinski
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13441) Schema version changes for each upgraded node in a rolling upgrade, causing migration storms

2017-04-18 Thread Aleksey Yeschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-13441:
--
Fix Version/s: 3.11.x
   3.0.x

> Schema version changes for each upgraded node in a rolling upgrade, causing 
> migration storms
> 
>
> Key: CASSANDRA-13441
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13441
> Project: Cassandra
>  Issue Type: Bug
>  Components: Schema
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> In versions < 3.0, during a rolling upgrade (say 2.0 -> 2.1), the first node 
> to upgrade to 2.1 would add the new tables, setting the new 2.1 version ID, 
> and subsequently upgraded hosts would settle on that version.
> When a 3.0 node upgrades and writes its own new-in-3.0 system tables, it'll 
> write the same tables that exist in the schema with brand new timestamps. As 
> written, this will cause all nodes in the cluster to change schema (to the 
> version with the newest timestamp). On a sufficiently large cluster with a 
> non-trivial schema, this could cause (literally) millions of migration tasks 
> to needlessly bounce across the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13441) Schema version changes for each upgraded node in a rolling upgrade, causing migration storms

2017-04-18 Thread Aleksey Yeschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-13441:
--
Status: Ready to Commit  (was: Patch Available)

> Schema version changes for each upgraded node in a rolling upgrade, causing 
> migration storms
> 
>
> Key: CASSANDRA-13441
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13441
> Project: Cassandra
>  Issue Type: Bug
>  Components: Schema
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 4.x
>
>
> In versions < 3.0, during a rolling upgrade (say 2.0 -> 2.1), the first node 
> to upgrade to 2.1 would add the new tables, setting the new 2.1 version ID, 
> and subsequently upgraded hosts would settle on that version.
> When a 3.0 node upgrades and writes its own new-in-3.0 system tables, it'll 
> write the same tables that exist in the schema with brand new timestamps. As 
> written, this will cause all nodes in the cluster to change schema (to the 
> version with the newest timestamp). On a sufficiently large cluster with a 
> non-trivial schema, this could cause (literally) millions of migration tasks 
> to needlessly bounce across the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13441) Schema version changes for each upgraded node in a rolling upgrade, causing migration storms

2017-04-18 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972792#comment-15972792
 ] 

Aleksey Yeschenko commented on CASSANDRA-13441:
---

The patches look good to me. +1

P.S. Thought initially that you overlooked specifying a 0 timestamp for 
keyspace metadata itself (replication params and durable_writes), but 
apparently there, in {{MigrationManager.maybeAddKeyspace()}} we already do 
correctly set 0 timestamp. Makes me wonder how we forgot to do the same for 
tables back then.

> Schema version changes for each upgraded node in a rolling upgrade, causing 
> migration storms
> 
>
> Key: CASSANDRA-13441
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13441
> Project: Cassandra
>  Issue Type: Bug
>  Components: Schema
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 4.x
>
>
> In versions < 3.0, during a rolling upgrade (say 2.0 -> 2.1), the first node 
> to upgrade to 2.1 would add the new tables, setting the new 2.1 version ID, 
> and subsequently upgraded hosts would settle on that version.
> When a 3.0 node upgrades and writes its own new-in-3.0 system tables, it'll 
> write the same tables that exist in the schema with brand new timestamps. As 
> written, this will cause all nodes in the cluster to change schema (to the 
> version with the newest timestamp). On a sufficiently large cluster with a 
> non-trivial schema, this could cause (literally) millions of migration tasks 
> to needlessly bounce across the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13459) Diag. Events: Native transport integration

2017-04-18 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-13459:
---
Labels: client-impacting  (was: )

> Diag. Events: Native transport integration
> --
>
> Key: CASSANDRA-13459
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13459
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: CQL
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>  Labels: client-impacting
> Attachments: diag.log
>
>
> Events should be consumable by clients that would received subscribed events 
> from the connected node. This functionality is designed to work on top of 
> native transport with minor modifications to the protocol standard (see 
> [original 
> proposal|https://docs.google.com/document/d/1uEk7KYgxjNA0ybC9fOuegHTcK3Yi0hCQN5nTp5cNFyQ/edit?usp=sharing]
>  for further considered options). First we have to add another value for 
> existing event types. Also, we have to extend the protocol a bit to be able 
> to specify a sub-class and sub-type value. E.g. 
> {{DIAGNOSTIC_EVENT(GossiperEvent, MAJOR_STATE_CHANGE_HANDLED)}}. This still 
> has to be worked out and I'd appreciate any feedback.
> There will also be a CLI tool shipped with Cassandra that enables users to 
> dump events as JSON to stdout. This very simple tool will make use of a 
> patched Python client driver that will work with the new {{DIAGNOSTIC_EVENT}} 
> native transport event type.
> Invocation will simply look something like this {{./diagview event_class 
> ..}}. See attached diag.log for example json output produced by starting and 
> stopping another node.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13459) Diag. Events: Native transport integration

2017-04-18 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-13459:
---
Attachment: diag.log

> Diag. Events: Native transport integration
> --
>
> Key: CASSANDRA-13459
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13459
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: CQL
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
> Attachments: diag.log
>
>
> Events should be consumable by clients that would received subscribed events 
> from the connected node. This functionality is designed to work on top of 
> native transport with minor modifications to the protocol standard (see 
> [original 
> proposal|https://docs.google.com/document/d/1uEk7KYgxjNA0ybC9fOuegHTcK3Yi0hCQN5nTp5cNFyQ/edit?usp=sharing]
>  for further considered options). First we have to add another value for 
> existing event types. Also, we have to extend the protocol a bit to be able 
> to specify a sub-class and sub-type value. E.g. 
> {{DIAGNOSTIC_EVENT(GossiperEvent, MAJOR_STATE_CHANGE_HANDLED)}}. This still 
> has to be worked out and I'd appreciate any feedback.
> There will also be a CLI tool shipped with Cassandra that enables users to 
> dump events as JSON to stdout. This very simple tool will make use of a 
> patched Python client driver that will work with the new {{DIAGNOSTIC_EVENT}} 
> native transport event type.
> Invocation will simply look something like this {{./diagview event_class 
> ..}}. See attached diag.log for example json output produced by starting and 
> stopping another node.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13458) Diag. Events: Add unit testing support

2017-04-18 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-13458:
---
Description: 
Diagnostic events will improve unit testing by
* providing test execution control instances based on CompletableFutures (see 
[PendingRangeCalculatorServiceTest.java|https://github.com/spodkowinski/cassandra/blob/WIP-13458/test/unit/org/apache/cassandra/gms/PendingRangeCalculatorServiceTest.java])
 
* validate state and behavior by allowing you to inspect generated events (see 
[HintsServiceEventsTest.java|https://github.com/spodkowinski/cassandra/blob/WIP-13458/test/unit/org/apache/cassandra/hints/HintsServiceEventsTest.java])

See included 
[testing.rst|https://github.com/spodkowinski/cassandra/blob/WIP-13458/doc/source/development/testing.rst#diagnostic-events-40]
 draft for more details. Let me know if this would be useful for you as a 
developer.

  was:
Diagnostic events will improve unit testing by
* providing test execution control instances based on CompletableFutures (see 
PendingRangeCalculatorServiceTest.java) 
* validate state and behavior by allowing you to inspect generated events (see 
HintsServiceEventsExampleTest.java)

See included testing.rst draft for more details


> Diag. Events: Add unit testing support
> --
>
> Key: CASSANDRA-13458
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13458
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>
> Diagnostic events will improve unit testing by
> * providing test execution control instances based on CompletableFutures (see 
> [PendingRangeCalculatorServiceTest.java|https://github.com/spodkowinski/cassandra/blob/WIP-13458/test/unit/org/apache/cassandra/gms/PendingRangeCalculatorServiceTest.java])
>  
> * validate state and behavior by allowing you to inspect generated events 
> (see 
> [HintsServiceEventsTest.java|https://github.com/spodkowinski/cassandra/blob/WIP-13458/test/unit/org/apache/cassandra/hints/HintsServiceEventsTest.java])
> See included 
> [testing.rst|https://github.com/spodkowinski/cassandra/blob/WIP-13458/doc/source/development/testing.rst#diagnostic-events-40]
>  draft for more details. Let me know if this would be useful for you as a 
> developer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CASSANDRA-13460) Diag. Events: Add local persistance

2017-04-18 Thread Stefan Podkowinski (JIRA)
Stefan Podkowinski created CASSANDRA-13460:
--

 Summary: Diag. Events: Add local persistance
 Key: CASSANDRA-13460
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13460
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Stefan Podkowinski
Assignee: Stefan Podkowinski


Some generated events will be rather less frequent but very useful for 
retroactive troubleshooting. E.g. all events related to bootstraping and gossip 
would probably be worth saving, as they might provide valuable insights and 
will consume very little resources in low quantities. Imaging if we could e.g. 
in case of CASSANDRA-13348 just ask the user to run a tool like 
{{./bin/diagdump BootstrapEvent}} on each host, to get us a detailed log of all 
relevant events. 

This could be done by saving events white-listed in cassandra.yaml to a local 
table. Maybe using a TTL.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CASSANDRA-13459) Diag. Events: Native transport integration

2017-04-18 Thread Stefan Podkowinski (JIRA)
Stefan Podkowinski created CASSANDRA-13459:
--

 Summary: Diag. Events: Native transport integration
 Key: CASSANDRA-13459
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13459
 Project: Cassandra
  Issue Type: Sub-task
  Components: CQL
Reporter: Stefan Podkowinski
Assignee: Stefan Podkowinski


Events should be consumable by clients that would received subscribed events 
from the connected node. This functionality is designed to work on top of 
native transport with minor modifications to the protocol standard (see 
[original 
proposal|https://docs.google.com/document/d/1uEk7KYgxjNA0ybC9fOuegHTcK3Yi0hCQN5nTp5cNFyQ/edit?usp=sharing]
 for further considered options). First we have to add another value for 
existing event types. Also, we have to extend the protocol a bit to be able to 
specify a sub-class and sub-type value. E.g. {{DIAGNOSTIC_EVENT(GossiperEvent, 
MAJOR_STATE_CHANGE_HANDLED)}}. This still has to be worked out and I'd 
appreciate any feedback.


There will also be a CLI tool shipped with Cassandra that enables users to dump 
events as JSON to stdout. This very simple tool will make use of a patched 
Python client driver that will work with the new {{DIAGNOSTIC_EVENT}} native 
transport event type.
Invocation will simply look something like this {{./diagview event_class ..}}. 
See attached diag.log for example json output produced by starting and stopping 
another node.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CASSANDRA-13458) Diag. Events: Add unit testing support

2017-04-18 Thread Stefan Podkowinski (JIRA)
Stefan Podkowinski created CASSANDRA-13458:
--

 Summary: Diag. Events: Add unit testing support
 Key: CASSANDRA-13458
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13458
 Project: Cassandra
  Issue Type: Sub-task
  Components: Testing
Reporter: Stefan Podkowinski
Assignee: Stefan Podkowinski


Diagnostic events will improve unit testing by
* providing test execution control instances based on CompletableFutures (see 
PendingRangeCalculatorServiceTest.java) 
* validate state and behavior by allowing you to inspect generated events (see 
HintsServiceEventsExampleTest.java)

See included testing.rst draft for more details



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CASSANDRA-13457) Diag. Events: Add base classes

2017-04-18 Thread Stefan Podkowinski (JIRA)
Stefan Podkowinski created CASSANDRA-13457:
--

 Summary: Diag. Events: Add base classes
 Key: CASSANDRA-13457
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13457
 Project: Cassandra
  Issue Type: Sub-task
  Components: Core, Observability
Reporter: Stefan Podkowinski
Assignee: Stefan Podkowinski


Base ticket for adding classes that will allow you to implement and subscribe 
to events.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13455) derangement in decoding client token

2017-04-18 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972743#comment-15972743
 ] 

Robert Stupp commented on CASSANDRA-13455:
--

bq. if decodeCredentials() process a bytes ''\000cassandra\000'', it will 
wrongly parse cassandra as the password

I doubt this is true. Can you copy the decoding part into a standalone java app 
and verify?

> derangement in decoding client token
> 
>
> Key: CASSANDRA-13455
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13455
> Project: Cassandra
>  Issue Type: Bug
> Environment: CentOS7.2
> Java 1.8
>Reporter: Amos Jianjun Kong
>Assignee: Amos Jianjun Kong
> Fix For: 3.10
>
> Attachments: 0001-auth-strictly-delimit-in-decoding-client-token.patch
>
>
> RFC4616 requests AuthZID, USERNAME, PASSWORD are delimited by single '\000'.
> Current code actually delimits by serial '\000', when username or password
> is null, it caused decoding derangement.
> The problem was found in code review.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13454) Start compaction when incremental repair finishes

2017-04-18 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-13454:

Status: Ready to Commit  (was: Patch Available)

> Start compaction when incremental repair finishes
> -
>
> Key: CASSANDRA-13454
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13454
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
> Fix For: 4.0
>
>
> When an incremental repair finishes or fails, it's sstables are promoted / 
> demoted on the next compaction. We should submit a compaction as soon an 
> incremental repair finishes so sstables we're finished with don't spend any 
> more time in their own silo than they need to.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13454) Start compaction when incremental repair finishes

2017-04-18 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972706#comment-15972706
 ] 

Marcus Eriksson commented on CASSANDRA-13454:
-

+1

> Start compaction when incremental repair finishes
> -
>
> Key: CASSANDRA-13454
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13454
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
> Fix For: 4.0
>
>
> When an incremental repair finishes or fails, it's sstables are promoted / 
> demoted on the next compaction. We should submit a compaction as soon an 
> incremental repair finishes so sstables we're finished with don't spend any 
> more time in their own silo than they need to.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with storage requirements approximating RF=2

2017-04-18 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972697#comment-15972697
 ] 

Sylvain Lebresne commented on CASSANDRA-13442:
--

bq. Man, this makes me really nervous.
bq. Optimizing away redundant queries a la 7168? Sign me up. But I think 
removing that "redundant" data and making RF not actually mean RF is going too 
far.

I very much second those sentiments.

bq. It will be opt in at the keyspace level and won't affect anyone who dont 
want to use it.

Maybe that's meant to be a response to Jonathan's concern I'm quoting (and 
agreeing with) above. In which case I want to point out that, as also said 
above, this will imo add quite a bit of complexity to existing code, so I very 
strongly disagree with the argument that having it opt-in from the user point 
of view equates it to affecting no-one that don't want to use it.

bq. I think it's worth characterizing as much as possible what it's going to 
cost before dismissing it

Sure and I, for one, certainly won't claim having put tons of though into this 
yet, and I'm happy to see a much more precise analysis of the costs.

But I'll admit that my "gut feeling" (my 6+-years-working-on-C*-gut feeling) is 
that this will get pretty complex especially when you throw up node movements 
and potential data loss into the mix, and that it's not worth that complexity.


> Support a means of strongly consistent highly available replication with 
> storage requirements approximating RF=2
> 
>
> Key: CASSANDRA-13442
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13442
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Coordination, Distributed Metadata, Local 
> Write-Read Paths
>Reporter: Ariel Weisberg
>
> Replication factors like RF=2 can't provide strong consistency and 
> availability because if a single node is lost it's impossible to reach a 
> quorum of replicas. Stepping up to RF=3 will allow you to lose a node and 
> still achieve quorum for reads and writes, but requires committing additional 
> storage.
> The requirement of a quorum for writes/reads doesn't seem to be something 
> that can be relaxed without additional constraints on queries, but it seems 
> like it should be possible to relax the requirement that 3 full copies of the 
> entire data set are kept. What is actually required is a covering data set 
> for the range and we should be able to achieve a covering data set and high 
> availability without having three full copies. 
> After a repair we know that some subset of the data set is fully replicated. 
> At that point we don't have to read from a quorum of nodes for the repaired 
> data. It is sufficient to read from a single node for the repaired data and a 
> quorum of nodes for the unrepaired data.
> One way to exploit this would be to have N replicas, say the last N replicas 
> (where N varies with RF) in the preference list, delete all repaired data 
> after a repair completes. Subsequent quorum reads will be able to retrieve 
> the repaired data from any of the two full replicas and the unrepaired data 
> from a quorum read of any replica including the "transient" replicas.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13455) derangement in decoding client token

2017-04-18 Thread Amos Jianjun Kong (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972679#comment-15972679
 ] 

Amos Jianjun Kong commented on CASSANDRA-13455:
---

[~snazy] , thanks for your quick reply.

If we set password to '', then client will response  '\000cassandra\000'.
The good thing is that Cassandra has some checking of the request keys, it will 
raise error early (the problem code couldn't be reached). 

cassandra.cluster.NoHostAvailable: ('Unable to connect to any servers', 
{'127.0.0.1': AuthenticationFailed('Failed to authenticate to 127.0.0.1: Error 
from server: code= [Server error] message="java.lang.RuntimeException: 
com.google.common.util.concurrent.UncheckedExecutionException: 
org.apache.cassandra.exceptions.InvalidRequestException: Key may not be 
empty"',)})

```
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider

auth = PlainTextAuthProvider(username='cassandra', password='')
cluster = Cluster(['127.0.0.1'], auth_provider=auth)

session = cluster.connect(keyspace='system_auth')
session.execute("select * from role_members;")
session.shutdown()
```
-
The decoding code is wrong, if decodeCredentials() process a bytes 
''\000cassandra\000'', it will wrongly parse cassandra as the password, and 
parse a wrong username.

Sorry I'm not familiar with Cassandra unittest, I don't know if the problem 
code is covered by unittest.


> derangement in decoding client token
> 
>
> Key: CASSANDRA-13455
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13455
> Project: Cassandra
>  Issue Type: Bug
> Environment: CentOS7.2
> Java 1.8
>Reporter: Amos Jianjun Kong
>Assignee: Amos Jianjun Kong
> Fix For: 3.10
>
> Attachments: 0001-auth-strictly-delimit-in-decoding-client-token.patch
>
>
> RFC4616 requests AuthZID, USERNAME, PASSWORD are delimited by single '\000'.
> Current code actually delimits by serial '\000', when username or password
> is null, it caused decoding derangement.
> The problem was found in code review.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with storage requirements approximating RF=2

2017-04-18 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972641#comment-15972641
 ] 

T Jake Luciani commented on CASSANDRA-13442:


Rather than forcing auto deletion of the data on repair, would you be ok with 
requiring a explicit cleanup to see the disk savings? That seems like the 
safest approach.

> Support a means of strongly consistent highly available replication with 
> storage requirements approximating RF=2
> 
>
> Key: CASSANDRA-13442
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13442
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Coordination, Distributed Metadata, Local 
> Write-Read Paths
>Reporter: Ariel Weisberg
>
> Replication factors like RF=2 can't provide strong consistency and 
> availability because if a single node is lost it's impossible to reach a 
> quorum of replicas. Stepping up to RF=3 will allow you to lose a node and 
> still achieve quorum for reads and writes, but requires committing additional 
> storage.
> The requirement of a quorum for writes/reads doesn't seem to be something 
> that can be relaxed without additional constraints on queries, but it seems 
> like it should be possible to relax the requirement that 3 full copies of the 
> entire data set are kept. What is actually required is a covering data set 
> for the range and we should be able to achieve a covering data set and high 
> availability without having three full copies. 
> After a repair we know that some subset of the data set is fully replicated. 
> At that point we don't have to read from a quorum of nodes for the repaired 
> data. It is sufficient to read from a single node for the repaired data and a 
> quorum of nodes for the unrepaired data.
> One way to exploit this would be to have N replicas, say the last N replicas 
> (where N varies with RF) in the preference list, delete all repaired data 
> after a repair completes. Subsequent quorum reads will be able to retrieve 
> the repaired data from any of the two full replicas and the unrepaired data 
> from a quorum read of any replica including the "transient" replicas.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13442) Support a means of strongly consistent highly available replication with storage requirements approximating RF=2

2017-04-18 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972633#comment-15972633
 ] 

T Jake Luciani commented on CASSANDRA-13442:


I like the idea of making it part of the replication strategy.
You could have an unrepaired RF and a repaired RF.

In your example it would be: unrepaired: { dc1=3, dc2=3}, repaired: { dc1=2, 
dc2=2 }.


> Support a means of strongly consistent highly available replication with 
> storage requirements approximating RF=2
> 
>
> Key: CASSANDRA-13442
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13442
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Coordination, Distributed Metadata, Local 
> Write-Read Paths
>Reporter: Ariel Weisberg
>
> Replication factors like RF=2 can't provide strong consistency and 
> availability because if a single node is lost it's impossible to reach a 
> quorum of replicas. Stepping up to RF=3 will allow you to lose a node and 
> still achieve quorum for reads and writes, but requires committing additional 
> storage.
> The requirement of a quorum for writes/reads doesn't seem to be something 
> that can be relaxed without additional constraints on queries, but it seems 
> like it should be possible to relax the requirement that 3 full copies of the 
> entire data set are kept. What is actually required is a covering data set 
> for the range and we should be able to achieve a covering data set and high 
> availability without having three full copies. 
> After a repair we know that some subset of the data set is fully replicated. 
> At that point we don't have to read from a quorum of nodes for the repaired 
> data. It is sufficient to read from a single node for the repaired data and a 
> quorum of nodes for the unrepaired data.
> One way to exploit this would be to have N replicas, say the last N replicas 
> (where N varies with RF) in the preference list, delete all repaired data 
> after a repair completes. Subsequent quorum reads will be able to retrieve 
> the repaired data from any of the two full replicas and the unrepaired data 
> from a quorum read of any replica including the "transient" replicas.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-12835) Tracing payload not passed from QueryMessage to tracing session

2017-04-18 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972616#comment-15972616
 ] 

T Jake Luciani commented on CASSANDRA-12835:


Changes make sense.

One nit: I think you've included an unused import in the TracingTest: 
org.apache.commons.lang3.StringUtils

+1 assuming the tests look good.

> Tracing payload not passed from QueryMessage to tracing session
> ---
>
> Key: CASSANDRA-12835
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12835
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Hannu Kröger
>Assignee: mck
>Priority: Critical
>  Labels: tracing
> Fix For: 3.11.x, 4.x
>
>
> Caused by CASSANDRA-10392.
> Related to CASSANDRA-11706.
> When querying using CQL statements (not prepared) the message type is 
> QueryMessage and the code in 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/QueryMessage.java#L101
>  is as follows:
> {code:java}
> if (state.traceNextQuery())
> {
> state.createTracingSession();
> ImmutableMap.Builder builder = 
> ImmutableMap.builder();
> {code}
> {{state.createTracingSession();}} should probably be 
> {{state.createTracingSession(getCustomPayload());}}. At least that fixes the 
> problem for me.
> This also raises the question whether some other parts of the code should 
> pass the custom payload as well (I'm not the right person to analyze this):
> {code}
> $ ag createTracingSession
> src/java/org/apache/cassandra/service/QueryState.java
> 80:public void createTracingSession()
> 82:createTracingSession(Collections.EMPTY_MAP);
> 85:public void createTracingSession(Map customPayload)
> src/java/org/apache/cassandra/thrift/CassandraServer.java
> 2528:state().getQueryState().createTracingSession();
> src/java/org/apache/cassandra/transport/messages/BatchMessage.java
> 163:state.createTracingSession();
> src/java/org/apache/cassandra/transport/messages/ExecuteMessage.java
> 114:state.createTracingSession(getCustomPayload());
> src/java/org/apache/cassandra/transport/messages/QueryMessage.java
> 101:state.createTracingSession();
> src/java/org/apache/cassandra/transport/messages/PrepareMessage.java
> 74:state.createTracingSession();
> {code}
> This is not marked as `minor` as the CASSANDRA-11706 was because this cannot 
> be fixed by the tracing plugin.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-12835) Tracing payload not passed from QueryMessage to tracing session

2017-04-18 Thread mck (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mck updated CASSANDRA-12835:

Attachment: (was: 12835-trunk.txt)

> Tracing payload not passed from QueryMessage to tracing session
> ---
>
> Key: CASSANDRA-12835
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12835
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Hannu Kröger
>Assignee: mck
>Priority: Critical
>  Labels: tracing
> Fix For: 3.11.x, 4.x
>
>
> Caused by CASSANDRA-10392.
> Related to CASSANDRA-11706.
> When querying using CQL statements (not prepared) the message type is 
> QueryMessage and the code in 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/QueryMessage.java#L101
>  is as follows:
> {code:java}
> if (state.traceNextQuery())
> {
> state.createTracingSession();
> ImmutableMap.Builder builder = 
> ImmutableMap.builder();
> {code}
> {{state.createTracingSession();}} should probably be 
> {{state.createTracingSession(getCustomPayload());}}. At least that fixes the 
> problem for me.
> This also raises the question whether some other parts of the code should 
> pass the custom payload as well (I'm not the right person to analyze this):
> {code}
> $ ag createTracingSession
> src/java/org/apache/cassandra/service/QueryState.java
> 80:public void createTracingSession()
> 82:createTracingSession(Collections.EMPTY_MAP);
> 85:public void createTracingSession(Map customPayload)
> src/java/org/apache/cassandra/thrift/CassandraServer.java
> 2528:state().getQueryState().createTracingSession();
> src/java/org/apache/cassandra/transport/messages/BatchMessage.java
> 163:state.createTracingSession();
> src/java/org/apache/cassandra/transport/messages/ExecuteMessage.java
> 114:state.createTracingSession(getCustomPayload());
> src/java/org/apache/cassandra/transport/messages/QueryMessage.java
> 101:state.createTracingSession();
> src/java/org/apache/cassandra/transport/messages/PrepareMessage.java
> 74:state.createTracingSession();
> {code}
> This is not marked as `minor` as the CASSANDRA-11706 was because this cannot 
> be fixed by the tracing plugin.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-12835) Tracing payload not passed from QueryMessage to tracing session

2017-04-18 Thread mck (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mck updated CASSANDRA-12835:

Attachment: (was: 12835-3.X.txt)

> Tracing payload not passed from QueryMessage to tracing session
> ---
>
> Key: CASSANDRA-12835
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12835
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Hannu Kröger
>Assignee: mck
>Priority: Critical
>  Labels: tracing
> Fix For: 3.11.x, 4.x
>
>
> Caused by CASSANDRA-10392.
> Related to CASSANDRA-11706.
> When querying using CQL statements (not prepared) the message type is 
> QueryMessage and the code in 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/QueryMessage.java#L101
>  is as follows:
> {code:java}
> if (state.traceNextQuery())
> {
> state.createTracingSession();
> ImmutableMap.Builder builder = 
> ImmutableMap.builder();
> {code}
> {{state.createTracingSession();}} should probably be 
> {{state.createTracingSession(getCustomPayload());}}. At least that fixes the 
> problem for me.
> This also raises the question whether some other parts of the code should 
> pass the custom payload as well (I'm not the right person to analyze this):
> {code}
> $ ag createTracingSession
> src/java/org/apache/cassandra/service/QueryState.java
> 80:public void createTracingSession()
> 82:createTracingSession(Collections.EMPTY_MAP);
> 85:public void createTracingSession(Map customPayload)
> src/java/org/apache/cassandra/thrift/CassandraServer.java
> 2528:state().getQueryState().createTracingSession();
> src/java/org/apache/cassandra/transport/messages/BatchMessage.java
> 163:state.createTracingSession();
> src/java/org/apache/cassandra/transport/messages/ExecuteMessage.java
> 114:state.createTracingSession(getCustomPayload());
> src/java/org/apache/cassandra/transport/messages/QueryMessage.java
> 101:state.createTracingSession();
> src/java/org/apache/cassandra/transport/messages/PrepareMessage.java
> 74:state.createTracingSession();
> {code}
> This is not marked as `minor` as the CASSANDRA-11706 was because this cannot 
> be fixed by the tracing plugin.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-12835) Tracing payload not passed from QueryMessage to tracing session

2017-04-18 Thread mck (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972551#comment-15972551
 ] 

mck commented on CASSANDRA-12835:
-

[~tjake], patches are updated here:
|| Branch   || Testall  || Dtest ||
| 
[3.11|https://github.com/michaelsembwever/cassandra/commit/4105fc71c652794d3ae1fba475f01ebf00199a07]
  | [testall|https://circleci.com/gh/michaelsembwever/cassandra/16]   | 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/15/]
 |
| 
[trunk|https://github.com/michaelsembwever/cassandra/commit/c4de4f0dd0e70d7d67ade1e315ee3053494cf51c]
 | [testall|https://circleci.com/gh/michaelsembwever/cassandra/17]  
 | 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/19/]
 |

(dtests are queued and will likely take some time to complete)

> Tracing payload not passed from QueryMessage to tracing session
> ---
>
> Key: CASSANDRA-12835
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12835
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Hannu Kröger
>Assignee: mck
>Priority: Critical
>  Labels: tracing
> Fix For: 3.11.x, 4.x
>
> Attachments: 12835-3.X.txt, 12835-trunk.txt
>
>
> Caused by CASSANDRA-10392.
> Related to CASSANDRA-11706.
> When querying using CQL statements (not prepared) the message type is 
> QueryMessage and the code in 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/QueryMessage.java#L101
>  is as follows:
> {code:java}
> if (state.traceNextQuery())
> {
> state.createTracingSession();
> ImmutableMap.Builder builder = 
> ImmutableMap.builder();
> {code}
> {{state.createTracingSession();}} should probably be 
> {{state.createTracingSession(getCustomPayload());}}. At least that fixes the 
> problem for me.
> This also raises the question whether some other parts of the code should 
> pass the custom payload as well (I'm not the right person to analyze this):
> {code}
> $ ag createTracingSession
> src/java/org/apache/cassandra/service/QueryState.java
> 80:public void createTracingSession()
> 82:createTracingSession(Collections.EMPTY_MAP);
> 85:public void createTracingSession(Map customPayload)
> src/java/org/apache/cassandra/thrift/CassandraServer.java
> 2528:state().getQueryState().createTracingSession();
> src/java/org/apache/cassandra/transport/messages/BatchMessage.java
> 163:state.createTracingSession();
> src/java/org/apache/cassandra/transport/messages/ExecuteMessage.java
> 114:state.createTracingSession(getCustomPayload());
> src/java/org/apache/cassandra/transport/messages/QueryMessage.java
> 101:state.createTracingSession();
> src/java/org/apache/cassandra/transport/messages/PrepareMessage.java
> 74:state.createTracingSession();
> {code}
> This is not marked as `minor` as the CASSANDRA-11706 was because this cannot 
> be fixed by the tracing plugin.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13440) Sign RPM artifacts

2017-04-18 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972550#comment-15972550
 ] 

Stefan Podkowinski commented on CASSANDRA-13440:



Signatures can be used for both repository transport integrity protection and 
end-to-end content verification. 

Providing a signature for {{repomd.xml}} allows clients to verify the 
repository's meta-data. But you'll have to enable this by adding 
{{repo_gpgcheck=1}} to the yum config. 

Individual package files can also contain a signature in the RPM header. This 
can be done either during the build process ({{rpmbuild --sign}}) or afterwards 
on the final artifact. As the RPM should be build using docker and just create 
the RPMs at the end without intervention, we probably have to go with the later 
option here. I'd suggest to use the rpmsign wrapper ({{yum install rpm-sign}}) 
and use it on the package, e.g.:
{{rpmsign -D '%_gpg_name MyAlias' --addsign cassandra-3.0.13-1.noarch.rpm}}

Verifying package signatures requires to import the public keys first:
{{rpm --import https://www.apache.org/dist/cassandra/KEYS}}

Afterwards the following command should report "OK" for included hashes and gpg 
signatures:
{{rpm -K cassandra-3.0.13-1.noarch.rpm}}

Once the RPM is signed, we can enable {{gpgcheck=1}} again for the repo config. 
If enabled, both the import key and verification steps should take place 
automatically during installation from the yum repo.

> Sign RPM artifacts
> --
>
> Key: CASSANDRA-13440
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13440
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Packaging
>Reporter: Stefan Podkowinski
>
> RPMs should be gpg signed just as the deb packages. Also add documentation 
> how to verify to download page.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CASSANDRA-13456) Needs better logging for timeout/failures

2017-04-18 Thread Fuud (JIRA)
Fuud created CASSANDRA-13456:


 Summary: Needs better logging for timeout/failures
 Key: CASSANDRA-13456
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13456
 Project: Cassandra
  Issue Type: Bug
Reporter: Fuud


When read was failed due to timeout, Cassandra reports in logs "Timeout; 
reveived 1 of 3 responses". Same information is passed to clients.

But this information is not enoght to get list of slow nodes. 

Better to have detailed message in debug log:
"Timeout; reveived 1 of 4 responses. Requested but not responded node: [, 
], Failed nodes: []"

I implemented such behavior by patching ReadCallback, 
AbstractWriteResponseHandler, DatacenterSyncWriteResponseHandler and 
WriteResponseHandler. It handles all cases except Paxos. 

But I want to implement solid solution that handle all cases in same way.

But before I will start I want to know: are there any objections against such 
logging?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13455) derangement in decoding client token

2017-04-18 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972518#comment-15972518
 ] 

Robert Stupp commented on CASSANDRA-13455:
--

Can you elaborate why you think that the current code is broken? I do not see 
anything that's broken there.
(Ideally with a unit test.)

> derangement in decoding client token
> 
>
> Key: CASSANDRA-13455
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13455
> Project: Cassandra
>  Issue Type: Bug
> Environment: CentOS7.2
> Java 1.8
>Reporter: Amos Jianjun Kong
>Assignee: Amos Jianjun Kong
> Fix For: 3.10
>
> Attachments: 0001-auth-strictly-delimit-in-decoding-client-token.patch
>
>
> RFC4616 requests AuthZID, USERNAME, PASSWORD are delimited by single '\000'.
> Current code actually delimits by serial '\000', when username or password
> is null, it caused decoding derangement.
> The problem was found in code review.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13455) derangement in decoding client token

2017-04-18 Thread Amos Jianjun Kong (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amos Jianjun Kong updated CASSANDRA-13455:
--
Attachment: 0001-auth-strictly-delimit-in-decoding-client-token.patch

> derangement in decoding client token
> 
>
> Key: CASSANDRA-13455
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13455
> Project: Cassandra
>  Issue Type: Bug
> Environment: CentOS7.2
> Java 1.8
>Reporter: Amos Jianjun Kong
>Assignee: Amos Jianjun Kong
> Fix For: 3.10
>
> Attachments: 0001-auth-strictly-delimit-in-decoding-client-token.patch
>
>
> RFC4616 requests AuthZID, USERNAME, PASSWORD are delimited by single '\000'.
> Current code actually delimits by serial '\000', when username or password
> is null, it caused decoding derangement.
> The problem was found in code review.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13455) derangement in decoding client token

2017-04-18 Thread Amos Jianjun Kong (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amos Jianjun Kong updated CASSANDRA-13455:
--
Summary: derangement in decoding client token  (was: derangement in )

> derangement in decoding client token
> 
>
> Key: CASSANDRA-13455
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13455
> Project: Cassandra
>  Issue Type: Bug
> Environment: CentOS7.2
> Java 1.8
>Reporter: Amos Jianjun Kong
> Fix For: 3.10
>
>
> RFC4616 requests AuthZID, USERNAME, PASSWORD are delimited by single '\000'.
> Current code actually delimits by serial '\000', when username or password
> is null, it caused decoding derangement.
> The problem was found in code review.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CASSANDRA-13455) derangement in

2017-04-18 Thread Amos Jianjun Kong (JIRA)
Amos Jianjun Kong created CASSANDRA-13455:
-

 Summary: derangement in 
 Key: CASSANDRA-13455
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13455
 Project: Cassandra
  Issue Type: Bug
 Environment: CentOS7.2
Java 1.8
Reporter: Amos Jianjun Kong
 Fix For: 3.10


RFC4616 requests AuthZID, USERNAME, PASSWORD are delimited by single '\000'.
Current code actually delimits by serial '\000', when username or password
is null, it caused decoding derangement.

The problem was found in code review.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-10404) Node to Node encryption transitional mode

2017-04-18 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972394#comment-15972394
 ] 

Stefan Podkowinski commented on CASSANDRA-10404:



bq. I'm imagining that for 4.0 we still need both storage_port and 
ssl_storage_port in place to support cluster upgrades. Upgraded nodes will be 
smart enough to connect on the storage_port (which will be intelligent to 
figure out if the connection is TLS or not). Unupgraded nodes can still connect 
on the legacy port (as we'll need to listen on it, as well).


Looks like {{MessagingService.portFor(secure)}} would have to check the peer's 
version in that case (or probably solved differently after CASSANDRA-7544). 

Maybe we should also allow to set ssl_storage_port to same value as 
storage_port to prevent opening the obsolete ssl socket in first place for 
already upgraded clusters. 

Needs to be covered in NEWS.txt in any case.



> Node to Node encryption transitional mode
> -
>
> Key: CASSANDRA-10404
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10404
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Tom Lewis
>Assignee: Jason Brown
>
> Create a transitional mode for encryption that allows encrypted and 
> unencrypted traffic node-to-node during a change over to encryption from 
> unencrypted. This alleviates downtime during the switch.
>  This is similar to CASSANDRA-10559 which is intended for client-to-node



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13307) The specification of protocol version in cqlsh means the python driver doesn't automatically downgrade protocol version.

2017-04-18 Thread mck (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972282#comment-15972282
 ] 

mck commented on CASSANDRA-13307:
-

Committed to trunk.

[~tjake], ([~jjirsa]), i missed that you marked this for 3.11.x. Would you like 
me to commit it to the cassandra-3.11 branch as well? The patch applies cleanly 
there.

> The specification of protocol version in cqlsh means the python driver 
> doesn't automatically downgrade protocol version.
> 
>
> Key: CASSANDRA-13307
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13307
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Matt Byrd
>Assignee: Matt Byrd
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 3.11.x
>
>
> Hi,
> Looks like we've regressed on the issue described in:
> https://issues.apache.org/jira/browse/CASSANDRA-9467
> In that we're no longer able to connect from newer cqlsh versions
> (e.g trunk) to older versions of Cassandra with a lower version of the 
> protocol (e.g 2.1 with protocol version 3)
> The problem seems to be that we're relying on the ability for the client to 
> automatically downgrade protocol version implemented in Cassandra here:
> https://issues.apache.org/jira/browse/CASSANDRA-12838
> and utilised in the python client here:
> https://datastax-oss.atlassian.net/browse/PYTHON-240
> The problem however comes when we implemented:
> https://datastax-oss.atlassian.net/browse/PYTHON-537
> "Don't downgrade protocol version if explicitly set" 
> (included when we bumped from 3.5.0 to 3.7.0 of the python driver as part of 
> fixing: https://issues.apache.org/jira/browse/CASSANDRA-11534)
> Since we do explicitly specify the protocol version in the bin/cqlsh.py.
> I've got a patch which just adds an option to explicitly specify the protocol 
> version (for those who want to do that) and then otherwise defaults to not 
> setting the protocol version, i.e using the protocol version from the client 
> which we ship, which should by default be the same protocol as the server.
> Then it should downgrade gracefully as was intended. 
> Let me know if that seems reasonable.
> Thanks,
> Matt



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


cassandra git commit: Fix cqlsh automatic protocol downgrade regression Patch by Matt Byrd; reviewed by Mick Semb Wever for CASSANDRA-13307

2017-04-18 Thread mck
Repository: cassandra
Updated Branches:
  refs/heads/trunk 74bdf633e -> 8f5f54fb1


Fix cqlsh automatic protocol downgrade regression
Patch by Matt Byrd; reviewed by Mick Semb Wever for CASSANDRA-13307


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/8f5f54fb
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/8f5f54fb
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/8f5f54fb

Branch: refs/heads/trunk
Commit: 8f5f54fb1c6b32e1dceacb4e694c414d3583a419
Parents: 74bdf63
Author: Matt Byrd 
Authored: Wed Mar 8 13:55:01 2017 -0800
Committer: mck 
Committed: Tue Apr 18 17:12:50 2017 +1000

--
 CHANGES.txt  |  1 +
 bin/cqlsh.py | 19 +--
 2 files changed, 14 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/8f5f54fb/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index dae275f..adbaf84 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -52,6 +52,7 @@
  * cqlsh auto completion: refactor definition of compaction strategy options 
(CASSANDRA-12946)
  * Add support for arithmetic operators (CASSANDRA-11935)
  * Add histogram for delay to deliver hints (CASSANDRA-13234)
+ * Fix cqlsh automatic protocol downgrade regression (CASSANDRA-13307)
 
 
 3.11.0

http://git-wip-us.apache.org/repos/asf/cassandra/blob/8f5f54fb/bin/cqlsh.py
--
diff --git a/bin/cqlsh.py b/bin/cqlsh.py
index bd62072..28e8043 100644
--- a/bin/cqlsh.py
+++ b/bin/cqlsh.py
@@ -178,7 +178,6 @@ from cqlshlib.util import get_file_encoding_bomsize, 
trim_if_present
 DEFAULT_HOST = '127.0.0.1'
 DEFAULT_PORT = 9042
 DEFAULT_SSL = False
-DEFAULT_PROTOCOL_VERSION = 4
 DEFAULT_CONNECT_TIMEOUT_SECONDS = 5
 DEFAULT_REQUEST_TIMEOUT_SECONDS = 10
 
@@ -223,6 +222,9 @@ parser.add_option('--cqlversion', default=None,
   help='Specify a particular CQL version, '
'by default the highest version supported by the server 
will be used.'
' Examples: "3.0.3", "3.1.0"')
+parser.add_option("--protocol-version", type="int", default=None,
+  help='Specify a specific protcol version otherwise the 
client will default and downgrade as necessary')
+
 parser.add_option("-e", "--execute", help='Execute the statement and quit.')
 parser.add_option("--connect-timeout", 
default=DEFAULT_CONNECT_TIMEOUT_SECONDS, dest='connect_timeout',
   help='Specify the connection timeout in seconds (default: 
%default seconds).')
@@ -449,7 +451,7 @@ class Shell(cmd.Cmd):
  ssl=False,
  single_statement=None,
  request_timeout=DEFAULT_REQUEST_TIMEOUT_SECONDS,
- protocol_version=DEFAULT_PROTOCOL_VERSION,
+ protocol_version=None,
  connect_timeout=DEFAULT_CONNECT_TIMEOUT_SECONDS):
 cmd.Cmd.__init__(self, completekey=completekey)
 self.hostname = hostname
@@ -468,13 +470,16 @@ class Shell(cmd.Cmd):
 if use_conn:
 self.conn = use_conn
 else:
+kwargs = {}
+if protocol_version is not None:
+kwargs['protocol_version'] = protocol_version
 self.conn = Cluster(contact_points=(self.hostname,), 
port=self.port, cql_version=cqlver,
-protocol_version=protocol_version,
 auth_provider=self.auth_provider,
 ssl_options=sslhandling.ssl_settings(hostname, 
CONFIG_FILE) if ssl else None,
 
load_balancing_policy=WhiteListRoundRobinPolicy([self.hostname]),
 control_connection_timeout=connect_timeout,
-connect_timeout=connect_timeout)
+connect_timeout=connect_timeout,
+**kwargs)
 self.owns_connection = not use_conn
 
 if keyspace:
@@ -1673,9 +1678,9 @@ class Shell(cmd.Cmd):
 
 direction = parsed.get_binding('dir').upper()
 if direction == 'FROM':
-task = ImportTask(self, ks, table, columns, fname, opts, 
DEFAULT_PROTOCOL_VERSION, CONFIG_FILE)
+task = ImportTask(self, ks, table, columns, fname, opts, 
self.conn.protocol_version, CONFIG_FILE)
 elif direction == 'TO':
-task = ExportTask(self, ks, table, columns, fname, opts, 
DEFAULT_PROTOCOL_VERSION, CONFIG_FILE)
+task = ExportTask(self, ks, table, columns, fname, opts, 
self.conn.protocol_version, CONFIG_FILE)
 else:
 raise SyntaxError("Unknown direction %s" % direction)
 

[jira] [Commented] (CASSANDRA-13336) Upgrade the snappy-java version to 1.1.2.x

2017-04-18 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972187#comment-15972187
 ] 

Jeff Jirsa commented on CASSANDRA-13336:


[~jasobrown] There's a repair dtest that only fails on this branch, and not 
trunk. Rebasing to see if it's been fixed elsewhere, otherwise not committing 
until I understand that regression.

> Upgrade the snappy-java version to 1.1.2.x
> --
>
> Key: CASSANDRA-13336
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13336
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Libraries
>Reporter: yuqi
>Assignee: Jeff Jirsa
> Fix For: 4.x
>
>
>  Snappy-java added support for AArch64 since version 
> 1.1.2.x.(https://github.com/xerial/snappy-java/pull/135).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)