[jira] [Commented] (CASSANDRA-13997) Upgrade guava to 23.3

2017-11-07 Thread Thibault Kruse (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243295#comment-16243295
 ] 

Thibault Kruse commented on CASSANDRA-13997:


For us it would be nice if cassandra 3.x could be made API compatible with 
Guava > 19.0.
Such as replacing 

Iterators.emptyIterator()

with

Collections.emptyIterator()

as done in

https://github.com/krummas/cassandra/commits/marcuse/guava23

This would not require changing the guava version, just abolishing certain 
usages of guava that have been deprecated.

> Upgrade guava to 23.3
> -
>
> Key: CASSANDRA-13997
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13997
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 4.x
>
>
> For 4.0 we should upgrade guava to the latest version
> patch here: https://github.com/krummas/cassandra/commits/marcuse/guava23
> A bunch of quite commonly used methods have been deprecated since guava 18 
> which we use now ({{Throwables.propagate}} for example), this patch mostly 
> updates uses where compilation fails. {{Futures.transform(ListenableFuture 
> ..., AsyncFunction ...}} was deprecated in Guava 19 and removed in 20 for 
> example, we should probably open new tickets to remove calls to all 
> deprecated guava methods.
> Also had to add a dependency on {{com.google.j2objc.j2objc-annotations}}, to 
> avoid some build-time warnings (maybe due to 
> https://github.com/google/guava/commit/fffd2b1f67d158c7b4052123c5032b0ba54a910d
>  ?)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13985) Support restricting reads and writes to specific datacenters on a per user basis

2017-11-07 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243207#comment-16243207
 ] 

Blake Eggleston commented on CASSANDRA-13985:
-

Here’s an initial implementation optionally add specific datacenters when 
granting permissions: https://github.com/bdeggleston/cassandra/tree/13985

> Support restricting reads and writes to specific datacenters on a per user 
> basis
> 
>
> Key: CASSANDRA-13985
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13985
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Minor
>
> There are a few use cases where it makes sense to restrict the operations a 
> given user can perform in specific data centers. The obvious use case is the 
> production/analytics datacenter configuration. You don’t want the production 
> user to be reading/or writing to the analytics datacenter, and you don’t want 
> the analytics user to be reading from the production datacenter.
> Although we expect users to get this right on that application level, we 
> should also be able to enforce this at the database level. The first approach 
> that comes to mind would be to support an optional DC parameter when granting 
> select and modify permissions to roles. Something like {{GRANT SELECT ON 
> some_keyspace TO that_user IN DC dc1}}, statements that omit the dc would 
> implicitly be granting permission to all dcs. However, I’m not married to 
> this approach.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13991) NullPointerException when querying a table with a previous state

2017-11-07 Thread ZhaoYang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhaoYang updated CASSANDRA-13991:
-
Reproduced In: 3.11.1, 3.0.15  (was: 3.0.15, 3.11.1)
   Labels: lhf  (was: )

bq. https://github.com/gocql/gocql/issues/1017

Update: bug is fixed in gocql

> NullPointerException when querying a table with a previous state
> 
>
> Key: CASSANDRA-13991
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13991
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Chris mildebrandt
>  Labels: lhf
> Attachments: CASSANDRA-13991.log
>
>
> Performing the following steps (using the gocql library) results in an NPE:
> * With a table of 12 entries, read all rows.
> * Set the page size to 1 and read the first row. Save the query state.
> * Read all the row again.
> * Set the page size to 5 and the page state to the previous state. (This is 
> where the NPE occurs).
> This can be reproduced with the following project:
> https://github.com/eyeofthefrog/CASSANDRA-13991



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14001) Gossip after node restart can take a long time to converge about "down" nodes in large clusters

2017-11-07 Thread Joseph Lynch (JIRA)
Joseph Lynch created CASSANDRA-14001:


 Summary: Gossip after node restart can take a long time to 
converge about "down" nodes in large clusters
 Key: CASSANDRA-14001
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14001
 Project: Cassandra
  Issue Type: Improvement
  Components: Lifecycle
Reporter: Joseph Lynch
Priority: Minor


When nodes restart in a large cluster, they mark all nodes as "alive", which 
first calls {{markDead}} and then creates an {{EchoMessage}} and in the 
callback to that marks the node as alive. This works great, except when that 
initial echo fails for w.e. reason and that node is marked as dead, in which 
case it will remain dead for a long while.

We mostly see this on 100+ node clusters, and almost always when nodes are in 
different datacenters that have unreliable network connections (e.g, cross 
region in AWS) and I think that it comes down to a combination of:
1. Only a node itself can mark another node as "UP"
2. Nodes only gossip with dead nodes with probability {{#dead / (#live +1)}}

In particular the algorithm in #2 leads to long convergence times because the 
number of dead nodes it typically very small compared to the cluster size. My 
back of the envelope model of this algorithm indicates that for a 100 node 
cluster this would take an average of ~50 seconds with a stdev of 50 seconds, 
which means we might be waiting _minutes_ for the nodes to gossip with each 
other. I'm modeling this as the minimum of two [geometric 
distributions|https://en.wikipedia.org/wiki/Geometric_distribution] with 
parameter {{p=1/#nodes}}, yielding a geometric distribution with parameter 
{{p=1-(1-(1/#nodes)^2)}}. So for a 100 node cluster:

{noformat}
100 node cluster =>
X = Pr(node1 gossips with node2) = geom(0.01)
Y = Pr(node 2 gossips with node1) = geom(0.01)
Z = min(X or Y) = geom(1 - (1 - 0.01)^2) = geom(0.02)
E[Z] = 1/0.02 = 50
V[Z] = (1-0.02)/(0.02)^2 = 2450

1000 node cluster ->
Z = geom(1 - (1 - 0.001)^2) = geom(0.002)
E[Z] = 500
V[Z] = 24500
{noformat}

Since we gossip every second that means that on expectation in a 100 node 
cluster these nodes would see each other after about a minute and in a thousand 
node cluster, after ~8 minutes. For 100 node clusters the variance is 
astounding, and means that in particular edge cases we might be waiting hours 
before these nodes gossip with each other.

I'm thinking of writing a patch which either:
# Makes gossip order a shuffled list that includes dead nodes a la [swim 
gossip|https://www.cs.cornell.edu/~asdas/research/dsn02-swim.pdf]. This would 
make it so that we waste some rounds on dead nodes but guarantee linear 
bounding of gossip.
# Adds an endpoint that re-triggers gossip with all nodes. Operators could call 
this after a restart a few times if they detect a gossip inconsistency.
# Bounding the probability we gossip with a dead node at some reasonable number 
like 1/10 or something. This might cause a lot of gossip load when a node is 
actually down for large clusters, but would also act to bound the variance.
# Something else?

I've got a WIP 
[branch|https://github.com/apache/cassandra/compare/cassandra-3.11...jolynch:force_gossip]
 on 3.11 which implements options #1 and #2, but I can reduce/change/modify as 
needed if people think there is a better way. The patch doesn't pass tests yet 
but I'm not going to change/add the tests unless we think moving to time 
bounded gossip for down nodes is a good idea.





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13985) Support restricting reads and writes to specific datacenters on a per user basis

2017-11-07 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243207#comment-16243207
 ] 

Blake Eggleston edited comment on CASSANDRA-13985 at 11/8/17 1:09 AM:
--

Here’s an initial implementation to optionally add specific datacenters when 
granting permissions: https://github.com/bdeggleston/cassandra/tree/13985


was (Author: bdeggleston):
Here’s an initial implementation optionally add specific datacenters when 
granting permissions: https://github.com/bdeggleston/cassandra/tree/13985

> Support restricting reads and writes to specific datacenters on a per user 
> basis
> 
>
> Key: CASSANDRA-13985
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13985
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Minor
>
> There are a few use cases where it makes sense to restrict the operations a 
> given user can perform in specific data centers. The obvious use case is the 
> production/analytics datacenter configuration. You don’t want the production 
> user to be reading/or writing to the analytics datacenter, and you don’t want 
> the analytics user to be reading from the production datacenter.
> Although we expect users to get this right on that application level, we 
> should also be able to enforce this at the database level. The first approach 
> that comes to mind would be to support an optional DC parameter when granting 
> select and modify permissions to roles. Something like {{GRANT SELECT ON 
> some_keyspace TO that_user IN DC dc1}}, statements that omit the dc would 
> implicitly be granting permission to all dcs. However, I’m not married to 
> this approach.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14001) Gossip after node restart can take a long time to converge about "down" nodes in large clusters

2017-11-07 Thread Joseph Lynch (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243223#comment-16243223
 ] 

Joseph Lynch edited comment on CASSANDRA-14001 at 11/8/17 1:26 AM:
---

I think CASSANDRA-13993 might help with this, but I _think_  it's solving a 
slightly different problem.


was (Author: jolynch):
I think CASSANDRA-13993 might help with this, but I _thin_  it's solving a 
slightly different problem.

> Gossip after node restart can take a long time to converge about "down" nodes 
> in large clusters
> ---
>
> Key: CASSANDRA-14001
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14001
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Lifecycle
>Reporter: Joseph Lynch
>Priority: Minor
>
> When nodes restart in a large cluster, they mark all nodes as "alive", which 
> first calls {{markDead}} and then creates an {{EchoMessage}} and in the 
> callback to that marks the node as alive. This works great, except when that 
> initial echo fails for w.e. reason and that node is marked as dead, in which 
> case it will remain dead for a long while.
> We mostly see this on 100+ node clusters, and almost always when nodes are in 
> different datacenters that have unreliable network connections (e.g, cross 
> region in AWS) and I think that it comes down to a combination of:
> 1. Only a node itself can mark another node as "UP"
> 2. Nodes only gossip with dead nodes with probability {{#dead / (#live +1)}}
> In particular the algorithm in #2 leads to long convergence times because the 
> number of dead nodes it typically very small compared to the cluster size. My 
> back of the envelope model of this algorithm indicates that for a 100 node 
> cluster this would take an average of ~50 seconds with a stdev of 50 seconds, 
> which means we might be waiting _minutes_ for the nodes to gossip with each 
> other. I'm modeling this as the minimum of two [geometric 
> distributions|https://en.wikipedia.org/wiki/Geometric_distribution] with 
> parameter {{p=1/#nodes}}, yielding a geometric distribution with parameter 
> {{p=1-(1-(1/#nodes)^2)}}. So for a 100 node cluster:
> {noformat}
> 100 node cluster =>
> X = Pr(node1 gossips with node2) = geom(0.01)
> Y = Pr(node 2 gossips with node1) = geom(0.01)
> Z = min(X or Y) = geom(1 - (1 - 0.01)^2) = geom(0.02)
> E[Z] = 1/0.02 = 50
> V[Z] = (1-0.02)/(0.02)^2 = 2450
> 1000 node cluster ->
> Z = geom(1 - (1 - 0.001)^2) = geom(0.002)
> E[Z] = 500
> V[Z] = 24500
> {noformat}
> Since we gossip every second that means that on expectation in a 100 node 
> cluster these nodes would see each other after about a minute and in a 
> thousand node cluster, after ~8 minutes. For 100 node clusters the variance 
> is astounding, and means that in particular edge cases we might be waiting 
> hours before these nodes gossip with each other.
> I'm thinking of writing a patch which either:
> # Makes gossip order a shuffled list that includes dead nodes a la [swim 
> gossip|https://www.cs.cornell.edu/~asdas/research/dsn02-swim.pdf]. This would 
> make it so that we waste some rounds on dead nodes but guarantee linear 
> bounding of gossip.
> # Adds an endpoint that re-triggers gossip with all nodes. Operators could 
> call this after a restart a few times if they detect a gossip inconsistency.
> # Bounding the probability we gossip with a dead node at some reasonable 
> number like 1/10 or something. This might cause a lot of gossip load when a 
> node is actually down for large clusters, but would also act to bound the 
> variance.
> # Something else?
> I've got a WIP 
> [branch|https://github.com/apache/cassandra/compare/cassandra-3.11...jolynch:force_gossip]
>  on 3.11 which implements options #1 and #2, but I can reduce/change/modify 
> as needed if people think there is a better way. The patch doesn't pass tests 
> yet but I'm not going to change/add the tests unless we think moving to time 
> bounded gossip for down nodes is a good idea.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14001) Gossip after node restart can take a long time to converge about "down" nodes in large clusters

2017-11-07 Thread Joseph Lynch (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243223#comment-16243223
 ] 

Joseph Lynch commented on CASSANDRA-14001:
--

I think CASSANDRA-13993 might help with this, but I _thin_  it's solving a 
slightly different problem.

> Gossip after node restart can take a long time to converge about "down" nodes 
> in large clusters
> ---
>
> Key: CASSANDRA-14001
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14001
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Lifecycle
>Reporter: Joseph Lynch
>Priority: Minor
>
> When nodes restart in a large cluster, they mark all nodes as "alive", which 
> first calls {{markDead}} and then creates an {{EchoMessage}} and in the 
> callback to that marks the node as alive. This works great, except when that 
> initial echo fails for w.e. reason and that node is marked as dead, in which 
> case it will remain dead for a long while.
> We mostly see this on 100+ node clusters, and almost always when nodes are in 
> different datacenters that have unreliable network connections (e.g, cross 
> region in AWS) and I think that it comes down to a combination of:
> 1. Only a node itself can mark another node as "UP"
> 2. Nodes only gossip with dead nodes with probability {{#dead / (#live +1)}}
> In particular the algorithm in #2 leads to long convergence times because the 
> number of dead nodes it typically very small compared to the cluster size. My 
> back of the envelope model of this algorithm indicates that for a 100 node 
> cluster this would take an average of ~50 seconds with a stdev of 50 seconds, 
> which means we might be waiting _minutes_ for the nodes to gossip with each 
> other. I'm modeling this as the minimum of two [geometric 
> distributions|https://en.wikipedia.org/wiki/Geometric_distribution] with 
> parameter {{p=1/#nodes}}, yielding a geometric distribution with parameter 
> {{p=1-(1-(1/#nodes)^2)}}. So for a 100 node cluster:
> {noformat}
> 100 node cluster =>
> X = Pr(node1 gossips with node2) = geom(0.01)
> Y = Pr(node 2 gossips with node1) = geom(0.01)
> Z = min(X or Y) = geom(1 - (1 - 0.01)^2) = geom(0.02)
> E[Z] = 1/0.02 = 50
> V[Z] = (1-0.02)/(0.02)^2 = 2450
> 1000 node cluster ->
> Z = geom(1 - (1 - 0.001)^2) = geom(0.002)
> E[Z] = 500
> V[Z] = 24500
> {noformat}
> Since we gossip every second that means that on expectation in a 100 node 
> cluster these nodes would see each other after about a minute and in a 
> thousand node cluster, after ~8 minutes. For 100 node clusters the variance 
> is astounding, and means that in particular edge cases we might be waiting 
> hours before these nodes gossip with each other.
> I'm thinking of writing a patch which either:
> # Makes gossip order a shuffled list that includes dead nodes a la [swim 
> gossip|https://www.cs.cornell.edu/~asdas/research/dsn02-swim.pdf]. This would 
> make it so that we waste some rounds on dead nodes but guarantee linear 
> bounding of gossip.
> # Adds an endpoint that re-triggers gossip with all nodes. Operators could 
> call this after a restart a few times if they detect a gossip inconsistency.
> # Bounding the probability we gossip with a dead node at some reasonable 
> number like 1/10 or something. This might cause a lot of gossip load when a 
> node is actually down for large clusters, but would also act to bound the 
> variance.
> # Something else?
> I've got a WIP 
> [branch|https://github.com/apache/cassandra/compare/cassandra-3.11...jolynch:force_gossip]
>  on 3.11 which implements options #1 and #2, but I can reduce/change/modify 
> as needed if people think there is a better way. The patch doesn't pass tests 
> yet but I'm not going to change/add the tests unless we think moving to time 
> bounded gossip for down nodes is a good idea.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13964) Tracing interferes with digest requests when using RandomPartitioner

2017-11-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243074#comment-16243074
 ] 

ASF GitHub Bot commented on CASSANDRA-13964:


Github user beobal closed the pull request at:

https://github.com/apache/cassandra-dtest/pull/10


> Tracing interferes with digest requests when using RandomPartitioner
> 
>
> Key: CASSANDRA-13964
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13964
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths, Observability
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
> Fix For: 3.0.16, 3.11.2, 4.0
>
>
> A {{ThreadLocal}} is used to generate the MD5 digest when a 
> replica serves a read command and the {{isDigestQuery}} flag is set. The same 
> threadlocal is also used by {{RandomPartitioner}} to decorate partition keys. 
> So in a cluster with RP, if tracing is enabled the data digest is corrupted 
> by the partitioner making tokens for the tracing mutations. This causes a 
> digest mismatch on the coordinator, triggering a full data read on every read 
> where CL > 1 (or speculative execution/read repair kick in).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13987) Multithreaded commitlog subtly changed durability

2017-11-07 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16242915#comment-16242915
 ] 

Jason Brown commented on CASSANDRA-13987:
-

[~benedict] thanks for the comments, and for (indirectly) confirming that my 
understanding of the multithreaded commit log is more-or-less correct.

> Multithreaded commitlog subtly changed durability
> -
>
> Key: CASSANDRA-13987
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13987
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jason Brown
>Assignee: Jason Brown
> Fix For: 4.x
>
>
> When multithreaded commitlog was introduced in CASSANDRA-3578, we subtly 
> changed the way that commitlog durability worked. Everything still gets 
> written to an mmap file. However, not everything is replayable from the 
> mmaped file after a process crash, in periodic mode.
> In brief, the reason this changesd is due to the chained markers that are 
> required for the multithreaded commit log. At each msync, we wait for 
> outstanding mutations to serialize into the commitlog, and update a marker 
> before and after the commits that have accumluated since the last sync. With 
> those markers, we can safely replay that section of the commitlog. Without 
> the markers, we have no guarantee that the commits in that section were 
> successfully written, thus we abandon those commits on replay.
> If you have correlated process failures of multiple nodes at "nearly" the 
> same time (see ["There Is No 
> Now"|http://queue.acm.org/detail.cfm?id=2745385]), it is possible to have 
> data loss if none of the nodes msync the commitlog. For example, with RF=3, 
> if quorum write succeeds on two nodes (and we acknowledge the write back to 
> the client), and then the process on both nodes OOMs (say, due to reading the 
> index for a 100GB partition), the write will be lost if neither process 
> msync'ed the commitlog. More exactly, the commitlog cannot be fully replayed. 
> The reason why this data is silently lost is due to the chained markers that 
> were introduced with CASSANDRA-3578.
> The problem we are addressing with this ticket is incrementally improving 
> 'durability' due to process crash, not host crash. (Note: operators should 
> use batch mode to ensure greater durability, but batch mode in it's current 
> implementation is a) borked, and b) will burn through, *very* rapidly, SSDs 
> that don't have a non-volatile write cache sitting in front.) 
> The current default for {{commitlog_sync_period_in_ms}} is 10 seconds, which 
> means that a node could lose up to ten seconds of data due to process crash. 
> The unfortunate thing is that the data is still avaialble, in the mmap file, 
> but we can't replay it due to incomplete chained markers.
> ftr, I don't believe we've ever had a stated policy about commitlog 
> durability wrt process crash. Pre-2.0 we naturally piggy-backed off the 
> memory mapped file and the fact that every mutation was acquired a lock and 
> wrote into the mmap buffer, and the ability to replay everything out of it 
> came for free. With CASSANDRA-3578, that was subtly changed. 
> Something [~jjirsa] pointed out to me is that [MySQL provides a way to adjust 
> the durability 
> guarantees|https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit]
>  of each commit in innodb via the {{innodb_flush_log_at_trx_commit}}. I'm 
> using that idea as a loose springboard for what to do here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13872) document speculative_retry on DDL page

2017-11-07 Thread Jon Haddad (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad updated CASSANDRA-13872:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Awesome, thanks for the update.  Merged to trunk as {{976f48fb06}}.  Closing 
this out.

> document speculative_retry on DDL page
> --
>
> Key: CASSANDRA-13872
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13872
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation and Website
>Reporter: Jon Haddad
>Assignee: Jordan Vaughan
>  Labels: docuentation, lhf
> Fix For: 4.0
>
>
> There's no mention of speculative_retry or how it works on 
> https://cassandra.apache.org/doc/latest/cql/ddl.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-13874) nodetool setcachecapacity behaves oddly when cache size = 0

2017-11-07 Thread Jon Haddad (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad reassigned CASSANDRA-13874:
--

Assignee: Michal Szczepanski

> nodetool setcachecapacity behaves oddly when cache size = 0
> ---
>
> Key: CASSANDRA-13874
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13874
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jon Haddad
>Assignee: Michal Szczepanski
>  Labels: lhf, user-experience
> Attachments: 13874-trunk.txt
>
>
> If a node has row cache disabled, trying to turn it on via setcachecapacity 
> doesn't issue an error, and doesn't turn it on, it just silently doesn't work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



cassandra git commit: Document speculative_retry case-insensitivity and new "P" suffix on DDL page

2017-11-07 Thread rustyrazorblade
Repository: cassandra
Updated Branches:
  refs/heads/trunk 9e7a401b9 -> 976f48fb0


Document speculative_retry case-insensitivity and new "P" suffix on DDL page

Patch by Jordan Vaughan for CASSANDRA-13872; Reviewed by Jon Haddad


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/976f48fb
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/976f48fb
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/976f48fb

Branch: refs/heads/trunk
Commit: 976f48fb0664b15e1741a435447e593ad80edc4a
Parents: 9e7a401
Author: Jordan Vaughan 
Authored: Thu Nov 2 00:12:09 2017 -0700
Committer: Jon Haddad 
Committed: Tue Nov 7 12:19:17 2017 -0800

--
 doc/source/cql/ddl.rst | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/976f48fb/doc/source/cql/ddl.rst
--
diff --git a/doc/source/cql/ddl.rst b/doc/source/cql/ddl.rst
index a09265b..780a412 100644
--- a/doc/source/cql/ddl.rst
+++ b/doc/source/cql/ddl.rst
@@ -493,7 +493,7 @@ Speculative retry options
 By default, Cassandra read coordinators only query as many replicas as 
necessary to satisfy
 consistency levels: one for consistency level ``ONE``, a quorum for 
``QUORUM``, and so on.
 ``speculative_retry`` determines when coordinators may query additional 
replicas, which is useful
-when replicas are slow or unresponsive.  The following are legal values:
+when replicas are slow or unresponsive.  The following are legal values 
(case-insensitive):
 
 =  
=
  FormatExample  Description
@@ -502,6 +502,7 @@ when replicas are slow or unresponsive.  The following are 
legal values:
 If a replica takes longer than 
``X`` percent of this table's average
 response time, the coordinator 
queries an additional replica.
 ``X`` must be between 0 and 100.
+ ``XP``90.5PSynonym for ``XPERCENTILE``
  ``Yms``   25ms If a replica takes more than ``Y`` 
milliseconds to respond,
 the coordinator queries an 
additional replica.
  ``ALWAYS`` Coordinators always query all 
replicas.


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



cassandra-dtest git commit: Add test for digest requests with RandomPartitioner and tracing enabled

2017-11-07 Thread samt
Repository: cassandra-dtest
Updated Branches:
  refs/heads/master 7cc06a086 -> 01df7c498


Add test for digest requests with RandomPartitioner and tracing enabled

Patch by Sam Tunnicliffe; reviewed by Jason Brown and Philip Thompson
for CASSANDRA-13964


Project: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/commit/01df7c49
Tree: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/tree/01df7c49
Diff: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/diff/01df7c49

Branch: refs/heads/master
Commit: 01df7c49864ed5fa66db2181599a463a33b1f877
Parents: 7cc06a0
Author: Sam Tunnicliffe 
Authored: Tue Oct 17 14:50:25 2017 +0100
Committer: Sam Tunnicliffe 
Committed: Tue Nov 7 16:20:55 2017 +

--
 cql_tracing_test.py | 39 +--
 tools/jmxutils.py   | 12 
 2 files changed, 49 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra-dtest/blob/01df7c49/cql_tracing_test.py
--
diff --git a/cql_tracing_test.py b/cql_tracing_test.py
index aaf55aa..549e4d0 100644
--- a/cql_tracing_test.py
+++ b/cql_tracing_test.py
@@ -3,6 +3,7 @@ from distutils.version import LooseVersion
 
 from dtest import Tester, debug, create_ks
 from tools.decorators import since
+from tools.jmxutils import make_mbean, JolokiaAgent, 
remove_perf_disable_shared_mem
 
 
 class TestCqlTracing(Tester):
@@ -15,16 +16,23 @@ class TestCqlTracing(Tester):
 #  instantiated when specified as a custom tracing implementation.
 """
 
-def prepare(self, create_keyspace=True, nodes=3, rf=3, protocol_version=3, 
jvm_args=None, **kwargs):
+def prepare(self, create_keyspace=True, nodes=3, rf=3, protocol_version=3, 
jvm_args=None, random_partitioner=False, **kwargs):
 if jvm_args is None:
 jvm_args = []
 
 jvm_args.append('-Dcassandra.wait_for_tracing_events_timeout_secs=15')
 
 cluster = self.cluster
-cluster.populate(nodes).start(wait_for_binary_proto=True, 
jvm_args=jvm_args)
 
+if random_partitioner:
+
cluster.set_partitioner("org.apache.cassandra.dht.RandomPartitioner")
+else:
+
cluster.set_partitioner("org.apache.cassandra.dht.Murmur3Partitioner")
+
+cluster.populate(nodes)
 node1 = cluster.nodelist()[0]
+remove_perf_disable_shared_mem(node1)  # necessary for jmx
+cluster.start(wait_for_binary_proto=True, jvm_args=jvm_args)
 
 session = self.patient_cql_connection(node1, 
protocol_version=protocol_version)
 if create_keyspace:
@@ -176,3 +184,30 @@ class TestCqlTracing(Tester):
 self.assertIn("Default constructor for Tracing class "
   "'org.apache.cassandra.tracing.TracingImpl' is 
inaccessible.",
   check_for_errs_in)
+
+@since('3.0')
+def test_tracing_does_not_interfere_with_digest_calculation(self):
+"""
+Test that enabling tracing doesn't interfere with digest responses 
when using RandomPartitioner.
+The use of a threadlocal MessageDigest for generating both 
DigestResponse messages and for
+calculating tokens meant that the DigestResponse was always incorrect 
when both RP and tracing
+were enabled, leading to unnecessary data reads.
+
+@jira_ticket CASSANDRA-13964
+"""
+
+session = self.prepare(random_partitioner=True)
+self.trace(session)
+
+node1 = self.cluster.nodelist()[0]
+
+rr_count = make_mbean('metrics', type='ReadRepair', 
name='RepairedBlocking')
+with JolokiaAgent(node1) as jmx:
+# the MBean may not have been initialized, in which case Jolokia 
agent will return
+# a HTTP 404 response. If we receive such, we know that no digest 
mismatch was reported
+# If we are able to read the MBean attribute, assert that the 
count is 0
+if jmx.has_mbean(rr_count):
+# expect 0 digest mismatches
+self.assertEqual(0, jmx.read_attribute(rr_count, 'Count'))
+else:
+pass

http://git-wip-us.apache.org/repos/asf/cassandra-dtest/blob/01df7c49/tools/jmxutils.py
--
diff --git a/tools/jmxutils.py b/tools/jmxutils.py
index 8c20eb8..7468226 100644
--- a/tools/jmxutils.py
+++ b/tools/jmxutils.py
@@ -243,6 +243,18 @@ class JolokiaAgent(object):
 raise Exception("Jolokia agent returned non-200 status: %s" % 
(response,))
 return response
 
+def has_mbean(self, mbean, verbose=True):
+"""
+Check for the existence of an MBean
+
+`mbean` should be the full name of 

[jira] [Commented] (CASSANDRA-13983) Support a means of logging all queries as they were invoked

2017-11-07 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16242533#comment-16242533
 ] 

Blake Eggleston commented on CASSANDRA-13983:
-

First round of comments:

bin/fqltool
* looks like you unnecessarily copied a nodetool compatibility check at the top 
of the file

FullQueryLogger
* configure
** first preconditions check uses {{|}}, not {{||}}
** Personally, I try to avoid combining assignments/evaluations like this 
{{RollCycles.valueOf(rollCycle = rollCycle.toUpperCase())}}. They're harder to 
follow than they need to be. Could we just parse the value at the top of the 
method and check that it's not null here?

WeightedQueue
* It doesn't look like we use special Weigher implementations anywhere. I think 
this could be slightly simplified if we made the type param {{}}?

BinLog
* onReleased
** {{bytesInStoreFiles}} is incremented, but doesn't seem to be decremented 
after a file is deleted, so once we've recorded more than maxLogSize, we'll 
always delete all files
** I think we should be a bit safer with how we access {{bytesInStoreFiles}}. 
The calling method in chronicle is synchronized, so this should be safe as is, 
but it doesn't look like there are any documented guarantees that this won't 
silently change at some point in the future. Maybe we could synchronize the 
method, or use an atomic?

Misc: method comment style is a bit inconsistent. Can you use the {{/** */}} 
style, not the {{//}} style?

> Support a means of logging all queries as they were invoked
> ---
>
> Key: CASSANDRA-13983
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13983
> Project: Cassandra
>  Issue Type: New Feature
>  Components: CQL, Observability, Testing, Tools
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
> Fix For: 4.0
>
>
> For correctness testing it's useful to be able to capture production traffic 
> so that it can be replayed against both the old and new versions of Cassandra 
> while comparing the results.
> Implementing this functionality once inside the database is high performance 
> and presents less operational complexity.
> In [this patch|https://github.com/apache/cassandra/pull/169] there is an 
> implementation of a full query log that logs uses chronicle-queue (apache 
> licensed, the maven artifacts are labeled incorrectly in some cases, 
> dependencies are also apache licensed) to implement a rotating log of queries.
> * Single thread asynchronously writes log entries to disk to reduce impact on 
> query latency
> * Heap memory usage bounded by a weighted queue with configurable maximum 
> weight sitting in front of logging thread
> * If the weighted queue is full producers can be blocked or samples can be 
> dropped
> * Disk utilization is bounded by deleting old log segments once a 
> configurable size is reached
> * The on disk serialization uses a flexible schema binary format 
> (chronicle-wire) making it easy to skip unrecognized fields, add new ones, 
> and omit old ones.
> * Can be enabled and configured via JMX, disabled, and reset (delete on disk 
> data), logging path is configurable via both JMX and YAML
> * Introduce new {{fqltool}} in /bin that currently implements {{Dump}} which 
> can dump in a human readable format full query logs as well as follow active 
> full query logs
> Follow up work:
> * Introduce new {{fqltool}} command Replay which can replay N full query logs 
> to two different clusters and compare the result and check for 
> inconsistencies. <- Actively working on getting this done
> * Log not just queries but their results to facilitate a comparison between 
> the original query result and the replayed result. <- Really just don't have 
> specific use case at the moment
> * "Consistent" query logging allowing replay to fully replicate the original 
> order of execution and completion even in the face of races (including CAS). 
> <- This is more speculative



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13958) [CQL] Inconsistent handling double dollar sign for strings

2017-11-07 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16242424#comment-16242424
 ] 

Robert Stupp commented on CASSANDRA-13958:
--

Not a complete review or so and I haven't tried the patch at all. But some 
thoughts on the patch:
* The unit test method names should continue to start with {{test}} (not 
{{should}}). Nice work on the separation of the test methods though.
* I see that the test checks for three {{$}} signs. Would love to see checks 
with four or more dollar signs (leading, middle and trailing) in various 
combinations and spanning multiple lines. Both for the unit and cqlsh tests. An 
algorithmic approach to test those combinations might be beneficial over coding 
all combinations manually.
* The last point (many combinations) should also work for multiple parameters 
to a single statement - especially to verify that statements like {{INSERT INTO 
tab (x,y,z) VALUES (foo$$, $$poiewf$ewfi$, ewfpioj$$)}} work.


> [CQL] Inconsistent handling double dollar sign for strings
> --
>
> Key: CASSANDRA-13958
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13958
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Hugo Picado
>Assignee: Michał Szczygieł
>Priority: Minor
>
> Double dollar signs is a [built-in method for escaping columns that may 
> contain single quotes in its 
> content](https://docs.datastax.com/en/cql/3.3/cql/cql_reference/escape_char_r.html).
>  The way this is handled however is not consistent, in the sense that it 
> allows for $ to appear in the middle of the string but not in the last char.
> *Examples:*
> Valid: insert into users(id, name) values(1, $$john$$)
> Inserts the string *john*
> Valid: insert into users(id, name) values(1, $$jo$hn$$)
> Inserts the string *jo$hn*
> Valid: insert into users(id, name) values(1, $$$john$$)
> Inserts the string *$john*
> Invalid: insert into users(id, name) values(1, $$john$$$)
> Fails with:
> {code}
> Invalid syntax at line 1, char 48
>   insert into users(id, name) values(1, $$john$$$);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13403) nodetool repair breaks SASI index

2017-11-07 Thread Ludovic Boutros (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ludovic Boutros updated CASSANDRA-13403:

Attachment: testSASIRepair.patch

> nodetool repair breaks SASI index
> -
>
> Key: CASSANDRA-13403
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13403
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: 3.10
>Reporter: Igor Novgorodov
>Assignee: Alex Petrov
> Attachments: 3_nodes_compaction.log, 4_nodes_compaction.log, 
> testSASIRepair.patch
>
>
> I've got table:
> {code}
> CREATE TABLE cservice.bulks_recipients (
> recipient text,
> bulk_id uuid,
> datetime_final timestamp,
> datetime_sent timestamp,
> request_id uuid,
> status int,
> PRIMARY KEY (recipient, bulk_id)
> ) WITH CLUSTERING ORDER BY (bulk_id ASC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
> CREATE CUSTOM INDEX bulk_recipients_bulk_id ON cservice.bulks_recipients 
> (bulk_id) USING 'org.apache.cassandra.index.sasi.SASIIndex';
> {code}
> There are 11 rows in it:
> {code}
> > select * from bulks_recipients;
> ...
> (11 rows)
> {code}
> Let's query by index (all rows have the same *bulk_id*):
> {code}
> > select * from bulks_recipients where bulk_id = 
> > baa94815-e276-4ca4-adda-5b9734e6c4a5;   
> >   
> ...
> (11 rows)
> {code}
> Ok, everything is fine.
> Now i'm doing *nodetool repair --partitioner-range --job-threads 4 --full* on 
> each node in cluster sequentially.
> After it finished:
> {code}
> > select * from bulks_recipients where bulk_id = 
> > baa94815-e276-4ca4-adda-5b9734e6c4a5;
> ...
> (2 rows)
> {code}
> Only two rows.
> While the rows are actually there:
> {code}
> > select * from bulks_recipients;
> ...
> (11 rows)
> {code}
> If i issue an incremental repair on a random node, i can get like 7 rows 
> after index query.
> Dropping index and recreating it fixes the issue. Is it a bug or am i doing 
> the repair the wrong way?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-3858) expose "propagation delay" metric in JMX

2017-11-07 Thread George (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16242307#comment-16242307
 ] 

George commented on CASSANDRA-3858:
---

I'm surprised there's not great interest for such a feature. Propagation delays 
must be a valid concern. What am I missing?

> expose "propagation delay" metric in JMX
> 
>
> Key: CASSANDRA-3858
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3858
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Peter Schuller
>Priority: Minor
>
> My idea is to augment the gossip protocol to contain timestamps. We wouldn't 
> use the timestamps for anything "important", but we could use them to allow 
> each node to expose a number which is the number of milliseconds (or seconds) 
> "old" information is about nodes that are "the oldest" and also alive.
> When nodes go down you'd see spikes, but for most cases where nodes live, 
> this information should give you a pretty good idea of how fast gossip 
> information is propagating through the cluster, assuming you keep your clocks 
> in synch.
> It should be a good thing to have graphed, and to have alerts on.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13403) nodetool repair breaks SASI index

2017-11-07 Thread Ludovic Boutros (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16242309#comment-16242309
 ] 

Ludovic Boutros commented on CASSANDRA-13403:
-

[~ifesdjeen],

I think the issue is here in the 
[CompactionManager|https://github.com/apache/cassandra/blob/6d429cd0315d3509c904d0e83f91f7d12ba12085/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L1570].

The two SSTableWriters share the same LifeCycleTransaction instance. Therefore, 
the second commit call is not applied and SASI index are not committed.

I'have made a small unit test to reproduce the issue. I will attach it as a 
small patch for reference.

> nodetool repair breaks SASI index
> -
>
> Key: CASSANDRA-13403
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13403
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: 3.10
>Reporter: Igor Novgorodov
>Assignee: Alex Petrov
> Attachments: 3_nodes_compaction.log, 4_nodes_compaction.log
>
>
> I've got table:
> {code}
> CREATE TABLE cservice.bulks_recipients (
> recipient text,
> bulk_id uuid,
> datetime_final timestamp,
> datetime_sent timestamp,
> request_id uuid,
> status int,
> PRIMARY KEY (recipient, bulk_id)
> ) WITH CLUSTERING ORDER BY (bulk_id ASC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
> CREATE CUSTOM INDEX bulk_recipients_bulk_id ON cservice.bulks_recipients 
> (bulk_id) USING 'org.apache.cassandra.index.sasi.SASIIndex';
> {code}
> There are 11 rows in it:
> {code}
> > select * from bulks_recipients;
> ...
> (11 rows)
> {code}
> Let's query by index (all rows have the same *bulk_id*):
> {code}
> > select * from bulks_recipients where bulk_id = 
> > baa94815-e276-4ca4-adda-5b9734e6c4a5;   
> >   
> ...
> (11 rows)
> {code}
> Ok, everything is fine.
> Now i'm doing *nodetool repair --partitioner-range --job-threads 4 --full* on 
> each node in cluster sequentially.
> After it finished:
> {code}
> > select * from bulks_recipients where bulk_id = 
> > baa94815-e276-4ca4-adda-5b9734e6c4a5;
> ...
> (2 rows)
> {code}
> Only two rows.
> While the rows are actually there:
> {code}
> > select * from bulks_recipients;
> ...
> (11 rows)
> {code}
> If i issue an incremental repair on a random node, i can get like 7 rows 
> after index query.
> Dropping index and recreating it fixes the issue. Is it a bug or am i doing 
> the repair the wrong way?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13964) Tracing interferes with digest requests when using RandomPartitioner

2017-11-07 Thread Sam Tunnicliffe (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-13964:

   Resolution: Fixed
Fix Version/s: 4.0
   3.11.2
   3.0.16
Reproduced In: 3.11.1, 3.0.15, 4.0  (was: 3.0.15, 3.11.1, 4.0)
   Status: Resolved  (was: Patch Available)

The CI was generally good barring a couple of flaky-ish tests which I've 
checked are passing locally, so I've committed to 3.0 in 
{{58daf1376456289f97f0ef0b0daf9e0d03ba6b81}} and merged to 3.11 and trunk.

> Tracing interferes with digest requests when using RandomPartitioner
> 
>
> Key: CASSANDRA-13964
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13964
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths, Observability
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
> Fix For: 3.0.16, 3.11.2, 4.0
>
>
> A {{ThreadLocal}} is used to generate the MD5 digest when a 
> replica serves a read command and the {{isDigestQuery}} flag is set. The same 
> threadlocal is also used by {{RandomPartitioner}} to decorate partition keys. 
> So in a cluster with RP, if tracing is enabled the data digest is corrupted 
> by the partitioner making tokens for the tracing mutations. This causes a 
> digest mismatch on the coordinator, triggering a full data read on every read 
> where CL > 1 (or speculative execution/read repair kick in).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[4/6] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.11

2017-11-07 Thread samt
Merge branch 'cassandra-3.0' into cassandra-3.11


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/ab6201c6
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/ab6201c6
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/ab6201c6

Branch: refs/heads/trunk
Commit: ab6201c65b193c2df4c2f25f779a591c917b1df8
Parents: 6d429cd 58daf13
Author: Sam Tunnicliffe 
Authored: Tue Nov 7 13:59:20 2017 +
Committer: Sam Tunnicliffe 
Committed: Tue Nov 7 13:59:20 2017 +

--
 CHANGES.txt |  1 +
 .../apache/cassandra/dht/RandomPartitioner.java | 43 ++--
 .../org/apache/cassandra/utils/FBUtilities.java | 19 -
 3 files changed, 40 insertions(+), 23 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/ab6201c6/CHANGES.txt
--
diff --cc CHANGES.txt
index 275294f,3f4f3f2..1269dcf
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -1,8 -1,5 +1,9 @@@
 -3.0.16
 +3.11.2
 + * Round buffer size to powers of 2 for the chunk cache (CASSANDRA-13897)
 + * Update jackson JSON jars (CASSANDRA-13949)
 + * Avoid locks when checking LCS fanout and if we should defrag 
(CASSANDRA-13930)
 +Merged from 3.0:
+  * Tracing interferes with digest requests when using RandomPartitioner 
(CASSANDRA-13964)
   * Add flag to disable materialized views, and warnings on creation 
(CASSANDRA-13959)
   * Don't let user drop or generally break tables in system_distributed 
(CASSANDRA-13813)
   * Provide a JMX call to sync schema with local storage (CASSANDRA-13954)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/ab6201c6/src/java/org/apache/cassandra/dht/RandomPartitioner.java
--
diff --cc src/java/org/apache/cassandra/dht/RandomPartitioner.java
index 82c2493,c7837c9..bdf8b85
--- a/src/java/org/apache/cassandra/dht/RandomPartitioner.java
+++ b/src/java/org/apache/cassandra/dht/RandomPartitioner.java
@@@ -117,20 -103,7 +141,20 @@@ public class RandomPartitioner implemen
  return new BigIntegerToken(token);
  }
  
 -private final Token.TokenFactory tokenFactory = new Token.TokenFactory() {
 +public BigIntegerToken getRandomToken(Random random)
 +{
- BigInteger token = 
FBUtilities.hashToBigInteger(GuidGenerator.guidAsBytes(random, 
"host/127.0.0.1", 0));
++BigInteger token = hashToBigInteger(GuidGenerator.guidAsBytes(random, 
"host/127.0.0.1", 0));
 +if ( token.signum() == -1 )
 +token = token.multiply(BigInteger.valueOf(-1L));
 +return new BigIntegerToken(token);
 +}
 +
 +private boolean isValidToken(BigInteger token) {
 +return token.compareTo(ZERO) >= 0 && token.compareTo(MAXIMUM) <= 0;
 +}
 +
 +private final Token.TokenFactory tokenFactory = new Token.TokenFactory()
 +{
  public ByteBuffer toByteArray(Token token)
  {
  BigIntegerToken bigIntegerToken = (BigIntegerToken) token;
@@@ -275,9 -230,14 +300,19 @@@
  return partitionOrdering;
  }
  
 +public Optional splitter()
 +{
 +return Optional.of(splitter);
 +}
 +
+ private static BigInteger hashToBigInteger(ByteBuffer data)
+ {
+ MessageDigest messageDigest = localMD5Digest.get();
+ if (data.hasArray())
+ messageDigest.update(data.array(), data.arrayOffset() + 
data.position(), data.remaining());
+ else
+ messageDigest.update(data.duplicate());
+ 
+ return new BigInteger(messageDigest.digest()).abs();
+ }
  }

http://git-wip-us.apache.org/repos/asf/cassandra/blob/ab6201c6/src/java/org/apache/cassandra/utils/FBUtilities.java
--


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[6/6] cassandra git commit: Merge branch 'cassandra-3.11' into trunk

2017-11-07 Thread samt
Merge branch 'cassandra-3.11' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/9e7a401b
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/9e7a401b
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/9e7a401b

Branch: refs/heads/trunk
Commit: 9e7a401b9c1e9244ebb7654f5e4bafa1d633d2ca
Parents: 07fbd8e ab6201c
Author: Sam Tunnicliffe 
Authored: Tue Nov 7 14:03:45 2017 +
Committer: Sam Tunnicliffe 
Committed: Tue Nov 7 14:03:45 2017 +

--
 CHANGES.txt |  1 +
 .../apache/cassandra/dht/RandomPartitioner.java | 22 ++--
 2 files changed, 12 insertions(+), 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/9e7a401b/CHANGES.txt
--

http://git-wip-us.apache.org/repos/asf/cassandra/blob/9e7a401b/src/java/org/apache/cassandra/dht/RandomPartitioner.java
--


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[3/6] cassandra git commit: RandomPartitioner has separate MessageDigest for token generation

2017-11-07 Thread samt
RandomPartitioner has separate MessageDigest for token generation

Patch by Sam Tunnicliffe; reviewed by Jason Brown for CASSANDRA-13964


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/58daf137
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/58daf137
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/58daf137

Branch: refs/heads/trunk
Commit: 58daf1376456289f97f0ef0b0daf9e0d03ba6b81
Parents: 6c29ee8
Author: Sam Tunnicliffe 
Authored: Tue Oct 17 14:51:43 2017 +0100
Committer: Sam Tunnicliffe 
Committed: Tue Nov 7 13:38:25 2017 +

--
 CHANGES.txt |  1 +
 .../apache/cassandra/dht/RandomPartitioner.java | 43 ++--
 .../org/apache/cassandra/utils/FBUtilities.java | 19 -
 3 files changed, 41 insertions(+), 22 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/58daf137/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 935931c..3f4f3f2 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 3.0.16
+ * Tracing interferes with digest requests when using RandomPartitioner 
(CASSANDRA-13964)
  * Add flag to disable materialized views, and warnings on creation 
(CASSANDRA-13959)
  * Don't let user drop or generally break tables in system_distributed 
(CASSANDRA-13813)
  * Provide a JMX call to sync schema with local storage (CASSANDRA-13954)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/58daf137/src/java/org/apache/cassandra/dht/RandomPartitioner.java
--
diff --git a/src/java/org/apache/cassandra/dht/RandomPartitioner.java 
b/src/java/org/apache/cassandra/dht/RandomPartitioner.java
index b0dea01..c7837c9 100644
--- a/src/java/org/apache/cassandra/dht/RandomPartitioner.java
+++ b/src/java/org/apache/cassandra/dht/RandomPartitioner.java
@@ -20,6 +20,7 @@ package org.apache.cassandra.dht;
 import java.math.BigDecimal;
 import java.math.BigInteger;
 import java.nio.ByteBuffer;
+import java.security.MessageDigest;
 import java.util.*;
 
 import com.google.common.annotations.VisibleForTesting;
@@ -45,11 +46,35 @@ public class RandomPartitioner implements IPartitioner
 public static final BigIntegerToken MINIMUM = new BigIntegerToken("-1");
 public static final BigInteger MAXIMUM = new BigInteger("2").pow(127);
 
-private static final int HEAP_SIZE = (int) ObjectSizes.measureDeep(new 
BigIntegerToken(FBUtilities.hashToBigInteger(ByteBuffer.allocate(1;
+/**
+ * Maintain a separate threadlocal message digest, exclusively for token 
hashing. This is necessary because
+ * when Tracing is enabled and using the default tracing implementation, 
creating the mutations for the trace
+ * events involves tokenizing the partition keys. This happens multiple 
times whilst servicing a ReadCommand,
+ * and so can interfere with the stateful digest calculation if the node 
is a replica producing a digest response.
+ */
+private static final ThreadLocal localMD5Digest = new 
ThreadLocal()
+{
+@Override
+protected MessageDigest initialValue()
+{
+return FBUtilities.newMessageDigest("MD5");
+}
+
+@Override
+public MessageDigest get()
+{
+MessageDigest digest = super.get();
+digest.reset();
+return digest;
+}
+};
+
+private static final int HEAP_SIZE = (int) ObjectSizes.measureDeep(new 
BigIntegerToken(hashToBigInteger(ByteBuffer.allocate(1;
 
 public static final RandomPartitioner instance = new RandomPartitioner();
 public static final AbstractType partitionOrdering = new 
PartitionerDefinedOrder(instance);
 
+
 public DecoratedKey decorateKey(ByteBuffer key)
 {
 return new CachedHashDecoratedKey(getToken(key), key);
@@ -72,7 +97,7 @@ public class RandomPartitioner implements IPartitioner
 
 public BigIntegerToken getRandomToken()
 {
-BigInteger token = 
FBUtilities.hashToBigInteger(GuidGenerator.guidAsBytes());
+BigInteger token = hashToBigInteger(GuidGenerator.guidAsBytes());
 if ( token.signum() == -1 )
 token = token.multiply(BigInteger.valueOf(-1L));
 return new BigIntegerToken(token);
@@ -160,7 +185,8 @@ public class RandomPartitioner implements IPartitioner
 {
 if (key.remaining() == 0)
 return MINIMUM;
-return new BigIntegerToken(FBUtilities.hashToBigInteger(key));
+
+return new BigIntegerToken(hashToBigInteger(key));
 }
 
 public Map describeOwnership(List sortedTokens)
@@ -203,4 +229,15 @@ public class RandomPartitioner 

[1/6] cassandra git commit: RandomPartitioner has separate MessageDigest for token generation

2017-11-07 Thread samt
Repository: cassandra
Updated Branches:
  refs/heads/cassandra-3.0 6c29ee84a -> 58daf1376
  refs/heads/cassandra-3.11 6d429cd03 -> ab6201c65
  refs/heads/trunk 07fbd8ee6 -> 9e7a401b9


RandomPartitioner has separate MessageDigest for token generation

Patch by Sam Tunnicliffe; reviewed by Jason Brown for CASSANDRA-13964


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/58daf137
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/58daf137
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/58daf137

Branch: refs/heads/cassandra-3.0
Commit: 58daf1376456289f97f0ef0b0daf9e0d03ba6b81
Parents: 6c29ee8
Author: Sam Tunnicliffe 
Authored: Tue Oct 17 14:51:43 2017 +0100
Committer: Sam Tunnicliffe 
Committed: Tue Nov 7 13:38:25 2017 +

--
 CHANGES.txt |  1 +
 .../apache/cassandra/dht/RandomPartitioner.java | 43 ++--
 .../org/apache/cassandra/utils/FBUtilities.java | 19 -
 3 files changed, 41 insertions(+), 22 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/58daf137/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 935931c..3f4f3f2 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 3.0.16
+ * Tracing interferes with digest requests when using RandomPartitioner 
(CASSANDRA-13964)
  * Add flag to disable materialized views, and warnings on creation 
(CASSANDRA-13959)
  * Don't let user drop or generally break tables in system_distributed 
(CASSANDRA-13813)
  * Provide a JMX call to sync schema with local storage (CASSANDRA-13954)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/58daf137/src/java/org/apache/cassandra/dht/RandomPartitioner.java
--
diff --git a/src/java/org/apache/cassandra/dht/RandomPartitioner.java 
b/src/java/org/apache/cassandra/dht/RandomPartitioner.java
index b0dea01..c7837c9 100644
--- a/src/java/org/apache/cassandra/dht/RandomPartitioner.java
+++ b/src/java/org/apache/cassandra/dht/RandomPartitioner.java
@@ -20,6 +20,7 @@ package org.apache.cassandra.dht;
 import java.math.BigDecimal;
 import java.math.BigInteger;
 import java.nio.ByteBuffer;
+import java.security.MessageDigest;
 import java.util.*;
 
 import com.google.common.annotations.VisibleForTesting;
@@ -45,11 +46,35 @@ public class RandomPartitioner implements IPartitioner
 public static final BigIntegerToken MINIMUM = new BigIntegerToken("-1");
 public static final BigInteger MAXIMUM = new BigInteger("2").pow(127);
 
-private static final int HEAP_SIZE = (int) ObjectSizes.measureDeep(new 
BigIntegerToken(FBUtilities.hashToBigInteger(ByteBuffer.allocate(1;
+/**
+ * Maintain a separate threadlocal message digest, exclusively for token 
hashing. This is necessary because
+ * when Tracing is enabled and using the default tracing implementation, 
creating the mutations for the trace
+ * events involves tokenizing the partition keys. This happens multiple 
times whilst servicing a ReadCommand,
+ * and so can interfere with the stateful digest calculation if the node 
is a replica producing a digest response.
+ */
+private static final ThreadLocal localMD5Digest = new 
ThreadLocal()
+{
+@Override
+protected MessageDigest initialValue()
+{
+return FBUtilities.newMessageDigest("MD5");
+}
+
+@Override
+public MessageDigest get()
+{
+MessageDigest digest = super.get();
+digest.reset();
+return digest;
+}
+};
+
+private static final int HEAP_SIZE = (int) ObjectSizes.measureDeep(new 
BigIntegerToken(hashToBigInteger(ByteBuffer.allocate(1;
 
 public static final RandomPartitioner instance = new RandomPartitioner();
 public static final AbstractType partitionOrdering = new 
PartitionerDefinedOrder(instance);
 
+
 public DecoratedKey decorateKey(ByteBuffer key)
 {
 return new CachedHashDecoratedKey(getToken(key), key);
@@ -72,7 +97,7 @@ public class RandomPartitioner implements IPartitioner
 
 public BigIntegerToken getRandomToken()
 {
-BigInteger token = 
FBUtilities.hashToBigInteger(GuidGenerator.guidAsBytes());
+BigInteger token = hashToBigInteger(GuidGenerator.guidAsBytes());
 if ( token.signum() == -1 )
 token = token.multiply(BigInteger.valueOf(-1L));
 return new BigIntegerToken(token);
@@ -160,7 +185,8 @@ public class RandomPartitioner implements IPartitioner
 {
 if (key.remaining() == 0)
 return MINIMUM;
-return new BigIntegerToken(FBUtilities.hashToBigInteger(key));

[5/6] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.11

2017-11-07 Thread samt
Merge branch 'cassandra-3.0' into cassandra-3.11


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/ab6201c6
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/ab6201c6
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/ab6201c6

Branch: refs/heads/cassandra-3.11
Commit: ab6201c65b193c2df4c2f25f779a591c917b1df8
Parents: 6d429cd 58daf13
Author: Sam Tunnicliffe 
Authored: Tue Nov 7 13:59:20 2017 +
Committer: Sam Tunnicliffe 
Committed: Tue Nov 7 13:59:20 2017 +

--
 CHANGES.txt |  1 +
 .../apache/cassandra/dht/RandomPartitioner.java | 43 ++--
 .../org/apache/cassandra/utils/FBUtilities.java | 19 -
 3 files changed, 40 insertions(+), 23 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/ab6201c6/CHANGES.txt
--
diff --cc CHANGES.txt
index 275294f,3f4f3f2..1269dcf
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -1,8 -1,5 +1,9 @@@
 -3.0.16
 +3.11.2
 + * Round buffer size to powers of 2 for the chunk cache (CASSANDRA-13897)
 + * Update jackson JSON jars (CASSANDRA-13949)
 + * Avoid locks when checking LCS fanout and if we should defrag 
(CASSANDRA-13930)
 +Merged from 3.0:
+  * Tracing interferes with digest requests when using RandomPartitioner 
(CASSANDRA-13964)
   * Add flag to disable materialized views, and warnings on creation 
(CASSANDRA-13959)
   * Don't let user drop or generally break tables in system_distributed 
(CASSANDRA-13813)
   * Provide a JMX call to sync schema with local storage (CASSANDRA-13954)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/ab6201c6/src/java/org/apache/cassandra/dht/RandomPartitioner.java
--
diff --cc src/java/org/apache/cassandra/dht/RandomPartitioner.java
index 82c2493,c7837c9..bdf8b85
--- a/src/java/org/apache/cassandra/dht/RandomPartitioner.java
+++ b/src/java/org/apache/cassandra/dht/RandomPartitioner.java
@@@ -117,20 -103,7 +141,20 @@@ public class RandomPartitioner implemen
  return new BigIntegerToken(token);
  }
  
 -private final Token.TokenFactory tokenFactory = new Token.TokenFactory() {
 +public BigIntegerToken getRandomToken(Random random)
 +{
- BigInteger token = 
FBUtilities.hashToBigInteger(GuidGenerator.guidAsBytes(random, 
"host/127.0.0.1", 0));
++BigInteger token = hashToBigInteger(GuidGenerator.guidAsBytes(random, 
"host/127.0.0.1", 0));
 +if ( token.signum() == -1 )
 +token = token.multiply(BigInteger.valueOf(-1L));
 +return new BigIntegerToken(token);
 +}
 +
 +private boolean isValidToken(BigInteger token) {
 +return token.compareTo(ZERO) >= 0 && token.compareTo(MAXIMUM) <= 0;
 +}
 +
 +private final Token.TokenFactory tokenFactory = new Token.TokenFactory()
 +{
  public ByteBuffer toByteArray(Token token)
  {
  BigIntegerToken bigIntegerToken = (BigIntegerToken) token;
@@@ -275,9 -230,14 +300,19 @@@
  return partitionOrdering;
  }
  
 +public Optional splitter()
 +{
 +return Optional.of(splitter);
 +}
 +
+ private static BigInteger hashToBigInteger(ByteBuffer data)
+ {
+ MessageDigest messageDigest = localMD5Digest.get();
+ if (data.hasArray())
+ messageDigest.update(data.array(), data.arrayOffset() + 
data.position(), data.remaining());
+ else
+ messageDigest.update(data.duplicate());
+ 
+ return new BigInteger(messageDigest.digest()).abs();
+ }
  }

http://git-wip-us.apache.org/repos/asf/cassandra/blob/ab6201c6/src/java/org/apache/cassandra/utils/FBUtilities.java
--


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[2/6] cassandra git commit: RandomPartitioner has separate MessageDigest for token generation

2017-11-07 Thread samt
RandomPartitioner has separate MessageDigest for token generation

Patch by Sam Tunnicliffe; reviewed by Jason Brown for CASSANDRA-13964


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/58daf137
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/58daf137
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/58daf137

Branch: refs/heads/cassandra-3.11
Commit: 58daf1376456289f97f0ef0b0daf9e0d03ba6b81
Parents: 6c29ee8
Author: Sam Tunnicliffe 
Authored: Tue Oct 17 14:51:43 2017 +0100
Committer: Sam Tunnicliffe 
Committed: Tue Nov 7 13:38:25 2017 +

--
 CHANGES.txt |  1 +
 .../apache/cassandra/dht/RandomPartitioner.java | 43 ++--
 .../org/apache/cassandra/utils/FBUtilities.java | 19 -
 3 files changed, 41 insertions(+), 22 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/58daf137/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 935931c..3f4f3f2 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 3.0.16
+ * Tracing interferes with digest requests when using RandomPartitioner 
(CASSANDRA-13964)
  * Add flag to disable materialized views, and warnings on creation 
(CASSANDRA-13959)
  * Don't let user drop or generally break tables in system_distributed 
(CASSANDRA-13813)
  * Provide a JMX call to sync schema with local storage (CASSANDRA-13954)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/58daf137/src/java/org/apache/cassandra/dht/RandomPartitioner.java
--
diff --git a/src/java/org/apache/cassandra/dht/RandomPartitioner.java 
b/src/java/org/apache/cassandra/dht/RandomPartitioner.java
index b0dea01..c7837c9 100644
--- a/src/java/org/apache/cassandra/dht/RandomPartitioner.java
+++ b/src/java/org/apache/cassandra/dht/RandomPartitioner.java
@@ -20,6 +20,7 @@ package org.apache.cassandra.dht;
 import java.math.BigDecimal;
 import java.math.BigInteger;
 import java.nio.ByteBuffer;
+import java.security.MessageDigest;
 import java.util.*;
 
 import com.google.common.annotations.VisibleForTesting;
@@ -45,11 +46,35 @@ public class RandomPartitioner implements IPartitioner
 public static final BigIntegerToken MINIMUM = new BigIntegerToken("-1");
 public static final BigInteger MAXIMUM = new BigInteger("2").pow(127);
 
-private static final int HEAP_SIZE = (int) ObjectSizes.measureDeep(new 
BigIntegerToken(FBUtilities.hashToBigInteger(ByteBuffer.allocate(1;
+/**
+ * Maintain a separate threadlocal message digest, exclusively for token 
hashing. This is necessary because
+ * when Tracing is enabled and using the default tracing implementation, 
creating the mutations for the trace
+ * events involves tokenizing the partition keys. This happens multiple 
times whilst servicing a ReadCommand,
+ * and so can interfere with the stateful digest calculation if the node 
is a replica producing a digest response.
+ */
+private static final ThreadLocal localMD5Digest = new 
ThreadLocal()
+{
+@Override
+protected MessageDigest initialValue()
+{
+return FBUtilities.newMessageDigest("MD5");
+}
+
+@Override
+public MessageDigest get()
+{
+MessageDigest digest = super.get();
+digest.reset();
+return digest;
+}
+};
+
+private static final int HEAP_SIZE = (int) ObjectSizes.measureDeep(new 
BigIntegerToken(hashToBigInteger(ByteBuffer.allocate(1;
 
 public static final RandomPartitioner instance = new RandomPartitioner();
 public static final AbstractType partitionOrdering = new 
PartitionerDefinedOrder(instance);
 
+
 public DecoratedKey decorateKey(ByteBuffer key)
 {
 return new CachedHashDecoratedKey(getToken(key), key);
@@ -72,7 +97,7 @@ public class RandomPartitioner implements IPartitioner
 
 public BigIntegerToken getRandomToken()
 {
-BigInteger token = 
FBUtilities.hashToBigInteger(GuidGenerator.guidAsBytes());
+BigInteger token = hashToBigInteger(GuidGenerator.guidAsBytes());
 if ( token.signum() == -1 )
 token = token.multiply(BigInteger.valueOf(-1L));
 return new BigIntegerToken(token);
@@ -160,7 +185,8 @@ public class RandomPartitioner implements IPartitioner
 {
 if (key.remaining() == 0)
 return MINIMUM;
-return new BigIntegerToken(FBUtilities.hashToBigInteger(key));
+
+return new BigIntegerToken(hashToBigInteger(key));
 }
 
 public Map describeOwnership(List sortedTokens)
@@ -203,4 +229,15 @@ public class 

[jira] [Commented] (CASSANDRA-13403) nodetool repair breaks SASI index

2017-11-07 Thread Ludovic Boutros (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16242120#comment-16242120
 ] 

Ludovic Boutros commented on CASSANDRA-13403:
-

And the if we rebuild the index:

{code}
INFO  [RMI TCP Connection(7)-10.53.0.15] 2017-11-07 15:44:34,456 
ColumnFamilyStore.java:806 - User Requested secondary index re-build for 
lubo_test/t_doc indexes: i_doc
DEBUG [RMI TCP Connection(7)-10.53.0.15] 2017-11-07 15:44:34,458 
ColumnFamilyStore.java:899 - Enqueuing flush of IndexInfo: 0,385KiB (0%) 
on-heap, 0,000KiB (0%) off-heap
DEBUG [PerDiskMemtableFlushWriter_0:5] 2017-11-07 15:44:34,514 
Memtable.java:461 - Writing Memtable-IndexInfo@1020363412(0,049KiB serialized 
bytes, 1 ops, 0%/0% of on/off-heap limit), flushed range = 
(min(-9223372036854775808), max(9223372036854775807)]
DEBUG [PerDiskMemtableFlushWriter_0:5] 2017-11-07 15:44:34,515 
Memtable.java:490 - Completed flushing 
/data/cassandra/data/system/IndexInfo-9f5c6374d48532299a0a5094af9ad1e3/mc-18-big-Data.db
 (0,036KiB) for commitlog position CommitLogPosition(segmentId=1510062526702, 
position=2214781)
DEBUG [MemtableFlushWriter:5] 2017-11-07 15:44:34,644 
ColumnFamilyStore.java:1197 - Flushed to 
[BigTableReader(path='/data/cassandra/data/system/IndexInfo-9f5c6374d48532299a0a5094af9ad1e3/mc-18-big-Data.db')]
 (1 sstables, 4,854KiB), biggest 4,854KiB, smallest 4,854KiB
INFO  [RMI TCP Connection(7)-10.53.0.15] 2017-11-07 15:44:34,644 
SecondaryIndexManager.java:365 - Submitting index build of i_doc for data in 
BigTableReader(path='/data/cassandra/data/lubo_test/t_doc-64343790c31611e7a46403e2ed27ae86/mc-23-big-Data.db'),BigTableReader(path='/data/cassandra/data/lubo_test/t_doc-64343790c31611e7a46403e2ed27ae86/mc-22-big-Data.db')
INFO  [CompactionExecutor:10] 2017-11-07 15:44:34,646 
PerSSTableIndexWriter.java:279 - Scheduling index flush to 
/data/cassandra/data/lubo_test/t_doc-64343790c31611e7a46403e2ed27ae86/mc-22-big-SI_i_doc.db
INFO  [SASI-General:3] 2017-11-07 15:44:34,675 PerSSTableIndexWriter.java:330 - 
Index flush to 
/data/cassandra/data/lubo_test/t_doc-64343790c31611e7a46403e2ed27ae86/mc-22-big-SI_i_doc.db
 took 28 ms.
{code}
{code}
INFO  [CompactionExecutor:10] 2017-11-07 15:44:34,676 DataTracker.java:152 - 
SSTableIndex.open(column: r, minTerm: 0, maxTerm: 0, minKey: 1, maxKey: 7, 
sstable: 
BigTableReader(path='/data/cassandra/data/lubo_test/t_doc-64343790c31611e7a46403e2ed27ae86/mc-22-big-Data.db'))
{code}
{code}
INFO  [CompactionExecutor:10] 2017-11-07 15:44:34,677 
PerSSTableIndexWriter.java:279 - Scheduling index flush to 
/data/cassandra/data/lubo_test/t_doc-64343790c31611e7a46403e2ed27ae86/mc-23-big-SI_i_doc.db
INFO  [SASI-General:3] 2017-11-07 15:44:34,683 PerSSTableIndexWriter.java:330 - 
Index flush to 
/data/cassandra/data/lubo_test/t_doc-64343790c31611e7a46403e2ed27ae86/mc-23-big-SI_i_doc.db
 took 5 ms.
{code}
{code}
INFO  [CompactionExecutor:10] 2017-11-07 15:44:34,683 DataTracker.java:152 - 
SSTableIndex.open(column: r, minTerm: 0, maxTerm: 0, minKey: 11, maxKey: 9, 
sstable: 
BigTableReader(path='/data/cassandra/data/lubo_test/t_doc-64343790c31611e7a46403e2ed27ae86/mc-23-big-Data.db'))
{code}
{code}
INFO  [RMI TCP Connection(7)-10.53.0.15] 2017-11-07 15:44:34,683 
SecondaryIndexManager.java:385 - Index build of i_doc complete
{code}

We can see the two lines of log of the DataTracker.



> nodetool repair breaks SASI index
> -
>
> Key: CASSANDRA-13403
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13403
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: 3.10
>Reporter: Igor Novgorodov
>Assignee: Alex Petrov
> Attachments: 3_nodes_compaction.log, 4_nodes_compaction.log
>
>
> I've got table:
> {code}
> CREATE TABLE cservice.bulks_recipients (
> recipient text,
> bulk_id uuid,
> datetime_final timestamp,
> datetime_sent timestamp,
> request_id uuid,
> status int,
> PRIMARY KEY (recipient, bulk_id)
> ) WITH CLUSTERING ORDER BY (bulk_id ASC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
> CREATE CUSTOM INDEX bulk_recipients_bulk_id ON 

[jira] [Commented] (CASSANDRA-13997) Upgrade guava to 23.3

2017-11-07 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16242118#comment-16242118
 ] 

Marcus Eriksson commented on CASSANDRA-13997:
-

pushed a commit that upgrades airline to 0.8
https://circleci.com/gh/krummas/cassandra/tree/marcuse%2Fguava23
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/421/

> Upgrade guava to 23.3
> -
>
> Key: CASSANDRA-13997
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13997
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 4.x
>
>
> For 4.0 we should upgrade guava to the latest version
> patch here: https://github.com/krummas/cassandra/commits/marcuse/guava23
> A bunch of quite commonly used methods have been deprecated since guava 18 
> which we use now ({{Throwables.propagate}} for example), this patch mostly 
> updates uses where compilation fails. {{Futures.transform(ListenableFuture 
> ..., AsyncFunction ...}} was deprecated in Guava 19 and removed in 20 for 
> example, we should probably open new tickets to remove calls to all 
> deprecated guava methods.
> Also had to add a dependency on {{com.google.j2objc.j2objc-annotations}}, to 
> avoid some build-time warnings (maybe due to 
> https://github.com/google/guava/commit/fffd2b1f67d158c7b4052123c5032b0ba54a910d
>  ?)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13997) Upgrade guava to 23.3

2017-11-07 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16242108#comment-16242108
 ] 

Marcus Eriksson commented on CASSANDRA-13997:
-

dtest run yesterday timed out: 
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/420/
circle shows that we probably need to upgrade airline as well: 
https://circleci.com/gh/krummas/cassandra/171

> Upgrade guava to 23.3
> -
>
> Key: CASSANDRA-13997
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13997
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 4.x
>
>
> For 4.0 we should upgrade guava to the latest version
> patch here: https://github.com/krummas/cassandra/commits/marcuse/guava23
> A bunch of quite commonly used methods have been deprecated since guava 18 
> which we use now ({{Throwables.propagate}} for example), this patch mostly 
> updates uses where compilation fails. {{Futures.transform(ListenableFuture 
> ..., AsyncFunction ...}} was deprecated in Guava 19 and removed in 20 for 
> example, we should probably open new tickets to remove calls to all 
> deprecated guava methods.
> Also had to add a dependency on {{com.google.j2objc.j2objc-annotations}}, to 
> avoid some build-time warnings (maybe due to 
> https://github.com/google/guava/commit/fffd2b1f67d158c7b4052123c5032b0ba54a910d
>  ?)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14000) Remove v5 as a beta version from 3.11

2017-11-07 Thread Alex Petrov (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-14000:

Description: 
Currently, V5 has only two features (if anyone knows other ones, please correct 
me): 

  * [CASSANDRA-10786]
  * [CASSANDRA-12838]

V5 "beta" mode was suggested in [CASSANDRA-12142], hoping that we can release 
more features quicker. However, we did not. 

I suggest we remove v5 protocol support from 3.11, as all the new features go 
into 4.0 anyways and protocol is on an early stage, so most likely there will 
be a couple more changes.

UPDATE: [CASSANDRA-12838] adds a {{DURATION}} type, which can not be done any 
way other than bumping a protocol version. The problem is 

  was:
Currently, V5 has only two features (if anyone knows other ones, please correct 
me): 

  * [CASSANDRA-10786]
  * [CASSANDRA-12838]

V5 "beta" mode was suggested in [CASSANDRA-12142], hoping that we can release 
more features quicker. However, we did not. 

I suggest we remove v5 protocol support from 3.11, as all the new features go 
into 4.0 anyways and protocol is on an early stage, so most likely there will 
be a couple more changes.

UPDATE: [CASSANDRA-12838] adds a {{DURATION}} type, which can not be done any 
way other than bumping a protocol version. 


> Remove v5 as a beta version from 3.11 
> --
>
> Key: CASSANDRA-14000
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14000
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Blocker
>
> Currently, V5 has only two features (if anyone knows other ones, please 
> correct me): 
>   * [CASSANDRA-10786]
>   * [CASSANDRA-12838]
> V5 "beta" mode was suggested in [CASSANDRA-12142], hoping that we can release 
> more features quicker. However, we did not. 
> I suggest we remove v5 protocol support from 3.11, as all the new features go 
> into 4.0 anyways and protocol is on an early stage, so most likely there will 
> be a couple more changes.
> UPDATE: [CASSANDRA-12838] adds a {{DURATION}} type, which can not be done any 
> way other than bumping a protocol version. The problem is 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14000) Remove v5 as a beta version from 3.11

2017-11-07 Thread Alex Petrov (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-14000:

Description: 
Currently, V5 has only two features (if anyone knows other ones, please correct 
me): 

  * [CASSANDRA-10786]
  * [CASSANDRA-12838]

V5 "beta" mode was suggested in [CASSANDRA-12142], hoping that we can release 
more features quicker. However, we did not. 

I suggest we remove v5 protocol support from 3.11, as all the new features go 
into 4.0 anyways and protocol is on an early stage, so most likely there will 
be a couple more changes.

UPDATE: [CASSANDRA-12838] adds a {{DURATION}} type, which can not be done any 
way other than bumping a protocol version. 

  was:
Currently, V5 has only two features (if anyone knows other ones, please correct 
me): 

  * https://issues.apache.org/jira/browse/CASSANDRA-10145
  * https://issues.apache.org/jira/browse/CASSANDRA-10786
  * https://issues.apache.org/jira/browse/CASSANDRA-12838

V5 "beta" mode was suggested in [CASSANDRA-12142], hoping that we can release 
more features quicker. However, we did not. 

I suggest we remove v5 protocol support from 3.11, as all the new features go 
into 4.0 anyways and protocol is on an early stage, so most likely there will 
be a couple more changes.


> Remove v5 as a beta version from 3.11 
> --
>
> Key: CASSANDRA-14000
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14000
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Blocker
>
> Currently, V5 has only two features (if anyone knows other ones, please 
> correct me): 
>   * [CASSANDRA-10786]
>   * [CASSANDRA-12838]
> V5 "beta" mode was suggested in [CASSANDRA-12142], hoping that we can release 
> more features quicker. However, we did not. 
> I suggest we remove v5 protocol support from 3.11, as all the new features go 
> into 4.0 anyways and protocol is on an early stage, so most likely there will 
> be a couple more changes.
> UPDATE: [CASSANDRA-12838] adds a {{DURATION}} type, which can not be done any 
> way other than bumping a protocol version. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14000) Remove v5 as a beta version from 3.11

2017-11-07 Thread Robert Stupp (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp updated CASSANDRA-14000:
-
Description: 
Currently, V5 has only two features (if anyone knows other ones, please correct 
me): 

  * https://issues.apache.org/jira/browse/CASSANDRA-10145
  * https://issues.apache.org/jira/browse/CASSANDRA-10786
  * https://issues.apache.org/jira/browse/CASSANDRA-12838

V5 "beta" mode was suggested in [CASSANDRA-12142], hoping that we can release 
more features quicker. However, we did not. 

I suggest we remove v5 protocol support from 3.11, as all the new features go 
into 4.0 anyways and protocol is on an early stage, so most likely there will 
be a couple more changes.

  was:
Currently, V5 has only two features (if anyone knows other ones, please correct 
me): 

  * https://issues.apache.org/jira/browse/CASSANDRA-10786
  * https://issues.apache.org/jira/browse/CASSANDRA-12838

V5 "beta" mode was suggested in [CASSANDRA-12142], hoping that we can release 
more features quicker. However, we did not. 

I suggest we remove v5 protocol support from 3.11, as all the new features go 
into 4.0 anyways and protocol is on an early stage, so most likely there will 
be a couple more changes.


> Remove v5 as a beta version from 3.11 
> --
>
> Key: CASSANDRA-14000
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14000
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Blocker
>
> Currently, V5 has only two features (if anyone knows other ones, please 
> correct me): 
>   * https://issues.apache.org/jira/browse/CASSANDRA-10145
>   * https://issues.apache.org/jira/browse/CASSANDRA-10786
>   * https://issues.apache.org/jira/browse/CASSANDRA-12838
> V5 "beta" mode was suggested in [CASSANDRA-12142], hoping that we can release 
> more features quicker. However, we did not. 
> I suggest we remove v5 protocol support from 3.11, as all the new features go 
> into 4.0 anyways and protocol is on an early stage, so most likely there will 
> be a couple more changes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14000) Remove v5 as a beta version from 3.11

2017-11-07 Thread Alex Petrov (JIRA)
Alex Petrov created CASSANDRA-14000:
---

 Summary: Remove v5 as a beta version from 3.11 
 Key: CASSANDRA-14000
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14000
 Project: Cassandra
  Issue Type: Bug
Reporter: Alex Petrov
Assignee: Alex Petrov
Priority: Blocker


Currently, V5 has only two features (if anyone knows other ones, please correct 
me): 

  * https://issues.apache.org/jira/browse/CASSANDRA-10786
  * https://issues.apache.org/jira/browse/CASSANDRA-12838

V5 "beta" mode was suggested in [CASSANDRA-12142], hoping that we can release 
more features quicker. However, we did not. 

I suggest we remove v5 protocol support from 3.11, as all the new features go 
into 4.0 anyways and protocol is on an early stage, so most likely there will 
be a couple more changes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13997) Upgrade guava to 23.3

2017-11-07 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-13997:
---
Reviewer: Stefan Podkowinski

> Upgrade guava to 23.3
> -
>
> Key: CASSANDRA-13997
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13997
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 4.x
>
>
> For 4.0 we should upgrade guava to the latest version
> patch here: https://github.com/krummas/cassandra/commits/marcuse/guava23
> A bunch of quite commonly used methods have been deprecated since guava 18 
> which we use now ({{Throwables.propagate}} for example), this patch mostly 
> updates uses where compilation fails. {{Futures.transform(ListenableFuture 
> ..., AsyncFunction ...}} was deprecated in Guava 19 and removed in 20 for 
> example, we should probably open new tickets to remove calls to all 
> deprecated guava methods.
> Also had to add a dependency on {{com.google.j2objc.j2objc-annotations}}, to 
> avoid some build-time warnings (maybe due to 
> https://github.com/google/guava/commit/fffd2b1f67d158c7b4052123c5032b0ba54a910d
>  ?)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13403) nodetool repair breaks SASI index

2017-11-07 Thread Ludovic Boutros (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16242078#comment-16242078
 ] 

Ludovic Boutros commented on CASSANDRA-13403:
-

Another thing in the logs :

{code}
INFO  [CompactionExecutor:5] 2017-11-07 14:52:50,945 
CompactionManager.java:1472 - Performing anticompaction on 2 sstables
INFO  [CompactionExecutor:5] 2017-11-07 14:52:50,956 
CompactionManager.java:1509 - Anticompacting 
[BigTableReader(path='/data/cassandra/data/lubo_test/t_doc-64343790c31611e7a46403e2ed27ae86/mc-21-big-Data.db'),
 
BigTableReader(path='/data/cassandra/data/lubo_test/t_doc-64343790c31611e7a46403e2ed27ae86/mc-20-big-Data.db')]
INFO  [CompactionExecutor:5] 2017-11-07 14:52:51,308 
PerSSTableIndexWriter.java:279 - Scheduling index flush to 
/data/cassandra/data/lubo_test/t_doc-64343790c31611e7a46403e2ed27ae86/mc-22-big-SI_i_doc.db
INFO  [SASI-General:2] 2017-11-07 14:52:51,343 PerSSTableIndexWriter.java:330 - 
Index flush to 
/data/cassandra/data/lubo_test/t_doc-64343790c31611e7a46403e2ed27ae86/mc-22-big-SI_i_doc.db
 took 34 ms.
{code}

{code}
INFO  [CompactionExecutor:5] 2017-11-07 14:52:51,380 DataTracker.java:152 - 
SSTableIndex.open(column: r, minTerm: 0, maxTerm: 0, minKey: 1, maxKey: 7, 
sstable: 
BigTableReader(path='/data/cassandra/data/lubo_test/t_doc-64343790c31611e7a46403e2ed27ae86/mc-22-big-Data.db'))
{code}


{code}
INFO  [CompactionExecutor:5] 2017-11-07 14:52:51,381 
PerSSTableIndexWriter.java:279 - Scheduling index flush to 
/data/cassandra/data/lubo_test/t_doc-64343790c31611e7a46403e2ed27ae86/mc-23-big-SI_i_doc.db
INFO  [SASI-General:2] 2017-11-07 14:52:51,412 PerSSTableIndexWriter.java:330 - 
Index flush to 
/data/cassandra/data/lubo_test/t_doc-64343790c31611e7a46403e2ed27ae86/mc-23-big-SI_i_doc.db
 took 31 ms.
INFO  [CompactionExecutor:5] 2017-11-07 14:52:51,413 
CompactionManager.java:1488 - Anticompaction completed successfully, 
anticompacted from 0 to 2 sstable(s).
INFO  [CompactionExecutor:5] 2017-11-07 14:52:51,413 CompactionManager.java:694 
- [repair #f1539d30-c3c2-11e7-8fe4-090a7aa7154d] Completed anticompaction 
successfully
INFO  [InternalResponseStage:14] 2017-11-07 14:52:51,782 
RepairRunnable.java:340 - Repair command #1 finished in 1 second
{code}

The second index on the second SSTable does not seem to be opened/finished. 

And the only known keys are between [1 to 7] which matches with the query 
result: 

{code:SQL}
cassandra@cqlsh> SELECT * from lubo_test.t_doc where r = 0;

 id | r | cid
+---+--
  6 | 0 | 66f68be0-c316-11e7-a464-03e2ed27ae86
  7 | 0 | 66f74f30-c316-11e7-a464-03e2ed27ae86
 10 | 0 | 66faaa90-c316-11e7-a464-03e2ed27ae86
  4 | 0 | 66f46900-c316-11e7-a464-03e2ed27ae86
  3 | 0 | 66f37ea0-c316-11e7-a464-03e2ed27ae86
  5 | 0 | 66f5a180-c316-11e7-a464-03e2ed27ae86
  2 | 0 | 66f29440-c316-11e7-a464-03e2ed27ae86
  1 | 0 | 66ea56e0-c316-11e7-a464-03e2ed27ae86

(8 rows)
{code}


> nodetool repair breaks SASI index
> -
>
> Key: CASSANDRA-13403
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13403
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: 3.10
>Reporter: Igor Novgorodov
>Assignee: Alex Petrov
> Attachments: 3_nodes_compaction.log, 4_nodes_compaction.log
>
>
> I've got table:
> {code}
> CREATE TABLE cservice.bulks_recipients (
> recipient text,
> bulk_id uuid,
> datetime_final timestamp,
> datetime_sent timestamp,
> request_id uuid,
> status int,
> PRIMARY KEY (recipient, bulk_id)
> ) WITH CLUSTERING ORDER BY (bulk_id ASC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
> CREATE CUSTOM INDEX bulk_recipients_bulk_id ON cservice.bulks_recipients 
> (bulk_id) USING 'org.apache.cassandra.index.sasi.SASIIndex';
> {code}
> There are 11 rows in it:
> {code}
> > select * from bulks_recipients;
> ...
> (11 rows)
> {code}
> Let's query by index (all rows have the same *bulk_id*):
> {code}
> > select * from bulks_recipients where bulk_id = 
> > baa94815-e276-4ca4-adda-5b9734e6c4a5;   
> >   
> ...
> (11 rows)
> {code}
> 

[jira] [Updated] (CASSANDRA-12838) Extend native protocol flags and add supported versions to the SUPPORTED response

2017-11-07 Thread Alex Petrov (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-12838:

Labels: client-impacting protocolv5  (was: client-impacting)

> Extend native protocol flags and add supported versions to the SUPPORTED 
> response
> -
>
> Key: CASSANDRA-12838
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12838
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: CQL
>Reporter: Stefania
>Assignee: Stefania
>  Labels: client-impacting, protocolv5
> Fix For: 3.10
>
>
> We already use 7 bits for the flags of the QUERY message, and since they are 
> encoded with a fixed size byte, we may be forced to change the structure of 
> the message soon, and I'd like to do this in version 5 but without wasting 
> bytes on the wire. Therefore, I propose to convert fixed flag's bytes to 
> unsigned vints, as defined in CASSANDRA-9499. The only exception would be the 
> flags in the frame, which should stay as fixed size.
> Up to 7 bits, vints are encoded the same as bytes are, so no immediate change 
> would be required in the drivers, although they should plan to support vint 
> flags if supporting version 5. Moving forward, when a new flag is required 
> for the QUERY message, and eventually when other flags reach 8 bits in other 
> messages too, the flag's bitmaps would be automatically encoded with a size 
> that is big enough to accommodate all flags, but no bigger than required. We 
> can currently support up to 8 bytes with unsigned vints.
> The downside is that drivers need to implement unsigned vint encoding for 
> version 5, but this is already required by CASSANDRA-11873, and will most 
> likely be required by CASSANDRA-11622 as well.
> I would also like to add the list of versions to the SUPPORTED message, in 
> order to simplify the handshake for drivers that prefer to send an OPTION 
> message, rather than rely on receiving an error for an unsupported version in 
> the STARTUP message. Said error should also contain the full list of 
> supported versions, not just the min and max, for clarity, and because the 
> latest version is now a beta version.
> Finally, we currently store versions as integer constants in {{Server.java}}, 
> and we still have a fair bit of hard-coded numbers in the code, especially in 
> tests. I plan to clean this up by introducing a {{ProtocolVersion}} enum.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13971) Automatic certificate management using Vault

2017-11-07 Thread Jeff Mitchell (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16242008#comment-16242008
 ] 

Jeff Mitchell commented on CASSANDRA-13971:
---

Woah, I totally forgot I had an Apache JIRA account.

You're welcome!


> Automatic certificate management using Vault
> 
>
> Key: CASSANDRA-13971
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13971
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Streaming and Messaging
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
> Fix For: 4.x
>
>
> We've been adding security features during the last years to enable users to 
> secure their clusters, if they are willing to use them and do so correctly. 
> Some features are powerful and easy to work with, such as role based 
> authorization. Other features that require to manage a local keystore are 
> rather painful to deal with. Think about setting up SSL..
> To be fair, keystore related issues and certificate handling hasn't been 
> invented by us. We're just following Java standards there. But that doesn't 
> mean that we absolutely have to, if there are better options. I'd like to 
> give it a shoot and find out if we can automate certificate/key handling 
> (PKI) by using external APIs. In this case, the implementation will be based 
> on [Vault|https://vaultproject.io]. But certificate management services 
> offered by cloud providers may also be able to handle the use-case and I 
> intend to create a generic, pluggable API for that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13996) Close DataInputBuffer in MetadataSerializer

2017-11-07 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16242007#comment-16242007
 ] 

Aleksey Yeschenko commented on CASSANDRA-13996:
---

Yeah. I actually saw that and made an explicit call to leave it be, as that 
close is an obvious no-op, and not having it made the code cleaner.

Go ahead if you feel like it, but on principle I'm no fan of make code a bit 
worse to please tooling or unit tests. Is there an annotation to suppress the 
warnings instead, maybe?

> Close DataInputBuffer in MetadataSerializer
> ---
>
> Key: CASSANDRA-13996
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13996
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 4.x
>
>
> eclipse-warnings complains about this, either introduced by CASSANDRA-13321 
> or CASSANDRA-13953
> Patch here: https://github.com/krummas/cassandra/commits/marcuse/closeDIB
> https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/416/
> https://circleci.com/gh/krummas/cassandra/170



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13971) Automatic certificate management using Vault

2017-11-07 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16242001#comment-16242001
 ] 

Stefan Podkowinski commented on CASSANDRA-13971:


Work here is getting ahead nicely. I'm now done with a first implementation 
that allows me to authenticate against Vault and retrieve certificates. There's 
also a dtest that would download vault (a static go executable), spin up an 
instance and bootstrap a Cassandra cluster with SSL Vault support enabled.

There are still some aspects that need some more test coverage, such as 
certificate renewal for running Cassandra instances. But I don't see any major 
blockers on the way so far.

As for Vault, I've found that Java/JCA is a bit limited when it comes to 
supported rsa private key encodings and Vault's PKCS#1 encoded keys could not 
be read using the Java standard classes. But a 
[PR|https://github.com/hashicorp/vault/pull/3518] has been merged recently that 
will enable PKCS#8 support in one of the upcoming Vault releases, which is 
going to solve this issue (thanks  [~jeffm]!).

> Automatic certificate management using Vault
> 
>
> Key: CASSANDRA-13971
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13971
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Streaming and Messaging
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
> Fix For: 4.x
>
>
> We've been adding security features during the last years to enable users to 
> secure their clusters, if they are willing to use them and do so correctly. 
> Some features are powerful and easy to work with, such as role based 
> authorization. Other features that require to manage a local keystore are 
> rather painful to deal with. Think about setting up SSL..
> To be fair, keystore related issues and certificate handling hasn't been 
> invented by us. We're just following Java standards there. But that doesn't 
> mean that we absolutely have to, if there are better options. I'd like to 
> give it a shoot and find out if we can automate certificate/key handling 
> (PKI) by using external APIs. In this case, the implementation will be based 
> on [Vault|https://vaultproject.io]. But certificate management services 
> offered by cloud providers may also be able to handle the use-case and I 
> intend to create a generic, pluggable API for that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13964) Tracing interferes with digest requests when using RandomPartitioner

2017-11-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16241998#comment-16241998
 ] 

ASF GitHub Bot commented on CASSANDRA-13964:


GitHub user beobal opened a pull request:

https://github.com/apache/cassandra-dtest/pull/10

Add test for digest requests with RandomPartitioner and tracing enabled

Patch by Sam Tunnicliffe; reviewed by Jason Brown for CASSANDRA-13964

@ptnapoleon: Jason already gave this the once over, but if you have chance 
I'd appreciate your +1 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/beobal/cassandra-dtest 13964

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cassandra-dtest/pull/10.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10


commit edc48bc965e842628413cfd50a7a21071d7b098a
Author: Sam Tunnicliffe 
Date:   2017-10-17T13:50:25Z

Add test for digest requests with RandomPartitioner and tracing enabled

Patch by Sam Tunnicliffe; reviewed by Jason Brown for CASSANDRA-13964




> Tracing interferes with digest requests when using RandomPartitioner
> 
>
> Key: CASSANDRA-13964
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13964
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths, Observability
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>
> A {{ThreadLocal}} is used to generate the MD5 digest when a 
> replica serves a read command and the {{isDigestQuery}} flag is set. The same 
> threadlocal is also used by {{RandomPartitioner}} to decorate partition keys. 
> So in a cluster with RP, if tracing is enabled the data digest is corrupted 
> by the partitioner making tokens for the tracing mutations. This causes a 
> digest mismatch on the coordinator, triggering a full data read on every read 
> where CL > 1 (or speculative execution/read repair kick in).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13999) Segfault during memtable flush

2017-11-07 Thread Ricardo Bartolome (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ricardo Bartolome updated CASSANDRA-13999:
--
Description: 
We are getting segfaults on a production Cassandra cluster, apparently caused 
by Memtable flushes to disk.
{code}
Current thread (0x0cd77920):  JavaThread 
"PerDiskMemtableFlushWriter_0:140" daemon [_thread_in_Java, id=28952, 
stack(0x7f8b7aa53000,0x7f8b7aa94000)]
{code}

Stack
{code}
Stack: [0x7f8b7aa53000,0x7f8b7aa94000],  sp=0x7f8b7aa924a0,  free 
space=253k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
J 21889 C2 
org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(Lorg/apache/cassandra/db/rows/UnfilteredRowIterator;)Lorg/apache/cassandra/db/RowIndexEntry;
 (361 bytes) @ 0x7f8e9fcf75ac [0x7f8e9fcf42c0+0x32ec]
J 22464 C2 
org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents()V (383 
bytes) @ 0x7f8e9f17b988 [0x7f8e9f17b5c0+0x3c8]
j  
org.apache.cassandra.db.Memtable$FlushRunnable.call()Lorg/apache/cassandra/io/sstable/SSTableMultiWriter;+1
j  org.apache.cassandra.db.Memtable$FlushRunnable.call()Ljava/lang/Object;+1
J 18865 C2 java.util.concurrent.FutureTask.run()V (126 bytes) @ 
0x7f8e9d3c9540 [0x7f8e9d3c93a0+0x1a0]
J 21832 C2 
java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
 (225 bytes) @ 0x7f8e9f16856c [0x7f8e9f168400+0x16c]
J 6720 C1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V (9 bytes) @ 
0x7f8e9def73c4 [0x7f8e9def72c0+0x104]
J 22079 C2 java.lang.Thread.run()V (17 bytes) @ 0x7f8e9e67c4ac 
[0x7f8e9e67c460+0x4c]
v  ~StubRoutines::call_stub
V  [libjvm.so+0x691d16]  JavaCalls::call_helper(JavaValue*, methodHandle*, 
JavaCallArguments*, Thread*)+0x1056
V  [libjvm.so+0x692221]  JavaCalls::call_virtual(JavaValue*, KlassHandle, 
Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x321
V  [libjvm.so+0x6926c7]  JavaCalls::call_virtual(JavaValue*, Handle, 
KlassHandle, Symbol*, Symbol*, Thread*)+0x47
V  [libjvm.so+0x72da50]  thread_entry(JavaThread*, Thread*)+0xa0
V  [libjvm.so+0xa76833]  JavaThread::thread_main_inner()+0x103
V  [libjvm.so+0xa7697c]  JavaThread::run()+0x11c
V  [libjvm.so+0x927568]  java_start(Thread*)+0x108
C  [libpthread.so.0+0x7de5]  start_thread+0xc5
{code}

For further details, we attached:
* JVM error file with all details
* cassandra config file (we are using offheap_buffers as 
memtable_allocation_method)
* some lines printed in debug.log when the JVM error file was created and 
process died

h5. Reproducing the issue
So far we have been unable to reproduce it. It happens once/twice a week on 
single nodes. It happens either during high load or low load times. We have 
seen that when we replace EC2 instances and bootstrap new ones, due to 
compactions happening on source nodes before stream starts, sometimes more than 
a single node was affected by this, letting us with 2 out of 3 replicas out and 
UnavailableExceptions in the cluster.

This issue might have relation with CASSANDRA-12590 (Segfault reading secondary 
index) even this is the write path. Can someone confirm if both issues could be 
related? 

h5. Specifics of our scenario:
* Cassandra 3.9 on Amazon Linux (previous to this, we were running Cassandra 
2.0.9 and there are no records of this also happening, even I was not working 
on Cassandra)
* 12 x i3.2xlarge EC2 instances (8 core, 64GB RAM)
* a total of 176 keyspaces (there is a per-customer pattern)
** Some keyspaces have a single table, while others have 2 or 5 tables
** There is a table that uses standard Secondary Indexes ("emailindex" on 
"user_info" table)
* It happens on both Oracle JDK 1.8.0_112 and 1.8.0_131
* It happens in both kernel 4.9.43-17.38.amzn1.x86_64 and 
3.14.35-28.38.amzn1.x86_64


h5. Possible workarounds/solutions that we have in mind (to be validated yet)
* switching to heap_buffers (in case offheap_buffers triggers the bug), even we 
are still pending to measure performance degradation under that scenario.
* removing secondary indexes in favour of Materialized Views for this specific 
case, even we are concerned too about the fact that using MVs introduces new 
issues that may be present in our current Cassandra 3.9
* Upgrading to 3.11.1 is an option, but we are trying to keep it as last resort 
given that the cost of migrating is big and we don't have any guarantee that 
new bugs that affects nodes availability are not introduced.

  was:
We are getting segfaults on a production Cassandra cluster, apparently caused 
by Memtable flushes to disk.
{code}
Current thread (0x0cd77920):  JavaThread 
"PerDiskMemtableFlushWriter_0:140" daemon [_thread_in_Java, id=28952, 
stack(0x7f8b7aa53000,0x7f8b7aa94000)]
{code}

Stack
{code}
Stack: [0x7f8b7aa53000,0x7f8b7aa94000],  sp=0x7f8b7aa924a0,  free 

[jira] [Updated] (CASSANDRA-13999) Segfault during memtable flush

2017-11-07 Thread Ricardo Bartolome (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ricardo Bartolome updated CASSANDRA-13999:
--
Environment: 
* Cassandra 3.9
* Oracle JDK 1.8.0_112 and 1.8.0_131
* Kernel 4.9.43-17.38.amzn1.x86_64 and 3.14.35-28.38.amzn1.x86_64

> Segfault during memtable flush
> --
>
> Key: CASSANDRA-13999
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13999
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
> Environment: * Cassandra 3.9
> * Oracle JDK 1.8.0_112 and 1.8.0_131
> * Kernel 4.9.43-17.38.amzn1.x86_64 and 3.14.35-28.38.amzn1.x86_64
>Reporter: Ricardo Bartolome
>Priority: Critical
> Attachments: 
> cassandra-jvm-file-error-1509698372-pid16151.log.obfuscated, 
> cassandra_config.yaml, node_crashing_debug.log
>
>
> We are getting segfaults on a production Cassandra cluster, apparently caused 
> by Memtable flushes to disk.
> {code}
> Current thread (0x0cd77920):  JavaThread 
> "PerDiskMemtableFlushWriter_0:140" daemon [_thread_in_Java, id=28952, 
> stack(0x7f8b7aa53000,0x7f8b7aa94000)]
> {code}
> Stack
> {code}
> Stack: [0x7f8b7aa53000,0x7f8b7aa94000],  sp=0x7f8b7aa924a0,  free 
> space=253k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> J 21889 C2 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(Lorg/apache/cassandra/db/rows/UnfilteredRowIterator;)Lorg/apache/cassandra/db/RowIndexEntry;
>  (361 bytes) @ 0x7f8e9fcf75ac [0x7f8e9fcf42c0+0x32ec]
> J 22464 C2 
> org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents()V (383 
> bytes) @ 0x7f8e9f17b988 [0x7f8e9f17b5c0+0x3c8]
> j  
> org.apache.cassandra.db.Memtable$FlushRunnable.call()Lorg/apache/cassandra/io/sstable/SSTableMultiWriter;+1
> j  org.apache.cassandra.db.Memtable$FlushRunnable.call()Ljava/lang/Object;+1
> J 18865 C2 java.util.concurrent.FutureTask.run()V (126 bytes) @ 
> 0x7f8e9d3c9540 [0x7f8e9d3c93a0+0x1a0]
> J 21832 C2 
> java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
>  (225 bytes) @ 0x7f8e9f16856c [0x7f8e9f168400+0x16c]
> J 6720 C1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V (9 bytes) @ 
> 0x7f8e9def73c4 [0x7f8e9def72c0+0x104]
> J 22079 C2 java.lang.Thread.run()V (17 bytes) @ 0x7f8e9e67c4ac 
> [0x7f8e9e67c460+0x4c]
> v  ~StubRoutines::call_stub
> V  [libjvm.so+0x691d16]  JavaCalls::call_helper(JavaValue*, methodHandle*, 
> JavaCallArguments*, Thread*)+0x1056
> V  [libjvm.so+0x692221]  JavaCalls::call_virtual(JavaValue*, KlassHandle, 
> Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x321
> V  [libjvm.so+0x6926c7]  JavaCalls::call_virtual(JavaValue*, Handle, 
> KlassHandle, Symbol*, Symbol*, Thread*)+0x47
> V  [libjvm.so+0x72da50]  thread_entry(JavaThread*, Thread*)+0xa0
> V  [libjvm.so+0xa76833]  JavaThread::thread_main_inner()+0x103
> V  [libjvm.so+0xa7697c]  JavaThread::run()+0x11c
> V  [libjvm.so+0x927568]  java_start(Thread*)+0x108
> C  [libpthread.so.0+0x7de5]  start_thread+0xc5
> {code}
> For further details, we attached:
> * JVM error file with all details
> * cassandra config file (we are using offheap_buffers as 
> memtable_allocation_method)
> * some lines printed in debug.log when the JVM error file was created and 
> process died
> h5. Reproducing the issue
> So far we have been unable to reproduce it. It happens once/twice a week on 
> single nodes. It happens either during high load or low load times. We have 
> seen that when we replace EC2 instances and bootstrap new ones, due to 
> compactions happening on source nodes before stream starts, sometimes more 
> than a single node was affected by this, letting us with 2 out of 3 replicas 
> out and UnavailableExceptions in the cluster.
> This issue might have relation with CASSANDRA-12590 (Segfault reading 
> secondary index) even this is the write path. Can someone confirm if both 
> issues could be related? 
> h5. Specifics of our scenario:
> * Cassandra 3.9 on Amazon Linux (previous to this, we were running Cassandra 
> 2.0.9 and there are no records of this also happening, even I was not working 
> on Cassandra)
> * 12 x i3.2xlarge EC2 instances (8 core, 64GB RAM)
> * a total of 176 keyspaces (there is a per-customer pattern)
> ** Some keyspaces have a single table, while others have 2 or 5 tables
> ** There is a table that uses standard Secondary Indexes ("emailindex" on 
> "user_info" table)
> * It happens on both Oracle JDK 1.8.0_112 and 1.8.0_131
> * It happens in both kernel 4.9.43-17.38.amzn1.x86_64 and 
> 3.14.35-28.38.amzn1.x86_64
> h5. Possible workarounds/solutions (to be validated yet)
> * switching to heap_buffers (in case offheap_buffers triggers the bug), even 
> we are still 

[jira] [Commented] (CASSANDRA-13999) Segfault during memtable flush

2017-11-07 Thread Ricardo Bartolome (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16241970#comment-16241970
 ] 

Ricardo Bartolome commented on CASSANDRA-13999:
---

The schema of the tables that contains the index to query from user email is 
the following:
{code}
CREATE TABLE customer_user.user_info (
user_id text PRIMARY KEY,
user_accept_email boolean,
user_email text,
user_last_modified timestamp,
user_locale text,
user_metadata text,
user_name text,
user_profile_picture text,
user_site text,
user_timezone text
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
CREATE INDEX emailindex ON customer_user.user_info (user_email);
{code}

> Segfault during memtable flush
> --
>
> Key: CASSANDRA-13999
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13999
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Ricardo Bartolome
>Priority: Critical
> Attachments: 
> cassandra-jvm-file-error-1509698372-pid16151.log.obfuscated, 
> cassandra_config.yaml, node_crashing_debug.log
>
>
> We are getting segfaults on a production Cassandra cluster, apparently caused 
> by Memtable flushes to disk.
> {code}
> Current thread (0x0cd77920):  JavaThread 
> "PerDiskMemtableFlushWriter_0:140" daemon [_thread_in_Java, id=28952, 
> stack(0x7f8b7aa53000,0x7f8b7aa94000)]
> {code}
> Stack
> {code}
> Stack: [0x7f8b7aa53000,0x7f8b7aa94000],  sp=0x7f8b7aa924a0,  free 
> space=253k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> J 21889 C2 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(Lorg/apache/cassandra/db/rows/UnfilteredRowIterator;)Lorg/apache/cassandra/db/RowIndexEntry;
>  (361 bytes) @ 0x7f8e9fcf75ac [0x7f8e9fcf42c0+0x32ec]
> J 22464 C2 
> org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents()V (383 
> bytes) @ 0x7f8e9f17b988 [0x7f8e9f17b5c0+0x3c8]
> j  
> org.apache.cassandra.db.Memtable$FlushRunnable.call()Lorg/apache/cassandra/io/sstable/SSTableMultiWriter;+1
> j  org.apache.cassandra.db.Memtable$FlushRunnable.call()Ljava/lang/Object;+1
> J 18865 C2 java.util.concurrent.FutureTask.run()V (126 bytes) @ 
> 0x7f8e9d3c9540 [0x7f8e9d3c93a0+0x1a0]
> J 21832 C2 
> java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
>  (225 bytes) @ 0x7f8e9f16856c [0x7f8e9f168400+0x16c]
> J 6720 C1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V (9 bytes) @ 
> 0x7f8e9def73c4 [0x7f8e9def72c0+0x104]
> J 22079 C2 java.lang.Thread.run()V (17 bytes) @ 0x7f8e9e67c4ac 
> [0x7f8e9e67c460+0x4c]
> v  ~StubRoutines::call_stub
> V  [libjvm.so+0x691d16]  JavaCalls::call_helper(JavaValue*, methodHandle*, 
> JavaCallArguments*, Thread*)+0x1056
> V  [libjvm.so+0x692221]  JavaCalls::call_virtual(JavaValue*, KlassHandle, 
> Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x321
> V  [libjvm.so+0x6926c7]  JavaCalls::call_virtual(JavaValue*, Handle, 
> KlassHandle, Symbol*, Symbol*, Thread*)+0x47
> V  [libjvm.so+0x72da50]  thread_entry(JavaThread*, Thread*)+0xa0
> V  [libjvm.so+0xa76833]  JavaThread::thread_main_inner()+0x103
> V  [libjvm.so+0xa7697c]  JavaThread::run()+0x11c
> V  [libjvm.so+0x927568]  java_start(Thread*)+0x108
> C  [libpthread.so.0+0x7de5]  start_thread+0xc5
> {code}
> For further details, we attached:
> * JVM error file with all details
> * cassandra config file (we are using offheap_buffers as 
> memtable_allocation_method)
> * some lines printed in debug.log when the JVM error file was created and 
> process died
> h5. Reproducing the issue
> So far we have been unable to reproduce it. It happens once/twice a week on 
> single nodes. It happens either during high load or low load times. We have 
> seen that when we replace EC2 instances and bootstrap new ones, due to 
> compactions happening on source nodes before stream starts, sometimes more 
> than a single node was affected by this, letting us with 2 out of 3 replicas 
> out and UnavailableExceptions in the cluster.
> This issue might have relation with CASSANDRA-12590 

[jira] [Issue Comment Deleted] (CASSANDRA-10857) Allow dropping COMPACT STORAGE flag from tables in 3.X

2017-11-07 Thread Alex Petrov (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-10857:

Comment: was deleted

(was: Github user ptnapoleon commented on a diff in the pull request:

https://github.com/apache/cassandra-dtest/pull/9#discussion_r149296143
  
--- Diff: cql_tests.py ---
@@ -698,6 +719,54 @@ def many_columns_test(self):
",".join(map(lambda i: "c_{}".format(i), range(width))) 
+
" FROM very_wide_table", [[i for i in range(width)]])
 
+@since("3.11", max_version="3.X")
+def drop_compact_storage_flag_test(self):
+"""
+Test for CASSANDRA-10857, verifying the schema change
+distribution across the other nodes.
+
+"""
+
+cluster = self.cluster
+
+cluster.populate(3).start()
+node1 = cluster.nodelist()[0]
+node2 = cluster.nodelist()[1]
+node3 = cluster.nodelist()[2]
+time.sleep(0.2)
--- End diff --

There's no need for this sleep.
)

> Allow dropping COMPACT STORAGE flag from tables in 3.X
> --
>
> Key: CASSANDRA-10857
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10857
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL, Distributed Metadata
>Reporter: Aleksey Yeschenko
>Assignee: Alex Petrov
>Priority: Blocker
>  Labels: client-impacting
> Fix For: 4.0, 3.0.x, 3.11.x
>
>
> Thrift allows users to define flexible mixed column families - where certain 
> columns would have explicitly pre-defined names, potentially non-default 
> validation types, and be indexed.
> Example:
> {code}
> create column family foo
> and default_validation_class = UTF8Type
> and column_metadata = [
> {column_name: bar, validation_class: Int32Type, index_type: KEYS},
> {column_name: baz, validation_class: UUIDType, index_type: KEYS}
> ];
> {code}
> Columns named {{bar}} and {{baz}} will be validated as {{Int32Type}} and 
> {{UUIDType}}, respectively, and be indexed. Columns with any other name will 
> be validated by {{UTF8Type}} and will not be indexed.
> With CASSANDRA-8099, {{bar}} and {{baz}} would be mapped to static columns 
> internally. However, being {{WITH COMPACT STORAGE}}, the table will only 
> expose {{bar}} and {{baz}} columns. Accessing any dynamic columns (any column 
> not named {{bar}} and {{baz}}) right now requires going through Thrift.
> This is blocking Thrift -> CQL migration for users who have mixed 
> dynamic/static column families. That said, it *shouldn't* be hard to allow 
> users to drop the {{compact}} flag to expose the table as it is internally 
> now, and be able to access all columns.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Issue Comment Deleted] (CASSANDRA-10857) Allow dropping COMPACT STORAGE flag from tables in 3.X

2017-11-07 Thread Alex Petrov (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-10857:

Comment: was deleted

(was: Github user ptnapoleon commented on a diff in the pull request:

https://github.com/apache/cassandra-dtest/pull/9#discussion_r149296195
  
--- Diff: cql_tests.py ---
@@ -698,6 +719,54 @@ def many_columns_test(self):
",".join(map(lambda i: "c_{}".format(i), range(width))) 
+
" FROM very_wide_table", [[i for i in range(width)]])
 
+@since("3.11", max_version="3.X")
+def drop_compact_storage_flag_test(self):
+"""
+Test for CASSANDRA-10857, verifying the schema change
+distribution across the other nodes.
+
+"""
+
+cluster = self.cluster
+
+cluster.populate(3).start()
+node1 = cluster.nodelist()[0]
--- End diff --

Its much more concise to just write `node1, node2, node3 = 
cluster.nodelist()`
)

> Allow dropping COMPACT STORAGE flag from tables in 3.X
> --
>
> Key: CASSANDRA-10857
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10857
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL, Distributed Metadata
>Reporter: Aleksey Yeschenko
>Assignee: Alex Petrov
>Priority: Blocker
>  Labels: client-impacting
> Fix For: 4.0, 3.0.x, 3.11.x
>
>
> Thrift allows users to define flexible mixed column families - where certain 
> columns would have explicitly pre-defined names, potentially non-default 
> validation types, and be indexed.
> Example:
> {code}
> create column family foo
> and default_validation_class = UTF8Type
> and column_metadata = [
> {column_name: bar, validation_class: Int32Type, index_type: KEYS},
> {column_name: baz, validation_class: UUIDType, index_type: KEYS}
> ];
> {code}
> Columns named {{bar}} and {{baz}} will be validated as {{Int32Type}} and 
> {{UUIDType}}, respectively, and be indexed. Columns with any other name will 
> be validated by {{UTF8Type}} and will not be indexed.
> With CASSANDRA-8099, {{bar}} and {{baz}} would be mapped to static columns 
> internally. However, being {{WITH COMPACT STORAGE}}, the table will only 
> expose {{bar}} and {{baz}} columns. Accessing any dynamic columns (any column 
> not named {{bar}} and {{baz}}) right now requires going through Thrift.
> This is blocking Thrift -> CQL migration for users who have mixed 
> dynamic/static column families. That said, it *shouldn't* be hard to allow 
> users to drop the {{compact}} flag to expose the table as it is internally 
> now, and be able to access all columns.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Issue Comment Deleted] (CASSANDRA-10857) Allow dropping COMPACT STORAGE flag from tables in 3.X

2017-11-07 Thread Alex Petrov (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-10857:

Comment: was deleted

(was: Github user ptnapoleon commented on a diff in the pull request:

https://github.com/apache/cassandra-dtest/pull/9#discussion_r149296242
  
--- Diff: cql_tests.py ---
@@ -698,6 +719,54 @@ def many_columns_test(self):
",".join(map(lambda i: "c_{}".format(i), range(width))) 
+
" FROM very_wide_table", [[i for i in range(width)]])
 
+@since("3.11", max_version="3.X")
+def drop_compact_storage_flag_test(self):
+"""
+Test for CASSANDRA-10857, verifying the schema change
+distribution across the other nodes.
+
+"""
+
+cluster = self.cluster
+
+cluster.populate(3).start()
+node1 = cluster.nodelist()[0]
+node2 = cluster.nodelist()[1]
+node3 = cluster.nodelist()[2]
+time.sleep(0.2)
+
+session1 = self.patient_cql_connection(node1)
+session2 = self.patient_cql_connection(node2)
+session3 = self.patient_cql_connection(node3)
+self.create_ks(session1, 'ks', 3)
+sessions = [session1, session2, session3]
+
+for session in sessions:
+session.set_keyspace('ks')
+
+session1.execute("""
+CREATE TABLE test_drop_compact_storage (k int PRIMARY KEY, s1 
int) WITH COMPACT STORAGE;
+""")
+time.sleep(1)
--- End diff --

No need for this sleep.
)

> Allow dropping COMPACT STORAGE flag from tables in 3.X
> --
>
> Key: CASSANDRA-10857
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10857
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL, Distributed Metadata
>Reporter: Aleksey Yeschenko
>Assignee: Alex Petrov
>Priority: Blocker
>  Labels: client-impacting
> Fix For: 4.0, 3.0.x, 3.11.x
>
>
> Thrift allows users to define flexible mixed column families - where certain 
> columns would have explicitly pre-defined names, potentially non-default 
> validation types, and be indexed.
> Example:
> {code}
> create column family foo
> and default_validation_class = UTF8Type
> and column_metadata = [
> {column_name: bar, validation_class: Int32Type, index_type: KEYS},
> {column_name: baz, validation_class: UUIDType, index_type: KEYS}
> ];
> {code}
> Columns named {{bar}} and {{baz}} will be validated as {{Int32Type}} and 
> {{UUIDType}}, respectively, and be indexed. Columns with any other name will 
> be validated by {{UTF8Type}} and will not be indexed.
> With CASSANDRA-8099, {{bar}} and {{baz}} would be mapped to static columns 
> internally. However, being {{WITH COMPACT STORAGE}}, the table will only 
> expose {{bar}} and {{baz}} columns. Accessing any dynamic columns (any column 
> not named {{bar}} and {{baz}}) right now requires going through Thrift.
> This is blocking Thrift -> CQL migration for users who have mixed 
> dynamic/static column families. That said, it *shouldn't* be hard to allow 
> users to drop the {{compact}} flag to expose the table as it is internally 
> now, and be able to access all columns.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13999) Segfault during memtable flush

2017-11-07 Thread Ricardo Bartolome (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ricardo Bartolome updated CASSANDRA-13999:
--
Description: 
We are getting segfaults on a production Cassandra cluster, apparently caused 
by Memtable flushes to disk.
{code}
Current thread (0x0cd77920):  JavaThread 
"PerDiskMemtableFlushWriter_0:140" daemon [_thread_in_Java, id=28952, 
stack(0x7f8b7aa53000,0x7f8b7aa94000)]
{code}

Stack
{code}
Stack: [0x7f8b7aa53000,0x7f8b7aa94000],  sp=0x7f8b7aa924a0,  free 
space=253k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
J 21889 C2 
org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(Lorg/apache/cassandra/db/rows/UnfilteredRowIterator;)Lorg/apache/cassandra/db/RowIndexEntry;
 (361 bytes) @ 0x7f8e9fcf75ac [0x7f8e9fcf42c0+0x32ec]
J 22464 C2 
org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents()V (383 
bytes) @ 0x7f8e9f17b988 [0x7f8e9f17b5c0+0x3c8]
j  
org.apache.cassandra.db.Memtable$FlushRunnable.call()Lorg/apache/cassandra/io/sstable/SSTableMultiWriter;+1
j  org.apache.cassandra.db.Memtable$FlushRunnable.call()Ljava/lang/Object;+1
J 18865 C2 java.util.concurrent.FutureTask.run()V (126 bytes) @ 
0x7f8e9d3c9540 [0x7f8e9d3c93a0+0x1a0]
J 21832 C2 
java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
 (225 bytes) @ 0x7f8e9f16856c [0x7f8e9f168400+0x16c]
J 6720 C1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V (9 bytes) @ 
0x7f8e9def73c4 [0x7f8e9def72c0+0x104]
J 22079 C2 java.lang.Thread.run()V (17 bytes) @ 0x7f8e9e67c4ac 
[0x7f8e9e67c460+0x4c]
v  ~StubRoutines::call_stub
V  [libjvm.so+0x691d16]  JavaCalls::call_helper(JavaValue*, methodHandle*, 
JavaCallArguments*, Thread*)+0x1056
V  [libjvm.so+0x692221]  JavaCalls::call_virtual(JavaValue*, KlassHandle, 
Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x321
V  [libjvm.so+0x6926c7]  JavaCalls::call_virtual(JavaValue*, Handle, 
KlassHandle, Symbol*, Symbol*, Thread*)+0x47
V  [libjvm.so+0x72da50]  thread_entry(JavaThread*, Thread*)+0xa0
V  [libjvm.so+0xa76833]  JavaThread::thread_main_inner()+0x103
V  [libjvm.so+0xa7697c]  JavaThread::run()+0x11c
V  [libjvm.so+0x927568]  java_start(Thread*)+0x108
C  [libpthread.so.0+0x7de5]  start_thread+0xc5
{code}

For further details, we attached:
* JVM error file with all details
* cassandra config file (we are using offheap_buffers as 
memtable_allocation_method)
* some lines printed in debug.log when the JVM error file was created and 
process died

h5. Reproducing the issue
So far we have been unable to reproduce it. It happens once/twice a week on 
single nodes. It happens either during high load or low load times. We have 
seen that when we replace EC2 instances and bootstrap new ones, due to 
compactions happening on source nodes before stream starts, sometimes more than 
a single node was affected by this, letting us with 2 out of 3 replicas out and 
UnavailableExceptions in the cluster.

This issue might have relation with CASSANDRA-12590 (Segfault reading secondary 
index) even this is the write path. Can someone confirm if both issues could be 
related? 

h5. Specifics of our scenario:
* Cassandra 3.9 on Amazon Linux (previous to this, we were running Cassandra 
2.0.9 and there are no records of this also happening, even I was not working 
on Cassandra)
* 12 x i3.2xlarge EC2 instances (8 core, 64GB RAM)
* a total of 176 keyspaces (there is a per-customer pattern)
** Some keyspaces have a single table, while others have 2 or 5 tables
** There is a table that uses standard Secondary Indexes ("emailindex" on 
"user_info" table)
* It happens on both Oracle JDK 1.8.0_112 and 1.8.0_131
* It happens in both kernel 4.9.43-17.38.amzn1.x86_64 and 
3.14.35-28.38.amzn1.x86_64


h5. Possible workarounds/solutions (to be validated yet)
* switching to heap_buffers (in case offheap_buffers triggers the bug), even we 
are still pending to measure performance degradation under that scenario.
* removing secondary indexes in favour of Materialized Views for this specific 
case, even we are concerned too about the fact that using MVs introduces new 
issues that may be present in our current Cassandra 3.9
* Upgrading to 3.11.1 is an option, but we are trying to keep it as last resort 
given that the cost of migrating is big and we don't have any guarantee that 
new bugs that affects nodes availability are not introduced.

  was:
We are getting segfaults on a production Cassandra cluster, apparently caused 
by Memtable flushes to disk.
{code}
Current thread (0x0cd77920):  JavaThread 
"PerDiskMemtableFlushWriter_0:140" daemon [_thread_in_Java, id=28952, 
stack(0x7f8b7aa53000,0x7f8b7aa94000)]
{code}

Stack
{code}
Stack: [0x7f8b7aa53000,0x7f8b7aa94000],  sp=0x7f8b7aa924a0,  free 
space=253k
Native frames: 

[jira] [Updated] (CASSANDRA-13999) Segfault during memtable flush

2017-11-07 Thread Ricardo Bartolome (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ricardo Bartolome updated CASSANDRA-13999:
--
Description: 
We are getting segfaults on a production Cassandra cluster, apparently caused 
by Memtable flushes to disk.
{code}
Current thread (0x0cd77920):  JavaThread 
"PerDiskMemtableFlushWriter_0:140" daemon [_thread_in_Java, id=28952, 
stack(0x7f8b7aa53000,0x7f8b7aa94000)]
{code}

Stack
{code}
Stack: [0x7f8b7aa53000,0x7f8b7aa94000],  sp=0x7f8b7aa924a0,  free 
space=253k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
J 21889 C2 
org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(Lorg/apache/cassandra/db/rows/UnfilteredRowIterator;)Lorg/apache/cassandra/db/RowIndexEntry;
 (361 bytes) @ 0x7f8e9fcf75ac [0x7f8e9fcf42c0+0x32ec]
J 22464 C2 
org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents()V (383 
bytes) @ 0x7f8e9f17b988 [0x7f8e9f17b5c0+0x3c8]
j  
org.apache.cassandra.db.Memtable$FlushRunnable.call()Lorg/apache/cassandra/io/sstable/SSTableMultiWriter;+1
j  org.apache.cassandra.db.Memtable$FlushRunnable.call()Ljava/lang/Object;+1
J 18865 C2 java.util.concurrent.FutureTask.run()V (126 bytes) @ 
0x7f8e9d3c9540 [0x7f8e9d3c93a0+0x1a0]
J 21832 C2 
java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
 (225 bytes) @ 0x7f8e9f16856c [0x7f8e9f168400+0x16c]
J 6720 C1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V (9 bytes) @ 
0x7f8e9def73c4 [0x7f8e9def72c0+0x104]
J 22079 C2 java.lang.Thread.run()V (17 bytes) @ 0x7f8e9e67c4ac 
[0x7f8e9e67c460+0x4c]
v  ~StubRoutines::call_stub
V  [libjvm.so+0x691d16]  JavaCalls::call_helper(JavaValue*, methodHandle*, 
JavaCallArguments*, Thread*)+0x1056
V  [libjvm.so+0x692221]  JavaCalls::call_virtual(JavaValue*, KlassHandle, 
Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x321
V  [libjvm.so+0x6926c7]  JavaCalls::call_virtual(JavaValue*, Handle, 
KlassHandle, Symbol*, Symbol*, Thread*)+0x47
V  [libjvm.so+0x72da50]  thread_entry(JavaThread*, Thread*)+0xa0
V  [libjvm.so+0xa76833]  JavaThread::thread_main_inner()+0x103
V  [libjvm.so+0xa7697c]  JavaThread::run()+0x11c
V  [libjvm.so+0x927568]  java_start(Thread*)+0x108
C  [libpthread.so.0+0x7de5]  start_thread+0xc5
{code}

For further details, we attached:
* JVM error file with all details
* cassandra config file (we are using offheap_buffers as 
memtable_allocation_method)
* some lines printed in debug.log when the JVM error file was created and 
process died

h5. Reproducing the issue
So far we have been unable to reproduce it. It happens once/twice a week on 
single nodes. It happens either during high load or low load times. We have 
seen that when we replace EC2 instances and bootstrap new ones, due to 
compactions happening on source nodes before stream starts, sometimes more than 
a single node was affected by this, letting us with 2 out of 3 replicas out and 
UnavailableExceptions in the cluster.

It issue might have relation with CASSANDRA-12590 (Segfault reading secondary 
index) even this is the write path. Can someone confirm if both issues can be 
related? 

h5. Specifics of our scenario:
* Cassandra 3.9 on Amazon Linux (previous to this, we were running Cassandra 
2.0.9 and there are no records of this also happening, even I was not working 
on Cassandra)
* 12 x i3.2xlarge EC2 instances (8 core, 64GB RAM)
* a total of 176 keyspaces (there is a per-customer pattern)
** Some keyspaces have a single table, while others have 2 or 5 tables
** There is a table that uses standard Secondary Indexes ("emailindex" on 
"user_info" table)
* It happens on both Oracle JDK 1.8.0_112 and 1.8.0_131
* It happens in both kernel 4.9.43-17.38.amzn1.x86_64 and 
3.14.35-28.38.amzn1.x86_64


h5. Possible workarounds/solutions (to be validated yet)
* switching to heap_buffers (in case offheap_buffers triggers the bug), even we 
are still pending to measure performance degradation under that scenario.
* removing secondary indexes in favour of Materialized Views for this specific 
case, even we are concerned too about the fact that using MVs introduces new 
issues that may be present in our current Cassandra 3.9
* Upgrading to 3.11.1 is an option, but we are trying to keep it as last resort 
given that the cost of migrating is big and we don't have any guarantee that 
new bugs that affects nodes availability are not introduced.

  was:
We are getting segfaults on a production Cassandra cluster, apparently caused 
by Memtable flushes to disk.
{code}
Current thread (0x0cd77920):  JavaThread 
"PerDiskMemtableFlushWriter_0:140" daemon [_thread_in_Java, id=28952, 
stack(0x7f8b7aa53000,0x7f8b7aa94000)]
{code}

Stack
{code}
Stack: [0x7f8b7aa53000,0x7f8b7aa94000],  sp=0x7f8b7aa924a0,  free 
space=253k
Native frames: 

[jira] [Updated] (CASSANDRA-13999) Segfault during memtable flush

2017-11-07 Thread Ricardo Bartolome (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ricardo Bartolome updated CASSANDRA-13999:
--
Description: 
We are getting segfaults on a production Cassandra cluster, apparently caused 
by Memtable flushes to disk.
{code}
Current thread (0x0cd77920):  JavaThread 
"PerDiskMemtableFlushWriter_0:140" daemon [_thread_in_Java, id=28952, 
stack(0x7f8b7aa53000,0x7f8b7aa94000)]
{code}

Stack
{code}
Stack: [0x7f8b7aa53000,0x7f8b7aa94000],  sp=0x7f8b7aa924a0,  free 
space=253k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
J 21889 C2 
org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(Lorg/apache/cassandra/db/rows/UnfilteredRowIterator;)Lorg/apache/cassandra/db/RowIndexEntry;
 (361 bytes) @ 0x7f8e9fcf75ac [0x7f8e9fcf42c0+0x32ec]
J 22464 C2 
org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents()V (383 
bytes) @ 0x7f8e9f17b988 [0x7f8e9f17b5c0+0x3c8]
j  
org.apache.cassandra.db.Memtable$FlushRunnable.call()Lorg/apache/cassandra/io/sstable/SSTableMultiWriter;+1
j  org.apache.cassandra.db.Memtable$FlushRunnable.call()Ljava/lang/Object;+1
J 18865 C2 java.util.concurrent.FutureTask.run()V (126 bytes) @ 
0x7f8e9d3c9540 [0x7f8e9d3c93a0+0x1a0]
J 21832 C2 
java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
 (225 bytes) @ 0x7f8e9f16856c [0x7f8e9f168400+0x16c]
J 6720 C1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V (9 bytes) @ 
0x7f8e9def73c4 [0x7f8e9def72c0+0x104]
J 22079 C2 java.lang.Thread.run()V (17 bytes) @ 0x7f8e9e67c4ac 
[0x7f8e9e67c460+0x4c]
v  ~StubRoutines::call_stub
V  [libjvm.so+0x691d16]  JavaCalls::call_helper(JavaValue*, methodHandle*, 
JavaCallArguments*, Thread*)+0x1056
V  [libjvm.so+0x692221]  JavaCalls::call_virtual(JavaValue*, KlassHandle, 
Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x321
V  [libjvm.so+0x6926c7]  JavaCalls::call_virtual(JavaValue*, Handle, 
KlassHandle, Symbol*, Symbol*, Thread*)+0x47
V  [libjvm.so+0x72da50]  thread_entry(JavaThread*, Thread*)+0xa0
V  [libjvm.so+0xa76833]  JavaThread::thread_main_inner()+0x103
V  [libjvm.so+0xa7697c]  JavaThread::run()+0x11c
V  [libjvm.so+0x927568]  java_start(Thread*)+0x108
C  [libpthread.so.0+0x7de5]  start_thread+0xc5
{code}

h5. Reproducing the issue
So far we have been unable to reproduce it. It happens once/twice a week on 
single nodes. It happens either during high load or low load times. We have 
seen that when we replace EC2 instances and bootstrap new ones, due to 
compactions happening on source nodes before stream starts, sometimes more than 
a single node was affected by this, letting us with 2 out of 3 replicas out and 
UnavailableExceptions in the cluster. 

For further details, we attached:
* JVM error file with all details
* cassandra config file (we are using offheap_buffers as 
memtable_allocation_method)
* some lines printed in debug.log when the JVM error file was created and 
process died

Specifics of our scenario:
* Cassandra 3.9 on Amazon Linux (previous to this, we were running Cassandra 
2.0.9 and there are no records of this also happening, even I was not working 
on Cassandra)
* 12 x i3.2xlarge EC2 instances (8 core, 64GB RAM)
* a total of 176 keyspaces (there is a per-customer pattern)
** Some keyspaces have a single table, while others have 2 or 5 tables
** There is a table that uses standard Secondary Indexes ("emailindex" on 
"user_info" table)
* It happens on both Oracle JDK 1.8.0_112 and 1.8.0_131
* It happens in both kernel 4.9.43-17.38.amzn1.x86_64 and 
3.14.35-28.38.amzn1.x86_64



It issue might have relation with CASSANDRA-12590 (Segfault reading secondary 
index) even this is the write path. Can someone confirm if both issues can be 
related?

In our side we are thinking about:
* switching to heap_buffers (in case offheap_buffers triggers the bug), even we 
are still pending to measure performance degradation under that scenario.
* removing secondary indexes in favour of Materialized Views for this specific 
case, even we are concerned too about the fact that using MVs introduces new 
issues that may be present in our current Cassandra 3.9
* Upgrading to 3.11.1 is an option, but we are trying to keep it as last resort 
given that the cost of migrating is big and we don't have any guarantee that 
new bugs that affects nodes availability are not introduced.

  was:
We are getting segfaults on a production Cassandra cluster, apparently caused 
by Memtable flushes to disk.
{code}
Current thread (0x0cd77920):  JavaThread 
"PerDiskMemtableFlushWriter_0:140" daemon [_thread_in_Java, id=28952, 
stack(0x7f8b7aa53000,0x7f8b7aa94000)]
{code}

Stack
{code}
Stack: [0x7f8b7aa53000,0x7f8b7aa94000],  sp=0x7f8b7aa924a0,  free 
space=253k
Native frames: (J=compiled Java code, 

[jira] [Created] (CASSANDRA-13999) Segfault during memtable flush

2017-11-07 Thread Ricardo Bartolome (JIRA)
Ricardo Bartolome created CASSANDRA-13999:
-

 Summary: Segfault during memtable flush
 Key: CASSANDRA-13999
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13999
 Project: Cassandra
  Issue Type: Bug
  Components: Local Write-Read Paths
Reporter: Ricardo Bartolome
Priority: Critical
 Attachments: 
cassandra-jvm-file-error-1509698372-pid16151.log.obfuscated, 
cassandra_config.yaml, node_crashing_debug.log

We are getting segfaults on a production Cassandra cluster, apparently caused 
by Memtable flushes to disk.
{code}
Current thread (0x0cd77920):  JavaThread 
"PerDiskMemtableFlushWriter_0:140" daemon [_thread_in_Java, id=28952, 
stack(0x7f8b7aa53000,0x7f8b7aa94000)]
{code}

Stack
{code}
Stack: [0x7f8b7aa53000,0x7f8b7aa94000],  sp=0x7f8b7aa924a0,  free 
space=253k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
J 21889 C2 
org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(Lorg/apache/cassandra/db/rows/UnfilteredRowIterator;)Lorg/apache/cassandra/db/RowIndexEntry;
 (361 bytes) @ 0x7f8e9fcf75ac [0x7f8e9fcf42c0+0x32ec]
J 22464 C2 
org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents()V (383 
bytes) @ 0x7f8e9f17b988 [0x7f8e9f17b5c0+0x3c8]
j  
org.apache.cassandra.db.Memtable$FlushRunnable.call()Lorg/apache/cassandra/io/sstable/SSTableMultiWriter;+1
j  org.apache.cassandra.db.Memtable$FlushRunnable.call()Ljava/lang/Object;+1
J 18865 C2 java.util.concurrent.FutureTask.run()V (126 bytes) @ 
0x7f8e9d3c9540 [0x7f8e9d3c93a0+0x1a0]
J 21832 C2 
java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
 (225 bytes) @ 0x7f8e9f16856c [0x7f8e9f168400+0x16c]
J 6720 C1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V (9 bytes) @ 
0x7f8e9def73c4 [0x7f8e9def72c0+0x104]
J 22079 C2 java.lang.Thread.run()V (17 bytes) @ 0x7f8e9e67c4ac 
[0x7f8e9e67c460+0x4c]
v  ~StubRoutines::call_stub
V  [libjvm.so+0x691d16]  JavaCalls::call_helper(JavaValue*, methodHandle*, 
JavaCallArguments*, Thread*)+0x1056
V  [libjvm.so+0x692221]  JavaCalls::call_virtual(JavaValue*, KlassHandle, 
Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x321
V  [libjvm.so+0x6926c7]  JavaCalls::call_virtual(JavaValue*, Handle, 
KlassHandle, Symbol*, Symbol*, Thread*)+0x47
V  [libjvm.so+0x72da50]  thread_entry(JavaThread*, Thread*)+0xa0
V  [libjvm.so+0xa76833]  JavaThread::thread_main_inner()+0x103
V  [libjvm.so+0xa7697c]  JavaThread::run()+0x11c
V  [libjvm.so+0x927568]  java_start(Thread*)+0x108
C  [libpthread.so.0+0x7de5]  start_thread+0xc5
{code}

For further details, we attached:
* JVM error file with all details
* cassandra config file (we are using offheap_buffers as 
memtable_allocation_method)
* some lines printed in debug.log when the JVM error file was created and 
process died

Specifics of our scenario:
* Cassandra 3.9 on Amazon Linux (previous to this, we were running Cassandra 
2.0.9 and there are no records of this also happening, even I was not working 
on Cassandra)
* 12 x i3.2xlarge EC2 instances (8 core, 64GB RAM)
* a total of 176 keyspaces (there is a per-customer pattern)
** Some keyspaces have a single table, while others have 2 or 5 tables
** There is a table that uses standard Secondary Indexes ("emailindex" on 
"user_info" table)
* It happens on both Oracle JDK 1.8.0_112 and 1.8.0_131
* It happens in both kernel 4.9.43-17.38.amzn1.x86_64 and 
3.14.35-28.38.amzn1.x86_64
* It happens once/twice a week on single nodes. It happens either during high 
load or low load times. We have seen that when we replace EC2 instances and 
bootstrap new ones, due to compactions happening on source nodes before stream 
starts, sometimes more than a single node was affected by this, letting us with 
2 out of 3 replicas out and UnavailableExceptions in the cluster. 


It issue might have relation with CASSANDRA-12590 (Segfault reading secondary 
index) even this is the write path. Can someone confirm if both issues can be 
related?

In our side we are thinking about:
* switching to heap_buffers (in case offheap_buffers triggers the bug), even we 
are still pending to measure performance degradation under that scenario.
* removing secondary indexes in favour of Materialized Views for this specific 
case, even we are concerned too about the fact that using MVs introduces new 
issues that may be present in our current Cassandra 3.9
* Upgrading to 3.11.1 is an option, but we are trying to keep it as last resort 
given that the cost of migrating is big and we don't have any guarantee that 
new bugs that affects nodes availability are not introduced.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: 

[jira] [Updated] (CASSANDRA-13992) Don't send new_metadata_id for conditional updates

2017-11-07 Thread Kurt Greaves (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Greaves updated CASSANDRA-13992:
-
Status: Awaiting Feedback  (was: Open)

> Don't send new_metadata_id for conditional updates
> --
>
> Key: CASSANDRA-13992
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13992
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Olivier Michallat
>Priority: Minor
>
> This is a follow-up to CASSANDRA-10786.
> Given the table
> {code}
> CREATE TABLE foo (k int PRIMARY KEY)
> {code}
> And the prepared statement
> {code}
> INSERT INTO foo (k) VALUES (?) IF NOT EXISTS
> {code}
> The result set metadata changes depending on the outcome of the update:
> * if the row didn't exist, there is only a single column \[applied] = true
> * if it did, the result contains \[applied] = false, plus the current value 
> of column k.
> The way this was handled so far is that the PREPARED response contains no 
> result set metadata, and therefore all EXECUTE messages have SKIP_METADATA = 
> false, and the responses always include the full (and correct) metadata.
> CASSANDRA-10786 still sends the PREPARED response with no metadata, *but the 
> response to EXECUTE now contains a {{new_metadata_id}}*. The driver thinks it 
> is because of a schema change, and updates its local copy of the prepared 
> statement's result metadata.
> The next EXECUTE is sent with SKIP_METADATA = true, but the server appears to 
> ignore that, and still sends the metadata in the response. So each response 
> includes the correct metadata, the driver uses it, and there is no visible 
> issue for client code.
> The only drawback is that the driver updates its local copy of the metadata 
> unnecessarily, every time. We can work around that by only updating if we had 
> metadata before, at the cost of an extra volatile read. But I think the best 
> thing to do would be to never send a {{new_metadata_id}} in for a conditional 
> update.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13992) Don't send new_metadata_id for conditional updates

2017-11-07 Thread Kurt Greaves (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16241847#comment-16241847
 ] 

Kurt Greaves commented on CASSANDRA-13992:
--

bq. The next EXECUTE is sent with SKIP_METADATA = true, but the server appears 
to ignore that
I believe this is because METADATA_CHANGED will take precedence. If C* thinks 
the metadata changed it will set the METADATA_CHANGED flag and the driver 
should need to update it's metadata. TBH this isn't super clear from the spec 
but appears to be what the code achieves 
[here|https://github.com/apache/cassandra/blob/922dbdb658b1693973926026b213153d05b4077c/src/java/org/apache/cassandra/transport/messages/ExecuteMessage.java#L174].

I may have no idea what I'm talking about but I think the simplest solution to 
bq. never send a new_metadata_id in for a conditional update.
would be to simply always use the same digest for any LWT.
I think the following patch achieves this without breaking anything but I 
haven't confirmed if it actually fixes the driver issue yet. If someone with 
more understanding of the protocol and what not could have a glance and let me 
know if this makes sense or point me in the right direction.
[trunk|https://github.com/apache/cassandra/compare/trunk...kgreav:13992-trunk]

> Don't send new_metadata_id for conditional updates
> --
>
> Key: CASSANDRA-13992
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13992
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Olivier Michallat
>Priority: Minor
>
> This is a follow-up to CASSANDRA-10786.
> Given the table
> {code}
> CREATE TABLE foo (k int PRIMARY KEY)
> {code}
> And the prepared statement
> {code}
> INSERT INTO foo (k) VALUES (?) IF NOT EXISTS
> {code}
> The result set metadata changes depending on the outcome of the update:
> * if the row didn't exist, there is only a single column \[applied] = true
> * if it did, the result contains \[applied] = false, plus the current value 
> of column k.
> The way this was handled so far is that the PREPARED response contains no 
> result set metadata, and therefore all EXECUTE messages have SKIP_METADATA = 
> false, and the responses always include the full (and correct) metadata.
> CASSANDRA-10786 still sends the PREPARED response with no metadata, *but the 
> response to EXECUTE now contains a {{new_metadata_id}}*. The driver thinks it 
> is because of a schema change, and updates its local copy of the prepared 
> statement's result metadata.
> The next EXECUTE is sent with SKIP_METADATA = true, but the server appears to 
> ignore that, and still sends the metadata in the response. So each response 
> includes the correct metadata, the driver uses it, and there is no visible 
> issue for client code.
> The only drawback is that the driver updates its local copy of the metadata 
> unnecessarily, every time. We can work around that by only updating if we had 
> metadata before, at the cost of an extra volatile read. But I think the best 
> thing to do would be to never send a {{new_metadata_id}} in for a conditional 
> update.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-13992) Don't send new_metadata_id for conditional updates

2017-11-07 Thread Kurt Greaves (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Greaves reassigned CASSANDRA-13992:


Assignee: Kurt Greaves

> Don't send new_metadata_id for conditional updates
> --
>
> Key: CASSANDRA-13992
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13992
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Olivier Michallat
>Assignee: Kurt Greaves
>Priority: Minor
>
> This is a follow-up to CASSANDRA-10786.
> Given the table
> {code}
> CREATE TABLE foo (k int PRIMARY KEY)
> {code}
> And the prepared statement
> {code}
> INSERT INTO foo (k) VALUES (?) IF NOT EXISTS
> {code}
> The result set metadata changes depending on the outcome of the update:
> * if the row didn't exist, there is only a single column \[applied] = true
> * if it did, the result contains \[applied] = false, plus the current value 
> of column k.
> The way this was handled so far is that the PREPARED response contains no 
> result set metadata, and therefore all EXECUTE messages have SKIP_METADATA = 
> false, and the responses always include the full (and correct) metadata.
> CASSANDRA-10786 still sends the PREPARED response with no metadata, *but the 
> response to EXECUTE now contains a {{new_metadata_id}}*. The driver thinks it 
> is because of a schema change, and updates its local copy of the prepared 
> statement's result metadata.
> The next EXECUTE is sent with SKIP_METADATA = true, but the server appears to 
> ignore that, and still sends the metadata in the response. So each response 
> includes the correct metadata, the driver uses it, and there is no visible 
> issue for client code.
> The only drawback is that the driver updates its local copy of the metadata 
> unnecessarily, every time. We can work around that by only updating if we had 
> metadata before, at the cost of an extra volatile read. But I think the best 
> thing to do would be to never send a {{new_metadata_id}} in for a conditional 
> update.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13403) nodetool repair breaks SASI index

2017-11-07 Thread Ludovic Boutros (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ludovic Boutros updated CASSANDRA-13403:

Attachment: 3_nodes_compaction.log
4_nodes_compaction.log

> nodetool repair breaks SASI index
> -
>
> Key: CASSANDRA-13403
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13403
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: 3.10
>Reporter: Igor Novgorodov
>Assignee: Alex Petrov
> Attachments: 3_nodes_compaction.log, 4_nodes_compaction.log
>
>
> I've got table:
> {code}
> CREATE TABLE cservice.bulks_recipients (
> recipient text,
> bulk_id uuid,
> datetime_final timestamp,
> datetime_sent timestamp,
> request_id uuid,
> status int,
> PRIMARY KEY (recipient, bulk_id)
> ) WITH CLUSTERING ORDER BY (bulk_id ASC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
> CREATE CUSTOM INDEX bulk_recipients_bulk_id ON cservice.bulks_recipients 
> (bulk_id) USING 'org.apache.cassandra.index.sasi.SASIIndex';
> {code}
> There are 11 rows in it:
> {code}
> > select * from bulks_recipients;
> ...
> (11 rows)
> {code}
> Let's query by index (all rows have the same *bulk_id*):
> {code}
> > select * from bulks_recipients where bulk_id = 
> > baa94815-e276-4ca4-adda-5b9734e6c4a5;   
> >   
> ...
> (11 rows)
> {code}
> Ok, everything is fine.
> Now i'm doing *nodetool repair --partitioner-range --job-threads 4 --full* on 
> each node in cluster sequentially.
> After it finished:
> {code}
> > select * from bulks_recipients where bulk_id = 
> > baa94815-e276-4ca4-adda-5b9734e6c4a5;
> ...
> (2 rows)
> {code}
> Only two rows.
> While the rows are actually there:
> {code}
> > select * from bulks_recipients;
> ...
> (11 rows)
> {code}
> If i issue an incremental repair on a random node, i can get like 7 rows 
> after index query.
> Dropping index and recreating it fixes the issue. Is it a bug or am i doing 
> the repair the wrong way?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13403) nodetool repair breaks SASI index

2017-11-07 Thread Ludovic Boutros (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16241829#comment-16241829
 ] 

Ludovic Boutros commented on CASSANDRA-13403:
-

[~ifesdjeen],

in order to reproduce, I'm using a small C* 3.10 cluster with at least 4 nodes 
and 256 tokens.

Here is my keyspace declaration :

{code:sql}
CREATE KEYSPACE lubo_test WITH replication = {'class': 
'NetworkTopologyStrategy', 'dc1': '3'}  AND durable_writes = true;

CREATE TABLE lubo_test.t_doc (
id text,
r int,
cid timeuuid,
PRIMARY KEY (id, r, cid)
) WITH CLUSTERING ORDER BY (r DESC, cid DESC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
CREATE CUSTOM INDEX i_doc ON lubo_test.t_doc (r) USING 
'org.apache.cassandra.index.sasi.SASIIndex';
{code}

I'have added some docs :

{code:sql}
INSERT INTO lubo_test.t_doc ( id , r , cid ) VALUES ( '1', 0, now());
INSERT INTO lubo_test.t_doc ( id , r , cid ) VALUES ( '2', 0, now());
INSERT INTO lubo_test.t_doc ( id , r , cid ) VALUES ( '3', 0, now());
INSERT INTO lubo_test.t_doc ( id , r , cid ) VALUES ( '4', 0, now());
INSERT INTO lubo_test.t_doc ( id , r , cid ) VALUES ( '5', 0, now());
INSERT INTO lubo_test.t_doc ( id , r , cid ) VALUES ( '6', 0, now());
INSERT INTO lubo_test.t_doc ( id , r , cid ) VALUES ( '7', 0, now());
INSERT INTO lubo_test.t_doc ( id , r , cid ) VALUES ( '8', 0, now());
INSERT INTO lubo_test.t_doc ( id , r , cid ) VALUES ( '9', 0, now());
INSERT INTO lubo_test.t_doc ( id , r , cid ) VALUES ( '10', 0, now());
INSERT INTO lubo_test.t_doc ( id , r , cid ) VALUES ( '11', 0, now());
{code}

Then this query without repair :

{code:sql}
cassandra@cqlsh> SELECT * from lubo_test.t_doc where r = 0;

 id | r | cid
+---+--
  6 | 0 | 66f68be0-c316-11e7-a464-03e2ed27ae86
  7 | 0 | 66f74f30-c316-11e7-a464-03e2ed27ae86
  9 | 0 | 66f97210-c316-11e7-a464-03e2ed27ae86
 10 | 0 | 66faaa90-c316-11e7-a464-03e2ed27ae86
  4 | 0 | 66f46900-c316-11e7-a464-03e2ed27ae86
  3 | 0 | 66f37ea0-c316-11e7-a464-03e2ed27ae86
  5 | 0 | 66f5a180-c316-11e7-a464-03e2ed27ae86
  8 | 0 | 66f83990-c316-11e7-a464-03e2ed27ae86
  2 | 0 | 66f29440-c316-11e7-a464-03e2ed27ae86
 11 | 0 | 66fb6de0-c316-11e7-a464-03e2ed27ae86
  1 | 0 | 66ea56e0-c316-11e7-a464-03e2ed27ae86

(11 rows)
{code}

If I fire a "nodetool repair --full", then I have :

{code:sql}
cassandra@cqlsh> SELECT * from lubo_test.t_doc where r = 0;

 id | r | cid
+---+--
  6 | 0 | 66f68be0-c316-11e7-a464-03e2ed27ae86
  7 | 0 | 66f74f30-c316-11e7-a464-03e2ed27ae86
 10 | 0 | 66faaa90-c316-11e7-a464-03e2ed27ae86
  4 | 0 | 66f46900-c316-11e7-a464-03e2ed27ae86
  3 | 0 | 66f37ea0-c316-11e7-a464-03e2ed27ae86
  5 | 0 | 66f5a180-c316-11e7-a464-03e2ed27ae86
  2 | 0 | 66f29440-c316-11e7-a464-03e2ed27ae86
  1 | 0 | 66ea56e0-c316-11e7-a464-03e2ed27ae86

(8 rows)
{code}

I can fire a "rebuild_index" on each node to fix the problem.

I've checked the debug log differences between 3 and 4 nodes.

It seems that with 3 nodes, the Anticompaction process is not done. You can see 
in the log "mutating repairedAt instead of anticompacting".
Currently, I would bet that when anticompacting, the SASI Index is not rebuilt 
correctly, but that's just a bet.

I've attached the log extractions, if you need more, just ask.

> nodetool repair breaks SASI index
> -
>
> Key: CASSANDRA-13403
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13403
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: 3.10
>Reporter: Igor Novgorodov
>Assignee: Alex Petrov
>
> I've got table:
> {code}
> CREATE TABLE cservice.bulks_recipients (
> recipient text,
> bulk_id uuid,
> datetime_final timestamp,
> datetime_sent timestamp,
> request_id uuid,
> status int,
> PRIMARY KEY (recipient, bulk_id)
> ) WITH CLUSTERING ORDER BY (bulk_id ASC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': 

[jira] [Assigned] (CASSANDRA-13986) Fix native protocol v5 spec for new_metadata_id position in Rows response

2017-11-07 Thread Alex Petrov (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov reassigned CASSANDRA-13986:
---

Assignee: Alex Petrov

> Fix native protocol v5 spec for new_metadata_id position in Rows response
> -
>
> Key: CASSANDRA-13986
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13986
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Olivier Michallat
>Assignee: Alex Petrov
>Priority: Trivial
>
> There's a mistake in the protocol specification for CASSANDRA-10786. In 
> `native_protocol_v5.spec`, section 4.2.5.2:
> {code}
> 4.2.5.2. Rows
>   Indicates a set of rows. The rest of the body of a Rows result is:
> 
>   where:
> -  is composed of:
> 
> [][][?...]
> {code}
> The last line should be:
> {code}
> 
> [][][?...]
> {code}
> That is, if there is both a paging state and a new metadata id, the paging 
> state comes *first*, not second.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes

2017-11-07 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16241711#comment-16241711
 ] 

Paulo Motta commented on CASSANDRA-13948:
-

bq. Ok I was able to reproduce it on one node. I've attached the trace log. 
It's unfiltered since I didn't managed to filter only to 
org.apache.cassandra.db.compaction

I wasn't able to track down the root cause of this condition from the logs, but 
a similar issue was reported on CASSANDRA-12743, so I think this is some kind 
of race condition showing up due to the amount of concurrent compactions 
happening and is not a consequence of this fix, so I prefer to investigate this 
separately. If you still see this issue please feel free to reopen 
CASSANDRA-12743 with details.

bq. However I'm still facing issues with compactions. These are big nodes with 
with a big CF, holding many SSTables and pending compactions. According the 
thread dump it seems to be stuck around getNextBackgroundTask. Compactions are 
still being processed for the other keyspace. Beside that the node is running 
normally. Some nodetool commands takes time to proceed like compactionstats. 
Debug log doesn't show any error.

After having a look at the thread dump, it turns out that my previous patch 
generated a lock contention between compaction and cleanup, because each 
removed SSTable from cleanup generated a {{SSTableDeletingNotification}} and my 
previous patch submitted a new compaction task after each received notification 
which competed with the next {{SSTableDeletingNotification}} for the 
{{writeLock}} - making things slow overall, so I updated the patch to only 
submit a new compaction after receiving a flush notification as it was before, 
so this should be fixed now. [~llambiel]  would you mind trying the latest 
version now?

[~krummas] this should be ready for review now, the latest version already got 
a clean CI run, but I resubmitted a new internal CI run after doing the minor 
fix above and will update here when ready.

Summary of changes:
1) Reload compaction strategies when JBOD disk boundary changes 
([commit|https://github.com/pauloricardomg/cassandra/commit/6cab7e0a31a638cc4a957c4ecfa592035d874058])
2) Ensure compaction strategies do not loop indefinitely when not able to 
acquire Tracker lock 
([commit|https://github.com/pauloricardomg/cassandra/commit/3ef833d1e56c25f67bc8a3b49acf97b2efdf401d])
3) Only enable compaction strategies after gossip settles to prevent 
unnecessary relocation work 
([commit|https://github.com/pauloricardomg/cassandra/commit/eaf63dc3d52566ce0c4f91bbfec478305597f014])
4) Do not reload compaction strategies when receiving notifications and log 
warning when an SSTable is added multiple times to LCS 
([commit|https://github.com/pauloricardomg/cassandra/commit/3e61df70025e704ee0c9d6ee8754ccdd38f5ab6d])

Patches
* [3.11|https://github.com/pauloricardomg/cassandra/tree/3.11-13948]
* [trunk|https://github.com/pauloricardomg/cassandra/tree/trunk-13948]

I wonder if now that CSM caches the disk boundaries we can make the handling of 
notifications use the readLock instead of the writeLock, to reduce contention 
when there is a high number of concurrent compactors, do you see any potential 
problems with this? Even if the notification handling races with 
getNextBackground task, as long as the individual compaction strategies are 
synchronized getNextBackground task should get a consistent view of the 
strategy sstables when there is a concurrent notification from the tracker.

> Reload compaction strategies when JBOD disk boundary changes
> 
>
> Key: CASSANDRA-13948
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13948
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Paulo Motta
>Assignee: Paulo Motta
> Fix For: 3.11.x, 4.x
>
> Attachments: debug.log, threaddump.txt, trace.log
>
>
> The thread dump below shows a race between an sstable replacement by the 
> {{IndexSummaryRedistribution}} and 
> {{AbstractCompactionTask.getNextBackgroundTask}}:
> {noformat}
> Thread 94580: (state = BLOCKED)
>  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information 
> may be imprecise)
>  - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
> line=175 (Compiled frame)
>  - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() 
> @bci=1, line=836 (Compiled frame)
>  - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node,
>  int) @bci=67, line=870 (Compiled frame)
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) 
> @bci=17, line=1199 (Compiled frame)
>  - 

[jira] [Commented] (CASSANDRA-10857) Allow dropping COMPACT STORAGE flag from tables in 3.X

2017-11-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16241663#comment-16241663
 ] 

ASF GitHub Bot commented on CASSANDRA-10857:


Github user ptnapoleon commented on a diff in the pull request:

https://github.com/apache/cassandra-dtest/pull/9#discussion_r149296242
  
--- Diff: cql_tests.py ---
@@ -698,6 +719,54 @@ def many_columns_test(self):
",".join(map(lambda i: "c_{}".format(i), range(width))) 
+
" FROM very_wide_table", [[i for i in range(width)]])
 
+@since("3.11", max_version="3.X")
+def drop_compact_storage_flag_test(self):
+"""
+Test for CASSANDRA-10857, verifying the schema change
+distribution across the other nodes.
+
+"""
+
+cluster = self.cluster
+
+cluster.populate(3).start()
+node1 = cluster.nodelist()[0]
+node2 = cluster.nodelist()[1]
+node3 = cluster.nodelist()[2]
+time.sleep(0.2)
+
+session1 = self.patient_cql_connection(node1)
+session2 = self.patient_cql_connection(node2)
+session3 = self.patient_cql_connection(node3)
+self.create_ks(session1, 'ks', 3)
+sessions = [session1, session2, session3]
+
+for session in sessions:
+session.set_keyspace('ks')
+
+session1.execute("""
+CREATE TABLE test_drop_compact_storage (k int PRIMARY KEY, s1 
int) WITH COMPACT STORAGE;
+""")
+time.sleep(1)
--- End diff --

No need for this sleep.


> Allow dropping COMPACT STORAGE flag from tables in 3.X
> --
>
> Key: CASSANDRA-10857
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10857
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL, Distributed Metadata
>Reporter: Aleksey Yeschenko
>Assignee: Alex Petrov
>Priority: Blocker
>  Labels: client-impacting
> Fix For: 4.0, 3.0.x, 3.11.x
>
>
> Thrift allows users to define flexible mixed column families - where certain 
> columns would have explicitly pre-defined names, potentially non-default 
> validation types, and be indexed.
> Example:
> {code}
> create column family foo
> and default_validation_class = UTF8Type
> and column_metadata = [
> {column_name: bar, validation_class: Int32Type, index_type: KEYS},
> {column_name: baz, validation_class: UUIDType, index_type: KEYS}
> ];
> {code}
> Columns named {{bar}} and {{baz}} will be validated as {{Int32Type}} and 
> {{UUIDType}}, respectively, and be indexed. Columns with any other name will 
> be validated by {{UTF8Type}} and will not be indexed.
> With CASSANDRA-8099, {{bar}} and {{baz}} would be mapped to static columns 
> internally. However, being {{WITH COMPACT STORAGE}}, the table will only 
> expose {{bar}} and {{baz}} columns. Accessing any dynamic columns (any column 
> not named {{bar}} and {{baz}}) right now requires going through Thrift.
> This is blocking Thrift -> CQL migration for users who have mixed 
> dynamic/static column families. That said, it *shouldn't* be hard to allow 
> users to drop the {{compact}} flag to expose the table as it is internally 
> now, and be able to access all columns.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-10857) Allow dropping COMPACT STORAGE flag from tables in 3.X

2017-11-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16241661#comment-16241661
 ] 

ASF GitHub Bot commented on CASSANDRA-10857:


Github user ptnapoleon commented on a diff in the pull request:

https://github.com/apache/cassandra-dtest/pull/9#discussion_r149296195
  
--- Diff: cql_tests.py ---
@@ -698,6 +719,54 @@ def many_columns_test(self):
",".join(map(lambda i: "c_{}".format(i), range(width))) 
+
" FROM very_wide_table", [[i for i in range(width)]])
 
+@since("3.11", max_version="3.X")
+def drop_compact_storage_flag_test(self):
+"""
+Test for CASSANDRA-10857, verifying the schema change
+distribution across the other nodes.
+
+"""
+
+cluster = self.cluster
+
+cluster.populate(3).start()
+node1 = cluster.nodelist()[0]
--- End diff --

Its much more concise to just write `node1, node2, node3 = 
cluster.nodelist()`


> Allow dropping COMPACT STORAGE flag from tables in 3.X
> --
>
> Key: CASSANDRA-10857
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10857
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL, Distributed Metadata
>Reporter: Aleksey Yeschenko
>Assignee: Alex Petrov
>Priority: Blocker
>  Labels: client-impacting
> Fix For: 4.0, 3.0.x, 3.11.x
>
>
> Thrift allows users to define flexible mixed column families - where certain 
> columns would have explicitly pre-defined names, potentially non-default 
> validation types, and be indexed.
> Example:
> {code}
> create column family foo
> and default_validation_class = UTF8Type
> and column_metadata = [
> {column_name: bar, validation_class: Int32Type, index_type: KEYS},
> {column_name: baz, validation_class: UUIDType, index_type: KEYS}
> ];
> {code}
> Columns named {{bar}} and {{baz}} will be validated as {{Int32Type}} and 
> {{UUIDType}}, respectively, and be indexed. Columns with any other name will 
> be validated by {{UTF8Type}} and will not be indexed.
> With CASSANDRA-8099, {{bar}} and {{baz}} would be mapped to static columns 
> internally. However, being {{WITH COMPACT STORAGE}}, the table will only 
> expose {{bar}} and {{baz}} columns. Accessing any dynamic columns (any column 
> not named {{bar}} and {{baz}}) right now requires going through Thrift.
> This is blocking Thrift -> CQL migration for users who have mixed 
> dynamic/static column families. That said, it *shouldn't* be hard to allow 
> users to drop the {{compact}} flag to expose the table as it is internally 
> now, and be able to access all columns.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-10857) Allow dropping COMPACT STORAGE flag from tables in 3.X

2017-11-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16241662#comment-16241662
 ] 

ASF GitHub Bot commented on CASSANDRA-10857:


Github user ptnapoleon commented on a diff in the pull request:

https://github.com/apache/cassandra-dtest/pull/9#discussion_r149296143
  
--- Diff: cql_tests.py ---
@@ -698,6 +719,54 @@ def many_columns_test(self):
",".join(map(lambda i: "c_{}".format(i), range(width))) 
+
" FROM very_wide_table", [[i for i in range(width)]])
 
+@since("3.11", max_version="3.X")
+def drop_compact_storage_flag_test(self):
+"""
+Test for CASSANDRA-10857, verifying the schema change
+distribution across the other nodes.
+
+"""
+
+cluster = self.cluster
+
+cluster.populate(3).start()
+node1 = cluster.nodelist()[0]
+node2 = cluster.nodelist()[1]
+node3 = cluster.nodelist()[2]
+time.sleep(0.2)
--- End diff --

There's no need for this sleep.


> Allow dropping COMPACT STORAGE flag from tables in 3.X
> --
>
> Key: CASSANDRA-10857
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10857
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CQL, Distributed Metadata
>Reporter: Aleksey Yeschenko
>Assignee: Alex Petrov
>Priority: Blocker
>  Labels: client-impacting
> Fix For: 4.0, 3.0.x, 3.11.x
>
>
> Thrift allows users to define flexible mixed column families - where certain 
> columns would have explicitly pre-defined names, potentially non-default 
> validation types, and be indexed.
> Example:
> {code}
> create column family foo
> and default_validation_class = UTF8Type
> and column_metadata = [
> {column_name: bar, validation_class: Int32Type, index_type: KEYS},
> {column_name: baz, validation_class: UUIDType, index_type: KEYS}
> ];
> {code}
> Columns named {{bar}} and {{baz}} will be validated as {{Int32Type}} and 
> {{UUIDType}}, respectively, and be indexed. Columns with any other name will 
> be validated by {{UTF8Type}} and will not be indexed.
> With CASSANDRA-8099, {{bar}} and {{baz}} would be mapped to static columns 
> internally. However, being {{WITH COMPACT STORAGE}}, the table will only 
> expose {{bar}} and {{baz}} columns. Accessing any dynamic columns (any column 
> not named {{bar}} and {{baz}}) right now requires going through Thrift.
> This is blocking Thrift -> CQL migration for users who have mixed 
> dynamic/static column families. That said, it *shouldn't* be hard to allow 
> users to drop the {{compact}} flag to expose the table as it is internally 
> now, and be able to access all columns.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org