[jira] [Commented] (CASSANDRA-11748) Schema version mismatch may leads to Casandra OOM at bootstrap during a rolling upgrade process

2018-10-23 Thread Jeff Jirsa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661749#comment-16661749
 ] 

Jeff Jirsa commented on CASSANDRA-11748:


[~iamaleksey] / [~mbyrd] / [~spo...@gmail.com] / [~mcfongtw] - revisiting 
because I just duped CASSANDRA-14840 to this; since 11748 and 13569 are 
similar, is there any desire to dupe one of these to the other and concentrate 
on a single fix in 4.0 timeframe? 

> Schema version mismatch may leads to Casandra OOM at bootstrap during a 
> rolling upgrade process
> ---
>
> Key: CASSANDRA-11748
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11748
> Project: Cassandra
>  Issue Type: Bug
> Environment: Rolling upgrade process from 1.2.19 to 2.0.17. 
> CentOS 6.6
> Occurred in different C* node of different scale of deployment (2G ~ 5G)
>Reporter: Michael Fong
>Assignee: Matt Byrd
>Priority: Critical
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> We have observed multiple times when a multi-node C* (v2.0.17) cluster ran 
> into OOM in bootstrap during a rolling upgrade process from 1.2.19 to 2.0.17. 
> Here is the simple guideline of our rolling upgrade process
> 1. Update schema on a node, and wait until all nodes to be in schema version 
> agreemnt - via nodetool describeclulster
> 2. Restart a Cassandra node
> 3. After restart, there is a chance that the the restarted node has different 
> schema version.
> 4. All nodes in cluster start to rapidly exchange schema information, and any 
> of node could run into OOM. 
> The following is the system.log that occur in one of our 2-node cluster test 
> bed
> --
> Before rebooting node 2:
> Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 
> MigrationManager.java (line 328) Gossiping my schema version 
> 4cb463f8-5376-3baf-8e88-a5cc6a94f58f
> Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 
> MigrationManager.java (line 328) Gossiping my schema version 
> 4cb463f8-5376-3baf-8e88-a5cc6a94f58f
> After rebooting node 2, 
> Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java (line 328) 
> Gossiping my schema version f5270873-ba1f-39c7-ab2e-a86db868b09b
> The node2  keeps submitting the migration task over 100+ times to the other 
> node.
> INFO [GossipStage:1] 2016-04-19 11:18:18,261 Gossiper.java (line 1011) Node 
> /192.168.88.33 has restarted, now UP
> INFO [GossipStage:1] 2016-04-19 11:18:18,262 TokenMetadata.java (line 414) 
> Updating topology for /192.168.88.33
> ...
> DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line 
> 102) Submitting migration task for /192.168.88.33
> ... ( over 100+ times)
> --
> On the otherhand, Node 1 keeps updating its gossip information, followed by 
> receiving and submitting migrationTask afterwards: 
> INFO [RequestResponseStage:3] 2016-04-19 11:18:18,333 Gossiper.java (line 
> 978) InetAddress /192.168.88.34 is now UP
> ...
> DEBUG [MigrationStage:1] 2016-04-19 11:18:18,496 
> MigrationRequestVerbHandler.java (line 41) Received migration request from 
> /192.168.88.34.
> …… ( over 100+ times)
> DEBUG [OptionalTasks:1] 2016-04-19 11:19:18,337 MigrationManager.java (line 
> 127) submitting migration task for /192.168.88.34
> .  (over 50+ times)
> On the side note, we have over 200+ column families defined in Cassandra 
> database, which may related to this amount of rpc traffic.
> P.S.2 The over requested schema migration task will eventually have 
> InternalResponseStage performing schema merge operation. Since this operation 
> requires a compaction for each merge and is much slower to consume. Thus, the 
> back-pressure of incoming schema migration content objects consumes all of 
> the heap space and ultimately ends up OOM!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Resolved] (CASSANDRA-14444) Got NPE when querying Cassandra 3.11.2

2018-10-23 Thread Jeff Jirsa (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa resolved CASSANDRA-1.

Resolution: Duplicate

> Got NPE when querying Cassandra 3.11.2
> --
>
> Key: CASSANDRA-1
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: Ubuntu 14.04, JDK 1.8.0_171. 
> Cassandra 3.11.2
>Reporter: Xiaodong Xie
>Assignee: Xiaodong Xie
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.11.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We just upgraded our Cassandra cluster from 2.2.6 to 3.11.2
> After upgrading, we immediately got exceptions in Cassandra like this one: 
>  
> {code}
> ERROR [Native-Transport-Requests-1] 2018-05-11 17:10:21,994 
> QueryMessage.java:129 - Unexpected error during query
> java.lang.NullPointerException: null
> at 
> org.apache.cassandra.dht.RandomPartitioner.getToken(RandomPartitioner.java:248)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.dht.RandomPartitioner.decorateKey(RandomPartitioner.java:92)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at org.apache.cassandra.config.CFMetaData.decorateKey(CFMetaData.java:666) 
> ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.service.pager.PartitionRangeQueryPager.(PartitionRangeQueryPager.java:44)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.db.PartitionRangeReadCommand.getPager(PartitionRangeReadCommand.java:268)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.getPager(SelectStatement.java:475)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:288)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:118)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:224)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:255) 
> ~[apache-cassandra-3.11.2.jar:3.11.2]
> at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:240) 
> ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:116)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:517)
>  [apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:410)
>  [apache-cassandra-3.11.2.jar:3.11.2]
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>  [netty-all-4.0.44.Final.jar:4.0.44.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
>  [netty-all-4.0.44.Final.jar:4.0.44.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:35)
>  [netty-all-4.0.44.Final.jar:4.0.44.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:348)
>  [netty-all-4.0.44.Final.jar:4.0.44.Final]
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_171]
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
>  [apache-cassandra-3.11.2.jar:3.11.2]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) 
> [apache-cassandra-3.11.2.jar:3.11.2]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_171]
> {code}
>  
> The table schema is like:
> {code}
> CREATE TABLE example.example_table (
>  id bigint,
>  hash text,
>  json text,
>  PRIMARY KEY (id, hash)
> ) WITH COMPACT STORAGE
> {code}
>  
> The query is something like:
> {code}
> "select * from example.example_table;" // (We do know this is bad practise, 
> and we are trying to fix that right now)
> {code}
> with fetch-size as 200, using DataStax Java driver. 
> This table contains about 20k rows. 
>  
> Actually, the fix is quite simple, 
>  
> {code}
> --- a/src/java/org/apache/cassandra/service/pager/PagingState.java
> +++ b/src/java/org/apache/cassandra/service/pager/PagingState.java
> @@ -46,7 +46,7 @@ public class PagingState
> public PagingState(ByteBuffer partitionKey, RowMark rowMark, int remaining, 
> int remainingInPartition)
>  {
> - this.partitionKey = partitionKey;
> + this.partitionKey = partitionKey == null ? ByteBufferUtil.EMPTY_BYTE_BUFFER 
> : partitionKey;
>  this.rowMark = rowMark;
>  

[jira] [Resolved] (CASSANDRA-14840) Bootstrap of new node fails with OOM in a large cluster

2018-10-23 Thread Jeff Jirsa (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa resolved CASSANDRA-14840.

Resolution: Duplicate

> Bootstrap of new node fails with OOM in a large cluster
> ---
>
> Key: CASSANDRA-14840
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14840
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Critical
>
> We are seeing new node addition fails with OOM during bootstrap in a cluster 
> of more than 80 nodes and 3000 CF without any data in those CFs.
>  
> Steps to reproduce:
>  # Launch a 3 node cluster
>  # Create 3000 CF in the cluster
>  # Start adding nodes to the cluster one by one
>  # After adding 75-80 nodes, the new node bootstrap fails with OOM.
> {code:java}
> ERROR [PERIODIC-COMMIT-LOG-SYNCER] 2018-10-24 03:26:47,870 
> JVMStabilityInspector.java:78 - Exiting due to error while processing commit 
> log during initialization.
> java.lang.OutOfMemoryError: Java heap space
>  at java.util.regex.Pattern.matcher(Pattern.java:1093) ~[na:1.8.0_151]
>  at java.util.Formatter.parse(Formatter.java:2547) ~[na:1.8.0_151]
>  at java.util.Formatter.format(Formatter.java:2501) ~[na:1.8.0_151]
>  at java.util.Formatter.format(Formatter.java:2455) ~[na:1.8.0_151]
>  at java.lang.String.format(String.java:2940) ~[na:1.8.0_151]
>  at 
> org.apache.cassandra.db.commitlog.AbstractCommitLogService$1.run(AbstractCommitLogService.java:105)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]{code}
> Cassandra Version: 2.1.16
> OS: CentOS7
> num_tokens: 256 on each node.
>  
> This behavior is blocking us from adding extra capacity when needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14840) Bootstrap of new node fails with OOM in a large cluster

2018-10-23 Thread Jeff Jirsa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661748#comment-16661748
 ] 

Jeff Jirsa commented on CASSANDRA-14840:


This is a duplicate of CASSANDRA-11748 and/or CASSANDRA-13569 - what's 
happening is that when the new instance comes online, it pulls schema from all 
of the other instances in the cluster at once, getting 80+ copies of what's 
probably a very large schema all at once. 

If you really have no data in any of those tables, the easiest solution may be 
to start removing them to decrease schema size and make the thundering herd of 
schema mutations less painful (this may be a viable option if the tables are 
old and unused - if you expect to use them again, keep reading).

Beyond that, you have two options:
1) Try to make it so you can better handle all of the incoming mutations - this 
may mean a bigger heap, tuning the memtable, or similar. Hard to give concrete 
suggestions without a heap dump and knowing your current settings. Offheap 
memtable may be a starting point given you're on 2.1.
2) Try to limit the number of concurrent migrations - this is going to sound 
awful, for obvious reasons, but one of the things that may work is to 
artificially restrict your instance's view of the ring using firewall rules so 
it can only communicate with a handful of hosts (maybe just the seeds) for the 
first 5-15 seconds after it starts, then once it's got the schema, remove the 
rules allowing it to talk to the rest of the cluster so it can properly 
bootstrap.

One of the other two JIRAs will eventually get addressed; I'm going to dupe 
this to CASSANDRA-11748 since it's a lower number (earlier reporting). 

> Bootstrap of new node fails with OOM in a large cluster
> ---
>
> Key: CASSANDRA-14840
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14840
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Critical
>
> We are seeing new node addition fails with OOM during bootstrap in a cluster 
> of more than 80 nodes and 3000 CF without any data in those CFs.
>  
> Steps to reproduce:
>  # Launch a 3 node cluster
>  # Create 3000 CF in the cluster
>  # Start adding nodes to the cluster one by one
>  # After adding 75-80 nodes, the new node bootstrap fails with OOM.
> {code:java}
> ERROR [PERIODIC-COMMIT-LOG-SYNCER] 2018-10-24 03:26:47,870 
> JVMStabilityInspector.java:78 - Exiting due to error while processing commit 
> log during initialization.
> java.lang.OutOfMemoryError: Java heap space
>  at java.util.regex.Pattern.matcher(Pattern.java:1093) ~[na:1.8.0_151]
>  at java.util.Formatter.parse(Formatter.java:2547) ~[na:1.8.0_151]
>  at java.util.Formatter.format(Formatter.java:2501) ~[na:1.8.0_151]
>  at java.util.Formatter.format(Formatter.java:2455) ~[na:1.8.0_151]
>  at java.lang.String.format(String.java:2940) ~[na:1.8.0_151]
>  at 
> org.apache.cassandra.db.commitlog.AbstractCommitLogService$1.run(AbstractCommitLogService.java:105)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]{code}
> Cassandra Version: 2.1.16
> OS: CentOS7
> num_tokens: 256 on each node.
>  
> This behavior is blocking us from adding extra capacity when needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14840) Bootstrap of new node fails with OOM in a large cluster

2018-10-23 Thread Jai Bheemsen Rao Dhanwada (JIRA)
Jai Bheemsen Rao Dhanwada created CASSANDRA-14840:
-

 Summary: Bootstrap of new node fails with OOM in a large cluster
 Key: CASSANDRA-14840
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14840
 Project: Cassandra
  Issue Type: Bug
  Components: Streaming and Messaging
Reporter: Jai Bheemsen Rao Dhanwada


We are seeing new node addition fails with OOM during bootstrap in a cluster of 
more than 80 nodes and 3000 CF without any data in those CFs.

 

Steps to reproduce:
 # Launch a 3 node cluster
 # Create 3000 CF in the cluster
 # Start adding nodes to the cluster one by one
 # After adding 75-80 nodes, the new node bootstrap fails with OOM.

{code:java}
ERROR [PERIODIC-COMMIT-LOG-SYNCER] 2018-10-24 03:26:47,870 
JVMStabilityInspector.java:78 - Exiting due to error while processing commit 
log during initialization.
java.lang.OutOfMemoryError: Java heap space
 at java.util.regex.Pattern.matcher(Pattern.java:1093) ~[na:1.8.0_151]
 at java.util.Formatter.parse(Formatter.java:2547) ~[na:1.8.0_151]
 at java.util.Formatter.format(Formatter.java:2501) ~[na:1.8.0_151]
 at java.util.Formatter.format(Formatter.java:2455) ~[na:1.8.0_151]
 at java.lang.String.format(String.java:2940) ~[na:1.8.0_151]
 at 
org.apache.cassandra.db.commitlog.AbstractCommitLogService$1.run(AbstractCommitLogService.java:105)
 ~[apache-cassandra-2.1.16.jar:2.1.16]
 at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]{code}
Cassandra Version: 2.1.16

OS: CentOS7

num_tokens: 256 on each node.

 

This behavior is blocking us from adding extra capacity when needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14495) Memory Leak /High Memory usage post 3.11.2 upgrade

2018-10-23 Thread Chris Lohfink (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661664#comment-16661664
 ] 

Chris Lohfink commented on CASSANDRA-14495:
---

High heap usage is expected and not likely an issue. If it was an issue you 
would notice other problems (like horrible latencies and many timeouts).

> Memory Leak /High Memory usage post 3.11.2 upgrade
> --
>
> Key: CASSANDRA-14495
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14495
> Project: Cassandra
>  Issue Type: Bug
>  Components: Metrics
>Reporter: Abdul Patel
>Priority: Major
> Attachments: cas_heap.txt
>
>
> Hi All,
>  
> I recently upgraded my non prod cassandra cluster( 4 nodes single DC) from 
> 3.10 to 3.11.2 version.
> No issues reported apart from only nodetool info reporting 80% usage .
> I intially had 16GB memory on each node, later i bumped up to 20GB, and 
> rebooted all nodes.
> Waited for an week and now again i have seen memory usage more than 80% , 
> 16GB + .
> this means some memory leaks are happening over the time.
> Any one has faced such issue or do we have any workaround ? my 3.11.2 version 
>  upgrade rollout has been halted because of this bug.
> ===
> ID : 65b64f5a-7fe6-4036-94c8-8da9c57718cc
> Gossip active  : true
> Thrift active  : true
> Native Transport active: true
> Load   : 985.24 MiB
> Generation No  : 1526923117
> Uptime (seconds)   : 1097684
> Heap Memory (MB)   : 16875.64 / 20480.00
> Off Heap Memory (MB)   : 20.42
> Data Center    : DC7
> Rack   : rac1
> Exceptions : 0
> Key Cache  : entries 3569, size 421.44 KiB, capacity 100 MiB, 
> 7931933 hits, 8098632 requests, 0.979 recent hit rate, 14400 save period in 
> seconds
> Row Cache  : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 
> requests, NaN recent hit rate, 0 save period in seconds
> Counter Cache  : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 
> requests, NaN recent hit rate, 7200 save period in seconds
> Chunk Cache    : entries 2361, size 147.56 MiB, capacity 3.97 GiB, 
> 2412803 misses, 72594047 requests, 0.967 recent hit rate, NaN microseconds 
> miss latency
> Percent Repaired   : 99.88086234106282%
> Token  : (invoke with -T/--tokens to see all 256 tokens)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14459) DynamicEndpointSnitch should never prefer latent nodes

2018-10-23 Thread Joseph Lynch (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661557#comment-16661557
 ] 

Joseph Lynch commented on CASSANDRA-14459:
--

[~aweisberg] Ok I've pushed changes to the branch which I believe address all 
your comments (including recording only the maximum latencies). I also finished 
the pluggability work so that we can swap out snitches live as well as 
implementing the ~20 lines needed to provide a backwards compatible option. To 
make the snitch more or less equivalent to the old snitch behavior you can do 
either live reconfiguration:

{noformat}
$ jmxterm
Welcome to JMX terminal. Type "help" for available commands.
$>open localhost:7199
#Connection to localhost:7199 is opened
$>bean org.apache.cassandra.db:type=StorageService
#bean is set to org.apache.cassandra.db:type=StorageService
$>run updateSnitch -t 
string,string,java.lang.Integer,java.lang.Integer,java.lang.Double null 
"DynamicEndpointSnitchLegacyHistogram" null 5000 null
#calling operation updateSnitch of mbean 
org.apache.cassandra.db:type=StorageService with params [null, 
DynamicEndpointSnitchLegacyHistogram, null, 1000, null]
#operation returns: 
null
{noformat}

Or this can be updated in {{cassandra.yaml}} with the (intentionally) 
undocumented {{dynamic_snitch_class_name}} option. If a user does set that to 
not the default probing Histogram, we log a warning that it is not supported. I 
imagine that this pluggability will allow us to rapidly experiment with 
different metrics and load balancing implementations in  CASSANDRA-14817.

Summary of the patch:
# Makes the {{DynamicEndpointSnitch}} pluggable and live swappable (the 
underlying {{IEndpointSnitch}} already was)
# Refactors {{DynamicEndpointSnitch}} to send latency probes to interesting 
nodes instead of resetting the samples and losing all latency information
# Refactors {{DynamicEndpointSnitch}} to be a lot more testable so we can test 
our future changes.
# Provide three implementations, the new default as 
{{DynamicEndpointSnitchHistogram}} old legacy as 
{{DynamicEndpointSnitchLegacyHistogram}} and an experimental EMA based snitch 
as {{DynamicEndpointSnitchEMA}}. All three are run through the dynamic 
snitching ranking test.
# Adds a bunch of unit tests of all the implementations.


> DynamicEndpointSnitch should never prefer latent nodes
> --
>
> Key: CASSANDRA-14459
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14459
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Minor
>  Labels: 4.0-feature-freeze-review-requested, 
> pull-request-available
> Fix For: 4.x
>
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> The DynamicEndpointSnitch has two unfortunate behaviors that allow it to 
> provide latent hosts as replicas:
>  # Loses all latency information when Cassandra restarts
>  # Clears latency information entirely every ten minutes (by default), 
> allowing global queries to be routed to _other datacenters_ (and local 
> queries cross racks/azs)
> This means that the first few queries after restart/reset could be quite slow 
> compared to average latencies. I propose we solve this by resetting to the 
> minimum observed latency instead of completely clearing the samples and 
> extending the {{isLatencyForSnitch}} idea to a three state variable instead 
> of two, in particular {{YES}}, {{NO}}, {{MAYBE}}. This extension allows 
> {{EchoMessages}} and {{PingMessages}} to send {{MAYBE}} indicating that the 
> DS should use those measurements if it only has one or fewer samples for a 
> host. This fixes both problems because on process restart we send out 
> {{PingMessages}} / {{EchoMessages}} as part of startup, and we would reset to 
> effectively the RTT of the hosts (also at that point normal gossip 
> {{EchoMessages}} have an opportunity to add an additional latency 
> measurement).
> This strategy also nicely deals with the "a host got slow but now it's fine" 
> problem that the DS resets were (afaik) designed to stop because the 
> {{EchoMessage}} ping latency will count only after the reset for that host. 
> Ping latency is a more reasonable lower bound on host latency (as opposed to 
> status quo of zero).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-12823) dtest failure in topology_test.TestTopology.crash_during_decommission_test

2018-10-23 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown reassigned CASSANDRA-12823:
---

Assignee: (was: Jason Brown)

> dtest failure in topology_test.TestTopology.crash_during_decommission_test
> --
>
> Key: CASSANDRA-12823
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12823
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sean McCarthy
>Priority: Major
>  Labels: dtest, test-failure
> Attachments: node1.log, node1_debug.log, node1_gc.log, node2.log, 
> node2_debug.log, node2_gc.log, node3.log, node3_debug.log, node3_gc.log
>
>
> example failure:
> http://cassci.datastax.com/job/trunk_novnode_dtest/489/testReport/topology_test/TestTopology/crash_during_decommission_test
> {code}
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 358, in run
> self.tearDown()
>   File "/home/automaton/cassandra-dtest/dtest.py", line 581, in tearDown
> raise AssertionError('Unexpected error in log, see stdout')
> "Unexpected error in log, see stdout
> {code}{code}
> Standard Output
> Unexpected error in node2 log, error: 
> ERROR [GossipStage:1] 2016-10-19 15:44:14,820 CassandraDaemon.java:229 - 
> Exception in thread Thread[GossipStage:1,5,main]
> java.lang.NullPointerException: null
>   at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936) 
> ~[na:1.8.0_45]
>   at org.apache.cassandra.hints.HintsCatalog.get(HintsCatalog.java:89) 
> ~[main/:na]
>   at 
> org.apache.cassandra.hints.HintsService.excise(HintsService.java:313) 
> ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.excise(StorageService.java:2458) 
> ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.excise(StorageService.java:2471) 
> ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:2375)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1905)
>  ~[main/:na]
>   at 
> org.apache.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1222) 
> ~[main/:na]
>   at org.apache.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:1205) 
> ~[main/:na]
>   at 
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1168) 
> ~[main/:na]
>   at 
> org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:58)
>  ~[main/:na]
>   at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) 
> ~[main/:na]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_45]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_45]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_45]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Resolved] (CASSANDRA-13517) dtest failure in paxos_tests.TestPaxos.contention_test_many_threads

2018-10-23 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown resolved CASSANDRA-13517.
-
Resolution: Cannot Reproduce

Closing for now as it doesn't seem to be a problem of late.

> dtest failure in paxos_tests.TestPaxos.contention_test_many_threads
> ---
>
> Key: CASSANDRA-13517
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13517
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Ariel Weisberg
>Assignee: Jason Brown
>Priority: Major
>  Labels: dtest, test-failure, test-failure-fresh
> Attachments: test_failure.txt
>
>
> See attachment for details



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-11809) IV misuse in commit log encryption

2018-10-23 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown reassigned CASSANDRA-11809:
---

Assignee: (was: Jason Brown)

> IV misuse in commit log encryption
> --
>
> Key: CASSANDRA-11809
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11809
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Blake Eggleston
>Priority: Major
> Fix For: 3.11.x
>
>
> Commit log segments share iv values between encrypted chunks. The cipher 
> should be reinitialized with a new iv for each discrete piece of data it 
> encrypts, otherwise it gives attackers something to compare between chunks of 
> data. Also, some cipher configurations don't support initialization vectors 
> ('AES/ECB/NoPadding'), so some logic should be added to determine if the 
> cipher should be initialized with an iv.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-13856) Optimize ByteBuf reallocations in the native protocol pipeline

2018-10-23 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown reassigned CASSANDRA-13856:
---

Assignee: (was: Jason Brown)

> Optimize ByteBuf reallocations in the native protocol pipeline
> --
>
> Key: CASSANDRA-13856
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13856
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jason Brown
>Priority: Minor
>
> This is a follow up to CASSANDRA-13789. I discovered we reallocate the 
> {{ByteBuf}} when writing data to it, and it would be nice to size the buffer 
> correctly up-front to avoid reallocating it. I'm not sure how easy that is, 
> nor if the cost of the realloc is cheaper than calculating the size needed 
> for the buffer. Adding this ticket, nonetheless, to explore that optimization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-11810) IV misuse in hints encryption

2018-10-23 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown reassigned CASSANDRA-11810:
---

Assignee: (was: Jason Brown)

> IV misuse in hints encryption
> -
>
> Key: CASSANDRA-11810
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11810
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Blake Eggleston
>Priority: Major
> Fix For: 3.11.x
>
>
> Encrypted hint files share iv values between encrypted chunks. The cipher 
> should be reinitialized with a new iv for each discrete piece of data it 
> encrypts, otherwise it gives attackers something to compare between chunks of 
> data. Also, some cipher configurations don't support initialization vectors 
> ('AES/ECB/NoPadding'), so some logic should be added to determine if the 
> cipher should be initialized with an iv.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-7922) Add file-level encryption

2018-10-23 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown reassigned CASSANDRA-7922:
--

Assignee: (was: Jason Brown)

> Add file-level encryption
> -
>
> Key: CASSANDRA-7922
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7922
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jason Brown
>Priority: Major
>  Labels: encryption, security
> Fix For: 4.x
>
>
> Umbrella ticket for file-level encryption
> Some use cases require encrypting files at rest for certain compliance needs: 
> the health­care industry (HIPAA regulations), the card payment industry (PCI 
> DSS regulations) or the US government (FISMA regulations). File system 
> encryption can be used in some situations, but does not solve all problems. 
> I can foresee the following components needing at-rest encryption:
> - sstables (data, index, and summary files) (CASSANDRA-9633)
> - commit log (CASSANDRA-6018)
> - hints (CASSANDRA-11040)
> - some systems tables (batches, not sure if any others)
> - index/row cache
> - secondary indexes
> The work for those items would be separate tickets, of course. I have a 
> working version of most of the above components working in 2.0, which I need 
> to ship in production now, but it's too late for the 2.0 branch and unclear 
> for 2.1.
> Other products, such as Oracle/SqlServer/Datastax Enterprise commonly refer 
> to at-rest encryption as Transparent Data Encryption (TDE), and I'm happy to 
> stick with that convention, here, as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-9633) Add ability to encrypt sstables

2018-10-23 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown reassigned CASSANDRA-9633:
--

Assignee: (was: Jason Brown)

> Add ability to encrypt sstables
> ---
>
> Key: CASSANDRA-9633
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9633
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jason Brown
>Priority: Major
>  Labels: encryption, security, sstable
> Fix For: 4.x
>
>
> Add option to allow encrypting of sstables.
> I have a version of this functionality built on cassandra 2.0 that 
> piggy-backs on the existing sstable compression functionality and ICompressor 
> interface (similar in nature to what DataStax Enterprise does). However, if 
> we're adding the feature to the main OSS product, I'm not sure if we want to 
> use the pluggable compression framework or if it's worth investigating a 
> different path. I think there's a lot of upside in reusing the sstable 
> compression scheme, but perhaps add a new component in cqlsh for table 
> encryption and a corresponding field in CFMD.
> Encryption configuration in the yaml can use the same mechanism as 
> CASSANDRA-6018 (which is currently pending internal review).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 16kb

2018-10-23 Thread Joseph Lynch (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661492#comment-16661492
 ] 

Joseph Lynch edited comment on CASSANDRA-13241 at 10/24/18 12:05 AM:
-

I don't mean to add another perspective to this, but I am not sure we're 
considering the compression ratio loss on real world data enough here. We've 
been talking a lot about the memory requirements but I think the bigger issues 
are:
 * Ratio loss leading to less of the dataset being hot in OS page cache
 * OS Read-ahead is usually 16 or 32kb, so if you're reading less than that 
from disk you're still going to read 16 or 32kb...

I think for Cassandra which relies on the OS page cache heavily for 
performance, 16kb is the absolute minimum I would default to. For example from 
IRC today I ran Ariel's ratio 
[script|https://gist.github.com/jolynch/411e62ac592bfb55cfdd5db87c77ef6f] on a 
(somewhat arbitrary) 3.0.17 production cluster dataset and saw the following 
ratios :
{noformat}
Chunk size 4096, ratio 0.541505
Chunk size 8192, ratio 0.467537
Chunk size 16384, ratio 0.425122
Chunk size 32768, ratio 0.387040
Chunk size 65536, ratio 0.352454
{noformat}
The reduction in ratio at 4-8kb would destroy the OS page cache imo. 16KB isn't 
too bad, and 32kb is downright fine.

In my experience, 32kb is often an easy win, and 16kb is often a good idea for 
less compressible datasets. Last I checked Scylla uses direct io and bypasses 
the OS cache so I don't think we should use their default unless we implement 
direct io as well (and the buffer cache on top of it)...

If the hot dataset is much less than RAM, then yea 4kb all the way ...


was (Author: jolynch):
I don't mean to add another perspective to this, but I am not sure we're 
considering the compression ratio loss on real world data enough here. We've 
been talking a lot about the memory requirements but I think the bigger issues 
are:
 * Ratio loss leading to less of the dataset being hot in OS page cache
 * OS Read-ahead is usually 16 or 32kb, so if you're reading less than that 
from disk you're still going to read 16 or 32kb...

I think for Cassandra which relies on the OS page cache heavily for 
performance, 16kb is the absolute minimum I would default to. For example from 
IRC today I ran Ariel's ratio 
[script|https://gist.github.com/jolynch/411e62ac592bfb55cfdd5db87c77ef6f] on a 
(somewhat arbitrary) 3.0.17 production cluster dataset and saw the following 
ratios :
{noformat}
Chunk size 4096, ratio 0.541505
Chunk size 8192, ratio 0.467537
Chunk size 16384, ratio 0.425122
Chunk size 32768, ratio 0.387040
Chunk size 65536, ratio 0.352454
{noformat}
The reduction in ratio at 4-8kb would destroy the OS page cache imo. 16KB isn't 
too bad, and 32kb is downright fine.

In my experience, 32kb is often an easy win, and 16kb is often a good idea for 
less compressible datasets. Last I checked Scylla uses direct io and bypasses 
the OS cache so I don't think we should use their default unless we implement 
direct io as well (and the buffer cache on top of it)...

If the dataset is less than RAM, then yea 4kb all the way ...

> Lower default chunk_length_in_kb from 64kb to 16kb
> --
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>Assignee: Ariel Weisberg
>Priority: Major
> Attachments: CompactIntegerSequence.java, 
> CompactIntegerSequenceBench.java, CompactSummingIntegerSequence.java
>
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by 

[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 16kb

2018-10-23 Thread Joseph Lynch (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661492#comment-16661492
 ] 

Joseph Lynch commented on CASSANDRA-13241:
--

I don't mean to add another perspective to this, but I am not sure we're 
considering the compression ratio loss on real world data enough here. We've 
been talking a lot about the memory requirements but I think the bigger issues 
are:
 * Ratio loss leading to less of the dataset being hot in OS page cache
 * OS Read-ahead is usually 16 or 32kb, so if you're reading less than that 
from disk you're still going to read 16 or 32kb...

I think for Cassandra which relies on the OS page cache heavily for 
performance, 16kb is the absolute minimum I would default to. For example from 
IRC today I ran Ariel's ratio 
[script|https://gist.github.com/jolynch/411e62ac592bfb55cfdd5db87c77ef6f] on a 
(somewhat arbitrary) 3.0.17 production cluster dataset and saw the following 
ratios :
{noformat}
Chunk size 4096, ratio 0.541505
Chunk size 8192, ratio 0.467537
Chunk size 16384, ratio 0.425122
Chunk size 32768, ratio 0.387040
Chunk size 65536, ratio 0.352454
{noformat}
The reduction in ratio at 4-8kb would destroy the OS page cache imo. 16KB isn't 
too bad, and 32kb is downright fine.

In my experience, 32kb is often an easy win, and 16kb is often a good idea for 
less compressible datasets. Last I checked Scylla uses direct io and bypasses 
the OS cache so I don't think we should use their default unless we implement 
direct io as well (and the buffer cache on top of it)...

If the dataset is less than RAM, then yea 4kb all the way ...

> Lower default chunk_length_in_kb from 64kb to 16kb
> --
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>Assignee: Ariel Weisberg
>Priority: Major
> Attachments: CompactIntegerSequence.java, 
> CompactIntegerSequenceBench.java, CompactSummingIntegerSequence.java
>
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14839) Cannot bootstrap new node IOException: CF was dropped during streaming - Cassandra 3.0.16

2018-10-23 Thread Drew O'Connor (JIRA)
Drew O'Connor created CASSANDRA-14839:
-

 Summary: Cannot bootstrap new node IOException: CF  was 
dropped during streaming - Cassandra 3.0.16 
 Key: CASSANDRA-14839
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14839
 Project: Cassandra
  Issue Type: Bug
 Environment: Cassandra 3.0.16
Reporter: Drew O'Connor


I have a Cassandra 3.0.16 cluster with 6 nodes. 

Repairs work well on all nodes. 

Trying to join a 7th node fails with:
{code:java}
ERROR [STREAM-IN-/172.30.2.191] 2018-10-23 20:57:06,497 StreamSession.java:534 
- [Stream #08d98990-d706-11e8-bf10-9b02b0f15309] Streaming error occurred
java.io.IOException: CF 9428b300-99ac-11e8-aa56-b1d7e983933e was dropped during 
streaming
at 
org.apache.cassandra.streaming.compress.CompressedStreamReader.read(CompressedStreamReader.java:77)
 ~[apache-cassandra-3.0.16.jar:3.0.16]
at 
org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:54)
 ~[apache-cassandra-3.0.16.jar:3.0.16]
at 
org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:43)
 ~[apache-cassandra-3.0.16.jar:3.0.16]
at 
org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:59)
 ~[apache-cassandra-3.0.16.jar:3.0.16]
at 
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:294)
 ~[apache-cassandra-3.0.16.jar:3.0.16]
at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
 [apache-cassandra-3.0.16.jar:3.0.16]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_191]
{code}
On disk I have:
{code:java}
ubuntu@prd-cassandra-2d:/data$ find -type d -name 
'*9428b30099ac11e8aa56b1d7e983933e*'
./cassandra/data/sandboxprdvault/vault-9428b30099ac11e8aa56b1d7e983933e
{code}
However in system_schema.tables I have:
{code:java}
keyspace_name   table_name  id
sandboxprdvault vault   942c8390-99ac-11e8-be55-69bc3438b9fa
{code}
What is unusual to me is that the system_schema.table.ID for vault is different 
than the id on disk for the same table. 

This seems to cause a stream error when bootstrapping a new node. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14790) LongBufferPoolTest burn test fails assertion

2018-10-23 Thread Aleksey Yeschenko (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661405#comment-16661405
 ] 

Aleksey Yeschenko commented on CASSANDRA-14790:
---

Committed to 3.0 as 
[e07d53aaec94a498028d988f7d2c7ae7e6b620d0|https://github.com/apache/cassandra/commit/e07d53aaec94a498028d988f7d2c7ae7e6b620d0]
 and merged upwards, thanks.

> LongBufferPoolTest burn test fails assertion
> 
>
> Key: CASSANDRA-14790
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14790
> Project: Cassandra
>  Issue Type: Test
>  Components: Testing
> Environment: Run under macOS 10.13.6, with patch (attached, but also 
> https://github.com/jonmeredith/cassandra/tree/failing-burn-test)
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.18, 3.11.4, 4.0
>
> Attachments: 0001-Add-burn-testsome-target-to-build.xml.patch, 
> 0002-Initialize-before-running-LongBufferPoolTest.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> The LongBufferPoolTest from the burn tests fails with an assertion error.  I 
> added a build target to run individual burn tests, and \{jasobrown} gave a 
> fix for the uninitialized test setup (attached), however the test now fails 
> on an assertion about recycling buffers.
> To reproduce (with patch applied)
> {{ant burn-testsome 
> -Dtest.name=org.apache.cassandra.utils.memory.LongBufferPoolTest 
> -Dtest.methods=testAllocate}}
> Output
> {{    [junit] Testcase: 
> testAllocate(org.apache.cassandra.utils.memory.LongBufferPoolTest): FAILED}}
> {{    [junit] null}}
> {{    [junit] junit.framework.AssertionFailedError}}
> {{    [junit] at 
> org.apache.cassandra.utils.memory.BufferPool$Debug.check(BufferPool.java:204)}}
> {{    [junit] at 
> org.apache.cassandra.utils.memory.BufferPool.assertAllRecycled(BufferPool.java:181)}}
> {{    [junit] at 
> org.apache.cassandra.utils.memory.LongBufferPoolTest.testAllocate(LongBufferPoolTest.java:350)}}
> {{    [junit] at 
> org.apache.cassandra.utils.memory.LongBufferPoolTest.testAllocate(LongBufferPoolTest.java:54)}}
> All major branches from 3.0 and later have issues, however the trunk branch 
> also warns about references not being released before the reference is 
> garbage collected.
> {{[junit] ERROR [Reference-Reaper:1] 2018-09-25 13:59:54,089 Ref.java:224 - 
> LEAK DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@7f58d19a) to @623704362 was 
> not released before the reference was garbage collected}}
> {{ [junit] ERROR [Reference-Reaper:1] 2018-09-25 13:59:54,089 Ref.java:255 - 
> Allocate trace org.apache.cassandra.utils.concurrent.Ref$State@7f58d19a:}}
> {{ [junit] Thread[pool-2-thread-24,5,main]}}
> {{ [junit] at java.lang.Thread.getStackTrace(Thread.java:1559)}}
> {{ [junit] at 
> org.apache.cassandra.utils.concurrent.Ref$Debug.(Ref.java:245)}}
> {{ [junit] at 
> org.apache.cassandra.utils.concurrent.Ref$State.(Ref.java:175)}}
> {{ [junit] at org.apache.cassandra.utils.concurrent.Ref.(Ref.java:97)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.BufferPool$Chunk.setAttachment(BufferPool.java:663)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.BufferPool$Chunk.get(BufferPool.java:803)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.BufferPool$Chunk.get(BufferPool.java:793)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.BufferPool$LocalPool.get(BufferPool.java:388)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.BufferPool.maybeTakeFromPool(BufferPool.java:143)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.BufferPool.takeFromPool(BufferPool.java:115)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.BufferPool.get(BufferPool.java:85)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.LongBufferPoolTest$3.allocate(LongBufferPoolTest.java:296)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.LongBufferPoolTest$3.testOne(LongBufferPoolTest.java:246)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.LongBufferPoolTest$TestUntil.call(LongBufferPoolTest.java:399)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.LongBufferPoolTest$TestUntil.call(LongBufferPoolTest.java:379)}}
> {{ [junit] at java.util.concurrent.FutureTask.run(FutureTask.java:266)}}
> {{ [junit] at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}}
> {{ [junit] at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}}
> {{ [junit] at java.lang.Thread.run(Thread.java:748)}}
>  
> Perhaps the environment is not being set up correctly for the tests.
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To 

[jira] [Updated] (CASSANDRA-14790) LongBufferPoolTest burn test fails assertion

2018-10-23 Thread Aleksey Yeschenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-14790:
--
   Resolution: Fixed
Fix Version/s: 4.0
   3.11.4
   3.0.18
   Status: Resolved  (was: Patch Available)

> LongBufferPoolTest burn test fails assertion
> 
>
> Key: CASSANDRA-14790
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14790
> Project: Cassandra
>  Issue Type: Test
>  Components: Testing
> Environment: Run under macOS 10.13.6, with patch (attached, but also 
> https://github.com/jonmeredith/cassandra/tree/failing-burn-test)
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.18, 3.11.4, 4.0
>
> Attachments: 0001-Add-burn-testsome-target-to-build.xml.patch, 
> 0002-Initialize-before-running-LongBufferPoolTest.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> The LongBufferPoolTest from the burn tests fails with an assertion error.  I 
> added a build target to run individual burn tests, and \{jasobrown} gave a 
> fix for the uninitialized test setup (attached), however the test now fails 
> on an assertion about recycling buffers.
> To reproduce (with patch applied)
> {{ant burn-testsome 
> -Dtest.name=org.apache.cassandra.utils.memory.LongBufferPoolTest 
> -Dtest.methods=testAllocate}}
> Output
> {{    [junit] Testcase: 
> testAllocate(org.apache.cassandra.utils.memory.LongBufferPoolTest): FAILED}}
> {{    [junit] null}}
> {{    [junit] junit.framework.AssertionFailedError}}
> {{    [junit] at 
> org.apache.cassandra.utils.memory.BufferPool$Debug.check(BufferPool.java:204)}}
> {{    [junit] at 
> org.apache.cassandra.utils.memory.BufferPool.assertAllRecycled(BufferPool.java:181)}}
> {{    [junit] at 
> org.apache.cassandra.utils.memory.LongBufferPoolTest.testAllocate(LongBufferPoolTest.java:350)}}
> {{    [junit] at 
> org.apache.cassandra.utils.memory.LongBufferPoolTest.testAllocate(LongBufferPoolTest.java:54)}}
> All major branches from 3.0 and later have issues, however the trunk branch 
> also warns about references not being released before the reference is 
> garbage collected.
> {{[junit] ERROR [Reference-Reaper:1] 2018-09-25 13:59:54,089 Ref.java:224 - 
> LEAK DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@7f58d19a) to @623704362 was 
> not released before the reference was garbage collected}}
> {{ [junit] ERROR [Reference-Reaper:1] 2018-09-25 13:59:54,089 Ref.java:255 - 
> Allocate trace org.apache.cassandra.utils.concurrent.Ref$State@7f58d19a:}}
> {{ [junit] Thread[pool-2-thread-24,5,main]}}
> {{ [junit] at java.lang.Thread.getStackTrace(Thread.java:1559)}}
> {{ [junit] at 
> org.apache.cassandra.utils.concurrent.Ref$Debug.(Ref.java:245)}}
> {{ [junit] at 
> org.apache.cassandra.utils.concurrent.Ref$State.(Ref.java:175)}}
> {{ [junit] at org.apache.cassandra.utils.concurrent.Ref.(Ref.java:97)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.BufferPool$Chunk.setAttachment(BufferPool.java:663)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.BufferPool$Chunk.get(BufferPool.java:803)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.BufferPool$Chunk.get(BufferPool.java:793)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.BufferPool$LocalPool.get(BufferPool.java:388)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.BufferPool.maybeTakeFromPool(BufferPool.java:143)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.BufferPool.takeFromPool(BufferPool.java:115)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.BufferPool.get(BufferPool.java:85)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.LongBufferPoolTest$3.allocate(LongBufferPoolTest.java:296)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.LongBufferPoolTest$3.testOne(LongBufferPoolTest.java:246)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.LongBufferPoolTest$TestUntil.call(LongBufferPoolTest.java:399)}}
> {{ [junit] at 
> org.apache.cassandra.utils.memory.LongBufferPoolTest$TestUntil.call(LongBufferPoolTest.java:379)}}
> {{ [junit] at java.util.concurrent.FutureTask.run(FutureTask.java:266)}}
> {{ [junit] at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}}
> {{ [junit] at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}}
> {{ [junit] at java.lang.Thread.run(Thread.java:748)}}
>  
> Perhaps the environment is not being set up correctly for the tests.
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For 

[2/6] cassandra git commit: Fix flaky LongBufferPoolTest

2018-10-23 Thread aleksey
Fix flaky LongBufferPoolTest

patch by Jon Meredith; reviewed by Dinesh Joshi for CASSANDRA-14790


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/e07d53aa
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/e07d53aa
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/e07d53aa

Branch: refs/heads/cassandra-3.11
Commit: e07d53aaec94a498028d988f7d2c7ae7e6b620d0
Parents: 285153f
Author: Jon Meredith 
Authored: Thu Oct 4 17:08:52 2018 -0600
Committer: Aleksey Yeshchenko 
Committed: Tue Oct 23 23:22:13 2018 +0100

--
 build.xml   |  33 +
 .../utils/memory/LongBufferPoolTest.java| 614 ---
 2 files changed, 411 insertions(+), 236 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/e07d53aa/build.xml
--
diff --git a/build.xml b/build.xml
index d7e5444..d7e6c4b 100644
--- a/build.xml
+++ b/build.xml
@@ -1345,6 +1345,14 @@
 
   
 
+  
+  
+
+  
+
+  
   
 
 
@@ -1742,6 +1750,31 @@
   
   
 
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+
   
   
 

http://git-wip-us.apache.org/repos/asf/cassandra/blob/e07d53aa/test/burn/org/apache/cassandra/utils/memory/LongBufferPoolTest.java
--
diff --git 
a/test/burn/org/apache/cassandra/utils/memory/LongBufferPoolTest.java 
b/test/burn/org/apache/cassandra/utils/memory/LongBufferPoolTest.java
index 17ac569..66abe5a 100644
--- a/test/burn/org/apache/cassandra/utils/memory/LongBufferPoolTest.java
+++ b/test/burn/org/apache/cassandra/utils/memory/LongBufferPoolTest.java
@@ -36,10 +36,36 @@ import org.apache.cassandra.utils.DynamicList;
 
 import static org.junit.Assert.*;
 
+/**
+ * Long BufferPool test - make sure that the BufferPool allocates and recycles
+ * ByteBuffers under heavy concurrent usage.
+ *
+ * The test creates two groups of threads
+ *
+ * - the burn producer/consumer pair that allocates 1/10 poolSize and then 
returns
+ *   all the memory to the pool. 50% is freed by the producer, 50% passed to 
the consumer thread.
+ *
+ * - a ring of worker threads that allocate buffers and either immediately 
free them,
+ *   or pass to the next worker thread for it to be freed on it's behalf.  
Periodically
+ *   all memory is freed by the thread.
+ *
+ * While the burn/worker threads run, the original main thread checks that all 
of the threads are still
+ * making progress every 10s (no locking issues, or exits from assertion 
failures),
+ * and that every chunk has been freed at least once during the previous cycle 
(if that was possible).
+ *
+ * The test does not expect to survive out-of-memory errors, so needs 
sufficient heap memory
+ * for non-direct buffers and the debug tracking objects that check the 
allocate buffers.
+ * (The timing is very interesting when Xmx is lowered to increase garbage 
collection pauses, but do
+ * not set it too low).
+ */
 public class LongBufferPoolTest
 {
 private static final Logger logger = 
LoggerFactory.getLogger(LongBufferPoolTest.class);
 
+private static final int AVG_BUFFER_SIZE = 16 << 10;
+private static final int STDEV_BUFFER_SIZE = 10 << 10; // picked to ensure 
exceeding buffer size is rare, but occurs
+private static final DateFormat DATE_FORMAT = new 
SimpleDateFormat("/MM/dd HH:mm:ss");
+
 @Test
 public void testAllocate() throws InterruptedException, ExecutionException
 {
@@ -73,299 +99,393 @@ public class LongBufferPoolTest
 }
 }
 
-public void testAllocate(int threadCount, long duration, int poolSize) 
throws InterruptedException, ExecutionException
+private static final class TestEnvironment
 {
-final int avgBufferSize = 16 << 10;
-final int stdevBufferSize = 10 << 10; // picked to ensure exceeding 
buffer size is rare, but occurs
-final DateFormat dateFormat = new SimpleDateFormat("/MM/dd 
HH:mm:ss");
+final int threadCount;
+final long duration;
+final int poolSize;
+final long until;
+final CountDownLatch latch;
+final SPSCQueue[] sharedRecycle;
+final AtomicBoolean[] makingProgress;
+final AtomicBoolean burnFreed;
+final AtomicBoolean[] freedAllMemory;
+final ExecutorService executorService;
+final List> threadResultFuture;
+final int targetSizeQuanta;
+
+TestEnvironment(int threadCount, long duration, int poolSize)
+{
+this.threadCount = threadCount;
+   

[4/6] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.11

2018-10-23 Thread aleksey
Merge branch 'cassandra-3.0' into cassandra-3.11


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/6308fb21
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/6308fb21
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/6308fb21

Branch: refs/heads/trunk
Commit: 6308fb21da9ef512e8620779efb68b49f277484d
Parents: 075f458 e07d53a
Author: Aleksey Yeshchenko 
Authored: Tue Oct 23 23:27:09 2018 +0100
Committer: Aleksey Yeshchenko 
Committed: Tue Oct 23 23:27:09 2018 +0100

--
 build.xml   |  33 +
 .../utils/memory/LongBufferPoolTest.java| 614 ---
 2 files changed, 411 insertions(+), 236 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/6308fb21/build.xml
--
diff --cc build.xml
index ff1695b,d7e6c4b..c9565f8
--- a/build.xml
+++ b/build.xml
@@@ -1399,7 -1345,15 +1399,15 @@@
  

  
+   
+   
+ 
+   
+ 
+   
 -  
 +  
  
  



-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[1/6] cassandra git commit: Fix flaky LongBufferPoolTest

2018-10-23 Thread aleksey
Repository: cassandra
Updated Branches:
  refs/heads/cassandra-3.0 285153f62 -> e07d53aae
  refs/heads/cassandra-3.11 075f45862 -> 6308fb21d
  refs/heads/trunk c3ef43d45 -> bf6ddb3bc


Fix flaky LongBufferPoolTest

patch by Jon Meredith; reviewed by Dinesh Joshi for CASSANDRA-14790


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/e07d53aa
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/e07d53aa
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/e07d53aa

Branch: refs/heads/cassandra-3.0
Commit: e07d53aaec94a498028d988f7d2c7ae7e6b620d0
Parents: 285153f
Author: Jon Meredith 
Authored: Thu Oct 4 17:08:52 2018 -0600
Committer: Aleksey Yeshchenko 
Committed: Tue Oct 23 23:22:13 2018 +0100

--
 build.xml   |  33 +
 .../utils/memory/LongBufferPoolTest.java| 614 ---
 2 files changed, 411 insertions(+), 236 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/e07d53aa/build.xml
--
diff --git a/build.xml b/build.xml
index d7e5444..d7e6c4b 100644
--- a/build.xml
+++ b/build.xml
@@ -1345,6 +1345,14 @@
 
   
 
+  
+  
+
+  
+
+  
   
 
 
@@ -1742,6 +1750,31 @@
   
   
 
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+
   
   
 

http://git-wip-us.apache.org/repos/asf/cassandra/blob/e07d53aa/test/burn/org/apache/cassandra/utils/memory/LongBufferPoolTest.java
--
diff --git 
a/test/burn/org/apache/cassandra/utils/memory/LongBufferPoolTest.java 
b/test/burn/org/apache/cassandra/utils/memory/LongBufferPoolTest.java
index 17ac569..66abe5a 100644
--- a/test/burn/org/apache/cassandra/utils/memory/LongBufferPoolTest.java
+++ b/test/burn/org/apache/cassandra/utils/memory/LongBufferPoolTest.java
@@ -36,10 +36,36 @@ import org.apache.cassandra.utils.DynamicList;
 
 import static org.junit.Assert.*;
 
+/**
+ * Long BufferPool test - make sure that the BufferPool allocates and recycles
+ * ByteBuffers under heavy concurrent usage.
+ *
+ * The test creates two groups of threads
+ *
+ * - the burn producer/consumer pair that allocates 1/10 poolSize and then 
returns
+ *   all the memory to the pool. 50% is freed by the producer, 50% passed to 
the consumer thread.
+ *
+ * - a ring of worker threads that allocate buffers and either immediately 
free them,
+ *   or pass to the next worker thread for it to be freed on it's behalf.  
Periodically
+ *   all memory is freed by the thread.
+ *
+ * While the burn/worker threads run, the original main thread checks that all 
of the threads are still
+ * making progress every 10s (no locking issues, or exits from assertion 
failures),
+ * and that every chunk has been freed at least once during the previous cycle 
(if that was possible).
+ *
+ * The test does not expect to survive out-of-memory errors, so needs 
sufficient heap memory
+ * for non-direct buffers and the debug tracking objects that check the 
allocate buffers.
+ * (The timing is very interesting when Xmx is lowered to increase garbage 
collection pauses, but do
+ * not set it too low).
+ */
 public class LongBufferPoolTest
 {
 private static final Logger logger = 
LoggerFactory.getLogger(LongBufferPoolTest.class);
 
+private static final int AVG_BUFFER_SIZE = 16 << 10;
+private static final int STDEV_BUFFER_SIZE = 10 << 10; // picked to ensure 
exceeding buffer size is rare, but occurs
+private static final DateFormat DATE_FORMAT = new 
SimpleDateFormat("/MM/dd HH:mm:ss");
+
 @Test
 public void testAllocate() throws InterruptedException, ExecutionException
 {
@@ -73,299 +99,393 @@ public class LongBufferPoolTest
 }
 }
 
-public void testAllocate(int threadCount, long duration, int poolSize) 
throws InterruptedException, ExecutionException
+private static final class TestEnvironment
 {
-final int avgBufferSize = 16 << 10;
-final int stdevBufferSize = 10 << 10; // picked to ensure exceeding 
buffer size is rare, but occurs
-final DateFormat dateFormat = new SimpleDateFormat("/MM/dd 
HH:mm:ss");
+final int threadCount;
+final long duration;
+final int poolSize;
+final long until;
+final CountDownLatch latch;
+final SPSCQueue[] sharedRecycle;
+final AtomicBoolean[] makingProgress;
+final AtomicBoolean burnFreed;
+final AtomicBoolean[] freedAllMemory;
+final ExecutorService executorService;
+final List> 

[3/6] cassandra git commit: Fix flaky LongBufferPoolTest

2018-10-23 Thread aleksey
Fix flaky LongBufferPoolTest

patch by Jon Meredith; reviewed by Dinesh Joshi for CASSANDRA-14790


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/e07d53aa
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/e07d53aa
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/e07d53aa

Branch: refs/heads/trunk
Commit: e07d53aaec94a498028d988f7d2c7ae7e6b620d0
Parents: 285153f
Author: Jon Meredith 
Authored: Thu Oct 4 17:08:52 2018 -0600
Committer: Aleksey Yeshchenko 
Committed: Tue Oct 23 23:22:13 2018 +0100

--
 build.xml   |  33 +
 .../utils/memory/LongBufferPoolTest.java| 614 ---
 2 files changed, 411 insertions(+), 236 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/e07d53aa/build.xml
--
diff --git a/build.xml b/build.xml
index d7e5444..d7e6c4b 100644
--- a/build.xml
+++ b/build.xml
@@ -1345,6 +1345,14 @@
 
   
 
+  
+  
+
+  
+
+  
   
 
 
@@ -1742,6 +1750,31 @@
   
   
 
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+
   
   
 

http://git-wip-us.apache.org/repos/asf/cassandra/blob/e07d53aa/test/burn/org/apache/cassandra/utils/memory/LongBufferPoolTest.java
--
diff --git 
a/test/burn/org/apache/cassandra/utils/memory/LongBufferPoolTest.java 
b/test/burn/org/apache/cassandra/utils/memory/LongBufferPoolTest.java
index 17ac569..66abe5a 100644
--- a/test/burn/org/apache/cassandra/utils/memory/LongBufferPoolTest.java
+++ b/test/burn/org/apache/cassandra/utils/memory/LongBufferPoolTest.java
@@ -36,10 +36,36 @@ import org.apache.cassandra.utils.DynamicList;
 
 import static org.junit.Assert.*;
 
+/**
+ * Long BufferPool test - make sure that the BufferPool allocates and recycles
+ * ByteBuffers under heavy concurrent usage.
+ *
+ * The test creates two groups of threads
+ *
+ * - the burn producer/consumer pair that allocates 1/10 poolSize and then 
returns
+ *   all the memory to the pool. 50% is freed by the producer, 50% passed to 
the consumer thread.
+ *
+ * - a ring of worker threads that allocate buffers and either immediately 
free them,
+ *   or pass to the next worker thread for it to be freed on it's behalf.  
Periodically
+ *   all memory is freed by the thread.
+ *
+ * While the burn/worker threads run, the original main thread checks that all 
of the threads are still
+ * making progress every 10s (no locking issues, or exits from assertion 
failures),
+ * and that every chunk has been freed at least once during the previous cycle 
(if that was possible).
+ *
+ * The test does not expect to survive out-of-memory errors, so needs 
sufficient heap memory
+ * for non-direct buffers and the debug tracking objects that check the 
allocate buffers.
+ * (The timing is very interesting when Xmx is lowered to increase garbage 
collection pauses, but do
+ * not set it too low).
+ */
 public class LongBufferPoolTest
 {
 private static final Logger logger = 
LoggerFactory.getLogger(LongBufferPoolTest.class);
 
+private static final int AVG_BUFFER_SIZE = 16 << 10;
+private static final int STDEV_BUFFER_SIZE = 10 << 10; // picked to ensure 
exceeding buffer size is rare, but occurs
+private static final DateFormat DATE_FORMAT = new 
SimpleDateFormat("/MM/dd HH:mm:ss");
+
 @Test
 public void testAllocate() throws InterruptedException, ExecutionException
 {
@@ -73,299 +99,393 @@ public class LongBufferPoolTest
 }
 }
 
-public void testAllocate(int threadCount, long duration, int poolSize) 
throws InterruptedException, ExecutionException
+private static final class TestEnvironment
 {
-final int avgBufferSize = 16 << 10;
-final int stdevBufferSize = 10 << 10; // picked to ensure exceeding 
buffer size is rare, but occurs
-final DateFormat dateFormat = new SimpleDateFormat("/MM/dd 
HH:mm:ss");
+final int threadCount;
+final long duration;
+final int poolSize;
+final long until;
+final CountDownLatch latch;
+final SPSCQueue[] sharedRecycle;
+final AtomicBoolean[] makingProgress;
+final AtomicBoolean burnFreed;
+final AtomicBoolean[] freedAllMemory;
+final ExecutorService executorService;
+final List> threadResultFuture;
+final int targetSizeQuanta;
+
+TestEnvironment(int threadCount, long duration, int poolSize)
+{
+this.threadCount = threadCount;
+

[5/6] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.11

2018-10-23 Thread aleksey
Merge branch 'cassandra-3.0' into cassandra-3.11


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/6308fb21
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/6308fb21
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/6308fb21

Branch: refs/heads/cassandra-3.11
Commit: 6308fb21da9ef512e8620779efb68b49f277484d
Parents: 075f458 e07d53a
Author: Aleksey Yeshchenko 
Authored: Tue Oct 23 23:27:09 2018 +0100
Committer: Aleksey Yeshchenko 
Committed: Tue Oct 23 23:27:09 2018 +0100

--
 build.xml   |  33 +
 .../utils/memory/LongBufferPoolTest.java| 614 ---
 2 files changed, 411 insertions(+), 236 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/6308fb21/build.xml
--
diff --cc build.xml
index ff1695b,d7e6c4b..c9565f8
--- a/build.xml
+++ b/build.xml
@@@ -1399,7 -1345,15 +1399,15 @@@
  

  
+   
+   
+ 
+   
+ 
+   
 -  
 +  
  
  



-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[6/6] cassandra git commit: Merge branch 'cassandra-3.11' into trunk

2018-10-23 Thread aleksey
Merge branch 'cassandra-3.11' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/bf6ddb3b
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/bf6ddb3b
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/bf6ddb3b

Branch: refs/heads/trunk
Commit: bf6ddb3bc8af19fd5b7c300c32e0d0071bcef192
Parents: c3ef43d 6308fb2
Author: Aleksey Yeshchenko 
Authored: Tue Oct 23 23:32:49 2018 +0100
Committer: Aleksey Yeshchenko 
Committed: Tue Oct 23 23:35:49 2018 +0100

--
 build.xml   |  34 +
 .../utils/memory/LongBufferPoolTest.java| 623 ---
 2 files changed, 421 insertions(+), 236 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/bf6ddb3b/build.xml
--
diff --cc build.xml
index 28c33bf,c9565f8..3d3014c
--- a/build.xml
+++ b/build.xml
@@@ -1441,17 -1400,14 +1441,26 @@@

  

 +  
 +
 +  
 +  
 +  
 +
 +  
 +
++  
+   
+ 
+   
+ 
+   
 -  
++
 +  
  
  


http://git-wip-us.apache.org/repos/asf/cassandra/blob/bf6ddb3b/test/burn/org/apache/cassandra/utils/memory/LongBufferPoolTest.java
--
diff --cc test/burn/org/apache/cassandra/utils/memory/LongBufferPoolTest.java
index 17ac569,66abe5a..57aa940
--- a/test/burn/org/apache/cassandra/utils/memory/LongBufferPoolTest.java
+++ b/test/burn/org/apache/cassandra/utils/memory/LongBufferPoolTest.java
@@@ -27,11 -27,11 +27,13 @@@ import java.util.concurrent.*
  import java.util.concurrent.atomic.AtomicBoolean;
  
  import com.google.common.util.concurrent.Uninterruptibles;
++import org.junit.BeforeClass;
  import org.junit.Test;
  
  import org.slf4j.Logger;
  import org.slf4j.LoggerFactory;
  
++import org.apache.cassandra.config.DatabaseDescriptor;
  import org.apache.cassandra.utils.DynamicList;
  
  import static org.junit.Assert.*;
@@@ -40,6 -62,10 +64,16 @@@ public class LongBufferPoolTes
  {
  private static final Logger logger = 
LoggerFactory.getLogger(LongBufferPoolTest.class);
  
+ private static final int AVG_BUFFER_SIZE = 16 << 10;
+ private static final int STDEV_BUFFER_SIZE = 10 << 10; // picked to 
ensure exceeding buffer size is rare, but occurs
+ private static final DateFormat DATE_FORMAT = new 
SimpleDateFormat("/MM/dd HH:mm:ss");
+ 
++@BeforeClass
++public static void setup() throws Exception
++{
++DatabaseDescriptor.daemonInitialization();
++}
++
  @Test
  public void testAllocate() throws InterruptedException, ExecutionException
  {
@@@ -407,9 -535,19 +543,20 @@@
  }
  }
  
- public static void main(String[] args) throws InterruptedException, 
ExecutionException
+ public static void main(String[] args)
  {
- new 
LongBufferPoolTest().testAllocate(Runtime.getRuntime().availableProcessors(), 
TimeUnit.HOURS.toNanos(2L), 16 << 20);
+ try
+ {
++LongBufferPoolTest.setup();
+ new 
LongBufferPoolTest().testAllocate(Runtime.getRuntime().availableProcessors(),
+   TimeUnit.HOURS.toNanos(2L), 
16 << 20);
+ System.exit(0);
+ }
+ catch (Throwable tr)
+ {
+ System.out.println(String.format("Test failed - %s", 
tr.getMessage()));
+ System.exit(1); // Force exit so that non-daemon threads like 
REQUEST-SCHEDULER do not hang the process on failure
+ }
  }
  
  /**


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14836) Incorrect log entry during startup in 4.0

2018-10-23 Thread Ariel Weisberg (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-14836:
---
Resolution: Fixed
Status: Resolved  (was: Ready to Commit)

Committed as 
[c3ef43d45e5c9c44f22dbeb8a58232aa6f0cfd15|https://github.com/apache/cassandra/commit/c3ef43d45e5c9c44f22dbeb8a58232aa6f0cfd15]
 thanks!

> Incorrect log entry during startup in 4.0
> -
>
> Key: CASSANDRA-14836
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14836
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tommy Stendahl
>Assignee: Tommy Stendahl
>Priority: Trivial
>
> When doing some testing on 4.0 I found this in the log:
> {noformat}
> 2018-10-12T14:06:14.507+0200  INFO [main] 
> StartupClusterConnectivityChecker.java:113 After waiting/processing for 10005 
> milliseconds, 1 out of 3 peers (0.0%) have been marked alive and had 
> connections established{noformat}
> 1 out of 3 is not 0%:)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



cassandra git commit: Incorrect log entry during startup in 4.0

2018-10-23 Thread aweisberg
Repository: cassandra
Updated Branches:
  refs/heads/trunk 4ae229f5c -> c3ef43d45


Incorrect log entry during startup in 4.0

Patch by Tommy Stendahl; Reviewed by Ariel Weisberg for CASSANDRA-14836


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/c3ef43d4
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/c3ef43d4
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/c3ef43d4

Branch: refs/heads/trunk
Commit: c3ef43d45e5c9c44f22dbeb8a58232aa6f0cfd15
Parents: 4ae229f
Author: tommy stendahl 
Authored: Tue Oct 23 16:09:04 2018 -0400
Committer: Ariel Weisberg 
Committed: Tue Oct 23 16:13:44 2018 -0400

--
 .../apache/cassandra/net/StartupClusterConnectivityChecker.java| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/c3ef43d4/src/java/org/apache/cassandra/net/StartupClusterConnectivityChecker.java
--
diff --git 
a/src/java/org/apache/cassandra/net/StartupClusterConnectivityChecker.java 
b/src/java/org/apache/cassandra/net/StartupClusterConnectivityChecker.java
index db04ca3..bab3283 100644
--- a/src/java/org/apache/cassandra/net/StartupClusterConnectivityChecker.java
+++ b/src/java/org/apache/cassandra/net/StartupClusterConnectivityChecker.java
@@ -114,7 +114,7 @@ public class StartupClusterConnectivityChecker
 TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - 
startNanos),
 connected,
 peers.size(),
-connected / (peers.size()) * 100.0);
+String.format("%.2f", (connected / (float)peers.size()) * 
100));
 return succeeded;
 }
 


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14836) Incorrect log entry during startup in 4.0

2018-10-23 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661218#comment-16661218
 ] 

Ariel Weisberg commented on CASSANDRA-14836:


+1, I made some small tweaks. I removed the extra float cast and I added string 
formatting so we only print two digits after the decimal point.

Running the tests now. 

https://github.com/apache/cassandra/compare/trunk...aweisberg:14836-trunk?expand=1
https://circleci.com/gh/aweisberg/cassandra/tree/14836-trunk


> Incorrect log entry during startup in 4.0
> -
>
> Key: CASSANDRA-14836
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14836
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tommy Stendahl
>Assignee: Tommy Stendahl
>Priority: Trivial
>
> When doing some testing on 4.0 I found this in the log:
> {noformat}
> 2018-10-12T14:06:14.507+0200  INFO [main] 
> StartupClusterConnectivityChecker.java:113 After waiting/processing for 10005 
> milliseconds, 1 out of 3 peers (0.0%) have been marked alive and had 
> connections established{noformat}
> 1 out of 3 is not 0%:)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14836) Incorrect log entry during startup in 4.0

2018-10-23 Thread Ariel Weisberg (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-14836:
---
Status: Ready to Commit  (was: Patch Available)

> Incorrect log entry during startup in 4.0
> -
>
> Key: CASSANDRA-14836
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14836
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tommy Stendahl
>Assignee: Tommy Stendahl
>Priority: Trivial
>
> When doing some testing on 4.0 I found this in the log:
> {noformat}
> 2018-10-12T14:06:14.507+0200  INFO [main] 
> StartupClusterConnectivityChecker.java:113 After waiting/processing for 10005 
> milliseconds, 1 out of 3 peers (0.0%) have been marked alive and had 
> connections established{noformat}
> 1 out of 3 is not 0%:)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14838) Dropped columns can cause reverse sstable iteration to return prematurely

2018-10-23 Thread Blake Eggleston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-14838:

Reviewer: Sam Tunnicliffe
  Status: Patch Available  (was: Open)

|[3.0|https://github.com/bdeggleston/cassandra/tree/14838-3.0]|[circle|https://circleci.com/gh/bdeggleston/workflows/cassandra/tree/cci%2F14838-3.0]|
|[3.11|https://github.com/bdeggleston/cassandra/tree/14838-3.11]|[circle|https://circleci.com/gh/bdeggleston/workflows/cassandra/tree/cci%2F14838-3.11]|
|[trunk|https://github.com/bdeggleston/cassandra/tree/14838-trunk]|[circle|https://circleci.com/gh/bdeggleston/workflows/cassandra/tree/cci%2F14838-trunk]|

> Dropped columns can cause reverse sstable iteration to return prematurely
> -
>
> Key: CASSANDRA-14838
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14838
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Major
> Fix For: 3.0.18, 3.11.4, 4.0
>
>
> CASSANDRA-14803 fixed an issue where reading legacy sstables in reverse could 
> return early in certain cases. It's also possible to get into this state with 
> current version sstables if there are 2 or more indexed blocks in a row that 
> only contain data for a dropped column. Post 14803, this will throw an 
> exception instead of returning an incomplete response, but it should just 
> continue reading like it does for legacy sstables



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14838) Dropped columns can cause reverse sstable iteration to return prematurely

2018-10-23 Thread Blake Eggleston (JIRA)
Blake Eggleston created CASSANDRA-14838:
---

 Summary: Dropped columns can cause reverse sstable iteration to 
return prematurely
 Key: CASSANDRA-14838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14838
 Project: Cassandra
  Issue Type: Bug
Reporter: Blake Eggleston
Assignee: Blake Eggleston
 Fix For: 3.0.18, 3.11.4, 4.0


CASSANDRA-14803 fixed an issue where reading legacy sstables in reverse could 
return early in certain cases. It's also possible to get into this state with 
current version sstables if there are 2 or more indexed blocks in a row that 
only contain data for a dropped column. Post 14803, this will throw an 
exception instead of returning an incomplete response, but it should just 
continue reading like it does for legacy sstables



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14836) Incorrect log entry during startup in 4.0

2018-10-23 Thread Ariel Weisberg (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-14836:
---
Reviewer: Ariel Weisberg

> Incorrect log entry during startup in 4.0
> -
>
> Key: CASSANDRA-14836
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14836
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tommy Stendahl
>Assignee: Tommy Stendahl
>Priority: Trivial
>
> When doing some testing on 4.0 I found this in the log:
> {noformat}
> 2018-10-12T14:06:14.507+0200  INFO [main] 
> StartupClusterConnectivityChecker.java:113 After waiting/processing for 10005 
> milliseconds, 1 out of 3 peers (0.0%) have been marked alive and had 
> connections established{noformat}
> 1 out of 3 is not 0%:)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14495) Memory Leak /High Memory usage post 3.11.2 upgrade

2018-10-23 Thread Abdul Patel (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661010#comment-16661010
 ] 

Abdul Patel commented on CASSANDRA-14495:
-

i only see high heap memory usage nothing else, and when i searched for 
GCInspector i found the the listed message

> Memory Leak /High Memory usage post 3.11.2 upgrade
> --
>
> Key: CASSANDRA-14495
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14495
> Project: Cassandra
>  Issue Type: Bug
>  Components: Metrics
>Reporter: Abdul Patel
>Priority: Major
> Attachments: cas_heap.txt
>
>
> Hi All,
>  
> I recently upgraded my non prod cassandra cluster( 4 nodes single DC) from 
> 3.10 to 3.11.2 version.
> No issues reported apart from only nodetool info reporting 80% usage .
> I intially had 16GB memory on each node, later i bumped up to 20GB, and 
> rebooted all nodes.
> Waited for an week and now again i have seen memory usage more than 80% , 
> 16GB + .
> this means some memory leaks are happening over the time.
> Any one has faced such issue or do we have any workaround ? my 3.11.2 version 
>  upgrade rollout has been halted because of this bug.
> ===
> ID : 65b64f5a-7fe6-4036-94c8-8da9c57718cc
> Gossip active  : true
> Thrift active  : true
> Native Transport active: true
> Load   : 985.24 MiB
> Generation No  : 1526923117
> Uptime (seconds)   : 1097684
> Heap Memory (MB)   : 16875.64 / 20480.00
> Off Heap Memory (MB)   : 20.42
> Data Center    : DC7
> Rack   : rac1
> Exceptions : 0
> Key Cache  : entries 3569, size 421.44 KiB, capacity 100 MiB, 
> 7931933 hits, 8098632 requests, 0.979 recent hit rate, 14400 save period in 
> seconds
> Row Cache  : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 
> requests, NaN recent hit rate, 0 save period in seconds
> Counter Cache  : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 
> requests, NaN recent hit rate, 7200 save period in seconds
> Chunk Cache    : entries 2361, size 147.56 MiB, capacity 3.97 GiB, 
> 2412803 misses, 72594047 requests, 0.967 recent hit rate, NaN microseconds 
> miss latency
> Percent Repaired   : 99.88086234106282%
> Token  : (invoke with -T/--tokens to see all 256 tokens)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-8798) don't throw TombstoneOverwhelmingException during bootstrap

2018-10-23 Thread Jeff Jirsa (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa updated CASSANDRA-8798:
--
Attachment: (was: 8798.txt)

> don't throw TombstoneOverwhelmingException during bootstrap
> ---
>
> Key: CASSANDRA-8798
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8798
> Project: Cassandra
>  Issue Type: Bug
>Reporter: mck
>Priority: Major
>
> During bootstrap honouring tombstone_failure_threshold seems 
> counter-productive as the node is not serving requests so not protecting 
> anything.
> Instead what happens is bootstrap fails, and a cluster that obviously needs 
> an extra node isn't getting it...
> **History**
> When adding a new node bootstrap process looks complete in that streaming is 
> finished, compactions finished, and all disk and cpu activity is calm.
> But the node is still stuck in "joining" status. 
> The last stage in the bootstrapping process is the rebuilding of secondary 
> indexes. grepping the logs confirmed it failed during this stage.
> {code}grep SecondaryIndexManager cassandra/logs/*{code}
> To see what secondary index rebuilding was initiated
> {code}
> grep "index build of " cassandra/logs/* | awk -F" for data in " '{print $1}'
> INFO 13:18:11,252 Submitting index build of addresses.unobfuscatedIndex
> INFO 13:18:11,352 Submitting index build of Inbox.FINNBOXID_INDEX
> INFO 23:03:54,758 Submitting index build of [events.collected_tbIndex, 
> events.real_tbIndex]
> {code}
> To get an idea of successful secondary index rebuilding 
> {code}grep "Index build of "cassandra/logs/*
> INFO 13:18:11,263 Index build of addresses.unobfuscatedIndex complete
> INFO 13:18:11,355 Index build of Inbox.FINNBOXID_INDEX complete
> {code}
> Looking closer at  {{[events.collected_tbIndex, events.real_tbIndex]}} showed 
> the following stacktrace
> {code}
> ERROR [StreamReceiveTask:121] 2015-02-12 05:54:47,768 CassandraDaemon.java 
> (line 199) Exception in thread Thread[StreamReceiveTask:121,5,main]
> java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: 
> org.apache.cassandra.db.filter.TombstoneOverwhelmingException
> at 
> org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:413)
> at 
> org.apache.cassandra.db.index.SecondaryIndexManager.maybeBuildSecondaryIndexes(SecondaryIndexManager.java:142)
> at 
> org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:130)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: 
> org.apache.cassandra.db.filter.TombstoneOverwhelmingException
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:188)
> at 
> org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:409)
> ... 7 more
> Caused by: java.lang.RuntimeException: 
> org.apache.cassandra.db.filter.TombstoneOverwhelmingException
> at 
> org.apache.cassandra.service.pager.QueryPagers$1.next(QueryPagers.java:160)
> at 
> org.apache.cassandra.service.pager.QueryPagers$1.next(QueryPagers.java:143)
> at org.apache.cassandra.db.Keyspace.indexRow(Keyspace.java:406)
> at 
> org.apache.cassandra.db.index.SecondaryIndexBuilder.build(SecondaryIndexBuilder.java:62)
> at 
> org.apache.cassandra.db.compaction.CompactionManager$9.run(CompactionManager.java:834)
> ... 5 more
> Caused by: org.apache.cassandra.db.filter.TombstoneOverwhelmingException
> at 
> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:202)
> at 
> org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
> at 
> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
> at 
> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
> at 
> org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
> at 
> org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1547)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1376)

[jira] [Commented] (CASSANDRA-14495) Memory Leak /High Memory usage post 3.11.2 upgrade

2018-10-23 Thread Chris Lohfink (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660862#comment-16660862
 ] 

Chris Lohfink commented on CASSANDRA-14495:
---

If your gc time is set to 500ms like the default g1 settings are it doesn't 
mean much just that the JVM is doing what its supposed to do - it fills up 
enough eden regions and tries to set the number of regions such that with the 
current allocation rates it will take up the targeted pause time. Take a look 
at https://www.oracle.com/technetwork/tutorials/tutorials-1876574.html and the 
gc logs, theres many youtube presentations and blogs that can help walk through 
the phases and how to read the logs.

Do you have an actual problem your experiencing? Bad latencies? timeouts? If so 
thats different and nodetool tablestats and schema helpful if its a data model 
issue but try to describe the problem your having and perhaps move this to user 
list or stack overflow as this jira is for bug reports, new features etc and 
changes to C* source. Your GCs are fairly frequent though so if its impacting 
your system people can help identify a bad data model and maybe some mitigation 
approaches but there are better forums to reach out for that kind of help.

> Memory Leak /High Memory usage post 3.11.2 upgrade
> --
>
> Key: CASSANDRA-14495
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14495
> Project: Cassandra
>  Issue Type: Bug
>  Components: Metrics
>Reporter: Abdul Patel
>Priority: Major
> Attachments: cas_heap.txt
>
>
> Hi All,
>  
> I recently upgraded my non prod cassandra cluster( 4 nodes single DC) from 
> 3.10 to 3.11.2 version.
> No issues reported apart from only nodetool info reporting 80% usage .
> I intially had 16GB memory on each node, later i bumped up to 20GB, and 
> rebooted all nodes.
> Waited for an week and now again i have seen memory usage more than 80% , 
> 16GB + .
> this means some memory leaks are happening over the time.
> Any one has faced such issue or do we have any workaround ? my 3.11.2 version 
>  upgrade rollout has been halted because of this bug.
> ===
> ID : 65b64f5a-7fe6-4036-94c8-8da9c57718cc
> Gossip active  : true
> Thrift active  : true
> Native Transport active: true
> Load   : 985.24 MiB
> Generation No  : 1526923117
> Uptime (seconds)   : 1097684
> Heap Memory (MB)   : 16875.64 / 20480.00
> Off Heap Memory (MB)   : 20.42
> Data Center    : DC7
> Rack   : rac1
> Exceptions : 0
> Key Cache  : entries 3569, size 421.44 KiB, capacity 100 MiB, 
> 7931933 hits, 8098632 requests, 0.979 recent hit rate, 14400 save period in 
> seconds
> Row Cache  : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 
> requests, NaN recent hit rate, 0 save period in seconds
> Counter Cache  : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 
> requests, NaN recent hit rate, 7200 save period in seconds
> Chunk Cache    : entries 2361, size 147.56 MiB, capacity 3.97 GiB, 
> 2412803 misses, 72594047 requests, 0.967 recent hit rate, NaN microseconds 
> miss latency
> Percent Repaired   : 99.88086234106282%
> Token  : (invoke with -T/--tokens to see all 256 tokens)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 16kb

2018-10-23 Thread Ariel Weisberg (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-13241:
---
Status: Patch Available  (was: Open)

> Lower default chunk_length_in_kb from 64kb to 16kb
> --
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>Assignee: Ariel Weisberg
>Priority: Major
> Attachments: CompactIntegerSequence.java, 
> CompactIntegerSequenceBench.java, CompactSummingIntegerSequence.java
>
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14806) CircleCI workflow improvements and Java 11 support

2018-10-23 Thread Jon Meredith (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660687#comment-16660687
 ] 

Jon Meredith commented on CASSANDRA-14806:
--

Looks like we had the same ideas on the CircleCI config - I was trying to add 
targets that generated coverage information with JaCoCo in CASSANDRA-14788. 
Obviously the two changes conflict, I can look at reworking the coverage 
changes on top of this one once it has merged, or you're welcome to incorporate 
the changes in this patch if you would prefer.

Also, I'm not sure if you've had problems running the burn tests on the high 
resource configuration. I hit problems locally on 8 / 12 core machines running 
it on machines that are not configured to use very large heaps.  
CASSANDRA-14790 makes the long buffer test pass for me, however the long btree 
test has been a little flaky just running locally.

 

> CircleCI workflow improvements and Java 11 support
> --
>
> Key: CASSANDRA-14806
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14806
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Build, Testing
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>Priority: Major
>
> The current CircleCI config could use some cleanup and improvements. First of 
> all, the config has been made more modular by using the new CircleCI 2.1 
> executors and command elements. Based on CASSANDRA-14713, there's now also a 
> Java 11 executor that will allow running tests under Java 11. The {{build}} 
> step will be done using Java 11 in all cases, so we can catch any regressions 
> for that and also test the Java 11 multi-jar artifact during dtests, that 
> we'd also create during the release process.
> The job workflow has now also been changed to make use of the [manual job 
> approval|https://circleci.com/docs/2.0/workflows/#holding-a-workflow-for-a-manual-approval]
>  feature, which now allows running dtest jobs only on request and not 
> automatically with every commit. The Java8 unit tests still do, but that 
> could also be easily changed if needed. See [example 
> workflow|https://circleci.com/workflow-run/be25579d-3cbb-4258-9e19-b1f571873850]
>  with start_ jobs being triggers needed manual approval for starting the 
> actual jobs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14495) Memory Leak /High Memory usage post 3.11.2 upgrade

2018-10-23 Thread Abdul Patel (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660666#comment-16660666
 ] 

Abdul Patel commented on CASSANDRA-14495:
-

I see below GC inspector messages, can some help me undertsnad these ?

INFO [Service Thread] 2018-10-22 11:10:22,178 GCInspector.java:284 - G1 Young 
Generation GC in 251ms. G1 Eden Space: 729808896 -> 0; G1 Old Gen: 7029653504 
-> 7138390512; G1 Survivor Space: 109051904 -> 75497472;
INFO [Service Thread] 2018-10-22 11:16:05,708 GCInspector.java:284 - G1 Young 
Generation GC in 209ms. G1 Eden Space: 729808896 -> 0; G1 Old Gen: 4353687552 
-> 4483710984;
INFO [Service Thread] 2018-10-22 11:16:08,759 GCInspector.java:284 - G1 Young 
Generation GC in 253ms. G1 Eden Space: 729808896 -> 0; G1 Old Gen: 4483710984 
-> 4601151480;
INFO [Service Thread] 2018-10-22 11:20:26,201 GCInspector.java:284 - G1 Young 
Generation GC in 217ms. G1 Eden Space: 729808896 -> 0; G1 Old Gen: 3602907144 
-> 2015954944;
INFO [Service Thread] 2018-10-22 11:20:34,467 GCInspector.java:284 - G1 Young 
Generation GC in 206ms. G1 Eden Space: 729808896 -> 0; G1 Old Gen: 1874853896 
-> 2052537840;
INFO [Service Thread] 2018-10-22 11:48:32,697 GCInspector.java:284 - G1 Young 
Generation GC in 573ms. G1 Eden Space: 10200547328 -> 0; G1 Old Gen: 35824 -> 
0; G1 Survivor Space: 100663296 -> 939524096;
INFO [Service Thread] 2018-10-22 11:48:38,038 GCInspector.java:284 - G1 Young 
Generation GC in 793ms. G1 Eden Space: 3078619136 -> 0; G1 Old Gen: 0 -> 
907214328; G1 Survivor Space: 939524096 -> 394264576;
INFO [Service Thread] 2018-10-22 11:48:39,133 GCInspector.java:284 - G1 Young 
Generation GC in 294ms. G1 Eden Space: 461373440 -> 0; G1 Old Gen: 907214328 -> 
1291845632; G1 Survivor Space: 394264576 -> 75497472;
INFO [Service Thread] 2018-10-22 11:49:12,405 GCInspector.java:284 - G1 Young 
Generation GC in 222ms. G1 Eden Space: 2181038080 -> 0; G1 Old Gen: 3481272304 
-> 3677068784; G1 Survivor Space: 251658240 -> 243269632;
INFO [Service Thread] 2018-10-22 11:49:34,485 GCInspector.java:284 - G1 Young 
Generation GC in 210ms. G1 Eden Space: 4085252096 -> 0; G1 Survivor Space: 
67108864 -> 234881024;
INFO [Service Thread] 2018-10-22 11:49:41,027 GCInspector.java:284 - G1 Young 
Generation GC in 208ms. G1 Eden Space: 2290089984 -> 0; G1 Old Gen: 4903141368 
-> 5096079352; G1 Survivor Space: 192937984 -> 100663296;
INFO [Service Thread] 2018-10-22 11:49:47,059 GCInspector.java:284 - G1 Young 
Generation GC in 229ms. G1 Eden Space: 2113929216 -> 0; G1 Old Gen: 5096079352 
-> 5179965448; G1 Survivor Space: 100663296 -> 260046848;
INFO [Service Thread] 2018-10-22 11:49:47,864 GCInspector.java:284 - G1 Young 
Generation GC in 240ms. G1 Eden Space: 595591168 -> 0; G1 Old Gen: 5179965448 
-> 5456591864; G1 Survivor Space: 260046848 -> 41943040;
INFO [Service Thread] 2018-10-22 11:51:55,126 GCInspector.java:284 - G1 Young 
Generation GC in 682ms. G1 Eden Space: 10208935936 -> 0; G1 Old Gen: 2657805472 
-> 2663677936; G1 Survivor Space: 92274688 -> 830472192;
INFO [Service Thread] 2018-10-22 11:52:02,632 GCInspector.java:284 - G1 Young 
Generation GC in 614ms. G1 Eden Space: 2558525440 -> 0; G1 Old Gen: 2663677936 
-> 3467692024; G1 Survivor Space: 830472192 -> 318767104;
INFO [Service Thread] 2018-10-22 11:52:04,595 GCInspector.java:284 - G1 Young 
Generation GC in 213ms. G1 Eden Space: 536870912 -> 0; G1 Old Gen: 3467692024 
-> 3783262192; G1 Survivor Space: 318767104 -> 83886080;
INFO [Service Thread] 2018-10-22 11:53:41,556 GCInspector.java:284 - G1 Young 
Generation GC in 279ms. G1 Eden Space: 10150215680 -> 0; G1 Old Gen: 3793081848 
-> 3797276144; G1 Survivor Space: 150994944 -> 662700032;
INFO [Service Thread] 2018-10-22 11:53:51,744 GCInspector.java:284 - G1 Young 
Generation GC in 521ms. G1 Eden Space: 7440695296 -> 0; G1 Old Gen: 3797276144 
-> 3918572016; G1 Survivor Space: 662700032 -> 998244352;
INFO [Service Thread] 2018-10-22 11:53:52,370 GCInspector.java:284 - G1 Young 
Generation GC in 589ms. G1 Eden Space: 8388608 -> 0; G1 Old Gen: 3918572016 -> 
4907335664; G1 Survivor Space: 998244352 -> 33554432;
INFO [Service Thread] 2018-10-22 11:54:31,296 GCInspector.java:284 - G1 Young 
Generation GC in 253ms. G1 Eden Space: 1937768448 -> 0; G1 Old Gen: 3246391280 
-> 3661627384; G1 Survivor Space: 394264576 -> 100663296;
INFO [Service Thread] 2018-10-22 11:54:42,711 GCInspector.java:284 - G1 Young 
Generation GC in 210ms. G1 Eden Space: 872415232 -> 0; G1 Old Gen: 3955228664 
-> 4215275512; G1 Survivor Space: 260046848 -> 67108864;
INFO [Service Thread] 2018-10-22 11:54:47,487 GCInspector.java:284 - G1 Young 
Generation GC in 207ms. G1 Eden Space: 3816816640 -> 0; G1 Survivor Space: 
67108864 -> 260046848;
INFO [Service Thread] 2018-10-22 11:54:48,615 GCInspector.java:284 - G1 Young 
Generation GC in 212ms. G1 Eden Space: 763363328 -> 0; G1 Old Gen: 4215275512 
-> 4466933752; G1 Survivor Space: 

[jira] [Assigned] (CASSANDRA-14837) Refactor/clean up QueryProcessor

2018-10-23 Thread Marcus Eriksson (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson reassigned CASSANDRA-14837:
---

  Assignee: Marcus Eriksson
Issue Type: Improvement  (was: Bug)

parking this on me for now, but if anyone wants to start working on it, please 
do

> Refactor/clean up QueryProcessor
> 
>
> Key: CASSANDRA-14837
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14837
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 4.x
>
>
> {{QueryProcessor}} has grown over the years, adding special methods for 
> different use cases etc, we should clean it up and perhaps break out all the 
> "internal" methods to a separate class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14837) Refactor/clean up QueryProcessor

2018-10-23 Thread Marcus Eriksson (JIRA)
Marcus Eriksson created CASSANDRA-14837:
---

 Summary: Refactor/clean up QueryProcessor
 Key: CASSANDRA-14837
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14837
 Project: Cassandra
  Issue Type: Bug
Reporter: Marcus Eriksson
 Fix For: 4.x


{{QueryProcessor}} has grown over the years, adding special methods for 
different use cases etc, we should clean it up and perhaps break out all the 
"internal" methods to a separate class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13010) nodetool compactionstats should say which disk a compaction is writing to

2018-10-23 Thread Anonymous (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous updated CASSANDRA-13010:
--
Status: Open  (was: Ready to Commit)

> nodetool compactionstats should say which disk a compaction is writing to
> -
>
> Key: CASSANDRA-13010
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13010
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Compaction, Tools
>Reporter: Jon Haddad
>Assignee: Alex Lourie
>Priority: Major
>  Labels: 4.0-feature-freeze-review-requested, lhf
> Attachments: cleanup.png, multiple operations.png
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13010) nodetool compactionstats should say which disk a compaction is writing to

2018-10-23 Thread Anonymous (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous updated CASSANDRA-13010:
--
Status: Ready to Commit  (was: Patch Available)

> nodetool compactionstats should say which disk a compaction is writing to
> -
>
> Key: CASSANDRA-13010
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13010
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Compaction, Tools
>Reporter: Jon Haddad
>Assignee: Alex Lourie
>Priority: Major
>  Labels: 4.0-feature-freeze-review-requested, lhf
> Attachments: cleanup.png, multiple operations.png
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14806) CircleCI workflow improvements and Java 11 support

2018-10-23 Thread Marcus Eriksson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660450#comment-16660450
 ] 

Marcus Eriksson commented on CASSANDRA-14806:
-

this looks good to me, a few comments (mostly circle-ci-config-feature-related);
* It would be nice to at some point get rid of the {{<<:}} stuff as it is not 
really covered in the circleci docs, this probably needs additional circleci 
features though
* We can probably remove {{with_dtest_jobs_only}} now? Or do you have a good 
use case for it? 
https://github.com/krummas/cassandra/commit/8e6732694648cdf69a64bcbf416226ddae1f95b7
* We should probably allow {{test-compression}} to be run in parallel, this 
requires build.xml changes though, so we can do it in a later ticket
* Maybe we could ship a {{.circleci/high_resource.patch}} so that enabling the 
high resource would simply be {{git am .circleci/high_resource.patch}}?

As discussed offline, we should probably give a heads up to the mailing list 
(especially the fact that you now need to enable 'build processing') to get 
this to work. Waiting until 2.1 config is out of 'preview' mode might also be 
good?


> CircleCI workflow improvements and Java 11 support
> --
>
> Key: CASSANDRA-14806
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14806
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Build, Testing
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>Priority: Major
>
> The current CircleCI config could use some cleanup and improvements. First of 
> all, the config has been made more modular by using the new CircleCI 2.1 
> executors and command elements. Based on CASSANDRA-14713, there's now also a 
> Java 11 executor that will allow running tests under Java 11. The {{build}} 
> step will be done using Java 11 in all cases, so we can catch any regressions 
> for that and also test the Java 11 multi-jar artifact during dtests, that 
> we'd also create during the release process.
> The job workflow has now also been changed to make use of the [manual job 
> approval|https://circleci.com/docs/2.0/workflows/#holding-a-workflow-for-a-manual-approval]
>  feature, which now allows running dtest jobs only on request and not 
> automatically with every commit. The Java8 unit tests still do, but that 
> could also be easily changed if needed. See [example 
> workflow|https://circleci.com/workflow-run/be25579d-3cbb-4258-9e19-b1f571873850]
>  with start_ jobs being triggers needed manual approval for starting the 
> actual jobs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14836) Incorrect log entry during startup in 4.0

2018-10-23 Thread Tommy Stendahl (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660448#comment-16660448
 ] 

Tommy Stendahl commented on CASSANDRA-14836:


Patch availible here: 
[cassandra-14836|https://github.com/tommystendahl/cassandra/commit/bcaa224dec3939dcc883ded250fb9849bfbfa992]

> Incorrect log entry during startup in 4.0
> -
>
> Key: CASSANDRA-14836
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14836
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tommy Stendahl
>Assignee: Tommy Stendahl
>Priority: Trivial
>
> When doing some testing on 4.0 I found this in the log:
> {noformat}
> 2018-10-12T14:06:14.507+0200  INFO [main] 
> StartupClusterConnectivityChecker.java:113 After waiting/processing for 10005 
> milliseconds, 1 out of 3 peers (0.0%) have been marked alive and had 
> connections established{noformat}
> 1 out of 3 is not 0%:)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14836) Incorrect log entry during startup in 4.0

2018-10-23 Thread Tommy Stendahl (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tommy Stendahl updated CASSANDRA-14836:
---
Status: Patch Available  (was: Open)

> Incorrect log entry during startup in 4.0
> -
>
> Key: CASSANDRA-14836
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14836
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tommy Stendahl
>Assignee: Tommy Stendahl
>Priority: Trivial
>
> When doing some testing on 4.0 I found this in the log:
> {noformat}
> 2018-10-12T14:06:14.507+0200  INFO [main] 
> StartupClusterConnectivityChecker.java:113 After waiting/processing for 10005 
> milliseconds, 1 out of 3 peers (0.0%) have been marked alive and had 
> connections established{noformat}
> 1 out of 3 is not 0%:)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14836) Incorrect log entry during startup in 4.0

2018-10-23 Thread Tommy Stendahl (JIRA)
Tommy Stendahl created CASSANDRA-14836:
--

 Summary: Incorrect log entry during startup in 4.0
 Key: CASSANDRA-14836
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14836
 Project: Cassandra
  Issue Type: Bug
Reporter: Tommy Stendahl


When doing some testing on 4.0 I found this in the log:
{noformat}
2018-10-12T14:06:14.507+0200  INFO [main] 
StartupClusterConnectivityChecker.java:113 After waiting/processing for 10005 
milliseconds, 1 out of 3 peers (0.0%) have been marked alive and had 
connections established{noformat}
1 out of 3 is not 0%:)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-14836) Incorrect log entry during startup in 4.0

2018-10-23 Thread Tommy Stendahl (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tommy Stendahl reassigned CASSANDRA-14836:
--

Assignee: Tommy Stendahl

> Incorrect log entry during startup in 4.0
> -
>
> Key: CASSANDRA-14836
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14836
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tommy Stendahl
>Assignee: Tommy Stendahl
>Priority: Trivial
>
> When doing some testing on 4.0 I found this in the log:
> {noformat}
> 2018-10-12T14:06:14.507+0200  INFO [main] 
> StartupClusterConnectivityChecker.java:113 After waiting/processing for 10005 
> milliseconds, 1 out of 3 peers (0.0%) have been marked alive and had 
> connections established{noformat}
> 1 out of 3 is not 0%:)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14812) Multiget Thrift query processor skips records in case of digest mismatch

2018-10-23 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660334#comment-16660334
 ] 

Benedict edited comment on CASSANDRA-14812 at 10/23/18 9:32 AM:


[~sivukhin.nikita], sorry for the delay - unfortunately other things cropped up 
that required my attention.

I'm awaiting the results of a clean CI run, but 
[here|https://github.com/belliottsmith/cassandra/commit/111392281afa41ca2b22e8f90369f9c29a18bfd2]
 is a commit that should fix the problem for you.

The only necessary changes are those to {{PartitionIterators}} and 
{{BasePartitions}}

The main problem was introduced right back in the beginning with 
{{PartitionIterators.concat}} that used the {{MoreContents.extend()}} feature - 
which retains the transformations already applied to the iterator at the point 
of extension.  Logically, a {{concat}} should not do this, as each element is 
unrelated to the others.

The change to {{BasePartitions}} is necessary because of CASSANDRA-13482, which 
seems to have erroneously applied the child stop criteria to the whole parent 
iterator.  I haven't fully explored the original intent of this change, but the 
modification made here seems most likely what was intended from the ticket 
description.

The branch I have uploaded also has a back port of CASSANDRA-14821, to support 
a dtest to exhibit this bug as it actually occurs, though your test cases were 
greatly appreciated.


was (Author: benedict):
[~sivukhin.nikita], sorry for the delay - unfortunately other things cropped up 
that required my attention.

I'm awaiting the results of a clean CI run, but 
[here|https://github.com/belliottsmith/cassandra/commit/111392281afa41ca2b22e8f90369f9c29a18bfd2]
 is a commit that should fix the problem for you.

The only necessary changes are those to {{PartitionIterators}} and 
{{BasePartitions}}

> Multiget Thrift query processor skips records in case of digest mismatch
> 
>
> Key: CASSANDRA-14812
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14812
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sivukhin Nikita
>Priority: Critical
>  Labels: bug
> Attachments: repro_script.py, requirements.txt, small_repro_script.py
>
>
> It seems that in Cassandra 3.0.0 a nasty bug was introduced in {{multiget}} 
> Thrift query processing logic. When one tries to read data from several 
> partitions with a single {{multiget}} query and {{DigestMismatch}} exception 
> is raised during this query processing, request coordinator prematurely 
> terminates response stream right at the point where the first 
> \{{DigestMismatch}} error is occurring. This leads to situation where clients 
> "do not see" some data contained in the database.
> We managed to reproduce this bug in all versions of Cassandra starting with 
> v3.0.0. The pre-release version 3.0.0-rc2 works correctly. It looks like 
> [refactoring of iterator transformation 
> hierarchy|https://github.com/apache/cassandra/commit/609497471441273367013c09a1e0e1c990726ec7]
>  related to CASSANDRA-9975 triggers incorrect behaviour.
> When concatenated iterator is returned from the 
> [StorageProxy.fetchRows(...)|https://github.com/apache/cassandra/blob/a05785d82c621c9cd04d8a064c38fd2012ef981c/src/java/org/apache/cassandra/service/StorageProxy.java#L1770],
>  Cassandra starts to consume this combined iterator. Because of 
> {{DigestMismatch}} exception some elements of this combined iterator contain 
> additional {{ThriftCounter}}, that was added during 
> [DataResolver.resolve(...)|https://github.com/apache/cassandra/blob/ee9e06b5a75c0be954694b191ea4170456015b98/src/java/org/apache/cassandra/service/reads/DataResolver.java#L120]
>  execution. While consuming iterator for many partitions Cassandra calls 
> [BaseIterator.tryGetMoreContents(...)|https://github.com/apache/cassandra/blob/a05785d82c621c9cd04d8a064c38fd2012ef981c/src/java/org/apache/cassandra/db/transform/BaseIterator.java#L115]
>  method that must switch from one partition iterator to another in case of 
> exhaustion of the former. In this case all Transformations contained in the 
> next iterator are applied to the combined BaseIterator that enumerates 
> partitions sequence which is wrong. This behaviour causes BaseIterator to 
> stop enumeration after it fully consumes partition with {{DigestMismatch}} 
> error, because this partition iterator has additional {{ThriftCounter}} data 
> limit.
> The attachment contains the python2 script [^small_repro_script.py] that 
> reproduces this bug within 3-nodes ccmlib controlled cluster. Also, there is 
> an extended version of this script - [^repro_script.py] - that contains more 
> logging information and provides the ability to test behavior for many 
> Cassandra 

[jira] [Commented] (CASSANDRA-14812) Multiget Thrift query processor skips records in case of digest mismatch

2018-10-23 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660334#comment-16660334
 ] 

Benedict commented on CASSANDRA-14812:
--

[~sivukhin.nikita], sorry for the delay - unfortunately other things cropped up 
that required my attention.

I'm awaiting the results of a clean CI run, but 
[here|https://github.com/belliottsmith/cassandra/commit/111392281afa41ca2b22e8f90369f9c29a18bfd2]
 is a commit that should fix the problem for you.

The only necessary changes are those to {{PartitionIterators}} and 
{{BasePartitions}}

> Multiget Thrift query processor skips records in case of digest mismatch
> 
>
> Key: CASSANDRA-14812
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14812
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sivukhin Nikita
>Priority: Critical
>  Labels: bug
> Attachments: repro_script.py, requirements.txt, small_repro_script.py
>
>
> It seems that in Cassandra 3.0.0 a nasty bug was introduced in {{multiget}} 
> Thrift query processing logic. When one tries to read data from several 
> partitions with a single {{multiget}} query and {{DigestMismatch}} exception 
> is raised during this query processing, request coordinator prematurely 
> terminates response stream right at the point where the first 
> \{{DigestMismatch}} error is occurring. This leads to situation where clients 
> "do not see" some data contained in the database.
> We managed to reproduce this bug in all versions of Cassandra starting with 
> v3.0.0. The pre-release version 3.0.0-rc2 works correctly. It looks like 
> [refactoring of iterator transformation 
> hierarchy|https://github.com/apache/cassandra/commit/609497471441273367013c09a1e0e1c990726ec7]
>  related to CASSANDRA-9975 triggers incorrect behaviour.
> When concatenated iterator is returned from the 
> [StorageProxy.fetchRows(...)|https://github.com/apache/cassandra/blob/a05785d82c621c9cd04d8a064c38fd2012ef981c/src/java/org/apache/cassandra/service/StorageProxy.java#L1770],
>  Cassandra starts to consume this combined iterator. Because of 
> {{DigestMismatch}} exception some elements of this combined iterator contain 
> additional {{ThriftCounter}}, that was added during 
> [DataResolver.resolve(...)|https://github.com/apache/cassandra/blob/ee9e06b5a75c0be954694b191ea4170456015b98/src/java/org/apache/cassandra/service/reads/DataResolver.java#L120]
>  execution. While consuming iterator for many partitions Cassandra calls 
> [BaseIterator.tryGetMoreContents(...)|https://github.com/apache/cassandra/blob/a05785d82c621c9cd04d8a064c38fd2012ef981c/src/java/org/apache/cassandra/db/transform/BaseIterator.java#L115]
>  method that must switch from one partition iterator to another in case of 
> exhaustion of the former. In this case all Transformations contained in the 
> next iterator are applied to the combined BaseIterator that enumerates 
> partitions sequence which is wrong. This behaviour causes BaseIterator to 
> stop enumeration after it fully consumes partition with {{DigestMismatch}} 
> error, because this partition iterator has additional {{ThriftCounter}} data 
> limit.
> The attachment contains the python2 script [^small_repro_script.py] that 
> reproduces this bug within 3-nodes ccmlib controlled cluster. Also, there is 
> an extended version of this script - [^repro_script.py] - that contains more 
> logging information and provides the ability to test behavior for many 
> Cassandra versions (to run all test cases from repro_script.py you can call 
> {{python -m unittest2 -v repro_script.ThriftMultigetTestCase}}). All the 
> necessary dependencies contained in the [^requirements.txt]
>  
> This bug is critical in our production environment because we can't permit 
> any data skip.
> Any ideas about a patch for this issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14444) Got NPE when querying Cassandra 3.11.2

2018-10-23 Thread Xiaodong Xie (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660202#comment-16660202
 ] 

Xiaodong Xie commented on CASSANDRA-1:
--

Yes, I agree that this is a duplicate of 10880. Please close this one. Thanks. 

> Got NPE when querying Cassandra 3.11.2
> --
>
> Key: CASSANDRA-1
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: Ubuntu 14.04, JDK 1.8.0_171. 
> Cassandra 3.11.2
>Reporter: Xiaodong Xie
>Assignee: Xiaodong Xie
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.11.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We just upgraded our Cassandra cluster from 2.2.6 to 3.11.2
> After upgrading, we immediately got exceptions in Cassandra like this one: 
>  
> {code}
> ERROR [Native-Transport-Requests-1] 2018-05-11 17:10:21,994 
> QueryMessage.java:129 - Unexpected error during query
> java.lang.NullPointerException: null
> at 
> org.apache.cassandra.dht.RandomPartitioner.getToken(RandomPartitioner.java:248)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.dht.RandomPartitioner.decorateKey(RandomPartitioner.java:92)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at org.apache.cassandra.config.CFMetaData.decorateKey(CFMetaData.java:666) 
> ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.service.pager.PartitionRangeQueryPager.(PartitionRangeQueryPager.java:44)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.db.PartitionRangeReadCommand.getPager(PartitionRangeReadCommand.java:268)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.getPager(SelectStatement.java:475)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:288)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:118)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:224)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:255) 
> ~[apache-cassandra-3.11.2.jar:3.11.2]
> at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:240) 
> ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:116)
>  ~[apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:517)
>  [apache-cassandra-3.11.2.jar:3.11.2]
> at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:410)
>  [apache-cassandra-3.11.2.jar:3.11.2]
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>  [netty-all-4.0.44.Final.jar:4.0.44.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
>  [netty-all-4.0.44.Final.jar:4.0.44.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:35)
>  [netty-all-4.0.44.Final.jar:4.0.44.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:348)
>  [netty-all-4.0.44.Final.jar:4.0.44.Final]
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_171]
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
>  [apache-cassandra-3.11.2.jar:3.11.2]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) 
> [apache-cassandra-3.11.2.jar:3.11.2]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_171]
> {code}
>  
> The table schema is like:
> {code}
> CREATE TABLE example.example_table (
>  id bigint,
>  hash text,
>  json text,
>  PRIMARY KEY (id, hash)
> ) WITH COMPACT STORAGE
> {code}
>  
> The query is something like:
> {code}
> "select * from example.example_table;" // (We do know this is bad practise, 
> and we are trying to fix that right now)
> {code}
> with fetch-size as 200, using DataStax Java driver. 
> This table contains about 20k rows. 
>  
> Actually, the fix is quite simple, 
>  
> {code}
> --- a/src/java/org/apache/cassandra/service/pager/PagingState.java
> +++ b/src/java/org/apache/cassandra/service/pager/PagingState.java
> @@ -46,7 +46,7 @@ public class PagingState
> public PagingState(ByteBuffer partitionKey, RowMark rowMark, int remaining, 
> int remainingInPartition)
>  {
> - this.partitionKey = partitionKey;
> + this.partitionKey =