[jira] [Updated] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)

2013-03-21 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-2698:


Attachment: patch.diff

 Instrument repair to be able to assess it's efficiency (precision)
 --

 Key: CASSANDRA-2698
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Priority: Minor
  Labels: lhf
 Attachments: nodetool_repair_and_cfhistogram.tar.gz, 
 patch_2698_v1.txt, patch.diff


 Some reports indicate that repair sometime transfer huge amounts of data. One 
 hypothesis is that the merkle tree precision may deteriorate too much at some 
 data size. To check this hypothesis, it would be reasonably to gather 
 statistic during the merkle tree building of how many rows each merkle tree 
 range account for (and the size that this represent). It is probably an 
 interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)

2013-03-21 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13609625#comment-13609625
 ] 

Benedict commented on CASSANDRA-2698:
-

Hi,

I've uploaded a patch for this issue (patch.diff - apologies for the 
potentially future-clashing name). Logging is performed in two places, both on 
the source (not requesting) node of any comparison:

1) On the requesting node in AntiEntropyService.Difference.run(), after the 
MerkleTree difference is calculated and before the StreamingRepairTask is 
created
2) On the source node, on which StreamingRepairTask is run, in 
StreamOut.createPendingFiles()

In both cases we log, at debug level, a sample of the largest ranges followed 
by a histogram of the range size distribution.  The first is achieved by 
inserting each range directly into an EstimatedHistogram, on which we call the 
new logSummary() method; the second by calling the new groupByFrequency() 
method on that same histogram, to yield a histogram based on the frequency of 
sizes present in the original (on which we simply call log()).

In case 1, we construct the MerkleTree to include a size taken from the 
AbstractCompactedRow we compute the hash from, and use this in 
MerkleTree.difference to estimate the size of mismatching ranges. This tends to 
underestimate, versus that reported by StreamOut, by around 15%. One design 
decision of note here: instead of modifying AbstractCompactedRow to return a 
size (which would be invasive and in some cases incur an unnecessary penalty) 
we use a custom implementation of MessageDigest that counts the number of bytes 
provided to it.

Case 2 is much simpler, as we already have the ranges and their sizes available 
to us.

There are some other changes, particularly in MerkleTree, with some 
refactoring/renames/new subclasses as part of updating MerkleTree.difference(). 
In particular, TreeDifference is returned instead of TreeRange (to accommodate 
the extra size information), and it is used generally in place of it within 
this method tree where applicable; hash() and hashHelper() have also been 
renamed to find() and findHelper(), with a new hash() implementation depending 
on find(). I'm sure there are other minutiae, but hopefully nothing too opaque. 
If you need any clarification, feel free to ask.

 Instrument repair to be able to assess it's efficiency (precision)
 --

 Key: CASSANDRA-2698
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Priority: Minor
  Labels: lhf
 Attachments: nodetool_repair_and_cfhistogram.tar.gz, 
 patch_2698_v1.txt, patch.diff


 Some reports indicate that repair sometime transfer huge amounts of data. One 
 hypothesis is that the merkle tree precision may deteriorate too much at some 
 data size. To check this hypothesis, it would be reasonably to gather 
 statistic during the merkle tree building of how many rows each merkle tree 
 range account for (and the size that this represent). It is probably an 
 interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)

2013-03-28 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-2698:


Attachment: patch-rebased.diff

 Instrument repair to be able to assess it's efficiency (precision)
 --

 Key: CASSANDRA-2698
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Benedict
Priority: Minor
  Labels: lhf
 Attachments: nodetool_repair_and_cfhistogram.tar.gz, 
 patch_2698_v1.txt, patch.diff, patch-rebased.diff


 Some reports indicate that repair sometime transfer huge amounts of data. One 
 hypothesis is that the merkle tree precision may deteriorate too much at some 
 data size. To check this hypothesis, it would be reasonably to gather 
 statistic during the merkle tree building of how many rows each merkle tree 
 range account for (and the size that this represent). It is probably an 
 interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)

2013-03-28 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616569#comment-13616569
 ] 

Benedict commented on CASSANDRA-2698:
-

Hi Yuki,

The patch was created some time ago, and there were some minor renames/changes 
to MerkleTree and AntiEntropyService in the meantime. I've pulled the latest 
changes, merged, and regenerated the patch. This is against the main trunk.

 Instrument repair to be able to assess it's efficiency (precision)
 --

 Key: CASSANDRA-2698
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Benedict
Priority: Minor
  Labels: lhf
 Attachments: nodetool_repair_and_cfhistogram.tar.gz, 
 patch_2698_v1.txt, patch.diff, patch-rebased.diff


 Some reports indicate that repair sometime transfer huge amounts of data. One 
 hypothesis is that the merkle tree precision may deteriorate too much at some 
 data size. To check this hypothesis, it would be reasonably to gather 
 statistic during the merkle tree building of how many rows each merkle tree 
 range account for (and the size that this represent). It is probably an 
 interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)

2013-03-28 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616569#comment-13616569
 ] 

Benedict edited comment on CASSANDRA-2698 at 3/28/13 8:26 PM:
--

Hi Yuki,

The patch was created some time ago, and there were some minor renames/changes 
to MerkleTree and AntiEntropyService in the meantime. I've pulled the latest 
changes, merged, and regenerated the patch. This is against the main trunk / 
HEAD branch.

  was (Author: benedict):
Hi Yuki,

The patch was created some time ago, and there were some minor renames/changes 
to MerkleTree and AntiEntropyService in the meantime. I've pulled the latest 
changes, merged, and regenerated the patch. This is against the main trunk.
  
 Instrument repair to be able to assess it's efficiency (precision)
 --

 Key: CASSANDRA-2698
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Benedict
Priority: Minor
  Labels: lhf
 Attachments: nodetool_repair_and_cfhistogram.tar.gz, 
 patch_2698_v1.txt, patch.diff, patch-rebased.diff


 Some reports indicate that repair sometime transfer huge amounts of data. One 
 hypothesis is that the merkle tree precision may deteriorate too much at some 
 data size. To check this hypothesis, it would be reasonably to gather 
 statistic during the merkle tree building of how many rows each merkle tree 
 range account for (and the size that this represent). It is probably an 
 interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)

2013-04-09 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626520#comment-13626520
 ] 

Benedict commented on CASSANDRA-2698:
-

Hi Yuki,

Without in some way collecting (or at least sampling) the size of the 
differences, I don't know what bucket sizes to use. Since I need to reinsert 
all the records once I've decided this anyway, I need to retain them all, which 
I chose to do in EstimatedHistogram as they do, in effect, constitute a 
histogram. I also sample the largest records which I figure could be useful for 
debugging purposes (though that was just a guess). I don't see why 1000s of 
items is a major issue?

I agree that logging is suboptimal for this data. Presumably similar data for 
other tasks may be optionally logged in future, and so I would guess this 
should form part of a discussion about metric logging?

{quote}
fix coding style (especially whitespace) to match other code.
{quote}
Do you have an Eclipse formatter profile I could use for your coding 
convention? I did my best to keep it correct manually, but it is difficult to 
spot differences in an unfamiliar convention. Whitespace should be 
comparatively easy though.

{quote}
EstimatedHistogram#testGroupBy is failing.
{quote}
Noted - will fix and resubmit

{quote}
comparator in Arrays#sort in EstimatedHistogram#logSummary has the same 
conditions in both if and else if.
{quote}
Thanks, good spot. I'm surprised Eclipse didn't warn me.




 Instrument repair to be able to assess it's efficiency (precision)
 --

 Key: CASSANDRA-2698
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Benedict
Priority: Minor
  Labels: lhf
 Attachments: nodetool_repair_and_cfhistogram.tar.gz, 
 patch_2698_v1.txt, patch.diff, patch-rebased.diff


 Some reports indicate that repair sometime transfer huge amounts of data. One 
 hypothesis is that the merkle tree precision may deteriorate too much at some 
 data size. To check this hypothesis, it would be reasonably to gather 
 statistic during the merkle tree building of how many rows each merkle tree 
 range account for (and the size that this represent). It is probably an 
 interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-7631) Allow Stress to write directly to SSTables

2014-07-31 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080759#comment-14080759
 ] 

Benedict commented on CASSANDRA-7631:
-

Feel free to leave this one for me, as I'll be looking at stress soon anyway.

 Allow Stress to write directly to SSTables
 --

 Key: CASSANDRA-7631
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7631
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Russell Alexander Spitzer
Assignee: Russell Alexander Spitzer

 One common difficulty with benchmarking machines is the amount of time it 
 takes to initially load data. For machines with a large amount of ram this 
 becomes especially onerous because a very large amount of data needs to be 
 placed on the machine before page-cache can be circumvented. 
 To remedy this I suggest we add a top level flag to Cassandra-Stress which 
 would cause the tool to write directly to sstables rather than actually 
 performing CQL inserts. Internally this would use CQLSStable writer to write 
 directly to sstables while skipping any keys which are not owned by the node 
 stress is running on. The same stress command run on each node in the cluster 
 would then write unique sstables only containing data which that node is 
 responsible for. Following this no further network IO would be required to 
 distribute data as it would all already be correctly in place.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7593) Errors when upgrading through several versions to 2.1

2014-07-31 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080825#comment-14080825
 ] 

Benedict commented on CASSANDRA-7593:
-

Might be better to just expose this in CSCNT, since we have access to it when 
doing this.

 Errors when upgrading through several versions to 2.1
 -

 Key: CASSANDRA-7593
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7593
 Project: Cassandra
  Issue Type: Bug
 Environment: java 1.7
Reporter: Russ Hatch
Assignee: Tyler Hobbs
Priority: Critical
 Fix For: 2.1.0

 Attachments: 0001-keep-clusteringSize-in-CompoundComposite.patch, 
 7593.txt


 I'm seeing two different errors cropping up in the dtest which upgrades a 
 cluster through several versions.
 This is the more common error:
 {noformat}
 ERROR [GossipStage:10] 2014-07-22 13:14:30,028 CassandraDaemon.java:168 - 
 Exception in thread Thread[GossipStage:10,5,main]
 java.lang.AssertionError: null
 at 
 org.apache.cassandra.db.filter.SliceQueryFilter.shouldInclude(SliceQueryFilter.java:347)
  ~[main/:na]
 at 
 org.apache.cassandra.db.filter.QueryFilter.shouldInclude(QueryFilter.java:249)
  ~[main/:na]
 at 
 org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:249)
  ~[main/:na]
 at 
 org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:60)
  ~[main/:na]
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1873)
  ~[main/:na]
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1681)
  ~[main/:na]
 at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:345) 
 ~[main/:na]
 at 
 org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:59)
  ~[main/:na]
 at 
 org.apache.cassandra.cql3.statements.SelectStatement.readLocally(SelectStatement.java:293)
  ~[main/:na]
 at 
 org.apache.cassandra.cql3.statements.SelectStatement.executeInternal(SelectStatement.java:302)
  ~[main/:na]
 at 
 org.apache.cassandra.cql3.statements.SelectStatement.executeInternal(SelectStatement.java:60)
  ~[main/:na]
 at 
 org.apache.cassandra.cql3.QueryProcessor.executeInternal(QueryProcessor.java:263)
  ~[main/:na]
 at 
 org.apache.cassandra.db.SystemKeyspace.getPreferredIP(SystemKeyspace.java:514)
  ~[main/:na]
 at 
 org.apache.cassandra.net.OutboundTcpConnectionPool.init(OutboundTcpConnectionPool.java:51)
  ~[main/:na]
 at 
 org.apache.cassandra.net.MessagingService.getConnectionPool(MessagingService.java:522)
  ~[main/:na]
 at 
 org.apache.cassandra.net.MessagingService.getConnection(MessagingService.java:536)
  ~[main/:na]
 at 
 org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:689)
  ~[main/:na]
 at 
 org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:663)
  ~[main/:na]
 at 
 org.apache.cassandra.service.EchoVerbHandler.doVerb(EchoVerbHandler.java:40) 
 ~[main/:na]
 at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) 
 ~[main/:na]
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  ~[na:1.7.0_60]
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  ~[na:1.7.0_60]
 at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_60]
 {noformat}
 The same test sometimes fails with this exception instead:
 {noformat}
 ERROR [CompactionExecutor:4] 2014-07-22 16:18:21,008 CassandraDaemon.java:168 
 - Exception in thread Thread[CompactionExecutor:4,1,RMI Runtime]
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@7059d3e9 
 rejected from 
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@108f1504[Terminated,
  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 95]
 at 
 java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
  ~[na:1.7.0_60]
 at 
 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) 
 ~[na:1.7.0_60]
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:325)
  ~[na:1.7.0_60]
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:530)
  ~[na:1.7.0_60]
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor.execute(ScheduledThreadPoolExecutor.java:619)
  ~[na:1.7.0_60]
 at 
 org.apache.cassandra.io.sstable.SSTableReader.scheduleTidy(SSTableReader.java:628)
  

[jira] [Commented] (CASSANDRA-7567) when the commit_log disk for a single node is overwhelmed the entire cluster slows down

2014-07-31 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081045#comment-14081045
 ] 

Benedict commented on CASSANDRA-7567:
-

Which mode of connectivity? smart thrift and cql native3 both use token aware 
routing from the Java driver (smart thrift does its own fairly dumb round-robin 
for a given token range), so will go directly to a random node in the cluster. 
Java driver I don't think we have any easy API control over what nodes we 
connect to, and I'm not sure there's a lot of point making smart thrift too 
smart, since it's only there to compare fairly against cql native3's 
token-aware routing. Regular thrift mode won't do this.

 when the commit_log disk for a single node is overwhelmed the entire cluster 
 slows down
 ---

 Key: CASSANDRA-7567
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7567
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: debian 7.5, bare metal, 14 nodes, 64CPUs, 64GB RAM, 
 commit_log disk sata, data disk SSD, vnodes, leveled compaction strategy
Reporter: David O'Dell
Assignee: Brandon Williams
 Fix For: 2.1.0

 Attachments: 7567.logs.bz2, write_request_latency.png


 We've run into a situation where a single node out of 14 is experiencing high 
 disk io. This can happen when a node is being decommissioned or after it 
 joins the ring and runs into the bug cassandra-6621.
 When this occurs the write latency for the entire cluster spikes.
 From 0.3ms to 170ms.
 To simulate this simply run dd on the commit_log disk (dd if=/dev/zero 
 of=/tmp/foo bs=1024) and you will see that instantly all nodes in the cluster 
 have slowed down.
 BTW overwhelming the data disk does not have this same effect.
 Also I've tried this where the overwhelmed node isn't being connected 
 directly from the client and it still has the same effect.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7567) when the commit_log disk for a single node is overwhelmed the entire cluster slows down

2014-07-31 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081060#comment-14081060
 ] 

Benedict commented on CASSANDRA-7567:
-

So Java driver then (cql3 native prepared is default) There isn't anything we 
can really do about this without Java Driver support

 when the commit_log disk for a single node is overwhelmed the entire cluster 
 slows down
 ---

 Key: CASSANDRA-7567
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7567
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: debian 7.5, bare metal, 14 nodes, 64CPUs, 64GB RAM, 
 commit_log disk sata, data disk SSD, vnodes, leveled compaction strategy
Reporter: David O'Dell
Assignee: Brandon Williams
 Attachments: 7567.logs.bz2, write_request_latency.png


 We've run into a situation where a single node out of 14 is experiencing high 
 disk io. This can happen when a node is being decommissioned or after it 
 joins the ring and runs into the bug cassandra-6621.
 When this occurs the write latency for the entire cluster spikes.
 From 0.3ms to 170ms.
 To simulate this simply run dd on the commit_log disk (dd if=/dev/zero 
 of=/tmp/foo bs=1024) and you will see that instantly all nodes in the cluster 
 have slowed down.
 BTW overwhelming the data disk does not have this same effect.
 Also I've tried this where the overwhelmed node isn't being connected 
 directly from the client and it still has the same effect.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7658) stress connects to all nodes when it shouldn't

2014-07-31 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-7658:


Issue Type: Improvement  (was: Bug)

 stress connects to all nodes when it shouldn't
 --

 Key: CASSANDRA-7658
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7658
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Brandon Williams
Assignee: Benedict
 Fix For: 2.1.1


 If you tell stress -node 1,2 in cluster with more nodes, stress appears to do 
 ring discovery and connect to them all anyway (checked via netstat.)  This 
 led to the confusion on CASSANDRA-7567



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7658) stress connects to all nodes when it shouldn't

2014-07-31 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-7658:


Priority: Minor  (was: Major)

 stress connects to all nodes when it shouldn't
 --

 Key: CASSANDRA-7658
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7658
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Brandon Williams
Assignee: Benedict
Priority: Minor
 Fix For: 2.1.1


 If you tell stress -node 1,2 in cluster with more nodes, stress appears to do 
 ring discovery and connect to them all anyway (checked via netstat.)  This 
 led to the confusion on CASSANDRA-7567



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7658) stress connects to all nodes when it shouldn't

2014-07-31 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081063#comment-14081063
 ] 

Benedict commented on CASSANDRA-7658:
-

You're inferring a property of the 'node' option it doesn't promise; that's the 
list of nodes it connects to initially to get started. You're looking for a 
whitelist, which is a different thing, and not currently supported. For dumb 
routing it necessarily behaves as both, but this is a feature request not a bug.

Either way, we need Java Driver support that I don't think currently exists via 
the API.

 stress connects to all nodes when it shouldn't
 --

 Key: CASSANDRA-7658
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7658
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Brandon Williams
Assignee: Benedict
Priority: Minor
 Fix For: 2.1.1


 If you tell stress -node 1,2 in cluster with more nodes, stress appears to do 
 ring discovery and connect to them all anyway (checked via netstat.)  This 
 led to the confusion on CASSANDRA-7567



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7658) stress connects to all nodes when it shouldn't

2014-07-31 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081071#comment-14081071
 ] 

Benedict commented on CASSANDRA-7658:
-

old-stress has no distinction between a whitelist and an initial list.

 stress connects to all nodes when it shouldn't
 --

 Key: CASSANDRA-7658
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7658
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Brandon Williams
Assignee: Benedict
Priority: Minor
 Fix For: 2.1.1


 If you tell stress -node 1,2 in cluster with more nodes, stress appears to do 
 ring discovery and connect to them all anyway (checked via netstat.)  This 
 led to the confusion on CASSANDRA-7567



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (CASSANDRA-7631) Allow Stress to write directly to SSTables

2014-07-31 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict reassigned CASSANDRA-7631:
---

Assignee: Benedict  (was: Russell Alexander Spitzer)

 Allow Stress to write directly to SSTables
 --

 Key: CASSANDRA-7631
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7631
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Russell Alexander Spitzer
Assignee: Benedict

 One common difficulty with benchmarking machines is the amount of time it 
 takes to initially load data. For machines with a large amount of ram this 
 becomes especially onerous because a very large amount of data needs to be 
 placed on the machine before page-cache can be circumvented. 
 To remedy this I suggest we add a top level flag to Cassandra-Stress which 
 would cause the tool to write directly to sstables rather than actually 
 performing CQL inserts. Internally this would use CQLSStable writer to write 
 directly to sstables while skipping any keys which are not owned by the node 
 stress is running on. The same stress command run on each node in the cluster 
 would then write unique sstables only containing data which that node is 
 responsible for. Following this no further network IO would be required to 
 distribute data as it would all already be correctly in place.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7631) Allow Stress to write directly to SSTables

2014-07-31 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-7631:


Assignee: Russell Alexander Spitzer  (was: Benedict)

 Allow Stress to write directly to SSTables
 --

 Key: CASSANDRA-7631
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7631
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Russell Alexander Spitzer
Assignee: Russell Alexander Spitzer

 One common difficulty with benchmarking machines is the amount of time it 
 takes to initially load data. For machines with a large amount of ram this 
 becomes especially onerous because a very large amount of data needs to be 
 placed on the machine before page-cache can be circumvented. 
 To remedy this I suggest we add a top level flag to Cassandra-Stress which 
 would cause the tool to write directly to sstables rather than actually 
 performing CQL inserts. Internally this would use CQLSStable writer to write 
 directly to sstables while skipping any keys which are not owned by the node 
 stress is running on. The same stress command run on each node in the cluster 
 would then write unique sstables only containing data which that node is 
 responsible for. Following this no further network IO would be required to 
 distribute data as it would all already be correctly in place.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7631) Allow Stress to write directly to SSTables

2014-07-31 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081116#comment-14081116
 ] 

Benedict commented on CASSANDRA-7631:
-

Ok, in that case some various random thoughts on this:

1) I suspect you're not blocking on ABQ, but on the single thread you have 
consuming from it (and having this separate thread is bad anyway). It's likely 
you're getting some misattribution in your profiler due to rapid thread 
sleeping/waking there.
2) We should for now complain if the whole partition isn't being inserted for 
this mode
3) We should create the CF on each individual thread, and we should append them 
unsorted onto a ConcurrentLinkedQueue, track the total memory used in the 
buffer, and have a separate thread that sorts the partition keys and flushes 
out to disk once we exceed our threshold for doing so (much like memtable 
flushing)
4) We should modify the PartitionGenerator to support sorting the clustering 
components it generates; this way we can reduce the sorting cost fairly 
dramatically, as sorting individual components is much cheaper than sorting all 
components at once
5) Ideally we would visit the partition keys in approximately sorted order, so 
that we can flush a single file, as this will be most efficient for loading. 
This will require a minor portion of the changes I'll be introducing soon for 
more realistic workload generation, and then a custom SeedGenerator that 
(externally) pre-sorts the seeds based on the partitions they generate.


 Allow Stress to write directly to SSTables
 --

 Key: CASSANDRA-7631
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7631
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Russell Alexander Spitzer
Assignee: Russell Alexander Spitzer

 One common difficulty with benchmarking machines is the amount of time it 
 takes to initially load data. For machines with a large amount of ram this 
 becomes especially onerous because a very large amount of data needs to be 
 placed on the machine before page-cache can be circumvented. 
 To remedy this I suggest we add a top level flag to Cassandra-Stress which 
 would cause the tool to write directly to sstables rather than actually 
 performing CQL inserts. Internally this would use CQLSStable writer to write 
 directly to sstables while skipping any keys which are not owned by the node 
 stress is running on. The same stress command run on each node in the cluster 
 would then write unique sstables only containing data which that node is 
 responsible for. Following this no further network IO would be required to 
 distribute data as it would all already be correctly in place.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7644) tracing does not log commitlog/memtable ops when the coordinator is a replica

2014-07-31 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081129#comment-14081129
 ] 

Benedict commented on CASSANDRA-7644:
-

LGTM

 tracing does not log commitlog/memtable ops when the coordinator is a replica
 -

 Key: CASSANDRA-7644
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7644
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Brandon Williams
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 2.1.0

 Attachments: 7644.txt


 For instance:
 {noformat}
  session_id   | event_id 
 | activity  | source  
   | source_elapsed | thread
 --+--+---+---++-
  bb1c4bc0-176f-11e4-8893-4b4842ed69b9 | bb1c4bc1-176f-11e4-8893-4b4842ed69b9 
 | Parsing insert into Standard1 (key, C0) VALUES ( 0xff, 0xff); |  
 10.208.8.123 | 86 | SharedPool-Worker-5
  bb1c4bc0-176f-11e4-8893-4b4842ed69b9 | bb1c72d0-176f-11e4-8893-4b4842ed69b9 
 |   Preparing statement |  
 10.208.8.123 |434 | SharedPool-Worker-5
  bb1c4bc0-176f-11e4-8893-4b4842ed69b9 | bb1c72d1-176f-11e4-8893-4b4842ed69b9 
 | Determining replicas for mutation |  
 10.208.8.123 |534 | SharedPool-Worker-5
  bb1c4bc0-176f-11e4-8893-4b4842ed69b9 | bb1c72d2-176f-11e4-8893-4b4842ed69b9 
 |   Sending message to /10.208.8.63 |  
 10.208.8.123 |   1157 |  WRITE-/10.208.8.63
  bb1c4bc0-176f-11e4-8893-4b4842ed69b9 | bb1c99e0-176f-11e4-8893-4b4842ed69b9 
 | Sending message to /10.208.35.225 |  
 10.208.8.123 |   1975 |WRITE-/10.208.35.225
  bb1c4bc0-176f-11e4-8893-4b4842ed69b9 | bb1d0f10-176f-11e4-8893-4b4842ed69b9 
 |Message received from /10.208.8.63 |  
 10.208.8.123 |   4732 |Thread-5
  bb1c4bc0-176f-11e4-8893-4b4842ed69b9 | bb1d0f11-176f-11e4-8893-4b4842ed69b9 
 |  Message received from /10.208.35.225 |  
 10.208.8.123 |   5086 |Thread-4
  bb1c4bc0-176f-11e4-8893-4b4842ed69b9 | bb1d3620-176f-11e4-8893-4b4842ed69b9 
 | Processing response from /10.208.8.63 |  
 10.208.8.123 |   5288 | SharedPool-Worker-7
  bb1c4bc0-176f-11e4-8893-4b4842ed69b9 | bb1d3620-176f-11e4-93e6-517bcdb23258 
 |   Message received from /10.208.8.123 | 
 10.208.35.225 | 76 |Thread-4
  bb1c4bc0-176f-11e4-8893-4b4842ed69b9 | bb1d3620-176f-11e4-9b20-3b546d897db7 
 |   Message received from /10.208.8.123 |   
 10.208.8.63 |317 |Thread-4
  bb1c4bc0-176f-11e4-8893-4b4842ed69b9 | bb1d3621-176f-11e4-8893-4b4842ed69b9 
 |   Processing response from /10.208.35.225 |  
 10.208.8.123 |   5332 | SharedPool-Worker-7
  bb1c4bc0-176f-11e4-8893-4b4842ed69b9 | bb1d3621-176f-11e4-93e6-517bcdb23258 
 |Appending to commitlog | 
 10.208.35.225 |322 | SharedPool-Worker-4
  bb1c4bc0-176f-11e4-8893-4b4842ed69b9 | bb1d3622-176f-11e4-93e6-517bcdb23258 
 |  Adding to Standard1 memtable | 
 10.208.35.225 |386 | SharedPool-Worker-4
  bb1c4bc0-176f-11e4-8893-4b4842ed69b9 | bb1d3623-176f-11e4-93e6-517bcdb23258 
 |   Enqueuing response to /10.208.8.123 | 
 10.208.35.225 |451 | SharedPool-Worker-4
  bb1c4bc0-176f-11e4-8893-4b4842ed69b9 | bb1d5d30-176f-11e4-93e6-517bcdb23258 
 |  Sending message to bw-1/10.208.8.123 | 
 10.208.35.225 |   1538 | WRITE-bw-1/10.208.8.123
  bb1c4bc0-176f-11e4-8893-4b4842ed69b9 | bb1d5d30-176f-11e4-9b20-3b546d897db7 
 |Appending to commitlog |   
 10.208.8.63 |   1191 | SharedPool-Worker-7
  bb1c4bc0-176f-11e4-8893-4b4842ed69b9 | bb1d5d31-176f-11e4-9b20-3b546d897db7 
 |  Adding to Standard1 memtable |   
 10.208.8.63 |   1226 | SharedPool-Worker-7
  bb1c4bc0-176f-11e4-8893-4b4842ed69b9 | bb1d5d32-176f-11e4-9b20-3b546d897db7 
 |   Enqueuing response to /10.208.8.123 |   
 10.208.8.63 |   1277 | SharedPool-Worker-7
  

[jira] [Commented] (CASSANDRA-7593) Errors when upgrading through several versions to 2.1

2014-07-31 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081135#comment-14081135
 ] 

Benedict commented on CASSANDRA-7593:
-

Yes, this is what I meant when I said expose it in CSCNT; didn't spot it was 
already exposed 

 Errors when upgrading through several versions to 2.1
 -

 Key: CASSANDRA-7593
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7593
 Project: Cassandra
  Issue Type: Bug
 Environment: java 1.7
Reporter: Russ Hatch
Assignee: Tyler Hobbs
Priority: Critical
 Fix For: 2.1.0

 Attachments: 0001-keep-clusteringSize-in-CompoundComposite.patch, 
 7593.txt


 I'm seeing two different errors cropping up in the dtest which upgrades a 
 cluster through several versions.
 This is the more common error:
 {noformat}
 ERROR [GossipStage:10] 2014-07-22 13:14:30,028 CassandraDaemon.java:168 - 
 Exception in thread Thread[GossipStage:10,5,main]
 java.lang.AssertionError: null
 at 
 org.apache.cassandra.db.filter.SliceQueryFilter.shouldInclude(SliceQueryFilter.java:347)
  ~[main/:na]
 at 
 org.apache.cassandra.db.filter.QueryFilter.shouldInclude(QueryFilter.java:249)
  ~[main/:na]
 at 
 org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:249)
  ~[main/:na]
 at 
 org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:60)
  ~[main/:na]
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1873)
  ~[main/:na]
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1681)
  ~[main/:na]
 at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:345) 
 ~[main/:na]
 at 
 org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:59)
  ~[main/:na]
 at 
 org.apache.cassandra.cql3.statements.SelectStatement.readLocally(SelectStatement.java:293)
  ~[main/:na]
 at 
 org.apache.cassandra.cql3.statements.SelectStatement.executeInternal(SelectStatement.java:302)
  ~[main/:na]
 at 
 org.apache.cassandra.cql3.statements.SelectStatement.executeInternal(SelectStatement.java:60)
  ~[main/:na]
 at 
 org.apache.cassandra.cql3.QueryProcessor.executeInternal(QueryProcessor.java:263)
  ~[main/:na]
 at 
 org.apache.cassandra.db.SystemKeyspace.getPreferredIP(SystemKeyspace.java:514)
  ~[main/:na]
 at 
 org.apache.cassandra.net.OutboundTcpConnectionPool.init(OutboundTcpConnectionPool.java:51)
  ~[main/:na]
 at 
 org.apache.cassandra.net.MessagingService.getConnectionPool(MessagingService.java:522)
  ~[main/:na]
 at 
 org.apache.cassandra.net.MessagingService.getConnection(MessagingService.java:536)
  ~[main/:na]
 at 
 org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:689)
  ~[main/:na]
 at 
 org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:663)
  ~[main/:na]
 at 
 org.apache.cassandra.service.EchoVerbHandler.doVerb(EchoVerbHandler.java:40) 
 ~[main/:na]
 at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) 
 ~[main/:na]
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  ~[na:1.7.0_60]
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  ~[na:1.7.0_60]
 at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_60]
 {noformat}
 The same test sometimes fails with this exception instead:
 {noformat}
 ERROR [CompactionExecutor:4] 2014-07-22 16:18:21,008 CassandraDaemon.java:168 
 - Exception in thread Thread[CompactionExecutor:4,1,RMI Runtime]
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@7059d3e9 
 rejected from 
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@108f1504[Terminated,
  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 95]
 at 
 java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
  ~[na:1.7.0_60]
 at 
 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) 
 ~[na:1.7.0_60]
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:325)
  ~[na:1.7.0_60]
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:530)
  ~[na:1.7.0_60]
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor.execute(ScheduledThreadPoolExecutor.java:619)
  ~[na:1.7.0_60]
 at 
 org.apache.cassandra.io.sstable.SSTableReader.scheduleTidy(SSTableReader.java:628)
  

[jira] [Commented] (CASSANDRA-7511) Always flush on TRUNCATE

2014-08-01 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14082295#comment-14082295
 ] 

Benedict commented on CASSANDRA-7511:
-

Looking at 2.1, it is actually still affected by this bug. I don't mind which 
solution we go for in 2.1; always flush, or grab the last replay position from 
the memtable (either are pretty trivial)

 Always flush on TRUNCATE
 

 Key: CASSANDRA-7511
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7511
 Project: Cassandra
  Issue Type: Bug
 Environment: CentOS 6.5, Oracle Java 7u60, C* 2.0.6, 2.0.9, including 
 earlier 1.0.* versions.
Reporter: Viktor Jevdokimov
Assignee: Jeremiah Jordan
Priority: Minor
  Labels: commitlog
 Fix For: 2.0.10

 Attachments: 7511-2.0-v2.txt, 7511-v3-remove-renewMemtable.txt, 
 7511-v3-test.txt, 7511-v3.txt, 7511.txt


 Commit log grows infinitely after CF truncate operation via cassandra-cli, 
 regardless CF receives writes or not thereafter.
 CF's could be non-CQL Standard and Super column type. Creation of snapshots 
 after truncate is turned off.
 Commit log may start grow promptly, may start grow later, on a few only or on 
 all nodes at once.
 Nothing special in the system log. No idea how to reproduce.
 After rolling restart commit logs are cleared and back to normal. Just 
 annoying to do rolling restart after each truncate.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7631) Allow Stress to write directly to SSTables

2014-08-01 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14082854#comment-14082854
 ] 

Benedict commented on CASSANDRA-7631:
-

I'd be inclined to keep this feature for user commands only, to keep 
maintenance complexity down. The writing is on the wall for the legacy mode 
anyway, for anything other than a very quick benchmark of general server 
performance. I don't see a reason to build the sstables up in advance for that 
kind of use case.

 Allow Stress to write directly to SSTables
 --

 Key: CASSANDRA-7631
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7631
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Russell Alexander Spitzer
Assignee: Russell Alexander Spitzer

 One common difficulty with benchmarking machines is the amount of time it 
 takes to initially load data. For machines with a large amount of ram this 
 becomes especially onerous because a very large amount of data needs to be 
 placed on the machine before page-cache can be circumvented. 
 To remedy this I suggest we add a top level flag to Cassandra-Stress which 
 would cause the tool to write directly to sstables rather than actually 
 performing CQL inserts. Internally this would use CQLSStable writer to write 
 directly to sstables while skipping any keys which are not owned by the node 
 stress is running on. The same stress command run on each node in the cluster 
 would then write unique sstables only containing data which that node is 
 responsible for. Following this no further network IO would be required to 
 distribute data as it would all already be correctly in place.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7631) Allow Stress to write directly to SSTables

2014-08-01 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14082880#comment-14082880
 ] 

Benedict commented on CASSANDRA-7631:
-

I think it's a bit early for that. Let's pencil that in for 3.0.

 Allow Stress to write directly to SSTables
 --

 Key: CASSANDRA-7631
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7631
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Russell Alexander Spitzer
Assignee: Russell Alexander Spitzer

 One common difficulty with benchmarking machines is the amount of time it 
 takes to initially load data. For machines with a large amount of ram this 
 becomes especially onerous because a very large amount of data needs to be 
 placed on the machine before page-cache can be circumvented. 
 To remedy this I suggest we add a top level flag to Cassandra-Stress which 
 would cause the tool to write directly to sstables rather than actually 
 performing CQL inserts. Internally this would use CQLSStable writer to write 
 directly to sstables while skipping any keys which are not owned by the node 
 stress is running on. The same stress command run on each node in the cluster 
 would then write unique sstables only containing data which that node is 
 responsible for. Following this no further network IO would be required to 
 distribute data as it would all already be correctly in place.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6276) CQL: Map can not be created with the same name as a previously dropped list

2014-08-04 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14084921#comment-14084921
 ] 

Benedict commented on CASSANDRA-6276:
-

3.0 storage engine won't be universal for a while (maybe never for thrift), but 
will index directly into columns (i.e. won't touch any not requested), so could 
trivially avoid retrieving data for dropped columns. The only problem is we'd 
need to track the range of sstables for which they were previously dropped (and 
maybe contains stale data), and which we now apply the new comparator too, 
which would be a bit ugly/annoying.

 CQL: Map can not be created with the same name as a previously dropped list
 ---

 Key: CASSANDRA-6276
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6276
 Project: Cassandra
  Issue Type: Bug
 Environment:  Cassandra 2.0.2 | CQL spec 3.1.0
 centos 64 bit
 Java(TM) SE Runtime Environment (build 1.7.0-b147)
Reporter: Oli Schacher
Assignee: Benjamin Lerer
Priority: Minor
  Labels: cql
 Fix For: 2.1.1

 Attachments: CASSANDRA-6276.txt


 If create a list, drop it and create a map with the same name, i get Bad 
 Request: comparators do not match or are not compatible.
 {quote}
 cqlsh:os_test1 create table thetable(id timeuuid primary key, somevalue 
 text);
 cqlsh:os_test1 alter table thetable add mycollection listtext;  
 cqlsh:os_test1 alter table thetable drop mycollection;
 cqlsh:os_test1 alter table thetable add mycollection maptext,text;  
 Bad Request: comparators do not match or are not compatible.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-08-06 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-7546:


Attachment: 7546.20_6.txt

I've attached a slightly tweaked version, making things a little clearer (IMO) 
and removing some of the unnecessary comments, as well as fixing a couple of 
bugs and removing the AtomicReferenceHolder to recoup the extra space we're 
now using in the Holder.

I must admit I'm still not madly keen on the nested synchronized() calls - I 
think they're a little ugly, and also increase call depth which is not ideal. I 
also cannot find any evidence that invoking unsafe.monitorenter/monitorexit 
would result in negative optimisations (this discussion on the relevant mailing 
list makes no such assertion whilst discussing its potential removal 
[http://openjdk.5641.n7.nabble.com/Unsafe-removing-the-monitorEnter-monitorExit-tryMonitorEnter-methods-td179462.html],
 but suggests exposing them more safely), however mostly I think the usage is 
clearer than nested calls passing the state of the method (isSynchronized is 
esp. ugly to me). I am not deadset against it though. Perhaps [~iamaleksey] can 
offer a third opinion?

Otherwise, WDYT [~graham sanderson]? Could you give this patch a test and see 
how it behaves?

 AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
 -

 Key: CASSANDRA-7546
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: graham sanderson
Assignee: graham sanderson
 Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_alt.txt, 
 suggestion1.txt, suggestion1_21.txt


 In order to preserve atomicity, this code attempts to read, clone/update, 
 then CAS the state of the partition.
 Under heavy contention for updating a single partition this can cause some 
 fairly staggering memory growth (the more cores on your machine the worst it 
 gets).
 Whilst many usage patterns don't do highly concurrent updates to the same 
 partition, hinting today, does, and in this case wild (order(s) of magnitude 
 more than expected) memory allocation rates can be seen (especially when the 
 updates being hinted are small updates to different partitions which can 
 happen very fast on their own) - see CASSANDRA-7545
 It would be best to eliminate/reduce/limit the spinning memory allocation 
 whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7704) FileNotFoundException during STREAM-OUT triggers 100% CPU usage

2014-08-06 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-7704:


Attachment: 7704.txt

Attaching a patch that I think addresses this. There are a number of 
concurrency bugs here, and whilst we could fix them with more advanced 
lock-freedom, there is no compelling reason this class doesn't use synchronized 
everywhere, which would probably have avoided this problem in the first place. 
There is only one place where the execution is not guaranteed to be prompt, and 
I have left this out of the synchronization. I have at the same time simplified 
the logic, and fixed the logic for cancelling timeouts, as well as made the 
scheduled executor for timeouts globally shared (there's no good reason to 
spinup a new executor for each set of transfers)

In this particular instance the issue seems to have been a lack of atomicity 
between abort() and complete(); an ACK arrived at the same time as abort() was 
cancelling all transfers, causing a reference to be released twice. This could 
also occur with the timeouts, but since they occur only every 12hrs, the risk 
is low.

 FileNotFoundException during STREAM-OUT triggers 100% CPU usage
 ---

 Key: CASSANDRA-7704
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7704
 Project: Cassandra
  Issue Type: Bug
Reporter: Rick Branson
 Attachments: 7704.txt, backtrace.txt


 See attached backtrace which was what triggered this. This stream failed and 
 then ~12 seconds later it emitted that exception. At that point, all CPUs 
 went to 100%. A thread dump shows all the ReadStage threads stuck inside 
 IntervalTree.searchInternal inside of CFS.markReferenced().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (CASSANDRA-7704) FileNotFoundException during STREAM-OUT triggers 100% CPU usage

2014-08-06 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict reassigned CASSANDRA-7704:
---

Assignee: Benedict

 FileNotFoundException during STREAM-OUT triggers 100% CPU usage
 ---

 Key: CASSANDRA-7704
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7704
 Project: Cassandra
  Issue Type: Bug
Reporter: Rick Branson
Assignee: Benedict
 Attachments: 7704.txt, backtrace.txt


 See attached backtrace which was what triggered this. This stream failed and 
 then ~12 seconds later it emitted that exception. At that point, all CPUs 
 went to 100%. A thread dump shows all the ReadStage threads stuck inside 
 IntervalTree.searchInternal inside of CFS.markReferenced().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7704) FileNotFoundException during STREAM-OUT triggers 100% CPU usage

2014-08-06 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087577#comment-14087577
 ] 

Benedict commented on CASSANDRA-7704:
-

[~rbranson], ftr, could we get the earlier stack traces you saw and other 
related info? I suspect it's possible the earlier failing transfer caused a 
file to be deleted prematurely, which then caused this failure. Both the same 
bug.

 FileNotFoundException during STREAM-OUT triggers 100% CPU usage
 ---

 Key: CASSANDRA-7704
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7704
 Project: Cassandra
  Issue Type: Bug
Reporter: Rick Branson
Assignee: Benedict
 Attachments: 7704.txt, backtrace.txt


 See attached backtrace which was what triggered this. This stream failed and 
 then ~12 seconds later it emitted that exception. At that point, all CPUs 
 went to 100%. A thread dump shows all the ReadStage threads stuck inside 
 IntervalTree.searchInternal inside of CFS.markReferenced().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-7705) Safer Resource Management

2014-08-06 Thread Benedict (JIRA)
Benedict created CASSANDRA-7705:
---

 Summary: Safer Resource Management
 Key: CASSANDRA-7705
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7705
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
 Fix For: 3.0


We've had a spate of bugs recently with bad reference counting. these can have 
potentially dire consequences, generally either randomly deleting data or 
giving us infinite loops. 

Since in 2.1 we only reference count resources that are relatively expensive 
and infrequently managed, we could without any negative consequences (and only 
slight code complexity) introduce a safer resource management scheme.

Basically, I propose when we want to acquire a resource we allocate an object 
that manages the reference. This can only be released once; if it is released 
twice, we fail immediately at the second release, reporting where the bug is 
(rather than letting it continue fine until the next correct release corrupts 
the count). The reference counter remains the same, but we obtain guarantees 
that the reference count itself is never badly maintained, although code using 
it could mistakenly release its own handle early (typically this is only an 
issue when cleaning up after a failure, in which case under the new scheme this 
would be an innocuous error)





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7705) Safer Resource Management

2014-08-06 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-7705:


Description: 
We've had a spate of bugs recently with bad reference counting. these can have 
potentially dire consequences, generally either randomly deleting data or 
giving us infinite loops. 

Since in 2.1 we only reference count resources that are relatively expensive 
and infrequently managed (or in places where this safety is probably not as 
necessary, e.g. SerializingCache), we could without any negative consequences 
(and only slight code complexity) introduce a safer resource management scheme 
for these more expensive/infrequent actions.

Basically, I propose when we want to acquire a resource we allocate an object 
that manages the reference. This can only be released once; if it is released 
twice, we fail immediately at the second release, reporting where the bug is 
(rather than letting it continue fine until the next correct release corrupts 
the count). The reference counter remains the same, but we obtain guarantees 
that the reference count itself is never badly maintained, although code using 
it could mistakenly release its own handle early (typically this is only an 
issue when cleaning up after a failure, in which case under the new scheme this 
would be an innocuous error)



  was:
We've had a spate of bugs recently with bad reference counting. these can have 
potentially dire consequences, generally either randomly deleting data or 
giving us infinite loops. 

Since in 2.1 we only reference count resources that are relatively expensive 
and infrequently managed, we could without any negative consequences (and only 
slight code complexity) introduce a safer resource management scheme.

Basically, I propose when we want to acquire a resource we allocate an object 
that manages the reference. This can only be released once; if it is released 
twice, we fail immediately at the second release, reporting where the bug is 
(rather than letting it continue fine until the next correct release corrupts 
the count). The reference counter remains the same, but we obtain guarantees 
that the reference count itself is never badly maintained, although code using 
it could mistakenly release its own handle early (typically this is only an 
issue when cleaning up after a failure, in which case under the new scheme this 
would be an innocuous error)




 Safer Resource Management
 -

 Key: CASSANDRA-7705
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7705
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
 Fix For: 3.0


 We've had a spate of bugs recently with bad reference counting. these can 
 have potentially dire consequences, generally either randomly deleting data 
 or giving us infinite loops. 
 Since in 2.1 we only reference count resources that are relatively expensive 
 and infrequently managed (or in places where this safety is probably not as 
 necessary, e.g. SerializingCache), we could without any negative consequences 
 (and only slight code complexity) introduce a safer resource management 
 scheme for these more expensive/infrequent actions.
 Basically, I propose when we want to acquire a resource we allocate an object 
 that manages the reference. This can only be released once; if it is released 
 twice, we fail immediately at the second release, reporting where the bug is 
 (rather than letting it continue fine until the next correct release corrupts 
 the count). The reference counter remains the same, but we obtain guarantees 
 that the reference count itself is never badly maintained, although code 
 using it could mistakenly release its own handle early (typically this is 
 only an issue when cleaning up after a failure, in which case under the new 
 scheme this would be an innocuous error)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-08-06 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087814#comment-14087814
 ] 

Benedict commented on CASSANDRA-7546:
-

Well, technically we never ever call addColumn() directly, but in 2.0 we 
haven't removed / UnsupportedOperationException'd that path, so I'm not totally 
comfortable leaving it as a regular int, as an external call to addColumn would 
break it (but then, this probably isn't the end of the world). 

However, I actually introduced a double counting bug in changing that :/   ... 
and since we don't want to incur the incAndGet every change, and we don't want 
to dup code, let's settle for the possible race for maintaining size if 
somebody uses the API in a way it isn;t in the codebase right now.

However I think I would prefer to make size final in this case.

 AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
 -

 Key: CASSANDRA-7546
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: graham sanderson
Assignee: graham sanderson
 Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_alt.txt, 
 suggestion1.txt, suggestion1_21.txt


 In order to preserve atomicity, this code attempts to read, clone/update, 
 then CAS the state of the partition.
 Under heavy contention for updating a single partition this can cause some 
 fairly staggering memory growth (the more cores on your machine the worst it 
 gets).
 Whilst many usage patterns don't do highly concurrent updates to the same 
 partition, hinting today, does, and in this case wild (order(s) of magnitude 
 more than expected) memory allocation rates can be seen (especially when the 
 updates being hinted are small updates to different partitions which can 
 happen very fast on their own) - see CASSANDRA-7545
 It would be best to eliminate/reduce/limit the spinning memory allocation 
 whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-08-06 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087814#comment-14087814
 ] 

Benedict edited comment on CASSANDRA-7546 at 8/6/14 3:56 PM:
-

Well, technically we never ever call addColumn() directly, but in 2.0 we 
haven't removed / UnsupportedOperationException'd that path, so I'm not totally 
comfortable leaving it as a regular int, as an external call to addColumn would 
break it (but then, this probably isn't the end of the world). 

However, I actually introduced a double counting bug in changing that :/   ... 
and since we don't want to incur the incAndGet every change, and we don't want 
to dup code, let's settle for the possible race for maintaining size if 
somebody uses the API in a way it isn;t in the codebase right now.

-However I think I would prefer to make size final in this case.-

Looking again, it's too ugly to make it final, so let's settle for the ugliness 
of it being non-final, and revert to your behaviour here. This bit is soon to 
be superceded by 2.1 anyway, so let's not agonise over the beauty of it.


was (Author: benedict):
Well, technically we never ever call addColumn() directly, but in 2.0 we 
haven't removed / UnsupportedOperationException'd that path, so I'm not totally 
comfortable leaving it as a regular int, as an external call to addColumn would 
break it (but then, this probably isn't the end of the world). 

However, I actually introduced a double counting bug in changing that :/   ... 
and since we don't want to incur the incAndGet every change, and we don't want 
to dup code, let's settle for the possible race for maintaining size if 
somebody uses the API in a way it isn;t in the codebase right now.

However I think I would prefer to make size final in this case.

 AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
 -

 Key: CASSANDRA-7546
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: graham sanderson
Assignee: graham sanderson
 Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_alt.txt, 
 suggestion1.txt, suggestion1_21.txt


 In order to preserve atomicity, this code attempts to read, clone/update, 
 then CAS the state of the partition.
 Under heavy contention for updating a single partition this can cause some 
 fairly staggering memory growth (the more cores on your machine the worst it 
 gets).
 Whilst many usage patterns don't do highly concurrent updates to the same 
 partition, hinting today, does, and in this case wild (order(s) of magnitude 
 more than expected) memory allocation rates can be seen (especially when the 
 updates being hinted are small updates to different partitions which can 
 happen very fast on their own) - see CASSANDRA-7545
 It would be best to eliminate/reduce/limit the spinning memory allocation 
 whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-08-06 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087850#comment-14087850
 ] 

Benedict commented on CASSANDRA-7546:
-

bq. We probably mean to the left of... before or after are a bit 
confusing here!

Yep, good catch!

bq. Volatile read of the wasteTracker in the fast path.

At the moment we mostly optimise for x86 for the moment, and it's essentially 
free here as you say. Even on platforms it isn't, it's unlikely to be a 
significant part of the overall costs, so better to keep it cleaner

bq. Adjacent in memory CASed vars in the AtomicSortedColumns - Again not 
majorly worried here... I don't think the (CASed) variables themselves are 
highly contended, it is more that we are doing lots of slow concurrent work, 
and then failing the CAS.

Absolutely not worried about this. Like you say, most of the cost is elsewhere. 
Would be much worse to pollute the cache with padding to avoid it.

 AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
 -

 Key: CASSANDRA-7546
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: graham sanderson
Assignee: graham sanderson
 Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_alt.txt, 
 suggestion1.txt, suggestion1_21.txt


 In order to preserve atomicity, this code attempts to read, clone/update, 
 then CAS the state of the partition.
 Under heavy contention for updating a single partition this can cause some 
 fairly staggering memory growth (the more cores on your machine the worst it 
 gets).
 Whilst many usage patterns don't do highly concurrent updates to the same 
 partition, hinting today, does, and in this case wild (order(s) of magnitude 
 more than expected) memory allocation rates can be seen (especially when the 
 updates being hinted are small updates to different partitions which can 
 happen very fast on their own) - see CASSANDRA-7545
 It would be best to eliminate/reduce/limit the spinning memory allocation 
 whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-08-06 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087885#comment-14087885
 ] 

Benedict commented on CASSANDRA-7546:
-

Sounds good, thanks!

 AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
 -

 Key: CASSANDRA-7546
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: graham sanderson
Assignee: graham sanderson
 Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_alt.txt, 
 suggestion1.txt, suggestion1_21.txt


 In order to preserve atomicity, this code attempts to read, clone/update, 
 then CAS the state of the partition.
 Under heavy contention for updating a single partition this can cause some 
 fairly staggering memory growth (the more cores on your machine the worst it 
 gets).
 Whilst many usage patterns don't do highly concurrent updates to the same 
 partition, hinting today, does, and in this case wild (order(s) of magnitude 
 more than expected) memory allocation rates can be seen (especially when the 
 updates being hinted are small updates to different partitions which can 
 happen very fast on their own) - see CASSANDRA-7545
 It would be best to eliminate/reduce/limit the spinning memory allocation 
 whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-08-07 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089019#comment-14089019
 ] 

Benedict commented on CASSANDRA-7282:
-

Just pushed a minor update removing an extraneous comment and making the resize 
threshold triggering uniform + clearer.

Agree it would be good to get some performance numbers on this, but I'm not 
sure which magical facilities you're referring to? New stress isn't likely to 
stress this bit out any more interestingly than old stress, and we don't yet 
have a working magical performance service I don't think...

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7695) Inserting the same row in parallel causes bad data to be returned to the client

2014-08-07 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089282#comment-14089282
 ] 

Benedict commented on CASSANDRA-7695:
-

LGTM.

Would prefer we call the test NativeTransportBufferRecycleTest, and comment it 
to explain. Also remove the LOCAL_QUORUM, since it's meaningless here and don't 
want to confuse future readers. It's also not clear why we're bothering to 
'dump keys', but this is a test so I'm not going to vet it too hard.

 Inserting the same row in parallel causes bad data to be returned to the 
 client
 ---

 Key: CASSANDRA-7695
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7695
 Project: Cassandra
  Issue Type: Bug
 Environment: Linux 3.12.21, JVM 1.7u60
 Cassandra server 2.1.0 RC 5
 Cassandra datastax client version 2.1.0RC1
Reporter: Johan Bjork
Assignee: T Jake Luciani
Priority: Blocker
 Fix For: 2.1.0

 Attachments: 7695-workaround.txt, PutFailureRepro.java, 
 bad-data-tid43-get, bad-data-tid43-put


 Running the attached test program against a cassandra 2.1 server results in 
 scrambled data returned by the SELECT statement. Running it against latest 
 stable works fine.
 Attached:
 * Program that reproduces the failure
 * Example output files from mentioned test-program with the scrambled output.
 Failure mode:
 The value returned by 'get' is scrambled, the size is correct but some bytes 
 have shifted locations in the returned buffer.
 Cluster info:
 For the test we set up a single cassandra node using the stock configuration 
 file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7628) Tools java driver needs to be updated

2014-08-07 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089959#comment-14089959
 ] 

Benedict commented on CASSANDRA-7628:
-

FTR, 2.1.0-rc1 works fine dropped in as well - only thing that doesn't compile 
is CqlPagingRecordReader, which needs a couple of delegate methods 
auto-generating.

 Tools java driver needs to be updated
 -

 Key: CASSANDRA-7628
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7628
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Reporter: Brandon Williams
Assignee: Benedict
Priority: Minor
 Fix For: 2.1.1


 When you run stress currently you get a bunch of harmless stacktraces like:
 {noformat}
 ERROR 21:11:51 Error parsing schema options for table system_traces.sessions: 
 Cluster.getMetadata().getKeyspace(system_traces).getTable(sessions).getOptions()
  will return null
 java.lang.IllegalArgumentException: populate_io_cache_on_flush is not a 
 column defined in this metadata
 at 
 com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273)
  ~[cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279)
  ~[cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.ArrayBackedRow.isNull(ArrayBackedRow.java:56) 
 ~[cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.TableMetadata$Options.init(TableMetadata.java:529) 
 ~[cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.TableMetadata.build(TableMetadata.java:119) 
 ~[cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.Metadata.buildTableMetadata(Metadata.java:131) 
 [cassandra-driver-core-2.0.1.jar:na]
 at com.datastax.driver.core.Metadata.rebuildSchema(Metadata.java:92) 
 [cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.ControlConnection.refreshSchema(ControlConnection.java:293)
  [cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:230)
  [cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:170)
  [cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:78) 
 [cassandra-driver-core-2.0.1.jar:na]
 at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1029) 
 [cassandra-driver-core-2.0.1.jar:na]
 at com.datastax.driver.core.Cluster.getMetadata(Cluster.java:270) 
 [cassandra-driver-core-2.0.1.jar:na]
 at 
 org.apache.cassandra.stress.util.JavaDriverClient.connect(JavaDriverClient.java:90)
  [stress/:na]
 at 
 org.apache.cassandra.stress.settings.StressSettings.getJavaDriverClient(StressSettings.java:177)
  [stress/:na]
 at 
 org.apache.cassandra.stress.settings.StressSettings.getJavaDriverClient(StressSettings.java:159)
  [stress/:na]
 at 
 org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:264) 
 [stress/:na]
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7628) Tools java driver needs to be updated

2014-08-08 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090426#comment-14090426
 ] 

Benedict commented on CASSANDRA-7628:
-

Since 2.1.0 is still rc, I'd say that's a good idea

 Tools java driver needs to be updated
 -

 Key: CASSANDRA-7628
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7628
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Reporter: Brandon Williams
Assignee: Benedict
Priority: Minor
 Fix For: 2.1.1


 When you run stress currently you get a bunch of harmless stacktraces like:
 {noformat}
 ERROR 21:11:51 Error parsing schema options for table system_traces.sessions: 
 Cluster.getMetadata().getKeyspace(system_traces).getTable(sessions).getOptions()
  will return null
 java.lang.IllegalArgumentException: populate_io_cache_on_flush is not a 
 column defined in this metadata
 at 
 com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273)
  ~[cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279)
  ~[cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.ArrayBackedRow.isNull(ArrayBackedRow.java:56) 
 ~[cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.TableMetadata$Options.init(TableMetadata.java:529) 
 ~[cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.TableMetadata.build(TableMetadata.java:119) 
 ~[cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.Metadata.buildTableMetadata(Metadata.java:131) 
 [cassandra-driver-core-2.0.1.jar:na]
 at com.datastax.driver.core.Metadata.rebuildSchema(Metadata.java:92) 
 [cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.ControlConnection.refreshSchema(ControlConnection.java:293)
  [cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:230)
  [cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:170)
  [cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:78) 
 [cassandra-driver-core-2.0.1.jar:na]
 at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1029) 
 [cassandra-driver-core-2.0.1.jar:na]
 at com.datastax.driver.core.Cluster.getMetadata(Cluster.java:270) 
 [cassandra-driver-core-2.0.1.jar:na]
 at 
 org.apache.cassandra.stress.util.JavaDriverClient.connect(JavaDriverClient.java:90)
  [stress/:na]
 at 
 org.apache.cassandra.stress.settings.StressSettings.getJavaDriverClient(StressSettings.java:177)
  [stress/:na]
 at 
 org.apache.cassandra.stress.settings.StressSettings.getJavaDriverClient(StressSettings.java:159)
  [stress/:na]
 at 
 org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:264) 
 [stress/:na]
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7628) Tools java driver needs to be updated

2014-08-08 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090557#comment-14090557
 ] 

Benedict commented on CASSANDRA-7628:
-

Looks like you didn't update the tools/lib directory

 Tools java driver needs to be updated
 -

 Key: CASSANDRA-7628
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7628
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Reporter: Brandon Williams
Assignee: Benedict
Priority: Minor
 Fix For: 2.1.1


 When you run stress currently you get a bunch of harmless stacktraces like:
 {noformat}
 ERROR 21:11:51 Error parsing schema options for table system_traces.sessions: 
 Cluster.getMetadata().getKeyspace(system_traces).getTable(sessions).getOptions()
  will return null
 java.lang.IllegalArgumentException: populate_io_cache_on_flush is not a 
 column defined in this metadata
 at 
 com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273)
  ~[cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279)
  ~[cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.ArrayBackedRow.isNull(ArrayBackedRow.java:56) 
 ~[cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.TableMetadata$Options.init(TableMetadata.java:529) 
 ~[cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.TableMetadata.build(TableMetadata.java:119) 
 ~[cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.Metadata.buildTableMetadata(Metadata.java:131) 
 [cassandra-driver-core-2.0.1.jar:na]
 at com.datastax.driver.core.Metadata.rebuildSchema(Metadata.java:92) 
 [cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.ControlConnection.refreshSchema(ControlConnection.java:293)
  [cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:230)
  [cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:170)
  [cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:78) 
 [cassandra-driver-core-2.0.1.jar:na]
 at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1029) 
 [cassandra-driver-core-2.0.1.jar:na]
 at com.datastax.driver.core.Cluster.getMetadata(Cluster.java:270) 
 [cassandra-driver-core-2.0.1.jar:na]
 at 
 org.apache.cassandra.stress.util.JavaDriverClient.connect(JavaDriverClient.java:90)
  [stress/:na]
 at 
 org.apache.cassandra.stress.settings.StressSettings.getJavaDriverClient(StressSettings.java:177)
  [stress/:na]
 at 
 org.apache.cassandra.stress.settings.StressSettings.getJavaDriverClient(StressSettings.java:159)
  [stress/:na]
 at 
 org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:264) 
 [stress/:na]
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7628) Tools java driver needs to be updated

2014-08-08 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090569#comment-14090569
 ] 

Benedict commented on CASSANDRA-7628:
-

I think we had to upgrade it independently at some point...? Can't remember, I 
think there was discussion about it a while back.

 Tools java driver needs to be updated
 -

 Key: CASSANDRA-7628
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7628
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Reporter: Brandon Williams
Assignee: Benedict
Priority: Minor
 Fix For: 2.1.1


 When you run stress currently you get a bunch of harmless stacktraces like:
 {noformat}
 ERROR 21:11:51 Error parsing schema options for table system_traces.sessions: 
 Cluster.getMetadata().getKeyspace(system_traces).getTable(sessions).getOptions()
  will return null
 java.lang.IllegalArgumentException: populate_io_cache_on_flush is not a 
 column defined in this metadata
 at 
 com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273)
  ~[cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279)
  ~[cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.ArrayBackedRow.isNull(ArrayBackedRow.java:56) 
 ~[cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.TableMetadata$Options.init(TableMetadata.java:529) 
 ~[cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.TableMetadata.build(TableMetadata.java:119) 
 ~[cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.Metadata.buildTableMetadata(Metadata.java:131) 
 [cassandra-driver-core-2.0.1.jar:na]
 at com.datastax.driver.core.Metadata.rebuildSchema(Metadata.java:92) 
 [cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.ControlConnection.refreshSchema(ControlConnection.java:293)
  [cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:230)
  [cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:170)
  [cassandra-driver-core-2.0.1.jar:na]
 at 
 com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:78) 
 [cassandra-driver-core-2.0.1.jar:na]
 at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1029) 
 [cassandra-driver-core-2.0.1.jar:na]
 at com.datastax.driver.core.Cluster.getMetadata(Cluster.java:270) 
 [cassandra-driver-core-2.0.1.jar:na]
 at 
 org.apache.cassandra.stress.util.JavaDriverClient.connect(JavaDriverClient.java:90)
  [stress/:na]
 at 
 org.apache.cassandra.stress.settings.StressSettings.getJavaDriverClient(StressSettings.java:177)
  [stress/:na]
 at 
 org.apache.cassandra.stress.settings.StressSettings.getJavaDriverClient(StressSettings.java:159)
  [stress/:na]
 at 
 org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:264) 
 [stress/:na]
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7447) New sstable format with support for columnar layout

2014-08-08 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090655#comment-14090655
 ] 

Benedict commented on CASSANDRA-7447:
-

bq. CASSANDRA-7443 is good to clean up the code base and should be done first

This, and further related patches, are a necessary prerequisite to this ticket, 
yes.

bq. Using 32bit instead of 64bit pointers could also save some space.

I would prefer not to go down this route just yet, as it is error prone to be 
optimising this in the first version. Any optimisations that can be made 
universally (i.e. guaranteed to be safe for all file sizes) I'm onboard with, 
but obfuscating code dependent on file size I'm not. Especially as this 
introduces an extra condition to execute on every single field access, 
potentially stalling the processor pipeline more readily.

bq. Trie + byte ordered types: would this mean to do some special 
serialization e.g. for timeuuid to make them binary comparable?

Yes

bq. If one partition only contains one row, plain row-oriented storage seems 
to be more efficient. Is this what small partition layout is meant for?

No, it is because it requires fewer disk accesses to have it all packed into 
the same block (or we can have smaller blocks, increasing IOPS esp. on SSDs). 
In fact it is quite reasonable to assume that even with single row partitions 
the column oriented storage will be more efficient, as the columns do not care 
about partitions; they extend across all partitions, and so the serialization 
costs are reduced even if there are no clustering columns. 

I should note that the presentation at ngcc is only for historical reference 
and to get familiar with the general discussion. As mentioned in the 
description of this ticket, I now favour a row-oriented approach backed by the 
new index structures for many of the non-optimal column-oriented use cases, 
which *may* reduce the necessity of a compact column-oriented form, although it 
would still be useful as just described.

bq. Column names (CQL): I'd prefer to extend the table definition schema with 
an integer column id and use that. Could save lots of String.hashCode/equals() 
calls - even if the column-id is also used in native protocol. (Think this was 
discussed elsewhere)

There is a separate ticket for this, and I consider it to be an orthogonal 
effort. We can more easily deliver it here than we can cross-cluster 
(personally I favour cross-cluster names to be supported by a general enum type 
(CASSANDRA-6917))

bq. Bikeshed: Is the term sstable still correct?

The original sstable was only imposing a sort-order on the partition keys. This 
will still be imposed, so yes, but I don't have any strong attachment to it.

bq. I didn't catch the point why only maps and sets don't naturally fit into 
columnar format but lists, strings and blobs do. Or is it just because of their 
mean serialized size?

They don't logically fit because they are an extra dimension, much as static 
columns are one _fewer_ dimension. Columnar layouts really need fixed 
dimensionality. You can flatten maps, sets and lists (my list was not 
exhaustive), but this incurs significant cost and complexity on reading these 
across multiple sstables, as opposed to relying on the standard machinery. 
Strings and blobs can more trivially be split out into an extra file if they 
are too large (for simplicity of first delivery we can just append all values 
larger than some limit to a file, and replace them with their location in the 
file), but storing large strings in a columnar layout is generally not 
sensible/beneficial anyway.

In all likelihood I think the best approach may be to permit collections and 
statics on column oriented tables by splitting them into a separate 
row-oriented sstable, at least in the near-term. The heap-blocks outlined in 
the ngcc talk could be delivered later, although I might be inclined to tell 
users that column oriented storage is not for them if they want to store these 
things in the table.


 New sstable format with support for columnar layout
 ---

 Key: CASSANDRA-7447
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7447
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance, storage
 Fix For: 3.0

 Attachments: ngcc-storage.odp


 h2. Storage Format Proposal
 C* has come a long way over the past few years, and unfortunately our storage 
 format hasn't kept pace with the data models we are now encouraging people to 
 utilise. This ticket proposes a collections of storage primitives that can be 
 combined to serve these data models more optimally.
 It would probably help to first state the data model at the most abstract 

[jira] [Commented] (CASSANDRA-7704) FileNotFoundException during STREAM-OUT triggers 100% CPU usage

2014-08-08 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091053#comment-14091053
 ] 

Benedict commented on CASSANDRA-7704:
-

My mistake. I thought on IRC you said there were errors preceding it that might 
be related. Not necessary at all, just thought they might be explicable.

 FileNotFoundException during STREAM-OUT triggers 100% CPU usage
 ---

 Key: CASSANDRA-7704
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7704
 Project: Cassandra
  Issue Type: Bug
Reporter: Rick Branson
Assignee: Benedict
 Attachments: 7704.txt, backtrace.txt, other-errors.txt


 See attached backtrace which was what triggered this. This stream failed and 
 then ~12 seconds later it emitted that exception. At that point, all CPUs 
 went to 100%. A thread dump shows all the ReadStage threads stuck inside 
 IntervalTree.searchInternal inside of CFS.markReferenced().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7447) New sstable format with support for columnar layout

2014-08-08 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091084#comment-14091084
 ] 

Benedict commented on CASSANDRA-7447:
-

bq. Is there any reason why you want to put the row index block next to the 
data? 

If we are going out of cache, we may as well read the index + data, rather than 
then index _then_ data. With HDDs this should avoid any penalty. Bear in mind 
the index also _is_ the data in this brave new world. My goal with the new 
format is to, as far as possible, guarantee as many or fewer seeks to the old 
format (even if SSDs are becoming more prevalent), whilst reducing the total 
amount of space necessary (so reduce requisite disk bandwidth and improve cache 
occupancy).

bq. Is there any reason why you want to put the row index block next to the 
data? This actually makes it tricky to make sstables pluggable since right now 
we would put this index in the index.db file. It could be in both places I 
suppose since it would help with recovery to have multiple copies.

Why does this make pluggability hard? The index is an artefact of the sstable 
type (or it should be, before we roll this out), so it shouldn't matter?

bq. Also if you plan of putting the index at the front of the row you would 
need to do some kind of two pass to write the partition. 

Maybe. I'd prefer not to get down to this level of specifics just yet, I'm 
pretty sure it's solvable either way. It would be preferable to focus mostly on 
the overall design, featureset, etc. for the moment. The format is likely to be 
agnostic to where the two records live with respect to each other, but there 
are some optimisations possible on read if they're adjacent, assuming the 
records are all smaller than a page. If they are much larger than that, no 
optimisation is likely to help so it doesn't matter too much, and if they are 
smaller we only have to buffer two pages.

 New sstable format with support for columnar layout
 ---

 Key: CASSANDRA-7447
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7447
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance, storage
 Fix For: 3.0

 Attachments: ngcc-storage.odp


 h2. Storage Format Proposal
 C* has come a long way over the past few years, and unfortunately our storage 
 format hasn't kept pace with the data models we are now encouraging people to 
 utilise. This ticket proposes a collections of storage primitives that can be 
 combined to serve these data models more optimally.
 It would probably help to first state the data model at the most abstract 
 level. We have a fixed three-tier structure: We have the partition key, the 
 clustering columns, and the data columns. Each have their own characteristics 
 and so require their own specialised treatment.
 I should note that these changes will necessarily be delivered in stages, and 
 that we will be making some assumptions about what the most useful features 
 to support initially will be. Any features not supported will require 
 sticking with the old format until we extend support to all C* functionality.
 h3. Partition Key
 * This really has two components: the partition, and the value. Although the 
 partition is primarily used to distribute across nodes, it can also be used 
 to optimise lookups for a given key within a node
 * Generally partitioning is by hash, and for the moment I want to focus this 
 ticket on the assumption that this is the case
 * Given this, it makes sense to optimise our storage format to permit O(1) 
 searching of a given partition. It may be possible to achieve this with 
 little overhead based on the fact we store the hashes in order and know they 
 are approximately randomly distributed, as this effectively forms an 
 immutable contiguous split-ordered list (see Shalev/Shavit, or 
 CASSANDRA-7282), so we only need to store an amount of data based on how 
 imperfectly distributed the hashes are, or at worst a single value per block.
 * This should completely obviate the need for a separate key-cache, which 
 will be relegated to supporting the old storage format only
 h3. Primary Key / Clustering Columns
 * Given we have a hierarchical data model, I propose the use of a 
 cache-oblivious trie
 * The main advantage of the trie is that it is extremely compact and 
 _supports optimally efficient merges with other tries_ so that we can support 
 more efficient reads when multiple sstables are touched
 * The trie will be preceded by a small amount of related data; the full 
 partition key, a timestamp epoch (for offset-encoding timestamps) and any 
 other partition level optimisation data, such as (potentially) a min/max 
 timestamp to abort 

[jira] [Commented] (CASSANDRA-7447) New sstable format with support for columnar layout

2014-08-08 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091220#comment-14091220
 ] 

Benedict commented on CASSANDRA-7447:
-

bq. you already have all the code in place to put the index next to the 
partition location

I'm not sure I follow, but one of the goals is to permit faster _in memory_ 
(cached) performance, which means being more targeted with where we hit data 
inside our pages so that we can cache with finer granularity (and so have a 
higher cache hit rate), so we don't want to scan entire pages if we can avoid 
it. 

We have one index right now, and one data file, within both of which we persist 
clustering keys. This new scheme has one partition index, one data file, and 
one hybrid dataset, which can live by itself or in the datafile, but behaves as 
both a clustering index and the data itself. So when we're talking about an 
index things can get confusing. However we want to be able to support (and 
improve upon) the current ability to seek directly within partitions, and we 
want to be able to do so efficiently, without extra disk seeks, so we ideally 
want these clustering key records to be cached independently of the rest of the 
data since they will/may be referenced more frequently.

 New sstable format with support for columnar layout
 ---

 Key: CASSANDRA-7447
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7447
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance, storage
 Fix For: 3.0

 Attachments: ngcc-storage.odp


 h2. Storage Format Proposal
 C* has come a long way over the past few years, and unfortunately our storage 
 format hasn't kept pace with the data models we are now encouraging people to 
 utilise. This ticket proposes a collections of storage primitives that can be 
 combined to serve these data models more optimally.
 It would probably help to first state the data model at the most abstract 
 level. We have a fixed three-tier structure: We have the partition key, the 
 clustering columns, and the data columns. Each have their own characteristics 
 and so require their own specialised treatment.
 I should note that these changes will necessarily be delivered in stages, and 
 that we will be making some assumptions about what the most useful features 
 to support initially will be. Any features not supported will require 
 sticking with the old format until we extend support to all C* functionality.
 h3. Partition Key
 * This really has two components: the partition, and the value. Although the 
 partition is primarily used to distribute across nodes, it can also be used 
 to optimise lookups for a given key within a node
 * Generally partitioning is by hash, and for the moment I want to focus this 
 ticket on the assumption that this is the case
 * Given this, it makes sense to optimise our storage format to permit O(1) 
 searching of a given partition. It may be possible to achieve this with 
 little overhead based on the fact we store the hashes in order and know they 
 are approximately randomly distributed, as this effectively forms an 
 immutable contiguous split-ordered list (see Shalev/Shavit, or 
 CASSANDRA-7282), so we only need to store an amount of data based on how 
 imperfectly distributed the hashes are, or at worst a single value per block.
 * This should completely obviate the need for a separate key-cache, which 
 will be relegated to supporting the old storage format only
 h3. Primary Key / Clustering Columns
 * Given we have a hierarchical data model, I propose the use of a 
 cache-oblivious trie
 * The main advantage of the trie is that it is extremely compact and 
 _supports optimally efficient merges with other tries_ so that we can support 
 more efficient reads when multiple sstables are touched
 * The trie will be preceded by a small amount of related data; the full 
 partition key, a timestamp epoch (for offset-encoding timestamps) and any 
 other partition level optimisation data, such as (potentially) a min/max 
 timestamp to abort merges earlier
 * Initially I propose to limit the trie to byte-order comparable data types 
 only (the number of which we can expand through translations of the important 
 types that are not currently)
 * Crucially the trie will also encapsulate any range tombstones, so that 
 these are merged early in the process and avoids re-iterating the same data
 * Results in true bidirectional streaming without having to read entire range 
 into memory
 h3. Values
 There are generally two approaches to storing rows of data: columnar, or 
 row-oriented. The above two data structures can be combined with a value 
 storage scheme that is based on either. 

[jira] [Created] (CASSANDRA-7735) Remove ref-counting of netty buffers

2014-08-10 Thread Benedict (JIRA)
Benedict created CASSANDRA-7735:
---

 Summary: Remove ref-counting of netty buffers
 Key: CASSANDRA-7735
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7735
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Benedict
Assignee: T Jake Luciani
Priority: Critical
 Fix For: 2.1 rc5


This has turned out to be more bug prone than we'd hoped, and it no longer 
seems to be a justified risk factor, since the performance gains were generally 
quite modest. When there's some time we can reengineer the API to make it safer 
to produce more obviously correct usage, but in the meantime I propose rolling 
back this change before general availability of 2.1



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-7736) Clean-up, justify (and reduce) each use of @Inline

2014-08-10 Thread Benedict (JIRA)
Benedict created CASSANDRA-7736:
---

 Summary: Clean-up, justify (and reduce) each use of @Inline
 Key: CASSANDRA-7736
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7736
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: T Jake Luciani
Priority: Minor
 Fix For: 2.1.0


\@Inline is a delicate tool, and should in all cases we've used it (and use it 
in future) be accompanied by a comment justifying its use in the given context 
both theoretically and, preferably, with some brief description of/link to 
steps taken to demonstrate its benefit. We should aim to not use it unless we 
are very confident we can do better than the normal behaviour, as poor use can 
result in a polluted instruction cache, which can yield better results in tight 
benchmarks, but worse results in general use.

It looks to me that we have too many uses already. I'll look over each one as 
well, and we can compare notes. If there's disagreement on any use, we can 
discuss, and if still there is any dissent should always err in favour of *not* 
using \@Inline.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7736) Clean-up, justify (and reduce) each use of @Inline

2014-08-11 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092484#comment-14092484
 ] 

Benedict commented on CASSANDRA-7736:
-

Thanks. 

In general inlining is unlikely to ever have a material difference if it 
impacts only a handful of calls for each database operation. We should restrict 
its use to methods invoked disproportionately often and, especially, in tight 
loops, where we know the instruction cache pollution will pay off (ie where the 
heuristics fall down).

 Clean-up, justify (and reduce) each use of @Inline
 --

 Key: CASSANDRA-7736
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7736
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: T Jake Luciani
Priority: Minor
 Fix For: 2.1.0


 \@Inline is a delicate tool, and should in all cases we've used it (and use 
 it in future) be accompanied by a comment justifying its use in the given 
 context both theoretically and, preferably, with some brief description 
 of/link to steps taken to demonstrate its benefit. We should aim to not use 
 it unless we are very confident we can do better than the normal behaviour, 
 as poor use can result in a polluted instruction cache, which can yield 
 better results in tight benchmarks, but worse results in general use.
 It looks to me that we have too many uses already. I'll look over each one as 
 well, and we can compare notes. If there's disagreement on any use, we can 
 discuss, and if still there is any dissent should always err in favour of 
 *not* using \@Inline.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-7738) Permit CL overuse to be explicitly bounded

2014-08-11 Thread Benedict (JIRA)
Benedict created CASSANDRA-7738:
---

 Summary: Permit CL overuse to be explicitly bounded
 Key: CASSANDRA-7738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7738
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
Priority: Minor
 Fix For: 2.1.1


As mentioned in CASSANDRA-7554, we do not currently offer any way to explicitly 
bound CL growth, which can be problematic in some scenarios (e.g. EC2 where the 
system drive is quite small). We should offer a configurable amount of 
headroom, beyond which we stop accepting writes until the backlog clears.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7743) Possible C* OOM issue during long running test

2014-08-11 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093206#comment-14093206
 ] 

Benedict commented on CASSANDRA-7743:
-

Are you running with memtable_allocation_type: offheap_buffers? If so, switch 
to the offheap_objects. 

If not, it's surprising to be hitting that limit with netty buffers, as we 
don't allocate them anywhere else. Either way, the fact that this is failing 
inside netty is surprising, since this is prior to the fix for CASSANDRA-7695, 
so we shouldn't in principle be allocating direct buffers with netty.

 Possible C* OOM issue during long running test
 --

 Key: CASSANDRA-7743
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7743
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Google Compute Engine, n1-standard-1
Reporter: Pierre Laporte

 During a long running test, we ended up with a lot of 
 java.lang.OutOfMemoryError: Direct buffer memory errors on the Cassandra 
 instances.
 Here is an example of stacktrace from system.log :
 {code}
 ERROR [SharedPool-Worker-1] 2014-08-11 11:09:34,610 ErrorMessage.java:218 - 
 Unexpected exception during request
 java.lang.OutOfMemoryError: Direct buffer memory
 at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_25]
 at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) 
 ~[na:1.7.0_25]
 at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) 
 ~[na:1.7.0_25]
 at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocate(PoolArena.java:98) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:251)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:112)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:507) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:464)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:378) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:350) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25]
 {code}
 The test consisted of a 3-nodes cluster of n1-standard-1 GCE instances (1 
 vCPU, 3.75 GB RAM) running cassandra-2.1.0-rc5, and a n1-standard-2 instance 
 running the test.
 After ~2.5 days, several requests start to fail and we see the previous 
 stacktraces in the system.log file.
 The output from linux ‘free’ and ‘meminfo’ suggest that there is still memory 
 available.
 {code}
 $ free -m
 total  used   free sharedbuffers cached
 Mem:  3702   3532169  0161854
 -/+ buffers/cache:   2516   1185
 Swap:0  0  0
 $ head -n 4 /proc/meminfo
 MemTotal:3791292 kB
 MemFree:  173568 kB
 Buffers:  165608 kB
 Cached:   874752 kB
 {code}
 These errors do not affect all the queries we run. The cluster is still 
 responsive but is unable to display tracing information using cqlsh :
 {code}
 $ ./bin/nodetool --host 10.240.137.253 status duration_test
 Datacenter: DC1
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address Load   Tokens  Owns (effective)  Host ID  
  Rack
 UN  10.240.98.27

[jira] [Resolved] (CASSANDRA-7732) Counter replication mutation can have corrupt cell name values (via pooled Netty buffers)

2014-08-11 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict resolved CASSANDRA-7732.
-

Resolution: Not a Problem

Closing as we're removing ref-counting for 2.1 until we can come up with a 
safer strategy

 Counter replication mutation can have corrupt cell name values (via pooled 
 Netty buffers)
 -

 Key: CASSANDRA-7732
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7732
 Project: Cassandra
  Issue Type: Bug
Reporter: Andrew Montalenti
Assignee: Aleksey Yeschenko
Priority: Critical
 Fix For: 2.1.0

 Attachments: 7732.txt


 Counter replication mutation can have corrupt cell name values (via pooled 
 Netty buffers), because CM.apply()-created replication mutation doesn't copy 
 the cell names/partition key from the original mutation AND eventually 
 doesn't have the same source frame, making ref counting not work there.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7735) Remove ref-counting of netty buffers

2014-08-11 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093323#comment-14093323
 ] 

Benedict commented on CASSANDRA-7735:
-

Yes. Linked/closed CASSANDRA-7732.

 Remove ref-counting of netty buffers
 

 Key: CASSANDRA-7735
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7735
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Benedict
Assignee: T Jake Luciani
Priority: Critical
  Labels: correctness, performance
 Fix For: 2.1.0

 Attachments: 7735.txt


 This has turned out to be more bug prone than we'd hoped, and it no longer 
 seems to be a justified risk factor, since the performance gains were 
 generally quite modest. When there's some time we can reengineer the API to 
 make it safer to produce more obviously correct usage, but in the meantime I 
 propose rolling back this change before general availability of 2.1



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7732) Counter replication mutation can have corrupt cell name values (via pooled Netty buffers)

2014-08-11 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093348#comment-14093348
 ] 

Benedict commented on CASSANDRA-7732:
-

Yes, see the linked (superceded-by) issue, CASSANDRA-7735

 Counter replication mutation can have corrupt cell name values (via pooled 
 Netty buffers)
 -

 Key: CASSANDRA-7732
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7732
 Project: Cassandra
  Issue Type: Bug
Reporter: Andrew Montalenti
Assignee: Aleksey Yeschenko
Priority: Critical
 Fix For: 2.1.0

 Attachments: 7732.txt


 Counter replication mutation can have corrupt cell name values (via pooled 
 Netty buffers), because CM.apply()-created replication mutation doesn't copy 
 the cell names/partition key from the original mutation AND eventually 
 doesn't have the same source frame, making ref counting not work there.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7704) FileNotFoundException during STREAM-OUT triggers 100% CPU usage

2014-08-11 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093450#comment-14093450
 ] 

Benedict commented on CASSANDRA-7704:
-

[~yukim] was that comment a +1 on the 2.0 patch, and asking for a corresponding 
2.1 patch?

 FileNotFoundException during STREAM-OUT triggers 100% CPU usage
 ---

 Key: CASSANDRA-7704
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7704
 Project: Cassandra
  Issue Type: Bug
Reporter: Rick Branson
Assignee: Benedict
 Attachments: 7704.txt, backtrace.txt, other-errors.txt


 See attached backtrace which was what triggered this. This stream failed and 
 then ~12 seconds later it emitted that exception. At that point, all CPUs 
 went to 100%. A thread dump shows all the ReadStage threads stuck inside 
 IntervalTree.searchInternal inside of CFS.markReferenced().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7704) FileNotFoundException during STREAM-OUT triggers 100% CPU usage

2014-08-11 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-7704:


Attachment: 7704.20.v2.txt

FTR, there was a (probably innocuous) mistake in that patch; fixed version 
attached.

 FileNotFoundException during STREAM-OUT triggers 100% CPU usage
 ---

 Key: CASSANDRA-7704
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7704
 Project: Cassandra
  Issue Type: Bug
Reporter: Rick Branson
Assignee: Benedict
 Attachments: 7704.20.v2.txt, 7704.txt, backtrace.txt, other-errors.txt


 See attached backtrace which was what triggered this. This stream failed and 
 then ~12 seconds later it emitted that exception. At that point, all CPUs 
 went to 100%. A thread dump shows all the ReadStage threads stuck inside 
 IntervalTree.searchInternal inside of CFS.markReferenced().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7735) Remove ref-counting of netty buffers

2014-08-11 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093476#comment-14093476
 ] 

Benedict commented on CASSANDRA-7735:
-

LGTM.

nit: BatchStatement, ModificationStatement, Mutation, QueryState each have an 
unused Frame import, ResponseVerbHandler has an unused IMutation import

 Remove ref-counting of netty buffers
 

 Key: CASSANDRA-7735
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7735
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Benedict
Assignee: T Jake Luciani
Priority: Critical
  Labels: correctness, performance
 Fix For: 2.1.0

 Attachments: 7735.txt


 This has turned out to be more bug prone than we'd hoped, and it no longer 
 seems to be a justified risk factor, since the performance gains were 
 generally quite modest. When there's some time we can reengineer the API to 
 make it safer to produce more obviously correct usage, but in the meantime I 
 propose rolling back this change before general availability of 2.1



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7728) ConcurrentModificationException after upgrade to trunk

2014-08-11 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093499#comment-14093499
 ] 

Benedict commented on CASSANDRA-7728:
-

Looks related to CASSANDRA-7116 - looks like 2.0 may possibly be affected

 ConcurrentModificationException after upgrade to trunk
 --

 Key: CASSANDRA-7728
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7728
 Project: Cassandra
  Issue Type: Bug
Reporter: Russ Hatch

 Trying to repro another issue, I ran across this exception. It occurred 
 during a rolling upgrade to trunk. It happening during or right after the 
 test script checks counters to see if they are correct.
 {noformat}
 ERROR [Thrift:2] 2014-08-11 13:47:09,668 CustomTThreadPoolServer.java:219 - 
 Error occurred during processing of message.
 java.util.ConcurrentModificationException: null
   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859) 
 ~[na:1.7.0_65]
   at java.util.ArrayList$Itr.next(ArrayList.java:831) ~[na:1.7.0_65]
   at 
 org.apache.cassandra.service.RowDigestResolver.getData(RowDigestResolver.java:40)
  ~[main/:na]
   at 
 org.apache.cassandra.service.RowDigestResolver.getData(RowDigestResolver.java:28)
  ~[main/:na]
   at org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:110) 
 ~[main/:na]
   at 
 org.apache.cassandra.service.AbstractReadExecutor.get(AbstractReadExecutor.java:144)
  ~[main/:na]
   at 
 org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1262) 
 ~[main/:na]
   at 
 org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1188) 
 ~[main/:na]
   at 
 org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:256)
  ~[main/:na]
   at 
 org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:212)
  ~[main/:na]
   at 
 org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:61)
  ~[main/:na]
   at 
 org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:186)
  ~[main/:na]
   at 
 org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:205) 
 ~[main/:na]
   at 
 org.apache.cassandra.thrift.CassandraServer.execute_cql3_query(CassandraServer.java:1916)
  ~[main/:na]
   at 
 org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4588)
  ~[thrift/:na]
   at 
 org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4572)
  ~[thrift/:na]
   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
 ~[libthrift-0.9.1.jar:0.9.1]
   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
 ~[libthrift-0.9.1.jar:0.9.1]
   at 
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:201)
  ~[main/:na]
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  [na:1.7.0_65]
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_65]
   at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
 {noformat}
 It's not happening 100% of the time, but may be triggered by running this 
 dtest:
 {noformat}
 nosetests -vs 
 upgrade_through_versions_test.py:TestUpgradeThroughVersions.upgrade_test_mixed
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7743) Possible C* OOM issue during long running test

2014-08-12 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093976#comment-14093976
 ] 

Benedict commented on CASSANDRA-7743:
-

Could we get some heap dumps? Sounds to me like it's possibly a netty bug, or a 
ref counting bug coupled with a leaked/held reference somewhere. We need to see 
where these ByteBuffer references are being retained and why.

 Possible C* OOM issue during long running test
 --

 Key: CASSANDRA-7743
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7743
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Google Compute Engine, n1-standard-1
Reporter: Pierre Laporte

 During a long running test, we ended up with a lot of 
 java.lang.OutOfMemoryError: Direct buffer memory errors on the Cassandra 
 instances.
 Here is an example of stacktrace from system.log :
 {code}
 ERROR [SharedPool-Worker-1] 2014-08-11 11:09:34,610 ErrorMessage.java:218 - 
 Unexpected exception during request
 java.lang.OutOfMemoryError: Direct buffer memory
 at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_25]
 at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) 
 ~[na:1.7.0_25]
 at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) 
 ~[na:1.7.0_25]
 at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocate(PoolArena.java:98) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:251)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:112)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:507) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:464)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:378) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:350) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25]
 {code}
 The test consisted of a 3-nodes cluster of n1-standard-1 GCE instances (1 
 vCPU, 3.75 GB RAM) running cassandra-2.1.0-rc5, and a n1-standard-2 instance 
 running the test.
 After ~2.5 days, several requests start to fail and we see the previous 
 stacktraces in the system.log file.
 The output from linux ‘free’ and ‘meminfo’ suggest that there is still memory 
 available.
 {code}
 $ free -m
 total  used   free sharedbuffers cached
 Mem:  3702   3532169  0161854
 -/+ buffers/cache:   2516   1185
 Swap:0  0  0
 $ head -n 4 /proc/meminfo
 MemTotal:3791292 kB
 MemFree:  173568 kB
 Buffers:  165608 kB
 Cached:   874752 kB
 {code}
 These errors do not affect all the queries we run. The cluster is still 
 responsive but is unable to display tracing information using cqlsh :
 {code}
 $ ./bin/nodetool --host 10.240.137.253 status duration_test
 Datacenter: DC1
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address Load   Tokens  Owns (effective)  Host ID  
  Rack
 UN  10.240.98.27925.17 KB  256 100.0%
 41314169-eff5-465f-85ea-d501fd8f9c5e  RAC1
 UN  10.240.137.253  1.1 MB 256 100.0%
 c706f5f9-c5f3-4d5e-95e9-a8903823827e  RAC1

[jira] [Commented] (CASSANDRA-7750) Do not flush on truncate if durable_writes is false.

2014-08-12 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094071#comment-14094071
 ] 

Benedict commented on CASSANDRA-7750:
-

I'd rather we did not reintroduce the 'renew memtable' method, as it is 
inherently dangerous. If we are to do so, it should have clear danger warnings 
around it, OR it should explicitly clear the CL of any records it contains.

 Do not flush on truncate if durable_writes is false.
 --

 Key: CASSANDRA-7750
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7750
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Jeremiah Jordan
Assignee: Jeremiah Jordan
Priority: Minor
 Fix For: 2.0.10, 2.1.1

 Attachments: 7750-2.0.txt, 7750-2.1.txt


 CASSANDRA-7511 changed truncate so it will always flush to fix commit log 
 issues.  If durable_writes is false, then there will not be able data in the 
 commit log for the table, so we can safely just drop the memtables and not 
 flush.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7743) Possible C* OOM issue during long running test

2014-08-13 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095411#comment-14095411
 ] 

Benedict commented on CASSANDRA-7743:
-

No, but I don't think it's likely to be related, since they would still be 
collected when unreferenced, so we'd likely see LEAK DETECTOR warnings from 
netty at which time the associated resources would also be freed, so we'd be 
somwhat unlikely to see the bug.

No harm in trying, of course, but it sounds like it takes a few days to 
reproduce.

 Possible C* OOM issue during long running test
 --

 Key: CASSANDRA-7743
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7743
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Google Compute Engine, n1-standard-1
Reporter: Pierre Laporte
 Fix For: 2.1.0


 During a long running test, we ended up with a lot of 
 java.lang.OutOfMemoryError: Direct buffer memory errors on the Cassandra 
 instances.
 Here is an example of stacktrace from system.log :
 {code}
 ERROR [SharedPool-Worker-1] 2014-08-11 11:09:34,610 ErrorMessage.java:218 - 
 Unexpected exception during request
 java.lang.OutOfMemoryError: Direct buffer memory
 at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_25]
 at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) 
 ~[na:1.7.0_25]
 at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) 
 ~[na:1.7.0_25]
 at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocate(PoolArena.java:98) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:251)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:112)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:507) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:464)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:378) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:350) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25]
 {code}
 The test consisted of a 3-nodes cluster of n1-standard-1 GCE instances (1 
 vCPU, 3.75 GB RAM) running cassandra-2.1.0-rc5, and a n1-standard-2 instance 
 running the test.
 After ~2.5 days, several requests start to fail and we see the previous 
 stacktraces in the system.log file.
 The output from linux ‘free’ and ‘meminfo’ suggest that there is still memory 
 available.
 {code}
 $ free -m
 total  used   free sharedbuffers cached
 Mem:  3702   3532169  0161854
 -/+ buffers/cache:   2516   1185
 Swap:0  0  0
 $ head -n 4 /proc/meminfo
 MemTotal:3791292 kB
 MemFree:  173568 kB
 Buffers:  165608 kB
 Cached:   874752 kB
 {code}
 These errors do not affect all the queries we run. The cluster is still 
 responsive but is unable to display tracing information using cqlsh :
 {code}
 $ ./bin/nodetool --host 10.240.137.253 status duration_test
 Datacenter: DC1
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address Load   Tokens  Owns (effective)  Host ID  
  Rack
 UN  10.240.98.27925.17 KB  256 100.0%  

[jira] [Commented] (CASSANDRA-7743) Possible C* OOM issue during long running test

2014-08-13 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095780#comment-14095780
 ] 

Benedict commented on CASSANDRA-7743:
-

It looks like the problem is caused by a number of changes in 2.1 composing to 
yield especially bad behaviour. We use pooled buffers in netty, but we also 
introduced an SEPWorker pool that has many threads (more than the number that 
actually service any single pool), and all threads may eventually service work 
on the netty executor side. This gives us ~130 threads periodically performing 
this work, and each of them apparently allocates a buffer at some point. These 
buffers are unfortunately allocated from a threadlocal pool, which starts at 
16Mb, so each thread retains at least 16Mb of largely useless memory.

The best fix will be to stop the SEPWorker tasks from allocating any buffers, 
but [~tjake] has pointed out we can also tweak some settings to mitigate the 
negative impact of this kind of problem as well.

I'll look into a patch tomorrow.

 Possible C* OOM issue during long running test
 --

 Key: CASSANDRA-7743
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7743
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Google Compute Engine, n1-standard-1
Reporter: Pierre Laporte
 Fix For: 2.1.0


 During a long running test, we ended up with a lot of 
 java.lang.OutOfMemoryError: Direct buffer memory errors on the Cassandra 
 instances.
 Here is an example of stacktrace from system.log :
 {code}
 ERROR [SharedPool-Worker-1] 2014-08-11 11:09:34,610 ErrorMessage.java:218 - 
 Unexpected exception during request
 java.lang.OutOfMemoryError: Direct buffer memory
 at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_25]
 at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) 
 ~[na:1.7.0_25]
 at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) 
 ~[na:1.7.0_25]
 at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocate(PoolArena.java:98) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:251)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:112)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:507) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:464)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:378) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:350) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25]
 {code}
 The test consisted of a 3-nodes cluster of n1-standard-1 GCE instances (1 
 vCPU, 3.75 GB RAM) running cassandra-2.1.0-rc5, and a n1-standard-2 instance 
 running the test.
 After ~2.5 days, several requests start to fail and we see the previous 
 stacktraces in the system.log file.
 The output from linux ‘free’ and ‘meminfo’ suggest that there is still memory 
 available.
 {code}
 $ free -m
 total  used   free sharedbuffers cached
 Mem:  3702   3532169  0161854
 -/+ buffers/cache:   2516   1185
 Swap:0  0  0
 $ head -n 4 /proc/meminfo
 MemTotal:3791292 kB
 MemFree:  173568 kB
 

[jira] [Assigned] (CASSANDRA-7743) Possible C* OOM issue during long running test

2014-08-13 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict reassigned CASSANDRA-7743:
---

Assignee: Benedict

 Possible C* OOM issue during long running test
 --

 Key: CASSANDRA-7743
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7743
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Google Compute Engine, n1-standard-1
Reporter: Pierre Laporte
Assignee: Benedict
 Fix For: 2.1.0


 During a long running test, we ended up with a lot of 
 java.lang.OutOfMemoryError: Direct buffer memory errors on the Cassandra 
 instances.
 Here is an example of stacktrace from system.log :
 {code}
 ERROR [SharedPool-Worker-1] 2014-08-11 11:09:34,610 ErrorMessage.java:218 - 
 Unexpected exception during request
 java.lang.OutOfMemoryError: Direct buffer memory
 at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_25]
 at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) 
 ~[na:1.7.0_25]
 at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) 
 ~[na:1.7.0_25]
 at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocate(PoolArena.java:98) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:251)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:112)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:507) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:464)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:378) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:350) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25]
 {code}
 The test consisted of a 3-nodes cluster of n1-standard-1 GCE instances (1 
 vCPU, 3.75 GB RAM) running cassandra-2.1.0-rc5, and a n1-standard-2 instance 
 running the test.
 After ~2.5 days, several requests start to fail and we see the previous 
 stacktraces in the system.log file.
 The output from linux ‘free’ and ‘meminfo’ suggest that there is still memory 
 available.
 {code}
 $ free -m
 total  used   free sharedbuffers cached
 Mem:  3702   3532169  0161854
 -/+ buffers/cache:   2516   1185
 Swap:0  0  0
 $ head -n 4 /proc/meminfo
 MemTotal:3791292 kB
 MemFree:  173568 kB
 Buffers:  165608 kB
 Cached:   874752 kB
 {code}
 These errors do not affect all the queries we run. The cluster is still 
 responsive but is unable to display tracing information using cqlsh :
 {code}
 $ ./bin/nodetool --host 10.240.137.253 status duration_test
 Datacenter: DC1
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address Load   Tokens  Owns (effective)  Host ID  
  Rack
 UN  10.240.98.27925.17 KB  256 100.0%
 41314169-eff5-465f-85ea-d501fd8f9c5e  RAC1
 UN  10.240.137.253  1.1 MB 256 100.0%
 c706f5f9-c5f3-4d5e-95e9-a8903823827e  RAC1
 UN  10.240.72.183   896.57 KB  256 100.0%
 15735c4d-98d4-4ea4-a305-7ab2d92f65fc  RAC1
 $ echo 'tracing on; select count(*) from duration_test.ints;' | ./bin/cqlsh 
 

[jira] [Commented] (CASSANDRA-7743) Possible C* OOM issue during long running test

2014-08-14 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096619#comment-14096619
 ] 

Benedict commented on CASSANDRA-7743:
-

Hmm. So, looking at this a little more closely, I think this may effectively be 
a netty bug after all. It looks like no matter what pool/thread a pooled 
bytebuf is allocated on, it gets returned to the pool of the thread that 
_releases_ it. This means it simply accumulates indefinitely (up to the pool 
limit, which defaults to 32Mb) in the SEPWorkers, since they never themselves 
_allocate_, only release.

[~norman] is that analysis correct? If so, it looks like this behaviour is 
somewhat unexpected and not ideal. However we can work around it for now.

 Possible C* OOM issue during long running test
 --

 Key: CASSANDRA-7743
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7743
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Google Compute Engine, n1-standard-1
Reporter: Pierre Laporte
Assignee: Benedict
 Fix For: 2.1.0


 During a long running test, we ended up with a lot of 
 java.lang.OutOfMemoryError: Direct buffer memory errors on the Cassandra 
 instances.
 Here is an example of stacktrace from system.log :
 {code}
 ERROR [SharedPool-Worker-1] 2014-08-11 11:09:34,610 ErrorMessage.java:218 - 
 Unexpected exception during request
 java.lang.OutOfMemoryError: Direct buffer memory
 at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_25]
 at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) 
 ~[na:1.7.0_25]
 at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) 
 ~[na:1.7.0_25]
 at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocate(PoolArena.java:98) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:251)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:112)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:507) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:464)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:378) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:350) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25]
 {code}
 The test consisted of a 3-nodes cluster of n1-standard-1 GCE instances (1 
 vCPU, 3.75 GB RAM) running cassandra-2.1.0-rc5, and a n1-standard-2 instance 
 running the test.
 After ~2.5 days, several requests start to fail and we see the previous 
 stacktraces in the system.log file.
 The output from linux ‘free’ and ‘meminfo’ suggest that there is still memory 
 available.
 {code}
 $ free -m
 total  used   free sharedbuffers cached
 Mem:  3702   3532169  0161854
 -/+ buffers/cache:   2516   1185
 Swap:0  0  0
 $ head -n 4 /proc/meminfo
 MemTotal:3791292 kB
 MemFree:  173568 kB
 Buffers:  165608 kB
 Cached:   874752 kB
 {code}
 These errors do not affect all the queries we run. The cluster is still 
 responsive but is unable to display tracing information using cqlsh :
 {code}
 $ ./bin/nodetool --host 10.240.137.253 status duration_test

[jira] [Commented] (CASSANDRA-7743) Possible C* OOM issue during long running test

2014-08-14 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096631#comment-14096631
 ] 

Benedict commented on CASSANDRA-7743:
-

I haven't got to that stage yet, I'm just analysing the code right now. It's 
why I asked for your input, was hoping you could disabuse me if I'm completely 
wrong. I don't 100% understand the control flow, as it doesn't make much sense 
(to me) to be adding it to a different cache. However if you look in 
PooledByteBuf.deallocate(), it calls PoolArena.free() to release the memory, 
which in turn calls parent.threadCache.get().add() to cache its memory; 
obviously the threadCache.get() is grabbing the threadlocal cache for the 
thread releasing, not the source PoolThreadCache.

Also worth noting I'm not convinced that, even if I'm correct, this fully 
explains the behaviour. We should only release on a different thread if an 
exception occurs during processing anyway, so I'm still digging for a more 
satisfactory full explanation of the behaviour.

 Possible C* OOM issue during long running test
 --

 Key: CASSANDRA-7743
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7743
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Google Compute Engine, n1-standard-1
Reporter: Pierre Laporte
Assignee: Benedict
 Fix For: 2.1.0


 During a long running test, we ended up with a lot of 
 java.lang.OutOfMemoryError: Direct buffer memory errors on the Cassandra 
 instances.
 Here is an example of stacktrace from system.log :
 {code}
 ERROR [SharedPool-Worker-1] 2014-08-11 11:09:34,610 ErrorMessage.java:218 - 
 Unexpected exception during request
 java.lang.OutOfMemoryError: Direct buffer memory
 at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_25]
 at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) 
 ~[na:1.7.0_25]
 at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) 
 ~[na:1.7.0_25]
 at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocate(PoolArena.java:98) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:251)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:112)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:507) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:464)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:378) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:350) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25]
 {code}
 The test consisted of a 3-nodes cluster of n1-standard-1 GCE instances (1 
 vCPU, 3.75 GB RAM) running cassandra-2.1.0-rc5, and a n1-standard-2 instance 
 running the test.
 After ~2.5 days, several requests start to fail and we see the previous 
 stacktraces in the system.log file.
 The output from linux ‘free’ and ‘meminfo’ suggest that there is still memory 
 available.
 {code}
 $ free -m
 total  used   free sharedbuffers cached
 Mem:  3702   3532169  0161854
 -/+ buffers/cache:   2516   1185
 Swap:0  0  0
 $ head -n 4 /proc/meminfo
 MemTotal:3791292 kB
 MemFree:   

[jira] [Commented] (CASSANDRA-7743) Possible C* OOM issue during long running test

2014-08-14 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096638#comment-14096638
 ] 

Benedict commented on CASSANDRA-7743:
-

We're conflating two pools maybe :)

I mean the pool of memory the thread can allocate from. So, to confirm I have 
this right, if you have two threads A and B, A only allocating and B only 
releasing, you would get memory accumulating up to max pool size in B, and A 
always allocating new memory?

 Possible C* OOM issue during long running test
 --

 Key: CASSANDRA-7743
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7743
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Google Compute Engine, n1-standard-1
Reporter: Pierre Laporte
Assignee: Benedict
 Fix For: 2.1.0


 During a long running test, we ended up with a lot of 
 java.lang.OutOfMemoryError: Direct buffer memory errors on the Cassandra 
 instances.
 Here is an example of stacktrace from system.log :
 {code}
 ERROR [SharedPool-Worker-1] 2014-08-11 11:09:34,610 ErrorMessage.java:218 - 
 Unexpected exception during request
 java.lang.OutOfMemoryError: Direct buffer memory
 at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_25]
 at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) 
 ~[na:1.7.0_25]
 at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) 
 ~[na:1.7.0_25]
 at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocate(PoolArena.java:98) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:251)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:112)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:507) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:464)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:378) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:350) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25]
 {code}
 The test consisted of a 3-nodes cluster of n1-standard-1 GCE instances (1 
 vCPU, 3.75 GB RAM) running cassandra-2.1.0-rc5, and a n1-standard-2 instance 
 running the test.
 After ~2.5 days, several requests start to fail and we see the previous 
 stacktraces in the system.log file.
 The output from linux ‘free’ and ‘meminfo’ suggest that there is still memory 
 available.
 {code}
 $ free -m
 total  used   free sharedbuffers cached
 Mem:  3702   3532169  0161854
 -/+ buffers/cache:   2516   1185
 Swap:0  0  0
 $ head -n 4 /proc/meminfo
 MemTotal:3791292 kB
 MemFree:  173568 kB
 Buffers:  165608 kB
 Cached:   874752 kB
 {code}
 These errors do not affect all the queries we run. The cluster is still 
 responsive but is unable to display tracing information using cqlsh :
 {code}
 $ ./bin/nodetool --host 10.240.137.253 status duration_test
 Datacenter: DC1
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address Load   Tokens  Owns (effective)  Host ID  
  Rack
 UN  10.240.98.27925.17 KB  256 100.0%
 

[jira] [Commented] (CASSANDRA-7743) Possible C* OOM issue during long running test

2014-08-14 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096651#comment-14096651
 ] 

Benedict commented on CASSANDRA-7743:
-

bq. well it will be released after a while if not used.

how long? it shouldn't ever be used, and it looks like it accumulates gigabytes 
in total over the course of a few days (around 16-32Mb per thread)

bq.  just pass in 0 for int tinyCacheSize, int smallCacheSize, int 
normalCacheSize.

Won't that obviate most of the benefit of the pooled buffers? 

I plan to simply prevent our deallocating on the other threads.

 Possible C* OOM issue during long running test
 --

 Key: CASSANDRA-7743
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7743
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Google Compute Engine, n1-standard-1
Reporter: Pierre Laporte
Assignee: Benedict
 Fix For: 2.1.0


 During a long running test, we ended up with a lot of 
 java.lang.OutOfMemoryError: Direct buffer memory errors on the Cassandra 
 instances.
 Here is an example of stacktrace from system.log :
 {code}
 ERROR [SharedPool-Worker-1] 2014-08-11 11:09:34,610 ErrorMessage.java:218 - 
 Unexpected exception during request
 java.lang.OutOfMemoryError: Direct buffer memory
 at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_25]
 at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) 
 ~[na:1.7.0_25]
 at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) 
 ~[na:1.7.0_25]
 at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocate(PoolArena.java:98) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:251)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:112)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:507) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:464)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:378) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:350) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25]
 {code}
 The test consisted of a 3-nodes cluster of n1-standard-1 GCE instances (1 
 vCPU, 3.75 GB RAM) running cassandra-2.1.0-rc5, and a n1-standard-2 instance 
 running the test.
 After ~2.5 days, several requests start to fail and we see the previous 
 stacktraces in the system.log file.
 The output from linux ‘free’ and ‘meminfo’ suggest that there is still memory 
 available.
 {code}
 $ free -m
 total  used   free sharedbuffers cached
 Mem:  3702   3532169  0161854
 -/+ buffers/cache:   2516   1185
 Swap:0  0  0
 $ head -n 4 /proc/meminfo
 MemTotal:3791292 kB
 MemFree:  173568 kB
 Buffers:  165608 kB
 Cached:   874752 kB
 {code}
 These errors do not affect all the queries we run. The cluster is still 
 responsive but is unable to display tracing information using cqlsh :
 {code}
 $ ./bin/nodetool --host 10.240.137.253 status duration_test
 Datacenter: DC1
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address Load   Tokens  Owns 

[jira] [Commented] (CASSANDRA-7704) FileNotFoundException during STREAM-OUT triggers 100% CPU usage

2014-08-14 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096779#comment-14096779
 ] 

Benedict commented on CASSANDRA-7704:
-

Not cleaning up resources is really not ideal in my book, however there is 
absolutely no reason we need to cancel with interruption - note this does *not* 
always result in a cancelled state if it ran, only if it was in the middle of 
running at the time (but still completed), and this can be fixed by not 
permitting it to be interrupted. However this is not the problem - in the test 
it will often be the case that the task was _genuinely_ successfully cancelled. 
In my opinion the test is broken, since previously there was _no_ guarantee 
that all cancellations would run (although the cancellation in the test case 
will); after the last task completes successfully the scheduled tasks were all 
removed from the queue (but *not* cancelled), so the behaviour of the future in 
this case would be to never return, which is much more surprising and 
inconsistent in my book.

I'm not entirely sure what the offending line is intended to test, anyway?

 FileNotFoundException during STREAM-OUT triggers 100% CPU usage
 ---

 Key: CASSANDRA-7704
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7704
 Project: Cassandra
  Issue Type: Bug
Reporter: Rick Branson
Assignee: Benedict
 Attachments: 7704.20.v2.txt, 7704.txt, backtrace.txt, other-errors.txt


 See attached backtrace which was what triggered this. This stream failed and 
 then ~12 seconds later it emitted that exception. At that point, all CPUs 
 went to 100%. A thread dump shows all the ReadStage threads stuck inside 
 IntervalTree.searchInternal inside of CFS.markReferenced().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-6726) Recycle CompressedRandomAccessReader/RandomAccessReader buffers independently of their owners, and move them off-heap when possible

2014-08-14 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-6726:


Assignee: Branimir Lambov

 Recycle CompressedRandomAccessReader/RandomAccessReader buffers independently 
 of their owners, and move them off-heap when possible
 ---

 Key: CASSANDRA-6726
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6726
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benedict
Assignee: Branimir Lambov
Priority: Minor
  Labels: performance
 Fix For: 3.0


 Whilst CRAR and RAR are pooled, we could and probably should pool the buffers 
 independently, so that they are not tied to a specific sstable. It may be 
 possible to move the RAR buffer off-heap, and the CRAR sometimes (e.g. Snappy 
 may possibly support off-heap buffers)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-5902) Dealing with hints after a topology change

2014-08-14 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-5902:


Assignee: Branimir Lambov

 Dealing with hints after a topology change
 --

 Key: CASSANDRA-5902
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5902
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Branimir Lambov
Priority: Minor

 Hints are stored and delivered by destination node id.  This allows them to 
 survive IP changes in the target, while making scan all the hints for a 
 given destination an efficient operation.  However, we do not detect and 
 handle new node assuming responsibility for the hinted row via bootstrap 
 before it can be delivered.
 I think we have to take a performance hit in this case -- we need to deliver 
 such a hint to *all* replicas, since we don't know which is the new one.  
 This happens infrequently enough, however -- requiring first the target node 
 to be down to create the hint, then the hint owner to be down long enough for 
 the target to both recover and stream to a new node -- that this should be 
 okay.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7039) DirectByteBuffer compatible LZ4 methods

2014-08-14 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-7039:


Assignee: Branimir Lambov  (was: Lyuben Todorov)

 DirectByteBuffer compatible LZ4 methods
 ---

 Key: CASSANDRA-7039
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7039
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
Priority: Minor
  Labels: performance
 Fix For: 3.0


 As we move more things off-heap, it's becoming more and more essential to be 
 able to use DirectByteBuffer (or native pointers) in various places. 
 Unfortunately LZ4 doesn't currently support this operation, despite being JNI 
 based - this means we both have to perform unnecessary copies to de/compress 
 data from DBB, but also we can stall GC as any JNI method operating over a 
 java array using the GetPrimitiveArrayCritical enters a critical section that 
 prevents GC for its duration. This means STWs will be at least as long any 
 running compression/decompression (and no GC will happen until they complete, 
 so it's additive).
 We should temporarily fork (and then resubmit upstream) jpountz-lz4 to 
 support operating over a native pointer, so that we can pass a DBB or a raw 
 pointer we have allocated ourselves. This will help improve performance when 
 flushing the new offheap memtables, as well as enable us to implement 
 CASSANDRA-6726 and finish CASSANDRA-4338.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-08-14 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-7546:


Fix Version/s: 2.1.1
   2.0.11

 AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
 -

 Key: CASSANDRA-7546
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: graham sanderson
Assignee: graham sanderson
 Fix For: 2.0.11, 2.1.1

 Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
 7546.20_alt.txt, suggestion1.txt, suggestion1_21.txt


 In order to preserve atomicity, this code attempts to read, clone/update, 
 then CAS the state of the partition.
 Under heavy contention for updating a single partition this can cause some 
 fairly staggering memory growth (the more cores on your machine the worst it 
 gets).
 Whilst many usage patterns don't do highly concurrent updates to the same 
 partition, hinting today, does, and in this case wild (order(s) of magnitude 
 more than expected) memory allocation rates can be seen (especially when the 
 updates being hinted are small updates to different partitions which can 
 happen very fast on their own) - see CASSANDRA-7545
 It would be best to eliminate/reduce/limit the spinning memory allocation 
 whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7561) On DROP we should invalidate CounterKeyCache as well as Key/Row cache

2014-08-14 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-7561:


Fix Version/s: 2.1.0

 On DROP we should invalidate CounterKeyCache as well as Key/Row cache
 -

 Key: CASSANDRA-7561
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7561
 Project: Cassandra
  Issue Type: Bug
Reporter: Benedict
Assignee: Aleksey Yeschenko
Priority: Minor
 Fix For: 2.1.0


 We should also probably ensure we don't attempt to auto save _any_ of the 
 caches while they are in an inconsistent state (i.e. there are keys present 
 to be saved that should not be restored, or that would throw exceptions when 
 we save (e.g. CounterCacheKey))



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7704) FileNotFoundException during STREAM-OUT triggers 100% CPU usage

2014-08-14 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-7704:


Fix Version/s: 2.1.0
   2.0.10

 FileNotFoundException during STREAM-OUT triggers 100% CPU usage
 ---

 Key: CASSANDRA-7704
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7704
 Project: Cassandra
  Issue Type: Bug
Reporter: Rick Branson
Assignee: Benedict
 Fix For: 2.0.10, 2.1.0

 Attachments: 7704.20.v2.txt, 7704.txt, backtrace.txt, other-errors.txt


 See attached backtrace which was what triggered this. This stream failed and 
 then ~12 seconds later it emitted that exception. At that point, all CPUs 
 went to 100%. A thread dump shows all the ReadStage threads stuck inside 
 IntervalTree.searchInternal inside of CFS.markReferenced().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-3852) use LIFO queueing policy when queue size exceeds thresholds

2014-08-14 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-3852:


Fix Version/s: 3.0

 use LIFO queueing policy when queue size exceeds thresholds
 ---

 Key: CASSANDRA-3852
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3852
 Project: Cassandra
  Issue Type: Improvement
Reporter: Peter Schuller
Assignee: Peter Schuller
  Labels: performance
 Fix For: 3.0


 A strict FIFO policy for queueing (between stages) is detrimental to latency 
 and forward progress. Whenever a node is saturated beyond incoming request 
 rate, *all* requests become slow. If it is consistently saturated, you start 
 effectively timing out on *all* requests.
 A much better strategy from the point of view of latency is to serve a subset 
 requests quickly, and letting some time out, rather than letting all either 
 time out or be slow.
 Care must be taken such that:
 * We still guarantee that requests are processed reasonably timely (we 
 couldn't go strict LIFO for example as that would result in requests getting 
 stuck potentially forever on a loaded node).
 * Maybe, depending on the previous point's solution, ensure that some 
 requests bypass the policy and get prioritized (e.g., schema migrations, or 
 anything internal to a node).
 A possible implementation is to go LIFO whenever there are requests in the 
 queue that are older than N milliseconds (or a certain queue size, etc).
 Benefits:
 * All cases where the client is directly, or is indirectly affecting through 
 other layers, a system which has limited concurrency (e.g., thread pool size 
 of X to serve some incoming request rate), it is *much* better for a few 
 requests to time out while most are serviced quickly, than for all requests 
 to become slow, as it doesn't explode concurrency. Think any random 
 non-super-advanced php app, ruby web app, java servlet based app, etc. 
 Essentially, it optimizes very heavily for improved average latencies.
 * Systems with strict p95/p99/p999 requirements on latencies should greatly 
 benefit from such a policy. For example, suppose you have a system at 85% of 
 capacity, and it takes a write spike (or has a hiccup like GC pause, blocking 
 on a commit log write, etc). Suppose the hiccup racks up 500 ms worth of 
 requests. At 15% margin at steady state, that takes 500ms * 100/15 = 3.2 
 seconds to recover. Instead of *all* requests for an entire 3.2 second window 
 being slow, we'd serve requests quickly for 2.7 of those seconds, with the 
 incoming requests during that 500 ms interval being the ones primarily 
 affected. The flip side though is that once you're at the point where more 
 than N percent of requests end up having to wait for others to take LIFO 
 priority, the p(100-N) latencies will actually be *worse* than without this 
 change (but at this point you have to consider what the root reason for those 
 pXX requirements are).
 * In the case of complete saturation, it allows forward progress. Suppose 
 you're taking 25% more traffic than you are able to handle. Instead of 
 getting backed up and ending up essentially timing out *every single 
 request*, you will succeed in processing up to 75% of them (I say up to 
 because it depends; for example on a {{QUORUM}} request you need at least two 
 of the requests from the co-ordinator to succeed so the percentage is brought 
 down) and allowing clients to make forward progress and get work done, rather 
 than being stuck.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7542) Reduce CAS contention

2014-08-14 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096805#comment-14096805
 ] 

Benedict commented on CASSANDRA-7542:
-

[~kohlisankalp] any news?

 Reduce CAS contention
 -

 Key: CASSANDRA-7542
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7542
 Project: Cassandra
  Issue Type: Improvement
Reporter: sankalp kohli
Assignee: Benedict
 Fix For: 2.0.10


 CAS updates on same CQL partition can lead to heavy contention inside C*. I 
 am looking for simple ways(no algorithmic changes) to reduce contention as 
 the penalty of it is high in terms of latency, specially for reads. 
 We can put some sort of synchronization on CQL partition at StorageProxy 
 level. This will reduce contention at least for all requests landing on one 
 box for same partition. 
 Here is an example of why it will help:
 1) Say 1 write and 2 read CAS requests for the same partition key is send to 
 C* in parallel. 
 2) Since client is token-aware, it sends these 3 request to the same C* 
 instance A. (Lets assume that all 3 requests goto same instance A) 
 3) In this C* instance A, all 3 CAS requests will contend with each other in 
 Paxos. (This is bad)
 To improve contention in 3), what I am proposing is to add a lock on 
 partition key similar to what we do in PaxosState.java to serialize these 3 
 requests. This will remove the contention and improve performance as these 3 
 requests will not collide with each other.
 Another improvement we can do in client is to pick a deterministic live 
 replica for a given partition doing CAS.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (CASSANDRA-6780) Memtable OffHeap GC Statistics

2014-08-14 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict resolved CASSANDRA-6780.
-

Resolution: Later

Since offheap GC has been postponed indefinitely, this ticket should also be 
closed to revisit later.

 Memtable OffHeap GC Statistics
 --

 Key: CASSANDRA-6780
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6780
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benedict
Priority: Minor

 As mentioned in CASSANDRA-6689, it would be nice to expose via JMX some 
 statistics on GC behaviour, instead of just optionally debug logging it (and 
 maybe expand to cover some more things):
 - Time spent in GC
 - Amount of memory reclaimed
 - Number of collections (per CFS?), and average reclaimed per collection



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (CASSANDRA-6709) Changes to KeyCache

2014-08-14 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict resolved CASSANDRA-6709.
-

Resolution: Later

Closing as the new sstable format most likely makes this unnecessary by 
eliminating the need for a separate key cache, although we *may* want to 
revisit this at some point afterwards since a separate cache could still be 
beneficial by improving memory occupancy rate, so closing as later instead of 
duplicate.

 Changes to KeyCache
 ---

 Key: CASSANDRA-6709
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6709
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benedict
Priority: Minor

 It seems to me that KeyCache can be improved in a number of ways, but first 
 let's state the basic goal of KeyCache: to reduce the average query response 
 time by providing an exact seek position in a file for a given key.
 As it stands, KeyCache is both 100% accurate, but requires a lot of overhead 
 per entry.
 I propose to make KeyCache *mostly* accurate (say 99.%), by which I means 
 it will always fail accurately, but may rarely return an incorrect address, 
 and code the end users of it to be able to retry to confirm they seeked to 
 the correct position in the file, and to retry without the cache if they did 
 not.
 The advantage of this is that we can both take the cache off-heap easily, and 
 pack a lot more items into the cache. If we permit collisions across files 
 and simply use the (full 128-bit) murmur hash of the key + cfId + file 
 generation, we should get enough uniqueness to rarely have an erroneuous 
 collision, however we will be using only 20 bytes per key, instead of the 
 current 100 + key length bytes. This should allow us to answer far more 
 queries from the key cache than before, so the positive improvement to 
 performance should be greater than the negative drain.
 For the structure I propose an associative cache, where a single contiguous 
 address space is broken up into regions of, say, 8 entries, plus one counter. 
 The counter tracks the recency of access of each of the entries, so that on 
 write the least recently accessed/written can be replaced. A linear probe 
 within the region is used to determine if the entry we're looking for is 
 present. This should be very quick, as the entire region should fit into one 
 or two lines of L1.
 Advantage: we may see 5x bump in cache hit-rate, or even more for large keys.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (CASSANDRA-6802) Row cache improvements

2014-08-14 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict resolved CASSANDRA-6802.
-

Resolution: Later

Since offheap GC has been postponed indefinitely, this ticket should also be 
closed to revisit later.

 Row cache improvements
 --

 Key: CASSANDRA-6802
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6802
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
  Labels: performance
 Fix For: 3.0


 There are a few things we could do;
 * Start using the native memory constructs from CASSANDRA-6694 to avoid 
 serialization/deserialization costs and to minimize the on-heap overhead
 * Stop invalidating cached rows on writes (update on write instead).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (CASSANDRA-5019) Still too much object allocation on reads

2014-08-14 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict resolved CASSANDRA-5019.
-

Resolution: Duplicate

Since the read path will be rewritten as part of efforts to introduce 
CASSANDRA-7447 (both regards internal APIs, and the implementation details for 
the new format), this ticket should be addressed by doing things right here. 
This may mean the legacy format continues to be somewhat inefficient, but this 
may or may not eventually be retired entirely, so there is probably not much 
point spending a lot of time optimising it, esp. when the impact is unknown and 
probably not dramatic in relation to the other costs associated with this 
format.

 Still too much object allocation on reads
 -

 Key: CASSANDRA-5019
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5019
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
  Labels: performance
 Fix For: 3.0


 ArrayBackedSortedColumns was a step in the right direction but it's still 
 relatively heavyweight thanks to allocating individual Columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7029) Investigate alternative transport protocols for both client and inter-server communications

2014-08-14 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096817#comment-14096817
 ] 

Benedict commented on CASSANDRA-7029:
-

mTCP is not stable enough, nor universal enough, to be useful to us. It 
requires very specific linux kernel versions, and very specific network 
interfaces, in order to work. If it matures it will be worth revisiting.

 Investigate alternative transport protocols for both client and inter-server 
 communications
 ---

 Key: CASSANDRA-7029
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7029
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 There are a number of reasons to think we can do better than TCP for our 
 communications:
 1) We can actually tolerate sporadic small message losses, so guaranteed 
 delivery isn't essential (although for larger messages it probably is)
 2) As shown in \[1\] and \[2\], Linux can behave quite suboptimally with 
 regard to TCP message delivery when the system is under load. Judging from 
 the theoretical description, this is likely to apply even when the 
 system-load is not high, but the number of processes to schedule is high. 
 Cassandra generally has a lot of threads to schedule, so this is quite 
 pertinent for us. UDP performs substantially better here.
 3) Even when the system is not under load, UDP has a lower CPU burden, and 
 that burden is constant regardless of the number of connections it processes. 
 4) On a simple benchmark on my local PC, using non-blocking IO for UDP and 
 busy spinning on IO I can actually push 20-40% more throughput through 
 loopback (where TCP should be optimal, as no latency), even for very small 
 messages. Since we can see networking taking multiple CPUs' worth of time 
 during a stress test, using a busy-spin for ~100micros after last message 
 receipt is almost certainly acceptable, especially as we can (ultimately) 
 process inter-server and client communications on the same thread/socket in 
 this model.
 5) We can optimise the threading model heavily: since we generally process 
 very small messages (200 bytes not at all implausible), the thread signalling 
 costs on the processing thread can actually dramatically impede throughput. 
 In general it costs ~10micros to signal (and passing the message to another 
 thread for processing in the current model requires signalling). For 200-byte 
 messages this caps our throughput at 20MB/s.
 I propose to knock up a highly naive UDP-based connection protocol with 
 super-trivial congestion control over the course of a few days, with the only 
 initial goal being maximum possible performance (not fairness, reliability, 
 or anything else), and trial it in Netty (possibly making some changes to 
 Netty to mitigate thread signalling costs). The reason for knocking up our 
 own here is to get a ceiling on what the absolute limit of potential for this 
 approach is. Assuming this pans out with performance gains in C* proper, we 
 then look to contributing to/forking the udt-java project and see how easy it 
 is to bring performance in line with what we can get with our naive approach 
 (I don't suggest starting here, as the project is using blocking old-IO, and 
 modifying it with latency in mind may be challenging, and we won't know for 
 sure what the best case scenario is).
 \[1\] 
 http://test-docdb.fnal.gov/0016/001648/002/Potential%20Performance%20Bottleneck%20in%20Linux%20TCP.PDF
 \[2\] 
 http://cd-docdb.fnal.gov/cgi-bin/RetrieveFile?docid=1968;filename=Performance%20Analysis%20of%20Linux%20Networking%20-%20Packet%20Receiving%20(Official).pdf;version=2
 Further related reading:
 http://public.dhe.ibm.com/software/commerce/doc/mft/cdunix/41/UDTWhitepaper.pdf
 https://mospace.umsystem.edu/xmlui/bitstream/handle/10355/14482/ChoiUndPerTcp.pdf?sequence=1
 https://access.redhat.com/site/documentation/en-US/JBoss_Enterprise_Web_Platform/5/html/Administration_And_Configuration_Guide/jgroups-perf-udpbuffer.html
 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.153.3762rep=rep1type=pdf



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7542) Reduce CAS contention

2014-08-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098326#comment-14098326
 ] 

Benedict commented on CASSANDRA-7542:
-

OK. Not sure if it is worth our pursuing this right now then, at least as far 
as a 2.0 delivery is concerned. When I get some more free time I'll create some 
benchmarks to test how much of an improvement these (or future) changes have.

 Reduce CAS contention
 -

 Key: CASSANDRA-7542
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7542
 Project: Cassandra
  Issue Type: Improvement
Reporter: sankalp kohli
Assignee: Benedict
 Fix For: 2.0.10


 CAS updates on same CQL partition can lead to heavy contention inside C*. I 
 am looking for simple ways(no algorithmic changes) to reduce contention as 
 the penalty of it is high in terms of latency, specially for reads. 
 We can put some sort of synchronization on CQL partition at StorageProxy 
 level. This will reduce contention at least for all requests landing on one 
 box for same partition. 
 Here is an example of why it will help:
 1) Say 1 write and 2 read CAS requests for the same partition key is send to 
 C* in parallel. 
 2) Since client is token-aware, it sends these 3 request to the same C* 
 instance A. (Lets assume that all 3 requests goto same instance A) 
 3) In this C* instance A, all 3 CAS requests will contend with each other in 
 Paxos. (This is bad)
 To improve contention in 3), what I am proposing is to add a lock on 
 partition key similar to what we do in PaxosState.java to serialize these 3 
 requests. This will remove the contention and improve performance as these 3 
 requests will not collide with each other.
 Another improvement we can do in client is to pick a deterministic live 
 replica for a given partition doing CAS.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7704) FileNotFoundException during STREAM-OUT triggers 100% CPU usage

2014-08-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-7704:


Attachment: 7704-2.1.txt

Attaching a new version which does not cancel the task that was run, and 
updates the unit tests to match the new behaviour

 FileNotFoundException during STREAM-OUT triggers 100% CPU usage
 ---

 Key: CASSANDRA-7704
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7704
 Project: Cassandra
  Issue Type: Bug
Reporter: Rick Branson
Assignee: Benedict
 Fix For: 2.0.10, 2.1.0

 Attachments: 7704-2.1.txt, 7704.txt, backtrace.txt, other-errors.txt


 See attached backtrace which was what triggered this. This stream failed and 
 then ~12 seconds later it emitted that exception. At that point, all CPUs 
 went to 100%. A thread dump shows all the ReadStage threads stuck inside 
 IntervalTree.searchInternal inside of CFS.markReferenced().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7704) FileNotFoundException during STREAM-OUT triggers 100% CPU usage

2014-08-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-7704:


Attachment: (was: 7704.20.v2.txt)

 FileNotFoundException during STREAM-OUT triggers 100% CPU usage
 ---

 Key: CASSANDRA-7704
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7704
 Project: Cassandra
  Issue Type: Bug
Reporter: Rick Branson
Assignee: Benedict
 Fix For: 2.0.10, 2.1.0

 Attachments: 7704-2.1.txt, 7704.txt, backtrace.txt, other-errors.txt


 See attached backtrace which was what triggered this. This stream failed and 
 then ~12 seconds later it emitted that exception. At that point, all CPUs 
 went to 100%. A thread dump shows all the ReadStage threads stuck inside 
 IntervalTree.searchInternal inside of CFS.markReferenced().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7763) cql_tests static_with_empty_clustering test failure

2014-08-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098406#comment-14098406
 ] 

Benedict commented on CASSANDRA-7763:
-

It's a shame we've spot this now, as it's a bit late to optimise this again for 
2.1, but we should perhaps revisit later (for 3.0), as the introduction of 
these virtual method invocations was a large part of the reason for 
CASSANDRA-6934 in the first place. It should be possible to avoid these 
invocations on most calls, since we only actually incur static columns 
infrequently, but let's leave it for now.

This patch does need to include the changes to the 
AbstractCType.compareUnsigned, WithCollection.compare() and 
AbstractNativeCell.compare() methods as well though



 cql_tests static_with_empty_clustering test failure
 ---

 Key: CASSANDRA-7763
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7763
 Project: Cassandra
  Issue Type: Bug
Reporter: Ryan McGuire
Assignee: Sylvain Lebresne
 Fix For: 2.1 rc6

 Attachments: 7763.txt


 {code}
 ==
 FAIL: static_with_empty_clustering_test (cql_tests.TestCQL)
 --
 Traceback (most recent call last):
   File /home/ryan/git/datastax/cassandra-dtest/tools.py, line 213, in 
 wrapped
 f(obj)
   File /home/ryan/git/datastax/cassandra-dtest/cql_tests.py, line 4082, in 
 static_with_empty_clustering_test
 assert_one(cursor, SELECT * FROM test, ['partition1', '', 'static 
 value', 'value'])
   File /home/ryan/git/datastax/cassandra-dtest/assertions.py, line 40, in 
 assert_one
 assert res == [expected], res
 AssertionError: [[u'partition1', u'', None, None], [u'partition1', u'', None, 
 None], [u'partition1', u'', None, u'value']]
   begin captured logging  
 dtest: DEBUG: cluster ccm directory: /tmp/dtest-Ex54V7
 -  end captured logging  -
 --
 Ran 1 test in 6.866s
 FAILED (failures=1)
 {code}
 regression from CASSANDRA-7455?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7561) On DROP we should invalidate CounterKeyCache as well as Key/Row cache

2014-08-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098811#comment-14098811
 ] 

Benedict commented on CASSANDRA-7561:
-

bq. Well. It shouldn't be throwing any exceptions, AFAIK

CounterCacheKey.getPathInfo() is called during serialization, which is not safe 
if the CF has been dropped (since it will get a null cf back). So we still need 
to address preventing an autosave happening whilst the map contains keys that 
are in a dropped CF, or we need getPathInfo() at least to be safe during this 
(and return a result that is valid for all use cases), whichever is easiest.

 On DROP we should invalidate CounterKeyCache as well as Key/Row cache
 -

 Key: CASSANDRA-7561
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7561
 Project: Cassandra
  Issue Type: Bug
Reporter: Benedict
Assignee: Aleksey Yeschenko
Priority: Minor
 Fix For: 2.1.0

 Attachments: 7561.txt


 We should also probably ensure we don't attempt to auto save _any_ of the 
 caches while they are in an inconsistent state (i.e. there are keys present 
 to be saved that should not be restored, or that would throw exceptions when 
 we save (e.g. CounterCacheKey))



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-7561) On DROP we should invalidate CounterKeyCache as well as Key/Row cache

2014-08-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098811#comment-14098811
 ] 

Benedict edited comment on CASSANDRA-7561 at 8/15/14 5:46 PM:
--

bq. Well. It shouldn't be throwing any exceptions, AFAIK

CounterCacheKey.getPathInfo() is called during serialization, which is not safe 
if the CF has been dropped (since it will get a null cf back). So we still need 
to address preventing an autosave happening whilst the map contains keys that 
are in a dropped CF, or we need getPathInfo() at least to be safe during this 
(and return a result that is valid for all use cases), whichever is easiest.

It looks like this bug may affect the row cache as well, except that we've 
simply never noticed it since the window is too small. I filed this ticket a 
long time ago so cannot remember where/why I saw this happen. Mea culpa for not 
filling it into the ticket in the first place.


was (Author: benedict):
bq. Well. It shouldn't be throwing any exceptions, AFAIK

CounterCacheKey.getPathInfo() is called during serialization, which is not safe 
if the CF has been dropped (since it will get a null cf back). So we still need 
to address preventing an autosave happening whilst the map contains keys that 
are in a dropped CF, or we need getPathInfo() at least to be safe during this 
(and return a result that is valid for all use cases), whichever is easiest.

 On DROP we should invalidate CounterKeyCache as well as Key/Row cache
 -

 Key: CASSANDRA-7561
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7561
 Project: Cassandra
  Issue Type: Bug
Reporter: Benedict
Assignee: Aleksey Yeschenko
Priority: Minor
 Fix For: 2.1.0

 Attachments: 7561.txt


 We should also probably ensure we don't attempt to auto save _any_ of the 
 caches while they are in an inconsistent state (i.e. there are keys present 
 to be saved that should not be restored, or that would throw exceptions when 
 we save (e.g. CounterCacheKey))



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7561) On DROP we should invalidate CounterKeyCache as well as Key/Row cache

2014-08-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098858#comment-14098858
 ] 

Benedict commented on CASSANDRA-7561:
-

Since this is holding up 2.1-rc6, I'm comfortable splitting the remainder of 
the fix out into a separate ticket. The code as it stands at least reduces the 
bug to a window of risk after DROP rather than a guaranteed failure.

 On DROP we should invalidate CounterKeyCache as well as Key/Row cache
 -

 Key: CASSANDRA-7561
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7561
 Project: Cassandra
  Issue Type: Bug
Reporter: Benedict
Assignee: Aleksey Yeschenko
Priority: Minor
 Fix For: 2.1.0

 Attachments: 7561.txt


 We should also probably ensure we don't attempt to auto save _any_ of the 
 caches while they are in an inconsistent state (i.e. there are keys present 
 to be saved that should not be restored, or that would throw exceptions when 
 we save (e.g. CounterCacheKey))



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-7784) DROP table leaves the counter and row cache in a temporarily inconsistent state that, if saved during, will cause an exception to be thrown

2014-08-15 Thread Benedict (JIRA)
Benedict created CASSANDRA-7784:
---

 Summary: DROP table leaves the counter and row cache in a 
temporarily inconsistent state that, if saved during, will cause an exception 
to be thrown
 Key: CASSANDRA-7784
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7784
 Project: Cassandra
  Issue Type: Bug
Reporter: Benedict
Assignee: Aleksey Yeschenko
Priority: Minor


It looks like this is quite a realistic race to hit reasonably often, since we 
forceBlockingFlush after removing from Schema.cfIdMap, so there could be a 
lengthy window to overlap with an auto-save



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (CASSANDRA-7743) Possible C* OOM issue during long running test

2014-08-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict reopened CASSANDRA-7743:
-

Tester: Pierre Laporte

I'd like to get confirmation this bug is fixed before resolving it, but no 
reason to hold up rc6 for that.

[~pingtimeout] do you think you'll be able to try this out?

 Possible C* OOM issue during long running test
 --

 Key: CASSANDRA-7743
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7743
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Google Compute Engine, n1-standard-1
Reporter: Pierre Laporte
Assignee: Benedict
 Fix For: 2.1 rc6


 During a long running test, we ended up with a lot of 
 java.lang.OutOfMemoryError: Direct buffer memory errors on the Cassandra 
 instances.
 Here is an example of stacktrace from system.log :
 {code}
 ERROR [SharedPool-Worker-1] 2014-08-11 11:09:34,610 ErrorMessage.java:218 - 
 Unexpected exception during request
 java.lang.OutOfMemoryError: Direct buffer memory
 at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_25]
 at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) 
 ~[na:1.7.0_25]
 at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) 
 ~[na:1.7.0_25]
 at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocate(PoolArena.java:98) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:251)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:112)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:507) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:464)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:378) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:350) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25]
 {code}
 The test consisted of a 3-nodes cluster of n1-standard-1 GCE instances (1 
 vCPU, 3.75 GB RAM) running cassandra-2.1.0-rc5, and a n1-standard-2 instance 
 running the test.
 After ~2.5 days, several requests start to fail and we see the previous 
 stacktraces in the system.log file.
 The output from linux ‘free’ and ‘meminfo’ suggest that there is still memory 
 available.
 {code}
 $ free -m
 total  used   free sharedbuffers cached
 Mem:  3702   3532169  0161854
 -/+ buffers/cache:   2516   1185
 Swap:0  0  0
 $ head -n 4 /proc/meminfo
 MemTotal:3791292 kB
 MemFree:  173568 kB
 Buffers:  165608 kB
 Cached:   874752 kB
 {code}
 These errors do not affect all the queries we run. The cluster is still 
 responsive but is unable to display tracing information using cqlsh :
 {code}
 $ ./bin/nodetool --host 10.240.137.253 status duration_test
 Datacenter: DC1
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address Load   Tokens  Owns (effective)  Host ID  
  Rack
 UN  10.240.98.27925.17 KB  256 100.0%
 41314169-eff5-465f-85ea-d501fd8f9c5e  RAC1
 UN  10.240.137.253  1.1 MB 256 100.0%
 c706f5f9-c5f3-4d5e-95e9-a8903823827e  RAC1
 UN  

[jira] [Updated] (CASSANDRA-6809) Compressed Commit Log

2014-08-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-6809:


Assignee: Branimir Lambov

 Compressed Commit Log
 -

 Key: CASSANDRA-6809
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6809
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benedict
Assignee: Branimir Lambov
Priority: Minor
  Labels: performance
 Fix For: 3.0


 It seems an unnecessary oversight that we don't compress the commit log. 
 Doing so should improve throughput, but some care will need to be taken to 
 ensure we use as much of a segment as possible. I propose decoupling the 
 writing of the records from the segments. Basically write into a (queue of) 
 DirectByteBuffer, and have the sync thread compress, say, ~64K chunks every X 
 MB written to the CL (where X is ordinarily CLS size), and then pack as many 
 of the compressed chunks into a CLS as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-6572) Workload recording / playback

2014-08-16 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-6572:


Assignee: (was: Lyuben Todorov)

 Workload recording / playback
 -

 Key: CASSANDRA-6572
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6572
 Project: Cassandra
  Issue Type: New Feature
  Components: Core, Tools
Reporter: Jonathan Ellis
 Fix For: 2.1.1

 Attachments: 6572-trunk.diff


 Write sample mode gets us part way to testing new versions against a real 
 world workload, but we need an easy way to test the query side as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7468) Add time-based execution to cassandra-stress

2014-08-16 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099525#comment-14099525
 ] 

Benedict commented on CASSANDRA-7468:
-

FTR, I'm planning to address this once CASSANDRA-7519 is committed, since this 
is not super-high priority.

 Add time-based execution to cassandra-stress
 

 Key: CASSANDRA-7468
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7468
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Matt Kennedy
Assignee: Matt Kennedy
Priority: Minor
 Fix For: 2.1.1

 Attachments: 7468v2.txt, trunk-7468-rebase.patch, trunk-7468.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7232) Enable live replay of commit logs

2014-08-16 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099524#comment-14099524
 ] 

Benedict commented on CASSANDRA-7232:
-

Missed this due to status != Patch Available

I'm not keen on passing properties around using System.get/setProperty after 
system startup. We should modify CommitLogReplay so we can instantiate it with 
a specific PIT, and construct one specifically for this out-of-band restore. 
Also the comment is inaccurate, stating it is the point to restore _from_, not 
_to_. However it would be useful to be able to provide both, as presumably the 
commitlog archive directory will have more logs than needed.

 Enable live replay of commit logs
 -

 Key: CASSANDRA-7232
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7232
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Patrick McFadin
Assignee: Lyuben Todorov
Priority: Minor
 Fix For: 2.0.10

 Attachments: 
 0001-Expose-CommitLog-recover-to-JMX-add-nodetool-cmd-for.patch, 
 0001-TRUNK-JMX-and-nodetool-cmd-for-commitlog-replay.patch


 Replaying commit logs takes a restart but restoring sstables can be an online 
 operation with refresh. In order to restore a point-in-time without a 
 restart, the node needs to live replay the commit logs from JMX and a 
 nodetool command.
 nodetool refreshcommitlogs keyspace table



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-7468) Add time-based execution to cassandra-stress

2014-08-16 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099525#comment-14099525
 ] 

Benedict edited comment on CASSANDRA-7468 at 8/16/14 6:31 AM:
--

FTR, I'm planning to address this (the lack of presence on user commands, not 
the behaviour with auto mode running the test multiple times, as this is not a 
bug) once CASSANDRA-7519 is committed, since this is not super-high priority.


was (Author: benedict):
FTR, I'm planning to address this once CASSANDRA-7519 is committed, since this 
is not super-high priority.

 Add time-based execution to cassandra-stress
 

 Key: CASSANDRA-7468
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7468
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Matt Kennedy
Assignee: Matt Kennedy
Priority: Minor
 Fix For: 2.1.1

 Attachments: 7468v2.txt, trunk-7468-rebase.patch, trunk-7468.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7704) FileNotFoundException during STREAM-OUT triggers 100% CPU usage

2014-08-16 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099527#comment-14099527
 ] 

Benedict commented on CASSANDRA-7704:
-

Committed to 2.0, 2.1.0 and 2.1 branches. I overwrote 2.0's contents with 
2.1's, only removing the repairedAt property, since the only other difference 
was the lack of aborted property preventing inconsistent state.

 FileNotFoundException during STREAM-OUT triggers 100% CPU usage
 ---

 Key: CASSANDRA-7704
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7704
 Project: Cassandra
  Issue Type: Bug
Reporter: Rick Branson
Assignee: Benedict
 Fix For: 2.0.10, 2.1.0

 Attachments: 7704-2.1.txt, 7704.txt, backtrace.txt, other-errors.txt


 See attached backtrace which was what triggered this. This stream failed and 
 then ~12 seconds later it emitted that exception. At that point, all CPUs 
 went to 100%. A thread dump shows all the ReadStage threads stuck inside 
 IntervalTree.searchInternal inside of CFS.markReferenced().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (CASSANDRA-7754) FileNotFoundException in MemtableFlushWriter

2014-08-17 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict resolved CASSANDRA-7754.
-

Resolution: Not a Problem

[~shalupov] the first exception you posted is occurring during creation of the 
initial file for writing, the last exception you posted is not related to the 
other two, and the middle exception appears to be thrown during abort of a 
write due to some other error which then finds the data it had been writing is 
now missing, so I suggest you most likely have some problems with your file 
system. I would check your ACLs are all in order, and look for background 
cleanup / archive processes. 

I currently doubt there is a problem with C* from the information you've 
posted, especially as this code is exercised regularly and we haven't seen any 
issues elsewhere, but if after further investigation you continue to be 
convinced there is a bug, please reopen the ticket with some more information 
and reproduction steps so we can try to replicate it ourselves.

 FileNotFoundException in MemtableFlushWriter
 

 Key: CASSANDRA-7754
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7754
 Project: Cassandra
  Issue Type: Bug
 Environment: Linux, OpenJDK 1.7
Reporter: Leonid Shalupov
Priority: Critical

 Exception in cassandra logs, after upgrade to 2.1:
 [MemtableFlushWriter:91] ERROR o.a.c.service.CassandraDaemon - Exception in 
 thread Thread[MemtableFlushWriter:91,5,main]
 java.lang.RuntimeException: java.io.FileNotFoundException: 
 /xxx/cassandra/data/system/batchlog-0290003c977e397cac3efdfdc01d626b/system-batchlog-tmp-ka-186-Index.db
  (No such file or directory)
   at 
 org.apache.cassandra.io.util.SequentialWriter.init(SequentialWriter.java:75)
  ~[cassandra-all-2.1.0-rc5.jar:2.1.0-rc5]
   at 
 org.apache.cassandra.io.util.SequentialWriter.open(SequentialWriter.java:104) 
 ~[cassandra-all-2.1.0-rc5.jar:2.1.0-rc5]
   at 
 org.apache.cassandra.io.util.SequentialWriter.open(SequentialWriter.java:99) 
 ~[cassandra-all-2.1.0-rc5.jar:2.1.0-rc5]
   at 
 org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.init(SSTableWriter.java:550)
  ~[cassandra-all-2.1.0-rc5.jar:2.1.0-rc5]
   at 
 org.apache.cassandra.io.sstable.SSTableWriter.init(SSTableWriter.java:134) 
 ~[cassandra-all-2.1.0-rc5.jar:2.1.0-rc5]
   at 
 org.apache.cassandra.db.Memtable$FlushRunnable.createFlushWriter(Memtable.java:383)
  ~[cassandra-all-2.1.0-rc5.jar:2.1.0-rc5]
   at 
 org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:330)
  ~[cassandra-all-2.1.0-rc5.jar:2.1.0-rc5]
   at 
 org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:314) 
 ~[cassandra-all-2.1.0-rc5.jar:2.1.0-rc5]
   at 
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
  ~[cassandra-all-2.1.0-rc5.jar:2.1.0-rc5]
   at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
 ~[cassandra-all-2.1.0-rc5.jar:2.1.0-rc5]
   at 
 com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
  ~[guava-16.0.jar:na]
   at 
 org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1054)
  ~[cassandra-all-2.1.0-rc5.jar:2.1.0-rc5]
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  ~[na:1.7.0_65]
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  ~[na:1.7.0_65]
   at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_65]
 Caused by: java.io.FileNotFoundException: 
 /xxx/cassandra/data/system/batchlog-0290003c977e397cac3efdfdc01d626b/system-batchlog-tmp-ka-186-Index.db
  (No such file or directory)
   at java.io.RandomAccessFile.open(Native Method) ~[na:1.7.0_65]
   at java.io.RandomAccessFile.init(RandomAccessFile.java:241) 
 ~[na:1.7.0_65]
   at 
 org.apache.cassandra.io.util.SequentialWriter.init(SequentialWriter.java:71)
  ~[cassandra-all-2.1.0-rc5.jar:2.1.0-rc5]
   ... 14 common frames omitted



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7743) Possible C* OOM issue during long running test

2014-08-17 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100197#comment-14100197
 ] 

Benedict commented on CASSANDRA-7743:
-

Did you see the actual error, or have more info than meminfo? Because that is 
not at all conclusive by itself.

 Possible C* OOM issue during long running test
 --

 Key: CASSANDRA-7743
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7743
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Google Compute Engine, n1-standard-1
Reporter: Pierre Laporte
Assignee: Benedict
 Fix For: 2.1 rc6


 During a long running test, we ended up with a lot of 
 java.lang.OutOfMemoryError: Direct buffer memory errors on the Cassandra 
 instances.
 Here is an example of stacktrace from system.log :
 {code}
 ERROR [SharedPool-Worker-1] 2014-08-11 11:09:34,610 ErrorMessage.java:218 - 
 Unexpected exception during request
 java.lang.OutOfMemoryError: Direct buffer memory
 at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_25]
 at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) 
 ~[na:1.7.0_25]
 at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) 
 ~[na:1.7.0_25]
 at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocate(PoolArena.java:98) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:251)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:112)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:507) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:464)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:378) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:350) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25]
 {code}
 The test consisted of a 3-nodes cluster of n1-standard-1 GCE instances (1 
 vCPU, 3.75 GB RAM) running cassandra-2.1.0-rc5, and a n1-standard-2 instance 
 running the test.
 After ~2.5 days, several requests start to fail and we see the previous 
 stacktraces in the system.log file.
 The output from linux ‘free’ and ‘meminfo’ suggest that there is still memory 
 available.
 {code}
 $ free -m
 total  used   free sharedbuffers cached
 Mem:  3702   3532169  0161854
 -/+ buffers/cache:   2516   1185
 Swap:0  0  0
 $ head -n 4 /proc/meminfo
 MemTotal:3791292 kB
 MemFree:  173568 kB
 Buffers:  165608 kB
 Cached:   874752 kB
 {code}
 These errors do not affect all the queries we run. The cluster is still 
 responsive but is unable to display tracing information using cqlsh :
 {code}
 $ ./bin/nodetool --host 10.240.137.253 status duration_test
 Datacenter: DC1
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address Load   Tokens  Owns (effective)  Host ID  
  Rack
 UN  10.240.98.27925.17 KB  256 100.0%
 41314169-eff5-465f-85ea-d501fd8f9c5e  RAC1
 UN  10.240.137.253  1.1 MB 256 100.0%
 c706f5f9-c5f3-4d5e-95e9-a8903823827e  RAC1
 UN  10.240.72.183   896.57 KB  256 100.0%

[jira] [Commented] (CASSANDRA-7519) Further stress improvements to generate more realistic workloads

2014-08-17 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100237#comment-14100237
 ] 

Benedict commented on CASSANDRA-7519:
-

bq. I plan to run some test workloads to double check the logic, but first cut 
of the code looked good. I left a couple comments on the github branch

Thanks!

bq. I'm not very keen on the new labels you've chosen for the insert section of 
the yaml file, They should be more verbose

Nomenclature is always tricky, certainly not fixed on them. Although by making 
these more verbose we'll need to make the command line correspondingly more 
verbose to keep them in sync, which I'm not super keen on, but not too fussed 
about either.

bq. partitions_per_batch maybe?

perhaps partitions_per_operation? because per_batch implies we might change the 
number of partitions between batches, whereas we work with the same partitions 
for the duration of an 'operation' (the n= declared on command line)...

bq. batch_split_count

batches_per_operation?


 Further stress improvements to generate more realistic workloads
 

 Key: CASSANDRA-7519
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7519
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Benedict
Assignee: Benedict
Priority: Minor
  Labels: tools
 Fix For: 2.1.1


 We generally believe that the most common workload is for reads to 
 exponentially prefer most recently written data. However as stress currently 
 behaves we have two id generation modes: sequential and random (although 
 random can be distributed). I propose introducing a new mode which is 
 somewhat like sequential, except we essentially 'look back' from the current 
 id by some amount defined by a distribution. I may possibly make the position 
 only increment as it's first written to also, so that this mode can be run 
 from a clean slate with a mixed workload. This should allow is to generate 
 workloads that are more representative.
 At the same time, I will introduce a timestamp value generator for primary 
 key columns that is strictly ascending, i.e. has some random component but is 
 based off of the actual system time (or some shared monotonically increasing 
 state) so that we can again generate a more realistic workload. This may be 
 challenging to tie in with the new procedurally generated partitions, but I'm 
 sure it can be done without too much difficulty.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7705) Safer Resource Management

2014-08-17 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100260#comment-14100260
 ] 

Benedict commented on CASSANDRA-7705:
-

Linked four related tickets

 Safer Resource Management
 -

 Key: CASSANDRA-7705
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7705
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
 Fix For: 3.0


 We've had a spate of bugs recently with bad reference counting. these can 
 have potentially dire consequences, generally either randomly deleting data 
 or giving us infinite loops. 
 Since in 2.1 we only reference count resources that are relatively expensive 
 and infrequently managed (or in places where this safety is probably not as 
 necessary, e.g. SerializingCache), we could without any negative consequences 
 (and only slight code complexity) introduce a safer resource management 
 scheme for these more expensive/infrequent actions.
 Basically, I propose when we want to acquire a resource we allocate an object 
 that manages the reference. This can only be released once; if it is released 
 twice, we fail immediately at the second release, reporting where the bug is 
 (rather than letting it continue fine until the next correct release corrupts 
 the count). The reference counter remains the same, but we obtain guarantees 
 that the reference count itself is never badly maintained, although code 
 using it could mistakenly release its own handle early (typically this is 
 only an issue when cleaning up after a failure, in which case under the new 
 scheme this would be an innocuous error)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7220) Nodes hang with 100% CPU load

2014-08-17 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100261#comment-14100261
 ] 

Benedict commented on CASSANDRA-7220:
-

Any other exceptions in the logs? Looks related to CASSANDRA-7262, 
CASSANDRA-7704, CASSANDRA-7705. It's likely this has been fixed in a newer 
release.

 Nodes hang with 100% CPU load
 -

 Key: CASSANDRA-7220
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7220
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: C* 2.0.7
 4 nodes cluster on 12 core machines
Reporter: Robert Stupp
Assignee: Ryan McGuire
 Attachments: c-12-read-100perc-cpu.zip


 I've ran a test that both reads and writes rows.
 After some time, all writes succeeded and all reads stopped.
 Two of the four nodes have 16 of 16 threads of the ReadStage thread pool 
 running. The number of pending task continuouly grows on these nodes.
 I have attached outputs of the stack traces and some diagnostic output from 
 nodetool tpstats
 nodetool status shows all nodes as UN.
 I had run that test previously without any issues in with the same 
 configuration.
 Some specials from cassandra.yaml:
 - key_cache_size_in_mb: 1024
 - row_cache_size_in_mb: 8192
 The nodes running at 100% CPU are node2 and node3. node1node4 are fine.
 I'm not sure if it is reproducable - but it's definitly not a good behaviour.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7220) Nodes hang with 100% CPU load

2014-08-17 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100290#comment-14100290
 ] 

Benedict commented on CASSANDRA-7220:
-

[~rarudduck] it looks like the issue that killed your server was OOM. I can't 
see a reason for this in the logs, so it's possible you simply need to increase 
your heap size, however upgrading may help as there are a LOT of exceptions 
related to CASSANDRA-7756 logged, and it's possible that's somehow causing a 
knock on effect of some kind.

 Nodes hang with 100% CPU load
 -

 Key: CASSANDRA-7220
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7220
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: C* 2.0.7
 4 nodes cluster on 12 core machines
Reporter: Robert Stupp
Assignee: Ryan McGuire
 Attachments: c-12-read-100perc-cpu.zip, system.log


 I've ran a test that both reads and writes rows.
 After some time, all writes succeeded and all reads stopped.
 Two of the four nodes have 16 of 16 threads of the ReadStage thread pool 
 running. The number of pending task continuouly grows on these nodes.
 I have attached outputs of the stack traces and some diagnostic output from 
 nodetool tpstats
 nodetool status shows all nodes as UN.
 I had run that test previously without any issues in with the same 
 configuration.
 Some specials from cassandra.yaml:
 - key_cache_size_in_mb: 1024
 - row_cache_size_in_mb: 8192
 The nodes running at 100% CPU are node2 and node3. node1node4 are fine.
 I'm not sure if it is reproducable - but it's definitly not a good behaviour.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7786) Cassandra is shutting down out of no apparent reason

2014-08-17 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100293#comment-14100293
 ] 

Benedict commented on CASSANDRA-7786:
-

Are you sure you haven't somehow sent the message over JMX / nodeprobe somehow? 
Possibly a script accidentally has the command embedded? There doesn't seem to 
be any code path that could shutdown the server without first logging an 
exception.

 Cassandra is shutting down out of no apparent reason
 

 Key: CASSANDRA-7786
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7786
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: C* 2.0.9
Reporter: Or Sher

 We've recently start facing an issue when one of the C* node in our dev and 
 CI cluster (Thanks god didn't happen yet in Prod) is shutting down from time 
 to time without any exceptions or errors.
 There is usually something like that in the logs:
 INFO [MemoryMeter:1] 2014-08-15 01:32:43,266 Memtable.java (line 481) 
 CFS(Keyspace='system', ColumnFamily='sstable_activity') liveRatio is 
 14.597030881851438 (just-counted was 14.596825396825396).  calculation took 
 2ms for 84 cells
  INFO [StorageServiceShutdownHook] 2014-08-15 01:40:58,954 ThriftServer.java 
 (line 141) Stop listening to thrift clients
  INFO [StorageServiceShutdownHook] 2014-08-15 01:40:59,007 Server.java (line 
 182) Stop listening for CQL clients
  INFO [StorageServiceShutdownHook] 2014-08-15 01:40:59,011 Gossiper.java 
 (line 1279) Announcing shutdown
  INFO [StorageServiceShutdownHook] 2014-08-15 01:41:01,011 
 MessagingService.java (line 683) Waiting for messaging service to quiesce
  INFO [ACCEPT-/192.168.27.241] 2014-08-15 01:41:01,012 MessagingService.java 
 (line 923) MessagingService has terminated the accept() thread
  INFO [main] 2014-08-17 09:50:56,647 CassandraDaemon.java (line 135) Logging 
 initialized
 You can see the last line in the log is usually written at least 5 minutes 
 before the shutdown, sometimes 30 minutes before.
 I can't reproduce it as I have know idea why is that happening and how the 
 attack this issue.
 I believe I'm not the only one suffering from this issue as there was a 
 thread about such behavior in the user mail distribution.
 Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   3   4   5   6   7   8   9   10   >