[jira] [Updated] (CASSANDRA-6949) Performance regression in tombstone heavy workloads

2014-04-15 Thread Sam Tunnicliffe (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-6949:
---

Reviewer: Jonathan Ellis  (was: Sam Tunnicliffe)

 Performance regression in tombstone heavy workloads
 ---

 Key: CASSANDRA-6949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6949
 Project: Cassandra
  Issue Type: Bug
Reporter: Jeremiah Jordan
Assignee: Sam Tunnicliffe
 Attachments: 
 0001-Remove-expansion-of-RangeTombstones-to-delete-from-2.patch, 6949.txt


 CASSANDRA-5614 causes a huge performance regression in tombstone heavy 
 workloads.  The isDeleted checks here cause a huge CPU overhead: 
 https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/db/AtomicSortedColumns.java#L189-L196
 An insert workload which does perfectly fine on 1.2, pegs CPU use at 100% on 
 2.0, with all of the mutation threads sitting in that loop.  For example:
 {noformat}
 MutationStage:20 daemon prio=10 tid=0x7fb1c4c72800 nid=0x2249 runnable 
 [0x7fb1b033]
java.lang.Thread.State: RUNNABLE
 at org.apache.cassandra.db.marshal.BytesType.bytesCompare(BytesType.java:45)
 at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:34)
 at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:26)
 at 
 org.apache.cassandra.db.marshal.AbstractType.compareCollectionMembers(AbstractType.java:267)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:85)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
 at 
 org.apache.cassandra.db.RangeTombstoneList.searchInternal(RangeTombstoneList.java:253)
 at 
 org.apache.cassandra.db.RangeTombstoneList.isDeleted(RangeTombstoneList.java:210)
 at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:136)
 at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:123)
 at 
 org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:193)
 at org.apache.cassandra.db.Memtable.resolve(Memtable.java:194)
 at org.apache.cassandra.db.Memtable.put(Memtable.java:158)
 at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:890)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
 at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:201)
 at 
 org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
 at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6949) Performance regression in tombstone heavy workloads

2014-04-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969709#comment-13969709
 ] 

Benedict commented on CASSANDRA-6949:
-

I assume the only real risk with reverting is that if there are no reads we can 
get uncontrolled growth of the 2i?

 Performance regression in tombstone heavy workloads
 ---

 Key: CASSANDRA-6949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6949
 Project: Cassandra
  Issue Type: Bug
Reporter: Jeremiah Jordan
Assignee: Sam Tunnicliffe
 Attachments: 
 0001-Remove-expansion-of-RangeTombstones-to-delete-from-2.patch, 6949.txt


 CASSANDRA-5614 causes a huge performance regression in tombstone heavy 
 workloads.  The isDeleted checks here cause a huge CPU overhead: 
 https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/db/AtomicSortedColumns.java#L189-L196
 An insert workload which does perfectly fine on 1.2, pegs CPU use at 100% on 
 2.0, with all of the mutation threads sitting in that loop.  For example:
 {noformat}
 MutationStage:20 daemon prio=10 tid=0x7fb1c4c72800 nid=0x2249 runnable 
 [0x7fb1b033]
java.lang.Thread.State: RUNNABLE
 at org.apache.cassandra.db.marshal.BytesType.bytesCompare(BytesType.java:45)
 at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:34)
 at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:26)
 at 
 org.apache.cassandra.db.marshal.AbstractType.compareCollectionMembers(AbstractType.java:267)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:85)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
 at 
 org.apache.cassandra.db.RangeTombstoneList.searchInternal(RangeTombstoneList.java:253)
 at 
 org.apache.cassandra.db.RangeTombstoneList.isDeleted(RangeTombstoneList.java:210)
 at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:136)
 at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:123)
 at 
 org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:193)
 at org.apache.cassandra.db.Memtable.resolve(Memtable.java:194)
 at org.apache.cassandra.db.Memtable.put(Memtable.java:158)
 at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:890)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
 at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:201)
 at 
 org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
 at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6949) Performance regression in tombstone heavy workloads

2014-04-15 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969715#comment-13969715
 ] 

Sam Tunnicliffe commented on CASSANDRA-6949:


Only until a compaction, which will also remove stale entries.

 Performance regression in tombstone heavy workloads
 ---

 Key: CASSANDRA-6949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6949
 Project: Cassandra
  Issue Type: Bug
Reporter: Jeremiah Jordan
Assignee: Sam Tunnicliffe
 Attachments: 
 0001-Remove-expansion-of-RangeTombstones-to-delete-from-2.patch, 6949.txt


 CASSANDRA-5614 causes a huge performance regression in tombstone heavy 
 workloads.  The isDeleted checks here cause a huge CPU overhead: 
 https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/db/AtomicSortedColumns.java#L189-L196
 An insert workload which does perfectly fine on 1.2, pegs CPU use at 100% on 
 2.0, with all of the mutation threads sitting in that loop.  For example:
 {noformat}
 MutationStage:20 daemon prio=10 tid=0x7fb1c4c72800 nid=0x2249 runnable 
 [0x7fb1b033]
java.lang.Thread.State: RUNNABLE
 at org.apache.cassandra.db.marshal.BytesType.bytesCompare(BytesType.java:45)
 at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:34)
 at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:26)
 at 
 org.apache.cassandra.db.marshal.AbstractType.compareCollectionMembers(AbstractType.java:267)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:85)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
 at 
 org.apache.cassandra.db.RangeTombstoneList.searchInternal(RangeTombstoneList.java:253)
 at 
 org.apache.cassandra.db.RangeTombstoneList.isDeleted(RangeTombstoneList.java:210)
 at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:136)
 at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:123)
 at 
 org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:193)
 at org.apache.cassandra.db.Memtable.resolve(Memtable.java:194)
 at org.apache.cassandra.db.Memtable.put(Memtable.java:158)
 at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:890)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
 at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:201)
 at 
 org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
 at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6949) Performance regression in tombstone heavy workloads

2014-04-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969716#comment-13969716
 ] 

Benedict commented on CASSANDRA-6949:
-

It's worth pointing out that a sensible intersection implementation over two 
ordered sets can be quite efficient and a fairly low computational burden, 
which is possibly a good middle ground. But if there's no real risk to getting 
rid of it, that's probably best.

 Performance regression in tombstone heavy workloads
 ---

 Key: CASSANDRA-6949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6949
 Project: Cassandra
  Issue Type: Bug
Reporter: Jeremiah Jordan
Assignee: Sam Tunnicliffe
 Attachments: 
 0001-Remove-expansion-of-RangeTombstones-to-delete-from-2.patch, 6949.txt


 CASSANDRA-5614 causes a huge performance regression in tombstone heavy 
 workloads.  The isDeleted checks here cause a huge CPU overhead: 
 https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/db/AtomicSortedColumns.java#L189-L196
 An insert workload which does perfectly fine on 1.2, pegs CPU use at 100% on 
 2.0, with all of the mutation threads sitting in that loop.  For example:
 {noformat}
 MutationStage:20 daemon prio=10 tid=0x7fb1c4c72800 nid=0x2249 runnable 
 [0x7fb1b033]
java.lang.Thread.State: RUNNABLE
 at org.apache.cassandra.db.marshal.BytesType.bytesCompare(BytesType.java:45)
 at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:34)
 at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:26)
 at 
 org.apache.cassandra.db.marshal.AbstractType.compareCollectionMembers(AbstractType.java:267)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:85)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
 at 
 org.apache.cassandra.db.RangeTombstoneList.searchInternal(RangeTombstoneList.java:253)
 at 
 org.apache.cassandra.db.RangeTombstoneList.isDeleted(RangeTombstoneList.java:210)
 at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:136)
 at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:123)
 at 
 org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:193)
 at org.apache.cassandra.db.Memtable.resolve(Memtable.java:194)
 at org.apache.cassandra.db.Memtable.put(Memtable.java:158)
 at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:890)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
 at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:201)
 at 
 org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
 at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6949) Performance regression in tombstone heavy workloads

2014-04-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969724#comment-13969724
 ] 

Benedict commented on CASSANDRA-6949:
-

bq. Only until a compaction, which will also remove stale entries.

Does it? I don't see how...

 Performance regression in tombstone heavy workloads
 ---

 Key: CASSANDRA-6949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6949
 Project: Cassandra
  Issue Type: Bug
Reporter: Jeremiah Jordan
Assignee: Sam Tunnicliffe
 Attachments: 
 0001-Remove-expansion-of-RangeTombstones-to-delete-from-2.patch, 6949.txt


 CASSANDRA-5614 causes a huge performance regression in tombstone heavy 
 workloads.  The isDeleted checks here cause a huge CPU overhead: 
 https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/db/AtomicSortedColumns.java#L189-L196
 An insert workload which does perfectly fine on 1.2, pegs CPU use at 100% on 
 2.0, with all of the mutation threads sitting in that loop.  For example:
 {noformat}
 MutationStage:20 daemon prio=10 tid=0x7fb1c4c72800 nid=0x2249 runnable 
 [0x7fb1b033]
java.lang.Thread.State: RUNNABLE
 at org.apache.cassandra.db.marshal.BytesType.bytesCompare(BytesType.java:45)
 at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:34)
 at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:26)
 at 
 org.apache.cassandra.db.marshal.AbstractType.compareCollectionMembers(AbstractType.java:267)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:85)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
 at 
 org.apache.cassandra.db.RangeTombstoneList.searchInternal(RangeTombstoneList.java:253)
 at 
 org.apache.cassandra.db.RangeTombstoneList.isDeleted(RangeTombstoneList.java:210)
 at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:136)
 at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:123)
 at 
 org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:193)
 at org.apache.cassandra.db.Memtable.resolve(Memtable.java:194)
 at org.apache.cassandra.db.Memtable.put(Memtable.java:158)
 at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:890)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
 at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:201)
 at 
 org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
 at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7040) Replace read/write stage with per-disk access coordination

2014-04-15 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969726#comment-13969726
 ] 

Jason Brown commented on CASSANDRA-7040:


Martin Thompson mentions batching IO events in a talk at the react conf 2014: 
https://www.youtube.com/watch?v=4dfk3ucthN8 . The idea seems reasonable but I 
haven't investigated it yet. 

bq. that may touch the disks

Yeah, the key word here is *may*. You could add in helpers like mincore (and 
row cache) to help inform you if you have nothing in memory and that you'll be 
going to disk.

 Replace read/write stage with per-disk access coordination
 --

 Key: CASSANDRA-7040
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7040
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 As discussed in CASSANDRA-6995, current coordination of access to disk is 
 suboptimal: instead of ensuring disk accesses alone are coordinated, we 
 instead coordinate at the level of operations that may touch the disks, 
 ensuring only so many are proceeding at once. As such, tuning is difficult, 
 and we incur unnecessary delays for operations that would not touch the 
 disk(s).
 Ideally we would instead simply use a shared coordination primitive to gate 
 access to the disk when we perform a rebuffer. This work would dovetail very 
 nicely with any work in CASSANDRA-5863, as we could prevent any blocking or 
 context switching for data that we know to be cached. It also, as far as I 
 can tell, obviates the need for CASSANDRA-5239.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-7042) Disk space growth until restart

2014-04-15 Thread Zach Aller (JIRA)
Zach Aller created CASSANDRA-7042:
-

 Summary: Disk space growth until restart
 Key: CASSANDRA-7042
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7042
 Project: Cassandra
  Issue Type: Bug
 Environment: Ubuntu 12.04
Sun Java 7
Cassandra 2.0.6

Reporter: Zach Aller
Priority: Critical


Cassandra will constantly eat disk space not sure whats causing it the only 
thing that seems to fix it is a restart of cassandra this happens about every 
3-5 hrs we will grow from about 350GB to 650GB with no end in site. Once we 
restart cassandra it usually all clears itself up and disks return to normal 
for a while then something triggers its and starts climbing again. Sometimes 
when we restart compactions pending skyrocket and if we restart a second time 
the compactions pending drop off back to a normal level. One other thing to 
note is the space is not free'd until cassandra starts back up and not when 
shutdown.

I will get a clean log of before and after restarting next time it happens and 
post it.

Here is a common ERROR in our logs that might be related

ERROR [CompactionExecutor:46] 2014-04-15 09:12:51,040 CassandraDaemon.java 
(line 196) Exception in thread Thread[CompactionExecutor:46,1,main]
java.lang.RuntimeException: java.io.FileNotFoundException: 
/local-project/cassandra_data/data/wxgrid/grid/wxgrid-grid-jb-468677-Data.db 
(No such file or directory)
at 
org.apache.cassandra.io.util.ThrottledReader.open(ThrottledReader.java:53)
at 
org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:1355)
at 
org.apache.cassandra.io.sstable.SSTableScanner.init(SSTableScanner.java:67)
at 
org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:1161)
at 
org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:1173)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getScanners(LeveledCompactionStrategy.java:194)
at 
org.apache.cassandra.db.compaction.AbstractCompactionStrategy.getScanners(AbstractCompactionStrategy.java:258)
at 
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:126)
at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:197)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.FileNotFoundException: 
/local-project/cassandra_data/data/wxgrid/grid/wxgrid-grid-jb-468677-Data.db 
(No such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.init(Unknown Source)
at 
org.apache.cassandra.io.util.RandomAccessReader.init(RandomAccessReader.java:58)
at 
org.apache.cassandra.io.util.ThrottledReader.init(ThrottledReader.java:35)
at 
org.apache.cassandra.io.util.ThrottledReader.open(ThrottledReader.java:49)
... 17 more





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7040) Replace read/write stage with per-disk access coordination

2014-04-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969736#comment-13969736
 ] 

Benedict commented on CASSANDRA-7040:
-

bq. You could add in helpers like mincore (and row cache) to help inform you

Or CASSANDRA-5863 :-)

As to batching - that's another step further along: it would be interesting to 
experiment with an intelligent storage manager that requests are submitted 
to, and are coordinated by, but I think that comes after 5863 + this. There's 
lots of ways we might be able to get improved performance with that approach, 
but I'm not absolutely sure they'll pan out, and they'll be a non-trivial 
undertaking.

 Replace read/write stage with per-disk access coordination
 --

 Key: CASSANDRA-7040
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7040
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 As discussed in CASSANDRA-6995, current coordination of access to disk is 
 suboptimal: instead of ensuring disk accesses alone are coordinated, we 
 instead coordinate at the level of operations that may touch the disks, 
 ensuring only so many are proceeding at once. As such, tuning is difficult, 
 and we incur unnecessary delays for operations that would not touch the 
 disk(s).
 Ideally we would instead simply use a shared coordination primitive to gate 
 access to the disk when we perform a rebuffer. This work would dovetail very 
 nicely with any work in CASSANDRA-5863, as we could prevent any blocking or 
 context switching for data that we know to be cached. It also, as far as I 
 can tell, obviates the need for CASSANDRA-5239.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7040) Replace read/write stage with per-disk access coordination

2014-04-15 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969753#comment-13969753
 ] 

Jason Brown commented on CASSANDRA-7040:


CASSANDRA-5863 could be legit, as well :).

As to intelligent storage manager, I don't think that's necessarily blocked 
by this work, but I do agree it's non-trivial undertaking.

 Replace read/write stage with per-disk access coordination
 --

 Key: CASSANDRA-7040
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7040
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 As discussed in CASSANDRA-6995, current coordination of access to disk is 
 suboptimal: instead of ensuring disk accesses alone are coordinated, we 
 instead coordinate at the level of operations that may touch the disks, 
 ensuring only so many are proceeding at once. As such, tuning is difficult, 
 and we incur unnecessary delays for operations that would not touch the 
 disk(s).
 Ideally we would instead simply use a shared coordination primitive to gate 
 access to the disk when we perform a rebuffer. This work would dovetail very 
 nicely with any work in CASSANDRA-5863, as we could prevent any blocking or 
 context switching for data that we know to be cached. It also, as far as I 
 can tell, obviates the need for CASSANDRA-5239.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7028) Allow C* to compile under java 8

2014-04-15 Thread Joshua McKenzie (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua McKenzie updated CASSANDRA-7028:
---

Attachment: 7028_v5.patch

 Allow C* to compile under java 8
 

 Key: CASSANDRA-7028
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7028
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Dave Brosius
Assignee: Dave Brosius
Priority: Minor
 Fix For: 3.0

 Attachments: 7028.txt, 7028_v2.txt, 7028_v3.txt, 7028_v4.txt, 
 7028_v5.patch


 antlr 3.2 has a problem with java 8, as described here: 
 http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8015656
 updating to antlr 3.5.2 solves this, however they have split up the jars 
 differently, which adds some changes, but also the generation of 
 CqlParser.java causes a method to be too large, so i needed to split that 
 method to reduce the size of it.
 (patch against trunk)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7028) Allow C* to compile under java 8

2014-04-15 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969765#comment-13969765
 ] 

Joshua McKenzie commented on CASSANDRA-7028:


Ah - good call on the runtime libraries.

v4 lost the full index so file deletion failed and the diff has references to 
the new .jar files which prevents it applying both with the files and without.  
I've attached a v5 that cleans up some whitespace complaints and includes the 
binary, both deletion and addition.  We should be able to just apply this to 
trunk and get all the changes - one-shot, no need to download libraries 
separately and place them for the committer.

The diff syntax I used to build this was 'git diff --full-index --binary 
commit1 commit2'.  Even w/full-index, if you don't include the binary flag 
it won't generate the data that goes with your new files you've added and you 
end up with an invalid patch as it has markers to add files but no binary data 
to place in them.

I reran tests on linux against this just to confirm changes to resolve 
HintedHandOffTest didn't munge with anything else and it all looks good on 
jdk7.  I'm +1 on the v5 patch; give it a run against trunk and let me know how 
it works for you.

 Allow C* to compile under java 8
 

 Key: CASSANDRA-7028
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7028
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Dave Brosius
Assignee: Dave Brosius
Priority: Minor
 Fix For: 3.0

 Attachments: 7028.txt, 7028_v2.txt, 7028_v3.txt, 7028_v4.txt, 
 7028_v5.patch


 antlr 3.2 has a problem with java 8, as described here: 
 http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8015656
 updating to antlr 3.5.2 solves this, however they have split up the jars 
 differently, which adds some changes, but also the generation of 
 CqlParser.java causes a method to be too large, so i needed to split that 
 method to reduce the size of it.
 (patch against trunk)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6985) ReadExecutors should not rely on static StorageProxy

2014-04-15 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969798#comment-13969798
 ] 

Yuki Morishita commented on CASSANDRA-6985:
---

Ed,

I don't see the reason to pass StorageProxy to AbstractReadExecutor at all.
It is only used to get live sorted endpoints from it in getExecutor so why just 
pass ListInetAddress?

I see the only reason that StorageProxy singleton instance exists right now is 
mainly for JMX.
Is it more reasonable (for now?) to leave StorageProxy as an utility class/API 
and separate its managing aspect to another class?


 ReadExecutors should not rely on static StorageProxy
 

 Key: CASSANDRA-6985
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6985
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Edward Capriolo
Assignee: Edward Capriolo
Priority: Minor
 Fix For: 3.0

 Attachments: CASSANDRA_6985.1.patch


 All the Read Executor child classes require use of the Storage Proxy to carry 
 out read. We can pass the StorageProxy along in the constructor.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2014-04-15 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969831#comment-13969831
 ] 

T Jake Luciani commented on CASSANDRA-5863:
---

I do think having a set of fast disks for hot data that doesn't fit into memory 
is key because in a large per node deployment you want:

1.  Memory (Really hot data)
2.  SSD (Hot data that doesn't fit in memory)
3.  Spinning disk (Historic cold data) 

[~benedict] you are describing building a custom page cache impl off heap which 
is pretty ambitious.  Don't you think a baby step would be to rely on the OS 
page cache to start and build a custom one as a phase II?

What would be the page size for uncompressed data.  For compressed the chunk 
size (conceptually) fits nicely. 

 In process (uncompressed) page cache
 

 Key: CASSANDRA-5863
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: T Jake Luciani
Assignee: Pavel Yaskevich
  Labels: performance
 Fix For: 2.1 beta2


 Currently, for every read, the CRAR reads each compressed chunk into a 
 byte[], sends it to ICompressor, gets back another byte[] and verifies a 
 checksum.  
 This process is where the majority of time is spent in a read request.  
 Before compression, we would have zero-copy of data and could respond 
 directly from the page-cache.
 It would be useful to have some kind of Chunk cache that could speed up this 
 process for hot data. Initially this could be a off heap cache but it would 
 be great to put these decompressed chunks onto a SSD so the hot data lives on 
 a fast disk similar to https://github.com/facebook/flashcache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6572) Workload recording / playback

2014-04-15 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969839#comment-13969839
 ] 

Jonathan Ellis commented on CASSANDRA-6572:
---

How do you deal w/ prepared vs non-prepared queries?  Thinking of 
CASSANDRA-7021 here.

 Workload recording / playback
 -

 Key: CASSANDRA-6572
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6572
 Project: Cassandra
  Issue Type: New Feature
  Components: Core, Tools
Reporter: Jonathan Ellis
Assignee: Lyuben Todorov
 Fix For: 2.0.8

 Attachments: 6572-trunk.diff


 Write sample mode gets us part way to testing new versions against a real 
 world workload, but we need an easy way to test the query side as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-6572) Workload recording / playback

2014-04-15 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-6572:
--

Reviewer: Tyler Hobbs

WDYT [~thobbs], is this uninvasive enough to make it into 2.0?

 Workload recording / playback
 -

 Key: CASSANDRA-6572
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6572
 Project: Cassandra
  Issue Type: New Feature
  Components: Core, Tools
Reporter: Jonathan Ellis
Assignee: Lyuben Todorov
 Fix For: 2.0.8

 Attachments: 6572-trunk.diff


 Write sample mode gets us part way to testing new versions against a real 
 world workload, but we need an easy way to test the query side as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6487) Log WARN on large batch sizes

2014-04-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969880#comment-13969880
 ] 

Benedict commented on CASSANDRA-6487:
-

I suggest using the ColumnFamily.dataSize() method as Aleksey suggested: in the 
BatchStatement.executeWithConditions() and executeWithoutConditions() methods 
we have access to the fully constructed ColumnFamily objects we will apply. In 
the former we construct a single CF _updates_, and in the latter we can iterate 
over each of the IMutations and call _getColumnFamilies()_.

Warning on the prepared size is probably not meaningful, because it does not 
say anything about how big the data we're applying is.

 Log WARN on large batch sizes
 -

 Key: CASSANDRA-6487
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6487
 Project: Cassandra
  Issue Type: Improvement
Reporter: Patrick McFadin
Assignee: Lyuben Todorov
Priority: Minor
 Fix For: 2.0.8

 Attachments: 6487_trunk.patch, 6487_trunk_v2.patch, 
 cassandra-2.0-6487.diff


 Large batches on a coordinator can cause a lot of node stress. I propose 
 adding a WARN log entry if batch sizes go beyond a configurable size. This 
 will give more visibility to operators on something that can happen on the 
 developer side. 
 New yaml setting with 5k default.
 {{# Log WARN on any batch size exceeding this value. 5k by default.}}
 {{# Caution should be taken on increasing the size of this threshold as it 
 can lead to node instability.}}
 {{batch_size_warn_threshold: 5k}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7040) Replace read/write stage with per-disk access coordination

2014-04-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969882#comment-13969882
 ] 

Benedict commented on CASSANDRA-7040:
-

bq.  I don't think that's necessarily blocked by this work

Sure - and if you want to start building one right now, go to town :)

I only mean that I think it builds on the work here and in 5863, as they both 
involve intercepting the points at which we perform disk accesses and inserting 
some (minimal) coordination inebetween them. Swapping those interception points 
for something more intelligent is probably more straightforward once we've done 
that, and having a cache in which to deposit the result is _probably_ helpful 
too (definitely none of this is 100% essential though).

 Replace read/write stage with per-disk access coordination
 --

 Key: CASSANDRA-7040
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7040
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 As discussed in CASSANDRA-6995, current coordination of access to disk is 
 suboptimal: instead of ensuring disk accesses alone are coordinated, we 
 instead coordinate at the level of operations that may touch the disks, 
 ensuring only so many are proceeding at once. As such, tuning is difficult, 
 and we incur unnecessary delays for operations that would not touch the 
 disk(s).
 Ideally we would instead simply use a shared coordination primitive to gate 
 access to the disk when we perform a rebuffer. This work would dovetail very 
 nicely with any work in CASSANDRA-5863, as we could prevent any blocking or 
 context switching for data that we know to be cached. It also, as far as I 
 can tell, obviates the need for CASSANDRA-5239.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7034) commitlog files are 32MB in size, even with a 64bit OS and jvm

2014-04-15 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969886#comment-13969886
 ] 

Donald Smith commented on CASSANDRA-7034:
-

Benedict, I'm aware that *commitlog_total_space_in_mb* has that purpose.  What 
I'm raising is the issue that this comment in cassandra,yaml is now wrong: the 
default size is 32 on 32-bit JVMs, and 1024 on 64-bit JVMs..  That's no longer 
being enforced.

 commitlog files are 32MB in size, even with a 64bit  OS and jvm
 ---

 Key: CASSANDRA-7034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7034
 Project: Cassandra
  Issue Type: Bug
Reporter: Donald Smith

 We did a rpm install of cassandra 2.0.6 on CentOS 6.4 running 
 {noformat}
  java -version
 Java(TM) SE Runtime Environment (build 1.7.0_40-b43)
 Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode)
 {noformat}
 That is the version of java CassandraDaemon is using.
 We used the default setting (None) in cassandra.yaml for 
 commitlog_total_space_in_mb:
 {noformat}
 # Total space to use for commitlogs.  Since commitlog segments are
 # mmapped, and hence use up address space, the default size is 32
 # on 32-bit JVMs, and 1024 on 64-bit JVMs.
 #
 # If space gets above this value (it will round up to the next nearest
 # segment multiple), Cassandra will flush every dirty CF in the oldest
 # segment and remove it.  So a small total commitlog space will tend
 # to cause more flush activity on less-active columnfamilies.
 # commitlog_total_space_in_mb: 4096
 {noformat}
 But our commitlog files are 32MB in size, not 1024MB.
 OpsCenter confirms that commitlog_total_space_in_mb is None.
 I don't think the problem is in cassandra-env.sh, because when I run it 
 manually and echo the  values of the version variables I get:
 {noformat}
 jvmver=1.7.0_40
 JVM_VERSION=1.7.0
 JVM_ARCH=64-Bit
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7034) commitlog files are 32MB in size, even with a 64bit OS and jvm

2014-04-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969891#comment-13969891
 ] 

Benedict commented on CASSANDRA-7034:
-

Your statement is that your files are 32Mb in size. This is correct. On all VMs 
they should be 32Mb in size, and there should be at most 32 of them on a 64-bit 
architecture, except when the data directories are behind the commit log, in 
which case there can be more. On a 32-bit architecture there would be only 1 
commit log file.

 commitlog files are 32MB in size, even with a 64bit  OS and jvm
 ---

 Key: CASSANDRA-7034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7034
 Project: Cassandra
  Issue Type: Bug
Reporter: Donald Smith

 We did a rpm install of cassandra 2.0.6 on CentOS 6.4 running 
 {noformat}
  java -version
 Java(TM) SE Runtime Environment (build 1.7.0_40-b43)
 Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode)
 {noformat}
 That is the version of java CassandraDaemon is using.
 We used the default setting (None) in cassandra.yaml for 
 commitlog_total_space_in_mb:
 {noformat}
 # Total space to use for commitlogs.  Since commitlog segments are
 # mmapped, and hence use up address space, the default size is 32
 # on 32-bit JVMs, and 1024 on 64-bit JVMs.
 #
 # If space gets above this value (it will round up to the next nearest
 # segment multiple), Cassandra will flush every dirty CF in the oldest
 # segment and remove it.  So a small total commitlog space will tend
 # to cause more flush activity on less-active columnfamilies.
 # commitlog_total_space_in_mb: 4096
 {noformat}
 But our commitlog files are 32MB in size, not 1024MB.
 OpsCenter confirms that commitlog_total_space_in_mb is None.
 I don't think the problem is in cassandra-env.sh, because when I run it 
 manually and echo the  values of the version variables I get:
 {noformat}
 jvmver=1.7.0_40
 JVM_VERSION=1.7.0
 JVM_ARCH=64-Bit
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

2014-04-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969927#comment-13969927
 ] 

Benedict commented on CASSANDRA-5863:
-

I think there are at least three issues we're contending with here, and each 
need their own ticket (eventually). Putting historic data on slow drives is, I 
think, a different problem to putting a cache on some fast disks. Both will be 
helpful. Ideally I think we want the following tiers:

# Uncompressed Memory Cache
# Compressed Memory Cache (disjoint set from 1)
# Compressed SSD cache
# Regular Data
# Archived/Cold/Historic Data

The main distinction being the added regular data layer: any special fast 
disk cache should not store the full sstable hierarchy and its related files, 
it should just store the most popular blocks (or portions of blocks)

bq. Benedict you are describing building a custom page cache impl off heap 
which is pretty ambitious. Don't you think a baby step would be to rely on the 
OS page cache to start and build a custom one as a phase II?

People get very worried when they think they're competing with the kernel 
developers. Often for good reason, but since we don't have to be all things to 
all people we get the opportunity to make economies that aren't always as 
easily available to them. But also we only need to get roughly the same 
performance so we can build on this to make inroads elsewhere. What we're 
talking about here is pretty straight forward - it's one of the less 
challenging problems. A compressed page cache is more challenging, since we 
don't have a uniform size, but it is still probably not too difficult. Take a 
look at my suggestion for a key cache in CASSANDRA-6709 for a detailed 
description of how I would build the offheap structure.

The basic approach I would probably take is this: deal with 4Kb blocks. Any 
blocks we read from disk larger than this we split up into 4Kb chunks and 
insert each into the cache separately*. The cache itself is 8- or 16-way 
associative, with 3 components: a long storing the LRU information for the 
bucket, 16-longs storing identity information for the lookup within the bucket, 
and corresponding positions in a large address space storing each of the 4Kb 
data chunks. Readers always hit the cache, and if they miss they populate the 
cache using the appropriate reader before continuing. Regrettably we don't have 
access to SIMD instructions or we could do a lot of this stuff tremendously 
efficiently, but even without that it should be pretty nippy.

*This allows us to have a greater granularity for eviction and keeps cpu-cache 
traffic when reading from the cache to a minimum. It's also a pretty optimal 
size for reading/writing to SSD if we overflow to disk, and is a sufficiently 
large amount to get good compression for an in-memory compressed cache, whilst 
still being small enough to streamdecompress from main-memory without a major 
penalty to lookup a small part of it.

As to having a fast disk cache, I also think this is a great idea. But I think 
it fits in as an extension of this and any compressed in-memory cache, as we 
build a tiered-cache architecture.

 In process (uncompressed) page cache
 

 Key: CASSANDRA-5863
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: T Jake Luciani
Assignee: Pavel Yaskevich
  Labels: performance
 Fix For: 2.1 beta2


 Currently, for every read, the CRAR reads each compressed chunk into a 
 byte[], sends it to ICompressor, gets back another byte[] and verifies a 
 checksum.  
 This process is where the majority of time is spent in a read request.  
 Before compression, we would have zero-copy of data and could respond 
 directly from the page-cache.
 It would be useful to have some kind of Chunk cache that could speed up this 
 process for hot data. Initially this could be a off heap cache but it would 
 be great to put these decompressed chunks onto a SSD so the hot data lives on 
 a fast disk similar to https://github.com/facebook/flashcache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-7043) CommitLogArchiver thread pool name inconsistent with others

2014-04-15 Thread Chris Lohfink (JIRA)
Chris Lohfink created CASSANDRA-7043:


 Summary: CommitLogArchiver thread pool name inconsistent with 
others
 Key: CASSANDRA-7043
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7043
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Chris Lohfink
Priority: Trivial
 Attachments: namechange.diff

Pretty trivial... The names of all ThreadPoolExecutors are in CamelCase except 
the CommitLogArchiver as commitlog_archiver.  This shows up a little more 
obvious in tpstats output:

{code}
nodetool tpstats

Pool NameActive   Pending  Completed   Blocked  
ReadStage 0 0 113702 0  
 
RequestResponseStage  0 0  0 0  
 
...
PendingRangeCalculator0 0  1 0  
   
commitlog_archiver0 0  0 0  
   
InternalResponseStage 0 0  0 0  
   
HintedHandoff 0 0  0 0  
   
{code}

Seems minor enough to update this to be CommitLogArchiver but it may mean 
changes in any monitoring applications (although I don't think this particular 
pool has had much runtime or monitoring needs).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[3/3] git commit: Merge branch 'cassandra-2.1' into trunk

2014-04-15 Thread yukim
Merge branch 'cassandra-2.1' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/2804ce99
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/2804ce99
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/2804ce99

Branch: refs/heads/trunk
Commit: 2804ce9945a83a696e36b4add7a684b132fdef7c
Parents: fc4ae11 de8a479
Author: Yuki Morishita yu...@apache.org
Authored: Tue Apr 15 17:15:01 2014 -0500
Committer: Yuki Morishita yu...@apache.org
Committed: Tue Apr 15 17:15:01 2014 -0500

--
 CHANGES.txt |  1 +
 .../apache/cassandra/db/ColumnFamilyStore.java  | 18 ++-
 .../repair/RepairMessageVerbHandler.java| 33 +---
 .../apache/cassandra/repair/SnapshotTask.java   |  8 +--
 .../repair/messages/RepairMessage.java  |  3 +-
 .../repair/messages/SnapshotMessage.java| 53 
 6 files changed, 100 insertions(+), 16 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/2804ce99/CHANGES.txt
--



[2/3] git commit: Snapshot only related SSTables when sequential repair

2014-04-15 Thread yukim
Snapshot only related SSTables when sequential repair

patch by yukim; reviewed by jmckenzie for CASSANDRA-7024


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/de8a479f
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/de8a479f
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/de8a479f

Branch: refs/heads/trunk
Commit: de8a479f2e1a8b536dedf2e6470301709bc3d9dc
Parents: b69f5e3
Author: Yuki Morishita yu...@apache.org
Authored: Tue Apr 15 17:13:45 2014 -0500
Committer: Yuki Morishita yu...@apache.org
Committed: Tue Apr 15 17:13:45 2014 -0500

--
 CHANGES.txt |  1 +
 .../apache/cassandra/db/ColumnFamilyStore.java  | 18 ++-
 .../repair/RepairMessageVerbHandler.java| 33 +---
 .../apache/cassandra/repair/SnapshotTask.java   |  8 +--
 .../repair/messages/RepairMessage.java  |  3 +-
 .../repair/messages/SnapshotMessage.java| 53 
 6 files changed, 100 insertions(+), 16 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/de8a479f/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 592eef9..9f34023 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -45,6 +45,7 @@
  * Add failure handler to async callback (CASSANDRA-6747)
  * Fix AE when closing SSTable without releasing reference (CASSANDRA-7000)
  * Clean up IndexInfo on keyspace/table drops (CASSANDRA-6924)
+ * Only snapshot relative SSTables when sequential repair (CASSANDRA-7024)
 Merged from 2.0:
  * Put nodes in hibernate when join_ring is false (CASSANDRA-6961)
  * Allow compaction of system tables during startup (CASSANDRA-6913)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/de8a479f/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
--
diff --git a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java 
b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
index ffea243..923ea5b 100644
--- a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
+++ b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
@@ -30,6 +30,7 @@ import javax.management.*;
 
 import com.google.common.annotations.VisibleForTesting;
 import com.google.common.base.Function;
+import com.google.common.base.Predicate;
 import com.google.common.collect.*;
 import com.google.common.util.concurrent.*;
 import com.google.common.util.concurrent.Futures;
@@ -2153,6 +2154,11 @@ public class ColumnFamilyStore implements 
ColumnFamilyStoreMBean
 
 public void snapshotWithoutFlush(String snapshotName)
 {
+snapshotWithoutFlush(snapshotName, null);
+}
+
+public void snapshotWithoutFlush(String snapshotName, 
PredicateSSTableReader predicate)
+{
 for (ColumnFamilyStore cfs : concatWithIndexes())
 {
 DataTracker.View currentView = cfs.markCurrentViewReferenced();
@@ -2161,6 +2167,11 @@ public class ColumnFamilyStore implements 
ColumnFamilyStoreMBean
 {
 for (SSTableReader ssTable : currentView.sstables)
 {
+if (predicate != null  !predicate.apply(ssTable))
+{
+continue;
+}
+
 File snapshotDirectory = 
Directories.getSnapshotDirectory(ssTable.descriptor, snapshotName);
 ssTable.createLinks(snapshotDirectory.getPath()); // hard 
links
 if (logger.isDebugEnabled())
@@ -2190,8 +2201,13 @@ public class ColumnFamilyStore implements 
ColumnFamilyStoreMBean
  */
 public void snapshot(String snapshotName)
 {
+snapshot(snapshotName, null);
+}
+
+public void snapshot(String snapshotName, PredicateSSTableReader 
predicate)
+{
 forceBlockingFlush();
-snapshotWithoutFlush(snapshotName);
+snapshotWithoutFlush(snapshotName, predicate);
 }
 
 public boolean snapshotExists(String snapshotName)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/de8a479f/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java
--
diff --git a/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java 
b/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java
index bb66b69..d710652 100644
--- a/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java
+++ b/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java
@@ -18,30 +18,32 @@
 package org.apache.cassandra.repair;
 
 import java.util.ArrayList;
+import java.util.Collections;
 import java.util.List;
 import java.util.UUID;
 import java.util.concurrent.Future;
 

[1/3] git commit: Snapshot only related SSTables when sequential repair

2014-04-15 Thread yukim
Repository: cassandra
Updated Branches:
  refs/heads/cassandra-2.1 b69f5e363 - de8a479f2
  refs/heads/trunk fc4ae115a - 2804ce994


Snapshot only related SSTables when sequential repair

patch by yukim; reviewed by jmckenzie for CASSANDRA-7024


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/de8a479f
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/de8a479f
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/de8a479f

Branch: refs/heads/cassandra-2.1
Commit: de8a479f2e1a8b536dedf2e6470301709bc3d9dc
Parents: b69f5e3
Author: Yuki Morishita yu...@apache.org
Authored: Tue Apr 15 17:13:45 2014 -0500
Committer: Yuki Morishita yu...@apache.org
Committed: Tue Apr 15 17:13:45 2014 -0500

--
 CHANGES.txt |  1 +
 .../apache/cassandra/db/ColumnFamilyStore.java  | 18 ++-
 .../repair/RepairMessageVerbHandler.java| 33 +---
 .../apache/cassandra/repair/SnapshotTask.java   |  8 +--
 .../repair/messages/RepairMessage.java  |  3 +-
 .../repair/messages/SnapshotMessage.java| 53 
 6 files changed, 100 insertions(+), 16 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/de8a479f/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 592eef9..9f34023 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -45,6 +45,7 @@
  * Add failure handler to async callback (CASSANDRA-6747)
  * Fix AE when closing SSTable without releasing reference (CASSANDRA-7000)
  * Clean up IndexInfo on keyspace/table drops (CASSANDRA-6924)
+ * Only snapshot relative SSTables when sequential repair (CASSANDRA-7024)
 Merged from 2.0:
  * Put nodes in hibernate when join_ring is false (CASSANDRA-6961)
  * Allow compaction of system tables during startup (CASSANDRA-6913)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/de8a479f/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
--
diff --git a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java 
b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
index ffea243..923ea5b 100644
--- a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
+++ b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
@@ -30,6 +30,7 @@ import javax.management.*;
 
 import com.google.common.annotations.VisibleForTesting;
 import com.google.common.base.Function;
+import com.google.common.base.Predicate;
 import com.google.common.collect.*;
 import com.google.common.util.concurrent.*;
 import com.google.common.util.concurrent.Futures;
@@ -2153,6 +2154,11 @@ public class ColumnFamilyStore implements 
ColumnFamilyStoreMBean
 
 public void snapshotWithoutFlush(String snapshotName)
 {
+snapshotWithoutFlush(snapshotName, null);
+}
+
+public void snapshotWithoutFlush(String snapshotName, 
PredicateSSTableReader predicate)
+{
 for (ColumnFamilyStore cfs : concatWithIndexes())
 {
 DataTracker.View currentView = cfs.markCurrentViewReferenced();
@@ -2161,6 +2167,11 @@ public class ColumnFamilyStore implements 
ColumnFamilyStoreMBean
 {
 for (SSTableReader ssTable : currentView.sstables)
 {
+if (predicate != null  !predicate.apply(ssTable))
+{
+continue;
+}
+
 File snapshotDirectory = 
Directories.getSnapshotDirectory(ssTable.descriptor, snapshotName);
 ssTable.createLinks(snapshotDirectory.getPath()); // hard 
links
 if (logger.isDebugEnabled())
@@ -2190,8 +2201,13 @@ public class ColumnFamilyStore implements 
ColumnFamilyStoreMBean
  */
 public void snapshot(String snapshotName)
 {
+snapshot(snapshotName, null);
+}
+
+public void snapshot(String snapshotName, PredicateSSTableReader 
predicate)
+{
 forceBlockingFlush();
-snapshotWithoutFlush(snapshotName);
+snapshotWithoutFlush(snapshotName, predicate);
 }
 
 public boolean snapshotExists(String snapshotName)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/de8a479f/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java
--
diff --git a/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java 
b/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java
index bb66b69..d710652 100644
--- a/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java
+++ b/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java
@@ -18,30 +18,32 @@
 package org.apache.cassandra.repair;
 
 import 

[jira] [Commented] (CASSANDRA-6949) Performance regression in tombstone heavy workloads

2014-04-15 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13970143#comment-13970143
 ] 

Aleksey Yeschenko commented on CASSANDRA-6949:
--

Probably talking about this - 
https://github.com/apache/cassandra/blob/2804ce9945a83a696e36b4add7a684b132fdef7c/src/java/org/apache/cassandra/db/compaction/LazilyCompactedRow.java#L226-L230

 Performance regression in tombstone heavy workloads
 ---

 Key: CASSANDRA-6949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6949
 Project: Cassandra
  Issue Type: Bug
Reporter: Jeremiah Jordan
Assignee: Sam Tunnicliffe
 Attachments: 
 0001-Remove-expansion-of-RangeTombstones-to-delete-from-2.patch, 6949.txt


 CASSANDRA-5614 causes a huge performance regression in tombstone heavy 
 workloads.  The isDeleted checks here cause a huge CPU overhead: 
 https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/db/AtomicSortedColumns.java#L189-L196
 An insert workload which does perfectly fine on 1.2, pegs CPU use at 100% on 
 2.0, with all of the mutation threads sitting in that loop.  For example:
 {noformat}
 MutationStage:20 daemon prio=10 tid=0x7fb1c4c72800 nid=0x2249 runnable 
 [0x7fb1b033]
java.lang.Thread.State: RUNNABLE
 at org.apache.cassandra.db.marshal.BytesType.bytesCompare(BytesType.java:45)
 at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:34)
 at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:26)
 at 
 org.apache.cassandra.db.marshal.AbstractType.compareCollectionMembers(AbstractType.java:267)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:85)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
 at 
 org.apache.cassandra.db.RangeTombstoneList.searchInternal(RangeTombstoneList.java:253)
 at 
 org.apache.cassandra.db.RangeTombstoneList.isDeleted(RangeTombstoneList.java:210)
 at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:136)
 at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:123)
 at 
 org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:193)
 at org.apache.cassandra.db.Memtable.resolve(Memtable.java:194)
 at org.apache.cassandra.db.Memtable.put(Memtable.java:158)
 at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:890)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
 at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:201)
 at 
 org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
 at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (CASSANDRA-7024) Create snapshot selectively during sequential repair

2014-04-15 Thread Yuki Morishita (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuki Morishita resolved CASSANDRA-7024.
---

Resolution: Fixed

Thanks, committed.
And yes, it looks like SnapshotCommand is not used any more, but I still leave 
it for now.

 Create snapshot selectively during sequential repair 
 -

 Key: CASSANDRA-7024
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7024
 Project: Cassandra
  Issue Type: Improvement
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
 Fix For: 2.1 beta2

 Attachments: 
 0001-Only-snapshot-SSTables-related-to-validating-range.patch


 When doing snapshot repair, right now we snapshot all SSTables, open them and 
 use just part of them for building MerkleTree.
 Instead, we can snapshot and use only SSTables that are necessary to build 
 MerkleTree of interested range.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict reassigned CASSANDRA-6936:
---

Assignee: (was: Benedict)

 Make all byte representations of types comparable by their unsigned byte 
 representation only
 

 Key: CASSANDRA-6936
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 This could be a painful change, but is necessary for implementing a 
 trie-based index, and settling for less would be suboptimal; it also should 
 make comparisons cheaper all-round, and since comparison operations are 
 pretty much the majority of C*'s business, this should be easily felt (see 
 CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes with 
 major performance impacts). No copying/special casing/slicing should mean 
 fewer opportunities to introduce performance regressions as well.
 Since I have slated for 3.0 a lot of non-backwards-compatible sstable 
 changes, hopefully this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (CASSANDRA-6976) Determining replicas to query is very slow with large numbers of nodes or vnodes

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict reassigned CASSANDRA-6976:
---

Assignee: (was: Benedict)

 Determining replicas to query is very slow with large numbers of nodes or 
 vnodes
 

 Key: CASSANDRA-6976
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6976
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 2.1


 As described in CASSANDRA-6906, this can be ~100ms for a relatively small 
 cluster with vnodes, which is longer than it will spend in transit on the 
 network. This should be much faster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (CASSANDRA-6935) Make clustering part of primary key a first order component in the storage engine

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict reassigned CASSANDRA-6935:
---

Assignee: (was: Benedict)

 Make clustering part of primary key a first order component in the storage 
 engine
 -

 Key: CASSANDRA-6935
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6935
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 It would be helpful for a number of upcoming improvements if the clustering 
 part of the primary key were extracted from CellName, and if a ColumnFamily 
 object could store multiple ClusteredRow (or similar) instances, within which 
 each cell is keyed only by the column identifier.
 This would also, by itself, reduce on comparison costs and also permit memory 
 savings in memtables, by sharing the clustering part of the primary key 
 across all cells in the same row. It might also make it easier to move more 
 data off-heap, by constructing an off-heap clustered row, but keeping the 
 partition level object on-heap.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (CASSANDRA-6861) Eliminate garbage in server-side native transport

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict reassigned CASSANDRA-6861:
---

Assignee: (was: Benedict)

 Eliminate garbage in server-side native transport
 -

 Key: CASSANDRA-6861
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6861
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Priority: Minor
  Labels: performance
 Fix For: 2.1 beta2


 Now we've upgraded to Netty 4, we're generating a lot of garbage that could 
 be avoided, so we should probably stop that. Should be reasonably easy to 
 hook into Netty's pooled buffers, returning them to the pool once a given 
 message is completed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (CASSANDRA-6726) Recycle CRAR/RAR buffers independently of their owners, and move them off-heap when possible

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict reassigned CASSANDRA-6726:
---

Assignee: (was: Benedict)

 Recycle CRAR/RAR buffers independently of their owners, and move them 
 off-heap when possible
 

 Key: CASSANDRA-6726
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6726
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benedict
Priority: Minor
  Labels: performance
 Fix For: 3.0


 Whilst CRAR and RAR are pooled, we could and probably should pool the buffers 
 independently, so that they are not tied to a specific sstable. It may be 
 possible to move the RAR buffer off-heap, and the CRAR sometimes (e.g. Snappy 
 may possibly support off-heap buffers)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (CASSANDRA-6755) Optimise CellName/Composite comparisons for NativeCell

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict reassigned CASSANDRA-6755:
---

Assignee: (was: Benedict)

 Optimise CellName/Composite comparisons for NativeCell
 --

 Key: CASSANDRA-6755
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6755
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benedict
Priority: Minor
  Labels: performance
 Fix For: 3.0


 As discussed in CASSANDRA-6694, to reduce temporary garbage generation we 
 should minimise the incidence of CellName component extraction. The biggest 
 win will be to perform comparisons on Cell where possible, instead of 
 CellName, so that Native*Cell can use its extra information to avoid creating 
 any ByteBuffer objects



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (CASSANDRA-6809) Compressed Commit Log

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict reassigned CASSANDRA-6809:
---

Assignee: (was: Benedict)

 Compressed Commit Log
 -

 Key: CASSANDRA-6809
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6809
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benedict
Priority: Minor
  Labels: performance
 Fix For: 3.0


 It seems an unnecessary oversight that we don't compress the commit log. 
 Doing so should improve throughput, but some care will need to be taken to 
 ensure we use as much of a segment as possible. I propose decoupling the 
 writing of the records from the segments. Basically write into a (queue of) 
 DirectByteBuffer, and have the sync thread compress, say, ~64K chunks every X 
 MB written to the CL (where X is ordinarily CLS size), and then pack as many 
 of the compressed chunks into a CLS as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (CASSANDRA-5019) Still too much object allocation on reads

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict reassigned CASSANDRA-5019:
---

Assignee: (was: Benedict)

 Still too much object allocation on reads
 -

 Key: CASSANDRA-5019
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5019
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
  Labels: performance
 Fix For: 3.0


 ArrayBackedSortedColumns was a step in the right direction but it's still 
 relatively heavyweight thanks to allocating individual Columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (CASSANDRA-7029) Investigate alternative transport protocols for both client and inter-server communications

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict reassigned CASSANDRA-7029:
---

Assignee: Benedict

 Investigate alternative transport protocols for both client and inter-server 
 communications
 ---

 Key: CASSANDRA-7029
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7029
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0


 There are a number of reasons to think we can do better than TCP for our 
 communications:
 1) We can actually tolerate sporadic small message losses, so guaranteed 
 delivery isn't essential (although for larger messages it probably is)
 2) As shown in \[1\] and \[2\], Linux can behave quite suboptimally with 
 regard to TCP message delivery when the system is under load. Judging from 
 the theoretical description, this is likely to apply even when the 
 system-load is not high, but the number of processes to schedule is high. 
 Cassandra generally has a lot of threads to schedule, so this is quite 
 pertinent for us. UDP performs substantially better here.
 3) Even when the system is not under load, UDP has a lower CPU burden, and 
 that burden is constant regardless of the number of connections it processes. 
 4) On a simple benchmark on my local PC, using non-blocking IO for UDP and 
 busy spinning on IO I can actually push 20-40% more throughput through 
 loopback (where TCP should be optimal, as no latency), even for very small 
 messages. Since we can see networking taking multiple CPUs' worth of time 
 during a stress test, using a busy-spin for ~100micros after last message 
 receipt is almost certainly acceptable, especially as we can (ultimately) 
 process inter-server and client communications on the same thread/socket in 
 this model.
 5) We can optimise the threading model heavily: since we generally process 
 very small messages (200 bytes not at all implausible), the thread signalling 
 costs on the processing thread can actually dramatically impede throughput. 
 In general it costs ~10micros to signal (and passing the message to another 
 thread for processing in the current model requires signalling). For 200-byte 
 messages this caps our throughput at 20MB/s.
 I propose to knock up a highly naive UDP-based connection protocol with 
 super-trivial congestion control over the course of a few days, with the only 
 initial goal being maximum possible performance (not fairness, reliability, 
 or anything else), and trial it in Netty (possibly making some changes to 
 Netty to mitigate thread signalling costs). The reason for knocking up our 
 own here is to get a ceiling on what the absolute limit of potential for this 
 approach is. Assuming this pans out with performance gains in C* proper, we 
 then look to contributing to/forking the udt-java project and see how easy it 
 is to bring performance in line with what we can get with our naive approach 
 (I don't suggest starting here, as the project is using blocking old-IO, and 
 modifying it with latency in mind may be challenging, and we won't know for 
 sure what the best case scenario is).
 \[1\] 
 http://test-docdb.fnal.gov/0016/001648/002/Potential%20Performance%20Bottleneck%20in%20Linux%20TCP.PDF
 \[2\] 
 http://cd-docdb.fnal.gov/cgi-bin/RetrieveFile?docid=1968;filename=Performance%20Analysis%20of%20Linux%20Networking%20-%20Packet%20Receiving%20(Official).pdf;version=2
 Further related reading:
 http://public.dhe.ibm.com/software/commerce/doc/mft/cdunix/41/UDTWhitepaper.pdf
 https://mospace.umsystem.edu/xmlui/bitstream/handle/10355/14482/ChoiUndPerTcp.pdf?sequence=1
 https://access.redhat.com/site/documentation/en-US/JBoss_Enterprise_Web_Platform/5/html/Administration_And_Configuration_Guide/jgroups-perf-udpbuffer.html
 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.153.3762rep=rep1type=pdf



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-5220) Repair improvements when using vnodes

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-5220:


Labels: performance repair  (was: )

 Repair improvements when using vnodes
 -

 Key: CASSANDRA-5220
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.2.0 beta 1
Reporter: Brandon Williams
Assignee: Yuki Morishita
  Labels: performance, repair
 Fix For: 2.1 beta2


 Currently when using vnodes, repair takes much longer to complete than 
 without them.  This appears at least in part because it's using a session per 
 range and processing them sequentially.  This generates a lot of log spam 
 with vnodes, and while being gentler and lighter on hard disk deployments, 
 ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-6066) LHF 2i performance improvements

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-6066:


Labels: performance  (was: )

 LHF 2i performance improvements
 ---

 Key: CASSANDRA-6066
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6066
 Project: Cassandra
  Issue Type: Improvement
Reporter: Aleksey Yeschenko
Assignee: Lyuben Todorov
Priority: Minor
  Labels: performance
 Fix For: 2.0.8


 We should perform more aggressive paging over the index partition (costs us 
 nothing) and also fetch the rows from the base table in one slice query (at 
 least the ones belonging to the same partition).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-6602) Compaction improvements to optimize time series data

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-6602:


Summary: Compaction improvements to optimize time series data  (was: 
Enhancements to optimize for the storing of time series data)

 Compaction improvements to optimize time series data
 

 Key: CASSANDRA-6602
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6602
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Tupshin Harper
  Labels: performance
 Fix For: 3.0


 There are some unique characteristics of many/most time series use cases that 
 both provide challenges, as well as provide unique opportunities for 
 optimizations.
 One of the major challenges is in compaction. The existing compaction 
 strategies will tend to re-compact data on disk at least a few times over the 
 lifespan of each data point, greatly increasing the cpu and IO costs of that 
 write.
 Compaction exists to
 1) ensure that there aren't too many files on disk
 2) ensure that data that should be contiguous (part of the same partition) is 
 laid out contiguously
 3) deleting data due to ttls or tombstones
 The special characteristics of time series data allow us to optimize away all 
 three.
 Time series data
 1) tends to be delivered in time order, with relatively constrained exceptions
 2) often has a pre-determined and fixed expiration date
 3) Never gets deleted prior to TTL
 4) Has relatively predictable ingestion rates
 Note that I filed CASSANDRA-5561 and this ticket potentially replaces or 
 lowers the need for it. In that ticket, jbellis reasonably asks, how that 
 compaction strategy is better than disabling compaction.
 Taking that to heart, here is a compaction-strategy-less approach that could 
 be extremely efficient for time-series use cases that follow the above 
 pattern.
 (For context, I'm thinking of an example use case involving lots of streams 
 of time-series data with a 5GB per day ingestion rate, and a 1000 day 
 retention with TTL, resulting in an eventual steady state of 5TB per node)
 1) You have an extremely large memtable (preferably off heap, if/when doable) 
 for the table, and that memtable is sized to be able to hold a lengthy window 
 of time. A typical period might be one day. At the end of that period, you 
 flush the contents of the memtable to an sstable and move to the next one. 
 This is basically identical to current behaviour, but with thresholds 
 adjusted so that you can ensure flushing at predictable intervals. (Open 
 question is whether predictable intervals is actually necessary, or whether 
 just waiting until the huge memtable is nearly full is sufficient)
 2) Combine the behaviour with CASSANDRA-5228 so that sstables will be 
 efficiently dropped once all of the columns have. (Another side note, it 
 might be valuable to have a modified version of CASSANDRA-3974 that doesn't 
 bother storing per-column TTL since it is required that all columns have the 
 same TTL)
 3) Be able to mark column families as read/write only (no explicit deletes), 
 so no tombstones.
 4) Optionally add back an additional type of delete that would delete all 
 data earlier than a particular timestamp, resulting in immediate dropping of 
 obsoleted sstables.
 The result is that for in-order delivered data, Every cell will be laid out 
 optimally on disk on the first pass, and over the course of 1000 days and 5TB 
 of data, there will only be 1000 5GB sstables, so the number of filehandles 
 will be reasonable.
 For exceptions (out-of-order delivery), most cases will be caught by the 
 extended (24 hour+) memtable flush times and merged correctly automatically. 
 For those that were slightly askew at flush time, or were delivered so far 
 out of order that they go in the wrong sstable, there is relatively low 
 overhead to reading from two sstables for a time slice, instead of one, and 
 that overhead would be incurred relatively rarely unless out-of-order 
 delivery was the common case, in which case, this strategy should not be used.
 Another possible optimization to address out-of-order would be to maintain 
 more than one time-centric memtables in memory at a time (e.g. two 12 hour 
 ones), and then you always insert into whichever one of the two owns the 
 appropriate range of time. By delaying flushing the ahead one until we are 
 ready to roll writes over to a third one, we are able to avoid any 
 fragmentation as long as all deliveries come in no more than 12 hours late 
 (in this example, presumably tunable).
 Anything that triggers compactions will have to be looked at, since there 
 won't be any. The one concern I have is the ramificaiton of repair. 
 Initially, at least, I think it would be acceptable to just write one 

[jira] [Commented] (CASSANDRA-6572) Workload recording / playback

2014-04-15 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13970268#comment-13970268
 ] 

Aleksey Yeschenko commented on CASSANDRA-6572:
--

I'd say 3.0, with 2.1 being so close, and delayed.

 Workload recording / playback
 -

 Key: CASSANDRA-6572
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6572
 Project: Cassandra
  Issue Type: New Feature
  Components: Core, Tools
Reporter: Jonathan Ellis
Assignee: Lyuben Todorov
 Fix For: 2.0.8

 Attachments: 6572-trunk.diff


 Write sample mode gets us part way to testing new versions against a real 
 world workload, but we need an easy way to test the query side as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


git commit: Allow cassandra to compile under java 8

2014-04-15 Thread dbrosius
Repository: cassandra
Updated Branches:
  refs/heads/trunk 2804ce994 - 4d0691759


Allow cassandra to compile under java 8

patch by dbrosius reviewed by jmckenzie for cassandra-7028


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/4d069175
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/4d069175
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/4d069175

Branch: refs/heads/trunk
Commit: 4d0691759a19f1faafe889d765145ae6a5096397
Parents: 2804ce9
Author: Dave Brosius dbros...@mebigfatguy.com
Authored: Tue Apr 15 20:36:16 2014 -0400
Committer: Dave Brosius dbros...@mebigfatguy.com
Committed: Tue Apr 15 20:38:32 2014 -0400

--
 CHANGES.txt  |   1 +
 build.xml|  11 ---
 lib/antlr-3.2.jar| Bin 1928009 - 0 bytes
 lib/antlr-runtime-3.5.2.jar  | Bin 0 - 167761 bytes
 lib/licenses/antlr-3.2.txt   |  27 --
 lib/licenses/antlr-runtime-3.5.2.txt |  27 ++
 lib/licenses/stringtemplate-4.0.2.txt|  27 ++
 lib/stringtemplate-4.0.2.jar | Bin 0 - 226406 bytes
 src/java/org/apache/cassandra/cql3/Cql.g |  22 -
 9 files changed, 80 insertions(+), 35 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/4d069175/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index cbf82de..2fbf3ae 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -4,6 +4,7 @@
  * Remove CQL2 (CASSANDRA-5918)
  * Add Thrift get_multi_slice call (CASSANDRA-6757)
  * Optimize fetching multiple cells by name (CASSANDRA-6933)
+ * Allow compilation in java 8 (CASSANDRA-7208)
 
 
 2.1.0-beta2

http://git-wip-us.apache.org/repos/asf/cassandra/blob/4d069175/build.xml
--
diff --git a/build.xml b/build.xml
index 8c4cb7b..9326424 100644
--- a/build.xml
+++ b/build.xml
@@ -190,7 +190,7 @@
 target name=gen-cli-grammar depends=check-gen-cli-grammar 
unless=cliUpToDate
   echoBuilding Grammar ${build.src.java}/org/apache/cassandra/cli/Cli.g  
/echo
   java classname=org.antlr.Tool
-classpath=${build.lib}/antlr-3.2.jar
+
classpath=${build.dir.lib}/jars/antlr-3.5.2.jar;${build.lib}/antlr-runtime-3.5.2.jar;${build.lib}/stringtemplate-4.0.2.jar
 fork=true
 failonerror=true
  arg value=${build.src.java}/org/apache/cassandra/cli/Cli.g /
@@ -211,7 +211,7 @@
 target name=gen-cql3-grammar depends=check-gen-cql3-grammar 
unless=cql3current
   echoBuilding Grammar ${build.src.java}/org/apache/cassandra/cql3/Cql.g 
 .../echo
   java classname=org.antlr.Tool
-classpath=${build.lib}/antlr-3.2.jar
+
classpath=${build.dir.lib}/jars/antlr-3.5.2.jar;${build.lib}/antlr-runtime-3.5.2.jar;${build.lib}/stringtemplate-4.0.2.jar
 fork=true
 failonerror=true
  arg value=-Xconversiontimeout /
@@ -330,7 +330,9 @@
   dependency groupId=org.apache.commons artifactId=commons-lang3 
version=3.1/
   dependency groupId=org.apache.commons artifactId=commons-math3 
version=3.2/
   dependency groupId=com.googlecode.concurrentlinkedhashmap 
artifactId=concurrentlinkedhashmap-lru version=1.3/
-  dependency groupId=org.antlr artifactId=antlr version=3.2/
+  dependency groupId=org.antlr artifactId=antlr version=3.5.2/
+  dependency groupId=org.antlr artifactId=antlr-runtime 
version=3.5.2/
+  dependency groupId=org.antlr artifactId=stringtemplate 
version=4.0.2/
   dependency groupId=org.slf4j artifactId=slf4j-api 
version=1.7.2/
   dependency groupId=ch.qos.logback artifactId=logback-core 
version=1.1.12/
   dependency groupId=ch.qos.logback artifactId=logback-classic 
version=1.1.12/
@@ -403,6 +405,7 @@
dependency groupId=org.apache.hadoop 
artifactId=hadoop-minicluster/
 dependency groupId=org.apache.pig artifactId=pig/
dependency groupId=com.google.code.findbugs artifactId=jsr305/
+dependency groupId=org.antlr artifactId=antlr/
 dependency groupId=com.datastax.cassandra 
artifactId=cassandra-driver-core/
   /artifact:pom
 
@@ -444,6 +447,8 @@
 dependency groupId=org.apache.commons artifactId=commons-math3/
 dependency groupId=com.googlecode.concurrentlinkedhashmap 
artifactId=concurrentlinkedhashmap-lru/
 dependency groupId=org.antlr artifactId=antlr/
+dependency groupId=org.antlr artifactId=antlr-runtime/
+dependency groupId=org.antlr artifactId=stringtemplate 
version=4.0.2/
 dependency 

[jira] [Commented] (CASSANDRA-7030) Remove JEMallocAllocator

2014-04-15 Thread Vijay (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13970297#comment-13970297
 ] 

Vijay commented on CASSANDRA-7030:
--

You are right i had the synchronization in the test attached in the old ticket 
because initially we had some segfaults which was fixed in the later JEM 
releases, but the synchronization was never committed into cassandra repo 
because by then it was fixed.

Rerunning the test after removing the locks in the same old test classes, the 
results the time take is much better in jemalloc, you might need more runs. The 
memory foot print is better too (malloc is slower and uses more memory 
comparatively as per my tests).
http://pastebin.com/JtixVvGU

As mentioned earlier i don't mind removing it either :)

 Remove JEMallocAllocator
 

 Key: CASSANDRA-7030
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7030
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
Priority: Minor
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 7030.txt


 JEMalloc, whilst having some nice performance properties by comparison to 
 Doug Lea's standard malloc algorithm in principle, is pointless in practice 
 because of the JNA cost. In general it is around 30x more expensive to call 
 than unsafe.allocate(); malloc does not have a variability of response time 
 as extreme as the JNA overhead, so using JEMalloc in Cassandra is never a 
 sensible idea. I doubt if custom JNI would make it worthwhile either.
 I propose removing it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7030) Remove JEMallocAllocator

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-7030:


Attachment: benchmark.21.diff.txt

bq. As mentioned earlier i don't mind removing it either

Well, if it demonstrates an advantage I'd prefer to keep it still :-)

Could you try running my benchmark, so we can compare the more specific stats, 
and can rule out interference by CLHM? I'm particularly surprised that it is 
anything like as fast, let alone faster, given how much dramatically slower it 
is on my box (36MB/s is laughable). It's possible I have an older version of 
jemalloc bundled with Ubuntu (I cannot run multi-threaded, but I think this is 
down to compile options), but I assume the only explanation for such awful 
performance is JNA.

I've attached a diff that should apply to 2.1.

 Remove JEMallocAllocator
 

 Key: CASSANDRA-7030
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7030
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
Priority: Minor
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 7030.txt, benchmark.21.diff.txt


 JEMalloc, whilst having some nice performance properties by comparison to 
 Doug Lea's standard malloc algorithm in principle, is pointless in practice 
 because of the JNA cost. In general it is around 30x more expensive to call 
 than unsafe.allocate(); malloc does not have a variability of response time 
 as extreme as the JNA overhead, so using JEMalloc in Cassandra is never a 
 sensible idea. I doubt if custom JNI would make it worthwhile either.
 I propose removing it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7042) Disk space growth until restart

2014-04-15 Thread Zach Aller (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zach Aller updated CASSANDRA-7042:
--

Attachment: after.log
before.log

 Disk space growth until restart
 ---

 Key: CASSANDRA-7042
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7042
 Project: Cassandra
  Issue Type: Bug
 Environment: Ubuntu 12.04
 Sun Java 7
 Cassandra 2.0.6
Reporter: Zach Aller
Priority: Critical
 Attachments: after.log, before.log


 Cassandra will constantly eat disk space not sure whats causing it the only 
 thing that seems to fix it is a restart of cassandra this happens about every 
 3-5 hrs we will grow from about 350GB to 650GB with no end in site. Once we 
 restart cassandra it usually all clears itself up and disks return to normal 
 for a while then something triggers its and starts climbing again. Sometimes 
 when we restart compactions pending skyrocket and if we restart a second time 
 the compactions pending drop off back to a normal level. One other thing to 
 note is the space is not free'd until cassandra starts back up and not when 
 shutdown.
 I will get a clean log of before and after restarting next time it happens 
 and post it.
 Here is a common ERROR in our logs that might be related
 ERROR [CompactionExecutor:46] 2014-04-15 09:12:51,040 CassandraDaemon.java 
 (line 196) Exception in thread Thread[CompactionExecutor:46,1,main]
 java.lang.RuntimeException: java.io.FileNotFoundException: 
 /local-project/cassandra_data/data/wxgrid/grid/wxgrid-grid-jb-468677-Data.db 
 (No such file or directory)
 at 
 org.apache.cassandra.io.util.ThrottledReader.open(ThrottledReader.java:53)
 at 
 org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:1355)
 at 
 org.apache.cassandra.io.sstable.SSTableScanner.init(SSTableScanner.java:67)
 at 
 org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:1161)
 at 
 org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:1173)
 at 
 org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getScanners(LeveledCompactionStrategy.java:194)
 at 
 org.apache.cassandra.db.compaction.AbstractCompactionStrategy.getScanners(AbstractCompactionStrategy.java:258)
 at 
 org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:126)
 at 
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at 
 org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
 at 
 org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:197)
 at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 at java.util.concurrent.FutureTask.run(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)
 Caused by: java.io.FileNotFoundException: 
 /local-project/cassandra_data/data/wxgrid/grid/wxgrid-grid-jb-468677-Data.db 
 (No such file or directory)
 at java.io.RandomAccessFile.open(Native Method)
 at java.io.RandomAccessFile.init(Unknown Source)
 at 
 org.apache.cassandra.io.util.RandomAccessReader.init(RandomAccessReader.java:58)
 at 
 org.apache.cassandra.io.util.ThrottledReader.init(ThrottledReader.java:35)
 at 
 org.apache.cassandra.io.util.ThrottledReader.open(ThrottledReader.java:49)
 ... 17 more



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables

2014-04-15 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13970462#comment-13970462
 ] 

Pavel Yaskevich commented on CASSANDRA-6694:


[~benedict] While working on trying to avoid usage of Impl classes and looking 
closer at the code I have a question, which knowing that future is going to be 
totally off-heap makes sense to ask now: current Native*Cell classes re-use 
Impl code from static implementations of interfaces but some of the methods 
e.g. reconcile for Counter(Update)Cell in certain conditions need to generate a 
new object (for now we are allocating BufferCounterCell which allows as to use 
CounterCell.Impl.reconcile for both implementations), do you have an action 
plan regarding required changes in that regard for the next step in this series 
when we are not going to copy things back to heap? 

 Slightly More Off-Heap Memtables
 

 Key: CASSANDRA-6694
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2


 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as 
 the on-heap overhead is still very large. It should not be tremendously 
 difficult to extend these changes so that we allocate entire Cells off-heap, 
 instead of multiple BBs per Cell (with all their associated overhead).
 The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 
 bytes per cell on average for the btree overhead, for a total overhead of 
 around 20-22 bytes). This translates to 8-byte object overhead, 4-byte 
 address (we will do alignment tricks like the VM to allow us to address a 
 reasonably large memory space, although this trick is unlikely to last us 
 forever, at which point we will have to bite the bullet and accept a 24-byte 
 per cell overhead), and 4-byte object reference for maintaining our internal 
 list of allocations, which is unfortunately necessary since we cannot safely 
 (and cheaply) walk the object graph we allocate otherwise, which is necessary 
 for (allocation-) compaction and pointer rewriting.
 The ugliest thing here is going to be implementing the various CellName 
 instances so that they may be backed by native memory OR heap memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6696) Drive replacement in JBOD can cause data to reappear.

2014-04-15 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969333#comment-13969333
 ] 

Marcus Eriksson commented on CASSANDRA-6696:


pushed a new version to 
https://github.com/krummas/cassandra/commits/marcuse/6696-3 which;

* adds nodetool command to rebalance data over disks so that user can do this 
whenever they want (like after manually adding sstables to the data directories)
* removes diskawarewriter from everything but streams and the rebalancing 
command
* makes the flush executor an array of executors.
* splits ranges based on total partitioner range and makes this feature 
vnodes-only
* supports the old way of doing things for non-vnodes setup (and ordered 
partitioners)

there are still some of my config-changes left in as i bet there will be more 
comments on this

 Drive replacement in JBOD can cause data to reappear. 
 --

 Key: CASSANDRA-6696
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6696
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: sankalp kohli
Assignee: Marcus Eriksson
 Fix For: 3.0


 In JBOD, when someone gets a bad drive, the bad drive is replaced with a new 
 empty one and repair is run. 
 This can cause deleted data to come back in some cases. Also this is true for 
 corrupt stables in which we delete the corrupt stable and run repair. 
 Here is an example:
 Say we have 3 nodes A,B and C and RF=3 and GC grace=10days. 
 row=sankalp col=sankalp is written 20 days back and successfully went to all 
 three nodes. 
 Then a delete/tombstone was written successfully for the same row column 15 
 days back. 
 Since this tombstone is more than gc grace, it got compacted in Nodes A and B 
 since it got compacted with the actual data. So there is no trace of this row 
 column in node A and B.
 Now in node C, say the original data is in drive1 and tombstone is in drive2. 
 Compaction has not yet reclaimed the data and tombstone.  
 Drive2 becomes corrupt and was replaced with new empty drive. 
 Due to the replacement, the tombstone in now gone and row=sankalp col=sankalp 
 has come back to life. 
 Now after replacing the drive we run repair. This data will be propagated to 
 all nodes. 
 Note: This is still a problem even if we run repair every gc grace. 
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7030) Remove JEMallocAllocator

2014-04-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969377#comment-13969377
 ] 

Benedict commented on CASSANDRA-7030:
-

FTR, though, I think the problem with your test is that jemalloc is 
synchronised and malloc is not. This leads to the CLHM not obeying its limits 
as readily as it is asked to (seems to keep ~ 3x as much data around in my 
test):

{noformat}
concurrent malloc:
Elapsed: 55.433s
Allocated: 2973Mb
VM total:177
vsz: 6221
rsz: 4501

synchronized malloc:
Elapsed: 96.507s
Allocated: 1026Mb
VM total:187
vsz: 3341
rsz: 1681

synchronized jemalloc:
Elapsed: 263.686s
Allocated: 1027Mb
VM total:192
vsz: 3628
rsz: 1525
{noformat}

and for posterity, the code I was running:

{code}
public static void main(String[] args) throws InterruptedException, 
IOException
{
String pid = 
ManagementFactory.getRuntimeMXBean().getName().split(@)[0];
final IAllocator allocator = new NativeAllocator();
final AtomicLong total = new AtomicLong();
EvictionListenerUUID, Memory listener = new EvictionListenerUUID, 
Memory()
{
public void onEviction(UUID k, Memory mem)
{
total.addAndGet(-mem.size());
mem.free(allocator);
}
};

final MapUUID, Memory map = new ConcurrentLinkedHashMap.BuilderUUID, 
Memory().weigher(Weighers.Memory singleton())

.initialCapacity(8 * 65536).maximumWeightedCapacity(2 * 65536)

.listener(listener).build();
final AtomicLong elapsed = new AtomicLong();
final AtomicLong count = new AtomicLong();
final ExecutorService exec = Executors.newFixedThreadPool(8);
for (int i = 0 ; i  8 ; i++)
{
final Random rand = new Random(i);
exec.execute(new Runnable()
{
public void run()
{
byte[] keyBytes = new byte[16];
for (int i = 0; i  100; i++)
{
int size = rand.nextInt(128 * 128);
if (size = 0)
continue;
rand.nextBytes(keyBytes);
long start = System.nanoTime();
Memory mem = new Memory(allocator, size);
elapsed.addAndGet(System.nanoTime() - start);
mem.setMemory(0, mem.size(), (byte) 2);
Memory r = map.put(UUID.nameUUIDFromBytes(keyBytes), 
mem);
if (r != null)
r.free();
total.addAndGet(size);
if (count.incrementAndGet() % 1000 == 0)
System.out.println(1M);
}
}
});
}

exec.shutdown();
exec.awaitTermination(1L, TimeUnit.HOURS);
System.out.println(String.format(Elapsed: %.3fs, elapsed.get() * 
0.1d));
System.out.println(String.format(Allocated: %.0fMb, total.get() / 
(double) (1  20)));
System.out.println(String.format(VM total:%.0f, 
Runtime.getRuntime().totalMemory() / (double) (1  20)));
memuse(vsz, pid);
memuse(rsz, pid);
Thread.sleep(100);
}

private static void memuse(String type, String pid) throws IOException
{
Process p = new ProcessBuilder().command(ps, -o, type, 
pid).redirectErrorStream(true).start();
BufferedReader reader = new BufferedReader(new 
InputStreamReader(p.getInputStream()));
reader.readLine();
System.out.println(String.format(%s: %.0f, type, 
Integer.parseInt(reader.readLine()) / 1024d));
}
{code}

 Remove JEMallocAllocator
 

 Key: CASSANDRA-7030
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7030
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
Priority: Minor
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 7030.txt


 JEMalloc, whilst having some nice performance properties by comparison to 
 Doug Lea's standard malloc algorithm in principle, is pointless in practice 
 because of the JNA cost. In general it is around 30x more expensive to call 
 than unsafe.allocate(); malloc does not have a variability of response time 
 as extreme as the JNA overhead, so using JEMalloc in Cassandra is never a 
 sensible idea. I doubt if custom JNI would make it worthwhile either.
 I propose removing it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (CASSANDRA-7030) Remove JEMallocAllocator

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict reopened CASSANDRA-7030:
-


I think there's actually a couple of questions we should answer before closing 
the ticket:

1) without JNI, should we be supporting jemalloc (it is slower and has higher 
overheads in all comparable workloads we can test)?
2) should we be synchronising on malloc/free for jemalloc? Or do we simply hope 
the user has compiled jemalloc in a manner that avoids the issue?

 Remove JEMallocAllocator
 

 Key: CASSANDRA-7030
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7030
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
Priority: Minor
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 7030.txt


 JEMalloc, whilst having some nice performance properties by comparison to 
 Doug Lea's standard malloc algorithm in principle, is pointless in practice 
 because of the JNA cost. In general it is around 30x more expensive to call 
 than unsafe.allocate(); malloc does not have a variability of response time 
 as extreme as the JNA overhead, so using JEMalloc in Cassandra is never a 
 sensible idea. I doubt if custom JNI would make it worthwhile either.
 I propose removing it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-7038) Nodetool rebuild_index requires named indexes argument

2014-04-15 Thread Sam Tunnicliffe (JIRA)
Sam Tunnicliffe created CASSANDRA-7038:
--

 Summary: Nodetool rebuild_index requires named indexes argument
 Key: CASSANDRA-7038
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7038
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Reporter: Sam Tunnicliffe
Assignee: Sam Tunnicliffe
Priority: Trivial


In addition to explicitly listing the indexes to be rebuilt, nodetool 
rebuild_indexes will also accept just keyspace  columnfamily arguments, 
indicating that all indexes for that ks/cf should be rebuilt.
This doesn't actually work as CFS.rebuildSecondaryIndex requires the explicit 
list. In the 2 arg version, nodetool just passes an empty list here and so the 
rebuild becomes a no-op. As this has been the case since CASSANDRA-3860 
(AFAICT, 80ea03f is the commit that removed this) we may as well just remove 
the option from nodetool, patch attached to do that. 




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-3680) Add Support for Composite Secondary Indexes

2014-04-15 Thread Sam Tunnicliffe (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-3680:
---

Attachment: 7038-2.1.txt
7038-1.2.txt

 Add Support for Composite Secondary Indexes
 ---

 Key: CASSANDRA-3680
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3680
 Project: Cassandra
  Issue Type: Sub-task
Reporter: T Jake Luciani
Assignee: Sylvain Lebresne
  Labels: cql3, secondary_index
 Fix For: 1.2.0 beta 1

 Attachments: 0001-Secondary-indexes-on-composite-columns.txt


 CASSANDRA-2474 and CASSANDRA-3647 add the ability to transpose wide rows 
 differently, for efficiency and functionality secondary index api needs to be 
 altered to allow composite indexes.  
 I think this will require the IndexManager api to have a 
 maybeIndex(ByteBuffer column) method that SS can call and implement a 
 PerRowSecondaryIndex per column, break the composite into parts and index 
 specific bits, also including the base rowkey.
 Then a search against a TRANSPOSED row or DOCUMENT will be possible.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7038) Nodetool rebuild_index requires named indexes argument

2014-04-15 Thread Sam Tunnicliffe (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-7038:
---

Attachment: 7038-2.1.txt
7038-1.2.txt

 Nodetool rebuild_index requires named indexes argument
 --

 Key: CASSANDRA-7038
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7038
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Reporter: Sam Tunnicliffe
Assignee: Sam Tunnicliffe
Priority: Trivial
 Attachments: 7038-1.2.txt, 7038-2.1.txt


 In addition to explicitly listing the indexes to be rebuilt, nodetool 
 rebuild_indexes will also accept just keyspace  columnfamily arguments, 
 indicating that all indexes for that ks/cf should be rebuilt.
 This doesn't actually work as CFS.rebuildSecondaryIndex requires the explicit 
 list. In the 2 arg version, nodetool just passes an empty list here and so 
 the rebuild becomes a no-op. As this has been the case since CASSANDRA-3860 
 (AFAICT, 80ea03f is the commit that removed this) we may as well just remove 
 the option from nodetool, patch attached to do that. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-3680) Add Support for Composite Secondary Indexes

2014-04-15 Thread Sam Tunnicliffe (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-3680:
---

Attachment: (was: 7038-2.1.txt)

 Add Support for Composite Secondary Indexes
 ---

 Key: CASSANDRA-3680
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3680
 Project: Cassandra
  Issue Type: Sub-task
Reporter: T Jake Luciani
Assignee: Sylvain Lebresne
  Labels: cql3, secondary_index
 Fix For: 1.2.0 beta 1

 Attachments: 0001-Secondary-indexes-on-composite-columns.txt


 CASSANDRA-2474 and CASSANDRA-3647 add the ability to transpose wide rows 
 differently, for efficiency and functionality secondary index api needs to be 
 altered to allow composite indexes.  
 I think this will require the IndexManager api to have a 
 maybeIndex(ByteBuffer column) method that SS can call and implement a 
 PerRowSecondaryIndex per column, break the composite into parts and index 
 specific bits, also including the base rowkey.
 Then a search against a TRANSPOSED row or DOCUMENT will be possible.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7030) Remove JEMallocAllocator

2014-04-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969419#comment-13969419
 ] 

Benedict commented on CASSANDRA-7030:
-

bq. This leads to the CLHM not obeying its limits as readily as it is asked to

Confirmed that the problem I am seeing with concurrent execution (and that I 
would guess is leading to your test results) is down to CLHM. By replacing the 
CLHM with an AtomicReferenceArray to guarantee the bounds I get:

{noformat}
concurrent malloc:
Total Elapsed: 9.708s
Allocate Elapsed: 21.271s
Free Elapsed: 26.023s
Total Allocated: 62483Mb
Rate: 1.290Gb/s
Live Allocated: 1020Mb
VM total:117
vsz: 3149
rsz: 1280

synchronized malloc:
Total Elapsed: 36.526s
Allocate Elapsed: 134.114s
Free Elapsed: 128.416s
Total Allocated: 62483Mb
Rate: 0.232Gb/s
Live Allocated: 1020Mb
VM total:117
vsz: 3213
rsz: 1427

synchronized jemalloc:
Total Elapsed: 217.113s
Allocate Elapsed: 162.753s
Free Elapsed: 1531.215s
Total Allocated: 62483Mb
Rate: 0.036Gb/s
Live Allocated: 1020Mb
VM total:70
vsz: 4084
rsz: 1410
{noformat}

Can you rerun your test with either synchronised malloc, or with an 
AtomicReferenceArray instead of the CLHM, to confirm?

Note I have reverted my position back to let's get rid of jemalloc - without 
more evidence to the contrary: the test I was running that initiated the 
creation of this ticket was measuring elapsed time for both allocate() *and* 
free(), and I dropped the latter from the tests based on your benchmark because 
it's difficult to time the free() calls (as they live in the eviction 
listener). Now I am timing both, and you can see the real-elapsed time and 
per-CPU elapsed times are dramatically higher for jemalloc once both are 
included. The cost of calling free() appears to be disproportionately higher 
for jemalloc.

Note the throughput rate for jemalloc: 36Mb/s. This is really really pathetic!

 Remove JEMallocAllocator
 

 Key: CASSANDRA-7030
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7030
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
Priority: Minor
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: 7030.txt


 JEMalloc, whilst having some nice performance properties by comparison to 
 Doug Lea's standard malloc algorithm in principle, is pointless in practice 
 because of the JNA cost. In general it is around 30x more expensive to call 
 than unsafe.allocate(); malloc does not have a variability of response time 
 as extreme as the JNA overhead, so using JEMalloc in Cassandra is never a 
 sensible idea. I doubt if custom JNI would make it worthwhile either.
 I propose removing it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-3680) Add Support for Composite Secondary Indexes

2014-04-15 Thread Sam Tunnicliffe (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-3680:
---

Attachment: (was: 7038-1.2.txt)

 Add Support for Composite Secondary Indexes
 ---

 Key: CASSANDRA-3680
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3680
 Project: Cassandra
  Issue Type: Sub-task
Reporter: T Jake Luciani
Assignee: Sylvain Lebresne
  Labels: cql3, secondary_index
 Fix For: 1.2.0 beta 1

 Attachments: 0001-Secondary-indexes-on-composite-columns.txt


 CASSANDRA-2474 and CASSANDRA-3647 add the ability to transpose wide rows 
 differently, for efficiency and functionality secondary index api needs to be 
 altered to allow composite indexes.  
 I think this will require the IndexManager api to have a 
 maybeIndex(ByteBuffer column) method that SS can call and implement a 
 PerRowSecondaryIndex per column, break the composite into parts and index 
 specific bits, also including the base rowkey.
 Then a search against a TRANSPOSED row or DOCUMENT will be possible.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6924) Data Inserted Immediately After Secondary Index Creation is not Indexed

2014-04-15 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969428#comment-13969428
 ] 

Sam Tunnicliffe commented on CASSANDRA-6924:


This doesn't seem like a regression as the repro script fails for me just as 
consistently on 1.2.15 as it does on later versions.

The issue appears to be that when a ks or cf is dropped, we don't update 
system.IndexInfo to remove the entry for the 2i. Then when the ks/cf  index 
are recreated, we treat the index creation not as a brand new index, but as if 
we're restarting and linking in an existing index to the cf. So we skip the 
buildIndexAsync call that we should make which is what causes some entries to 
never get indexed. 

Fixing this so that we do clean up IndexInfo leads to us running into 
CASSANDRA-5202 on pre-2.1 branches. On 2.1, we see the issues mentioned in 
CASSANDRA-6959 so as Sylvain suggests there, the test needs to be changed to 
wait for schema agreement. This can be acheived with a 1s wait, or by actively 
testing for agreement. Now that the buildIndexAsync call is happening on index 
initialisation, we can insert this wait in one of two places: between the index 
creation and the inserts, or between the inserts and the reads. I've updated 
the dtest accordingly and added another variant which drops just the cf, rather 
than the entire ks (https://github.com/riptano/cassandra-dtest/pull/40). I do 
still see the errors from {{CommitLogSegmentManager}} on 2.1 detailed on 
CASSANDRA-6959 even after applying the patch attached to that issue.

Likewise, using Tyler's original repro script, a 1s sleep before commencing the 
reads is now enough to ensure the run succeeds (on the 2.1 branch).

On trunk, I get completely different errors running both the dtest  repro.py, 
both with and without the IndexInfo fix:
{code}
ERROR [Thrift:1] 2014-04-14 15:45:10,714 CustomTThreadPoolServer.java:212 - 
Error occurred during processing of message.
java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
java.lang.IllegalArgumentException: fromIndex(34)  toIndex(25)
at 
org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:411) 
~[main/:na]
at 
org.apache.cassandra.service.MigrationManager.announce(MigrationManager.java:281)
 ~[main/:na]
at 
org.apache.cassandra.service.MigrationManager.announceColumnFamilyUpdate(MigrationManager.java:242)
 ~[main/:na]
at 
org.apache.cassandra.cql3.statements.CreateIndexStatement.announceMigration(CreateIndexStatement.java:141)
 ~[main/:na]
at 
org.apache.cassandra.cql3.statements.SchemaAlteringStatement.execute(SchemaAlteringStatement.java:71)
 ~[main/:na]
at 
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:180)
 ~[main/:na]
at 
org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:214) 
~[main/:na]
at 
org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:204) 
~[main/:na]
at 
org.apache.cassandra.thrift.CassandraServer.execute_cql3_query(CassandraServer.java:1973)
 ~[main/:na]
at 
org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4486)
 ~[thrift/:na]
at 
org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4470)
 ~[thrift/:na]
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
~[libthrift-0.9.1.jar:0.9.1]
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
~[libthrift-0.9.1.jar:0.9.1]
at 
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:194)
 ~[main/:na]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_51]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_51]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
Caused by: java.util.concurrent.ExecutionException: 
java.lang.IllegalArgumentException: fromIndex(34)  toIndex(25)
at java.util.concurrent.FutureTask.report(FutureTask.java:122) 
~[na:1.7.0_51]
at java.util.concurrent.FutureTask.get(FutureTask.java:188) 
~[na:1.7.0_51]
at 
org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:407) 
~[main/:na]
... 16 common frames omitted
Caused by: java.lang.IllegalArgumentException: fromIndex(34)  toIndex(25)
at java.util.TimSort.rangeCheck(TimSort.java:921) ~[na:1.7.0_51]
at java.util.TimSort.sort(TimSort.java:182) ~[na:1.7.0_51]
at java.util.Arrays.sort(Arrays.java:727) ~[na:1.7.0_51]
at 
org.apache.cassandra.db.ArrayBackedSortedColumns.sortCells(ArrayBackedSortedColumns.java:113)
 ~[main/:na]
at 

[jira] [Updated] (CASSANDRA-6924) Data Inserted Immediately After Secondary Index Creation is not Indexed

2014-04-15 Thread Sam Tunnicliffe (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-6924:
---

Attachment: 6924-2.1.txt

 Data Inserted Immediately After Secondary Index Creation is not Indexed
 ---

 Key: CASSANDRA-6924
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6924
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Tyler Hobbs
Assignee: Sam Tunnicliffe
 Fix For: 2.0.7

 Attachments: 6924-2.1.txt, repro.py


 The head of the cassandra-1.2 branch (currently 1.2.16-tentative) contains a 
 regression from 1.2.15.  Data that is inserted immediately after secondary 
 index creation may never get indexed.
 You can reproduce the issue with a [pycassa integration 
 test|https://github.com/pycassa/pycassa/blob/master/tests/test_autopacking.py#L793]
  by running:
 {noformat}
 nosetests tests/test_autopacking.py:TestKeyValidators.test_get_indexed_slices
 {noformat}
 from the pycassa directory.
 The operation order goes like this:
 # create CF
 # create secondary index
 # insert data
 # query secondary index
 If a short sleep is added in between steps 2 and 3, the data gets indexed and 
 the query is successful.
 If a sleep is only added in between steps 3 and 4, some of the data is never 
 indexed and the query will return incomplete results.  This appears to be the 
 case even if the sleep is relatively long (30s), which makes me think the 
 data may never get indexed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-4718) More-efficient ExecutorService for improved throughput

2014-04-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969476#comment-13969476
 ] 

Benedict commented on CASSANDRA-4718:
-

[~jasobrown]: Could you upload the full stress outputs for these runs? And also 
try running a separate stress run with a fixed high threadcount and op count?

In particular for CQL, the results in the file are a little bit weird. That 
said, given their consistency for thrift I don't doubt the result is 
meaningful, but it would be good to understand what we're incorporating a bit 
better before committing.

 More-efficient ExecutorService for improved throughput
 --

 Key: CASSANDRA-4718
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4718
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Assignee: Jason Brown
Priority: Minor
  Labels: performance
 Fix For: 2.1

 Attachments: 4718-v1.patch, PerThreadQueue.java, baq vs trunk.png, op 
 costs of various queues.ods, stress op rate with various queues.ods, 
 v1-stress.out


 Currently all our execution stages dequeue tasks one at a time.  This can 
 result in contention between producers and consumers (although we do our best 
 to minimize this by using LinkedBlockingQueue).
 One approach to mitigating this would be to make consumer threads do more 
 work in bulk instead of just one task per dequeue.  (Producer threads tend 
 to be single-task oriented by nature, so I don't see an equivalent 
 opportunity there.)
 BlockingQueue has a drainTo(collection, int) method that would be perfect for 
 this.  However, no ExecutorService in the jdk supports using drainTo, nor 
 could I google one.
 What I would like to do here is create just such a beast and wire it into (at 
 least) the write and read stages.  (Other possible candidates for such an 
 optimization, such as the CommitLog and OutboundTCPConnection, are not 
 ExecutorService-based and will need to be one-offs.)
 AbstractExecutorService may be useful.  The implementations of 
 ICommitLogExecutorService may also be useful. (Despite the name these are not 
 actual ExecutorServices, although they share the most important properties of 
 one.)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


git commit: Clean up IndexInfo on keyspace/table drops

2014-04-15 Thread aleksey
Repository: cassandra
Updated Branches:
  refs/heads/cassandra-2.1 6658a6e03 - b69f5e363


Clean up IndexInfo on keyspace/table drops

patch by Sam Tunnicliffe; reviewed by Aleksey Yeschenko for
CASSANDRA-6924


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/b69f5e36
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/b69f5e36
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/b69f5e36

Branch: refs/heads/cassandra-2.1
Commit: b69f5e363b75543429a25b0909b45dff735c64b2
Parents: 6658a6e
Author: beobal s...@beobal.com
Authored: Mon Apr 14 20:08:31 2014 +0100
Committer: Aleksey Yeschenko alek...@apache.org
Committed: Tue Apr 15 15:17:58 2014 +0300

--
 CHANGES.txt  | 1 +
 src/java/org/apache/cassandra/config/CFMetaData.java | 6 ++
 src/java/org/apache/cassandra/config/KSMetaData.java | 1 +
 3 files changed, 8 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/b69f5e36/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index d7c6e71..592eef9 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -44,6 +44,7 @@
  * Ensure safe resource cleanup when replacing sstables (CASSANDRA-6912)
  * Add failure handler to async callback (CASSANDRA-6747)
  * Fix AE when closing SSTable without releasing reference (CASSANDRA-7000)
+ * Clean up IndexInfo on keyspace/table drops (CASSANDRA-6924)
 Merged from 2.0:
  * Put nodes in hibernate when join_ring is false (CASSANDRA-6961)
  * Allow compaction of system tables during startup (CASSANDRA-6913)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/b69f5e36/src/java/org/apache/cassandra/config/CFMetaData.java
--
diff --git a/src/java/org/apache/cassandra/config/CFMetaData.java 
b/src/java/org/apache/cassandra/config/CFMetaData.java
index e930de4..72a0fc5 100644
--- a/src/java/org/apache/cassandra/config/CFMetaData.java
+++ b/src/java/org/apache/cassandra/config/CFMetaData.java
@@ -1585,6 +1585,12 @@ public final class CFMetaData
 for (TriggerDefinition td : triggers.values())
 td.deleteFromSchema(mutation, cfName, timestamp);
 
+for (String indexName : 
Keyspace.open(this.ksName).getColumnFamilyStore(this.cfName).getBuiltIndexes())
+{
+ColumnFamily indexCf = mutation.addOrGet(IndexCf);
+
indexCf.addTombstone(indexCf.getComparator().makeCellName(indexName), ldt, 
timestamp);
+}
+
 return mutation;
 }
 

http://git-wip-us.apache.org/repos/asf/cassandra/blob/b69f5e36/src/java/org/apache/cassandra/config/KSMetaData.java
--
diff --git a/src/java/org/apache/cassandra/config/KSMetaData.java 
b/src/java/org/apache/cassandra/config/KSMetaData.java
index 3d1edb6..d0cb613 100644
--- a/src/java/org/apache/cassandra/config/KSMetaData.java
+++ b/src/java/org/apache/cassandra/config/KSMetaData.java
@@ -242,6 +242,7 @@ public final class KSMetaData
 mutation.delete(SystemKeyspace.SCHEMA_COLUMNS_CF, timestamp);
 mutation.delete(SystemKeyspace.SCHEMA_TRIGGERS_CF, timestamp);
 mutation.delete(SystemKeyspace.SCHEMA_USER_TYPES_CF, timestamp);
+mutation.delete(SystemKeyspace.INDEX_CF, timestamp);
 
 return mutation;
 }



[1/2] git commit: Clean up IndexInfo on keyspace/table drops

2014-04-15 Thread aleksey
Repository: cassandra
Updated Branches:
  refs/heads/trunk 6e97178a5 - fc4ae115a


Clean up IndexInfo on keyspace/table drops

patch by Sam Tunnicliffe; reviewed by Aleksey Yeschenko for
CASSANDRA-6924


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/b69f5e36
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/b69f5e36
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/b69f5e36

Branch: refs/heads/trunk
Commit: b69f5e363b75543429a25b0909b45dff735c64b2
Parents: 6658a6e
Author: beobal s...@beobal.com
Authored: Mon Apr 14 20:08:31 2014 +0100
Committer: Aleksey Yeschenko alek...@apache.org
Committed: Tue Apr 15 15:17:58 2014 +0300

--
 CHANGES.txt  | 1 +
 src/java/org/apache/cassandra/config/CFMetaData.java | 6 ++
 src/java/org/apache/cassandra/config/KSMetaData.java | 1 +
 3 files changed, 8 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/b69f5e36/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index d7c6e71..592eef9 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -44,6 +44,7 @@
  * Ensure safe resource cleanup when replacing sstables (CASSANDRA-6912)
  * Add failure handler to async callback (CASSANDRA-6747)
  * Fix AE when closing SSTable without releasing reference (CASSANDRA-7000)
+ * Clean up IndexInfo on keyspace/table drops (CASSANDRA-6924)
 Merged from 2.0:
  * Put nodes in hibernate when join_ring is false (CASSANDRA-6961)
  * Allow compaction of system tables during startup (CASSANDRA-6913)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/b69f5e36/src/java/org/apache/cassandra/config/CFMetaData.java
--
diff --git a/src/java/org/apache/cassandra/config/CFMetaData.java 
b/src/java/org/apache/cassandra/config/CFMetaData.java
index e930de4..72a0fc5 100644
--- a/src/java/org/apache/cassandra/config/CFMetaData.java
+++ b/src/java/org/apache/cassandra/config/CFMetaData.java
@@ -1585,6 +1585,12 @@ public final class CFMetaData
 for (TriggerDefinition td : triggers.values())
 td.deleteFromSchema(mutation, cfName, timestamp);
 
+for (String indexName : 
Keyspace.open(this.ksName).getColumnFamilyStore(this.cfName).getBuiltIndexes())
+{
+ColumnFamily indexCf = mutation.addOrGet(IndexCf);
+
indexCf.addTombstone(indexCf.getComparator().makeCellName(indexName), ldt, 
timestamp);
+}
+
 return mutation;
 }
 

http://git-wip-us.apache.org/repos/asf/cassandra/blob/b69f5e36/src/java/org/apache/cassandra/config/KSMetaData.java
--
diff --git a/src/java/org/apache/cassandra/config/KSMetaData.java 
b/src/java/org/apache/cassandra/config/KSMetaData.java
index 3d1edb6..d0cb613 100644
--- a/src/java/org/apache/cassandra/config/KSMetaData.java
+++ b/src/java/org/apache/cassandra/config/KSMetaData.java
@@ -242,6 +242,7 @@ public final class KSMetaData
 mutation.delete(SystemKeyspace.SCHEMA_COLUMNS_CF, timestamp);
 mutation.delete(SystemKeyspace.SCHEMA_TRIGGERS_CF, timestamp);
 mutation.delete(SystemKeyspace.SCHEMA_USER_TYPES_CF, timestamp);
+mutation.delete(SystemKeyspace.INDEX_CF, timestamp);
 
 return mutation;
 }



[2/2] git commit: Merge branch 'cassandra-2.1' into trunk

2014-04-15 Thread aleksey
Merge branch 'cassandra-2.1' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/fc4ae115
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/fc4ae115
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/fc4ae115

Branch: refs/heads/trunk
Commit: fc4ae115ac94b1599d308956590672eaca49e64d
Parents: 6e97178 b69f5e3
Author: Aleksey Yeschenko alek...@apache.org
Authored: Tue Apr 15 15:23:12 2014 +0300
Committer: Aleksey Yeschenko alek...@apache.org
Committed: Tue Apr 15 15:23:12 2014 +0300

--
 CHANGES.txt  | 1 +
 src/java/org/apache/cassandra/config/CFMetaData.java | 6 ++
 src/java/org/apache/cassandra/config/KSMetaData.java | 1 +
 3 files changed, 8 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/fc4ae115/CHANGES.txt
--

http://git-wip-us.apache.org/repos/asf/cassandra/blob/fc4ae115/src/java/org/apache/cassandra/config/CFMetaData.java
--



[jira] [Created] (CASSANDRA-7039) DirectByteBuffer compatible LZ4 methods

2014-04-15 Thread Benedict (JIRA)
Benedict created CASSANDRA-7039:
---

 Summary: DirectByteBuffer compatible LZ4 methods
 Key: CASSANDRA-7039
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7039
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Priority: Minor


As we move more things off-heap, it's becoming more and more essential to be 
able to use DirectByteBuffer (or native pointers) in various places. 
Unfortunately LZ4 doesn't currently support this operation, despite being JNI 
based - this means we both have to perform unnecessary copies to de/compress 
data from DBB, but also we can stall GC as any JNI method operating over a java 
array using the GetPrimitiveArrayCritical enters a critical section that 
prevents GC for its duration. This means STWs will be at least as long any 
running compression/decompression (and no GC will happen until they complete, 
so it's additive).

We should temporarily fork (and then resubmit upstream) jpountz-lz4 to support 
operating over a native pointer, so that we can pass a DBB or a raw pointer we 
have allocated ourselves. This will help improve performance when flushing the 
new offheap memtables, as well as enable us to implement CASSANDRA-6726 and 
finish CASSANDRA-4338.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (CASSANDRA-5020) Time to switch back to byte[] internally?

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict resolved CASSANDRA-5020.
-

Resolution: Not a Problem

This has most likely become not a problem as a result of movement towards 
off-heap memtables + cells, which bring the overheads down as low as we can go 
with a per-cell data structure.

 Time to switch back to byte[] internally?
 -

 Key: CASSANDRA-5020
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5020
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: T Jake Luciani
  Labels: performance
 Fix For: 3.0


 We switched to ByteBuffer for column names and values back in 0.7, which gave 
 us a short term performance boost on mmap'd reads, but we gave that up when 
 we switched to refcounted sstables in 1.0.  (refcounting all the way up the 
 read path would be too painful, so we copy into an on-heap buffer when 
 reading from an sstable, then release the reference.)
 A HeapByteBuffer wastes a lot of memory compared to a byte[] (5 more ints, a 
 long, and a boolean).
 The hard problem here is how to do the arena allocation we do on writes, 
 which has been very successful in reducing STW CMS from heap fragmentation.  
 ByteBuffer is a good fit there.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7039) DirectByteBuffer compatible LZ4 methods

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-7039:


Fix Version/s: 3.0

 DirectByteBuffer compatible LZ4 methods
 ---

 Key: CASSANDRA-7039
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7039
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Priority: Minor
  Labels: performance
 Fix For: 3.0


 As we move more things off-heap, it's becoming more and more essential to be 
 able to use DirectByteBuffer (or native pointers) in various places. 
 Unfortunately LZ4 doesn't currently support this operation, despite being JNI 
 based - this means we both have to perform unnecessary copies to de/compress 
 data from DBB, but also we can stall GC as any JNI method operating over a 
 java array using the GetPrimitiveArrayCritical enters a critical section that 
 prevents GC for its duration. This means STWs will be at least as long any 
 running compression/decompression (and no GC will happen until they complete, 
 so it's additive).
 We should temporarily fork (and then resubmit upstream) jpountz-lz4 to 
 support operating over a native pointer, so that we can pass a DBB or a raw 
 pointer we have allocated ourselves. This will help improve performance when 
 flushing the new offheap memtables, as well as enable us to implement 
 CASSANDRA-6726 and finish CASSANDRA-4338.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-6755) Optimise CellName/Composite comparisons for NativeCell

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-6755:


Summary: Optimise CellName/Composite comparisons for NativeCell  (was: 
Minimise extraction of CellName components)

 Optimise CellName/Composite comparisons for NativeCell
 --

 Key: CASSANDRA-6755
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6755
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benedict
Assignee: Benedict
Priority: Minor
  Labels: performance
 Fix For: 3.0


 As discussed in CASSANDRA-6694, to reduce temporary garbage generation we 
 should minimise the incidence of CellName component extraction. The biggest 
 win will be to perform comparisons on Cell where possible, instead of 
 CellName, so that Native*Cell can use its extra information to avoid creating 
 any ByteBuffer objects



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6755) Optimise CellName/Composite comparisons for NativeCell

2014-04-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969500#comment-13969500
 ] 

Benedict commented on CASSANDRA-6755:
-

An ideal solution would probably be modelled on the util.FastByteOperations 
class

 Optimise CellName/Composite comparisons for NativeCell
 --

 Key: CASSANDRA-6755
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6755
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benedict
Assignee: Benedict
Priority: Minor
  Labels: performance
 Fix For: 3.0


 As discussed in CASSANDRA-6694, to reduce temporary garbage generation we 
 should minimise the incidence of CellName component extraction. The biggest 
 win will be to perform comparisons on Cell where possible, instead of 
 CellName, so that Native*Cell can use its extra information to avoid creating 
 any ByteBuffer objects



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6487) Log WARN on large batch sizes

2014-04-15 Thread Lyuben Todorov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969501#comment-13969501
 ] 

Lyuben Todorov commented on CASSANDRA-6487:
---

Just noticed that we're actually already using the memory meter for checking 
batch size when it might get placed into the prepared statement cache, so why 
not log based on that value (calculated in 
{{BatchStatement#measureForPreparedCache}}). 

 Log WARN on large batch sizes
 -

 Key: CASSANDRA-6487
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6487
 Project: Cassandra
  Issue Type: Improvement
Reporter: Patrick McFadin
Assignee: Lyuben Todorov
Priority: Minor
 Fix For: 2.0.8

 Attachments: 6487_trunk.patch, 6487_trunk_v2.patch, 
 cassandra-2.0-6487.diff


 Large batches on a coordinator can cause a lot of node stress. I propose 
 adding a WARN log entry if batch sizes go beyond a configurable size. This 
 will give more visibility to operators on something that can happen on the 
 developer side. 
 New yaml setting with 5k default.
 {{# Log WARN on any batch size exceeding this value. 5k by default.}}
 {{# Caution should be taken on increasing the size of this threshold as it 
 can lead to node instability.}}
 {{batch_size_warn_threshold: 5k}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-7040) Replace read/write stage with per-disk access coordination

2014-04-15 Thread Benedict (JIRA)
Benedict created CASSANDRA-7040:
---

 Summary: Replace read/write stage with per-disk access coordination
 Key: CASSANDRA-7040
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7040
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
 Fix For: 3.0


As discussed in CASSANDRA-6995, current coordination of access to disk is 
suboptimal: instead of ensuring disk accesses alone are coordinated, we instead 
coordinate at the level of operations that may touch the disks, ensuring only 
so many are proceeding at once. As such, tuning is difficult, and we incur 
unnecessary delays for operations that would not touch the disk(s).

Ideally we would instead simply use a shared coordination primitive to gate 
access to the disk when we perform a rebuffer. This work would dovetail very 
nicely with any work in CASSANDRA-5863, as we could prevent any blocking or 
context switching for data that we know to be cached. It also, as far as I can 
tell, obviates the need for CASSANDRA-5239.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7040) Replace read/write stage with per-disk access coordination

2014-04-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969514#comment-13969514
 ] 

Benedict commented on CASSANDRA-7040:
-

Further, once we have this, we can experiment with periodically locking access 
to the disks (for short, say 20-50ms periods) in order to let 
compactions/flushes catch up with any outstanding work, if they appear to be 
getting behind.

 Replace read/write stage with per-disk access coordination
 --

 Key: CASSANDRA-7040
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7040
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
  Labels: performance
 Fix For: 3.0


 As discussed in CASSANDRA-6995, current coordination of access to disk is 
 suboptimal: instead of ensuring disk accesses alone are coordinated, we 
 instead coordinate at the level of operations that may touch the disks, 
 ensuring only so many are proceeding at once. As such, tuning is difficult, 
 and we incur unnecessary delays for operations that would not touch the 
 disk(s).
 Ideally we would instead simply use a shared coordination primitive to gate 
 access to the disk when we perform a rebuffer. This work would dovetail very 
 nicely with any work in CASSANDRA-5863, as we could prevent any blocking or 
 context switching for data that we know to be cached. It also, as far as I 
 can tell, obviates the need for CASSANDRA-5239.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-4718) More-efficient ExecutorService for improved throughput

2014-04-15 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969513#comment-13969513
 ] 

Jason Brown commented on CASSANDRA-4718:


OK, will give it a shot today. Also, just noticed I did not tune 
native_transport_max_threads at all (so I have the default of 128). Might play 
with that a bit, as well.

 More-efficient ExecutorService for improved throughput
 --

 Key: CASSANDRA-4718
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4718
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Assignee: Jason Brown
Priority: Minor
  Labels: performance
 Fix For: 2.1

 Attachments: 4718-v1.patch, PerThreadQueue.java, baq vs trunk.png, op 
 costs of various queues.ods, stress op rate with various queues.ods, 
 v1-stress.out


 Currently all our execution stages dequeue tasks one at a time.  This can 
 result in contention between producers and consumers (although we do our best 
 to minimize this by using LinkedBlockingQueue).
 One approach to mitigating this would be to make consumer threads do more 
 work in bulk instead of just one task per dequeue.  (Producer threads tend 
 to be single-task oriented by nature, so I don't see an equivalent 
 opportunity there.)
 BlockingQueue has a drainTo(collection, int) method that would be perfect for 
 this.  However, no ExecutorService in the jdk supports using drainTo, nor 
 could I google one.
 What I would like to do here is create just such a beast and wire it into (at 
 least) the write and read stages.  (Other possible candidates for such an 
 optimization, such as the CommitLog and OutboundTCPConnection, are not 
 ExecutorService-based and will need to be one-offs.)
 AbstractExecutorService may be useful.  The implementations of 
 ICommitLogExecutorService may also be useful. (Despite the name these are not 
 actual ExecutorServices, although they share the most important properties of 
 one.)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6995) Execute local ONE/LOCAL_ONE reads on request thread instead of dispatching to read stage

2014-04-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969515#comment-13969515
 ] 

Benedict commented on CASSANDRA-6995:
-

I've split my suggestion out into another ticket: CASSANDRA-7040

 Execute local ONE/LOCAL_ONE reads on request thread instead of dispatching to 
 read stage
 

 Key: CASSANDRA-6995
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6995
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jason Brown
Assignee: Jason Brown
Priority: Minor
  Labels: performance
 Fix For: 2.0.7

 Attachments: 6995-v1.diff, syncread-stress.txt


 When performing a read local to a coordinator node, AbstractReadExecutor will 
 create a new SP.LocalReadRunnable and drop it into the read stage for 
 asynchronous execution. If you are using a client that intelligently routes  
 read requests to a node holding the data for a given request, and are using 
 CL.ONE/LOCAL_ONE, the enqueuing SP.LocalReadRunnable and waiting for the 
 context switches (and possible NUMA misses) adds unneccesary latency. We can 
 reduce that latency and improve throughput by avoiding the queueing and 
 thread context switching by simply executing the SP.LocalReadRunnable 
 synchronously in the request thread. Testing on a three node cluster (each 
 with 32 cpus, 132 GB ram) yields ~10% improvement in throughput and ~20% 
 speedup on avg/95/99 percentiles (99.9% was about 5-10% improvement).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-6487) Log WARN on large batch sizes

2014-04-15 Thread Lyuben Todorov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969501#comment-13969501
 ] 

Lyuben Todorov edited comment on CASSANDRA-6487 at 4/15/14 1:05 PM:


Just noticed that we're actually already using the memory meter for checking 
batch size when it might get placed into the prepared statement cache, so why 
not log based on that value (calculated in 
{{BatchStatement#measureForPreparedCache}}). As for non-prepared batch 
statements, there we can enforce a limit based on count of statements.


was (Author: lyubent):
Just noticed that we're actually already using the memory meter for checking 
batch size when it might get placed into the prepared statement cache, so why 
not log based on that value (calculated in 
{{BatchStatement#measureForPreparedCache}}). 

 Log WARN on large batch sizes
 -

 Key: CASSANDRA-6487
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6487
 Project: Cassandra
  Issue Type: Improvement
Reporter: Patrick McFadin
Assignee: Lyuben Todorov
Priority: Minor
 Fix For: 2.0.8

 Attachments: 6487_trunk.patch, 6487_trunk_v2.patch, 
 cassandra-2.0-6487.diff


 Large batches on a coordinator can cause a lot of node stress. I propose 
 adding a WARN log entry if batch sizes go beyond a configurable size. This 
 will give more visibility to operators on something that can happen on the 
 developer side. 
 New yaml setting with 5k default.
 {{# Log WARN on any batch size exceeding this value. 5k by default.}}
 {{# Caution should be taken on increasing the size of this threshold as it 
 can lead to node instability.}}
 {{batch_size_warn_threshold: 5k}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-5863) In process (uncompressed) page cache

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-5863:


Summary: In process (uncompressed) page cache  (was: Create a Decompressed 
Chunk [block] Cache)

 In process (uncompressed) page cache
 

 Key: CASSANDRA-5863
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: T Jake Luciani
Assignee: Pavel Yaskevich
  Labels: performance
 Fix For: 2.1 beta2


 Currently, for every read, the CRAR reads each compressed chunk into a 
 byte[], sends it to ICompressor, gets back another byte[] and verifies a 
 checksum.  
 This process is where the majority of time is spent in a read request.  
 Before compression, we would have zero-copy of data and could respond 
 directly from the page-cache.
 It would be useful to have some kind of Chunk cache that could speed up this 
 process for hot data. Initially this could be a off heap cache but it would 
 be great to put these decompressed chunks onto a SSD so the hot data lives on 
 a fast disk similar to https://github.com/facebook/flashcache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-6802) Row cache improvements

2014-04-15 Thread Benedict (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-6802:


Labels: performance  (was: )

 Row cache improvements
 --

 Key: CASSANDRA-6802
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6802
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
  Labels: performance
 Fix For: 3.0


 There are a few things we could do;
 * Start using the native memory constructs from CASSANDRA-6694 to avoid 
 serialization/deserialization costs and to minimize the on-heap overhead
 * Stop invalidating cached rows on writes (update on write instead).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-7041) Select query returns inconsistent result

2014-04-15 Thread Ngoc Minh Vo (JIRA)
Ngoc Minh Vo created CASSANDRA-7041:
---

 Summary: Select query returns inconsistent result
 Key: CASSANDRA-7041
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7041
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra v2.0.6 (upgraded from v2.0.3)
4-node cluster: Windows7, 12GB JVM
Reporter: Ngoc Minh Vo
Priority: Critical


Hello,

We are running in an issue with C* v2.0.x: CQL queries randomly return empty 
result.
Here is the scenario:
1. Schema:
{noformat}
CREATE TABLE string_values (
  date int,
  field text,
  value text,
  PRIMARY KEY ((date, field), value)
) WITH
  bloom_filter_fp_chance=0.10 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.10 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};
{noformat}

2. There is no new data imported to the cluster during the test.

3. CQL query:
{noformat}
select * from string_values where date=20140122 and field='SCONYKSP1';
{noformat}

4. In Cqlsh, the same query has been executed several times during a short 
interval (~1-2 seconds). The first query results are empty and then we got the 
data. And from that point, we always get the correct result:
{noformat}
cqlsh:titan_test select * from string_values where date=20140122 and 
field='SCONYKSP1';
(0 rows)
cqlsh:titan_test select * from string_values where date=20140122 and 
field='SCONYKSP1';
(0 rows)
... ...
cqlsh:titan_test select * from string_values where date=20140122 and 
field='SCONYKSP1';
(0 rows)
cqlsh:titan_test select * from string_values where date=20140122 and 
field='SCONYKSP1';
(0 rows)
cqlsh:titan_test select * from string_values where date=20140122 and 
field='SCONYKSP1';
 date | field | value
--+---+-
 20140122 | SCONYKSP1 | 201401220251826297a_0_3
(1 rows)
cqlsh:titan_test select * from string_values where date=20140122 and 
field='SCONYKSP1';
 date | field | value
--+---+-
 20140122 | SCONYKSP1 | 201401220251826297a_0_3
(1 rows)
{noformat}

5. It might relate to some kind of warmup process. We tried to disable 
key/data caching but it does not help.

Upgrading cluster from v2.0.3 to v2.0.6 does not fix the issue (hence, not 
related to CASSANDRA-6555).

Long time ago, we posted a report on Java Driver JIRA: 
https://datastax-oss.atlassian.net/browse/JAVA-217. But it seems that the issue 
is in the server side.

Best regards,
Minh



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6949) Performance regression in tombstone heavy workloads

2014-04-15 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969587#comment-13969587
 ] 

Jeremiah Jordan commented on CASSANDRA-6949:


This code does't seem to check if there are actually indexes on the columns 
before checking all the range tombstone and isDeleted stuff.  If all those 
checks are really needed, can we at least only do them if there is actually a 
2i of some sort on the table?

 Performance regression in tombstone heavy workloads
 ---

 Key: CASSANDRA-6949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6949
 Project: Cassandra
  Issue Type: Bug
Reporter: Jeremiah Jordan
Assignee: Sam Tunnicliffe

 CASSANDRA-5614 causes a huge performance regression in tombstone heavy 
 workloads.  The isDeleted checks here cause a huge CPU overhead: 
 https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/db/AtomicSortedColumns.java#L189-L196
 An insert workload which does perfectly fine on 1.2, pegs CPU use at 100% on 
 2.0, with all of the mutation threads sitting in that loop.  For example:
 {noformat}
 MutationStage:20 daemon prio=10 tid=0x7fb1c4c72800 nid=0x2249 runnable 
 [0x7fb1b033]
java.lang.Thread.State: RUNNABLE
 at org.apache.cassandra.db.marshal.BytesType.bytesCompare(BytesType.java:45)
 at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:34)
 at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:26)
 at 
 org.apache.cassandra.db.marshal.AbstractType.compareCollectionMembers(AbstractType.java:267)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:85)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
 at 
 org.apache.cassandra.db.RangeTombstoneList.searchInternal(RangeTombstoneList.java:253)
 at 
 org.apache.cassandra.db.RangeTombstoneList.isDeleted(RangeTombstoneList.java:210)
 at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:136)
 at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:123)
 at 
 org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:193)
 at org.apache.cassandra.db.Memtable.resolve(Memtable.java:194)
 at org.apache.cassandra.db.Memtable.put(Memtable.java:158)
 at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:890)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
 at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:201)
 at 
 org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
 at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-6949) Performance regression in tombstone heavy workloads

2014-04-15 Thread Jeremiah Jordan (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Jordan updated CASSANDRA-6949:
---

Attachment: 6949.txt

Looks like we actually added that check in 2.1.  I don't know if there is more 
we want to do, but is it valid to just check
{noformat}
if (indexer != SecondaryIndexManager.nullUpdater  
cm.deletionInfo().hasRanges())
{noformat}
instead of
{noformat}
if (cm.deletionInfo().hasRanges())
{noformat}

 Performance regression in tombstone heavy workloads
 ---

 Key: CASSANDRA-6949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6949
 Project: Cassandra
  Issue Type: Bug
Reporter: Jeremiah Jordan
Assignee: Sam Tunnicliffe
 Attachments: 6949.txt


 CASSANDRA-5614 causes a huge performance regression in tombstone heavy 
 workloads.  The isDeleted checks here cause a huge CPU overhead: 
 https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/db/AtomicSortedColumns.java#L189-L196
 An insert workload which does perfectly fine on 1.2, pegs CPU use at 100% on 
 2.0, with all of the mutation threads sitting in that loop.  For example:
 {noformat}
 MutationStage:20 daemon prio=10 tid=0x7fb1c4c72800 nid=0x2249 runnable 
 [0x7fb1b033]
java.lang.Thread.State: RUNNABLE
 at org.apache.cassandra.db.marshal.BytesType.bytesCompare(BytesType.java:45)
 at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:34)
 at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:26)
 at 
 org.apache.cassandra.db.marshal.AbstractType.compareCollectionMembers(AbstractType.java:267)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:85)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
 at 
 org.apache.cassandra.db.RangeTombstoneList.searchInternal(RangeTombstoneList.java:253)
 at 
 org.apache.cassandra.db.RangeTombstoneList.isDeleted(RangeTombstoneList.java:210)
 at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:136)
 at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:123)
 at 
 org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:193)
 at org.apache.cassandra.db.Memtable.resolve(Memtable.java:194)
 at org.apache.cassandra.db.Memtable.put(Memtable.java:158)
 at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:890)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
 at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:201)
 at 
 org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
 at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2014-04-15 Thread Richard Low (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969613#comment-13969613
 ] 

Richard Low commented on CASSANDRA-5220:


It's going to be a lot slower when there's little data because there is 
num_tokens times as much work to do. But when there is lots of data the times 
should be pretty much independent of num_tokens because most of repair is spent 
reading data and hashing. I ran some tests when we were developing vnodes 
(sorry, I don't have the data still available) and this was the case. Something 
might have regressed though.

 Repair improvements when using vnodes
 -

 Key: CASSANDRA-5220
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.2.0 beta 1
Reporter: Brandon Williams
Assignee: Yuki Morishita
 Fix For: 2.1 beta2


 Currently when using vnodes, repair takes much longer to complete than 
 without them.  This appears at least in part because it's using a session per 
 range and processing them sequentially.  This generates a lot of log spam 
 with vnodes, and while being gentler and lighter on hard disk deployments, 
 ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6949) Performance regression in tombstone heavy workloads

2014-04-15 Thread Sergio Bossa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969655#comment-13969655
 ] 

Sergio Bossa commented on CASSANDRA-6949:
-

That's not enough: PRSI doesn't get notified of column-level deletes (they 
don't need to), so there would still be a performance regression in that case, 
even with that extra check.

 Performance regression in tombstone heavy workloads
 ---

 Key: CASSANDRA-6949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6949
 Project: Cassandra
  Issue Type: Bug
Reporter: Jeremiah Jordan
Assignee: Jeremiah Jordan
 Attachments: 6949.txt


 CASSANDRA-5614 causes a huge performance regression in tombstone heavy 
 workloads.  The isDeleted checks here cause a huge CPU overhead: 
 https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/db/AtomicSortedColumns.java#L189-L196
 An insert workload which does perfectly fine on 1.2, pegs CPU use at 100% on 
 2.0, with all of the mutation threads sitting in that loop.  For example:
 {noformat}
 MutationStage:20 daemon prio=10 tid=0x7fb1c4c72800 nid=0x2249 runnable 
 [0x7fb1b033]
java.lang.Thread.State: RUNNABLE
 at org.apache.cassandra.db.marshal.BytesType.bytesCompare(BytesType.java:45)
 at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:34)
 at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:26)
 at 
 org.apache.cassandra.db.marshal.AbstractType.compareCollectionMembers(AbstractType.java:267)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:85)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
 at 
 org.apache.cassandra.db.RangeTombstoneList.searchInternal(RangeTombstoneList.java:253)
 at 
 org.apache.cassandra.db.RangeTombstoneList.isDeleted(RangeTombstoneList.java:210)
 at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:136)
 at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:123)
 at 
 org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:193)
 at org.apache.cassandra.db.Memtable.resolve(Memtable.java:194)
 at org.apache.cassandra.db.Memtable.put(Memtable.java:158)
 at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:890)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
 at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:201)
 at 
 org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
 at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (CASSANDRA-6949) Performance regression in tombstone heavy workloads

2014-04-15 Thread Sam Tunnicliffe (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe reassigned CASSANDRA-6949:
--

Assignee: Sam Tunnicliffe  (was: Jeremiah Jordan)

 Performance regression in tombstone heavy workloads
 ---

 Key: CASSANDRA-6949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6949
 Project: Cassandra
  Issue Type: Bug
Reporter: Jeremiah Jordan
Assignee: Sam Tunnicliffe
 Attachments: 
 0001-Remove-expansion-of-RangeTombstones-to-delete-from-2.patch, 6949.txt


 CASSANDRA-5614 causes a huge performance regression in tombstone heavy 
 workloads.  The isDeleted checks here cause a huge CPU overhead: 
 https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/db/AtomicSortedColumns.java#L189-L196
 An insert workload which does perfectly fine on 1.2, pegs CPU use at 100% on 
 2.0, with all of the mutation threads sitting in that loop.  For example:
 {noformat}
 MutationStage:20 daemon prio=10 tid=0x7fb1c4c72800 nid=0x2249 runnable 
 [0x7fb1b033]
java.lang.Thread.State: RUNNABLE
 at org.apache.cassandra.db.marshal.BytesType.bytesCompare(BytesType.java:45)
 at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:34)
 at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:26)
 at 
 org.apache.cassandra.db.marshal.AbstractType.compareCollectionMembers(AbstractType.java:267)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:85)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
 at 
 org.apache.cassandra.db.RangeTombstoneList.searchInternal(RangeTombstoneList.java:253)
 at 
 org.apache.cassandra.db.RangeTombstoneList.isDeleted(RangeTombstoneList.java:210)
 at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:136)
 at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:123)
 at 
 org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:193)
 at org.apache.cassandra.db.Memtable.resolve(Memtable.java:194)
 at org.apache.cassandra.db.Memtable.put(Memtable.java:158)
 at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:890)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
 at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:201)
 at 
 org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
 at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-6949) Performance regression in tombstone heavy workloads

2014-04-15 Thread Sam Tunnicliffe (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-6949:
---

Attachment: 0001-Remove-expansion-of-RangeTombstones-to-delete-from-2.patch

 Performance regression in tombstone heavy workloads
 ---

 Key: CASSANDRA-6949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6949
 Project: Cassandra
  Issue Type: Bug
Reporter: Jeremiah Jordan
Assignee: Jeremiah Jordan
 Attachments: 
 0001-Remove-expansion-of-RangeTombstones-to-delete-from-2.patch, 6949.txt


 CASSANDRA-5614 causes a huge performance regression in tombstone heavy 
 workloads.  The isDeleted checks here cause a huge CPU overhead: 
 https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/db/AtomicSortedColumns.java#L189-L196
 An insert workload which does perfectly fine on 1.2, pegs CPU use at 100% on 
 2.0, with all of the mutation threads sitting in that loop.  For example:
 {noformat}
 MutationStage:20 daemon prio=10 tid=0x7fb1c4c72800 nid=0x2249 runnable 
 [0x7fb1b033]
java.lang.Thread.State: RUNNABLE
 at org.apache.cassandra.db.marshal.BytesType.bytesCompare(BytesType.java:45)
 at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:34)
 at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:26)
 at 
 org.apache.cassandra.db.marshal.AbstractType.compareCollectionMembers(AbstractType.java:267)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:85)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
 at 
 org.apache.cassandra.db.RangeTombstoneList.searchInternal(RangeTombstoneList.java:253)
 at 
 org.apache.cassandra.db.RangeTombstoneList.isDeleted(RangeTombstoneList.java:210)
 at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:136)
 at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:123)
 at 
 org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:193)
 at org.apache.cassandra.db.Memtable.resolve(Memtable.java:194)
 at org.apache.cassandra.db.Memtable.put(Memtable.java:158)
 at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:890)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
 at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:201)
 at 
 org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
 at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6949) Performance regression in tombstone heavy workloads

2014-04-15 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969692#comment-13969692
 ] 

Sam Tunnicliffe commented on CASSANDRA-6949:


That will help in the simple case where there are no indexes defined for the 
table, but it won't make a difference if there are. In other words, if the 
table has any indexes defined (including PerRowSecondaryIndexes, for which the 
specifics of the update are meaningless), we'll still iterate over every cell 
in that partition in the memtable to check it's not covered by the range 
tombstone. 

Personally, I'd prefer to revert the change to AtomicSortedColumns from 
CASSANDRA-5614 completely. It isn't necessary to ensure correctness in either 
KeysIndex or CompositesIndex as the repair-on-read behaviour cleans up any 
stale index entries (as does compaction). Given that, it doesn't seem worth the 
performance hit to ensure the 2i is kept absolutely in sync like this.

Attaching a patch against 2.0 to remove the ASC changes from 5614.


 Performance regression in tombstone heavy workloads
 ---

 Key: CASSANDRA-6949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6949
 Project: Cassandra
  Issue Type: Bug
Reporter: Jeremiah Jordan
Assignee: Jeremiah Jordan
 Attachments: 6949.txt


 CASSANDRA-5614 causes a huge performance regression in tombstone heavy 
 workloads.  The isDeleted checks here cause a huge CPU overhead: 
 https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/db/AtomicSortedColumns.java#L189-L196
 An insert workload which does perfectly fine on 1.2, pegs CPU use at 100% on 
 2.0, with all of the mutation threads sitting in that loop.  For example:
 {noformat}
 MutationStage:20 daemon prio=10 tid=0x7fb1c4c72800 nid=0x2249 runnable 
 [0x7fb1b033]
java.lang.Thread.State: RUNNABLE
 at org.apache.cassandra.db.marshal.BytesType.bytesCompare(BytesType.java:45)
 at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:34)
 at org.apache.cassandra.db.marshal.UTF8Type.compare(UTF8Type.java:26)
 at 
 org.apache.cassandra.db.marshal.AbstractType.compareCollectionMembers(AbstractType.java:267)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:85)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
 at 
 org.apache.cassandra.db.RangeTombstoneList.searchInternal(RangeTombstoneList.java:253)
 at 
 org.apache.cassandra.db.RangeTombstoneList.isDeleted(RangeTombstoneList.java:210)
 at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:136)
 at org.apache.cassandra.db.DeletionInfo.isDeleted(DeletionInfo.java:123)
 at 
 org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:193)
 at org.apache.cassandra.db.Memtable.resolve(Memtable.java:194)
 at org.apache.cassandra.db.Memtable.put(Memtable.java:158)
 at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:890)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
 at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:201)
 at 
 org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
 at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)