date:20140417

[
https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972524#comment-13972524
]

Benedict commented on CASSANDRA-6694:
-

So, on the whole I really don't perceive this approach as better: there's a
great deal of code duplication now (set to get worse still when you finish the
refactor for DecoratedKey), between each of the correspondingly named cell
implementations. Personally I think the Impl approach is neater as a result of
avoiding that (this may be more pronounced if we decide to optimise equals() is
you suggested). That said, if this moves us forwards I can live with it, if you
can address point 1 below.

There are a few problems though:

# I am *very* opposed to a public setPeer() method. This is a deal breaker for
me - but it can be avoided with a bit more refactoring.
# Your optimised updateDigest function is actually much slower than the old
implementation for all but the smallest values: an optimised version needs to
batch the contents into an array (stored in a ThreadLocal) and call
updateDigest with the array, unless the total size is very small (there's a
crossover point on my laptop of about 12 bytes, under which it's faster to call
update(byte)).
# AbstractNativeCell.getBytes actually calls setBytes
# excessHeapSize... should be unsharedHeapSize...
# There should be no hashCode method in Buffer\*Cell - I removed these for a
reason. Because we can have a Cell that is a CellName, and vice-versa, using a
Cell as a key for a map is likely dangerous. Since we don't do it anywhere,
it's safe to simply remove the methods.

There may be other minor issues, I'll hold off giving it a formal review until
we decide the direction we're going. To respond to a few of your comments:

bq. CounterUpdateCell interface is missing as well as NativeCounterUpdateCell
implementation to match it.

There shouldn't be one for the time being - we can never construct one.

bq. CounterUpdateCell should be BufferCounterUpdateCell as it extends BufferCell

Same reason - it doesn't exist as either or, so I made a conscious decision to
leave it as a CounterUpdateCell: the fact that it extends BufferCell is kind of
unimportant. It's purpose is somewhat different, and I think it is better left
named CounterUpdateCell, as that is its purpose (to carry a counter update as
far as the memtable, and no further).

bq. Impl classes extends another Impl classes which doesn't make much sense as
all of the methods are static.

This brings in the namespace of the extended class' static methods, which is
useful.

bq. When taken out of context like that it doesn't really make sense but what I
meant, there are situation where we don't really need to get BB from the
CellName but can transfer bytes directly (especially for the native cell
implementations).

Sure, but again: scope of ticket, and care needs to be taken when doing this
(e.g. your updateDigest modifications)

Slightly More Off-Heap Memtables

Key: CASSANDRA-6694
URL: https://issues.apache.org/jira/browse/CASSANDRA-6694
Project: Cassandra
Issue Type: Improvement
Components: Core
Reporter: Benedict
Assignee: Benedict
Labels: performance
Fix For: 2.1 beta2

The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as
the on-heap overhead is still very large. It should not be tremendously
difficult to extend these changes so that we allocate entire Cells off-heap,
instead of multiple BBs per Cell (with all their associated overhead).
The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6
bytes per cell on average for the btree overhead, for a total overhead of
around 20-22 bytes). This translates to 8-byte object overhead, 4-byte
address (we will do alignment tricks like the VM to allow us to address a
reasonably large memory space, although this trick is unlikely to last us
forever, at which point we will have to bite the bullet and accept a 24-byte
per cell overhead), and 4-byte object reference for maintaining our internal
list of allocations, which is unfortunately necessary since we cannot safely
(and cheaply) walk the object graph we allocate otherwise, which is necessary
for (allocation-) compaction and pointer rewriting.
The ugliest thing here is going to be implementing the various CellName
instances so that they may be backed by native memory OR heap memory.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972776#comment-13972776
 ] 

Pavel Yaskevich commented on CASSANDRA-6694:


To address all of your comments this is not intended for any kind of review 
yet, it is just an idea demonstration that's why I basically carried over all 
of the methods from original implementations, didn't rename or move stuff. Also 
I'm fine if methods in both implementations are going to return constant values 
like serializationFlags or isMarkedForDeleted, a part from that there is not 
much of the code duplication, duplication is also going to be minimized when 
hashCode and other methods go away, which would probably only leave us with 
dataSize and serializedSize duplication but I guess we can come up with 
something clever for native cells there too. Regarding the point about 
updateDigest - it's meant more like representation of kind of things we can do 
if we have two different implementations of it, not optimized for performance 
yet.

bq. There shouldn't be one for the time being - we can never construct one.

and 

bq. Same reason - it doesn't exist as either or, so I made a conscious decision 
to leave it as a CounterUpdateCell: the fact that it extends BufferCell is kind 
of unimportant. It's purpose is somewhat different, and I think it is better 
left named CounterUpdateCell, as that is its purpose (to carry a counter update 
as far as the memtable, and no further).

It is constructed in ColumnFamily and ColumnSerializer. If it's supposed to be 
only one implementation for now let's name it appropriately and use like all 
other buffered cells.

bq. This brings in the namespace of the extended class' static methods, which 
is useful.

By why do we care and what does it give us as those interfaces are called 
directly and static methods don't override each other?

bq. Sure, but again: scope of ticket, and care needs to be taken when doing 
this (e.g. your updateDigest modifications)

I don't really follow what are you implying with that, the scope is introduce 
native implementations as optimized as possible so why do we miss out of such 
low hanging fruit?...



 Slightly More Off-Heap Memtables
 

 Key: CASSANDRA-6694
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2


 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as 
 the on-heap overhead is still very large. It should not be tremendously 
 difficult to extend these changes so that we allocate entire Cells off-heap, 
 instead of multiple BBs per Cell (with all their associated overhead).
 The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 
 bytes per cell on average for the btree overhead, for a total overhead of 
 around 20-22 bytes). This translates to 8-byte object overhead, 4-byte 
 address (we will do alignment tricks like the VM to allow us to address a 
 reasonably large memory space, although this trick is unlikely to last us 
 forever, at which point we will have to bite the bullet and accept a 24-byte 
 per cell overhead), and 4-byte object reference for maintaining our internal 
 list of allocations, which is unfortunately necessary since we cannot safely 
 (and cheaply) walk the object graph we allocate otherwise, which is necessary 
 for (allocation-) compaction and pointer rewriting.
 The ugliest thing here is going to be implementing the various CellName 
 instances so that they may be backed by native memory OR heap memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972796#comment-13972796
 ] 

Benedict commented on CASSANDRA-6694:
-

bq. the scope is introduce native implementations -as optimized as possible-

Otherwise we need to do a lot more than the changes you are suggesting :)

bq.  Also I'm fine if methods in both implementations are going to return 
constant values like serializationFlags or isMarkedForDeleted

Well, these are still duplication - it is not clear as a result where the 
definition of these behaviours live. If the semantics change in future, it may 
introduce errors unnecessarily. Either way equals(),  reconcile() and 
validateFields() will still be issues. You don't seem to have implemented most 
of these methods yet (looks like your code doesn't actually compile). These 
methods are each non-trivial amounts of code duplication, equals() especially 
so is we optimise it as you want to. CounterCell.diff() will also need to be 
duplicated.

But, like I said, I can probably live with all of this if we address the 
setPeer() issue. equals() should probably still end up in a shared static 
method, at the very least, though.


 Slightly More Off-Heap Memtables
 

 Key: CASSANDRA-6694
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2


 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as 
 the on-heap overhead is still very large. It should not be tremendously 
 difficult to extend these changes so that we allocate entire Cells off-heap, 
 instead of multiple BBs per Cell (with all their associated overhead).
 The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 
 bytes per cell on average for the btree overhead, for a total overhead of 
 around 20-22 bytes). This translates to 8-byte object overhead, 4-byte 
 address (we will do alignment tricks like the VM to allow us to address a 
 reasonably large memory space, although this trick is unlikely to last us 
 forever, at which point we will have to bite the bullet and accept a 24-byte 
 per cell overhead), and 4-byte object reference for maintaining our internal 
 list of allocations, which is unfortunately necessary since we cannot safely 
 (and cheaply) walk the object graph we allocate otherwise, which is necessary 
 for (allocation-) compaction and pointer rewriting.
 The ugliest thing here is going to be implementing the various CellName 
 instances so that they may be backed by native memory OR heap memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables

2014-04-17 Thread Marcus Eriksson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972832#comment-13972832
 ] 

Marcus Eriksson commented on CASSANDRA-6694:


I'm +1 on [~benedict]s branch (have not looked at the one by [~xedin] yet)

nits;
* A few methods in Cell.Impl look redundant, isMarkedForDelete/isLive for 
example, kept around for symmetry?
* License header in DeletedCell and ExpiringCell
* Javadoc comment in NativeAllocator looks wrong

 Slightly More Off-Heap Memtables
 

 Key: CASSANDRA-6694
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2


 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as 
 the on-heap overhead is still very large. It should not be tremendously 
 difficult to extend these changes so that we allocate entire Cells off-heap, 
 instead of multiple BBs per Cell (with all their associated overhead).
 The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 
 bytes per cell on average for the btree overhead, for a total overhead of 
 around 20-22 bytes). This translates to 8-byte object overhead, 4-byte 
 address (we will do alignment tricks like the VM to allow us to address a 
 reasonably large memory space, although this trick is unlikely to last us 
 forever, at which point we will have to bite the bullet and accept a 24-byte 
 per cell overhead), and 4-byte object reference for maintaining our internal 
 list of allocations, which is unfortunately necessary since we cannot safely 
 (and cheaply) walk the object graph we allocate otherwise, which is necessary 
 for (allocation-) compaction and pointer rewriting.
 The ugliest thing here is going to be implementing the various CellName 
 instances so that they may be backed by native memory OR heap memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (CASSANDRA-7050) AbstractColumnFamilyInputFormat AbstractColumnFamilyOutputFormat throw NPE if username is provided but password is null

2014-04-17 Thread Mike Adamson (JIRA)

Mike Adamson created CASSANDRA-7050:
---

 Summary: AbstractColumnFamilyInputFormat  
AbstractColumnFamilyOutputFormat throw NPE if username is provided but password 
is null
 Key: CASSANDRA-7050
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7050
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Mike Adamson
Priority: Minor
 Fix For: 2.0.7


If a username is provided to either of these classes but the password is null 
the thrift layer throws an NPE because it can't handle null values for the 
login.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (CASSANDRA-7050) AbstractColumnFamilyInputFormat AbstractColumnFamilyOutputFormat throw NPE if username is provided but password is null

2014-04-17 Thread Mike Adamson (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Adamson updated CASSANDRA-7050:


Attachment: 7050.patch

Patch adds conditional check for password not being null before attempting login

 AbstractColumnFamilyInputFormat  AbstractColumnFamilyOutputFormat throw NPE 
 if username is provided but password is null
 -

 Key: CASSANDRA-7050
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7050
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Mike Adamson
Priority: Minor
 Fix For: 2.0.7

 Attachments: 7050.patch


 If a username is provided to either of these classes but the password is null 
 the thrift layer throws an NPE because it can't handle null values for the 
 login.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6572) Workload recording / playback


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972845#comment-13972845
 ] 

Benedict commented on CASSANDRA-6572:
-

[~lyubent] a few comments/suggestions about the patch:

# In the query recorder, it would make most sense to keep a writer handle open, 
instead of re-opening the file every time you append a new query. Ideally, we 
would probably have a buffer of some kind we write to in-memory, that we swap 
when we have flush to disk so that other queries can continue to log to the 
buffer without being impeded by the flush.
# It's quite wasteful to convert the query string to base64 encoded bytes, and 
then to convert that back into a string. Should write the bytes straight into 
the new buffer
# Since you're using an AtomicInteger, there's no need to use a lock there: can 
simply increment the counter and check the result (modulo frequency) to see if 
we should log. No need to reset to zero.


 Workload recording / playback
 -

 Key: CASSANDRA-6572
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6572
 Project: Cassandra
  Issue Type: New Feature
  Components: Core, Tools
Reporter: Jonathan Ellis
Assignee: Lyuben Todorov
 Fix For: 2.0.8

 Attachments: 6572-trunk.diff


 Write sample mode gets us part way to testing new versions against a real 
 world workload, but we need an easy way to test the query side as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (CASSANDRA-7041) Select query returns inconsistent result

2014-04-17 Thread Ngoc Minh Vo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13970601#comment-13970601
 ] 

Ngoc Minh Vo edited comment on CASSANDRA-7041 at 4/17/14 11:37 AM:
---

We implemented a test method in Java client to estimate the number of attempts 
required, for a failing query, to get a expected/non-empty result.
The number of attempts is between 2 and ~40 and it very random...

-No issue detected in other column families having more complicated schemas.-
-Hence, it might relate to CF without data columns? (i.e. all columns are 
part of Primary Key)-

This issue appears in multiple tables, not only the simple string_values one.

Thanks in advance for your help.
Best regards,
Minh


was (Author: vongocminh):
We implemented a test method in Java client to estimate the number of attempts 
required, for a failing query, to get a expected/non-empty result.
The number of attempts is between 2 and ~40 and it very random...

No issue detected in other column families having more complicated schemas.
Hence, it might relate to CF without data columns? (i.e. all columns are part 
of Primary Key)

Thanks in advance for your help.
Best regards,
Minh

 Select query returns inconsistent result
 

 Key: CASSANDRA-7041
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7041
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra v2.0.6 (upgraded from v2.0.3)
 4-node cluster: Windows7, 12GB JVM
Reporter: Ngoc Minh Vo
Priority: Critical

 Hello,
 We are running in an issue with C* v2.0.x: CQL queries randomly return empty 
 result.
 Here is the scenario:
 1. Schema:
 {noformat}
 CREATE TABLE string_values (
   date int,
   field text,
   value text,
   PRIMARY KEY ((date, field), value)
 ) WITH
   bloom_filter_fp_chance=0.10 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='99.0PERCENTILE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'LeveledCompactionStrategy'} AND
   compression={'sstable_compression': 'LZ4Compressor'};
 {noformat}
 2. There is no new data imported to the cluster during the test.
 3. CQL query:
 {noformat}
 select * from string_values where date=20140122 and field='SCONYKSP1';
 {noformat}
 4. In Cqlsh, the same query has been executed several times during a short 
 interval (~1-2 seconds). The first query results are empty and then we got 
 the data. And from that point, we always get the correct result:
 {noformat}
 cqlsh:titan_test select * from string_values where date=20140122 and 
 field='SCONYKSP1';
 (0 rows)
 cqlsh:titan_test select * from string_values where date=20140122 and 
 field='SCONYKSP1';
 (0 rows)
 ... ...
 cqlsh:titan_test select * from string_values where date=20140122 and 
 field='SCONYKSP1';
 (0 rows)
 cqlsh:titan_test select * from string_values where date=20140122 and 
 field='SCONYKSP1';
 (0 rows)
 cqlsh:titan_test select * from string_values where date=20140122 and 
 field='SCONYKSP1';
  date | field | value
 --+---+-
  20140122 | SCONYKSP1 | 201401220251826297a_0_3
 (1 rows)
 cqlsh:titan_test select * from string_values where date=20140122 and 
 field='SCONYKSP1';
  date | field | value
 --+---+-
  20140122 | SCONYKSP1 | 201401220251826297a_0_3
 (1 rows)
 {noformat}
 5. It might relate to some kind of warmup process. We tried to disable 
 key/data caching but it does not help.
 Upgrading cluster from v2.0.3 to v2.0.6 does not fix the issue (hence, not 
 related to CASSANDRA-6555).
 Long time ago, we posted a report on Java Driver JIRA: 
 https://datastax-oss.atlassian.net/browse/JAVA-217. But it seems that the 
 issue is in the server side.
 Best regards,
 Minh



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7041) Select query returns inconsistent result

2014-04-17 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972895#comment-13972895
 ] 

Sylvain Lebresne commented on CASSANDRA-7041:
-

I see nothing here that indicates that you're using QUORUM consistency (and you 
do have 4 nodes, though you didn't indicated your replication factor). cqlsh 
uses CL.ONE by default in particular. If you don't use QUORUM consistency (and 
your replication factor is  1), then what you see is perfectly expected.

 Select query returns inconsistent result
 

 Key: CASSANDRA-7041
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7041
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra v2.0.6 (upgraded from v2.0.3)
 4-node cluster: Windows7, 12GB JVM
Reporter: Ngoc Minh Vo
Priority: Critical

 Hello,
 We are running in an issue with C* v2.0.x: CQL queries randomly return empty 
 result.
 Here is the scenario:
 1. Schema:
 {noformat}
 CREATE TABLE string_values (
   date int,
   field text,
   value text,
   PRIMARY KEY ((date, field), value)
 ) WITH
   bloom_filter_fp_chance=0.10 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='99.0PERCENTILE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'LeveledCompactionStrategy'} AND
   compression={'sstable_compression': 'LZ4Compressor'};
 {noformat}
 2. There is no new data imported to the cluster during the test.
 3. CQL query:
 {noformat}
 select * from string_values where date=20140122 and field='SCONYKSP1';
 {noformat}
 4. In Cqlsh, the same query has been executed several times during a short 
 interval (~1-2 seconds). The first query results are empty and then we got 
 the data. And from that point, we always get the correct result:
 {noformat}
 cqlsh:titan_test select * from string_values where date=20140122 and 
 field='SCONYKSP1';
 (0 rows)
 cqlsh:titan_test select * from string_values where date=20140122 and 
 field='SCONYKSP1';
 (0 rows)
 ... ...
 cqlsh:titan_test select * from string_values where date=20140122 and 
 field='SCONYKSP1';
 (0 rows)
 cqlsh:titan_test select * from string_values where date=20140122 and 
 field='SCONYKSP1';
 (0 rows)
 cqlsh:titan_test select * from string_values where date=20140122 and 
 field='SCONYKSP1';
  date | field | value
 --+---+-
  20140122 | SCONYKSP1 | 201401220251826297a_0_3
 (1 rows)
 cqlsh:titan_test select * from string_values where date=20140122 and 
 field='SCONYKSP1';
  date | field | value
 --+---+-
  20140122 | SCONYKSP1 | 201401220251826297a_0_3
 (1 rows)
 {noformat}
 5. It might relate to some kind of warmup process. We tried to disable 
 key/data caching but it does not help.
 Upgrading cluster from v2.0.3 to v2.0.6 does not fix the issue (hence, not 
 related to CASSANDRA-6555).
 Long time ago, we posted a report on Java Driver JIRA: 
 https://datastax-oss.atlassian.net/browse/JAVA-217. But it seems that the 
 issue is in the server side.
 Best regards,
 Minh



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (CASSANDRA-7041) Select query returns inconsistent result

2014-04-17 Thread Ngoc Minh Vo (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ngoc Minh Vo updated CASSANDRA-7041:


Reproduced In: 2.0.6, 2.0.3  (was: 2.0.3, 2.0.6)
  Environment: 
Cassandra v2.0.6 (upgraded from v2.0.3)
4-node RF=3, cluster: Windows7, 12GB JVM

  was:
Cassandra v2.0.6 (upgraded from v2.0.3)
4-node cluster: Windows7, 12GB JVM


 Select query returns inconsistent result
 

 Key: CASSANDRA-7041
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7041
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra v2.0.6 (upgraded from v2.0.3)
 4-node RF=3, cluster: Windows7, 12GB JVM
Reporter: Ngoc Minh Vo
Priority: Critical

 Hello,
 We are running in an issue with C* v2.0.x: CQL queries randomly return empty 
 result.
 Here is the scenario:
 1. Schema:
 {noformat}
 CREATE TABLE string_values (
   date int,
   field text,
   value text,
   PRIMARY KEY ((date, field), value)
 ) WITH
   bloom_filter_fp_chance=0.10 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='99.0PERCENTILE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'LeveledCompactionStrategy'} AND
   compression={'sstable_compression': 'LZ4Compressor'};
 {noformat}
 2. There is no new data imported to the cluster during the test.
 3. CQL query:
 {noformat}
 select * from string_values where date=20140122 and field='SCONYKSP1';
 {noformat}
 4. In Cqlsh, the same query has been executed several times during a short 
 interval (~1-2 seconds). The first query results are empty and then we got 
 the data. And from that point, we always get the correct result:
 {noformat}
 cqlsh:titan_test select * from string_values where date=20140122 and 
 field='SCONYKSP1';
 (0 rows)
 cqlsh:titan_test select * from string_values where date=20140122 and 
 field='SCONYKSP1';
 (0 rows)
 ... ...
 cqlsh:titan_test select * from string_values where date=20140122 and 
 field='SCONYKSP1';
 (0 rows)
 cqlsh:titan_test select * from string_values where date=20140122 and 
 field='SCONYKSP1';
 (0 rows)
 cqlsh:titan_test select * from string_values where date=20140122 and 
 field='SCONYKSP1';
  date | field | value
 --+---+-
  20140122 | SCONYKSP1 | 201401220251826297a_0_3
 (1 rows)
 cqlsh:titan_test select * from string_values where date=20140122 and 
 field='SCONYKSP1';
  date | field | value
 --+---+-
  20140122 | SCONYKSP1 | 201401220251826297a_0_3
 (1 rows)
 {noformat}
 5. It might relate to some kind of warmup process. We tried to disable 
 key/data caching but it does not help.
 Upgrading cluster from v2.0.3 to v2.0.6 does not fix the issue (hence, not 
 related to CASSANDRA-6555).
 Long time ago, we posted a report on Java Driver JIRA: 
 https://datastax-oss.atlassian.net/browse/JAVA-217. But it seems that the 
 issue is in the server side.
 Best regards,
 Minh



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7041) Select query returns inconsistent result

2014-04-17 Thread Ngoc Minh Vo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972901#comment-13972901
 ] 

Ngoc Minh Vo commented on CASSANDRA-7041:
-

Hello Sylvain,

Thanks for your prompt answer.
Indeed, the issue is related to discrepancies in our data on date 20140122. 
Queries on other dates worked fine.

Change CL to QUORUM solved the issue!

Do we need to set CL to QUORUM on write queries as well?
With the default setting (ONE), we didn't get any error during data import.

Best regards,
Minh

 Select query returns inconsistent result
 

 Key: CASSANDRA-7041
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7041
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra v2.0.6 (upgraded from v2.0.3)
 4-node cluster: Windows7, 12GB JVM
Reporter: Ngoc Minh Vo
Priority: Critical

 Hello,
 We are running in an issue with C* v2.0.x: CQL queries randomly return empty 
 result.
 Here is the scenario:
 1. Schema:
 {noformat}
 CREATE TABLE string_values (
   date int,
   field text,
   value text,
   PRIMARY KEY ((date, field), value)
 ) WITH
   bloom_filter_fp_chance=0.10 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='99.0PERCENTILE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'LeveledCompactionStrategy'} AND
   compression={'sstable_compression': 'LZ4Compressor'};
 {noformat}
 2. There is no new data imported to the cluster during the test.
 3. CQL query:
 {noformat}
 select * from string_values where date=20140122 and field='SCONYKSP1';
 {noformat}
 4. In Cqlsh, the same query has been executed several times during a short 
 interval (~1-2 seconds). The first query results are empty and then we got 
 the data. And from that point, we always get the correct result:
 {noformat}
 cqlsh:titan_test select * from string_values where date=20140122 and 
 field='SCONYKSP1';
 (0 rows)
 cqlsh:titan_test select * from string_values where date=20140122 and 
 field='SCONYKSP1';
 (0 rows)
 ... ...
 cqlsh:titan_test select * from string_values where date=20140122 and 
 field='SCONYKSP1';
 (0 rows)
 cqlsh:titan_test select * from string_values where date=20140122 and 
 field='SCONYKSP1';
 (0 rows)
 cqlsh:titan_test select * from string_values where date=20140122 and 
 field='SCONYKSP1';
  date | field | value
 --+---+-
  20140122 | SCONYKSP1 | 201401220251826297a_0_3
 (1 rows)
 cqlsh:titan_test select * from string_values where date=20140122 and 
 field='SCONYKSP1';
  date | field | value
 --+---+-
  20140122 | SCONYKSP1 | 201401220251826297a_0_3
 (1 rows)
 {noformat}
 5. It might relate to some kind of warmup process. We tried to disable 
 key/data caching but it does not help.
 Upgrading cluster from v2.0.3 to v2.0.6 does not fix the issue (hence, not 
 related to CASSANDRA-6555).
 Long time ago, we posted a report on Java Driver JIRA: 
 https://datastax-oss.atlassian.net/browse/JAVA-217. But it seems that the 
 issue is in the server side.
 Best regards,
 Minh



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6591) un-deprecate cache recentHitRate and expose in o.a.c.metrics

2014-04-17 Thread Chris Burroughs (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972905#comment-13972905
 ] 

Chris Burroughs commented on CASSANDRA-6591:


What's next for this ticket?

 un-deprecate cache recentHitRate and expose in o.a.c.metrics
 

 Key: CASSANDRA-6591
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6591
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Chris Burroughs
Assignee: Chris Burroughs
Priority: Minor
 Attachments: j6591-1.2-v1.txt, j6591-1.2-v2.txt, j6591-1.2-v3.txt


 recentHitRate metrics were not added as part of CASSANDRA-4009 because there 
 is not an obvious way to do it with the Metrics library.  Instead hitRate was 
 added as an all time measurement since node restart.
 This does allow changes in cache rate (aka production performance problems)  
 to be detected.  Ideally there would be 1/5/15 moving averages for the hit 
 rate, but I'm not sure how to calculate that.  Instead I propose updating 
 recentHitRate on a fixed interval and exposing that as a Gauge.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (CASSANDRA-7041) Select query returns inconsistent result

2014-04-17 Thread Sylvain Lebresne (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne resolved CASSANDRA-7041.
-

   Resolution: Invalid
Reproduced In: 2.0.6, 2.0.3  (was: 2.0.3, 2.0.6)

Yes, you need QUORUM for writes and reads if you want to be guaranteed to see 
your write right away. I strongly encourage you to read documentations to 
understand how consistency level works as this is a pretty fundamental concept 
in Cassandra. Most Cassandra introduction you can found easily with google 
should help you there but you can probably start 
[here|http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html].

 Select query returns inconsistent result
 

 Key: CASSANDRA-7041
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7041
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra v2.0.6 (upgraded from v2.0.3)
 4-node RF=3, cluster: Windows7, 12GB JVM
Reporter: Ngoc Minh Vo
Priority: Critical

 Hello,
 We are running in an issue with C* v2.0.x: CQL queries randomly return empty 
 result.
 Here is the scenario:
 1. Schema:
 {noformat}
 CREATE TABLE string_values (
   date int,
   field text,
   value text,
   PRIMARY KEY ((date, field), value)
 ) WITH
   bloom_filter_fp_chance=0.10 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='99.0PERCENTILE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'LeveledCompactionStrategy'} AND
   compression={'sstable_compression': 'LZ4Compressor'};
 {noformat}
 2. There is no new data imported to the cluster during the test.
 3. CQL query:
 {noformat}
 select * from string_values where date=20140122 and field='SCONYKSP1';
 {noformat}
 4. In Cqlsh, the same query has been executed several times during a short 
 interval (~1-2 seconds). The first query results are empty and then we got 
 the data. And from that point, we always get the correct result:
 {noformat}
 cqlsh:titan_test select * from string_values where date=20140122 and 
 field='SCONYKSP1';
 (0 rows)
 cqlsh:titan_test select * from string_values where date=20140122 and 
 field='SCONYKSP1';
 (0 rows)
 ... ...
 cqlsh:titan_test select * from string_values where date=20140122 and 
 field='SCONYKSP1';
 (0 rows)
 cqlsh:titan_test select * from string_values where date=20140122 and 
 field='SCONYKSP1';
 (0 rows)
 cqlsh:titan_test select * from string_values where date=20140122 and 
 field='SCONYKSP1';
  date | field | value
 --+---+-
  20140122 | SCONYKSP1 | 201401220251826297a_0_3
 (1 rows)
 cqlsh:titan_test select * from string_values where date=20140122 and 
 field='SCONYKSP1';
  date | field | value
 --+---+-
  20140122 | SCONYKSP1 | 201401220251826297a_0_3
 (1 rows)
 {noformat}
 5. It might relate to some kind of warmup process. We tried to disable 
 key/data caching but it does not help.
 Upgrading cluster from v2.0.3 to v2.0.6 does not fix the issue (hence, not 
 related to CASSANDRA-6555).
 Long time ago, we posted a report on Java Driver JIRA: 
 https://datastax-oss.atlassian.net/browse/JAVA-217. But it seems that the 
 issue is in the server side.
 Best regards,
 Minh



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (CASSANDRA-7051) UnsupportedOperationException

2014-04-17 Thread Digant Modha (JIRA)

Digant Modha created CASSANDRA-7051:
---

 Summary: UnsupportedOperationException
 Key: CASSANDRA-7051
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7051
 Project: Cassandra
  Issue Type: Bug
  Components: API, Core
 Environment: Cassandra 2.0.6
Reporter: Digant Modha
Priority: Critical


UnsupportedOperationException exception thrown when using batchstatement.  This 
is because in 
org.apache.cassandra.cql3.statements.BatchStatement.unzipMutations returns a 
collection that does not support add if the size of mutation is 1.

STACK:
throws UnsupportedOperationException.
Daemon Thread [Native-Transport-Requests:1043] (Suspended (entry into method 
init in UnsupportedOperationException))
UnsupportedOperationException.init() line: 42 [local variables 
unavailable]
HashMap$Values(AbstractCollectionE).add(E) line: 260
HashMap$Values(AbstractCollectionE).addAll(Collection? extends E) 
line: 342
StorageProxy.mutateWithTriggers(CollectionIMutation, 
ConsistencyLevel, boolean) line: 519
BatchStatement.executeWithoutConditions(CollectionIMutation, 
ConsistencyLevel) line: 210
BatchStatement.execute(BatchStatement$BatchVariables, boolean, 
ConsistencyLevel, long) line: 203
BatchStatement.executeWithPerStatementVariables(ConsistencyLevel, 
QueryState, ListListByteBuffer) line: 192
QueryProcessor.processBatch(BatchStatement, ConsistencyLevel, 
QueryState, ListListByteBuffer, ListObject) line: 373
BatchMessage.execute(QueryState) line: 206
Message$Dispatcher.messageReceived(ChannelHandlerContext, MessageEvent) 
line: 304

Message$Dispatcher(SimpleChannelUpstreamHandler).handleUpstream(ChannelHandlerContext,
 ChannelEvent) line: 70

DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline$DefaultChannelHandlerContext,
 ChannelEvent) line: 564

DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(ChannelEvent) 
line: 791
ChannelUpstreamEventRunnable.doRun() line: 43
ChannelUpstreamEventRunnable(ChannelEventRunnable).run() line: 67

RequestThreadPoolExecutor(ThreadPoolExecutor).runWorker(ThreadPoolExecutor$Worker)
 line: 1145
ThreadPoolExecutor$Worker.run() line: 615
Thread.run() line: 744

org.apache.cassandra.cql3.statements.BatchStatement:
private Collection? extends IMutation unzipMutations(MapString, 
MapByteBuffer, IMutation mutations)
{
// The case where all statement where on the same keyspace is pretty 
common
if (mutations.size() == 1)
return mutations.values().iterator().next().values();

ListIMutation ms = new ArrayList();
for (MapByteBuffer, IMutation ksMap : mutations.values())
ms.addAll(ksMap.values());
return ms;
}




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6847) The binary transport doesn't load truststore file

2014-04-17 Thread Jeremiah Jordan (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972980#comment-13972980
 ] 

Jeremiah Jordan commented on CASSANDRA-6847:


Can we put a line in changes.txt for this?  I spent 2 days pulling my hair out 
from this one, and yes I probably should have done a full JIRA search, but I 
would expect require_client_auth being completely broken to show up in 
changes.txt :/

 The binary transport doesn't load truststore file
 -

 Key: CASSANDRA-6847
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6847
 Project: Cassandra
  Issue Type: Bug
Reporter: Mikhail Stepura
Assignee: Mikhail Stepura
Priority: Minor
  Labels: ssl
 Fix For: 1.2.16, 2.0.7, 2.1 beta2

 Attachments: cassandra-2.0-6847.patch


 {code:title=org.apache.cassandra.transport.Server.SecurePipelineFactory}
 this.sslContext = SSLFactory.createSSLContext(encryptionOptions, false);
 {code}
 {{false}} there means that {{truststore}} file won't be loaded in any case. 
 And that means that the file will not be used to validate clients when 
 {{require_client_auth==true}}, making 
 http://www.datastax.com/documentation/cassandra/2.0/cassandra/security/secureNewTrustedUsers_t.html
  meaningless.
 The only way to workaround that currently is to start C* with 
 {{-Djavax.net.ssl.trustStore=conf/.truststore}}
 I believe we should load  {{truststore}} when {{require_client_auth==true}},



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (CASSANDRA-7047) TriggerExecutor should group mutations by row key

2014-04-17 Thread Sergio Bossa (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Bossa updated CASSANDRA-7047:


Attachment: CASSANDRA-7047.patch

 TriggerExecutor should group mutations by row key
 -

 Key: CASSANDRA-7047
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7047
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Sergio Bossa
Assignee: Sergio Bossa
 Attachments: CASSANDRA-7047.patch


 TriggerExecutor doesn't currently group mutations returned by triggers even 
 if belonging to the same row key: while harmful per se (at least, I think 
 so), this is definitely a performance problem, because each mutation is a 
 *cluster* mutation, generating more network traffic, more disk IO and more 
 index calls (if present).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (CASSANDRA-7047) TriggerExecutor should group mutations by row key


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-7047:
-

Reviewer: Aleksey Yeschenko

 TriggerExecutor should group mutations by row key
 -

 Key: CASSANDRA-7047
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7047
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Sergio Bossa
Assignee: Sergio Bossa
 Attachments: CASSANDRA-7047.patch


 TriggerExecutor doesn't currently group mutations returned by triggers even 
 if belonging to the same row key: while harmful per se (at least, I think 
 so), this is definitely a performance problem, because each mutation is a 
 *cluster* mutation, generating more network traffic, more disk IO and more 
 index calls (if present).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (CASSANDRA-3668) Parallel streaming for sstableloader

2014-04-17 Thread Yuki Morishita (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuki Morishita updated CASSANDRA-3668:
--

Reviewer: Yuki Morishita  (was: Peter Schuller)

 Parallel streaming for sstableloader
 

 Key: CASSANDRA-3668
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3668
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Manish Zope
Assignee: Joshua McKenzie
Priority: Minor
  Labels: streaming
 Fix For: 2.1 beta2

 Attachments: 3668-1.1-v2.txt, 3668-1.1.txt, 3668_v2.txt, 
 3688-reply_before_closing_writer.txt, sstable-loader performance.txt

   Original Estimate: 48h
  Remaining Estimate: 48h

 One of my colleague had reported the bug regarding the degraded performance 
 of the sstable generator and sstable loader.
 ISSUE :- https://issues.apache.org/jira/browse/CASSANDRA-3589 
 As stated in above issue generator performance is rectified but performance 
 of the sstableloader is still an issue.
 3589 is marked as duplicate of 3618.Both issues shows resolved status.But the 
 problem with sstableloader still exists.
 So opening other issue so that sstbleloader problem should not go unnoticed.
 FYI : We have tested the generator part with the patch given in 3589.Its 
 Working fine.
 Please let us know if you guys require further inputs from our side.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables

[
https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973220#comment-13973220
]

Pavel Yaskevich commented on CASSANDRA-6694:

bq. Well, these are still duplication - it is not clear as a result where the
definition of these behaviours live. If the semantics change in future, it may
introduce errors unnecessarily. Either way equals(), reconcile() and
validateFields() will still be issues. You don't seem to have implemented most
of these methods yet (looks like your code doesn't actually compile). These
methods are each non-trivial amounts of code duplication, equals() especially
so is we optimise it as you want to. CounterCell.diff() will also need to be
duplicated.

Most of the duplicated methods are methods with static behavior which is not
going to change e.g. isMarkedForDelete, getMarkedForDeleteAt or
serializationFlags. CounterCell.diff and reconcile are living in the interface
for now. I will address setPeer(long) problem and hashCode.

Slightly More Off-Heap Memtables

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973226#comment-13973226
 ] 

Benedict commented on CASSANDRA-6694:
-

bq. CounterCell.diff and reconcile are living in the interface for now

Ah. This is a Java 8 only feature, which is why I missed it. Not really 
feasible.

 Slightly More Off-Heap Memtables
 

 Key: CASSANDRA-6694
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2


 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as 
 the on-heap overhead is still very large. It should not be tremendously 
 difficult to extend these changes so that we allocate entire Cells off-heap, 
 instead of multiple BBs per Cell (with all their associated overhead).
 The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 
 bytes per cell on average for the btree overhead, for a total overhead of 
 around 20-22 bytes). This translates to 8-byte object overhead, 4-byte 
 address (we will do alignment tricks like the VM to allow us to address a 
 reasonably large memory space, although this trick is unlikely to last us 
 forever, at which point we will have to bite the bullet and accept a 24-byte 
 per cell overhead), and 4-byte object reference for maintaining our internal 
 list of allocations, which is unfortunately necessary since we cannot safely 
 (and cheaply) walk the object graph we allocate otherwise, which is necessary 
 for (allocation-) compaction and pointer rewriting.
 The ugliest thing here is going to be implementing the various CellName 
 instances so that they may be backed by native memory OR heap memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973297#comment-13973297
 ] 

Pavel Yaskevich commented on CASSANDRA-6694:


I'm not talking about default methods in interfaces, I'm just saying that I 
added static diff/reconcile to CounterCell for now :)

 Slightly More Off-Heap Memtables
 

 Key: CASSANDRA-6694
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2


 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as 
 the on-heap overhead is still very large. It should not be tremendously 
 difficult to extend these changes so that we allocate entire Cells off-heap, 
 instead of multiple BBs per Cell (with all their associated overhead).
 The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 
 bytes per cell on average for the btree overhead, for a total overhead of 
 around 20-22 bytes). This translates to 8-byte object overhead, 4-byte 
 address (we will do alignment tricks like the VM to allow us to address a 
 reasonably large memory space, although this trick is unlikely to last us 
 forever, at which point we will have to bite the bullet and accept a 24-byte 
 per cell overhead), and 4-byte object reference for maintaining our internal 
 list of allocations, which is unfortunately necessary since we cannot safely 
 (and cheaply) walk the object graph we allocate otherwise, which is necessary 
 for (allocation-) compaction and pointer rewriting.
 The ugliest thing here is going to be implementing the various CellName 
 instances so that they may be backed by native memory OR heap memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6962) examine shortening path length post-5202

2014-04-17 Thread Yuki Morishita (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973336#comment-13973336
 ] 

Yuki Morishita commented on CASSANDRA-6962:
---

This turns out to be a bit complex than I first thought because secondary index 
CFs are flushing to the same directory. :(
Any ideas?

 examine shortening path length post-5202
 

 Key: CASSANDRA-6962
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6962
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Assignee: Yuki Morishita
 Fix For: 2.1 beta2


 From CASSANDRA-5202 discussion:
 {quote}
 Did we give up on this?
 Could we clean up the redundancy a little by moving the ID into the directory 
 name? e.g., ks/cf-uuid/version-generation-component.db
 I'm worried about path length, which is limited on Windows.
 Edit: to give a specific example, for KS foo Table bar we now have
 /var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4/foo-bar-ka-1-Data.db
 I'm proposing
 /var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4/ka-1-Data.db
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (CASSANDRA-7052) Query on compact storage with limit returns extra rows

2014-04-17 Thread Stuart Freeman (JIRA)

Stuart Freeman created CASSANDRA-7052:
-

 Summary: Query on compact storage with limit returns extra rows
 Key: CASSANDRA-7052
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7052
 Project: Cassandra
  Issue Type: Bug
Reporter: Stuart Freeman


I tested this on Cassandra 2.0.6 and 2.0.3 and got the same result on both:

{code}
cqlsh create KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': 1};
cqlsh USE test;
cqlsh:test CREATE COLUMNFAMILY VerifyPagedColumnQueryStartAndEnd (keyId 
text, columnName text, value text, PRIMARY KEY (keyId, columnName)) 
WITH COMPACT STORAGE;
cqlsh:test INSERT INTO VerifyPagedColumnQueryStartAndEnd (keyId, 
columnName, value) VALUES ( 'key', 'a', '1' )  ;
cqlsh:test INSERT INTO VerifyPagedColumnQueryStartAndEnd (keyId, 
columnName, value) VALUES ( 'key', 'b', '1' )  ;
cqlsh:test INSERT INTO VerifyPagedColumnQueryStartAndEnd (keyId, 
columnName, value) VALUES ( 'key', 'c', '1' )  ;
cqlsh:test INSERT INTO VerifyPagedColumnQueryStartAndEnd (keyId, 
columnName, value) VALUES ( 'key', 'd', '1' )  ;
cqlsh:test INSERT INTO VerifyPagedColumnQueryStartAndEnd (keyId, 
columnName, value) VALUES ( 'key', 'e', '1' )  ;
cqlsh:test SELECT * FROM VerifyPagedColumnQueryStartAndEnd WHERE keyId = 
'key' AND columnName  '' AND columnName = 'e' LIMIT 2;

 keyId | columnName | value
---++---
   key |  a | 1
   key |  b | 1
   key |  c | 1

(3 rows)

cqlsh:test
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6875) CQL3: select multiple CQL rows in a single partition using IN

2014-04-17 Thread Tyler Hobbs (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973391#comment-13973391
 ] 

Tyler Hobbs commented on CASSANDRA-6875:


Regarding prepared statements, I assume we want to support all of the following:
* {{... WHERE (k, c1) IN ?}}
* {{... WHERE (k, c1) IN (?, ?, ...)}}
* {{... WHERE (k, c1) IN ((?, ?), (?, ?), ...)}}

 CQL3: select multiple CQL rows in a single partition using IN
 -

 Key: CASSANDRA-6875
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6875
 Project: Cassandra
  Issue Type: Bug
  Components: API
Reporter: Nicolas Favre-Felix
Assignee: Tyler Hobbs
Priority: Minor
 Fix For: 2.0.8


 In the spirit of CASSANDRA-4851 and to bring CQL to parity with Thrift, it is 
 important to support reading several distinct CQL rows from a given partition 
 using a distinct set of coordinates for these rows within the partition.
 CASSANDRA-4851 introduced a range scan over the multi-dimensional space of 
 clustering keys. We also need to support a multi-get of CQL rows, 
 potentially using the IN keyword to define a set of clustering keys to 
 fetch at once.
 (reusing the same example\:)
 Consider the following table:
 {code}
 CREATE TABLE test (
   k int,
   c1 int,
   c2 int,
   PRIMARY KEY (k, c1, c2)
 );
 {code}
 with the following data:
 {code}
  k | c1 | c2
 ---++
  0 |  0 |  0
  0 |  0 |  1
  0 |  1 |  0
  0 |  1 |  1
 {code}
 We can fetch a single row or a range of rows, but not a set of them:
 {code}
  SELECT * FROM test WHERE k = 0 AND (c1, c2) IN ((0, 0), (1,1)) ;
 Bad Request: line 1:54 missing EOF at ','
 {code}
 Supporting this syntax would return:
 {code}
  k | c1 | c2
 ---++
  0 |  0 |  0
  0 |  1 |  1
 {code}
 Being able to fetch these two CQL rows in a single read is important to 
 maintain partition-level isolation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (CASSANDRA-6875) CQL3: select multiple CQL rows in a single partition using IN

2014-04-17 Thread Tyler Hobbs (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973391#comment-13973391
 ] 

Tyler Hobbs edited comment on CASSANDRA-6875 at 4/17/14 8:50 PM:
-

Regarding prepared statements, I assume we want to support all of the following:
* {{... WHERE (c1, c2) IN ?}}
* {{... WHERE (c1, c2) IN (?, ?, ...)}}
* {{... WHERE (c1, c2) IN ((?, ?), (?, ?), ...)}}


was (Author: thobbs):
Regarding prepared statements, I assume we want to support all of the following:
* {{... WHERE (k, c1) IN ?}}
* {{... WHERE (k, c1) IN (?, ?, ...)}}
* {{... WHERE (k, c1) IN ((?, ?), (?, ?), ...)}}

 CQL3: select multiple CQL rows in a single partition using IN
 -

 Key: CASSANDRA-6875
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6875
 Project: Cassandra
  Issue Type: Bug
  Components: API
Reporter: Nicolas Favre-Felix
Assignee: Tyler Hobbs
Priority: Minor
 Fix For: 2.0.8


 In the spirit of CASSANDRA-4851 and to bring CQL to parity with Thrift, it is 
 important to support reading several distinct CQL rows from a given partition 
 using a distinct set of coordinates for these rows within the partition.
 CASSANDRA-4851 introduced a range scan over the multi-dimensional space of 
 clustering keys. We also need to support a multi-get of CQL rows, 
 potentially using the IN keyword to define a set of clustering keys to 
 fetch at once.
 (reusing the same example\:)
 Consider the following table:
 {code}
 CREATE TABLE test (
   k int,
   c1 int,
   c2 int,
   PRIMARY KEY (k, c1, c2)
 );
 {code}
 with the following data:
 {code}
  k | c1 | c2
 ---++
  0 |  0 |  0
  0 |  0 |  1
  0 |  1 |  0
  0 |  1 |  1
 {code}
 We can fetch a single row or a range of rows, but not a set of them:
 {code}
  SELECT * FROM test WHERE k = 0 AND (c1, c2) IN ((0, 0), (1,1)) ;
 Bad Request: line 1:54 missing EOF at ','
 {code}
 Supporting this syntax would return:
 {code}
  k | c1 | c2
 ---++
  0 |  0 |  0
  0 |  1 |  1
 {code}
 Being able to fetch these two CQL rows in a single read is important to 
 maintain partition-level isolation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables

[
https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973395#comment-13973395
]

Aleksey Yeschenko commented on CASSANDRA-6694:
--

bq. It's purpose is somewhat different, and I think it is better left named
CounterUpdateCell, as that is its purpose (to carry a counter update as far as
the memtable, and no further).

FWIW it doesn't even make it to a memtable in 2.1, ever. That said, not calling
it BufferCounterUpdateCell would be bothering my consistency OCD, a lot, and
I'm not done with counters until 3.0. Can you make my OCD a tiny favor and call
it consistently with the other implementations? (: Thanks.

bq. There should be no hashCode method in Buffer*Cell - I removed these for a
reason. Because we can have a Cell that is a CellName, and vice-versa, using a
Cell as a key for a map is likely dangerous. Since we don't do it anywhere,
it's safe to simply remove the methods.

Maybe we should just throw UnsupportedOperationException then, but leave the
methods? I agree that using Cell-s as keys is very unlikely, but stuff like
this has bitten us before.

Haven't read either branch yet, but planning to soon, just wanted to jump at
the opportunity to bikeshed a bit.

Slightly More Off-Heap Memtables

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables

[
https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973405#comment-13973405
]

Benedict commented on CASSANDRA-6694:
-

bq. Can you make my OCD a tiny favor and call it consistently with the other
implementations? (: Thanks.

Sure. I have a preference to keep it that way, but not a strong one.

bq. Maybe we should just throw UnsupportedOperationException then, but leave
the methods? I agree that using Cell-s as keys is very unlikely, but stuff like
this has bitten us before.

Also sure.

Slightly More Off-Heap Memtables

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (CASSANDRA-6694) Slightly More Off-Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973435#comment-13973435
 ] 

Pavel Yaskevich edited comment on CASSANDRA-6694 at 4/17/14 9:29 PM:
-

Regarding hashCode that's what we do, I do it in AbstractCell now, Benedict 
does it in both BufferCell and NativeCell.


was (Author: xedin):
Regarding, the hashCode that's what we do, I do it in AbstractCell now, 
Benedict does it in both BufferCell and NativeCell.

 Slightly More Off-Heap Memtables
 

 Key: CASSANDRA-6694
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2


 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as 
 the on-heap overhead is still very large. It should not be tremendously 
 difficult to extend these changes so that we allocate entire Cells off-heap, 
 instead of multiple BBs per Cell (with all their associated overhead).
 The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 
 bytes per cell on average for the btree overhead, for a total overhead of 
 around 20-22 bytes). This translates to 8-byte object overhead, 4-byte 
 address (we will do alignment tricks like the VM to allow us to address a 
 reasonably large memory space, although this trick is unlikely to last us 
 forever, at which point we will have to bite the bullet and accept a 24-byte 
 per cell overhead), and 4-byte object reference for maintaining our internal 
 list of allocations, which is unfortunately necessary since we cannot safely 
 (and cheaply) walk the object graph we allocate otherwise, which is necessary 
 for (allocation-) compaction and pointer rewriting.
 The ugliest thing here is going to be implementing the various CellName 
 instances so that they may be backed by native memory OR heap memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973435#comment-13973435
 ] 

Pavel Yaskevich commented on CASSANDRA-6694:


Regarding, the hashCode that's what we do, I do it in AbstractCell now, 
Benedict does it in both BufferCell and NativeCell.

 Slightly More Off-Heap Memtables
 

 Key: CASSANDRA-6694
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2


 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as 
 the on-heap overhead is still very large. It should not be tremendously 
 difficult to extend these changes so that we allocate entire Cells off-heap, 
 instead of multiple BBs per Cell (with all their associated overhead).
 The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 
 bytes per cell on average for the btree overhead, for a total overhead of 
 around 20-22 bytes). This translates to 8-byte object overhead, 4-byte 
 address (we will do alignment tricks like the VM to allow us to address a 
 reasonably large memory space, although this trick is unlikely to last us 
 forever, at which point we will have to bite the bullet and accept a 24-byte 
 per cell overhead), and 4-byte object reference for maintaining our internal 
 list of allocations, which is unfortunately necessary since we cannot safely 
 (and cheaply) walk the object graph we allocate otherwise, which is necessary 
 for (allocation-) compaction and pointer rewriting.
 The ugliest thing here is going to be implementing the various CellName 
 instances so that they may be backed by native memory OR heap memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6999) Batchlog replay should account for CF truncation records

2014-04-17 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973447#comment-13973447
 ] 

Jonathan Ellis commented on CASSANDRA-6999:
---

Looks to me like the ImmutableSet copy in replayBatch is unnecessary, since 
mutation.without creates a new modifications map rather than modifying the 
original.

Who wins on ties?  Should writtenAt  SystemTable.getTruncatedAt be =?

Rest LGTM.

 Batchlog replay should account for CF truncation records
 

 Key: CASSANDRA-6999
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6999
 Project: Cassandra
  Issue Type: Bug
Reporter: Aleksey Yeschenko
Assignee: Aleksey Yeschenko
 Fix For: 1.2.17, 2.0.8, 2.1 beta2


 Just as HHOM does, BM should properly handle column families' truncation 
 records and not replay mutations that are younger that the last known record.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6999) Batchlog replay should account for CF truncation records


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973456#comment-13973456
 ] 

Aleksey Yeschenko commented on CASSANDRA-6999:
--

bq. Looks to me like the ImmutableSet copy in replayBatch is unnecessary, since 
mutation.without creates a new modifications map rather than modifying the 
original.

It is necessary :( 
http://docs.oracle.com/javase/7/docs/api/java/util/Map.html#keySet() - if not 
copied, might return a ConcurrentModificationException.

bq. Who wins on ties? Should writtenAt  SystemTable.getTruncatedAt be =?

Probably. Will alter HHOM to use = as well.

 Batchlog replay should account for CF truncation records
 

 Key: CASSANDRA-6999
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6999
 Project: Cassandra
  Issue Type: Bug
Reporter: Aleksey Yeschenko
Assignee: Aleksey Yeschenko
 Fix For: 1.2.17, 2.0.8, 2.1 beta2


 Just as HHOM does, BM should properly handle column families' truncation 
 records and not replay mutations that are younger that the last known record.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6999) Batchlog replay should account for CF truncation records


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973484#comment-13973484
 ] 

Aleksey Yeschenko commented on CASSANDRA-6999:
--

NVM, you were right about ImmutableSet copy in replayBatch being unnecessary, 
sorry.

 Batchlog replay should account for CF truncation records
 

 Key: CASSANDRA-6999
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6999
 Project: Cassandra
  Issue Type: Bug
Reporter: Aleksey Yeschenko
Assignee: Aleksey Yeschenko
 Fix For: 1.2.17, 2.0.8, 2.1 beta2


 Just as HHOM does, BM should properly handle column families' truncation 
 records and not replay mutations that are younger that the last known record.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (CASSANDRA-7053) USING TIMESTAMP for batches does not work

2014-04-17 Thread Robert Supencheck (JIRA)

Robert Supencheck created CASSANDRA-7053:


 Summary: USING TIMESTAMP for batches does not work
 Key: CASSANDRA-7053
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7053
 Project: Cassandra
  Issue Type: Bug
Reporter: Robert Supencheck


When using the USING TIMESTAMP timestamp syntax for a batch statement, the 
supplied timestamp is ignored.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7053) USING TIMESTAMP for batches does not work

2014-04-17 Thread Robert Supencheck (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973522#comment-13973522
 ] 

Robert Supencheck commented on CASSANDRA-7053:
--

Replication steps:

1)  Invoke the cqlsh prompt;
2)  Create a keyspace:
create keyspace test with replication = 
{'class':'SimpleStrategy','replication_factor':1};
3)  Choose to use the keyspace, test;
4)  Create a table in the test keyspace:
CREATE TABLE test_table ( key text PRIMARY KEY, data text) ;
5)  Attempt a batch insert, using a timestamp, in the table, test_table:
BEGIN BATCH USING TIMESTAMP  INSERT INTO test_table (key, data 
) VALUES ( 'key1', 'some data 1'); INSERT INTO test_table (key, data) VALUES ( 
'key2', 'some data 2') ; APPLY BATCH ;
6)  View the timestamps on the newly inserted table entries to observe that the 
timestamps are not as specified:
select writetime(data), key, data from test_table;

 writetime(data)  | key  | data
--+--+-
 1397772023766000 | key1 | some data 1
 1397772023766000 | key2 | some data 2

(2 rows)

***
The expected behavior is that the timestamps in the resulting table should be 
.


 USING TIMESTAMP for batches does not work
 -

 Key: CASSANDRA-7053
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7053
 Project: Cassandra
  Issue Type: Bug
Reporter: Robert Supencheck
  Labels: cqlsh

 When using the USING TIMESTAMP timestamp syntax for a batch statement, 
 the supplied timestamp is ignored.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

git commit: ix batchlog to account for CF truncation records

Repository: cassandra
Updated Branches:
  refs/heads/cassandra-1.2 fe94e90f4 - f46c6578c


ix batchlog to account for CF truncation records

patch by Aleksey Yeschenko; reviewed by Jonathan Ellis for
CASSANDRA-6999


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/f46c6578
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/f46c6578
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/f46c6578

Branch: refs/heads/cassandra-1.2
Commit: f46c6578c2fb905cd88681d80218d89798032e03
Parents: fe94e90
Author: Aleksey Yeschenko alek...@apache.org
Authored: Fri Apr 18 01:36:08 2014 +0300
Committer: Aleksey Yeschenko alek...@apache.org
Committed: Fri Apr 18 01:36:08 2014 +0300

--
 CHANGES.txt |   1 +
 .../apache/cassandra/db/BatchlogManager.java| 102 ---
 .../apache/cassandra/db/ColumnFamilyStore.java  |   6 --
 .../cassandra/db/HintedHandOffManager.java  |  16 +--
 .../org/apache/cassandra/db/RowMutation.java|   6 +-
 .../org/apache/cassandra/db/SystemTable.java|  53 +++---
 .../db/commitlog/CommitLogReplayer.java |   4 +-
 .../apache/cassandra/service/StorageProxy.java  |   9 +-
 .../cassandra/db/BatchlogManagerTest.java   |  78 --
 .../apache/cassandra/db/HintedHandOffTest.java  |   2 +-
 10 files changed, 189 insertions(+), 88 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/f46c6578/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 07c09cf..bb08a37 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -5,6 +5,7 @@
  * Schedule schema pulls on change (CASSANDRA-6971)
  * Non-droppable verbs shouldn't be dropped from OTC (CASSANDRA-6980)
  * Shutdown batchlog executor in SS#drain() (CASSANDRA-7025)
+ * Fix batchlog to account for CF truncation records (CASSANDRA-6999)
 
 
 1.2.16

http://git-wip-us.apache.org/repos/asf/cassandra/blob/f46c6578/src/java/org/apache/cassandra/db/BatchlogManager.java
--
diff --git a/src/java/org/apache/cassandra/db/BatchlogManager.java 
b/src/java/org/apache/cassandra/db/BatchlogManager.java
index b8dbadd..ea32e9d 100644
--- a/src/java/org/apache/cassandra/db/BatchlogManager.java
+++ b/src/java/org/apache/cassandra/db/BatchlogManager.java
@@ -24,10 +24,7 @@ import java.lang.management.ManagementFactory;
 import java.net.InetAddress;
 import java.nio.ByteBuffer;
 import java.util.*;
-import java.util.concurrent.CopyOnWriteArraySet;
-import java.util.concurrent.ExecutionException;
-import java.util.concurrent.ScheduledExecutorService;
-import java.util.concurrent.TimeUnit;
+import java.util.concurrent.*;
 import java.util.concurrent.atomic.AtomicBoolean;
 import java.util.concurrent.atomic.AtomicLong;
 import javax.management.MBeanServer;
@@ -36,6 +33,7 @@ import javax.management.ObjectName;
 import com.google.common.annotations.VisibleForTesting;
 import com.google.common.collect.Iterables;
 import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
 import com.google.common.util.concurrent.RateLimiter;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
@@ -254,45 +252,72 @@ public class BatchlogManager implements 
BatchlogManagerMBean
 {
 DataInputStream in = new 
DataInputStream(ByteBufferUtil.inputStream(data));
 int size = in.readInt();
+ListRowMutation mutations = new ArrayListRowMutation(size);
+
 for (int i = 0; i  size; i++)
-replaySerializedMutation(RowMutation.serializer.deserialize(in, 
VERSION), writtenAt, rateLimiter);
+{
+RowMutation mutation = RowMutation.serializer.deserialize(in, 
VERSION);
+
+// Remove CFs that have been truncated since. writtenAt and 
SystemTable#getTruncatedAt() both return millis.
+// We don't abort the replay entirely b/c this can be considered a 
succes (truncated is same as delivered then
+// truncated.
+for (UUID cfId : mutation.getColumnFamilyIds())
+if (writtenAt = SystemTable.getTruncatedAt(cfId))
+mutation = mutation.without(cfId);
+
+if (!mutation.isEmpty())
+mutations.add(mutation);
+}
+
+if (!mutations.isEmpty())
+replayMutations(mutations, writtenAt, rateLimiter);
 }
 
 /*
  * We try to deliver the mutations to the replicas ourselves if they are 
alive and only resort to writing hints
  * when a replica is down or a write request times out.
  */
-private void replaySerializedMutation(RowMutation mutation, long 
writtenAt, RateLimiter rateLimiter) throws IOException
+private void

[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973534#comment-13973534
 ] 

Pavel Yaskevich commented on CASSANDRA-6694:


Ok, hashCode and setPeer changes are now pushed to the same branch, 
AbstractNativeCell is independent of NativeAllocation now because 
NativeAllocator returns aligned peer directly, which allows peer field to be 
made final in AbstractNativeCell. Also I have pushed set/get logic for data 
size associated with the pointer to the NativeAllocator as it's basically it's 
metadata, IMO it's a bit cleaner comparing to how that is done in Benedict's 
branch where NativeAllocation tracks pointer alignment to size (internalPeer() 
{ return peer + 4; }) but NativeAllocator takes care of allocating 4 additional 
bytes to requested size.

 Slightly More Off-Heap Memtables
 

 Key: CASSANDRA-6694
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2


 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as 
 the on-heap overhead is still very large. It should not be tremendously 
 difficult to extend these changes so that we allocate entire Cells off-heap, 
 instead of multiple BBs per Cell (with all their associated overhead).
 The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 
 bytes per cell on average for the btree overhead, for a total overhead of 
 around 20-22 bytes). This translates to 8-byte object overhead, 4-byte 
 address (we will do alignment tricks like the VM to allow us to address a 
 reasonably large memory space, although this trick is unlikely to last us 
 forever, at which point we will have to bite the bullet and accept a 24-byte 
 per cell overhead), and 4-byte object reference for maintaining our internal 
 list of allocations, which is unfortunately necessary since we cannot safely 
 (and cheaply) walk the object graph we allocate otherwise, which is necessary 
 for (allocation-) compaction and pointer rewriting.
 The ugliest thing here is going to be implementing the various CellName 
 instances so that they may be backed by native memory OR heap memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (CASSANDRA-6694) Slightly More Off-Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973534#comment-13973534
 ] 

Pavel Yaskevich edited comment on CASSANDRA-6694 at 4/17/14 10:56 PM:
--

Ok, hashCode and setPeer changes are now pushed to the same branch, 
AbstractNativeCell is independent of NativeAllocation now because 
NativeAllocator returns aligned peer directly, which allows peer field to be 
made final in AbstractNativeCell. Also I have pushed set/get logic for data 
size associated with the pointer to the NativeAllocator as it's basically it's 
metadata, IMO it's a bit cleaner comparing to how that is done in Benedict's 
branch where NativeAllocation tracks pointer alignment to size (internalPeer() 
\{ return peer + 4; \}) but NativeAllocator takes care of allocating 4 
additional bytes to requested size.


was (Author: xedin):
Ok, hashCode and setPeer changes are now pushed to the same branch, 
AbstractNativeCell is independent of NativeAllocation now because 
NativeAllocator returns aligned peer directly, which allows peer field to be 
made final in AbstractNativeCell. Also I have pushed set/get logic for data 
size associated with the pointer to the NativeAllocator as it's basically it's 
metadata, IMO it's a bit cleaner comparing to how that is done in Benedict's 
branch where NativeAllocation tracks pointer alignment to size (internalPeer() 
{ return peer + 4; }) but NativeAllocator takes care of allocating 4 additional 
bytes to requested size.

 Slightly More Off-Heap Memtables
 

 Key: CASSANDRA-6694
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2


 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as 
 the on-heap overhead is still very large. It should not be tremendously 
 difficult to extend these changes so that we allocate entire Cells off-heap, 
 instead of multiple BBs per Cell (with all their associated overhead).
 The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 
 bytes per cell on average for the btree overhead, for a total overhead of 
 around 20-22 bytes). This translates to 8-byte object overhead, 4-byte 
 address (we will do alignment tricks like the VM to allow us to address a 
 reasonably large memory space, although this trick is unlikely to last us 
 forever, at which point we will have to bite the bullet and accept a 24-byte 
 per cell overhead), and 4-byte object reference for maintaining our internal 
 list of allocations, which is unfortunately necessary since we cannot safely 
 (and cheaply) walk the object graph we allocate otherwise, which is necessary 
 for (allocation-) compaction and pointer rewriting.
 The ugliest thing here is going to be implementing the various CellName 
 instances so that they may be backed by native memory OR heap memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973538#comment-13973538
 ] 

Benedict commented on CASSANDRA-6694:
-

I don't think this is the right approach: with the changes we are making, we 
are pretty much precluding doing anything fancy with GC (we'll have to rely on 
malloc for now). As such the size is no longer providing any useful book 
keeping information to the NativeAllocator. It should be dealt with entirely in 
the AbstractNativeCell - its concept of size is entirely unique to it for now. 
This also, separately, makes packing structs of NativeCell a lot more straight 
forward.

 Slightly More Off-Heap Memtables
 

 Key: CASSANDRA-6694
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2


 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as 
 the on-heap overhead is still very large. It should not be tremendously 
 difficult to extend these changes so that we allocate entire Cells off-heap, 
 instead of multiple BBs per Cell (with all their associated overhead).
 The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 
 bytes per cell on average for the btree overhead, for a total overhead of 
 around 20-22 bytes). This translates to 8-byte object overhead, 4-byte 
 address (we will do alignment tricks like the VM to allow us to address a 
 reasonably large memory space, although this trick is unlikely to last us 
 forever, at which point we will have to bite the bullet and accept a 24-byte 
 per cell overhead), and 4-byte object reference for maintaining our internal 
 list of allocations, which is unfortunately necessary since we cannot safely 
 (and cheaply) walk the object graph we allocate otherwise, which is necessary 
 for (allocation-) compaction and pointer rewriting.
 The ugliest thing here is going to be implementing the various CellName 
 instances so that they may be backed by native memory OR heap memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[1/2] git commit: Fix batchlog to account for CF truncation records

Repository: cassandra
Updated Branches:
  refs/heads/cassandra-2.0 7dbbe9233 - 384de4b85


Fix batchlog to account for CF truncation records

patch by Aleksey Yeschenko; reviewed by Jonathan Ellis for
CASSANDRA-6999


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/87097066
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/87097066
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/87097066

Branch: refs/heads/cassandra-2.0
Commit: 87097066e7c3c133e333804c4e4b00457b6c989d
Parents: fe94e90
Author: Aleksey Yeschenko alek...@apache.org
Authored: Fri Apr 18 01:36:08 2014 +0300
Committer: Aleksey Yeschenko alek...@apache.org
Committed: Fri Apr 18 01:38:55 2014 +0300

--
 CHANGES.txt |   1 +
 .../apache/cassandra/db/BatchlogManager.java| 102 ---
 .../apache/cassandra/db/ColumnFamilyStore.java  |   6 --
 .../cassandra/db/HintedHandOffManager.java  |  16 +--
 .../org/apache/cassandra/db/RowMutation.java|   6 +-
 .../org/apache/cassandra/db/SystemTable.java|  53 +++---
 .../db/commitlog/CommitLogReplayer.java |   4 +-
 .../apache/cassandra/service/StorageProxy.java  |   9 +-
 .../cassandra/db/BatchlogManagerTest.java   |  78 --
 .../apache/cassandra/db/HintedHandOffTest.java  |   2 +-
 10 files changed, 189 insertions(+), 88 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/87097066/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 07c09cf..bb08a37 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -5,6 +5,7 @@
  * Schedule schema pulls on change (CASSANDRA-6971)
  * Non-droppable verbs shouldn't be dropped from OTC (CASSANDRA-6980)
  * Shutdown batchlog executor in SS#drain() (CASSANDRA-7025)
+ * Fix batchlog to account for CF truncation records (CASSANDRA-6999)
 
 
 1.2.16

http://git-wip-us.apache.org/repos/asf/cassandra/blob/87097066/src/java/org/apache/cassandra/db/BatchlogManager.java
--
diff --git a/src/java/org/apache/cassandra/db/BatchlogManager.java 
b/src/java/org/apache/cassandra/db/BatchlogManager.java
index b8dbadd..ea32e9d 100644
--- a/src/java/org/apache/cassandra/db/BatchlogManager.java
+++ b/src/java/org/apache/cassandra/db/BatchlogManager.java
@@ -24,10 +24,7 @@ import java.lang.management.ManagementFactory;
 import java.net.InetAddress;
 import java.nio.ByteBuffer;
 import java.util.*;
-import java.util.concurrent.CopyOnWriteArraySet;
-import java.util.concurrent.ExecutionException;
-import java.util.concurrent.ScheduledExecutorService;
-import java.util.concurrent.TimeUnit;
+import java.util.concurrent.*;
 import java.util.concurrent.atomic.AtomicBoolean;
 import java.util.concurrent.atomic.AtomicLong;
 import javax.management.MBeanServer;
@@ -36,6 +33,7 @@ import javax.management.ObjectName;
 import com.google.common.annotations.VisibleForTesting;
 import com.google.common.collect.Iterables;
 import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
 import com.google.common.util.concurrent.RateLimiter;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
@@ -254,45 +252,72 @@ public class BatchlogManager implements 
BatchlogManagerMBean
 {
 DataInputStream in = new 
DataInputStream(ByteBufferUtil.inputStream(data));
 int size = in.readInt();
+ListRowMutation mutations = new ArrayListRowMutation(size);
+
 for (int i = 0; i  size; i++)
-replaySerializedMutation(RowMutation.serializer.deserialize(in, 
VERSION), writtenAt, rateLimiter);
+{
+RowMutation mutation = RowMutation.serializer.deserialize(in, 
VERSION);
+
+// Remove CFs that have been truncated since. writtenAt and 
SystemTable#getTruncatedAt() both return millis.
+// We don't abort the replay entirely b/c this can be considered a 
succes (truncated is same as delivered then
+// truncated.
+for (UUID cfId : mutation.getColumnFamilyIds())
+if (writtenAt = SystemTable.getTruncatedAt(cfId))
+mutation = mutation.without(cfId);
+
+if (!mutation.isEmpty())
+mutations.add(mutation);
+}
+
+if (!mutations.isEmpty())
+replayMutations(mutations, writtenAt, rateLimiter);
 }
 
 /*
  * We try to deliver the mutations to the replicas ourselves if they are 
alive and only resort to writing hints
  * when a replica is down or a write request times out.
  */
-private void replaySerializedMutation(RowMutation mutation, long 
writtenAt, RateLimiter rateLimiter) throws IOException
+private void

[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973571#comment-13973571
 ] 

Pavel Yaskevich commented on CASSANDRA-6694:


I just don't like that in NativeAllocation we assume that NativeAllocator has 
reserved 4 bytes for us. So I decided to put everything into NativeAllocator 
and only return useful space so we don't have to + 4 every time we need a peer. 
It could be done in AbstractNativeCell which would allocate size + 4 or it 
could be done in NativeAllocator and it would tell how big allocation was based 
on the area pointer that it returned (which is was 
NativeAllocator.getDataSize(areaPointer) does) on demand, either of those 
places (AbstractNativeCell or NativeAllocator) works for me.

 Slightly More Off-Heap Memtables
 

 Key: CASSANDRA-6694
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2


 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as 
 the on-heap overhead is still very large. It should not be tremendously 
 difficult to extend these changes so that we allocate entire Cells off-heap, 
 instead of multiple BBs per Cell (with all their associated overhead).
 The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 
 bytes per cell on average for the btree overhead, for a total overhead of 
 around 20-22 bytes). This translates to 8-byte object overhead, 4-byte 
 address (we will do alignment tricks like the VM to allow us to address a 
 reasonably large memory space, although this trick is unlikely to last us 
 forever, at which point we will have to bite the bullet and accept a 24-byte 
 per cell overhead), and 4-byte object reference for maintaining our internal 
 list of allocations, which is unfortunately necessary since we cannot safely 
 (and cheaply) walk the object graph we allocate otherwise, which is necessary 
 for (allocation-) compaction and pointer rewriting.
 The ugliest thing here is going to be implementing the various CellName 
 instances so that they may be backed by native memory OR heap memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (CASSANDRA-6694) Slightly More Off-Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973582#comment-13973582
 ] 

Benedict edited comment on CASSANDRA-6694 at 4/17/14 11:55 PM:
---

The only reason we were assigning a size in NativeAllocator was to support 
moving the peer around (in which case you need to know how much memory you're 
copying). 

NativeAllocation assuming it has (i.e. _being defined as having_) a size prefix 
is fine when it is tightly coupled with NativeAllocator (like it is in my 
branch) - but once you have it as a final field in another object, 
NativeAllocator should simply have no say in the matter. It never needs to know 
the size of the allocation, so we should just redefine what our 
AbstractNativeCell considers to be its size in its sizeOf() calculation, and 
have the NativeAllocator use that unadulterated value.


was (Author: benedict):
The only reason it was happening in NativeAllocator was to support moving the 
peer around (so you need to know how much memory you're copying). 

NativeAllocation assuming it has (i.e. _being defined as having_) a size prefix 
is fine when it is tightly coupled with NativeAllocator (like it is in my 
branch) - but once you have it as a final field in another object, 
NativeAllocator should simply have no say in the matter. It never needs to know 
the size of the allocation, so we should just redefine what our 
AbstractNativeCell considers to be its size in its sizeOf() calculation.

 Slightly More Off-Heap Memtables
 

 Key: CASSANDRA-6694
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2


 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as 
 the on-heap overhead is still very large. It should not be tremendously 
 difficult to extend these changes so that we allocate entire Cells off-heap, 
 instead of multiple BBs per Cell (with all their associated overhead).
 The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 
 bytes per cell on average for the btree overhead, for a total overhead of 
 around 20-22 bytes). This translates to 8-byte object overhead, 4-byte 
 address (we will do alignment tricks like the VM to allow us to address a 
 reasonably large memory space, although this trick is unlikely to last us 
 forever, at which point we will have to bite the bullet and accept a 24-byte 
 per cell overhead), and 4-byte object reference for maintaining our internal 
 list of allocations, which is unfortunately necessary since we cannot safely 
 (and cheaply) walk the object graph we allocate otherwise, which is necessary 
 for (allocation-) compaction and pointer rewriting.
 The ugliest thing here is going to be implementing the various CellName 
 instances so that they may be backed by native memory OR heap memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973582#comment-13973582
 ] 

Benedict commented on CASSANDRA-6694:
-

The only reason it was happening in NativeAllocator was to support moving the 
peer around (so you need to know how much memory you're copying). 

NativeAllocation assuming it has (i.e. _being defined as having_) a size prefix 
is fine when it is tightly coupled with NativeAllocator (like it is in my 
branch) - but once you have it as a final field in another object, 
NativeAllocator should simply have no say in the matter. It never needs to know 
the size of the allocation, so we should just redefine what our 
AbstractNativeCell considers to be its size in its sizeOf() calculation.

 Slightly More Off-Heap Memtables
 

 Key: CASSANDRA-6694
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2


 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as 
 the on-heap overhead is still very large. It should not be tremendously 
 difficult to extend these changes so that we allocate entire Cells off-heap, 
 instead of multiple BBs per Cell (with all their associated overhead).
 The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 
 bytes per cell on average for the btree overhead, for a total overhead of 
 around 20-22 bytes). This translates to 8-byte object overhead, 4-byte 
 address (we will do alignment tricks like the VM to allow us to address a 
 reasonably large memory space, although this trick is unlikely to last us 
 forever, at which point we will have to bite the bullet and accept a 24-byte 
 per cell overhead), and 4-byte object reference for maintaining our internal 
 list of allocations, which is unfortunately necessary since we cannot safely 
 (and cheaply) walk the object graph we allocate otherwise, which is necessary 
 for (allocation-) compaction and pointer rewriting.
 The ugliest thing here is going to be implementing the various CellName 
 instances so that they may be backed by native memory OR heap memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973595#comment-13973595
 ] 

Pavel Yaskevich commented on CASSANDRA-6694:


Sure, if you like that better I will change that right away, anyhow if we need 
it in allocator for some reason we can change it.

 Slightly More Off-Heap Memtables
 

 Key: CASSANDRA-6694
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2


 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as 
 the on-heap overhead is still very large. It should not be tremendously 
 difficult to extend these changes so that we allocate entire Cells off-heap, 
 instead of multiple BBs per Cell (with all their associated overhead).
 The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 
 bytes per cell on average for the btree overhead, for a total overhead of 
 around 20-22 bytes). This translates to 8-byte object overhead, 4-byte 
 address (we will do alignment tricks like the VM to allow us to address a 
 reasonably large memory space, although this trick is unlikely to last us 
 forever, at which point we will have to bite the bullet and accept a 24-byte 
 per cell overhead), and 4-byte object reference for maintaining our internal 
 list of allocations, which is unfortunately necessary since we cannot safely 
 (and cheaply) walk the object graph we allocate otherwise, which is necessary 
 for (allocation-) compaction and pointer rewriting.
 The ugliest thing here is going to be implementing the various CellName 
 instances so that they may be backed by native memory OR heap memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[1/4] git commit: Update versions for 2.0.7 release

Repository: cassandra
Updated Branches:
  refs/heads/cassandra-2.1 de8a479f2 - 66af6fedc


Update versions for 2.0.7 release


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/7dbbe923
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/7dbbe923
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/7dbbe923

Branch: refs/heads/cassandra-2.1
Commit: 7dbbe9233ce83c2a473ba2510c827a661de99400
Parents: 294c011
Author: Sylvain Lebresne sylv...@datastax.com
Authored: Mon Apr 14 16:43:46 2014 +0200
Committer: Sylvain Lebresne sylv...@datastax.com
Committed: Mon Apr 14 16:43:46 2014 +0200

--
 NEWS.txt | 11 ++-
 build.xml|  2 +-
 debian/changelog |  6 ++
 3 files changed, 17 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/7dbbe923/NEWS.txt
--
diff --git a/NEWS.txt b/NEWS.txt
index 18f89bc..05f9392 100644
--- a/NEWS.txt
+++ b/NEWS.txt
@@ -14,6 +14,15 @@ restore snapshots created with the previous major version 
using the
 using the provided 'sstableupgrade' tool.
 
 
+2.0.7
+=
+
+Upgrading
+-
+- Nothing specific to this release, but please see 2.0.6 if you are 
upgrading
+  from a previous version.
+
+
 2.0.6
 =
 
@@ -29,7 +38,7 @@ New features
 
 Upgrading
 -
-- Nothing specific to this release, but please see 2.0.6 if you are 
upgrading
+- Nothing specific to this release, but please see 2.0.5 if you are 
upgrading
   from a previous version.
 
 

http://git-wip-us.apache.org/repos/asf/cassandra/blob/7dbbe923/build.xml
--
diff --git a/build.xml b/build.xml
index e6d77d8..5c6c736 100644
--- a/build.xml
+++ b/build.xml
@@ -25,7 +25,7 @@
 property name=debuglevel value=source,lines,vars/
 
 !-- default version and SCM information --
-property name=base.version value=2.0.6/
+property name=base.version value=2.0.7/
 property name=scm.connection 
value=scm:git://git.apache.org/cassandra.git/
 property name=scm.developerConnection 
value=scm:git://git.apache.org/cassandra.git/
 property name=scm.url 
value=http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=tree/

http://git-wip-us.apache.org/repos/asf/cassandra/blob/7dbbe923/debian/changelog
--
diff --git a/debian/changelog b/debian/changelog
index 6cc4391..37c7425 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,9 @@
+cassandra (2.0.7) unstable; urgency=low
+
+  * New release
+
+ -- Sylvain Lebresne slebre...@apache.org  Mon, 14 Apr 2014 16:42:09 +0200
+
 cassandra (2.0.6) unstable; urgency=low
 
   * New release

[4/4] git commit: Merge branch 'cassandra-2.0' into cassandra-2.1

Merge branch 'cassandra-2.0' into cassandra-2.1

Conflicts:
CHANGES.txt
build.xml
debian/changelog
src/java/org/apache/cassandra/db/BatchlogManager.java
src/java/org/apache/cassandra/db/ColumnFamilyStore.java
src/java/org/apache/cassandra/db/HintedHandOffManager.java
src/java/org/apache/cassandra/db/SystemKeyspace.java
src/java/org/apache/cassandra/service/StorageProxy.java
test/unit/org/apache/cassandra/db/BatchlogManagerTest.java


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/66af6fed
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/66af6fed
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/66af6fed

Branch: refs/heads/cassandra-2.1
Commit: 66af6fedc02eed630028043f8a6f0d3014f193d5
Parents: de8a479 384de4b
Author: Aleksey Yeschenko alek...@apache.org
Authored: Fri Apr 18 03:14:47 2014 +0300
Committer: Aleksey Yeschenko alek...@apache.org
Committed: Fri Apr 18 03:14:47 2014 +0300

--
 CHANGES.txt |   1 +
 NEWS.txt|  11 +-
 .../apache/cassandra/db/BatchlogManager.java| 102 +++
 .../apache/cassandra/db/ColumnFamilyStore.java  |   6 --
 .../cassandra/db/HintedHandOffManager.java  |  19 +---
 .../org/apache/cassandra/db/SystemKeyspace.java |  55 +++---
 .../db/commitlog/CommitLogReplayer.java |  12 +--
 .../apache/cassandra/service/StorageProxy.java  |   9 +-
 .../cassandra/db/BatchlogManagerTest.java   |  84 +--
 .../apache/cassandra/db/HintedHandOffTest.java  |  19 ++--
 10 files changed, 214 insertions(+), 104 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/66af6fed/CHANGES.txt
--
diff --cc CHANGES.txt
index 9f34023,ad26f6d..705f1b8
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -108,6 -64,6 +108,7 @@@ Merged from 1.2
   * Schedule schema pulls on change (CASSANDRA-6971)
   * Non-droppable verbs shouldn't be dropped from OTC (CASSANDRA-6980)
   * Shutdown batchlog executor in SS#drain() (CASSANDRA-7025)
++ * Fix batchlog to account for CF truncation records (CASSANDRA-6999)
  
  
  2.0.6

http://git-wip-us.apache.org/repos/asf/cassandra/blob/66af6fed/NEWS.txt
--
diff --cc NEWS.txt
index 9567ef3,05f9392..ac78a73
--- a/NEWS.txt
+++ b/NEWS.txt
@@@ -13,46 -13,16 +13,55 @@@ restore snapshots created with the prev
  'sstableloader' tool. You can upgrade the file format of your snapshots
  using the provided 'sstableupgrade' tool.
  
 +2.1
 +===
 +
 +New features
 +
 +   - SSTable data directory name is slightly changed. Each directory will
 + have hex string appended after CF name, e.g.
 + ks/cf-5be396077b811e3a3ab9dc4b9ac088d/
 + This hex string part represents unique ColumnFamily ID.
 + Note that existing directories are used as is, so only newly created
 + directories after upgrade have new directory name format.
 +   - Saved key cache files also have ColumnFamily ID in their file name.
 +   - It is now possible to do incremental repairs, sstables that have been
 + repaired are marked with a timestamp and not included in the next
 + repair session. Use nodetool repair -par -inc to use this feature.
 + A tool to manually mark/unmark sstables as repaired is available in
 + tools/bin/sstablerepairedset.
 +
 +Upgrading
 +-
 +   - Rolling upgrades from anything pre-2.0.7 is not supported. Furthermore
 + pre-2.0 sstables are not supported. This means that before upgrading
 + a node on 2.1, this node must be started on 2.0 and
 + 'nodetool upgdradesstables' must be run (and this even in the case
 + of not-rolling upgrades).
 +   - For size-tiered compaction users, Cassandra now defaults to ignoring
 + the coldest 5% of sstables.  This can be customized with the
 + cold_reads_to_omit compaction option; 0.0 omits nothing (the old
 + behavior) and 1.0 omits everything.
 +   - Multithreaded compaction has been removed.
 +   - Counters implementation has been changed, replaced by a safer one with
 + less caveats, but different performance characteristics. You might have
 + to change your data model to accomodate the new implementation.
 + (See https://issues.apache.org/jira/browse/CASSANDRA-6504 and the dev
 + blog post at http://www.datastax.com/dev/blog/PLACEHOLDER for details).
 +- (per-table) index_interval parameter has been replaced with
 + min_index_interval and max_index_interval paratemeters. index_interval
 + has been deprecated.
 +
  
+ 2.0.7
+ =
+ 
+ Upgrading
+ -
+ - Nothing specific to this

[2/4] git commit: Fix batchlog to account for CF truncation records

Fix batchlog to account for CF truncation records

patch by Aleksey Yeschenko; reviewed by Jonathan Ellis for
CASSANDRA-6999


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/87097066
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/87097066
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/87097066

Branch: refs/heads/cassandra-2.1
Commit: 87097066e7c3c133e333804c4e4b00457b6c989d
Parents: fe94e90
Author: Aleksey Yeschenko alek...@apache.org
Authored: Fri Apr 18 01:36:08 2014 +0300
Committer: Aleksey Yeschenko alek...@apache.org
Committed: Fri Apr 18 01:38:55 2014 +0300

--
 CHANGES.txt |   1 +
 .../apache/cassandra/db/BatchlogManager.java| 102 ---
 .../apache/cassandra/db/ColumnFamilyStore.java  |   6 --
 .../cassandra/db/HintedHandOffManager.java  |  16 +--
 .../org/apache/cassandra/db/RowMutation.java|   6 +-
 .../org/apache/cassandra/db/SystemTable.java|  53 +++---
 .../db/commitlog/CommitLogReplayer.java |   4 +-
 .../apache/cassandra/service/StorageProxy.java  |   9 +-
 .../cassandra/db/BatchlogManagerTest.java   |  78 --
 .../apache/cassandra/db/HintedHandOffTest.java  |   2 +-
 10 files changed, 189 insertions(+), 88 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/87097066/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 07c09cf..bb08a37 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -5,6 +5,7 @@
  * Schedule schema pulls on change (CASSANDRA-6971)
  * Non-droppable verbs shouldn't be dropped from OTC (CASSANDRA-6980)
  * Shutdown batchlog executor in SS#drain() (CASSANDRA-7025)
+ * Fix batchlog to account for CF truncation records (CASSANDRA-6999)
 
 
 1.2.16

http://git-wip-us.apache.org/repos/asf/cassandra/blob/87097066/src/java/org/apache/cassandra/db/BatchlogManager.java
--
diff --git a/src/java/org/apache/cassandra/db/BatchlogManager.java 
b/src/java/org/apache/cassandra/db/BatchlogManager.java
index b8dbadd..ea32e9d 100644
--- a/src/java/org/apache/cassandra/db/BatchlogManager.java
+++ b/src/java/org/apache/cassandra/db/BatchlogManager.java
@@ -24,10 +24,7 @@ import java.lang.management.ManagementFactory;
 import java.net.InetAddress;
 import java.nio.ByteBuffer;
 import java.util.*;
-import java.util.concurrent.CopyOnWriteArraySet;
-import java.util.concurrent.ExecutionException;
-import java.util.concurrent.ScheduledExecutorService;
-import java.util.concurrent.TimeUnit;
+import java.util.concurrent.*;
 import java.util.concurrent.atomic.AtomicBoolean;
 import java.util.concurrent.atomic.AtomicLong;
 import javax.management.MBeanServer;
@@ -36,6 +33,7 @@ import javax.management.ObjectName;
 import com.google.common.annotations.VisibleForTesting;
 import com.google.common.collect.Iterables;
 import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
 import com.google.common.util.concurrent.RateLimiter;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
@@ -254,45 +252,72 @@ public class BatchlogManager implements 
BatchlogManagerMBean
 {
 DataInputStream in = new 
DataInputStream(ByteBufferUtil.inputStream(data));
 int size = in.readInt();
+ListRowMutation mutations = new ArrayListRowMutation(size);
+
 for (int i = 0; i  size; i++)
-replaySerializedMutation(RowMutation.serializer.deserialize(in, 
VERSION), writtenAt, rateLimiter);
+{
+RowMutation mutation = RowMutation.serializer.deserialize(in, 
VERSION);
+
+// Remove CFs that have been truncated since. writtenAt and 
SystemTable#getTruncatedAt() both return millis.
+// We don't abort the replay entirely b/c this can be considered a 
succes (truncated is same as delivered then
+// truncated.
+for (UUID cfId : mutation.getColumnFamilyIds())
+if (writtenAt = SystemTable.getTruncatedAt(cfId))
+mutation = mutation.without(cfId);
+
+if (!mutation.isEmpty())
+mutations.add(mutation);
+}
+
+if (!mutations.isEmpty())
+replayMutations(mutations, writtenAt, rateLimiter);
 }
 
 /*
  * We try to deliver the mutations to the replicas ourselves if they are 
alive and only resort to writing hints
  * when a replica is down or a write request times out.
  */
-private void replaySerializedMutation(RowMutation mutation, long 
writtenAt, RateLimiter rateLimiter) throws IOException
+private void replayMutations(ListRowMutation mutations, long writtenAt, 
RateLimiter rateLimiter) throws IOException
 {

[5/5] git commit: Merge branch 'cassandra-2.1' into trunk

Merge branch 'cassandra-2.1' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/e1002881
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/e1002881
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/e1002881

Branch: refs/heads/trunk
Commit: e100288123b055021d9b8873ab86f0dbf5fc9f22
Parents: 4d06917 66af6fe
Author: Aleksey Yeschenko alek...@apache.org
Authored: Fri Apr 18 03:16:14 2014 +0300
Committer: Aleksey Yeschenko alek...@apache.org
Committed: Fri Apr 18 03:16:14 2014 +0300

--
 CHANGES.txt |   1 +
 NEWS.txt|  11 +-
 .../apache/cassandra/db/BatchlogManager.java| 102 +++
 .../apache/cassandra/db/ColumnFamilyStore.java  |   6 --
 .../cassandra/db/HintedHandOffManager.java  |  19 +---
 .../org/apache/cassandra/db/SystemKeyspace.java |  55 +++---
 .../db/commitlog/CommitLogReplayer.java |  12 +--
 .../apache/cassandra/service/StorageProxy.java  |   9 +-
 .../cassandra/db/BatchlogManagerTest.java   |  84 +--
 .../apache/cassandra/db/HintedHandOffTest.java  |  19 ++--
 10 files changed, 214 insertions(+), 104 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/e1002881/CHANGES.txt
--

http://git-wip-us.apache.org/repos/asf/cassandra/blob/e1002881/NEWS.txt
--

[1/5] git commit: Update versions for 2.0.7 release

Repository: cassandra
Updated Branches:
  refs/heads/trunk 4d0691759 - e10028812


Update versions for 2.0.7 release


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/7dbbe923
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/7dbbe923
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/7dbbe923

Branch: refs/heads/trunk
Commit: 7dbbe9233ce83c2a473ba2510c827a661de99400
Parents: 294c011
Author: Sylvain Lebresne sylv...@datastax.com
Authored: Mon Apr 14 16:43:46 2014 +0200
Committer: Sylvain Lebresne sylv...@datastax.com
Committed: Mon Apr 14 16:43:46 2014 +0200

--
 NEWS.txt | 11 ++-
 build.xml|  2 +-
 debian/changelog |  6 ++
 3 files changed, 17 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/7dbbe923/NEWS.txt
--
diff --git a/NEWS.txt b/NEWS.txt
index 18f89bc..05f9392 100644
--- a/NEWS.txt
+++ b/NEWS.txt
@@ -14,6 +14,15 @@ restore snapshots created with the previous major version 
using the
 using the provided 'sstableupgrade' tool.
 
 
+2.0.7
+=
+
+Upgrading
+-
+- Nothing specific to this release, but please see 2.0.6 if you are 
upgrading
+  from a previous version.
+
+
 2.0.6
 =
 
@@ -29,7 +38,7 @@ New features
 
 Upgrading
 -
-- Nothing specific to this release, but please see 2.0.6 if you are 
upgrading
+- Nothing specific to this release, but please see 2.0.5 if you are 
upgrading
   from a previous version.
 
 

http://git-wip-us.apache.org/repos/asf/cassandra/blob/7dbbe923/build.xml
--
diff --git a/build.xml b/build.xml
index e6d77d8..5c6c736 100644
--- a/build.xml
+++ b/build.xml
@@ -25,7 +25,7 @@
 property name=debuglevel value=source,lines,vars/
 
 !-- default version and SCM information --
-property name=base.version value=2.0.6/
+property name=base.version value=2.0.7/
 property name=scm.connection 
value=scm:git://git.apache.org/cassandra.git/
 property name=scm.developerConnection 
value=scm:git://git.apache.org/cassandra.git/
 property name=scm.url 
value=http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=tree/

http://git-wip-us.apache.org/repos/asf/cassandra/blob/7dbbe923/debian/changelog
--
diff --git a/debian/changelog b/debian/changelog
index 6cc4391..37c7425 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,9 @@
+cassandra (2.0.7) unstable; urgency=low
+
+  * New release
+
+ -- Sylvain Lebresne slebre...@apache.org  Mon, 14 Apr 2014 16:42:09 +0200
+
 cassandra (2.0.6) unstable; urgency=low
 
   * New release

[4/5] git commit: Merge branch 'cassandra-2.0' into cassandra-2.1

Merge branch 'cassandra-2.0' into cassandra-2.1

Conflicts:
CHANGES.txt
build.xml
debian/changelog
src/java/org/apache/cassandra/db/BatchlogManager.java
src/java/org/apache/cassandra/db/ColumnFamilyStore.java
src/java/org/apache/cassandra/db/HintedHandOffManager.java
src/java/org/apache/cassandra/db/SystemKeyspace.java
src/java/org/apache/cassandra/service/StorageProxy.java
test/unit/org/apache/cassandra/db/BatchlogManagerTest.java


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/66af6fed
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/66af6fed
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/66af6fed

Branch: refs/heads/trunk
Commit: 66af6fedc02eed630028043f8a6f0d3014f193d5
Parents: de8a479 384de4b
Author: Aleksey Yeschenko alek...@apache.org
Authored: Fri Apr 18 03:14:47 2014 +0300
Committer: Aleksey Yeschenko alek...@apache.org
Committed: Fri Apr 18 03:14:47 2014 +0300

--
 CHANGES.txt |   1 +
 NEWS.txt|  11 +-
 .../apache/cassandra/db/BatchlogManager.java| 102 +++
 .../apache/cassandra/db/ColumnFamilyStore.java  |   6 --
 .../cassandra/db/HintedHandOffManager.java  |  19 +---
 .../org/apache/cassandra/db/SystemKeyspace.java |  55 +++---
 .../db/commitlog/CommitLogReplayer.java |  12 +--
 .../apache/cassandra/service/StorageProxy.java  |   9 +-
 .../cassandra/db/BatchlogManagerTest.java   |  84 +--
 .../apache/cassandra/db/HintedHandOffTest.java  |  19 ++--
 10 files changed, 214 insertions(+), 104 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/66af6fed/CHANGES.txt
--
diff --cc CHANGES.txt
index 9f34023,ad26f6d..705f1b8
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -108,6 -64,6 +108,7 @@@ Merged from 1.2
   * Schedule schema pulls on change (CASSANDRA-6971)
   * Non-droppable verbs shouldn't be dropped from OTC (CASSANDRA-6980)
   * Shutdown batchlog executor in SS#drain() (CASSANDRA-7025)
++ * Fix batchlog to account for CF truncation records (CASSANDRA-6999)
  
  
  2.0.6

http://git-wip-us.apache.org/repos/asf/cassandra/blob/66af6fed/NEWS.txt
--
diff --cc NEWS.txt
index 9567ef3,05f9392..ac78a73
--- a/NEWS.txt
+++ b/NEWS.txt
@@@ -13,46 -13,16 +13,55 @@@ restore snapshots created with the prev
  'sstableloader' tool. You can upgrade the file format of your snapshots
  using the provided 'sstableupgrade' tool.
  
 +2.1
 +===
 +
 +New features
 +
 +   - SSTable data directory name is slightly changed. Each directory will
 + have hex string appended after CF name, e.g.
 + ks/cf-5be396077b811e3a3ab9dc4b9ac088d/
 + This hex string part represents unique ColumnFamily ID.
 + Note that existing directories are used as is, so only newly created
 + directories after upgrade have new directory name format.
 +   - Saved key cache files also have ColumnFamily ID in their file name.
 +   - It is now possible to do incremental repairs, sstables that have been
 + repaired are marked with a timestamp and not included in the next
 + repair session. Use nodetool repair -par -inc to use this feature.
 + A tool to manually mark/unmark sstables as repaired is available in
 + tools/bin/sstablerepairedset.
 +
 +Upgrading
 +-
 +   - Rolling upgrades from anything pre-2.0.7 is not supported. Furthermore
 + pre-2.0 sstables are not supported. This means that before upgrading
 + a node on 2.1, this node must be started on 2.0 and
 + 'nodetool upgdradesstables' must be run (and this even in the case
 + of not-rolling upgrades).
 +   - For size-tiered compaction users, Cassandra now defaults to ignoring
 + the coldest 5% of sstables.  This can be customized with the
 + cold_reads_to_omit compaction option; 0.0 omits nothing (the old
 + behavior) and 1.0 omits everything.
 +   - Multithreaded compaction has been removed.
 +   - Counters implementation has been changed, replaced by a safer one with
 + less caveats, but different performance characteristics. You might have
 + to change your data model to accomodate the new implementation.
 + (See https://issues.apache.org/jira/browse/CASSANDRA-6504 and the dev
 + blog post at http://www.datastax.com/dev/blog/PLACEHOLDER for details).
 +- (per-table) index_interval parameter has been replaced with
 + min_index_interval and max_index_interval paratemeters. index_interval
 + has been deprecated.
 +
  
+ 2.0.7
+ =
+ 
+ Upgrading
+ -
+ - Nothing specific to this release, but

[2/5] git commit: Fix batchlog to account for CF truncation records

Fix batchlog to account for CF truncation records

patch by Aleksey Yeschenko; reviewed by Jonathan Ellis for
CASSANDRA-6999


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/87097066
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/87097066
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/87097066

Branch: refs/heads/trunk
Commit: 87097066e7c3c133e333804c4e4b00457b6c989d
Parents: fe94e90
Author: Aleksey Yeschenko alek...@apache.org
Authored: Fri Apr 18 01:36:08 2014 +0300
Committer: Aleksey Yeschenko alek...@apache.org
Committed: Fri Apr 18 01:38:55 2014 +0300

--
 CHANGES.txt |   1 +
 .../apache/cassandra/db/BatchlogManager.java| 102 ---
 .../apache/cassandra/db/ColumnFamilyStore.java  |   6 --
 .../cassandra/db/HintedHandOffManager.java  |  16 +--
 .../org/apache/cassandra/db/RowMutation.java|   6 +-
 .../org/apache/cassandra/db/SystemTable.java|  53 +++---
 .../db/commitlog/CommitLogReplayer.java |   4 +-
 .../apache/cassandra/service/StorageProxy.java  |   9 +-
 .../cassandra/db/BatchlogManagerTest.java   |  78 --
 .../apache/cassandra/db/HintedHandOffTest.java  |   2 +-
 10 files changed, 189 insertions(+), 88 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/87097066/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 07c09cf..bb08a37 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -5,6 +5,7 @@
  * Schedule schema pulls on change (CASSANDRA-6971)
  * Non-droppable verbs shouldn't be dropped from OTC (CASSANDRA-6980)
  * Shutdown batchlog executor in SS#drain() (CASSANDRA-7025)
+ * Fix batchlog to account for CF truncation records (CASSANDRA-6999)
 
 
 1.2.16

http://git-wip-us.apache.org/repos/asf/cassandra/blob/87097066/src/java/org/apache/cassandra/db/BatchlogManager.java
--
diff --git a/src/java/org/apache/cassandra/db/BatchlogManager.java 
b/src/java/org/apache/cassandra/db/BatchlogManager.java
index b8dbadd..ea32e9d 100644
--- a/src/java/org/apache/cassandra/db/BatchlogManager.java
+++ b/src/java/org/apache/cassandra/db/BatchlogManager.java
@@ -24,10 +24,7 @@ import java.lang.management.ManagementFactory;
 import java.net.InetAddress;
 import java.nio.ByteBuffer;
 import java.util.*;
-import java.util.concurrent.CopyOnWriteArraySet;
-import java.util.concurrent.ExecutionException;
-import java.util.concurrent.ScheduledExecutorService;
-import java.util.concurrent.TimeUnit;
+import java.util.concurrent.*;
 import java.util.concurrent.atomic.AtomicBoolean;
 import java.util.concurrent.atomic.AtomicLong;
 import javax.management.MBeanServer;
@@ -36,6 +33,7 @@ import javax.management.ObjectName;
 import com.google.common.annotations.VisibleForTesting;
 import com.google.common.collect.Iterables;
 import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
 import com.google.common.util.concurrent.RateLimiter;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
@@ -254,45 +252,72 @@ public class BatchlogManager implements 
BatchlogManagerMBean
 {
 DataInputStream in = new 
DataInputStream(ByteBufferUtil.inputStream(data));
 int size = in.readInt();
+ListRowMutation mutations = new ArrayListRowMutation(size);
+
 for (int i = 0; i  size; i++)
-replaySerializedMutation(RowMutation.serializer.deserialize(in, 
VERSION), writtenAt, rateLimiter);
+{
+RowMutation mutation = RowMutation.serializer.deserialize(in, 
VERSION);
+
+// Remove CFs that have been truncated since. writtenAt and 
SystemTable#getTruncatedAt() both return millis.
+// We don't abort the replay entirely b/c this can be considered a 
succes (truncated is same as delivered then
+// truncated.
+for (UUID cfId : mutation.getColumnFamilyIds())
+if (writtenAt = SystemTable.getTruncatedAt(cfId))
+mutation = mutation.without(cfId);
+
+if (!mutation.isEmpty())
+mutations.add(mutation);
+}
+
+if (!mutations.isEmpty())
+replayMutations(mutations, writtenAt, rateLimiter);
 }
 
 /*
  * We try to deliver the mutations to the replicas ourselves if they are 
alive and only resort to writing hints
  * when a replica is down or a write request times out.
  */
-private void replaySerializedMutation(RowMutation mutation, long 
writtenAt, RateLimiter rateLimiter) throws IOException
+private void replayMutations(ListRowMutation mutations, long writtenAt, 
RateLimiter rateLimiter) throws IOException
 {
-

[jira] [Updated] (CASSANDRA-7053) USING TIMESTAMP for batches does not work


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Stepura updated CASSANDRA-7053:
---

Attachment: cassandra-2.0-7053.patch

Attaching the patch which fix the described scenario. The {{timestamp}} from 
{{Attributes}} wasn't used in {{executeWithoutConditions}}

 USING TIMESTAMP for batches does not work
 -

 Key: CASSANDRA-7053
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7053
 Project: Cassandra
  Issue Type: Bug
Reporter: Robert Supencheck
  Labels: cqlsh
 Attachments: cassandra-2.0-7053.patch


 When using the USING TIMESTAMP timestamp syntax for a batch statement, 
 the supplied timestamp is ignored.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7053) USING TIMESTAMP for batches does not work


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973616#comment-13973616
 ] 

Mikhail Stepura commented on CASSANDRA-7053:


There are a couple of places in {{executeWithConditions}} which use {{now}} 
instead {{timestam}}, though. Should those be fixed as well?
{code}

conditions = new CQL3CasConditions(statement.cfm, now);

...
UpdateParameters params = 
statement.makeUpdateParameters(Collections.singleton(key), clusteringPrefix, 
statementVariables, false, cl, now);
{code}

 USING TIMESTAMP for batches does not work
 -

 Key: CASSANDRA-7053
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7053
 Project: Cassandra
  Issue Type: Bug
Reporter: Robert Supencheck
Assignee: Mikhail Stepura
  Labels: cqlsh
 Attachments: cassandra-2.0-7053.patch


 When using the USING TIMESTAMP timestamp syntax for a batch statement, 
 the supplied timestamp is ignored.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (CASSANDRA-7053) USING TIMESTAMP for batches does not work


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Stepura updated CASSANDRA-7053:
---

Fix Version/s: 2.0.8

 USING TIMESTAMP for batches does not work
 -

 Key: CASSANDRA-7053
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7053
 Project: Cassandra
  Issue Type: Bug
Reporter: Robert Supencheck
Assignee: Mikhail Stepura
  Labels: cqlsh
 Fix For: 2.0.8

 Attachments: cassandra-2.0-7053.patch


 When using the USING TIMESTAMP timestamp syntax for a batch statement, 
 the supplied timestamp is ignored.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973618#comment-13973618
 ] 

Pavel Yaskevich commented on CASSANDRA-6694:


Done, I have force pushed to my branch, now AbstractNativeCell is handling 
size, NativeAllocator has nothing to do with it.

 Slightly More Off-Heap Memtables
 

 Key: CASSANDRA-6694
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2


 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as 
 the on-heap overhead is still very large. It should not be tremendously 
 difficult to extend these changes so that we allocate entire Cells off-heap, 
 instead of multiple BBs per Cell (with all their associated overhead).
 The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 
 bytes per cell on average for the btree overhead, for a total overhead of 
 around 20-22 bytes). This translates to 8-byte object overhead, 4-byte 
 address (we will do alignment tricks like the VM to allow us to address a 
 reasonably large memory space, although this trick is unlikely to last us 
 forever, at which point we will have to bite the bullet and accept a 24-byte 
 per cell overhead), and 4-byte object reference for maintaining our internal 
 list of allocations, which is unfortunately necessary since we cannot safely 
 (and cheaply) walk the object graph we allocate otherwise, which is necessary 
 for (allocation-) compaction and pointer rewriting.
 The ugliest thing here is going to be implementing the various CellName 
 instances so that they may be backed by native memory OR heap memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973633#comment-13973633
 ] 

Benedict commented on CASSANDRA-6694:
-

Thanks. Although it looks like you haven't updated any of the offsets to work 
with the new layout?

As to the other changes you've made: I do not like the pollution of 
PoolAllocator with supportsNative(). Since this branch is supposed to be 
pushing idiomatic Java usage, let's stick to using interfaces for 
specialisation since we can.

 Slightly More Off-Heap Memtables
 

 Key: CASSANDRA-6694
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2


 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as 
 the on-heap overhead is still very large. It should not be tremendously 
 difficult to extend these changes so that we allocate entire Cells off-heap, 
 instead of multiple BBs per Cell (with all their associated overhead).
 The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 
 bytes per cell on average for the btree overhead, for a total overhead of 
 around 20-22 bytes). This translates to 8-byte object overhead, 4-byte 
 address (we will do alignment tricks like the VM to allow us to address a 
 reasonably large memory space, although this trick is unlikely to last us 
 forever, at which point we will have to bite the bullet and accept a 24-byte 
 per cell overhead), and 4-byte object reference for maintaining our internal 
 list of allocations, which is unfortunately necessary since we cannot safely 
 (and cheaply) walk the object graph we allocate otherwise, which is necessary 
 for (allocation-) compaction and pointer rewriting.
 The ugliest thing here is going to be implementing the various CellName 
 instances so that they may be backed by native memory OR heap memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (CASSANDRA-6694) Slightly More Off-Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973633#comment-13973633
 ] 

Benedict edited comment on CASSANDRA-6694 at 4/18/14 12:36 AM:
---

Thanks. Although it looks like you haven't updated any of the offsets to work 
with the new layout?

As to the other changes you've made: I do not like the pollution of 
PoolAllocator with supportsNative() and allocateNative(). Since this branch is 
supposed to be pushing idiomatic Java usage, let's stick to using interfaces 
for specialisation since we can.


was (Author: benedict):
Thanks. Although it looks like you haven't updated any of the offsets to work 
with the new layout?

As to the other changes you've made: I do not like the pollution of 
PoolAllocator with supportsNative(). Since this branch is supposed to be 
pushing idiomatic Java usage, let's stick to using interfaces for 
specialisation since we can.

 Slightly More Off-Heap Memtables
 

 Key: CASSANDRA-6694
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2


 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as 
 the on-heap overhead is still very large. It should not be tremendously 
 difficult to extend these changes so that we allocate entire Cells off-heap, 
 instead of multiple BBs per Cell (with all their associated overhead).
 The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 
 bytes per cell on average for the btree overhead, for a total overhead of 
 around 20-22 bytes). This translates to 8-byte object overhead, 4-byte 
 address (we will do alignment tricks like the VM to allow us to address a 
 reasonably large memory space, although this trick is unlikely to last us 
 forever, at which point we will have to bite the bullet and accept a 24-byte 
 per cell overhead), and 4-byte object reference for maintaining our internal 
 list of allocations, which is unfortunately necessary since we cannot safely 
 (and cheaply) walk the object graph we allocate otherwise, which is necessary 
 for (allocation-) compaction and pointer rewriting.
 The ugliest thing here is going to be implementing the various CellName 
 instances so that they may be backed by native memory OR heap memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973645#comment-13973645
 ] 

Pavel Yaskevich commented on CASSANDRA-6694:


Why it does - internalPeer does + 4 and internalSize does - 4 when all get/set 
methods use internalPeer() + offset. Regarding (and I was waiting for that) 
supportsNative() and allocateNative - I did that because I don't want to put 
time into adding DataAllocator and DataPool interfaces that your code has just 
yet, once it's decided which way we want to go I will remove allocateNative and 
do proper work there. This still intended as just an idea presentation for how 
to handle Cell without Impl classes.

 Slightly More Off-Heap Memtables
 

 Key: CASSANDRA-6694
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2


 The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as 
 the on-heap overhead is still very large. It should not be tremendously 
 difficult to extend these changes so that we allocate entire Cells off-heap, 
 instead of multiple BBs per Cell (with all their associated overhead).
 The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 
 bytes per cell on average for the btree overhead, for a total overhead of 
 around 20-22 bytes). This translates to 8-byte object overhead, 4-byte 
 address (we will do alignment tricks like the VM to allow us to address a 
 reasonably large memory space, although this trick is unlikely to last us 
 forever, at which point we will have to bite the bullet and accept a 24-byte 
 per cell overhead), and 4-byte object reference for maintaining our internal 
 list of allocations, which is unfortunately necessary since we cannot safely 
 (and cheaply) walk the object graph we allocate otherwise, which is necessary 
 for (allocation-) compaction and pointer rewriting.
 The ugliest thing here is going to be implementing the various CellName 
 instances so that they may be backed by native memory OR heap memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6694) Slightly More Off-Heap Memtables

[
https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973647#comment-13973647
]

Benedict commented on CASSANDRA-6694:
-

bq. This still intended as just an idea presentation for how to handle Cell
without Impl classes.

OK, cool. Glad we're staying on topic :)

bq. Why it does - internalPeer does + 4 and internalSize does - 4

My mistake. I was expecting to see the static OFFSET fields updated - we should
probably optimise that before we finish up (now that we can), but obviously
fine for now.

Slightly More Off-Heap Memtables

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (CASSANDRA-7047) TriggerExecutor should group mutations by row key


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-7047:
-

Attachment: 7047-v2.txt

 TriggerExecutor should group mutations by row key
 -

 Key: CASSANDRA-7047
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7047
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Sergio Bossa
Assignee: Sergio Bossa
 Attachments: 7047-v2.txt, CASSANDRA-7047.patch


 TriggerExecutor doesn't currently group mutations returned by triggers even 
 if belonging to the same row key: while harmful per se (at least, I think 
 so), this is definitely a performance problem, because each mutation is a 
 *cluster* mutation, generating more network traffic, more disk IO and more 
 index calls (if present).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7047) TriggerExecutor should group mutations by row key


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973716#comment-13973716
 ] 

Aleksey Yeschenko commented on CASSANDRA-7047:
--

Oh, well. Cleaned up TriggerExecutorTest, and only then realized that something 
is wrong (by TriggerExecutorTest diff being all green and lacking the license 
header).

We already have TriggersTest.java. Can you move the tests there? (and rewrite 
them to match the style of the tests there, too?)

Thanks.

 TriggerExecutor should group mutations by row key
 -

 Key: CASSANDRA-7047
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7047
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Sergio Bossa
Assignee: Sergio Bossa
 Attachments: 7047-v2.txt, CASSANDRA-7047.patch


 TriggerExecutor doesn't currently group mutations returned by triggers even 
 if belonging to the same row key: while harmful per se (at least, I think 
 so), this is definitely a performance problem, because each mutation is a 
 *cluster* mutation, generating more network traffic, more disk IO and more 
 index calls (if present).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (CASSANDRA-7053) USING TIMESTAMP for batches does not work


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Stepura updated CASSANDRA-7053:
---

Reviewer: Aleksey Yeschenko

[~iamaleksey] could you please review?

 USING TIMESTAMP for batches does not work
 -

 Key: CASSANDRA-7053
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7053
 Project: Cassandra
  Issue Type: Bug
Reporter: Robert Supencheck
Assignee: Mikhail Stepura
  Labels: cqlsh
 Fix For: 2.0.8

 Attachments: cassandra-2.0-7053.patch


 When using the USING TIMESTAMP timestamp syntax for a batch statement, 
 the supplied timestamp is ignored.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7047) TriggerExecutor should group mutations by row key