[jira] [Commented] (CASSANDRA-2659) Improve forceDeserialize/getCompactedRow encapsulation

2011-05-17 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034708#comment-13034708
 ] 

Sylvain Lebresne commented on CASSANDRA-2659:
-

nitpicks:
  * the could remove the descriptor argument of the first getCompactedRow() and 
call needDeserialize() for the EchoedRow case.
  * we could use that first getCompactedRow() in SSTableWriter (it's really 
only cosmetic as we forceDesialize)
  * the comment of that first getCompactedRow() method is not completely 
correct, since the method may purge data (either if the sstable is of an old 
format or if forceDeserialize is set) while the comment suggest it never does 
it.

but those are nitpicks, so with or without +1

 Improve forceDeserialize/getCompactedRow encapsulation
 --

 Key: CASSANDRA-2659
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2659
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.1

 Attachments: 2659.txt




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2433) Failed Streams Break Repair

2011-05-17 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2433:


Attachment: 
0004-Reports-validation-compaction-errors-back-to-repair-v2.patch
0003-Report-streaming-errors-back-to-repair-v2.patch
0002-Register-in-gossip-to-handle-node-failures-v2.patch

0001-Put-repair-session-on-a-Stage-and-add-a-method-to-re-v2.patch

Attaching rebased patch (against 0.8.1). It also change the behavior a little 
bit so as to not fail repair right away if a problem occur (it still throw an 
exception at the end if any problem had occured). It turns out to be slightly 
simpler that way. Especially for CASSANDRA-1610.

 Failed Streams Break Repair
 ---

 Key: CASSANDRA-2433
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2433
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.7.4
Reporter: Benjamin Coverston
Assignee: Sylvain Lebresne
  Labels: repair
 Fix For: 0.8.1

 Attachments: 
 0001-Put-repair-session-on-a-Stage-and-add-a-method-to-re-v2.patch, 
 0001-Put-repair-session-on-a-Stage-and-add-a-method-to-re.patch, 
 0002-Register-in-gossip-to-handle-node-failures-v2.patch, 
 0002-Register-in-gossip-to-handle-node-failures.patch, 
 0003-Report-streaming-errors-back-to-repair-v2.patch, 
 0003-Report-streaming-errors-back-to-repair.patch, 
 0004-Reports-validation-compaction-errors-back-to-repair-v2.patch, 
 0004-Reports-validation-compaction-errors-back-to-repair.patch


 Running repair in cases where a stream fails we are seeing multiple problems.
 1. Although retry is initiated and completes, the old stream doesn't seem to 
 clean itself up and repair hangs.
 2. The temp files are left behind and multiple failures can end up filling up 
 the data partition.
 These issues together are making repair very difficult for nearly everyone 
 running repair on a non-trivial sized data set.
 This issue is also being worked on w.r.t CASSANDRA-2088, however that was 
 moved to 0.8 for a few reasons. This ticket is to fix the immediate issues 
 that we are seeing in 0.7.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2610) Have the repair of a range repair *all* the replica for that range

2011-05-17 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2610:


Attachment: 0001-Make-repair-repair-all-hosts.patch

Patch against 0.8.1. It applies on top of CASSANDRA-2433 because it is changing 
enough of common code that I don't want to have to deal with the rebase back 
and forth (and it actually reuse some of the refactoring of CASSANDRA-2433 
anyway)

 Have the repair of a range repair *all* the replica for that range
 --

 Key: CASSANDRA-2610
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2610
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8 beta 1
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 0.8.1

 Attachments: 0001-Make-repair-repair-all-hosts.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Say you have a range R whose replica for that range are A, B and C. If you 
 run repair on node A for that range R, when the repair end you only know that 
 A is fully repaired. B and C are not. That is B and C are up to date with A 
 before the repair, but are not up to date with one another.
 It makes it a pain to schedule optimal cluster repairs, that is repairing a 
 full cluster without doing work twice (because you would have still have to 
 run a repair on B or C, which will make A, B and C redo a validation 
 compaction on R, and with more replica it's even more annoying).
 However it is fairly easy during the first repair on A to have him compare 
 all the merkle trees, i.e the ones for B and C, and ask to B or C to stream 
 between them whichever the differences they have. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (CASSANDRA-2481) C* .deb installs C* init.d scripts such that C* comes up before mdadm and related

2011-05-18 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne reopened CASSANDRA-2481:
-


When installing the debian package for 0.7.6 and 0.8.0-rc1 on ubuntu 11.04 
(natty), I get
{noformat}
Installing new version of config file /etc/init.d/cassandra ...
update-rc.d: error: start|stop arguments not terminated by .
usage: update-rc.d [-n] [-f] basename remove
   update-rc.d [-n] basename defaults [NN | SS KK]
   update-rc.d [-n] basename start|stop NN runlvl [runlvl] [...] .
   update-rc.d [-n] basename disable|enable [S|2|3|4|5]
-n: not really
-f: force
{noformat}

Given that it works like a charm with 0.7.5, I strongly suspect this is this 
patch doing.

 C* .deb installs C* init.d scripts such that C* comes up before mdadm and 
 related
 -

 Key: CASSANDRA-2481
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2481
 Project: Cassandra
  Issue Type: Bug
  Components: Packaging
Reporter: Matthew F. Dennis
Assignee: paul cannon
Priority: Minor
 Fix For: 0.7.6, 0.8.0

 Attachments: 2481.txt


 the C* .deb packages install the init.d scripts at S20 which is before mdadm 
 and various other services.  This means that when a node reboots that C* is 
 started before the RAID sets are up and mounted causing C* to think it has no 
 data and attempt bootstrapping again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1278) Make bulk loading into Cassandra less crappy, more pluggable

2011-05-19 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-1278:


Attachment: 0001-Add-bulk-loader-utility.patch

Attaching patch that implements the simpler idea. It provide a new utility 
'sstableloader' (a fat client basically) that given a sstable (or more) will 
stream the relevant parts of that sstable to the relevant nodes.

The tool tries to be self-documented but basically you must have a sstable with 
-Data and -Index component (we really need a -Index component to be able to do 
anything) in a directory dir whose name is the keyspace and call 'sstableloader 
dir'.

Alternatively, if dir seats on one of the machine of the cluster, you can 
simply use a JMX call with as argument the path to dir.

 Make bulk loading into Cassandra less crappy, more pluggable
 

 Key: CASSANDRA-1278
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1278
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Jeremy Hanna
Assignee: Sylvain Lebresne
 Fix For: 0.8.1

 Attachments: 0001-Add-bulk-loader-utility.patch, 
 1278-cassandra-0.7-v2.txt, 1278-cassandra-0.7.1.txt, 1278-cassandra-0.7.txt

   Original Estimate: 40h
  Time Spent: 40h 40m
  Remaining Estimate: 0h

 Currently bulk loading into Cassandra is a black art.  People are either 
 directed to just do it responsibly with thrift or a higher level client, or 
 they have to explore the contrib/bmt example - 
 http://wiki.apache.org/cassandra/BinaryMemtable  That contrib module requires 
 delving into the code to find out how it works and then applying it to the 
 given problem.  Using either method, the user also needs to keep in mind that 
 overloading the cluster is possible - which will hopefully be addressed in 
 CASSANDRA-685
 This improvement would be to create a contrib module or set of documents 
 dealing with bulk loading.  Perhaps it could include code in the Core to make 
 it more pluggable for external clients of different types.
 It is just that this is something that many that are new to Cassandra need to 
 do - bulk load their data into Cassandra.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-1278) Make bulk loading into Cassandra less crappy, more pluggable

2011-05-19 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036231#comment-13036231
 ] 

Sylvain Lebresne commented on CASSANDRA-1278:
-

I'd love to, but as it turns out it is fairly heavily hardwired in Descriptor 
that the keyspace name is the directory where the file sits. And by hardwired I 
mean that even if you add a constructor to Descriptor to decorrelate the ksname 
field from the directory argument this doesn't work, because streaming only 
transmit the name of the file (including the directory), not the ksname field 
and thus would get the wrong name.

That is, I don't think we can do that without adding a new argument to the 
stream header, which felt a bit overkill at first (it's probably doable 
though).  

 Make bulk loading into Cassandra less crappy, more pluggable
 

 Key: CASSANDRA-1278
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1278
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Jeremy Hanna
Assignee: Sylvain Lebresne
 Fix For: 0.8.1

 Attachments: 0001-Add-bulk-loader-utility.patch, 
 1278-cassandra-0.7-v2.txt, 1278-cassandra-0.7.1.txt, 1278-cassandra-0.7.txt

   Original Estimate: 40h
  Time Spent: 40h 40m
  Remaining Estimate: 0h

 Currently bulk loading into Cassandra is a black art.  People are either 
 directed to just do it responsibly with thrift or a higher level client, or 
 they have to explore the contrib/bmt example - 
 http://wiki.apache.org/cassandra/BinaryMemtable  That contrib module requires 
 delving into the code to find out how it works and then applying it to the 
 given problem.  Using either method, the user also needs to keep in mind that 
 overloading the cluster is possible - which will hopefully be addressed in 
 CASSANDRA-685
 This improvement would be to create a contrib module or set of documents 
 dealing with bulk loading.  Perhaps it could include code in the Core to make 
 it more pluggable for external clients of different types.
 It is just that this is something that many that are new to Cassandra need to 
 do - bulk load their data into Cassandra.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-1278) Make bulk loading into Cassandra less crappy, more pluggable

2011-05-19 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036277#comment-13036277
 ] 

Sylvain Lebresne commented on CASSANDRA-1278:
-

I didn't do it because if I'm correct the tool stuff don't go into releases 
(which I believe is the reason why we don't have cli, sstable2json, ... in 
tools). I figured that's not necessarily something we want user to grab the 
source to get. But I suppose we can if we want (at least the script + 
BulkLoader.java, I'd be in favor of leaving SSTableLoader where it is).

 Make bulk loading into Cassandra less crappy, more pluggable
 

 Key: CASSANDRA-1278
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1278
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Jeremy Hanna
Assignee: Sylvain Lebresne
 Fix For: 0.8.1

 Attachments: 0001-Add-bulk-loader-utility.patch, 
 1278-cassandra-0.7-v2.txt, 1278-cassandra-0.7.1.txt, 1278-cassandra-0.7.txt

   Original Estimate: 40h
  Time Spent: 40h 40m
  Remaining Estimate: 0h

 Currently bulk loading into Cassandra is a black art.  People are either 
 directed to just do it responsibly with thrift or a higher level client, or 
 they have to explore the contrib/bmt example - 
 http://wiki.apache.org/cassandra/BinaryMemtable  That contrib module requires 
 delving into the code to find out how it works and then applying it to the 
 given problem.  Using either method, the user also needs to keep in mind that 
 overloading the cluster is possible - which will hopefully be addressed in 
 CASSANDRA-685
 This improvement would be to create a contrib module or set of documents 
 dealing with bulk loading.  Perhaps it could include code in the Core to make 
 it more pluggable for external clients of different types.
 It is just that this is something that many that are new to Cassandra need to 
 do - bulk load their data into Cassandra.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-1278) Make bulk loading into Cassandra less crappy, more pluggable

2011-05-19 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036278#comment-13036278
 ] 

Sylvain Lebresne commented on CASSANDRA-1278:
-

{noformat}
+outputHandler.output(Starting client and waiting 15 seconds for 
gossip ...);
+try
+{
+// Init gossip
+StorageService.instance.initClient();
{noformat}

It is in client-only mode as far as I can tell. Maybe client-only mode is 
screwed up though, I don't know.

 Make bulk loading into Cassandra less crappy, more pluggable
 

 Key: CASSANDRA-1278
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1278
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Jeremy Hanna
Assignee: Sylvain Lebresne
 Fix For: 0.8.1

 Attachments: 0001-Add-bulk-loader-utility.patch, 
 1278-cassandra-0.7-v2.txt, 1278-cassandra-0.7.1.txt, 1278-cassandra-0.7.txt

   Original Estimate: 40h
  Time Spent: 40h 40m
  Remaining Estimate: 0h

 Currently bulk loading into Cassandra is a black art.  People are either 
 directed to just do it responsibly with thrift or a higher level client, or 
 they have to explore the contrib/bmt example - 
 http://wiki.apache.org/cassandra/BinaryMemtable  That contrib module requires 
 delving into the code to find out how it works and then applying it to the 
 given problem.  Using either method, the user also needs to keep in mind that 
 overloading the cluster is possible - which will hopefully be addressed in 
 CASSANDRA-685
 This improvement would be to create a contrib module or set of documents 
 dealing with bulk loading.  Perhaps it could include code in the Core to make 
 it more pluggable for external clients of different types.
 It is just that this is something that many that are new to Cassandra need to 
 do - bulk load their data into Cassandra.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1278) Make bulk loading into Cassandra less crappy, more pluggable

2011-05-20 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-1278:


Attachment: 0001-Add-bulk-loader-utility-v2.patch

bq. It'd be nice if it printed the filename and the time it took for each time, 
since just having the percentages reset is a bit confusing.

The fact that the percentages reset is really just a bug (I test at first with 
only one sstable, my bad). Anyway, that's fixed. I also agree with Jonathan's 
objection about printing the filename. And in general I'm not sure giving too 
much information is really necessary.

bq. Also, this should respect SS.RING_DELAY

Yes, I think this is the fat client that wasn't respecting it, it was waiting 
for an hardcoded time of 5 seconds, which is almost always not enough. I've 
updated SS.initClient() to use RING_DELAY instead.


Attaching v2 that:
  * use RING_DELAY
  * update the progress indication so that percentage works. It also add for 
each host the number of files that should be transfered to it and how many have 
already been. Lastly it adds a total percentage as well as approximate transfer 
rate infos.


 Make bulk loading into Cassandra less crappy, more pluggable
 

 Key: CASSANDRA-1278
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1278
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Jeremy Hanna
Assignee: Sylvain Lebresne
 Fix For: 0.8.1

 Attachments: 0001-Add-bulk-loader-utility-v2.patch, 
 0001-Add-bulk-loader-utility.patch, 1278-cassandra-0.7-v2.txt, 
 1278-cassandra-0.7.1.txt, 1278-cassandra-0.7.txt

   Original Estimate: 40h
  Time Spent: 40h 40m
  Remaining Estimate: 0h

 Currently bulk loading into Cassandra is a black art.  People are either 
 directed to just do it responsibly with thrift or a higher level client, or 
 they have to explore the contrib/bmt example - 
 http://wiki.apache.org/cassandra/BinaryMemtable  That contrib module requires 
 delving into the code to find out how it works and then applying it to the 
 given problem.  Using either method, the user also needs to keep in mind that 
 overloading the cluster is possible - which will hopefully be addressed in 
 CASSANDRA-685
 This improvement would be to create a contrib module or set of documents 
 dealing with bulk loading.  Perhaps it could include code in the Core to make 
 it more pluggable for external clients of different types.
 It is just that this is something that many that are new to Cassandra need to 
 do - bulk load their data into Cassandra.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2280) Request specific column families using StreamIn

2011-05-23 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037826#comment-13037826
 ] 

Sylvain Lebresne commented on CASSANDRA-2280:
-

* If we're going to put that in 0.8.1 (which we should), we cannot rely on 
MessagingService.VERSION_07. We must bump the version for 0.8.0. Turns out 
CASSANDRA-2433 already have this problem, so I suggest we introduce a 
MS.VERSION_080 and stick to that (as a side note, when that's done, we should 
be careful with StreamRequestMessage as it will have a 0.7 and 0.8.0 part, i.e, 
we shouldn't blindly s/VERSION_07/VERSION_080 in there).
* In StreamHeader and StreamRequestMessage, Iterables.size() is used. Is there 
a reason for that ? Though google collections are probably smart enough to not 
do a full iteration to compute the size when possible, in theory we can't 
really be sure so I don't see why not use .size() (and use a Collection 
instead of Iterable in StreamHeader, although see next point).
* Why are we sending the cfs in StreamHeader at all. It's never used and I 
don't see why it should (StreamInSession will know what it receive with each 
file, no reason why it should know upfront what was the request that initiated 
the streaming).

 Request specific column families using StreamIn
 ---

 Key: CASSANDRA-2280
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2280
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Jonathan Ellis
 Fix For: 0.8.1

 Attachments: 
 0001-Allow-specific-column-families-to-be-requested-for-str.txt, 
 0001-Allow-specific-column-families-to-be-requested-for-str.txt, 2280-v3.txt


 StreamIn.requestRanges only specifies a keyspace, meaning that requesting a 
 range will request it for all column families: if you have a large number of 
 CFs, this can cause quite a headache.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-1278) Make bulk loading into Cassandra less crappy, more pluggable

2011-05-23 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038007#comment-13038007
 ] 

Sylvain Lebresne commented on CASSANDRA-1278:
-

Note that I'm marking this resolved since that has been committed. However, as 
it stands sstableloader doesn't handler failure very well (because streaming 
doesn't). Once CASSANDRA-2433 is committed, this can be easily improved.

 Make bulk loading into Cassandra less crappy, more pluggable
 

 Key: CASSANDRA-1278
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1278
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Jeremy Hanna
Assignee: Sylvain Lebresne
 Fix For: 0.8.1

 Attachments: 0001-Add-bulk-loader-utility-v2.patch, 
 0001-Add-bulk-loader-utility.patch, 1278-cassandra-0.7-v2.txt, 
 1278-cassandra-0.7.1.txt, 1278-cassandra-0.7.txt

   Original Estimate: 40h
  Time Spent: 40h 40m
  Remaining Estimate: 0h

 Currently bulk loading into Cassandra is a black art.  People are either 
 directed to just do it responsibly with thrift or a higher level client, or 
 they have to explore the contrib/bmt example - 
 http://wiki.apache.org/cassandra/BinaryMemtable  That contrib module requires 
 delving into the code to find out how it works and then applying it to the 
 given problem.  Using either method, the user also needs to keep in mind that 
 overloading the cluster is possible - which will hopefully be addressed in 
 CASSANDRA-685
 This improvement would be to create a contrib module or set of documents 
 dealing with bulk loading.  Perhaps it could include code in the Core to make 
 it more pluggable for external clients of different types.
 It is just that this is something that many that are new to Cassandra need to 
 do - bulk load their data into Cassandra.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2690) Make the release build fail if the publish to central repository also fails

2011-05-23 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038124#comment-13038124
 ] 

Sylvain Lebresne commented on CASSANDRA-2690:
-

+1

 Make the release build fail if the publish to central repository also fails
 ---

 Key: CASSANDRA-2690
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2690
 Project: Cassandra
  Issue Type: Improvement
  Components: Packaging
Affects Versions: 0.7.7, 0.8.0
Reporter: Stephen Connolly
 Attachments: CASSANDRA-2690-v-trunk.patch, CASSANDRA-2690-v0.7.patch, 
 CASSANDRA-2690-v0.8.patch


 If the publish to Central fails for one artifact that failure is not picked 
 up.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-1978) get_range_slices: allow key and token to be interoperable

2011-05-24 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038420#comment-13038420
 ] 

Sylvain Lebresne commented on CASSANDRA-1978:
-

What about computing the token from the key for CASSANDRA-2003 ?

Right now it's a legit things to do to walk all keys until CASSANDRA-1034. 
After CASSANDRA-1034 it will have a risk to miss some keys, but we'll then have 
this problem with hadoop  too. But then I'm not sure the fix attached here is 
the right one. I think the right fix will be to only allow keys in 
range_slices, but all to specify if we're asking for a bound or a range. That 
is, changing KeyRange to something like:
{noformat}
struct KeyRange {
1: required binary start_key,
2: required binary end_key,
3: required boolean start_inclusive,
5: required i32 count=100
}
{noformat}

Anyway, we should probably defer that to later, but unless I'm missing 
something this shouldn't block CASSANDRA-2003.

 get_range_slices: allow key and token to be interoperable
 -

 Key: CASSANDRA-1978
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1978
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Kelvin Kakugawa
Assignee: Kelvin Kakugawa
Priority: Minor
 Fix For: 0.8.1

 Attachments: 
 0001-CASSANDRA-1978-allow-key-token-to-be-interoperable-i.patch


 problem: get_range_slices requires two keys or two tokens, so we can't walk a 
 randomly partitioned cluster by token.
 solution: allow keys and tokens to be mixed.  however, if one side is a 
 token, promote the bounds to a dht.Range, instead of a dht.Bounds.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2675) java.io.IOError: java.io.EOFException with version 0.7.6

2011-05-24 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2675:


Attachment: 0002-Avoid-modifying-super-column-in-memtable-being-flush.patch
0001-Don-t-remove-columns-from-super-columns-in-memtable.patch

I was able to reproduce, thanks for the java version.

I think the problem is that reads can remove subcolumns from a super-column 
that happens to be in a memtable being flushed. If a subcolumn become gc-able 
after when the super column count size was written on disk and the time the 
subcolumn itself is written we won't write it and will end up with short super 
columns (hence the EOFException). Note that this should not happen with a 
reasonable gc_grace value (one such that nothing that gets flushed will be 
gcable).

First attached patch fixes this by making reads copy the super-column before 
modifying it (0.7 patch).

I think there is a related second bug, in that when we reduce super columns (in 
QueryFilter), if we merge multiple super column with the same name, we'll 
merge them in the first super column. That is, we may end up adding 
subcolumns to a super column that is in an in-memory memtable. Most of the time 
this will be harmless, except some useless data duplication. But if that 
happens for a super column (in a memtable) being flushed and, as above, between 
the write of the number of column and the actual column writes, we may end up 
with too long super column. With could result in unreachable columns (i.e, data 
loss effectively) and quite probably some weird corruption during a compaction.

Second patch fixes this second problem.

I haven't been able to reproduce with the 2 attached patches and the thing is 
running since more than an hour.


 java.io.IOError: java.io.EOFException with version 0.7.6 
 -

 Key: CASSANDRA-2675
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2675
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.6
 Environment: Reproduced on single Cassandra node (CentOS 5.5)
 Reproduced on single Cassandra node (Windows Server 2008)
Reporter: rene kochen
Assignee: Sylvain Lebresne
 Fix For: 0.7.7

 Attachments: 
 0001-Don-t-remove-columns-from-super-columns-in-memtable.patch, 
 0002-Avoid-modifying-super-column-in-memtable-being-flush.patch, 
 CassandraIssue.zip, CassandraIssueJava.zip


 I use the following data-model
 column_metadata: []
 name: Customers
 column_type: Super
 gc_grace_seconds: 60
 I have a super-column-family with a single row.
 Within this row I have a single super-column.
 Within this super-column, I concurrently create, read and delete columns.
 I have three threads:
 - Do in a loop: add a column to the super-column.
 - Do in a loop: delete a random column from the super-column.
 - Do in a loop: read the super-column (with all columns).
 After running the above threads concurrently, I always receive one of the 
 following errors:
 ERROR 17:09:57,036 Fatal exception in thread Thread[ReadStage:81,5,main]
 java.io.IOError: java.io.EOFException
 at 
 org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:252)
 at 
 org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:268)
 at 
 org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:227)
 at java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(Unknown 
 Source)
 at java.util.concurrent.ConcurrentSkipListMap.init(Unknown Source)
 at 
 org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:379)
 at 
 org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:362)
 at 
 org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:322)
 at 
 org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:79)
 at 
 org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:40)
 at 
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
 at 
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
 at 
 org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:108)
 at 
 org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:283)
 at 
 org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326)
 at 
 org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)
 at 
 org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:69)
   

[jira] [Created] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)

2011-05-24 Thread Sylvain Lebresne (JIRA)
Instrument repair to be able to assess it's efficiency (precision)
--

 Key: CASSANDRA-2698
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor


Some reports indicate that repair sometime transfer huge amounts of data. One 
hypothesis is that the merkle tree precision may deteriorate too much at some 
data size. To check this hypothesis, it would be reasonably to gather statistic 
during the merkle tree building of how many rows each merkle tree range account 
for (and the size that this represent). It is probably an interesting statistic 
to have anyway.   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2675) java.io.IOError: java.io.EOFException with version 0.7.6

2011-05-24 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038571#comment-13038571
 ] 

Sylvain Lebresne commented on CASSANDRA-2675:
-

Yes I agree, patch 2 is actually enough.

 java.io.IOError: java.io.EOFException with version 0.7.6 
 -

 Key: CASSANDRA-2675
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2675
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.6
 Environment: Reproduced on single Cassandra node (CentOS 5.5)
 Reproduced on single Cassandra node (Windows Server 2008)
Reporter: rene kochen
Assignee: Sylvain Lebresne
 Fix For: 0.7.7

 Attachments: 
 0001-Don-t-remove-columns-from-super-columns-in-memtable.patch, 
 0002-Avoid-modifying-super-column-in-memtable-being-flush.patch, 
 CassandraIssue.zip, CassandraIssueJava.zip


 I use the following data-model
 column_metadata: []
 name: Customers
 column_type: Super
 gc_grace_seconds: 60
 I have a super-column-family with a single row.
 Within this row I have a single super-column.
 Within this super-column, I concurrently create, read and delete columns.
 I have three threads:
 - Do in a loop: add a column to the super-column.
 - Do in a loop: delete a random column from the super-column.
 - Do in a loop: read the super-column (with all columns).
 After running the above threads concurrently, I always receive one of the 
 following errors:
 ERROR 17:09:57,036 Fatal exception in thread Thread[ReadStage:81,5,main]
 java.io.IOError: java.io.EOFException
 at 
 org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:252)
 at 
 org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:268)
 at 
 org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:227)
 at java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(Unknown 
 Source)
 at java.util.concurrent.ConcurrentSkipListMap.init(Unknown Source)
 at 
 org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:379)
 at 
 org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:362)
 at 
 org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:322)
 at 
 org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:79)
 at 
 org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:40)
 at 
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
 at 
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
 at 
 org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:108)
 at 
 org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:283)
 at 
 org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326)
 at 
 org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)
 at 
 org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:69)
 at 
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
 at 
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
 at 
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:116)
 at 
 org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilter.java:130)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1390)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1267)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1195)
 at org.apache.cassandra.db.Table.getRow(Table.java:324)
 at 
 org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:63)
 at 
 org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:451)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
 Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)
 Caused by: java.io.EOFException
 at java.io.RandomAccessFile.readByte(Unknown Source)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.readShortLength(ByteBufferUtil.java:324)
 at 
 

[jira] [Commented] (CASSANDRA-2280) Request specific column families using StreamIn

2011-05-24 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038600#comment-13038600
 ] 

Sylvain Lebresne commented on CASSANDRA-2280:
-

* In SSTableLoader, calling Table.open() isn't really neat in that in the case 
of the 'external' bulk loader, it's a fat client, so that will imply creating 
directories, etc... for no good reason (I haven't test but I would be surprised 
it actually throw an exception). We'd better give an empty list. Or even better 
(in my opinion), my next point.
* I don't find that very logic for streamOutSession to take a collection of 
cfs. The coupling seems unnecessary. The problem we're solving is to ask 
another node to transfer us some range for some CF. So what about having the 
list of CFs only in StreamRequestMessage and add the list of cfs to use as an 
argument to StreamOut.transferRanges() ? We don't need it anywhere else.
* In StreamRequestMessage, we should write the operation type even if version 
is VERSION_080 (same for deserialization). Nitpick: and couldn't we use the cf 
ids instead of the names ?
* In StreamRequestMessage, the field is a Collection but we're still using 
Iterables.size() inside. Pretty sure that doesn't leave much option :) I mean, 
my remark was more about saying why add something that may make people wonder 
for no reason since that's not something that is widespread in the code. 
Anyway, just saying, I don't care.
* I suppose the bump of MessagingService from 2 to 81 was on purpose ? (I don't 
mind, just pointing out to make sure)


 Request specific column families using StreamIn
 ---

 Key: CASSANDRA-2280
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2280
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Jonathan Ellis
 Fix For: 0.8.1

 Attachments: 
 0001-Allow-specific-column-families-to-be-requested-for-str.txt, 
 0001-Allow-specific-column-families-to-be-requested-for-str.txt, 2280-v3.txt, 
 2280-v4.txt


 StreamIn.requestRanges only specifies a keyspace, meaning that requesting a 
 range will request it for all column families: if you have a large number of 
 CFs, this can cause quite a headache.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2675) java.io.IOError: java.io.EOFException with version 0.7.6

2011-05-24 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2675:


Attachment: 
0002-Avoid-modifying-super-column-in-memtable-being-flush-v2.patch

bq. Is it? Don't you still have the problem of a tombstone cleanup modifying 
things mid-flush?

Patch 2 make sure that the cf returned by a getTopLevelColumns() doesn't have 
any super column that is an alias of a super column in some memtable. So then 
we don't care what consumers of the result getTopLevelColumns() do. Even if 
they remove columns the 'being flushed' super column won't be affected.

The idea of not always copying in the first patch was to not incure the copy to 
all the part of the code that doesn't care (mainly compaction). But anyway, I 
do think that patch 2 is enough.

Attaching v2 of patch 2 to use isEmpty.

 java.io.IOError: java.io.EOFException with version 0.7.6 
 -

 Key: CASSANDRA-2675
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2675
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.6
 Environment: Reproduced on single Cassandra node (CentOS 5.5)
 Reproduced on single Cassandra node (Windows Server 2008)
Reporter: rene kochen
Assignee: Sylvain Lebresne
 Fix For: 0.7.7

 Attachments: 
 0001-Don-t-remove-columns-from-super-columns-in-memtable.patch, 
 0002-Avoid-modifying-super-column-in-memtable-being-flush-v2.patch, 
 0002-Avoid-modifying-super-column-in-memtable-being-flush.patch, 
 CassandraIssue.zip, CassandraIssueJava.zip


 I use the following data-model
 column_metadata: []
 name: Customers
 column_type: Super
 gc_grace_seconds: 60
 I have a super-column-family with a single row.
 Within this row I have a single super-column.
 Within this super-column, I concurrently create, read and delete columns.
 I have three threads:
 - Do in a loop: add a column to the super-column.
 - Do in a loop: delete a random column from the super-column.
 - Do in a loop: read the super-column (with all columns).
 After running the above threads concurrently, I always receive one of the 
 following errors:
 ERROR 17:09:57,036 Fatal exception in thread Thread[ReadStage:81,5,main]
 java.io.IOError: java.io.EOFException
 at 
 org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:252)
 at 
 org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:268)
 at 
 org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:227)
 at java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(Unknown 
 Source)
 at java.util.concurrent.ConcurrentSkipListMap.init(Unknown Source)
 at 
 org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:379)
 at 
 org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:362)
 at 
 org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:322)
 at 
 org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:79)
 at 
 org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:40)
 at 
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
 at 
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
 at 
 org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:108)
 at 
 org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:283)
 at 
 org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326)
 at 
 org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)
 at 
 org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:69)
 at 
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
 at 
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
 at 
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:116)
 at 
 org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilter.java:130)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1390)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1267)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1195)
 at org.apache.cassandra.db.Table.getRow(Table.java:324)
 at 
 

[jira] [Commented] (CASSANDRA-2669) Scrub does not close files

2011-05-25 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13039213#comment-13039213
 ] 

Sylvain Lebresne commented on CASSANDRA-2669:
-

+1

 Scrub does not close files
 --

 Key: CASSANDRA-2669
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2669
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 0.7.3
Reporter: Daniel Doubleday
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.7.7, 0.8.1

 Attachments: 2669.txt


 After scrubbing I find that cassandra process still holds file handles to the 
 deleted sstables:
 {noformat}
 root@blnrzh047:/mnt/cassandra# jps
 6932 Jps
 32359 CassandraDaemon
 32398 CassandraJmxHttpServer
 root@blnrzh047:/mnt/cassandra# du -sh .
 315G  .
 root@blnrzh047:/mnt/cassandra# df -h .
 FilesystemSize  Used Avail Use% Mounted on
 /dev/md0  1.1T  626G  420G  60% /mnt/cassandra
 root@blnrzh047:/mnt/cassandra# lsof | grep /mnt
 java  32359root  356r  REG9,0   24
 4194599 /mnt/cassandra/data/system/Migrations-f-13-Index.db (deleted)
 java  32359root  357r  REG9,0   329451
 4194547 /mnt/cassandra/data/system/HintsColumnFamily-f-588-Data.db (deleted)
 java  32359root  358r  REG9,0   22
 4194546 /mnt/cassandra/data/system/HintsColumnFamily-f-588-Index.db (deleted)
 java  32359root  359r  REG9,0   313225
 4194534 /mnt/cassandra/data/system/HintsColumnFamily-f-587-Data.db (deleted)
 java  32359root  360r  REG9,0   22
 4194494 /mnt/cassandra/data/system/HintsColumnFamily-f-587-Index.db (deleted)
 java  32359root  361r  REG9,030452
 4194636 /mnt/cassandra/data/system/Schema-f-13-Data.db (deleted)
 java  32359root  362r  REG9,0  484
 4194635 /mnt/cassandra/data/system/Schema-f-13-Index.db (deleted)
 {noformat}
 I guess there's a missing dataFile.close() in CompactionManager:648

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2716) avoid allocating a new serializer per ColumnFamily (row)

2011-05-27 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13040133#comment-13040133
 ] 

Sylvain Lebresne commented on CASSANDRA-2716:
-

+1

 avoid allocating a new serializer per ColumnFamily (row)
 

 Key: CASSANDRA-2716
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2716
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Trivial
 Fix For: 0.7.7, 0.8.1

 Attachments: 2716.txt


 Column.serializer and Supercolumn.serializer both allocate new objects with 
 each call. The most frequent offender is the ColumnFamily constructor.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2718) NPE in SSTableWriter when no ReplayPosition availible

2011-05-27 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13040236#comment-13040236
 ] 

Sylvain Lebresne commented on CASSANDRA-2718:
-

+1

 NPE in SSTableWriter when no ReplayPosition availible
 -

 Key: CASSANDRA-2718
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2718
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.1
Reporter: T Jake Luciani
Assignee: T Jake Luciani
Priority: Trivial
 Fix For: 0.8.1

 Attachments: 
 v1-0001-CASSANDRA-2718-avoide-NPE-when-bypassing-commitlog.txt


 The following NPE occurs when durable_writes is set to false
 {noformat}
 ERROR 09:20:30,378 Fatal exception in thread Thread[FlushWriter:11,5,main]
 java.lang.RuntimeException: java.lang.NullPointerException
   at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:619)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.cassandra.db.commitlog.ReplayPosition$ReplayPositionSerializer.serialize(ReplayPosition.java:127)
   at 
 org.apache.cassandra.io.sstable.SSTableWriter.writeMetadata(SSTableWriter.java:209)
   at 
 org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:187)
   at 
 org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:173)
   at 
 org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:253)
   at org.apache.cassandra.db.Memtable.access$400(Memtable.java:49)
   at org.apache.cassandra.db.Memtable$3.runMayThrow(Memtable.java:270)
   at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
   ... 3 more
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2641) AbstractBounds.normalize should deal with overlapping ranges

2011-05-27 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2641:


Attachment: 0001-Make-normalize-deoverlap-ranges.patch

 AbstractBounds.normalize should deal with overlapping ranges
 

 Key: CASSANDRA-2641
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2641
 Project: Cassandra
  Issue Type: Test
  Components: Core
Reporter: Stu Hood
Assignee: Stu Hood
Priority: Minor
 Fix For: 0.8.1

 Attachments: 0001-Assert-non-overlapping-ranges-in-normalize.txt, 
 0001-Make-normalize-deoverlap-ranges.patch, 
 0002-Don-t-use-overlapping-ranges-in-tests.txt


 Apparently no consumers have encountered it in production, but 
 AbstractBounds.normalize does not handle overlapping ranges. If given 
 overlapping ranges, the output will be sorted but still overlapping, for 
 which SSTableReader.getPositionsForRanges will choose ranges in an SSTable 
 that may overlap.
 We should either add an assert in normalize(), or in getPositionsForRanges() 
 to ensure that this never bites us in production.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (CASSANDRA-2641) AbstractBounds.normalize should deal with overlapping ranges

2011-05-27 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne reopened CASSANDRA-2641:
-


Sorry, I was much too quick in reviewing this. The patch has two problems:
 * It works only for Bounds, not Range. It will say that (1, 2] and (2, 3] are 
overlapping but it's not.
 * It does the check on the unsorted input list, so that's another reason why 
he will uncorrectly report overlapping

Because I'm stubborn, I'm attaching a patch that take the approach of making 
normalize() deoverlap overlapping ranges. It also add a number of unit tests 
for normalize.


 AbstractBounds.normalize should deal with overlapping ranges
 

 Key: CASSANDRA-2641
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2641
 Project: Cassandra
  Issue Type: Test
  Components: Core
Reporter: Stu Hood
Assignee: Stu Hood
Priority: Minor
 Fix For: 0.8.1

 Attachments: 0001-Assert-non-overlapping-ranges-in-normalize.txt, 
 0001-Make-normalize-deoverlap-ranges.patch, 
 0002-Don-t-use-overlapping-ranges-in-tests.txt


 Apparently no consumers have encountered it in production, but 
 AbstractBounds.normalize does not handle overlapping ranges. If given 
 overlapping ranges, the output will be sorted but still overlapping, for 
 which SSTableReader.getPositionsForRanges will choose ranges in an SSTable 
 that may overlap.
 We should either add an assert in normalize(), or in getPositionsForRanges() 
 to ensure that this never bites us in production.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (CASSANDRA-2709) sstableloader throws an exception when RF1

2011-05-27 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne resolved CASSANDRA-2709.
-

Resolution: Duplicate

It's CASSANDRA-2641 that is buggy. As one can see in this message, those range 
do not overlap. I've reopen CASSANDRA-2641 to fix so closing this one as 
duplicate.

 sstableloader throws an exception when RF1
 ---

 Key: CASSANDRA-2709
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2709
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.1
Reporter: Brandon Williams
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 0.8.1


 {noformat}
 Exception in thread main java.lang.AssertionError: Overlapping ranges 
 passed to normalize: see CASSANDRA-2461: 
 (113427455640312821154458202477256070484,170141183460469231731687303715884105726]
  and 
 [(56713727820156410577229101238628035242,113427455640312821154458202477256070484]]
 at 
 org.apache.cassandra.dht.AbstractBounds.normalize(AbstractBounds.java:104)
 at 
 org.apache.cassandra.io.sstable.SSTableReader.getPositionsForRanges(SSTableReader.java:497)
 at 
 org.apache.cassandra.streaming.StreamOut.createPendingFiles(StreamOut.java:168)
 at 
 org.apache.cassandra.streaming.StreamOut.transferSSTables(StreamOut.java:148)
 at 
 org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:128)
 at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:61)
 {noformat}
 However, it does appear to keep streaming files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2709) sstableloader throws an exception when RF1

2011-05-27 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13040256#comment-13040256
 ] 

Sylvain Lebresne commented on CASSANDRA-2709:
-

I've also reverted the buggy assertion, so this should not be a problem anymore.

 sstableloader throws an exception when RF1
 ---

 Key: CASSANDRA-2709
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2709
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.1
Reporter: Brandon Williams
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 0.8.1


 {noformat}
 Exception in thread main java.lang.AssertionError: Overlapping ranges 
 passed to normalize: see CASSANDRA-2461: 
 (113427455640312821154458202477256070484,170141183460469231731687303715884105726]
  and 
 [(56713727820156410577229101238628035242,113427455640312821154458202477256070484]]
 at 
 org.apache.cassandra.dht.AbstractBounds.normalize(AbstractBounds.java:104)
 at 
 org.apache.cassandra.io.sstable.SSTableReader.getPositionsForRanges(SSTableReader.java:497)
 at 
 org.apache.cassandra.streaming.StreamOut.createPendingFiles(StreamOut.java:168)
 at 
 org.apache.cassandra.streaming.StreamOut.transferSSTables(StreamOut.java:148)
 at 
 org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:128)
 at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:61)
 {noformat}
 However, it does appear to keep streaming files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2641) AbstractBounds.normalize should deal with overlapping ranges

2011-05-27 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2641:


Attachment: 0001-Make-normalize-deoverlap-ranges.patch

 AbstractBounds.normalize should deal with overlapping ranges
 

 Key: CASSANDRA-2641
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2641
 Project: Cassandra
  Issue Type: Test
  Components: Core
Reporter: Stu Hood
Assignee: Stu Hood
Priority: Minor
 Fix For: 0.8.1

 Attachments: 0001-Assert-non-overlapping-ranges-in-normalize.txt, 
 0001-Make-normalize-deoverlap-ranges.patch, 
 0002-Don-t-use-overlapping-ranges-in-tests.txt


 Apparently no consumers have encountered it in production, but 
 AbstractBounds.normalize does not handle overlapping ranges. If given 
 overlapping ranges, the output will be sorted but still overlapping, for 
 which SSTableReader.getPositionsForRanges will choose ranges in an SSTable 
 that may overlap.
 We should either add an assert in normalize(), or in getPositionsForRanges() 
 to ensure that this never bites us in production.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (CASSANDRA-2719) Super Column Counters Increment on Read

2011-05-27 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne resolved CASSANDRA-2719.
-

   Resolution: Duplicate
Fix Version/s: 0.8.0

This turns out to be a duplicate of CASSANDRA-2675. I've thus committed it to 
0.8.0 too (it was already committed on the 0.7 and 0.8 branches).

 Super Column Counters Increment on Read
 ---

 Key: CASSANDRA-2719
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2719
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8 beta 1
 Environment: Tested on 0.8.0-rc1 on both a 3 node cluster and single 
 instance.
Reporter: Greg Hinkle
 Fix For: 0.8.0

 Attachments: SuperCountTest.java


 Running a large number of batch increments on a set of counters in a super CF 
 seems to put some of the counters into a strange state where they increment 
 every time you read from them. Including just doing a list or get from the 
 cli. Will attach test that reproduces problem.
 For example, after running the test (and it completing and the process 
 stopping).
 [default@Chires] get CountTest[01];
 = (super_column=1306512590369,
  (counter=n, value=25625))
 Returned 1 results.
 [default@Chires] get CountTest[01];
 = (super_column=1306512590369,
  (counter=n, value=26610))
 Returned 1 results.
 From debug logs at the same time:
 DEBUG 12:42:13,899 get_slice
 DEBUG 12:42:13,899 Command/ConsistencyLevel is 
 SliceFromReadCommand(table='Chires', key='01', 
 column_parent='QueryPath(columnFamilyName='CountTest', 
 superColumnName='null', columnName='null')', start='', finish='', 
 reversed=false, count=100)/ONE
 DEBUG 12:42:13,899 Blockfor/repair is 1/true; setting up requests to 
 /127.0.0.2
 DEBUG 12:42:13,899 reading data from /127.0.0.2
 DEBUG 12:42:13,900 Processing response on a callback from 210570@/127.0.0.2
 DEBUG 12:42:13,900 Preprocessed data response
 DEBUG 12:42:13,900 Read: 1 ms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2721) nodetool statusthrift exception while node starts up

2011-05-30 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041031#comment-13041031
 ] 

Sylvain Lebresne commented on CASSANDRA-2721:
-

I don't think nodetool statusthrift exists, but yeah, makes sense, +1. BUT, 
let's just put it in 0.8.1 however, for the sake of making Eric job's easier 
when he re-roll 0.8.0 (in the meantime, any hypothetical implementation of a 
nodetool statusthrift could just catch the IllegalStateException).

 nodetool statusthrift exception while node starts up
 

 Key: CASSANDRA-2721
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2721
 Project: Cassandra
  Issue Type: Bug
Reporter: Chris Goffinet
Assignee: Chris Goffinet
Priority: Trivial
 Fix For: 0.8.0

 Attachments: 
 0001-If-RPCServer-isn-t-started-just-return-false-instead.patch


 We noticed when calling nodetool statusthrift, while a node is starting up, 
 it throws an exception. I think the proper behavior should be just return 
 false, instead of throwing an exception if RPC server hasn't started yet. 
 That way this stack trace won't have to be thrown in nodetool:
 Exception in thread main 
 java.lang.IllegalStateException: No configured RPC daemon

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2722) nodetool statusthrift

2011-05-30 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041057#comment-13041057
 ] 

Sylvain Lebresne commented on CASSANDRA-2722:
-

printisThriftServerRunning() should have the 'i' of 'is' in caps. Also, 
printing just 'true' or 'false' is maybe a big harsh (maybe 'Running' or 'Not 
running' would be slightly more friendly and not so much harder to use in a 
script). But apart from those nitpick, +1.

 nodetool statusthrift
 -

 Key: CASSANDRA-2722
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2722
 Project: Cassandra
  Issue Type: Improvement
Reporter: Chris Goffinet
Assignee: Chris Goffinet
Priority: Trivial
 Fix For: 0.8.1

 Attachments: 
 0001-Added-the-ability-to-check-thrift-status-in-nodetool.patch


 Provide the status of thrift server, if it's running or not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (CASSANDRA-2653) index scan errors out when zero columns are requested

2011-05-30 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne reassigned CASSANDRA-2653:
---

Assignee: Sylvain Lebresne  (was: Jonathan Ellis)

 index scan errors out when zero columns are requested
 -

 Key: CASSANDRA-2653
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2653
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0 beta 2
Reporter: Jonathan Ellis
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 0.7.7

 Attachments: v1-0001-CASSANDRA-2653-reproduce-regression.txt


 As reported by Tyler Hobbs as an addendum to CASSANDRA-2401,
 {noformat}
 ERROR 16:13:38,864 Fatal exception in thread Thread[ReadStage:16,5,main]
 java.lang.AssertionError: No data found for 
 SliceQueryFilter(start=java.nio.HeapByteBuffer[pos=10 lim=10 cap=30], 
 finish=java.nio.HeapByteBuffer[pos=17 lim=17 cap=30], reversed=false, 
 count=0] in DecoratedKey(81509516161424251288255223397843705139, 
 6b657931):QueryPath(columnFamilyName='cf', superColumnName='null', 
 columnName='null') (original filter 
 SliceQueryFilter(start=java.nio.HeapByteBuffer[pos=10 lim=10 cap=30], 
 finish=java.nio.HeapByteBuffer[pos=17 lim=17 cap=30], reversed=false, 
 count=0]) from expression 'cf.626972746864617465 EQ 1'
   at 
 org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1517)
   at 
 org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:42)
   at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2653) index scan errors out when zero columns are requested

2011-05-30 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2653:


Attachment: 0001-Reset-SSTII-in-EchoedRow-constructor.patch

This is indeed compaction related (but not related to secondary indexing at
all). The problem is that compaction may lose some rows.

Because of the way the ReducingIterator works, when we create a new
{Pre|Lazy|Echoed}CompactedRow, we have already decoded the next row key and
the file pointer if after that next row key. Both PreCompactedRow and
LazyCompactedRow handle this correctly by resetting their
SSTableIdentityIterator before reading (SSTII.getColumnFamilyWithColumns()
does it for PreCompactedRow and LazilyCompactedRow calls SSTII.reset()
directly). But EchoedRow doesn't handle this correctly. Hence when
EchoedRow.isEmpty() is called, it will call SSTII.hasNext(), that will compare
the current file pointer to the finishedAt value of the iterator. The pointer
being on the next row, this test will always fail and the row will be skipped.

Attaching a patch against 0.8 with a (smaller) unit test.

Note that luckily this doesn't affect 0.7, because it only uses EchoedRow for
cleanup compactions and clean compactions does not use ReducingIterator (and
thus, the underlying SSTII won't have changed when the EchoedRow is built).
I would still be in favor of committing the patch there too, just to make sure
we don't hit this later.

 index scan errors out when zero columns are requested
 -

 Key: CASSANDRA-2653
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2653
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0 beta 2
Reporter: Jonathan Ellis
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 0.7.7

 Attachments: 0001-Reset-SSTII-in-EchoedRow-constructor.patch, 
 v1-0001-CASSANDRA-2653-reproduce-regression.txt


 As reported by Tyler Hobbs as an addendum to CASSANDRA-2401,
 {noformat}
 ERROR 16:13:38,864 Fatal exception in thread Thread[ReadStage:16,5,main]
 java.lang.AssertionError: No data found for 
 SliceQueryFilter(start=java.nio.HeapByteBuffer[pos=10 lim=10 cap=30], 
 finish=java.nio.HeapByteBuffer[pos=17 lim=17 cap=30], reversed=false, 
 count=0] in DecoratedKey(81509516161424251288255223397843705139, 
 6b657931):QueryPath(columnFamilyName='cf', superColumnName='null', 
 columnName='null') (original filter 
 SliceQueryFilter(start=java.nio.HeapByteBuffer[pos=10 lim=10 cap=30], 
 finish=java.nio.HeapByteBuffer[pos=17 lim=17 cap=30], reversed=false, 
 count=0]) from expression 'cf.626972746864617465 EQ 1'
   at 
 org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1517)
   at 
 org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:42)
   at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2405) should expose 'time since last successful repair' for easier aes monitoring

2011-06-01 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042175#comment-13042175
 ] 

Sylvain Lebresne commented on CASSANDRA-2405:
-

This needs rebasing. First, two small remarks:
  * It seems we store the time in microseconds but then, when computing the 
time since last repair we use System.currentTimeMillis() - stored_time.
  * I would be in favor of calling the system table REPAIR_INFO, because the 
truth is I think it would make sense to record a number of other statistics on 
repair and it doesn't hurt to make the system table less specific. That also 
means we should probably not force any type for the value (though that can be 
easily changed later, so it's not a bit deal for this patch).
  * I think we usually put the code to query the system table in SystemTable, 
so I would move it from AntiEntropy to there.

Then more generally, a given repair involves multiple states and multiple 
nodes, so I don't think keeping only one timestamp is enough. Right now, we 
save the time of the last scheduled validation compaction on each node. With 
only that we're missing information so that people can do any reasonably inform 
decision:
* First, this does not correspond to the last repair session started on 
that node, since the validation can be a request from another node. People may 
be interested by that information.
* Second, given that repair concerns a given range, keeping only one 
general number is wrong (it would suggest the node have been repaired recently 
even when only one range out of 3 or 5 have been actually repaired).
   * Third, though recording the start of the validation compaction is 
important, this says nothing on the success of the repair (and we all know 
failing during repair do happen, if only because it's a fairly long operation 
during which node can die). So we need to record some info on the success of 
the operation if we don't want to return misleading information. Turns out, 
this is easy to record on the node coordinating the repair, maybe not so much 
on the other node participating in the repair.

Truth is, I'm not so sure what is the simplest way to handle this. Maybe one 
option could be to only register the start and end time of a repair session on 
the coordinator of the repair (adding the info of which range was repaired).

Also, what do people think of keeping an history (instead of just keeping the 
last number). I'm thinking a little bit ahead here, but what about storing one 
supercolumn by repair, where the super column name would be the repair session 
id (a TimeUUID really) and the columns infos on that repair. For this patch we 
would only record the range for that session, the start time and the end time 
(or maybe one end time for each node). But we would populate this a little bit 
further with stuff like CASSANDRA-2698. I think having such history would be 
fairly interesting.


 should expose 'time since last successful repair' for easier aes monitoring
 ---

 Key: CASSANDRA-2405
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2405
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Peter Schuller
Assignee: Pavel Yaskevich
Priority: Minor
 Fix For: 0.8.1

 Attachments: CASSANDRA-2405-v2.patch, CASSANDRA-2405.patch


 The practical implementation issues of actually ensuring repair runs is 
 somewhat of an undocumented/untreated issue.
 One hopefully low hanging fruit would be to at least expose the time since 
 last successful repair for a particular column family, to make it easier to 
 write a correct script to monitor for lack of repair in a non-buggy fashion.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2673) AssertionError post truncate

2011-06-01 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042199#comment-13042199
 ] 

Sylvain Lebresne commented on CASSANDRA-2673:
-

+1

 AssertionError post truncate
 

 Key: CASSANDRA-2673
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2673
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.0
 Environment: linux 64-bit ubuntu. deb package (datastax). (Random 
 partitioner)
Reporter: Marko Mikulicic
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.7.7, 0.8.1

 Attachments: 2673.txt


 I had 3 nodes with about 100G in a CF. I run truncate on that CF from 
 cassandra-cli. Then I run cleanup for that CF. I saw this exception shortly 
 after.
  INFO [FlushWriter:5] 2011-05-20 02:56:42,699 Memtable.java (line 157) 
 Writing Memtable-body@1278535630(26722 bytes, 1 operations)
  INFO [FlushWriter:5] 2011-05-20 02:56:42,706 Memtable.java (line 172) 
 Completed flushing /var/lib/cassandra/data/dnet/body-f-1892-Data.db (26915 
 bytes)
  INFO [NonPeriodicTasks:1] 2011-05-20 02:59:55,981 SSTable.java (line 147) 
 Deleted /var/lib/cassandra/data/dnet/body-f-1892
  INFO [NonPeriodicTasks:1] 2011-05-20 02:59:55,982 SSTable.java (line 147) 
 Deleted /var/lib/cassandra/data/dnet/body-f-1889
  INFO [NonPeriodicTasks:1] 2011-05-20 02:59:55,983 SSTable.java (line 147) 
 Deleted /var/lib/cassandra/data/dnet/body-f-1890
  INFO [NonPeriodicTasks:1] 2011-05-20 02:59:55,983 SSTable.java (line 147) 
 Deleted /var/lib/cassandra/data/dnet/body-f-1888
  INFO [NonPeriodicTasks:1] 2011-05-20 02:59:55,984 SSTable.java (line 147) 
 Deleted /var/lib/cassandra/data/dnet/body-f-1887
  INFO [CompactionExecutor:1] 2011-05-20 03:02:08,724 CompactionManager.java 
 (line 750) Cleaned up to 
 /var/lib/cassandra/data/dnet/body-tmp-f-1891-Data.db.  25,629,365,173 to 
 25,629,365,173 (~100% of original) bytes for 884,546 keys.  Time: 1,165,900ms.
 ERROR [CompactionExecutor:1] 2011-05-20 03:02:08,727 
 AbstractCassandraDaemon.java (line 114) Fatal exception in thread 
 Thread[CompactionExecutor:1,1,main]
 java.lang.AssertionError
   at 
 org.apache.cassandra.io.sstable.SSTableTracker.replace(SSTableTracker.java:108)
   at 
 org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:1037)
   at 
 org.apache.cassandra.db.CompactionManager.doCleanupCompaction(CompactionManager.java:769)
   at 
 org.apache.cassandra.db.CompactionManager.access$500(CompactionManager.java:56)
   at 
 org.apache.cassandra.db.CompactionManager$2.call(CompactionManager.java:173)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2231) Add CompositeType comparer to the comparers provided in org.apache.cassandra.db.marshal

2011-06-01 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1304#comment-1304
 ] 

Sylvain Lebresne commented on CASSANDRA-2231:
-

You'd have to apply the patch on CASSANDRA-2355 first.

 Add CompositeType comparer to the comparers provided in 
 org.apache.cassandra.db.marshal
 ---

 Key: CASSANDRA-2231
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2231
 Project: Cassandra
  Issue Type: New Feature
  Components: Contrib
Reporter: Ed Anuff
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 0.8.1

 Attachments: 
 0001-Add-compositeType-and-DynamicCompositeType-v2.patch, 
 0001-Add-compositeType-and-DynamicCompositeType-v3.patch, 
 0001-Add-compositeType-and-DynamicCompositeType-v4.patch, 
 0001-Add-compositeType-and-DynamicCompositeType_0.7.patch, 
 CompositeType-and-DynamicCompositeType.patch, 
 edanuff-CassandraCompositeType-1e253c4.zip


 CompositeType is a custom comparer that makes it possible to create 
 comparable composite values out of the basic types that Cassandra currently 
 supports, such as Long, UUID, etc.  This is very useful in both the creation 
 of custom inverted indexes using columns in a skinny row, where each column 
 name is a composite value, and also when using Cassandra's built-in secondary 
 index support, where it can be used to encode the values in the columns that 
 Cassandra indexes.  One scenario for the usage of these is documented here: 
 http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html.  Source for 
 contribution is attached and has been previously maintained on github here: 
 https://github.com/edanuff/CassandraCompositeType

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2654) Work around native heap leak in sun.nio.ch.Util affecting IncomingTcpConnection

2011-06-01 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042268#comment-13042268
 ] 

Sylvain Lebresne commented on CASSANDRA-2654:
-

looks good, +1

 Work around native heap leak in sun.nio.ch.Util affecting 
 IncomingTcpConnection
 ---

 Key: CASSANDRA-2654
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2654
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.6.13, 0.7.5, 0.8.0 beta 2
 Environment: OpenJDK Runtime Environment (IcedTea6 1.9.7) 
 (6b20-1.9.7-0ubuntu1~10.04.1)
 OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)
 Also observed on Sun/Oracle JDK. Probably platform- and os-independent.
Reporter: Hannes Schmidt
 Fix For: 0.6.12

 Attachments: 2654-v2.txt, 2654-v3.txt, 2654-v4-0.7.txt, 2654-v4.txt, 
 chunking.diff


 NIO's leaky, per-thread caching of direct buffers in combination with 
 IncomingTcpConnection's eager buffering of messages leads to leakage of large 
 amounts of native heap. Details in [1]. More on the root cause in [2]. Even 
 though it doesn't fix the leak, attached patch has been found to alleviate 
 the problem by keeping the size of each direct buffer modest.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2280) Request specific column families using StreamIn

2011-06-07 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045323#comment-13045323
 ] 

Sylvain Lebresne commented on CASSANDRA-2280:
-

In StreamRequestMessage deserializer, in the version  VERSION_080 part, the
type is deserialized again, it should be removed.

It needs rebasing (at least for 0.8 branch) so I didn't run the tests with it, 
but looks good otherwise.

 Request specific column families using StreamIn
 ---

 Key: CASSANDRA-2280
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2280
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Jonathan Ellis
 Fix For: 0.8.1

 Attachments: 
 0001-Allow-specific-column-families-to-be-requested-for-str.txt, 
 0001-Allow-specific-column-families-to-be-requested-for-str.txt, 2280-v3.txt, 
 2280-v4.txt, 2280-v5.txt


 StreamIn.requestRanges only specifies a keyspace, meaning that requesting a 
 range will request it for all column families: if you have a large number of 
 CFs, this can cause quite a headache.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-833) fix consistencylevel during bootstrap

2011-06-08 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046017#comment-13046017
 ] 

Sylvain Lebresne commented on CASSANDRA-833:


+1

 fix consistencylevel during bootstrap
 -

 Key: CASSANDRA-833
 URL: https://issues.apache.org/jira/browse/CASSANDRA-833
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.5
Reporter: Jonathan Ellis
Assignee: Sylvain Lebresne
 Fix For: 0.8.1

 Attachments: 0001-Increase-CL-with-boostrapping-leaving-node.patch, 
 833-v2.txt


 As originally designed, bootstrap nodes should *always* get *all* writes 
 under any consistencylevel, so when bootstrap finishes the operator can run 
 cleanup on the old nodes w/o fear that he might lose data.
 but if a bootstrap operation fails or is aborted, that means all writes will 
 fail until the ex-bootstrapping node is decommissioned.  so starting in 
 CASSANDRA-722, we just ignore dead nodes in consistencylevel calculations.
 but this breaks the original design.  CASSANDRA-822 adds a partial fix for 
 this (just adding bootstrap targets into the RF targets and hinting 
 normally), but this is still broken under certain conditions.  The real fix 
 is to consider consistencylevel for two sets of nodes:
   1. the RF targets as currently existing (no pending ranges)
   2.  the RF targets as they will exist after all movement ops are done
 If we satisfy CL for both sets then we will always be in good shape.
 I'm not sure if we can easily calculate 2. from the current TokenMetadata, 
 though.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2590) row delete breaks read repair

2011-06-09 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046398#comment-13046398
 ] 

Sylvain Lebresne commented on CASSANDRA-2590:
-

+1 on v4, we do need both calls.

That being said, we should probably refactor that part of the code someday 
because it is not the cleanest thing ever. And there is probably ways to avoid 
those two phases (which does do some duplicate works I believe).

 row delete breaks read repair 
 --

 Key: CASSANDRA-2590
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2590
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
 Fix For: 0.7.7, 0.8.1

 Attachments: 0001-2590-v3.patch, 
 0001-cf-resolve-test-and-possible-solution-for-read-repai.patch, 2590-v2.txt, 
 2590-v4-0.7.txt


 related to CASSANDRA-2589 
 Working at CL ALL can get inconsistent reads after row deletion. Reproduced 
 on the 0.7 and 0.8 source. 
 Steps to reproduce:
 # two node cluster with rf 2 and HH turned off
 # insert rows via cli 
 # flush both nodes 
 # shutdown node 1
 # connect to node 2 via cli and delete one row
 # bring up node 1
 # connect to node 1 via cli and issue get with CL ALL 
 # first get returns the deleted row, second get returns zero rows.
 RowRepairResolver.resolveSuperSet() resolves a local CF with the old row 
 columns, and the remote CF which is marked for deletion. CF.resolve() does 
 not pay attention to the deletion flags and the resolved CF has both 
 markedForDeletion set and a column with a lower timestamp. The return from 
 resolveSuperSet() is used as the return for the read without checking if the 
 cols are relevant. 
 Also when RowRepairResolver.mabeScheduleRepairs() runs it sends two 
 mutations. Node 1 is given the row level deletation, and Node 2 is given a 
 mutation to write the old (and now deleted) column from node 2. I have some 
 log traces for this if needed. 
 A quick fix is to check for relevant columns in the RowRepairResolver, will 
 attach shortly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1034) Remove assumption that Key to Token is one-to-one

2011-06-09 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-1034:


Attachment: (was: 
0002-Remove-assumption-that-token-and-keys-are-one-to-one-v2.patch)

 Remove assumption that Key to Token is one-to-one
 -

 Key: CASSANDRA-1034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1034
 Project: Cassandra
  Issue Type: Bug
Reporter: Stu Hood
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 1.0

 Attachments: 
 0001-Make-range-accept-both-Token-and-DecoratedKey.patch, 
 0002-LengthPartitioner.patch, 1034_v1.txt


 get_range_slices assumes that Tokens do not collide and converts a KeyRange 
 to an AbstractBounds. For RandomPartitioner, this assumption isn't safe, and 
 would lead to a very weird heisenberg.
 Converting AbstractBounds to use a DecoratedKey would solve this, because the 
 byte[] key portion of the DecoratedKey can act as a tiebreaker. 
 Alternatively, we could make DecoratedKey extend Token, and then use 
 DecoratedKeys in places where collisions are unacceptable.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1034) Remove assumption that Key to Token is one-to-one

2011-06-09 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-1034:


Attachment: (was: 
0002-Remove-assumption-that-token-and-keys-are-one-to-one.patch)

 Remove assumption that Key to Token is one-to-one
 -

 Key: CASSANDRA-1034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1034
 Project: Cassandra
  Issue Type: Bug
Reporter: Stu Hood
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 1.0

 Attachments: 
 0001-Make-range-accept-both-Token-and-DecoratedKey.patch, 
 0002-LengthPartitioner.patch, 1034_v1.txt


 get_range_slices assumes that Tokens do not collide and converts a KeyRange 
 to an AbstractBounds. For RandomPartitioner, this assumption isn't safe, and 
 would lead to a very weird heisenberg.
 Converting AbstractBounds to use a DecoratedKey would solve this, because the 
 byte[] key portion of the DecoratedKey can act as a tiebreaker. 
 Alternatively, we could make DecoratedKey extend Token, and then use 
 DecoratedKeys in places where collisions are unacceptable.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-1034) Remove assumption that Key to Token is one-to-one

2011-06-09 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-1034:


Attachment: 
1034-2-Remove-assumption-that-token-and-keys-are-one-to-one-v3.patch
1034-1-Generify-AbstractBounds-v3.patch

Patch rebased, this is against trunk.

 Remove assumption that Key to Token is one-to-one
 -

 Key: CASSANDRA-1034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1034
 Project: Cassandra
  Issue Type: Bug
Reporter: Stu Hood
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 1.0

 Attachments: 
 0001-Make-range-accept-both-Token-and-DecoratedKey.patch, 
 0002-LengthPartitioner.patch, 1034-1-Generify-AbstractBounds-v3.patch, 
 1034-2-Remove-assumption-that-token-and-keys-are-one-to-one-v3.patch, 
 1034_v1.txt


 get_range_slices assumes that Tokens do not collide and converts a KeyRange 
 to an AbstractBounds. For RandomPartitioner, this assumption isn't safe, and 
 would lead to a very weird heisenberg.
 Converting AbstractBounds to use a DecoratedKey would solve this, because the 
 byte[] key portion of the DecoratedKey can act as a tiebreaker. 
 Alternatively, we could make DecoratedKey extend Token, and then use 
 DecoratedKeys in places where collisions are unacceptable.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-1034) Remove assumption that Key to Token is one-to-one

2011-06-09 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046477#comment-13046477
 ] 

Sylvain Lebresne commented on CASSANDRA-1034:
-

bq. One way to remove toSplitValue would be to use DecoratedKey everywhere;

I'm not saying it's not possible, but I think this is overkill (in the changes 
it involves). Moreover, all the code that deals with topology really only care 
about token. That's the right abstraction for those part of the code. So I 
really (really) doubt using decorated key everywhere would be cleaner. Of 
course, anyone is free to actually do the experiment and prove me wrong. I also 
don't think it would remove the need for splitValue, it would just maybe call 
it differently.

bq. The equivalent of today's Token is a DecoratedKey for that token with a 
null key

This is only true today because we assume key and token are one-to-one. The 
goal is to change that. If multiple keys can have the same token (by definition 
the token is really the hash of a key), then the statement above is false. If a 
token correspond to an infinite set of key (with is the case with md5 btw, we 
just ignore it), then replacing a token by given key *cannot* work.

Overall, it could be that there is better way to do this, but having spend some 
time on this, I have a reasonable confidence on that it fixes the issue at hand 
without being too disruptive (which is not saying there isn't a few points here 
and there that couldn't be improved).

 Remove assumption that Key to Token is one-to-one
 -

 Key: CASSANDRA-1034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1034
 Project: Cassandra
  Issue Type: Bug
Reporter: Stu Hood
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 1.0

 Attachments: 
 0001-Make-range-accept-both-Token-and-DecoratedKey.patch, 
 0002-LengthPartitioner.patch, 1034-1-Generify-AbstractBounds-v3.patch, 
 1034-2-Remove-assumption-that-token-and-keys-are-one-to-one-v3.patch, 
 1034_v1.txt


 get_range_slices assumes that Tokens do not collide and converts a KeyRange 
 to an AbstractBounds. For RandomPartitioner, this assumption isn't safe, and 
 would lead to a very weird heisenberg.
 Converting AbstractBounds to use a DecoratedKey would solve this, because the 
 byte[] key portion of the DecoratedKey can act as a tiebreaker. 
 Alternatively, we could make DecoratedKey extend Token, and then use 
 DecoratedKeys in places where collisions are unacceptable.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2231) Add CompositeType comparer to the comparers provided in org.apache.cassandra.db.marshal

2011-06-09 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046500#comment-13046500
 ] 

Sylvain Lebresne commented on CASSANDRA-2231:
-

The comment still applies to DynamicCompositeType, but what the comment doesn't 
says is that if you use a 0x01 as the end-of-component, it expects you have no 
remaining component. The error message tells that apparently there is some 
bytes remaining after that 0x01. You can look the discussion above on that 
ticket for why that doesn't make sense to have anything after a 0x01.

 Add CompositeType comparer to the comparers provided in 
 org.apache.cassandra.db.marshal
 ---

 Key: CASSANDRA-2231
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2231
 Project: Cassandra
  Issue Type: New Feature
  Components: Contrib
Reporter: Ed Anuff
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 0.8.1

 Attachments: 
 0001-Add-compositeType-and-DynamicCompositeType-v2.patch, 
 0001-Add-compositeType-and-DynamicCompositeType-v3.patch, 
 0001-Add-compositeType-and-DynamicCompositeType-v4.patch, 
 0001-Add-compositeType-and-DynamicCompositeType_0.7.patch, 
 CompositeType-and-DynamicCompositeType.patch, 
 edanuff-CassandraCompositeType-1e253c4.zip


 CompositeType is a custom comparer that makes it possible to create 
 comparable composite values out of the basic types that Cassandra currently 
 supports, such as Long, UUID, etc.  This is very useful in both the creation 
 of custom inverted indexes using columns in a skinny row, where each column 
 name is a composite value, and also when using Cassandra's built-in secondary 
 index support, where it can be used to encode the values in the columns that 
 Cassandra indexes.  One scenario for the usage of these is documented here: 
 http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html.  Source for 
 contribution is attached and has been previously maintained on github here: 
 https://github.com/edanuff/CassandraCompositeType

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2433) Failed Streams Break Repair

2011-06-09 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2433:


Attachment: 
0004-Reports-validation-compaction-errors-back-to-repair-v3.patch
0003-Report-streaming-errors-back-to-repair-v3.patch
0002-Register-in-gossip-to-handle-node-failures-v3.patch

0001-Put-repair-session-on-a-Stage-and-add-a-method-to-re-v3.patch

Attaching v3 rebased (on 0.8).

bq. Since we're not trying to control throughput or monitor sessions, could we 
just use Stage.MISC?

The thing is that repair session are very long lived. And MISC is single 
threaded. So that would block other task that are not supposed to block. We 
could make MISC multi-threaded but even then it's not a good idea to mix short 
lived and long lived task on the same stage.

bq. I think RepairSession.exception needs to be volatile to ensure that the 
awoken thread sees it

Done in v3.

bq. Would it be better if RepairSession implemented 
IEndpointStateChangeSubscriber directly?

Good idea, it's slightly simpler, done in v3.

bq. The endpoint set needs to be threadsafe, since it will be modified by the 
endpoint state change thread, and the AE_STAGE thread

Done in v3. That will probably change with CASSANDRA-2610 anyway (which I have 
to update)

bq. Should StreamInSession.retries be volatile/atomic? (likely they won't retry 
quickly enough for it to be a problem, but...)

I did not change that, but if it's a problem for retries to not be volatile, I 
suspect having StreamInSession.current not volatile is also a problem. But 
really I'd be curious to see that be a problem.

bq. Playing devil's advocate: would sending a half-built tree in case of 
failure still be useful?

I don't think it is. Or more precisely, if you do send half-built tree, you'll 
have to be careful that the other doesn't consider what's missing as ranges not 
being in sync (I don't think people will be happy with tons of data being 
stream just because we happen to have a bug that make compaction throw an 
exception during the validation). So I think you cannot do much with a 
half-built tree, and it will add complication. For a case where people will 
need to restart a repair anyway once whatever happened is fixed

bq. success might need to be volatile as well

Done in v3.


 Failed Streams Break Repair
 ---

 Key: CASSANDRA-2433
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2433
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Benjamin Coverston
Assignee: Sylvain Lebresne
  Labels: repair
 Fix For: 0.8.1

 Attachments: 
 0001-Put-repair-session-on-a-Stage-and-add-a-method-to-re-v2.patch, 
 0001-Put-repair-session-on-a-Stage-and-add-a-method-to-re-v3.patch, 
 0001-Put-repair-session-on-a-Stage-and-add-a-method-to-re.patch, 
 0002-Register-in-gossip-to-handle-node-failures-v2.patch, 
 0002-Register-in-gossip-to-handle-node-failures-v3.patch, 
 0002-Register-in-gossip-to-handle-node-failures.patch, 
 0003-Report-streaming-errors-back-to-repair-v2.patch, 
 0003-Report-streaming-errors-back-to-repair-v3.patch, 
 0003-Report-streaming-errors-back-to-repair.patch, 
 0004-Reports-validation-compaction-errors-back-to-repair-v2.patch, 
 0004-Reports-validation-compaction-errors-back-to-repair-v3.patch, 
 0004-Reports-validation-compaction-errors-back-to-repair.patch


 Running repair in cases where a stream fails we are seeing multiple problems.
 1. Although retry is initiated and completes, the old stream doesn't seem to 
 clean itself up and repair hangs.
 2. The temp files are left behind and multiple failures can end up filling up 
 the data partition.
 These issues together are making repair very difficult for nearly everyone 
 running repair on a non-trivial sized data set.
 This issue is also being worked on w.r.t CASSANDRA-2088, however that was 
 moved to 0.8 for a few reasons. This ticket is to fix the immediate issues 
 that we are seeing in 0.7.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-2759) Scrub could lose increments and replicate that loss

2011-06-10 Thread Sylvain Lebresne (JIRA)
Scrub could lose increments and replicate that loss
---

 Key: CASSANDRA-2759
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2759
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0
Reporter: Sylvain Lebresne
 Fix For: 0.8.1


If scrub cannot 'repair' a corrupted row, it will skip it. On node A, if the 
row contains some sub-count for A id, those will be lost forever since A is the 
source of truth on it's current id. We should thus renew node A id when that 
happens to avoid this (not unlike we do in cleanup).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (CASSANDRA-2759) Scrub could lose increments and replicate that loss

2011-06-10 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne reassigned CASSANDRA-2759:
---

Assignee: Sylvain Lebresne

 Scrub could lose increments and replicate that loss
 ---

 Key: CASSANDRA-2759
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2759
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
  Labels: counters
 Fix For: 0.8.1


 If scrub cannot 'repair' a corrupted row, it will skip it. On node A, if the 
 row contains some sub-count for A id, those will be lost forever since A is 
 the source of truth on it's current id. We should thus renew node A id when 
 that happens to avoid this (not unlike we do in cleanup).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2759) Scrub could lose increments and replicate that loss

2011-06-10 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2759:


Attachment: 0001-Renew-nodeId-in-scrub-when-skipping-rows.patch

Attached patch against 0.8.

The patch also add a new startup option to renew the node id on startup. This 
could be useful if someone lose one of it's sstable (because of a bad disk for 
instance) and don't want to fully decommission that node.

This could arguably be splitted in another ticket though.

 Scrub could lose increments and replicate that loss
 ---

 Key: CASSANDRA-2759
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2759
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
  Labels: counters
 Fix For: 0.8.1

 Attachments: 0001-Renew-nodeId-in-scrub-when-skipping-rows.patch


 If scrub cannot 'repair' a corrupted row, it will skip it. On node A, if the 
 row contains some sub-count for A id, those will be lost forever since A is 
 the source of truth on it's current id. We should thus renew node A id when 
 that happens to avoid this (not unlike we do in cleanup).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2759) Scrub could lose increments and replicate that loss

2011-06-10 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047285#comment-13047285
 ] 

Sylvain Lebresne commented on CASSANDRA-2759:
-

It's picking a new UUID for the current node to use for new counter increment.

The problem is that on a given node we store deltas for it's current nodeId (to 
avoid synchronized read-before-write, but I'm starting to wonder is that was 
the smartest ever). Anyway, if scrub skips a row, it may skip some of those 
deltas. Let's say at first there is no increments coming for this row for A as 
'first distinguished replica'. So far we are still kind of good, because on a 
read (with CL  ONE) the result coming from A will have a 'version' for it's 
own sub-count smaller that the one on the other replica, so we will us the 
sub-count on those replica and return the correct value.

However, as soon as A acknowledge new increments for this row, it will start 
inserting new deltas while he is not intrinsically up to date. Which will 
result in an definitive undercount.

The goal of renewing the node id of A is to make sure that second part never 
happen (because after the renew A will add new deltas as A', not A anymore).

Anyway, now that I've plugged the brain this patch doesn't really works because 
A will never be repaired by the other nodes of it's now inconsistent value.

So I have no clue how to actually fix that.

 Scrub could lose increments and replicate that loss
 ---

 Key: CASSANDRA-2759
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2759
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
  Labels: counters
 Fix For: 0.8.1

 Attachments: 0001-Renew-nodeId-in-scrub-when-skipping-rows.patch


 If scrub cannot 'repair' a corrupted row, it will skip it. On node A, if the 
 row contains some sub-count for A id, those will be lost forever since A is 
 the source of truth on it's current id. We should thus renew node A id when 
 that happens to avoid this (not unlike we do in cleanup).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2759) Scrub could lose increments and replicate that loss

2011-06-10 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047306#comment-13047306
 ] 

Sylvain Lebresne commented on CASSANDRA-2759:
-

It may be that the best short fix here is to make scrub *not* skipping row on 
counter column families (though CASSANDRA-2614 would change that to 'never ever 
skipping row') and just throw a RuntimeException.

 Scrub could lose increments and replicate that loss
 ---

 Key: CASSANDRA-2759
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2759
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
  Labels: counters
 Fix For: 0.8.1

 Attachments: 0001-Renew-nodeId-in-scrub-when-skipping-rows.patch


 If scrub cannot 'repair' a corrupted row, it will skip it. On node A, if the 
 row contains some sub-count for A id, those will be lost forever since A is 
 the source of truth on it's current id. We should thus renew node A id when 
 that happens to avoid this (not unlike we do in cleanup).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2759) Scrub could lose increments and replicate that loss

2011-06-14 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2759:


Attachment: 0001-Don-t-skip-rows-on-scrub-for-counter-CFs.patch

Attaching patch to simply re-throw the exception instead of skipping the row 
for counter column families.

bq. Only if you actually did have a counter in the column_metadata, right?

right.

 Scrub could lose increments and replicate that loss
 ---

 Key: CASSANDRA-2759
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2759
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
  Labels: counters
 Fix For: 0.8.1

 Attachments: 0001-Don-t-skip-rows-on-scrub-for-counter-CFs.patch, 
 0001-Renew-nodeId-in-scrub-when-skipping-rows.patch


 If scrub cannot 'repair' a corrupted row, it will skip it. On node A, if the 
 row contains some sub-count for A id, those will be lost forever since A is 
 the source of truth on it's current id. We should thus renew node A id when 
 that happens to avoid this (not unlike we do in cleanup).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2641) AbstractBounds.normalize should deal with overlapping ranges

2011-06-14 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049042#comment-13049042
 ] 

Sylvain Lebresne commented on CASSANDRA-2641:
-

bq. it overlaps quite a bit with StorageProxy.getRestrictedRanges: is there 
anything there that can be reused?

getRestrictedRanges splits a range at different tokens. This patch is about 
merging overlapping range as part of normalize. Not sure I follow what could be 
reused here. And in any, I'm in favor of not refactoring anything that is not 
necessary for this patch. This is not worth it.

 AbstractBounds.normalize should deal with overlapping ranges
 

 Key: CASSANDRA-2641
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2641
 Project: Cassandra
  Issue Type: Test
  Components: Core
Reporter: Stu Hood
Assignee: Stu Hood
Priority: Minor
 Fix For: 0.8.1

 Attachments: 0001-Assert-non-overlapping-ranges-in-normalize.txt, 
 0001-Make-normalize-deoverlap-ranges.patch, 
 0002-Don-t-use-overlapping-ranges-in-tests.txt


 Apparently no consumers have encountered it in production, but 
 AbstractBounds.normalize does not handle overlapping ranges. If given 
 overlapping ranges, the output will be sorted but still overlapping, for 
 which SSTableReader.getPositionsForRanges will choose ranges in an SSTable 
 that may overlap.
 We should either add an assert in normalize(), or in getPositionsForRanges() 
 to ensure that this never bites us in production.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (CASSANDRA-2752) repair fails with java.io.EOFException

2011-06-14 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne resolved CASSANDRA-2752.
-

Resolution: Fixed
  Reviewer: slebresne
  Assignee: Jonathan Ellis  (was: Terje Marthinussen)

 repair fails with java.io.EOFException
 --

 Key: CASSANDRA-2752
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2752
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0
Reporter: Terje Marthinussen
Assignee: Jonathan Ellis
Priority: Critical
 Fix For: 0.8.1

 Attachments: 2752.txt


 Issuing repair on node 1  (1.10.42.81) in a cluster quickly fails with
 INFO [AntiEntropyStage:1] 2011-06-09 19:02:47,999 AntiEntropyService.java 
 (line 234) Queueing comparison #Differencer #TreeRequest 
 manual-repair-0c17c5f9-583f-4a31-a6d4-a9e7306fb46e, /1
 .10.42.82, (JP,XXX), (Token(bytes[6e]),Token(bytes[313039])]
  INFO [AntiEntropyStage:1] 2011-06-09 19:02:48,026 AntiEntropyService.java 
 (line 468) Endpoints somewhere/1.10.42.81 and /1.10.42.82 have 2 range(s) out 
 of sync for (JP,XXX) on (Token(bytes[6e]),Token(bytes[313039])]
  INFO [AntiEntropyStage:1] 2011-06-09 19:02:48,026 AntiEntropyService.java 
 (line 485) Performing streaming repair of 2 ranges for #TreeRequest 
 manual-repair-0c17c5f9-583f-4a31-a6d4-a9e7306
 fb46e, /1.10.42.82, (JP,XXX), (Token(bytes[6e]),Token(bytes[313039])]
  INFO [AntiEntropyStage:1] 2011-06-09 19:02:48,030 StreamOut.java (line 173) 
 Stream context metadata [/data/cassandra/node0/data/JP/XXX-g-3-Data.db 
 sections=1 progress=0/36592 - 0%], 1 sstables.
  INFO [AntiEntropyStage:1] 2011-06-09 19:02:48,031 StreamOutSession.java 
 (line 174) Streaming to /1.10.42.82
 ERROR [CompactionExecutor:9] 2011-06-09 19:02:48,970 
 AbstractCassandraDaemon.java (line 113) Fatal exception in thread 
 Thread[CompactionExecutor:9,1,main]
 java.io.EOFException
 at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725)
 at 
 org.apache.cassandra.io.sstable.SSTableWriter$RowIndexer.doIndexing(SSTableWriter.java:457)
 at 
 org.apache.cassandra.io.sstable.SSTableWriter$RowIndexer.index(SSTableWriter.java:364)
 at 
 org.apache.cassandra.io.sstable.SSTableWriter$Builder.build(SSTableWriter.java:315)
 at 
 org.apache.cassandra.db.CompactionManager$9.call(CompactionManager.java:1099)
 at 
 org.apache.cassandra.db.CompactionManager$9.call(CompactionManager.java:1090)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 On .82
 ERROR [CompactionExecutor:12] 2011-06-09 19:02:48,051 
 AbstractCassandraDaemon.java (line 113) Fatal exception in thread 
 Thread[CompactionExecutor:12,1,main]
 java.io.EOFException
 at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725)
 at 
 org.apache.cassandra.io.sstable.SSTableWriter$RowIndexer.doIndexing(SSTableWriter.java:457)
 at 
 org.apache.cassandra.io.sstable.SSTableWriter$RowIndexer.index(SSTableWriter.java:364)
 at 
 org.apache.cassandra.io.sstable.SSTableWriter$Builder.build(SSTableWriter.java:315)
 at 
 org.apache.cassandra.db.CompactionManager$9.call(CompactionManager.java:1099)
 at 
 org.apache.cassandra.db.CompactionManager$9.call(CompactionManager.java:1090)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 ERROR [Thread-132] 2011-06-09 19:02:48,051 AbstractCassandraDaemon.java (line 
 113) Fatal exception in thread Thread[Thread-132,5,main]
 java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
 java.io.EOFException
 at 
 org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSession.java:152)
 at 
 org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:63)
 at 
 org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:155)
 at 
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:93)
 Caused by: java.util.concurrent.ExecutionException: java.io.EOFException
 at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
 at 

[jira] [Commented] (CASSANDRA-2752) repair fails with java.io.EOFException

2011-06-14 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049045#comment-13049045
 ] 

Sylvain Lebresne commented on CASSANDRA-2752:
-

Good catch. +1 (committed).
Thanks Terje.

 repair fails with java.io.EOFException
 --

 Key: CASSANDRA-2752
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2752
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0
Reporter: Terje Marthinussen
Assignee: Terje Marthinussen
Priority: Critical
 Fix For: 0.8.1

 Attachments: 2752.txt


 Issuing repair on node 1  (1.10.42.81) in a cluster quickly fails with
 INFO [AntiEntropyStage:1] 2011-06-09 19:02:47,999 AntiEntropyService.java 
 (line 234) Queueing comparison #Differencer #TreeRequest 
 manual-repair-0c17c5f9-583f-4a31-a6d4-a9e7306fb46e, /1
 .10.42.82, (JP,XXX), (Token(bytes[6e]),Token(bytes[313039])]
  INFO [AntiEntropyStage:1] 2011-06-09 19:02:48,026 AntiEntropyService.java 
 (line 468) Endpoints somewhere/1.10.42.81 and /1.10.42.82 have 2 range(s) out 
 of sync for (JP,XXX) on (Token(bytes[6e]),Token(bytes[313039])]
  INFO [AntiEntropyStage:1] 2011-06-09 19:02:48,026 AntiEntropyService.java 
 (line 485) Performing streaming repair of 2 ranges for #TreeRequest 
 manual-repair-0c17c5f9-583f-4a31-a6d4-a9e7306
 fb46e, /1.10.42.82, (JP,XXX), (Token(bytes[6e]),Token(bytes[313039])]
  INFO [AntiEntropyStage:1] 2011-06-09 19:02:48,030 StreamOut.java (line 173) 
 Stream context metadata [/data/cassandra/node0/data/JP/XXX-g-3-Data.db 
 sections=1 progress=0/36592 - 0%], 1 sstables.
  INFO [AntiEntropyStage:1] 2011-06-09 19:02:48,031 StreamOutSession.java 
 (line 174) Streaming to /1.10.42.82
 ERROR [CompactionExecutor:9] 2011-06-09 19:02:48,970 
 AbstractCassandraDaemon.java (line 113) Fatal exception in thread 
 Thread[CompactionExecutor:9,1,main]
 java.io.EOFException
 at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725)
 at 
 org.apache.cassandra.io.sstable.SSTableWriter$RowIndexer.doIndexing(SSTableWriter.java:457)
 at 
 org.apache.cassandra.io.sstable.SSTableWriter$RowIndexer.index(SSTableWriter.java:364)
 at 
 org.apache.cassandra.io.sstable.SSTableWriter$Builder.build(SSTableWriter.java:315)
 at 
 org.apache.cassandra.db.CompactionManager$9.call(CompactionManager.java:1099)
 at 
 org.apache.cassandra.db.CompactionManager$9.call(CompactionManager.java:1090)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 On .82
 ERROR [CompactionExecutor:12] 2011-06-09 19:02:48,051 
 AbstractCassandraDaemon.java (line 113) Fatal exception in thread 
 Thread[CompactionExecutor:12,1,main]
 java.io.EOFException
 at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725)
 at 
 org.apache.cassandra.io.sstable.SSTableWriter$RowIndexer.doIndexing(SSTableWriter.java:457)
 at 
 org.apache.cassandra.io.sstable.SSTableWriter$RowIndexer.index(SSTableWriter.java:364)
 at 
 org.apache.cassandra.io.sstable.SSTableWriter$Builder.build(SSTableWriter.java:315)
 at 
 org.apache.cassandra.db.CompactionManager$9.call(CompactionManager.java:1099)
 at 
 org.apache.cassandra.db.CompactionManager$9.call(CompactionManager.java:1090)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 ERROR [Thread-132] 2011-06-09 19:02:48,051 AbstractCassandraDaemon.java (line 
 113) Fatal exception in thread Thread[Thread-132,5,main]
 java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
 java.io.EOFException
 at 
 org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSession.java:152)
 at 
 org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:63)
 at 
 org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:155)
 at 
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:93)
 Caused by: java.util.concurrent.ExecutionException: java.io.EOFException
 at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
 at 

[jira] [Created] (CASSANDRA-2767) ConcurrentModificationException in AntiEntropyService.getNeighbors()

2011-06-14 Thread Sylvain Lebresne (JIRA)
ConcurrentModificationException in AntiEntropyService.getNeighbors()


 Key: CASSANDRA-2767
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2767
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
 Fix For: 0.8.1




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2767) ConcurrentModificationException in AntiEntropyService.getNeighbors()

2011-06-14 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2767:


Attachment: 0001-Fix-ConcurrentModificationException.patch

 ConcurrentModificationException in AntiEntropyService.getNeighbors()
 

 Key: CASSANDRA-2767
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2767
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
  Labels: repair
 Fix For: 0.8.1

 Attachments: 0001-Fix-ConcurrentModificationException.patch

   Original Estimate: 1h
  Remaining Estimate: 1h



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2767) ConcurrentModificationException in AntiEntropyService.getNeighbors()

2011-06-14 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2767:


Attachment: (was: 0001-Fix-ConcurrentModificationException.patch)

 ConcurrentModificationException in AntiEntropyService.getNeighbors()
 

 Key: CASSANDRA-2767
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2767
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
  Labels: repair
 Fix For: 0.8.1

 Attachments: 0001-Fix-ConcurrentModificationException.patch

   Original Estimate: 1h
  Remaining Estimate: 1h



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2767) ConcurrentModificationException in AntiEntropyService.getNeighbors()

2011-06-14 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2767:


Attachment: 0001-Fix-ConcurrentModificationException.patch

 ConcurrentModificationException in AntiEntropyService.getNeighbors()
 

 Key: CASSANDRA-2767
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2767
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
  Labels: repair
 Fix For: 0.8.1

 Attachments: 0001-Fix-ConcurrentModificationException.patch

   Original Estimate: 1h
  Remaining Estimate: 1h



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2768) AntiEntropyService excluding nodes that are on version 0.7 or sooner

2011-06-14 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049078#comment-13049078
 ] 

Sylvain Lebresne commented on CASSANDRA-2768:
-

The important part here is that this is not a repair specific thing per se. The 
important part of the stack trace is the 'Excluding ...' part.
It is triggered because of the following code in AES.getNeighbors:
{noformat}
  if (Gossiper.instance.getVersion(endpoint) = MessagingService.VERSION_07)
  {
  logger.info(Excluding  + endpoint +  from repair because it is on 
version 0.7 or sooner. You should consider updating this node before running 
repair again.);
  neighbors.remove(endpoint);
  }
{noformat}
Since Sasha has reportedly verified that all node report being on 0.8.0, this 
suggests a Gossiper bug that reports the wrong version (even after node 
restarts).

The exception itself has been fixed in CASSANDRA-2767 and should not be the 
focus of attention here.

 AntiEntropyService excluding nodes that are on version 0.7 or sooner
 

 Key: CASSANDRA-2768
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2768
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0
 Environment: 4 node environment -- 
 Originally 0.7.6-2 with a Keyspace defined with RF=3
 Upgraded all nodes ( 1 at a time ) to version 0.8.0:  For each node, the node 
 was shut down, new version was turned on, using the existing data files / 
 directories and a nodetool repair was run.  
Reporter: Sasha Dolgy
Assignee: Sylvain Lebresne

 When I run nodetool repair on any of the nodes, the 
 /var/log/cassandra/system.log reports errors similar to:
 INFO [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 
 21:28:39,877 AntiEntropyService.java (line 177) Excluding /10.128.34.18 from 
 repair because it is on version 0.7 or sooner. You should consider updating 
 this node before running repair again.
 ERROR [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 
 21:28:39,877 AbstractCassandraDaemon.java (line 113) Fatal exception in 
 thread Thread[manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec,5,RMI 
 Runtime]
 java.util.ConcurrentModificationException
   at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
   at java.util.HashMap$KeyIterator.next(HashMap.java:828)
   at 
 org.apache.cassandra.service.AntiEntropyService.getNeighbors(AntiEntropyService.java:173)
   at 
 org.apache.cassandra.service.AntiEntropyService$RepairSession.run(AntiEntropyService.java:776)
 The INFO message and subsequent ERROR message are logged for 2 nodes .. I 
 suspect that this is because RF=3.  
 nodetool ring shows that all nodes are up.  
 Client connections (read / write) are not having issues..  
 nodetool version on all nodes shows that each node is 0.8.0
 At suggestion of some contributors, I have restarted each node and tried to 
 run a nodetool repair again ... the result is the same with the messages 
 being logged.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2758) nodetool repair never finishes. Loops forever through merkle trees?

2011-06-14 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2758:


Attachment: 0001-Fix-MerkleTree.init-to-not-create-non-sensical-trees.patch

MerkleTree.init(), which is used to create the merkle tree in case there is no 
data, was creating a nonsensical tree by stopping it's iteration too late.

Attached patch to fix (and dumping priority to minor because it has very little 
chance to hit anyone in any real-life situation).

 nodetool repair never finishes. Loops forever through merkle trees?
 ---

 Key: CASSANDRA-2758
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2758
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0
Reporter: Terje Marthinussen
Assignee: Sylvain Lebresne
 Fix For: 0.8.1

 Attachments: 
 0001-Fix-MerkleTree.init-to-not-create-non-sensical-trees.patch


 I am not sure all steps here is needed, but as part of testing something 
 else, I set up
 node1: initial_token: 1
 node2: initial_token: 5
 Then:
 {noformat}
 create keyspace myks 
  with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
  with strategy_options = [{ replication_factor:2 }];
 use myks;
 create column family test with comparator = AsciiType and column_metadata=[ 
 {column_name: 'up_', validation_class: LongType, index_type: 0}, 
 {column_name: 'del_', validation_class: LongType, index_type: 0} ]
  and keys_cached = 10 and rows_cached = 1 and 
 min_compaction_threshold = 2;
 quit;
 {noformat}
 Doing nodetool repair after this gets both nodes busy looping forever.
 A quick look at one node in eclipse makes me guess its having fun spinning 
 through  merkle trees, but I have to admit I have not look at it for a long 
 time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2758) nodetool repair never finishes. Loops forever through merkle trees?

2011-06-14 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2758:


Priority: Minor  (was: Major)

 nodetool repair never finishes. Loops forever through merkle trees?
 ---

 Key: CASSANDRA-2758
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2758
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0
Reporter: Terje Marthinussen
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 0.8.1

 Attachments: 
 0001-Fix-MerkleTree.init-to-not-create-non-sensical-trees.patch


 I am not sure all steps here is needed, but as part of testing something 
 else, I set up
 node1: initial_token: 1
 node2: initial_token: 5
 Then:
 {noformat}
 create keyspace myks 
  with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
  with strategy_options = [{ replication_factor:2 }];
 use myks;
 create column family test with comparator = AsciiType and column_metadata=[ 
 {column_name: 'up_', validation_class: LongType, index_type: 0}, 
 {column_name: 'del_', validation_class: LongType, index_type: 0} ]
  and keys_cached = 10 and rows_cached = 1 and 
 min_compaction_threshold = 2;
 quit;
 {noformat}
 Doing nodetool repair after this gets both nodes busy looping forever.
 A quick look at one node in eclipse makes me guess its having fun spinning 
 through  merkle trees, but I have to admit I have not look at it for a long 
 time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2679) Move some column creation logic into Column factory functions

2011-06-14 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049118#comment-13049118
 ] 

Sylvain Lebresne commented on CASSANDRA-2679:
-

lgtm but can we rename the 'get' to 'create' (or 'make')  as this better 
suggest what those methods do.

 Move some column creation logic into Column factory functions
 -

 Key: CASSANDRA-2679
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2679
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Stu Hood
Priority: Minor
 Fix For: 1.0

 Attachments: 
 0001-CASSANDRA-2679-Move-Deleted-Expiring-switch-and-contex.txt


 Expiring and Counter columns have extra creation logic that is better 
 encapsulated when implemented inside a factory function.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2766) ConcurrentModificationException during node recovery

2011-06-14 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049191#comment-13049191
 ] 

Sylvain Lebresne commented on CASSANDRA-2766:
-

lgtm +1 on v2

 ConcurrentModificationException during node recovery
 

 Key: CASSANDRA-2766
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2766
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Terje Marthinussen
Assignee: Jonathan Ellis
 Attachments: 2766-v2.txt, 2766.txt


 Testing some node recovery operations.
 In this case:
 1. Data is being added/updated as it would in production
 2. repair is running on other nodes in the cluster
 3. we wiped data on this node and started up again, but before repair was 
 actually started on this node (but it had gotten data through the regular 
 data feed) we got this error.
 I see no indication in the logs that outgoing streams has been started, but 
 the node have finished one incoming stream before this (I guess from some 
 other node doing repair).
  INFO [CompactionExecutor:11] 2011-06-14 14:15:09,078 SSTableReader.java 
 (line 155) Opening /data/cassandra/node1/data/JP/test-g-8
  INFO [CompactionExecutor:13] 2011-06-14 14:15:09,079 SSTableReader.java 
 (line 155) Opening /data/cassandra/node1/data/JP/test-g-10
  INFO [HintedHandoff:1] 2011-06-14 14:15:26,623 HintedHandOffManager.java 
 (line 302) Started hinted handoff for endpoint /1.10.42.216
  INFO [HintedHandoff:1] 2011-06-14 14:15:26,623 HintedHandOffManager.java 
 (line 358) Finished hinted handoff of 0 rows to endpoint /1.10.42.216
  INFO [CompactionExecutor:9] 2011-06-14 14:15:29,417 SSTableReader.java (line 
 155) Opening /data/cassandra/node1/data/JP/Datetest-g-2
 ERROR [Thread-84] 2011-06-14 14:15:36,755 AbstractCassandraDaemon.java (line 
 113) Fatal exception in thread Thread[Thread-84,5,main]
 java.util.ConcurrentModificationException
 at 
 java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
 at java.util.AbstractList$Itr.next(AbstractList.java:343)
 at 
 org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSession.java:132)
 at 
 org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:63)
 at 
 org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:155)
 at 
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:93)
 ERROR [Thread-79] 2011-06-14 14:15:36,755 AbstractCassandraDaemon.java (line 
 113) Fatal exception in thread Thread[Thread-79,5,main]
 java.util.ConcurrentModificationException
 at 
 java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
 at java.util.AbstractList$Itr.next(AbstractList.java:343)
 at 
 org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSession.java:132)
 at 
 org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:63)
 at 
 org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:155)
 at 
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:93)
 ERROR [Thread-83] 2011-06-14 14:15:36,755 AbstractCassandraDaemon.java (line 
 113) Fatal exception in thread Thread[Thread-83,5,main]
 java.util.ConcurrentModificationException
 at 
 java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
 at java.util.AbstractList$Itr.next(AbstractList.java:343)
 at 
 org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSession.java:132)
 at 
 org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:63)
 at 
 org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:155)
 at 
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:93)
 ERROR [Thread-85] 2011-06-14 14:15:36,755 AbstractCassandraDaemon.java (line 
 113) Fatal exception in thread Thread[Thread-85,5,main]
 java.util.ConcurrentModificationException
 at 
 java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
 at java.util.AbstractList$Itr.next(AbstractList.java:343)
 at 
 org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSession.java:132)
 at 
 org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:63)
 at 
 org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:155)
 at 
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:93)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2614) create Column and CounterColumn in the same column family

2011-06-14 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049308#comment-13049308
 ] 

Sylvain Lebresne commented on CASSANDRA-2614:
-

Turns out there is a tiny hiccup.
For a Deletion (inside a thrift Mutation), the timestamp is required to be set 
if the CF is a regular one, but not if it is a counter CF. But more 
importantly, for counter CF, the timestamp should be a server generated 
timestamp.

If we allow row to mix counters and regular columns, then when facing a 
Deletion for a full row, there is no way to accommodate those two requirements. 
It would kind of be a shame of not doing this because of Deletion, but I don't 
see a good way around this (other than changing the API, which would move that 
ticket to 1.0).

Ideas ? 

 create Column and CounterColumn in the same column family
 -

 Key: CASSANDRA-2614
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2614
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Dave Rav
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 0.8.1


 create Column and CounterColumn in the same column family

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2774) one way to make counter delete work better

2011-06-15 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049688#comment-13049688
 ] 

Sylvain Lebresne commented on CASSANDRA-2774:
-

Consider 2 nodes A, B and C with RF=2 and a given counter c whose replica set 
is {B, C}.
Consider a single client issuing the following operations (in order) while 
connected to node A:
# client increment c by +2 at CL.ONE
# client delete c at CL.ONE
# client increment c by +3 at CL.ONE
# client reads c at CL.ALL

The *only* valid answer the client should ever get on its last read is 3.  Any 
other value is a break of the consistency level contract and not something we 
can expect people to be happy with. Any other answer means that deletes are 
broken (and this *is* the problem with the actual implementation).

However, because the write are made at CL.ONE in the example above, at the time 
the read is issued, the only thing we know for sure is that each write has been 
received by one node, but not necessarily the same each time.  Depending on the 
actual timing and on which node happens to be the one acknowledging each 
writes, when the read reaches the nodes you can have a lot of different 
situations including:
* A and B both have received the 3 writes in the right order, they will all 
return 3, the 'right' answer.
* A received the deletion (the two increments are still on the wire yet to be 
received) and B received the other two increments (the delete is still on the 
wire yet to be received). A will return the tombstone, B will return 5. You can 
assign all epoch number you want, there is no way you can return 3 to the 
client. It will be either 5 or 0.

So the same query will result in different answers depending on the internal 
timing of events, and will sometimes return an answer that is a break of the 
contract. Removes of counters are broken and the only safe way to use them is 
for permanent removal with no following inserts. This patch doesn't fix it.

Btw, it's not too hard to come up with the same kind of example using only 
QUORUM reads and writes (but you'll need one more replica and a few more steps).


 one way to make counter delete work better
 --

 Key: CASSANDRA-2774
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2774
 Project: Cassandra
  Issue Type: New Feature
Affects Versions: 0.8.0
Reporter: Yang Yang
 Attachments: counter_delete.diff


 current Counter does not work with delete, because different merging order of 
 sstables would produces different result, for example:
 add 1
 delete 
 add 2
 if the merging happens by 1-2, (1,2)--3  order, the result we see will be 2
 if merging is: 1--3, (1,3)--2, the result will be 3.
 the issue is that delete now can not separate out previous adds and adds 
 later than the delete. supposedly a delete is to create a completely new 
 incarnation of the counter, or a new lifetime, or epoch. the new approach 
 utilizes the concept of epoch number, so that each delete bumps up the 
 epoch number. since each write is replicated (replicate on write is almost 
 always enabled in practice, if this is a concern, we could further force ROW 
 in case of delete ), so the epoch number is global to a replica set
 changes are attached, existing tests pass fine, some tests are modified since 
 the semantic is changed a bit. some cql tests do not pass in the original 
 0.8.0 source, that's not the fault of this change.
 see details at 
 http://mail-archives.apache.org/mod_mbox/cassandra-user/201106.mbox/%3cbanlktikqcglsnwtt-9hvqpseoo7sf58...@mail.gmail.com%3E
 the goal of this is to make delete work ( at least with consistent behavior, 
 yes in case of long network partition, the behavior is not ideal, but it's 
 consistent with the definition of logical clock), so that we could have 
 expiring Counters

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2769) Cannot Create Duplicate Compaction Marker

2011-06-15 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2769:


Attachment: 0002-Only-compact-what-has-been-succesfully-marked-as-com.patch
0001-Do-compact-only-smallerSSTables.patch
0001-0.8.0-Remove-useless-unmarkCompacting-in-doCleanup.patch

Alright, there is a bunch of problems, one of which affects 0.8 and trunk and 
could cause this stackTrace. The others are due to CASSANDRA-1610 and thus only 
affect trunk (but one of those can also result in the attached stackTrace).

The problem affecting 0.8 and trunk is related to a left over line in 
doCleanup() that is wrongly unmarking a sstable from the compacting set before 
having removed it from the active set of sstables. Thus another compaction 
could start compacting this sstable and we'll end up marking the file as 
compacted twice (and we would have duplicated the sstable, which is a problem 
for counters).
Patch 0001-0.8.0-Remove-useless-unmarkCompacting-in-doCleanup.patch removes it 
and is against 0.8.

Trunk has a few problems of its own:
* If disk space is not sufficient to compact all sstables, it computes the 
smallestSSTables set that fits, but doesn't use it. Attached first patch 
(0001-Do-compact-only-smallerSSTables.patch) fixes that.
* The CompactionTask logic wrongly decorrelates the set of sstables that are 
successfully marked from the ones it did compact. That is, it grabs a list of 
sstables it wants to compact, then call markCompacting on them, but does not 
check if all of them are successfully marked and compact the original list 
instead.
  In effect, a task will recompact sstables that are already being compacted by 
other task and the given file will be compacted twice (or more) and marked 
compacted multiple times.
  Attached patch 
(0002-Only-compact-what-has-been-succesfully-marked-as-com.patch) fixes this by 
changing the sstables set of a given CompactionTask to whatever has been 
successfully marked only. Since the marking involves updating the task, I've 
move the logic to AbstractCompactionTask where it seems to make more sense to 
me.
* For some reason, the markCompacting added for CompactionTasks was refusing to 
mark (and compact) anything if the set of sstable was bigger that 
MaxCompactionThreshold. This means that as soon as the number of sstables (of 
same size) in the column family would exceed the threshold, no compaction would 
be started. This is not the expected behavior. The second patch also fixes this 
by reusing the original markCompacting that handles this correctly.


 Cannot Create Duplicate Compaction Marker
 -

 Key: CASSANDRA-2769
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2769
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Benjamin Coverston
Assignee: Sylvain Lebresne
 Fix For: 0.8.1, 1.0

 Attachments: 
 0001-0.8.0-Remove-useless-unmarkCompacting-in-doCleanup.patch, 
 0001-Do-compact-only-smallerSSTables.patch, 
 0002-Only-compact-what-has-been-succesfully-marked-as-com.patch


 Concurrent compaction can trigger the following exception when two threads 
 compact the same sstable. DataTracker attempts to prevent this but apparently 
 not successfully.
 java.io.IOError: java.io.IOException: Unable to create compaction marker
   at 
 org.apache.cassandra.io.sstable.SSTableReader.markCompacted(SSTableReader.java:638)
   at 
 org.apache.cassandra.db.DataTracker.removeOldSSTablesSize(DataTracker.java:321)
   at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:294)
   at 
 org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:255)
   at 
 org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:932)
   at 
 org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:173)
   at 
 org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:119)
   at 
 org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:102)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:680)
 Caused by: java.io.IOException: Unable to create compaction marker
   at 
 org.apache.cassandra.io.sstable.SSTableReader.markCompacted(SSTableReader.java:634)
   ... 12 more

--
This message is automatically generated by JIRA.
For more information on JIRA, see: 

[jira] [Resolved] (CASSANDRA-2641) AbstractBounds.normalize should deal with overlapping ranges

2011-06-15 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne resolved CASSANDRA-2641.
-

   Resolution: Fixed
Fix Version/s: (was: 0.8.1)
   1.0
 Reviewer: stuhood  (was: slebresne)
 Assignee: Sylvain Lebresne  (was: Stu Hood)

Committed to 1.0. Since I'm pretty sure we don't generate overlapping range so 
far, it's not worth taking the risk to put in 0.8.

 AbstractBounds.normalize should deal with overlapping ranges
 

 Key: CASSANDRA-2641
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2641
 Project: Cassandra
  Issue Type: Test
  Components: Core
Reporter: Stu Hood
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 1.0

 Attachments: 0001-Assert-non-overlapping-ranges-in-normalize.txt, 
 0001-Make-normalize-deoverlap-ranges.patch, 
 0002-Don-t-use-overlapping-ranges-in-tests.txt


 Apparently no consumers have encountered it in production, but 
 AbstractBounds.normalize does not handle overlapping ranges. If given 
 overlapping ranges, the output will be sorted but still overlapping, for 
 which SSTableReader.getPositionsForRanges will choose ranges in an SSTable 
 that may overlap.
 We should either add an assert in normalize(), or in getPositionsForRanges() 
 to ensure that this never bites us in production.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2433) Failed Streams Break Repair

2011-06-15 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2433:


Attachment: (was: 
0001-Put-repair-session-on-a-Stage-and-add-a-method-to-re.patch)

 Failed Streams Break Repair
 ---

 Key: CASSANDRA-2433
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2433
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Benjamin Coverston
Assignee: Sylvain Lebresne
  Labels: repair
 Fix For: 0.8.1


 Running repair in cases where a stream fails we are seeing multiple problems.
 1. Although retry is initiated and completes, the old stream doesn't seem to 
 clean itself up and repair hangs.
 2. The temp files are left behind and multiple failures can end up filling up 
 the data partition.
 These issues together are making repair very difficult for nearly everyone 
 running repair on a non-trivial sized data set.
 This issue is also being worked on w.r.t CASSANDRA-2088, however that was 
 moved to 0.8 for a few reasons. This ticket is to fix the immediate issues 
 that we are seeing in 0.7.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2433) Failed Streams Break Repair

2011-06-15 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2433:


Attachment: (was: 
0002-Register-in-gossip-to-handle-node-failures-v3.patch)

 Failed Streams Break Repair
 ---

 Key: CASSANDRA-2433
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2433
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Benjamin Coverston
Assignee: Sylvain Lebresne
  Labels: repair
 Fix For: 0.8.1


 Running repair in cases where a stream fails we are seeing multiple problems.
 1. Although retry is initiated and completes, the old stream doesn't seem to 
 clean itself up and repair hangs.
 2. The temp files are left behind and multiple failures can end up filling up 
 the data partition.
 These issues together are making repair very difficult for nearly everyone 
 running repair on a non-trivial sized data set.
 This issue is also being worked on w.r.t CASSANDRA-2088, however that was 
 moved to 0.8 for a few reasons. This ticket is to fix the immediate issues 
 that we are seeing in 0.7.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2433) Failed Streams Break Repair

2011-06-15 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2433:


Attachment: (was: 
0002-Register-in-gossip-to-handle-node-failures-v2.patch)

 Failed Streams Break Repair
 ---

 Key: CASSANDRA-2433
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2433
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Benjamin Coverston
Assignee: Sylvain Lebresne
  Labels: repair
 Fix For: 0.8.1


 Running repair in cases where a stream fails we are seeing multiple problems.
 1. Although retry is initiated and completes, the old stream doesn't seem to 
 clean itself up and repair hangs.
 2. The temp files are left behind and multiple failures can end up filling up 
 the data partition.
 These issues together are making repair very difficult for nearly everyone 
 running repair on a non-trivial sized data set.
 This issue is also being worked on w.r.t CASSANDRA-2088, however that was 
 moved to 0.8 for a few reasons. This ticket is to fix the immediate issues 
 that we are seeing in 0.7.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2433) Failed Streams Break Repair

2011-06-15 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2433:


Attachment: (was: 
0001-Put-repair-session-on-a-Stage-and-add-a-method-to-re-v3.patch)

 Failed Streams Break Repair
 ---

 Key: CASSANDRA-2433
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2433
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Benjamin Coverston
Assignee: Sylvain Lebresne
  Labels: repair
 Fix For: 0.8.1


 Running repair in cases where a stream fails we are seeing multiple problems.
 1. Although retry is initiated and completes, the old stream doesn't seem to 
 clean itself up and repair hangs.
 2. The temp files are left behind and multiple failures can end up filling up 
 the data partition.
 These issues together are making repair very difficult for nearly everyone 
 running repair on a non-trivial sized data set.
 This issue is also being worked on w.r.t CASSANDRA-2088, however that was 
 moved to 0.8 for a few reasons. This ticket is to fix the immediate issues 
 that we are seeing in 0.7.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2433) Failed Streams Break Repair

2011-06-15 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2433:


Attachment: (was: 
0001-Put-repair-session-on-a-Stage-and-add-a-method-to-re-v2.patch)

 Failed Streams Break Repair
 ---

 Key: CASSANDRA-2433
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2433
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Benjamin Coverston
Assignee: Sylvain Lebresne
  Labels: repair
 Fix For: 0.8.1


 Running repair in cases where a stream fails we are seeing multiple problems.
 1. Although retry is initiated and completes, the old stream doesn't seem to 
 clean itself up and repair hangs.
 2. The temp files are left behind and multiple failures can end up filling up 
 the data partition.
 These issues together are making repair very difficult for nearly everyone 
 running repair on a non-trivial sized data set.
 This issue is also being worked on w.r.t CASSANDRA-2088, however that was 
 moved to 0.8 for a few reasons. This ticket is to fix the immediate issues 
 that we are seeing in 0.7.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2433) Failed Streams Break Repair

2011-06-15 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2433:


Attachment: (was: 0002-Register-in-gossip-to-handle-node-failures.patch)

 Failed Streams Break Repair
 ---

 Key: CASSANDRA-2433
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2433
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Benjamin Coverston
Assignee: Sylvain Lebresne
  Labels: repair
 Fix For: 0.8.1


 Running repair in cases where a stream fails we are seeing multiple problems.
 1. Although retry is initiated and completes, the old stream doesn't seem to 
 clean itself up and repair hangs.
 2. The temp files are left behind and multiple failures can end up filling up 
 the data partition.
 These issues together are making repair very difficult for nearly everyone 
 running repair on a non-trivial sized data set.
 This issue is also being worked on w.r.t CASSANDRA-2088, however that was 
 moved to 0.8 for a few reasons. This ticket is to fix the immediate issues 
 that we are seeing in 0.7.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2433) Failed Streams Break Repair

2011-06-15 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2433:


Attachment: (was: 
0004-Reports-validation-compaction-errors-back-to-repair-v2.patch)

 Failed Streams Break Repair
 ---

 Key: CASSANDRA-2433
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2433
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Benjamin Coverston
Assignee: Sylvain Lebresne
  Labels: repair
 Fix For: 0.8.1


 Running repair in cases where a stream fails we are seeing multiple problems.
 1. Although retry is initiated and completes, the old stream doesn't seem to 
 clean itself up and repair hangs.
 2. The temp files are left behind and multiple failures can end up filling up 
 the data partition.
 These issues together are making repair very difficult for nearly everyone 
 running repair on a non-trivial sized data set.
 This issue is also being worked on w.r.t CASSANDRA-2088, however that was 
 moved to 0.8 for a few reasons. This ticket is to fix the immediate issues 
 that we are seeing in 0.7.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2433) Failed Streams Break Repair

2011-06-15 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2433:


Attachment: (was: 
0004-Reports-validation-compaction-errors-back-to-repair.patch)

 Failed Streams Break Repair
 ---

 Key: CASSANDRA-2433
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2433
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Benjamin Coverston
Assignee: Sylvain Lebresne
  Labels: repair
 Fix For: 0.8.1


 Running repair in cases where a stream fails we are seeing multiple problems.
 1. Although retry is initiated and completes, the old stream doesn't seem to 
 clean itself up and repair hangs.
 2. The temp files are left behind and multiple failures can end up filling up 
 the data partition.
 These issues together are making repair very difficult for nearly everyone 
 running repair on a non-trivial sized data set.
 This issue is also being worked on w.r.t CASSANDRA-2088, however that was 
 moved to 0.8 for a few reasons. This ticket is to fix the immediate issues 
 that we are seeing in 0.7.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2433) Failed Streams Break Repair

2011-06-15 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2433:


Attachment: (was: 0003-Report-streaming-errors-back-to-repair-v2.patch)

 Failed Streams Break Repair
 ---

 Key: CASSANDRA-2433
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2433
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Benjamin Coverston
Assignee: Sylvain Lebresne
  Labels: repair
 Fix For: 0.8.1


 Running repair in cases where a stream fails we are seeing multiple problems.
 1. Although retry is initiated and completes, the old stream doesn't seem to 
 clean itself up and repair hangs.
 2. The temp files are left behind and multiple failures can end up filling up 
 the data partition.
 These issues together are making repair very difficult for nearly everyone 
 running repair on a non-trivial sized data set.
 This issue is also being worked on w.r.t CASSANDRA-2088, however that was 
 moved to 0.8 for a few reasons. This ticket is to fix the immediate issues 
 that we are seeing in 0.7.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2433) Failed Streams Break Repair

2011-06-15 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2433:


Attachment: (was: 0003-Report-streaming-errors-back-to-repair-v3.patch)

 Failed Streams Break Repair
 ---

 Key: CASSANDRA-2433
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2433
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Benjamin Coverston
Assignee: Sylvain Lebresne
  Labels: repair
 Fix For: 0.8.1


 Running repair in cases where a stream fails we are seeing multiple problems.
 1. Although retry is initiated and completes, the old stream doesn't seem to 
 clean itself up and repair hangs.
 2. The temp files are left behind and multiple failures can end up filling up 
 the data partition.
 These issues together are making repair very difficult for nearly everyone 
 running repair on a non-trivial sized data set.
 This issue is also being worked on w.r.t CASSANDRA-2088, however that was 
 moved to 0.8 for a few reasons. This ticket is to fix the immediate issues 
 that we are seeing in 0.7.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2433) Failed Streams Break Repair

2011-06-15 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2433:


Attachment: 
0004-Reports-validation-compaction-errors-back-to-repair-v4.patch
0003-Report-streaming-errors-back-to-repair-v4.patch
0002-Register-in-gossip-to-handle-node-failures-v4.patch

0001-Put-repair-session-on-a-Stage-and-add-a-method-to-re-v4.patch

Attaching v4 that is rebased and simply set the reties variable in 
StreamInSession volatile after all (I've removed old version because it was a 
mess).

 Failed Streams Break Repair
 ---

 Key: CASSANDRA-2433
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2433
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Benjamin Coverston
Assignee: Sylvain Lebresne
  Labels: repair
 Fix For: 0.8.1

 Attachments: 
 0001-Put-repair-session-on-a-Stage-and-add-a-method-to-re-v4.patch, 
 0002-Register-in-gossip-to-handle-node-failures-v4.patch, 
 0003-Report-streaming-errors-back-to-repair-v4.patch, 
 0004-Reports-validation-compaction-errors-back-to-repair-v4.patch


 Running repair in cases where a stream fails we are seeing multiple problems.
 1. Although retry is initiated and completes, the old stream doesn't seem to 
 clean itself up and repair hangs.
 2. The temp files are left behind and multiple failures can end up filling up 
 the data partition.
 These issues together are making repair very difficult for nearly everyone 
 running repair on a non-trivial sized data set.
 This issue is also being worked on w.r.t CASSANDRA-2088, however that was 
 moved to 0.8 for a few reasons. This ticket is to fix the immediate issues 
 that we are seeing in 0.7.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2405) should expose 'time since last successful repair' for easier aes monitoring

2011-06-15 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049770#comment-13049770
 ] 

Sylvain Lebresne commented on CASSANDRA-2405:
-

The problem with using the completion time as the (Super)Column name is that 
you have to wait the end of the repair to store anything. First, this will not 
capture started but failed session (which while not mandatory could be nice, 
especially as soon as we will start keeping a bit more info this could help 
troubleshooting). And Second, it will be a pain to have to keep some of the 
information until the end (the processingStartedAt is a first sign of this). 
And third, we may want to keep some info on say merkle tree creation on all 
replica participating in the repair, even though we only store the completed 
time on the node initiating the repair.

So I would propose to something like:
  row key: KS/CF
  super column name: repair session name (a TimeUUID)
  columns: the infos on the session (range, start and end time, number of range 
repaired, bytes transferred, ...)

That is roughly the same thing as you propose but with super column name being 
the repair session name.

Now, because the repair session names are TimeUUID (well, right now it is a 
sting including a UUID, we can change it to a simple TimeUUID easily), the 
session will be ordered by creation time. So getting the last successful repair 
is probably not too hard: just grab the last 1000 created sessions and find the 
last successful one.
And if we want, we can even use another specific index row that associate 
'completion time' - 'session UUID' (and thanks to the new DynamicCompositeType 
we can have some rows ordered by TimeUUIDType and some other ordered by 
LongType without the need of multiple system table).

 should expose 'time since last successful repair' for easier aes monitoring
 ---

 Key: CASSANDRA-2405
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2405
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Peter Schuller
Assignee: Pavel Yaskevich
Priority: Minor
 Fix For: 0.8.1

 Attachments: CASSANDRA-2405-v2.patch, CASSANDRA-2405-v3.patch, 
 CASSANDRA-2405.patch


 The practical implementation issues of actually ensuring repair runs is 
 somewhat of an undocumented/untreated issue.
 One hopefully low hanging fruit would be to at least expose the time since 
 last successful repair for a particular column family, to make it easier to 
 write a correct script to monitor for lack of repair in a non-buggy fashion.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2405) should expose 'time since last successful repair' for easier aes monitoring

2011-06-15 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049784#comment-13049784
 ] 

Sylvain Lebresne commented on CASSANDRA-2405:
-

I'm keen on adding persisted stats for repair for CASSANDRA-2698. Recording the 
start and end time of repair also amounts to persisting stats on repair. Given 
that, I don't care too much about what the description of this says, but I'm 
pretty much opposed to doing anything here that would make CASSANDRA-2698 much 
harder that it needs unless there is a good reason, and I don't see one. I'm 
happy with making this a duplicate or dependency of CASSANDRA-2698 though.

 should expose 'time since last successful repair' for easier aes monitoring
 ---

 Key: CASSANDRA-2405
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2405
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Peter Schuller
Assignee: Pavel Yaskevich
Priority: Minor
 Fix For: 0.8.1

 Attachments: CASSANDRA-2405-v2.patch, CASSANDRA-2405-v3.patch, 
 CASSANDRA-2405.patch


 The practical implementation issues of actually ensuring repair runs is 
 somewhat of an undocumented/untreated issue.
 One hopefully low hanging fruit would be to at least expose the time since 
 last successful repair for a particular column family, to make it easier to 
 write a correct script to monitor for lack of repair in a non-buggy fashion.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2774) one way to make counter delete work better

2011-06-15 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049834#comment-13049834
 ] 

Sylvain Lebresne commented on CASSANDRA-2774:
-


bq. I think with quorum delete you will guarantee timing to be consistent eoyh 
client And then achieve client expected result I. Your Case, id like to hear 
your counter example

Consider a cluster with RF=3 and counter c replicated on node A, B and C.  
Consider that all operation are done by the same client connected to some other 
node (doesn't have to be the same each time but can be). All operations are 
performed at QUORUM consistency level.

The client does the following operations:
# increment c by 1
# delete c
# increment c by 1
# reads c

Because QUORUM is 2, depending on internal timings (latency on the wire and 
such), either only 2 or the 3 nodes will have seen each write once it is acked 
to the client. Again, for the same inputs and depending on timing, the client 
could get on the read a variety of results:
* 1 if each node have received each operation in the order issued.
* 0 or 2, if for instance, by the time the read is issued:
** the first increment only reached B and C
** the deletion only reached A and C
** the second increment only reached A and B and it happens that the two first 
node answering the read are B and C. The exact value depends on the exact rules 
for dealing with the epoch number, but in any case, B would only have the two 
increments and C would have the first increment and deletion (issued after the 
increment, so the deletion wins). So B will answer 2 and C will answer a 
tombstone. Whatever resolution the coordinator does, it just cannot return 1 
that time.


 one way to make counter delete work better
 --

 Key: CASSANDRA-2774
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2774
 Project: Cassandra
  Issue Type: New Feature
Affects Versions: 0.8.0
Reporter: Yang Yang
 Attachments: counter_delete.diff


 current Counter does not work with delete, because different merging order of 
 sstables would produces different result, for example:
 add 1
 delete 
 add 2
 if the merging happens by 1-2, (1,2)--3  order, the result we see will be 2
 if merging is: 1--3, (1,3)--2, the result will be 3.
 the issue is that delete now can not separate out previous adds and adds 
 later than the delete. supposedly a delete is to create a completely new 
 incarnation of the counter, or a new lifetime, or epoch. the new approach 
 utilizes the concept of epoch number, so that each delete bumps up the 
 epoch number. since each write is replicated (replicate on write is almost 
 always enabled in practice, if this is a concern, we could further force ROW 
 in case of delete ), so the epoch number is global to a replica set
 changes are attached, existing tests pass fine, some tests are modified since 
 the semantic is changed a bit. some cql tests do not pass in the original 
 0.8.0 source, that's not the fault of this change.
 see details at 
 http://mail-archives.apache.org/mod_mbox/cassandra-user/201106.mbox/%3cbanlktikqcglsnwtt-9hvqpseoo7sf58...@mail.gmail.com%3E
 the goal of this is to make delete work ( at least with consistent behavior, 
 yes in case of long network partition, the behavior is not ideal, but it's 
 consistent with the definition of logical clock), so that we could have 
 expiring Counters

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2405) should expose 'time since last successful repair' for easier aes monitoring

2011-06-15 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049882#comment-13049882
 ] 

Sylvain Lebresne commented on CASSANDRA-2405:
-

Hum, the thing is that there will be many repair sessions for a given set of 
KS/CF and range. So you need one of the key (either row key or supercolumn 
name) to be the session_id (or anything that is unique to a session). If you 
use a row for each KS/CF pair and one super column for each session, you will 
have one super column for each repair made in a session (or kind of, you will 
indeed have multiple merkle tree for instance, one for each replica, but we can 
easily prefix the column with the replica name if need be). 

 should expose 'time since last successful repair' for easier aes monitoring
 ---

 Key: CASSANDRA-2405
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2405
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Peter Schuller
Assignee: Pavel Yaskevich
Priority: Minor
 Fix For: 0.8.1

 Attachments: CASSANDRA-2405-v2.patch, CASSANDRA-2405-v3.patch, 
 CASSANDRA-2405.patch


 The practical implementation issues of actually ensuring repair runs is 
 somewhat of an undocumented/untreated issue.
 One hopefully low hanging fruit would be to at least expose the time since 
 last successful repair for a particular column family, to make it easier to 
 write a correct script to monitor for lack of repair in a non-buggy fashion.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2369) support replication decisions per-key

2011-06-15 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049899#comment-13049899
 ] 

Sylvain Lebresne commented on CASSANDRA-2369:
-

Let me also add that if you allow that, load balancing will be a bitch. One may 
argue it should be the problem of whomever wants to use this, but I'm not sure 
that providing tools that make foot-shooting too easy is such a good idea.

 support replication decisions per-key
 -

 Key: CASSANDRA-2369
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2369
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jonathan Ellis
Assignee: Vijay
Priority: Minor
 Fix For: 1.0


 Currently the replicationstrategy gets a token and a keyspace with which to 
 decide how to place replicas.  for per-row replication this is insufficient 
 because tokenization is lossy (CASSANDRA-1034).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2774) one way to make counter delete work better

2011-06-15 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049912#comment-13049912
 ] 

Sylvain Lebresne commented on CASSANDRA-2774:
-

bq. but as I stated in my last comment, at least we can be sure that the new 
approach guarantees some common agreement eventually. 

It is already the case with the current implementation. Once compaction has 
compacted the deletes, all node will reach common agreement.

bq. it would be nice if we achieve the agreement in case of quorum, but that's 
not my main argument

My main argument is that this patch slightly change the behavior here and there 
but I don't think it adds any tangible new guarantee that people can work with. 
On the other side, it adds a fairly heavy performance hit by adding a read 
before write on every replica (and though you won't necessary do a read for 
every write, you will do that read more often than not as soon as the set of 
counters you're incrementing is not small enough).

 one way to make counter delete work better
 --

 Key: CASSANDRA-2774
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2774
 Project: Cassandra
  Issue Type: New Feature
Affects Versions: 0.8.0
Reporter: Yang Yang
 Attachments: counter_delete.diff


 current Counter does not work with delete, because different merging order of 
 sstables would produces different result, for example:
 add 1
 delete 
 add 2
 if the merging happens by 1-2, (1,2)--3  order, the result we see will be 2
 if merging is: 1--3, (1,3)--2, the result will be 3.
 the issue is that delete now can not separate out previous adds and adds 
 later than the delete. supposedly a delete is to create a completely new 
 incarnation of the counter, or a new lifetime, or epoch. the new approach 
 utilizes the concept of epoch number, so that each delete bumps up the 
 epoch number. since each write is replicated (replicate on write is almost 
 always enabled in practice, if this is a concern, we could further force ROW 
 in case of delete ), so the epoch number is global to a replica set
 changes are attached, existing tests pass fine, some tests are modified since 
 the semantic is changed a bit. some cql tests do not pass in the original 
 0.8.0 source, that's not the fault of this change.
 see details at 
 http://mail-archives.apache.org/mod_mbox/cassandra-user/201106.mbox/%3cbanlktikqcglsnwtt-9hvqpseoo7sf58...@mail.gmail.com%3E
 the goal of this is to make delete work ( at least with consistent behavior, 
 yes in case of long network partition, the behavior is not ideal, but it's 
 consistent with the definition of logical clock), so that we could have 
 expiring Counters

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2769) Cannot Create Duplicate Compaction Marker

2011-06-15 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049923#comment-13049923
 ] 

Sylvain Lebresne commented on CASSANDRA-2769:
-

Alright, I've committed the 0.8 patch. I'll have a look at the checks.

 Cannot Create Duplicate Compaction Marker
 -

 Key: CASSANDRA-2769
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2769
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Benjamin Coverston
Assignee: Sylvain Lebresne
 Fix For: 0.8.1, 1.0

 Attachments: 
 0001-0.8.0-Remove-useless-unmarkCompacting-in-doCleanup.patch, 
 0001-Do-compact-only-smallerSSTables.patch, 
 0002-Only-compact-what-has-been-succesfully-marked-as-com.patch


 Concurrent compaction can trigger the following exception when two threads 
 compact the same sstable. DataTracker attempts to prevent this but apparently 
 not successfully.
 java.io.IOError: java.io.IOException: Unable to create compaction marker
   at 
 org.apache.cassandra.io.sstable.SSTableReader.markCompacted(SSTableReader.java:638)
   at 
 org.apache.cassandra.db.DataTracker.removeOldSSTablesSize(DataTracker.java:321)
   at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:294)
   at 
 org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:255)
   at 
 org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:932)
   at 
 org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:173)
   at 
 org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:119)
   at 
 org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:102)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:680)
 Caused by: java.io.IOException: Unable to create compaction marker
   at 
 org.apache.cassandra.io.sstable.SSTableReader.markCompacted(SSTableReader.java:634)
   ... 12 more

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2405) should expose 'time since last successful repair' for easier aes monitoring

2011-06-15 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049962#comment-13049962
 ] 

Sylvain Lebresne commented on CASSANDRA-2405:
-

Nothing against that, though if we're going to have only a handful of rows in 
each it could be more efficient/cleaner to use the DynamicCompositeType instead 
of the creating two different CFs. Though if you absolutely prefer 2 CFs I 
won't fight against it.

 should expose 'time since last successful repair' for easier aes monitoring
 ---

 Key: CASSANDRA-2405
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2405
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Peter Schuller
Assignee: Pavel Yaskevich
Priority: Minor
 Fix For: 0.8.1

 Attachments: CASSANDRA-2405-v2.patch, CASSANDRA-2405-v3.patch, 
 CASSANDRA-2405.patch


 The practical implementation issues of actually ensuring repair runs is 
 somewhat of an undocumented/untreated issue.
 One hopefully low hanging fruit would be to at least expose the time since 
 last successful repair for a particular column family, to make it easier to 
 write a correct script to monitor for lack of repair in a non-buggy fashion.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2521) Move away from Phantom References for Compaction/Memtable

2011-06-16 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050287#comment-13050287
 ] 

Sylvain Lebresne commented on CASSANDRA-2521:
-

Actually I started working on this yesterday evening and I think I'm almost 
done. So re-assigning to myself for now :)

 Move away from Phantom References for Compaction/Memtable
 -

 Key: CASSANDRA-2521
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2521
 Project: Cassandra
  Issue Type: Improvement
Reporter: Chris Goffinet
Assignee: Sylvain Lebresne
 Fix For: 1.0


 http://wiki.apache.org/cassandra/MemtableSSTable
 Let's move to using reference counting instead of relying on GC to be called 
 in StorageService.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2521) Move away from Phantom References for Compaction/Memtable

2011-06-16 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2521:


Attachment: 0001-Use-reference-counting-to-decide-when-a-sstable-can-.patch

Attaching patch against trunk. Tests are passing and it seems to work, at least 
with small tests. I started a stress on a 3 node cluster with a repair and a 
major compaction started towards the end and compacted files did wait to be 
fully streamed to be removed and I didn't hit any bump (I did hit 
CASSANDRA-2769 a bunch of time but that's another story).

Still, this is a fairly tricky problem so it could use other eyes. The basics 
are fairly simple though: each time a thread want to do something with a 
SSTableReader, it acquires a reference to that sstable and releases it when 
done. SSTableReader just keep a counter of acquired references. When the 
sstable has been marked compacted, we start looking until all acquired 
reference has been released. When that's the case, the file can be removed.

Obviously the main drawback of this approach (compared to the phantomReference 
one) is that there is room for error. If a consumer forgot to acquire a 
reference (or do it in a non-thread-safe manner), the sstable can be removed.  
Thankfully there is not so many place in the code that needs to do this so 
hopefully I haven't missed any place.

The other thing is that if a reference on a sstable is acquired, it should be 
released (otherwise the sstable will not be removed until next restart). I've 
try to ensure this using try-catch block, but it's not really possible with the 
way streaming works. However, if streaming fails, it's not really worst than 
before since the files where not cleaned due to the (failed) session staying in 
the global map of streaming sessions. CASSANDRA-2433 should fix that in most 
cases anyway.

Last thing is http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4715154.  In 
other words, the deletion of a file won't work until the mmapping is finalized 
(aka, GC, Where art thou), at least on windows. For that reason, when the 
deletion of file fails (after the usual number of retries, which btw may make 
less sense now), the deletion task is saved in a global list. If Cassandra is 
low on disk, it will still trigger a GC, after which it will reschedule all 
failed files in the hope they can now be deleted. There is also a JMX call to 
retry this rescheduling.


 Move away from Phantom References for Compaction/Memtable
 -

 Key: CASSANDRA-2521
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2521
 Project: Cassandra
  Issue Type: Improvement
Reporter: Chris Goffinet
Assignee: Sylvain Lebresne
 Fix For: 1.0

 Attachments: 
 0001-Use-reference-counting-to-decide-when-a-sstable-can-.patch


 http://wiki.apache.org/cassandra/MemtableSSTable
 Let's move to using reference counting instead of relying on GC to be called 
 in StorageService.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-2782) Create a debian package for the CQL drivers

2011-06-16 Thread Sylvain Lebresne (JIRA)
Create a debian package for the CQL drivers
---

 Key: CASSANDRA-2782
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2782
 Project: Cassandra
  Issue Type: Wish
  Components: Packaging
Reporter: Sylvain Lebresne
Priority: Minor


Since the CQL drivers are not release in lockstep with Cassandra, they are 
excluded from the Cassandra debian package. Creating a debian package for them 
could make debian user's live a bit easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-2783) Explore adding replay ID for counters

2011-06-16 Thread Sylvain Lebresne (JIRA)
Explore adding replay ID for counters
-

 Key: CASSANDRA-2783
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2783
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Sylvain Lebresne


If a counter write returns a TimeoutException, the client cannot retry its 
write without risking an overcount.

One idea to fix this would be to allow the client to specify a replay ID with 
each counter write unique to the write. If the write timeout, the client would 
resubmit the write with the same replay ID and the system would ensure that 
write is only replayed if the previous write was not persisted.

Of course, the last part of this (the system would ensure ...) is the hard 
part. Still worth exploring I believe.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2769) Cannot Create Duplicate Compaction Marker

2011-06-17 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2769:


Attachment: 
0002-Only-compact-what-has-been-succesfully-marked-as-com-v2.patch
0001-Do-compact-only-smallerSSTables-v2.patch

bq. For trunk patches, I'm not comfortable w/ 0001 reassigning the sstables 
field on general principles either. We could have the compaction proceed using 
smallerSSTables as a simpler alternative, but in general this organization 
feels like negative progress from the 0.8 
doCompaction/doCompactionWithoutSizeEstimation.

Attaching v2 that doesn't reassign the sstables field.

bq. I think Alan has a good point. I don't think it's an appropriate role of 
the data tracker to modify the set of sstables to be compacted in a task.

I do not disagree with that. However I'd like that we fix trunk as a first 
priority. It's a pain to work on other issues (CASSANDRA-2521 for instance) 
while it is broken (and the goal must be to do our best to always have a 
working trunk). The attached patches doesn't really change any behavior, it 
just fixes the bugs, so let's get that in first before thinking about 
refactoring.


 Cannot Create Duplicate Compaction Marker
 -

 Key: CASSANDRA-2769
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2769
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Benjamin Coverston
Assignee: Sylvain Lebresne
 Fix For: 0.8.2

 Attachments: 
 0001-0.8.0-Remove-useless-unmarkCompacting-in-doCleanup.patch, 
 0001-Do-compact-only-smallerSSTables-v2.patch, 
 0001-Do-compact-only-smallerSSTables.patch, 
 0002-Only-compact-what-has-been-succesfully-marked-as-com-v2.patch, 
 0002-Only-compact-what-has-been-succesfully-marked-as-com.patch


 Concurrent compaction can trigger the following exception when two threads 
 compact the same sstable. DataTracker attempts to prevent this but apparently 
 not successfully.
 java.io.IOError: java.io.IOException: Unable to create compaction marker
   at 
 org.apache.cassandra.io.sstable.SSTableReader.markCompacted(SSTableReader.java:638)
   at 
 org.apache.cassandra.db.DataTracker.removeOldSSTablesSize(DataTracker.java:321)
   at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:294)
   at 
 org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:255)
   at 
 org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:932)
   at 
 org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:173)
   at 
 org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:119)
   at 
 org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:102)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:680)
 Caused by: java.io.IOException: Unable to create compaction marker
   at 
 org.apache.cassandra.io.sstable.SSTableReader.markCompacted(SSTableReader.java:634)
   ... 12 more

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2521) Move away from Phantom References for Compaction/Memtable

2011-06-17 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2521:


Attachment: 0002-Force-unmapping-files-before-deletion-v2.patch

0001-Use-reference-counting-to-decide-when-a-sstable-can-v2.patch

Attaching rebased first patch and a second patch to implement the Cleaner 
trick.

I have confirmed on an example that, at least on linux, it does force the 
unmapping: the jvm crashes if you try to access the buffer after the unmapping.

This is the biggest drawback of this approach imho. If we screw up with the 
reference counting and some thread does access the mapping, we won't get a nice 
exception, the JVM will simply crash (with the headache of having to find if it 
does is a bug on our side or a JVM bug). But for the quick testing I've done, 
it seems to work correctly.

 Move away from Phantom References for Compaction/Memtable
 -

 Key: CASSANDRA-2521
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2521
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Chris Goffinet
Assignee: Sylvain Lebresne
 Fix For: 1.0

 Attachments: 
 0001-Use-reference-counting-to-decide-when-a-sstable-can-.patch, 
 0001-Use-reference-counting-to-decide-when-a-sstable-can-v2.patch, 
 0002-Force-unmapping-files-before-deletion-v2.patch


 http://wiki.apache.org/cassandra/MemtableSSTable
 Let's move to using reference counting instead of relying on GC to be called 
 in StorageService.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2788) Add startup option renew the NodeId (for counters)

2011-06-17 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2788:


Attachment: 0001-Option-to-renew-the-NodeId-on-startup.patch

 Add startup option renew the NodeId (for counters)
 --

 Key: CASSANDRA-2788
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2788
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 0.8.0
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
  Labels: counters
 Fix For: 0.8.2

 Attachments: 0001-Option-to-renew-the-NodeId-on-startup.patch


 If an sstable of a counter column family is corrupted, the only safe solution 
 a user have right now is to:
 # Remove the NodeId System table to force the node to regenerate a new NodeId 
 (and thus stop incrementing on it's previous, corrupted, subcount)
 # Remove all the sstables for that column family on that node (this is 
 important because otherwise the node will never get repaired for it's 
 previous subcount)
 This is far from being ideal, but I think this is the price we pay for 
 avoiding the read-before-write. In any case, the first step (remove the 
 NodeId system table) happens to remove the list of the old NodeId this node 
 has, which could prevent us for merging the other potential previous nodeId. 
 This is ok but sub-optimal. This ticket proposes to add a new startup flag to 
 make the node renew it's NodeId, thus replacing this first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-2788) Add startup option renew the NodeId (for counters)

2011-06-17 Thread Sylvain Lebresne (JIRA)
Add startup option renew the NodeId (for counters)
--

 Key: CASSANDRA-2788
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2788
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 0.8.0
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 0.8.2
 Attachments: 0001-Option-to-renew-the-NodeId-on-startup.patch

If an sstable of a counter column family is corrupted, the only safe solution a 
user have right now is to:
# Remove the NodeId System table to force the node to regenerate a new NodeId 
(and thus stop incrementing on it's previous, corrupted, subcount)
# Remove all the sstables for that column family on that node (this is 
important because otherwise the node will never get repaired for it's 
previous subcount)

This is far from being ideal, but I think this is the price we pay for avoiding 
the read-before-write. In any case, the first step (remove the NodeId system 
table) happens to remove the list of the old NodeId this node has, which could 
prevent us for merging the other potential previous nodeId. This is ok but 
sub-optimal. This ticket proposes to add a new startup flag to make the node 
renew it's NodeId, thus replacing this first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2786) After a minor compaction, deleted key-slices are visible again

2011-06-17 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051110#comment-13051110
 ] 

Sylvain Lebresne commented on CASSANDRA-2786:
-

The java version would be really cool :)

 After a minor compaction, deleted key-slices are visible again
 --

 Key: CASSANDRA-2786
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2786
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0
 Environment: Single node with empty database
Reporter: rene kochen
 Attachments: CassandraIssue.zip


 After a minor compaction, deleted key-slices are visible again.
 Steps to reproduce:
 1) Insert a row named test.
 2) Insert 50 rows. During this step, test is included in a major 
 compaction.
 3) Delete row named test.
 4) Insert 50 rows. During this step, test is included in a minor 
 compaction.
 After step 4, row test is live again.
 Test environment:
 Single node with empty database.
 Standard configured super-column-family (I see this behavior with several 
 gc_grace settings (big and small values):
 create column family Customers with column_type = 'Super' and comparator = 
 'BytesType;
 In Cassandra 0.7.6 I observe the expected behavior, i.e. after step 4, the 
 row is still deleted.
 I've included a .NET program to reproduce the problem. I will add a Java 
 version later on.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2653) index scan errors out when zero columns are requested

2011-06-19 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051798#comment-13051798
 ] 

Sylvain Lebresne commented on CASSANDRA-2653:
-

This really primarily fixes the error from Jake's test cases. I'll have to 
admit that's the only I looked. I did not realize the original problem was not 
necessarily related and so it is very possible (even likely) this does not fix 
the zero-columns-requested problem.

 index scan errors out when zero columns are requested
 -

 Key: CASSANDRA-2653
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2653
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0 beta 2
Reporter: Jonathan Ellis
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 0.8.1

 Attachments: 0001-Reset-SSTII-in-EchoedRow-constructor.patch, 
 v1-0001-CASSANDRA-2653-reproduce-regression.txt


 As reported by Tyler Hobbs as an addendum to CASSANDRA-2401,
 {noformat}
 ERROR 16:13:38,864 Fatal exception in thread Thread[ReadStage:16,5,main]
 java.lang.AssertionError: No data found for 
 SliceQueryFilter(start=java.nio.HeapByteBuffer[pos=10 lim=10 cap=30], 
 finish=java.nio.HeapByteBuffer[pos=17 lim=17 cap=30], reversed=false, 
 count=0] in DecoratedKey(81509516161424251288255223397843705139, 
 6b657931):QueryPath(columnFamilyName='cf', superColumnName='null', 
 columnName='null') (original filter 
 SliceQueryFilter(start=java.nio.HeapByteBuffer[pos=10 lim=10 cap=30], 
 finish=java.nio.HeapByteBuffer[pos=17 lim=17 cap=30], reversed=false, 
 count=0]) from expression 'cf.626972746864617465 EQ 1'
   at 
 org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1517)
   at 
 org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:42)
   at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2735) Timestamp Based Compaction Strategy

2011-06-20 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051836#comment-13051836
 ] 

Sylvain Lebresne commented on CASSANDRA-2735:
-

The goal here is not to have TTL for counters (or anything else). The goal is 
to have a compaction strategy that as part of what it does can throw up entire 
sstable when the content is considered old enough (and that's actually only 
part of the strategy, not necessarily its primary goal). As it turns out, this 
will roughly (and the rough part is important) amount to expire data, including 
counters. But this will be a very heavy hammer, in particular it will only work 
if all the counter/data in the column family have the exact same expiration 
time. And this won't work at all for say counters that you would want to start 
re-incrementing after expiration. But again, this is not the goal of the ticket.

 Timestamp Based Compaction Strategy
 ---

 Key: CASSANDRA-2735
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2735
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Alan Liang
Assignee: Alan Liang
Priority: Minor
  Labels: compaction
 Attachments: 0004-timestamp-bucketed-compaction-strategy.patch


 Compaction strategy implementation based on max timestamp ordering of the 
 sstables while satisfying max sstable size, min and max compaction 
 thresholds. It also handles expiration of sstables based on a timestamp.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2773) Index manager cannot support deleting and inserting into a row in the same mutation

2011-06-20 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051871#comment-13051871
 ] 

Sylvain Lebresne commented on CASSANDRA-2773:
-

Hum, we cannot remove the column from cf in ignoreObsoleteMutations() because 
cf is the original column family from the row mutation and that's racy with 
commit log write (à la CASSANDRA-2604). We should clone the column family, but 
maybe it's simpler to add validation logic after all ? In any case, it could be 
worth it adding some comment in Table.apply() or 
Table.ignoreObsoleteMutations(). 

 Index manager cannot support deleting and inserting into a row in the same 
 mutation
 -

 Key: CASSANDRA-2773
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2773
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Boris Yen
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.2

 Attachments: 2773.txt


 I use hector 0.8.0-1 and cassandra 0.8.
 1. create mutator by using hector api, 
 2. Insert a few columns into the mutator for key key1, cf standard. 
 3. add a deletion to the mutator to delete the record of key1, cf 
 standard.
 4. repeat 2 and 3
 5. execute the mutator.
 the result: the connection seems to be held by the sever forever, it never 
 returns. when I tried to restart the cassandra I saw unsupportedexception : 
 Index manager cannot support deleting and inserting into a row in the same 
 mutation. and the cassandra is dead forever, unless I delete the commitlog. 
 I would expect to get an exception when I execute the mutator, not after I 
 restart the cassandra.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2795) Autodelete empty rows

2011-06-20 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051909#comment-13051909
 ] 

Sylvain Lebresne commented on CASSANDRA-2795:
-

bq. I tested setting gc_grace very low (tried 0 and 1) in a single node, and 
the row didn't disappear.

Ok, to be precise you need to have a compaction occuring after gc_grace has 
passed. So you'll need to flush after the insertion, wait for the column to 
expire, force a compaction, wait for it to finish and then request.

 Autodelete empty rows
 -

 Key: CASSANDRA-2795
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2795
 Project: Cassandra
  Issue Type: Improvement
  Components: Core, Tools
Affects Versions: 0.8.0
Reporter: Pau Rodriguez

 In a system where every column expire using TTL. The rows persist, and they 
 are empty.
 If is possible to also delete them if empty when last column had expired.
 I understand that this may be difficult to synchronize between all the 
 cluster.
 If this behavior isn't good for all cases, maybe can be configured in a 
 variable per Column Family.
 Alternatively could be a tool to removed empty rows along all the cluster, 
 the problem to do that using the API is the time between the check is done 
 and the remove is send.
 I think that is preferable to be done when last column has expired.
 Thanks in advance.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (CASSANDRA-2795) Autodelete empty rows

2011-06-20 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne resolved CASSANDRA-2795.
-

Resolution: Not A Problem

What you are seeing is range ghosts: 
http://wiki.apache.org/cassandra/FAQ#range_ghosts

The row *is* correctly deleted when all columns expires. It won't show as a 
range ghost once gc_grace seconds have passed.

 Autodelete empty rows
 -

 Key: CASSANDRA-2795
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2795
 Project: Cassandra
  Issue Type: Improvement
  Components: Core, Tools
Affects Versions: 0.8.0
Reporter: Pau Rodriguez

 In a system where every column expire using TTL. The rows persist, and they 
 are empty.
 If is possible to also delete them if empty when last column had expired.
 I understand that this may be difficult to synchronize between all the 
 cluster.
 If this behavior isn't good for all cases, maybe can be configured in a 
 variable per Column Family.
 Alternatively could be a tool to removed empty rows along all the cluster, 
 the problem to do that using the API is the time between the check is done 
 and the remove is send.
 I think that is preferable to be done when last column has expired.
 Thanks in advance.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2793) SSTable Corrupt (negative) value length encountered exception blocks compaction.

2011-06-20 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051915#comment-13051915
 ] 

Sylvain Lebresne edited comment on CASSANDRA-2793 at 6/20/11 10:46 AM:
---

bq. Hi the issue reported was that the sstable corruption is blocking 
compaction with the consequence the bucket of sstables Cassandra wants to 
compact just grows and you get huge cpu load (from repeated attempts at 
compaction and increasing read inefficiency).

This is a dupe of CASSANDRA-2261.

bq. the trace also shows that it has just skipped the corrupted row so in fact 
it hasn't solved the problem at all.

In most cases of corruption, there is not much more we can do than skip the 
row. As the long as the corruption is local and you don't use RF=1, this is 
usually not a big deal (which does not mean corruption is something we should 
be happy with).

bq. The corruption itself is also an issue

Corruption can be of two forms: either we have a bug or the corruption is 
external (bad hard drive for instance). Hard drive corruptions do happen and 
there is not much we can do about it (well, actually we should use checksum to 
at least better dectect them : CASSANDRA-1717). On the front of a bug, since I 
see this happens on a Super column family, it could be due to a race fixed by 
CASSANDRA-2675.



  was (Author: slebresne):
bq. Hi the issue reported was that the sstable corruption is blocking 
compaction with the consequence the bucket of sstables Cassandra wants to 
compact just grows and you get huge cpu load (from repeated attempts at 
compaction and increasing read inefficiency).

This is a dupe of https://issues.apache.org/jira/browse/CASSANDRA-2261.

bq. the trace also shows that it has just skipped the corrupted row so in fact 
it hasn't solved the problem at all.

In most cases of corruption, there is not much more we can do than skip the 
row. As the long as the corruption is local and you don't use RF=1, this is 
usually not a big deal (which does not mean corruption is something we should 
be happy with).

bq. The corruption itself is also an issue

Corruption can be of two forms: either we have a bug or the corruption is 
external (bad hard drive for instance). Hard drive corruptions do happen and 
there is not much we can do about it (well, actually we should use checksum to 
at least better dectect them : CASSANDRA-1717). On the front of a bug, since I 
see this happens on a Super column family, it could be due to a race fixed by 
CASSANDRA-2675.


  
 SSTable Corrupt (negative) value length encountered exception blocks 
 compaction.
 --

 Key: CASSANDRA-2793
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2793
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.6
 Environment: Ubuntu
Reporter: Dominic Williams

 A node was consistently experiencing high CPU load. Examination of the logs 
 showed that compaction of an sstable was failing with an error:
  INFO [CompactionExecutor:1] 2011-06-17 00:18:51,676 CompactionManager.java 
 (line 395) Compacting 
 [SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6993-Data.db'),SSTableReader(
 path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6994-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6995-Data.db'),SSTableReader(path='/var/opt/cassandra
 /data/FightMyMonster/UserMonsters-f-6996-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6998-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/Use
 rMonsters-f-7000-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7002-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7004-Data.db
 '),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7006-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7008-Data.db'),SSTableReader(path='/
 var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7010-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7012-Data.db'),SSTableReader(path='/var/opt/cassandra/data/F
 ightMyMonster/UserMonsters-f-7014-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7016-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonste
 rs-f-7018-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7020-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7022-Data.db'),SSTa
 

  1   2   3   4   5   6   7   8   9   10   >