[jira] [Commented] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-07-04 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059334#comment-13059334
 ] 

Mck SembWever commented on CASSANDRA-2388:
--

{quote}2) If we ARE in that situation, the right solution would be to send 
the job to a TT whose local replica IS live, not to read the data from a 
nonlocal replica. How can we signal that?{quote}To /really/ solve this issue 
can we do the following? 
In CFIF.getRangeMap() take out of each range any endpoints that are not alive. 
A client connection already exists in this method. This filtering out of dead 
endpoints wouldn't be difficult, and would move tasks *to* the data making use 
of replica. This approach does need a new method in cassandra.thrift, eg 
{{liststring describe_alive_nodes()}}

 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.7.6, 0.8.0
Reporter: Eldon Stegall
Assignee: Jeremy Hanna
  Labels: hadoop, inputformat
 Fix For: 0.7.7, 0.8.2

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388-addition1.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, 
 CASSANDRA-2388.patch, CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2527) Add ability to snapshot data as input to hadoop jobs

2011-07-04 Thread Wojciech Meler (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059356#comment-13059356
 ] 

Wojciech Meler commented on CASSANDRA-2527:
---

It would be great to have more generic client access to snapshot data. Maybe 
snapshots should be visible as new keyspaces? Or maybe we should throw away 
snapshots and start cloning keyspaces? If cloned keyspace could be read-only it 
would work out of the box :).

 Add ability to snapshot data as input to hadoop jobs
 

 Key: CASSANDRA-2527
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2527
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jeremy Hanna
  Labels: hadoop

 It is desirable to have immutable inputs to hadoop jobs for the duration of 
 the job.  That way re-execution of individual tasks do not alter the output.  
 One way to accomplish this would be to snapshot the data that is used as 
 input to a job.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2851) hex-to-bytes conversion accepts invalid inputs silently

2011-07-04 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059362#comment-13059362
 ] 

Sylvain Lebresne commented on CASSANDRA-2851:
-

Why would it be ok for single-character inputs and not other odd-sized inputs ? 
An odd-sized input doesn't (ever) correspond to a valid byte array, so I'd say 
either we always silently add a 0 to make it fit or we never do it. I do 
actually am in favor of throwing an exception rather then coping with it 
silently since it's more likely to indicate a user error than to be helpful 
(but maybe that addition of a '0' in front was there for a reason?).
I'll note that even though I can't imagine why people would generate odd-sized 
hex input, since it is allowed so far, there is a chance someone out there does 
it, and it would be a regression for that guy. So maybe we should target 1.0 
for the sake of making minor upgrade as smooth for everybody as can be.

On the patch side, we must make sure every consumer of hexToBytes() handles the 
new exception (or make it a NumberFormatException but I don't think this is a 
good idea). For instance, at least BytesType.fromString() should catch the 
IllegalArgumentException and rethrow a MarshalException, otherwise CQL will 
crap his pants on odd-sized inputs.

 hex-to-bytes conversion accepts invalid inputs silently
 ---

 Key: CASSANDRA-2851
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2851
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.6, 0.8.1
Reporter: David Allsopp
Priority: Minor
 Fix For: 0.8.2

 Attachments: cassandra-2851.diff


 FBUtilities.hexToBytes() has a minor bug - it copes with single-character 
 inputs by prepending 0, which is OK - but it does this for any input with 
 an odd number of characters, which is probably incorrect.
 {noformat}
 if (str.length() % 2 == 1)
 str = 0 + str;
 {noformat}
 Given 'fff' as an input, can we really assume that this should be '0fff'? 
 Isn't this just an error?
 Add the following to FBUtilitiesTest to demonstrate:
 {noformat}
 String[] badvalues = new String[]{000, fff};

 for (int i = 0; i  badvalues.length; i++)
 try
 {
 FBUtilities.hexToBytes(badvalues[i]);
 fail(Invalid hex value accepted+badvalues[i]);
 } catch (Exception e){}
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2846) Changing replication_factor using update keyspace not working

2011-07-04 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059373#comment-13059373
 ] 

Jonas Borgström commented on CASSANDRA-2846:


Jonathan, thanks for your fast response. Your patch works for me.

 Changing replication_factor using update keyspace not working
 ---

 Key: CASSANDRA-2846
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2846
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.1
 Environment: A clean 0.8.1 install using the default configuration
Reporter: Jonas Borgström
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.2

 Attachments: 2846.txt


 Unless I've misunderstood the new way to do this with 0.8 I think update 
 keyspace is broken:
 {code}
 [default@unknown] create keyspace Test with placement_strategy = 
 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = 
 [{replication_factor:1}];
 37f70d40-a3e9-11e0--242d50cf1fbf
 Waiting for schema agreement...
 ... schemas agree across the cluster
 [default@unknown] describe keyspace Test;
 Keyspace: Test:
   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
   Durable Writes: true
 Options: [replication_factor:1]
   Column Families:
 [default@unknown] update keyspace Test with placement_strategy = 
 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = 
 [{replication_factor:2}];
 489fe220-a3e9-11e0--242d50cf1fbf
 Waiting for schema agreement...
 ... schemas agree across the cluster
 [default@unknown] describe keyspace Test; 
   
 Keyspace: Test:
   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
   Durable Writes: true
 Options: [replication_factor:1]
   Column Families:
 {code}
 Isn't the second describe keyspace supposed to to say 
 replication_factor:2?
 Relevant bits from system.log:
 {code}
 Migration.java (line 116) Applying migration 
 489fe220-a3e9-11e0--242d50cf1fbf Update keyspace Testrep 
 strategy:SimpleStrategy{}durable_writes: true to Testrep 
 strategy:SimpleStrategy{}durable_writes: true
 UpdateKeyspace.java (line 74) Keyspace updated. Please perform any manual 
 operations
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2852) Cassandra CLI - Import Keyspace Definitions from File - Comments do partitially interpret characters/commands

2011-07-04 Thread Pavel Yaskevich (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-2852:
---

Attachment: CASSANDRA-2852.patch

can be applied on both 0.7 and 0.8 branches.

 Cassandra CLI - Import Keyspace Definitions from File - Comments do 
 partitially interpret characters/commands
 -

 Key: CASSANDRA-2852
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2852
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 0.7.0
 Environment: Win Vista 
Reporter: jens mueller
Assignee: Pavel Yaskevich
Priority: Trivial
 Fix For: 0.7.7, 0.8.2

 Attachments: CASSANDRA-2852.patch


 Hello, 
 using: bin/cassandra-cli -host localhost --file conf/schema-sample.txt
 with schema-sample.txt having contents like this:
 /* here are a lot of comments,
 like this sample create keyspace;
 and so on
 */
 Will result in an error: 
 Line 1 = Syntax Error at Position 323: mismatched charackter 'EOF' 
 expecting '*'
 The Cause is the keyspace; statement = the semicolon ; causes the error.
 However:
 Writing the word keyspace; with quotes, does NOT lead to the error.
 so this works: 
 /* here are a lot of comments,
 like this sample create keyspace;
 and so on
 */
 From my point of view this is an error. Everyting between the Start Comment 
 = /* and End Comment = */ Should be treated as a comment and not be 
 interpreted in any way. Thats the definition of a comment, to be not 
 interpreted at all. 
 Or this must be documented somewhere very prominently, otherwise this will 
 lead to unnecessary wasting of time searching for this odd behavoiur. And it 
 makes commenting out statements much more cumbersome.
 Plattform: Windows Vista
 thanks

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query

2011-07-04 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059401#comment-13059401
 ] 

Mck SembWever commented on CASSANDRA-1125:
--

bq. using KeyRange but with tokens (which Thrift also uses for start-exclusive)
this is my preference. i'll make a patch for it.

 Filter out ColumnFamily rows that aren't part of the query
 --

 Key: CASSANDRA-1125
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Jeremy Hanna
Assignee: Mck SembWever
Priority: Minor
 Fix For: 1.0

 Attachments: 1125-formatted.txt, CASSANDRA-1125.patch


 Currently, when running a MapReduce job against data in a Cassandra data 
 store, it reads through all the data for a particular ColumnFamily.  This 
 could be optimized to only read through those rows that have to do with the 
 query.
 It's a small change but wanted to put it in Jira so that it didn't fall 
 through the cracks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[Cassandra Wiki] Update of ClientOptions by PriitKallas

2011-07-04 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The ClientOptions page has been changed by PriitKallas:
http://wiki.apache.org/cassandra/ClientOptions?action=diffrev1=131rev2=132

Comment:
Added link to the new high-level PHP Cassandra Client Library

   * Ruby:
* Cassandra: http://github.com/fauna/cassandra
   * PHP:
+   * Cassandra PHP Client Library: 
https://github.com/kallaspriit/Cassandra-PHP-Client-Library
* phpcassa: http://github.com/thobbs/phpcassa
  
  == Older clients ==


[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

2011-07-04 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2850:


Attachment: 2850-v2.patch

Attaching a so-called v2 version that avoids the string object creation of
each byte by encodind each char separately. This version shows a 30% speedup
on the 10MB array conversion (and ~15% speedup on the 1K array conversion)
compared to the version of the previous patch. It also will generate less
garbage.

I've also broaden the scope of this ticket because hexToBytes also need some
love (actually even more so) and the v2 patch ships with a improved version of
hexToByte. As it turns out hexToByte was really naive and was using
substring() on every 2 characters, generating a lot of String objects. On a
micro-benchmark converting strings of 1000 characters, the attached version
shows a ~13x (!) speedup improvement. It also generate much less garbage.

To add to what David said, let's note that those methods used to not matter
too much (they were used non performance sensitive places, like debug/error
messages, or SSTable2json (though performance in those tools don't hurt)), but
are now used by CQL for BytesType.

 Converting bytes to hex string is unnecessarily slow
 

 Key: CASSANDRA-2850
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.6, 0.8.1
Reporter: David Allsopp
Priority: Minor
 Fix For: 0.8.2

 Attachments: 2850-v2.patch, BytesToHexBenchmark.java, 
 BytesToHexBenchmark2.java, cassandra-2850a.diff


 ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the 
 StringBuilder (so several re-sizes will be needed behind the scenes) and it 
 makes quite a few method calls per byte.
 (OK, this may be a premature optimisation, but I couldn't resist, and it's a 
 small change)
 Will attach patch shortly that speeds it up by about x3, plus benchmarking 
 test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2851) hex-to-bytes conversion accepts invalid inputs silently

2011-07-04 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059413#comment-13059413
 ] 

Jonathan Ellis commented on CASSANDRA-2851:
---

bq. maybe that addition of a '0' in front was there for a reason

I think it's there b/c of Integer.toHexString: This value is converted to a 
string of ASCII digits in hexadecimal (base 16) with no extra leading 0s.

Our bytesToHex does pad... but only for single-digit results.  So if we fix 
hexToBytes we'll introduce an incompatibility. (Granted, a minor one.)

 hex-to-bytes conversion accepts invalid inputs silently
 ---

 Key: CASSANDRA-2851
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2851
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.6, 0.8.1
Reporter: David Allsopp
Priority: Minor
 Fix For: 0.8.2

 Attachments: cassandra-2851.diff


 FBUtilities.hexToBytes() has a minor bug - it copes with single-character 
 inputs by prepending 0, which is OK - but it does this for any input with 
 an odd number of characters, which is probably incorrect.
 {noformat}
 if (str.length() % 2 == 1)
 str = 0 + str;
 {noformat}
 Given 'fff' as an input, can we really assume that this should be '0fff'? 
 Isn't this just an error?
 Add the following to FBUtilitiesTest to demonstrate:
 {noformat}
 String[] badvalues = new String[]{000, fff};

 for (int i = 0; i  badvalues.length; i++)
 try
 {
 FBUtilities.hexToBytes(badvalues[i]);
 fail(Invalid hex value accepted+badvalues[i]);
 } catch (Exception e){}
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[Cassandra Wiki] Update of ClientOptions06 by PriitKallas

2011-07-04 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The ClientOptions06 page has been changed by PriitKallas:
http://wiki.apache.org/cassandra/ClientOptions06?action=diffrev1=5rev2=6

* Jassandra: http://code.google.com/p/jassandra/
* Kundera: http://code.google.com/p/kundera/
   * PHP :
+   * PHP Cassandra Client Library: 
http://github.com/kallaspriit/Cassandra-PHP-Client-Library
* Pandra: http://github.com/mjpearson/Pandra/tree/master
* PHP Cassa: http://github.com/hoan/phpcassa [port of pycassa to PHP]
   * Clojure :


svn commit: r1142647 - in /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra: db/ hadoop/ io/sstable/ streaming/

2011-07-04 Thread jbellis
Author: jbellis
Date: Mon Jul  4 13:02:05 2011
New Revision: 1142647

URL: http://svn.apache.org/viewvc?rev=1142647view=rev
Log:
revert incomplete changes

Modified:

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/ColumnFamilySerializer.java

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/hadoop/ConfigHelper.java

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/io/sstable/IndexHelper.java

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/io/sstable/SSTableIdentityIterator.java

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/streaming/IncomingStreamReader.java

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/streaming/PendingFile.java

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/streaming/StreamInSession.java

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/streaming/StreamOut.java

Modified: 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/ColumnFamilySerializer.java
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/ColumnFamilySerializer.java?rev=1142647r1=1142646r2=1142647view=diff
==
--- 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/ColumnFamilySerializer.java
 (original)
+++ 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/ColumnFamilySerializer.java
 Mon Jul  4 13:02:05 2011
@@ -130,12 +130,6 @@ public class ColumnFamilySerializer impl
 public void deserializeColumns(DataInput dis, ColumnFamily cf, boolean 
intern, boolean fromRemote) throws IOException
 {
 int size = dis.readInt();
-deserializeColumns(dis, cf, size, intern, fromRemote);
-}
-
-/* column count is already read from DataInput */
-public void deserializeColumns(DataInput dis, ColumnFamily cf, int size, 
boolean intern, boolean fromRemote) throws IOException
-{
 ColumnFamilyStore interner = intern ? 
Table.open(CFMetaData.getCF(cf.id()).left).getColumnFamilyStore(cf.id()) : null;
 for (int i = 0; i  size; ++i)
 {

Modified: 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java?rev=1142647r1=1142646r2=1142647view=diff
==
--- 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java
 (original)
+++ 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java
 Mon Jul  4 13:02:05 2011
@@ -35,9 +35,10 @@ import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 import org.apache.cassandra.db.IColumn;
-import org.apache.cassandra.dht.IPartitioner;
-import org.apache.cassandra.dht.Range;
-import org.apache.cassandra.thrift.*;
+import org.apache.cassandra.thrift.Cassandra;
+import org.apache.cassandra.thrift.InvalidRequestException;
+import org.apache.cassandra.thrift.TokenRange;
+import org.apache.cassandra.thrift.TBinaryProtocol;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.mapreduce.*;
 import org.apache.thrift.TException;
@@ -100,43 +101,11 @@ public class ColumnFamilyInputFormat ext
 
 try
 {
-KeyRange jobKeyRange = ConfigHelper.getInputKeyRange(conf);
-IPartitioner partitioner = null;
-Range jobRange = null;
-if (jobKeyRange != null)
-{
-partitioner = 
ConfigHelper.getPartitioner(context.getConfiguration());
-assert partitioner.preservesOrder() : 
ConfigHelper.setInputKeyRange(..) can only be used with a order preserving 
paritioner;
-jobRange = new 
Range(partitioner.getToken(jobKeyRange.start_key),
- partitioner.getToken(jobKeyRange.end_key),
- partitioner);
-}
-
 ListFutureListInputSplit splitfutures = new 
ArrayListFutureListInputSplit();
 for (TokenRange range : masterRangeNodes)
 {
-if (jobRange == null)
-{
 // for each range, pick a live owner and ask it to compute 
bite-sized splits
 splitfutures.add(executor.submit(new SplitCallable(range, 
conf)));
-}
-else
-{
-Range dhtRange = new 
Range(partitioner.getTokenFactory().fromString(range.start_token),
-   
partitioner.getTokenFactory().fromString(range.end_token),
-

[jira] [Updated] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-07-04 Thread Mck SembWever (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mck SembWever updated CASSANDRA-2388:
-

Attachment: CASSANDRA-2388-extended.patch

 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.7.6, 0.8.0
Reporter: Eldon Stegall
Assignee: Jeremy Hanna
  Labels: hadoop, inputformat
 Fix For: 0.7.7, 0.8.2

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388-addition1.patch, CASSANDRA-2388-extended.patch, 
 CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, 
 CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2816) Repair doesn't synchronize merkle tree creation properly

2011-07-04 Thread Terje Marthinussen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059445#comment-13059445
 ] 

Terje Marthinussen commented on CASSANDRA-2816:
---

Things definitely seems to be improved overall, but weird things still happens.

So... 12 node cluster, this is maybe ugly, I know, but start repair on all of 
them.
Most nodes are fine, but one goes crazy. Disk use is now 3-4 times what it was 
before the repair started, and it does not seem to be done yet.

I have really no idea if this is the case, but I am getting the hunch that this 
node has ended up streaming out some of the data it is getting in. Would this 
be possible?


 Repair doesn't synchronize merkle tree creation properly
 

 Key: CASSANDRA-2816
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2816
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
  Labels: repair
 Fix For: 0.8.2

 Attachments: 0001-Schedule-merkle-tree-request-one-by-one.patch


 Being a little slow, I just realized after having opened CASSANDRA-2811 and 
 CASSANDRA-2815 that there is a more general problem with repair.
 When a repair is started, it will send a number of merkle tree to its 
 neighbor as well as himself and assume for correction that the building of 
 those trees will be started on every node roughly at the same time (if not, 
 we end up comparing data snapshot at different time and will thus mistakenly 
 repair a lot of useless data). This is bogus for many reasons:
 * Because validation compaction runs on the same executor that other 
 compaction, the start of the validation on the different node is subject to 
 other compactions. 0.8 mitigates this in a way by being multi-threaded (and 
 thus there is less change to be blocked a long time by a long running 
 compaction), but the compaction executor being bounded, its still a problem)
 * if you run a nodetool repair without arguments, it will repair every CFs. 
 As a consequence it will generate lots of merkle tree requests and all of 
 those requests will be issued at the same time. Because even in 0.8 the 
 compaction executor is bounded, some of those validations will end up being 
 queued behind the first ones. Even assuming that the different validation are 
 submitted in the same order on each node (which isn't guaranteed either), 
 there is no guarantee that on all nodes, the first validation will take the 
 same time, hence desynchronizing the queued ones.
 Overall, it is important for the precision of repair that for a given CF and 
 range (which is the unit at which trees are computed), we make sure that all 
 node will start the validation at the same time (or, since we can't do magic, 
 as close as possible).
 One (reasonably simple) proposition to fix this would be to have repair 
 schedule validation compactions across nodes one by one (i.e, one CF/range at 
 a time), waiting for all nodes to return their tree before submitting the 
 next request. Then on each node, we should make sure that the node will start 
 the validation compaction as soon as requested. For that, we probably want to 
 have a specific executor for validation compaction and:
 * either we fail the whole repair whenever one node is not able to execute 
 the validation compaction right away (because no thread are available right 
 away).
 * we simply tell the user that if he start too many repairs in parallel, he 
 may start seeing some of those repairing more data than it should.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query

2011-07-04 Thread Mck SembWever (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mck SembWever updated CASSANDRA-1125:
-

Attachment: CASSANDRA-1125.patch

 Filter out ColumnFamily rows that aren't part of the query
 --

 Key: CASSANDRA-1125
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Jeremy Hanna
Assignee: Mck SembWever
Priority: Minor
 Fix For: 1.0

 Attachments: 1125-formatted.txt, CASSANDRA-1125.patch, 
 CASSANDRA-1125.patch


 Currently, when running a MapReduce job against data in a Cassandra data 
 store, it reads through all the data for a particular ColumnFamily.  This 
 could be optimized to only read through those rows that have to do with the 
 query.
 It's a small change but wanted to put it in Jira so that it didn't fall 
 through the cracks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2816) Repair doesn't synchronize merkle tree creation properly

2011-07-04 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059459#comment-13059459
 ] 

Sylvain Lebresne commented on CASSANDRA-2816:
-

bq. So... 12 node cluster, this is maybe ugly, I know, but start repair on all 
of them.

Is it started on all of them ? If so, this is kind of expected in the sense 
that the patch assumes that each node does not do more than 2 repairs (for any 
column family) at the same time (this is configurable through the new 
concurrent_validators option, but it's probably better to stick to 2 and 
stagger the repair). If you do more than that (that is, if you did repair on 
all node at the same time and RF2), then we're back on our old demons.

bq. I have really no idea if this is the case, but I am getting the hunch that 
this node has ended up streaming out some of the data it is getting in. Would 
this be possible?

Not really. That is, it could be that you create a merkle tree on some data and 
once you start streaming you, you're picking up data that was just streamed to 
you and wasn't there when computing the tree. This patch is suppose to fixes 
this in parts, but this can still happen if you do repairs in parallel on 
neighboring nodes. However, you shouldn't get into a situation where 2 nodes 
stream forever because they pick up what is just streamed to them for instance, 
because what is streaming is determined at the very beginning of the streaming 
session.

So my first question would be, was all those repair started in parallel. If 
yes, you shall not do this :). CASSANDRA-2606 and CASSANDRA-2610 are here to 
help making the repair of a full cluster much easier (and efficient), but right 
now it's more about getting patch in one at a time.
If the repairs were started one at a time in a rolling fashion, then we do have 
a unknown problem somewhere.

 Repair doesn't synchronize merkle tree creation properly
 

 Key: CASSANDRA-2816
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2816
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
  Labels: repair
 Fix For: 0.8.2

 Attachments: 0001-Schedule-merkle-tree-request-one-by-one.patch


 Being a little slow, I just realized after having opened CASSANDRA-2811 and 
 CASSANDRA-2815 that there is a more general problem with repair.
 When a repair is started, it will send a number of merkle tree to its 
 neighbor as well as himself and assume for correction that the building of 
 those trees will be started on every node roughly at the same time (if not, 
 we end up comparing data snapshot at different time and will thus mistakenly 
 repair a lot of useless data). This is bogus for many reasons:
 * Because validation compaction runs on the same executor that other 
 compaction, the start of the validation on the different node is subject to 
 other compactions. 0.8 mitigates this in a way by being multi-threaded (and 
 thus there is less change to be blocked a long time by a long running 
 compaction), but the compaction executor being bounded, its still a problem)
 * if you run a nodetool repair without arguments, it will repair every CFs. 
 As a consequence it will generate lots of merkle tree requests and all of 
 those requests will be issued at the same time. Because even in 0.8 the 
 compaction executor is bounded, some of those validations will end up being 
 queued behind the first ones. Even assuming that the different validation are 
 submitted in the same order on each node (which isn't guaranteed either), 
 there is no guarantee that on all nodes, the first validation will take the 
 same time, hence desynchronizing the queued ones.
 Overall, it is important for the precision of repair that for a given CF and 
 range (which is the unit at which trees are computed), we make sure that all 
 node will start the validation at the same time (or, since we can't do magic, 
 as close as possible).
 One (reasonably simple) proposition to fix this would be to have repair 
 schedule validation compactions across nodes one by one (i.e, one CF/range at 
 a time), waiting for all nodes to return their tree before submitting the 
 next request. Then on each node, we should make sure that the node will start 
 the validation compaction as soon as requested. For that, we probably want to 
 have a specific executor for validation compaction and:
 * either we fail the whole repair whenever one node is not able to execute 
 the validation compaction right away (because no thread are available right 
 away).
 * we simply tell the user that if he start too many repairs in parallel, he 
 may start seeing some of those repairing more data than it should.

--
This message is 

svn commit: r1142690 - in /cassandra/trunk: ./ src/java/org/apache/cassandra/db/ src/java/org/apache/cassandra/db/filter/ test/unit/org/apache/cassandra/db/ test/unit/org/apache/cassandra/db/compactio

2011-07-04 Thread slebresne
Author: slebresne
Date: Mon Jul  4 14:36:11 2011
New Revision: 1142690

URL: http://svn.apache.org/viewvc?rev=1142690view=rev
Log:
Reset CF and SC deletion time after gc_grace
patch by slebresne; reviewed by jbellis for CASSANDRA-2317

Added:

cassandra/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java
Modified:
cassandra/trunk/CHANGES.txt
cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamily.java
cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamilySerializer.java
cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
cassandra/trunk/src/java/org/apache/cassandra/db/IColumnContainer.java
cassandra/trunk/src/java/org/apache/cassandra/db/RowMutation.java
cassandra/trunk/src/java/org/apache/cassandra/db/SuperColumn.java
cassandra/trunk/src/java/org/apache/cassandra/db/filter/QueryFilter.java
cassandra/trunk/test/unit/org/apache/cassandra/db/RowTest.java

cassandra/trunk/test/unit/org/apache/cassandra/db/compaction/CompactionsPurgeTest.java
cassandra/trunk/test/unit/org/apache/cassandra/service/RowResolverTest.java

Modified: cassandra/trunk/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/trunk/CHANGES.txt?rev=1142690r1=1142689r2=1142690view=diff
==
--- cassandra/trunk/CHANGES.txt (original)
+++ cassandra/trunk/CHANGES.txt Mon Jul  4 14:36:11 2011
@@ -10,6 +10,7 @@
  * clean up tmp files after failed compaction (CASSANDRA-2468)
  * restrict repair streaming to specific columnfamilies (CASSANDRA-2280)
  * don't bother persisting columns shadowed by a row tombstone (CASSANDRA-2589)
+ * reset CF and SC deletion times after gc_grace (CASSANDRA-2317)
 
 
 0.8.2

Added: 
cassandra/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java
URL: 
http://svn.apache.org/viewvc/cassandra/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java?rev=1142690view=auto
==
--- 
cassandra/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java 
(added)
+++ 
cassandra/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java 
Mon Jul  4 14:36:11 2011
@@ -0,0 +1,212 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.db;
+
+import java.nio.ByteBuffer;
+import java.security.MessageDigest;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.Map;
+import java.util.SortedSet;
+import java.util.concurrent.ConcurrentSkipListMap;
+import java.util.concurrent.atomic.AtomicReference;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.cassandra.config.CFMetaData;
+import org.apache.cassandra.config.DatabaseDescriptor;
+import org.apache.cassandra.db.filter.QueryPath;
+import org.apache.cassandra.db.marshal.AbstractType;
+import org.apache.cassandra.io.ICompactSerializer2;
+import org.apache.cassandra.io.util.IIterableColumns;
+import org.apache.cassandra.utils.FBUtilities;
+
+public abstract class AbstractColumnContainer implements IColumnContainer, 
IIterableColumns
+{
+private static Logger logger = 
LoggerFactory.getLogger(AbstractColumnContainer.class);
+
+protected final ConcurrentSkipListMapByteBuffer, IColumn columns;
+protected final AtomicReferenceDeletionInfo deletionInfo = new 
AtomicReferenceDeletionInfo(new DeletionInfo());
+
+protected AbstractColumnContainer(ConcurrentSkipListMapByteBuffer, 
IColumn columns)
+{
+this.columns = columns;
+}
+
+@Deprecated // TODO this is a hack to set initial value outside constructor
+public void delete(int localtime, long timestamp)
+{
+deletionInfo.set(new DeletionInfo(timestamp, localtime));
+}
+
+public void delete(AbstractColumnContainer cc2)
+{
+// Keeping deletion info for max markedForDeleteAt value
+DeletionInfo current;
+DeletionInfo cc2Info = cc2.deletionInfo.get();
+while (true)
+{
+ current = deletionInfo.get();
+ if (current.markedForDeleteAt = cc2Info.markedForDeleteAt || 
deletionInfo.compareAndSet(current, 

[jira] [Commented] (CASSANDRA-2317) Column family deletion time is not always reseted after gc_grace

2011-07-04 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059460#comment-13059460
 ] 

Sylvain Lebresne commented on CASSANDRA-2317:
-

Committed to trunk (as I agree this should really go there).

bq. doesn't this mean that for a CF w/ no tombstone, we create a new 
deletioninfo every call to maybeReset?

You're right, I've included a current.localDeletionTime == Integer.MIN_VALUE in 
the condition to escape early in that case.

 Column family deletion time is not always reseted after gc_grace
 

 Key: CASSANDRA-2317
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2317
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.6
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 1.0

 Attachments: 
 0001-Add-AbstractColumnContainer-to-factor-common-parts-o.patch, 
 0002-Add-unit-test.patch, 
 0003-Reset-CF-and-SC-deletion-time-after-compaction.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 Follow up of CASSANDRA-2305.
 Reproducible (thanks to Jeffrey Wang) by: 
 Create a CF with gc_grace_seconds = 0 and no row cache.
 Insert row X, col A with timestamp 0.
 Insert row X, col B with timestamp 2.
 Remove row X with timestamp 1 (expect col A to disappear, col B to stay).
 Wait 1 second.
 Force flush and compaction.
 Insert row X, col A with timestamp 0.
 Read row X, col A (see nothing).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




buildbot success in ASF Buildbot on cassandra-trunk

2011-07-04 Thread buildbot
The Buildbot has detected a restored build on builder cassandra-trunk while 
building ASF Buildbot.
Full details are available at:
 http://ci.apache.org/builders/cassandra-trunk/builds/1407

Buildbot URL: http://ci.apache.org/

Buildslave for this Build: isis_ubuntu

Build Reason: scheduler
Build Source Stamp: [branch cassandra/trunk] 1142690
Blamelist: slebresne

Build succeeded!

sincerely,
 -The Buildbot



[jira] [Commented] (CASSANDRA-2317) Column family deletion time is not always reseted after gc_grace

2011-07-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059477#comment-13059477
 ] 

Hudson commented on CASSANDRA-2317:
---

Integrated in Cassandra #948 (See 
[https://builds.apache.org/job/Cassandra/948/])
Reset CF and SC deletion time after gc_grace
patch by slebresne; reviewed by jbellis for CASSANDRA-2317

slebresne : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1142690
Files : 
* /cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamily.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/filter/QueryFilter.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/RowMutation.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/SuperColumn.java
* /cassandra/trunk/CHANGES.txt
* /cassandra/trunk/test/unit/org/apache/cassandra/service/RowResolverTest.java
* /cassandra/trunk/test/unit/org/apache/cassandra/db/RowTest.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/IColumnContainer.java
* 
/cassandra/trunk/test/unit/org/apache/cassandra/db/compaction/CompactionsPurgeTest.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamilySerializer.java


 Column family deletion time is not always reseted after gc_grace
 

 Key: CASSANDRA-2317
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2317
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.6
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 1.0

 Attachments: 
 0001-Add-AbstractColumnContainer-to-factor-common-parts-o.patch, 
 0002-Add-unit-test.patch, 
 0003-Reset-CF-and-SC-deletion-time-after-compaction.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 Follow up of CASSANDRA-2305.
 Reproducible (thanks to Jeffrey Wang) by: 
 Create a CF with gc_grace_seconds = 0 and no row cache.
 Insert row X, col A with timestamp 0.
 Insert row X, col B with timestamp 2.
 Remove row X with timestamp 1 (expect col A to disappear, col B to stay).
 Wait 1 second.
 Force flush and compaction.
 Insert row X, col A with timestamp 0.
 Read row X, col A (see nothing).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2851) hex-to-bytes conversion accepts invalid inputs silently

2011-07-04 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059484#comment-13059484
 ] 

Sylvain Lebresne commented on CASSANDRA-2851:
-

bq. Our bytesToHex does pad... but only for single-digit results. So if we fix 
hexToBytes we'll introduce an incompatibility. (Granted, a minor one.)

I don't understand. There is no such thing as padding when you convert a byte 
array to hex (Integer.toHexString does return only the right number of 
hexadecimal digits because it has no reason to do otherwise, but it's an 
implementation detail of bytesToHex). A byte is always 8 bit, never 4, and the 
output of bytesToHex will *always* have a even number of characters (as it 
should). Our hexToBytes just happen to semi-randomly add 0 in front to 
transform a buggy input with an odd number of character to a even one, in the 
off chance that a client used the (stupid) optimization of removing at most 1 
leading 0 to win some space or something. In my opinion, it would be better to 
simply refuse odd sized input because this is more likely a truncated input 
(and people using stupid clients should fix them, though I'm ok with saying 
that we'll force them to fix it only on a major upgrade).

 hex-to-bytes conversion accepts invalid inputs silently
 ---

 Key: CASSANDRA-2851
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2851
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.6, 0.8.1
Reporter: David Allsopp
Priority: Minor
 Fix For: 0.8.2

 Attachments: cassandra-2851.diff


 FBUtilities.hexToBytes() has a minor bug - it copes with single-character 
 inputs by prepending 0, which is OK - but it does this for any input with 
 an odd number of characters, which is probably incorrect.
 {noformat}
 if (str.length() % 2 == 1)
 str = 0 + str;
 {noformat}
 Given 'fff' as an input, can we really assume that this should be '0fff'? 
 Isn't this just an error?
 Add the following to FBUtilitiesTest to demonstrate:
 {noformat}
 String[] badvalues = new String[]{000, fff};

 for (int i = 0; i  badvalues.length; i++)
 try
 {
 FBUtilities.hexToBytes(badvalues[i]);
 fail(Invalid hex value accepted+badvalues[i]);
 } catch (Exception e){}
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2816) Repair doesn't synchronize merkle tree creation properly

2011-07-04 Thread Terje Marthinussen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059485#comment-13059485
 ] 

Terje Marthinussen commented on CASSANDRA-2816:
---

Cool!

Then you confirmed what I have sort of believed for a while, but my 
understanding of code has been a bit in conflict with:
http://wiki.apache.org/cassandra/Operations
which says:
It is safe to run repair against multiple machines at the same time, but to 
minimize the impact on your application workload it is recommended to wait for 
it to complete on one node before invoking it against the next.

I have always read that as if you have the HW, go for it!

May I change to:
It is safe to run repair against multiple machines at the same time. However, 
to minimize the amount of data transferred during a repair, careful 
synchronization is required between the nodes taking part of the repair. 

This is difficult to do if nodes with the same data replicas runs repair at the 
same time and doing so can in extreme cases generate excessive transfers of 
data. 

Improvements is being worked on, but for now, avoid scheduling repair on 
several nodes with replicas of the same data at the same time.



 Repair doesn't synchronize merkle tree creation properly
 

 Key: CASSANDRA-2816
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2816
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
  Labels: repair
 Fix For: 0.8.2

 Attachments: 0001-Schedule-merkle-tree-request-one-by-one.patch


 Being a little slow, I just realized after having opened CASSANDRA-2811 and 
 CASSANDRA-2815 that there is a more general problem with repair.
 When a repair is started, it will send a number of merkle tree to its 
 neighbor as well as himself and assume for correction that the building of 
 those trees will be started on every node roughly at the same time (if not, 
 we end up comparing data snapshot at different time and will thus mistakenly 
 repair a lot of useless data). This is bogus for many reasons:
 * Because validation compaction runs on the same executor that other 
 compaction, the start of the validation on the different node is subject to 
 other compactions. 0.8 mitigates this in a way by being multi-threaded (and 
 thus there is less change to be blocked a long time by a long running 
 compaction), but the compaction executor being bounded, its still a problem)
 * if you run a nodetool repair without arguments, it will repair every CFs. 
 As a consequence it will generate lots of merkle tree requests and all of 
 those requests will be issued at the same time. Because even in 0.8 the 
 compaction executor is bounded, some of those validations will end up being 
 queued behind the first ones. Even assuming that the different validation are 
 submitted in the same order on each node (which isn't guaranteed either), 
 there is no guarantee that on all nodes, the first validation will take the 
 same time, hence desynchronizing the queued ones.
 Overall, it is important for the precision of repair that for a given CF and 
 range (which is the unit at which trees are computed), we make sure that all 
 node will start the validation at the same time (or, since we can't do magic, 
 as close as possible).
 One (reasonably simple) proposition to fix this would be to have repair 
 schedule validation compactions across nodes one by one (i.e, one CF/range at 
 a time), waiting for all nodes to return their tree before submitting the 
 next request. Then on each node, we should make sure that the node will start 
 the validation compaction as soon as requested. For that, we probably want to 
 have a specific executor for validation compaction and:
 * either we fail the whole repair whenever one node is not able to execute 
 the validation compaction right away (because no thread are available right 
 away).
 * we simply tell the user that if he start too many repairs in parallel, he 
 may start seeing some of those repairing more data than it should.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-2855) Add hadoop support option to skip rows with empty columns

2011-07-04 Thread Jeremy Hanna (JIRA)
Add hadoop support option to skip rows with empty columns
-

 Key: CASSANDRA-2855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2855
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna


We have been finding that range ghosts appear in results from Hadoop via Pig.  
This could also happen if rows don't have data for the slice predicate that is 
given.  This leads to having to do a painful amount of defensive checking on 
the Pig side, especially in the case of range ghosts.

We would like to add an option to skip rows that have no column values in it.  
That functionality existed before in core Cassandra but was removed because of 
the performance penalty of that checking.  However with Hadoop support in the 
RecordReader, that is batch oriented anyway, so individual row reading 
performance isn't as much of an issue.  Also we would make it an optional 
config parameter for each job anyway, so people wouldn't have to incur that 
penalty if they are confident that there won't be those empty rows or they 
don't care.

It could be parameter cassandra.skip.empty.rows and be true/false.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[Cassandra Wiki] Update of InstallThrift by Joe Stein

2011-07-04 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The InstallThrift page has been changed by Joe Stein:
http://wiki.apache.org/cassandra/InstallThrift?action=diffrev1=13rev2=14

  '''NOTE:''' If you arrived here for the purpose of writing your first 
application, please consider using a [[ClientOptions|higher-level client]] 
instead of thrift directly.
  
- [[http://incubator.apache.org/thrift|Thrift]] historically did not have 
tagged releases and Cassandra used trunk revisions of it. As of Cassandra 0.7, 
Thrift 0.5 is used. For Cassandra 0.6, you have to use the matching version of 
Thrift. Under such circumstances, installing thrift is a bit of a bitch.  We 
are sorry about that, but we don't know of a better way to support a vast 
number of clients mostly automagically.
+ [[http://thrift.apache.org/|Thrift]] historically did not have tagged 
releases and Cassandra used trunk revisions of it however as of Cassandra 0.8, 
Thrift 0.6 is used and available for 
[[http://thrift.apache.org/download/|download]].  With Cassandra 0.7, Thrift 
0.5 is used. For Cassandra 0.6, you have to use the matching version of Thrift. 
Under such circumstances, installing thrift is a bit of a bitch.  We are sorry 
about that, but we don't know of a better way to support a vast number of 
clients mostly automagically.
  
+ If installing Thrift 0.6 on a Mac for use with Cassandra 0.8 and you get an 
error building 'thrift.protocol.fastbinary' extension during `make` then you 
might need to work around https://issues.apache.org/jira/browse/THRIFT-1143 by 
going to thrift-0.6.1/lib/py and run `sudo ARCHFLAGS=-arch x86_64 python 
setup.py install`
+ 
- Important note: you need to install the svn revision of thrift that matches 
the revision that your version of Cassandra uses (if not using 0.7 with Thrift 
0.5). This can be found in the Cassandra Home/lib directory - e.g. 
`libthrift-917130.jar` means that version of Cassandra uses svn revision 917130 
of thrift.
+ Important note: If using Cassandra 0.6 then you need to install the svn 
revision of thrift that matches the revision that your version of Cassandra 
uses (if not using 0.8 with Thrift 0.6 nor 0.7 with Thrift 0.5). This can be 
found in the Cassandra Home/lib directory - e.g. `libthrift-917130.jar` means 
that version of Cassandra uses svn revision 917130 of thrift.
  
   1. `aptitude install libboost-dev python-dev autoconf automake pkg-config 
make libtool flex bison build-essential` (or the equivalent on your system) 
(assumes you are interested in building for python; omit python-dev otherwise)
   1. Grab the thrift source with the revision that your version of Cassandra 
uses: e.g. `svn co -r 917130 http://svn.apache.org/repos/asf/thrift/trunk 
thrift`


[jira] [Commented] (CASSANDRA-2851) hex-to-bytes conversion accepts invalid inputs silently

2011-07-04 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059500#comment-13059500
 ] 

Jonathan Ellis commented on CASSANDRA-2851:
---

You're right, I was misreading how we were using Integer.toHexString.

 hex-to-bytes conversion accepts invalid inputs silently
 ---

 Key: CASSANDRA-2851
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2851
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.6, 0.8.1
Reporter: David Allsopp
Priority: Minor
 Fix For: 0.8.2

 Attachments: cassandra-2851.diff


 FBUtilities.hexToBytes() has a minor bug - it copes with single-character 
 inputs by prepending 0, which is OK - but it does this for any input with 
 an odd number of characters, which is probably incorrect.
 {noformat}
 if (str.length() % 2 == 1)
 str = 0 + str;
 {noformat}
 Given 'fff' as an input, can we really assume that this should be '0fff'? 
 Isn't this just an error?
 Add the following to FBUtilitiesTest to demonstrate:
 {noformat}
 String[] badvalues = new String[]{000, fff};

 for (int i = 0; i  badvalues.length; i++)
 try
 {
 FBUtilities.hexToBytes(badvalues[i]);
 fail(Invalid hex value accepted+badvalues[i]);
 } catch (Exception e){}
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




svn commit: r1142725 - in /cassandra/branches/cassandra-0.8: CHANGES.txt src/java/org/apache/cassandra/cli/CliClient.java

2011-07-04 Thread jbellis
Author: jbellis
Date: Mon Jul  4 16:20:14 2011
New Revision: 1142725

URL: http://svn.apache.org/viewvc?rev=1142725view=rev
Log:
fix CLI perpetuating obsolete KsDef.replication_factor
patch by jbellis; tested by Jonas Borgström for CASSANDRA-2846

Modified:
cassandra/branches/cassandra-0.8/CHANGES.txt

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliClient.java

Modified: cassandra/branches/cassandra-0.8/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/CHANGES.txt?rev=1142725r1=1142724r2=1142725view=diff
==
--- cassandra/branches/cassandra-0.8/CHANGES.txt (original)
+++ cassandra/branches/cassandra-0.8/CHANGES.txt Mon Jul  4 16:20:14 2011
@@ -13,6 +13,7 @@
  * Correctly set default for replicate_on_write (CASSANDRA-2835)
  * improve nodetool compactionstats formatting (CASSANDRA-2844)
  * fix index-building status display (CASSANDRA-2853)
+ * fix CLI perpetuating obsolete KsDef.replication_factor (CASSANDRA-2846)
 
 
 0.8.1

Modified: 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliClient.java
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliClient.java?rev=1142725r1=1142724r2=1142725view=diff
==
--- 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliClient.java
 (original)
+++ 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliClient.java
 Mon Jul  4 16:20:14 2011
@@ -1072,7 +1072,10 @@ public class CliClient
 private KsDef updateKsDefAttributes(Tree statement, KsDef ksDefToUpdate)
 {
 KsDef ksDef = new KsDef(ksDefToUpdate);
-
+// server helpfully sets deprecated replication factor when it sends a 
KsDef back, for older clients.
+// we need to unset that on the new KsDef we create to avoid being 
treated as a legacy client in return.
+ksDef.unsetReplication_factor();
+
 // removing all column definitions - thrift system_update_keyspace 
method requires that 
 ksDef.setCf_defs(new LinkedListCfDef());
 




svn commit: r1142727 - in /cassandra/branches/cassandra-0.7: CHANGES.txt src/java/org/apache/cassandra/cli/CliMain.java

2011-07-04 Thread jbellis
Author: jbellis
Date: Mon Jul  4 16:22:12 2011
New Revision: 1142727

URL: http://svn.apache.org/viewvc?rev=1142727view=rev
Log:
improve cli treatment of multiline comments
patch by pyaskevich; reviewed by jbellis for CASSANDRA-2852

Modified:
cassandra/branches/cassandra-0.7/CHANGES.txt

cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/CliMain.java

Modified: cassandra/branches/cassandra-0.7/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/CHANGES.txt?rev=1142727r1=1142726r2=1142727view=diff
==
--- cassandra/branches/cassandra-0.7/CHANGES.txt (original)
+++ cassandra/branches/cassandra-0.7/CHANGES.txt Mon Jul  4 16:22:12 2011
@@ -31,6 +31,7 @@
(CASSANDRA-2841)
  * allow deleting a row and updating indexed columns in it in the
same mutation (CASSANDRA-2773)
+ * improve cli treatment of multiline comments (CASSANDRA-2852)
 
 
 0.7.6

Modified: 
cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/CliMain.java
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/CliMain.java?rev=1142727r1=1142726r2=1142727view=diff
==
--- 
cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/CliMain.java 
(original)
+++ 
cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/CliMain.java 
Mon Jul  4 16:22:12 2011
@@ -365,6 +365,8 @@ public class CliMain
 String line = ;
 String currentStatement = ;
 
+boolean commentedBlock = false;
+
 while ((line = reader.readLine()) != null)
 {
 line = line.trim();
@@ -373,6 +375,18 @@ public class CliMain
 if (line.isEmpty() || line.startsWith(--))
 continue;
 
+if (line.startsWith(/*))
+commentedBlock = true;
+
+if (line.startsWith(*/) || line.endsWith(*/))
+{
+commentedBlock = false;
+continue;
+}
+
+if (commentedBlock) // skip commented lines
+continue;
+
 currentStatement += line;
 
 if (line.endsWith(;))




svn commit: r1142729 - in /cassandra/branches/cassandra-0.8: ./ contrib/ interface/thrift/gen-java/org/apache/cassandra/thrift/ src/java/org/apache/cassandra/cli/ test/unit/org/apache/cassandra/db/

2011-07-04 Thread jbellis
Author: jbellis
Date: Mon Jul  4 16:23:36 2011
New Revision: 1142729

URL: http://svn.apache.org/viewvc?rev=1142729view=rev
Log:
merge from 0.7

Modified:
cassandra/branches/cassandra-0.8/   (props changed)
cassandra/branches/cassandra-0.8/CHANGES.txt
cassandra/branches/cassandra-0.8/contrib/   (props changed)

cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java
   (props changed)

cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java
   (props changed)

cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java
   (props changed)

cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/NotFoundException.java
   (props changed)

cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/SuperColumn.java
   (props changed)

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliMain.java

cassandra/branches/cassandra-0.8/test/unit/org/apache/cassandra/db/ColumnFamilyStoreTest.java

Propchange: cassandra/branches/cassandra-0.8/
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Mon Jul  4 16:23:36 2011
@@ -1,5 +1,5 @@
 
/cassandra/branches/cassandra-0.6:922689-1052356,1052358-1053452,1053454,1053456-1131291
-/cassandra/branches/cassandra-0.7:1026516-1140567,1140928,1141129,1141213,1141217
+/cassandra/branches/cassandra-0.7:1026516-1142727
 /cassandra/branches/cassandra-0.7.0:1053690-1055654
 /cassandra/branches/cassandra-0.8:1090934-1125013,1125041
 /cassandra/branches/cassandra-0.8.0:1125021-1130369

Modified: cassandra/branches/cassandra-0.8/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/CHANGES.txt?rev=1142729r1=1142728r2=1142729view=diff
==
--- cassandra/branches/cassandra-0.8/CHANGES.txt (original)
+++ cassandra/branches/cassandra-0.8/CHANGES.txt Mon Jul  4 16:23:36 2011
@@ -14,6 +14,7 @@
  * improve nodetool compactionstats formatting (CASSANDRA-2844)
  * fix index-building status display (CASSANDRA-2853)
  * fix CLI perpetuating obsolete KsDef.replication_factor (CASSANDRA-2846)
+ * improve cli treatment of multiline comments (CASSANDRA-2852)
 
 
 0.8.1

Propchange: cassandra/branches/cassandra-0.8/contrib/
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Mon Jul  4 16:23:36 2011
@@ -1,5 +1,5 @@
 
/cassandra/branches/cassandra-0.6/contrib:922689-1052356,1052358-1053452,1053454,1053456-1068009
-/cassandra/branches/cassandra-0.7/contrib:1026516-1140567,1140928,1141129,1141213,1141217
+/cassandra/branches/cassandra-0.7/contrib:1026516-1142727
 /cassandra/branches/cassandra-0.7.0/contrib:1053690-1055654
 /cassandra/branches/cassandra-0.8/contrib:1090934-1125013,1125041
 /cassandra/branches/cassandra-0.8.0/contrib:1125021-1130369

Propchange: 
cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Mon Jul  4 16:23:36 2011
@@ -1,5 +1,5 @@
 
/cassandra/branches/cassandra-0.6/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:922689-1052356,1052358-1053452,1053454,1053456-1131291
-/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1026516-1140567,1140928,1141129,1141213,1141217
+/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1026516-1142727
 
/cassandra/branches/cassandra-0.7.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1053690-1055654
 
/cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1090934-1125013,1125041
 
/cassandra/branches/cassandra-0.8.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1125021-1130369

Propchange: 
cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Mon Jul  4 16:23:36 2011
@@ -1,5 +1,5 @@
 
/cassandra/branches/cassandra-0.6/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:922689-1052356,1052358-1053452,1053454,1053456-1131291
-/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1026516-1140567,1140928,1141129,1141213,1141217
+/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1026516-1142727
 

[Cassandra Wiki] Update of FAQ by TylerHobbs

2011-07-04 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The FAQ page has been changed by TylerHobbs:
http://wiki.apache.org/cassandra/FAQ?action=diffrev1=122rev2=123

Comment:
Add CassandraClusterAdmin to the list of GUI admins

  Anchor(gui)
  
  == Is there a GUI admin tool for Cassandra? ==
- The closest is [[http://github.com/driftx/chiton|chiton]], a GTK data browser.
+  * [[http://github.com/driftx/chiton|chiton]], a GTK data browser.
- 
- Another java UI http://code.google.com/p/cassandra-gui, a Swing data browser.
+  * [[http://code.google.com/p/cassandra-gui|cassandra-gui]], a Swing data 
browser.
+  * [[https://github.com/sebgiroux/Cassandra-Cluster-Admin|Cassandra Cluster 
Admin]], a PHP-based web UI.
  
  Anchor(a_long_is_exactly_8_bytes)
  


[jira] [Commented] (CASSANDRA-2852) Cassandra CLI - Import Keyspace Definitions from File - Comments do partitially interpret characters/commands

2011-07-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059513#comment-13059513
 ] 

Hudson commented on CASSANDRA-2852:
---

Integrated in Cassandra-0.7 #520 (See 
[https://builds.apache.org/job/Cassandra-0.7/520/])
improve cli treatment of multiline comments
patch by pyaskevich; reviewed by jbellis for CASSANDRA-2852

jbellis : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1142727
Files : 
* /cassandra/branches/cassandra-0.7/CHANGES.txt
* 
/cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/CliMain.java


 Cassandra CLI - Import Keyspace Definitions from File - Comments do 
 partitially interpret characters/commands
 -

 Key: CASSANDRA-2852
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2852
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 0.7.0
 Environment: Win Vista 
Reporter: jens mueller
Assignee: Pavel Yaskevich
Priority: Trivial
 Fix For: 0.7.7, 0.8.2

 Attachments: CASSANDRA-2852.patch


 Hello, 
 using: bin/cassandra-cli -host localhost --file conf/schema-sample.txt
 with schema-sample.txt having contents like this:
 /* here are a lot of comments,
 like this sample create keyspace;
 and so on
 */
 Will result in an error: 
 Line 1 = Syntax Error at Position 323: mismatched charackter 'EOF' 
 expecting '*'
 The Cause is the keyspace; statement = the semicolon ; causes the error.
 However:
 Writing the word keyspace; with quotes, does NOT lead to the error.
 so this works: 
 /* here are a lot of comments,
 like this sample create keyspace;
 and so on
 */
 From my point of view this is an error. Everyting between the Start Comment 
 = /* and End Comment = */ Should be treated as a comment and not be 
 interpreted in any way. Thats the definition of a comment, to be not 
 interpreted at all. 
 Or this must be documented somewhere very prominently, otherwise this will 
 lead to unnecessary wasting of time searching for this odd behavoiur. And it 
 makes commenting out statements much more cumbersome.
 Plattform: Windows Vista
 thanks

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2846) Changing replication_factor using update keyspace not working

2011-07-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059518#comment-13059518
 ] 

Hudson commented on CASSANDRA-2846:
---

Integrated in Cassandra-0.8 #204 (See 
[https://builds.apache.org/job/Cassandra-0.8/204/])
fix CLI perpetuating obsolete KsDef.replication_factor
patch by jbellis; tested by Jonas Borgström for CASSANDRA-2846

jbellis : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1142725
Files : 
* 
/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliClient.java
* /cassandra/branches/cassandra-0.8/CHANGES.txt


 Changing replication_factor using update keyspace not working
 ---

 Key: CASSANDRA-2846
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2846
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.1
 Environment: A clean 0.8.1 install using the default configuration
Reporter: Jonas Borgström
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.2

 Attachments: 2846.txt


 Unless I've misunderstood the new way to do this with 0.8 I think update 
 keyspace is broken:
 {code}
 [default@unknown] create keyspace Test with placement_strategy = 
 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = 
 [{replication_factor:1}];
 37f70d40-a3e9-11e0--242d50cf1fbf
 Waiting for schema agreement...
 ... schemas agree across the cluster
 [default@unknown] describe keyspace Test;
 Keyspace: Test:
   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
   Durable Writes: true
 Options: [replication_factor:1]
   Column Families:
 [default@unknown] update keyspace Test with placement_strategy = 
 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = 
 [{replication_factor:2}];
 489fe220-a3e9-11e0--242d50cf1fbf
 Waiting for schema agreement...
 ... schemas agree across the cluster
 [default@unknown] describe keyspace Test; 
   
 Keyspace: Test:
   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
   Durable Writes: true
 Options: [replication_factor:1]
   Column Families:
 {code}
 Isn't the second describe keyspace supposed to to say 
 replication_factor:2?
 Relevant bits from system.log:
 {code}
 Migration.java (line 116) Applying migration 
 489fe220-a3e9-11e0--242d50cf1fbf Update keyspace Testrep 
 strategy:SimpleStrategy{}durable_writes: true to Testrep 
 strategy:SimpleStrategy{}durable_writes: true
 UpdateKeyspace.java (line 74) Keyspace updated. Please perform any manual 
 operations
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2851) hex-to-bytes conversion accepts invalid inputs silently

2011-07-04 Thread David Allsopp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059526#comment-13059526
 ] 

David Allsopp commented on CASSANDRA-2851:
--

The origin of the current behaviour is CASSANDRA-1411 
https://issues.apache.org/jira/browse/CASSANDRA-1411 if that helps...



 hex-to-bytes conversion accepts invalid inputs silently
 ---

 Key: CASSANDRA-2851
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2851
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.6, 0.8.1
Reporter: David Allsopp
Priority: Minor
 Fix For: 0.8.2

 Attachments: cassandra-2851.diff


 FBUtilities.hexToBytes() has a minor bug - it copes with single-character 
 inputs by prepending 0, which is OK - but it does this for any input with 
 an odd number of characters, which is probably incorrect.
 {noformat}
 if (str.length() % 2 == 1)
 str = 0 + str;
 {noformat}
 Given 'fff' as an input, can we really assume that this should be '0fff'? 
 Isn't this just an error?
 Add the following to FBUtilitiesTest to demonstrate:
 {noformat}
 String[] badvalues = new String[]{000, fff};

 for (int i = 0; i  badvalues.length; i++)
 try
 {
 FBUtilities.hexToBytes(badvalues[i]);
 fail(Invalid hex value accepted+badvalues[i]);
 } catch (Exception e){}
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query

2011-07-04 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-1125:
--

Attachment: 1125-v3.txt

v3 makes the KeyRange an implementation detail (setInputRange just takes 
Strings for start and end) and fixes a reference to the key fields in CFIF.

 Filter out ColumnFamily rows that aren't part of the query
 --

 Key: CASSANDRA-1125
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Jeremy Hanna
Assignee: Mck SembWever
Priority: Minor
 Fix For: 1.0

 Attachments: 1125-formatted.txt, 1125-v3.txt, CASSANDRA-1125.patch, 
 CASSANDRA-1125.patch


 Currently, when running a MapReduce job against data in a Cassandra data 
 store, it reads through all the data for a particular ColumnFamily.  This 
 could be optimized to only read through those rows that have to do with the 
 query.
 It's a small change but wanted to put it in Jira so that it didn't fall 
 through the cracks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2851) hex-to-bytes conversion accepts invalid inputs silently

2011-07-04 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059582#comment-13059582
 ] 

Jonathan Ellis commented on CASSANDRA-2851:
---

Good point, David.

Sounds like the problem is thinking of this as a generic hex conversion 
function, rather than as hex that specifically represents bytes.

 hex-to-bytes conversion accepts invalid inputs silently
 ---

 Key: CASSANDRA-2851
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2851
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.6, 0.8.1
Reporter: David Allsopp
Priority: Minor
 Fix For: 0.8.2

 Attachments: cassandra-2851.diff


 FBUtilities.hexToBytes() has a minor bug - it copes with single-character 
 inputs by prepending 0, which is OK - but it does this for any input with 
 an odd number of characters, which is probably incorrect.
 {noformat}
 if (str.length() % 2 == 1)
 str = 0 + str;
 {noformat}
 Given 'fff' as an input, can we really assume that this should be '0fff'? 
 Isn't this just an error?
 Add the following to FBUtilitiesTest to demonstrate:
 {noformat}
 String[] badvalues = new String[]{000, fff};

 for (int i = 0; i  badvalues.length; i++)
 try
 {
 FBUtilities.hexToBytes(badvalues[i]);
 fail(Invalid hex value accepted+badvalues[i]);
 } catch (Exception e){}
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

2011-07-04 Thread David Allsopp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059589#comment-13059589
 ] 

David Allsopp commented on CASSANDRA-2850:
--

I think you mean (bytes.remaining() * 2) not (bytes.remaining() / 2) - we need 
twice as many chars as bytes.

Also, shouldn't byteToChar[] have length 16, not 256.

Not sure what string creation you are referring to?

I attach 2 further versions of bytesToHex (as another benchmark class 3). 
Results are below (I've had to increasse the number of repeats so the stats are 
significant!).

v3 uses 'normal' code and is another 20% faster for large values, and _another_ 
factor of 2 faster than v2, i.e. 7-10 time sfatser than the original.

v4 uses nasty reflection to avoid doing an arraycopy on the byte array - this 
avoids a large chunk of memory (all the previous solutions end up doing an 
arraycopy somewhere). This is now 11-13 times fatser than the original.

20M old: 1482
20M new: 360
20M  v2: 249
20M  v3: 203
20M  v4: 125

old: 2137
new: 859
 v2: 718
 v3: 203
 v4: 156

old: 2138
new: 843
 v2: 733
 v3: 188
 v4: 156




 Converting bytes to hex string is unnecessarily slow
 

 Key: CASSANDRA-2850
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.6, 0.8.1
Reporter: David Allsopp
Priority: Minor
 Fix For: 0.8.2

 Attachments: 2850-v2.patch, BytesToHexBenchmark.java, 
 BytesToHexBenchmark2.java, cassandra-2850a.diff


 ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the 
 StringBuilder (so several re-sizes will be needed behind the scenes) and it 
 makes quite a few method calls per byte.
 (OK, this may be a premature optimisation, but I couldn't resist, and it's a 
 small change)
 Will attach patch shortly that speeds it up by about x3, plus benchmarking 
 test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

2011-07-04 Thread David Allsopp (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Allsopp updated CASSANDRA-2850:
-

Attachment: BytesToHexBenchmark3.java

 Converting bytes to hex string is unnecessarily slow
 

 Key: CASSANDRA-2850
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.6, 0.8.1
Reporter: David Allsopp
Priority: Minor
 Fix For: 0.8.2

 Attachments: 2850-v2.patch, BytesToHexBenchmark.java, 
 BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff


 ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the 
 StringBuilder (so several re-sizes will be needed behind the scenes) and it 
 makes quite a few method calls per byte.
 (OK, this may be a premature optimisation, but I couldn't resist, and it's a 
 small change)
 Will attach patch shortly that speeds it up by about x3, plus benchmarking 
 test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

2011-07-04 Thread David Allsopp (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Allsopp updated CASSANDRA-2850:
-

Attachment: (was: BytesToHexBenchmark3.java)

 Converting bytes to hex string is unnecessarily slow
 

 Key: CASSANDRA-2850
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.6, 0.8.1
Reporter: David Allsopp
Priority: Minor
 Fix For: 0.8.2

 Attachments: 2850-v2.patch, BytesToHexBenchmark.java, 
 BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff


 ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the 
 StringBuilder (so several re-sizes will be needed behind the scenes) and it 
 makes quite a few method calls per byte.
 (OK, this may be a premature optimisation, but I couldn't resist, and it's a 
 small change)
 Will attach patch shortly that speeds it up by about x3, plus benchmarking 
 test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

2011-07-04 Thread David Allsopp (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Allsopp updated CASSANDRA-2850:
-

Attachment: BytesToHexBenchmark3.java

 Converting bytes to hex string is unnecessarily slow
 

 Key: CASSANDRA-2850
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.6, 0.8.1
Reporter: David Allsopp
Priority: Minor
 Fix For: 0.8.2

 Attachments: 2850-v2.patch, BytesToHexBenchmark.java, 
 BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff


 ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the 
 StringBuilder (so several re-sizes will be needed behind the scenes) and it 
 makes quite a few method calls per byte.
 (OK, this may be a premature optimisation, but I couldn't resist, and it's a 
 small change)
 Will attach patch shortly that speeds it up by about x3, plus benchmarking 
 test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

2011-07-04 Thread David Allsopp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059589#comment-13059589
 ] 

David Allsopp edited comment on CASSANDRA-2850 at 7/4/11 7:46 PM:
--

I think you mean (bytes.remaining() * 2) not (bytes.remaining() / 2) - we need 
twice as many chars as bytes.

Also, shouldn't byteToChar[] have length 16, not 256?

Not sure what string creation you are referring to?

I attach 2 further versions of bytesToHex (as another benchmark class 3). 
Results are below (I've had to increase the number of repeats so the stats are 
significant!).

v3 uses 'normal' code and is another 20% faster for large values, and _another_ 
factor of 2 faster than v2, i.e. 7-10 times faster than the original.

v4 uses nasty reflection to avoid doing an arraycopy on the byte array - this 
avoids a large chunk of memory (all the previous solutions end up doing an 
arraycopy somewhere). This is now 11-13 times faster than the original.

20M old: 1482
20M new: 360
20M  v2: 249
20M  v3: 203
20M  v4: 125

old: 2137
new: 859
 v2: 718
 v3: 203
 v4: 156

old: 2138
new: 843
 v2: 733
 v3: 188
 v4: 156




  was (Author: dallsopp):
I think you mean (bytes.remaining() * 2) not (bytes.remaining() / 2) - we 
need twice as many chars as bytes.

Also, shouldn't byteToChar[] have length 16, not 256.

Not sure what string creation you are referring to?

I attach 2 further versions of bytesToHex (as another benchmark class 3). 
Results are below (I've had to increasse the number of repeats so the stats are 
significant!).

v3 uses 'normal' code and is another 20% faster for large values, and _another_ 
factor of 2 faster than v2, i.e. 7-10 time sfatser than the original.

v4 uses nasty reflection to avoid doing an arraycopy on the byte array - this 
avoids a large chunk of memory (all the previous solutions end up doing an 
arraycopy somewhere). This is now 11-13 times fatser than the original.

20M old: 1482
20M new: 360
20M  v2: 249
20M  v3: 203
20M  v4: 125

old: 2137
new: 859
 v2: 718
 v3: 203
 v4: 156

old: 2138
new: 843
 v2: 733
 v3: 188
 v4: 156



  
 Converting bytes to hex string is unnecessarily slow
 

 Key: CASSANDRA-2850
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.6, 0.8.1
Reporter: David Allsopp
Priority: Minor
 Fix For: 0.8.2

 Attachments: 2850-v2.patch, BytesToHexBenchmark.java, 
 BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff


 ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the 
 StringBuilder (so several re-sizes will be needed behind the scenes) and it 
 makes quite a few method calls per byte.
 (OK, this may be a premature optimisation, but I couldn't resist, and it's a 
 small change)
 Will attach patch shortly that speeds it up by about x3, plus benchmarking 
 test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

2011-07-04 Thread David Allsopp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059595#comment-13059595
 ] 

David Allsopp commented on CASSANDRA-2850:
--

An issue with using hex at all is that we can't represent the maximum 2GB 
column value. If we have Integer.MAX_VALUE bytes, then we need twice as many 
chars - and arrays in Java are limited to Integer.MAX_VALUE.



 Converting bytes to hex string is unnecessarily slow
 

 Key: CASSANDRA-2850
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.6, 0.8.1
Reporter: David Allsopp
Priority: Minor
 Fix For: 0.8.2

 Attachments: 2850-v2.patch, BytesToHexBenchmark.java, 
 BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff


 ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the 
 StringBuilder (so several re-sizes will be needed behind the scenes) and it 
 makes quite a few method calls per byte.
 (OK, this may be a premature optimisation, but I couldn't resist, and it's a 
 small change)
 Will attach patch shortly that speeds it up by about x3, plus benchmarking 
 test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

2011-07-04 Thread David Allsopp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059607#comment-13059607
 ] 

David Allsopp commented on CASSANDRA-2850:
--

I can't improve any further on Sylvain's hexToByte - nice work!

 Converting bytes to hex string is unnecessarily slow
 

 Key: CASSANDRA-2850
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.6, 0.8.1
Reporter: David Allsopp
Priority: Minor
 Fix For: 0.8.2

 Attachments: 2850-v2.patch, BytesToHexBenchmark.java, 
 BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff


 ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the 
 StringBuilder (so several re-sizes will be needed behind the scenes) and it 
 makes quite a few method calls per byte.
 (OK, this may be a premature optimisation, but I couldn't resist, and it's a 
 small change)
 Will attach patch shortly that speeds it up by about x3, plus benchmarking 
 test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[Cassandra Wiki] Update of StorageConfiguration by AlexisLeQuoc

2011-07-04 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The StorageConfiguration page has been changed by AlexisLeQuoc:
http://wiki.apache.org/cassandra/StorageConfiguration?action=diffrev1=58rev2=59

  
   * '''replica_placement_strategy''' and '''replication_factor'''
  
+ === Pre-0.8.1 ===
  Strategy: Setting this to the class that implements 
{{{IReplicaPlacementStrategy}}} will change the way the node picker works. Out 
of the box, Cassandra provides 
{{{org.apache.cassandra.locator.RackUnawareStrategy}}} and 
{{{org.apache.cassandra.locator.RackAwareStrategy}}} (place one replica in a 
different datacenter, and the others on different racks in the same one.)
  
  Note that the replication factor (RF) is the ''total'' number of nodes onto 
which the data will be placed.  So, a replication factor of 1 means that only 1 
node will have the data.  It does '''not''' mean that one ''other'' node will 
have the data.
  
  Defaults are: 'org.apache.cassandra.locator.RackUnawareStrategy' and '1'. RF 
of at least 2 is highly recommended, keeping in mind that your effective number 
of nodes is (N total nodes / RF).
+ 
+ === 0.8.1 ===
+ Strategy: Setting this to the class that implements 
{{{IReplicaPlacementStrategy}}} will change the way the node picker works. Out 
of the box, Cassandra provides 
{{{org.apache.cassandra.locator.SimpleStrategy}}}, 
{{{org.apache.cassandra.locator.LocalStrategy}}} and  
{{{org.apache.cassandra.locator.NetworkTopologyStrategy}}} (place one replica 
in a different datacenter, and the others on different racks in the same one.)
+ 
+ Note that the replication factor (RF) is the ''total'' number of nodes onto 
which the data will be placed.  So, a replication factor of 1 means that only 1 
node will have the data.  It does '''not''' mean that one ''other'' node will 
have the data.
+ 
+ Defaults are: 'org.apache.cassandra.locator.NetworkTopologyStrategy' and '1'. 
RF of at least 2 is highly recommended, keeping in mind that your effective 
number of nodes is (N total nodes / RF).
  
  == per-ColumnFamily Settings ==
   * '''comment''' and '''name'''


[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

2011-07-04 Thread David Allsopp (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Allsopp updated CASSANDRA-2850:
-

Attachment: BytesToHexBenchmark3.java

 Converting bytes to hex string is unnecessarily slow
 

 Key: CASSANDRA-2850
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.6, 0.8.1
Reporter: David Allsopp
Priority: Minor
 Fix For: 0.8.2

 Attachments: 2850-v2.patch, BytesToHexBenchmark.java, 
 BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff


 ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the 
 StringBuilder (so several re-sizes will be needed behind the scenes) and it 
 makes quite a few method calls per byte.
 (OK, this may be a premature optimisation, but I couldn't resist, and it's a 
 small change)
 Will attach patch shortly that speeds it up by about x3, plus benchmarking 
 test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

2011-07-04 Thread David Allsopp (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Allsopp updated CASSANDRA-2850:
-

Attachment: (was: BytesToHexBenchmark3.java)

 Converting bytes to hex string is unnecessarily slow
 

 Key: CASSANDRA-2850
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.6, 0.8.1
Reporter: David Allsopp
Priority: Minor
 Fix For: 0.8.2

 Attachments: 2850-v2.patch, BytesToHexBenchmark.java, 
 BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff


 ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the 
 StringBuilder (so several re-sizes will be needed behind the scenes) and it 
 makes quite a few method calls per byte.
 (OK, this may be a premature optimisation, but I couldn't resist, and it's a 
 small change)
 Will attach patch shortly that speeds it up by about x3, plus benchmarking 
 test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

2011-07-04 Thread David Allsopp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059622#comment-13059622
 ] 

David Allsopp commented on CASSANDRA-2850:
--

Update - the benchmark version 3 was running v3 twice, not v3 then v4. Have 
re-attached. New results are:

20M old: 1435
20M new: 376
20M  v2: 405
20M  v3: 141
20M  v4: 93
20M old: 1265
20M new: 360
20M  v2: 234
20M  v3: 187
20M  v4: 78
20M old: 1233
20M new: 376
20M  v2: 452
20M  v3: 125
20M  v4: 63

old: 2184
new: 906
 v2: 577
 v3: 188
 v4: 172

old: 2215
new: 937
 v2: 593
 v3: 188
 v4: 156


 Converting bytes to hex string is unnecessarily slow
 

 Key: CASSANDRA-2850
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.6, 0.8.1
Reporter: David Allsopp
Priority: Minor
 Fix For: 0.8.2

 Attachments: 2850-v2.patch, BytesToHexBenchmark.java, 
 BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff


 ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the 
 StringBuilder (so several re-sizes will be needed behind the scenes) and it 
 makes quite a few method calls per byte.
 (OK, this may be a premature optimisation, but I couldn't resist, and it's a 
 small change)
 Will attach patch shortly that speeds it up by about x3, plus benchmarking 
 test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

2011-07-04 Thread David Allsopp (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Allsopp updated CASSANDRA-2850:
-

Attachment: (was: BytesToHexBenchmark3.java)

 Converting bytes to hex string is unnecessarily slow
 

 Key: CASSANDRA-2850
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.6, 0.8.1
Reporter: David Allsopp
Priority: Minor
 Fix For: 0.8.2

 Attachments: 2850-v2.patch, BytesToHexBenchmark.java, 
 BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff


 ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the 
 StringBuilder (so several re-sizes will be needed behind the scenes) and it 
 makes quite a few method calls per byte.
 (OK, this may be a premature optimisation, but I couldn't resist, and it's a 
 small change)
 Will attach patch shortly that speeds it up by about x3, plus benchmarking 
 test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

2011-07-04 Thread David Allsopp (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Allsopp updated CASSANDRA-2850:
-

Attachment: BytesToHexBenchmark3.java

 Converting bytes to hex string is unnecessarily slow
 

 Key: CASSANDRA-2850
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.6, 0.8.1
Reporter: David Allsopp
Priority: Minor
 Fix For: 0.8.2

 Attachments: 2850-v2.patch, BytesToHexBenchmark.java, 
 BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff


 ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the 
 StringBuilder (so several re-sizes will be needed behind the scenes) and it 
 makes quite a few method calls per byte.
 (OK, this may be a premature optimisation, but I couldn't resist, and it's a 
 small change)
 Will attach patch shortly that speeds it up by about x3, plus benchmarking 
 test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

2011-07-04 Thread David Allsopp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059622#comment-13059622
 ] 

David Allsopp edited comment on CASSANDRA-2850 at 7/4/11 9:48 PM:
--

Update - the benchmark version 3 was running v3 twice, not v3 then v4. Have 
re-attached. New results are 15-19x faster for 20MB values, 13-14x faster for 
1KB values.

20M old: 1435
20M new: 376
20M  v2: 405
20M  v3: 141
20M  v4: 93
20M old: 1265
20M new: 360
20M  v2: 234
20M  v3: 187
20M  v4: 78
20M old: 1233
20M new: 376
20M  v2: 452
20M  v3: 125
20M  v4: 63

old: 2184
new: 906
 v2: 577
 v3: 188
 v4: 172

old: 2215
new: 937
 v2: 593
 v3: 188
 v4: 156


  was (Author: dallsopp):
Update - the benchmark version 3 was running v3 twice, not v3 then v4. Have 
re-attached. New results are:

20M old: 1435
20M new: 376
20M  v2: 405
20M  v3: 141
20M  v4: 93
20M old: 1265
20M new: 360
20M  v2: 234
20M  v3: 187
20M  v4: 78
20M old: 1233
20M new: 376
20M  v2: 452
20M  v3: 125
20M  v4: 63

old: 2184
new: 906
 v2: 577
 v3: 188
 v4: 172

old: 2215
new: 937
 v2: 593
 v3: 188
 v4: 156

  
 Converting bytes to hex string is unnecessarily slow
 

 Key: CASSANDRA-2850
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.6, 0.8.1
Reporter: David Allsopp
Priority: Minor
 Fix For: 0.8.2

 Attachments: 2850-v2.patch, BytesToHexBenchmark.java, 
 BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff


 ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the 
 StringBuilder (so several re-sizes will be needed behind the scenes) and it 
 makes quite a few method calls per byte.
 (OK, this may be a premature optimisation, but I couldn't resist, and it's a 
 small change)
 Will attach patch shortly that speeds it up by about x3, plus benchmarking 
 test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

2011-07-04 Thread David Allsopp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059633#comment-13059633
 ] 

David Allsopp commented on CASSANDRA-2850:
--

Although the bytesToHex reflection hack is a bit horrible, it makes a huge 
difference with really big values - I've just been trying different input sizes 
(with -Xmx4g -Xms4g on a 6GB machine) and the JVM falls over with OOM at about 
300MB for all the other versions, but copes with 675MB for v4. 

With the other versions, for byte array size N, we also need at least 2N for 
the StringBuilder or char[], then another 2N for the String (because the normal 
String constructors and methods always do an arraycopy of the input byte[] - 
5N. 

I wonder where else in the code this sort of thing occurs...?

 Converting bytes to hex string is unnecessarily slow
 

 Key: CASSANDRA-2850
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.6, 0.8.1
Reporter: David Allsopp
Priority: Minor
 Fix For: 0.8.2

 Attachments: 2850-v2.patch, BytesToHexBenchmark.java, 
 BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff


 ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the 
 StringBuilder (so several re-sizes will be needed behind the scenes) and it 
 makes quite a few method calls per byte.
 (OK, this may be a premature optimisation, but I couldn't resist, and it's a 
 small change)
 Will attach patch shortly that speeds it up by about x3, plus benchmarking 
 test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2816) Repair doesn't synchronize merkle tree creation properly

2011-07-04 Thread Terje Marthinussen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059637#comment-13059637
 ] 

Terje Marthinussen commented on CASSANDRA-2816:
---

Regardless of change of documentation however, I don't think it should be 
possible to actually trigger a scenario like this in the first place.

The system should protect the user from that.

I also noticed that in this case, we have RF3. The node which is going somewhat 
crazy is number 6, however during the repair, it does log that it talks 
compares and streams data with node 4, 5, 7 and 8.

Seems like a couple of nodes too many?

 Repair doesn't synchronize merkle tree creation properly
 

 Key: CASSANDRA-2816
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2816
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
  Labels: repair
 Fix For: 0.8.2

 Attachments: 0001-Schedule-merkle-tree-request-one-by-one.patch


 Being a little slow, I just realized after having opened CASSANDRA-2811 and 
 CASSANDRA-2815 that there is a more general problem with repair.
 When a repair is started, it will send a number of merkle tree to its 
 neighbor as well as himself and assume for correction that the building of 
 those trees will be started on every node roughly at the same time (if not, 
 we end up comparing data snapshot at different time and will thus mistakenly 
 repair a lot of useless data). This is bogus for many reasons:
 * Because validation compaction runs on the same executor that other 
 compaction, the start of the validation on the different node is subject to 
 other compactions. 0.8 mitigates this in a way by being multi-threaded (and 
 thus there is less change to be blocked a long time by a long running 
 compaction), but the compaction executor being bounded, its still a problem)
 * if you run a nodetool repair without arguments, it will repair every CFs. 
 As a consequence it will generate lots of merkle tree requests and all of 
 those requests will be issued at the same time. Because even in 0.8 the 
 compaction executor is bounded, some of those validations will end up being 
 queued behind the first ones. Even assuming that the different validation are 
 submitted in the same order on each node (which isn't guaranteed either), 
 there is no guarantee that on all nodes, the first validation will take the 
 same time, hence desynchronizing the queued ones.
 Overall, it is important for the precision of repair that for a given CF and 
 range (which is the unit at which trees are computed), we make sure that all 
 node will start the validation at the same time (or, since we can't do magic, 
 as close as possible).
 One (reasonably simple) proposition to fix this would be to have repair 
 schedule validation compactions across nodes one by one (i.e, one CF/range at 
 a time), waiting for all nodes to return their tree before submitting the 
 next request. Then on each node, we should make sure that the node will start 
 the validation compaction as soon as requested. For that, we probably want to 
 have a specific executor for validation compaction and:
 * either we fail the whole repair whenever one node is not able to execute 
 the validation compaction right away (because no thread are available right 
 away).
 * we simply tell the user that if he start too many repairs in parallel, he 
 may start seeing some of those repairing more data than it should.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2816) Repair doesn't synchronize merkle tree creation properly

2011-07-04 Thread Terje Marthinussen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059636#comment-13059636
 ] 

Terje Marthinussen commented on CASSANDRA-2816:
---

Regardless of change of documentation however, I don't think it should be 
possible to actually trigger a scenario like this in the first place.

The system should protect the user from that.

I also noticed that in this case, we have RF3. The node which is going somewhat 
crazy is number 6, however during the repair, it does log that it talks 
compares and streams data with node 4, 5, 7 and 8.

Seems like a couple of nodes too many?

 Repair doesn't synchronize merkle tree creation properly
 

 Key: CASSANDRA-2816
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2816
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
  Labels: repair
 Fix For: 0.8.2

 Attachments: 0001-Schedule-merkle-tree-request-one-by-one.patch


 Being a little slow, I just realized after having opened CASSANDRA-2811 and 
 CASSANDRA-2815 that there is a more general problem with repair.
 When a repair is started, it will send a number of merkle tree to its 
 neighbor as well as himself and assume for correction that the building of 
 those trees will be started on every node roughly at the same time (if not, 
 we end up comparing data snapshot at different time and will thus mistakenly 
 repair a lot of useless data). This is bogus for many reasons:
 * Because validation compaction runs on the same executor that other 
 compaction, the start of the validation on the different node is subject to 
 other compactions. 0.8 mitigates this in a way by being multi-threaded (and 
 thus there is less change to be blocked a long time by a long running 
 compaction), but the compaction executor being bounded, its still a problem)
 * if you run a nodetool repair without arguments, it will repair every CFs. 
 As a consequence it will generate lots of merkle tree requests and all of 
 those requests will be issued at the same time. Because even in 0.8 the 
 compaction executor is bounded, some of those validations will end up being 
 queued behind the first ones. Even assuming that the different validation are 
 submitted in the same order on each node (which isn't guaranteed either), 
 there is no guarantee that on all nodes, the first validation will take the 
 same time, hence desynchronizing the queued ones.
 Overall, it is important for the precision of repair that for a given CF and 
 range (which is the unit at which trees are computed), we make sure that all 
 node will start the validation at the same time (or, since we can't do magic, 
 as close as possible).
 One (reasonably simple) proposition to fix this would be to have repair 
 schedule validation compactions across nodes one by one (i.e, one CF/range at 
 a time), waiting for all nodes to return their tree before submitting the 
 next request. Then on each node, we should make sure that the node will start 
 the validation compaction as soon as requested. For that, we probably want to 
 have a specific executor for validation compaction and:
 * either we fail the whole repair whenever one node is not able to execute 
 the validation compaction right away (because no thread are available right 
 away).
 * we simply tell the user that if he start too many repairs in parallel, he 
 may start seeing some of those repairing more data than it should.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2816) Repair doesn't synchronize merkle tree creation properly

2011-07-04 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059643#comment-13059643
 ] 

Jonathan Ellis commented on CASSANDRA-2816:
---

bq. May I change to

Sure.

bq. The system should protect the user from that

I'm not sure that in a p2p design we can posit an omniscient the system.

 Repair doesn't synchronize merkle tree creation properly
 

 Key: CASSANDRA-2816
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2816
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
  Labels: repair
 Fix For: 0.8.2

 Attachments: 0001-Schedule-merkle-tree-request-one-by-one.patch


 Being a little slow, I just realized after having opened CASSANDRA-2811 and 
 CASSANDRA-2815 that there is a more general problem with repair.
 When a repair is started, it will send a number of merkle tree to its 
 neighbor as well as himself and assume for correction that the building of 
 those trees will be started on every node roughly at the same time (if not, 
 we end up comparing data snapshot at different time and will thus mistakenly 
 repair a lot of useless data). This is bogus for many reasons:
 * Because validation compaction runs on the same executor that other 
 compaction, the start of the validation on the different node is subject to 
 other compactions. 0.8 mitigates this in a way by being multi-threaded (and 
 thus there is less change to be blocked a long time by a long running 
 compaction), but the compaction executor being bounded, its still a problem)
 * if you run a nodetool repair without arguments, it will repair every CFs. 
 As a consequence it will generate lots of merkle tree requests and all of 
 those requests will be issued at the same time. Because even in 0.8 the 
 compaction executor is bounded, some of those validations will end up being 
 queued behind the first ones. Even assuming that the different validation are 
 submitted in the same order on each node (which isn't guaranteed either), 
 there is no guarantee that on all nodes, the first validation will take the 
 same time, hence desynchronizing the queued ones.
 Overall, it is important for the precision of repair that for a given CF and 
 range (which is the unit at which trees are computed), we make sure that all 
 node will start the validation at the same time (or, since we can't do magic, 
 as close as possible).
 One (reasonably simple) proposition to fix this would be to have repair 
 schedule validation compactions across nodes one by one (i.e, one CF/range at 
 a time), waiting for all nodes to return their tree before submitting the 
 next request. Then on each node, we should make sure that the node will start 
 the validation compaction as soon as requested. For that, we probably want to 
 have a specific executor for validation compaction and:
 * either we fail the whole repair whenever one node is not able to execute 
 the validation compaction right away (because no thread are available right 
 away).
 * we simply tell the user that if he start too many repairs in parallel, he 
 may start seeing some of those repairing more data than it should.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2816) Repair doesn't synchronize merkle tree creation properly

2011-07-04 Thread Terje Marthinussen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059655#comment-13059655
 ] 

Terje Marthinussen commented on CASSANDRA-2816:
---

bq.I'm not sure that in a p2p design we can posit an omniscient the system.

Is that a philosophical statement? :)

As Cassandra, at least for now, is a p2p network with fairly clearly defined 
boundaries, I will continue calling it a system for now :)

However, looking at it from the p2p viewpoint, the user potentially have no 
clue about where replicas are stored and given this, it may be impossible for 
the user to issue repair manually on more than one node at a time without 
getting in trouble. Given a large enough p2p setup, it would also be 
non-trivial to actually schedule a complete repair without ending up with 2 or 
more repairs running on the same replica set.

Since Cassandra do no checkpoint the synchronization so it is forced to rescan 
everything on every repair, repairs easily take so long that you are forced to 
run it on several nodes at a time if you are going to manage to finish 
repairing all nodes in 10 days...

Anyway, this is way outside the scope of this jira :)

 Repair doesn't synchronize merkle tree creation properly
 

 Key: CASSANDRA-2816
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2816
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
  Labels: repair
 Fix For: 0.8.2

 Attachments: 0001-Schedule-merkle-tree-request-one-by-one.patch


 Being a little slow, I just realized after having opened CASSANDRA-2811 and 
 CASSANDRA-2815 that there is a more general problem with repair.
 When a repair is started, it will send a number of merkle tree to its 
 neighbor as well as himself and assume for correction that the building of 
 those trees will be started on every node roughly at the same time (if not, 
 we end up comparing data snapshot at different time and will thus mistakenly 
 repair a lot of useless data). This is bogus for many reasons:
 * Because validation compaction runs on the same executor that other 
 compaction, the start of the validation on the different node is subject to 
 other compactions. 0.8 mitigates this in a way by being multi-threaded (and 
 thus there is less change to be blocked a long time by a long running 
 compaction), but the compaction executor being bounded, its still a problem)
 * if you run a nodetool repair without arguments, it will repair every CFs. 
 As a consequence it will generate lots of merkle tree requests and all of 
 those requests will be issued at the same time. Because even in 0.8 the 
 compaction executor is bounded, some of those validations will end up being 
 queued behind the first ones. Even assuming that the different validation are 
 submitted in the same order on each node (which isn't guaranteed either), 
 there is no guarantee that on all nodes, the first validation will take the 
 same time, hence desynchronizing the queued ones.
 Overall, it is important for the precision of repair that for a given CF and 
 range (which is the unit at which trees are computed), we make sure that all 
 node will start the validation at the same time (or, since we can't do magic, 
 as close as possible).
 One (reasonably simple) proposition to fix this would be to have repair 
 schedule validation compactions across nodes one by one (i.e, one CF/range at 
 a time), waiting for all nodes to return their tree before submitting the 
 next request. Then on each node, we should make sure that the node will start 
 the validation compaction as soon as requested. For that, we probably want to 
 have a specific executor for validation compaction and:
 * either we fail the whole repair whenever one node is not able to execute 
 the validation compaction right away (because no thread are available right 
 away).
 * we simply tell the user that if he start too many repairs in parallel, he 
 may start seeing some of those repairing more data than it should.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2816) Repair doesn't synchronize merkle tree creation properly

2011-07-04 Thread Terje Marthinussen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059658#comment-13059658
 ] 

Terje Marthinussen commented on CASSANDRA-2816:
---

bq. I also noticed that in this case, we have RF3. The node which is going 
somewhat crazy is number 6, however during the repair, it does log that it 
talks compares and streams data with node 4, 5, 7 and 8.

This is maybe correct. Node 7 will replicate to node 6 and 8 so 6 and 8 would 
share data.

So, to make things safe, even with this patch, every 4th node can run repair at 
the same time if RF=3?, but you still need to run repair on each of those 4 
nodes to make sure it is all repaired?

 Repair doesn't synchronize merkle tree creation properly
 

 Key: CASSANDRA-2816
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2816
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
  Labels: repair
 Fix For: 0.8.2

 Attachments: 0001-Schedule-merkle-tree-request-one-by-one.patch


 Being a little slow, I just realized after having opened CASSANDRA-2811 and 
 CASSANDRA-2815 that there is a more general problem with repair.
 When a repair is started, it will send a number of merkle tree to its 
 neighbor as well as himself and assume for correction that the building of 
 those trees will be started on every node roughly at the same time (if not, 
 we end up comparing data snapshot at different time and will thus mistakenly 
 repair a lot of useless data). This is bogus for many reasons:
 * Because validation compaction runs on the same executor that other 
 compaction, the start of the validation on the different node is subject to 
 other compactions. 0.8 mitigates this in a way by being multi-threaded (and 
 thus there is less change to be blocked a long time by a long running 
 compaction), but the compaction executor being bounded, its still a problem)
 * if you run a nodetool repair without arguments, it will repair every CFs. 
 As a consequence it will generate lots of merkle tree requests and all of 
 those requests will be issued at the same time. Because even in 0.8 the 
 compaction executor is bounded, some of those validations will end up being 
 queued behind the first ones. Even assuming that the different validation are 
 submitted in the same order on each node (which isn't guaranteed either), 
 there is no guarantee that on all nodes, the first validation will take the 
 same time, hence desynchronizing the queued ones.
 Overall, it is important for the precision of repair that for a given CF and 
 range (which is the unit at which trees are computed), we make sure that all 
 node will start the validation at the same time (or, since we can't do magic, 
 as close as possible).
 One (reasonably simple) proposition to fix this would be to have repair 
 schedule validation compactions across nodes one by one (i.e, one CF/range at 
 a time), waiting for all nodes to return their tree before submitting the 
 next request. Then on each node, we should make sure that the node will start 
 the validation compaction as soon as requested. For that, we probably want to 
 have a specific executor for validation compaction and:
 * either we fail the whole repair whenever one node is not able to execute 
 the validation compaction right away (because no thread are available right 
 away).
 * we simply tell the user that if he start too many repairs in parallel, he 
 may start seeing some of those repairing more data than it should.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2816) Repair doesn't synchronize merkle tree creation properly

2011-07-04 Thread Terje Marthinussen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059658#comment-13059658
 ] 

Terje Marthinussen edited comment on CASSANDRA-2816 at 7/5/11 2:31 AM:
---

bq. I also noticed that in this case, we have RF3. The node which is going 
somewhat crazy is number 6, however during the repair, it does log that it 
talks compares and streams data with node 4, 5, 7 and 8.

This is maybe correct. Node 7 will replicate to node 6 and 8 so 6 and 8 would 
share data.

So, to make things safe, even with this patch, every 4th node can run repair at 
the same time if RF=3?, but you still need to run repair on each of those 4 
nodes to make sure it is all repaired?

As for the comment I made earlier.

To me, it looks like if the repair start triggering transfers on a large scale, 
the file the node get streamed in will not be streamed out, but this may get 
compacted before the repair finished and the compacted file I suspect gets 
streamed out and the repair just never finishe

  was (Author: terjem):
bq. I also noticed that in this case, we have RF3. The node which is going 
somewhat crazy is number 6, however during the repair, it does log that it 
talks compares and streams data with node 4, 5, 7 and 8.

This is maybe correct. Node 7 will replicate to node 6 and 8 so 6 and 8 would 
share data.

So, to make things safe, even with this patch, every 4th node can run repair at 
the same time if RF=3?, but you still need to run repair on each of those 4 
nodes to make sure it is all repaired?
  
 Repair doesn't synchronize merkle tree creation properly
 

 Key: CASSANDRA-2816
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2816
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
  Labels: repair
 Fix For: 0.8.2

 Attachments: 0001-Schedule-merkle-tree-request-one-by-one.patch


 Being a little slow, I just realized after having opened CASSANDRA-2811 and 
 CASSANDRA-2815 that there is a more general problem with repair.
 When a repair is started, it will send a number of merkle tree to its 
 neighbor as well as himself and assume for correction that the building of 
 those trees will be started on every node roughly at the same time (if not, 
 we end up comparing data snapshot at different time and will thus mistakenly 
 repair a lot of useless data). This is bogus for many reasons:
 * Because validation compaction runs on the same executor that other 
 compaction, the start of the validation on the different node is subject to 
 other compactions. 0.8 mitigates this in a way by being multi-threaded (and 
 thus there is less change to be blocked a long time by a long running 
 compaction), but the compaction executor being bounded, its still a problem)
 * if you run a nodetool repair without arguments, it will repair every CFs. 
 As a consequence it will generate lots of merkle tree requests and all of 
 those requests will be issued at the same time. Because even in 0.8 the 
 compaction executor is bounded, some of those validations will end up being 
 queued behind the first ones. Even assuming that the different validation are 
 submitted in the same order on each node (which isn't guaranteed either), 
 there is no guarantee that on all nodes, the first validation will take the 
 same time, hence desynchronizing the queued ones.
 Overall, it is important for the precision of repair that for a given CF and 
 range (which is the unit at which trees are computed), we make sure that all 
 node will start the validation at the same time (or, since we can't do magic, 
 as close as possible).
 One (reasonably simple) proposition to fix this would be to have repair 
 schedule validation compactions across nodes one by one (i.e, one CF/range at 
 a time), waiting for all nodes to return their tree before submitting the 
 next request. Then on each node, we should make sure that the node will start 
 the validation compaction as soon as requested. For that, we probably want to 
 have a specific executor for validation compaction and:
 * either we fail the whole repair whenever one node is not able to execute 
 the validation compaction right away (because no thread are available right 
 away).
 * we simply tell the user that if he start too many repairs in parallel, he 
 may start seeing some of those repairing more data than it should.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira