date:20110704

[
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059334#comment-13059334
]

Mck SembWever commented on CASSANDRA-2388:
--

{quote}2) If we ARE in that situation, the right solution would be to send
the job to a TT whose local replica IS live, not to read the data from a
nonlocal replica. How can we signal that?{quote}To /really/ solve this issue
can we do the following?
In CFIF.getRangeMap() take out of each range any endpoints that are not alive.
A client connection already exists in this method. This filtering out of dead
endpoints wouldn't be difficult, and would move tasks *to* the data making use
of replica. This approach does need a new method in cassandra.thrift, eg
{{liststring describe_alive_nodes()}}

ColumnFamilyRecordReader fails for a given split because a host is down, even
if records could reasonably be read from other replica.
-

Key: CASSANDRA-2388
URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
Project: Cassandra
Issue Type: Bug
Components: Hadoop
Affects Versions: 0.7.6, 0.8.0
Reporter: Eldon Stegall
Assignee: Jeremy Hanna
Labels: hadoop, inputformat
Fix For: 0.7.7, 0.8.2

Attachments: 0002_On_TException_try_next_split.patch,
CASSANDRA-2388-addition1.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch,
CASSANDRA-2388.patch, CASSANDRA-2388.patch

ColumnFamilyRecordReader only tries the first location for a given split. We
should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2527) Add ability to snapshot data as input to hadoop jobs

2011-07-04 Thread Wojciech Meler (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059356#comment-13059356
 ] 

Wojciech Meler commented on CASSANDRA-2527:
---

It would be great to have more generic client access to snapshot data. Maybe 
snapshots should be visible as new keyspaces? Or maybe we should throw away 
snapshots and start cloning keyspaces? If cloned keyspace could be read-only it 
would work out of the box :).

 Add ability to snapshot data as input to hadoop jobs
 

 Key: CASSANDRA-2527
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2527
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jeremy Hanna
  Labels: hadoop

 It is desirable to have immutable inputs to hadoop jobs for the duration of 
 the job.  That way re-execution of individual tasks do not alter the output.  
 One way to accomplish this would be to snapshot the data that is used as 
 input to a job.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2851) hex-to-bytes conversion accepts invalid inputs silently


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059362#comment-13059362
 ] 

Sylvain Lebresne commented on CASSANDRA-2851:
-

Why would it be ok for single-character inputs and not other odd-sized inputs ? 
An odd-sized input doesn't (ever) correspond to a valid byte array, so I'd say 
either we always silently add a 0 to make it fit or we never do it. I do 
actually am in favor of throwing an exception rather then coping with it 
silently since it's more likely to indicate a user error than to be helpful 
(but maybe that addition of a '0' in front was there for a reason?).
I'll note that even though I can't imagine why people would generate odd-sized 
hex input, since it is allowed so far, there is a chance someone out there does 
it, and it would be a regression for that guy. So maybe we should target 1.0 
for the sake of making minor upgrade as smooth for everybody as can be.

On the patch side, we must make sure every consumer of hexToBytes() handles the 
new exception (or make it a NumberFormatException but I don't think this is a 
good idea). For instance, at least BytesType.fromString() should catch the 
IllegalArgumentException and rethrow a MarshalException, otherwise CQL will 
crap his pants on odd-sized inputs.

 hex-to-bytes conversion accepts invalid inputs silently
 ---

 Key: CASSANDRA-2851
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2851
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.6, 0.8.1
Reporter: David Allsopp
Priority: Minor
 Fix For: 0.8.2

 Attachments: cassandra-2851.diff


 FBUtilities.hexToBytes() has a minor bug - it copes with single-character 
 inputs by prepending 0, which is OK - but it does this for any input with 
 an odd number of characters, which is probably incorrect.
 {noformat}
 if (str.length() % 2 == 1)
 str = 0 + str;
 {noformat}
 Given 'fff' as an input, can we really assume that this should be '0fff'? 
 Isn't this just an error?
 Add the following to FBUtilitiesTest to demonstrate:
 {noformat}
 String[] badvalues = new String[]{000, fff};

 for (int i = 0; i  badvalues.length; i++)
 try
 {
 FBUtilities.hexToBytes(badvalues[i]);
 fail(Invalid hex value accepted+badvalues[i]);
 } catch (Exception e){}
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2846) Changing replication_factor using update keyspace not working

2011-07-04 Thread JIRA


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059373#comment-13059373
 ] 

Jonas Borgström commented on CASSANDRA-2846:


Jonathan, thanks for your fast response. Your patch works for me.

 Changing replication_factor using update keyspace not working
 ---

 Key: CASSANDRA-2846
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2846
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.1
 Environment: A clean 0.8.1 install using the default configuration
Reporter: Jonas Borgström
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.2

 Attachments: 2846.txt


 Unless I've misunderstood the new way to do this with 0.8 I think update 
 keyspace is broken:
 {code}
 [default@unknown] create keyspace Test with placement_strategy = 
 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = 
 [{replication_factor:1}];
 37f70d40-a3e9-11e0--242d50cf1fbf
 Waiting for schema agreement...
 ... schemas agree across the cluster
 [default@unknown] describe keyspace Test;
 Keyspace: Test:
   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
   Durable Writes: true
 Options: [replication_factor:1]
   Column Families:
 [default@unknown] update keyspace Test with placement_strategy = 
 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = 
 [{replication_factor:2}];
 489fe220-a3e9-11e0--242d50cf1fbf
 Waiting for schema agreement...
 ... schemas agree across the cluster
 [default@unknown] describe keyspace Test; 
   
 Keyspace: Test:
   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
   Durable Writes: true
 Options: [replication_factor:1]
   Column Families:
 {code}
 Isn't the second describe keyspace supposed to to say 
 replication_factor:2?
 Relevant bits from system.log:
 {code}
 Migration.java (line 116) Applying migration 
 489fe220-a3e9-11e0--242d50cf1fbf Update keyspace Testrep 
 strategy:SimpleStrategy{}durable_writes: true to Testrep 
 strategy:SimpleStrategy{}durable_writes: true
 UpdateKeyspace.java (line 74) Keyspace updated. Please perform any manual 
 operations
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2852) Cassandra CLI - Import Keyspace Definitions from File - Comments do partitially interpret characters/commands

2011-07-04 Thread Pavel Yaskevich (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-2852:
---

Attachment: CASSANDRA-2852.patch

can be applied on both 0.7 and 0.8 branches.

 Cassandra CLI - Import Keyspace Definitions from File - Comments do 
 partitially interpret characters/commands
 -

 Key: CASSANDRA-2852
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2852
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 0.7.0
 Environment: Win Vista 
Reporter: jens mueller
Assignee: Pavel Yaskevich
Priority: Trivial
 Fix For: 0.7.7, 0.8.2

 Attachments: CASSANDRA-2852.patch


 Hello, 
 using: bin/cassandra-cli -host localhost --file conf/schema-sample.txt
 with schema-sample.txt having contents like this:
 /* here are a lot of comments,
 like this sample create keyspace;
 and so on
 */
 Will result in an error: 
 Line 1 = Syntax Error at Position 323: mismatched charackter 'EOF' 
 expecting '*'
 The Cause is the keyspace; statement = the semicolon ; causes the error.
 However:
 Writing the word keyspace; with quotes, does NOT lead to the error.
 so this works: 
 /* here are a lot of comments,
 like this sample create keyspace;
 and so on
 */
 From my point of view this is an error. Everyting between the Start Comment 
 = /* and End Comment = */ Should be treated as a comment and not be 
 interpreted in any way. Thats the definition of a comment, to be not 
 interpreted at all. 
 Or this must be documented somewhere very prominently, otherwise this will 
 lead to unnecessary wasting of time searching for this odd behavoiur. And it 
 makes commenting out statements much more cumbersome.
 Plattform: Windows Vista
 thanks

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059401#comment-13059401
 ] 

Mck SembWever commented on CASSANDRA-1125:
--

bq. using KeyRange but with tokens (which Thrift also uses for start-exclusive)
this is my preference. i'll make a patch for it.

 Filter out ColumnFamily rows that aren't part of the query
 --

 Key: CASSANDRA-1125
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Jeremy Hanna
Assignee: Mck SembWever
Priority: Minor
 Fix For: 1.0

 Attachments: 1125-formatted.txt, CASSANDRA-1125.patch


 Currently, when running a MapReduce job against data in a Cassandra data 
 store, it reads through all the data for a particular ColumnFamily.  This 
 could be optimized to only read through those rows that have to do with the 
 query.
 It's a small change but wanted to put it in Jira so that it didn't fall 
 through the cracks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[Cassandra Wiki] Update of ClientOptions by PriitKallas

Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The ClientOptions page has been changed by PriitKallas:
http://wiki.apache.org/cassandra/ClientOptions?action=diffrev1=131rev2=132

Comment:
Added link to the new high-level PHP Cassandra Client Library

   * Ruby:
* Cassandra: http://github.com/fauna/cassandra
   * PHP:
+   * Cassandra PHP Client Library: 
https://github.com/kallaspriit/Cassandra-PHP-Client-Library
* phpcassa: http://github.com/thobbs/phpcassa
  
  == Older clients ==

[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow

[
https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sylvain Lebresne updated CASSANDRA-2850:

Attachment: 2850-v2.patch

Attaching a so-called v2 version that avoids the string object creation of
each byte by encodind each char separately. This version shows a 30% speedup
on the 10MB array conversion (and ~15% speedup on the 1K array conversion)
compared to the version of the previous patch. It also will generate less
garbage.

I've also broaden the scope of this ticket because hexToBytes also need some
love (actually even more so) and the v2 patch ships with a improved version of
hexToByte. As it turns out hexToByte was really naive and was using
substring() on every 2 characters, generating a lot of String objects. On a
micro-benchmark converting strings of 1000 characters, the attached version
shows a ~13x (!) speedup improvement. It also generate much less garbage.

To add to what David said, let's note that those methods used to not matter
too much (they were used non performance sensitive places, like debug/error
messages, or SSTable2json (though performance in those tools don't hurt)), but
are now used by CQL for BytesType.

Converting bytes to hex string is unnecessarily slow

Key: CASSANDRA-2850
URL: https://issues.apache.org/jira/browse/CASSANDRA-2850
Project: Cassandra
Issue Type: Improvement
Components: Core
Affects Versions: 0.7.6, 0.8.1
Reporter: David Allsopp
Priority: Minor
Fix For: 0.8.2

Attachments: 2850-v2.patch, BytesToHexBenchmark.java,
BytesToHexBenchmark2.java, cassandra-2850a.diff

ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the
StringBuilder (so several re-sizes will be needed behind the scenes) and it
makes quite a few method calls per byte.
(OK, this may be a premature optimisation, but I couldn't resist, and it's a
small change)
Will attach patch shortly that speeds it up by about x3, plus benchmarking
test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2851) hex-to-bytes conversion accepts invalid inputs silently

2011-07-04 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059413#comment-13059413
 ] 

Jonathan Ellis commented on CASSANDRA-2851:
---

bq. maybe that addition of a '0' in front was there for a reason

I think it's there b/c of Integer.toHexString: This value is converted to a 
string of ASCII digits in hexadecimal (base 16) with no extra leading 0s.

Our bytesToHex does pad... but only for single-digit results.  So if we fix 
hexToBytes we'll introduce an incompatibility. (Granted, a minor one.)

 hex-to-bytes conversion accepts invalid inputs silently
 ---

 Key: CASSANDRA-2851
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2851
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.6, 0.8.1
Reporter: David Allsopp
Priority: Minor
 Fix For: 0.8.2

 Attachments: cassandra-2851.diff


 FBUtilities.hexToBytes() has a minor bug - it copes with single-character 
 inputs by prepending 0, which is OK - but it does this for any input with 
 an odd number of characters, which is probably incorrect.
 {noformat}
 if (str.length() % 2 == 1)
 str = 0 + str;
 {noformat}
 Given 'fff' as an input, can we really assume that this should be '0fff'? 
 Isn't this just an error?
 Add the following to FBUtilitiesTest to demonstrate:
 {noformat}
 String[] badvalues = new String[]{000, fff};

 for (int i = 0; i  badvalues.length; i++)
 try
 {
 FBUtilities.hexToBytes(badvalues[i]);
 fail(Invalid hex value accepted+badvalues[i]);
 } catch (Exception e){}
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[Cassandra Wiki] Update of ClientOptions06 by PriitKallas

Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The ClientOptions06 page has been changed by PriitKallas:
http://wiki.apache.org/cassandra/ClientOptions06?action=diffrev1=5rev2=6

* Jassandra: http://code.google.com/p/jassandra/
* Kundera: http://code.google.com/p/kundera/
   * PHP :
+   * PHP Cassandra Client Library: 
http://github.com/kallaspriit/Cassandra-PHP-Client-Library
* Pandra: http://github.com/mjpearson/Pandra/tree/master
* PHP Cassa: http://github.com/hoan/phpcassa [port of pycassa to PHP]
   * Clojure :

svn commit: r1142647 - in /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra: db/ hadoop/ io/sstable/ streaming/

Author: jbellis
Date: Mon Jul  4 13:02:05 2011
New Revision: 1142647

URL: http://svn.apache.org/viewvc?rev=1142647view=rev
Log:
revert incomplete changes

Modified:

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/ColumnFamilySerializer.java

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/hadoop/ConfigHelper.java

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/io/sstable/IndexHelper.java

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/io/sstable/SSTableIdentityIterator.java

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/streaming/IncomingStreamReader.java

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/streaming/PendingFile.java

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/streaming/StreamInSession.java

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/streaming/StreamOut.java

Modified: 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/ColumnFamilySerializer.java
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/ColumnFamilySerializer.java?rev=1142647r1=1142646r2=1142647view=diff
==
--- 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/ColumnFamilySerializer.java
 (original)
+++ 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/ColumnFamilySerializer.java
 Mon Jul  4 13:02:05 2011
@@ -130,12 +130,6 @@ public class ColumnFamilySerializer impl
 public void deserializeColumns(DataInput dis, ColumnFamily cf, boolean 
intern, boolean fromRemote) throws IOException
 {
 int size = dis.readInt();
-deserializeColumns(dis, cf, size, intern, fromRemote);
-}
-
-/* column count is already read from DataInput */
-public void deserializeColumns(DataInput dis, ColumnFamily cf, int size, 
boolean intern, boolean fromRemote) throws IOException
-{
 ColumnFamilyStore interner = intern ? 
Table.open(CFMetaData.getCF(cf.id()).left).getColumnFamilyStore(cf.id()) : null;
 for (int i = 0; i  size; ++i)
 {

Modified: 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java?rev=1142647r1=1142646r2=1142647view=diff
==
--- 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java
 (original)
+++ 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java
 Mon Jul  4 13:02:05 2011
@@ -35,9 +35,10 @@ import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 import org.apache.cassandra.db.IColumn;
-import org.apache.cassandra.dht.IPartitioner;
-import org.apache.cassandra.dht.Range;
-import org.apache.cassandra.thrift.*;
+import org.apache.cassandra.thrift.Cassandra;
+import org.apache.cassandra.thrift.InvalidRequestException;
+import org.apache.cassandra.thrift.TokenRange;
+import org.apache.cassandra.thrift.TBinaryProtocol;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.mapreduce.*;
 import org.apache.thrift.TException;
@@ -100,43 +101,11 @@ public class ColumnFamilyInputFormat ext
 
 try
 {
-KeyRange jobKeyRange = ConfigHelper.getInputKeyRange(conf);
-IPartitioner partitioner = null;
-Range jobRange = null;
-if (jobKeyRange != null)
-{
-partitioner = 
ConfigHelper.getPartitioner(context.getConfiguration());
-assert partitioner.preservesOrder() : 
ConfigHelper.setInputKeyRange(..) can only be used with a order preserving 
paritioner;
-jobRange = new 
Range(partitioner.getToken(jobKeyRange.start_key),
- partitioner.getToken(jobKeyRange.end_key),
- partitioner);
-}
-
 ListFutureListInputSplit splitfutures = new 
ArrayListFutureListInputSplit();
 for (TokenRange range : masterRangeNodes)
 {
-if (jobRange == null)
-{
 // for each range, pick a live owner and ask it to compute 
bite-sized splits
 splitfutures.add(executor.submit(new SplitCallable(range, 
conf)));
-}
-else
-{
-Range dhtRange = new 
Range(partitioner.getTokenFactory().fromString(range.start_token),
-   
partitioner.getTokenFactory().fromString(range.end_token),
-

[jira] [Updated] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mck SembWever updated CASSANDRA-2388:
-

Attachment: CASSANDRA-2388-extended.patch

 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.7.6, 0.8.0
Reporter: Eldon Stegall
Assignee: Jeremy Hanna
  Labels: hadoop, inputformat
 Fix For: 0.7.7, 0.8.2

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388-addition1.patch, CASSANDRA-2388-extended.patch, 
 CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, 
 CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2816) Repair doesn't synchronize merkle tree creation properly

2011-07-04 Thread Terje Marthinussen (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059445#comment-13059445
]

Terje Marthinussen commented on CASSANDRA-2816:
---

Things definitely seems to be improved overall, but weird things still happens.

So... 12 node cluster, this is maybe ugly, I know, but start repair on all of
them.
Most nodes are fine, but one goes crazy. Disk use is now 3-4 times what it was
before the repair started, and it does not seem to be done yet.

I have really no idea if this is the case, but I am getting the hunch that this
node has ended up streaming out some of the data it is getting in. Would this
be possible?

Repair doesn't synchronize merkle tree creation properly

Key: CASSANDRA-2816
URL: https://issues.apache.org/jira/browse/CASSANDRA-2816
Project: Cassandra
Issue Type: Bug
Components: Core
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Labels: repair
Fix For: 0.8.2

Attachments: 0001-Schedule-merkle-tree-request-one-by-one.patch

Being a little slow, I just realized after having opened CASSANDRA-2811 and
CASSANDRA-2815 that there is a more general problem with repair.
When a repair is started, it will send a number of merkle tree to its
neighbor as well as himself and assume for correction that the building of
those trees will be started on every node roughly at the same time (if not,
we end up comparing data snapshot at different time and will thus mistakenly
repair a lot of useless data). This is bogus for many reasons:
* Because validation compaction runs on the same executor that other
compaction, the start of the validation on the different node is subject to
other compactions. 0.8 mitigates this in a way by being multi-threaded (and
thus there is less change to be blocked a long time by a long running
compaction), but the compaction executor being bounded, its still a problem)
* if you run a nodetool repair without arguments, it will repair every CFs.
As a consequence it will generate lots of merkle tree requests and all of
those requests will be issued at the same time. Because even in 0.8 the
compaction executor is bounded, some of those validations will end up being
queued behind the first ones. Even assuming that the different validation are
submitted in the same order on each node (which isn't guaranteed either),
there is no guarantee that on all nodes, the first validation will take the
same time, hence desynchronizing the queued ones.
Overall, it is important for the precision of repair that for a given CF and
range (which is the unit at which trees are computed), we make sure that all
node will start the validation at the same time (or, since we can't do magic,
as close as possible).
One (reasonably simple) proposition to fix this would be to have repair
schedule validation compactions across nodes one by one (i.e, one CF/range at
a time), waiting for all nodes to return their tree before submitting the
next request. Then on each node, we should make sure that the node will start
the validation compaction as soon as requested. For that, we probably want to
have a specific executor for validation compaction and:
* either we fail the whole repair whenever one node is not able to execute
the validation compaction right away (because no thread are available right
away).
* we simply tell the user that if he start too many repairs in parallel, he
may start seeing some of those repairing more data than it should.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mck SembWever updated CASSANDRA-1125:
-

Attachment: CASSANDRA-1125.patch

 Filter out ColumnFamily rows that aren't part of the query
 --

 Key: CASSANDRA-1125
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Jeremy Hanna
Assignee: Mck SembWever
Priority: Minor
 Fix For: 1.0

 Attachments: 1125-formatted.txt, CASSANDRA-1125.patch, 
 CASSANDRA-1125.patch


 Currently, when running a MapReduce job against data in a Cassandra data 
 store, it reads through all the data for a particular ColumnFamily.  This 
 could be optimized to only read through those rows that have to do with the 
 query.
 It's a small change but wanted to put it in Jira so that it didn't fall 
 through the cracks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2816) Repair doesn't synchronize merkle tree creation properly

[
https://issues.apache.org/jira/browse/CASSANDRA-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059459#comment-13059459
]

Sylvain Lebresne commented on CASSANDRA-2816:
-

bq. So... 12 node cluster, this is maybe ugly, I know, but start repair on all
of them.

Is it started on all of them ? If so, this is kind of expected in the sense
that the patch assumes that each node does not do more than 2 repairs (for any
column family) at the same time (this is configurable through the new
concurrent_validators option, but it's probably better to stick to 2 and
stagger the repair). If you do more than that (that is, if you did repair on
all node at the same time and RF2), then we're back on our old demons.

bq. I have really no idea if this is the case, but I am getting the hunch that
this node has ended up streaming out some of the data it is getting in. Would
this be possible?

Not really. That is, it could be that you create a merkle tree on some data and
once you start streaming you, you're picking up data that was just streamed to
you and wasn't there when computing the tree. This patch is suppose to fixes
this in parts, but this can still happen if you do repairs in parallel on
neighboring nodes. However, you shouldn't get into a situation where 2 nodes
stream forever because they pick up what is just streamed to them for instance,
because what is streaming is determined at the very beginning of the streaming
session.

So my first question would be, was all those repair started in parallel. If
yes, you shall not do this :). CASSANDRA-2606 and CASSANDRA-2610 are here to
help making the repair of a full cluster much easier (and efficient), but right
now it's more about getting patch in one at a time.
If the repairs were started one at a time in a rolling fashion, then we do have
a unknown problem somewhere.

Repair doesn't synchronize merkle tree creation properly

Attachments: 0001-Schedule-merkle-tree-request-one-by-one.patch

--
This message is

svn commit: r1142690 - in /cassandra/trunk: ./ src/java/org/apache/cassandra/db/ src/java/org/apache/cassandra/db/filter/ test/unit/org/apache/cassandra/db/ test/unit/org/apache/cassandra/db/compactio

2011-07-04 Thread slebresne

Author: slebresne
Date: Mon Jul  4 14:36:11 2011
New Revision: 1142690

URL: http://svn.apache.org/viewvc?rev=1142690view=rev
Log:
Reset CF and SC deletion time after gc_grace
patch by slebresne; reviewed by jbellis for CASSANDRA-2317

Added:

cassandra/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java
Modified:
cassandra/trunk/CHANGES.txt
cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamily.java
cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamilySerializer.java
cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
cassandra/trunk/src/java/org/apache/cassandra/db/IColumnContainer.java
cassandra/trunk/src/java/org/apache/cassandra/db/RowMutation.java
cassandra/trunk/src/java/org/apache/cassandra/db/SuperColumn.java
cassandra/trunk/src/java/org/apache/cassandra/db/filter/QueryFilter.java
cassandra/trunk/test/unit/org/apache/cassandra/db/RowTest.java

cassandra/trunk/test/unit/org/apache/cassandra/db/compaction/CompactionsPurgeTest.java
cassandra/trunk/test/unit/org/apache/cassandra/service/RowResolverTest.java

Modified: cassandra/trunk/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/trunk/CHANGES.txt?rev=1142690r1=1142689r2=1142690view=diff
==
--- cassandra/trunk/CHANGES.txt (original)
+++ cassandra/trunk/CHANGES.txt Mon Jul  4 14:36:11 2011
@@ -10,6 +10,7 @@
  * clean up tmp files after failed compaction (CASSANDRA-2468)
  * restrict repair streaming to specific columnfamilies (CASSANDRA-2280)
  * don't bother persisting columns shadowed by a row tombstone (CASSANDRA-2589)
+ * reset CF and SC deletion times after gc_grace (CASSANDRA-2317)
 
 
 0.8.2

Added: 
cassandra/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java
URL: 
http://svn.apache.org/viewvc/cassandra/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java?rev=1142690view=auto
==
--- 
cassandra/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java 
(added)
+++ 
cassandra/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java 
Mon Jul  4 14:36:11 2011
@@ -0,0 +1,212 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.db;
+
+import java.nio.ByteBuffer;
+import java.security.MessageDigest;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.Map;
+import java.util.SortedSet;
+import java.util.concurrent.ConcurrentSkipListMap;
+import java.util.concurrent.atomic.AtomicReference;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.cassandra.config.CFMetaData;
+import org.apache.cassandra.config.DatabaseDescriptor;
+import org.apache.cassandra.db.filter.QueryPath;
+import org.apache.cassandra.db.marshal.AbstractType;
+import org.apache.cassandra.io.ICompactSerializer2;
+import org.apache.cassandra.io.util.IIterableColumns;
+import org.apache.cassandra.utils.FBUtilities;
+
+public abstract class AbstractColumnContainer implements IColumnContainer, 
IIterableColumns
+{
+private static Logger logger = 
LoggerFactory.getLogger(AbstractColumnContainer.class);
+
+protected final ConcurrentSkipListMapByteBuffer, IColumn columns;
+protected final AtomicReferenceDeletionInfo deletionInfo = new 
AtomicReferenceDeletionInfo(new DeletionInfo());
+
+protected AbstractColumnContainer(ConcurrentSkipListMapByteBuffer, 
IColumn columns)
+{
+this.columns = columns;
+}
+
+@Deprecated // TODO this is a hack to set initial value outside constructor
+public void delete(int localtime, long timestamp)
+{
+deletionInfo.set(new DeletionInfo(timestamp, localtime));
+}
+
+public void delete(AbstractColumnContainer cc2)
+{
+// Keeping deletion info for max markedForDeleteAt value
+DeletionInfo current;
+DeletionInfo cc2Info = cc2.deletionInfo.get();
+while (true)
+{
+ current = deletionInfo.get();
+ if (current.markedForDeleteAt = cc2Info.markedForDeleteAt || 
deletionInfo.compareAndSet(current,

[jira] [Commented] (CASSANDRA-2317) Column family deletion time is not always reseted after gc_grace


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059460#comment-13059460
 ] 

Sylvain Lebresne commented on CASSANDRA-2317:
-

Committed to trunk (as I agree this should really go there).

bq. doesn't this mean that for a CF w/ no tombstone, we create a new 
deletioninfo every call to maybeReset?

You're right, I've included a current.localDeletionTime == Integer.MIN_VALUE in 
the condition to escape early in that case.

 Column family deletion time is not always reseted after gc_grace
 

 Key: CASSANDRA-2317
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2317
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.6
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 1.0

 Attachments: 
 0001-Add-AbstractColumnContainer-to-factor-common-parts-o.patch, 
 0002-Add-unit-test.patch, 
 0003-Reset-CF-and-SC-deletion-time-after-compaction.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 Follow up of CASSANDRA-2305.
 Reproducible (thanks to Jeffrey Wang) by: 
 Create a CF with gc_grace_seconds = 0 and no row cache.
 Insert row X, col A with timestamp 0.
 Insert row X, col B with timestamp 2.
 Remove row X with timestamp 1 (expect col A to disappear, col B to stay).
 Wait 1 second.
 Force flush and compaction.
 Insert row X, col A with timestamp 0.
 Read row X, col A (see nothing).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

buildbot success in ASF Buildbot on cassandra-trunk

2011-07-04 Thread buildbot

The Buildbot has detected a restored build on builder cassandra-trunk while 
building ASF Buildbot.
Full details are available at:
 http://ci.apache.org/builders/cassandra-trunk/builds/1407

Buildbot URL: http://ci.apache.org/

Buildslave for this Build: isis_ubuntu

Build Reason: scheduler
Build Source Stamp: [branch cassandra/trunk] 1142690
Blamelist: slebresne

Build succeeded!

sincerely,
 -The Buildbot

[jira] [Commented] (CASSANDRA-2317) Column family deletion time is not always reseted after gc_grace

2011-07-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059477#comment-13059477
 ] 

Hudson commented on CASSANDRA-2317:
---

Integrated in Cassandra #948 (See 
[https://builds.apache.org/job/Cassandra/948/])
Reset CF and SC deletion time after gc_grace
patch by slebresne; reviewed by jbellis for CASSANDRA-2317

slebresne : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1142690
Files : 
* /cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamily.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/filter/QueryFilter.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/RowMutation.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/SuperColumn.java
* /cassandra/trunk/CHANGES.txt
* /cassandra/trunk/test/unit/org/apache/cassandra/service/RowResolverTest.java
* /cassandra/trunk/test/unit/org/apache/cassandra/db/RowTest.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/IColumnContainer.java
* 
/cassandra/trunk/test/unit/org/apache/cassandra/db/compaction/CompactionsPurgeTest.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamilySerializer.java


 Column family deletion time is not always reseted after gc_grace
 

 Key: CASSANDRA-2317
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2317
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.6
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 1.0

 Attachments: 
 0001-Add-AbstractColumnContainer-to-factor-common-parts-o.patch, 
 0002-Add-unit-test.patch, 
 0003-Reset-CF-and-SC-deletion-time-after-compaction.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 Follow up of CASSANDRA-2305.
 Reproducible (thanks to Jeffrey Wang) by: 
 Create a CF with gc_grace_seconds = 0 and no row cache.
 Insert row X, col A with timestamp 0.
 Insert row X, col B with timestamp 2.
 Remove row X with timestamp 1 (expect col A to disappear, col B to stay).
 Wait 1 second.
 Force flush and compaction.
 Insert row X, col A with timestamp 0.
 Read row X, col A (see nothing).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2851) hex-to-bytes conversion accepts invalid inputs silently


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059484#comment-13059484
 ] 

Sylvain Lebresne commented on CASSANDRA-2851:
-

bq. Our bytesToHex does pad... but only for single-digit results. So if we fix 
hexToBytes we'll introduce an incompatibility. (Granted, a minor one.)

I don't understand. There is no such thing as padding when you convert a byte 
array to hex (Integer.toHexString does return only the right number of 
hexadecimal digits because it has no reason to do otherwise, but it's an 
implementation detail of bytesToHex). A byte is always 8 bit, never 4, and the 
output of bytesToHex will *always* have a even number of characters (as it 
should). Our hexToBytes just happen to semi-randomly add 0 in front to 
transform a buggy input with an odd number of character to a even one, in the 
off chance that a client used the (stupid) optimization of removing at most 1 
leading 0 to win some space or something. In my opinion, it would be better to 
simply refuse odd sized input because this is more likely a truncated input 
(and people using stupid clients should fix them, though I'm ok with saying 
that we'll force them to fix it only on a major upgrade).

 hex-to-bytes conversion accepts invalid inputs silently
 ---

 Key: CASSANDRA-2851
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2851
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.6, 0.8.1
Reporter: David Allsopp
Priority: Minor
 Fix For: 0.8.2

 Attachments: cassandra-2851.diff


 FBUtilities.hexToBytes() has a minor bug - it copes with single-character 
 inputs by prepending 0, which is OK - but it does this for any input with 
 an odd number of characters, which is probably incorrect.
 {noformat}
 if (str.length() % 2 == 1)
 str = 0 + str;
 {noformat}
 Given 'fff' as an input, can we really assume that this should be '0fff'? 
 Isn't this just an error?
 Add the following to FBUtilitiesTest to demonstrate:
 {noformat}
 String[] badvalues = new String[]{000, fff};

 for (int i = 0; i  badvalues.length; i++)
 try
 {
 FBUtilities.hexToBytes(badvalues[i]);
 fail(Invalid hex value accepted+badvalues[i]);
 } catch (Exception e){}
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2816) Repair doesn't synchronize merkle tree creation properly

2011-07-04 Thread Terje Marthinussen (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059485#comment-13059485
 ] 

Terje Marthinussen commented on CASSANDRA-2816:
---

Cool!

Then you confirmed what I have sort of believed for a while, but my 
understanding of code has been a bit in conflict with:
http://wiki.apache.org/cassandra/Operations
which says:
It is safe to run repair against multiple machines at the same time, but to 
minimize the impact on your application workload it is recommended to wait for 
it to complete on one node before invoking it against the next.

I have always read that as if you have the HW, go for it!

May I change to:
It is safe to run repair against multiple machines at the same time. However, 
to minimize the amount of data transferred during a repair, careful 
synchronization is required between the nodes taking part of the repair. 

This is difficult to do if nodes with the same data replicas runs repair at the 
same time and doing so can in extreme cases generate excessive transfers of 
data. 

Improvements is being worked on, but for now, avoid scheduling repair on 
several nodes with replicas of the same data at the same time.



 Repair doesn't synchronize merkle tree creation properly
 

 Key: CASSANDRA-2816
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2816
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
  Labels: repair
 Fix For: 0.8.2

 Attachments: 0001-Schedule-merkle-tree-request-one-by-one.patch


 Being a little slow, I just realized after having opened CASSANDRA-2811 and 
 CASSANDRA-2815 that there is a more general problem with repair.
 When a repair is started, it will send a number of merkle tree to its 
 neighbor as well as himself and assume for correction that the building of 
 those trees will be started on every node roughly at the same time (if not, 
 we end up comparing data snapshot at different time and will thus mistakenly 
 repair a lot of useless data). This is bogus for many reasons:
 * Because validation compaction runs on the same executor that other 
 compaction, the start of the validation on the different node is subject to 
 other compactions. 0.8 mitigates this in a way by being multi-threaded (and 
 thus there is less change to be blocked a long time by a long running 
 compaction), but the compaction executor being bounded, its still a problem)
 * if you run a nodetool repair without arguments, it will repair every CFs. 
 As a consequence it will generate lots of merkle tree requests and all of 
 those requests will be issued at the same time. Because even in 0.8 the 
 compaction executor is bounded, some of those validations will end up being 
 queued behind the first ones. Even assuming that the different validation are 
 submitted in the same order on each node (which isn't guaranteed either), 
 there is no guarantee that on all nodes, the first validation will take the 
 same time, hence desynchronizing the queued ones.
 Overall, it is important for the precision of repair that for a given CF and 
 range (which is the unit at which trees are computed), we make sure that all 
 node will start the validation at the same time (or, since we can't do magic, 
 as close as possible).
 One (reasonably simple) proposition to fix this would be to have repair 
 schedule validation compactions across nodes one by one (i.e, one CF/range at 
 a time), waiting for all nodes to return their tree before submitting the 
 next request. Then on each node, we should make sure that the node will start 
 the validation compaction as soon as requested. For that, we probably want to 
 have a specific executor for validation compaction and:
 * either we fail the whole repair whenever one node is not able to execute 
 the validation compaction right away (because no thread are available right 
 away).
 * we simply tell the user that if he start too many repairs in parallel, he 
 may start seeing some of those repairing more data than it should.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (CASSANDRA-2855) Add hadoop support option to skip rows with empty columns

2011-07-04 Thread Jeremy Hanna (JIRA)

Add hadoop support option to skip rows with empty columns
-

 Key: CASSANDRA-2855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2855
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna


We have been finding that range ghosts appear in results from Hadoop via Pig.  
This could also happen if rows don't have data for the slice predicate that is 
given.  This leads to having to do a painful amount of defensive checking on 
the Pig side, especially in the case of range ghosts.

We would like to add an option to skip rows that have no column values in it.  
That functionality existed before in core Cassandra but was removed because of 
the performance penalty of that checking.  However with Hadoop support in the 
RecordReader, that is batch oriented anyway, so individual row reading 
performance isn't as much of an issue.  Also we would make it an optional 
config parameter for each job anyway, so people wouldn't have to incur that 
penalty if they are confident that there won't be those empty rows or they 
don't care.

It could be parameter cassandra.skip.empty.rows and be true/false.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[Cassandra Wiki] Update of InstallThrift by Joe Stein

Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The InstallThrift page has been changed by Joe Stein:
http://wiki.apache.org/cassandra/InstallThrift?action=diffrev1=13rev2=14

  '''NOTE:''' If you arrived here for the purpose of writing your first 
application, please consider using a [[ClientOptions|higher-level client]] 
instead of thrift directly.
  
- [[http://incubator.apache.org/thrift|Thrift]] historically did not have 
tagged releases and Cassandra used trunk revisions of it. As of Cassandra 0.7, 
Thrift 0.5 is used. For Cassandra 0.6, you have to use the matching version of 
Thrift. Under such circumstances, installing thrift is a bit of a bitch.  We 
are sorry about that, but we don't know of a better way to support a vast 
number of clients mostly automagically.
+ [[http://thrift.apache.org/|Thrift]] historically did not have tagged 
releases and Cassandra used trunk revisions of it however as of Cassandra 0.8, 
Thrift 0.6 is used and available for 
[[http://thrift.apache.org/download/|download]].  With Cassandra 0.7, Thrift 
0.5 is used. For Cassandra 0.6, you have to use the matching version of Thrift. 
Under such circumstances, installing thrift is a bit of a bitch.  We are sorry 
about that, but we don't know of a better way to support a vast number of 
clients mostly automagically.
  
+ If installing Thrift 0.6 on a Mac for use with Cassandra 0.8 and you get an 
error building 'thrift.protocol.fastbinary' extension during `make` then you 
might need to work around https://issues.apache.org/jira/browse/THRIFT-1143 by 
going to thrift-0.6.1/lib/py and run `sudo ARCHFLAGS=-arch x86_64 python 
setup.py install`
+ 
- Important note: you need to install the svn revision of thrift that matches 
the revision that your version of Cassandra uses (if not using 0.7 with Thrift 
0.5). This can be found in the Cassandra Home/lib directory - e.g. 
`libthrift-917130.jar` means that version of Cassandra uses svn revision 917130 
of thrift.
+ Important note: If using Cassandra 0.6 then you need to install the svn 
revision of thrift that matches the revision that your version of Cassandra 
uses (if not using 0.8 with Thrift 0.6 nor 0.7 with Thrift 0.5). This can be 
found in the Cassandra Home/lib directory - e.g. `libthrift-917130.jar` means 
that version of Cassandra uses svn revision 917130 of thrift.
  
   1. `aptitude install libboost-dev python-dev autoconf automake pkg-config 
make libtool flex bison build-essential` (or the equivalent on your system) 
(assumes you are interested in building for python; omit python-dev otherwise)
   1. Grab the thrift source with the revision that your version of Cassandra 
uses: e.g. `svn co -r 917130 http://svn.apache.org/repos/asf/thrift/trunk 
thrift`

[jira] [Commented] (CASSANDRA-2851) hex-to-bytes conversion accepts invalid inputs silently

2011-07-04 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059500#comment-13059500
 ] 

Jonathan Ellis commented on CASSANDRA-2851:
---

You're right, I was misreading how we were using Integer.toHexString.

 hex-to-bytes conversion accepts invalid inputs silently
 ---

 Key: CASSANDRA-2851
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2851
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.6, 0.8.1
Reporter: David Allsopp
Priority: Minor
 Fix For: 0.8.2

 Attachments: cassandra-2851.diff


 FBUtilities.hexToBytes() has a minor bug - it copes with single-character 
 inputs by prepending 0, which is OK - but it does this for any input with 
 an odd number of characters, which is probably incorrect.
 {noformat}
 if (str.length() % 2 == 1)
 str = 0 + str;
 {noformat}
 Given 'fff' as an input, can we really assume that this should be '0fff'? 
 Isn't this just an error?
 Add the following to FBUtilitiesTest to demonstrate:
 {noformat}
 String[] badvalues = new String[]{000, fff};

 for (int i = 0; i  badvalues.length; i++)
 try
 {
 FBUtilities.hexToBytes(badvalues[i]);
 fail(Invalid hex value accepted+badvalues[i]);
 } catch (Exception e){}
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

svn commit: r1142725 - in /cassandra/branches/cassandra-0.8: CHANGES.txt src/java/org/apache/cassandra/cli/CliClient.java

Author: jbellis
Date: Mon Jul  4 16:20:14 2011
New Revision: 1142725

URL: http://svn.apache.org/viewvc?rev=1142725view=rev
Log:
fix CLI perpetuating obsolete KsDef.replication_factor
patch by jbellis; tested by Jonas Borgström for CASSANDRA-2846

Modified:
cassandra/branches/cassandra-0.8/CHANGES.txt

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliClient.java

Modified: cassandra/branches/cassandra-0.8/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/CHANGES.txt?rev=1142725r1=1142724r2=1142725view=diff
==
--- cassandra/branches/cassandra-0.8/CHANGES.txt (original)
+++ cassandra/branches/cassandra-0.8/CHANGES.txt Mon Jul  4 16:20:14 2011
@@ -13,6 +13,7 @@
  * Correctly set default for replicate_on_write (CASSANDRA-2835)
  * improve nodetool compactionstats formatting (CASSANDRA-2844)
  * fix index-building status display (CASSANDRA-2853)
+ * fix CLI perpetuating obsolete KsDef.replication_factor (CASSANDRA-2846)
 
 
 0.8.1

Modified: 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliClient.java
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliClient.java?rev=1142725r1=1142724r2=1142725view=diff
==
--- 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliClient.java
 (original)
+++ 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliClient.java
 Mon Jul  4 16:20:14 2011
@@ -1072,7 +1072,10 @@ public class CliClient
 private KsDef updateKsDefAttributes(Tree statement, KsDef ksDefToUpdate)
 {
 KsDef ksDef = new KsDef(ksDefToUpdate);
-
+// server helpfully sets deprecated replication factor when it sends a 
KsDef back, for older clients.
+// we need to unset that on the new KsDef we create to avoid being 
treated as a legacy client in return.
+ksDef.unsetReplication_factor();
+
 // removing all column definitions - thrift system_update_keyspace 
method requires that 
 ksDef.setCf_defs(new LinkedListCfDef());

svn commit: r1142727 - in /cassandra/branches/cassandra-0.7: CHANGES.txt src/java/org/apache/cassandra/cli/CliMain.java

Author: jbellis
Date: Mon Jul  4 16:22:12 2011
New Revision: 1142727

URL: http://svn.apache.org/viewvc?rev=1142727view=rev
Log:
improve cli treatment of multiline comments
patch by pyaskevich; reviewed by jbellis for CASSANDRA-2852

Modified:
cassandra/branches/cassandra-0.7/CHANGES.txt

cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/CliMain.java

Modified: cassandra/branches/cassandra-0.7/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/CHANGES.txt?rev=1142727r1=1142726r2=1142727view=diff
==
--- cassandra/branches/cassandra-0.7/CHANGES.txt (original)
+++ cassandra/branches/cassandra-0.7/CHANGES.txt Mon Jul  4 16:22:12 2011
@@ -31,6 +31,7 @@
(CASSANDRA-2841)
  * allow deleting a row and updating indexed columns in it in the
same mutation (CASSANDRA-2773)
+ * improve cli treatment of multiline comments (CASSANDRA-2852)
 
 
 0.7.6

Modified: 
cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/CliMain.java
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/CliMain.java?rev=1142727r1=1142726r2=1142727view=diff
==
--- 
cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/CliMain.java 
(original)
+++ 
cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/CliMain.java 
Mon Jul  4 16:22:12 2011
@@ -365,6 +365,8 @@ public class CliMain
 String line = ;
 String currentStatement = ;
 
+boolean commentedBlock = false;
+
 while ((line = reader.readLine()) != null)
 {
 line = line.trim();
@@ -373,6 +375,18 @@ public class CliMain
 if (line.isEmpty() || line.startsWith(--))
 continue;
 
+if (line.startsWith(/*))
+commentedBlock = true;
+
+if (line.startsWith(*/) || line.endsWith(*/))
+{
+commentedBlock = false;
+continue;
+}
+
+if (commentedBlock) // skip commented lines
+continue;
+
 currentStatement += line;
 
 if (line.endsWith(;))

svn commit: r1142729 - in /cassandra/branches/cassandra-0.8: ./ contrib/ interface/thrift/gen-java/org/apache/cassandra/thrift/ src/java/org/apache/cassandra/cli/ test/unit/org/apache/cassandra/db/

Author: jbellis
Date: Mon Jul  4 16:23:36 2011
New Revision: 1142729

URL: http://svn.apache.org/viewvc?rev=1142729view=rev
Log:
merge from 0.7

Modified:
cassandra/branches/cassandra-0.8/   (props changed)
cassandra/branches/cassandra-0.8/CHANGES.txt
cassandra/branches/cassandra-0.8/contrib/   (props changed)

cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java
   (props changed)

cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java
   (props changed)

cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java
   (props changed)

cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/NotFoundException.java
   (props changed)

cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/SuperColumn.java
   (props changed)

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliMain.java

cassandra/branches/cassandra-0.8/test/unit/org/apache/cassandra/db/ColumnFamilyStoreTest.java

Propchange: cassandra/branches/cassandra-0.8/
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Mon Jul  4 16:23:36 2011
@@ -1,5 +1,5 @@
 
/cassandra/branches/cassandra-0.6:922689-1052356,1052358-1053452,1053454,1053456-1131291
-/cassandra/branches/cassandra-0.7:1026516-1140567,1140928,1141129,1141213,1141217
+/cassandra/branches/cassandra-0.7:1026516-1142727
 /cassandra/branches/cassandra-0.7.0:1053690-1055654
 /cassandra/branches/cassandra-0.8:1090934-1125013,1125041
 /cassandra/branches/cassandra-0.8.0:1125021-1130369

Modified: cassandra/branches/cassandra-0.8/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/CHANGES.txt?rev=1142729r1=1142728r2=1142729view=diff
==
--- cassandra/branches/cassandra-0.8/CHANGES.txt (original)
+++ cassandra/branches/cassandra-0.8/CHANGES.txt Mon Jul  4 16:23:36 2011
@@ -14,6 +14,7 @@
  * improve nodetool compactionstats formatting (CASSANDRA-2844)
  * fix index-building status display (CASSANDRA-2853)
  * fix CLI perpetuating obsolete KsDef.replication_factor (CASSANDRA-2846)
+ * improve cli treatment of multiline comments (CASSANDRA-2852)
 
 
 0.8.1

Propchange: cassandra/branches/cassandra-0.8/contrib/
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Mon Jul  4 16:23:36 2011
@@ -1,5 +1,5 @@
 
/cassandra/branches/cassandra-0.6/contrib:922689-1052356,1052358-1053452,1053454,1053456-1068009
-/cassandra/branches/cassandra-0.7/contrib:1026516-1140567,1140928,1141129,1141213,1141217
+/cassandra/branches/cassandra-0.7/contrib:1026516-1142727
 /cassandra/branches/cassandra-0.7.0/contrib:1053690-1055654
 /cassandra/branches/cassandra-0.8/contrib:1090934-1125013,1125041
 /cassandra/branches/cassandra-0.8.0/contrib:1125021-1130369

Propchange: 
cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Mon Jul  4 16:23:36 2011
@@ -1,5 +1,5 @@
 
/cassandra/branches/cassandra-0.6/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:922689-1052356,1052358-1053452,1053454,1053456-1131291
-/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1026516-1140567,1140928,1141129,1141213,1141217
+/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1026516-1142727
 
/cassandra/branches/cassandra-0.7.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1053690-1055654
 
/cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1090934-1125013,1125041
 
/cassandra/branches/cassandra-0.8.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1125021-1130369

Propchange: 
cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Mon Jul  4 16:23:36 2011
@@ -1,5 +1,5 @@
 
/cassandra/branches/cassandra-0.6/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:922689-1052356,1052358-1053452,1053454,1053456-1131291
-/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1026516-1140567,1140928,1141129,1141213,1141217
+/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1026516-1142727

[Cassandra Wiki] Update of FAQ by TylerHobbs