[jira] [Commented] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.
[ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059334#comment-13059334 ] Mck SembWever commented on CASSANDRA-2388: -- {quote}2) If we ARE in that situation, the right solution would be to send the job to a TT whose local replica IS live, not to read the data from a nonlocal replica. How can we signal that?{quote}To /really/ solve this issue can we do the following? In CFIF.getRangeMap() take out of each range any endpoints that are not alive. A client connection already exists in this method. This filtering out of dead endpoints wouldn't be difficult, and would move tasks *to* the data making use of replica. This approach does need a new method in cassandra.thrift, eg {{liststring describe_alive_nodes()}} ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica. - Key: CASSANDRA-2388 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.7.6, 0.8.0 Reporter: Eldon Stegall Assignee: Jeremy Hanna Labels: hadoop, inputformat Fix For: 0.7.7, 0.8.2 Attachments: 0002_On_TException_try_next_split.patch, CASSANDRA-2388-addition1.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch ColumnFamilyRecordReader only tries the first location for a given split. We should try multiple locations for a given split. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2527) Add ability to snapshot data as input to hadoop jobs
[ https://issues.apache.org/jira/browse/CASSANDRA-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059356#comment-13059356 ] Wojciech Meler commented on CASSANDRA-2527: --- It would be great to have more generic client access to snapshot data. Maybe snapshots should be visible as new keyspaces? Or maybe we should throw away snapshots and start cloning keyspaces? If cloned keyspace could be read-only it would work out of the box :). Add ability to snapshot data as input to hadoop jobs Key: CASSANDRA-2527 URL: https://issues.apache.org/jira/browse/CASSANDRA-2527 Project: Cassandra Issue Type: Improvement Reporter: Jeremy Hanna Labels: hadoop It is desirable to have immutable inputs to hadoop jobs for the duration of the job. That way re-execution of individual tasks do not alter the output. One way to accomplish this would be to snapshot the data that is used as input to a job. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2851) hex-to-bytes conversion accepts invalid inputs silently
[ https://issues.apache.org/jira/browse/CASSANDRA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059362#comment-13059362 ] Sylvain Lebresne commented on CASSANDRA-2851: - Why would it be ok for single-character inputs and not other odd-sized inputs ? An odd-sized input doesn't (ever) correspond to a valid byte array, so I'd say either we always silently add a 0 to make it fit or we never do it. I do actually am in favor of throwing an exception rather then coping with it silently since it's more likely to indicate a user error than to be helpful (but maybe that addition of a '0' in front was there for a reason?). I'll note that even though I can't imagine why people would generate odd-sized hex input, since it is allowed so far, there is a chance someone out there does it, and it would be a regression for that guy. So maybe we should target 1.0 for the sake of making minor upgrade as smooth for everybody as can be. On the patch side, we must make sure every consumer of hexToBytes() handles the new exception (or make it a NumberFormatException but I don't think this is a good idea). For instance, at least BytesType.fromString() should catch the IllegalArgumentException and rethrow a MarshalException, otherwise CQL will crap his pants on odd-sized inputs. hex-to-bytes conversion accepts invalid inputs silently --- Key: CASSANDRA-2851 URL: https://issues.apache.org/jira/browse/CASSANDRA-2851 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.6, 0.8.1 Reporter: David Allsopp Priority: Minor Fix For: 0.8.2 Attachments: cassandra-2851.diff FBUtilities.hexToBytes() has a minor bug - it copes with single-character inputs by prepending 0, which is OK - but it does this for any input with an odd number of characters, which is probably incorrect. {noformat} if (str.length() % 2 == 1) str = 0 + str; {noformat} Given 'fff' as an input, can we really assume that this should be '0fff'? Isn't this just an error? Add the following to FBUtilitiesTest to demonstrate: {noformat} String[] badvalues = new String[]{000, fff}; for (int i = 0; i badvalues.length; i++) try { FBUtilities.hexToBytes(badvalues[i]); fail(Invalid hex value accepted+badvalues[i]); } catch (Exception e){} {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2846) Changing replication_factor using update keyspace not working
[ https://issues.apache.org/jira/browse/CASSANDRA-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059373#comment-13059373 ] Jonas Borgström commented on CASSANDRA-2846: Jonathan, thanks for your fast response. Your patch works for me. Changing replication_factor using update keyspace not working --- Key: CASSANDRA-2846 URL: https://issues.apache.org/jira/browse/CASSANDRA-2846 Project: Cassandra Issue Type: Bug Affects Versions: 0.8.1 Environment: A clean 0.8.1 install using the default configuration Reporter: Jonas Borgström Assignee: Jonathan Ellis Priority: Minor Fix For: 0.8.2 Attachments: 2846.txt Unless I've misunderstood the new way to do this with 0.8 I think update keyspace is broken: {code} [default@unknown] create keyspace Test with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = [{replication_factor:1}]; 37f70d40-a3e9-11e0--242d50cf1fbf Waiting for schema agreement... ... schemas agree across the cluster [default@unknown] describe keyspace Test; Keyspace: Test: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:1] Column Families: [default@unknown] update keyspace Test with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = [{replication_factor:2}]; 489fe220-a3e9-11e0--242d50cf1fbf Waiting for schema agreement... ... schemas agree across the cluster [default@unknown] describe keyspace Test; Keyspace: Test: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:1] Column Families: {code} Isn't the second describe keyspace supposed to to say replication_factor:2? Relevant bits from system.log: {code} Migration.java (line 116) Applying migration 489fe220-a3e9-11e0--242d50cf1fbf Update keyspace Testrep strategy:SimpleStrategy{}durable_writes: true to Testrep strategy:SimpleStrategy{}durable_writes: true UpdateKeyspace.java (line 74) Keyspace updated. Please perform any manual operations {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2852) Cassandra CLI - Import Keyspace Definitions from File - Comments do partitially interpret characters/commands
[ https://issues.apache.org/jira/browse/CASSANDRA-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Yaskevich updated CASSANDRA-2852: --- Attachment: CASSANDRA-2852.patch can be applied on both 0.7 and 0.8 branches. Cassandra CLI - Import Keyspace Definitions from File - Comments do partitially interpret characters/commands - Key: CASSANDRA-2852 URL: https://issues.apache.org/jira/browse/CASSANDRA-2852 Project: Cassandra Issue Type: Bug Components: Tools Affects Versions: 0.7.0 Environment: Win Vista Reporter: jens mueller Assignee: Pavel Yaskevich Priority: Trivial Fix For: 0.7.7, 0.8.2 Attachments: CASSANDRA-2852.patch Hello, using: bin/cassandra-cli -host localhost --file conf/schema-sample.txt with schema-sample.txt having contents like this: /* here are a lot of comments, like this sample create keyspace; and so on */ Will result in an error: Line 1 = Syntax Error at Position 323: mismatched charackter 'EOF' expecting '*' The Cause is the keyspace; statement = the semicolon ; causes the error. However: Writing the word keyspace; with quotes, does NOT lead to the error. so this works: /* here are a lot of comments, like this sample create keyspace; and so on */ From my point of view this is an error. Everyting between the Start Comment = /* and End Comment = */ Should be treated as a comment and not be interpreted in any way. Thats the definition of a comment, to be not interpreted at all. Or this must be documented somewhere very prominently, otherwise this will lead to unnecessary wasting of time searching for this odd behavoiur. And it makes commenting out statements much more cumbersome. Plattform: Windows Vista thanks -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059401#comment-13059401 ] Mck SembWever commented on CASSANDRA-1125: -- bq. using KeyRange but with tokens (which Thrift also uses for start-exclusive) this is my preference. i'll make a patch for it. Filter out ColumnFamily rows that aren't part of the query -- Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Assignee: Mck SembWever Priority: Minor Fix For: 1.0 Attachments: 1125-formatted.txt, CASSANDRA-1125.patch Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Cassandra Wiki] Update of ClientOptions by PriitKallas
Dear Wiki user, You have subscribed to a wiki page or wiki category on Cassandra Wiki for change notification. The ClientOptions page has been changed by PriitKallas: http://wiki.apache.org/cassandra/ClientOptions?action=diffrev1=131rev2=132 Comment: Added link to the new high-level PHP Cassandra Client Library * Ruby: * Cassandra: http://github.com/fauna/cassandra * PHP: + * Cassandra PHP Client Library: https://github.com/kallaspriit/Cassandra-PHP-Client-Library * phpcassa: http://github.com/thobbs/phpcassa == Older clients ==
[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow
[ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-2850: Attachment: 2850-v2.patch Attaching a so-called v2 version that avoids the string object creation of each byte by encodind each char separately. This version shows a 30% speedup on the 10MB array conversion (and ~15% speedup on the 1K array conversion) compared to the version of the previous patch. It also will generate less garbage. I've also broaden the scope of this ticket because hexToBytes also need some love (actually even more so) and the v2 patch ships with a improved version of hexToByte. As it turns out hexToByte was really naive and was using substring() on every 2 characters, generating a lot of String objects. On a micro-benchmark converting strings of 1000 characters, the attached version shows a ~13x (!) speedup improvement. It also generate much less garbage. To add to what David said, let's note that those methods used to not matter too much (they were used non performance sensitive places, like debug/error messages, or SSTable2json (though performance in those tools don't hurt)), but are now used by CQL for BytesType. Converting bytes to hex string is unnecessarily slow Key: CASSANDRA-2850 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7.6, 0.8.1 Reporter: David Allsopp Priority: Minor Fix For: 0.8.2 Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, cassandra-2850a.diff ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte. (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change) Will attach patch shortly that speeds it up by about x3, plus benchmarking test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2851) hex-to-bytes conversion accepts invalid inputs silently
[ https://issues.apache.org/jira/browse/CASSANDRA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059413#comment-13059413 ] Jonathan Ellis commented on CASSANDRA-2851: --- bq. maybe that addition of a '0' in front was there for a reason I think it's there b/c of Integer.toHexString: This value is converted to a string of ASCII digits in hexadecimal (base 16) with no extra leading 0s. Our bytesToHex does pad... but only for single-digit results. So if we fix hexToBytes we'll introduce an incompatibility. (Granted, a minor one.) hex-to-bytes conversion accepts invalid inputs silently --- Key: CASSANDRA-2851 URL: https://issues.apache.org/jira/browse/CASSANDRA-2851 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.6, 0.8.1 Reporter: David Allsopp Priority: Minor Fix For: 0.8.2 Attachments: cassandra-2851.diff FBUtilities.hexToBytes() has a minor bug - it copes with single-character inputs by prepending 0, which is OK - but it does this for any input with an odd number of characters, which is probably incorrect. {noformat} if (str.length() % 2 == 1) str = 0 + str; {noformat} Given 'fff' as an input, can we really assume that this should be '0fff'? Isn't this just an error? Add the following to FBUtilitiesTest to demonstrate: {noformat} String[] badvalues = new String[]{000, fff}; for (int i = 0; i badvalues.length; i++) try { FBUtilities.hexToBytes(badvalues[i]); fail(Invalid hex value accepted+badvalues[i]); } catch (Exception e){} {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Cassandra Wiki] Update of ClientOptions06 by PriitKallas
Dear Wiki user, You have subscribed to a wiki page or wiki category on Cassandra Wiki for change notification. The ClientOptions06 page has been changed by PriitKallas: http://wiki.apache.org/cassandra/ClientOptions06?action=diffrev1=5rev2=6 * Jassandra: http://code.google.com/p/jassandra/ * Kundera: http://code.google.com/p/kundera/ * PHP : + * PHP Cassandra Client Library: http://github.com/kallaspriit/Cassandra-PHP-Client-Library * Pandra: http://github.com/mjpearson/Pandra/tree/master * PHP Cassa: http://github.com/hoan/phpcassa [port of pycassa to PHP] * Clojure :
svn commit: r1142647 - in /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra: db/ hadoop/ io/sstable/ streaming/
Author: jbellis Date: Mon Jul 4 13:02:05 2011 New Revision: 1142647 URL: http://svn.apache.org/viewvc?rev=1142647view=rev Log: revert incomplete changes Modified: cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/ColumnFamilySerializer.java cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/hadoop/ConfigHelper.java cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/io/sstable/IndexHelper.java cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/io/sstable/SSTableIdentityIterator.java cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/streaming/IncomingStreamReader.java cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/streaming/PendingFile.java cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/streaming/StreamInSession.java cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/streaming/StreamOut.java Modified: cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/ColumnFamilySerializer.java URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/ColumnFamilySerializer.java?rev=1142647r1=1142646r2=1142647view=diff == --- cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/ColumnFamilySerializer.java (original) +++ cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/ColumnFamilySerializer.java Mon Jul 4 13:02:05 2011 @@ -130,12 +130,6 @@ public class ColumnFamilySerializer impl public void deserializeColumns(DataInput dis, ColumnFamily cf, boolean intern, boolean fromRemote) throws IOException { int size = dis.readInt(); -deserializeColumns(dis, cf, size, intern, fromRemote); -} - -/* column count is already read from DataInput */ -public void deserializeColumns(DataInput dis, ColumnFamily cf, int size, boolean intern, boolean fromRemote) throws IOException -{ ColumnFamilyStore interner = intern ? Table.open(CFMetaData.getCF(cf.id()).left).getColumnFamilyStore(cf.id()) : null; for (int i = 0; i size; ++i) { Modified: cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java?rev=1142647r1=1142646r2=1142647view=diff == --- cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java (original) +++ cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java Mon Jul 4 13:02:05 2011 @@ -35,9 +35,10 @@ import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.apache.cassandra.db.IColumn; -import org.apache.cassandra.dht.IPartitioner; -import org.apache.cassandra.dht.Range; -import org.apache.cassandra.thrift.*; +import org.apache.cassandra.thrift.Cassandra; +import org.apache.cassandra.thrift.InvalidRequestException; +import org.apache.cassandra.thrift.TokenRange; +import org.apache.cassandra.thrift.TBinaryProtocol; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.mapreduce.*; import org.apache.thrift.TException; @@ -100,43 +101,11 @@ public class ColumnFamilyInputFormat ext try { -KeyRange jobKeyRange = ConfigHelper.getInputKeyRange(conf); -IPartitioner partitioner = null; -Range jobRange = null; -if (jobKeyRange != null) -{ -partitioner = ConfigHelper.getPartitioner(context.getConfiguration()); -assert partitioner.preservesOrder() : ConfigHelper.setInputKeyRange(..) can only be used with a order preserving paritioner; -jobRange = new Range(partitioner.getToken(jobKeyRange.start_key), - partitioner.getToken(jobKeyRange.end_key), - partitioner); -} - ListFutureListInputSplit splitfutures = new ArrayListFutureListInputSplit(); for (TokenRange range : masterRangeNodes) { -if (jobRange == null) -{ // for each range, pick a live owner and ask it to compute bite-sized splits splitfutures.add(executor.submit(new SplitCallable(range, conf))); -} -else -{ -Range dhtRange = new Range(partitioner.getTokenFactory().fromString(range.start_token), - partitioner.getTokenFactory().fromString(range.end_token), -
[jira] [Updated] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.
[ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mck SembWever updated CASSANDRA-2388: - Attachment: CASSANDRA-2388-extended.patch ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica. - Key: CASSANDRA-2388 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.7.6, 0.8.0 Reporter: Eldon Stegall Assignee: Jeremy Hanna Labels: hadoop, inputformat Fix For: 0.7.7, 0.8.2 Attachments: 0002_On_TException_try_next_split.patch, CASSANDRA-2388-addition1.patch, CASSANDRA-2388-extended.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch ColumnFamilyRecordReader only tries the first location for a given split. We should try multiple locations for a given split. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2816) Repair doesn't synchronize merkle tree creation properly
[ https://issues.apache.org/jira/browse/CASSANDRA-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059445#comment-13059445 ] Terje Marthinussen commented on CASSANDRA-2816: --- Things definitely seems to be improved overall, but weird things still happens. So... 12 node cluster, this is maybe ugly, I know, but start repair on all of them. Most nodes are fine, but one goes crazy. Disk use is now 3-4 times what it was before the repair started, and it does not seem to be done yet. I have really no idea if this is the case, but I am getting the hunch that this node has ended up streaming out some of the data it is getting in. Would this be possible? Repair doesn't synchronize merkle tree creation properly Key: CASSANDRA-2816 URL: https://issues.apache.org/jira/browse/CASSANDRA-2816 Project: Cassandra Issue Type: Bug Components: Core Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Labels: repair Fix For: 0.8.2 Attachments: 0001-Schedule-merkle-tree-request-one-by-one.patch Being a little slow, I just realized after having opened CASSANDRA-2811 and CASSANDRA-2815 that there is a more general problem with repair. When a repair is started, it will send a number of merkle tree to its neighbor as well as himself and assume for correction that the building of those trees will be started on every node roughly at the same time (if not, we end up comparing data snapshot at different time and will thus mistakenly repair a lot of useless data). This is bogus for many reasons: * Because validation compaction runs on the same executor that other compaction, the start of the validation on the different node is subject to other compactions. 0.8 mitigates this in a way by being multi-threaded (and thus there is less change to be blocked a long time by a long running compaction), but the compaction executor being bounded, its still a problem) * if you run a nodetool repair without arguments, it will repair every CFs. As a consequence it will generate lots of merkle tree requests and all of those requests will be issued at the same time. Because even in 0.8 the compaction executor is bounded, some of those validations will end up being queued behind the first ones. Even assuming that the different validation are submitted in the same order on each node (which isn't guaranteed either), there is no guarantee that on all nodes, the first validation will take the same time, hence desynchronizing the queued ones. Overall, it is important for the precision of repair that for a given CF and range (which is the unit at which trees are computed), we make sure that all node will start the validation at the same time (or, since we can't do magic, as close as possible). One (reasonably simple) proposition to fix this would be to have repair schedule validation compactions across nodes one by one (i.e, one CF/range at a time), waiting for all nodes to return their tree before submitting the next request. Then on each node, we should make sure that the node will start the validation compaction as soon as requested. For that, we probably want to have a specific executor for validation compaction and: * either we fail the whole repair whenever one node is not able to execute the validation compaction right away (because no thread are available right away). * we simply tell the user that if he start too many repairs in parallel, he may start seeing some of those repairing more data than it should. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mck SembWever updated CASSANDRA-1125: - Attachment: CASSANDRA-1125.patch Filter out ColumnFamily rows that aren't part of the query -- Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Assignee: Mck SembWever Priority: Minor Fix For: 1.0 Attachments: 1125-formatted.txt, CASSANDRA-1125.patch, CASSANDRA-1125.patch Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2816) Repair doesn't synchronize merkle tree creation properly
[ https://issues.apache.org/jira/browse/CASSANDRA-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059459#comment-13059459 ] Sylvain Lebresne commented on CASSANDRA-2816: - bq. So... 12 node cluster, this is maybe ugly, I know, but start repair on all of them. Is it started on all of them ? If so, this is kind of expected in the sense that the patch assumes that each node does not do more than 2 repairs (for any column family) at the same time (this is configurable through the new concurrent_validators option, but it's probably better to stick to 2 and stagger the repair). If you do more than that (that is, if you did repair on all node at the same time and RF2), then we're back on our old demons. bq. I have really no idea if this is the case, but I am getting the hunch that this node has ended up streaming out some of the data it is getting in. Would this be possible? Not really. That is, it could be that you create a merkle tree on some data and once you start streaming you, you're picking up data that was just streamed to you and wasn't there when computing the tree. This patch is suppose to fixes this in parts, but this can still happen if you do repairs in parallel on neighboring nodes. However, you shouldn't get into a situation where 2 nodes stream forever because they pick up what is just streamed to them for instance, because what is streaming is determined at the very beginning of the streaming session. So my first question would be, was all those repair started in parallel. If yes, you shall not do this :). CASSANDRA-2606 and CASSANDRA-2610 are here to help making the repair of a full cluster much easier (and efficient), but right now it's more about getting patch in one at a time. If the repairs were started one at a time in a rolling fashion, then we do have a unknown problem somewhere. Repair doesn't synchronize merkle tree creation properly Key: CASSANDRA-2816 URL: https://issues.apache.org/jira/browse/CASSANDRA-2816 Project: Cassandra Issue Type: Bug Components: Core Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Labels: repair Fix For: 0.8.2 Attachments: 0001-Schedule-merkle-tree-request-one-by-one.patch Being a little slow, I just realized after having opened CASSANDRA-2811 and CASSANDRA-2815 that there is a more general problem with repair. When a repair is started, it will send a number of merkle tree to its neighbor as well as himself and assume for correction that the building of those trees will be started on every node roughly at the same time (if not, we end up comparing data snapshot at different time and will thus mistakenly repair a lot of useless data). This is bogus for many reasons: * Because validation compaction runs on the same executor that other compaction, the start of the validation on the different node is subject to other compactions. 0.8 mitigates this in a way by being multi-threaded (and thus there is less change to be blocked a long time by a long running compaction), but the compaction executor being bounded, its still a problem) * if you run a nodetool repair without arguments, it will repair every CFs. As a consequence it will generate lots of merkle tree requests and all of those requests will be issued at the same time. Because even in 0.8 the compaction executor is bounded, some of those validations will end up being queued behind the first ones. Even assuming that the different validation are submitted in the same order on each node (which isn't guaranteed either), there is no guarantee that on all nodes, the first validation will take the same time, hence desynchronizing the queued ones. Overall, it is important for the precision of repair that for a given CF and range (which is the unit at which trees are computed), we make sure that all node will start the validation at the same time (or, since we can't do magic, as close as possible). One (reasonably simple) proposition to fix this would be to have repair schedule validation compactions across nodes one by one (i.e, one CF/range at a time), waiting for all nodes to return their tree before submitting the next request. Then on each node, we should make sure that the node will start the validation compaction as soon as requested. For that, we probably want to have a specific executor for validation compaction and: * either we fail the whole repair whenever one node is not able to execute the validation compaction right away (because no thread are available right away). * we simply tell the user that if he start too many repairs in parallel, he may start seeing some of those repairing more data than it should. -- This message is
svn commit: r1142690 - in /cassandra/trunk: ./ src/java/org/apache/cassandra/db/ src/java/org/apache/cassandra/db/filter/ test/unit/org/apache/cassandra/db/ test/unit/org/apache/cassandra/db/compactio
Author: slebresne Date: Mon Jul 4 14:36:11 2011 New Revision: 1142690 URL: http://svn.apache.org/viewvc?rev=1142690view=rev Log: Reset CF and SC deletion time after gc_grace patch by slebresne; reviewed by jbellis for CASSANDRA-2317 Added: cassandra/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java Modified: cassandra/trunk/CHANGES.txt cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamily.java cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamilySerializer.java cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java cassandra/trunk/src/java/org/apache/cassandra/db/IColumnContainer.java cassandra/trunk/src/java/org/apache/cassandra/db/RowMutation.java cassandra/trunk/src/java/org/apache/cassandra/db/SuperColumn.java cassandra/trunk/src/java/org/apache/cassandra/db/filter/QueryFilter.java cassandra/trunk/test/unit/org/apache/cassandra/db/RowTest.java cassandra/trunk/test/unit/org/apache/cassandra/db/compaction/CompactionsPurgeTest.java cassandra/trunk/test/unit/org/apache/cassandra/service/RowResolverTest.java Modified: cassandra/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/cassandra/trunk/CHANGES.txt?rev=1142690r1=1142689r2=1142690view=diff == --- cassandra/trunk/CHANGES.txt (original) +++ cassandra/trunk/CHANGES.txt Mon Jul 4 14:36:11 2011 @@ -10,6 +10,7 @@ * clean up tmp files after failed compaction (CASSANDRA-2468) * restrict repair streaming to specific columnfamilies (CASSANDRA-2280) * don't bother persisting columns shadowed by a row tombstone (CASSANDRA-2589) + * reset CF and SC deletion times after gc_grace (CASSANDRA-2317) 0.8.2 Added: cassandra/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java URL: http://svn.apache.org/viewvc/cassandra/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java?rev=1142690view=auto == --- cassandra/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java (added) +++ cassandra/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java Mon Jul 4 14:36:11 2011 @@ -0,0 +1,212 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.cassandra.db; + +import java.nio.ByteBuffer; +import java.security.MessageDigest; +import java.util.Collection; +import java.util.Iterator; +import java.util.Map; +import java.util.SortedSet; +import java.util.concurrent.ConcurrentSkipListMap; +import java.util.concurrent.atomic.AtomicReference; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.cassandra.config.CFMetaData; +import org.apache.cassandra.config.DatabaseDescriptor; +import org.apache.cassandra.db.filter.QueryPath; +import org.apache.cassandra.db.marshal.AbstractType; +import org.apache.cassandra.io.ICompactSerializer2; +import org.apache.cassandra.io.util.IIterableColumns; +import org.apache.cassandra.utils.FBUtilities; + +public abstract class AbstractColumnContainer implements IColumnContainer, IIterableColumns +{ +private static Logger logger = LoggerFactory.getLogger(AbstractColumnContainer.class); + +protected final ConcurrentSkipListMapByteBuffer, IColumn columns; +protected final AtomicReferenceDeletionInfo deletionInfo = new AtomicReferenceDeletionInfo(new DeletionInfo()); + +protected AbstractColumnContainer(ConcurrentSkipListMapByteBuffer, IColumn columns) +{ +this.columns = columns; +} + +@Deprecated // TODO this is a hack to set initial value outside constructor +public void delete(int localtime, long timestamp) +{ +deletionInfo.set(new DeletionInfo(timestamp, localtime)); +} + +public void delete(AbstractColumnContainer cc2) +{ +// Keeping deletion info for max markedForDeleteAt value +DeletionInfo current; +DeletionInfo cc2Info = cc2.deletionInfo.get(); +while (true) +{ + current = deletionInfo.get(); + if (current.markedForDeleteAt = cc2Info.markedForDeleteAt || deletionInfo.compareAndSet(current,
[jira] [Commented] (CASSANDRA-2317) Column family deletion time is not always reseted after gc_grace
[ https://issues.apache.org/jira/browse/CASSANDRA-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059460#comment-13059460 ] Sylvain Lebresne commented on CASSANDRA-2317: - Committed to trunk (as I agree this should really go there). bq. doesn't this mean that for a CF w/ no tombstone, we create a new deletioninfo every call to maybeReset? You're right, I've included a current.localDeletionTime == Integer.MIN_VALUE in the condition to escape early in that case. Column family deletion time is not always reseted after gc_grace Key: CASSANDRA-2317 URL: https://issues.apache.org/jira/browse/CASSANDRA-2317 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.6 Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Priority: Minor Fix For: 1.0 Attachments: 0001-Add-AbstractColumnContainer-to-factor-common-parts-o.patch, 0002-Add-unit-test.patch, 0003-Reset-CF-and-SC-deletion-time-after-compaction.patch Original Estimate: 1h Remaining Estimate: 1h Follow up of CASSANDRA-2305. Reproducible (thanks to Jeffrey Wang) by: Create a CF with gc_grace_seconds = 0 and no row cache. Insert row X, col A with timestamp 0. Insert row X, col B with timestamp 2. Remove row X with timestamp 1 (expect col A to disappear, col B to stay). Wait 1 second. Force flush and compaction. Insert row X, col A with timestamp 0. Read row X, col A (see nothing). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
buildbot success in ASF Buildbot on cassandra-trunk
The Buildbot has detected a restored build on builder cassandra-trunk while building ASF Buildbot. Full details are available at: http://ci.apache.org/builders/cassandra-trunk/builds/1407 Buildbot URL: http://ci.apache.org/ Buildslave for this Build: isis_ubuntu Build Reason: scheduler Build Source Stamp: [branch cassandra/trunk] 1142690 Blamelist: slebresne Build succeeded! sincerely, -The Buildbot
[jira] [Commented] (CASSANDRA-2317) Column family deletion time is not always reseted after gc_grace
[ https://issues.apache.org/jira/browse/CASSANDRA-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059477#comment-13059477 ] Hudson commented on CASSANDRA-2317: --- Integrated in Cassandra #948 (See [https://builds.apache.org/job/Cassandra/948/]) Reset CF and SC deletion time after gc_grace patch by slebresne; reviewed by jbellis for CASSANDRA-2317 slebresne : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1142690 Files : * /cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamily.java * /cassandra/trunk/src/java/org/apache/cassandra/db/filter/QueryFilter.java * /cassandra/trunk/src/java/org/apache/cassandra/db/RowMutation.java * /cassandra/trunk/src/java/org/apache/cassandra/db/SuperColumn.java * /cassandra/trunk/CHANGES.txt * /cassandra/trunk/test/unit/org/apache/cassandra/service/RowResolverTest.java * /cassandra/trunk/test/unit/org/apache/cassandra/db/RowTest.java * /cassandra/trunk/src/java/org/apache/cassandra/db/IColumnContainer.java * /cassandra/trunk/test/unit/org/apache/cassandra/db/compaction/CompactionsPurgeTest.java * /cassandra/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java * /cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java * /cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamilySerializer.java Column family deletion time is not always reseted after gc_grace Key: CASSANDRA-2317 URL: https://issues.apache.org/jira/browse/CASSANDRA-2317 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.6 Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Priority: Minor Fix For: 1.0 Attachments: 0001-Add-AbstractColumnContainer-to-factor-common-parts-o.patch, 0002-Add-unit-test.patch, 0003-Reset-CF-and-SC-deletion-time-after-compaction.patch Original Estimate: 1h Remaining Estimate: 1h Follow up of CASSANDRA-2305. Reproducible (thanks to Jeffrey Wang) by: Create a CF with gc_grace_seconds = 0 and no row cache. Insert row X, col A with timestamp 0. Insert row X, col B with timestamp 2. Remove row X with timestamp 1 (expect col A to disappear, col B to stay). Wait 1 second. Force flush and compaction. Insert row X, col A with timestamp 0. Read row X, col A (see nothing). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2851) hex-to-bytes conversion accepts invalid inputs silently
[ https://issues.apache.org/jira/browse/CASSANDRA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059484#comment-13059484 ] Sylvain Lebresne commented on CASSANDRA-2851: - bq. Our bytesToHex does pad... but only for single-digit results. So if we fix hexToBytes we'll introduce an incompatibility. (Granted, a minor one.) I don't understand. There is no such thing as padding when you convert a byte array to hex (Integer.toHexString does return only the right number of hexadecimal digits because it has no reason to do otherwise, but it's an implementation detail of bytesToHex). A byte is always 8 bit, never 4, and the output of bytesToHex will *always* have a even number of characters (as it should). Our hexToBytes just happen to semi-randomly add 0 in front to transform a buggy input with an odd number of character to a even one, in the off chance that a client used the (stupid) optimization of removing at most 1 leading 0 to win some space or something. In my opinion, it would be better to simply refuse odd sized input because this is more likely a truncated input (and people using stupid clients should fix them, though I'm ok with saying that we'll force them to fix it only on a major upgrade). hex-to-bytes conversion accepts invalid inputs silently --- Key: CASSANDRA-2851 URL: https://issues.apache.org/jira/browse/CASSANDRA-2851 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.6, 0.8.1 Reporter: David Allsopp Priority: Minor Fix For: 0.8.2 Attachments: cassandra-2851.diff FBUtilities.hexToBytes() has a minor bug - it copes with single-character inputs by prepending 0, which is OK - but it does this for any input with an odd number of characters, which is probably incorrect. {noformat} if (str.length() % 2 == 1) str = 0 + str; {noformat} Given 'fff' as an input, can we really assume that this should be '0fff'? Isn't this just an error? Add the following to FBUtilitiesTest to demonstrate: {noformat} String[] badvalues = new String[]{000, fff}; for (int i = 0; i badvalues.length; i++) try { FBUtilities.hexToBytes(badvalues[i]); fail(Invalid hex value accepted+badvalues[i]); } catch (Exception e){} {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2816) Repair doesn't synchronize merkle tree creation properly
[ https://issues.apache.org/jira/browse/CASSANDRA-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059485#comment-13059485 ] Terje Marthinussen commented on CASSANDRA-2816: --- Cool! Then you confirmed what I have sort of believed for a while, but my understanding of code has been a bit in conflict with: http://wiki.apache.org/cassandra/Operations which says: It is safe to run repair against multiple machines at the same time, but to minimize the impact on your application workload it is recommended to wait for it to complete on one node before invoking it against the next. I have always read that as if you have the HW, go for it! May I change to: It is safe to run repair against multiple machines at the same time. However, to minimize the amount of data transferred during a repair, careful synchronization is required between the nodes taking part of the repair. This is difficult to do if nodes with the same data replicas runs repair at the same time and doing so can in extreme cases generate excessive transfers of data. Improvements is being worked on, but for now, avoid scheduling repair on several nodes with replicas of the same data at the same time. Repair doesn't synchronize merkle tree creation properly Key: CASSANDRA-2816 URL: https://issues.apache.org/jira/browse/CASSANDRA-2816 Project: Cassandra Issue Type: Bug Components: Core Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Labels: repair Fix For: 0.8.2 Attachments: 0001-Schedule-merkle-tree-request-one-by-one.patch Being a little slow, I just realized after having opened CASSANDRA-2811 and CASSANDRA-2815 that there is a more general problem with repair. When a repair is started, it will send a number of merkle tree to its neighbor as well as himself and assume for correction that the building of those trees will be started on every node roughly at the same time (if not, we end up comparing data snapshot at different time and will thus mistakenly repair a lot of useless data). This is bogus for many reasons: * Because validation compaction runs on the same executor that other compaction, the start of the validation on the different node is subject to other compactions. 0.8 mitigates this in a way by being multi-threaded (and thus there is less change to be blocked a long time by a long running compaction), but the compaction executor being bounded, its still a problem) * if you run a nodetool repair without arguments, it will repair every CFs. As a consequence it will generate lots of merkle tree requests and all of those requests will be issued at the same time. Because even in 0.8 the compaction executor is bounded, some of those validations will end up being queued behind the first ones. Even assuming that the different validation are submitted in the same order on each node (which isn't guaranteed either), there is no guarantee that on all nodes, the first validation will take the same time, hence desynchronizing the queued ones. Overall, it is important for the precision of repair that for a given CF and range (which is the unit at which trees are computed), we make sure that all node will start the validation at the same time (or, since we can't do magic, as close as possible). One (reasonably simple) proposition to fix this would be to have repair schedule validation compactions across nodes one by one (i.e, one CF/range at a time), waiting for all nodes to return their tree before submitting the next request. Then on each node, we should make sure that the node will start the validation compaction as soon as requested. For that, we probably want to have a specific executor for validation compaction and: * either we fail the whole repair whenever one node is not able to execute the validation compaction right away (because no thread are available right away). * we simply tell the user that if he start too many repairs in parallel, he may start seeing some of those repairing more data than it should. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-2855) Add hadoop support option to skip rows with empty columns
Add hadoop support option to skip rows with empty columns - Key: CASSANDRA-2855 URL: https://issues.apache.org/jira/browse/CASSANDRA-2855 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Jeremy Hanna Assignee: Jeremy Hanna We have been finding that range ghosts appear in results from Hadoop via Pig. This could also happen if rows don't have data for the slice predicate that is given. This leads to having to do a painful amount of defensive checking on the Pig side, especially in the case of range ghosts. We would like to add an option to skip rows that have no column values in it. That functionality existed before in core Cassandra but was removed because of the performance penalty of that checking. However with Hadoop support in the RecordReader, that is batch oriented anyway, so individual row reading performance isn't as much of an issue. Also we would make it an optional config parameter for each job anyway, so people wouldn't have to incur that penalty if they are confident that there won't be those empty rows or they don't care. It could be parameter cassandra.skip.empty.rows and be true/false. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Cassandra Wiki] Update of InstallThrift by Joe Stein
Dear Wiki user, You have subscribed to a wiki page or wiki category on Cassandra Wiki for change notification. The InstallThrift page has been changed by Joe Stein: http://wiki.apache.org/cassandra/InstallThrift?action=diffrev1=13rev2=14 '''NOTE:''' If you arrived here for the purpose of writing your first application, please consider using a [[ClientOptions|higher-level client]] instead of thrift directly. - [[http://incubator.apache.org/thrift|Thrift]] historically did not have tagged releases and Cassandra used trunk revisions of it. As of Cassandra 0.7, Thrift 0.5 is used. For Cassandra 0.6, you have to use the matching version of Thrift. Under such circumstances, installing thrift is a bit of a bitch. We are sorry about that, but we don't know of a better way to support a vast number of clients mostly automagically. + [[http://thrift.apache.org/|Thrift]] historically did not have tagged releases and Cassandra used trunk revisions of it however as of Cassandra 0.8, Thrift 0.6 is used and available for [[http://thrift.apache.org/download/|download]]. With Cassandra 0.7, Thrift 0.5 is used. For Cassandra 0.6, you have to use the matching version of Thrift. Under such circumstances, installing thrift is a bit of a bitch. We are sorry about that, but we don't know of a better way to support a vast number of clients mostly automagically. + If installing Thrift 0.6 on a Mac for use with Cassandra 0.8 and you get an error building 'thrift.protocol.fastbinary' extension during `make` then you might need to work around https://issues.apache.org/jira/browse/THRIFT-1143 by going to thrift-0.6.1/lib/py and run `sudo ARCHFLAGS=-arch x86_64 python setup.py install` + - Important note: you need to install the svn revision of thrift that matches the revision that your version of Cassandra uses (if not using 0.7 with Thrift 0.5). This can be found in the Cassandra Home/lib directory - e.g. `libthrift-917130.jar` means that version of Cassandra uses svn revision 917130 of thrift. + Important note: If using Cassandra 0.6 then you need to install the svn revision of thrift that matches the revision that your version of Cassandra uses (if not using 0.8 with Thrift 0.6 nor 0.7 with Thrift 0.5). This can be found in the Cassandra Home/lib directory - e.g. `libthrift-917130.jar` means that version of Cassandra uses svn revision 917130 of thrift. 1. `aptitude install libboost-dev python-dev autoconf automake pkg-config make libtool flex bison build-essential` (or the equivalent on your system) (assumes you are interested in building for python; omit python-dev otherwise) 1. Grab the thrift source with the revision that your version of Cassandra uses: e.g. `svn co -r 917130 http://svn.apache.org/repos/asf/thrift/trunk thrift`
[jira] [Commented] (CASSANDRA-2851) hex-to-bytes conversion accepts invalid inputs silently
[ https://issues.apache.org/jira/browse/CASSANDRA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059500#comment-13059500 ] Jonathan Ellis commented on CASSANDRA-2851: --- You're right, I was misreading how we were using Integer.toHexString. hex-to-bytes conversion accepts invalid inputs silently --- Key: CASSANDRA-2851 URL: https://issues.apache.org/jira/browse/CASSANDRA-2851 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.6, 0.8.1 Reporter: David Allsopp Priority: Minor Fix For: 0.8.2 Attachments: cassandra-2851.diff FBUtilities.hexToBytes() has a minor bug - it copes with single-character inputs by prepending 0, which is OK - but it does this for any input with an odd number of characters, which is probably incorrect. {noformat} if (str.length() % 2 == 1) str = 0 + str; {noformat} Given 'fff' as an input, can we really assume that this should be '0fff'? Isn't this just an error? Add the following to FBUtilitiesTest to demonstrate: {noformat} String[] badvalues = new String[]{000, fff}; for (int i = 0; i badvalues.length; i++) try { FBUtilities.hexToBytes(badvalues[i]); fail(Invalid hex value accepted+badvalues[i]); } catch (Exception e){} {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
svn commit: r1142725 - in /cassandra/branches/cassandra-0.8: CHANGES.txt src/java/org/apache/cassandra/cli/CliClient.java
Author: jbellis Date: Mon Jul 4 16:20:14 2011 New Revision: 1142725 URL: http://svn.apache.org/viewvc?rev=1142725view=rev Log: fix CLI perpetuating obsolete KsDef.replication_factor patch by jbellis; tested by Jonas Borgström for CASSANDRA-2846 Modified: cassandra/branches/cassandra-0.8/CHANGES.txt cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliClient.java Modified: cassandra/branches/cassandra-0.8/CHANGES.txt URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/CHANGES.txt?rev=1142725r1=1142724r2=1142725view=diff == --- cassandra/branches/cassandra-0.8/CHANGES.txt (original) +++ cassandra/branches/cassandra-0.8/CHANGES.txt Mon Jul 4 16:20:14 2011 @@ -13,6 +13,7 @@ * Correctly set default for replicate_on_write (CASSANDRA-2835) * improve nodetool compactionstats formatting (CASSANDRA-2844) * fix index-building status display (CASSANDRA-2853) + * fix CLI perpetuating obsolete KsDef.replication_factor (CASSANDRA-2846) 0.8.1 Modified: cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliClient.java URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliClient.java?rev=1142725r1=1142724r2=1142725view=diff == --- cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliClient.java (original) +++ cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliClient.java Mon Jul 4 16:20:14 2011 @@ -1072,7 +1072,10 @@ public class CliClient private KsDef updateKsDefAttributes(Tree statement, KsDef ksDefToUpdate) { KsDef ksDef = new KsDef(ksDefToUpdate); - +// server helpfully sets deprecated replication factor when it sends a KsDef back, for older clients. +// we need to unset that on the new KsDef we create to avoid being treated as a legacy client in return. +ksDef.unsetReplication_factor(); + // removing all column definitions - thrift system_update_keyspace method requires that ksDef.setCf_defs(new LinkedListCfDef());
svn commit: r1142727 - in /cassandra/branches/cassandra-0.7: CHANGES.txt src/java/org/apache/cassandra/cli/CliMain.java
Author: jbellis Date: Mon Jul 4 16:22:12 2011 New Revision: 1142727 URL: http://svn.apache.org/viewvc?rev=1142727view=rev Log: improve cli treatment of multiline comments patch by pyaskevich; reviewed by jbellis for CASSANDRA-2852 Modified: cassandra/branches/cassandra-0.7/CHANGES.txt cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/CliMain.java Modified: cassandra/branches/cassandra-0.7/CHANGES.txt URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/CHANGES.txt?rev=1142727r1=1142726r2=1142727view=diff == --- cassandra/branches/cassandra-0.7/CHANGES.txt (original) +++ cassandra/branches/cassandra-0.7/CHANGES.txt Mon Jul 4 16:22:12 2011 @@ -31,6 +31,7 @@ (CASSANDRA-2841) * allow deleting a row and updating indexed columns in it in the same mutation (CASSANDRA-2773) + * improve cli treatment of multiline comments (CASSANDRA-2852) 0.7.6 Modified: cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/CliMain.java URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/CliMain.java?rev=1142727r1=1142726r2=1142727view=diff == --- cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/CliMain.java (original) +++ cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/CliMain.java Mon Jul 4 16:22:12 2011 @@ -365,6 +365,8 @@ public class CliMain String line = ; String currentStatement = ; +boolean commentedBlock = false; + while ((line = reader.readLine()) != null) { line = line.trim(); @@ -373,6 +375,18 @@ public class CliMain if (line.isEmpty() || line.startsWith(--)) continue; +if (line.startsWith(/*)) +commentedBlock = true; + +if (line.startsWith(*/) || line.endsWith(*/)) +{ +commentedBlock = false; +continue; +} + +if (commentedBlock) // skip commented lines +continue; + currentStatement += line; if (line.endsWith(;))
svn commit: r1142729 - in /cassandra/branches/cassandra-0.8: ./ contrib/ interface/thrift/gen-java/org/apache/cassandra/thrift/ src/java/org/apache/cassandra/cli/ test/unit/org/apache/cassandra/db/
Author: jbellis Date: Mon Jul 4 16:23:36 2011 New Revision: 1142729 URL: http://svn.apache.org/viewvc?rev=1142729view=rev Log: merge from 0.7 Modified: cassandra/branches/cassandra-0.8/ (props changed) cassandra/branches/cassandra-0.8/CHANGES.txt cassandra/branches/cassandra-0.8/contrib/ (props changed) cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java (props changed) cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java (props changed) cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java (props changed) cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/NotFoundException.java (props changed) cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/SuperColumn.java (props changed) cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliMain.java cassandra/branches/cassandra-0.8/test/unit/org/apache/cassandra/db/ColumnFamilyStoreTest.java Propchange: cassandra/branches/cassandra-0.8/ -- --- svn:mergeinfo (original) +++ svn:mergeinfo Mon Jul 4 16:23:36 2011 @@ -1,5 +1,5 @@ /cassandra/branches/cassandra-0.6:922689-1052356,1052358-1053452,1053454,1053456-1131291 -/cassandra/branches/cassandra-0.7:1026516-1140567,1140928,1141129,1141213,1141217 +/cassandra/branches/cassandra-0.7:1026516-1142727 /cassandra/branches/cassandra-0.7.0:1053690-1055654 /cassandra/branches/cassandra-0.8:1090934-1125013,1125041 /cassandra/branches/cassandra-0.8.0:1125021-1130369 Modified: cassandra/branches/cassandra-0.8/CHANGES.txt URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/CHANGES.txt?rev=1142729r1=1142728r2=1142729view=diff == --- cassandra/branches/cassandra-0.8/CHANGES.txt (original) +++ cassandra/branches/cassandra-0.8/CHANGES.txt Mon Jul 4 16:23:36 2011 @@ -14,6 +14,7 @@ * improve nodetool compactionstats formatting (CASSANDRA-2844) * fix index-building status display (CASSANDRA-2853) * fix CLI perpetuating obsolete KsDef.replication_factor (CASSANDRA-2846) + * improve cli treatment of multiline comments (CASSANDRA-2852) 0.8.1 Propchange: cassandra/branches/cassandra-0.8/contrib/ -- --- svn:mergeinfo (original) +++ svn:mergeinfo Mon Jul 4 16:23:36 2011 @@ -1,5 +1,5 @@ /cassandra/branches/cassandra-0.6/contrib:922689-1052356,1052358-1053452,1053454,1053456-1068009 -/cassandra/branches/cassandra-0.7/contrib:1026516-1140567,1140928,1141129,1141213,1141217 +/cassandra/branches/cassandra-0.7/contrib:1026516-1142727 /cassandra/branches/cassandra-0.7.0/contrib:1053690-1055654 /cassandra/branches/cassandra-0.8/contrib:1090934-1125013,1125041 /cassandra/branches/cassandra-0.8.0/contrib:1125021-1130369 Propchange: cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java -- --- svn:mergeinfo (original) +++ svn:mergeinfo Mon Jul 4 16:23:36 2011 @@ -1,5 +1,5 @@ /cassandra/branches/cassandra-0.6/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:922689-1052356,1052358-1053452,1053454,1053456-1131291 -/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1026516-1140567,1140928,1141129,1141213,1141217 +/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1026516-1142727 /cassandra/branches/cassandra-0.7.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1053690-1055654 /cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1090934-1125013,1125041 /cassandra/branches/cassandra-0.8.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1125021-1130369 Propchange: cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java -- --- svn:mergeinfo (original) +++ svn:mergeinfo Mon Jul 4 16:23:36 2011 @@ -1,5 +1,5 @@ /cassandra/branches/cassandra-0.6/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:922689-1052356,1052358-1053452,1053454,1053456-1131291 -/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1026516-1140567,1140928,1141129,1141213,1141217 +/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1026516-1142727
[Cassandra Wiki] Update of FAQ by TylerHobbs
Dear Wiki user, You have subscribed to a wiki page or wiki category on Cassandra Wiki for change notification. The FAQ page has been changed by TylerHobbs: http://wiki.apache.org/cassandra/FAQ?action=diffrev1=122rev2=123 Comment: Add CassandraClusterAdmin to the list of GUI admins Anchor(gui) == Is there a GUI admin tool for Cassandra? == - The closest is [[http://github.com/driftx/chiton|chiton]], a GTK data browser. + * [[http://github.com/driftx/chiton|chiton]], a GTK data browser. - - Another java UI http://code.google.com/p/cassandra-gui, a Swing data browser. + * [[http://code.google.com/p/cassandra-gui|cassandra-gui]], a Swing data browser. + * [[https://github.com/sebgiroux/Cassandra-Cluster-Admin|Cassandra Cluster Admin]], a PHP-based web UI. Anchor(a_long_is_exactly_8_bytes)
[jira] [Commented] (CASSANDRA-2852) Cassandra CLI - Import Keyspace Definitions from File - Comments do partitially interpret characters/commands
[ https://issues.apache.org/jira/browse/CASSANDRA-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059513#comment-13059513 ] Hudson commented on CASSANDRA-2852: --- Integrated in Cassandra-0.7 #520 (See [https://builds.apache.org/job/Cassandra-0.7/520/]) improve cli treatment of multiline comments patch by pyaskevich; reviewed by jbellis for CASSANDRA-2852 jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1142727 Files : * /cassandra/branches/cassandra-0.7/CHANGES.txt * /cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/cli/CliMain.java Cassandra CLI - Import Keyspace Definitions from File - Comments do partitially interpret characters/commands - Key: CASSANDRA-2852 URL: https://issues.apache.org/jira/browse/CASSANDRA-2852 Project: Cassandra Issue Type: Bug Components: Tools Affects Versions: 0.7.0 Environment: Win Vista Reporter: jens mueller Assignee: Pavel Yaskevich Priority: Trivial Fix For: 0.7.7, 0.8.2 Attachments: CASSANDRA-2852.patch Hello, using: bin/cassandra-cli -host localhost --file conf/schema-sample.txt with schema-sample.txt having contents like this: /* here are a lot of comments, like this sample create keyspace; and so on */ Will result in an error: Line 1 = Syntax Error at Position 323: mismatched charackter 'EOF' expecting '*' The Cause is the keyspace; statement = the semicolon ; causes the error. However: Writing the word keyspace; with quotes, does NOT lead to the error. so this works: /* here are a lot of comments, like this sample create keyspace; and so on */ From my point of view this is an error. Everyting between the Start Comment = /* and End Comment = */ Should be treated as a comment and not be interpreted in any way. Thats the definition of a comment, to be not interpreted at all. Or this must be documented somewhere very prominently, otherwise this will lead to unnecessary wasting of time searching for this odd behavoiur. And it makes commenting out statements much more cumbersome. Plattform: Windows Vista thanks -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2846) Changing replication_factor using update keyspace not working
[ https://issues.apache.org/jira/browse/CASSANDRA-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059518#comment-13059518 ] Hudson commented on CASSANDRA-2846: --- Integrated in Cassandra-0.8 #204 (See [https://builds.apache.org/job/Cassandra-0.8/204/]) fix CLI perpetuating obsolete KsDef.replication_factor patch by jbellis; tested by Jonas Borgström for CASSANDRA-2846 jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1142725 Files : * /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliClient.java * /cassandra/branches/cassandra-0.8/CHANGES.txt Changing replication_factor using update keyspace not working --- Key: CASSANDRA-2846 URL: https://issues.apache.org/jira/browse/CASSANDRA-2846 Project: Cassandra Issue Type: Bug Affects Versions: 0.8.1 Environment: A clean 0.8.1 install using the default configuration Reporter: Jonas Borgström Assignee: Jonathan Ellis Priority: Minor Fix For: 0.8.2 Attachments: 2846.txt Unless I've misunderstood the new way to do this with 0.8 I think update keyspace is broken: {code} [default@unknown] create keyspace Test with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = [{replication_factor:1}]; 37f70d40-a3e9-11e0--242d50cf1fbf Waiting for schema agreement... ... schemas agree across the cluster [default@unknown] describe keyspace Test; Keyspace: Test: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:1] Column Families: [default@unknown] update keyspace Test with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = [{replication_factor:2}]; 489fe220-a3e9-11e0--242d50cf1fbf Waiting for schema agreement... ... schemas agree across the cluster [default@unknown] describe keyspace Test; Keyspace: Test: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:1] Column Families: {code} Isn't the second describe keyspace supposed to to say replication_factor:2? Relevant bits from system.log: {code} Migration.java (line 116) Applying migration 489fe220-a3e9-11e0--242d50cf1fbf Update keyspace Testrep strategy:SimpleStrategy{}durable_writes: true to Testrep strategy:SimpleStrategy{}durable_writes: true UpdateKeyspace.java (line 74) Keyspace updated. Please perform any manual operations {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2851) hex-to-bytes conversion accepts invalid inputs silently
[ https://issues.apache.org/jira/browse/CASSANDRA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059526#comment-13059526 ] David Allsopp commented on CASSANDRA-2851: -- The origin of the current behaviour is CASSANDRA-1411 https://issues.apache.org/jira/browse/CASSANDRA-1411 if that helps... hex-to-bytes conversion accepts invalid inputs silently --- Key: CASSANDRA-2851 URL: https://issues.apache.org/jira/browse/CASSANDRA-2851 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.6, 0.8.1 Reporter: David Allsopp Priority: Minor Fix For: 0.8.2 Attachments: cassandra-2851.diff FBUtilities.hexToBytes() has a minor bug - it copes with single-character inputs by prepending 0, which is OK - but it does this for any input with an odd number of characters, which is probably incorrect. {noformat} if (str.length() % 2 == 1) str = 0 + str; {noformat} Given 'fff' as an input, can we really assume that this should be '0fff'? Isn't this just an error? Add the following to FBUtilitiesTest to demonstrate: {noformat} String[] badvalues = new String[]{000, fff}; for (int i = 0; i badvalues.length; i++) try { FBUtilities.hexToBytes(badvalues[i]); fail(Invalid hex value accepted+badvalues[i]); } catch (Exception e){} {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-1125: -- Attachment: 1125-v3.txt v3 makes the KeyRange an implementation detail (setInputRange just takes Strings for start and end) and fixes a reference to the key fields in CFIF. Filter out ColumnFamily rows that aren't part of the query -- Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Assignee: Mck SembWever Priority: Minor Fix For: 1.0 Attachments: 1125-formatted.txt, 1125-v3.txt, CASSANDRA-1125.patch, CASSANDRA-1125.patch Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2851) hex-to-bytes conversion accepts invalid inputs silently
[ https://issues.apache.org/jira/browse/CASSANDRA-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059582#comment-13059582 ] Jonathan Ellis commented on CASSANDRA-2851: --- Good point, David. Sounds like the problem is thinking of this as a generic hex conversion function, rather than as hex that specifically represents bytes. hex-to-bytes conversion accepts invalid inputs silently --- Key: CASSANDRA-2851 URL: https://issues.apache.org/jira/browse/CASSANDRA-2851 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.6, 0.8.1 Reporter: David Allsopp Priority: Minor Fix For: 0.8.2 Attachments: cassandra-2851.diff FBUtilities.hexToBytes() has a minor bug - it copes with single-character inputs by prepending 0, which is OK - but it does this for any input with an odd number of characters, which is probably incorrect. {noformat} if (str.length() % 2 == 1) str = 0 + str; {noformat} Given 'fff' as an input, can we really assume that this should be '0fff'? Isn't this just an error? Add the following to FBUtilitiesTest to demonstrate: {noformat} String[] badvalues = new String[]{000, fff}; for (int i = 0; i badvalues.length; i++) try { FBUtilities.hexToBytes(badvalues[i]); fail(Invalid hex value accepted+badvalues[i]); } catch (Exception e){} {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow
[ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059589#comment-13059589 ] David Allsopp commented on CASSANDRA-2850: -- I think you mean (bytes.remaining() * 2) not (bytes.remaining() / 2) - we need twice as many chars as bytes. Also, shouldn't byteToChar[] have length 16, not 256. Not sure what string creation you are referring to? I attach 2 further versions of bytesToHex (as another benchmark class 3). Results are below (I've had to increasse the number of repeats so the stats are significant!). v3 uses 'normal' code and is another 20% faster for large values, and _another_ factor of 2 faster than v2, i.e. 7-10 time sfatser than the original. v4 uses nasty reflection to avoid doing an arraycopy on the byte array - this avoids a large chunk of memory (all the previous solutions end up doing an arraycopy somewhere). This is now 11-13 times fatser than the original. 20M old: 1482 20M new: 360 20M v2: 249 20M v3: 203 20M v4: 125 old: 2137 new: 859 v2: 718 v3: 203 v4: 156 old: 2138 new: 843 v2: 733 v3: 188 v4: 156 Converting bytes to hex string is unnecessarily slow Key: CASSANDRA-2850 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7.6, 0.8.1 Reporter: David Allsopp Priority: Minor Fix For: 0.8.2 Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, cassandra-2850a.diff ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte. (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change) Will attach patch shortly that speeds it up by about x3, plus benchmarking test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow
[ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Allsopp updated CASSANDRA-2850: - Attachment: BytesToHexBenchmark3.java Converting bytes to hex string is unnecessarily slow Key: CASSANDRA-2850 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7.6, 0.8.1 Reporter: David Allsopp Priority: Minor Fix For: 0.8.2 Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte. (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change) Will attach patch shortly that speeds it up by about x3, plus benchmarking test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow
[ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Allsopp updated CASSANDRA-2850: - Attachment: (was: BytesToHexBenchmark3.java) Converting bytes to hex string is unnecessarily slow Key: CASSANDRA-2850 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7.6, 0.8.1 Reporter: David Allsopp Priority: Minor Fix For: 0.8.2 Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte. (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change) Will attach patch shortly that speeds it up by about x3, plus benchmarking test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow
[ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Allsopp updated CASSANDRA-2850: - Attachment: BytesToHexBenchmark3.java Converting bytes to hex string is unnecessarily slow Key: CASSANDRA-2850 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7.6, 0.8.1 Reporter: David Allsopp Priority: Minor Fix For: 0.8.2 Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte. (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change) Will attach patch shortly that speeds it up by about x3, plus benchmarking test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow
[ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059589#comment-13059589 ] David Allsopp edited comment on CASSANDRA-2850 at 7/4/11 7:46 PM: -- I think you mean (bytes.remaining() * 2) not (bytes.remaining() / 2) - we need twice as many chars as bytes. Also, shouldn't byteToChar[] have length 16, not 256? Not sure what string creation you are referring to? I attach 2 further versions of bytesToHex (as another benchmark class 3). Results are below (I've had to increase the number of repeats so the stats are significant!). v3 uses 'normal' code and is another 20% faster for large values, and _another_ factor of 2 faster than v2, i.e. 7-10 times faster than the original. v4 uses nasty reflection to avoid doing an arraycopy on the byte array - this avoids a large chunk of memory (all the previous solutions end up doing an arraycopy somewhere). This is now 11-13 times faster than the original. 20M old: 1482 20M new: 360 20M v2: 249 20M v3: 203 20M v4: 125 old: 2137 new: 859 v2: 718 v3: 203 v4: 156 old: 2138 new: 843 v2: 733 v3: 188 v4: 156 was (Author: dallsopp): I think you mean (bytes.remaining() * 2) not (bytes.remaining() / 2) - we need twice as many chars as bytes. Also, shouldn't byteToChar[] have length 16, not 256. Not sure what string creation you are referring to? I attach 2 further versions of bytesToHex (as another benchmark class 3). Results are below (I've had to increasse the number of repeats so the stats are significant!). v3 uses 'normal' code and is another 20% faster for large values, and _another_ factor of 2 faster than v2, i.e. 7-10 time sfatser than the original. v4 uses nasty reflection to avoid doing an arraycopy on the byte array - this avoids a large chunk of memory (all the previous solutions end up doing an arraycopy somewhere). This is now 11-13 times fatser than the original. 20M old: 1482 20M new: 360 20M v2: 249 20M v3: 203 20M v4: 125 old: 2137 new: 859 v2: 718 v3: 203 v4: 156 old: 2138 new: 843 v2: 733 v3: 188 v4: 156 Converting bytes to hex string is unnecessarily slow Key: CASSANDRA-2850 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7.6, 0.8.1 Reporter: David Allsopp Priority: Minor Fix For: 0.8.2 Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte. (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change) Will attach patch shortly that speeds it up by about x3, plus benchmarking test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow
[ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059595#comment-13059595 ] David Allsopp commented on CASSANDRA-2850: -- An issue with using hex at all is that we can't represent the maximum 2GB column value. If we have Integer.MAX_VALUE bytes, then we need twice as many chars - and arrays in Java are limited to Integer.MAX_VALUE. Converting bytes to hex string is unnecessarily slow Key: CASSANDRA-2850 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7.6, 0.8.1 Reporter: David Allsopp Priority: Minor Fix For: 0.8.2 Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte. (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change) Will attach patch shortly that speeds it up by about x3, plus benchmarking test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow
[ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059607#comment-13059607 ] David Allsopp commented on CASSANDRA-2850: -- I can't improve any further on Sylvain's hexToByte - nice work! Converting bytes to hex string is unnecessarily slow Key: CASSANDRA-2850 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7.6, 0.8.1 Reporter: David Allsopp Priority: Minor Fix For: 0.8.2 Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte. (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change) Will attach patch shortly that speeds it up by about x3, plus benchmarking test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Cassandra Wiki] Update of StorageConfiguration by AlexisLeQuoc
Dear Wiki user, You have subscribed to a wiki page or wiki category on Cassandra Wiki for change notification. The StorageConfiguration page has been changed by AlexisLeQuoc: http://wiki.apache.org/cassandra/StorageConfiguration?action=diffrev1=58rev2=59 * '''replica_placement_strategy''' and '''replication_factor''' + === Pre-0.8.1 === Strategy: Setting this to the class that implements {{{IReplicaPlacementStrategy}}} will change the way the node picker works. Out of the box, Cassandra provides {{{org.apache.cassandra.locator.RackUnawareStrategy}}} and {{{org.apache.cassandra.locator.RackAwareStrategy}}} (place one replica in a different datacenter, and the others on different racks in the same one.) Note that the replication factor (RF) is the ''total'' number of nodes onto which the data will be placed. So, a replication factor of 1 means that only 1 node will have the data. It does '''not''' mean that one ''other'' node will have the data. Defaults are: 'org.apache.cassandra.locator.RackUnawareStrategy' and '1'. RF of at least 2 is highly recommended, keeping in mind that your effective number of nodes is (N total nodes / RF). + + === 0.8.1 === + Strategy: Setting this to the class that implements {{{IReplicaPlacementStrategy}}} will change the way the node picker works. Out of the box, Cassandra provides {{{org.apache.cassandra.locator.SimpleStrategy}}}, {{{org.apache.cassandra.locator.LocalStrategy}}} and {{{org.apache.cassandra.locator.NetworkTopologyStrategy}}} (place one replica in a different datacenter, and the others on different racks in the same one.) + + Note that the replication factor (RF) is the ''total'' number of nodes onto which the data will be placed. So, a replication factor of 1 means that only 1 node will have the data. It does '''not''' mean that one ''other'' node will have the data. + + Defaults are: 'org.apache.cassandra.locator.NetworkTopologyStrategy' and '1'. RF of at least 2 is highly recommended, keeping in mind that your effective number of nodes is (N total nodes / RF). == per-ColumnFamily Settings == * '''comment''' and '''name'''
[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow
[ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Allsopp updated CASSANDRA-2850: - Attachment: BytesToHexBenchmark3.java Converting bytes to hex string is unnecessarily slow Key: CASSANDRA-2850 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7.6, 0.8.1 Reporter: David Allsopp Priority: Minor Fix For: 0.8.2 Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte. (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change) Will attach patch shortly that speeds it up by about x3, plus benchmarking test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow
[ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Allsopp updated CASSANDRA-2850: - Attachment: (was: BytesToHexBenchmark3.java) Converting bytes to hex string is unnecessarily slow Key: CASSANDRA-2850 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7.6, 0.8.1 Reporter: David Allsopp Priority: Minor Fix For: 0.8.2 Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte. (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change) Will attach patch shortly that speeds it up by about x3, plus benchmarking test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow
[ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059622#comment-13059622 ] David Allsopp commented on CASSANDRA-2850: -- Update - the benchmark version 3 was running v3 twice, not v3 then v4. Have re-attached. New results are: 20M old: 1435 20M new: 376 20M v2: 405 20M v3: 141 20M v4: 93 20M old: 1265 20M new: 360 20M v2: 234 20M v3: 187 20M v4: 78 20M old: 1233 20M new: 376 20M v2: 452 20M v3: 125 20M v4: 63 old: 2184 new: 906 v2: 577 v3: 188 v4: 172 old: 2215 new: 937 v2: 593 v3: 188 v4: 156 Converting bytes to hex string is unnecessarily slow Key: CASSANDRA-2850 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7.6, 0.8.1 Reporter: David Allsopp Priority: Minor Fix For: 0.8.2 Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte. (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change) Will attach patch shortly that speeds it up by about x3, plus benchmarking test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow
[ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Allsopp updated CASSANDRA-2850: - Attachment: (was: BytesToHexBenchmark3.java) Converting bytes to hex string is unnecessarily slow Key: CASSANDRA-2850 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7.6, 0.8.1 Reporter: David Allsopp Priority: Minor Fix For: 0.8.2 Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte. (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change) Will attach patch shortly that speeds it up by about x3, plus benchmarking test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow
[ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Allsopp updated CASSANDRA-2850: - Attachment: BytesToHexBenchmark3.java Converting bytes to hex string is unnecessarily slow Key: CASSANDRA-2850 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7.6, 0.8.1 Reporter: David Allsopp Priority: Minor Fix For: 0.8.2 Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte. (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change) Will attach patch shortly that speeds it up by about x3, plus benchmarking test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow
[ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059622#comment-13059622 ] David Allsopp edited comment on CASSANDRA-2850 at 7/4/11 9:48 PM: -- Update - the benchmark version 3 was running v3 twice, not v3 then v4. Have re-attached. New results are 15-19x faster for 20MB values, 13-14x faster for 1KB values. 20M old: 1435 20M new: 376 20M v2: 405 20M v3: 141 20M v4: 93 20M old: 1265 20M new: 360 20M v2: 234 20M v3: 187 20M v4: 78 20M old: 1233 20M new: 376 20M v2: 452 20M v3: 125 20M v4: 63 old: 2184 new: 906 v2: 577 v3: 188 v4: 172 old: 2215 new: 937 v2: 593 v3: 188 v4: 156 was (Author: dallsopp): Update - the benchmark version 3 was running v3 twice, not v3 then v4. Have re-attached. New results are: 20M old: 1435 20M new: 376 20M v2: 405 20M v3: 141 20M v4: 93 20M old: 1265 20M new: 360 20M v2: 234 20M v3: 187 20M v4: 78 20M old: 1233 20M new: 376 20M v2: 452 20M v3: 125 20M v4: 63 old: 2184 new: 906 v2: 577 v3: 188 v4: 172 old: 2215 new: 937 v2: 593 v3: 188 v4: 156 Converting bytes to hex string is unnecessarily slow Key: CASSANDRA-2850 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7.6, 0.8.1 Reporter: David Allsopp Priority: Minor Fix For: 0.8.2 Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte. (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change) Will attach patch shortly that speeds it up by about x3, plus benchmarking test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2850) Converting bytes to hex string is unnecessarily slow
[ https://issues.apache.org/jira/browse/CASSANDRA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059633#comment-13059633 ] David Allsopp commented on CASSANDRA-2850: -- Although the bytesToHex reflection hack is a bit horrible, it makes a huge difference with really big values - I've just been trying different input sizes (with -Xmx4g -Xms4g on a 6GB machine) and the JVM falls over with OOM at about 300MB for all the other versions, but copes with 675MB for v4. With the other versions, for byte array size N, we also need at least 2N for the StringBuilder or char[], then another 2N for the String (because the normal String constructors and methods always do an arraycopy of the input byte[] - 5N. I wonder where else in the code this sort of thing occurs...? Converting bytes to hex string is unnecessarily slow Key: CASSANDRA-2850 URL: https://issues.apache.org/jira/browse/CASSANDRA-2850 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7.6, 0.8.1 Reporter: David Allsopp Priority: Minor Fix For: 0.8.2 Attachments: 2850-v2.patch, BytesToHexBenchmark.java, BytesToHexBenchmark2.java, BytesToHexBenchmark3.java, cassandra-2850a.diff ByteBufferUtil.bytesToHex() is unnecessarily slow - it doesn't pre-size the StringBuilder (so several re-sizes will be needed behind the scenes) and it makes quite a few method calls per byte. (OK, this may be a premature optimisation, but I couldn't resist, and it's a small change) Will attach patch shortly that speeds it up by about x3, plus benchmarking test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2816) Repair doesn't synchronize merkle tree creation properly
[ https://issues.apache.org/jira/browse/CASSANDRA-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059637#comment-13059637 ] Terje Marthinussen commented on CASSANDRA-2816: --- Regardless of change of documentation however, I don't think it should be possible to actually trigger a scenario like this in the first place. The system should protect the user from that. I also noticed that in this case, we have RF3. The node which is going somewhat crazy is number 6, however during the repair, it does log that it talks compares and streams data with node 4, 5, 7 and 8. Seems like a couple of nodes too many? Repair doesn't synchronize merkle tree creation properly Key: CASSANDRA-2816 URL: https://issues.apache.org/jira/browse/CASSANDRA-2816 Project: Cassandra Issue Type: Bug Components: Core Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Labels: repair Fix For: 0.8.2 Attachments: 0001-Schedule-merkle-tree-request-one-by-one.patch Being a little slow, I just realized after having opened CASSANDRA-2811 and CASSANDRA-2815 that there is a more general problem with repair. When a repair is started, it will send a number of merkle tree to its neighbor as well as himself and assume for correction that the building of those trees will be started on every node roughly at the same time (if not, we end up comparing data snapshot at different time and will thus mistakenly repair a lot of useless data). This is bogus for many reasons: * Because validation compaction runs on the same executor that other compaction, the start of the validation on the different node is subject to other compactions. 0.8 mitigates this in a way by being multi-threaded (and thus there is less change to be blocked a long time by a long running compaction), but the compaction executor being bounded, its still a problem) * if you run a nodetool repair without arguments, it will repair every CFs. As a consequence it will generate lots of merkle tree requests and all of those requests will be issued at the same time. Because even in 0.8 the compaction executor is bounded, some of those validations will end up being queued behind the first ones. Even assuming that the different validation are submitted in the same order on each node (which isn't guaranteed either), there is no guarantee that on all nodes, the first validation will take the same time, hence desynchronizing the queued ones. Overall, it is important for the precision of repair that for a given CF and range (which is the unit at which trees are computed), we make sure that all node will start the validation at the same time (or, since we can't do magic, as close as possible). One (reasonably simple) proposition to fix this would be to have repair schedule validation compactions across nodes one by one (i.e, one CF/range at a time), waiting for all nodes to return their tree before submitting the next request. Then on each node, we should make sure that the node will start the validation compaction as soon as requested. For that, we probably want to have a specific executor for validation compaction and: * either we fail the whole repair whenever one node is not able to execute the validation compaction right away (because no thread are available right away). * we simply tell the user that if he start too many repairs in parallel, he may start seeing some of those repairing more data than it should. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2816) Repair doesn't synchronize merkle tree creation properly
[ https://issues.apache.org/jira/browse/CASSANDRA-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059636#comment-13059636 ] Terje Marthinussen commented on CASSANDRA-2816: --- Regardless of change of documentation however, I don't think it should be possible to actually trigger a scenario like this in the first place. The system should protect the user from that. I also noticed that in this case, we have RF3. The node which is going somewhat crazy is number 6, however during the repair, it does log that it talks compares and streams data with node 4, 5, 7 and 8. Seems like a couple of nodes too many? Repair doesn't synchronize merkle tree creation properly Key: CASSANDRA-2816 URL: https://issues.apache.org/jira/browse/CASSANDRA-2816 Project: Cassandra Issue Type: Bug Components: Core Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Labels: repair Fix For: 0.8.2 Attachments: 0001-Schedule-merkle-tree-request-one-by-one.patch Being a little slow, I just realized after having opened CASSANDRA-2811 and CASSANDRA-2815 that there is a more general problem with repair. When a repair is started, it will send a number of merkle tree to its neighbor as well as himself and assume for correction that the building of those trees will be started on every node roughly at the same time (if not, we end up comparing data snapshot at different time and will thus mistakenly repair a lot of useless data). This is bogus for many reasons: * Because validation compaction runs on the same executor that other compaction, the start of the validation on the different node is subject to other compactions. 0.8 mitigates this in a way by being multi-threaded (and thus there is less change to be blocked a long time by a long running compaction), but the compaction executor being bounded, its still a problem) * if you run a nodetool repair without arguments, it will repair every CFs. As a consequence it will generate lots of merkle tree requests and all of those requests will be issued at the same time. Because even in 0.8 the compaction executor is bounded, some of those validations will end up being queued behind the first ones. Even assuming that the different validation are submitted in the same order on each node (which isn't guaranteed either), there is no guarantee that on all nodes, the first validation will take the same time, hence desynchronizing the queued ones. Overall, it is important for the precision of repair that for a given CF and range (which is the unit at which trees are computed), we make sure that all node will start the validation at the same time (or, since we can't do magic, as close as possible). One (reasonably simple) proposition to fix this would be to have repair schedule validation compactions across nodes one by one (i.e, one CF/range at a time), waiting for all nodes to return their tree before submitting the next request. Then on each node, we should make sure that the node will start the validation compaction as soon as requested. For that, we probably want to have a specific executor for validation compaction and: * either we fail the whole repair whenever one node is not able to execute the validation compaction right away (because no thread are available right away). * we simply tell the user that if he start too many repairs in parallel, he may start seeing some of those repairing more data than it should. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2816) Repair doesn't synchronize merkle tree creation properly
[ https://issues.apache.org/jira/browse/CASSANDRA-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059643#comment-13059643 ] Jonathan Ellis commented on CASSANDRA-2816: --- bq. May I change to Sure. bq. The system should protect the user from that I'm not sure that in a p2p design we can posit an omniscient the system. Repair doesn't synchronize merkle tree creation properly Key: CASSANDRA-2816 URL: https://issues.apache.org/jira/browse/CASSANDRA-2816 Project: Cassandra Issue Type: Bug Components: Core Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Labels: repair Fix For: 0.8.2 Attachments: 0001-Schedule-merkle-tree-request-one-by-one.patch Being a little slow, I just realized after having opened CASSANDRA-2811 and CASSANDRA-2815 that there is a more general problem with repair. When a repair is started, it will send a number of merkle tree to its neighbor as well as himself and assume for correction that the building of those trees will be started on every node roughly at the same time (if not, we end up comparing data snapshot at different time and will thus mistakenly repair a lot of useless data). This is bogus for many reasons: * Because validation compaction runs on the same executor that other compaction, the start of the validation on the different node is subject to other compactions. 0.8 mitigates this in a way by being multi-threaded (and thus there is less change to be blocked a long time by a long running compaction), but the compaction executor being bounded, its still a problem) * if you run a nodetool repair without arguments, it will repair every CFs. As a consequence it will generate lots of merkle tree requests and all of those requests will be issued at the same time. Because even in 0.8 the compaction executor is bounded, some of those validations will end up being queued behind the first ones. Even assuming that the different validation are submitted in the same order on each node (which isn't guaranteed either), there is no guarantee that on all nodes, the first validation will take the same time, hence desynchronizing the queued ones. Overall, it is important for the precision of repair that for a given CF and range (which is the unit at which trees are computed), we make sure that all node will start the validation at the same time (or, since we can't do magic, as close as possible). One (reasonably simple) proposition to fix this would be to have repair schedule validation compactions across nodes one by one (i.e, one CF/range at a time), waiting for all nodes to return their tree before submitting the next request. Then on each node, we should make sure that the node will start the validation compaction as soon as requested. For that, we probably want to have a specific executor for validation compaction and: * either we fail the whole repair whenever one node is not able to execute the validation compaction right away (because no thread are available right away). * we simply tell the user that if he start too many repairs in parallel, he may start seeing some of those repairing more data than it should. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2816) Repair doesn't synchronize merkle tree creation properly
[ https://issues.apache.org/jira/browse/CASSANDRA-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059655#comment-13059655 ] Terje Marthinussen commented on CASSANDRA-2816: --- bq.I'm not sure that in a p2p design we can posit an omniscient the system. Is that a philosophical statement? :) As Cassandra, at least for now, is a p2p network with fairly clearly defined boundaries, I will continue calling it a system for now :) However, looking at it from the p2p viewpoint, the user potentially have no clue about where replicas are stored and given this, it may be impossible for the user to issue repair manually on more than one node at a time without getting in trouble. Given a large enough p2p setup, it would also be non-trivial to actually schedule a complete repair without ending up with 2 or more repairs running on the same replica set. Since Cassandra do no checkpoint the synchronization so it is forced to rescan everything on every repair, repairs easily take so long that you are forced to run it on several nodes at a time if you are going to manage to finish repairing all nodes in 10 days... Anyway, this is way outside the scope of this jira :) Repair doesn't synchronize merkle tree creation properly Key: CASSANDRA-2816 URL: https://issues.apache.org/jira/browse/CASSANDRA-2816 Project: Cassandra Issue Type: Bug Components: Core Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Labels: repair Fix For: 0.8.2 Attachments: 0001-Schedule-merkle-tree-request-one-by-one.patch Being a little slow, I just realized after having opened CASSANDRA-2811 and CASSANDRA-2815 that there is a more general problem with repair. When a repair is started, it will send a number of merkle tree to its neighbor as well as himself and assume for correction that the building of those trees will be started on every node roughly at the same time (if not, we end up comparing data snapshot at different time and will thus mistakenly repair a lot of useless data). This is bogus for many reasons: * Because validation compaction runs on the same executor that other compaction, the start of the validation on the different node is subject to other compactions. 0.8 mitigates this in a way by being multi-threaded (and thus there is less change to be blocked a long time by a long running compaction), but the compaction executor being bounded, its still a problem) * if you run a nodetool repair without arguments, it will repair every CFs. As a consequence it will generate lots of merkle tree requests and all of those requests will be issued at the same time. Because even in 0.8 the compaction executor is bounded, some of those validations will end up being queued behind the first ones. Even assuming that the different validation are submitted in the same order on each node (which isn't guaranteed either), there is no guarantee that on all nodes, the first validation will take the same time, hence desynchronizing the queued ones. Overall, it is important for the precision of repair that for a given CF and range (which is the unit at which trees are computed), we make sure that all node will start the validation at the same time (or, since we can't do magic, as close as possible). One (reasonably simple) proposition to fix this would be to have repair schedule validation compactions across nodes one by one (i.e, one CF/range at a time), waiting for all nodes to return their tree before submitting the next request. Then on each node, we should make sure that the node will start the validation compaction as soon as requested. For that, we probably want to have a specific executor for validation compaction and: * either we fail the whole repair whenever one node is not able to execute the validation compaction right away (because no thread are available right away). * we simply tell the user that if he start too many repairs in parallel, he may start seeing some of those repairing more data than it should. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2816) Repair doesn't synchronize merkle tree creation properly
[ https://issues.apache.org/jira/browse/CASSANDRA-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059658#comment-13059658 ] Terje Marthinussen commented on CASSANDRA-2816: --- bq. I also noticed that in this case, we have RF3. The node which is going somewhat crazy is number 6, however during the repair, it does log that it talks compares and streams data with node 4, 5, 7 and 8. This is maybe correct. Node 7 will replicate to node 6 and 8 so 6 and 8 would share data. So, to make things safe, even with this patch, every 4th node can run repair at the same time if RF=3?, but you still need to run repair on each of those 4 nodes to make sure it is all repaired? Repair doesn't synchronize merkle tree creation properly Key: CASSANDRA-2816 URL: https://issues.apache.org/jira/browse/CASSANDRA-2816 Project: Cassandra Issue Type: Bug Components: Core Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Labels: repair Fix For: 0.8.2 Attachments: 0001-Schedule-merkle-tree-request-one-by-one.patch Being a little slow, I just realized after having opened CASSANDRA-2811 and CASSANDRA-2815 that there is a more general problem with repair. When a repair is started, it will send a number of merkle tree to its neighbor as well as himself and assume for correction that the building of those trees will be started on every node roughly at the same time (if not, we end up comparing data snapshot at different time and will thus mistakenly repair a lot of useless data). This is bogus for many reasons: * Because validation compaction runs on the same executor that other compaction, the start of the validation on the different node is subject to other compactions. 0.8 mitigates this in a way by being multi-threaded (and thus there is less change to be blocked a long time by a long running compaction), but the compaction executor being bounded, its still a problem) * if you run a nodetool repair without arguments, it will repair every CFs. As a consequence it will generate lots of merkle tree requests and all of those requests will be issued at the same time. Because even in 0.8 the compaction executor is bounded, some of those validations will end up being queued behind the first ones. Even assuming that the different validation are submitted in the same order on each node (which isn't guaranteed either), there is no guarantee that on all nodes, the first validation will take the same time, hence desynchronizing the queued ones. Overall, it is important for the precision of repair that for a given CF and range (which is the unit at which trees are computed), we make sure that all node will start the validation at the same time (or, since we can't do magic, as close as possible). One (reasonably simple) proposition to fix this would be to have repair schedule validation compactions across nodes one by one (i.e, one CF/range at a time), waiting for all nodes to return their tree before submitting the next request. Then on each node, we should make sure that the node will start the validation compaction as soon as requested. For that, we probably want to have a specific executor for validation compaction and: * either we fail the whole repair whenever one node is not able to execute the validation compaction right away (because no thread are available right away). * we simply tell the user that if he start too many repairs in parallel, he may start seeing some of those repairing more data than it should. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2816) Repair doesn't synchronize merkle tree creation properly
[ https://issues.apache.org/jira/browse/CASSANDRA-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059658#comment-13059658 ] Terje Marthinussen edited comment on CASSANDRA-2816 at 7/5/11 2:31 AM: --- bq. I also noticed that in this case, we have RF3. The node which is going somewhat crazy is number 6, however during the repair, it does log that it talks compares and streams data with node 4, 5, 7 and 8. This is maybe correct. Node 7 will replicate to node 6 and 8 so 6 and 8 would share data. So, to make things safe, even with this patch, every 4th node can run repair at the same time if RF=3?, but you still need to run repair on each of those 4 nodes to make sure it is all repaired? As for the comment I made earlier. To me, it looks like if the repair start triggering transfers on a large scale, the file the node get streamed in will not be streamed out, but this may get compacted before the repair finished and the compacted file I suspect gets streamed out and the repair just never finishe was (Author: terjem): bq. I also noticed that in this case, we have RF3. The node which is going somewhat crazy is number 6, however during the repair, it does log that it talks compares and streams data with node 4, 5, 7 and 8. This is maybe correct. Node 7 will replicate to node 6 and 8 so 6 and 8 would share data. So, to make things safe, even with this patch, every 4th node can run repair at the same time if RF=3?, but you still need to run repair on each of those 4 nodes to make sure it is all repaired? Repair doesn't synchronize merkle tree creation properly Key: CASSANDRA-2816 URL: https://issues.apache.org/jira/browse/CASSANDRA-2816 Project: Cassandra Issue Type: Bug Components: Core Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Labels: repair Fix For: 0.8.2 Attachments: 0001-Schedule-merkle-tree-request-one-by-one.patch Being a little slow, I just realized after having opened CASSANDRA-2811 and CASSANDRA-2815 that there is a more general problem with repair. When a repair is started, it will send a number of merkle tree to its neighbor as well as himself and assume for correction that the building of those trees will be started on every node roughly at the same time (if not, we end up comparing data snapshot at different time and will thus mistakenly repair a lot of useless data). This is bogus for many reasons: * Because validation compaction runs on the same executor that other compaction, the start of the validation on the different node is subject to other compactions. 0.8 mitigates this in a way by being multi-threaded (and thus there is less change to be blocked a long time by a long running compaction), but the compaction executor being bounded, its still a problem) * if you run a nodetool repair without arguments, it will repair every CFs. As a consequence it will generate lots of merkle tree requests and all of those requests will be issued at the same time. Because even in 0.8 the compaction executor is bounded, some of those validations will end up being queued behind the first ones. Even assuming that the different validation are submitted in the same order on each node (which isn't guaranteed either), there is no guarantee that on all nodes, the first validation will take the same time, hence desynchronizing the queued ones. Overall, it is important for the precision of repair that for a given CF and range (which is the unit at which trees are computed), we make sure that all node will start the validation at the same time (or, since we can't do magic, as close as possible). One (reasonably simple) proposition to fix this would be to have repair schedule validation compactions across nodes one by one (i.e, one CF/range at a time), waiting for all nodes to return their tree before submitting the next request. Then on each node, we should make sure that the node will start the validation compaction as soon as requested. For that, we probably want to have a specific executor for validation compaction and: * either we fail the whole repair whenever one node is not able to execute the validation compaction right away (because no thread are available right away). * we simply tell the user that if he start too many repairs in parallel, he may start seeing some of those repairing more data than it should. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira