[jira] [Commented] (CASSANDRA-3634) compare string vs. binary prepared statement parameters
[ https://issues.apache.org/jira/browse/CASSANDRA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188336#comment-13188336 ] Sylvain Lebresne commented on CASSANDRA-3634: - bq. It can all be done on the client side if you have the full current schema available which, of course, is doable but expensive (in time) to get in place. I think we could send enough info with the CqlPreparedResult, i.e, replace the count by a list of types, like what we do for CqlResult. It would be simpler for drivers than keeping the full schema somewhere and probably parse the initial prepared query to figure out to what each marker correspond in the schema. There would be the slight issue of someone changing the validation of a given value between preparation and execution, but I don't think it's a big deal at all to say that you'll have to re-prepare queries if you do that (how often do you actually change a value validation function anyway, and even if you do so, you'd better change it for something that is compatible with the previous type for CQL, so in fact most changes would not be a problem). compare string vs. binary prepared statement parameters --- Key: CASSANDRA-3634 URL: https://issues.apache.org/jira/browse/CASSANDRA-3634 Project: Cassandra Issue Type: Sub-task Components: API, Core Reporter: Eric Evans Assignee: Eric Evans Priority: Minor Labels: cql Fix For: 1.1 Perform benchmarks to compare the performance of string and pre-serialized binary parameters to prepared statements. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CASSANDRA-3750) Migrations and Schema CFs use disk space proportional to the square of the number of CFs
[ https://issues.apache.org/jira/browse/CASSANDRA-3750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne resolved CASSANDRA-3750. - Resolution: Duplicate While it is not yet committed, CASSANDRA-1391 will almost surely fix that, so marking that one as duplicate. Migrations and Schema CFs use disk space proportional to the square of the number of CFs Key: CASSANDRA-3750 URL: https://issues.apache.org/jira/browse/CASSANDRA-3750 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.1 Environment: Linux (CentOS 5.7) Reporter: John Chakerian Attachments: fit.png The system keyspace grows proportional to the square of the number of CFs (more likely, it grows quadratically with # of schema changes in general). The major offenders in the keyspace are the Migrations table the Schema table. On clusters with very large #s of CFs (in the low thousands), we think that these large system tables may be contributing to various performance issues. The approximate expression is: s = 0.0003253*n^2 + 2.58, where n is # of keyspaces + # of schemas and s is the size of the system keyspace in megabytes. See attached plot of the regression curve showing fit. Sampled data: {noformat} NUM_CFS SYSTEM_SIZE_IN_MB 100 4.4 200 15 300 32 400 55 500 85 600 120 700 162 800 211 900 266 1000 327 {noformat} This was hit in 1.0.1, but is almost certainly not version specific. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3545) Fix very low Secondary Index performance
[ https://issues.apache.org/jira/browse/CASSANDRA-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-3545: Fix Version/s: (was: 1.0.6) 1.1 Fix very low Secondary Index performance Key: CASSANDRA-3545 URL: https://issues.apache.org/jira/browse/CASSANDRA-3545 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7.0 Reporter: Evgeny Ryabitskiy Assignee: Sylvain Lebresne Fix For: 1.1 Attachments: 0001-3545.patch, 0002-cleanup.patch While performing index search + value filtering over large Index Row ( ~100k keys per index value) with chunks (size of 512-1024 keys) search time is about 8-12 seconds, which is very very low. After profiling I got this picture: 60% of search time is calculating MD5 hash with MessageDigester (Of cause it is because of RundomPartitioner). 33% of search time (half of all MD5 hash calculating time) is double calculating of MD5 for comparing two row keys while rotating Index row to startKey (when performing search query for next chunk). I see several performance improvements: 1) Use good algorithm to search startKey in sorted collection, that is faster then iteration over all keys. This solution is on first place because it simple, need only local code changes and should solve problem (increase search in multiple times). 2) Don't calculate MD5 hash for startKey every time. It's optimal to compute it once (so search will be twice faster). Also need local code changes. 3) Think about something faster that MD5 for hashing (like TigerRandomPartitioner with Tiger/128 hash). Need research and maybe this research was done. 4) Don't use Tokens (with MD5 hash for RandomPartitioner) for comparing and sorting keys in index rows. In index rows, keys can be stored and compared with simple Byte Comparator. This solution requires huge code changes. I'm going to start from first solution. Next improvements can be done with next tickets. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3743) Lower memory consumption used by index sampling
[ https://issues.apache.org/jira/browse/CASSANDRA-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188349#comment-13188349 ] Radim Kolar commented on CASSANDRA-3743: I am working on it now Lower memory consumption used by index sampling --- Key: CASSANDRA-3743 URL: https://issues.apache.org/jira/browse/CASSANDRA-3743 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 1.0.6 Reporter: Radim Kolar currently j.o.a.c.io.sstable.indexsummary is implemented as ArrayList of KeyPosition (RowPosition key, long offset)i propose to change it to: RowPosition keys[] long offsets[] and use standard binary search on it. This will lower number of java objects used per entry from 2 (KeyPosition + RowPosition) to 1 (RowPosition). For building these arrays convenient ArrayList class can be used and then call to .toArray() on it. This is very important because index sampling uses a lot of memory on nodes with billions rows -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-3752) bulk loader no longer finds sstables
bulk loader no longer finds sstables Key: CASSANDRA-3752 URL: https://issues.apache.org/jira/browse/CASSANDRA-3752 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1 Reporter: Brandon Williams Fix For: 1.1 It looks like CASSANDRA-2749 broke it: {noformat} WARN 13:02:20,107 Invalid file 'Standard1' in data directory /var/lib/cassandra/data/Keyspace1. {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3743) Lower memory consumption used by index sampling
[ https://issues.apache.org/jira/browse/CASSANDRA-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Kolar updated CASSANDRA-3743: --- Attachment: cassandra-3743.txt Lower memory consumption used by index sampling --- Key: CASSANDRA-3743 URL: https://issues.apache.org/jira/browse/CASSANDRA-3743 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 1.0.6 Reporter: Radim Kolar Labels: optimization Fix For: 1.0.8 Attachments: cassandra-3743.txt currently j.o.a.c.io.sstable.indexsummary is implemented as ArrayList of KeyPosition (RowPosition key, long offset)i propose to change it to: RowPosition keys[] long offsets[] and use standard binary search on it. This will lower number of java objects used per entry from 2 (KeyPosition + RowPosition) to 1 (RowPosition). For building these arrays convenient ArrayList class can be used and then call to .toArray() on it. This is very important because index sampling uses a lot of memory on nodes with billions rows -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[Cassandra Wiki] Update of CassandraLimitations by JonathanEllis
Dear Wiki user, You have subscribed to a wiki page or wiki category on Cassandra Wiki for change notification. The CassandraLimitations page has been changed by JonathanEllis: http://wiki.apache.org/cassandra/CassandraLimitations?action=diffrev1=28rev2=29 == Stuff that isn't likely to change == * All data for a single row must fit (on disk) on a single machine in the cluster. Because row keys alone are used to determine the nodes responsible for replicating their data, the amount of data associated with a single key has this upper bound. - * A single column value may not be larger than 2GB. + * A single column value may not be larger than 2GB. (However, large values are read into memory when requested, so in practice small number of MB is more appropriate.) * The maximum of column per row is 2 billion. * The key (and column names) must be under 64K bytes.
[1/3] git commit: change bind parms from string to bytes
Updated Branches: refs/heads/trunk 0456b7eb2 - 7c92fc52e change bind parms from string to bytes Patch by eevans; reviewed by Rick Shaw for CASSANDRA-3634 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/7c92fc52 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/7c92fc52 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/7c92fc52 Branch: refs/heads/trunk Commit: 7c92fc52ec9aebd1906441a676cec28aa8c07967 Parents: ce29659 Author: Eric Evans eev...@sym-link.com Authored: Thu Dec 15 09:33:42 2011 -0600 Committer: Eric Evans eev...@apache.org Committed: Wed Jan 18 09:00:17 2012 -0600 -- .../apache/cassandra/cql/AbstractModification.java |5 ++- .../org/apache/cassandra/cql/BatchStatement.java |3 +- .../cassandra/cql/CreateColumnFamilyStatement.java |4 +- .../org/apache/cassandra/cql/DeleteStatement.java |6 ++-- .../org/apache/cassandra/cql/QueryProcessor.java | 22 +++--- src/java/org/apache/cassandra/cql/Term.java|4 +- .../org/apache/cassandra/cql/UpdateStatement.java |6 ++-- .../apache/cassandra/thrift/CassandraServer.java |2 +- 8 files changed, 27 insertions(+), 25 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/7c92fc52/src/java/org/apache/cassandra/cql/AbstractModification.java -- diff --git a/src/java/org/apache/cassandra/cql/AbstractModification.java b/src/java/org/apache/cassandra/cql/AbstractModification.java index 38f323b..3a0b8cb 100644 --- a/src/java/org/apache/cassandra/cql/AbstractModification.java +++ b/src/java/org/apache/cassandra/cql/AbstractModification.java @@ -20,6 +20,7 @@ */ package org.apache.cassandra.cql; +import java.nio.ByteBuffer; import java.util.List; import org.apache.cassandra.db.IMutation; @@ -103,7 +104,7 @@ public abstract class AbstractModification * * @throws InvalidRequestException on the wrong request */ -public abstract ListIMutation prepareRowMutations(String keyspace, ClientState clientState, ListString variables) +public abstract ListIMutation prepareRowMutations(String keyspace, ClientState clientState, ListByteBuffer variables) throws org.apache.cassandra.thrift.InvalidRequestException; /** @@ -117,6 +118,6 @@ public abstract class AbstractModification * * @throws InvalidRequestException on the wrong request */ -public abstract ListIMutation prepareRowMutations(String keyspace, ClientState clientState, Long timestamp, ListString variables) +public abstract ListIMutation prepareRowMutations(String keyspace, ClientState clientState, Long timestamp, ListByteBuffer variables) throws org.apache.cassandra.thrift.InvalidRequestException; } http://git-wip-us.apache.org/repos/asf/cassandra/blob/7c92fc52/src/java/org/apache/cassandra/cql/BatchStatement.java -- diff --git a/src/java/org/apache/cassandra/cql/BatchStatement.java b/src/java/org/apache/cassandra/cql/BatchStatement.java index 650b53d..2781833 100644 --- a/src/java/org/apache/cassandra/cql/BatchStatement.java +++ b/src/java/org/apache/cassandra/cql/BatchStatement.java @@ -20,6 +20,7 @@ */ package org.apache.cassandra.cql; +import java.nio.ByteBuffer; import java.util.LinkedList; import java.util.List; @@ -76,7 +77,7 @@ public class BatchStatement return timeToLive; } -public ListIMutation getMutations(String keyspace, ClientState clientState, ListString variables) +public ListIMutation getMutations(String keyspace, ClientState clientState, ListByteBuffer variables) throws InvalidRequestException { ListIMutation batch = new LinkedListIMutation(); http://git-wip-us.apache.org/repos/asf/cassandra/blob/7c92fc52/src/java/org/apache/cassandra/cql/CreateColumnFamilyStatement.java -- diff --git a/src/java/org/apache/cassandra/cql/CreateColumnFamilyStatement.java b/src/java/org/apache/cassandra/cql/CreateColumnFamilyStatement.java index 0f371f7..93b8331 100644 --- a/src/java/org/apache/cassandra/cql/CreateColumnFamilyStatement.java +++ b/src/java/org/apache/cassandra/cql/CreateColumnFamilyStatement.java @@ -55,7 +55,7 @@ public class CreateColumnFamilyStatement } /** Perform validation of parsed params */ -private void validate(ListString variables) throws InvalidRequestException +private void validate(ListByteBuffer variables) throws InvalidRequestException { cfProps.validate(); @@ -164,7 +164,7 @@ public class CreateColumnFamilyStatement * @return a CFMetaData instance corresponding to the values
[2/3] git commit: generated thrift code
generated thrift code Patch by eevans; reviewed by Rick Shaw for CASSANDRA-3634 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/ce29659a Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/ce29659a Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/ce29659a Branch: refs/heads/trunk Commit: ce29659ae29f8449023a27b6d80f1034767d302c Parents: 0456b7e Author: Eric Evans eev...@sym-link.com Authored: Thu Dec 15 09:21:35 2011 -0600 Committer: Eric Evans eev...@apache.org Committed: Wed Jan 18 08:59:54 2012 -0600 -- interface/cassandra.thrift |2 +- .../org/apache/cassandra/thrift/Cassandra.java | 9070 +++ 2 files changed, 4496 insertions(+), 4576 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/ce29659a/interface/cassandra.thrift -- diff --git a/interface/cassandra.thrift b/interface/cassandra.thrift index a35387b..a0298e5 100644 --- a/interface/cassandra.thrift +++ b/interface/cassandra.thrift @@ -709,7 +709,7 @@ service Cassandra { * Executes a prepared CQL (Cassandra Query Language) statement by passing an id token and a list of variables * to bind and returns a CqlResult containing the results. */ - CqlResult execute_prepared_cql_query(1:required i32 itemId, 2:required liststring values) + CqlResult execute_prepared_cql_query(1:required i32 itemId, 2:required listbinary values) throws (1:InvalidRequestException ire, 2:UnavailableException ue, 3:TimedOutException te,
[jira] [Resolved] (CASSANDRA-3634) compare string vs. binary prepared statement parameters
[ https://issues.apache.org/jira/browse/CASSANDRA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Evans resolved CASSANDRA-3634. --- Resolution: Fixed committed (7c92fc52) compare string vs. binary prepared statement parameters --- Key: CASSANDRA-3634 URL: https://issues.apache.org/jira/browse/CASSANDRA-3634 Project: Cassandra Issue Type: Sub-task Components: API, Core Reporter: Eric Evans Assignee: Eric Evans Priority: Minor Labels: cql Fix For: 1.1 Perform benchmarks to compare the performance of string and pre-serialized binary parameters to prepared statements. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[1/2] git commit: fix merge error; recompile Cassandra.java with thrift 0.7.0
Updated Branches: refs/heads/trunk 7c92fc52e - bce44ff32 fix merge error; recompile Cassandra.java with thrift 0.7.0 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/bce44ff3 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/bce44ff3 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/bce44ff3 Branch: refs/heads/trunk Commit: bce44ff3297c36260a2adabea5f77c6b0b9dfe92 Parents: 7c92fc5 Author: Eric Evans eev...@apache.org Authored: Wed Jan 18 10:54:33 2012 -0600 Committer: Eric Evans eev...@apache.org Committed: Wed Jan 18 10:54:33 2012 -0600 -- .../org/apache/cassandra/thrift/Cassandra.java | 9036 --- 1 files changed, 4558 insertions(+), 4478 deletions(-) --
[Cassandra Wiki] Update of Committers by JonathanEllis
Dear Wiki user, You have subscribed to a wiki page or wiki category on Cassandra Wiki for change notification. The Committers page has been changed by JonathanEllis: http://wiki.apache.org/cassandra/Committers?action=diffrev1=18rev2=19 Comment: add Aaron ||Sylvain Lebresne||Mar 2011||Datastax||PMC member, Release manager|| ||Pavel Yaskevich||Aug 2011||Datastax|| || ||Vijay Parthasarathy||Jan 2012||Netflix|| || + ||Aaron Morton||Jan 2012||Independent|| ||
[jira] [Commented] (CASSANDRA-3507) Proposal: separate cqlsh from CQL drivers
[ https://issues.apache.org/jira/browse/CASSANDRA-3507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188586#comment-13188586 ] paul cannon commented on CASSANDRA-3507: Whatever we're going to do here for 1.1, we probably want to get started. Is there any further input? In particular, will C* lose points if it gets distributed (as a tarball, at least) without any client software? Proposal: separate cqlsh from CQL drivers - Key: CASSANDRA-3507 URL: https://issues.apache.org/jira/browse/CASSANDRA-3507 Project: Cassandra Issue Type: Improvement Components: Packaging, Tools Affects Versions: 1.0.3 Environment: Debian-based systems Reporter: paul cannon Assignee: paul cannon Priority: Minor Labels: cql, cqlsh Fix For: 1.1 Whereas: * It has been shown to be very desirable to decouple the release cycles of Cassandra from the various client CQL drivers, and * It is also desirable to include a good interactive CQL client with releases of Cassandra, and * It is not desirable for Cassandra releases to depend on 3rd-party software which is neither bundled with Cassandra nor readily available for every target platform, but * Any good interactive CQL client will require a CQL driver; Therefore, be it resolved that: * cqlsh will not use an official or supported CQL driver, but will include its own private CQL driver, not intended for use by anything else, and * the Cassandra project will still recommend installing and using a proper CQL driver for client software. To ease maintenance, the private CQL driver included with cqlsh may very well be created by copying the python CQL driver from one directory into another, but the user shouldn't rely on this. Maybe we even ought to take some minor steps to discourage its use for other purposes. Thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-3753) Update CqlPreparedResult to provide type information
Update CqlPreparedResult to provide type information Key: CASSANDRA-3753 URL: https://issues.apache.org/jira/browse/CASSANDRA-3753 Project: Cassandra Issue Type: Improvement Components: API Affects Versions: 1.1 Reporter: Jonathan Ellis Priority: Critical Fix For: 1.1 As discussed on CASSANDRA-3634, adding type information to a prepared statement would allow more client-side error checking. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3634) compare string vs. binary prepared statement parameters
[ https://issues.apache.org/jira/browse/CASSANDRA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188623#comment-13188623 ] Jonathan Ellis commented on CASSANDRA-3634: --- bq. I think we could send enough info with the CqlPreparedResult, i.e, replace the count by a list of types Created CASSANDRA-3753 to follow up on that. compare string vs. binary prepared statement parameters --- Key: CASSANDRA-3634 URL: https://issues.apache.org/jira/browse/CASSANDRA-3634 Project: Cassandra Issue Type: Sub-task Components: API, Core Reporter: Eric Evans Assignee: Eric Evans Priority: Minor Labels: cql Fix For: 1.1 Perform benchmarks to compare the performance of string and pre-serialized binary parameters to prepared statements. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3740) While using BulkOutputFormat unneccessarily look for the cassandra.yaml file.
[ https://issues.apache.org/jira/browse/CASSANDRA-3740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-3740: Attachment: 0003-use-output-partitioner.txt 0002-Prevent-loading-from-yaml.txt 0001-Make-DD-the-canonical-partitioner-source.txt While using BulkOutputFormat unneccessarily look for the cassandra.yaml file. -- Key: CASSANDRA-3740 URL: https://issues.apache.org/jira/browse/CASSANDRA-3740 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 1.1 Reporter: Samarth Gahire Assignee: Brandon Williams Labels: cassandra, hadoop, mapreduce Fix For: 1.1 Attachments: 0001-Make-DD-the-canonical-partitioner-source.txt, 0002-Prevent-loading-from-yaml.txt, 0003-use-output-partitioner.txt I am trying to use BulkOutputFormat to stream the data from map of Hadoop job. I have set the cassandra related configuration using ConfigHelper ,Also have looked into Cassandra code seems Cassandra has taken care that it should not look for the cassandra.yaml file. But still when I run the job i get the following error: { 12/01/13 11:30:04 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 12/01/13 11:30:04 INFO input.FileInputFormat: Total input paths to process : 1 12/01/13 11:30:04 INFO mapred.JobClient: Running job: job_201201130910_0015 12/01/13 11:30:05 INFO mapred.JobClient: map 0% reduce 0% 12/01/13 11:30:23 INFO mapred.JobClient: Task Id : attempt_201201130910_0015_m_00_0, Status : FAILED java.lang.Throwable: Child Error at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) Caused by: java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) attempt_201201130910_0015_m_00_0: Cannot locate cassandra.yaml attempt_201201130910_0015_m_00_0: Fatal configuration error; unable to start server. } Also let me know how can i make this cassandra.yaml file available to Hadoop mapreduce job? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3749) Allow rangeSlice queries to be start/end inclusive/exclusive
[ https://issues.apache.org/jira/browse/CASSANDRA-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-3749: -- Attachment: 3749-comments.txt Even after updating the comments to make things a bit more clear (attached), I'm still confused by the remainder/split dance. For instance, if we are splitting a Bounds on bounds.right, that means that remainder overlaps Bounds entirely and so we should add that to the ranges, but instead we skip it. Allow rangeSlice queries to be start/end inclusive/exclusive - Key: CASSANDRA-3749 URL: https://issues.apache.org/jira/browse/CASSANDRA-3749 Project: Cassandra Issue Type: Sub-task Components: Core Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Fix For: 1.1 Attachments: 3749-comments.txt, 3749.patch Currently, given two keys k1 and k2, we can only do a rangeSlice on the intervals (k1, k2] (Range) and [k1, k2] (Bounds). CQL goes around this manually, by querying one more row if the start is exclusive and removing the start/end post-query if necessary. This doesn't work however with the new option introduced by CASSANDRA-3742. So this ticket proposes to add support (internally) for doing a rangeSlice for the intervals (k1, k2) an [k1, k2). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3743) Lower memory consumption used by index sampling
[ https://issues.apache.org/jira/browse/CASSANDRA-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-3743: -- Reviewer: yukim Affects Version/s: (was: 1.0.6) 1.0.0 Fix Version/s: (was: 1.0.8) 1.1 Assignee: Radim Kolar Lower memory consumption used by index sampling --- Key: CASSANDRA-3743 URL: https://issues.apache.org/jira/browse/CASSANDRA-3743 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 1.0.0 Reporter: Radim Kolar Assignee: Radim Kolar Labels: optimization Fix For: 1.1 Attachments: cassandra-3743.txt currently j.o.a.c.io.sstable.indexsummary is implemented as ArrayList of KeyPosition (RowPosition key, long offset)i propose to change it to: RowPosition keys[] long offsets[] and use standard binary search on it. This will lower number of java objects used per entry from 2 (KeyPosition + RowPosition) to 1 (RowPosition). For building these arrays convenient ArrayList class can be used and then call to .toArray() on it. This is very important because index sampling uses a lot of memory on nodes with billions rows -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3744) Nodetool.bat double quotes classpath
[ https://issues.apache.org/jira/browse/CASSANDRA-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-3744: -- Attachment: 3744-v2.txt I see what you mean. v2 attached to generalize fix to other .bats. Nodetool.bat double quotes classpath Key: CASSANDRA-3744 URL: https://issues.apache.org/jira/browse/CASSANDRA-3744 Project: Cassandra Issue Type: Bug Components: Tools Environment: Windows Reporter: Nick Bailey Assignee: Nick Bailey Priority: Minor Fix For: 1.0.8 Attachments: 0001-Don-t-double-quote-classpath.patch, 3744-v2.txt Windows sucks and double quoting things breaks stuff. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3736) -Dreplace_token leaves old node (IP) in the gossip with the token.
[ https://issues.apache.org/jira/browse/CASSANDRA-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188817#comment-13188817 ] Jackson Chung commented on CASSANDRA-3736: -- looks like fix from CASSANDRA-3747 got the fix. the replacement node would still get this once: INFO [GossipStage:1] 2012-01-18 23:45:56,412 Gossiper.java (line 834) Node /50.56.58.55 is now part of the cluster INFO [GossipStage:1] 2012-01-18 23:45:56,412 Gossiper.java (line 800) InetAddress /50.56.58.55 is now UP INFO [GossipStage:1] 2012-01-18 23:45:56,413 StorageService.java (line 1016) Nodes /50.56.58.55 and action-quick2/50.56.31.186 have the same token 85070591730234615865843651857942052864. Ignoring /50.56.58.55 INFO [GossipTasks:1] 2012-01-18 23:46:05,805 Gossiper.java (line 814) InetAddress /50.56.58.55 is now dead. INFO [GossipTasks:1] 2012-01-18 23:46:26,819 Gossiper.java (line 628) FatClient /50.56.58.55 has been silent for 3ms, removing from gossip but its quiet after that. the other node would receive the same info also: INFO [GossipTasks:1] 2012-01-18 23:45:57,486 Gossiper.java (line 628) FatClient /50.56.58.55 has been silent for 3ms, removing from gossip and the gossipinfo of those nodes are the matching: $ ./bin/nodetool -h 50.56.31.186 gossipinfo /50.56.59.68 RELEASE_VERSION:1.0.7-SNAPSHOT LOAD:6820.0 RPC_ADDRESS:50.56.59.68 STATUS:NORMAL,0 SCHEMA:--1000-- action-quick2/50.56.31.186 RELEASE_VERSION:1.0.7-SNAPSHOT RPC_ADDRESS:50.56.31.186 STATUS:NORMAL,85070591730234615865843651857942052864 LOAD:11372.0 SCHEMA:--1000-- $ ./bin/nodetool -h 50.56.59.68 gossipinfo action-quick/50.56.59.68 SCHEMA:--1000-- RELEASE_VERSION:1.0.7-SNAPSHOT LOAD:6820.0 RPC_ADDRESS:50.56.59.68 STATUS:NORMAL,0 /50.56.31.186 SCHEMA:--1000-- RELEASE_VERSION:1.0.7-SNAPSHOT LOAD:11372.0 RPC_ADDRESS:50.56.31.186 STATUS:NORMAL,85070591730234615865843651857942052864 -Dreplace_token leaves old node (IP) in the gossip with the token. -- Key: CASSANDRA-3736 URL: https://issues.apache.org/jira/browse/CASSANDRA-3736 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.0 Reporter: Jackson Chung Assignee: Vijay Fix For: 1.0.8 Attachments: 0001-CASSANDRA-3736.patch https://issues.apache.org/jira/browse/CASSANDRA-957 introduce a -Dreplace_token, however, the replaced IP keeps on showing up in the Gossiper when starting the replacement node: {noformat} INFO [Thread-2] 2012-01-12 23:59:35,162 CassandraDaemon.java (line 213) Listening for thrift clients... INFO [GossipStage:1] 2012-01-12 23:59:35,173 Gossiper.java (line 836) Node /50.56.59.68 has restarted, now UP INFO [GossipStage:1] 2012-01-12 23:59:35,174 Gossiper.java (line 804) InetAddress /50.56.59.68 is now UP INFO [GossipStage:1] 2012-01-12 23:59:35,175 StorageService.java (line 988) Node /50.56.59.68 state jump to normal INFO [GossipStage:1] 2012-01-12 23:59:35,176 Gossiper.java (line 836) Node /50.56.58.55 has restarted, now UP INFO [GossipStage:1] 2012-01-12 23:59:35,176 Gossiper.java (line 804) InetAddress /50.56.58.55 is now UP INFO [GossipStage:1] 2012-01-12 23:59:35,177 StorageService.java (line 1016) Nodes /50.56.58.55 and action-quick2/50.56.31.186 have the same token 85070591730234615865843651857942052864. Ignoring /50.56.58.55 INFO [GossipTasks:1] 2012-01-12 23:59:45,048 Gossiper.java (line 818) InetAddress /50.56.58.55 is now dead. INFO [GossipTasks:1] 2012-01-13 00:00:06,062 Gossiper.java (line 632) FatClient /50.56.58.55 has been silent for 3ms, removing from gossip INFO [GossipStage:1] 2012-01-13 00:01:06,320 Gossiper.java (line 838) Node /50.56.58.55 is now part of the cluster INFO [GossipStage:1] 2012-01-13 00:01:06,320 Gossiper.java (line 804) InetAddress /50.56.58.55 is now UP INFO [GossipStage:1] 2012-01-13 00:01:06,321 StorageService.java (line 1016) Nodes /50.56.58.55 and action-quick2/50.56.31.186 have the same token 85070591730234615865843651857942052864. Ignoring /50.56.58.55 INFO [GossipTasks:1] 2012-01-13 00:01:16,106 Gossiper.java (line 818) InetAddress /50.56.58.55 is now dead. INFO [GossipTasks:1] 2012-01-13 00:01:37,121 Gossiper.java (line 632) FatClient /50.56.58.55 has been silent for 3ms, removing from gossip INFO [GossipStage:1] 2012-01-13 00:02:37,352 Gossiper.java (line 838) Node /50.56.58.55 is now part of the cluster INFO [GossipStage:1] 2012-01-13 00:02:37,353 Gossiper.java (line 804) InetAddress /50.56.58.55 is now UP INFO [GossipStage:1] 2012-01-13 00:02:37,353 StorageService.java (line
[jira] [Updated] (CASSANDRA-3668) Performance of sstableloader is affected in 1.0.x
[ https://issues.apache.org/jira/browse/CASSANDRA-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuki Morishita updated CASSANDRA-3668: -- Attachment: 0003-Add-threads-option-to-sstableloader.patch 0002-Allow-concurrent-stream-in-StreamOutSession.patch 0001-Allow-multiple-connection-in-StreamInSession.patch Attached patches add threads option(-t) to sstableloader. The option allows you to configure # of threads per destination. I made patches for trunk because in 1.0 branch streaming socket is one-to-one to incoming stream session. I got better throughput with 4 threads, and observed little impact on target node's cpu and memory. Performance of sstableloader is affected in 1.0.x - Key: CASSANDRA-3668 URL: https://issues.apache.org/jira/browse/CASSANDRA-3668 Project: Cassandra Issue Type: Bug Components: API Affects Versions: 1.0.7 Reporter: Manish Zope Assignee: Yuki Morishita Fix For: 1.0.8 Attachments: 0001-Allow-multiple-connection-in-StreamInSession.patch, 0002-Allow-concurrent-stream-in-StreamOutSession.patch, 0003-Add-threads-option-to-sstableloader.patch, 3688-reply_before_closing_writer.txt, sstable-loader performance.txt Original Estimate: 48h Remaining Estimate: 48h One of my colleague had reported the bug regarding the degraded performance of the sstable generator and sstable loader. ISSUE :- https://issues.apache.org/jira/browse/CASSANDRA-3589 As stated in above issue generator performance is rectified but performance of the sstableloader is still an issue. 3589 is marked as duplicate of 3618.Both issues shows resolved status.But the problem with sstableloader still exists. So opening other issue so that sstbleloader problem should not go unnoticed. FYI : We have tested the generator part with the patch given in 3589.Its Working fine. Please let us know if you guys require further inputs from our side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3668) Performance of sstableloader is affected in 1.0.x
[ https://issues.apache.org/jira/browse/CASSANDRA-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-3668: -- Reviewer: jbellis Priority: Minor (was: Major) Affects Version/s: (was: 1.0.7) 1.0.0 Fix Version/s: (was: 1.0.8) 1.1 Performance of sstableloader is affected in 1.0.x - Key: CASSANDRA-3668 URL: https://issues.apache.org/jira/browse/CASSANDRA-3668 Project: Cassandra Issue Type: Bug Components: API Affects Versions: 1.0.0 Reporter: Manish Zope Assignee: Yuki Morishita Priority: Minor Fix For: 1.1 Attachments: 0001-Allow-multiple-connection-in-StreamInSession.patch, 0002-Allow-concurrent-stream-in-StreamOutSession.patch, 0003-Add-threads-option-to-sstableloader.patch, 3688-reply_before_closing_writer.txt, sstable-loader performance.txt Original Estimate: 48h Remaining Estimate: 48h One of my colleague had reported the bug regarding the degraded performance of the sstable generator and sstable loader. ISSUE :- https://issues.apache.org/jira/browse/CASSANDRA-3589 As stated in above issue generator performance is rectified but performance of the sstableloader is still an issue. 3589 is marked as duplicate of 3618.Both issues shows resolved status.But the problem with sstableloader still exists. So opening other issue so that sstbleloader problem should not go unnoticed. FYI : We have tested the generator part with the patch given in 3589.Its Working fine. Please let us know if you guys require further inputs from our side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3736) -Dreplace_token leaves old node (IP) in the gossip with the token.
[ https://issues.apache.org/jira/browse/CASSANDRA-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188911#comment-13188911 ] Vijay commented on CASSANDRA-3736: -- Yes and the fix attached with this ticket will also remove the node from the System table, while replacing hence you wont even see the following message... INFO [GossipStage:1] 2012-01-18 23:45:56,412 Gossiper.java (line 800) InetAddress /50.56.58.55 is now UP The problem is that we remove the node after 30 seconds Meanwhile the gossip will make the other node know about .55 and hence the message in the other node. The patch will fix this by removing the information from the System table in the first place instead of restart which triggering it to reappear. Can you try redoing the test? it doesn't appear back in my tests. -Dreplace_token leaves old node (IP) in the gossip with the token. -- Key: CASSANDRA-3736 URL: https://issues.apache.org/jira/browse/CASSANDRA-3736 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.0 Reporter: Jackson Chung Assignee: Vijay Fix For: 1.0.8 Attachments: 0001-CASSANDRA-3736.patch https://issues.apache.org/jira/browse/CASSANDRA-957 introduce a -Dreplace_token, however, the replaced IP keeps on showing up in the Gossiper when starting the replacement node: {noformat} INFO [Thread-2] 2012-01-12 23:59:35,162 CassandraDaemon.java (line 213) Listening for thrift clients... INFO [GossipStage:1] 2012-01-12 23:59:35,173 Gossiper.java (line 836) Node /50.56.59.68 has restarted, now UP INFO [GossipStage:1] 2012-01-12 23:59:35,174 Gossiper.java (line 804) InetAddress /50.56.59.68 is now UP INFO [GossipStage:1] 2012-01-12 23:59:35,175 StorageService.java (line 988) Node /50.56.59.68 state jump to normal INFO [GossipStage:1] 2012-01-12 23:59:35,176 Gossiper.java (line 836) Node /50.56.58.55 has restarted, now UP INFO [GossipStage:1] 2012-01-12 23:59:35,176 Gossiper.java (line 804) InetAddress /50.56.58.55 is now UP INFO [GossipStage:1] 2012-01-12 23:59:35,177 StorageService.java (line 1016) Nodes /50.56.58.55 and action-quick2/50.56.31.186 have the same token 85070591730234615865843651857942052864. Ignoring /50.56.58.55 INFO [GossipTasks:1] 2012-01-12 23:59:45,048 Gossiper.java (line 818) InetAddress /50.56.58.55 is now dead. INFO [GossipTasks:1] 2012-01-13 00:00:06,062 Gossiper.java (line 632) FatClient /50.56.58.55 has been silent for 3ms, removing from gossip INFO [GossipStage:1] 2012-01-13 00:01:06,320 Gossiper.java (line 838) Node /50.56.58.55 is now part of the cluster INFO [GossipStage:1] 2012-01-13 00:01:06,320 Gossiper.java (line 804) InetAddress /50.56.58.55 is now UP INFO [GossipStage:1] 2012-01-13 00:01:06,321 StorageService.java (line 1016) Nodes /50.56.58.55 and action-quick2/50.56.31.186 have the same token 85070591730234615865843651857942052864. Ignoring /50.56.58.55 INFO [GossipTasks:1] 2012-01-13 00:01:16,106 Gossiper.java (line 818) InetAddress /50.56.58.55 is now dead. INFO [GossipTasks:1] 2012-01-13 00:01:37,121 Gossiper.java (line 632) FatClient /50.56.58.55 has been silent for 3ms, removing from gossip INFO [GossipStage:1] 2012-01-13 00:02:37,352 Gossiper.java (line 838) Node /50.56.58.55 is now part of the cluster INFO [GossipStage:1] 2012-01-13 00:02:37,353 Gossiper.java (line 804) InetAddress /50.56.58.55 is now UP INFO [GossipStage:1] 2012-01-13 00:02:37,353 StorageService.java (line 1016) Nodes /50.56.58.55 and action-quick2/50.56.31.186 have the same token 85070591730234615865843651857942052864. Ignoring /50.56.58.55 INFO [GossipTasks:1] 2012-01-13 00:02:47,158 Gossiper.java (line 818) InetAddress /50.56.58.55 is now dead. INFO [GossipStage:1] 2012-01-13 00:02:50,162 Gossiper.java (line 818) InetAddress /50.56.58.55 is now dead. INFO [GossipStage:1] 2012-01-13 00:02:50,163 StorageService.java (line 1156) Removing token 122029383590318827259508597176866581733 for /50.56.58.55 {noformat} in the above, /50.56.58.55 was the replaced IP. tried adding the Gossiper.instance.removeEndpoint(endpoint); in the StorageService.java where the message 'Nodes %s and %s have the same token %s. Ignoring %s,' seems only have fixed this temporary. Here is a ring output: {noformat} riptano@action-quick:~/work/cassandra$ ./bin/nodetool -h localhost ring Address DC RackStatus State LoadOwns Token 85070591730234615865843651857942052864 50.56.59.68 datacenter1 rack1 Up Normal 6.67 KB 85.56%
[jira] [Commented] (CASSANDRA-3743) Lower memory consumption used by index sampling
[ https://issues.apache.org/jira/browse/CASSANDRA-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188950#comment-13188950 ] Radim Kolar commented on CASSANDRA-3743: patch is against-1.0. This was expected to go into 1.0-branch. Its small change KeyPosition was referenced by just one other class Lower memory consumption used by index sampling --- Key: CASSANDRA-3743 URL: https://issues.apache.org/jira/browse/CASSANDRA-3743 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 1.0.0 Reporter: Radim Kolar Assignee: Radim Kolar Labels: optimization Fix For: 1.1 Attachments: cassandra-3743.txt currently j.o.a.c.io.sstable.indexsummary is implemented as ArrayList of KeyPosition (RowPosition key, long offset)i propose to change it to: RowPosition keys[] long offsets[] and use standard binary search on it. This will lower number of java objects used per entry from 2 (KeyPosition + RowPosition) to 1 (RowPosition). For building these arrays convenient ArrayList class can be used and then call to .toArray() on it. This is very important because index sampling uses a lot of memory on nodes with billions rows -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira