date:20121106

Sylvain Lebresne created CASSANDRA-4918:
---

 Summary: Remove CQL3 arbitrary select limit
 Key: CASSANDRA-4918
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4918
 Project: Cassandra
  Issue Type: Bug
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
 Fix For: 1.2.0


Let it be clear however that until CASSANDRA-4415 is resolved, it will put us 
in a situation where it will be easy to write queries that timeout (and 
potentially OOM the server). That being said, even with the auto-limit it's not 
too hard to write queries that timeout if you're not at least a bit careful and 
so far we've always answer that by saying 'you have to be mindful of how much 
data your query is asking for'. And while I'm all for adding protection against 
OOMing the server like suggested by Jonathan on CASSANDRA-4304, I think the 
arbitrary auto-limit is the worst possible solution to this problem.

Note that until CASSANDRA-4415 is resolved I wouldn't be totally opposed to 
force people to provide a LIMIT to select queries if we're really thing it will 
avoids lots of surprise, though tbh I do think it would be enough to just 
continue to be vocal about the fact that 'you have to be mindful of how much 
data your query is asking for' and its follow-up 'you should use an explicit 
LIMIT if in doubt about how much data will be returned'.

But I am *strongly opposed* in keeping the current arbitrary limit because it 
makes very little sense imo, and the little sense it makes will completely 
vanish once CASSANDRA-4415 is here, and I don't want to break the API and do a 
CQL4 to be able to remove that limit later.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4918) Remove CQL3 arbitrary select limit


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-4918:


Attachment: 4918.txt

Trivial patch attached

 Remove CQL3 arbitrary select limit
 --

 Key: CASSANDRA-4918
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4918
 Project: Cassandra
  Issue Type: Bug
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
 Fix For: 1.2.0

 Attachments: 4918.txt


 Let it be clear however that until CASSANDRA-4415 is resolved, it will put us 
 in a situation where it will be easy to write queries that timeout (and 
 potentially OOM the server). That being said, even with the auto-limit it's 
 not too hard to write queries that timeout if you're not at least a bit 
 careful and so far we've always answer that by saying 'you have to be mindful 
 of how much data your query is asking for'. And while I'm all for adding 
 protection against OOMing the server like suggested by Jonathan on 
 CASSANDRA-4304, I think the arbitrary auto-limit is the worst possible 
 solution to this problem.
 Note that until CASSANDRA-4415 is resolved I wouldn't be totally opposed to 
 force people to provide a LIMIT to select queries if we're really thing it 
 will avoids lots of surprise, though tbh I do think it would be enough to 
 just continue to be vocal about the fact that 'you have to be mindful of how 
 much data your query is asking for' and its follow-up 'you should use an 
 explicit LIMIT if in doubt about how much data will be returned'.
 But I am *strongly opposed* in keeping the current arbitrary limit because it 
 makes very little sense imo, and the little sense it makes will completely 
 vanish once CASSANDRA-4415 is here, and I don't want to break the API and do 
 a CQL4 to be able to remove that limit later.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4915) CQL should force limit when query samples data.


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491326#comment-13491326
 ] 

Sylvain Lebresne commented on CASSANDRA-4915:
-

I agree that us doing a full scan for that kind of query is confusing. In fact, 
that's a break of our otherwise applied rule: we don't allow queries that are 
not indexed. In that case we don't have an index and fallback to a full scan 
which we never do otherwise.

So the logical thing would just be to refuse that type of query (but to be 
clear, I think SELECT * FROM videos should always be allowed, because there 
is no surprise here, you've asked everything). We talked about allowing 
indexing the component of the clustering key (and though it's not done yet, I 
see no reason not to do it eventually), and once that is done we will be able 
to do those queries efficiently and it's only then that we should, again in 
theory, allow them.

Now in practice there is the fact that those queries more or less correspond to 
range_slice_queries and there is a good chance people would complain if we 
disallow them. I do note that it's not fully equivalent to the thrift case 
however, in the sense that in the thrift case you're literally asking for some 
sub-slice of all rows (or at least a range of rows), and in the result you will 
get all the rows, but with an empty set of columns if the provided filter 
selected nothing. In CQL3, you select _only_ the rows _where_ some predicate is 
true, so you won't get all those internal rows that have nothing for you.

bq. force people to supply a LIMIT clause

I really don't think this is a LIMIT problem and thus I don't think forcing (or 
doing anything with) LIMIT is the solution. Namely, if you have billions of 
rows and none of them has {{videoname = 'My funny cat'}}, then whatever the 
limit you provide (even 1) this query will timeout. Now I have some things to 
say about LIMIT and I've created CASSANDRA-4918 for that, but this is a 
completely orthogonal problem imo.

So in terms of solutions, here are the ones I would suggest by order of 
preferences:
# we could add a new {{ALLOW FULL SCAN}} option to {{SELECT}} queries that 
would explicitly say I allow the engine to do a full scan and thus I 
understand my query performance may suck immensely. We would then not allow 
queries like
{noformat}
SELECT * FROM videos WHERE videoname = 'My funny cat'
{noformat}
  until we support 2ndary indexing videoname, but we would allow
{noformat}
SELECT * FROM videos WHERE videoname = 'My funny cat' ALLOW FULL SCAN
{noformat}
  (alternative syntax could be 'ALLOW NON-INDEXED SCAN' or whatever). I think 
this would be in line with what we want for Cassandra: make the user explicitly 
conscious of the performance implications of its queries. We could even later 
extend the support of this 'ALLOW FULL SCAN' bits by bits to other type of 
queries we refuse today (though I'm certainly not implying this should be a 
priority).
# if others really don't like my previous idea, I do think that the logical 
next best thing is to refuse that type of queries pure and simple.
# as a last resort (though I don't really like it tbh), we could add some form 
a simple explain that would tell you whether a query is indexed or not (but I 
largely prefer the 'you have to explicitly say you're fine with non-indexed' 
solution).


 CQL should force limit when query samples data.
 ---

 Key: CASSANDRA-4915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4915
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 1.2.0 beta 1
Reporter: Edward Capriolo
Priority: Minor

 When issuing a query like:
 {noformat}
 CREATE TABLE videos (
   videoid uuid,
   videoname varchar,
   username varchar,
   description varchar,
   tags varchar,
   upload_date timestamp,
   PRIMARY KEY (videoid,videoname)
 );
 SELECT * FROM videos WHERE videoname = 'My funny cat';
 {noformat}
 Cassandra samples some data using get_range_slice and then applies the query.
 This is very confusing to me, because as an end user am not sure if the query 
 is fast because Cassandra is performing an optimized query (over an index, or 
 using a slicePredicate) or if cassandra is simple sampling some random rows 
 and returning me some results. 
 My suggestions:
 1) force people to supply a LIMIT clause on any query that is going to
 page over get_range_slice
 2) having some type of explain support so I can establish if this
 query will work in the
 I will champion suggestion 1) because CQL has put itself in a rather unique 
 un-sql like position by applying an automatic limit clause without the user 
 asking for them. I also do not believe the CQL language should let the user 
 issue queries that will not work as intended with

git commit: Allow static CF definition with COMPACT STORAGE

Updated Branches:
  refs/heads/cassandra-1.1 988c10fd3 - 77ee3109e


Allow static CF definition with COMPACT STORAGE

patch by slebresne; reviewed by jbellis for CASSANDRA-4910


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/77ee3109
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/77ee3109
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/77ee3109

Branch: refs/heads/cassandra-1.1
Commit: 77ee3109e547013c08007e546921ac50137923d9
Parents: 988c10f
Author: Sylvain Lebresne sylv...@datastax.com
Authored: Tue Nov 6 11:14:46 2012 +0100
Committer: Sylvain Lebresne sylv...@datastax.com
Committed: Tue Nov 6 11:14:46 2012 +0100

--
 CHANGES.txt|1 +
 .../statements/CreateColumnFamilyStatement.java|5 +
 2 files changed, 2 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/77ee3109/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 5f5ea89..c033172 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -12,6 +12,7 @@
  * (CQL) fix CREATE COLUMNFAMILY permissions check (CASSANDRA-4864)
  * Fix DynamicCompositeType same type comparison (CASSANDRA-4711)
  * Fix duplicate SSTable reference when stream session failed (CASSANDRA-3306)
+ * Allow static CF definition with compact storage (CASSANDRA-4910)
 
 
 1.1.6

http://git-wip-us.apache.org/repos/asf/cassandra/blob/77ee3109/src/java/org/apache/cassandra/cql3/statements/CreateColumnFamilyStatement.java
--
diff --git 
a/src/java/org/apache/cassandra/cql3/statements/CreateColumnFamilyStatement.java
 
b/src/java/org/apache/cassandra/cql3/statements/CreateColumnFamilyStatement.java
index 286f265..3d77053 100644
--- 
a/src/java/org/apache/cassandra/cql3/statements/CreateColumnFamilyStatement.java
+++ 
b/src/java/org/apache/cassandra/cql3/statements/CreateColumnFamilyStatement.java
@@ -222,11 +222,8 @@ public class CreateColumnFamilyStatement extends 
SchemaAlteringStatement
 stmt.comparator = CFDefinition.definitionType;
 }
 
-if (useCompactStorage)
+if (useCompactStorage  !stmt.columnAliases.isEmpty())
 {
-// There should at least have been one column alias
-if (stmt.columnAliases.isEmpty())
-throw new InvalidRequestException(COMPACT STORAGE 
requires at least one column part of the clustering key, none found);
 // There should be only one column definition remaining, 
which gives us the default validator.
 if (stmt.columns.isEmpty())
 throw new InvalidRequestException(COMPACT STORAGE 
requires one definition not part of the PRIMARY KEY, none found);

[2/2] git commit: Allow static CF definition with COMPACT STORAGE

Allow static CF definition with COMPACT STORAGE

patch by slebresne; reviewed by jbellis for CASSANDRA-4910


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/77ee3109
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/77ee3109
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/77ee3109

Branch: refs/heads/trunk
Commit: 77ee3109e547013c08007e546921ac50137923d9
Parents: 988c10f
Author: Sylvain Lebresne sylv...@datastax.com
Authored: Tue Nov 6 11:14:46 2012 +0100
Committer: Sylvain Lebresne sylv...@datastax.com
Committed: Tue Nov 6 11:14:46 2012 +0100

--
 CHANGES.txt|1 +
 .../statements/CreateColumnFamilyStatement.java|5 +
 2 files changed, 2 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/77ee3109/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 5f5ea89..c033172 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -12,6 +12,7 @@
  * (CQL) fix CREATE COLUMNFAMILY permissions check (CASSANDRA-4864)
  * Fix DynamicCompositeType same type comparison (CASSANDRA-4711)
  * Fix duplicate SSTable reference when stream session failed (CASSANDRA-3306)
+ * Allow static CF definition with compact storage (CASSANDRA-4910)
 
 
 1.1.6

http://git-wip-us.apache.org/repos/asf/cassandra/blob/77ee3109/src/java/org/apache/cassandra/cql3/statements/CreateColumnFamilyStatement.java
--
diff --git 
a/src/java/org/apache/cassandra/cql3/statements/CreateColumnFamilyStatement.java
 
b/src/java/org/apache/cassandra/cql3/statements/CreateColumnFamilyStatement.java
index 286f265..3d77053 100644
--- 
a/src/java/org/apache/cassandra/cql3/statements/CreateColumnFamilyStatement.java
+++ 
b/src/java/org/apache/cassandra/cql3/statements/CreateColumnFamilyStatement.java
@@ -222,11 +222,8 @@ public class CreateColumnFamilyStatement extends 
SchemaAlteringStatement
 stmt.comparator = CFDefinition.definitionType;
 }
 
-if (useCompactStorage)
+if (useCompactStorage  !stmt.columnAliases.isEmpty())
 {
-// There should at least have been one column alias
-if (stmt.columnAliases.isEmpty())
-throw new InvalidRequestException(COMPACT STORAGE 
requires at least one column part of the clustering key, none found);
 // There should be only one column definition remaining, 
which gives us the default validator.
 if (stmt.columns.isEmpty())
 throw new InvalidRequestException(COMPACT STORAGE 
requires one definition not part of the PRIMARY KEY, none found);

[1/2] git commit: Merge branch 'cassandra-1.1' into trunk

Updated Branches:
  refs/heads/trunk 5467fb52f - 2821490b1


Merge branch 'cassandra-1.1' into trunk

Conflicts:

src/java/org/apache/cassandra/cql3/statements/CreateColumnFamilyStatement.java


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/2821490b
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/2821490b
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/2821490b

Branch: refs/heads/trunk
Commit: 2821490b1011b92aff58ec0ec76647818df6
Parents: 5467fb5 77ee310
Author: Sylvain Lebresne sylv...@datastax.com
Authored: Tue Nov 6 11:22:55 2012 +0100
Committer: Sylvain Lebresne sylv...@datastax.com
Committed: Tue Nov 6 11:22:55 2012 +0100

--
 CHANGES.txt |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/2821490b/CHANGES.txt
--
diff --cc CHANGES.txt
index b64237e,c033172..a84b3b0
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -67,77 -12,9 +67,76 @@@ Merged from 1.1
   * (CQL) fix CREATE COLUMNFAMILY permissions check (CASSANDRA-4864)
   * Fix DynamicCompositeType same type comparison (CASSANDRA-4711)
   * Fix duplicate SSTable reference when stream session failed (CASSANDRA-3306)
 - * Allow static CF definition with compact storage (CASSANDRA-4910)
  
  
- 
 +1.2-beta1
 + * add atomic_batch_mutate (CASSANDRA-4542, -4635)
 + * increase default max_hint_window_in_ms to 3h (CASSANDRA-4632)
 + * include message initiation time to replicas so they can more
 +   accurately drop timed-out requests (CASSANDRA-2858)
 + * fix clientutil.jar dependencies (CASSANDRA-4566)
 + * optimize WriteResponse (CASSANDRA-4548)
 + * new metrics (CASSANDRA-4009)
 + * redesign KEYS indexes to avoid read-before-write (CASSANDRA-2897)
 + * debug tracing (CASSANDRA-1123)
 + * parallelize row cache loading (CASSANDRA-4282)
 + * Make compaction, flush JBOD-aware (CASSANDRA-4292)
 + * run local range scans on the read stage (CASSANDRA-3687)
 + * clean up ioexceptions (CASSANDRA-2116)
 + * add disk_failure_policy (CASSANDRA-2118)
 + * Introduce new json format with row level deletion (CASSANDRA-4054)
 + * remove redundant name column from schema_keyspaces (CASSANDRA-4433)
 + * improve nodetool ring handling of multi-dc clusters (CASSANDRA-3047)
 + * update NTS calculateNaturalEndpoints to be O(N log N) (CASSANDRA-3881)
 + * add UseCondCardMark XX jvm settings on jdk 1.7 (CASSANDRA-4366)
 + * split up rpc timeout by operation type (CASSANDRA-2819)
 + * rewrite key cache save/load to use only sequential i/o (CASSANDRA-3762)
 + * update MS protocol with a version handshake + broadcast address id
 +   (CASSANDRA-4311)
 + * multithreaded hint replay (CASSANDRA-4189)
 + * add inter-node message compression (CASSANDRA-3127)
 + * remove COPP (CASSANDRA-2479)
 + * Track tombstone expiration and compact when tombstone content is
 +   higher than a configurable threshold, default 20% (CASSANDRA-3442, 4234)
 + * update MurmurHash to version 3 (CASSANDRA-2975)
 + * (CLI) track elapsed time for `delete' operation (CASSANDRA-4060)
 + * (CLI) jline version is bumped to 1.0 to properly  support
 +   'delete' key function (CASSANDRA-4132)
 + * Save IndexSummary into new SSTable 'Summary' component (CASSANDRA-2392, 
4289)
 + * Add support for range tombstones (CASSANDRA-3708)
 + * Improve MessagingService efficiency (CASSANDRA-3617)
 + * Avoid ID conflicts from concurrent schema changes (CASSANDRA-3794)
 + * Set thrift HSHA server thread limit to unlimited by default 
(CASSANDRA-4277)
 + * Avoids double serialization of CF id in RowMutation messages
 +   (CASSANDRA-4293)
 + * stream compressed sstables directly with java nio (CASSANDRA-4297)
 + * Support multiple ranges in SliceQueryFilter (CASSANDRA-3885)
 + * Add column metadata to system column families (CASSANDRA-4018)
 + * (cql3) Always use composite types by default (CASSANDRA-4329)
 + * (cql3) Add support for set, map and list (CASSANDRA-3647)
 + * Validate date type correctly (CASSANDRA-4441)
 + * (cql3) Allow definitions with only a PK (CASSANDRA-4361)
 + * (cql3) Add support for row key composites (CASSANDRA-4179)
 + * improve DynamicEndpointSnitch by using reservoir sampling (CASSANDRA-4038)
 + * (cql3) Add support for 2ndary indexes (CASSANDRA-3680)
 + * (cql3) fix defining more than one PK to be invalid (CASSANDRA-4477)
 + * remove schema agreement checking from all external APIs (Thrift, CQL and 
CQL3) (CASSANDRA-4487)
 + * add Murmur3Partitioner and make it default for new installations 
(CASSANDRA-3772, 4621)
 + * (cql3) update pseudo-map syntax to use map syntax (CASSANDRA-4497)
 + * Finer grained exceptions hierarchy and provides error code with exceptions 
(CASSANDRA-3979)
 + * Adds events push to binary protocol

[jira] [Commented] (CASSANDRA-4482) In-memory merkle trees for repair

2012-11-06 Thread Stefan Fleiter (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491369#comment-13491369
 ] 

Stefan Fleiter commented on CASSANDRA-4482:
---

Won't the in-memory trees still be useful especially for doing repairs under 
heavy load situation?
With Apache Cassandra Anti Entropy finding the inconsistencies adds a big 
additional load on the servers while with continuous Anti Entropy only the 
amount of inconsistencies adds load.
A cheap continuous Anti Entropy which repairs more than 99% of all 
inconsistencies automatically and can be active even if the cluster is under 
heavy load seems beneficial to me.
This is especially the case if QUORUM read/writes can not used or for a 
recovering cluster after an outage of several nodes for more than 
max_hint_window_in_ms.
Getting things 100% correct can for some scenarios wait for a longer time than 
repairing most inconsistencies.


 In-memory merkle trees for repair
 -

 Key: CASSANDRA-4482
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4482
 Project: Cassandra
  Issue Type: New Feature
Reporter: Marcus Eriksson

 this sounds cool, we should reimplement it in the open source cassandra;
 http://www.acunu.com/2/post/2012/07/incremental-repair.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[cassandra-jdbc] push by wfs...@gmail.com - Support for Collections Step #1 on 2012-11-05 04:38 GMT

2012-11-06 Thread cassandra-jdbc . apache-extras . org


Revision: 87e732a6c911
Author:   Rick Shaw wfs...@gmail.com
Date: Sun Nov  4 19:51:06 2012
Log:  Support for Collections Step #1
http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/source/detail?r=87e732a6c911

Added:
 /src/main/java/org/apache/cassandra/cql/jdbc/ListMaker.java
 /src/main/java/org/apache/cassandra/cql/jdbc/MapMaker.java
 /src/main/java/org/apache/cassandra/cql/jdbc/Pair.java
 /src/main/java/org/apache/cassandra/cql/jdbc/SetMaker.java
 /src/test/java/org/apache/cassandra/cql/jdbc/CollectionsTest.java
Modified:
 /src/main/java/org/apache/cassandra/cql/jdbc/CassandraResultSet.java
 /src/main/java/org/apache/cassandra/cql/jdbc/CassandraResultSetExtras.java
 /src/main/java/org/apache/cassandra/cql/jdbc/ColumnDecoder.java
 /src/main/java/org/apache/cassandra/cql/jdbc/TypedColumn.java
 /src/main/java/org/apache/cassandra/cql/jdbc/Utils.java
 /src/test/java/org/apache/cassandra/cql/jdbc/DataSourceTest.java
 /src/test/java/org/apache/cassandra/cql/jdbc/SpashScreenTest.java
 /src/test/resources/log4j.properties

===
--- /dev/null
+++ /src/main/java/org/apache/cassandra/cql/jdbc/ListMaker.java	Sun Nov  4  
19:51:06 2012

@@ -0,0 +1,85 @@
+ /*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.cql.jdbc;
+
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+public class ListMakerT
+{
+// interning instances
+private static final MapAbstractJdbcType?, ListMaker instances =  
new HashMapAbstractJdbcType?, ListMaker();

+
+public final AbstractJdbcTypeT elements;
+
+
+public static synchronized T ListMakerT  
getInstance(AbstractJdbcTypeT elements)

+{
+ListMakerT t = instances.get(elements);
+if (t == null)
+{
+t = new ListMakerT(elements);
+instances.put(elements, t);
+}
+return t;
+}
+
+private ListMaker(AbstractJdbcTypeT elements)
+{
+this.elements = elements;
+}
+
+public ListT compose(ByteBuffer bytes)
+{
+ByteBuffer input = bytes.duplicate();
+int n = input.getShort();
+ListT l = new ArrayListT(n);
+for (int i = 0; i  n; i++)
+{
+int s = input.getShort();
+byte[] data = new byte[s];
+input.get(data);
+ByteBuffer databb = ByteBuffer.wrap(data);
+l.add(elements.compose(databb));
+}
+return l;
+}
+
+/**
+ * Layout is: {@code ns_1b_1...s_nb_n }
+ * where:
+ *   n is the number of elements
+ *   s_i is the number of bytes composing the ith element
+ *   b_i is the s_i bytes composing the ith element
+ */
+public ByteBuffer decompose(ListT value)
+{
+ListByteBuffer bbs = new ArrayListByteBuffer(value.size());
+int size = 0;
+for (T elt : value)
+{
+ByteBuffer bb = elements.decompose(elt);
+bbs.add(bb);
+size += 2 + bb.remaining();
+}
+return Utils.pack(bbs, value.size(), size);
+}
+}
===
--- /dev/null
+++ /src/main/java/org/apache/cassandra/cql/jdbc/MapMaker.java	Sun Nov  4  
19:51:06 2012

@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.cql.jdbc;
+
+import java.nio.ByteBuffer;
+import

git commit: Fix row key population with old-style mapred interface. Patch by Ben Kempe, reviewed by brandonwilliams for CASSANDRA-4834

2012-11-06 Thread brandonwilliams

Updated Branches:
  refs/heads/cassandra-1.1 77ee3109e - d909fb4fa


Fix row key population with old-style mapred interface.
Patch by Ben Kempe, reviewed by brandonwilliams for CASSANDRA-4834


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/d909fb4f
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/d909fb4f
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/d909fb4f

Branch: refs/heads/cassandra-1.1
Commit: d909fb4faf7de41b3cf19f19b48f962ab0e6fb32
Parents: 77ee310
Author: Brandon Williams brandonwilli...@apache.org
Authored: Tue Nov 6 06:23:15 2012 -0600
Committer: Brandon Williams brandonwilli...@apache.org
Committed: Tue Nov 6 06:23:15 2012 -0600

--
 .../cassandra/hadoop/ColumnFamilyRecordReader.java |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/d909fb4f/src/java/org/apache/cassandra/hadoop/ColumnFamilyRecordReader.java
--
diff --git a/src/java/org/apache/cassandra/hadoop/ColumnFamilyRecordReader.java 
b/src/java/org/apache/cassandra/hadoop/ColumnFamilyRecordReader.java
index c662932..83e436b 100644
--- a/src/java/org/apache/cassandra/hadoop/ColumnFamilyRecordReader.java
+++ b/src/java/org/apache/cassandra/hadoop/ColumnFamilyRecordReader.java
@@ -483,7 +483,7 @@ public class ColumnFamilyRecordReader extends 
RecordReaderByteBuffer, SortedMap
 return endOfData();
 
 PairByteBuffer, SortedMapByteBuffer, IColumn next = 
wideColumns.next();
-lastColumn = next.right.values().iterator().next().name();
+lastColumn = 
next.right.values().iterator().next().name().duplicate();
 
 maybeIncreaseRowCounter(next);
 return next;
@@ -556,7 +556,7 @@ public class ColumnFamilyRecordReader extends 
RecordReaderByteBuffer, SortedMap
 if (this.nextKeyValue())
 {
 key.clear();
-key.put(this.getCurrentKey());
+key.put(this.getCurrentKey().duplicate());
 key.flip();
 
 value.clear();

git commit: Update CQL3 documentation

Updated Branches:
  refs/heads/trunk 2821490b1 - 3b425b591


Update CQL3 documentation

patch by urandom; reviewed by slebresne for CASSANDRA-4879


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/3b425b59
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/3b425b59
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/3b425b59

Branch: refs/heads/trunk
Commit: 3b425b5911c5d610537f8c18222d0c6f59e1635d
Parents: 2821490
Author: Sylvain Lebresne sylv...@datastax.com
Authored: Tue Nov 6 15:08:54 2012 +0100
Committer: Sylvain Lebresne sylv...@datastax.com
Committed: Tue Nov 6 15:08:54 2012 +0100

--
 doc/cql3/CQL.textile   |  307 ++-
 .../cassandra/cql3/operations/ListOperation.java   |2 +-
 2 files changed, 210 insertions(+), 99 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/3b425b59/doc/cql3/CQL.textile
--
diff --git a/doc/cql3/CQL.textile b/doc/cql3/CQL.textile
index 175bac0..76e0b33 100644
--- a/doc/cql3/CQL.textile
+++ b/doc/cql3/CQL.textile
@@ -2,8 +2,6 @@
 
 h1. Cassandra Query Language (CQL) v3.0.0
 
-p(banner). Please note that as of Cassandra 1.1, CQL v3.0.0 is considered beta 
and while the bulk of the language should be fixed by now, small breaking 
changes may be introduced until CQL v3.0.0 is final (in Cassandra 1.2).
-
 
  span id=tableOfContents
 
@@ -49,10 +47,11 @@ p. There is a second kind of identifiers called _quoted 
identifiers_ defined by
 
 h3(#constants). Constants
 
-CQL defines 3 kinds of _implicitly-typed constants_: strings, numbers and 
uuids:
+CQL defines 4 kinds of _implicitly-typed constants_: strings, numbers, uuids 
and booleans:
 * A string constant is an arbitrary sequence of characters characters enclosed 
by single-quote(@'@). One can include a single-quote in a string by repeating 
it, e.g. @'It''s raining today'@. Those are not to be confused with quoted 
identifiers that use double-quotes.
 * Numeric constants are either integer constant defined by @-?[0-9]+@ or a 
float constant defined by @-?[0-9]+.[0-9]*@.
 * A UUID:http://en.wikipedia.org/wiki/Universally_unique_identifier constant 
is defined by @hex{8}-hex{4}-hex{4}-hex{4}-hex{12}@ where @hex@ is an 
hexadecimal character, e.g. @[0-9a-fA-F]@ and @{4}@ is the number of such 
characters.
+* A boolean constant is either @true@ or @false@ up to case-insensitivity 
(i.e. @True@ is a valid boolean constant).
 
 
 h3. Comments
@@ -70,7 +69,7 @@ CQL consists of statements. As in SQL, these statements can 
be divided in 3 cate
 * Data manipulation statements, that allow to change data
 * Queries, to look up data
 
-All statements end with a semicolon (@;@) but that semicolon can be omitted 
when dealing with a single statement. The supported statements are described in 
the following sections. When describing the grammar of said statement, we will 
reuse the non-terminal symbol defined below:
+All statements end with a semicolon (@;@) but that semicolon can be omitted 
when dealing with a single statement. The supported statements are described in 
the following sections. When describing the grammar of said statements, we will 
reuse the non-terminal symbols defined below:
 
 bc(syntax).. 
 identifier ::= any quoted or unquoted identifier, excluding reserved keywords
@@ -81,17 +80,32 @@ bc(syntax)..
  float ::= a float constant
 number ::= integer | float
   uuid ::= a uuid constant
-
-  term ::= identifier
-   | string
-   | number
-   | uuid
-   | '?'
-  int-term ::= identifier
-   | '?'
+   boolean ::= a boolean constant
+
+  final-term ::= string
+ | number
+ | uuid
+ | boolean
+term ::= final-term
+ | '?'
+int-term ::= integer
+ | '?'
+
+  collection-literal ::= map-literal
+ | set-literal
+ | list-literal
+ map-literal ::= '{' ( final-term ':' final-term ( ',' 
final-term ':' final-term )* )? '}'
+ set-literal ::= '{' ( final-term ( ',' final-term )* )? '}'
+list-literal ::= '[' ( final-term ( ',' final-term )* )? ']'
+
+  properties ::= property (AND property)*
+property ::= identifier '=' ( value | map-literal )
+   value ::= identifier | string | number | boolean
 p. 
 The question mark (@?@) in the syntax above is a bind variables for prepared 
statements:#preparedStatement.
 
+The @properties@ production is use by statement that create and alter 
keyspaces and tables. Each @property@ is either a _simple_ one, in which case 
it just has a value, or a _map_ one, in which case it's value is a map grouping

[jira] [Updated] (CASSANDRA-4021) CFS.scrubDataDirectories tries to delete nonexistent orphans

2012-11-06 Thread Brandon Williams (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-4021:


Attachment: node1.log

dtestbot managed to randomly reproduce this morning.  It looks like a race 
between compaction cleanup and forcible shutdown, then startup.  Log attached.

 CFS.scrubDataDirectories tries to delete nonexistent orphans
 

 Key: CASSANDRA-4021
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4021
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7 beta 2
Reporter: Brandon Williams
Assignee: Brandon Williams
Priority: Minor
  Labels: datastax_qa
 Attachments: 4021.txt, node1.log


 The check only looks for a missing data file, then deletes all other 
 components, however it's possible for the data file and another component to 
 be missing, causing an error:
 {noformat}
  WARN 17:19:28,765 Removing orphans for 
 /var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-24492:
  [Index.db, Filter.db, Digest.sha1, Statistics.db, Data.db]
 ERROR 17:19:28,766 Exception encountered during startup
 java.lang.AssertionError: attempted to delete non-existing file 
 system-HintsColumnFamily-hd-24492-Index.db
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:49)
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:44)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:357)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:167)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:352)
 at 
 org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:105)
 java.lang.AssertionError: attempted to delete non-existing file 
 system-HintsColumnFamily-hd-24492-Index.db
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:49)
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:44)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:357)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:167)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:352)
 at 
 org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:105)
 Exception encountered during startup: attempted to delete non-existing file 
 system-HintsColumnFamily-hd-24492-Index.db
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4861) Consider separating tracing from log4j


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-4861:


Attachment: 4861.txt

Attaching patch for this.

The gist of the patch is that instead of calling logger.debug(...) to trace 
stuff, Tracing.trace(...) is now called. I note that I've made so that this 
latter method logs the message at TRACE. We can change but I figured that 
anything that belong the query tracing probably belong to trace logging, and 
that avoids duplicating two calls (one to trace, one to log). There is also a 
Tracing.traceAndDebugLog for when you want to log at debug (but as it happens, 
I think most traced stuff don't belong at DEBUG log, because if we start 
logging per-query things at DEBUG it very quickly makes the debug log unusable).

As far as I can tell, for inserts and select the things logged are pretty much 
the same. Except for the fact that I took the liberty of adding the tracing of 
CQL3 statemtn parsing/preparation/validation but that's a detail. I've also 
done a few very minor update to one or two messages while at it when I though 
it improved them, but I'm happy reverting those bits if someone disagree.

I do want to note that this does change what is trace if you say trace a CREATE 
TABLE statement. Currently, in trunk, the trace is unreadable, because we trace 
messages like:
{noformat}
Renaming 
/home/mcmanus/.ccm/test/node1/data/system/schema_columnfamilies/system-schema_columnfamilies-tmp-ia-2-CompressionInfo.db
 to 
/home/mcmanus/.ccm/test/node1/data/system/schema_columnfamilies/system-schema_columnfamilies-ia-2-CompressionInfo.d
Completed flushing 
/home/mcmanus/.ccm/test/node1/data/system/schema_columnfamilies/system-schema_columnfamilies-ia-2-Data.db
 (646 bytes) for commitlog position ReplayPosition(segmentId=1352213221524, 
position=53258)
Creating IntervalNode from [[DecoratedKey(2008276574632865675, 73797374656d), 
DecoratedKey(5501786289152180687, 
73797374656d5f747261636573)](SSTableReader(path='/home/mcmanus/.ccm/test/node1/data/system/schema_columnfamilies/system-schema_columnfamilies-ia-1-Data.db')),
 [DecoratedKey(-6017608668500074083, 74657374), 
DecoratedKey(-6017608668500074083, 
74657374)](SSTableReader(path='/home/mcmanus/.ccm/test/node1/data/system/schema_columnfamilies/system-schema_columnfamilies-ia-2-Data.db'))]
completed reading (0 ms; 0 keys) saved cache 
/home/mcmanus/.ccm/test/node1/saved_caches/test-foo-KeyCache-b.db
...
{noformat}
which are not traced with this patch obviously. This also restore the debug log 
to its former glory (without all those annoying 'Acquiring switchLock' and 
other 'Appending to commitlog' all over the place).


 Consider separating tracing from log4j
 --

 Key: CASSANDRA-4861
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4861
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 1.2.0 beta 1
Reporter: Sylvain Lebresne
 Fix For: 1.2.0

 Attachments: 4861.txt


 Currently, (as far as I understand) tracing is implemented as a log4j 
 appender that intercepts all log messages and write them to a system table. 
 I'm sorry to not have bring that up during the initial review (it's hard to 
 follow every ticket) but before we release this I'd like to have a serious 
 discussion on that choice because I'm not convinced (at all) that it's a good 
 idea. Namely, I can see the following drawbacks:
 # the main one is that this *forces* every debug messages to be traced and 
 conversely, every traced message to be logged at debug. But I strongly think 
 that debug logging and query tracing are not the same thing. Don't get me 
 wrong, there is clearly a large intersection between those two things (which 
 is fine), but I do think that *identifying* them is a mistake. More 
 concretely:
  ** Consider some of the messages we log at debug in CFS:
{noformat}
logger.debug(memtable is already frozen; another thread must be flushing 
 it);
logger.debug(forceFlush requested but everything is clean in {}, 
 columnFamily);
logger.debug(Checking for sstables overlapping {}, sstables);
{noformat}
Those messages are useful for debugging and have a place in the log at 
 debug, but they are noise as far as query tracing is concerned (None have any 
 concrete impact on query performance, they just describe what the code has 
 done). Or take the following ones from CompactionManager:
{noformat}
logger.debug(Background compaction is still running for {}.{} ({} 
 remaining). Skipping, new Object[] {cfs.table.name, cfs.columnFamily, 
 count});
logger.debug(Scheduling a background task check for {}.{} with {}, new 
 Object[] {cfs.table.name, cfs.columnFamily, 
 cfs.getCompactionStrategy().getClass().getSimpleName()});

[jira] [Commented] (CASSANDRA-4915) CQL should force limit when query samples data.

2012-11-06 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491523#comment-13491523
 ] 

Edward Capriolo commented on CASSANDRA-4915:


What do you think about forcing the construct 'WHERE token(key)=0'? This is 
like the limit concept put I believe it is clear that this query is a range 
scanning query and it is clearly starting at some key.

 CQL should force limit when query samples data.
 ---

 Key: CASSANDRA-4915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4915
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 1.2.0 beta 1
Reporter: Edward Capriolo
Priority: Minor

 When issuing a query like:
 {noformat}
 CREATE TABLE videos (
   videoid uuid,
   videoname varchar,
   username varchar,
   description varchar,
   tags varchar,
   upload_date timestamp,
   PRIMARY KEY (videoid,videoname)
 );
 SELECT * FROM videos WHERE videoname = 'My funny cat';
 {noformat}
 Cassandra samples some data using get_range_slice and then applies the query.
 This is very confusing to me, because as an end user am not sure if the query 
 is fast because Cassandra is performing an optimized query (over an index, or 
 using a slicePredicate) or if cassandra is simple sampling some random rows 
 and returning me some results. 
 My suggestions:
 1) force people to supply a LIMIT clause on any query that is going to
 page over get_range_slice
 2) having some type of explain support so I can establish if this
 query will work in the
 I will champion suggestion 1) because CQL has put itself in a rather unique 
 un-sql like position by applying an automatic limit clause without the user 
 asking for them. I also do not believe the CQL language should let the user 
 issue queries that will not work as intended with larger-then-auto-limit 
 size data sets.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4482) In-memory merkle trees for repair


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491525#comment-13491525
 ] 

Jonathan Ellis commented on CASSANDRA-4482:
---

bq. A cheap continuous Anti Entropy which repairs more than 99% of all 
inconsistencies automatically and can be active even if the cluster is under 
heavy load seems beneficial to me.

You already get this with hinted handoff.

 In-memory merkle trees for repair
 -

 Key: CASSANDRA-4482
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4482
 Project: Cassandra
  Issue Type: New Feature
Reporter: Marcus Eriksson

 this sounds cool, we should reimplement it in the open source cassandra;
 http://www.acunu.com/2/post/2012/07/incremental-repair.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4767) Need some indication of node repair success or failure


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-4767:
--

  Component/s: Tools
Affects Version/s: (was: 1.1.4)
Fix Version/s: 1.2.0
   1.1.7
 Assignee: Yuki Morishita
   Labels: jmx  (was: )

Yes, JMX is the right place for this.

One possible API: a List of MapString: String:

{'Session': session id,
 'Initiator': node coordinating the repair,
 'Status': 'Pending'|'Validating'|'Repairing'|'Success'|'Failed',
 'Started': start timestamp,
 'Finished': finish timestamp}

If this is unintrusive enough we can try to get it into 1.1.x; otherwise, an 
early 1.2 release.

/cc [~j.casares]

 Need some indication of node repair success or failure
 --

 Key: CASSANDRA-4767
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4767
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Ahmed Bashir
Assignee: Yuki Morishita
  Labels: jmx
 Fix For: 1.1.7, 1.2.0


 We are currently verifying node repair status via basic log analysis.  In 
 order to automatically track the status of periodic node repair jobs, it 
 would be better to have an indicator (through JMX perhaps).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4915) CQL should force limit when query samples data.


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491549#comment-13491549
 ] 

Sylvain Lebresne commented on CASSANDRA-4915:
-

bq. What do you think about forcing the construct 'WHERE token(key)=0'?

I don't think it solves the problem honestly. There is nothing in that telling 
you that we won't use an index to answer your query and that the query will 
almost surely timeout if you have lots of rows but little matching the 
videoname = 'My funny cat' predicate. And in fact when/if we support indexing 
on a clustering key component (videoname in that case), it will make complete 
sense to do an indexed query with a 'token(key)  0' condition (meaning, we 
allow this for indexed queries today and that doesn't imply the query is a full 
scan).

 CQL should force limit when query samples data.
 ---

 Key: CASSANDRA-4915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4915
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 1.2.0 beta 1
Reporter: Edward Capriolo
Priority: Minor

 When issuing a query like:
 {noformat}
 CREATE TABLE videos (
   videoid uuid,
   videoname varchar,
   username varchar,
   description varchar,
   tags varchar,
   upload_date timestamp,
   PRIMARY KEY (videoid,videoname)
 );
 SELECT * FROM videos WHERE videoname = 'My funny cat';
 {noformat}
 Cassandra samples some data using get_range_slice and then applies the query.
 This is very confusing to me, because as an end user am not sure if the query 
 is fast because Cassandra is performing an optimized query (over an index, or 
 using a slicePredicate) or if cassandra is simple sampling some random rows 
 and returning me some results. 
 My suggestions:
 1) force people to supply a LIMIT clause on any query that is going to
 page over get_range_slice
 2) having some type of explain support so I can establish if this
 query will work in the
 I will champion suggestion 1) because CQL has put itself in a rather unique 
 un-sql like position by applying an automatic limit clause without the user 
 asking for them. I also do not believe the CQL language should let the user 
 issue queries that will not work as intended with larger-then-auto-limit 
 size data sets.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (CASSANDRA-4919) StorageProxy.getRangeSlice sometimes returns incorrect number of columns

2012-11-06 Thread JIRA

Piotr Kołaczkowski created CASSANDRA-4919:
-

 Summary: StorageProxy.getRangeSlice sometimes returns incorrect 
number of columns
 Key: CASSANDRA-4919
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4919
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.1.6
Reporter: Piotr Kołaczkowski
Assignee: Piotr Kołaczkowski


When deployed on a single node, number of columns is correct.
When deployed on a cluster, total number of returned columns is slightly lower 
than desired. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4919) StorageProxy.getRangeSlice sometimes returns incorrect number of columns

2012-11-06 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Kołaczkowski updated CASSANDRA-4919:
--

Attachment: 0001-Fix-getRangeSlice-paging-reset-predicate-after-fetch.patch

Attaching a patch fixing paged column iteration.

 StorageProxy.getRangeSlice sometimes returns incorrect number of columns
 

 Key: CASSANDRA-4919
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4919
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.1.6
Reporter: Piotr Kołaczkowski
Assignee: Piotr Kołaczkowski
 Attachments: 
 0001-Fix-getRangeSlice-paging-reset-predicate-after-fetch.patch


 When deployed on a single node, number of columns is correct.
 When deployed on a cluster, total number of returned columns is slightly 
 lower than desired. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4829) Make consistency level configurable in cqlsh


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-4829:
-

Priority: Minor  (was: Trivial)
Assignee: Aleksey Yeschenko

 Make consistency level configurable in cqlsh
 

 Key: CASSANDRA-4829
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4829
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 1.2.0 beta 1
Reporter: Aleksey Yeschenko
Assignee: Aleksey Yeschenko
Priority: Minor
  Labels: cqlsh

 CASSANDRA-4734 moved consistency level to the protocol, so cqlsh needs a way 
 to change consistency level from the default (ONE).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4919) StorageProxy.getRangeSlice sometimes returns incorrect number of columns


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491568#comment-13491568
 ] 

Jonathan Ellis commented on CASSANDRA-4919:
---

Is there a dtest for this?

 StorageProxy.getRangeSlice sometimes returns incorrect number of columns
 

 Key: CASSANDRA-4919
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4919
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.1.6
Reporter: Piotr Kołaczkowski
Assignee: Piotr Kołaczkowski
 Attachments: 
 0001-Fix-getRangeSlice-paging-reset-predicate-after-fetch.patch


 When deployed on a single node, number of columns is correct.
 When deployed on a cluster, total number of returned columns is slightly 
 lower than desired. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4803) CFRR wide row iterators improvements

2012-11-06 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Kołaczkowski updated CASSANDRA-4803:
--

Attachment: 0007-Fallback-to-describe_splits-in-case-describe_splits_.patch

Attaching a patch allowing to generate splits also when talking to an older 
version of thrift server.

 CFRR wide row iterators improvements
 

 Key: CASSANDRA-4803
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4803
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 1.1.0
Reporter: Piotr Kołaczkowski
Assignee: Piotr Kołaczkowski
 Fix For: 1.1.7, 1.2.0

 Attachments: 0001-Wide-row-iterator-counts-rows-not-columns.patch, 
 0002-Fixed-bugs-in-describe_splits.-CFRR-uses-row-counts-.patch, 
 0003-Fixed-get_paged_slice-memtable-and-sstable-column-it.patch, 
 0004-Better-token-range-wrap-around-handling-in-CFIF-CFRR.patch, 
 0005-Fixed-handling-of-start_key-end_token-in-get_range_s.patch, 
 0006-Code-cleanup-refactoring-in-CFRR.-Fixed-bug-with-mis.patch, 
 0007-Fallback-to-describe_splits-in-case-describe_splits_.patch


 {code}
  public float getProgress()
 {
 // TODO this is totally broken for wide rows
 // the progress is likely to be reported slightly off the actual but 
 close enough
 float progress = ((float) iter.rowsRead() / totalRowCount);
 return progress  1.0F ? 1.0F : progress;
 }
 {code}
 The problem is iter.rowsRead() does not return the number of rows read from 
 the wide row iterator, but returns number of *columns* (every row is counted 
 multiple times). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (CASSANDRA-4920) Add Collation to abstract type to provide standard sort order for Strings

Sidharth created CASSANDRA-4920:
---

 Summary: Add Collation to abstract type to provide standard sort 
order for Strings
 Key: CASSANDRA-4920
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4920
 Project: Cassandra
  Issue Type: Improvement
  Components: API, Core
Affects Versions: 1.2.0 beta 1
Reporter: Sidharth


Adding a way to sort UTF8 based on a standard order(collation) is very useful. 
Say for example you have wide rows where you cannot use cassandra's standard 
indexes(secondary/primary index). Lets say each column had a string value that 
was either one of alphanumeric or purely numeric.  

Now lets say I want to index these values in a materialized views so I could 
look up things by range of values (range makes sense as a standard ordering 
over my  alpha numeric and numeric strings i.e. 12  1).

More specifically I add these values into a CompositeType and SliceRange over 
them for the index to work and I dont really care weather its a alpha or a 
numeric, it should be in the order that follows collation semantics as follows:
1) If the string is a numeric then it should be comparable like a numeric
2) If its a alpha then it should be comparable like a normal string. 
3) If its a alhpa-numeric then a contiguos sequence of numbers in the string 
should be compared as numbers like c10  c2.
4) UTF8 type strings assumed everywhere.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4920) Add Collation to abstract type to provide standard sort order for Strings

[
https://issues.apache.org/jira/browse/CASSANDRA-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sidharth updated CASSANDRA-4920:

Priority: Minor (was: Major)

Add Collation to abstract type to provide standard sort order for Strings
-

Key: CASSANDRA-4920
URL: https://issues.apache.org/jira/browse/CASSANDRA-4920
Project: Cassandra
Issue Type: Improvement
Components: API, Core
Affects Versions: 1.2.0 beta 1
Reporter: Sidharth
Priority: Minor
Labels: cassandra

Adding a way to sort UTF8 based on a standard order(collation) is very
useful. Say for example you have wide rows where you cannot use cassandra's
standard indexes(secondary/primary index). Lets say each column had a string
value that was either one of alphanumeric or purely numeric.
Now lets say I want to index these values in a materialized views so I could
look up things by range of values (range makes sense as a standard ordering
over my alpha numeric and numeric strings i.e. 12 1).
More specifically I add these values into a CompositeType and SliceRange over
them for the index to work and I dont really care weather its a alpha or a
numeric, it should be in the order that follows collation semantics as
follows:
1) If the string is a numeric then it should be comparable like a numeric
2) If its a alpha then it should be comparable like a normal string.
3) If its a alhpa-numeric then a contiguos sequence of numbers in the string
should be compared as numbers like c10 c2.
4) UTF8 type strings assumed everywhere.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4767) Need some indication of node repair success or failure

2012-11-06 Thread Nick Bailey (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491602#comment-13491602
 ] 

Nick Bailey commented on CASSANDRA-4767:


Our approach with these jmx type operations in the past has been let the jmx 
call block until it finishes. That does have some downsides though like 
checking progress and jmx timeouts.

If we do this we should hopefully make it generic enough to hook all of our 
long running jmx calls in to.

 Need some indication of node repair success or failure
 --

 Key: CASSANDRA-4767
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4767
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Ahmed Bashir
Assignee: Yuki Morishita
  Labels: jmx
 Fix For: 1.1.7, 1.2.0


 We are currently verifying node repair status via basic log analysis.  In 
 order to automatically track the status of periodic node repair jobs, it 
 would be better to have an indicator (through JMX perhaps).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4920) Add Collation to abstract type to provide standard sort order for Strings

[
https://issues.apache.org/jira/browse/CASSANDRA-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sidharth updated CASSANDRA-4920:

Description:
Adding a way to sort UTF8 based on below described collation semantics can be
useful.

Use case: Say for example you have wide rows where you cannot use cassandra's
standard indexes(secondary/primary index). Lets say each column had a string
value that was either one of alphanumeric or purely numeric and you wanted an
index by value. MOre specifically you want to slice range over a bunch of
column values and say get me all the ID's associated with value ABC to XYZ .
As usual I would index these values in a materialized views

More specifically I create an index CF; And add these values into a
CompositeType column and SliceRange over them for the indexing to work and I
dont really care weather its a alpha or a numeric as long as its ordered by the
following collation semantics as follows:
1) If the string is a numeric then it should be comparable like a numeric
2) If its a alpha then it should be comparable like a normal string.
3) If its a alhpa-numeric then a contiguos sequence of numbers in the string
should be compared as numbers like c10 c2.
4) UTF8 type strings assumed everywhere.

How this helps?:
1) You dont end up creating multiple CF for different value types.
2) You dont have to write boiler plate to do complicated type detection and do
this manually in the application.

was:
Adding a way to sort UTF8 based on a standard order(collation) is very useful.
Say for example you have wide rows where you cannot use cassandra's standard
indexes(secondary/primary index). Lets say each column had a string value that
was either one of alphanumeric or purely numeric.

Now lets say I want to index these values in a materialized views so I could
look up things by range of values (range makes sense as a standard ordering
over my alpha numeric and numeric strings i.e. 12 1).

More specifically I add these values into a CompositeType and SliceRange over
them for the index to work and I dont really care weather its a alpha or a
numeric, it should be in the order that follows collation semantics as follows:
1) If the string is a numeric then it should be comparable like a numeric
2) If its a alpha then it should be comparable like a normal string.
3) If its a alhpa-numeric then a contiguos sequence of numbers in the string
should be compared as numbers like c10 c2.
4) UTF8 type strings assumed everywhere.

Add Collation to abstract type to provide standard sort order for Strings
-

Adding a way to sort UTF8 based on below described collation semantics can be
useful.
Use case: Say for example you have wide rows where you cannot use cassandra's
standard indexes(secondary/primary index). Lets say each column had a string
value that was either one of alphanumeric or purely numeric and you wanted an
index by value. MOre specifically you want to slice range over a bunch of
column values and say get me all the ID's associated with value ABC to XYZ
. As usual I would index these values in a materialized views
More specifically I create an index CF; And add these values into a
CompositeType column and SliceRange over them for the indexing to work and I
dont really care weather its a alpha or a numeric as long as its ordered by
the following collation semantics as follows:
1) If the string is a numeric then it should be comparable like a numeric
2) If its a alpha then it should be comparable like a normal string.
3) If its a alhpa-numeric then a contiguos sequence of numbers in the string
should be compared as numbers like c10 c2.
4) UTF8 type strings assumed everywhere.
How this helps?:
1) You dont end up creating multiple CF for different value types.
2) You dont have to write boiler plate to do complicated type detection and
do this manually in the application.

[jira] [Updated] (CASSANDRA-4920) Add Collation semantics to abstract type to provide standard sort order for Strings


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharth updated CASSANDRA-4920:


Summary: Add Collation semantics to abstract type to provide standard sort 
order for Strings  (was: Add Collation to abstract type to provide standard 
sort order for Strings)

 Add Collation semantics to abstract type to provide standard sort order for 
 Strings
 ---

 Key: CASSANDRA-4920
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4920
 Project: Cassandra
  Issue Type: Improvement
  Components: API, Core
Affects Versions: 1.2.0 beta 1
Reporter: Sidharth
Priority: Minor
  Labels: cassandra

 Adding a way to sort UTF8 based on below described collation semantics can be 
 useful. 
 Use case: Say for example you have wide rows where you cannot use cassandra's 
 standard indexes(secondary/primary index). Lets say each column had a string 
 value that was either one of alphanumeric or purely numeric and you wanted an 
 index by value. MOre specifically you want to slice range over a bunch of 
 column values and say get me all the ID's associated with value ABC to XYZ 
 . As usual I would index these values in a materialized views  
 More specifically I create an index CF; And add these values into a 
 CompositeType column and SliceRange over them for the indexing to work and I 
 dont really care weather its a alpha or a numeric as long as its ordered by 
 the following collation semantics as follows:
 1) If the string is a numeric then it should be comparable like a numeric
 2) If its a alpha then it should be comparable like a normal string. 
 3) If its a alhpa-numeric then a contiguos sequence of numbers in the string 
 should be compared as numbers like c10  c2.
 4) UTF8 type strings assumed everywhere.
 How this helps?:
 1) You dont end up creating multiple CF for different value types. 
 2) You dont have to write boiler plate to do complicated type detection and 
 do this manually in the application. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4920) Add collation semantics to abstract type to provide standard sort order for Strings


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharth updated CASSANDRA-4920:


Summary: Add collation semantics to abstract type to provide standard sort 
order for Strings  (was: Add Collation semantics to abstract type to provide 
standard sort order for Strings)

 Add collation semantics to abstract type to provide standard sort order for 
 Strings
 ---

 Key: CASSANDRA-4920
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4920
 Project: Cassandra
  Issue Type: Improvement
  Components: API, Core
Affects Versions: 1.2.0 beta 1
Reporter: Sidharth
Priority: Minor
  Labels: cassandra

 Adding a way to sort UTF8 based on below described collation semantics can be 
 useful. 
 Use case: Say for example you have wide rows where you cannot use cassandra's 
 standard indexes(secondary/primary index). Lets say each column had a string 
 value that was either one of alphanumeric or purely numeric and you wanted an 
 index by value. MOre specifically you want to slice range over a bunch of 
 column values and say get me all the ID's associated with value ABC to XYZ 
 . As usual I would index these values in a materialized views  
 More specifically I create an index CF; And add these values into a 
 CompositeType column and SliceRange over them for the indexing to work and I 
 dont really care weather its a alpha or a numeric as long as its ordered by 
 the following collation semantics as follows:
 1) If the string is a numeric then it should be comparable like a numeric
 2) If its a alpha then it should be comparable like a normal string. 
 3) If its a alhpa-numeric then a contiguos sequence of numbers in the string 
 should be compared as numbers like c10  c2.
 4) UTF8 type strings assumed everywhere.
 How this helps?:
 1) You dont end up creating multiple CF for different value types. 
 2) You dont have to write boiler plate to do complicated type detection and 
 do this manually in the application. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4919) StorageProxy.getRangeSlice sometimes returns incorrect number of columns

2012-11-06 Thread Brandon Williams (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491614#comment-13491614
 ] 

Brandon Williams commented on CASSANDRA-4919:
-

There's a wide row test and a range slice test, but not a combination of the 
two.

 StorageProxy.getRangeSlice sometimes returns incorrect number of columns
 

 Key: CASSANDRA-4919
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4919
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.1.6
Reporter: Piotr Kołaczkowski
Assignee: Piotr Kołaczkowski
 Attachments: 
 0001-Fix-getRangeSlice-paging-reset-predicate-after-fetch.patch


 When deployed on a single node, number of columns is correct.
 When deployed on a cluster, total number of returned columns is slightly 
 lower than desired. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4767) Need some indication of node repair success or failure


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491615#comment-13491615
 ] 

Jonathan Ellis commented on CASSANDRA-4767:
---

None? of the other ones have the concept of a unique session so that's 
broadening the scope significantly.

 Need some indication of node repair success or failure
 --

 Key: CASSANDRA-4767
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4767
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Ahmed Bashir
Assignee: Yuki Morishita
  Labels: jmx
 Fix For: 1.1.7, 1.2.0


 We are currently verifying node repair status via basic log analysis.  In 
 order to automatically track the status of periodic node repair jobs, it 
 would be better to have an indicator (through JMX perhaps).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4245) Provide a locale/collation-aware text comparator


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-4245:
--

Summary: Provide a locale/collation-aware text comparator  (was: Provide a 
UT8Type (case insensitive) comparator)

 Provide a locale/collation-aware text comparator
 

 Key: CASSANDRA-4245
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4245
 Project: Cassandra
  Issue Type: New Feature
Reporter: Ertio Lew
Assignee: amorton
Priority: Minor

 It is a common use case to use a bunch of entity names as column names  then 
 use the row as a search index, using search by range. For such use cases  
 others, it is useful to have a UTF8 comparator that provides case insensitive 
 ordering of columns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (CASSANDRA-4920) Add collation semantics to abstract type to provide standard sort order for Strings

[
https://issues.apache.org/jira/browse/CASSANDRA-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis resolved CASSANDRA-4920.
---

Resolution: Duplicate

see CASSANDRA-4245

Add collation semantics to abstract type to provide standard sort order for
Strings
---

[jira] [Created] (CASSANDRA-4921) improve cqlsh COPY FROM performance

Jonathan Ellis created CASSANDRA-4921:
-

 Summary: improve cqlsh COPY FROM performance
 Key: CASSANDRA-4921
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4921
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 1.1.2
Reporter: Jonathan Ellis
Assignee: Aleksey Yeschenko
Priority: Minor
 Fix For: 1.1.7, 1.2.0


Profiling shows that prepare_inline takes the vast majority of cqlsh COPY FROM 
time, particularly on csv rows with many columns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4921) improve cqlsh COPY FROM performance


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491658#comment-13491658
 ] 

Jonathan Ellis commented on CASSANDRA-4921:
---

It looks like we can work around the regexp performance problems simply by 
omitting them if params is empty (which it is since cqlsh does its own param 
substitution).

To do this cleanly we should update cqlsh in 1.1 to be compatible with 
python-cql master.

However, since 1.1 is unlikely to need further python-cql enhancements, 
creating a 1.1 branch with the params fix is also fine.

 improve cqlsh COPY FROM performance
 ---

 Key: CASSANDRA-4921
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4921
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 1.1.2
Reporter: Jonathan Ellis
Assignee: Aleksey Yeschenko
Priority: Minor
 Fix For: 1.1.7, 1.2.0


 Profiling shows that prepare_inline takes the vast majority of cqlsh COPY 
 FROM time, particularly on csv rows with many columns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

2012-11-06 Thread Michael Kjellman (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491678#comment-13491678
]

Michael Kjellman commented on CASSANDRA-4813:
-

Also, while we don't throw the EOF anymore further down the stack as far as I
can tell from my testing, the net result is an IOException being thrown in the
Reducer.

Problem using BulkOutputFormat while streaming several SSTables
simultaneously from a given node.
-

Key: CASSANDRA-4813
URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
Project: Cassandra
Issue Type: Bug
Affects Versions: 1.1.0
Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop
nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker.
The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and
33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using
Hadoop 0.20.2.
Reporter: Ralph Romanos
Assignee: Yuki Morishita
Priority: Minor
Labels: Bulkoutputformat, Hadoop, SSTables
Fix For: 1.2.0

Attachments: 4813.txt

The issue occurs when streaming simultaneously SSTables from the same node to
a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot
handle receiving simultaneously SSTables from the same node. However, when it
receives simultaneously SSTables from two different nodes, everything works
fine. As a consequence, when using BulkOutputFormat to generate SSTables and
stream them to a cassandra cluster, I cannot use more than one reducer per
node otherwise I get a java.io.EOFException in the tasktracker's logs and a
java.io.IOException: Broken pipe in the Cassandra logs.

[jira] [Updated] (CASSANDRA-4874) Possible authorizaton handling impovements


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-4874:
-

Fix Version/s: 1.2.0 rc1

 Possible authorizaton handling impovements
 --

 Key: CASSANDRA-4874
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4874
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 1.1.6, 1.2.0 beta 1
Reporter: Aleksey Yeschenko
Assignee: Aleksey Yeschenko
  Labels: security
 Fix For: 1.2.0 rc1


 I'll create another issue with my suggestions about fixing/improving 
 IAuthority interfaces. This one lists possible improvements that aren't 
 related to grant/revoke methods.
 Inconsistencies:
 - CREATE COLUMNFAMILY: P.CREATE on the KS in CQL2 vs. P.CREATE on the CF in 
 CQL3 and Thrift
 - BATCH: P.UPDATE or P.DELETE on CF in CQL2 vs. P.UPDATE in CQL3 and Thrift 
 (despite remove* in Thrift asking for P.DELETE)
 - DELETE: P.DELETE in CQL2 and Thrift vs. P.UPDATE in CQL3
 - DROP INDEX: no checks in CQL2 vs. P.ALTER on the CF in CQL3
 Other issues/suggestions
 - CQL2 DROP INDEX should require authorization
 - current permission checks are inconsistent since they are performed 
 separately by CQL2 query processor, Thrift CassandraServer and CQL3 statement 
 classes.
 We should move it to one place. SomeClassWithABetterName.authorize(Operation, 
 KS, CF, User), where operation would be a enum
 (ALTER_KEYSPACE, ALTER_TABLE, CREATE_TABLE, CREATE, USE, UPDATE etc.), CF 
 should be nullable.
 - we don't respect the hierarchy when checking for permissions, or, to be 
 more specific, we are doing it wrong. take  CQL3 INSERT as an example:
 we require P.UPDATE on the CF or FULL_ACCESS on either KS or CF. However, 
 having P.UPDATE on the KS won't allow you to perform the statement, only 
 FULL_ACCESS will do.
 I doubt this was intentional, and if it was, I say it's wrong. P.UPDATE on 
 the KS should allow you to do updates on KS's cfs.
 Examples in 
 http://www.datastax.com/dev/blog/dynamic-permission-allocation-in-cassandra-1-1
  point to it being a bug, since REVOKE UPDATE ON ks FROM omega is there.
 - currently we lack a way to set permission on cassandra/keyspaces resource. 
 I think we should be able to do it. See the following point on why.
 - currently to create a keyspace you must have a P.CREATE permission on that 
 keyspace THAT DOESN'T EVEN EXIST YET. So only a superuser can create a 
 keyspace,
 or a superuser must first grant you a permission to create it. Which doesn't 
 look right to me. P.CREATE on cassandra/keyspaces should allow you to create 
 new
 keyspaces without an explicit permission for each of them.
 - same goes for CREATE TABLE. you need P.CREATE on that not-yet-existing CF 
 of FULL_ACCESS on the whole KS. P.CREATE on the KS won't do. this is wrong.
 - since permissions don't map directly to statements, we should describe 
 clearly in the documentation what permissions are required by what cql 
 statement/thrift method.
 Full list of current permission requirements: https://gist.github.com/3978182

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4875) Possible improvements to IAuthority[2] interface

[
https://issues.apache.org/jira/browse/CASSANDRA-4875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aleksey Yeschenko updated CASSANDRA-4875:
-

Fix Version/s: 1.2.0 rc1

Possible improvements to IAuthority[2] interface

Key: CASSANDRA-4875
URL: https://issues.apache.org/jira/browse/CASSANDRA-4875
Project: Cassandra
Issue Type: Improvement
Affects Versions: 1.1.6, 1.2.0 beta 1
Reporter: Aleksey Yeschenko
Assignee: Aleksey Yeschenko
Labels: security
Fix For: 1.2.0 rc1

CASSANDRA-4874 is about general improvements to authorization handling, this
one is about IAuthority[2] in particular.
- 'LIST GRANTS OF user should' become 'LIST PERMISSIONS [on resource] [of
user]'.
Currently there is no way to see all the permissions on the resource, only
all the permissions of a particular user.
- IAuthority2.listPermissions() should return a generic collection of
ResoucePermission or something, not CQLResult or ResultMessage.
That's a wrong level of abstraction. I know this issue has been raised here -
https://issues.apache.org/jira/browse/CASSANDRA-4490?focusedCommentId=13449732page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13449732com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13449732,
but I think it's possible to change this. Returning a list of {resource,
user, permission, grant_option} tuples should be possible.
- We should get rid of Permission.NO_ACCESS. An empty list of permissions
should mean absence of any permission, not some magical Permission.NO_ACCESS
value.
It's insecure and error-prone and also ambiguous (what if a user has both
FULL_ACCESS and NO_ACCESS permissions? If it's meant to be a way to strip a
user
of all permissions on the resource, then it should be replaced with some form
of REVOKE statement. Something like 'REVOKE ALL PERMISSIONS' sounds more
logical than GRANT NO_ACCESS to me.
- Previous point will probably require adding revokeAllPermissions() method
to make it explicit, special-casing IAuthority2.revoke() won't do
- IAuthorize2.grant() and IAuthorize2.revoke() accept CFName instance for a
resource, which has its ks and cf fields swapped if cf is omitted. This may
cause a real security issue if IAuthorize2 implementer doesn't know about the
issue. We must pass the resouce as a collection of strings ([cassandra,
keyspaces[, ks_name][, cf_name]]) instead, the way we pass it to
IAuthorize.authorize().
- We should probably get rid of FULL_ACCESS as well, at least as a valid
permission value (but maybe allow it in the CQL statement) and add an
equivalent IAuthority2.grantAllPermissions(), separately. Why? Imagine the
following sequence: GRANT FULL_ACCESS ON resource FOR user; REVOKE SELECT ON
resource FROM user; should the user be allowed to SELECT anymore?
I say no, he shouldn't. Full access should be represented by a list of all
permissions, not by a magical special value.
- P.DELETE probably should go in favour of P.UPDATE even for TRUNCATE.
Presence of P.DELETE will definitely confuse users, who might think that it
is somehow required to delete data, when it isn't. You can overwrite every
value if you have P.UPDATE with TTL=1 and get the same result. We should also
drop P.INSERT. Leave P.UPDATE (or rename it to P.MODIFY). P.MODIFY_DATA +
P.READ_DATA should replace P.UPDATE, P.SELECT and P.DELETE.
- I suggest new syntax to allow setting permissions on cassandra/keyspaces
resource: GRANT permission ON * FOR user.
The interface has to change because of the CFName argument to grant() and
revoke(), and since it's going to be broken anyway (and has been introduced
recently), I think we are in a position to make some other improvements while
at it.

[jira] [Commented] (CASSANDRA-4921) improve cqlsh COPY FROM performance


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491719#comment-13491719
 ] 

Jonathan Ellis commented on CASSANDRA-4921:
---

Further testing shows that the version of python-cql in trunk does not have 
this regexp performance problem.  But we still want to fix this for 1.1.7 as 
well.

 improve cqlsh COPY FROM performance
 ---

 Key: CASSANDRA-4921
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4921
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 1.1.2
Reporter: Jonathan Ellis
Assignee: Aleksey Yeschenko
Priority: Minor
 Fix For: 1.1.7, 1.2.0


 Profiling shows that prepare_inline takes the vast majority of cqlsh COPY 
 FROM time, particularly on csv rows with many columns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4915) CQL should force limit when query samples data.


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491730#comment-13491730
 ] 

Jonathan Ellis commented on CASSANDRA-4915:
---

Short of real native paging (CASSANDRA-4415), I don't think this is really 
preventable.  {{ALLOW FULL SCAN}} would only give you a false sense of 
security; consider {{SELECT * FROM users WHERE first_name='Ben' AND 
last_name='Higgenbotham'}}.  If first_name is indexed but not last_name, and 
you have millions of Bens and a handful of Higgenbothams, you have the same 
problem even though our simplistic heuristic of is it indexed? would consider 
it safe.

 CQL should force limit when query samples data.
 ---

 Key: CASSANDRA-4915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4915
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 1.2.0 beta 1
Reporter: Edward Capriolo
Priority: Minor

 When issuing a query like:
 {noformat}
 CREATE TABLE videos (
   videoid uuid,
   videoname varchar,
   username varchar,
   description varchar,
   tags varchar,
   upload_date timestamp,
   PRIMARY KEY (videoid,videoname)
 );
 SELECT * FROM videos WHERE videoname = 'My funny cat';
 {noformat}
 Cassandra samples some data using get_range_slice and then applies the query.
 This is very confusing to me, because as an end user am not sure if the query 
 is fast because Cassandra is performing an optimized query (over an index, or 
 using a slicePredicate) or if cassandra is simple sampling some random rows 
 and returning me some results. 
 My suggestions:
 1) force people to supply a LIMIT clause on any query that is going to
 page over get_range_slice
 2) having some type of explain support so I can establish if this
 query will work in the
 I will champion suggestion 1) because CQL has put itself in a rather unique 
 un-sql like position by applying an automatic limit clause without the user 
 asking for them. I also do not believe the CQL language should let the user 
 issue queries that will not work as intended with larger-then-auto-limit 
 size data sets.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4915) CQL should force limit when query samples data.


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491734#comment-13491734
 ] 

Jonathan Ellis commented on CASSANDRA-4915:
---

Note that while implicit {{LIMIT}} does not prevent expensive queries like 
this, it does keep you from OOMing the server!  So it is useful in that respect.

I don't see why we'd need a CQL4 when we don't need that anymore, though.  We 
respect user-specified {{LIMIT}} already, and relying on the limit being X vs 
10X or 0.1X is silly.  But we could codify that as Cassandra may, but is not 
required to, impose a limit if none is specified.

 CQL should force limit when query samples data.
 ---

 Key: CASSANDRA-4915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4915
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 1.2.0 beta 1
Reporter: Edward Capriolo
Priority: Minor

 When issuing a query like:
 {noformat}
 CREATE TABLE videos (
   videoid uuid,
   videoname varchar,
   username varchar,
   description varchar,
   tags varchar,
   upload_date timestamp,
   PRIMARY KEY (videoid,videoname)
 );
 SELECT * FROM videos WHERE videoname = 'My funny cat';
 {noformat}
 Cassandra samples some data using get_range_slice and then applies the query.
 This is very confusing to me, because as an end user am not sure if the query 
 is fast because Cassandra is performing an optimized query (over an index, or 
 using a slicePredicate) or if cassandra is simple sampling some random rows 
 and returning me some results. 
 My suggestions:
 1) force people to supply a LIMIT clause on any query that is going to
 page over get_range_slice
 2) having some type of explain support so I can establish if this
 query will work in the
 I will champion suggestion 1) because CQL has put itself in a rather unique 
 un-sql like position by applying an automatic limit clause without the user 
 asking for them. I also do not believe the CQL language should let the user 
 issue queries that will not work as intended with larger-then-auto-limit 
 size data sets.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (CASSANDRA-4922) TTransportException thrown many times

2012-11-06 Thread Michael Kjellman (JIRA)

Michael Kjellman created CASSANDRA-4922:
---

 Summary: TTransportException thrown many times 
 Key: CASSANDRA-4922
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4922
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 1.2.0 beta 1
 Environment: setup log4j to log DEBUG warnings, hadoop 1.0.3, 3 node 
dev cluster rf=3
Reporter: Michael Kjellman


I'm seeing a ton of these st's when either Streaming into Cassandra with BOF or 
reading with ColumnFamilyInputFormat.

DEBUG [Thrift:16] 2012-11-06 11:18:27,933 CustomTThreadPoolServer.java (line 
210) Thrift transport error occurred during processing of message.
org.apache.thrift.transport.TTransportException
at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at 
org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
at 
org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at 
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
at 
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
at 
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:22)
at 
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:200)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4922) TTransportException thrown many times while Hadoop streams in or out from cluster

2012-11-06 Thread Michael Kjellman (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Kjellman updated CASSANDRA-4922:


Summary: TTransportException thrown many times while Hadoop streams in or 
out from cluster  (was: TTransportException thrown many times )

 TTransportException thrown many times while Hadoop streams in or out from 
 cluster
 -

 Key: CASSANDRA-4922
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4922
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 1.2.0 beta 1
 Environment: setup log4j to log DEBUG warnings, hadoop 1.0.3, 3 node 
 dev cluster rf=3
Reporter: Michael Kjellman

 I'm seeing a ton of these st's when either Streaming into Cassandra with BOF 
 or reading with ColumnFamilyInputFormat.
 DEBUG [Thrift:16] 2012-11-06 11:18:27,933 CustomTThreadPoolServer.java (line 
 210) Thrift transport error occurred during processing of message.
 org.apache.thrift.transport.TTransportException
   at 
 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
   at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
   at 
 org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
   at 
 org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
   at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
   at 
 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
   at 
 org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
   at 
 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:22)
   at 
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:200)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
   at java.lang.Thread.run(Thread.java:722)
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (CASSANDRA-4923) cqlsh COPY FROM command requires primary key in first column of CSV

2012-11-06 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-4923:


 Summary: cqlsh COPY FROM command requires primary key in first 
column of CSV
 Key: CASSANDRA-4923
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4923
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 1.2.0 beta 1, 1.1.5
Reporter: J.B. Langston


The cqlsh COPY FROM command requires the primary key to be in the first column 
of the CSV, even if the field list shows that the primary key is in a different 
position.

CREATE TABLE cbp01us ( 
   naics text PRIMARY KEY 
   ) WITH 
   comment='' AND 
   comparator=text AND 
   read_repair_chance=0.10 AND 
   gc_grace_seconds=864000 AND 
   default_validation=text AND 
   min_compaction_threshold=4 AND 
   max_compaction_threshold=32 AND 
   replicate_on_write='true' AND 
   compaction_strategy_class='SizeTieredCompactionStrategy' AND 
   compression_parameters:sstable_compression='SnappyCompressor';

copy cbp01us 
(uscode,naics,empflag,emp,qp1,ap,est,f1_4,e1_4,q1_4,a1_4,n1_4,f5_9,e5_9,q5_9,a5_9,n5_9,f10_19,e10_19,q10_19,a10_19,n10_19,f20_49,e20_49,q20_49,a20_49,n20_49,f50_99,e50_99,q50_99,a50_99,n50_99,f100_249,e100_249,q100_249,a100_249,n100_249,f250_499,e250_499,q250_499,a250_499,n250_499,f500_999,e500_999,q500_999,a500_999,n500_999,f1000,e1000,q1000,a1000,n1000)
 from 'cbp01us.txt' with header=true;
Bad Request: Expected key 'NAICS' to be present in WHERE clause for 'cbp01us'
Aborting import at record #0 (line 1). Previously-inserted values still present.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4923) cqlsh COPY FROM command requires primary key in first column of CSV

2012-11-06 Thread J.B. Langston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.B. Langston updated CASSANDRA-4923:
-

Description: 
The cqlsh COPY FROM command requires the primary key to be in the first column 
of the CSV, even if the field list shows that the primary key is in a different 
position.

Test data available from 
ftp://ftp.census.gov/Econ2001_And_Earlier/CBP_CSV/cbp01us.txt

CREATE TABLE cbp01us ( 
   naics text PRIMARY KEY 
   ) WITH 
   comment='' AND 
   comparator=text AND 
   read_repair_chance=0.10 AND 
   gc_grace_seconds=864000 AND 
   default_validation=text AND 
   min_compaction_threshold=4 AND 
   max_compaction_threshold=32 AND 
   replicate_on_write='true' AND 
   compaction_strategy_class='SizeTieredCompactionStrategy' AND 
   compression_parameters:sstable_compression='SnappyCompressor';

copy cbp01us 
(uscode,naics,empflag,emp,qp1,ap,est,f1_4,e1_4,q1_4,a1_4,n1_4,f5_9,e5_9,q5_9,a5_9,n5_9,f10_19,e10_19,q10_19,a10_19,n10_19,f20_49,e20_49,q20_49,a20_49,n20_49,f50_99,e50_99,q50_99,a50_99,n50_99,f100_249,e100_249,q100_249,a100_249,n100_249,f250_499,e250_499,q250_499,a250_499,n250_499,f500_999,e500_999,q500_999,a500_999,n500_999,f1000,e1000,q1000,a1000,n1000)
 from 'cbp01us.txt' with header=true;
Bad Request: Expected key 'NAICS' to be present in WHERE clause for 'cbp01us'
Aborting import at record #0 (line 1). Previously-inserted values still present.

  was:
The cqlsh COPY FROM command requires the primary key to be in the first column 
of the CSV, even if the field list shows that the primary key is in a different 
position.

CREATE TABLE cbp01us ( 
   naics text PRIMARY KEY 
   ) WITH 
   comment='' AND 
   comparator=text AND 
   read_repair_chance=0.10 AND 
   gc_grace_seconds=864000 AND 
   default_validation=text AND 
   min_compaction_threshold=4 AND 
   max_compaction_threshold=32 AND 
   replicate_on_write='true' AND 
   compaction_strategy_class='SizeTieredCompactionStrategy' AND 
   compression_parameters:sstable_compression='SnappyCompressor';

copy cbp01us 
(uscode,naics,empflag,emp,qp1,ap,est,f1_4,e1_4,q1_4,a1_4,n1_4,f5_9,e5_9,q5_9,a5_9,n5_9,f10_19,e10_19,q10_19,a10_19,n10_19,f20_49,e20_49,q20_49,a20_49,n20_49,f50_99,e50_99,q50_99,a50_99,n50_99,f100_249,e100_249,q100_249,a100_249,n100_249,f250_499,e250_499,q250_499,a250_499,n250_499,f500_999,e500_999,q500_999,a500_999,n500_999,f1000,e1000,q1000,a1000,n1000)
 from 'cbp01us.txt' with header=true;
Bad Request: Expected key 'NAICS' to be present in WHERE clause for 'cbp01us'
Aborting import at record #0 (line 1). Previously-inserted values still present.


 cqlsh COPY FROM command requires primary key in first column of CSV
 ---

 Key: CASSANDRA-4923
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4923
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 1.1.5, 1.2.0 beta 1
Reporter: J.B. Langston
  Labels: cqlsh

 The cqlsh COPY FROM command requires the primary key to be in the first 
 column of the CSV, even if the field list shows that the primary key is in a 
 different position.
 Test data available from 
 ftp://ftp.census.gov/Econ2001_And_Earlier/CBP_CSV/cbp01us.txt
 CREATE TABLE cbp01us ( 
naics text PRIMARY KEY 
) WITH 
comment='' AND 
comparator=text AND 
read_repair_chance=0.10 AND 
gc_grace_seconds=864000 AND 
default_validation=text AND 
min_compaction_threshold=4 AND 
max_compaction_threshold=32 AND 
replicate_on_write='true' AND 
compaction_strategy_class='SizeTieredCompactionStrategy' AND 
compression_parameters:sstable_compression='SnappyCompressor';
 copy cbp01us 
 (uscode,naics,empflag,emp,qp1,ap,est,f1_4,e1_4,q1_4,a1_4,n1_4,f5_9,e5_9,q5_9,a5_9,n5_9,f10_19,e10_19,q10_19,a10_19,n10_19,f20_49,e20_49,q20_49,a20_49,n20_49,f50_99,e50_99,q50_99,a50_99,n50_99,f100_249,e100_249,q100_249,a100_249,n100_249,f250_499,e250_499,q250_499,a250_499,n250_499,f500_999,e500_999,q500_999,a500_999,n500_999,f1000,e1000,q1000,a1000,n1000)
  from 'cbp01us.txt' with header=true;
 Bad Request: Expected key 'NAICS' to be present in WHERE clause for 'cbp01us'
 Aborting import at record #0 (line 1). Previously-inserted values still 
 present.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (CASSANDRA-4923) cqlsh COPY FROM command requires primary key in first column of CSV


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko reassigned CASSANDRA-4923:


Assignee: Aleksey Yeschenko

 cqlsh COPY FROM command requires primary key in first column of CSV
 ---

 Key: CASSANDRA-4923
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4923
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 1.1.5, 1.2.0 beta 1
Reporter: J.B. Langston
Assignee: Aleksey Yeschenko
  Labels: cqlsh

 The cqlsh COPY FROM command requires the primary key to be in the first 
 column of the CSV, even if the field list shows that the primary key is in a 
 different position.
 Test data available from 
 ftp://ftp.census.gov/Econ2001_And_Earlier/CBP_CSV/cbp01us.txt
 CREATE TABLE cbp01us ( 
naics text PRIMARY KEY 
) WITH 
comment='' AND 
comparator=text AND 
read_repair_chance=0.10 AND 
gc_grace_seconds=864000 AND 
default_validation=text AND 
min_compaction_threshold=4 AND 
max_compaction_threshold=32 AND 
replicate_on_write='true' AND 
compaction_strategy_class='SizeTieredCompactionStrategy' AND 
compression_parameters:sstable_compression='SnappyCompressor';
 copy cbp01us 
 (uscode,naics,empflag,emp,qp1,ap,est,f1_4,e1_4,q1_4,a1_4,n1_4,f5_9,e5_9,q5_9,a5_9,n5_9,f10_19,e10_19,q10_19,a10_19,n10_19,f20_49,e20_49,q20_49,a20_49,n20_49,f50_99,e50_99,q50_99,a50_99,n50_99,f100_249,e100_249,q100_249,a100_249,n100_249,f250_499,e250_499,q250_499,a250_499,n250_499,f500_999,e500_999,q500_999,a500_999,n500_999,f1000,e1000,q1000,a1000,n1000)
  from 'cbp01us.txt' with header=true;
 Bad Request: Expected key 'NAICS' to be present in WHERE clause for 'cbp01us'
 Aborting import at record #0 (line 1). Previously-inserted values still 
 present.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4767) Need some indication of node repair success or failure

2012-11-06 Thread Nick Bailey (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491795#comment-13491795
 ] 

Nick Bailey commented on CASSANDRA-4767:


Darn.

 Need some indication of node repair success or failure
 --

 Key: CASSANDRA-4767
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4767
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Ahmed Bashir
Assignee: Yuki Morishita
  Labels: jmx
 Fix For: 1.1.7, 1.2.0


 We are currently verifying node repair status via basic log analysis.  In 
 order to automatically track the status of periodic node repair jobs, it 
 would be better to have an indicator (through JMX perhaps).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (CASSANDRA-4924) Make CQL 3 data accessible via thrift.

2012-11-06 Thread amorton (JIRA)

amorton created CASSANDRA-4924:
--

 Summary: Make CQL 3 data accessible via thrift.
 Key: CASSANDRA-4924
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4924
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.2.0
Reporter: amorton


Following the changes from CASSANDRA-4377 data created using CQL 3 is not 
visible via the thrift interface. 

This goes against the spirit of many comments by the project that the thrift 
API is not going away. These statements and ones such as Internally, both 
CQL3 and thrift use the same storage engine, so all future improvements to this 
engine will impact both of them equally. 
(http://www.datastax.com/dev/blog/thrift-to-cql3) and the CQL3 and thrift 
examples given here http://www.datastax.com/dev/blog/cql3-for-cassandra-experts 
gave the impression CQL 3 was a layer on top of the core storage engine. It now 
appears to be an incompatible format change. 

It makes it impossible to explain to existing using users how CQL 3 stores it's 
data. 

It also creates an all or nothing approach to trying CQL 3. 

My request is to make all data written by CQL 3 readable via the thrift API. 

An example of using the current 1.2 trunk is below:

{noformat}
cqlsh:cass_college CREATE TABLE UserTweets 
... (
... tweet_idbigint,
... user_name   text,
... bodytext,
... timestamp   timestamp,
... PRIMARY KEY (user_name, tweet_id)
... );
cqlsh:cass_college INSERT INTO 
... UserTweets
... (tweet_id, body, user_name, timestamp)
... VALUES
... (1, 'The Tweet', 'fred', 1352150816917);
cqlsh:cass_college 
cqlsh:cass_college 
cqlsh:cass_college select * from UserTweets;

 user_name | tweet_id | body  | timestamp
---+--+---+--
  fred |1 | The Tweet | 2012-11-06 10:26:56+1300
{noformat}

and in the CLI

{noformat}
[default@cass_college] show schema;
create keyspace cass_college
  with placement_strategy = 'SimpleStrategy'
  and strategy_options = {replication_factor : 3}
  and durable_writes = true;

use cass_college;



[default@cass_college] list UserTweets;
UserTweets not found in current keyspace.
[default@cass_college] 
{noformat}




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4915) CQL should force limit when query samples data.

[
https://issues.apache.org/jira/browse/CASSANDRA-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491820#comment-13491820
]

Sylvain Lebresne commented on CASSANDRA-4915:
-

bq. and relying on the limit being X vs 10X or 0.1X is silly

Why I agree on the silliness, I don't fully share your optimism that people
won't start relying on it. I also would prefer being able to clearly specify
that without limit we return as much result as there is with the technical
limitation that it's Integer.MAX_VALUE, rather than having to settle for
without limit we return results with a limit that depends on the weather and
the exact value of which you shouldn't rely on. I also think that having an
arbitrary default limit is a very bad OOM protection (I think it's still fairly
easy to OOM even with the 10,000 limit unless you are mindful of your query).
But I'd rather discuss that in CASSANDRA-4918 for the sake of not mixing
unrelated issues.

Because I do think there is an issue here that has nothing to do whatsoever
with the limit and preventing OOMing. That issue is that we allow some queries
that do not scale with the number of records in the database. And to be clear,
'not scale with the number of records in the database' means that even for a
*constant* query output it doesn't scale. Those queries are:
# the one in the description of this ticket
# as Jonathan said (and I don't disagree with it's statement), secondary index
queries with additional restrictions.

Now I agree that we can't completely protect people against those short of
refusing the queries. But I do think we have some discrepancies in what we
support and don't support: we refuse 'SELECT * FROM t WHERE partition_key = ..
AND clustering_key_part2 = ...' based on the argument than because
clustering_key_part1 is not provided, we would have to do a full scan of the
internal row and the inefficiency of that would be too surprising for the user.
But we do allow the query in the description of this ticket even though
honestly it's the same kind of query (I.e, it's a query where we don't have
*any* index to really start with).

And I don't like discrepancies. Or in other words, we've claimed that an
advantage of Cassandra is that that query performance is predictable, but
queries that for the same output (even a very small one) have an execution time
that is proportional to the number of record in the database is imho the exact
definition of query performance being non predictable (or at least
non-scalable). So I think it would be of interest to clarify what it is exactly
that we guarantee in term of query performance being predictable. And for that
I see a number of options:
# We leave thing as they are, but then the rule of when a query will have a
predicable performance (which for me means that the performance will be almost
only dependant on the query output) are fairly opaque and not very coherent.
And in particular in that case it feels random to refuse queries that would
require a full internal row scan when we happily do the ones that require an
entire ring scan.
# We get strict about allowing only queries that we can guarantee have
predictable performance (with the definition above that I think is reasonable).
That does mean refusing the query in the description, but also indeed queries
on 2ndary indexes that have more than one restriction, which probably make that
solution too restrictive to be desirable.
# We try to hit some middle ground, where while we allow some guarantee we
can't guarantee the predictability, we at least make it so that the rule for
when the predictability is guaranteed easy to understand/follow. My proposition
for ALLOW FULL SCAN above was a tentative of that. If we allow that, and
unless I forget something which is possible, I think we can say that: a query
will have predictable performance unless it either use 2ndary index or it uses
'allow full scan'. And for 2ndary index we can refine that a bit and say 'it
still will have guaranteed predictable performance if you only use one
restriction in the query'. But at least, we'd have clear guarantee without
2ndary index, and I do thing that 1) it's very useful and 2) it's not crazy to
say that 2ndary index involves more complex processing and offer thus less
guarantee in term of predictability.

In favor of my third point, I want to mention that this is exactly the
guarantee that thrift provides today, because today a non-2ndary query in
thrift always give you predictable performance in the sense that the query
performance will be proportional to the query ouptut (that you can control with
the limit), because a get_range_slice in thrift (without IndexExpression) with
a count of 1 will only ever scan one row (and if that one row doesn't have
anything for the filter, the result will be an empty row), but that is *not*
how CQL3

[jira] [Commented] (CASSANDRA-4924) Make CQL 3 data accessible via thrift.

[
https://issues.apache.org/jira/browse/CASSANDRA-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491839#comment-13491839
]

Jonathan Ellis commented on CASSANDRA-4924:
---

bq. Following the changes from CASSANDRA-4377 data created using CQL 3 is not
visible via the thrift interface.

More accurately, the table definitions are hidden, because you really shouldn't
be trying to manually perform CQL3-style encoding. But if you really insist,
batch_mutate, get_range_slice, and friends will still work as advertised, and
even validate the bytes you give them.

This seems like a reasonable compromise to me: tools designed for Thrift will
not try to scribble over cql3 tables inadvertently because they aren't aware of
the difference, but if you know your schema and want to do limited manual
encoding that is available to you.

Make CQL 3 data accessible via thrift.
--

Key: CASSANDRA-4924
URL: https://issues.apache.org/jira/browse/CASSANDRA-4924
Project: Cassandra
Issue Type: Improvement
Components: Core
Affects Versions: 1.2.0
Reporter: amorton

Following the changes from CASSANDRA-4377 data created using CQL 3 is not
visible via the thrift interface.
This goes against the spirit of many comments by the project that the thrift
API is not going away. These statements and ones such as Internally, both
CQL3 and thrift use the same storage engine, so all future improvements to
this engine will impact both of them equally.
(http://www.datastax.com/dev/blog/thrift-to-cql3) and the CQL3 and thrift
examples given here
http://www.datastax.com/dev/blog/cql3-for-cassandra-experts gave the
impression CQL 3 was a layer on top of the core storage engine. It now
appears to be an incompatible format change.
It makes it impossible to explain to existing using users how CQL 3 stores
it's data.
It also creates an all or nothing approach to trying CQL 3.
My request is to make all data written by CQL 3 readable via the thrift API.
An example of using the current 1.2 trunk is below:
{noformat}
cqlsh:cass_college CREATE TABLE UserTweets
... (
... tweet_idbigint,
... user_name text,
... bodytext,
... timestamp timestamp,
... PRIMARY KEY (user_name, tweet_id)
... );
cqlsh:cass_college INSERT INTO
... UserTweets
... (tweet_id, body, user_name, timestamp)
... VALUES
... (1, 'The Tweet', 'fred', 1352150816917);
cqlsh:cass_college
cqlsh:cass_college
cqlsh:cass_college select * from UserTweets;
user_name | tweet_id | body | timestamp
---+--+---+--
fred |1 | The Tweet | 2012-11-06 10:26:56+1300
{noformat}
and in the CLI
{noformat}
[default@cass_college] show schema;
create keyspace cass_college
with placement_strategy = 'SimpleStrategy'
and strategy_options = {replication_factor : 3}
and durable_writes = true;
use cass_college;
[default@cass_college] list UserTweets;
UserTweets not found in current keyspace.
[default@cass_college]
{noformat}

[jira] [Commented] (CASSANDRA-4924) Make CQL 3 data accessible via thrift.

2012-11-06 Thread Nate McCall (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491844#comment-13491844
 ] 

Nate McCall commented on CASSANDRA-4924:


I can't really know my schema programmatically unless describe_keyspace works 
as advertised (advertised != buried in a readme or similar). 

If we are looking for usability, intentionally adding impedance mismatch 
between CQL3 and thrift is not helping. 

 Make CQL 3 data accessible via thrift.
 --

 Key: CASSANDRA-4924
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4924
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.2.0
Reporter: amorton

 Following the changes from CASSANDRA-4377 data created using CQL 3 is not 
 visible via the thrift interface. 
 This goes against the spirit of many comments by the project that the thrift 
 API is not going away. These statements and ones such as Internally, both 
 CQL3 and thrift use the same storage engine, so all future improvements to 
 this engine will impact both of them equally. 
 (http://www.datastax.com/dev/blog/thrift-to-cql3) and the CQL3 and thrift 
 examples given here 
 http://www.datastax.com/dev/blog/cql3-for-cassandra-experts gave the 
 impression CQL 3 was a layer on top of the core storage engine. It now 
 appears to be an incompatible format change. 
 It makes it impossible to explain to existing using users how CQL 3 stores 
 it's data. 
 It also creates an all or nothing approach to trying CQL 3. 
 My request is to make all data written by CQL 3 readable via the thrift API. 
 An example of using the current 1.2 trunk is below:
 {noformat}
 cqlsh:cass_college CREATE TABLE UserTweets 
 ... (
 ... tweet_idbigint,
 ... user_name   text,
 ... bodytext,
 ... timestamp   timestamp,
 ... PRIMARY KEY (user_name, tweet_id)
 ... );
 cqlsh:cass_college INSERT INTO 
 ... UserTweets
 ... (tweet_id, body, user_name, timestamp)
 ... VALUES
 ... (1, 'The Tweet', 'fred', 1352150816917);
 cqlsh:cass_college 
 cqlsh:cass_college 
 cqlsh:cass_college select * from UserTweets;
  user_name | tweet_id | body  | timestamp
 ---+--+---+--
   fred |1 | The Tweet | 2012-11-06 10:26:56+1300
 {noformat}
 and in the CLI
 {noformat}
 [default@cass_college] show schema;
 create keyspace cass_college
   with placement_strategy = 'SimpleStrategy'
   and strategy_options = {replication_factor : 3}
   and durable_writes = true;
 use cass_college;
 [default@cass_college] list UserTweets;
 UserTweets not found in current keyspace.
 [default@cass_college] 
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4924) Make CQL 3 data accessible via thrift.


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491855#comment-13491855
 ] 

Jonathan Ellis commented on CASSANDRA-4924:
---

You tell me, then.  Suppose we add new fields to CfDef for the extra metadata 
CQL3 uses, and return them in describe_keyspace.  How do you keep a tool 
written for C* 1.0, that doesn't know these fields exists, from misinterpreting 
it based on the fields that it *does* know about?

 Make CQL 3 data accessible via thrift.
 --

 Key: CASSANDRA-4924
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4924
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.2.0
Reporter: amorton

 Following the changes from CASSANDRA-4377 data created using CQL 3 is not 
 visible via the thrift interface. 
 This goes against the spirit of many comments by the project that the thrift 
 API is not going away. These statements and ones such as Internally, both 
 CQL3 and thrift use the same storage engine, so all future improvements to 
 this engine will impact both of them equally. 
 (http://www.datastax.com/dev/blog/thrift-to-cql3) and the CQL3 and thrift 
 examples given here 
 http://www.datastax.com/dev/blog/cql3-for-cassandra-experts gave the 
 impression CQL 3 was a layer on top of the core storage engine. It now 
 appears to be an incompatible format change. 
 It makes it impossible to explain to existing using users how CQL 3 stores 
 it's data. 
 It also creates an all or nothing approach to trying CQL 3. 
 My request is to make all data written by CQL 3 readable via the thrift API. 
 An example of using the current 1.2 trunk is below:
 {noformat}
 cqlsh:cass_college CREATE TABLE UserTweets 
 ... (
 ... tweet_idbigint,
 ... user_name   text,
 ... bodytext,
 ... timestamp   timestamp,
 ... PRIMARY KEY (user_name, tweet_id)
 ... );
 cqlsh:cass_college INSERT INTO 
 ... UserTweets
 ... (tweet_id, body, user_name, timestamp)
 ... VALUES
 ... (1, 'The Tweet', 'fred', 1352150816917);
 cqlsh:cass_college 
 cqlsh:cass_college 
 cqlsh:cass_college select * from UserTweets;
  user_name | tweet_id | body  | timestamp
 ---+--+---+--
   fred |1 | The Tweet | 2012-11-06 10:26:56+1300
 {noformat}
 and in the CLI
 {noformat}
 [default@cass_college] show schema;
 create keyspace cass_college
   with placement_strategy = 'SimpleStrategy'
   and strategy_options = {replication_factor : 3}
   and durable_writes = true;
 use cass_college;
 [default@cass_college] list UserTweets;
 UserTweets not found in current keyspace.
 [default@cass_college] 
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4924) Make CQL 3 data accessible via thrift.

2012-11-06 Thread amorton (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491868#comment-13491868
]

amorton commented on CASSANDRA-4924:

bq. More accurately, the table definitions are hidden, because you really
shouldn't be trying to manually perform CQL3-style encoding. But if you really
insist, batch_mutate, get_range_slice, and friends will still work as
advertised, and even validate the bytes you give them.

Are you saying this is a restriction in the cli only ?

bq. This seems like a reasonable compromise to me: tools designed for Thrift
will not try to scribble over cql3 tables inadvertently because they aren't
aware of the difference, but if you know your schema and want to do limited
manual encoding that is available to you.

Statements like This CQL definition will store data in exactly the same way
than the thrift definition above
(http://www.datastax.com/dev/blog/thrift-to-cql3) Created the impression two
API's were using the storage engine in an equal but different way.

Hiding table definitions from one API because it does things the other will not
understand goes against those statements and that impression.

I'm trying to understand *what* CQL 3 is, how to explain it, and how to help
people migrate to it. It looks like CQL 3 is table orientated, schema driven,
API that is incompatible with the previous Thrift/RPC way of using Cassandra.

That's a big change to come in with an API upgrade.

IMHO making the CQL 3 data read and/or write visible via thrift would make it
easier at a human level. If there is real danger of thrift API clients
essentially corrupting CQL 3 data could a config setting be added to allow read
and/or write ?

Make CQL 3 data accessible via thrift.
--

Key: CASSANDRA-4924
URL: https://issues.apache.org/jira/browse/CASSANDRA-4924
Project: Cassandra
Issue Type: Improvement
Components: Core
Affects Versions: 1.2.0
Reporter: amorton

[jira] [Commented] (CASSANDRA-4924) Make CQL 3 data accessible via thrift.

2012-11-06 Thread Nate McCall (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13491876#comment-13491876
 ] 

Nate McCall commented on CASSANDRA-4924:


bq. How do you keep a tool written for C* 1.0, that doesn't know these fields 
exists, from misinterpreting it based on the fields that it does know about?

Like this is the first change on the CfDef struct? We've been managing that on 
the client for the past 2.5 years with well typed exceptions (most times :) 
when you do something dumb. All version of cassandra and hector 0.8 -- 1.0 
-- 1.1 work with as much as possible. Interop works fine - just don't do 
anything with programatic keyspace/cf stuff is a frequent reply on our mail 
list for such. Users are fine with that. 

 Make CQL 3 data accessible via thrift.
 --

 Key: CASSANDRA-4924
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4924
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.2.0
Reporter: amorton

 Following the changes from CASSANDRA-4377 data created using CQL 3 is not 
 visible via the thrift interface. 
 This goes against the spirit of many comments by the project that the thrift 
 API is not going away. These statements and ones such as Internally, both 
 CQL3 and thrift use the same storage engine, so all future improvements to 
 this engine will impact both of them equally. 
 (http://www.datastax.com/dev/blog/thrift-to-cql3) and the CQL3 and thrift 
 examples given here 
 http://www.datastax.com/dev/blog/cql3-for-cassandra-experts gave the 
 impression CQL 3 was a layer on top of the core storage engine. It now 
 appears to be an incompatible format change. 
 It makes it impossible to explain to existing using users how CQL 3 stores 
 it's data. 
 It also creates an all or nothing approach to trying CQL 3. 
 My request is to make all data written by CQL 3 readable via the thrift API. 
 An example of using the current 1.2 trunk is below:
 {noformat}
 cqlsh:cass_college CREATE TABLE UserTweets 
 ... (
 ... tweet_idbigint,
 ... user_name   text,
 ... bodytext,
 ... timestamp   timestamp,
 ... PRIMARY KEY (user_name, tweet_id)
 ... );
 cqlsh:cass_college INSERT INTO 
 ... UserTweets
 ... (tweet_id, body, user_name, timestamp)
 ... VALUES
 ... (1, 'The Tweet', 'fred', 1352150816917);
 cqlsh:cass_college 
 cqlsh:cass_college 
 cqlsh:cass_college select * from UserTweets;
  user_name | tweet_id | body  | timestamp
 ---+--+---+--
   fred |1 | The Tweet | 2012-11-06 10:26:56+1300
 {noformat}
 and in the CLI
 {noformat}
 [default@cass_college] show schema;
 create keyspace cass_college
   with placement_strategy = 'SimpleStrategy'
   and strategy_options = {replication_factor : 3}
   and durable_writes = true;
 use cass_college;
 [default@cass_college] list UserTweets;
 UserTweets not found in current keyspace.
 [default@cass_college] 
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4875) Possible improvements to IAuthority[2] interface