[jira] [Commented] (CASSANDRA-5959) CQL3 support for multi-column insert in a single operation (Batch Insert / Batch Mutate)

2013-09-03 Thread Les Hazlewood (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756895#comment-13756895
 ] 

Les Hazlewood commented on CASSANDRA-5959:
--

[~slebresne] Thanks for the added comments!  We're fine upgrading to 2.0 now 
that it has been made final and released, so we will be able to benefit from 
the CASSANDRA-4693 fix.

But as you suggested, I do like the idea of adding in the proposed syntax as a 
convenience (understanding that it wouldn't be a performance improvement).  But 
since this issue originally reflected our performance needs within our software 
(and not a person using cqlsh), our particular concern has been satisfied with 
the release of C* 2.0.  

I'll let someone else resurrect this issue if they feel it is desirable enough 
to consume C* software development time/resources. :)

 CQL3 support for multi-column insert in a single operation (Batch Insert / 
 Batch Mutate)
 

 Key: CASSANDRA-5959
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5959
 Project: Cassandra
  Issue Type: New Feature
  Components: Core, Drivers
Reporter: Les Hazlewood
  Labels: CQL

 h3. Impetus for this Request
 (from the original [question on 
 StackOverflow|http://stackoverflow.com/questions/18522191/using-cassandra-and-cql3-how-do-you-insert-an-entire-wide-row-in-a-single-reque]):
 I want to insert a single row with 50,000 columns into Cassandra 1.2.9. 
 Before inserting, I have all the data for the entire row ready to go (in 
 memory):
 {code}
 +-+--+--+--+--+---+
 | | 0| 1| 2| ...  | 4 |
 | row_id  +--+--+--+--+---+
 | | text | text | text | ...  | text  |
 +-+--+--+--|--+---+
 {code}
 The column names are integers, allowing slicing for pagination. The column 
 values are a value at that particular index.
 CQL3 table definition:
 {code}
 create table results (
 row_id text,
 index int,
 value text,
 primary key (row_id, index)
 ) 
 with compact storage;
 {code}
 As I already have the row_id and all 50,000 name/value pairs in memory, I 
 just want to insert a single row into Cassandra in a single request/operation 
 so it is as fast as possible.
 The only thing I can seem to find is to do execute the following 50,000 times:
 {code}
 INSERT INTO results (row_id, index, value) values (my_row_id, ?, ?);
 {code}
 where the first {{?}} is is an index counter ({{i}}) and the second {{?}} is 
 the text value to store at location {{i}}.
 With the Datastax Java Driver client and C* server on the same development 
 machine, this took a full minute to execute.
 Oddly enough, the same 50,000 insert statements in a [Datastax Java Driver 
 Batch|http://www.datastax.com/drivers/java/apidocs/com/datastax/driver/core/querybuilder/QueryBuilder.html#batch(com.datastax.driver.core.Statement...)]
  on the same machine took 7.5 minutes.  I thought batches were supposed to be 
 _faster_ than individual inserts?
 We tried instead with a Thrift client (Astyanax) and the same insert via a 
 [MutationBatch|http://netflix.github.io/astyanax/javadoc/com/netflix/astyanax/MutationBatch.html].
   This took _235 milliseconds_.
 h3. Feature Request
 As a result of this performance testing, this issue is to request that CQL3 
 support batch mutation operations as a single operation (statement) to ensure 
 the same speed/performance benefits as existing Thrift clients.
 Example suggested syntax (based on the above example table/column family):
 {code}
 insert into results (row_id, (index,value)) values 
 ((0,text0), (1,text1), (2,text2), ..., (N,textN));
 {code}
 Each value in the {{values}} clause is a tuple.  The first tuple element is 
 the column name, the second tuple element is the column value.  This seems to 
 be the most simple/accurate representation of what happens during a batch 
 insert/mutate.
 Not having this CQL feature forced us to remove the Datastax Java Driver 
 (which we liked) in favor of Astyanax because Astyanax supports this 
 behavior.  We desire feature/performance parity between Thrift and 
 CQL3/Datastax Java Driver, so we hope this request improves both CQL3 and the 
 Driver.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-5959) CQL3 support for multi-column insert in a single operation (Batch Insert / Batch Mutate)

2013-08-30 Thread Les Hazlewood (JIRA)
Les Hazlewood created CASSANDRA-5959:


 Summary: CQL3 support for multi-column insert in a single 
operation (Batch Insert / Batch Mutate)
 Key: CASSANDRA-5959
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5959
 Project: Cassandra
  Issue Type: New Feature
  Components: Core, Drivers
Reporter: Les Hazlewood


h3. Impetus for this Request

(from the original [question on 
StackOverflow|http://stackoverflow.com/questions/18522191/using-cassandra-and-cql3-how-do-you-insert-an-entire-wide-row-in-a-single-reque]):

I want to insert a single row with 50,000 columns into Cassandra 1.2.9. Before 
inserting, I have all the data for the entire row ready to go (in memory):

{code}
+-+--+--+--+--+---+
| | 0| 1| 2| ...  | 4 |
| row_id  +--+--+--+--+---+
| | text | text | text | ...  | text  |
+-+--+--+--|--+---+
{code}

The column names are integers, allowing slicing for pagination. The column 
values are a value at that particular index.

CQL3 table definition:

{code}
create table results (
row_id text,
index int,
value text,
primary key (row_id, index)
) 
with compact storage;
{code}

As I already have the row_id and all 50,000 name/value pairs in memory, I just 
want to insert a single row into Cassandra in a single request/operation so it 
is as fast as possible.

The only thing I can seem to find is to do execute the following 50,000 times:

{code}
INSERT INTO results (row_id, index, value) values (my_row_id, ?, ?);
{code}

where the first {{?}} is is an index counter ({{i}}) and the second {{?}} is 
the text value to store at location {{i}}.

With the Datastax Java Driver client and C* server on the same development 
machine, this took a full minute to execute.

Oddly enough, the same 50,000 insert statements in a [Datastax Java Driver 
Batch|http://www.datastax.com/drivers/java/apidocs/com/datastax/driver/core/querybuilder/QueryBuilder.html#batch(com.datastax.driver.core.Statement...)]
 on the same machine took 7.5 minutes.  I thought batches were supposed to be 
_faster_ than individual inserts?

We tried instead with a Thrift client (Astyanax) and the same insert via a 
[MutationBatch|http://netflix.github.io/astyanax/javadoc/com/netflix/astyanax/MutationBatch.html].
  This took _235 milliseconds_.


h3. Feature Request

As a result of this performance testing, this issue is to request that CQL3 
support batch mutation operations as a single operation (statement) to ensure 
the same speed/performance benefits as existing Thrift clients.

Example suggested syntax (based on the above example table/column family):

{code}
insert into results (row_id, (index,value)) values 
((0,text0), (1,text1), (2,text2), ..., (N,textN));
{code}

Each value in the {{values}} clause is a tuple.  The first tuple element is the 
column name, the second tuple element is the column value.  This seems to be 
the most simple/accurate representation of what happens during a batch 
insert/mutate.

Not having this CQL feature forced us to remove the Datastax Java Driver (which 
we liked) in favor of Astyanax because Astyanax supports this behavior.  We 
desire feature/performance parity between Thrift and CQL3/Datastax Java Driver, 
so we hope this request improves both CQL3 and the Driver.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-5959) CQL3 support for multi-column insert in a single operation (Batch Insert / Batch Mutate)

2013-08-30 Thread Les Hazlewood (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Les Hazlewood updated CASSANDRA-5959:
-

Description: 
h3. Impetus for this Request

(from the original [question on 
StackOverflow|http://stackoverflow.com/questions/18522191/using-cassandra-and-cql3-how-do-you-insert-an-entire-wide-row-in-a-single-reque]):

I want to insert a single row with 50,000 columns into Cassandra 1.2.9. Before 
inserting, I have all the data for the entire row ready to go (in memory):

{code}
+-+--+--+--+--+---+
| | 0| 1| 2| ...  | 4 |
| row_id  +--+--+--+--+---+
| | text | text | text | ...  | text  |
+-+--+--+--|--+---+
{code}

The column names are integers, allowing slicing for pagination. The column 
values are a value at that particular index.

CQL3 table definition:

{code}
create table results (
row_id text,
index int,
value text,
primary key (row_id, index)
) 
with compact storage;
{code}

As I already have the row_id and all 50,000 name/value pairs in memory, I just 
want to insert a single row into Cassandra in a single request/operation so it 
is as fast as possible.

The only thing I can seem to find is to do execute the following 50,000 times:

{code}
INSERT INTO results (row_id, index, value) values (my_row_id, ?, ?);
{code}

where the first {{?}} is is an index counter ({{i}}) and the second {{?}} is 
the text value to store at location {{i}}.

With the Datastax Java Driver client and C* server on the same development 
machine, this took a full minute to execute.

Oddly enough, the same 50,000 insert statements in a [Datastax Java Driver 
Batch|http://www.datastax.com/drivers/java/apidocs/com/datastax/driver/core/querybuilder/QueryBuilder.html#batch(com.datastax.driver.core.Statement...)]
 on the same machine took 7.5 minutes.  I thought batches were supposed to be 
_faster_ than individual inserts?

We tried instead with a Thrift client (Astyanax) and the same insert via a 
[MutationBatch|http://netflix.github.io/astyanax/javadoc/com/netflix/astyanax/MutationBatch.html].
  This took _235 milliseconds_.


h3. Feature Request

As a result of this performance testing, this issue is to request that CQL3 
support batch mutation operations as a single operation (statement) to ensure 
the same speed/performance benefits as existing Thrift clients.

Example suggested syntax (based on the above example table/column family):


{code}
insert into results (row_id, (index,value)) values 
((0,text0), (1,text1), (2,text2), ..., (N,textN));
{code}

Each value in the {{values}} clause is a tuple.  The first tuple element is the 
column name, the second tuple element is the column value.  This seems to be 
the most simple/accurate representation of what happens during a batch 
insert/mutate.

Not having this CQL feature forced us to remove the Datastax Java Driver (which 
we liked) in favor of Astyanax because Astyanax supports this behavior.  We 
desire feature/performance parity between Thrift and CQL3/Datastax Java Driver, 
so we hope this request improves both CQL3 and the Driver.


  was:
h3. Impetus for this Request

(from the original [question on 
StackOverflow|http://stackoverflow.com/questions/18522191/using-cassandra-and-cql3-how-do-you-insert-an-entire-wide-row-in-a-single-reque]):

I want to insert a single row with 50,000 columns into Cassandra 1.2.9. Before 
inserting, I have all the data for the entire row ready to go (in memory):

{code}
+-+--+--+--+--+---+
| | 0| 1| 2| ...  | 4 |
| row_id  +--+--+--+--+---+
| | text | text | text | ...  | text  |
+-+--+--+--|--+---+
{code}

The column names are integers, allowing slicing for pagination. The column 
values are a value at that particular index.

CQL3 table definition:

{code}
create table results (
row_id text,
index int,
value text,
primary key (row_id, index)
) 
with compact storage;
{code}

As I already have the row_id and all 50,000 name/value pairs in memory, I just 
want to insert a single row into Cassandra in a single request/operation so it 
is as fast as possible.

The only thing I can seem to find is to do execute the following 50,000 times:

{code}
INSERT INTO results (row_id, index, value) values (my_row_id, ?, ?);
{code}

where the first {{?}} is is an index counter ({{i}}) and the second {{?}} is 
the text value to store at location {{i}}.

With the Datastax Java Driver client and C* server on the same development 
machine, this took a full minute to execute.

Oddly enough, the same 50,000 insert statements in a [Datastax Java Driver 
Batch|http://www.datastax.com/drivers/java/apidocs/com/datastax/driver/core/querybuilder/QueryBuilder.html#batch(com.datastax.driver.core.Statement...)]

[jira] [Commented] (CASSANDRA-5959) CQL3 support for multi-column insert in a single operation (Batch Insert / Batch Mutate)

2013-08-30 Thread Les Hazlewood (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13755160#comment-13755160
 ] 

Les Hazlewood commented on CASSANDRA-5959:
--

[~iamaleksey] Do you expect this to have the same performance as a Batch Mutate?

 CQL3 support for multi-column insert in a single operation (Batch Insert / 
 Batch Mutate)
 

 Key: CASSANDRA-5959
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5959
 Project: Cassandra
  Issue Type: New Feature
  Components: Core, Drivers
Reporter: Les Hazlewood
  Labels: CQL

 h3. Impetus for this Request
 (from the original [question on 
 StackOverflow|http://stackoverflow.com/questions/18522191/using-cassandra-and-cql3-how-do-you-insert-an-entire-wide-row-in-a-single-reque]):
 I want to insert a single row with 50,000 columns into Cassandra 1.2.9. 
 Before inserting, I have all the data for the entire row ready to go (in 
 memory):
 {code}
 +-+--+--+--+--+---+
 | | 0| 1| 2| ...  | 4 |
 | row_id  +--+--+--+--+---+
 | | text | text | text | ...  | text  |
 +-+--+--+--|--+---+
 {code}
 The column names are integers, allowing slicing for pagination. The column 
 values are a value at that particular index.
 CQL3 table definition:
 {code}
 create table results (
 row_id text,
 index int,
 value text,
 primary key (row_id, index)
 ) 
 with compact storage;
 {code}
 As I already have the row_id and all 50,000 name/value pairs in memory, I 
 just want to insert a single row into Cassandra in a single request/operation 
 so it is as fast as possible.
 The only thing I can seem to find is to do execute the following 50,000 times:
 {code}
 INSERT INTO results (row_id, index, value) values (my_row_id, ?, ?);
 {code}
 where the first {{?}} is is an index counter ({{i}}) and the second {{?}} is 
 the text value to store at location {{i}}.
 With the Datastax Java Driver client and C* server on the same development 
 machine, this took a full minute to execute.
 Oddly enough, the same 50,000 insert statements in a [Datastax Java Driver 
 Batch|http://www.datastax.com/drivers/java/apidocs/com/datastax/driver/core/querybuilder/QueryBuilder.html#batch(com.datastax.driver.core.Statement...)]
  on the same machine took 7.5 minutes.  I thought batches were supposed to be 
 _faster_ than individual inserts?
 We tried instead with a Thrift client (Astyanax) and the same insert via a 
 [MutationBatch|http://netflix.github.io/astyanax/javadoc/com/netflix/astyanax/MutationBatch.html].
   This took _235 milliseconds_.
 h3. Feature Request
 As a result of this performance testing, this issue is to request that CQL3 
 support batch mutation operations as a single operation (statement) to ensure 
 the same speed/performance benefits as existing Thrift clients.
 Example suggested syntax (based on the above example table/column family):
 {code}
 insert into results (row_id, (index,value)) values 
 ((0,text0), (1,text1), (2,text2), ..., (N,textN));
 {code}
 Each value in the {{values}} clause is a tuple.  The first tuple element is 
 the column name, the second tuple element is the column value.  This seems to 
 be the most simple/accurate representation of what happens during a batch 
 insert/mutate.
 Not having this CQL feature forced us to remove the Datastax Java Driver 
 (which we liked) in favor of Astyanax because Astyanax supports this 
 behavior.  We desire feature/performance parity between Thrift and 
 CQL3/Datastax Java Driver, so we hope this request improves both CQL3 and the 
 Driver.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5959) CQL3 support for multi-column insert in a single operation (Batch Insert / Batch Mutate)

2013-08-30 Thread Les Hazlewood (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13755171#comment-13755171
 ] 

Les Hazlewood commented on CASSANDRA-5959:
--

[~dbros...@apache.org] While the suggestion to support slice queries on a 
Cassandra collection would work for my particular example use case, I don't 
think it would be the ideal solution for C* in general: the suggested solution 
would not work for any collection larger than 65,535 elements since that is the 
C* max collection size.  If I choose to use a wide row for more columns, I'd 
expect the query to work on that as well.

Thanks for the idea!

 CQL3 support for multi-column insert in a single operation (Batch Insert / 
 Batch Mutate)
 

 Key: CASSANDRA-5959
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5959
 Project: Cassandra
  Issue Type: New Feature
  Components: Core, Drivers
Reporter: Les Hazlewood
  Labels: CQL

 h3. Impetus for this Request
 (from the original [question on 
 StackOverflow|http://stackoverflow.com/questions/18522191/using-cassandra-and-cql3-how-do-you-insert-an-entire-wide-row-in-a-single-reque]):
 I want to insert a single row with 50,000 columns into Cassandra 1.2.9. 
 Before inserting, I have all the data for the entire row ready to go (in 
 memory):
 {code}
 +-+--+--+--+--+---+
 | | 0| 1| 2| ...  | 4 |
 | row_id  +--+--+--+--+---+
 | | text | text | text | ...  | text  |
 +-+--+--+--|--+---+
 {code}
 The column names are integers, allowing slicing for pagination. The column 
 values are a value at that particular index.
 CQL3 table definition:
 {code}
 create table results (
 row_id text,
 index int,
 value text,
 primary key (row_id, index)
 ) 
 with compact storage;
 {code}
 As I already have the row_id and all 50,000 name/value pairs in memory, I 
 just want to insert a single row into Cassandra in a single request/operation 
 so it is as fast as possible.
 The only thing I can seem to find is to do execute the following 50,000 times:
 {code}
 INSERT INTO results (row_id, index, value) values (my_row_id, ?, ?);
 {code}
 where the first {{?}} is is an index counter ({{i}}) and the second {{?}} is 
 the text value to store at location {{i}}.
 With the Datastax Java Driver client and C* server on the same development 
 machine, this took a full minute to execute.
 Oddly enough, the same 50,000 insert statements in a [Datastax Java Driver 
 Batch|http://www.datastax.com/drivers/java/apidocs/com/datastax/driver/core/querybuilder/QueryBuilder.html#batch(com.datastax.driver.core.Statement...)]
  on the same machine took 7.5 minutes.  I thought batches were supposed to be 
 _faster_ than individual inserts?
 We tried instead with a Thrift client (Astyanax) and the same insert via a 
 [MutationBatch|http://netflix.github.io/astyanax/javadoc/com/netflix/astyanax/MutationBatch.html].
   This took _235 milliseconds_.
 h3. Feature Request
 As a result of this performance testing, this issue is to request that CQL3 
 support batch mutation operations as a single operation (statement) to ensure 
 the same speed/performance benefits as existing Thrift clients.
 Example suggested syntax (based on the above example table/column family):
 {code}
 insert into results (row_id, (index,value)) values 
 ((0,text0), (1,text1), (2,text2), ..., (N,textN));
 {code}
 Each value in the {{values}} clause is a tuple.  The first tuple element is 
 the column name, the second tuple element is the column value.  This seems to 
 be the most simple/accurate representation of what happens during a batch 
 insert/mutate.
 Not having this CQL feature forced us to remove the Datastax Java Driver 
 (which we liked) in favor of Astyanax because Astyanax supports this 
 behavior.  We desire feature/performance parity between Thrift and 
 CQL3/Datastax Java Driver, so we hope this request improves both CQL3 and the 
 Driver.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira