[jira] [Commented] (CASSANDRA-5959) CQL3 support for multi-column insert in a single operation (Batch Insert / Batch Mutate)
[ https://issues.apache.org/jira/browse/CASSANDRA-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756895#comment-13756895 ] Les Hazlewood commented on CASSANDRA-5959: -- [~slebresne] Thanks for the added comments! We're fine upgrading to 2.0 now that it has been made final and released, so we will be able to benefit from the CASSANDRA-4693 fix. But as you suggested, I do like the idea of adding in the proposed syntax as a convenience (understanding that it wouldn't be a performance improvement). But since this issue originally reflected our performance needs within our software (and not a person using cqlsh), our particular concern has been satisfied with the release of C* 2.0. I'll let someone else resurrect this issue if they feel it is desirable enough to consume C* software development time/resources. :) CQL3 support for multi-column insert in a single operation (Batch Insert / Batch Mutate) Key: CASSANDRA-5959 URL: https://issues.apache.org/jira/browse/CASSANDRA-5959 Project: Cassandra Issue Type: New Feature Components: Core, Drivers Reporter: Les Hazlewood Labels: CQL h3. Impetus for this Request (from the original [question on StackOverflow|http://stackoverflow.com/questions/18522191/using-cassandra-and-cql3-how-do-you-insert-an-entire-wide-row-in-a-single-reque]): I want to insert a single row with 50,000 columns into Cassandra 1.2.9. Before inserting, I have all the data for the entire row ready to go (in memory): {code} +-+--+--+--+--+---+ | | 0| 1| 2| ... | 4 | | row_id +--+--+--+--+---+ | | text | text | text | ... | text | +-+--+--+--|--+---+ {code} The column names are integers, allowing slicing for pagination. The column values are a value at that particular index. CQL3 table definition: {code} create table results ( row_id text, index int, value text, primary key (row_id, index) ) with compact storage; {code} As I already have the row_id and all 50,000 name/value pairs in memory, I just want to insert a single row into Cassandra in a single request/operation so it is as fast as possible. The only thing I can seem to find is to do execute the following 50,000 times: {code} INSERT INTO results (row_id, index, value) values (my_row_id, ?, ?); {code} where the first {{?}} is is an index counter ({{i}}) and the second {{?}} is the text value to store at location {{i}}. With the Datastax Java Driver client and C* server on the same development machine, this took a full minute to execute. Oddly enough, the same 50,000 insert statements in a [Datastax Java Driver Batch|http://www.datastax.com/drivers/java/apidocs/com/datastax/driver/core/querybuilder/QueryBuilder.html#batch(com.datastax.driver.core.Statement...)] on the same machine took 7.5 minutes. I thought batches were supposed to be _faster_ than individual inserts? We tried instead with a Thrift client (Astyanax) and the same insert via a [MutationBatch|http://netflix.github.io/astyanax/javadoc/com/netflix/astyanax/MutationBatch.html]. This took _235 milliseconds_. h3. Feature Request As a result of this performance testing, this issue is to request that CQL3 support batch mutation operations as a single operation (statement) to ensure the same speed/performance benefits as existing Thrift clients. Example suggested syntax (based on the above example table/column family): {code} insert into results (row_id, (index,value)) values ((0,text0), (1,text1), (2,text2), ..., (N,textN)); {code} Each value in the {{values}} clause is a tuple. The first tuple element is the column name, the second tuple element is the column value. This seems to be the most simple/accurate representation of what happens during a batch insert/mutate. Not having this CQL feature forced us to remove the Datastax Java Driver (which we liked) in favor of Astyanax because Astyanax supports this behavior. We desire feature/performance parity between Thrift and CQL3/Datastax Java Driver, so we hope this request improves both CQL3 and the Driver. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-5959) CQL3 support for multi-column insert in a single operation (Batch Insert / Batch Mutate)
Les Hazlewood created CASSANDRA-5959: Summary: CQL3 support for multi-column insert in a single operation (Batch Insert / Batch Mutate) Key: CASSANDRA-5959 URL: https://issues.apache.org/jira/browse/CASSANDRA-5959 Project: Cassandra Issue Type: New Feature Components: Core, Drivers Reporter: Les Hazlewood h3. Impetus for this Request (from the original [question on StackOverflow|http://stackoverflow.com/questions/18522191/using-cassandra-and-cql3-how-do-you-insert-an-entire-wide-row-in-a-single-reque]): I want to insert a single row with 50,000 columns into Cassandra 1.2.9. Before inserting, I have all the data for the entire row ready to go (in memory): {code} +-+--+--+--+--+---+ | | 0| 1| 2| ... | 4 | | row_id +--+--+--+--+---+ | | text | text | text | ... | text | +-+--+--+--|--+---+ {code} The column names are integers, allowing slicing for pagination. The column values are a value at that particular index. CQL3 table definition: {code} create table results ( row_id text, index int, value text, primary key (row_id, index) ) with compact storage; {code} As I already have the row_id and all 50,000 name/value pairs in memory, I just want to insert a single row into Cassandra in a single request/operation so it is as fast as possible. The only thing I can seem to find is to do execute the following 50,000 times: {code} INSERT INTO results (row_id, index, value) values (my_row_id, ?, ?); {code} where the first {{?}} is is an index counter ({{i}}) and the second {{?}} is the text value to store at location {{i}}. With the Datastax Java Driver client and C* server on the same development machine, this took a full minute to execute. Oddly enough, the same 50,000 insert statements in a [Datastax Java Driver Batch|http://www.datastax.com/drivers/java/apidocs/com/datastax/driver/core/querybuilder/QueryBuilder.html#batch(com.datastax.driver.core.Statement...)] on the same machine took 7.5 minutes. I thought batches were supposed to be _faster_ than individual inserts? We tried instead with a Thrift client (Astyanax) and the same insert via a [MutationBatch|http://netflix.github.io/astyanax/javadoc/com/netflix/astyanax/MutationBatch.html]. This took _235 milliseconds_. h3. Feature Request As a result of this performance testing, this issue is to request that CQL3 support batch mutation operations as a single operation (statement) to ensure the same speed/performance benefits as existing Thrift clients. Example suggested syntax (based on the above example table/column family): {code} insert into results (row_id, (index,value)) values ((0,text0), (1,text1), (2,text2), ..., (N,textN)); {code} Each value in the {{values}} clause is a tuple. The first tuple element is the column name, the second tuple element is the column value. This seems to be the most simple/accurate representation of what happens during a batch insert/mutate. Not having this CQL feature forced us to remove the Datastax Java Driver (which we liked) in favor of Astyanax because Astyanax supports this behavior. We desire feature/performance parity between Thrift and CQL3/Datastax Java Driver, so we hope this request improves both CQL3 and the Driver. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-5959) CQL3 support for multi-column insert in a single operation (Batch Insert / Batch Mutate)
[ https://issues.apache.org/jira/browse/CASSANDRA-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Les Hazlewood updated CASSANDRA-5959: - Description: h3. Impetus for this Request (from the original [question on StackOverflow|http://stackoverflow.com/questions/18522191/using-cassandra-and-cql3-how-do-you-insert-an-entire-wide-row-in-a-single-reque]): I want to insert a single row with 50,000 columns into Cassandra 1.2.9. Before inserting, I have all the data for the entire row ready to go (in memory): {code} +-+--+--+--+--+---+ | | 0| 1| 2| ... | 4 | | row_id +--+--+--+--+---+ | | text | text | text | ... | text | +-+--+--+--|--+---+ {code} The column names are integers, allowing slicing for pagination. The column values are a value at that particular index. CQL3 table definition: {code} create table results ( row_id text, index int, value text, primary key (row_id, index) ) with compact storage; {code} As I already have the row_id and all 50,000 name/value pairs in memory, I just want to insert a single row into Cassandra in a single request/operation so it is as fast as possible. The only thing I can seem to find is to do execute the following 50,000 times: {code} INSERT INTO results (row_id, index, value) values (my_row_id, ?, ?); {code} where the first {{?}} is is an index counter ({{i}}) and the second {{?}} is the text value to store at location {{i}}. With the Datastax Java Driver client and C* server on the same development machine, this took a full minute to execute. Oddly enough, the same 50,000 insert statements in a [Datastax Java Driver Batch|http://www.datastax.com/drivers/java/apidocs/com/datastax/driver/core/querybuilder/QueryBuilder.html#batch(com.datastax.driver.core.Statement...)] on the same machine took 7.5 minutes. I thought batches were supposed to be _faster_ than individual inserts? We tried instead with a Thrift client (Astyanax) and the same insert via a [MutationBatch|http://netflix.github.io/astyanax/javadoc/com/netflix/astyanax/MutationBatch.html]. This took _235 milliseconds_. h3. Feature Request As a result of this performance testing, this issue is to request that CQL3 support batch mutation operations as a single operation (statement) to ensure the same speed/performance benefits as existing Thrift clients. Example suggested syntax (based on the above example table/column family): {code} insert into results (row_id, (index,value)) values ((0,text0), (1,text1), (2,text2), ..., (N,textN)); {code} Each value in the {{values}} clause is a tuple. The first tuple element is the column name, the second tuple element is the column value. This seems to be the most simple/accurate representation of what happens during a batch insert/mutate. Not having this CQL feature forced us to remove the Datastax Java Driver (which we liked) in favor of Astyanax because Astyanax supports this behavior. We desire feature/performance parity between Thrift and CQL3/Datastax Java Driver, so we hope this request improves both CQL3 and the Driver. was: h3. Impetus for this Request (from the original [question on StackOverflow|http://stackoverflow.com/questions/18522191/using-cassandra-and-cql3-how-do-you-insert-an-entire-wide-row-in-a-single-reque]): I want to insert a single row with 50,000 columns into Cassandra 1.2.9. Before inserting, I have all the data for the entire row ready to go (in memory): {code} +-+--+--+--+--+---+ | | 0| 1| 2| ... | 4 | | row_id +--+--+--+--+---+ | | text | text | text | ... | text | +-+--+--+--|--+---+ {code} The column names are integers, allowing slicing for pagination. The column values are a value at that particular index. CQL3 table definition: {code} create table results ( row_id text, index int, value text, primary key (row_id, index) ) with compact storage; {code} As I already have the row_id and all 50,000 name/value pairs in memory, I just want to insert a single row into Cassandra in a single request/operation so it is as fast as possible. The only thing I can seem to find is to do execute the following 50,000 times: {code} INSERT INTO results (row_id, index, value) values (my_row_id, ?, ?); {code} where the first {{?}} is is an index counter ({{i}}) and the second {{?}} is the text value to store at location {{i}}. With the Datastax Java Driver client and C* server on the same development machine, this took a full minute to execute. Oddly enough, the same 50,000 insert statements in a [Datastax Java Driver Batch|http://www.datastax.com/drivers/java/apidocs/com/datastax/driver/core/querybuilder/QueryBuilder.html#batch(com.datastax.driver.core.Statement...)]
[jira] [Commented] (CASSANDRA-5959) CQL3 support for multi-column insert in a single operation (Batch Insert / Batch Mutate)
[ https://issues.apache.org/jira/browse/CASSANDRA-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13755160#comment-13755160 ] Les Hazlewood commented on CASSANDRA-5959: -- [~iamaleksey] Do you expect this to have the same performance as a Batch Mutate? CQL3 support for multi-column insert in a single operation (Batch Insert / Batch Mutate) Key: CASSANDRA-5959 URL: https://issues.apache.org/jira/browse/CASSANDRA-5959 Project: Cassandra Issue Type: New Feature Components: Core, Drivers Reporter: Les Hazlewood Labels: CQL h3. Impetus for this Request (from the original [question on StackOverflow|http://stackoverflow.com/questions/18522191/using-cassandra-and-cql3-how-do-you-insert-an-entire-wide-row-in-a-single-reque]): I want to insert a single row with 50,000 columns into Cassandra 1.2.9. Before inserting, I have all the data for the entire row ready to go (in memory): {code} +-+--+--+--+--+---+ | | 0| 1| 2| ... | 4 | | row_id +--+--+--+--+---+ | | text | text | text | ... | text | +-+--+--+--|--+---+ {code} The column names are integers, allowing slicing for pagination. The column values are a value at that particular index. CQL3 table definition: {code} create table results ( row_id text, index int, value text, primary key (row_id, index) ) with compact storage; {code} As I already have the row_id and all 50,000 name/value pairs in memory, I just want to insert a single row into Cassandra in a single request/operation so it is as fast as possible. The only thing I can seem to find is to do execute the following 50,000 times: {code} INSERT INTO results (row_id, index, value) values (my_row_id, ?, ?); {code} where the first {{?}} is is an index counter ({{i}}) and the second {{?}} is the text value to store at location {{i}}. With the Datastax Java Driver client and C* server on the same development machine, this took a full minute to execute. Oddly enough, the same 50,000 insert statements in a [Datastax Java Driver Batch|http://www.datastax.com/drivers/java/apidocs/com/datastax/driver/core/querybuilder/QueryBuilder.html#batch(com.datastax.driver.core.Statement...)] on the same machine took 7.5 minutes. I thought batches were supposed to be _faster_ than individual inserts? We tried instead with a Thrift client (Astyanax) and the same insert via a [MutationBatch|http://netflix.github.io/astyanax/javadoc/com/netflix/astyanax/MutationBatch.html]. This took _235 milliseconds_. h3. Feature Request As a result of this performance testing, this issue is to request that CQL3 support batch mutation operations as a single operation (statement) to ensure the same speed/performance benefits as existing Thrift clients. Example suggested syntax (based on the above example table/column family): {code} insert into results (row_id, (index,value)) values ((0,text0), (1,text1), (2,text2), ..., (N,textN)); {code} Each value in the {{values}} clause is a tuple. The first tuple element is the column name, the second tuple element is the column value. This seems to be the most simple/accurate representation of what happens during a batch insert/mutate. Not having this CQL feature forced us to remove the Datastax Java Driver (which we liked) in favor of Astyanax because Astyanax supports this behavior. We desire feature/performance parity between Thrift and CQL3/Datastax Java Driver, so we hope this request improves both CQL3 and the Driver. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5959) CQL3 support for multi-column insert in a single operation (Batch Insert / Batch Mutate)
[ https://issues.apache.org/jira/browse/CASSANDRA-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13755171#comment-13755171 ] Les Hazlewood commented on CASSANDRA-5959: -- [~dbros...@apache.org] While the suggestion to support slice queries on a Cassandra collection would work for my particular example use case, I don't think it would be the ideal solution for C* in general: the suggested solution would not work for any collection larger than 65,535 elements since that is the C* max collection size. If I choose to use a wide row for more columns, I'd expect the query to work on that as well. Thanks for the idea! CQL3 support for multi-column insert in a single operation (Batch Insert / Batch Mutate) Key: CASSANDRA-5959 URL: https://issues.apache.org/jira/browse/CASSANDRA-5959 Project: Cassandra Issue Type: New Feature Components: Core, Drivers Reporter: Les Hazlewood Labels: CQL h3. Impetus for this Request (from the original [question on StackOverflow|http://stackoverflow.com/questions/18522191/using-cassandra-and-cql3-how-do-you-insert-an-entire-wide-row-in-a-single-reque]): I want to insert a single row with 50,000 columns into Cassandra 1.2.9. Before inserting, I have all the data for the entire row ready to go (in memory): {code} +-+--+--+--+--+---+ | | 0| 1| 2| ... | 4 | | row_id +--+--+--+--+---+ | | text | text | text | ... | text | +-+--+--+--|--+---+ {code} The column names are integers, allowing slicing for pagination. The column values are a value at that particular index. CQL3 table definition: {code} create table results ( row_id text, index int, value text, primary key (row_id, index) ) with compact storage; {code} As I already have the row_id and all 50,000 name/value pairs in memory, I just want to insert a single row into Cassandra in a single request/operation so it is as fast as possible. The only thing I can seem to find is to do execute the following 50,000 times: {code} INSERT INTO results (row_id, index, value) values (my_row_id, ?, ?); {code} where the first {{?}} is is an index counter ({{i}}) and the second {{?}} is the text value to store at location {{i}}. With the Datastax Java Driver client and C* server on the same development machine, this took a full minute to execute. Oddly enough, the same 50,000 insert statements in a [Datastax Java Driver Batch|http://www.datastax.com/drivers/java/apidocs/com/datastax/driver/core/querybuilder/QueryBuilder.html#batch(com.datastax.driver.core.Statement...)] on the same machine took 7.5 minutes. I thought batches were supposed to be _faster_ than individual inserts? We tried instead with a Thrift client (Astyanax) and the same insert via a [MutationBatch|http://netflix.github.io/astyanax/javadoc/com/netflix/astyanax/MutationBatch.html]. This took _235 milliseconds_. h3. Feature Request As a result of this performance testing, this issue is to request that CQL3 support batch mutation operations as a single operation (statement) to ensure the same speed/performance benefits as existing Thrift clients. Example suggested syntax (based on the above example table/column family): {code} insert into results (row_id, (index,value)) values ((0,text0), (1,text1), (2,text2), ..., (N,textN)); {code} Each value in the {{values}} clause is a tuple. The first tuple element is the column name, the second tuple element is the column value. This seems to be the most simple/accurate representation of what happens during a batch insert/mutate. Not having this CQL feature forced us to remove the Datastax Java Driver (which we liked) in favor of Astyanax because Astyanax supports this behavior. We desire feature/performance parity between Thrift and CQL3/Datastax Java Driver, so we hope this request improves both CQL3 and the Driver. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira