[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486115#comment-13486115 ] Edward Capriolo commented on CASSANDRA-4815: I do not think set and get are syntactic features that should be out of this discussion. I was doing some blogging this weekend and came to the re-realization that, BigTable just provides a simple low level API. So its fairly hard for us to argue that Cassandra should not have a simple set and get. Thinking further into this I think the new transport only being able to execute CQL queries is a huge defect. We are going to continually have these discussions about what we can and can't do in CQL, that we can do in thrift. We should not have to spend time designing CQL features to solve impedance mismatches between RPC and query languages, and we should not be redesigning Cassandra so every operation fits into a CQL language. We have to face a reality, it is going to be quite awkward for to clients to maintain multiple connection pools for client requests, 1 for thrift, one for cql2, and one for cql3, one for cql4, etc. The new transport should be able to piggyback thrift requests somehow, this way a user only needs to maintain a single client connection. Make CQL work naturally with wide rows -- Key: CASSANDRA-4815 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815 Project: Cassandra Issue Type: Wish Reporter: Edward Capriolo Attachments: cql feature set updated.png, table.png I find that CQL3 is quite obtuse and does not provide me a language useful for accessing my data. First, lets point out how we should design Cassandra data. 1) Denormalize 2) Eliminate seeks 3) Design for read 4) optimize for blind writes So here is a schema that abides by these tried and tested rules large production uses are employing today. Say we have a table of movie objects: Movie Name Description - tags (string) - credits composite(role string, name string ) -1 likesToday -1 blacklisted The above structure is a movie notice it hold a mix of static and dynamic columns, but the other all number of columns is not very large. (even if it was larger this is OK as well) Notice this table is not just a single one to many relationship, it has 1 to 1 data and it has two sets of 1 to many data. The schema today is declared something like this: create column family movies with default_comparator=UTF8Type and column_metadata = [ {column_name: blacklisted, validation_class: int}, {column_name: likestoday, validation_class: long}, {column_name: description, validation_class: UTF8Type} ]; We should be able to insert data like this: set ['Cassandra Database, not looking for a seQL']['blacklisted']=1; set ['Cassandra Database, not looking for a seQL']['likesToday']=34; set ['Cassandra Database, not looking for a seQL']['credits-dir']='director:asf'; set ['Cassandra Database, not looking for a seQL']['credits-jir]='jiraguy:bob'; set ['Cassandra Database, not looking for a seQL']['tags-action']=''; set ['Cassandra Database, not looking for a seQL']['tags-adventure']=''; set ['Cassandra Database, not looking for a seQL']['tags-romance']=''; set ['Cassandra Database, not looking for a seQL']['tags-programming']=''; This is the correct way to do it. 1 seek to find all the information related to a movie. As long as this row does not get large there is no reason to optimize by breaking data into other column families. (Notice you can not transpose this because movies is two 1-to-many relationships of potentially different types) Lets look at the CQL3 way to do this design: First, contrary to the original design of cassandra CQL does not like wide rows. It also does not have a good way to dealing with dynamic rows together with static rows either. You have two options: Option 1: lose all schema create table movies ( name string, column blob, value blob, primary key(name)) with compact storage. This method is not so hot we have not lost all our validators, and by the way you have to physically shutdown everything and rename files and recreate your schema if you want to inform cassandra that a current table should be compact. This could at very least be just a metadata change. Also you can not add column schema either. Option 2 Normalize (is even worse) create table movie (name String, description string, likestoday int, blacklisted int); create table movecredits( name string, role string, personname string, primary key(name,role) ); create table movetags( name string, tag string, primary key (name,tag) ); This is a terrible design, of the 4 key characteristics how cassandra data should be designed it fails 3: It does
[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486158#comment-13486158 ] Jonathan Ellis commented on CASSANDRA-4815: --- I'm still not sure we're on the same page as far as GET and SET go. I'm saying that functionally, if you have {code} create column family test; {code} (all cli defaults -- everything is bytes), then {code} set test['ff']['dd'] = 'cc'; {code} in the cli (translation to Thrift left as an exercise for the reader) is EXACTLY the same as {code} insert into test(key, column1, value) values ('ff', 'dd', 'cc'); {code} in cql. If you think we're missing functionality here then let's clear that up. But if you're hung up on the syntax then we'll have to agree to disagree. Make CQL work naturally with wide rows -- Key: CASSANDRA-4815 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815 Project: Cassandra Issue Type: Wish Reporter: Edward Capriolo Attachments: cql feature set updated.png, table.png I find that CQL3 is quite obtuse and does not provide me a language useful for accessing my data. First, lets point out how we should design Cassandra data. 1) Denormalize 2) Eliminate seeks 3) Design for read 4) optimize for blind writes So here is a schema that abides by these tried and tested rules large production uses are employing today. Say we have a table of movie objects: Movie Name Description - tags (string) - credits composite(role string, name string ) -1 likesToday -1 blacklisted The above structure is a movie notice it hold a mix of static and dynamic columns, but the other all number of columns is not very large. (even if it was larger this is OK as well) Notice this table is not just a single one to many relationship, it has 1 to 1 data and it has two sets of 1 to many data. The schema today is declared something like this: create column family movies with default_comparator=UTF8Type and column_metadata = [ {column_name: blacklisted, validation_class: int}, {column_name: likestoday, validation_class: long}, {column_name: description, validation_class: UTF8Type} ]; We should be able to insert data like this: set ['Cassandra Database, not looking for a seQL']['blacklisted']=1; set ['Cassandra Database, not looking for a seQL']['likesToday']=34; set ['Cassandra Database, not looking for a seQL']['credits-dir']='director:asf'; set ['Cassandra Database, not looking for a seQL']['credits-jir]='jiraguy:bob'; set ['Cassandra Database, not looking for a seQL']['tags-action']=''; set ['Cassandra Database, not looking for a seQL']['tags-adventure']=''; set ['Cassandra Database, not looking for a seQL']['tags-romance']=''; set ['Cassandra Database, not looking for a seQL']['tags-programming']=''; This is the correct way to do it. 1 seek to find all the information related to a movie. As long as this row does not get large there is no reason to optimize by breaking data into other column families. (Notice you can not transpose this because movies is two 1-to-many relationships of potentially different types) Lets look at the CQL3 way to do this design: First, contrary to the original design of cassandra CQL does not like wide rows. It also does not have a good way to dealing with dynamic rows together with static rows either. You have two options: Option 1: lose all schema create table movies ( name string, column blob, value blob, primary key(name)) with compact storage. This method is not so hot we have not lost all our validators, and by the way you have to physically shutdown everything and rename files and recreate your schema if you want to inform cassandra that a current table should be compact. This could at very least be just a metadata change. Also you can not add column schema either. Option 2 Normalize (is even worse) create table movie (name String, description string, likestoday int, blacklisted int); create table movecredits( name string, role string, personname string, primary key(name,role) ); create table movetags( name string, tag string, primary key (name,tag) ); This is a terrible design, of the 4 key characteristics how cassandra data should be designed it fails 3: It does not: 1) Denormalize 2) Eliminate seeks 3) Design for read Why is Cassandra steering toward this course, by making a language that does not understand wide rows? So what can be done? My suggestions: Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a virtual view that is compact storage with no work to migrate data and recreate schemas. Every table should have a compact view for the schemaless, or a simple query hint like /*transposed*/ should make this change. Metadata
[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482161#comment-13482161 ] Sylvain Lebresne commented on CASSANDRA-4815: - bq. Now to insert into this table I need to format everything as hex Not for prepared statement where all the value will be in binary. What I mean here is that as far as CQL-the-language is concerned, you can absolutely use it to think of Cassandra as a memcache (in fact, I'd say that the hard would be to think of Cassandra as a relational database because it's not a relational database, at least not a full blown one, and CQL don't change that). Now if the remark is that it's less convenient to work with blobs in cqlsh than it was with the cli, then I can agree to that and I'm fine trying to fix it, but let's maybe keep that to CASSANDRA-3799. Make CQL work naturally with wide rows -- Key: CASSANDRA-4815 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815 Project: Cassandra Issue Type: Wish Reporter: Edward Capriolo I find that CQL3 is quite obtuse and does not provide me a language useful for accessing my data. First, lets point out how we should design Cassandra data. 1) Denormalize 2) Eliminate seeks 3) Design for read 4) optimize for blind writes So here is a schema that abides by these tried and tested rules large production uses are employing today. Say we have a table of movie objects: Movie Name Description - tags (string) - credits composite(role string, name string ) -1 likesToday -1 blacklisted The above structure is a movie notice it hold a mix of static and dynamic columns, but the other all number of columns is not very large. (even if it was larger this is OK as well) Notice this table is not just a single one to many relationship, it has 1 to 1 data and it has two sets of 1 to many data. The schema today is declared something like this: create column family movies with default_comparator=UTF8Type and column_metadata = [ {column_name: blacklisted, validation_class: int}, {column_name: likestoday, validation_class: long}, {column_name: description, validation_class: UTF8Type} ]; We should be able to insert data like this: set ['Cassandra Database, not looking for a seQL']['blacklisted']=1; set ['Cassandra Database, not looking for a seQL']['likesToday']=34; set ['Cassandra Database, not looking for a seQL']['credits-dir']='director:asf'; set ['Cassandra Database, not looking for a seQL']['credits-jir]='jiraguy:bob'; set ['Cassandra Database, not looking for a seQL']['tags-action']=''; set ['Cassandra Database, not looking for a seQL']['tags-adventure']=''; set ['Cassandra Database, not looking for a seQL']['tags-romance']=''; set ['Cassandra Database, not looking for a seQL']['tags-programming']=''; This is the correct way to do it. 1 seek to find all the information related to a movie. As long as this row does not get large there is no reason to optimize by breaking data into other column families. (Notice you can not transpose this because movies is two 1-to-many relationships of potentially different types) Lets look at the CQL3 way to do this design: First, contrary to the original design of cassandra CQL does not like wide rows. It also does not have a good way to dealing with dynamic rows together with static rows either. You have two options: Option 1: lose all schema create table movies ( name string, column blob, value blob, primary key(name)) with compact storage. This method is not so hot we have not lost all our validators, and by the way you have to physically shutdown everything and rename files and recreate your schema if you want to inform cassandra that a current table should be compact. This could at very least be just a metadata change. Also you can not add column schema either. Option 2 Normalize (is even worse) create table movie (name String, description string, likestoday int, blacklisted int); create table movecredits( name string, role string, personname string, primary key(name,role) ); create table movetags( name string, tag string, primary key (name,tag) ); This is a terrible design, of the 4 key characteristics how cassandra data should be designed it fails 3: It does not: 1) Denormalize 2) Eliminate seeks 3) Design for read Why is Cassandra steering toward this course, by making a language that does not understand wide rows? So what can be done? My suggestions: Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a virtual view that is compact storage with no work to migrate data and recreate schemas. Every table should have a compact view for the schemaless, or a simple query hint like /*transposed*/ should
[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482439#comment-13482439 ] Edward Capriolo commented on CASSANDRA-4815: @Jonathan agreed. My main concern is that a user can set schema-less columns. This currently looks possible with compact storage tables but not possible with non-compact storage tables. I attached that table to show what features a user has with one client vs the other. I am not necessarily arguing that CQL should have a given feature in that table. I was only trying to show that based on the client users have access to some features and not others. Also I wanted to highlight how all the different clients have strengths and deficiencies. Internally I have to sell things to people and I just wanted to show the CQL and CQLsh are week in comparison to the CLI for schema-less columns. For my QA person as an example, they learned how set and assume worked in the CLI and the had functions like ascii(). These things are missing and people are effected. So my table is not to say all the things CQL should support just to show the reality of what users are faced with. Make CQL work naturally with wide rows -- Key: CASSANDRA-4815 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815 Project: Cassandra Issue Type: Wish Reporter: Edward Capriolo Attachments: cql feature set updated.png, table.png I find that CQL3 is quite obtuse and does not provide me a language useful for accessing my data. First, lets point out how we should design Cassandra data. 1) Denormalize 2) Eliminate seeks 3) Design for read 4) optimize for blind writes So here is a schema that abides by these tried and tested rules large production uses are employing today. Say we have a table of movie objects: Movie Name Description - tags (string) - credits composite(role string, name string ) -1 likesToday -1 blacklisted The above structure is a movie notice it hold a mix of static and dynamic columns, but the other all number of columns is not very large. (even if it was larger this is OK as well) Notice this table is not just a single one to many relationship, it has 1 to 1 data and it has two sets of 1 to many data. The schema today is declared something like this: create column family movies with default_comparator=UTF8Type and column_metadata = [ {column_name: blacklisted, validation_class: int}, {column_name: likestoday, validation_class: long}, {column_name: description, validation_class: UTF8Type} ]; We should be able to insert data like this: set ['Cassandra Database, not looking for a seQL']['blacklisted']=1; set ['Cassandra Database, not looking for a seQL']['likesToday']=34; set ['Cassandra Database, not looking for a seQL']['credits-dir']='director:asf'; set ['Cassandra Database, not looking for a seQL']['credits-jir]='jiraguy:bob'; set ['Cassandra Database, not looking for a seQL']['tags-action']=''; set ['Cassandra Database, not looking for a seQL']['tags-adventure']=''; set ['Cassandra Database, not looking for a seQL']['tags-romance']=''; set ['Cassandra Database, not looking for a seQL']['tags-programming']=''; This is the correct way to do it. 1 seek to find all the information related to a movie. As long as this row does not get large there is no reason to optimize by breaking data into other column families. (Notice you can not transpose this because movies is two 1-to-many relationships of potentially different types) Lets look at the CQL3 way to do this design: First, contrary to the original design of cassandra CQL does not like wide rows. It also does not have a good way to dealing with dynamic rows together with static rows either. You have two options: Option 1: lose all schema create table movies ( name string, column blob, value blob, primary key(name)) with compact storage. This method is not so hot we have not lost all our validators, and by the way you have to physically shutdown everything and rename files and recreate your schema if you want to inform cassandra that a current table should be compact. This could at very least be just a metadata change. Also you can not add column schema either. Option 2 Normalize (is even worse) create table movie (name String, description string, likestoday int, blacklisted int); create table movecredits( name string, role string, personname string, primary key(name,role) ); create table movetags( name string, tag string, primary key (name,tag) ); This is a terrible design, of the 4 key characteristics how cassandra data should be designed it fails 3: It does not: 1) Denormalize 2) Eliminate seeks 3) Design for read Why is Cassandra steering toward
[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482445#comment-13482445 ] Edward Capriolo commented on CASSANDRA-4815: @Jonathan I disagree with the attachment. For slice composites in CQL3 you have 'YES'. But with compact storage we can only slice based on the first value of the composite. This is why I said 'KINDA' because a composite might be a very wide row. Thus if the first value of the composite has 100,000 values equal to 5 and then the second part of the composite has high cardinality that can not be sliced effectively. The way I would say this is CQL3 can effectively slice the composites it created in schema-full tables, CQL3 can slice only on the first column of composite in a schema-less table. Sylvain agreed with this above {quote}It's possibly nitpicking, but I would talk of a difficulty in poperly paginating composites. But yes, that's one of the very few things that CQL3 is not currently very good at. But we'll fix it (and the good thing about having a query language is that it will be trivial to fix it without a backward incompatible breaking change). {quote} Make CQL work naturally with wide rows -- Key: CASSANDRA-4815 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815 Project: Cassandra Issue Type: Wish Reporter: Edward Capriolo Attachments: cql feature set updated.png, table.png I find that CQL3 is quite obtuse and does not provide me a language useful for accessing my data. First, lets point out how we should design Cassandra data. 1) Denormalize 2) Eliminate seeks 3) Design for read 4) optimize for blind writes So here is a schema that abides by these tried and tested rules large production uses are employing today. Say we have a table of movie objects: Movie Name Description - tags (string) - credits composite(role string, name string ) -1 likesToday -1 blacklisted The above structure is a movie notice it hold a mix of static and dynamic columns, but the other all number of columns is not very large. (even if it was larger this is OK as well) Notice this table is not just a single one to many relationship, it has 1 to 1 data and it has two sets of 1 to many data. The schema today is declared something like this: create column family movies with default_comparator=UTF8Type and column_metadata = [ {column_name: blacklisted, validation_class: int}, {column_name: likestoday, validation_class: long}, {column_name: description, validation_class: UTF8Type} ]; We should be able to insert data like this: set ['Cassandra Database, not looking for a seQL']['blacklisted']=1; set ['Cassandra Database, not looking for a seQL']['likesToday']=34; set ['Cassandra Database, not looking for a seQL']['credits-dir']='director:asf'; set ['Cassandra Database, not looking for a seQL']['credits-jir]='jiraguy:bob'; set ['Cassandra Database, not looking for a seQL']['tags-action']=''; set ['Cassandra Database, not looking for a seQL']['tags-adventure']=''; set ['Cassandra Database, not looking for a seQL']['tags-romance']=''; set ['Cassandra Database, not looking for a seQL']['tags-programming']=''; This is the correct way to do it. 1 seek to find all the information related to a movie. As long as this row does not get large there is no reason to optimize by breaking data into other column families. (Notice you can not transpose this because movies is two 1-to-many relationships of potentially different types) Lets look at the CQL3 way to do this design: First, contrary to the original design of cassandra CQL does not like wide rows. It also does not have a good way to dealing with dynamic rows together with static rows either. You have two options: Option 1: lose all schema create table movies ( name string, column blob, value blob, primary key(name)) with compact storage. This method is not so hot we have not lost all our validators, and by the way you have to physically shutdown everything and rename files and recreate your schema if you want to inform cassandra that a current table should be compact. This could at very least be just a metadata change. Also you can not add column schema either. Option 2 Normalize (is even worse) create table movie (name String, description string, likestoday int, blacklisted int); create table movecredits( name string, role string, personname string, primary key(name,role) ); create table movetags( name string, tag string, primary key (name,tag) ); This is a terrible design, of the 4 key characteristics how cassandra data should be designed it fails 3: It does not: 1) Denormalize 2) Eliminate seeks 3) Design for read Why is Cassandra steering toward this
[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482464#comment-13482464 ] Sylvain Lebresne commented on CASSANDRA-4815: - Though there is some truth about slice on composites currently having a few limitations in CQL3, CQL3 can slice only on the first column of composite is not true either (regardless of it being a schema-less or a compact/non-compact table). I've just created CASSANDRA-4851 to lift that limitation (and as I explain in that ticket, you can slice any component, not only the first one, but you cannot page simultaneously on both in a way, which imo is only useful for pagination in real life). Nevertheless, it is a current limitation, but let it be clear that we intend to fix it. I have the feeling that there is a misunderstanding in that some seem to believe that we intend to limit the possible use case for Cassandra with CQL3. That is absolutely not the case. In fact, aside for CASSANDRA-4851 (which I think is fairly specific) and creating a secondary index on a specific column of a wide row (feature that I've only ever see one person using, and even he agree that was kind of a hack and for which CASSANDRA-3782 is open nonetheless), I'm not aware of any use cases that thrift support but CQL3 doesn't. And when I say that, I'm including things as dynamic as using DynamicCompositeType (that I don't particularly encourage anyone to use btw, I'm still looking for a compelling use case where it is truly necessary). That is, CQL3 doesn't provide any nice syntax to work with DynamicCompositeType, but you can still use it the same way you do in thrift (the syntax will be pretty much as convenient as in thrift, that is not very convenient at all, but you can do it and it's not worth than in thrift). Make CQL work naturally with wide rows -- Key: CASSANDRA-4815 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815 Project: Cassandra Issue Type: Wish Reporter: Edward Capriolo Attachments: cql feature set updated.png, table.png I find that CQL3 is quite obtuse and does not provide me a language useful for accessing my data. First, lets point out how we should design Cassandra data. 1) Denormalize 2) Eliminate seeks 3) Design for read 4) optimize for blind writes So here is a schema that abides by these tried and tested rules large production uses are employing today. Say we have a table of movie objects: Movie Name Description - tags (string) - credits composite(role string, name string ) -1 likesToday -1 blacklisted The above structure is a movie notice it hold a mix of static and dynamic columns, but the other all number of columns is not very large. (even if it was larger this is OK as well) Notice this table is not just a single one to many relationship, it has 1 to 1 data and it has two sets of 1 to many data. The schema today is declared something like this: create column family movies with default_comparator=UTF8Type and column_metadata = [ {column_name: blacklisted, validation_class: int}, {column_name: likestoday, validation_class: long}, {column_name: description, validation_class: UTF8Type} ]; We should be able to insert data like this: set ['Cassandra Database, not looking for a seQL']['blacklisted']=1; set ['Cassandra Database, not looking for a seQL']['likesToday']=34; set ['Cassandra Database, not looking for a seQL']['credits-dir']='director:asf'; set ['Cassandra Database, not looking for a seQL']['credits-jir]='jiraguy:bob'; set ['Cassandra Database, not looking for a seQL']['tags-action']=''; set ['Cassandra Database, not looking for a seQL']['tags-adventure']=''; set ['Cassandra Database, not looking for a seQL']['tags-romance']=''; set ['Cassandra Database, not looking for a seQL']['tags-programming']=''; This is the correct way to do it. 1 seek to find all the information related to a movie. As long as this row does not get large there is no reason to optimize by breaking data into other column families. (Notice you can not transpose this because movies is two 1-to-many relationships of potentially different types) Lets look at the CQL3 way to do this design: First, contrary to the original design of cassandra CQL does not like wide rows. It also does not have a good way to dealing with dynamic rows together with static rows either. You have two options: Option 1: lose all schema create table movies ( name string, column blob, value blob, primary key(name)) with compact storage. This method is not so hot we have not lost all our validators, and by the way you have to physically shutdown everything and rename files and recreate your schema if you want to inform cassandra that a current
[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481264#comment-13481264 ] Sylvain Lebresne commented on CASSANDRA-4815: - bq. Can I create a schema less table? Yes. The following as-schemaless-as-can-possibly-be thrift/cli definition: {noformat} create column family schemaless with key_validation_class = BytesType and comparator = BytesType and default_validation_class = BytesType {noformat} is *equivalent* to the following CQL3 definition {noformat} CREATE TABLE schemaless ( key blob, column blob, value blob, PRIMARY KEY (key, column) ) WITH COMPACT STORAGE {noformat} And to be clear, when I say equivalent, I mean equivalent. If you create the first definion above, you can use the column family in CQL3 as if it was defined by the second definition (as in, you don't have to do the CREATE TABLE itself), or you can create the table in CQL3 first with the second query and query it in thrift exactly as if it had been created by the first definition. The composite primary key is what tells CQL3 that it's a transposed wide CF. In other words, in CQL3, 'key' will map to the row key, 'column' will map to the internal column name and 'value' will map to the internal column value. I note that 'key', 'column' and 'value' are the default names that CQL3 picks for you when you haven't explicitely defined user friendlier one (in other words, when you upgrade from thrift). CASSANDRA-4822 is open to allow you to rename those default names to more user friendly ones if you so wish (and to be clear, doing so as no impact whatsoever on what is stored, it just declare the new names as CQL3 metadata). bq. I guess this is slightly more difficult to express composite slices. It's possibly nitpicking, but I would talk of a difficulty in poperly paginating composites. But yes, that's one of the very few things that CQL3 is not currently very good at. But we'll fix it (and the good thing about having a query language is that it will be trivial to fix it without a backward incompatible breaking change). That being said, I do believe that once you start doing real life example, it's not really a blocker. Most of the time, when you use composites in real life, you want to slice over one of the component, which works fine. That's why it's really more a problem for slightly more complex pagination over composite wide rows. There is also CASSANDRA-4415 that will fix the need for a good part of the manual pagination people do right now. bq. If we have an old style schema don't we need to be able to alter a current table. As explained above, thrift CF *are* directly accessible from CQL3 (without any redefinition, and that's why trying to create the table in CQL3 is not legal). However, you won't nice column names if you do so (but rather the 'key', 'column' and 'value' generic names above). Again, CASSANDRA-4822 will allow to declare nice names without having to do complex operation (like trashing your thrift schema so that CQL3 allow the redefinition). bq. What is going to happen if Cassandra and the CQL language actually adds true composite row keys? It does already: CASSANDRA-4179. You just declare {noformat} PRIMARY KEY ((id_part1, id_part2), tag_name). {noformat} Make CQL work naturally with wide rows -- Key: CASSANDRA-4815 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815 Project: Cassandra Issue Type: Wish Reporter: Edward Capriolo I find that CQL3 is quite obtuse and does not provide me a language useful for accessing my data. First, lets point out how we should design Cassandra data. 1) Denormalize 2) Eliminate seeks 3) Design for read 4) optimize for blind writes So here is a schema that abides by these tried and tested rules large production uses are employing today. Say we have a table of movie objects: Movie Name Description - tags (string) - credits composite(role string, name string ) -1 likesToday -1 blacklisted The above structure is a movie notice it hold a mix of static and dynamic columns, but the other all number of columns is not very large. (even if it was larger this is OK as well) Notice this table is not just a single one to many relationship, it has 1 to 1 data and it has two sets of 1 to many data. The schema today is declared something like this: create column family movies with default_comparator=UTF8Type and column_metadata = [ {column_name: blacklisted, validation_class: int}, {column_name: likestoday, validation_class: long}, {column_name: description, validation_class: UTF8Type} ]; We should be able to insert data like this: set ['Cassandra Database, not looking for a seQL']['blacklisted']=1; set ['Cassandra
[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481506#comment-13481506 ] Jonathan Ellis commented on CASSANDRA-4815: --- These are good questions and I wanted to reach a wider audience than the Jira followers, so I wrote a blog post to address the questions here: http://www.datastax.com/dev/blog/cql3-for-cassandra-experts Please let me know if that clarifies things. (Note that all the examples there work in 1.1 as well as 1.2, with the exception of the cql3 CREATE and ALTER for song_tags. Those require 1.2. All the pasted output is in fact from 1.1.) Make CQL work naturally with wide rows -- Key: CASSANDRA-4815 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815 Project: Cassandra Issue Type: Wish Reporter: Edward Capriolo I find that CQL3 is quite obtuse and does not provide me a language useful for accessing my data. First, lets point out how we should design Cassandra data. 1) Denormalize 2) Eliminate seeks 3) Design for read 4) optimize for blind writes So here is a schema that abides by these tried and tested rules large production uses are employing today. Say we have a table of movie objects: Movie Name Description - tags (string) - credits composite(role string, name string ) -1 likesToday -1 blacklisted The above structure is a movie notice it hold a mix of static and dynamic columns, but the other all number of columns is not very large. (even if it was larger this is OK as well) Notice this table is not just a single one to many relationship, it has 1 to 1 data and it has two sets of 1 to many data. The schema today is declared something like this: create column family movies with default_comparator=UTF8Type and column_metadata = [ {column_name: blacklisted, validation_class: int}, {column_name: likestoday, validation_class: long}, {column_name: description, validation_class: UTF8Type} ]; We should be able to insert data like this: set ['Cassandra Database, not looking for a seQL']['blacklisted']=1; set ['Cassandra Database, not looking for a seQL']['likesToday']=34; set ['Cassandra Database, not looking for a seQL']['credits-dir']='director:asf'; set ['Cassandra Database, not looking for a seQL']['credits-jir]='jiraguy:bob'; set ['Cassandra Database, not looking for a seQL']['tags-action']=''; set ['Cassandra Database, not looking for a seQL']['tags-adventure']=''; set ['Cassandra Database, not looking for a seQL']['tags-romance']=''; set ['Cassandra Database, not looking for a seQL']['tags-programming']=''; This is the correct way to do it. 1 seek to find all the information related to a movie. As long as this row does not get large there is no reason to optimize by breaking data into other column families. (Notice you can not transpose this because movies is two 1-to-many relationships of potentially different types) Lets look at the CQL3 way to do this design: First, contrary to the original design of cassandra CQL does not like wide rows. It also does not have a good way to dealing with dynamic rows together with static rows either. You have two options: Option 1: lose all schema create table movies ( name string, column blob, value blob, primary key(name)) with compact storage. This method is not so hot we have not lost all our validators, and by the way you have to physically shutdown everything and rename files and recreate your schema if you want to inform cassandra that a current table should be compact. This could at very least be just a metadata change. Also you can not add column schema either. Option 2 Normalize (is even worse) create table movie (name String, description string, likestoday int, blacklisted int); create table movecredits( name string, role string, personname string, primary key(name,role) ); create table movetags( name string, tag string, primary key (name,tag) ); This is a terrible design, of the 4 key characteristics how cassandra data should be designed it fails 3: It does not: 1) Denormalize 2) Eliminate seeks 3) Design for read Why is Cassandra steering toward this course, by making a language that does not understand wide rows? So what can be done? My suggestions: Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a virtual view that is compact storage with no work to migrate data and recreate schemas. Every table should have a compact view for the schemaless, or a simple query hint like /*transposed*/ should make this change. Metadata should be definable by regex. For example, all columnes named tag* are of type string. CQL should have the column[slice_start] .. column[slice_end] operator from cql2. CQL should support
[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481513#comment-13481513 ] Jonathan Ellis commented on CASSANDRA-4815: --- To add to that, if you want mostly static columns, but some schemaless then you can throw the schemaless ones in a Map. This will NOT be easily accessible from Thrift -- but it's a good example of the kinds of things that cql3 makes easier. Make CQL work naturally with wide rows -- Key: CASSANDRA-4815 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815 Project: Cassandra Issue Type: Wish Reporter: Edward Capriolo I find that CQL3 is quite obtuse and does not provide me a language useful for accessing my data. First, lets point out how we should design Cassandra data. 1) Denormalize 2) Eliminate seeks 3) Design for read 4) optimize for blind writes So here is a schema that abides by these tried and tested rules large production uses are employing today. Say we have a table of movie objects: Movie Name Description - tags (string) - credits composite(role string, name string ) -1 likesToday -1 blacklisted The above structure is a movie notice it hold a mix of static and dynamic columns, but the other all number of columns is not very large. (even if it was larger this is OK as well) Notice this table is not just a single one to many relationship, it has 1 to 1 data and it has two sets of 1 to many data. The schema today is declared something like this: create column family movies with default_comparator=UTF8Type and column_metadata = [ {column_name: blacklisted, validation_class: int}, {column_name: likestoday, validation_class: long}, {column_name: description, validation_class: UTF8Type} ]; We should be able to insert data like this: set ['Cassandra Database, not looking for a seQL']['blacklisted']=1; set ['Cassandra Database, not looking for a seQL']['likesToday']=34; set ['Cassandra Database, not looking for a seQL']['credits-dir']='director:asf'; set ['Cassandra Database, not looking for a seQL']['credits-jir]='jiraguy:bob'; set ['Cassandra Database, not looking for a seQL']['tags-action']=''; set ['Cassandra Database, not looking for a seQL']['tags-adventure']=''; set ['Cassandra Database, not looking for a seQL']['tags-romance']=''; set ['Cassandra Database, not looking for a seQL']['tags-programming']=''; This is the correct way to do it. 1 seek to find all the information related to a movie. As long as this row does not get large there is no reason to optimize by breaking data into other column families. (Notice you can not transpose this because movies is two 1-to-many relationships of potentially different types) Lets look at the CQL3 way to do this design: First, contrary to the original design of cassandra CQL does not like wide rows. It also does not have a good way to dealing with dynamic rows together with static rows either. You have two options: Option 1: lose all schema create table movies ( name string, column blob, value blob, primary key(name)) with compact storage. This method is not so hot we have not lost all our validators, and by the way you have to physically shutdown everything and rename files and recreate your schema if you want to inform cassandra that a current table should be compact. This could at very least be just a metadata change. Also you can not add column schema either. Option 2 Normalize (is even worse) create table movie (name String, description string, likestoday int, blacklisted int); create table movecredits( name string, role string, personname string, primary key(name,role) ); create table movetags( name string, tag string, primary key (name,tag) ); This is a terrible design, of the 4 key characteristics how cassandra data should be designed it fails 3: It does not: 1) Denormalize 2) Eliminate seeks 3) Design for read Why is Cassandra steering toward this course, by making a language that does not understand wide rows? So what can be done? My suggestions: Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a virtual view that is compact storage with no work to migrate data and recreate schemas. Every table should have a compact view for the schemaless, or a simple query hint like /*transposed*/ should make this change. Metadata should be definable by regex. For example, all columnes named tag* are of type string. CQL should have the column[slice_start] .. column[slice_end] operator from cql2. CQL should support current users, users should not have to switch between CQL versions, and possibly thrift, to work with wide rows. The language should work for them even if it not expressly designed for them.
[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481525#comment-13481525 ] Jonathan Ellis commented on CASSANDRA-4815: --- bq. What does this mean? 'We can do that'. If we have an old style schema don't we need to be able to alter a current table. Only if you want to add meaningful names. That's what this next part is saying: If we simply use the old schema directly as-is, Cassandra will give cell names and values autogenerated CQL3 names: column1, column2, and so forth. Here I’m accessing the data inserted earlier from CQL2, but with cqlsh --cql3: {noformat} SELECT * FROM song_tags; id | column1 | value --+-+--- 8a172618-b121-4136-bb10-f665cfc469eb |2007 | 8a172618-b121-4136-bb10-f665cfc469eb | covers | a3e64f8f-bd44-4f28-b8d9-6938726e34d4 |1973 | a3e64f8f-bd44-4f28-b8d9-6938726e34d4 | blues | {noformat} ... that said, as Sylvain points out we do have CASSANDRA-4822 open to allow changing those default names without dropping and recreating the table definition. bq. Does it make sense to implement CLI like SET and GET? Not in CQL-the-language, and I don't think even in cqlsh-the-utility. I understand the appeal of the convenience, but the abstraction leakage it would introduce threatens to undo all the work we're doing to make CQL3 something you can use on its own terms. (As far as performance goes, prepared statements make the length of the string being parsed initially a non-issue.) Make CQL work naturally with wide rows -- Key: CASSANDRA-4815 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815 Project: Cassandra Issue Type: Wish Reporter: Edward Capriolo I find that CQL3 is quite obtuse and does not provide me a language useful for accessing my data. First, lets point out how we should design Cassandra data. 1) Denormalize 2) Eliminate seeks 3) Design for read 4) optimize for blind writes So here is a schema that abides by these tried and tested rules large production uses are employing today. Say we have a table of movie objects: Movie Name Description - tags (string) - credits composite(role string, name string ) -1 likesToday -1 blacklisted The above structure is a movie notice it hold a mix of static and dynamic columns, but the other all number of columns is not very large. (even if it was larger this is OK as well) Notice this table is not just a single one to many relationship, it has 1 to 1 data and it has two sets of 1 to many data. The schema today is declared something like this: create column family movies with default_comparator=UTF8Type and column_metadata = [ {column_name: blacklisted, validation_class: int}, {column_name: likestoday, validation_class: long}, {column_name: description, validation_class: UTF8Type} ]; We should be able to insert data like this: set ['Cassandra Database, not looking for a seQL']['blacklisted']=1; set ['Cassandra Database, not looking for a seQL']['likesToday']=34; set ['Cassandra Database, not looking for a seQL']['credits-dir']='director:asf'; set ['Cassandra Database, not looking for a seQL']['credits-jir]='jiraguy:bob'; set ['Cassandra Database, not looking for a seQL']['tags-action']=''; set ['Cassandra Database, not looking for a seQL']['tags-adventure']=''; set ['Cassandra Database, not looking for a seQL']['tags-romance']=''; set ['Cassandra Database, not looking for a seQL']['tags-programming']=''; This is the correct way to do it. 1 seek to find all the information related to a movie. As long as this row does not get large there is no reason to optimize by breaking data into other column families. (Notice you can not transpose this because movies is two 1-to-many relationships of potentially different types) Lets look at the CQL3 way to do this design: First, contrary to the original design of cassandra CQL does not like wide rows. It also does not have a good way to dealing with dynamic rows together with static rows either. You have two options: Option 1: lose all schema create table movies ( name string, column blob, value blob, primary key(name)) with compact storage. This method is not so hot we have not lost all our validators, and by the way you have to physically shutdown everything and rename files and recreate your schema if you want to inform cassandra that a current table should be compact. This could at very least be just a metadata change. Also you can not add column schema either. Option 2 Normalize (is even worse) create table movie (name String, description string, likestoday int, blacklisted int); create table movecredits( name string, role
[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481793#comment-13481793 ] Edward Capriolo commented on CASSANDRA-4815: {noformat}Not in CQL-the-language, and I don't think even in cqlsh-the-utility. I understand the appeal of the convenience, but the abstraction leakage it would introduce threatens to undo all the work we're doing to make CQL3 something you can use on its own terms.{noformat} But what if I want to think of Cassandra as a memcache not a relational database. This is one of my ticket points, CQL should support all the use cases it can. You are calling it abstraction leakage but I think of it as a natural way with working with Cassandra. But I do agree that SELECTS are better then cli 'get' in most cases. What is missing is the SET side. {noformat} CREATE TABLE schemaless ( key blob, column blob, value blob, PRIMARY KEY (key, column) ) WITH COMPACT STORAGE {noformat} Now to insert into this table I need to format everything as hex. INSERT INTO SCHEMALESS (key,column,value) VALUES ('HEX','HEX','HEX'); The CLI has many useful functions like ascii(' '), or utf8(' '). Assume does not seem to have an effect here. This is discussed in CASSANDRA-3799. Make CQL work naturally with wide rows -- Key: CASSANDRA-4815 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815 Project: Cassandra Issue Type: Wish Reporter: Edward Capriolo I find that CQL3 is quite obtuse and does not provide me a language useful for accessing my data. First, lets point out how we should design Cassandra data. 1) Denormalize 2) Eliminate seeks 3) Design for read 4) optimize for blind writes So here is a schema that abides by these tried and tested rules large production uses are employing today. Say we have a table of movie objects: Movie Name Description - tags (string) - credits composite(role string, name string ) -1 likesToday -1 blacklisted The above structure is a movie notice it hold a mix of static and dynamic columns, but the other all number of columns is not very large. (even if it was larger this is OK as well) Notice this table is not just a single one to many relationship, it has 1 to 1 data and it has two sets of 1 to many data. The schema today is declared something like this: create column family movies with default_comparator=UTF8Type and column_metadata = [ {column_name: blacklisted, validation_class: int}, {column_name: likestoday, validation_class: long}, {column_name: description, validation_class: UTF8Type} ]; We should be able to insert data like this: set ['Cassandra Database, not looking for a seQL']['blacklisted']=1; set ['Cassandra Database, not looking for a seQL']['likesToday']=34; set ['Cassandra Database, not looking for a seQL']['credits-dir']='director:asf'; set ['Cassandra Database, not looking for a seQL']['credits-jir]='jiraguy:bob'; set ['Cassandra Database, not looking for a seQL']['tags-action']=''; set ['Cassandra Database, not looking for a seQL']['tags-adventure']=''; set ['Cassandra Database, not looking for a seQL']['tags-romance']=''; set ['Cassandra Database, not looking for a seQL']['tags-programming']=''; This is the correct way to do it. 1 seek to find all the information related to a movie. As long as this row does not get large there is no reason to optimize by breaking data into other column families. (Notice you can not transpose this because movies is two 1-to-many relationships of potentially different types) Lets look at the CQL3 way to do this design: First, contrary to the original design of cassandra CQL does not like wide rows. It also does not have a good way to dealing with dynamic rows together with static rows either. You have two options: Option 1: lose all schema create table movies ( name string, column blob, value blob, primary key(name)) with compact storage. This method is not so hot we have not lost all our validators, and by the way you have to physically shutdown everything and rename files and recreate your schema if you want to inform cassandra that a current table should be compact. This could at very least be just a metadata change. Also you can not add column schema either. Option 2 Normalize (is even worse) create table movie (name String, description string, likestoday int, blacklisted int); create table movecredits( name string, role string, personname string, primary key(name,role) ); create table movetags( name string, tag string, primary key (name,tag) ); This is a terrible design, of the 4 key characteristics how cassandra data should be designed it fails 3: It does not: 1) Denormalize 2) Eliminate seeks 3) Design for read Why is
[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480971#comment-13480971 ] Edward Capriolo commented on CASSANDRA-4815: Also without reading much of the background tickets I am pretty curious as about the syntax {noformat} PRIMARY KEY (id, tag_name) {noformat} What is going to happen if Cassandra and the CQL language actually adds true composite row keys? Make CQL work naturally with wide rows -- Key: CASSANDRA-4815 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815 Project: Cassandra Issue Type: Wish Reporter: Edward Capriolo I find that CQL3 is quite obtuse and does not provide me a language useful for accessing my data. First, lets point out how we should design Cassandra data. 1) Denormalize 2) Eliminate seeks 3) Design for read 4) optimize for blind writes So here is a schema that abides by these tried and tested rules large production uses are employing today. Say we have a table of movie objects: Movie Name Description - tags (string) - credits composite(role string, name string ) -1 likesToday -1 blacklisted The above structure is a movie notice it hold a mix of static and dynamic columns, but the other all number of columns is not very large. (even if it was larger this is OK as well) Notice this table is not just a single one to many relationship, it has 1 to 1 data and it has two sets of 1 to many data. The schema today is declared something like this: create column family movies with default_comparator=UTF8Type and column_metadata = [ {column_name: blacklisted, validation_class: int}, {column_name: likestoday, validation_class: long}, {column_name: description, validation_class: UTF8Type} ]; We should be able to insert data like this: set ['Cassandra Database, not looking for a seQL']['blacklisted']=1; set ['Cassandra Database, not looking for a seQL']['likesToday']=34; set ['Cassandra Database, not looking for a seQL']['credits-dir']='director:asf'; set ['Cassandra Database, not looking for a seQL']['credits-jir]='jiraguy:bob'; set ['Cassandra Database, not looking for a seQL']['tags-action']=''; set ['Cassandra Database, not looking for a seQL']['tags-adventure']=''; set ['Cassandra Database, not looking for a seQL']['tags-romance']=''; set ['Cassandra Database, not looking for a seQL']['tags-programming']=''; This is the correct way to do it. 1 seek to find all the information related to a movie. As long as this row does not get large there is no reason to optimize by breaking data into other column families. (Notice you can not transpose this because movies is two 1-to-many relationships of potentially different types) Lets look at the CQL3 way to do this design: First, contrary to the original design of cassandra CQL does not like wide rows. It also does not have a good way to dealing with dynamic rows together with static rows either. You have two options: Option 1: lose all schema create table movies ( name string, column blob, value blob, primary key(name)) with compact storage. This method is not so hot we have not lost all our validators, and by the way you have to physically shutdown everything and rename files and recreate your schema if you want to inform cassandra that a current table should be compact. This could at very least be just a metadata change. Also you can not add column schema either. Option 2 Normalize (is even worse) create table movie (name String, description string, likestoday int, blacklisted int); create table movecredits( name string, role string, personname string, primary key(name,role) ); create table movetags( name string, tag string, primary key (name,tag) ); This is a terrible design, of the 4 key characteristics how cassandra data should be designed it fails 3: It does not: 1) Denormalize 2) Eliminate seeks 3) Design for read Why is Cassandra steering toward this course, by making a language that does not understand wide rows? So what can be done? My suggestions: Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a virtual view that is compact storage with no work to migrate data and recreate schemas. Every table should have a compact view for the schemaless, or a simple query hint like /*transposed*/ should make this change. Metadata should be definable by regex. For example, all columnes named tag* are of type string. CQL should have the column[slice_start] .. column[slice_end] operator from cql2. CQL should support current users, users should not have to switch between CQL versions, and possibly thrift, to work with wide rows. The language should work for them even if it not expressly designed for them.
[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480976#comment-13480976 ] Edward Capriolo commented on CASSANDRA-4815: So in Cassandra 1.2.0 you can create a table with no columns other then the primary key {noformat} cqlsh:testkeyspace create table sample ( keycolumn varchar, primary key (keycolumn) ); {noformat} But you can not insert to it. cqlsh:testkeyspace insert into sample ( keycolumn, 'age' ) values ('ed','30') ; And rather surprisingly it creates a table with metadata I did not ask for. It assumes the comparator is a composite of a single UTF8Type {noformat} create column family sample with column_type = 'Standard' and comparator = 'CompositeType(org.apache.cassandra.db.marshal.UTF8Type)' and default_validation_class = 'UTF8Type' {noformat} Not what I was going for :( Make CQL work naturally with wide rows -- Key: CASSANDRA-4815 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815 Project: Cassandra Issue Type: Wish Reporter: Edward Capriolo I find that CQL3 is quite obtuse and does not provide me a language useful for accessing my data. First, lets point out how we should design Cassandra data. 1) Denormalize 2) Eliminate seeks 3) Design for read 4) optimize for blind writes So here is a schema that abides by these tried and tested rules large production uses are employing today. Say we have a table of movie objects: Movie Name Description - tags (string) - credits composite(role string, name string ) -1 likesToday -1 blacklisted The above structure is a movie notice it hold a mix of static and dynamic columns, but the other all number of columns is not very large. (even if it was larger this is OK as well) Notice this table is not just a single one to many relationship, it has 1 to 1 data and it has two sets of 1 to many data. The schema today is declared something like this: create column family movies with default_comparator=UTF8Type and column_metadata = [ {column_name: blacklisted, validation_class: int}, {column_name: likestoday, validation_class: long}, {column_name: description, validation_class: UTF8Type} ]; We should be able to insert data like this: set ['Cassandra Database, not looking for a seQL']['blacklisted']=1; set ['Cassandra Database, not looking for a seQL']['likesToday']=34; set ['Cassandra Database, not looking for a seQL']['credits-dir']='director:asf'; set ['Cassandra Database, not looking for a seQL']['credits-jir]='jiraguy:bob'; set ['Cassandra Database, not looking for a seQL']['tags-action']=''; set ['Cassandra Database, not looking for a seQL']['tags-adventure']=''; set ['Cassandra Database, not looking for a seQL']['tags-romance']=''; set ['Cassandra Database, not looking for a seQL']['tags-programming']=''; This is the correct way to do it. 1 seek to find all the information related to a movie. As long as this row does not get large there is no reason to optimize by breaking data into other column families. (Notice you can not transpose this because movies is two 1-to-many relationships of potentially different types) Lets look at the CQL3 way to do this design: First, contrary to the original design of cassandra CQL does not like wide rows. It also does not have a good way to dealing with dynamic rows together with static rows either. You have two options: Option 1: lose all schema create table movies ( name string, column blob, value blob, primary key(name)) with compact storage. This method is not so hot we have not lost all our validators, and by the way you have to physically shutdown everything and rename files and recreate your schema if you want to inform cassandra that a current table should be compact. This could at very least be just a metadata change. Also you can not add column schema either. Option 2 Normalize (is even worse) create table movie (name String, description string, likestoday int, blacklisted int); create table movecredits( name string, role string, personname string, primary key(name,role) ); create table movetags( name string, tag string, primary key (name,tag) ); This is a terrible design, of the 4 key characteristics how cassandra data should be designed it fails 3: It does not: 1) Denormalize 2) Eliminate seeks 3) Design for read Why is Cassandra steering toward this course, by making a language that does not understand wide rows? So what can be done? My suggestions: Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a virtual view that is compact storage with no work to migrate data and recreate schemas. Every table should have a compact view for the schemaless, or a simple query hint like
[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481106#comment-13481106 ] Nick Bailey commented on CASSANDRA-4815: bq. What is going to happen if Cassandra and the CQL language actually adds true composite row keys? CASSANDRA-4179 Make CQL work naturally with wide rows -- Key: CASSANDRA-4815 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815 Project: Cassandra Issue Type: Wish Reporter: Edward Capriolo I find that CQL3 is quite obtuse and does not provide me a language useful for accessing my data. First, lets point out how we should design Cassandra data. 1) Denormalize 2) Eliminate seeks 3) Design for read 4) optimize for blind writes So here is a schema that abides by these tried and tested rules large production uses are employing today. Say we have a table of movie objects: Movie Name Description - tags (string) - credits composite(role string, name string ) -1 likesToday -1 blacklisted The above structure is a movie notice it hold a mix of static and dynamic columns, but the other all number of columns is not very large. (even if it was larger this is OK as well) Notice this table is not just a single one to many relationship, it has 1 to 1 data and it has two sets of 1 to many data. The schema today is declared something like this: create column family movies with default_comparator=UTF8Type and column_metadata = [ {column_name: blacklisted, validation_class: int}, {column_name: likestoday, validation_class: long}, {column_name: description, validation_class: UTF8Type} ]; We should be able to insert data like this: set ['Cassandra Database, not looking for a seQL']['blacklisted']=1; set ['Cassandra Database, not looking for a seQL']['likesToday']=34; set ['Cassandra Database, not looking for a seQL']['credits-dir']='director:asf'; set ['Cassandra Database, not looking for a seQL']['credits-jir]='jiraguy:bob'; set ['Cassandra Database, not looking for a seQL']['tags-action']=''; set ['Cassandra Database, not looking for a seQL']['tags-adventure']=''; set ['Cassandra Database, not looking for a seQL']['tags-romance']=''; set ['Cassandra Database, not looking for a seQL']['tags-programming']=''; This is the correct way to do it. 1 seek to find all the information related to a movie. As long as this row does not get large there is no reason to optimize by breaking data into other column families. (Notice you can not transpose this because movies is two 1-to-many relationships of potentially different types) Lets look at the CQL3 way to do this design: First, contrary to the original design of cassandra CQL does not like wide rows. It also does not have a good way to dealing with dynamic rows together with static rows either. You have two options: Option 1: lose all schema create table movies ( name string, column blob, value blob, primary key(name)) with compact storage. This method is not so hot we have not lost all our validators, and by the way you have to physically shutdown everything and rename files and recreate your schema if you want to inform cassandra that a current table should be compact. This could at very least be just a metadata change. Also you can not add column schema either. Option 2 Normalize (is even worse) create table movie (name String, description string, likestoday int, blacklisted int); create table movecredits( name string, role string, personname string, primary key(name,role) ); create table movetags( name string, tag string, primary key (name,tag) ); This is a terrible design, of the 4 key characteristics how cassandra data should be designed it fails 3: It does not: 1) Denormalize 2) Eliminate seeks 3) Design for read Why is Cassandra steering toward this course, by making a language that does not understand wide rows? So what can be done? My suggestions: Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a virtual view that is compact storage with no work to migrate data and recreate schemas. Every table should have a compact view for the schemaless, or a simple query hint like /*transposed*/ should make this change. Metadata should be definable by regex. For example, all columnes named tag* are of type string. CQL should have the column[slice_start] .. column[slice_end] operator from cql2. CQL should support current users, users should not have to switch between CQL versions, and possibly thrift, to work with wide rows. The language should work for them even if it not expressly designed for them. Some of these features are already part of cql2 so they should be carried over. Also what needs to not happen is someone to make a
[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481108#comment-13481108 ] Edward Capriolo commented on CASSANDRA-4815: Working with CQL today another idea came to me. Does it make sense to implement CLI like SET and GET? SET and GET are actually fairly natural ways to work with schema-less cassandra. Also in terms of performance a CLI set statement is smaller then the equivalent insert into. This would serve a a no nonsense way to get data into a CF. Make CQL work naturally with wide rows -- Key: CASSANDRA-4815 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815 Project: Cassandra Issue Type: Wish Reporter: Edward Capriolo I find that CQL3 is quite obtuse and does not provide me a language useful for accessing my data. First, lets point out how we should design Cassandra data. 1) Denormalize 2) Eliminate seeks 3) Design for read 4) optimize for blind writes So here is a schema that abides by these tried and tested rules large production uses are employing today. Say we have a table of movie objects: Movie Name Description - tags (string) - credits composite(role string, name string ) -1 likesToday -1 blacklisted The above structure is a movie notice it hold a mix of static and dynamic columns, but the other all number of columns is not very large. (even if it was larger this is OK as well) Notice this table is not just a single one to many relationship, it has 1 to 1 data and it has two sets of 1 to many data. The schema today is declared something like this: create column family movies with default_comparator=UTF8Type and column_metadata = [ {column_name: blacklisted, validation_class: int}, {column_name: likestoday, validation_class: long}, {column_name: description, validation_class: UTF8Type} ]; We should be able to insert data like this: set ['Cassandra Database, not looking for a seQL']['blacklisted']=1; set ['Cassandra Database, not looking for a seQL']['likesToday']=34; set ['Cassandra Database, not looking for a seQL']['credits-dir']='director:asf'; set ['Cassandra Database, not looking for a seQL']['credits-jir]='jiraguy:bob'; set ['Cassandra Database, not looking for a seQL']['tags-action']=''; set ['Cassandra Database, not looking for a seQL']['tags-adventure']=''; set ['Cassandra Database, not looking for a seQL']['tags-romance']=''; set ['Cassandra Database, not looking for a seQL']['tags-programming']=''; This is the correct way to do it. 1 seek to find all the information related to a movie. As long as this row does not get large there is no reason to optimize by breaking data into other column families. (Notice you can not transpose this because movies is two 1-to-many relationships of potentially different types) Lets look at the CQL3 way to do this design: First, contrary to the original design of cassandra CQL does not like wide rows. It also does not have a good way to dealing with dynamic rows together with static rows either. You have two options: Option 1: lose all schema create table movies ( name string, column blob, value blob, primary key(name)) with compact storage. This method is not so hot we have not lost all our validators, and by the way you have to physically shutdown everything and rename files and recreate your schema if you want to inform cassandra that a current table should be compact. This could at very least be just a metadata change. Also you can not add column schema either. Option 2 Normalize (is even worse) create table movie (name String, description string, likestoday int, blacklisted int); create table movecredits( name string, role string, personname string, primary key(name,role) ); create table movetags( name string, tag string, primary key (name,tag) ); This is a terrible design, of the 4 key characteristics how cassandra data should be designed it fails 3: It does not: 1) Denormalize 2) Eliminate seeks 3) Design for read Why is Cassandra steering toward this course, by making a language that does not understand wide rows? So what can be done? My suggestions: Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a virtual view that is compact storage with no work to migrate data and recreate schemas. Every table should have a compact view for the schemaless, or a simple query hint like /*transposed*/ should make this change. Metadata should be definable by regex. For example, all columnes named tag* are of type string. CQL should have the column[slice_start] .. column[slice_end] operator from cql2. CQL should support current users, users should not have to switch between CQL versions, and possibly thrift, to work with
[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480883#comment-13480883 ] Edward Capriolo commented on CASSANDRA-4815: Thanks for building that Jonathan. It clears up a couple things. I still have some questions/possible feature requests. Can I create a schema less table? {noformat} cqlsh:testkeyspace create table simple (a varchar, primary key(a) ); Bad Request: No definition found that is not part of the PRIMARY KEY {noformat} Make CQL work naturally with wide rows -- Key: CASSANDRA-4815 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815 Project: Cassandra Issue Type: Wish Reporter: Edward Capriolo I find that CQL3 is quite obtuse and does not provide me a language useful for accessing my data. First, lets point out how we should design Cassandra data. 1) Denormalize 2) Eliminate seeks 3) Design for read 4) optimize for blind writes So here is a schema that abides by these tried and tested rules large production uses are employing today. Say we have a table of movie objects: Movie Name Description - tags (string) - credits composite(role string, name string ) -1 likesToday -1 blacklisted The above structure is a movie notice it hold a mix of static and dynamic columns, but the other all number of columns is not very large. (even if it was larger this is OK as well) Notice this table is not just a single one to many relationship, it has 1 to 1 data and it has two sets of 1 to many data. The schema today is declared something like this: create column family movies with default_comparator=UTF8Type and column_metadata = [ {column_name: blacklisted, validation_class: int}, {column_name: likestoday, validation_class: long}, {column_name: description, validation_class: UTF8Type} ]; We should be able to insert data like this: set ['Cassandra Database, not looking for a seQL']['blacklisted']=1; set ['Cassandra Database, not looking for a seQL']['likesToday']=34; set ['Cassandra Database, not looking for a seQL']['credits-dir']='director:asf'; set ['Cassandra Database, not looking for a seQL']['credits-jir]='jiraguy:bob'; set ['Cassandra Database, not looking for a seQL']['tags-action']=''; set ['Cassandra Database, not looking for a seQL']['tags-adventure']=''; set ['Cassandra Database, not looking for a seQL']['tags-romance']=''; set ['Cassandra Database, not looking for a seQL']['tags-programming']=''; This is the correct way to do it. 1 seek to find all the information related to a movie. As long as this row does not get large there is no reason to optimize by breaking data into other column families. (Notice you can not transpose this because movies is two 1-to-many relationships of potentially different types) Lets look at the CQL3 way to do this design: First, contrary to the original design of cassandra CQL does not like wide rows. It also does not have a good way to dealing with dynamic rows together with static rows either. You have two options: Option 1: lose all schema create table movies ( name string, column blob, value blob, primary key(name)) with compact storage. This method is not so hot we have not lost all our validators, and by the way you have to physically shutdown everything and rename files and recreate your schema if you want to inform cassandra that a current table should be compact. This could at very least be just a metadata change. Also you can not add column schema either. Option 2 Normalize (is even worse) create table movie (name String, description string, likestoday int, blacklisted int); create table movecredits( name string, role string, personname string, primary key(name,role) ); create table movetags( name string, tag string, primary key (name,tag) ); This is a terrible design, of the 4 key characteristics how cassandra data should be designed it fails 3: It does not: 1) Denormalize 2) Eliminate seeks 3) Design for read Why is Cassandra steering toward this course, by making a language that does not understand wide rows? So what can be done? My suggestions: Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a virtual view that is compact storage with no work to migrate data and recreate schemas. Every table should have a compact view for the schemaless, or a simple query hint like /*transposed*/ should make this change. Metadata should be definable by regex. For example, all columnes named tag* are of type string. CQL should have the column[slice_start] .. column[slice_end] operator from cql2. CQL should support current users, users should not have to switch between CQL versions, and possibly thrift, to work with wide rows. The
[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480884#comment-13480884 ] Edward Capriolo commented on CASSANDRA-4815: The syntax suggests you should be able to create a table that only has a primary key. {noformat} CREATE TABLE cfname ( colname type PRIMARY KEY [, colname type [, ...]] ) [WITH optionname = val [AND optionname = val [...]]]; {noformat} I think a user SHOULD be able to do this, because cassandra can be schemaless CQL should provide a way to create this type of table. (since we can SELECT * from tables created from the CLI) Make CQL work naturally with wide rows -- Key: CASSANDRA-4815 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815 Project: Cassandra Issue Type: Wish Reporter: Edward Capriolo I find that CQL3 is quite obtuse and does not provide me a language useful for accessing my data. First, lets point out how we should design Cassandra data. 1) Denormalize 2) Eliminate seeks 3) Design for read 4) optimize for blind writes So here is a schema that abides by these tried and tested rules large production uses are employing today. Say we have a table of movie objects: Movie Name Description - tags (string) - credits composite(role string, name string ) -1 likesToday -1 blacklisted The above structure is a movie notice it hold a mix of static and dynamic columns, but the other all number of columns is not very large. (even if it was larger this is OK as well) Notice this table is not just a single one to many relationship, it has 1 to 1 data and it has two sets of 1 to many data. The schema today is declared something like this: create column family movies with default_comparator=UTF8Type and column_metadata = [ {column_name: blacklisted, validation_class: int}, {column_name: likestoday, validation_class: long}, {column_name: description, validation_class: UTF8Type} ]; We should be able to insert data like this: set ['Cassandra Database, not looking for a seQL']['blacklisted']=1; set ['Cassandra Database, not looking for a seQL']['likesToday']=34; set ['Cassandra Database, not looking for a seQL']['credits-dir']='director:asf'; set ['Cassandra Database, not looking for a seQL']['credits-jir]='jiraguy:bob'; set ['Cassandra Database, not looking for a seQL']['tags-action']=''; set ['Cassandra Database, not looking for a seQL']['tags-adventure']=''; set ['Cassandra Database, not looking for a seQL']['tags-romance']=''; set ['Cassandra Database, not looking for a seQL']['tags-programming']=''; This is the correct way to do it. 1 seek to find all the information related to a movie. As long as this row does not get large there is no reason to optimize by breaking data into other column families. (Notice you can not transpose this because movies is two 1-to-many relationships of potentially different types) Lets look at the CQL3 way to do this design: First, contrary to the original design of cassandra CQL does not like wide rows. It also does not have a good way to dealing with dynamic rows together with static rows either. You have two options: Option 1: lose all schema create table movies ( name string, column blob, value blob, primary key(name)) with compact storage. This method is not so hot we have not lost all our validators, and by the way you have to physically shutdown everything and rename files and recreate your schema if you want to inform cassandra that a current table should be compact. This could at very least be just a metadata change. Also you can not add column schema either. Option 2 Normalize (is even worse) create table movie (name String, description string, likestoday int, blacklisted int); create table movecredits( name string, role string, personname string, primary key(name,role) ); create table movetags( name string, tag string, primary key (name,tag) ); This is a terrible design, of the 4 key characteristics how cassandra data should be designed it fails 3: It does not: 1) Denormalize 2) Eliminate seeks 3) Design for read Why is Cassandra steering toward this course, by making a language that does not understand wide rows? So what can be done? My suggestions: Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a virtual view that is compact storage with no work to migrate data and recreate schemas. Every table should have a compact view for the schemaless, or a simple query hint like /*transposed*/ should make this change. Metadata should be definable by regex. For example, all columnes named tag* are of type string. CQL should have the column[slice_start] .. column[slice_end] operator
[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480886#comment-13480886 ] Jeremy Hanna commented on CASSANDRA-4815: - I let Ed know that in 1.2 there was support for creating a table with only a primary key (thanks Patrick). He did ask a good question - is CQL3 going to be relatively set in stone in 1.2? If people implement to CQL3, that's not going to change is it? Make CQL work naturally with wide rows -- Key: CASSANDRA-4815 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815 Project: Cassandra Issue Type: Wish Reporter: Edward Capriolo I find that CQL3 is quite obtuse and does not provide me a language useful for accessing my data. First, lets point out how we should design Cassandra data. 1) Denormalize 2) Eliminate seeks 3) Design for read 4) optimize for blind writes So here is a schema that abides by these tried and tested rules large production uses are employing today. Say we have a table of movie objects: Movie Name Description - tags (string) - credits composite(role string, name string ) -1 likesToday -1 blacklisted The above structure is a movie notice it hold a mix of static and dynamic columns, but the other all number of columns is not very large. (even if it was larger this is OK as well) Notice this table is not just a single one to many relationship, it has 1 to 1 data and it has two sets of 1 to many data. The schema today is declared something like this: create column family movies with default_comparator=UTF8Type and column_metadata = [ {column_name: blacklisted, validation_class: int}, {column_name: likestoday, validation_class: long}, {column_name: description, validation_class: UTF8Type} ]; We should be able to insert data like this: set ['Cassandra Database, not looking for a seQL']['blacklisted']=1; set ['Cassandra Database, not looking for a seQL']['likesToday']=34; set ['Cassandra Database, not looking for a seQL']['credits-dir']='director:asf'; set ['Cassandra Database, not looking for a seQL']['credits-jir]='jiraguy:bob'; set ['Cassandra Database, not looking for a seQL']['tags-action']=''; set ['Cassandra Database, not looking for a seQL']['tags-adventure']=''; set ['Cassandra Database, not looking for a seQL']['tags-romance']=''; set ['Cassandra Database, not looking for a seQL']['tags-programming']=''; This is the correct way to do it. 1 seek to find all the information related to a movie. As long as this row does not get large there is no reason to optimize by breaking data into other column families. (Notice you can not transpose this because movies is two 1-to-many relationships of potentially different types) Lets look at the CQL3 way to do this design: First, contrary to the original design of cassandra CQL does not like wide rows. It also does not have a good way to dealing with dynamic rows together with static rows either. You have two options: Option 1: lose all schema create table movies ( name string, column blob, value blob, primary key(name)) with compact storage. This method is not so hot we have not lost all our validators, and by the way you have to physically shutdown everything and rename files and recreate your schema if you want to inform cassandra that a current table should be compact. This could at very least be just a metadata change. Also you can not add column schema either. Option 2 Normalize (is even worse) create table movie (name String, description string, likestoday int, blacklisted int); create table movecredits( name string, role string, personname string, primary key(name,role) ); create table movetags( name string, tag string, primary key (name,tag) ); This is a terrible design, of the 4 key characteristics how cassandra data should be designed it fails 3: It does not: 1) Denormalize 2) Eliminate seeks 3) Design for read Why is Cassandra steering toward this course, by making a language that does not understand wide rows? So what can be done? My suggestions: Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a virtual view that is compact storage with no work to migrate data and recreate schemas. Every table should have a compact view for the schemaless, or a simple query hint like /*transposed*/ should make this change. Metadata should be definable by regex. For example, all columnes named tag* are of type string. CQL should have the column[slice_start] .. column[slice_end] operator from cql2. CQL should support current users, users should not have to switch between CQL versions, and possibly thrift, to work with wide rows. The language should work for them even if it not expressly designed for
[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480889#comment-13480889 ] Edward Capriolo commented on CASSANDRA-4815: This is possibly more nitpicky because it seems hard to do. create column family compositetry with key_validation_class=UTF8Type and comparator='CompositeType(UTF8Type,UTF8Type)'; [default@testkeyspace] set compositetry ['a']['b:c']=UTF8('d'); [default@testkeyspace] set compositetry ['a']['d:e']=UTF8('f'); [default@testkeyspace] set compositetry ['a']['h:i']=UTF8('j'); cqlsh:testkeyspace select * from compositetry where key='a' and column1='b' and column1'h'; key | column1 | column2 | value -+-+-+--- a | b | c |64 a | d | e |66 cqlsh:testkeyspace select * from compositetry where key='a' and column1='b' and column1'h' and column2='c'; Bad Request: PRIMARY KEY part column2 cannot be restricted (preceding part column1 is either not restricted or by a non-EQ relation) Perhaps you meant to use CQL 2? Try using the -2 option when starting cqlsh. I guess this is slightly more difficult to express composite slices. Make CQL work naturally with wide rows -- Key: CASSANDRA-4815 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815 Project: Cassandra Issue Type: Wish Reporter: Edward Capriolo I find that CQL3 is quite obtuse and does not provide me a language useful for accessing my data. First, lets point out how we should design Cassandra data. 1) Denormalize 2) Eliminate seeks 3) Design for read 4) optimize for blind writes So here is a schema that abides by these tried and tested rules large production uses are employing today. Say we have a table of movie objects: Movie Name Description - tags (string) - credits composite(role string, name string ) -1 likesToday -1 blacklisted The above structure is a movie notice it hold a mix of static and dynamic columns, but the other all number of columns is not very large. (even if it was larger this is OK as well) Notice this table is not just a single one to many relationship, it has 1 to 1 data and it has two sets of 1 to many data. The schema today is declared something like this: create column family movies with default_comparator=UTF8Type and column_metadata = [ {column_name: blacklisted, validation_class: int}, {column_name: likestoday, validation_class: long}, {column_name: description, validation_class: UTF8Type} ]; We should be able to insert data like this: set ['Cassandra Database, not looking for a seQL']['blacklisted']=1; set ['Cassandra Database, not looking for a seQL']['likesToday']=34; set ['Cassandra Database, not looking for a seQL']['credits-dir']='director:asf'; set ['Cassandra Database, not looking for a seQL']['credits-jir]='jiraguy:bob'; set ['Cassandra Database, not looking for a seQL']['tags-action']=''; set ['Cassandra Database, not looking for a seQL']['tags-adventure']=''; set ['Cassandra Database, not looking for a seQL']['tags-romance']=''; set ['Cassandra Database, not looking for a seQL']['tags-programming']=''; This is the correct way to do it. 1 seek to find all the information related to a movie. As long as this row does not get large there is no reason to optimize by breaking data into other column families. (Notice you can not transpose this because movies is two 1-to-many relationships of potentially different types) Lets look at the CQL3 way to do this design: First, contrary to the original design of cassandra CQL does not like wide rows. It also does not have a good way to dealing with dynamic rows together with static rows either. You have two options: Option 1: lose all schema create table movies ( name string, column blob, value blob, primary key(name)) with compact storage. This method is not so hot we have not lost all our validators, and by the way you have to physically shutdown everything and rename files and recreate your schema if you want to inform cassandra that a current table should be compact. This could at very least be just a metadata change. Also you can not add column schema either. Option 2 Normalize (is even worse) create table movie (name String, description string, likestoday int, blacklisted int); create table movecredits( name string, role string, personname string, primary key(name,role) ); create table movetags( name string, tag string, primary key (name,tag) ); This is a terrible design, of the 4 key characteristics how cassandra data should be designed it fails 3: It does not: 1) Denormalize 2) Eliminate seeks 3) Design for read Why is Cassandra steering toward this course, by making a language that does not understand
[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480890#comment-13480890 ] Edward Capriolo commented on CASSANDRA-4815: Also from the article this statement: For the song tags, we have two choices. If we need to be compatible with data from an old-style schema, we can do that as follows: CREATE TABLE song_tags ( id uuid, tag_name text, PRIMARY KEY (id, tag_name) ); What does this mean? 'We can do that'. If we have an old style schema don't we need to be able to alter a current table. Which can't be done. cqlsh:testkeyspace CREATE TABLE song_tags ( id uuid, tag_name text, b text, PRIMARY KEY (id, tag_name) ); Bad Request: org.apache.cassandra.config.ConfigurationException: Cannot add already existing column family 'song_tags' to keyspace 'testkeyspace'. This is why I suggest VIEW tables make sense. All the CQL2 / CQL3 tables look like logical constructs on top of physical column families. Maybe defining multiple logic tables storing data to the same physical ones is the best bet long term. Make CQL work naturally with wide rows -- Key: CASSANDRA-4815 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815 Project: Cassandra Issue Type: Wish Reporter: Edward Capriolo I find that CQL3 is quite obtuse and does not provide me a language useful for accessing my data. First, lets point out how we should design Cassandra data. 1) Denormalize 2) Eliminate seeks 3) Design for read 4) optimize for blind writes So here is a schema that abides by these tried and tested rules large production uses are employing today. Say we have a table of movie objects: Movie Name Description - tags (string) - credits composite(role string, name string ) -1 likesToday -1 blacklisted The above structure is a movie notice it hold a mix of static and dynamic columns, but the other all number of columns is not very large. (even if it was larger this is OK as well) Notice this table is not just a single one to many relationship, it has 1 to 1 data and it has two sets of 1 to many data. The schema today is declared something like this: create column family movies with default_comparator=UTF8Type and column_metadata = [ {column_name: blacklisted, validation_class: int}, {column_name: likestoday, validation_class: long}, {column_name: description, validation_class: UTF8Type} ]; We should be able to insert data like this: set ['Cassandra Database, not looking for a seQL']['blacklisted']=1; set ['Cassandra Database, not looking for a seQL']['likesToday']=34; set ['Cassandra Database, not looking for a seQL']['credits-dir']='director:asf'; set ['Cassandra Database, not looking for a seQL']['credits-jir]='jiraguy:bob'; set ['Cassandra Database, not looking for a seQL']['tags-action']=''; set ['Cassandra Database, not looking for a seQL']['tags-adventure']=''; set ['Cassandra Database, not looking for a seQL']['tags-romance']=''; set ['Cassandra Database, not looking for a seQL']['tags-programming']=''; This is the correct way to do it. 1 seek to find all the information related to a movie. As long as this row does not get large there is no reason to optimize by breaking data into other column families. (Notice you can not transpose this because movies is two 1-to-many relationships of potentially different types) Lets look at the CQL3 way to do this design: First, contrary to the original design of cassandra CQL does not like wide rows. It also does not have a good way to dealing with dynamic rows together with static rows either. You have two options: Option 1: lose all schema create table movies ( name string, column blob, value blob, primary key(name)) with compact storage. This method is not so hot we have not lost all our validators, and by the way you have to physically shutdown everything and rename files and recreate your schema if you want to inform cassandra that a current table should be compact. This could at very least be just a metadata change. Also you can not add column schema either. Option 2 Normalize (is even worse) create table movie (name String, description string, likestoday int, blacklisted int); create table movecredits( name string, role string, personname string, primary key(name,role) ); create table movetags( name string, tag string, primary key (name,tag) ); This is a terrible design, of the 4 key characteristics how cassandra data should be designed it fails 3: It does not: 1) Denormalize 2) Eliminate seeks 3) Design for read Why is Cassandra steering toward this course, by making a language that does not understand wide rows? So what can be done? My suggestions: Cassandra needs to lose the COMPACT STORAGE
[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479294#comment-13479294 ] Jonathan Ellis commented on CASSANDRA-4815: --- Hi Ed, I wrote a longish blog post over at http://www.datastax.com/dev/blog/cql3-for-cassandra-experts showing how use cases like this are handled in CQL3, with no rewriting of data. Give that a read and let me know if you have further questions! (All the examples in that post, except for the one using {{Set}}, are from Cassandra 1.1.6.) Make CQL work naturally with wide rows -- Key: CASSANDRA-4815 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815 Project: Cassandra Issue Type: Wish Reporter: Edward Capriolo I find that CQL3 is quite obtuse and does not provide me a language useful for accessing my data. First, lets point out how we should design Cassandra data. 1) Denormalize 2) Eliminate seeks 3) Design for read 4) optimize for blind writes So here is a schema that abides by these tried and tested rules large production uses are employing today. Say we have a table of movie objects: Movie Name Description - tags (string) - credits composite(role string, name string ) -1 likesToday -1 blacklisted The above structure is a movie notice it hold a mix of static and dynamic columns, but the other all number of columns is not very large. (even if it was larger this is OK as well) Notice this table is not just a single one to many relationship, it has 1 to 1 data and it has two sets of 1 to many data. The schema today is declared something like this: create column family movies with default_comparator=UTF8Type and column_metadata = [ {column_name: blacklisted, validation_class: int}, {column_name: likestoday, validation_class: long}, {column_name: description, validation_class: UTF8Type} ]; We should be able to insert data like this: set ['Cassandra Database, not looking for a seQL']['blacklisted']=1; set ['Cassandra Database, not looking for a seQL']['likesToday']=34; set ['Cassandra Database, not looking for a seQL']['credits-dir']='director:asf'; set ['Cassandra Database, not looking for a seQL']['credits-jir]='jiraguy:bob'; set ['Cassandra Database, not looking for a seQL']['tags-action']=''; set ['Cassandra Database, not looking for a seQL']['tags-adventure']=''; set ['Cassandra Database, not looking for a seQL']['tags-romance']=''; set ['Cassandra Database, not looking for a seQL']['tags-programming']=''; This is the correct way to do it. 1 seek to find all the information related to a movie. As long as this row does not get large there is no reason to optimize by breaking data into other column families. (Notice you can not transpose this because movies is two 1-to-many relationships of potentially different types) Lets look at the CQL3 way to do this design: First, contrary to the original design of cassandra CQL does not like wide rows. It also does not have a good way to dealing with dynamic rows together with static rows either. You have two options: Option 1: lose all schema create table movies ( name string, column blob, value blob, primary key(name)) with compact storage. This method is not so hot we have not lost all our validators, and by the way you have to physically shutdown everything and rename files and recreate your schema if you want to inform cassandra that a current table should be compact. This could at very least be just a metadata change. Also you can not add column schema either. Option 2 Normalize (is even worse) create table movie (name String, description string, likestoday int, blacklisted int); create table movecredits( name string, role string, personname string, primary key(name,role) ); create table movetags( name string, tag string, primary key (name,tag) ); This is a terrible design, of the 4 key characteristics how cassandra data should be designed it fails 3: It does not: 1) Denormalize 2) Eliminate seeks 3) Design for read Why is Cassandra steering toward this course, by making a language that does not understand wide rows? So what can be done? My suggestions: Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a virtual view that is compact storage with no work to migrate data and recreate schemas. Every table should have a compact view for the schemaless, or a simple query hint like /*transposed*/ should make this change. Metadata should be definable by regex. For example, all columnes named tag* are of type string. CQL should have the column[slice_start] .. column[slice_end] operator from cql2. CQL should support current users, users should not have to switch between CQL versions, and possibly thrift, to work
[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477109#comment-13477109 ] Nick Bailey commented on CASSANDRA-4815: Isn't this the main reason behind collections support? {noformat} CREATE TABLE movies ( movie_id int PRIMARY KEY, blacklisted int, credits maptext, text, description text, likes_today int, name text, tags settext ); {noformat} Make CQL work naturally with wide rows -- Key: CASSANDRA-4815 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815 Project: Cassandra Issue Type: Wish Reporter: Edward Capriolo I find that CQL3 is quite obtuse and does not provide me a language useful for accessing my data. First, lets point out how we should design Cassandra data. 1) Denormalize 2) Eliminate seeks 3) Design for read 4) optimize for blind writes So here is a schema that abides by these tried and tested rules large production uses are employing today. Say we have a table of movie objects: Movie Name Description - tags (string) - credits composite(role string, name string ) -1 likesToday -1 blacklisted The above structure is a movie notice it hold a mix of static and dynamic columns, but the other all number of columns is not very large. (even if it was larger this is OK as well) Notice this table is not just a single one to many relationship, it has 1 to 1 data and it has two sets of 1 to many data. The schema today is declared something like this: create column family movies with default_comparator=UTF8Type and column_metadata = [ {column_name: blacklisted, validation_class: int}, {column_name: likestoday, validation_class: long}, {column_name: description, validation_class: UTF8Type} ]; We should be able to insert data like this: set ['Cassandra Database, not looking for a seQL']['blacklisted']=1; set ['Cassandra Database, not looking for a seQL']['likesToday']=34; set ['Cassandra Database, not looking for a seQL']['credits-dir']='director:asf'; set ['Cassandra Database, not looking for a seQL']['credits-jir]='jiraguy:bob'; set ['Cassandra Database, not looking for a seQL']['tags-action']=''; set ['Cassandra Database, not looking for a seQL']['tags-adventure']=''; set ['Cassandra Database, not looking for a seQL']['tags-romance']=''; set ['Cassandra Database, not looking for a seQL']['tags-programming']=''; This is the correct way to do it. 1 seek to find all the information related to a movie. As long as this row does not get large there is no reason to optimize by breaking data into other column families. (Notice you can not transpose this because movies is two 1-to-many relationships of potentially different types) Lets look at the CQL3 way to do this design: First, contrary to the original design of cassandra CQL does not like wide rows. It also does not have a good way to dealing with dynamic rows together with static rows either. You have two options: Option 1: lose all schema create table movies ( name string, column blob, value blob, primary key(name)) with compact storage. This method is not so hot we have not lost all our validators, and by the way you have to physically shutdown everything and rename files and recreate your schema if you want to inform cassandra that a current table should be compact. This could at very least be just a metadata change. Also you can not add column schema either. Option 2 Normalize (is even worse) create table movie (name String, description string, likestoday int, blacklisted int); create table movecredits( name string, role string, personname string, primary key(name,role) ); create table movetags( name string, tag string, primary key (name,tag) ); This is a terrible design, of the 4 key characteristics how cassandra data should be designed it fails 3: It does not: 1) Denormalize 2) Eliminate seeks 3) Design for read Why is Cassandra steering toward this course, by making a language that does not understand wide rows? So what can be done? My suggestions: Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a virtual view that is compact storage with no work to migrate data and recreate schemas. Every table should have a compact view for the schemaless, or a simple query hint like /*transposed*/ should make this change. Metadata should be definable by regex. For example, all columnes named tag* are of type string. CQL should have the column[slice_start] .. column[slice_end] operator from cql2. CQL should support current users, users should not have to switch between CQL versions, and possibly thrift, to work with wide rows. The language should work for them even if it not expressly designed for them. Some of
[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477114#comment-13477114 ] T Jake Luciani commented on CASSANDRA-4815: --- I agree CQL3 is a step towards requiring more schema... I think for a lot of people that's a good thing and others it's not. The core of the issue here IMO is not how can we change CQL3 to fit your use case. It's will CQL3 eventually be the only way to access Cassandra in N years or can we always rely on there being the old more schemaless API? Make CQL work naturally with wide rows -- Key: CASSANDRA-4815 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815 Project: Cassandra Issue Type: Wish Reporter: Edward Capriolo I find that CQL3 is quite obtuse and does not provide me a language useful for accessing my data. First, lets point out how we should design Cassandra data. 1) Denormalize 2) Eliminate seeks 3) Design for read 4) optimize for blind writes So here is a schema that abides by these tried and tested rules large production uses are employing today. Say we have a table of movie objects: Movie Name Description - tags (string) - credits composite(role string, name string ) -1 likesToday -1 blacklisted The above structure is a movie notice it hold a mix of static and dynamic columns, but the other all number of columns is not very large. (even if it was larger this is OK as well) Notice this table is not just a single one to many relationship, it has 1 to 1 data and it has two sets of 1 to many data. The schema today is declared something like this: create column family movies with default_comparator=UTF8Type and column_metadata = [ {column_name: blacklisted, validation_class: int}, {column_name: likestoday, validation_class: long}, {column_name: description, validation_class: UTF8Type} ]; We should be able to insert data like this: set ['Cassandra Database, not looking for a seQL']['blacklisted']=1; set ['Cassandra Database, not looking for a seQL']['likesToday']=34; set ['Cassandra Database, not looking for a seQL']['credits-dir']='director:asf'; set ['Cassandra Database, not looking for a seQL']['credits-jir]='jiraguy:bob'; set ['Cassandra Database, not looking for a seQL']['tags-action']=''; set ['Cassandra Database, not looking for a seQL']['tags-adventure']=''; set ['Cassandra Database, not looking for a seQL']['tags-romance']=''; set ['Cassandra Database, not looking for a seQL']['tags-programming']=''; This is the correct way to do it. 1 seek to find all the information related to a movie. As long as this row does not get large there is no reason to optimize by breaking data into other column families. (Notice you can not transpose this because movies is two 1-to-many relationships of potentially different types) Lets look at the CQL3 way to do this design: First, contrary to the original design of cassandra CQL does not like wide rows. It also does not have a good way to dealing with dynamic rows together with static rows either. You have two options: Option 1: lose all schema create table movies ( name string, column blob, value blob, primary key(name)) with compact storage. This method is not so hot we have not lost all our validators, and by the way you have to physically shutdown everything and rename files and recreate your schema if you want to inform cassandra that a current table should be compact. This could at very least be just a metadata change. Also you can not add column schema either. Option 2 Normalize (is even worse) create table movie (name String, description string, likestoday int, blacklisted int); create table movecredits( name string, role string, personname string, primary key(name,role) ); create table movetags( name string, tag string, primary key (name,tag) ); This is a terrible design, of the 4 key characteristics how cassandra data should be designed it fails 3: It does not: 1) Denormalize 2) Eliminate seeks 3) Design for read Why is Cassandra steering toward this course, by making a language that does not understand wide rows? So what can be done? My suggestions: Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a virtual view that is compact storage with no work to migrate data and recreate schemas. Every table should have a compact view for the schemaless, or a simple query hint like /*transposed*/ should make this change. Metadata should be definable by regex. For example, all columnes named tag* are of type string. CQL should have the column[slice_start] .. column[slice_end] operator from cql2. CQL should support current users, users should not have to switch between CQL versions, and possibly
[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477218#comment-13477218 ] Edward Capriolo commented on CASSANDRA-4815: As I mentioned towards the end of the ticket, this feature request is not to support future theoretical use cases, it is to support the dominant current use case. It is not just my use case. It is the use case that the Cassandra project originally advocated. http://www.slideshare.net/lomakin.andrey/apache-cassandra-part-1-principles-data-model slide 30 -columns aren't fixed -columns can be sorted -columns can be queried for a certain range I am fine if Cassandra adds new features that benefit from more schema, I am fine with Cassandra adding collections and think these are a great idea. But I see no technical reason why CQL can't support both old and new use cases. This is especially disturbing since the project offers no eloquent way to get from now to the future. Switching to COMPACT STORAGE is a pain and rewriting all the data into a new collection based design is not necessarily a good use of resources. Someone once told me Avro was the future of Cassandra. I am asking for features to support the now. Make CQL work naturally with wide rows -- Key: CASSANDRA-4815 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815 Project: Cassandra Issue Type: Wish Reporter: Edward Capriolo I find that CQL3 is quite obtuse and does not provide me a language useful for accessing my data. First, lets point out how we should design Cassandra data. 1) Denormalize 2) Eliminate seeks 3) Design for read 4) optimize for blind writes So here is a schema that abides by these tried and tested rules large production uses are employing today. Say we have a table of movie objects: Movie Name Description - tags (string) - credits composite(role string, name string ) -1 likesToday -1 blacklisted The above structure is a movie notice it hold a mix of static and dynamic columns, but the other all number of columns is not very large. (even if it was larger this is OK as well) Notice this table is not just a single one to many relationship, it has 1 to 1 data and it has two sets of 1 to many data. The schema today is declared something like this: create column family movies with default_comparator=UTF8Type and column_metadata = [ {column_name: blacklisted, validation_class: int}, {column_name: likestoday, validation_class: long}, {column_name: description, validation_class: UTF8Type} ]; We should be able to insert data like this: set ['Cassandra Database, not looking for a seQL']['blacklisted']=1; set ['Cassandra Database, not looking for a seQL']['likesToday']=34; set ['Cassandra Database, not looking for a seQL']['credits-dir']='director:asf'; set ['Cassandra Database, not looking for a seQL']['credits-jir]='jiraguy:bob'; set ['Cassandra Database, not looking for a seQL']['tags-action']=''; set ['Cassandra Database, not looking for a seQL']['tags-adventure']=''; set ['Cassandra Database, not looking for a seQL']['tags-romance']=''; set ['Cassandra Database, not looking for a seQL']['tags-programming']=''; This is the correct way to do it. 1 seek to find all the information related to a movie. As long as this row does not get large there is no reason to optimize by breaking data into other column families. (Notice you can not transpose this because movies is two 1-to-many relationships of potentially different types) Lets look at the CQL3 way to do this design: First, contrary to the original design of cassandra CQL does not like wide rows. It also does not have a good way to dealing with dynamic rows together with static rows either. You have two options: Option 1: lose all schema create table movies ( name string, column blob, value blob, primary key(name)) with compact storage. This method is not so hot we have not lost all our validators, and by the way you have to physically shutdown everything and rename files and recreate your schema if you want to inform cassandra that a current table should be compact. This could at very least be just a metadata change. Also you can not add column schema either. Option 2 Normalize (is even worse) create table movie (name String, description string, likestoday int, blacklisted int); create table movecredits( name string, role string, personname string, primary key(name,role) ); create table movetags( name string, tag string, primary key (name,tag) ); This is a terrible design, of the 4 key characteristics how cassandra data should be designed it fails 3: It does not: 1) Denormalize 2) Eliminate seeks 3) Design for read Why is Cassandra steering toward this course, by