[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows

2012-10-29 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486115#comment-13486115
 ] 

Edward Capriolo commented on CASSANDRA-4815:


I do not think set and get are syntactic features that should be out of this 
discussion. I was doing some blogging this weekend and came to the 
re-realization that, BigTable just provides a simple low level API. So its 
fairly hard for us to argue that Cassandra should not have a simple set and 
get. 

Thinking further into this I think the new transport only being able to execute 
CQL queries is a huge defect. We are going to continually have these 
discussions about what we can and can't do in CQL, that we can do in thrift.  

We should not have to spend time designing CQL features to solve impedance 
mismatches between RPC and query languages, and we should not be redesigning 
Cassandra so every operation fits into a CQL language.

We have to face a reality, it is going to be quite awkward for to clients to 
maintain multiple connection pools for client requests, 1 for thrift, one for 
cql2, and one for cql3, one for cql4, etc. The new transport should be able to 
piggyback thrift requests somehow, this way a user only needs to maintain a 
single client connection. 

 Make CQL work naturally with wide rows
 --

 Key: CASSANDRA-4815
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815
 Project: Cassandra
  Issue Type: Wish
Reporter: Edward Capriolo
 Attachments: cql feature set updated.png, table.png


 I find that CQL3 is quite obtuse and does not provide me a language useful 
 for accessing my data. First, lets point out how we should design Cassandra 
 data. 
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 4) optimize for blind writes
 So here is a schema that abides by these tried and tested rules large 
 production uses are employing today. 
 Say we have a table of movie objects:
 Movie
 Name
 Description
 - tags   (string)
 - credits composite(role string, name string )
 -1 likesToday
 -1 blacklisted
 The above structure is a movie notice it hold a mix of static and dynamic 
 columns, but the other all number of columns is not very large. (even if it 
 was larger this is OK as well) Notice this table is not just 
 a single one to many relationship, it has 1 to 1 data and it has two sets of 
 1 to many data.
 The schema today is declared something like this:
 create column family movies
 with default_comparator=UTF8Type and
   column_metadata =
   [
 {column_name: blacklisted, validation_class: int},
 {column_name: likestoday, validation_class: long},
 {column_name: description, validation_class: UTF8Type}
   ];
 We should be able to insert data like this:
 set ['Cassandra Database, not looking for a seQL']['blacklisted']=1;
 set ['Cassandra Database, not looking for a seQL']['likesToday']=34;
 set ['Cassandra Database, not looking for a 
 seQL']['credits-dir']='director:asf';
 set ['Cassandra Database, not looking for a 
 seQL']['credits-jir]='jiraguy:bob';
 set ['Cassandra Database, not looking for a seQL']['tags-action']='';
 set ['Cassandra Database, not looking for a seQL']['tags-adventure']='';
 set ['Cassandra Database, not looking for a seQL']['tags-romance']='';
 set ['Cassandra Database, not looking for a seQL']['tags-programming']='';
 This is the correct way to do it. 1 seek to find all the information related 
 to a movie. As long as this row does
 not get large there is no reason to optimize by breaking data into other 
 column families. (Notice you can not transpose this
 because movies is two 1-to-many relationships of potentially different types)
 Lets look at the CQL3 way to do this design:
 First, contrary to the original design of cassandra CQL does not like wide 
 rows. It also does not have a good way to dealing with dynamic rows together 
 with static rows either.
 You have two options:
 Option 1: lose all schema
 create table movies ( name string, column blob, value blob, primary 
 key(name)) with compact storage.
 This method is not so hot we have not lost all our validators, and by the way 
 you have to physically shutdown everything and rename files and recreate your 
 schema if you want to inform cassandra that a current table should be 
 compact. This could at very least be just a metadata change. Also you can not 
 add column schema either.
 Option 2  Normalize (is even worse)
 create table movie (name String, description string, likestoday int, 
 blacklisted int);
 create table movecredits( name string, role string, personname string, 
 primary key(name,role) );
 create table movetags( name string, tag string, primary key (name,tag) );
 This is a terrible design, of the 4 key characteristics how cassandra data 
 should be designed it fails 3:
 It does 

[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows

2012-10-29 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486158#comment-13486158
 ] 

Jonathan Ellis commented on CASSANDRA-4815:
---

I'm still not sure we're on the same page as far as GET and SET go.

I'm saying that functionally, if you have

{code}
create column family test;
{code}

(all cli defaults -- everything is bytes), then

{code}
set test['ff']['dd'] = 'cc';
{code}

in the cli (translation to Thrift left as an exercise for the reader) is 
EXACTLY the same as

{code}
insert into test(key, column1, value) values ('ff', 'dd', 'cc');
{code}

in cql.

If you think we're missing functionality here then let's clear that up.  But if 
you're hung up on the syntax then we'll have to agree to disagree.

 Make CQL work naturally with wide rows
 --

 Key: CASSANDRA-4815
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815
 Project: Cassandra
  Issue Type: Wish
Reporter: Edward Capriolo
 Attachments: cql feature set updated.png, table.png


 I find that CQL3 is quite obtuse and does not provide me a language useful 
 for accessing my data. First, lets point out how we should design Cassandra 
 data. 
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 4) optimize for blind writes
 So here is a schema that abides by these tried and tested rules large 
 production uses are employing today. 
 Say we have a table of movie objects:
 Movie
 Name
 Description
 - tags   (string)
 - credits composite(role string, name string )
 -1 likesToday
 -1 blacklisted
 The above structure is a movie notice it hold a mix of static and dynamic 
 columns, but the other all number of columns is not very large. (even if it 
 was larger this is OK as well) Notice this table is not just 
 a single one to many relationship, it has 1 to 1 data and it has two sets of 
 1 to many data.
 The schema today is declared something like this:
 create column family movies
 with default_comparator=UTF8Type and
   column_metadata =
   [
 {column_name: blacklisted, validation_class: int},
 {column_name: likestoday, validation_class: long},
 {column_name: description, validation_class: UTF8Type}
   ];
 We should be able to insert data like this:
 set ['Cassandra Database, not looking for a seQL']['blacklisted']=1;
 set ['Cassandra Database, not looking for a seQL']['likesToday']=34;
 set ['Cassandra Database, not looking for a 
 seQL']['credits-dir']='director:asf';
 set ['Cassandra Database, not looking for a 
 seQL']['credits-jir]='jiraguy:bob';
 set ['Cassandra Database, not looking for a seQL']['tags-action']='';
 set ['Cassandra Database, not looking for a seQL']['tags-adventure']='';
 set ['Cassandra Database, not looking for a seQL']['tags-romance']='';
 set ['Cassandra Database, not looking for a seQL']['tags-programming']='';
 This is the correct way to do it. 1 seek to find all the information related 
 to a movie. As long as this row does
 not get large there is no reason to optimize by breaking data into other 
 column families. (Notice you can not transpose this
 because movies is two 1-to-many relationships of potentially different types)
 Lets look at the CQL3 way to do this design:
 First, contrary to the original design of cassandra CQL does not like wide 
 rows. It also does not have a good way to dealing with dynamic rows together 
 with static rows either.
 You have two options:
 Option 1: lose all schema
 create table movies ( name string, column blob, value blob, primary 
 key(name)) with compact storage.
 This method is not so hot we have not lost all our validators, and by the way 
 you have to physically shutdown everything and rename files and recreate your 
 schema if you want to inform cassandra that a current table should be 
 compact. This could at very least be just a metadata change. Also you can not 
 add column schema either.
 Option 2  Normalize (is even worse)
 create table movie (name String, description string, likestoday int, 
 blacklisted int);
 create table movecredits( name string, role string, personname string, 
 primary key(name,role) );
 create table movetags( name string, tag string, primary key (name,tag) );
 This is a terrible design, of the 4 key characteristics how cassandra data 
 should be designed it fails 3:
 It does not:
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 Why is Cassandra steering toward this course, by making a language that does 
 not understand wide rows?
 So what can be done? My suggestions: 
 Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a 
 virtual view that is compact storage with no work to migrate data and 
 recreate schemas. Every table should have a compact view for the schemaless, 
 or a simple query hint like /*transposed*/ should make this change.
 Metadata 

[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows

2012-10-23 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482161#comment-13482161
 ] 

Sylvain Lebresne commented on CASSANDRA-4815:
-

bq. Now to insert into this table I need to format everything as hex

Not for prepared statement where all the value will be in binary. What I mean 
here is that as far as CQL-the-language is concerned, you can absolutely use it 
to think of Cassandra as a memcache (in fact, I'd say that the hard would be to 
think of Cassandra as a relational database because it's not a relational 
database, at least not a full blown one, and CQL don't change that).

Now if the remark is that it's less convenient to work with blobs in cqlsh than 
it was with the cli, then I can agree to that and I'm fine trying to fix it, 
but let's maybe keep that to CASSANDRA-3799. 

 Make CQL work naturally with wide rows
 --

 Key: CASSANDRA-4815
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815
 Project: Cassandra
  Issue Type: Wish
Reporter: Edward Capriolo

 I find that CQL3 is quite obtuse and does not provide me a language useful 
 for accessing my data. First, lets point out how we should design Cassandra 
 data. 
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 4) optimize for blind writes
 So here is a schema that abides by these tried and tested rules large 
 production uses are employing today. 
 Say we have a table of movie objects:
 Movie
 Name
 Description
 - tags   (string)
 - credits composite(role string, name string )
 -1 likesToday
 -1 blacklisted
 The above structure is a movie notice it hold a mix of static and dynamic 
 columns, but the other all number of columns is not very large. (even if it 
 was larger this is OK as well) Notice this table is not just 
 a single one to many relationship, it has 1 to 1 data and it has two sets of 
 1 to many data.
 The schema today is declared something like this:
 create column family movies
 with default_comparator=UTF8Type and
   column_metadata =
   [
 {column_name: blacklisted, validation_class: int},
 {column_name: likestoday, validation_class: long},
 {column_name: description, validation_class: UTF8Type}
   ];
 We should be able to insert data like this:
 set ['Cassandra Database, not looking for a seQL']['blacklisted']=1;
 set ['Cassandra Database, not looking for a seQL']['likesToday']=34;
 set ['Cassandra Database, not looking for a 
 seQL']['credits-dir']='director:asf';
 set ['Cassandra Database, not looking for a 
 seQL']['credits-jir]='jiraguy:bob';
 set ['Cassandra Database, not looking for a seQL']['tags-action']='';
 set ['Cassandra Database, not looking for a seQL']['tags-adventure']='';
 set ['Cassandra Database, not looking for a seQL']['tags-romance']='';
 set ['Cassandra Database, not looking for a seQL']['tags-programming']='';
 This is the correct way to do it. 1 seek to find all the information related 
 to a movie. As long as this row does
 not get large there is no reason to optimize by breaking data into other 
 column families. (Notice you can not transpose this
 because movies is two 1-to-many relationships of potentially different types)
 Lets look at the CQL3 way to do this design:
 First, contrary to the original design of cassandra CQL does not like wide 
 rows. It also does not have a good way to dealing with dynamic rows together 
 with static rows either.
 You have two options:
 Option 1: lose all schema
 create table movies ( name string, column blob, value blob, primary 
 key(name)) with compact storage.
 This method is not so hot we have not lost all our validators, and by the way 
 you have to physically shutdown everything and rename files and recreate your 
 schema if you want to inform cassandra that a current table should be 
 compact. This could at very least be just a metadata change. Also you can not 
 add column schema either.
 Option 2  Normalize (is even worse)
 create table movie (name String, description string, likestoday int, 
 blacklisted int);
 create table movecredits( name string, role string, personname string, 
 primary key(name,role) );
 create table movetags( name string, tag string, primary key (name,tag) );
 This is a terrible design, of the 4 key characteristics how cassandra data 
 should be designed it fails 3:
 It does not:
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 Why is Cassandra steering toward this course, by making a language that does 
 not understand wide rows?
 So what can be done? My suggestions: 
 Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a 
 virtual view that is compact storage with no work to migrate data and 
 recreate schemas. Every table should have a compact view for the schemaless, 
 or a simple query hint like /*transposed*/ should 

[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows

2012-10-23 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482439#comment-13482439
 ] 

Edward Capriolo commented on CASSANDRA-4815:


@Jonathan agreed. My main concern is that a user can set schema-less columns. 
This currently looks possible with compact storage tables but not possible with 
non-compact storage tables. 

I attached that table to show what features a user has with one client vs the 
other. I am not necessarily arguing that CQL should have a given feature in 
that table. I was only trying to show that based on the client users have 
access to some features and not others. 

Also I wanted to highlight how all the different clients have strengths and 
deficiencies. Internally I have to sell things to people and I just wanted to 
show the CQL and CQLsh are week in comparison to the CLI for schema-less 
columns. 

For my QA person as an example, they learned how set and assume worked in the 
CLI and the had functions like ascii(). These things are missing and people are 
effected. 

So my table is not to say all the things CQL should support just to show the 
reality of what users are faced with.
 
 

 Make CQL work naturally with wide rows
 --

 Key: CASSANDRA-4815
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815
 Project: Cassandra
  Issue Type: Wish
Reporter: Edward Capriolo
 Attachments: cql feature set updated.png, table.png


 I find that CQL3 is quite obtuse and does not provide me a language useful 
 for accessing my data. First, lets point out how we should design Cassandra 
 data. 
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 4) optimize for blind writes
 So here is a schema that abides by these tried and tested rules large 
 production uses are employing today. 
 Say we have a table of movie objects:
 Movie
 Name
 Description
 - tags   (string)
 - credits composite(role string, name string )
 -1 likesToday
 -1 blacklisted
 The above structure is a movie notice it hold a mix of static and dynamic 
 columns, but the other all number of columns is not very large. (even if it 
 was larger this is OK as well) Notice this table is not just 
 a single one to many relationship, it has 1 to 1 data and it has two sets of 
 1 to many data.
 The schema today is declared something like this:
 create column family movies
 with default_comparator=UTF8Type and
   column_metadata =
   [
 {column_name: blacklisted, validation_class: int},
 {column_name: likestoday, validation_class: long},
 {column_name: description, validation_class: UTF8Type}
   ];
 We should be able to insert data like this:
 set ['Cassandra Database, not looking for a seQL']['blacklisted']=1;
 set ['Cassandra Database, not looking for a seQL']['likesToday']=34;
 set ['Cassandra Database, not looking for a 
 seQL']['credits-dir']='director:asf';
 set ['Cassandra Database, not looking for a 
 seQL']['credits-jir]='jiraguy:bob';
 set ['Cassandra Database, not looking for a seQL']['tags-action']='';
 set ['Cassandra Database, not looking for a seQL']['tags-adventure']='';
 set ['Cassandra Database, not looking for a seQL']['tags-romance']='';
 set ['Cassandra Database, not looking for a seQL']['tags-programming']='';
 This is the correct way to do it. 1 seek to find all the information related 
 to a movie. As long as this row does
 not get large there is no reason to optimize by breaking data into other 
 column families. (Notice you can not transpose this
 because movies is two 1-to-many relationships of potentially different types)
 Lets look at the CQL3 way to do this design:
 First, contrary to the original design of cassandra CQL does not like wide 
 rows. It also does not have a good way to dealing with dynamic rows together 
 with static rows either.
 You have two options:
 Option 1: lose all schema
 create table movies ( name string, column blob, value blob, primary 
 key(name)) with compact storage.
 This method is not so hot we have not lost all our validators, and by the way 
 you have to physically shutdown everything and rename files and recreate your 
 schema if you want to inform cassandra that a current table should be 
 compact. This could at very least be just a metadata change. Also you can not 
 add column schema either.
 Option 2  Normalize (is even worse)
 create table movie (name String, description string, likestoday int, 
 blacklisted int);
 create table movecredits( name string, role string, personname string, 
 primary key(name,role) );
 create table movetags( name string, tag string, primary key (name,tag) );
 This is a terrible design, of the 4 key characteristics how cassandra data 
 should be designed it fails 3:
 It does not:
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 Why is Cassandra steering toward 

[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows

2012-10-23 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482445#comment-13482445
 ] 

Edward Capriolo commented on CASSANDRA-4815:


@Jonathan
I disagree with the attachment. For slice composites in CQL3 you have 'YES'. 
But with compact storage we can only slice based on the first value of the 
composite. This is why I said 'KINDA' because a composite might be a very wide 
row. Thus if the first value of the composite has 100,000 values equal to 5 and 
then the second part of the composite has high cardinality that can not be 
sliced effectively. 

The way I would say this is CQL3 can effectively slice the composites it 
created in schema-full tables, CQL3 can slice only on the first column of 
composite in a schema-less table.

Sylvain agreed with this above
{quote}It's possibly nitpicking, but I would talk of a difficulty in poperly 
paginating composites. But yes, that's one of the very few things that CQL3 is 
not currently very good at. But we'll fix it (and the good thing about having a 
query language is that it will be trivial to fix it without a backward 
incompatible breaking change). {quote}

 Make CQL work naturally with wide rows
 --

 Key: CASSANDRA-4815
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815
 Project: Cassandra
  Issue Type: Wish
Reporter: Edward Capriolo
 Attachments: cql feature set updated.png, table.png


 I find that CQL3 is quite obtuse and does not provide me a language useful 
 for accessing my data. First, lets point out how we should design Cassandra 
 data. 
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 4) optimize for blind writes
 So here is a schema that abides by these tried and tested rules large 
 production uses are employing today. 
 Say we have a table of movie objects:
 Movie
 Name
 Description
 - tags   (string)
 - credits composite(role string, name string )
 -1 likesToday
 -1 blacklisted
 The above structure is a movie notice it hold a mix of static and dynamic 
 columns, but the other all number of columns is not very large. (even if it 
 was larger this is OK as well) Notice this table is not just 
 a single one to many relationship, it has 1 to 1 data and it has two sets of 
 1 to many data.
 The schema today is declared something like this:
 create column family movies
 with default_comparator=UTF8Type and
   column_metadata =
   [
 {column_name: blacklisted, validation_class: int},
 {column_name: likestoday, validation_class: long},
 {column_name: description, validation_class: UTF8Type}
   ];
 We should be able to insert data like this:
 set ['Cassandra Database, not looking for a seQL']['blacklisted']=1;
 set ['Cassandra Database, not looking for a seQL']['likesToday']=34;
 set ['Cassandra Database, not looking for a 
 seQL']['credits-dir']='director:asf';
 set ['Cassandra Database, not looking for a 
 seQL']['credits-jir]='jiraguy:bob';
 set ['Cassandra Database, not looking for a seQL']['tags-action']='';
 set ['Cassandra Database, not looking for a seQL']['tags-adventure']='';
 set ['Cassandra Database, not looking for a seQL']['tags-romance']='';
 set ['Cassandra Database, not looking for a seQL']['tags-programming']='';
 This is the correct way to do it. 1 seek to find all the information related 
 to a movie. As long as this row does
 not get large there is no reason to optimize by breaking data into other 
 column families. (Notice you can not transpose this
 because movies is two 1-to-many relationships of potentially different types)
 Lets look at the CQL3 way to do this design:
 First, contrary to the original design of cassandra CQL does not like wide 
 rows. It also does not have a good way to dealing with dynamic rows together 
 with static rows either.
 You have two options:
 Option 1: lose all schema
 create table movies ( name string, column blob, value blob, primary 
 key(name)) with compact storage.
 This method is not so hot we have not lost all our validators, and by the way 
 you have to physically shutdown everything and rename files and recreate your 
 schema if you want to inform cassandra that a current table should be 
 compact. This could at very least be just a metadata change. Also you can not 
 add column schema either.
 Option 2  Normalize (is even worse)
 create table movie (name String, description string, likestoday int, 
 blacklisted int);
 create table movecredits( name string, role string, personname string, 
 primary key(name,role) );
 create table movetags( name string, tag string, primary key (name,tag) );
 This is a terrible design, of the 4 key characteristics how cassandra data 
 should be designed it fails 3:
 It does not:
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 Why is Cassandra steering toward this 

[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows

2012-10-23 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482464#comment-13482464
 ] 

Sylvain Lebresne commented on CASSANDRA-4815:
-

Though there is some truth about slice on composites currently having a few 
limitations in CQL3, CQL3 can slice only on the first column of composite is 
not true either (regardless of it being a schema-less or a compact/non-compact 
table). I've just created CASSANDRA-4851 to lift that limitation (and as I 
explain in that ticket, you can slice any component, not only the first one, 
but you cannot page simultaneously on both in a way, which imo is only useful 
for pagination in real life). Nevertheless, it is a current limitation, but let 
it be clear that we intend to fix it.

I have the feeling that there is a misunderstanding in that some seem to 
believe that we intend to limit the possible use case for Cassandra with CQL3. 
That is absolutely not the case. In fact, aside for CASSANDRA-4851 (which I 
think is fairly specific) and creating a secondary index on a specific column 
of a wide row (feature that I've only ever see one person using, and even he 
agree that was kind of a hack and for which CASSANDRA-3782 is open 
nonetheless), I'm not aware of any use cases that thrift support but CQL3 
doesn't. And when I say that, I'm including things as dynamic as using 
DynamicCompositeType (that I don't particularly encourage anyone to use btw, 
I'm still looking for a compelling use case where it is truly necessary). That 
is, CQL3 doesn't provide any nice syntax to work with DynamicCompositeType, but 
you can still use it the same way you do in thrift (the syntax will be pretty 
much as convenient as in thrift, that is not very convenient at all, but you 
can do it and it's not worth than in thrift).

 Make CQL work naturally with wide rows
 --

 Key: CASSANDRA-4815
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815
 Project: Cassandra
  Issue Type: Wish
Reporter: Edward Capriolo
 Attachments: cql feature set updated.png, table.png


 I find that CQL3 is quite obtuse and does not provide me a language useful 
 for accessing my data. First, lets point out how we should design Cassandra 
 data. 
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 4) optimize for blind writes
 So here is a schema that abides by these tried and tested rules large 
 production uses are employing today. 
 Say we have a table of movie objects:
 Movie
 Name
 Description
 - tags   (string)
 - credits composite(role string, name string )
 -1 likesToday
 -1 blacklisted
 The above structure is a movie notice it hold a mix of static and dynamic 
 columns, but the other all number of columns is not very large. (even if it 
 was larger this is OK as well) Notice this table is not just 
 a single one to many relationship, it has 1 to 1 data and it has two sets of 
 1 to many data.
 The schema today is declared something like this:
 create column family movies
 with default_comparator=UTF8Type and
   column_metadata =
   [
 {column_name: blacklisted, validation_class: int},
 {column_name: likestoday, validation_class: long},
 {column_name: description, validation_class: UTF8Type}
   ];
 We should be able to insert data like this:
 set ['Cassandra Database, not looking for a seQL']['blacklisted']=1;
 set ['Cassandra Database, not looking for a seQL']['likesToday']=34;
 set ['Cassandra Database, not looking for a 
 seQL']['credits-dir']='director:asf';
 set ['Cassandra Database, not looking for a 
 seQL']['credits-jir]='jiraguy:bob';
 set ['Cassandra Database, not looking for a seQL']['tags-action']='';
 set ['Cassandra Database, not looking for a seQL']['tags-adventure']='';
 set ['Cassandra Database, not looking for a seQL']['tags-romance']='';
 set ['Cassandra Database, not looking for a seQL']['tags-programming']='';
 This is the correct way to do it. 1 seek to find all the information related 
 to a movie. As long as this row does
 not get large there is no reason to optimize by breaking data into other 
 column families. (Notice you can not transpose this
 because movies is two 1-to-many relationships of potentially different types)
 Lets look at the CQL3 way to do this design:
 First, contrary to the original design of cassandra CQL does not like wide 
 rows. It also does not have a good way to dealing with dynamic rows together 
 with static rows either.
 You have two options:
 Option 1: lose all schema
 create table movies ( name string, column blob, value blob, primary 
 key(name)) with compact storage.
 This method is not so hot we have not lost all our validators, and by the way 
 you have to physically shutdown everything and rename files and recreate your 
 schema if you want to inform cassandra that a current 

[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows

2012-10-22 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481264#comment-13481264
 ] 

Sylvain Lebresne commented on CASSANDRA-4815:
-


bq. Can I create a schema less table?

Yes. The following as-schemaless-as-can-possibly-be thrift/cli definition:
{noformat}
create column family schemaless
  with key_validation_class = BytesType
  and comparator = BytesType
  and default_validation_class = BytesType
{noformat}
is *equivalent* to the following CQL3 definition
{noformat}
CREATE TABLE schemaless (
  key blob,
  column blob,
  value blob,
  PRIMARY KEY (key, column)
) WITH COMPACT STORAGE
{noformat}
And to be clear, when I say equivalent, I mean equivalent. If you create the 
first definion above, you can use the column family in CQL3 as if it was 
defined by the second definition (as in, you don't have to do the CREATE TABLE 
itself), or you can create the table in CQL3 first with the second query and 
query it in thrift exactly as if it had been created by the first definition.

The composite primary key is what tells CQL3 that it's a transposed wide CF.  
In other words, in CQL3, 'key' will map to the row key, 'column' will map to 
the internal column name and 'value' will map to the internal column value. I 
note that 'key', 'column' and 'value' are the default names that CQL3 picks for 
you when you haven't explicitely defined user friendlier one (in other words, 
when you upgrade from thrift). CASSANDRA-4822 is open to allow you to rename 
those default names to more user friendly ones if you so wish (and to be clear, 
doing so as no impact whatsoever on what is stored, it just declare the new 
names as CQL3 metadata).

bq. I guess this is slightly more difficult to express composite slices.

It's possibly nitpicking, but I would talk of a difficulty in poperly 
paginating composites. But yes, that's one of the very few things that CQL3 is 
not currently very good at. But we'll fix it (and the good thing about having a 
query language is that it will be trivial to fix it without a backward 
incompatible breaking change). That being said, I do believe that once you 
start doing real life example, it's not really a blocker. Most of the time, 
when you use composites in real life, you want to slice over one of the 
component, which works fine. That's why it's really more a problem for slightly 
more complex pagination over composite wide rows. There is also CASSANDRA-4415 
that will fix the need for a good part of the manual pagination people do right 
now.

bq. If we have an old style schema don't we need to be able to alter a current 
table.

As explained above, thrift CF *are* directly accessible from CQL3 (without 
any redefinition, and that's why trying to create the table in CQL3 is not 
legal). However, you won't nice column names if you do so (but rather the 
'key', 'column' and 'value' generic names above). Again, CASSANDRA-4822 will 
allow to declare nice names without having to do complex operation (like 
trashing your thrift schema so that CQL3 allow the redefinition).

bq. What is going to happen if Cassandra and the CQL language actually adds 
true composite row keys?

It does already: CASSANDRA-4179. You just declare
{noformat}
PRIMARY KEY ((id_part1, id_part2), tag_name).
{noformat}


 Make CQL work naturally with wide rows
 --

 Key: CASSANDRA-4815
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815
 Project: Cassandra
  Issue Type: Wish
Reporter: Edward Capriolo

 I find that CQL3 is quite obtuse and does not provide me a language useful 
 for accessing my data. First, lets point out how we should design Cassandra 
 data. 
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 4) optimize for blind writes
 So here is a schema that abides by these tried and tested rules large 
 production uses are employing today. 
 Say we have a table of movie objects:
 Movie
 Name
 Description
 - tags   (string)
 - credits composite(role string, name string )
 -1 likesToday
 -1 blacklisted
 The above structure is a movie notice it hold a mix of static and dynamic 
 columns, but the other all number of columns is not very large. (even if it 
 was larger this is OK as well) Notice this table is not just 
 a single one to many relationship, it has 1 to 1 data and it has two sets of 
 1 to many data.
 The schema today is declared something like this:
 create column family movies
 with default_comparator=UTF8Type and
   column_metadata =
   [
 {column_name: blacklisted, validation_class: int},
 {column_name: likestoday, validation_class: long},
 {column_name: description, validation_class: UTF8Type}
   ];
 We should be able to insert data like this:
 set ['Cassandra Database, not looking for a seQL']['blacklisted']=1;
 set ['Cassandra 

[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows

2012-10-22 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481506#comment-13481506
 ] 

Jonathan Ellis commented on CASSANDRA-4815:
---

These are good questions and I wanted to reach a wider audience than the Jira 
followers, so I wrote a blog post to address the questions here: 
http://www.datastax.com/dev/blog/cql3-for-cassandra-experts

Please let me know if that clarifies things.

(Note that all the examples there work in 1.1 as well as 1.2, with the 
exception of the cql3 CREATE and ALTER for song_tags.  Those require 1.2.  All 
the pasted output is in fact from 1.1.)


 Make CQL work naturally with wide rows
 --

 Key: CASSANDRA-4815
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815
 Project: Cassandra
  Issue Type: Wish
Reporter: Edward Capriolo

 I find that CQL3 is quite obtuse and does not provide me a language useful 
 for accessing my data. First, lets point out how we should design Cassandra 
 data. 
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 4) optimize for blind writes
 So here is a schema that abides by these tried and tested rules large 
 production uses are employing today. 
 Say we have a table of movie objects:
 Movie
 Name
 Description
 - tags   (string)
 - credits composite(role string, name string )
 -1 likesToday
 -1 blacklisted
 The above structure is a movie notice it hold a mix of static and dynamic 
 columns, but the other all number of columns is not very large. (even if it 
 was larger this is OK as well) Notice this table is not just 
 a single one to many relationship, it has 1 to 1 data and it has two sets of 
 1 to many data.
 The schema today is declared something like this:
 create column family movies
 with default_comparator=UTF8Type and
   column_metadata =
   [
 {column_name: blacklisted, validation_class: int},
 {column_name: likestoday, validation_class: long},
 {column_name: description, validation_class: UTF8Type}
   ];
 We should be able to insert data like this:
 set ['Cassandra Database, not looking for a seQL']['blacklisted']=1;
 set ['Cassandra Database, not looking for a seQL']['likesToday']=34;
 set ['Cassandra Database, not looking for a 
 seQL']['credits-dir']='director:asf';
 set ['Cassandra Database, not looking for a 
 seQL']['credits-jir]='jiraguy:bob';
 set ['Cassandra Database, not looking for a seQL']['tags-action']='';
 set ['Cassandra Database, not looking for a seQL']['tags-adventure']='';
 set ['Cassandra Database, not looking for a seQL']['tags-romance']='';
 set ['Cassandra Database, not looking for a seQL']['tags-programming']='';
 This is the correct way to do it. 1 seek to find all the information related 
 to a movie. As long as this row does
 not get large there is no reason to optimize by breaking data into other 
 column families. (Notice you can not transpose this
 because movies is two 1-to-many relationships of potentially different types)
 Lets look at the CQL3 way to do this design:
 First, contrary to the original design of cassandra CQL does not like wide 
 rows. It also does not have a good way to dealing with dynamic rows together 
 with static rows either.
 You have two options:
 Option 1: lose all schema
 create table movies ( name string, column blob, value blob, primary 
 key(name)) with compact storage.
 This method is not so hot we have not lost all our validators, and by the way 
 you have to physically shutdown everything and rename files and recreate your 
 schema if you want to inform cassandra that a current table should be 
 compact. This could at very least be just a metadata change. Also you can not 
 add column schema either.
 Option 2  Normalize (is even worse)
 create table movie (name String, description string, likestoday int, 
 blacklisted int);
 create table movecredits( name string, role string, personname string, 
 primary key(name,role) );
 create table movetags( name string, tag string, primary key (name,tag) );
 This is a terrible design, of the 4 key characteristics how cassandra data 
 should be designed it fails 3:
 It does not:
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 Why is Cassandra steering toward this course, by making a language that does 
 not understand wide rows?
 So what can be done? My suggestions: 
 Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a 
 virtual view that is compact storage with no work to migrate data and 
 recreate schemas. Every table should have a compact view for the schemaless, 
 or a simple query hint like /*transposed*/ should make this change.
 Metadata should be definable by regex. For example, all columnes named tag* 
 are of type string.
 CQL should have the column[slice_start] .. column[slice_end] operator from 
 cql2. 
 CQL should support 

[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows

2012-10-22 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481513#comment-13481513
 ] 

Jonathan Ellis commented on CASSANDRA-4815:
---

To add to that, if you want mostly static columns, but some schemaless then 
you can throw the schemaless ones in a Map.  This will NOT be easily 
accessible from Thrift -- but it's a good example of the kinds of things that 
cql3 makes easier.

 Make CQL work naturally with wide rows
 --

 Key: CASSANDRA-4815
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815
 Project: Cassandra
  Issue Type: Wish
Reporter: Edward Capriolo

 I find that CQL3 is quite obtuse and does not provide me a language useful 
 for accessing my data. First, lets point out how we should design Cassandra 
 data. 
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 4) optimize for blind writes
 So here is a schema that abides by these tried and tested rules large 
 production uses are employing today. 
 Say we have a table of movie objects:
 Movie
 Name
 Description
 - tags   (string)
 - credits composite(role string, name string )
 -1 likesToday
 -1 blacklisted
 The above structure is a movie notice it hold a mix of static and dynamic 
 columns, but the other all number of columns is not very large. (even if it 
 was larger this is OK as well) Notice this table is not just 
 a single one to many relationship, it has 1 to 1 data and it has two sets of 
 1 to many data.
 The schema today is declared something like this:
 create column family movies
 with default_comparator=UTF8Type and
   column_metadata =
   [
 {column_name: blacklisted, validation_class: int},
 {column_name: likestoday, validation_class: long},
 {column_name: description, validation_class: UTF8Type}
   ];
 We should be able to insert data like this:
 set ['Cassandra Database, not looking for a seQL']['blacklisted']=1;
 set ['Cassandra Database, not looking for a seQL']['likesToday']=34;
 set ['Cassandra Database, not looking for a 
 seQL']['credits-dir']='director:asf';
 set ['Cassandra Database, not looking for a 
 seQL']['credits-jir]='jiraguy:bob';
 set ['Cassandra Database, not looking for a seQL']['tags-action']='';
 set ['Cassandra Database, not looking for a seQL']['tags-adventure']='';
 set ['Cassandra Database, not looking for a seQL']['tags-romance']='';
 set ['Cassandra Database, not looking for a seQL']['tags-programming']='';
 This is the correct way to do it. 1 seek to find all the information related 
 to a movie. As long as this row does
 not get large there is no reason to optimize by breaking data into other 
 column families. (Notice you can not transpose this
 because movies is two 1-to-many relationships of potentially different types)
 Lets look at the CQL3 way to do this design:
 First, contrary to the original design of cassandra CQL does not like wide 
 rows. It also does not have a good way to dealing with dynamic rows together 
 with static rows either.
 You have two options:
 Option 1: lose all schema
 create table movies ( name string, column blob, value blob, primary 
 key(name)) with compact storage.
 This method is not so hot we have not lost all our validators, and by the way 
 you have to physically shutdown everything and rename files and recreate your 
 schema if you want to inform cassandra that a current table should be 
 compact. This could at very least be just a metadata change. Also you can not 
 add column schema either.
 Option 2  Normalize (is even worse)
 create table movie (name String, description string, likestoday int, 
 blacklisted int);
 create table movecredits( name string, role string, personname string, 
 primary key(name,role) );
 create table movetags( name string, tag string, primary key (name,tag) );
 This is a terrible design, of the 4 key characteristics how cassandra data 
 should be designed it fails 3:
 It does not:
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 Why is Cassandra steering toward this course, by making a language that does 
 not understand wide rows?
 So what can be done? My suggestions: 
 Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a 
 virtual view that is compact storage with no work to migrate data and 
 recreate schemas. Every table should have a compact view for the schemaless, 
 or a simple query hint like /*transposed*/ should make this change.
 Metadata should be definable by regex. For example, all columnes named tag* 
 are of type string.
 CQL should have the column[slice_start] .. column[slice_end] operator from 
 cql2. 
 CQL should support current users, users should not have to 
 switch between CQL versions, and possibly thrift, to work with wide rows. The 
 language should work for them even if 
 it not expressly designed for them. 

[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows

2012-10-22 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481525#comment-13481525
 ] 

Jonathan Ellis commented on CASSANDRA-4815:
---

bq. What does this mean? 'We can do that'. If we have an old style schema don't 
we need to be able to alter a current table.

Only if you want to add meaningful names.  That's what this next part is saying:

If we simply use the old schema directly as-is, Cassandra will give cell names 
and values autogenerated CQL3 names: column1, column2, and so forth. Here I’m 
accessing the data inserted earlier from CQL2, but with cqlsh --cql3:

{noformat}
SELECT * FROM song_tags;

id   | column1 | value
--+-+---
8a172618-b121-4136-bb10-f665cfc469eb |2007 |
8a172618-b121-4136-bb10-f665cfc469eb |  covers |
a3e64f8f-bd44-4f28-b8d9-6938726e34d4 |1973 |
a3e64f8f-bd44-4f28-b8d9-6938726e34d4 |   blues |
{noformat}

... that said, as Sylvain points out we do have CASSANDRA-4822 open to allow 
changing those default names without dropping and recreating the table 
definition.

bq. Does it make sense to implement CLI like SET and GET?

Not in CQL-the-language, and I don't think even in cqlsh-the-utility.  I 
understand the appeal of the convenience, but the abstraction leakage it would 
introduce threatens to undo all the work we're doing to make CQL3 something you 
can use on its own terms.

(As far as performance goes, prepared statements make the length of the string 
being parsed initially a non-issue.)

 Make CQL work naturally with wide rows
 --

 Key: CASSANDRA-4815
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815
 Project: Cassandra
  Issue Type: Wish
Reporter: Edward Capriolo

 I find that CQL3 is quite obtuse and does not provide me a language useful 
 for accessing my data. First, lets point out how we should design Cassandra 
 data. 
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 4) optimize for blind writes
 So here is a schema that abides by these tried and tested rules large 
 production uses are employing today. 
 Say we have a table of movie objects:
 Movie
 Name
 Description
 - tags   (string)
 - credits composite(role string, name string )
 -1 likesToday
 -1 blacklisted
 The above structure is a movie notice it hold a mix of static and dynamic 
 columns, but the other all number of columns is not very large. (even if it 
 was larger this is OK as well) Notice this table is not just 
 a single one to many relationship, it has 1 to 1 data and it has two sets of 
 1 to many data.
 The schema today is declared something like this:
 create column family movies
 with default_comparator=UTF8Type and
   column_metadata =
   [
 {column_name: blacklisted, validation_class: int},
 {column_name: likestoday, validation_class: long},
 {column_name: description, validation_class: UTF8Type}
   ];
 We should be able to insert data like this:
 set ['Cassandra Database, not looking for a seQL']['blacklisted']=1;
 set ['Cassandra Database, not looking for a seQL']['likesToday']=34;
 set ['Cassandra Database, not looking for a 
 seQL']['credits-dir']='director:asf';
 set ['Cassandra Database, not looking for a 
 seQL']['credits-jir]='jiraguy:bob';
 set ['Cassandra Database, not looking for a seQL']['tags-action']='';
 set ['Cassandra Database, not looking for a seQL']['tags-adventure']='';
 set ['Cassandra Database, not looking for a seQL']['tags-romance']='';
 set ['Cassandra Database, not looking for a seQL']['tags-programming']='';
 This is the correct way to do it. 1 seek to find all the information related 
 to a movie. As long as this row does
 not get large there is no reason to optimize by breaking data into other 
 column families. (Notice you can not transpose this
 because movies is two 1-to-many relationships of potentially different types)
 Lets look at the CQL3 way to do this design:
 First, contrary to the original design of cassandra CQL does not like wide 
 rows. It also does not have a good way to dealing with dynamic rows together 
 with static rows either.
 You have two options:
 Option 1: lose all schema
 create table movies ( name string, column blob, value blob, primary 
 key(name)) with compact storage.
 This method is not so hot we have not lost all our validators, and by the way 
 you have to physically shutdown everything and rename files and recreate your 
 schema if you want to inform cassandra that a current table should be 
 compact. This could at very least be just a metadata change. Also you can not 
 add column schema either.
 Option 2  Normalize (is even worse)
 create table movie (name String, description string, likestoday int, 
 blacklisted int);
 create table movecredits( name string, role 

[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows

2012-10-22 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481793#comment-13481793
 ] 

Edward Capriolo commented on CASSANDRA-4815:


{noformat}Not in CQL-the-language, and I don't think even in cqlsh-the-utility. 
I understand the appeal of the convenience, but the abstraction leakage it 
would introduce threatens to undo all the work we're doing to make CQL3 
something you can use on its own terms.{noformat}

But what if I want to think of Cassandra as a memcache not a relational 
database. This is one of my ticket points, CQL should support all the use cases 
it can. You are calling it abstraction leakage but I think of it as a natural 
way with working with Cassandra. But I do agree that SELECTS are better then 
cli 'get' in most cases. What is missing is the SET side. 

{noformat}
CREATE TABLE schemaless (
  key blob,
  column blob,
  value blob,
  PRIMARY KEY (key, column)
) WITH COMPACT STORAGE
{noformat}

Now to insert into this table I need to format everything as hex.

INSERT INTO SCHEMALESS (key,column,value) VALUES ('HEX','HEX','HEX');

The CLI has many useful functions like ascii(' '), or utf8(' '). Assume does 
not seem to have an effect here. This is discussed in CASSANDRA-3799.

 Make CQL work naturally with wide rows
 --

 Key: CASSANDRA-4815
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815
 Project: Cassandra
  Issue Type: Wish
Reporter: Edward Capriolo

 I find that CQL3 is quite obtuse and does not provide me a language useful 
 for accessing my data. First, lets point out how we should design Cassandra 
 data. 
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 4) optimize for blind writes
 So here is a schema that abides by these tried and tested rules large 
 production uses are employing today. 
 Say we have a table of movie objects:
 Movie
 Name
 Description
 - tags   (string)
 - credits composite(role string, name string )
 -1 likesToday
 -1 blacklisted
 The above structure is a movie notice it hold a mix of static and dynamic 
 columns, but the other all number of columns is not very large. (even if it 
 was larger this is OK as well) Notice this table is not just 
 a single one to many relationship, it has 1 to 1 data and it has two sets of 
 1 to many data.
 The schema today is declared something like this:
 create column family movies
 with default_comparator=UTF8Type and
   column_metadata =
   [
 {column_name: blacklisted, validation_class: int},
 {column_name: likestoday, validation_class: long},
 {column_name: description, validation_class: UTF8Type}
   ];
 We should be able to insert data like this:
 set ['Cassandra Database, not looking for a seQL']['blacklisted']=1;
 set ['Cassandra Database, not looking for a seQL']['likesToday']=34;
 set ['Cassandra Database, not looking for a 
 seQL']['credits-dir']='director:asf';
 set ['Cassandra Database, not looking for a 
 seQL']['credits-jir]='jiraguy:bob';
 set ['Cassandra Database, not looking for a seQL']['tags-action']='';
 set ['Cassandra Database, not looking for a seQL']['tags-adventure']='';
 set ['Cassandra Database, not looking for a seQL']['tags-romance']='';
 set ['Cassandra Database, not looking for a seQL']['tags-programming']='';
 This is the correct way to do it. 1 seek to find all the information related 
 to a movie. As long as this row does
 not get large there is no reason to optimize by breaking data into other 
 column families. (Notice you can not transpose this
 because movies is two 1-to-many relationships of potentially different types)
 Lets look at the CQL3 way to do this design:
 First, contrary to the original design of cassandra CQL does not like wide 
 rows. It also does not have a good way to dealing with dynamic rows together 
 with static rows either.
 You have two options:
 Option 1: lose all schema
 create table movies ( name string, column blob, value blob, primary 
 key(name)) with compact storage.
 This method is not so hot we have not lost all our validators, and by the way 
 you have to physically shutdown everything and rename files and recreate your 
 schema if you want to inform cassandra that a current table should be 
 compact. This could at very least be just a metadata change. Also you can not 
 add column schema either.
 Option 2  Normalize (is even worse)
 create table movie (name String, description string, likestoday int, 
 blacklisted int);
 create table movecredits( name string, role string, personname string, 
 primary key(name,role) );
 create table movetags( name string, tag string, primary key (name,tag) );
 This is a terrible design, of the 4 key characteristics how cassandra data 
 should be designed it fails 3:
 It does not:
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 Why is 

[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows

2012-10-21 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480971#comment-13480971
 ] 

Edward Capriolo commented on CASSANDRA-4815:


Also without reading much of the background tickets I am pretty curious as 
about the syntax

{noformat}
PRIMARY KEY (id, tag_name)
{noformat} 

What is going to happen if Cassandra and the CQL language actually adds true 
composite row keys? 



 Make CQL work naturally with wide rows
 --

 Key: CASSANDRA-4815
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815
 Project: Cassandra
  Issue Type: Wish
Reporter: Edward Capriolo

 I find that CQL3 is quite obtuse and does not provide me a language useful 
 for accessing my data. First, lets point out how we should design Cassandra 
 data. 
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 4) optimize for blind writes
 So here is a schema that abides by these tried and tested rules large 
 production uses are employing today. 
 Say we have a table of movie objects:
 Movie
 Name
 Description
 - tags   (string)
 - credits composite(role string, name string )
 -1 likesToday
 -1 blacklisted
 The above structure is a movie notice it hold a mix of static and dynamic 
 columns, but the other all number of columns is not very large. (even if it 
 was larger this is OK as well) Notice this table is not just 
 a single one to many relationship, it has 1 to 1 data and it has two sets of 
 1 to many data.
 The schema today is declared something like this:
 create column family movies
 with default_comparator=UTF8Type and
   column_metadata =
   [
 {column_name: blacklisted, validation_class: int},
 {column_name: likestoday, validation_class: long},
 {column_name: description, validation_class: UTF8Type}
   ];
 We should be able to insert data like this:
 set ['Cassandra Database, not looking for a seQL']['blacklisted']=1;
 set ['Cassandra Database, not looking for a seQL']['likesToday']=34;
 set ['Cassandra Database, not looking for a 
 seQL']['credits-dir']='director:asf';
 set ['Cassandra Database, not looking for a 
 seQL']['credits-jir]='jiraguy:bob';
 set ['Cassandra Database, not looking for a seQL']['tags-action']='';
 set ['Cassandra Database, not looking for a seQL']['tags-adventure']='';
 set ['Cassandra Database, not looking for a seQL']['tags-romance']='';
 set ['Cassandra Database, not looking for a seQL']['tags-programming']='';
 This is the correct way to do it. 1 seek to find all the information related 
 to a movie. As long as this row does
 not get large there is no reason to optimize by breaking data into other 
 column families. (Notice you can not transpose this
 because movies is two 1-to-many relationships of potentially different types)
 Lets look at the CQL3 way to do this design:
 First, contrary to the original design of cassandra CQL does not like wide 
 rows. It also does not have a good way to dealing with dynamic rows together 
 with static rows either.
 You have two options:
 Option 1: lose all schema
 create table movies ( name string, column blob, value blob, primary 
 key(name)) with compact storage.
 This method is not so hot we have not lost all our validators, and by the way 
 you have to physically shutdown everything and rename files and recreate your 
 schema if you want to inform cassandra that a current table should be 
 compact. This could at very least be just a metadata change. Also you can not 
 add column schema either.
 Option 2  Normalize (is even worse)
 create table movie (name String, description string, likestoday int, 
 blacklisted int);
 create table movecredits( name string, role string, personname string, 
 primary key(name,role) );
 create table movetags( name string, tag string, primary key (name,tag) );
 This is a terrible design, of the 4 key characteristics how cassandra data 
 should be designed it fails 3:
 It does not:
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 Why is Cassandra steering toward this course, by making a language that does 
 not understand wide rows?
 So what can be done? My suggestions: 
 Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a 
 virtual view that is compact storage with no work to migrate data and 
 recreate schemas. Every table should have a compact view for the schemaless, 
 or a simple query hint like /*transposed*/ should make this change.
 Metadata should be definable by regex. For example, all columnes named tag* 
 are of type string.
 CQL should have the column[slice_start] .. column[slice_end] operator from 
 cql2. 
 CQL should support current users, users should not have to 
 switch between CQL versions, and possibly thrift, to work with wide rows. The 
 language should work for them even if 
 it not expressly designed for them. 

[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows

2012-10-21 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480976#comment-13480976
 ] 

Edward Capriolo commented on CASSANDRA-4815:


So in Cassandra 1.2.0 you can create a table with no columns other then the 
primary key
{noformat}
cqlsh:testkeyspace create table sample ( keycolumn varchar, primary key 
(keycolumn) );
{noformat}
But you can not insert to it.
cqlsh:testkeyspace insert into sample ( keycolumn, 'age' ) values ('ed','30')  
;

And rather surprisingly it creates a table with metadata I did not ask for. It 
assumes the comparator is a composite of a single UTF8Type

{noformat}
create column family sample
  with column_type = 'Standard'
  and comparator = 'CompositeType(org.apache.cassandra.db.marshal.UTF8Type)'
  and default_validation_class = 'UTF8Type'
{noformat}

Not what I was going for :(

 Make CQL work naturally with wide rows
 --

 Key: CASSANDRA-4815
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815
 Project: Cassandra
  Issue Type: Wish
Reporter: Edward Capriolo

 I find that CQL3 is quite obtuse and does not provide me a language useful 
 for accessing my data. First, lets point out how we should design Cassandra 
 data. 
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 4) optimize for blind writes
 So here is a schema that abides by these tried and tested rules large 
 production uses are employing today. 
 Say we have a table of movie objects:
 Movie
 Name
 Description
 - tags   (string)
 - credits composite(role string, name string )
 -1 likesToday
 -1 blacklisted
 The above structure is a movie notice it hold a mix of static and dynamic 
 columns, but the other all number of columns is not very large. (even if it 
 was larger this is OK as well) Notice this table is not just 
 a single one to many relationship, it has 1 to 1 data and it has two sets of 
 1 to many data.
 The schema today is declared something like this:
 create column family movies
 with default_comparator=UTF8Type and
   column_metadata =
   [
 {column_name: blacklisted, validation_class: int},
 {column_name: likestoday, validation_class: long},
 {column_name: description, validation_class: UTF8Type}
   ];
 We should be able to insert data like this:
 set ['Cassandra Database, not looking for a seQL']['blacklisted']=1;
 set ['Cassandra Database, not looking for a seQL']['likesToday']=34;
 set ['Cassandra Database, not looking for a 
 seQL']['credits-dir']='director:asf';
 set ['Cassandra Database, not looking for a 
 seQL']['credits-jir]='jiraguy:bob';
 set ['Cassandra Database, not looking for a seQL']['tags-action']='';
 set ['Cassandra Database, not looking for a seQL']['tags-adventure']='';
 set ['Cassandra Database, not looking for a seQL']['tags-romance']='';
 set ['Cassandra Database, not looking for a seQL']['tags-programming']='';
 This is the correct way to do it. 1 seek to find all the information related 
 to a movie. As long as this row does
 not get large there is no reason to optimize by breaking data into other 
 column families. (Notice you can not transpose this
 because movies is two 1-to-many relationships of potentially different types)
 Lets look at the CQL3 way to do this design:
 First, contrary to the original design of cassandra CQL does not like wide 
 rows. It also does not have a good way to dealing with dynamic rows together 
 with static rows either.
 You have two options:
 Option 1: lose all schema
 create table movies ( name string, column blob, value blob, primary 
 key(name)) with compact storage.
 This method is not so hot we have not lost all our validators, and by the way 
 you have to physically shutdown everything and rename files and recreate your 
 schema if you want to inform cassandra that a current table should be 
 compact. This could at very least be just a metadata change. Also you can not 
 add column schema either.
 Option 2  Normalize (is even worse)
 create table movie (name String, description string, likestoday int, 
 blacklisted int);
 create table movecredits( name string, role string, personname string, 
 primary key(name,role) );
 create table movetags( name string, tag string, primary key (name,tag) );
 This is a terrible design, of the 4 key characteristics how cassandra data 
 should be designed it fails 3:
 It does not:
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 Why is Cassandra steering toward this course, by making a language that does 
 not understand wide rows?
 So what can be done? My suggestions: 
 Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a 
 virtual view that is compact storage with no work to migrate data and 
 recreate schemas. Every table should have a compact view for the schemaless, 
 or a simple query hint like 

[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows

2012-10-21 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481106#comment-13481106
 ] 

Nick Bailey commented on CASSANDRA-4815:


bq. What is going to happen if Cassandra and the CQL language actually adds 
true composite row keys?

CASSANDRA-4179

 Make CQL work naturally with wide rows
 --

 Key: CASSANDRA-4815
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815
 Project: Cassandra
  Issue Type: Wish
Reporter: Edward Capriolo

 I find that CQL3 is quite obtuse and does not provide me a language useful 
 for accessing my data. First, lets point out how we should design Cassandra 
 data. 
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 4) optimize for blind writes
 So here is a schema that abides by these tried and tested rules large 
 production uses are employing today. 
 Say we have a table of movie objects:
 Movie
 Name
 Description
 - tags   (string)
 - credits composite(role string, name string )
 -1 likesToday
 -1 blacklisted
 The above structure is a movie notice it hold a mix of static and dynamic 
 columns, but the other all number of columns is not very large. (even if it 
 was larger this is OK as well) Notice this table is not just 
 a single one to many relationship, it has 1 to 1 data and it has two sets of 
 1 to many data.
 The schema today is declared something like this:
 create column family movies
 with default_comparator=UTF8Type and
   column_metadata =
   [
 {column_name: blacklisted, validation_class: int},
 {column_name: likestoday, validation_class: long},
 {column_name: description, validation_class: UTF8Type}
   ];
 We should be able to insert data like this:
 set ['Cassandra Database, not looking for a seQL']['blacklisted']=1;
 set ['Cassandra Database, not looking for a seQL']['likesToday']=34;
 set ['Cassandra Database, not looking for a 
 seQL']['credits-dir']='director:asf';
 set ['Cassandra Database, not looking for a 
 seQL']['credits-jir]='jiraguy:bob';
 set ['Cassandra Database, not looking for a seQL']['tags-action']='';
 set ['Cassandra Database, not looking for a seQL']['tags-adventure']='';
 set ['Cassandra Database, not looking for a seQL']['tags-romance']='';
 set ['Cassandra Database, not looking for a seQL']['tags-programming']='';
 This is the correct way to do it. 1 seek to find all the information related 
 to a movie. As long as this row does
 not get large there is no reason to optimize by breaking data into other 
 column families. (Notice you can not transpose this
 because movies is two 1-to-many relationships of potentially different types)
 Lets look at the CQL3 way to do this design:
 First, contrary to the original design of cassandra CQL does not like wide 
 rows. It also does not have a good way to dealing with dynamic rows together 
 with static rows either.
 You have two options:
 Option 1: lose all schema
 create table movies ( name string, column blob, value blob, primary 
 key(name)) with compact storage.
 This method is not so hot we have not lost all our validators, and by the way 
 you have to physically shutdown everything and rename files and recreate your 
 schema if you want to inform cassandra that a current table should be 
 compact. This could at very least be just a metadata change. Also you can not 
 add column schema either.
 Option 2  Normalize (is even worse)
 create table movie (name String, description string, likestoday int, 
 blacklisted int);
 create table movecredits( name string, role string, personname string, 
 primary key(name,role) );
 create table movetags( name string, tag string, primary key (name,tag) );
 This is a terrible design, of the 4 key characteristics how cassandra data 
 should be designed it fails 3:
 It does not:
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 Why is Cassandra steering toward this course, by making a language that does 
 not understand wide rows?
 So what can be done? My suggestions: 
 Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a 
 virtual view that is compact storage with no work to migrate data and 
 recreate schemas. Every table should have a compact view for the schemaless, 
 or a simple query hint like /*transposed*/ should make this change.
 Metadata should be definable by regex. For example, all columnes named tag* 
 are of type string.
 CQL should have the column[slice_start] .. column[slice_end] operator from 
 cql2. 
 CQL should support current users, users should not have to 
 switch between CQL versions, and possibly thrift, to work with wide rows. The 
 language should work for them even if 
 it not expressly designed for them. Some of these features are already part 
 of cql2 so they should be carried over.
 Also what needs to not happen is someone to make a 

[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows

2012-10-21 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481108#comment-13481108
 ] 

Edward Capriolo commented on CASSANDRA-4815:


Working with CQL today another idea came to me. Does it make sense to implement 
CLI like SET and GET? SET and GET are actually fairly natural ways to work with 
schema-less cassandra. Also in terms of performance a CLI set statement is 
smaller then the equivalent insert into. This would serve a a no nonsense way 
to get data into a CF.

 Make CQL work naturally with wide rows
 --

 Key: CASSANDRA-4815
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815
 Project: Cassandra
  Issue Type: Wish
Reporter: Edward Capriolo

 I find that CQL3 is quite obtuse and does not provide me a language useful 
 for accessing my data. First, lets point out how we should design Cassandra 
 data. 
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 4) optimize for blind writes
 So here is a schema that abides by these tried and tested rules large 
 production uses are employing today. 
 Say we have a table of movie objects:
 Movie
 Name
 Description
 - tags   (string)
 - credits composite(role string, name string )
 -1 likesToday
 -1 blacklisted
 The above structure is a movie notice it hold a mix of static and dynamic 
 columns, but the other all number of columns is not very large. (even if it 
 was larger this is OK as well) Notice this table is not just 
 a single one to many relationship, it has 1 to 1 data and it has two sets of 
 1 to many data.
 The schema today is declared something like this:
 create column family movies
 with default_comparator=UTF8Type and
   column_metadata =
   [
 {column_name: blacklisted, validation_class: int},
 {column_name: likestoday, validation_class: long},
 {column_name: description, validation_class: UTF8Type}
   ];
 We should be able to insert data like this:
 set ['Cassandra Database, not looking for a seQL']['blacklisted']=1;
 set ['Cassandra Database, not looking for a seQL']['likesToday']=34;
 set ['Cassandra Database, not looking for a 
 seQL']['credits-dir']='director:asf';
 set ['Cassandra Database, not looking for a 
 seQL']['credits-jir]='jiraguy:bob';
 set ['Cassandra Database, not looking for a seQL']['tags-action']='';
 set ['Cassandra Database, not looking for a seQL']['tags-adventure']='';
 set ['Cassandra Database, not looking for a seQL']['tags-romance']='';
 set ['Cassandra Database, not looking for a seQL']['tags-programming']='';
 This is the correct way to do it. 1 seek to find all the information related 
 to a movie. As long as this row does
 not get large there is no reason to optimize by breaking data into other 
 column families. (Notice you can not transpose this
 because movies is two 1-to-many relationships of potentially different types)
 Lets look at the CQL3 way to do this design:
 First, contrary to the original design of cassandra CQL does not like wide 
 rows. It also does not have a good way to dealing with dynamic rows together 
 with static rows either.
 You have two options:
 Option 1: lose all schema
 create table movies ( name string, column blob, value blob, primary 
 key(name)) with compact storage.
 This method is not so hot we have not lost all our validators, and by the way 
 you have to physically shutdown everything and rename files and recreate your 
 schema if you want to inform cassandra that a current table should be 
 compact. This could at very least be just a metadata change. Also you can not 
 add column schema either.
 Option 2  Normalize (is even worse)
 create table movie (name String, description string, likestoday int, 
 blacklisted int);
 create table movecredits( name string, role string, personname string, 
 primary key(name,role) );
 create table movetags( name string, tag string, primary key (name,tag) );
 This is a terrible design, of the 4 key characteristics how cassandra data 
 should be designed it fails 3:
 It does not:
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 Why is Cassandra steering toward this course, by making a language that does 
 not understand wide rows?
 So what can be done? My suggestions: 
 Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a 
 virtual view that is compact storage with no work to migrate data and 
 recreate schemas. Every table should have a compact view for the schemaless, 
 or a simple query hint like /*transposed*/ should make this change.
 Metadata should be definable by regex. For example, all columnes named tag* 
 are of type string.
 CQL should have the column[slice_start] .. column[slice_end] operator from 
 cql2. 
 CQL should support current users, users should not have to 
 switch between CQL versions, and possibly thrift, to work with 

[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows

2012-10-20 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480883#comment-13480883
 ] 

Edward Capriolo commented on CASSANDRA-4815:


Thanks for building that Jonathan. It clears up a couple things. I still have 
some questions/possible feature requests.

Can I create a schema less table?

{noformat}
cqlsh:testkeyspace create table simple (a varchar, primary key(a) );
Bad Request: No definition found that is not part of the PRIMARY KEY
{noformat}


 Make CQL work naturally with wide rows
 --

 Key: CASSANDRA-4815
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815
 Project: Cassandra
  Issue Type: Wish
Reporter: Edward Capriolo

 I find that CQL3 is quite obtuse and does not provide me a language useful 
 for accessing my data. First, lets point out how we should design Cassandra 
 data. 
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 4) optimize for blind writes
 So here is a schema that abides by these tried and tested rules large 
 production uses are employing today. 
 Say we have a table of movie objects:
 Movie
 Name
 Description
 - tags   (string)
 - credits composite(role string, name string )
 -1 likesToday
 -1 blacklisted
 The above structure is a movie notice it hold a mix of static and dynamic 
 columns, but the other all number of columns is not very large. (even if it 
 was larger this is OK as well) Notice this table is not just 
 a single one to many relationship, it has 1 to 1 data and it has two sets of 
 1 to many data.
 The schema today is declared something like this:
 create column family movies
 with default_comparator=UTF8Type and
   column_metadata =
   [
 {column_name: blacklisted, validation_class: int},
 {column_name: likestoday, validation_class: long},
 {column_name: description, validation_class: UTF8Type}
   ];
 We should be able to insert data like this:
 set ['Cassandra Database, not looking for a seQL']['blacklisted']=1;
 set ['Cassandra Database, not looking for a seQL']['likesToday']=34;
 set ['Cassandra Database, not looking for a 
 seQL']['credits-dir']='director:asf';
 set ['Cassandra Database, not looking for a 
 seQL']['credits-jir]='jiraguy:bob';
 set ['Cassandra Database, not looking for a seQL']['tags-action']='';
 set ['Cassandra Database, not looking for a seQL']['tags-adventure']='';
 set ['Cassandra Database, not looking for a seQL']['tags-romance']='';
 set ['Cassandra Database, not looking for a seQL']['tags-programming']='';
 This is the correct way to do it. 1 seek to find all the information related 
 to a movie. As long as this row does
 not get large there is no reason to optimize by breaking data into other 
 column families. (Notice you can not transpose this
 because movies is two 1-to-many relationships of potentially different types)
 Lets look at the CQL3 way to do this design:
 First, contrary to the original design of cassandra CQL does not like wide 
 rows. It also does not have a good way to dealing with dynamic rows together 
 with static rows either.
 You have two options:
 Option 1: lose all schema
 create table movies ( name string, column blob, value blob, primary 
 key(name)) with compact storage.
 This method is not so hot we have not lost all our validators, and by the way 
 you have to physically shutdown everything and rename files and recreate your 
 schema if you want to inform cassandra that a current table should be 
 compact. This could at very least be just a metadata change. Also you can not 
 add column schema either.
 Option 2  Normalize (is even worse)
 create table movie (name String, description string, likestoday int, 
 blacklisted int);
 create table movecredits( name string, role string, personname string, 
 primary key(name,role) );
 create table movetags( name string, tag string, primary key (name,tag) );
 This is a terrible design, of the 4 key characteristics how cassandra data 
 should be designed it fails 3:
 It does not:
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 Why is Cassandra steering toward this course, by making a language that does 
 not understand wide rows?
 So what can be done? My suggestions: 
 Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a 
 virtual view that is compact storage with no work to migrate data and 
 recreate schemas. Every table should have a compact view for the schemaless, 
 or a simple query hint like /*transposed*/ should make this change.
 Metadata should be definable by regex. For example, all columnes named tag* 
 are of type string.
 CQL should have the column[slice_start] .. column[slice_end] operator from 
 cql2. 
 CQL should support current users, users should not have to 
 switch between CQL versions, and possibly thrift, to work with wide rows. The 
 

[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows

2012-10-20 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480884#comment-13480884
 ] 

Edward Capriolo commented on CASSANDRA-4815:


The syntax suggests you should be able to create a table that only has a 
primary key.

{noformat}
CREATE TABLE cfname ( colname type PRIMARY KEY [,
colname type [, ...]] )
   [WITH optionname = val [AND optionname = val [...]]];
{noformat}

I think a user SHOULD be able to do this, because cassandra can be schemaless 
CQL should provide a way to create this type of table. (since we can SELECT * 
from tables created from the CLI)

 Make CQL work naturally with wide rows
 --

 Key: CASSANDRA-4815
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815
 Project: Cassandra
  Issue Type: Wish
Reporter: Edward Capriolo

 I find that CQL3 is quite obtuse and does not provide me a language useful 
 for accessing my data. First, lets point out how we should design Cassandra 
 data. 
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 4) optimize for blind writes
 So here is a schema that abides by these tried and tested rules large 
 production uses are employing today. 
 Say we have a table of movie objects:
 Movie
 Name
 Description
 - tags   (string)
 - credits composite(role string, name string )
 -1 likesToday
 -1 blacklisted
 The above structure is a movie notice it hold a mix of static and dynamic 
 columns, but the other all number of columns is not very large. (even if it 
 was larger this is OK as well) Notice this table is not just 
 a single one to many relationship, it has 1 to 1 data and it has two sets of 
 1 to many data.
 The schema today is declared something like this:
 create column family movies
 with default_comparator=UTF8Type and
   column_metadata =
   [
 {column_name: blacklisted, validation_class: int},
 {column_name: likestoday, validation_class: long},
 {column_name: description, validation_class: UTF8Type}
   ];
 We should be able to insert data like this:
 set ['Cassandra Database, not looking for a seQL']['blacklisted']=1;
 set ['Cassandra Database, not looking for a seQL']['likesToday']=34;
 set ['Cassandra Database, not looking for a 
 seQL']['credits-dir']='director:asf';
 set ['Cassandra Database, not looking for a 
 seQL']['credits-jir]='jiraguy:bob';
 set ['Cassandra Database, not looking for a seQL']['tags-action']='';
 set ['Cassandra Database, not looking for a seQL']['tags-adventure']='';
 set ['Cassandra Database, not looking for a seQL']['tags-romance']='';
 set ['Cassandra Database, not looking for a seQL']['tags-programming']='';
 This is the correct way to do it. 1 seek to find all the information related 
 to a movie. As long as this row does
 not get large there is no reason to optimize by breaking data into other 
 column families. (Notice you can not transpose this
 because movies is two 1-to-many relationships of potentially different types)
 Lets look at the CQL3 way to do this design:
 First, contrary to the original design of cassandra CQL does not like wide 
 rows. It also does not have a good way to dealing with dynamic rows together 
 with static rows either.
 You have two options:
 Option 1: lose all schema
 create table movies ( name string, column blob, value blob, primary 
 key(name)) with compact storage.
 This method is not so hot we have not lost all our validators, and by the way 
 you have to physically shutdown everything and rename files and recreate your 
 schema if you want to inform cassandra that a current table should be 
 compact. This could at very least be just a metadata change. Also you can not 
 add column schema either.
 Option 2  Normalize (is even worse)
 create table movie (name String, description string, likestoday int, 
 blacklisted int);
 create table movecredits( name string, role string, personname string, 
 primary key(name,role) );
 create table movetags( name string, tag string, primary key (name,tag) );
 This is a terrible design, of the 4 key characteristics how cassandra data 
 should be designed it fails 3:
 It does not:
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 Why is Cassandra steering toward this course, by making a language that does 
 not understand wide rows?
 So what can be done? My suggestions: 
 Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a 
 virtual view that is compact storage with no work to migrate data and 
 recreate schemas. Every table should have a compact view for the schemaless, 
 or a simple query hint like /*transposed*/ should make this change.
 Metadata should be definable by regex. For example, all columnes named tag* 
 are of type string.
 CQL should have the column[slice_start] .. column[slice_end] operator 

[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows

2012-10-20 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480886#comment-13480886
 ] 

Jeremy Hanna commented on CASSANDRA-4815:
-

I let Ed know that in 1.2 there was support for creating a table with only a 
primary key (thanks Patrick).  He did ask a good question - is CQL3 going to be 
relatively set in stone in 1.2?  If people implement to CQL3, that's not going 
to change is it?

 Make CQL work naturally with wide rows
 --

 Key: CASSANDRA-4815
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815
 Project: Cassandra
  Issue Type: Wish
Reporter: Edward Capriolo

 I find that CQL3 is quite obtuse and does not provide me a language useful 
 for accessing my data. First, lets point out how we should design Cassandra 
 data. 
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 4) optimize for blind writes
 So here is a schema that abides by these tried and tested rules large 
 production uses are employing today. 
 Say we have a table of movie objects:
 Movie
 Name
 Description
 - tags   (string)
 - credits composite(role string, name string )
 -1 likesToday
 -1 blacklisted
 The above structure is a movie notice it hold a mix of static and dynamic 
 columns, but the other all number of columns is not very large. (even if it 
 was larger this is OK as well) Notice this table is not just 
 a single one to many relationship, it has 1 to 1 data and it has two sets of 
 1 to many data.
 The schema today is declared something like this:
 create column family movies
 with default_comparator=UTF8Type and
   column_metadata =
   [
 {column_name: blacklisted, validation_class: int},
 {column_name: likestoday, validation_class: long},
 {column_name: description, validation_class: UTF8Type}
   ];
 We should be able to insert data like this:
 set ['Cassandra Database, not looking for a seQL']['blacklisted']=1;
 set ['Cassandra Database, not looking for a seQL']['likesToday']=34;
 set ['Cassandra Database, not looking for a 
 seQL']['credits-dir']='director:asf';
 set ['Cassandra Database, not looking for a 
 seQL']['credits-jir]='jiraguy:bob';
 set ['Cassandra Database, not looking for a seQL']['tags-action']='';
 set ['Cassandra Database, not looking for a seQL']['tags-adventure']='';
 set ['Cassandra Database, not looking for a seQL']['tags-romance']='';
 set ['Cassandra Database, not looking for a seQL']['tags-programming']='';
 This is the correct way to do it. 1 seek to find all the information related 
 to a movie. As long as this row does
 not get large there is no reason to optimize by breaking data into other 
 column families. (Notice you can not transpose this
 because movies is two 1-to-many relationships of potentially different types)
 Lets look at the CQL3 way to do this design:
 First, contrary to the original design of cassandra CQL does not like wide 
 rows. It also does not have a good way to dealing with dynamic rows together 
 with static rows either.
 You have two options:
 Option 1: lose all schema
 create table movies ( name string, column blob, value blob, primary 
 key(name)) with compact storage.
 This method is not so hot we have not lost all our validators, and by the way 
 you have to physically shutdown everything and rename files and recreate your 
 schema if you want to inform cassandra that a current table should be 
 compact. This could at very least be just a metadata change. Also you can not 
 add column schema either.
 Option 2  Normalize (is even worse)
 create table movie (name String, description string, likestoday int, 
 blacklisted int);
 create table movecredits( name string, role string, personname string, 
 primary key(name,role) );
 create table movetags( name string, tag string, primary key (name,tag) );
 This is a terrible design, of the 4 key characteristics how cassandra data 
 should be designed it fails 3:
 It does not:
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 Why is Cassandra steering toward this course, by making a language that does 
 not understand wide rows?
 So what can be done? My suggestions: 
 Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a 
 virtual view that is compact storage with no work to migrate data and 
 recreate schemas. Every table should have a compact view for the schemaless, 
 or a simple query hint like /*transposed*/ should make this change.
 Metadata should be definable by regex. For example, all columnes named tag* 
 are of type string.
 CQL should have the column[slice_start] .. column[slice_end] operator from 
 cql2. 
 CQL should support current users, users should not have to 
 switch between CQL versions, and possibly thrift, to work with wide rows. The 
 language should work for them even if 
 it not expressly designed for 

[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows

2012-10-20 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480889#comment-13480889
 ] 

Edward Capriolo commented on CASSANDRA-4815:


This is possibly more nitpicky because it seems hard to do.

create column family compositetry with key_validation_class=UTF8Type and 
comparator='CompositeType(UTF8Type,UTF8Type)';

[default@testkeyspace] set compositetry ['a']['b:c']=UTF8('d');  
[default@testkeyspace] set compositetry ['a']['d:e']=UTF8('f');
[default@testkeyspace] set compositetry ['a']['h:i']=UTF8('j');

cqlsh:testkeyspace select * from compositetry where key='a' and column1='b' 
and column1'h';
 key | column1 | column2 | value
-+-+-+---
   a |   b |   c |64
   a |   d |   e |66

cqlsh:testkeyspace select * from compositetry where key='a' and column1='b' 
and column1'h' and column2='c';
Bad Request: PRIMARY KEY part column2 cannot be restricted (preceding part 
column1 is either not restricted or by a non-EQ relation)
Perhaps you meant to use CQL 2? Try using the -2 option when starting cqlsh.

I guess this is slightly more difficult to express composite slices. 

 Make CQL work naturally with wide rows
 --

 Key: CASSANDRA-4815
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815
 Project: Cassandra
  Issue Type: Wish
Reporter: Edward Capriolo

 I find that CQL3 is quite obtuse and does not provide me a language useful 
 for accessing my data. First, lets point out how we should design Cassandra 
 data. 
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 4) optimize for blind writes
 So here is a schema that abides by these tried and tested rules large 
 production uses are employing today. 
 Say we have a table of movie objects:
 Movie
 Name
 Description
 - tags   (string)
 - credits composite(role string, name string )
 -1 likesToday
 -1 blacklisted
 The above structure is a movie notice it hold a mix of static and dynamic 
 columns, but the other all number of columns is not very large. (even if it 
 was larger this is OK as well) Notice this table is not just 
 a single one to many relationship, it has 1 to 1 data and it has two sets of 
 1 to many data.
 The schema today is declared something like this:
 create column family movies
 with default_comparator=UTF8Type and
   column_metadata =
   [
 {column_name: blacklisted, validation_class: int},
 {column_name: likestoday, validation_class: long},
 {column_name: description, validation_class: UTF8Type}
   ];
 We should be able to insert data like this:
 set ['Cassandra Database, not looking for a seQL']['blacklisted']=1;
 set ['Cassandra Database, not looking for a seQL']['likesToday']=34;
 set ['Cassandra Database, not looking for a 
 seQL']['credits-dir']='director:asf';
 set ['Cassandra Database, not looking for a 
 seQL']['credits-jir]='jiraguy:bob';
 set ['Cassandra Database, not looking for a seQL']['tags-action']='';
 set ['Cassandra Database, not looking for a seQL']['tags-adventure']='';
 set ['Cassandra Database, not looking for a seQL']['tags-romance']='';
 set ['Cassandra Database, not looking for a seQL']['tags-programming']='';
 This is the correct way to do it. 1 seek to find all the information related 
 to a movie. As long as this row does
 not get large there is no reason to optimize by breaking data into other 
 column families. (Notice you can not transpose this
 because movies is two 1-to-many relationships of potentially different types)
 Lets look at the CQL3 way to do this design:
 First, contrary to the original design of cassandra CQL does not like wide 
 rows. It also does not have a good way to dealing with dynamic rows together 
 with static rows either.
 You have two options:
 Option 1: lose all schema
 create table movies ( name string, column blob, value blob, primary 
 key(name)) with compact storage.
 This method is not so hot we have not lost all our validators, and by the way 
 you have to physically shutdown everything and rename files and recreate your 
 schema if you want to inform cassandra that a current table should be 
 compact. This could at very least be just a metadata change. Also you can not 
 add column schema either.
 Option 2  Normalize (is even worse)
 create table movie (name String, description string, likestoday int, 
 blacklisted int);
 create table movecredits( name string, role string, personname string, 
 primary key(name,role) );
 create table movetags( name string, tag string, primary key (name,tag) );
 This is a terrible design, of the 4 key characteristics how cassandra data 
 should be designed it fails 3:
 It does not:
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 Why is Cassandra steering toward this course, by making a language that does 
 not understand 

[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows

2012-10-20 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480890#comment-13480890
 ] 

Edward Capriolo commented on CASSANDRA-4815:


Also from the article this statement:

For the song tags, we have two choices. If we need to be compatible with data 
from an old-style schema, we can do that as follows:

CREATE TABLE song_tags (
id uuid,
tag_name text,
PRIMARY KEY (id, tag_name)
);

What does this mean? 'We can do that'. If we have an old style schema don't we 
need to be able to alter a current table. Which can't be done.

cqlsh:testkeyspace CREATE TABLE song_tags ( id uuid, tag_name text, b text,  
PRIMARY KEY (id, tag_name) );
Bad Request: org.apache.cassandra.config.ConfigurationException: Cannot add 
already existing column family 'song_tags' to keyspace 'testkeyspace'.

This is why I suggest VIEW tables make sense. All the CQL2 / CQL3 tables look 
like logical constructs on top of physical column families. Maybe defining 
multiple logic tables storing data to the same physical ones is the best bet 
long term.

 Make CQL work naturally with wide rows
 --

 Key: CASSANDRA-4815
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815
 Project: Cassandra
  Issue Type: Wish
Reporter: Edward Capriolo

 I find that CQL3 is quite obtuse and does not provide me a language useful 
 for accessing my data. First, lets point out how we should design Cassandra 
 data. 
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 4) optimize for blind writes
 So here is a schema that abides by these tried and tested rules large 
 production uses are employing today. 
 Say we have a table of movie objects:
 Movie
 Name
 Description
 - tags   (string)
 - credits composite(role string, name string )
 -1 likesToday
 -1 blacklisted
 The above structure is a movie notice it hold a mix of static and dynamic 
 columns, but the other all number of columns is not very large. (even if it 
 was larger this is OK as well) Notice this table is not just 
 a single one to many relationship, it has 1 to 1 data and it has two sets of 
 1 to many data.
 The schema today is declared something like this:
 create column family movies
 with default_comparator=UTF8Type and
   column_metadata =
   [
 {column_name: blacklisted, validation_class: int},
 {column_name: likestoday, validation_class: long},
 {column_name: description, validation_class: UTF8Type}
   ];
 We should be able to insert data like this:
 set ['Cassandra Database, not looking for a seQL']['blacklisted']=1;
 set ['Cassandra Database, not looking for a seQL']['likesToday']=34;
 set ['Cassandra Database, not looking for a 
 seQL']['credits-dir']='director:asf';
 set ['Cassandra Database, not looking for a 
 seQL']['credits-jir]='jiraguy:bob';
 set ['Cassandra Database, not looking for a seQL']['tags-action']='';
 set ['Cassandra Database, not looking for a seQL']['tags-adventure']='';
 set ['Cassandra Database, not looking for a seQL']['tags-romance']='';
 set ['Cassandra Database, not looking for a seQL']['tags-programming']='';
 This is the correct way to do it. 1 seek to find all the information related 
 to a movie. As long as this row does
 not get large there is no reason to optimize by breaking data into other 
 column families. (Notice you can not transpose this
 because movies is two 1-to-many relationships of potentially different types)
 Lets look at the CQL3 way to do this design:
 First, contrary to the original design of cassandra CQL does not like wide 
 rows. It also does not have a good way to dealing with dynamic rows together 
 with static rows either.
 You have two options:
 Option 1: lose all schema
 create table movies ( name string, column blob, value blob, primary 
 key(name)) with compact storage.
 This method is not so hot we have not lost all our validators, and by the way 
 you have to physically shutdown everything and rename files and recreate your 
 schema if you want to inform cassandra that a current table should be 
 compact. This could at very least be just a metadata change. Also you can not 
 add column schema either.
 Option 2  Normalize (is even worse)
 create table movie (name String, description string, likestoday int, 
 blacklisted int);
 create table movecredits( name string, role string, personname string, 
 primary key(name,role) );
 create table movetags( name string, tag string, primary key (name,tag) );
 This is a terrible design, of the 4 key characteristics how cassandra data 
 should be designed it fails 3:
 It does not:
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 Why is Cassandra steering toward this course, by making a language that does 
 not understand wide rows?
 So what can be done? My suggestions: 
 Cassandra needs to lose the COMPACT STORAGE 

[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows

2012-10-18 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479294#comment-13479294
 ] 

Jonathan Ellis commented on CASSANDRA-4815:
---

Hi Ed,

I wrote a longish blog post over at 
http://www.datastax.com/dev/blog/cql3-for-cassandra-experts showing how use 
cases like this are handled in CQL3, with no rewriting of data.  Give that a 
read and let me know if you have further questions!

(All the examples in that post, except for the one using {{Set}}, are from 
Cassandra 1.1.6.)

 Make CQL work naturally with wide rows
 --

 Key: CASSANDRA-4815
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815
 Project: Cassandra
  Issue Type: Wish
Reporter: Edward Capriolo

 I find that CQL3 is quite obtuse and does not provide me a language useful 
 for accessing my data. First, lets point out how we should design Cassandra 
 data. 
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 4) optimize for blind writes
 So here is a schema that abides by these tried and tested rules large 
 production uses are employing today. 
 Say we have a table of movie objects:
 Movie
 Name
 Description
 - tags   (string)
 - credits composite(role string, name string )
 -1 likesToday
 -1 blacklisted
 The above structure is a movie notice it hold a mix of static and dynamic 
 columns, but the other all number of columns is not very large. (even if it 
 was larger this is OK as well) Notice this table is not just 
 a single one to many relationship, it has 1 to 1 data and it has two sets of 
 1 to many data.
 The schema today is declared something like this:
 create column family movies
 with default_comparator=UTF8Type and
   column_metadata =
   [
 {column_name: blacklisted, validation_class: int},
 {column_name: likestoday, validation_class: long},
 {column_name: description, validation_class: UTF8Type}
   ];
 We should be able to insert data like this:
 set ['Cassandra Database, not looking for a seQL']['blacklisted']=1;
 set ['Cassandra Database, not looking for a seQL']['likesToday']=34;
 set ['Cassandra Database, not looking for a 
 seQL']['credits-dir']='director:asf';
 set ['Cassandra Database, not looking for a 
 seQL']['credits-jir]='jiraguy:bob';
 set ['Cassandra Database, not looking for a seQL']['tags-action']='';
 set ['Cassandra Database, not looking for a seQL']['tags-adventure']='';
 set ['Cassandra Database, not looking for a seQL']['tags-romance']='';
 set ['Cassandra Database, not looking for a seQL']['tags-programming']='';
 This is the correct way to do it. 1 seek to find all the information related 
 to a movie. As long as this row does
 not get large there is no reason to optimize by breaking data into other 
 column families. (Notice you can not transpose this
 because movies is two 1-to-many relationships of potentially different types)
 Lets look at the CQL3 way to do this design:
 First, contrary to the original design of cassandra CQL does not like wide 
 rows. It also does not have a good way to dealing with dynamic rows together 
 with static rows either.
 You have two options:
 Option 1: lose all schema
 create table movies ( name string, column blob, value blob, primary 
 key(name)) with compact storage.
 This method is not so hot we have not lost all our validators, and by the way 
 you have to physically shutdown everything and rename files and recreate your 
 schema if you want to inform cassandra that a current table should be 
 compact. This could at very least be just a metadata change. Also you can not 
 add column schema either.
 Option 2  Normalize (is even worse)
 create table movie (name String, description string, likestoday int, 
 blacklisted int);
 create table movecredits( name string, role string, personname string, 
 primary key(name,role) );
 create table movetags( name string, tag string, primary key (name,tag) );
 This is a terrible design, of the 4 key characteristics how cassandra data 
 should be designed it fails 3:
 It does not:
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 Why is Cassandra steering toward this course, by making a language that does 
 not understand wide rows?
 So what can be done? My suggestions: 
 Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a 
 virtual view that is compact storage with no work to migrate data and 
 recreate schemas. Every table should have a compact view for the schemaless, 
 or a simple query hint like /*transposed*/ should make this change.
 Metadata should be definable by regex. For example, all columnes named tag* 
 are of type string.
 CQL should have the column[slice_start] .. column[slice_end] operator from 
 cql2. 
 CQL should support current users, users should not have to 
 switch between CQL versions, and possibly thrift, to work 

[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows

2012-10-16 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477109#comment-13477109
 ] 

Nick Bailey commented on CASSANDRA-4815:


Isn't this the main reason behind collections support?

{noformat}
CREATE TABLE movies (
  movie_id int PRIMARY KEY,
  blacklisted int,
  credits maptext, text,
  description text,
  likes_today int,
  name text,
  tags settext
);
{noformat}

 Make CQL work naturally with wide rows
 --

 Key: CASSANDRA-4815
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815
 Project: Cassandra
  Issue Type: Wish
Reporter: Edward Capriolo

 I find that CQL3 is quite obtuse and does not provide me a language useful 
 for accessing my data. First, lets point out how we should design Cassandra 
 data. 
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 4) optimize for blind writes
 So here is a schema that abides by these tried and tested rules large 
 production uses are employing today. 
 Say we have a table of movie objects:
 Movie
 Name
 Description
 - tags   (string)
 - credits composite(role string, name string )
 -1 likesToday
 -1 blacklisted
 The above structure is a movie notice it hold a mix of static and dynamic 
 columns, but the other all number of columns is not very large. (even if it 
 was larger this is OK as well) Notice this table is not just 
 a single one to many relationship, it has 1 to 1 data and it has two sets of 
 1 to many data.
 The schema today is declared something like this:
 create column family movies
 with default_comparator=UTF8Type and
   column_metadata =
   [
 {column_name: blacklisted, validation_class: int},
 {column_name: likestoday, validation_class: long},
 {column_name: description, validation_class: UTF8Type}
   ];
 We should be able to insert data like this:
 set ['Cassandra Database, not looking for a seQL']['blacklisted']=1;
 set ['Cassandra Database, not looking for a seQL']['likesToday']=34;
 set ['Cassandra Database, not looking for a 
 seQL']['credits-dir']='director:asf';
 set ['Cassandra Database, not looking for a 
 seQL']['credits-jir]='jiraguy:bob';
 set ['Cassandra Database, not looking for a seQL']['tags-action']='';
 set ['Cassandra Database, not looking for a seQL']['tags-adventure']='';
 set ['Cassandra Database, not looking for a seQL']['tags-romance']='';
 set ['Cassandra Database, not looking for a seQL']['tags-programming']='';
 This is the correct way to do it. 1 seek to find all the information related 
 to a movie. As long as this row does
 not get large there is no reason to optimize by breaking data into other 
 column families. (Notice you can not transpose this
 because movies is two 1-to-many relationships of potentially different types)
 Lets look at the CQL3 way to do this design:
 First, contrary to the original design of cassandra CQL does not like wide 
 rows. It also does not have a good way to dealing with dynamic rows together 
 with static rows either.
 You have two options:
 Option 1: lose all schema
 create table movies ( name string, column blob, value blob, primary 
 key(name)) with compact storage.
 This method is not so hot we have not lost all our validators, and by the way 
 you have to physically shutdown everything and rename files and recreate your 
 schema if you want to inform cassandra that a current table should be 
 compact. This could at very least be just a metadata change. Also you can not 
 add column schema either.
 Option 2  Normalize (is even worse)
 create table movie (name String, description string, likestoday int, 
 blacklisted int);
 create table movecredits( name string, role string, personname string, 
 primary key(name,role) );
 create table movetags( name string, tag string, primary key (name,tag) );
 This is a terrible design, of the 4 key characteristics how cassandra data 
 should be designed it fails 3:
 It does not:
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 Why is Cassandra steering toward this course, by making a language that does 
 not understand wide rows?
 So what can be done? My suggestions: 
 Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a 
 virtual view that is compact storage with no work to migrate data and 
 recreate schemas. Every table should have a compact view for the schemaless, 
 or a simple query hint like /*transposed*/ should make this change.
 Metadata should be definable by regex. For example, all columnes named tag* 
 are of type string.
 CQL should have the column[slice_start] .. column[slice_end] operator from 
 cql2. 
 CQL should support current users, users should not have to 
 switch between CQL versions, and possibly thrift, to work with wide rows. The 
 language should work for them even if 
 it not expressly designed for them. Some of 

[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows

2012-10-16 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477114#comment-13477114
 ] 

T Jake Luciani commented on CASSANDRA-4815:
---

I agree CQL3 is a step towards requiring more schema... I think for a lot of 
people that's a good thing and others it's not.  

The core of the issue here IMO is not how can we change CQL3 to fit your use 
case. It's will CQL3 eventually be the only way to access Cassandra in N years 
or can we always rely on there being the old more schemaless API?





 Make CQL work naturally with wide rows
 --

 Key: CASSANDRA-4815
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815
 Project: Cassandra
  Issue Type: Wish
Reporter: Edward Capriolo

 I find that CQL3 is quite obtuse and does not provide me a language useful 
 for accessing my data. First, lets point out how we should design Cassandra 
 data. 
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 4) optimize for blind writes
 So here is a schema that abides by these tried and tested rules large 
 production uses are employing today. 
 Say we have a table of movie objects:
 Movie
 Name
 Description
 - tags   (string)
 - credits composite(role string, name string )
 -1 likesToday
 -1 blacklisted
 The above structure is a movie notice it hold a mix of static and dynamic 
 columns, but the other all number of columns is not very large. (even if it 
 was larger this is OK as well) Notice this table is not just 
 a single one to many relationship, it has 1 to 1 data and it has two sets of 
 1 to many data.
 The schema today is declared something like this:
 create column family movies
 with default_comparator=UTF8Type and
   column_metadata =
   [
 {column_name: blacklisted, validation_class: int},
 {column_name: likestoday, validation_class: long},
 {column_name: description, validation_class: UTF8Type}
   ];
 We should be able to insert data like this:
 set ['Cassandra Database, not looking for a seQL']['blacklisted']=1;
 set ['Cassandra Database, not looking for a seQL']['likesToday']=34;
 set ['Cassandra Database, not looking for a 
 seQL']['credits-dir']='director:asf';
 set ['Cassandra Database, not looking for a 
 seQL']['credits-jir]='jiraguy:bob';
 set ['Cassandra Database, not looking for a seQL']['tags-action']='';
 set ['Cassandra Database, not looking for a seQL']['tags-adventure']='';
 set ['Cassandra Database, not looking for a seQL']['tags-romance']='';
 set ['Cassandra Database, not looking for a seQL']['tags-programming']='';
 This is the correct way to do it. 1 seek to find all the information related 
 to a movie. As long as this row does
 not get large there is no reason to optimize by breaking data into other 
 column families. (Notice you can not transpose this
 because movies is two 1-to-many relationships of potentially different types)
 Lets look at the CQL3 way to do this design:
 First, contrary to the original design of cassandra CQL does not like wide 
 rows. It also does not have a good way to dealing with dynamic rows together 
 with static rows either.
 You have two options:
 Option 1: lose all schema
 create table movies ( name string, column blob, value blob, primary 
 key(name)) with compact storage.
 This method is not so hot we have not lost all our validators, and by the way 
 you have to physically shutdown everything and rename files and recreate your 
 schema if you want to inform cassandra that a current table should be 
 compact. This could at very least be just a metadata change. Also you can not 
 add column schema either.
 Option 2  Normalize (is even worse)
 create table movie (name String, description string, likestoday int, 
 blacklisted int);
 create table movecredits( name string, role string, personname string, 
 primary key(name,role) );
 create table movetags( name string, tag string, primary key (name,tag) );
 This is a terrible design, of the 4 key characteristics how cassandra data 
 should be designed it fails 3:
 It does not:
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 Why is Cassandra steering toward this course, by making a language that does 
 not understand wide rows?
 So what can be done? My suggestions: 
 Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a 
 virtual view that is compact storage with no work to migrate data and 
 recreate schemas. Every table should have a compact view for the schemaless, 
 or a simple query hint like /*transposed*/ should make this change.
 Metadata should be definable by regex. For example, all columnes named tag* 
 are of type string.
 CQL should have the column[slice_start] .. column[slice_end] operator from 
 cql2. 
 CQL should support current users, users should not have to 
 switch between CQL versions, and possibly 

[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows

2012-10-16 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477218#comment-13477218
 ] 

Edward Capriolo commented on CASSANDRA-4815:


As I mentioned towards the end of the ticket, this feature request is not to 
support future theoretical use cases, it is to support the dominant current use 
case. It is not just my use case. It is the use case that the Cassandra project 
originally advocated.
 
http://www.slideshare.net/lomakin.andrey/apache-cassandra-part-1-principles-data-model
slide 30
-columns aren't fixed
-columns can be sorted
-columns can be queried for a certain range

I am fine if Cassandra adds new features that benefit from more schema, I am 
fine with Cassandra adding collections and think these are a great idea. But I 
see no technical reason why CQL can't support both old and new use cases. This 
is especially disturbing since the project offers no eloquent way to get from 
now to the future. Switching to COMPACT STORAGE is a pain and rewriting all the 
data into a new collection based design is not necessarily a good use of 
resources. 

Someone once told me Avro was the future of Cassandra. I am asking for features 
to support the now. 





 Make CQL work naturally with wide rows
 --

 Key: CASSANDRA-4815
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815
 Project: Cassandra
  Issue Type: Wish
Reporter: Edward Capriolo

 I find that CQL3 is quite obtuse and does not provide me a language useful 
 for accessing my data. First, lets point out how we should design Cassandra 
 data. 
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 4) optimize for blind writes
 So here is a schema that abides by these tried and tested rules large 
 production uses are employing today. 
 Say we have a table of movie objects:
 Movie
 Name
 Description
 - tags   (string)
 - credits composite(role string, name string )
 -1 likesToday
 -1 blacklisted
 The above structure is a movie notice it hold a mix of static and dynamic 
 columns, but the other all number of columns is not very large. (even if it 
 was larger this is OK as well) Notice this table is not just 
 a single one to many relationship, it has 1 to 1 data and it has two sets of 
 1 to many data.
 The schema today is declared something like this:
 create column family movies
 with default_comparator=UTF8Type and
   column_metadata =
   [
 {column_name: blacklisted, validation_class: int},
 {column_name: likestoday, validation_class: long},
 {column_name: description, validation_class: UTF8Type}
   ];
 We should be able to insert data like this:
 set ['Cassandra Database, not looking for a seQL']['blacklisted']=1;
 set ['Cassandra Database, not looking for a seQL']['likesToday']=34;
 set ['Cassandra Database, not looking for a 
 seQL']['credits-dir']='director:asf';
 set ['Cassandra Database, not looking for a 
 seQL']['credits-jir]='jiraguy:bob';
 set ['Cassandra Database, not looking for a seQL']['tags-action']='';
 set ['Cassandra Database, not looking for a seQL']['tags-adventure']='';
 set ['Cassandra Database, not looking for a seQL']['tags-romance']='';
 set ['Cassandra Database, not looking for a seQL']['tags-programming']='';
 This is the correct way to do it. 1 seek to find all the information related 
 to a movie. As long as this row does
 not get large there is no reason to optimize by breaking data into other 
 column families. (Notice you can not transpose this
 because movies is two 1-to-many relationships of potentially different types)
 Lets look at the CQL3 way to do this design:
 First, contrary to the original design of cassandra CQL does not like wide 
 rows. It also does not have a good way to dealing with dynamic rows together 
 with static rows either.
 You have two options:
 Option 1: lose all schema
 create table movies ( name string, column blob, value blob, primary 
 key(name)) with compact storage.
 This method is not so hot we have not lost all our validators, and by the way 
 you have to physically shutdown everything and rename files and recreate your 
 schema if you want to inform cassandra that a current table should be 
 compact. This could at very least be just a metadata change. Also you can not 
 add column schema either.
 Option 2  Normalize (is even worse)
 create table movie (name String, description string, likestoday int, 
 blacklisted int);
 create table movecredits( name string, role string, personname string, 
 primary key(name,role) );
 create table movetags( name string, tag string, primary key (name,tag) );
 This is a terrible design, of the 4 key characteristics how cassandra data 
 should be designed it fails 3:
 It does not:
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 Why is Cassandra steering toward this course, by