Re: CQL and undefined columns
On Wed, Jul 31, 2013 at 3:10 PM, Jonathan Haddad j...@jonhaddad.com wrote: It's advised you do not use compact storage, as it's primarily for backwards compatibility. Many Apache Cassandra experts do not advise against using COMPACT STORAGE. [1] Use CQL3 non-COMPACT STORAGE if you want to, but there are also valid reasons to not use it. Asserting that there is some good reason you should not use COMPACT STORAGE (other than range ghosts?) seems inaccurate. :) =Rob [1] http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/legacy_tables
Re: CQL and undefined columns
The CQL docs recommend not using it - I didn't just make that up. :) COMPACT STORAGE imposes the limit that you can't add columns to your tables. For those of us that are heavy CQL users, this limitation is a total deal breaker. On Mon, Aug 5, 2013 at 10:27 AM, Robert Coli rc...@eventbrite.com wrote: On Wed, Jul 31, 2013 at 3:10 PM, Jonathan Haddad j...@jonhaddad.comwrote: It's advised you do not use compact storage, as it's primarily for backwards compatibility. Many Apache Cassandra experts do not advise against using COMPACT STORAGE. [1] Use CQL3 non-COMPACT STORAGE if you want to, but there are also valid reasons to not use it. Asserting that there is some good reason you should not use COMPACT STORAGE (other than range ghosts?) seems inaccurate. :) =Rob [1] http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/legacy_tables -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
Re: CQL and undefined columns
COMPACT STORAGE imposes the limit that you can't add columns to your tables. Is absolutely false. If anything CQL is imposing the limits! Simple to prove. Try something like this: create table abc (x int); insert into abc (y) values (5); and watch CQL reject the insert saying something to the effect of 'y? whats that? Did you mean CQL2 OR 1.5?, or hamburgers' Then go to the Cassandra cli and do this: create column family abd; set ['abd']['y']= '5'; set ['abd']['z']='4'; AND IT WORKS! I noticed the nomenclature starting to spring up around the term legacy tables and docs based around can't do with them. Frankly it makes me nuts because... This little known web company named google produced a white paper about what a ColumnFamily data model could do http://en.wikipedia.org/wiki/BigTable . Cassandra was build on the BigTable/ColumnFamily data model. There was also this big movement called NoSQL, where people wanted to break free of query languages and rigid schema's On Mon, Aug 5, 2013 at 1:56 PM, Jonathan Haddad j...@jonhaddad.com wrote: The CQL docs recommend not using it - I didn't just make that up. :) COMPACT STORAGE imposes the limit that you can't add columns to your tables. For those of us that are heavy CQL users, this limitation is a total deal breaker. On Mon, Aug 5, 2013 at 10:27 AM, Robert Coli rc...@eventbrite.com wrote: On Wed, Jul 31, 2013 at 3:10 PM, Jonathan Haddad j...@jonhaddad.comwrote: It's advised you do not use compact storage, as it's primarily for backwards compatibility. Many Apache Cassandra experts do not advise against using COMPACT STORAGE. [1] Use CQL3 non-COMPACT STORAGE if you want to, but there are also valid reasons to not use it. Asserting that there is some good reason you should not use COMPACT STORAGE (other than range ghosts?) seems inaccurate. :) =Rob [1] http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/legacy_tables -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
Re: CQL and undefined columns
If you expected your CQL3 query to work, then I think you've missed the point of CQL completely. For many of us, adding in a query layer which gives us predictable column names, but continues to allow us to utilize wide rows on disk is a huge benefit. Why would I want to reinvent a system for structured data when the DB can handle it for me? I get a bunch of stuff for free with CQL, which decreases my development time, which is the resource that I happen to be the most bottlenecked on. Feel free to continue to use thrift's wide row structure, with ad hoc columns. No one is stopping you. On Mon, Aug 5, 2013 at 1:36 PM, Edward Capriolo edlinuxg...@gmail.comwrote: COMPACT STORAGE imposes the limit that you can't add columns to your tables. Is absolutely false. If anything CQL is imposing the limits! Simple to prove. Try something like this: create table abc (x int); insert into abc (y) values (5); and watch CQL reject the insert saying something to the effect of 'y? whats that? Did you mean CQL2 OR 1.5?, or hamburgers' Then go to the Cassandra cli and do this: create column family abd; set ['abd']['y']= '5'; set ['abd']['z']='4'; AND IT WORKS! I noticed the nomenclature starting to spring up around the term legacy tables and docs based around can't do with them. Frankly it makes me nuts because... This little known web company named google produced a white paper about what a ColumnFamily data model could do http://en.wikipedia.org/wiki/BigTable . Cassandra was build on the BigTable/ColumnFamily data model. There was also this big movement called NoSQL, where people wanted to break free of query languages and rigid schema's On Mon, Aug 5, 2013 at 1:56 PM, Jonathan Haddad j...@jonhaddad.com wrote: The CQL docs recommend not using it - I didn't just make that up. :) COMPACT STORAGE imposes the limit that you can't add columns to your tables. For those of us that are heavy CQL users, this limitation is a total deal breaker. On Mon, Aug 5, 2013 at 10:27 AM, Robert Coli rc...@eventbrite.comwrote: On Wed, Jul 31, 2013 at 3:10 PM, Jonathan Haddad j...@jonhaddad.comwrote: It's advised you do not use compact storage, as it's primarily for backwards compatibility. Many Apache Cassandra experts do not advise against using COMPACT STORAGE. [1] Use CQL3 non-COMPACT STORAGE if you want to, but there are also valid reasons to not use it. Asserting that there is some good reason you should not use COMPACT STORAGE (other than range ghosts?) seems inaccurate. :) =Rob [1] http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/legacy_tables -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
Re: CQL and undefined columns
From the Cassandra 1.2 Manual: Using the compact storage directive prevents you from adding more than one column that is not part of the PRIMARY KEY. At this time, updates to data in a table created with compact storage are not allowed. The table with compact storage that uses a compound primary key must define at least one clustering column. Unless you specify WITH COMPACT STORAGE, CQL creates a table with non-compact storage. Note that CQL collection columns (e.g. map) can do something a lot like your CLI example. Also note that your example doesn't even use WITH COMPACT STORAGE, which I don't think would make any difference in that case. Also, there's really no need to be snarky, respectful communication is much more appreciated. On 08/05/2013 02:36 PM, Edward Capriolo wrote: COMPACT STORAGE imposes the limit that you can't add columns to your tables. Is absolutely false. If anything CQL is imposing the limits! Simple to prove. Try something like this: create table abc (x int); insert into abc (y) values (5); and watch CQL reject the insert saying something to the effect of 'y? whats that? Did you mean CQL2 OR 1.5?, or hamburgers' Then go to the Cassandra cli and do this: create column family abd; set ['abd']['y']= '5'; set ['abd']['z']='4'; AND IT WORKS! I noticed the nomenclature starting to spring up around the term legacy tables and docs based around can't do with them. Frankly it makes me nuts because... This little known web company named google produced a white paper about what a ColumnFamily data model could do http://en.wikipedia.org/wiki/BigTable . Cassandra was build on the BigTable/ColumnFamily data model. There was also this big movement called NoSQL, where people wanted to break free of query languages and rigid schema's On Mon, Aug 5, 2013 at 1:56 PM, Jonathan Haddad j...@jonhaddad.com mailto:j...@jonhaddad.com wrote: The CQL docs recommend not using it - I didn't just make that up. :) COMPACT STORAGE imposes the limit that you can't add columns to your tables. For those of us that are heavy CQL users, this limitation is a total deal breaker. On Mon, Aug 5, 2013 at 10:27 AM, Robert Coli rc...@eventbrite.com mailto:rc...@eventbrite.com wrote: On Wed, Jul 31, 2013 at 3:10 PM, Jonathan Haddad j...@jonhaddad.com mailto:j...@jonhaddad.com wrote: It's advised you do not use compact storage, as it's primarily for backwards compatibility. Many Apache Cassandra experts do not advise against using COMPACT STORAGE. [1] Use CQL3 non-COMPACT STORAGE if you want to, but there are also valid reasons to not use it. Asserting that there is some good reason you should not use COMPACT STORAGE (other than range ghosts?) seems inaccurate. :) =Rob [1] http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/legacy_tables -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
Re: CQL and undefined columns
Feel free to continue to use thrift's wide row structure, with ad hoc columns. No one is stopping you. Thanks. I was not trying to stop you from doing it your way either. You said this: COMPACT STORAGE imposes the limit that you can't add columns to your tables. I was demonstrating you are incorrect. I then went on to point out that Cassandra is a ColumnFamily data store which was designed around big table. You could always add column dynamically because schema-less is one of the key components of a ColumnFamily datastore. I know which CQL document you are loosely referencing that implies you can not add columns to compact storage. If that were true Cassandra would have never been a ColumnFamily data store. I have found several documents which are championing CQL and its constructs, which suggest that some thing can not be done with compact storage. In reality those are short comings of the CQL language. I say this because the language can not easily accommodate the original schema system. Many applications that are already written and performing well do NOT fit well into the CQL model of non compact storage (which does not have a name by the way probably because the opposite of compact is sparse and how would SPARSE STORAGE sound?). Implying all the original stuff is legacy and you should probably avoid it is wrong. In many cases compact storage it is the best way to store things, because it is the smallest. On Mon, Aug 5, 2013 at 4:57 PM, Jonathan Haddad j...@jonhaddad.com wrote: If you expected your CQL3 query to work, then I think you've missed the point of CQL completely. For many of us, adding in a query layer which gives us predictable column names, but continues to allow us to utilize wide rows on disk is a huge benefit. Why would I want to reinvent a system for structured data when the DB can handle it for me? I get a bunch of stuff for free with CQL, which decreases my development time, which is the resource that I happen to be the most bottlenecked on. Feel free to continue to use thrift's wide row structure, with ad hoc columns. No one is stopping you. On Mon, Aug 5, 2013 at 1:36 PM, Edward Capriolo edlinuxg...@gmail.comwrote: COMPACT STORAGE imposes the limit that you can't add columns to your tables. Is absolutely false. If anything CQL is imposing the limits! Simple to prove. Try something like this: create table abc (x int); insert into abc (y) values (5); and watch CQL reject the insert saying something to the effect of 'y? whats that? Did you mean CQL2 OR 1.5?, or hamburgers' Then go to the Cassandra cli and do this: create column family abd; set ['abd']['y']= '5'; set ['abd']['z']='4'; AND IT WORKS! I noticed the nomenclature starting to spring up around the term legacy tables and docs based around can't do with them. Frankly it makes me nuts because... This little known web company named google produced a white paper about what a ColumnFamily data model could do http://en.wikipedia.org/wiki/BigTable . Cassandra was build on the BigTable/ColumnFamily data model. There was also this big movement called NoSQL, where people wanted to break free of query languages and rigid schema's On Mon, Aug 5, 2013 at 1:56 PM, Jonathan Haddad j...@jonhaddad.comwrote: The CQL docs recommend not using it - I didn't just make that up. :) COMPACT STORAGE imposes the limit that you can't add columns to your tables. For those of us that are heavy CQL users, this limitation is a total deal breaker. On Mon, Aug 5, 2013 at 10:27 AM, Robert Coli rc...@eventbrite.comwrote: On Wed, Jul 31, 2013 at 3:10 PM, Jonathan Haddad j...@jonhaddad.comwrote: It's advised you do not use compact storage, as it's primarily for backwards compatibility. Many Apache Cassandra experts do not advise against using COMPACT STORAGE. [1] Use CQL3 non-COMPACT STORAGE if you want to, but there are also valid reasons to not use it. Asserting that there is some good reason you should not use COMPACT STORAGE (other than range ghosts?) seems inaccurate. :) =Rob [1] http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/legacy_tables -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
Re: CQL and undefined columns
CQL maps a series of logical rows into a single physical row by transposing multiple rows based on partition and clustering keys into slices of a row. The point is to add a loose schema on top of a wide row which allows you to stop reimplementing common patterns. Yes, you can go in and mess with your tables via the cassandra-cli, but that's not exactly proving me wrong. You've simply removed the constraints of CQL and wrote data to the table at a lower level that didn't deal with schema enforcement. On Mon, Aug 5, 2013 at 2:37 PM, Edward Capriolo edlinuxg...@gmail.comwrote: Feel free to continue to use thrift's wide row structure, with ad hoc columns. No one is stopping you. Thanks. I was not trying to stop you from doing it your way either. You said this: COMPACT STORAGE imposes the limit that you can't add columns to your tables. I was demonstrating you are incorrect. I then went on to point out that Cassandra is a ColumnFamily data store which was designed around big table. You could always add column dynamically because schema-less is one of the key components of a ColumnFamily datastore. I know which CQL document you are loosely referencing that implies you can not add columns to compact storage. If that were true Cassandra would have never been a ColumnFamily data store. I have found several documents which are championing CQL and its constructs, which suggest that some thing can not be done with compact storage. In reality those are short comings of the CQL language. I say this because the language can not easily accommodate the original schema system. Many applications that are already written and performing well do NOT fit well into the CQL model of non compact storage (which does not have a name by the way probably because the opposite of compact is sparse and how would SPARSE STORAGE sound?). Implying all the original stuff is legacy and you should probably avoid it is wrong. In many cases compact storage it is the best way to store things, because it is the smallest. On Mon, Aug 5, 2013 at 4:57 PM, Jonathan Haddad j...@jonhaddad.com wrote: If you expected your CQL3 query to work, then I think you've missed the point of CQL completely. For many of us, adding in a query layer which gives us predictable column names, but continues to allow us to utilize wide rows on disk is a huge benefit. Why would I want to reinvent a system for structured data when the DB can handle it for me? I get a bunch of stuff for free with CQL, which decreases my development time, which is the resource that I happen to be the most bottlenecked on. Feel free to continue to use thrift's wide row structure, with ad hoc columns. No one is stopping you. On Mon, Aug 5, 2013 at 1:36 PM, Edward Capriolo edlinuxg...@gmail.comwrote: COMPACT STORAGE imposes the limit that you can't add columns to your tables. Is absolutely false. If anything CQL is imposing the limits! Simple to prove. Try something like this: create table abc (x int); insert into abc (y) values (5); and watch CQL reject the insert saying something to the effect of 'y? whats that? Did you mean CQL2 OR 1.5?, or hamburgers' Then go to the Cassandra cli and do this: create column family abd; set ['abd']['y']= '5'; set ['abd']['z']='4'; AND IT WORKS! I noticed the nomenclature starting to spring up around the term legacy tables and docs based around can't do with them. Frankly it makes me nuts because... This little known web company named google produced a white paper about what a ColumnFamily data model could do http://en.wikipedia.org/wiki/BigTable . Cassandra was build on the BigTable/ColumnFamily data model. There was also this big movement called NoSQL, where people wanted to break free of query languages and rigid schema's On Mon, Aug 5, 2013 at 1:56 PM, Jonathan Haddad j...@jonhaddad.comwrote: The CQL docs recommend not using it - I didn't just make that up. :) COMPACT STORAGE imposes the limit that you can't add columns to your tables. For those of us that are heavy CQL users, this limitation is a total deal breaker. On Mon, Aug 5, 2013 at 10:27 AM, Robert Coli rc...@eventbrite.comwrote: On Wed, Jul 31, 2013 at 3:10 PM, Jonathan Haddad j...@jonhaddad.comwrote: It's advised you do not use compact storage, as it's primarily for backwards compatibility. Many Apache Cassandra experts do not advise against using COMPACT STORAGE. [1] Use CQL3 non-COMPACT STORAGE if you want to, but there are also valid reasons to not use it. Asserting that there is some good reason you should not use COMPACT STORAGE (other than range ghosts?) seems inaccurate. :) =Rob [1] http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/legacy_tables -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade -- Jon Haddad http://www.rustyrazorblade.com
Re: CQL and undefined columns
On Wed, Jul 31, 2013 at 03:10:54PM -0700, Jonathan Haddad wrote: It's advised you do not use compact storage, as it's primarily for backwards compatibility. Yes indeed, I understand what it does and why now, but only because I was pointed to the thrift-to-cql document. The CQL documentation itself doesn't make it at all clear, I was originally under the impression that the way 'COMPACT STORAGE' works was the way CQL works by default, because that's the natural assumption until it's explained why it doesn't work that way. I was pointing out that either the thrift-to-cql document must be wrong, or the CQL document must be wrong, because they contradict each other.
Re: CQL and undefined columns
I am glad this document helped you. I like to point to this 'thrift-to-cql' document, since it was really useful to me when I found it, even if I had to read it at least 3 times entirely and still need to refer to some piece of it sometimes because of the complexity of what is explained in it. @Sylvain, you did a real good job with this blog post. Thanks a lot, be sure I will continue sharing it. Alain 2013/8/1 Jon Ribbens jon-cassan...@unequivocal.co.uk On Wed, Jul 31, 2013 at 03:10:54PM -0700, Jonathan Haddad wrote: It's advised you do not use compact storage, as it's primarily for backwards compatibility. Yes indeed, I understand what it does and why now, but only because I was pointed to the thrift-to-cql document. The CQL documentation itself doesn't make it at all clear, I was originally under the impression that the way 'COMPACT STORAGE' works was the way CQL works by default, because that's the natural assumption until it's explained why it doesn't work that way. I was pointing out that either the thrift-to-cql document must be wrong, or the CQL document must be wrong, because they contradict each other.
Re: CQL and undefined columns
I like to point to this article from Sylvain, which is really well written. http://www.datastax.com/dev/blog/thrift-to-cql3 It explains a lot of things and is really interesting for Cassandra users pre-CQL3. Actually, old dynamic columns were defined this way : CREATE TABLE test ( keytext, column1 text, value text, PRIMARY KEY (key, column1) ) WITH COMPACT STORAGE; This is still doable with CQL3, column1 would be your column name, value, the value of your column. As the primary key is composed by key + column1, you can add as much columns as you want. An other way to do it is to dynamically add columns (Alter table ..., afaik, this is lock free and does not slow performance too much). 2013/7/31 Jon Ribbens jon-cassan...@unequivocal.co.uk I thought that part of the point of Cassandra was that, unlike a standard relational database, each row does not have to have the same set of columns. I don't understand how this squares with CQL. If I want to have a table (column family?) with a few fixed columns that are relevant to every row, I can create that with CQL's CREATE TABLE, but if I then want to set extra columns with arbitrary names on various rows, how do I tell CQL what type those columns are? Or is this feature of Cassandra now deprecated?
Re: CQL and undefined columns
Oops, sorry about double post. Alain 2013/7/31 Alain RODRIGUEZ arodr...@gmail.com I like to point to this article from Sylvain, which is really well written. http://www.datastax.com/dev/blog/thrift-to-cql3 It explains a lot of things and is really interesting for Cassandra users pre-CQL3. Actually, old dynamic columns were defined this way : CREATE TABLE test ( keytext, column1 text, value text, PRIMARY KEY (key, column1) ) WITH COMPACT STORAGE; This is still doable with CQL3, column1 would be your column name, value, the value of your column. As the primary key is composed by key + column1, you can add as much columns as you want. An other way to do it is to dynamically add columns (Alter table ..., afaik, this is lock free and does not slow performance too much). 2013/7/31 Jon Ribbens jon-cassan...@unequivocal.co.uk I thought that part of the point of Cassandra was that, unlike a standard relational database, each row does not have to have the same set of columns. I don't understand how this squares with CQL. If I want to have a table (column family?) with a few fixed columns that are relevant to every row, I can create that with CQL's CREATE TABLE, but if I then want to set extra columns with arbitrary names on various rows, how do I tell CQL what type those columns are? Or is this feature of Cassandra now deprecated?
Re: CQL and undefined columns
On Wed, Jul 31, 2013 at 02:21:52PM +0200, Alain RODRIGUEZ wrote: I like to point to this article from Sylvain, which is really well written. http://www.datastax.com/dev/blog/thrift-to-cql3 Ah, thankyou, it looks like a combination of multi-column PRIMARY KEY and use of collections may well suffice for what I want. I must admit that I did not find any of this particularly obvious from the CQL documentation. By the way, http://cassandra.apache.org/doc/cql3/CQL.html#createTableStmt says A table with COMPACT STORAGE must also define at least one clustering key, which seems to contradict definition 2 in the thrift-to-cql3 document you pointed me to.
Re: CQL and undefined columns
You should also profile what your data looks like on disk before picking a format. It may not be as efficient to use one form or the other due to extra disk overhead. On Wed, Jul 31, 2013 at 1:32 PM, Jon Ribbens jon-cassan...@unequivocal.co.uk wrote: On Wed, Jul 31, 2013 at 02:21:52PM +0200, Alain RODRIGUEZ wrote: I like to point to this article from Sylvain, which is really well written. http://www.datastax.com/dev/blog/thrift-to-cql3 Ah, thankyou, it looks like a combination of multi-column PRIMARY KEY and use of collections may well suffice for what I want. I must admit that I did not find any of this particularly obvious from the CQL documentation. By the way, http://cassandra.apache.org/doc/cql3/CQL.html#createTableStmt says A table with COMPACT STORAGE must also define at least one clustering key, which seems to contradict definition 2 in the thrift-to-cql3 document you pointed me to.
Re: CQL and undefined columns
It's advised you do not use compact storage, as it's primarily for backwards compatibility. The first of these option is COMPACT STORAGE. This option is meanly targeted towards backward compatibility with some table definition created before CQL3. But it also provides a slightly more compact layout of data on disk, though at the price of flexibility and extensibility, and for that reason is not recommended unless for the backward compatibility reason. On Wed, Jul 31, 2013 at 2:54 PM, Edward Capriolo edlinuxg...@gmail.comwrote: You should also profile what your data looks like on disk before picking a format. It may not be as efficient to use one form or the other due to extra disk overhead. On Wed, Jul 31, 2013 at 1:32 PM, Jon Ribbens jon-cassan...@unequivocal.co.uk wrote: On Wed, Jul 31, 2013 at 02:21:52PM +0200, Alain RODRIGUEZ wrote: I like to point to this article from Sylvain, which is really well written. http://www.datastax.com/dev/blog/thrift-to-cql3 Ah, thankyou, it looks like a combination of multi-column PRIMARY KEY and use of collections may well suffice for what I want. I must admit that I did not find any of this particularly obvious from the CQL documentation. By the way, http://cassandra.apache.org/doc/cql3/CQL.html#createTableStmt says A table with COMPACT STORAGE must also define at least one clustering key, which seems to contradict definition 2 in the thrift-to-cql3 document you pointed me to. -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade