mysterious 'column1' in cql describe
Hi all! We have encountered the following problem. We create our column families via hector like this: ColumnFamilyDefinition cfdef = HFactory.createColumnFamilyDefinition(* mykeyspace*, *mycf*); cfdef.setColumnType(ColumnType.*STANDARD*); cfdef.setComparatorType(ComparatorType.*UTF8TYPE*); cfdef.setDefaultValidationClass(*BytesType*); cfdef.setKeyValidationClass(*UTF8Type*); cfdef.setReadRepairChance(0.1); cfdef.setGcGraceSeconds(864000); cfdef.setMinCompactionThreshold(4); cfdef.setMaxCompactionThreshold(32); cfdef.setReplicateOnWrite(*true*); cfdef.setCompactionStrategy(*SizeTieredCompactionStrategy*); MapString, String compressionOptions = *new* HashMapString, String(); compressionOptions.put(*sstable_compression*, **); cfdef.setCompressionOptions(compressionOptions); cluster.addColumnFamily(cfdef, *true*); When we *describe *this column family via *cqlsh* we get this CREATE TABLE mycf ( key text, column1 text, value blob, PRIMARY KEY (key, column1) ) WITH COMPACT STORAGE AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={}; As you can see there is a mysterious *column1* and moreover it is added to the primary key. We've thought it wrong so we've tried getting rid of it. We've managed to do it by adding explicit column definitions like this: BasicColumnDefinition cdef = new BasicColumnDefinition(); cdef.setName(StringSerializer.get().toByteBuffer(*mycolumn*)); cdef.setValidationClass(ComparatorType.*BYTESTYPE*.getTypeName()); cdef.setIndexType(ColumnIndexType.*CUSTOM*); cfdef.addColumnDefinition(cDef); After this the primary key was like PRIMARY KEY (key) The effect of this was *overwhelming* - we got a tremendous performance improvement and according to stats, the key cache began working while previously its hit ratio was close to zero. My questions are 1) What is this all about? Is what we did right? 2) In this project we can provide explicit column definitions. But in another project we have some column families where this is not possible because column names are dynamic (based on timestamps). If what we did is right - how can we adapt this solution to the dynamic column name case?
Re: how can i get the column value? Need help!.. cassandra 1.28 and pig 0.11.1
I try this: *rows = LOAD 'cql://keyspace1/test?page_size=1split_size=4where_clause=age%3D30' USING CqlStorage();* *dump rows;* *ILLUSTRATE rows;* *describe rows;* * * *values2= FOREACH rows GENERATE TOTUPLE (id) as (mycolumn:tuple(name,value));* *dump values2;* *describe values2;* * * But I get this results: - | rows | id:chararray | age:int | title:chararray | - | | (id, 6)| (age, 30) | (title, QA) | - rows: {id: chararray,age: int,title: chararray} 2013-08-30 09:54:37,831 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable field schema: left is tuple_0:tuple(mycolumn:tuple(name:bytearray,value:bytearray)), right is org.apache.pig.builtin.totuple_id_1:tuple(id:chararray) or *values2= FOREACH rows GENERATE TOTUPLE (id) ;* *dump values2;* *describe values2;* and the results are: ... (((id,6))) (((id,5))) values2: {org.apache.pig.builtin.totuple_id_8: (id: chararray)} Aggg! * * Miguel Angel Martín Junquera Analyst Engineer. miguelangel.mar...@brainsins.com 2013/8/28 Miguel Angel Martin junquera mianmarjun.mailingl...@gmail.com hi: I can not understand why the schema is define like *id:chararray,age:int,title:chararray and it does not define like tuples or bag tuples, if we have pair key-values columns* * * * * *I try other time to change schema but it does not work.* * * *any ideas ...* * * *perhaps, is the issue in the definition cql3 tables ?* * * *regards* 2013/8/28 Miguel Angel Martin junquera mianmarjun.mailingl...@gmail.com hi all: Regards Still i can resolve this issue. . does anybody have this issue or try to test this simple example? i am stumped I can not find a solution working. I appreciate any comment or help 2013/8/22 Miguel Angel Martin junquera mianmarjun.mailingl...@gmail.com hi all: I,m testing the new CqlStorage() with cassandra 1.28 and pig 0.11.1 I am using this sample data test: http://frommyworkshop.blogspot.com.es/2013/07/hadoop-map-reduce-with-cassandra.html And I load and dump data Righ with this script: *rows = LOAD 'cql://keyspace1/test?page_size=1split_size=4where_clause=age%3D30' USING CqlStorage();* * * *dump rows;* *describe rows;* * * *resutls: ((id,6),(age,30),(title,QA)) ((id,5),(age,30),(title,QA)) rows: {id: chararray,age: int,title: chararray} * But i can not get the column values I try to define another schemas in Load like I used with cassandraStorage() http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-and-Pig-how-to-get-column-values-td5641158.html example: *rows = LOAD 'cql://keyspace1/test?page_size=1split_size=4where_clause=age%3D30' USING CqlStorage() AS (columns: bag {T: tuple(name, value)});* and I get this error: *2013-08-22 12:24:45,426 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable schema: left is columns:bag{T:tuple(name:bytearray,value:bytearray)}, right is id:chararray,age:int,title:chararray* I try to use, FLATTEN, SUBSTRING, SPLIT UDF`s but i have not get good result: Example: - when I flatten , I get a set of tuples like *(title,QA)* *(title,QA)* *2013-08-22 12:42:20,673 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1* *A: {title: chararray}* but i can get value QA Sustring only works with title example: *B = FOREACH A GENERATE SUBSTRING(title,2,5);* * * *dump B;* *describe B;* * * * * *results:* * * *(tle)* *(tle)* *B: {chararray}* i try, this like ERIC LEE inthe other mail and have the same results: Anyways, what I really what is the column value, not the name. Is there a way to do that? I listed all of the failed attempts I made below. - colnames = FOREACH cols GENERATE $1 and was told $1 was out of bounds. - casted = FOREACH cols GENERATE (tuple(chararray, chararray))$0; but all I got back were empty tuples - values = FOREACH cols GENERATE $0.$1; but I got an error telling me data byte array can't be casted to tuple Please, I will appreciate any help Regards -- Miguel Angel Martín Junquera Analyst Engineer. miguelangel.mar...@brainsins.com Tel. / Fax: (+34) 91 485 56 66 *http://www.brainsins.com* Smart eCommerce *Madrid*: http://goo.gl/4B5kv *London*: http://goo.gl/uIXdv *Barcelona*: http://goo.gl/NZslW Antes de imprimir este e-mail, piense si es necesario. La legislación española ampara el secreto de las comunicaciones. Este correo electrónico es estrictamente confidencial y va dirigido exclusivamente a su destinatario/a. Si no es Ud., le rogamos que no difunda ni copie la transmisión y nos lo notifique cuanto antes.
Re: CqlStorage creates wrong schema for Pig
I try this: *rows = LOAD 'cql://keyspace1/test?page_size=1split_size=4where_clause=age%3D30' USING CqlStorage();* *dump rows;* *ILLUSTRATE rows;* *describe rows;* * * *values2= FOREACH rows GENERATE TOTUPLE (id) as (mycolumn:tuple(name,value));* *dump values2;* *describe values2;* * * But I get this results: - | rows | id:chararray | age:int | title:chararray | - | | (id, 6)| (age, 30) | (title, QA) | - rows: {id: chararray,age: int,title: chararray} 2013-08-30 09:54:37,831 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable field schema: left is tuple_0:tuple(mycolumn:tuple(name:bytearray,value:bytearray)), right is org.apache.pig.builtin.totuple_id_1:tuple(id:chararray) or *values2= FOREACH rows GENERATE TOTUPLE (id) ;* *dump values2;* *describe values2;* and the results are: ... (((id,6))) (((id,5))) values2: {org.apache.pig.builtin.totuple_id_8: (id: chararray)} Aggg! * * Miguel Angel Martín Junquera Analyst Engineer. miguelangel.mar...@brainsins.com 2013/8/26 Miguel Angel Martin junquera mianmarjun.mailingl...@gmail.com hi Chad . I have this issue I send a mail to user-pig-list and I still i can resolve this, and I can not access to column values. In this mail I write some things that I try without results... and information about this issue. http://mail-archives.apache.org/mod_mbox/pig-user/201308.mbox/%3ccajeg_hq9s2po3_xytzx5xki4j1mao8q26jydg2wndy_kyiv...@mail.gmail.com%3E I hope someOne reply one comment, idea or solution about this issue or bug. I have reviewed the CqlStorage class in code cassandra 1.2.8 but i do not have configure the environmetn to debug and trace this issue. Only I find some comments like, but I do not understand at all. /** * A LoadStoreFunc for retrieving data from and storing data to Cassandra * * A row from a standard CF will be returned as nested tuples: * (((key1, value1), (key2, value2)), ((name1, val1), (name2, val2))). */ I you found some idea or solution, please post it thanks 2013/8/23 Chad Johnston cjohns...@megatome.com (I'm using Cassandra 1.2.8 and Pig 0.11.1) I'm loading some simple data from Cassandra into Pig using CqlStorage. The CqlStorage loader defines a Pig schema based on the Cassandra schema, but it seems to be wrong. If I do: data = LOAD 'cql://bookdata/books' USING CqlStorage(); DESCRIBE data; I get this: data: {isbn: chararray,bookauthor: chararray,booktitle: chararray,publisher: chararray,yearofpublication: int} However, if I DUMP data, I get results like these: ((isbn,0425093387),(bookauthor,Georgette Heyer),(booktitle,Death in the Stocks),(publisher,Berkley Pub Group),(yearofpublication,1986)) Clearly the results from Cassandra are key/value pairs, as would be expected. I don't know why the schema generated by CqlStorage() would be so different. This is really causing me problems trying to access the column values. I tried a naive approach of FLATTENing each tuple, then trying to access the values that way: flattened = FOREACH data GENERATE FLATTEN(isbn), FLATTEN(booktitle), ... values = FOREACH flattened GENERATE $1 AS ISBN, $3 AS BookTitle, ... As soon as I try to access field $5, Pig complains about the index being out of bounds. Is there a way to solve the schema/reality mismatch? Am I doing something wrong, or have I stumbled across a defect? Thanks, Chad
Re: successful use of shuffle?
+1. I am still afraid of this step. Yet you can avoid it by introducing new nodes, with vnodes enabled, and then remove old ones. This should work. My problem is that I am not really confident in vnodes either... Any share, on this transition, and then of the use of vnodes would be great indeed. Alain 2013/8/29 Robert Coli rc...@eventbrite.com Hi! I've been wondering... is there anyone in the cassandra-user audience who has used shuffle feature successfully on a non-toy-or-testing cluster? If so, could you describe the experience you had and any problems you encountered? Thanks! =Rob
Re: mysterious 'column1' in cql describe
The short story is that you're probably not up to date on how CQL and thrift table definition relate to one another, and that may not be exactly how you think it does. If you haven't done so, I'd suggest the reading of http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows(should answer your what about dynamic column name case) and http://www.datastax.com/dev/blog/thrift-to-cql3 (should help explain how CQL3 interprets thrift table, and why your saw what you saw). -- Sylvain On Fri, Aug 30, 2013 at 9:50 AM, Alexander Shutyaev shuty...@gmail.comwrote: Hi all! We have encountered the following problem. We create our column families via hector like this: ColumnFamilyDefinition cfdef = HFactory.createColumnFamilyDefinition(* mykeyspace*, *mycf*); cfdef.setColumnType(ColumnType.*STANDARD*); cfdef.setComparatorType(ComparatorType.*UTF8TYPE*); cfdef.setDefaultValidationClass(*BytesType*); cfdef.setKeyValidationClass(*UTF8Type*); cfdef.setReadRepairChance(0.1); cfdef.setGcGraceSeconds(864000); cfdef.setMinCompactionThreshold(4); cfdef.setMaxCompactionThreshold(32); cfdef.setReplicateOnWrite(*true*); cfdef.setCompactionStrategy(*SizeTieredCompactionStrategy*); MapString, String compressionOptions = *new* HashMapString, String(); compressionOptions.put(*sstable_compression*, **); cfdef.setCompressionOptions(compressionOptions); cluster.addColumnFamily(cfdef, *true*); When we *describe *this column family via *cqlsh* we get this CREATE TABLE mycf ( key text, column1 text, value blob, PRIMARY KEY (key, column1) ) WITH COMPACT STORAGE AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={}; As you can see there is a mysterious *column1* and moreover it is added to the primary key. We've thought it wrong so we've tried getting rid of it. We've managed to do it by adding explicit column definitions like this: BasicColumnDefinition cdef = new BasicColumnDefinition(); cdef.setName(StringSerializer.get().toByteBuffer(*mycolumn*)); cdef.setValidationClass(ComparatorType.*BYTESTYPE*.getTypeName()); cdef.setIndexType(ColumnIndexType.*CUSTOM*); cfdef.addColumnDefinition(cDef); After this the primary key was like PRIMARY KEY (key) The effect of this was *overwhelming* - we got a tremendous performance improvement and according to stats, the key cache began working while previously its hit ratio was close to zero. My questions are 1) What is this all about? Is what we did right? 2) In this project we can provide explicit column definitions. But in another project we have some column families where this is not possible because column names are dynamic (based on timestamps). If what we did is right - how can we adapt this solution to the dynamic column name case?
Re: mysterious 'column1' in cql describe
Thanks, Sylvain! I'll read it most thoroughly but after a quick glance I wish to repeat my another (implied) question that I believe will not be answered in these articles. Why does the explicit definition of columns in a column family significantly improve performance and key cache hit ratio (the last one being almost zero when there are no explicit column definitions)? 2013/8/30 Sylvain Lebresne sylv...@datastax.com The short story is that you're probably not up to date on how CQL and thrift table definition relate to one another, and that may not be exactly how you think it does. If you haven't done so, I'd suggest the reading of http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows(should answer your what about dynamic column name case) and http://www.datastax.com/dev/blog/thrift-to-cql3 (should help explain how CQL3 interprets thrift table, and why your saw what you saw). -- Sylvain On Fri, Aug 30, 2013 at 9:50 AM, Alexander Shutyaev shuty...@gmail.comwrote: Hi all! We have encountered the following problem. We create our column families via hector like this: ColumnFamilyDefinition cfdef = HFactory.createColumnFamilyDefinition(* mykeyspace*, *mycf*); cfdef.setColumnType(ColumnType.*STANDARD*); cfdef.setComparatorType(ComparatorType.*UTF8TYPE*); cfdef.setDefaultValidationClass(*BytesType*); cfdef.setKeyValidationClass(*UTF8Type*); cfdef.setReadRepairChance(0.1); cfdef.setGcGraceSeconds(864000); cfdef.setMinCompactionThreshold(4); cfdef.setMaxCompactionThreshold(32); cfdef.setReplicateOnWrite(*true*); cfdef.setCompactionStrategy(*SizeTieredCompactionStrategy*); MapString, String compressionOptions = *new* HashMapString, String(); compressionOptions.put(*sstable_compression*, **); cfdef.setCompressionOptions(compressionOptions); cluster.addColumnFamily(cfdef, *true*); When we *describe *this column family via *cqlsh* we get this CREATE TABLE mycf ( key text, column1 text, value blob, PRIMARY KEY (key, column1) ) WITH COMPACT STORAGE AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={}; As you can see there is a mysterious *column1* and moreover it is added to the primary key. We've thought it wrong so we've tried getting rid of it. We've managed to do it by adding explicit column definitions like this: BasicColumnDefinition cdef = new BasicColumnDefinition(); cdef.setName(StringSerializer.get().toByteBuffer(*mycolumn*)); cdef.setValidationClass(ComparatorType.*BYTESTYPE*.getTypeName()); cdef.setIndexType(ColumnIndexType.*CUSTOM*); cfdef.addColumnDefinition(cDef); After this the primary key was like PRIMARY KEY (key) The effect of this was *overwhelming* - we got a tremendous performance improvement and according to stats, the key cache began working while previously its hit ratio was close to zero. My questions are 1) What is this all about? Is what we did right? 2) In this project we can provide explicit column definitions. But in another project we have some column families where this is not possible because column names are dynamic (based on timestamps). If what we did is right - how can we adapt this solution to the dynamic column name case?
[RELEASE] Apache Cassandra 1.2.9 released
The Cassandra team is pleased to announce the release of Apache Cassandra version 1.2.9. Cassandra is a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model. You can read more here: http://cassandra.apache.org/ Downloads of source and binary distributions are listed in our download section: http://cassandra.apache.org/download/ This version is a maintenance/bug fix release[1] on the 1.2 series. As always, please pay attention to the release notes[2] and Let us know[3] if you were to encounter any problem. Enjoy! [1]: http://goo.gl/2UVSW5 (CHANGES.txt) [2]: http://goo.gl/lOZAdM (NEWS.txt) [3]: https://issues.apache.org/jira/browse/CASSANDRA
Re: mysterious 'column1' in cql describe
Why does the explicit definition of columns in a column family significantly improve performance and key cache hit ratio (the last one being almost zero when there are no explicit column definitions)? It doesn't, not in itself at least. So something else has changed or something is wrong in your comparison of before/after. But it's hard to say without at least a minimum of information on how you actually observed such significant performance improvement (which queries for instance). As for the key cache hit rate, adding a column definition certainly have no effect on it in itself. But defining a new 2ndary index might, and the code to add the column you've provided does has a setIndexType. Again, hard to be definitive on that because the code you've show set a CUSTOM index type without providing any indexOption, which is *invalid* (and rejected as so by Cassandra). So either the code above is not complete, or it's not the one you've used, or Hector is doing some weird stuff behind your back. In any case, if index creation there has been, then *that* could easily explain a before-after performance difference. -- Sylvain 2013/8/30 Sylvain Lebresne sylv...@datastax.com The short story is that you're probably not up to date on how CQL and thrift table definition relate to one another, and that may not be exactly how you think it does. If you haven't done so, I'd suggest the reading of http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows(should answer your what about dynamic column name case) and http://www.datastax.com/dev/blog/thrift-to-cql3 (should help explain how CQL3 interprets thrift table, and why your saw what you saw). -- Sylvain On Fri, Aug 30, 2013 at 9:50 AM, Alexander Shutyaev shuty...@gmail.comwrote: Hi all! We have encountered the following problem. We create our column families via hector like this: ColumnFamilyDefinition cfdef = HFactory.createColumnFamilyDefinition(* mykeyspace*, *mycf*); cfdef.setColumnType(ColumnType.*STANDARD*); cfdef.setComparatorType(ComparatorType.*UTF8TYPE*); cfdef.setDefaultValidationClass(*BytesType*); cfdef.setKeyValidationClass(*UTF8Type*); cfdef.setReadRepairChance(0.1); cfdef.setGcGraceSeconds(864000); cfdef.setMinCompactionThreshold(4); cfdef.setMaxCompactionThreshold(32); cfdef.setReplicateOnWrite(*true*); cfdef.setCompactionStrategy(*SizeTieredCompactionStrategy*); MapString, String compressionOptions = *new* HashMapString, String(); compressionOptions.put(*sstable_compression*, **); cfdef.setCompressionOptions(compressionOptions); cluster.addColumnFamily(cfdef, *true*); When we *describe *this column family via *cqlsh* we get this CREATE TABLE mycf ( key text, column1 text, value blob, PRIMARY KEY (key, column1) ) WITH COMPACT STORAGE AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={}; As you can see there is a mysterious *column1* and moreover it is added to the primary key. We've thought it wrong so we've tried getting rid of it. We've managed to do it by adding explicit column definitions like this: BasicColumnDefinition cdef = new BasicColumnDefinition(); cdef.setName(StringSerializer.get().toByteBuffer(*mycolumn*)); cdef.setValidationClass(ComparatorType.*BYTESTYPE*.getTypeName()); cdef.setIndexType(ColumnIndexType.*CUSTOM*); cfdef.addColumnDefinition(cDef); After this the primary key was like PRIMARY KEY (key) The effect of this was *overwhelming* - we got a tremendous performance improvement and according to stats, the key cache began working while previously its hit ratio was close to zero. My questions are 1) What is this all about? Is what we did right? 2) In this project we can provide explicit column definitions. But in another project we have some column families where this is not possible because column names are dynamic (based on timestamps). If what we did is right - how can we adapt this solution to the dynamic column name case?
RE: Cassandra-shuffle fails
Hi, Failed to enable shuffling is thrown when an IOException occurs in the constructor JMXConnection(endpoint, port). See Shuffle.enableRelocations() in org.apache.cassandra.tools. Have you set up credentials for JMX? Regards, Romain De :Tamar Rosen ta...@correlor.com A : user@cassandra.apache.org, Cc :Vitaly Sourikov vit...@correlor.com, Yair Pinyan y...@correlor.com Date : 29/08/2013 17:35 Objet : Cassandra-shuffle fails Hi, We recently upgraded from version 1.1 to 1.2 It all went well, including setting up vnodes, but shuffle fails. We have 2 nodes, hosted on Amazon AWS The steps we took (on each of our nodes) are pretty straight forward: 1. upgrade binaries 2. adjust cassandra.yaml (keep token) 3. nodetool upgradesstables 4. change cassandra.yaml to vnodes rather than tokens 5. restart cassandra 6. cassandra-shuffle create. All the above went fine. However, the following fails: cassandra-shuffle enable Failed to enable shuffling on 10.194.230.175! Note: 1. The failure is immediate, and consistent. 2. Calling shuffle create on either node prepares the shuffle files for both. 3. I made sure both servers are communicating fine on both 9160 and 7199. Any help will be greatly appreciated. Tamar Tamar Rosen Senior Data Architect Correlor.com
map/reduce performance time and sstable readerŠ.
Has anyone done performance tests on sstable reading vs. M/R? I did a quick test on reading all SSTAbles in a LCS column family on 23 tables and took the average time it took sstable2json(to /dev/null to make it faster) which was 7 seconds per table. (reading to stdout took 16 seconds per table). This then worked out to an estimation of 12.5 hours up to 27 hours(from to stdout calculation). I am suspecting the map/reduce time may be much worse since there are not as many repeated rows in LCS Ie. I am wondering if I should just read from SSTAbles directly instead of map/reduce? I am about to dig around in the code of M/R and sstable2json to see what each is doing specifically. Thanks, Dean
is there a SSTAbleInput for Map/Reduce instead of ColumnFamily?
is there a SSTableInput for Map/Reduce instead of ColumnFamily (which uses thrift)? We are not worried about repeated reads since we are idempotent but would rather have the direct speed (even if we had to read from a snapshot, it would be fine). (We would most likely run our M/R on 4 nodes of the 12 nodes we have since we have RF=3 right now). Thanks, Dean
RE: Truncate question
Thank you all for your responses. Yes I have cleared the snapshots post truncate operation. Thanks,SC Date: Thu, 29 Aug 2013 21:41:25 -0400 Subject: Re: Truncate question From: dmcne...@gmail.com To: user@cassandra.apache.org You would, however, want to clear the snapshot folder afterword, right? I thought that truncate, like drop table, created a snapshot (unless that feature had been disabled in your yaml. On Thu, Aug 29, 2013 at 6:51 PM, Robert Coli rc...@eventbrite.com wrote: On Thu, Aug 29, 2013 at 3:48 PM, S C as...@outlook.com wrote: Do we have to run nodetool repair or nodetool cleanup after Truncating a Column Family? No. Why would you? =Rob
Re: Upgrade from 1.0.9 to 1.2.8
Does your previous snapshot include the system keyspace? I haven't tried upgrading from 1.0.x then rolling back, but it's possible there's some backwards incompatible changes.Other than that, make sure you also rolled back your config files? On Aug 30, 2013, at 8:57 AM, Mike Neir m...@liquidweb.com wrote: Greetings folks, I'm faced with the need to update a 36 node cluster with roughly 25T of data on disk to a version of cassandra in the 1.2.x series. While it seems that 1.2.8 will play nicely in the 1.0.9 cluster long enough to do a rolling upgrade, I'd still like to have a roll-back plan in case the rolling upgrade goes sideways. I've tried to upgrade a single node in my dev cluster, then roll back using a snapshot taken previously, but things don't appear to be going smoothly. The node will rejoin the ring eventually, but not after spending some time in the Joining state as shown by nodetool ring, and spewing a ton of error messages similar to the following: ERROR [MutationStage:31] 2013-08-29 14:07:20,530 RowMutationVerbHandler.java (line 61) Error in row mutation org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1178 My test procedure is as follows: 1) nodetool -h localhost snapshot 2) nodetool -h localhost drain 3) service cassandra stop 4) back up cassandra configs 5) remove cassandra 1.0.9 6) install cassandra 1.2.8 7) restore cassandra configs, alter them to remove configuration entries no longer used 8) start cassandra 1.2.8, let it run for a bit, then drain/stop it 9) remove cassandra 1.2.8 10) reinstall cassandra 1.0.9 11) restore original cassandra configs 12) remove any commit logs present 13) remove folders for system_auth and system_traces Keyspaces (since they don't seem to be present in 1.0.9) 14) Move snapshots back to where they should be for 1.0.9 and remove cass 1.2.8 data # cd /var/lib/cassandra/data/$KEYSPACE/ # mv */snapshots/$TIMESTAMP/* . # find . -mindepth 1 -type d -exec rm -rf {} \; # cd /var/lib/cassandra/data/system # mv */snapshots/$TIMESTAMP/* . # find . -mindepth 1 -type d -exec rm -rf {} \; 15) start cassandra 1.0.9 16) observe cassandra system.log Does anyone have any insight on things I may be doing wrong, or whether this is just an unavoidable pain point caused by rolling back? It seems that since there are no schema changes going on, the node should be able to just hop back into the cluster without error and without transitioning through the Joining state. -- Mike Neir Liquid Web, Inc. Infrastructure Administrator
Re: Upgrade from 1.0.9 to 1.2.8
Sorry, I didn't see the test procedure, it's still early. On Aug 30, 2013, at 8:57 AM, Mike Neir m...@liquidweb.com wrote: Greetings folks, I'm faced with the need to update a 36 node cluster with roughly 25T of data on disk to a version of cassandra in the 1.2.x series. While it seems that 1.2.8 will play nicely in the 1.0.9 cluster long enough to do a rolling upgrade, I'd still like to have a roll-back plan in case the rolling upgrade goes sideways. I've tried to upgrade a single node in my dev cluster, then roll back using a snapshot taken previously, but things don't appear to be going smoothly. The node will rejoin the ring eventually, but not after spending some time in the Joining state as shown by nodetool ring, and spewing a ton of error messages similar to the following: ERROR [MutationStage:31] 2013-08-29 14:07:20,530 RowMutationVerbHandler.java (line 61) Error in row mutation org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1178 My test procedure is as follows: 1) nodetool -h localhost snapshot 2) nodetool -h localhost drain 3) service cassandra stop 4) back up cassandra configs 5) remove cassandra 1.0.9 6) install cassandra 1.2.8 7) restore cassandra configs, alter them to remove configuration entries no longer used 8) start cassandra 1.2.8, let it run for a bit, then drain/stop it 9) remove cassandra 1.2.8 10) reinstall cassandra 1.0.9 11) restore original cassandra configs 12) remove any commit logs present 13) remove folders for system_auth and system_traces Keyspaces (since they don't seem to be present in 1.0.9) 14) Move snapshots back to where they should be for 1.0.9 and remove cass 1.2.8 data # cd /var/lib/cassandra/data/$KEYSPACE/ # mv */snapshots/$TIMESTAMP/* . # find . -mindepth 1 -type d -exec rm -rf {} \; # cd /var/lib/cassandra/data/system # mv */snapshots/$TIMESTAMP/* . # find . -mindepth 1 -type d -exec rm -rf {} \; 15) start cassandra 1.0.9 16) observe cassandra system.log Does anyone have any insight on things I may be doing wrong, or whether this is just an unavoidable pain point caused by rolling back? It seems that since there are no schema changes going on, the node should be able to just hop back into the cluster without error and without transitioning through the Joining state. -- Mike Neir Liquid Web, Inc. Infrastructure Administrator
Re: Upgrade from 1.0.9 to 1.2.8
On Fri, Aug 30, 2013 at 8:57 AM, Mike Neir m...@liquidweb.com wrote: I'm faced with the need to update a 36 node cluster with roughly 25T of data on disk to a version of cassandra in the 1.2.x series. While it seems that 1.2.8 will play nicely in the 1.0.9 cluster long enough to do a rolling upgrade, I'd still like to have a roll-back plan in case the rolling upgrade goes sideways. Upgrading two major versions online is an unsupported operation. I would not expect it to work. Is there a detailed reason you believe it should work between these versions? Also, instead of 1.2.8 you should upgrade to 1.2.9, released yesterday. Everyone headed to 2.0 has to pass through 1.2.9. =Rob
Upgrade from 1.0.9 to 1.2.8
Greetings folks, I'm faced with the need to update a 36 node cluster with roughly 25T of data on disk to a version of cassandra in the 1.2.x series. While it seems that 1.2.8 will play nicely in the 1.0.9 cluster long enough to do a rolling upgrade, I'd still like to have a roll-back plan in case the rolling upgrade goes sideways. I've tried to upgrade a single node in my dev cluster, then roll back using a snapshot taken previously, but things don't appear to be going smoothly. The node will rejoin the ring eventually, but not after spending some time in the Joining state as shown by nodetool ring, and spewing a ton of error messages similar to the following: ERROR [MutationStage:31] 2013-08-29 14:07:20,530 RowMutationVerbHandler.java (line 61) Error in row mutation org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1178 My test procedure is as follows: 1) nodetool -h localhost snapshot 2) nodetool -h localhost drain 3) service cassandra stop 4) back up cassandra configs 5) remove cassandra 1.0.9 6) install cassandra 1.2.8 7) restore cassandra configs, alter them to remove configuration entries no longer used 8) start cassandra 1.2.8, let it run for a bit, then drain/stop it 9) remove cassandra 1.2.8 10) reinstall cassandra 1.0.9 11) restore original cassandra configs 12) remove any commit logs present 13) remove folders for system_auth and system_traces Keyspaces (since they don't seem to be present in 1.0.9) 14) Move snapshots back to where they should be for 1.0.9 and remove cass 1.2.8 data # cd /var/lib/cassandra/data/$KEYSPACE/ # mv */snapshots/$TIMESTAMP/* . # find . -mindepth 1 -type d -exec rm -rf {} \; # cd /var/lib/cassandra/data/system # mv */snapshots/$TIMESTAMP/* . # find . -mindepth 1 -type d -exec rm -rf {} \; 15) start cassandra 1.0.9 16) observe cassandra system.log Does anyone have any insight on things I may be doing wrong, or whether this is just an unavoidable pain point caused by rolling back? It seems that since there are no schema changes going on, the node should be able to just hop back into the cluster without error and without transitioning through the Joining state. -- Mike Neir Liquid Web, Inc. Infrastructure Administrator
Re: Upgrade from 1.0.9 to 1.2.8
If you have multiple DCs you at least want to upgrade to 1.0.11. There is an issue where you might get errors during cross DC replication. On Fri, Aug 30, 2013 at 9:41 AM, Mike Neir m...@liquidweb.com wrote: In my testing, mixing 1.0.9 and 1.2.8 seems to work fine as long as there is no need to do streaming operations (move/repair/bootstrap/etc). The reading I've done confirms that 1.2.x should be network-compatible with 1.0.x, sans streaming operations. Datastax seems to indicate here that doing a rolling upgrade from 1.0.x to 1.2.x is viable: http://www.datastax.com/**documentation/cassandra/1.2/** webhelp/#upgrade/upgradeC_c.**html#concept_ds_nht_czr_ckhttp://www.datastax.com/documentation/cassandra/1.2/webhelp/#upgrade/upgradeC_c.html%23concept_ds_nht_czr_ck See the second bullet point in the Prerequisites section. I'll look into 1.2.9. It wasn't available when I started my testing. MN On 08/30/2013 12:15 PM, Robert Coli wrote: On Fri, Aug 30, 2013 at 8:57 AM, Mike Neir m...@liquidweb.com mailto:m...@liquidweb.com wrote: I'm faced with the need to update a 36 node cluster with roughly 25T of data on disk to a version of cassandra in the 1.2.x series. While it seems that 1.2.8 will play nicely in the 1.0.9 cluster long enough to do a rolling upgrade, I'd still like to have a roll-back plan in case the rolling upgrade goes sideways. Upgrading two major versions online is an unsupported operation. I would not expect it to work. Is there a detailed reason you believe it should work between these versions? Also, instead of 1.2.8 you should upgrade to 1.2.9, released yesterday. Everyone headed to 2.0 has to pass through 1.2.9. =Rob -- Mike Neir Liquid Web, Inc. Infrastructure Administrator
Re: Upgrade from 1.0.9 to 1.2.8
In my testing, mixing 1.0.9 and 1.2.8 seems to work fine as long as there is no need to do streaming operations (move/repair/bootstrap/etc). The reading I've done confirms that 1.2.x should be network-compatible with 1.0.x, sans streaming operations. Datastax seems to indicate here that doing a rolling upgrade from 1.0.x to 1.2.x is viable: http://www.datastax.com/documentation/cassandra/1.2/webhelp/#upgrade/upgradeC_c.html#concept_ds_nht_czr_ck See the second bullet point in the Prerequisites section. I'll look into 1.2.9. It wasn't available when I started my testing. MN On 08/30/2013 12:15 PM, Robert Coli wrote: On Fri, Aug 30, 2013 at 8:57 AM, Mike Neir m...@liquidweb.com mailto:m...@liquidweb.com wrote: I'm faced with the need to update a 36 node cluster with roughly 25T of data on disk to a version of cassandra in the 1.2.x series. While it seems that 1.2.8 will play nicely in the 1.0.9 cluster long enough to do a rolling upgrade, I'd still like to have a roll-back plan in case the rolling upgrade goes sideways. Upgrading two major versions online is an unsupported operation. I would not expect it to work. Is there a detailed reason you believe it should work between these versions? Also, instead of 1.2.8 you should upgrade to 1.2.9, released yesterday. Everyone headed to 2.0 has to pass through 1.2.9. =Rob -- Mike Neir Liquid Web, Inc. Infrastructure Administrator
Update-Replace
Hi, I have a use case, where I periodically need to apply updates to a wide row that should replace the whole row. The straight-forward insert/update only replace values that are present in the executed statement, keeping remaining data around. Is there a smooth way to do a replace with C* or do I have to handle this by the application (e.g. doing delete and then write or coming up with a more clever data model)? Jan
[ANNOUNCE] Polidoro - A Cassandra client in Scala
Hi all, We've open sourced Polidoro. It's a Cassandra client in Scala on top of Astyanax and in the style of Cascal. Find it at https://github.com/SpotRight/Polidoro -Lanny Ripple SpotRight, Inc - http://spotright.com
Re: Upgrade from 1.0.9 to 1.2.8
You probably want to go to 1.0.11/12 first no matter what. If you want the least chance of issue you should then go to 1.1.12. While there is a high probability that going from 1.0.X-1.2 will work. You have the best chance at no failures if you go through 1.1.12. There are some edge cases that can cause errors if you don't do that. -Jeremiah On Aug 30, 2013, at 11:41 AM, Mike Neir m...@liquidweb.com wrote: In my testing, mixing 1.0.9 and 1.2.8 seems to work fine as long as there is no need to do streaming operations (move/repair/bootstrap/etc). The reading I've done confirms that 1.2.x should be network-compatible with 1.0.x, sans streaming operations. Datastax seems to indicate here that doing a rolling upgrade from 1.0.x to 1.2.x is viable: http://www.datastax.com/documentation/cassandra/1.2/webhelp/#upgrade/upgradeC_c.html#concept_ds_nht_czr_ck See the second bullet point in the Prerequisites section. I'll look into 1.2.9. It wasn't available when I started my testing. MN On 08/30/2013 12:15 PM, Robert Coli wrote: On Fri, Aug 30, 2013 at 8:57 AM, Mike Neir m...@liquidweb.com mailto:m...@liquidweb.com wrote: I'm faced with the need to update a 36 node cluster with roughly 25T of data on disk to a version of cassandra in the 1.2.x series. While it seems that 1.2.8 will play nicely in the 1.0.9 cluster long enough to do a rolling upgrade, I'd still like to have a roll-back plan in case the rolling upgrade goes sideways. Upgrading two major versions online is an unsupported operation. I would not expect it to work. Is there a detailed reason you believe it should work between these versions? Also, instead of 1.2.8 you should upgrade to 1.2.9, released yesterday. Everyone headed to 2.0 has to pass through 1.2.9. =Rob -- Mike Neir Liquid Web, Inc. Infrastructure Administrator
Re: CQL Thrift
If you're going to work with CQL, work with CQL. If you're going to work with Thrift, work with Thrift. Don't mix. On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote: Hi, If i a create a table with CQL3 as create table user(user_id text PRIMARY KEY, first_name text, last_name text, emailid text); and create index as: create index on user(first_name); then inserted some data as: insert into user(user_id,first_name,last_name,emailId) values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in'); Then if update same column family using Cassandra-cli as: update column family user with key_validation_class='UTF8Type' and column_metadata=[{column_name:last_name, validation_class:'UTF8Type', index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', index_type:KEYS}]; Now if i connect via cqlsh and explore user table, i can see column first_name,last_name are not part of table structure anymore. Here is the output: CREATE TABLE user ( key text PRIMARY KEY ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; cqlsh:cql3usage select * from user; user_id - @mevivs I understand that, CQL3 and thrift interoperability is an issue. But this looks to me a very basic scenario. Any suggestions? Or If anybody can explain a reason behind this? -Vivek
Re: is there a SSTAbleInput for Map/Reduce instead of ColumnFamily?
FYI: http://techblog.netflix.com/2012/02/aegisthus-bulk-data-pipeline-out-of.html -Jeremiah On Aug 30, 2013, at 9:21 AM, Hiller, Dean dean.hil...@nrel.gov wrote: is there a SSTableInput for Map/Reduce instead of ColumnFamily (which uses thrift)? We are not worried about repeated reads since we are idempotent but would rather have the direct speed (even if we had to read from a snapshot, it would be fine). (We would most likely run our M/R on 4 nodes of the 12 nodes we have since we have RF=3 right now). Thanks, Dean
Re: CQL Thrift
And surprisingly if i alter table as : alter table user add first_name text; alter table user add last_name text; It gives me back column with values, but still no indexes. Thrift and CQL3 depends on same storage engine. Do they really maintain different metadata for same column family? -Vivek On Fri, Aug 30, 2013 at 11:08 PM, Vivek Mishra mishra.v...@gmail.comwrote: Hi, If i a create a table with CQL3 as create table user(user_id text PRIMARY KEY, first_name text, last_name text, emailid text); and create index as: create index on user(first_name); then inserted some data as: insert into user(user_id,first_name,last_name,emailId) values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in'); Then if update same column family using Cassandra-cli as: update column family user with key_validation_class='UTF8Type' and column_metadata=[{column_name:last_name, validation_class:'UTF8Type', index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', index_type:KEYS}]; Now if i connect via cqlsh and explore user table, i can see column first_name,last_name are not part of table structure anymore. Here is the output: CREATE TABLE user ( key text PRIMARY KEY ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; cqlsh:cql3usage select * from user; user_id - @mevivs I understand that, CQL3 and thrift interoperability is an issue. But this looks to me a very basic scenario. Any suggestions? Or If anybody can explain a reason behind this? -Vivek
Re: CQL Thrift
in my case, I built a temporal database on top of Cassandra, so it's absolutely key. Dynamic columns are super powerful, which relational database have no equivalent. For me, that is one of the top 3 reasons for using Cassandra. On Fri, Aug 30, 2013 at 2:03 PM, Vivek Mishra mishra.v...@gmail.com wrote: If you talk about comparator. Yes, that's a valid point and not possible with CQL3. -Vivek On Fri, Aug 30, 2013 at 11:31 PM, Peter Lin wool...@gmail.com wrote: I use dynamic columns all the time and they vary in type. With CQL you can define a default type, but you can't insert specific types of data for column name and value. It forces you to use all bytes or all strings, which would require coverting it to other types. thrift is much more powerful in that respect. not everyone needs to take advantage of the full power of dynamic columns. On Fri, Aug 30, 2013 at 1:58 PM, Jon Haddad j...@jonhaddad.com wrote: Just curious - what do you need to do that requires thrift? We've build our entire platform using CQL3 and we haven't hit any issues. On Aug 30, 2013, at 10:53 AM, Peter Lin wool...@gmail.com wrote: my bias perspective, I find the sweet spot is thrift for insert/update and CQL for select queries. CQL is too limiting and negates the power of storing arbitrary data types in dynamic columns. On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote: If you're going to work with CQL, work with CQL. If you're going to work with Thrift, work with Thrift. Don't mix. On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote: Hi, If i a create a table with CQL3 as create table user(user_id text PRIMARY KEY, first_name text, last_name text, emailid text); and create index as: create index on user(first_name); then inserted some data as: insert into user(user_id,first_name,last_name,emailId) values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in'); Then if update same column family using Cassandra-cli as: update column family user with key_validation_class='UTF8Type' and column_metadata=[{column_name:last_name, validation_class:'UTF8Type', index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', index_type:KEYS}]; Now if i connect via cqlsh and explore user table, i can see column first_name,last_name are not part of table structure anymore. Here is the output: CREATE TABLE user ( key text PRIMARY KEY ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; cqlsh:cql3usage select * from user; user_id - @mevivs I understand that, CQL3 and thrift interoperability is an issue. But this looks to me a very basic scenario. Any suggestions? Or If anybody can explain a reason behind this? -Vivek
Re: CQL Thrift
Hi, I understand that, but i want to understand the reason behind such behavior? Is it because of maintaining different metadata objects for CQL3 and thrift? Any suggestion? -Vivek On Fri, Aug 30, 2013 at 11:15 PM, Jon Haddad j...@jonhaddad.com wrote: If you're going to work with CQL, work with CQL. If you're going to work with Thrift, work with Thrift. Don't mix. On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote: Hi, If i a create a table with CQL3 as create table user(user_id text PRIMARY KEY, first_name text, last_name text, emailid text); and create index as: create index on user(first_name); then inserted some data as: insert into user(user_id,first_name,last_name,emailId) values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in'); Then if update same column family using Cassandra-cli as: update column family user with key_validation_class='UTF8Type' and column_metadata=[{column_name:last_name, validation_class:'UTF8Type', index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', index_type:KEYS}]; Now if i connect via cqlsh and explore user table, i can see column first_name,last_name are not part of table structure anymore. Here is the output: CREATE TABLE user ( key text PRIMARY KEY ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; cqlsh:cql3usage select * from user; user_id - @mevivs I understand that, CQL3 and thrift interoperability is an issue. But this looks to me a very basic scenario. Any suggestions? Or If anybody can explain a reason behind this? -Vivek
Re: CQL Thrift
Could you please give a more concrete example? On Aug 30, 2013, at 11:10 AM, Peter Lin wool...@gmail.com wrote: in my case, I built a temporal database on top of Cassandra, so it's absolutely key. Dynamic columns are super powerful, which relational database have no equivalent. For me, that is one of the top 3 reasons for using Cassandra. On Fri, Aug 30, 2013 at 2:03 PM, Vivek Mishra mishra.v...@gmail.com wrote: If you talk about comparator. Yes, that's a valid point and not possible with CQL3. -Vivek On Fri, Aug 30, 2013 at 11:31 PM, Peter Lin wool...@gmail.com wrote: I use dynamic columns all the time and they vary in type. With CQL you can define a default type, but you can't insert specific types of data for column name and value. It forces you to use all bytes or all strings, which would require coverting it to other types. thrift is much more powerful in that respect. not everyone needs to take advantage of the full power of dynamic columns. On Fri, Aug 30, 2013 at 1:58 PM, Jon Haddad j...@jonhaddad.com wrote: Just curious - what do you need to do that requires thrift? We've build our entire platform using CQL3 and we haven't hit any issues. On Aug 30, 2013, at 10:53 AM, Peter Lin wool...@gmail.com wrote: my bias perspective, I find the sweet spot is thrift for insert/update and CQL for select queries. CQL is too limiting and negates the power of storing arbitrary data types in dynamic columns. On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote: If you're going to work with CQL, work with CQL. If you're going to work with Thrift, work with Thrift. Don't mix. On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote: Hi, If i a create a table with CQL3 as create table user(user_id text PRIMARY KEY, first_name text, last_name text, emailid text); and create index as: create index on user(first_name); then inserted some data as: insert into user(user_id,first_name,last_name,emailId) values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in'); Then if update same column family using Cassandra-cli as: update column family user with key_validation_class='UTF8Type' and column_metadata=[{column_name:last_name, validation_class:'UTF8Type', index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', index_type:KEYS}]; Now if i connect via cqlsh and explore user table, i can see column first_name,last_name are not part of table structure anymore. Here is the output: CREATE TABLE user ( key text PRIMARY KEY ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; cqlsh:cql3usage select * from user; user_id - @mevivs I understand that, CQL3 and thrift interoperability is an issue. But this looks to me a very basic scenario. Any suggestions? Or If anybody can explain a reason behind this? -Vivek
Re: CQL Thrift
http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin wool...@gmail.com wrote: my bias perspective, I find the sweet spot is thrift for insert/update and CQL for select queries. CQL is too limiting and negates the power of storing arbitrary data types in dynamic columns. On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote: If you're going to work with CQL, work with CQL. If you're going to work with Thrift, work with Thrift. Don't mix. On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote: Hi, If i a create a table with CQL3 as create table user(user_id text PRIMARY KEY, first_name text, last_name text, emailid text); and create index as: create index on user(first_name); then inserted some data as: insert into user(user_id,first_name,last_name,emailId) values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in'); Then if update same column family using Cassandra-cli as: update column family user with key_validation_class='UTF8Type' and column_metadata=[{column_name:last_name, validation_class:'UTF8Type', index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', index_type:KEYS}]; Now if i connect via cqlsh and explore user table, i can see column first_name,last_name are not part of table structure anymore. Here is the output: CREATE TABLE user ( key text PRIMARY KEY ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; cqlsh:cql3usage select * from user; user_id - @mevivs I understand that, CQL3 and thrift interoperability is an issue. But this looks to me a very basic scenario. Any suggestions? Or If anybody can explain a reason behind this? -Vivek -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: CQL Thrift
Just curious - what do you need to do that requires thrift? We've build our entire platform using CQL3 and we haven't hit any issues. On Aug 30, 2013, at 10:53 AM, Peter Lin wool...@gmail.com wrote: my bias perspective, I find the sweet spot is thrift for insert/update and CQL for select queries. CQL is too limiting and negates the power of storing arbitrary data types in dynamic columns. On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote: If you're going to work with CQL, work with CQL. If you're going to work with Thrift, work with Thrift. Don't mix. On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote: Hi, If i a create a table with CQL3 as create table user(user_id text PRIMARY KEY, first_name text, last_name text, emailid text); and create index as: create index on user(first_name); then inserted some data as: insert into user(user_id,first_name,last_name,emailId) values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in'); Then if update same column family using Cassandra-cli as: update column family user with key_validation_class='UTF8Type' and column_metadata=[{column_name:last_name, validation_class:'UTF8Type', index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', index_type:KEYS}]; Now if i connect via cqlsh and explore user table, i can see column first_name,last_name are not part of table structure anymore. Here is the output: CREATE TABLE user ( key text PRIMARY KEY ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; cqlsh:cql3usage select * from user; user_id - @mevivs I understand that, CQL3 and thrift interoperability is an issue. But this looks to me a very basic scenario. Any suggestions? Or If anybody can explain a reason behind this? -Vivek
Re: CQL Thrift
CQL is too limiting and negates the power of storing arbitrary data types in dynamic columns. I agree but partly. You can always create column family with key, column and value and store any number of arbitrary columns as column name in column and it's corresponding value with value. I find it much easier. Coming back to original question, i think differentiator is the column metadata is treated in thrift and CQL3. What i do not understand is, for same column family if maintaining two set of metadata objects(CqlMetadata,CFDef), why updating anyone would cause trouble for another! -Vivek On Fri, Aug 30, 2013 at 11:23 PM, Peter Lin wool...@gmail.com wrote: my bias perspective, I find the sweet spot is thrift for insert/update and CQL for select queries. CQL is too limiting and negates the power of storing arbitrary data types in dynamic columns. On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote: If you're going to work with CQL, work with CQL. If you're going to work with Thrift, work with Thrift. Don't mix. On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote: Hi, If i a create a table with CQL3 as create table user(user_id text PRIMARY KEY, first_name text, last_name text, emailid text); and create index as: create index on user(first_name); then inserted some data as: insert into user(user_id,first_name,last_name,emailId) values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in'); Then if update same column family using Cassandra-cli as: update column family user with key_validation_class='UTF8Type' and column_metadata=[{column_name:last_name, validation_class:'UTF8Type', index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', index_type:KEYS}]; Now if i connect via cqlsh and explore user table, i can see column first_name,last_name are not part of table structure anymore. Here is the output: CREATE TABLE user ( key text PRIMARY KEY ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; cqlsh:cql3usage select * from user; user_id - @mevivs I understand that, CQL3 and thrift interoperability is an issue. But this looks to me a very basic scenario. Any suggestions? Or If anybody can explain a reason behind this? -Vivek
Re: CQL Thrift
I use dynamic columns all the time and they vary in type. With CQL you can define a default type, but you can't insert specific types of data for column name and value. It forces you to use all bytes or all strings, which would require coverting it to other types. thrift is much more powerful in that respect. not everyone needs to take advantage of the full power of dynamic columns. On Fri, Aug 30, 2013 at 1:58 PM, Jon Haddad j...@jonhaddad.com wrote: Just curious - what do you need to do that requires thrift? We've build our entire platform using CQL3 and we haven't hit any issues. On Aug 30, 2013, at 10:53 AM, Peter Lin wool...@gmail.com wrote: my bias perspective, I find the sweet spot is thrift for insert/update and CQL for select queries. CQL is too limiting and negates the power of storing arbitrary data types in dynamic columns. On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote: If you're going to work with CQL, work with CQL. If you're going to work with Thrift, work with Thrift. Don't mix. On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote: Hi, If i a create a table with CQL3 as create table user(user_id text PRIMARY KEY, first_name text, last_name text, emailid text); and create index as: create index on user(first_name); then inserted some data as: insert into user(user_id,first_name,last_name,emailId) values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in'); Then if update same column family using Cassandra-cli as: update column family user with key_validation_class='UTF8Type' and column_metadata=[{column_name:last_name, validation_class:'UTF8Type', index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', index_type:KEYS}]; Now if i connect via cqlsh and explore user table, i can see column first_name,last_name are not part of table structure anymore. Here is the output: CREATE TABLE user ( key text PRIMARY KEY ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; cqlsh:cql3usage select * from user; user_id - @mevivs I understand that, CQL3 and thrift interoperability is an issue. But this looks to me a very basic scenario. Any suggestions? Or If anybody can explain a reason behind this? -Vivek
Re: CQL Thrift
True for newly build platform(s), but what about existing apps build using thrift? As per http:// www.datastax.com/dev/blog/thrift-to-cql3http://www.datastax.com/dev/blog/thrift-to-cql3 it should be easy. I am just curious to understand the real reason behind such behavior. -Vivek On Fri, Aug 30, 2013 at 11:28 PM, Jon Haddad j...@jonhaddad.com wrote: Just curious - what do you need to do that requires thrift? We've build our entire platform using CQL3 and we haven't hit any issues. On Aug 30, 2013, at 10:53 AM, Peter Lin wool...@gmail.com wrote: my bias perspective, I find the sweet spot is thrift for insert/update and CQL for select queries. CQL is too limiting and negates the power of storing arbitrary data types in dynamic columns. On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote: If you're going to work with CQL, work with CQL. If you're going to work with Thrift, work with Thrift. Don't mix. On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote: Hi, If i a create a table with CQL3 as create table user(user_id text PRIMARY KEY, first_name text, last_name text, emailid text); and create index as: create index on user(first_name); then inserted some data as: insert into user(user_id,first_name,last_name,emailId) values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in'); Then if update same column family using Cassandra-cli as: update column family user with key_validation_class='UTF8Type' and column_metadata=[{column_name:last_name, validation_class:'UTF8Type', index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', index_type:KEYS}]; Now if i connect via cqlsh and explore user table, i can see column first_name,last_name are not part of table structure anymore. Here is the output: CREATE TABLE user ( key text PRIMARY KEY ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; cqlsh:cql3usage select * from user; user_id - @mevivs I understand that, CQL3 and thrift interoperability is an issue. But this looks to me a very basic scenario. Any suggestions? Or If anybody can explain a reason behind this? -Vivek
Re: CQL Thrift
If you talk about comparator. Yes, that's a valid point and not possible with CQL3. -Vivek On Fri, Aug 30, 2013 at 11:31 PM, Peter Lin wool...@gmail.com wrote: I use dynamic columns all the time and they vary in type. With CQL you can define a default type, but you can't insert specific types of data for column name and value. It forces you to use all bytes or all strings, which would require coverting it to other types. thrift is much more powerful in that respect. not everyone needs to take advantage of the full power of dynamic columns. On Fri, Aug 30, 2013 at 1:58 PM, Jon Haddad j...@jonhaddad.com wrote: Just curious - what do you need to do that requires thrift? We've build our entire platform using CQL3 and we haven't hit any issues. On Aug 30, 2013, at 10:53 AM, Peter Lin wool...@gmail.com wrote: my bias perspective, I find the sweet spot is thrift for insert/update and CQL for select queries. CQL is too limiting and negates the power of storing arbitrary data types in dynamic columns. On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote: If you're going to work with CQL, work with CQL. If you're going to work with Thrift, work with Thrift. Don't mix. On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote: Hi, If i a create a table with CQL3 as create table user(user_id text PRIMARY KEY, first_name text, last_name text, emailid text); and create index as: create index on user(first_name); then inserted some data as: insert into user(user_id,first_name,last_name,emailId) values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in'); Then if update same column family using Cassandra-cli as: update column family user with key_validation_class='UTF8Type' and column_metadata=[{column_name:last_name, validation_class:'UTF8Type', index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', index_type:KEYS}]; Now if i connect via cqlsh and explore user table, i can see column first_name,last_name are not part of table structure anymore. Here is the output: CREATE TABLE user ( key text PRIMARY KEY ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; cqlsh:cql3usage select * from user; user_id - @mevivs I understand that, CQL3 and thrift interoperability is an issue. But this looks to me a very basic scenario. Any suggestions? Or If anybody can explain a reason behind this? -Vivek
Re: CQL Thrift
In the interest of education and discussion. I didn't mean to say CQL3 doesn't support dynamic columns. The example from the page shows default type defined in the create statement. create column family data with key_validation_class=Int32Type and comparator=DateType and default_validation_class=FloatType; If I try to insert a dynamic column that uses double for column name and string for column value, it will throw an error. The kind of use case I'm talking about defines a minimum number of static columns. Most of the columns that are added at runtime are different name and value type. This is specific to my use case. Having said that, I believe it would be possible to provide that kind of feature in CQL, but the trade off is it deviates from SQL. The grammar would have to allow type declaration in the columns list and functions in the values. Something like insert into mytable (KEY, doubleType(newcol1), string(newcol2)) values ('abc123', some string, double(102.211)) doubleType(newcol1) and string(newcol2) are dynamic columns. I know many people find thrift hard to grok and struggle with it, but I'm a firm believer in taking time to learn. Every developer should take time to read cassandra source code and the source code for the driver they're using. On Fri, Aug 30, 2013 at 2:18 PM, Jonathan Ellis jbel...@gmail.com wrote: http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin wool...@gmail.com wrote: my bias perspective, I find the sweet spot is thrift for insert/update and CQL for select queries. CQL is too limiting and negates the power of storing arbitrary data types in dynamic columns. On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote: If you're going to work with CQL, work with CQL. If you're going to work with Thrift, work with Thrift. Don't mix. On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote: Hi, If i a create a table with CQL3 as create table user(user_id text PRIMARY KEY, first_name text, last_name text, emailid text); and create index as: create index on user(first_name); then inserted some data as: insert into user(user_id,first_name,last_name,emailId) values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in'); Then if update same column family using Cassandra-cli as: update column family user with key_validation_class='UTF8Type' and column_metadata=[{column_name:last_name, validation_class:'UTF8Type', index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', index_type:KEYS}]; Now if i connect via cqlsh and explore user table, i can see column first_name,last_name are not part of table structure anymore. Here is the output: CREATE TABLE user ( key text PRIMARY KEY ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; cqlsh:cql3usage select * from user; user_id - @mevivs I understand that, CQL3 and thrift interoperability is an issue. But this looks to me a very basic scenario. Any suggestions? Or If anybody can explain a reason behind this? -Vivek -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: CQL Thrift
It sounds like you want this: create table data ( pk int, colname blob, value blob, primary key (pk, colname)); that gives you arbitrary columns (cleverly labeled colname) in a single row, where the value is value. If you don't want the overhead of storing colname in every row, try with compact storage. Does this solve the problem, or am I missing something? On Aug 30, 2013, at 11:45 AM, Peter Lin wool...@gmail.com wrote: you could dynamically create new tables at runtime and insert rows into the new table, but is that better than using thrift and putting it into a regular dynamic column with the exact name type and value type? that would mean if there's 20 dynamic columns of different types, you'd have to execute 21 queries to rebuild the data. That's basically the same as using EVA tables in relational databases. Having used that approach in the past to build temporal databases, it doesn't scale well. On Fri, Aug 30, 2013 at 2:40 PM, Vivek Mishra mishra.v...@gmail.com wrote: create a column family as: create table dynamicTable(key text, nameAsDouble double, valueAsBlob blob); insert into dynamicTable(key, nameAsDouble, valueAsBlob) values ( key, double(102.211), textAsBlob('valueInBytes'). Do you think, it will work in case column name are double? -Vivek On Sat, Aug 31, 2013 at 12:03 AM, Peter Lin wool...@gmail.com wrote: In the interest of education and discussion. I didn't mean to say CQL3 doesn't support dynamic columns. The example from the page shows default type defined in the create statement. create column family data with key_validation_class=Int32Type and comparator=DateType and default_validation_class=FloatType; If I try to insert a dynamic column that uses double for column name and string for column value, it will throw an error. The kind of use case I'm talking about defines a minimum number of static columns. Most of the columns that are added at runtime are different name and value type. This is specific to my use case. Having said that, I believe it would be possible to provide that kind of feature in CQL, but the trade off is it deviates from SQL. The grammar would have to allow type declaration in the columns list and functions in the values. Something like insert into mytable (KEY, doubleType(newcol1), string(newcol2)) values ('abc123', some string, double(102.211)) doubleType(newcol1) and string(newcol2) are dynamic columns. I know many people find thrift hard to grok and struggle with it, but I'm a firm believer in taking time to learn. Every developer should take time to read cassandra source code and the source code for the driver they're using. On Fri, Aug 30, 2013 at 2:18 PM, Jonathan Ellis jbel...@gmail.com wrote: http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin wool...@gmail.com wrote: my bias perspective, I find the sweet spot is thrift for insert/update and CQL for select queries. CQL is too limiting and negates the power of storing arbitrary data types in dynamic columns. On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote: If you're going to work with CQL, work with CQL. If you're going to work with Thrift, work with Thrift. Don't mix. On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote: Hi, If i a create a table with CQL3 as create table user(user_id text PRIMARY KEY, first_name text, last_name text, emailid text); and create index as: create index on user(first_name); then inserted some data as: insert into user(user_id,first_name,last_name,emailId) values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in'); Then if update same column family using Cassandra-cli as: update column family user with key_validation_class='UTF8Type' and column_metadata=[{column_name:last_name, validation_class:'UTF8Type', index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', index_type:KEYS}]; Now if i connect via cqlsh and explore user table, i can see column first_name,last_name are not part of table structure anymore. Here is the output: CREATE TABLE user ( key text PRIMARY KEY ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; cqlsh:cql3usage select * from user; user_id - @mevivs I understand that, CQL3 and thrift interoperability is an issue. But this looks to me a very basic scenario. Any suggestions? Or If anybody can explain a reason behind this? -Vivek -- Jonathan
Re: CQL Thrift
On Fri, Aug 30, 2013 at 10:58 AM, Jon Haddad j...@jonhaddad.com wrote: Just curious - what do you need to do that requires thrift? We've build our entire platform using CQL3 and we haven't hit any issues. Here's one thing: If you're using wide rows and you want to do anything other than just append individual columns to the row, then CQL3 (as it functions currently) is way too slow. I just created the following Jira issue 5 minutes ago because we've been fighting with this issue for the last 2 days. Our workaround was to swap out CQL3 + DataStax Java Driver in favor of Astyanax for this particular use case: https://issues.apache.org/jira/browse/CASSANDRA-5959 Cheers, -- Les Hazlewood | @lhazlewood CTO, Stormpath | http://stormpath.com | @goStormpath | 888.391.5282
Re: CQL Thrift
Did you try to explore CQL3 collection support for the same? You can definitely save on number of rows with that. Point which i am trying to make out is, you can achieve it via CQL3 ( Jonathan's blog : http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows) I agree with you that still thrift may have some valid points to prove, but considering latest development around new Cassandra features, i think CQL3 is the path to follow. -Vivek On Sat, Aug 31, 2013 at 12:15 AM, Peter Lin wool...@gmail.com wrote: you could dynamically create new tables at runtime and insert rows into the new table, but is that better than using thrift and putting it into a regular dynamic column with the exact name type and value type? that would mean if there's 20 dynamic columns of different types, you'd have to execute 21 queries to rebuild the data. That's basically the same as using EVA tables in relational databases. Having used that approach in the past to build temporal databases, it doesn't scale well. On Fri, Aug 30, 2013 at 2:40 PM, Vivek Mishra mishra.v...@gmail.comwrote: create a column family as: create table dynamicTable(key text, nameAsDouble double, valueAsBlob blob); insert into dynamicTable(key, nameAsDouble, valueAsBlob) values ( key, double(102.211), textAsBlob('valueInBytes'). Do you think, it will work in case column name are double? -Vivek On Sat, Aug 31, 2013 at 12:03 AM, Peter Lin wool...@gmail.com wrote: In the interest of education and discussion. I didn't mean to say CQL3 doesn't support dynamic columns. The example from the page shows default type defined in the create statement. create column family data with key_validation_class=Int32Type and comparator=DateType and default_validation_class=FloatType; If I try to insert a dynamic column that uses double for column name and string for column value, it will throw an error. The kind of use case I'm talking about defines a minimum number of static columns. Most of the columns that are added at runtime are different name and value type. This is specific to my use case. Having said that, I believe it would be possible to provide that kind of feature in CQL, but the trade off is it deviates from SQL. The grammar would have to allow type declaration in the columns list and functions in the values. Something like insert into mytable (KEY, doubleType(newcol1), string(newcol2)) values ('abc123', some string, double(102.211)) doubleType(newcol1) and string(newcol2) are dynamic columns. I know many people find thrift hard to grok and struggle with it, but I'm a firm believer in taking time to learn. Every developer should take time to read cassandra source code and the source code for the driver they're using. On Fri, Aug 30, 2013 at 2:18 PM, Jonathan Ellis jbel...@gmail.comwrote: http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin wool...@gmail.com wrote: my bias perspective, I find the sweet spot is thrift for insert/update and CQL for select queries. CQL is too limiting and negates the power of storing arbitrary data types in dynamic columns. On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote: If you're going to work with CQL, work with CQL. If you're going to work with Thrift, work with Thrift. Don't mix. On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote: Hi, If i a create a table with CQL3 as create table user(user_id text PRIMARY KEY, first_name text, last_name text, emailid text); and create index as: create index on user(first_name); then inserted some data as: insert into user(user_id,first_name,last_name,emailId) values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in'); Then if update same column family using Cassandra-cli as: update column family user with key_validation_class='UTF8Type' and column_metadata=[{column_name:last_name, validation_class:'UTF8Type', index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', index_type:KEYS}]; Now if i connect via cqlsh and explore user table, i can see column first_name,last_name are not part of table structure anymore. Here is the output: CREATE TABLE user ( key text PRIMARY KEY ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; cqlsh:cql3usage select * from user; user_id - @mevivs I understand that, CQL3 and thrift interoperability is an issue. But this looks to me a very basic scenario. Any suggestions? Or If anybody can explain a reason behind this? -Vivek
Re: CQL Thrift
@lhazlewood https://issues.apache.org/jira/browse/CASSANDRA-5959 Begin batch multiple insert statements. apply batch It doesn't work for you? -Vivek On Sat, Aug 31, 2013 at 12:21 AM, Les Hazlewood lhazlew...@apache.orgwrote: On Fri, Aug 30, 2013 at 10:58 AM, Jon Haddad j...@jonhaddad.com wrote: Just curious - what do you need to do that requires thrift? We've build our entire platform using CQL3 and we haven't hit any issues. Here's one thing: If you're using wide rows and you want to do anything other than just append individual columns to the row, then CQL3 (as it functions currently) is way too slow. I just created the following Jira issue 5 minutes ago because we've been fighting with this issue for the last 2 days. Our workaround was to swap out CQL3 + DataStax Java Driver in favor of Astyanax for this particular use case: https://issues.apache.org/jira/browse/CASSANDRA-5959 Cheers, -- Les Hazlewood | @lhazlewood CTO, Stormpath | http://stormpath.com | @goStormpath | 888.391.5282
CQL Thrift
Hi, If i a create a table with CQL3 as create table user(user_id text PRIMARY KEY, first_name text, last_name text, emailid text); and create index as: create index on user(first_name); then inserted some data as: insert into user(user_id,first_name,last_name,emailId) values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in'); Then if update same column family using Cassandra-cli as: update column family user with key_validation_class='UTF8Type' and column_metadata=[{column_name:last_name, validation_class:'UTF8Type', index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', index_type:KEYS}]; Now if i connect via cqlsh and explore user table, i can see column first_name,last_name are not part of table structure anymore. Here is the output: CREATE TABLE user ( key text PRIMARY KEY ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; cqlsh:cql3usage select * from user; user_id - @mevivs I understand that, CQL3 and thrift interoperability is an issue. But this looks to me a very basic scenario. Any suggestions? Or If anybody can explain a reason behind this? -Vivek
Re: Upgrade from 1.0.9 to 1.2.8
Is there anything that you can link that describes the pitfalls you mention? I'd like a bit more information. Just for clarity's sake, are you recommending 1.0.9 - 1.0.12 - 1.1.12 - 1.2.x? Or would 1.0.9 - 1.1.12 - 1.2.x suffice? Regarding the placement strategy mentioned in a different post, I'm using the Simple placement strategy, with the RackInferringSnitch. How does that play into the bugs mentioned previously about cross-DC replication? MN On 08/30/2013 01:28 PM, Jeremiah D Jordan wrote: You probably want to go to 1.0.11/12 first no matter what. If you want the least chance of issue you should then go to 1.1.12. While there is a high probability that going from 1.0.X-1.2 will work. You have the best chance at no failures if you go through 1.1.12. There are some edge cases that can cause errors if you don't do that. -Jeremiah
Re: CQL Thrift
CQL3 collections is meant to store stuff that is list, set, map. Plus, collections currently do not supporting secondary indexes. The point is often you don't know what columns are needed at design time. If you know what's needed, use static columns. Using a list, set or map to store data you don't know and can't predict in the future feels like a hammer solution. Cassandra has this super powerful and useful feature that developers can use via thrift. The last time I looked DataStax's official statement is that thrift isn't going away, so I take them at their word. On Fri, Aug 30, 2013 at 2:51 PM, Vivek Mishra mishra.v...@gmail.com wrote: Did you try to explore CQL3 collection support for the same? You can definitely save on number of rows with that. Point which i am trying to make out is, you can achieve it via CQL3 ( Jonathan's blog : http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows ) I agree with you that still thrift may have some valid points to prove, but considering latest development around new Cassandra features, i think CQL3 is the path to follow. -Vivek On Sat, Aug 31, 2013 at 12:15 AM, Peter Lin wool...@gmail.com wrote: you could dynamically create new tables at runtime and insert rows into the new table, but is that better than using thrift and putting it into a regular dynamic column with the exact name type and value type? that would mean if there's 20 dynamic columns of different types, you'd have to execute 21 queries to rebuild the data. That's basically the same as using EVA tables in relational databases. Having used that approach in the past to build temporal databases, it doesn't scale well. On Fri, Aug 30, 2013 at 2:40 PM, Vivek Mishra mishra.v...@gmail.comwrote: create a column family as: create table dynamicTable(key text, nameAsDouble double, valueAsBlob blob); insert into dynamicTable(key, nameAsDouble, valueAsBlob) values ( key, double(102.211), textAsBlob('valueInBytes'). Do you think, it will work in case column name are double? -Vivek On Sat, Aug 31, 2013 at 12:03 AM, Peter Lin wool...@gmail.com wrote: In the interest of education and discussion. I didn't mean to say CQL3 doesn't support dynamic columns. The example from the page shows default type defined in the create statement. create column family data with key_validation_class=Int32Type and comparator=DateType and default_validation_class=FloatType; If I try to insert a dynamic column that uses double for column name and string for column value, it will throw an error. The kind of use case I'm talking about defines a minimum number of static columns. Most of the columns that are added at runtime are different name and value type. This is specific to my use case. Having said that, I believe it would be possible to provide that kind of feature in CQL, but the trade off is it deviates from SQL. The grammar would have to allow type declaration in the columns list and functions in the values. Something like insert into mytable (KEY, doubleType(newcol1), string(newcol2)) values ('abc123', some string, double(102.211)) doubleType(newcol1) and string(newcol2) are dynamic columns. I know many people find thrift hard to grok and struggle with it, but I'm a firm believer in taking time to learn. Every developer should take time to read cassandra source code and the source code for the driver they're using. On Fri, Aug 30, 2013 at 2:18 PM, Jonathan Ellis jbel...@gmail.comwrote: http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin wool...@gmail.com wrote: my bias perspective, I find the sweet spot is thrift for insert/update and CQL for select queries. CQL is too limiting and negates the power of storing arbitrary data types in dynamic columns. On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.comwrote: If you're going to work with CQL, work with CQL. If you're going to work with Thrift, work with Thrift. Don't mix. On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote: Hi, If i a create a table with CQL3 as create table user(user_id text PRIMARY KEY, first_name text, last_name text, emailid text); and create index as: create index on user(first_name); then inserted some data as: insert into user(user_id,first_name,last_name,emailId) values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in'); Then if update same column family using Cassandra-cli as: update column family user with key_validation_class='UTF8Type' and column_metadata=[{column_name:last_name, validation_class:'UTF8Type', index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', index_type:KEYS}]; Now if i connect via cqlsh and explore user table, i can see column first_name,last_name are not part of table structure anymore. Here is the output: CREATE TABLE user ( key text PRIMARY KEY
Re: successful use of shuffle?
You need to introduce the new vnode enabled nodes in a new DC. Or you will have similar issues to https://issues.apache.org/jira/browse/CASSANDRA-5525 Add vnode DC: http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/operations/ops_add_dc_to_cluster_t.html Point clients to new DC Remove non vnode DC: http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/operations/ops_decomission_dc_t.html -Jeremiah On Aug 30, 2013, at 3:04 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: +1. I am still afraid of this step. Yet you can avoid it by introducing new nodes, with vnodes enabled, and then remove old ones. This should work. My problem is that I am not really confident in vnodes either... Any share, on this transition, and then of the use of vnodes would be great indeed. Alain 2013/8/29 Robert Coli rc...@eventbrite.com Hi! I've been wondering... is there anyone in the cassandra-user audience who has used shuffle feature successfully on a non-toy-or-testing cluster? If so, could you describe the experience you had and any problems you encountered? Thanks! =Rob
Re: CQL Thrift
On Fri, Aug 30, 2013 at 11:56 AM, Vivek Mishra mishra.v...@gmail.comwrote: @lhazlewood https://issues.apache.org/jira/browse/CASSANDRA-5959 Begin batch multiple insert statements. apply batch It doesn't work for you? -Vivek According to the OP batching inserts is slow. The SO thread [1] mentions that the in their environment BATCH takes 1.5min, while the Thrift-based approach is around 235millis. [1] http://stackoverflow.com/questions/18522191/using-cassandra-and-cql3-how-do-you-insert-an-entire-wide-row-in-a-single-reque -- :- a) Alex Popescu Sen. Product Manager @ DataStax @al3xandru
Re: CQL Thrift
It seems really strange to me that you're create a table with specific types then try to deviate from it. Why not just use the blob type, then you can store whatever you want in there? The whole point of adding strong typing is to adhere to it. I wouldn't consider it a fault of the database that it does what you asked it to. On Aug 30, 2013, at 11:33 AM, Peter Lin wool...@gmail.com wrote: In the interest of education and discussion. I didn't mean to say CQL3 doesn't support dynamic columns. The example from the page shows default type defined in the create statement. create column family data with key_validation_class=Int32Type and comparator=DateType and default_validation_class=FloatType; If I try to insert a dynamic column that uses double for column name and string for column value, it will throw an error. The kind of use case I'm talking about defines a minimum number of static columns. Most of the columns that are added at runtime are different name and value type. This is specific to my use case. Having said that, I believe it would be possible to provide that kind of feature in CQL, but the trade off is it deviates from SQL. The grammar would have to allow type declaration in the columns list and functions in the values. Something like insert into mytable (KEY, doubleType(newcol1), string(newcol2)) values ('abc123', some string, double(102.211)) doubleType(newcol1) and string(newcol2) are dynamic columns. I know many people find thrift hard to grok and struggle with it, but I'm a firm believer in taking time to learn. Every developer should take time to read cassandra source code and the source code for the driver they're using. On Fri, Aug 30, 2013 at 2:18 PM, Jonathan Ellis jbel...@gmail.com wrote: http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin wool...@gmail.com wrote: my bias perspective, I find the sweet spot is thrift for insert/update and CQL for select queries. CQL is too limiting and negates the power of storing arbitrary data types in dynamic columns. On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote: If you're going to work with CQL, work with CQL. If you're going to work with Thrift, work with Thrift. Don't mix. On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote: Hi, If i a create a table with CQL3 as create table user(user_id text PRIMARY KEY, first_name text, last_name text, emailid text); and create index as: create index on user(first_name); then inserted some data as: insert into user(user_id,first_name,last_name,emailId) values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in'); Then if update same column family using Cassandra-cli as: update column family user with key_validation_class='UTF8Type' and column_metadata=[{column_name:last_name, validation_class:'UTF8Type', index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', index_type:KEYS}]; Now if i connect via cqlsh and explore user table, i can see column first_name,last_name are not part of table structure anymore. Here is the output: CREATE TABLE user ( key text PRIMARY KEY ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; cqlsh:cql3usage select * from user; user_id - @mevivs I understand that, CQL3 and thrift interoperability is an issue. But this looks to me a very basic scenario. Any suggestions? Or If anybody can explain a reason behind this? -Vivek -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: CQL3 wide row and slow inserts - is there a single insert alternative?
Well, it appears that this just isn't possible. I created CASSANDRA-5959 as a result. (Backstory + performance testing results are described in the issue): https://issues.apache.org/jira/browse/CASSANDRA-5959 -- Les Hazlewood | @lhazlewood CTO, Stormpath | http://stormpath.com | @goStormpath | 888.391.5282 On Thu, Aug 29, 2013 at 12:04 PM, Les Hazlewood lhazlew...@apache.orgwrote: Hi all, We're using a Cassandra table to store search results in a table/column family that that look like this: ++-+-+-+ || 0 | 1 | 2 | ... ++-+-+-+ | row_id | text... | text... | text... | ... The column name is the index # (an integer) of the location in the overall result set. The value is the result at that particular index. This is great because pagination becomes a simple slice query on the column name. Large result sets are split into multiple rows - we're limiting row size on disk to be around 6 or 7 MB. For our particular result entries, this means we can get around 50,000 columns in a single row. When we create the rows, we have the entire data available in the application at the time the row insert is necessary. Using CQL3, an initial implementation had one INSERT statement per column. This was killing performance (not to mention the # of tombstones it created). Here's the CQL3 table definition: create table query_results ( row_id text, shard_num int, list_index int, result text, primary key (row_id, shard_num), list_index)) with compact storage (the row key is row_id + shard_num. The 'cluster column' is list_index). I don't want to execute 50,000 INSERT statements for a single row. We have all of the data up front - I want to execute a single INSERT. Is this possible? We're using the Datastax Java Driver. Thanks for any help! Les
Re: CQL Thrift
create a column family as: create table dynamicTable(key text, nameAsDouble double, valueAsBlob blob); insert into dynamicTable(key, nameAsDouble, valueAsBlob) values ( key, double(102.211), textAsBlob('valueInBytes'). Do you think, it will work in case column name are double? -Vivek On Sat, Aug 31, 2013 at 12:03 AM, Peter Lin wool...@gmail.com wrote: In the interest of education and discussion. I didn't mean to say CQL3 doesn't support dynamic columns. The example from the page shows default type defined in the create statement. create column family data with key_validation_class=Int32Type and comparator=DateType and default_validation_class=FloatType; If I try to insert a dynamic column that uses double for column name and string for column value, it will throw an error. The kind of use case I'm talking about defines a minimum number of static columns. Most of the columns that are added at runtime are different name and value type. This is specific to my use case. Having said that, I believe it would be possible to provide that kind of feature in CQL, but the trade off is it deviates from SQL. The grammar would have to allow type declaration in the columns list and functions in the values. Something like insert into mytable (KEY, doubleType(newcol1), string(newcol2)) values ('abc123', some string, double(102.211)) doubleType(newcol1) and string(newcol2) are dynamic columns. I know many people find thrift hard to grok and struggle with it, but I'm a firm believer in taking time to learn. Every developer should take time to read cassandra source code and the source code for the driver they're using. On Fri, Aug 30, 2013 at 2:18 PM, Jonathan Ellis jbel...@gmail.com wrote: http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin wool...@gmail.com wrote: my bias perspective, I find the sweet spot is thrift for insert/update and CQL for select queries. CQL is too limiting and negates the power of storing arbitrary data types in dynamic columns. On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote: If you're going to work with CQL, work with CQL. If you're going to work with Thrift, work with Thrift. Don't mix. On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote: Hi, If i a create a table with CQL3 as create table user(user_id text PRIMARY KEY, first_name text, last_name text, emailid text); and create index as: create index on user(first_name); then inserted some data as: insert into user(user_id,first_name,last_name,emailId) values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in'); Then if update same column family using Cassandra-cli as: update column family user with key_validation_class='UTF8Type' and column_metadata=[{column_name:last_name, validation_class:'UTF8Type', index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', index_type:KEYS}]; Now if i connect via cqlsh and explore user table, i can see column first_name,last_name are not part of table structure anymore. Here is the output: CREATE TABLE user ( key text PRIMARY KEY ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; cqlsh:cql3usage select * from user; user_id - @mevivs I understand that, CQL3 and thrift interoperability is an issue. But this looks to me a very basic scenario. Any suggestions? Or If anybody can explain a reason behind this? -Vivek -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: CQL Thrift
Yes, that's correct - and that's a scaled number. In practice: On the local dev machine, CQL3 inserting 10,000 columns (for 1 row) in a BATCH took 1.5 minutes. 50,000 columns (the desired amount) in a BATCH took 7.5 minutes. The same Thrift functionality took _235 milliseconds_. That's almost 2,000 times faster (3 orders of magnitude difference)! However, according to Aleksey Yeschenko, this performance problem has been addressed in 2.0 beta 1 via https://issues.apache.org/jira/browse/CASSANDRA-4693. I'll reserve judgement until I can performance-test 2.0 beta 1 ;) Cheers, -- Les Hazlewood | @lhazlewood CTO, Stormpath | http://stormpath.com | @goStormpath | 888.391.5282 On Fri, Aug 30, 2013 at 12:50 PM, Alex Popescu al...@datastax.com wrote: On Fri, Aug 30, 2013 at 11:56 AM, Vivek Mishra mishra.v...@gmail.comwrote: @lhazlewood https://issues.apache.org/jira/browse/CASSANDRA-5959 Begin batch multiple insert statements. apply batch It doesn't work for you? -Vivek According to the OP batching inserts is slow. The SO thread [1] mentions that the in their environment BATCH takes 1.5min, while the Thrift-based approach is around 235millis. [1] http://stackoverflow.com/questions/18522191/using-cassandra-and-cql3-how-do-you-insert-an-entire-wide-row-in-a-single-reque -- :- a) Alex Popescu Sen. Product Manager @ DataStax @al3xandru
Re: CQL Thrift
This has nothing to do with compact storage. Cassandra supports arbitrary dynamic columns of different name/value type today. If people are happy with SQL metaphor, then CQL is fine. Then again, if SQL metaphor was good for temporal databases, there wouldn't be so many failed temporal databases built on RDB. I've built over 4 bi-temporal databases on RDB over the last 12 years, so it's not something that was done lightly. it was from years of pain. I won't bore others about the challenges of building temporal databases. On Fri, Aug 30, 2013 at 2:51 PM, Jon Haddad j...@jonhaddad.com wrote: It sounds like you want this: create table data ( pk int, colname blob, value blob, primary key (pk, colname)); that gives you arbitrary columns (cleverly labeled colname) in a single row, where the value is value. If you don't want the overhead of storing colname in every row, try with compact storage. Does this solve the problem, or am I missing something? On Aug 30, 2013, at 11:45 AM, Peter Lin wool...@gmail.com wrote: you could dynamically create new tables at runtime and insert rows into the new table, but is that better than using thrift and putting it into a regular dynamic column with the exact name type and value type? that would mean if there's 20 dynamic columns of different types, you'd have to execute 21 queries to rebuild the data. That's basically the same as using EVA tables in relational databases. Having used that approach in the past to build temporal databases, it doesn't scale well. On Fri, Aug 30, 2013 at 2:40 PM, Vivek Mishra mishra.v...@gmail.comwrote: create a column family as: create table dynamicTable(key text, nameAsDouble double, valueAsBlob blob); insert into dynamicTable(key, nameAsDouble, valueAsBlob) values ( key, double(102.211), textAsBlob('valueInBytes'). Do you think, it will work in case column name are double? -Vivek On Sat, Aug 31, 2013 at 12:03 AM, Peter Lin wool...@gmail.com wrote: In the interest of education and discussion. I didn't mean to say CQL3 doesn't support dynamic columns. The example from the page shows default type defined in the create statement. create column family data with key_validation_class=Int32Type and comparator=DateType and default_validation_class=FloatType; If I try to insert a dynamic column that uses double for column name and string for column value, it will throw an error. The kind of use case I'm talking about defines a minimum number of static columns. Most of the columns that are added at runtime are different name and value type. This is specific to my use case. Having said that, I believe it would be possible to provide that kind of feature in CQL, but the trade off is it deviates from SQL. The grammar would have to allow type declaration in the columns list and functions in the values. Something like insert into mytable (KEY, doubleType(newcol1), string(newcol2)) values ('abc123', some string, double(102.211)) doubleType(newcol1) and string(newcol2) are dynamic columns. I know many people find thrift hard to grok and struggle with it, but I'm a firm believer in taking time to learn. Every developer should take time to read cassandra source code and the source code for the driver they're using. On Fri, Aug 30, 2013 at 2:18 PM, Jonathan Ellis jbel...@gmail.comwrote: http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin wool...@gmail.com wrote: my bias perspective, I find the sweet spot is thrift for insert/update and CQL for select queries. CQL is too limiting and negates the power of storing arbitrary data types in dynamic columns. On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote: If you're going to work with CQL, work with CQL. If you're going to work with Thrift, work with Thrift. Don't mix. On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote: Hi, If i a create a table with CQL3 as create table user(user_id text PRIMARY KEY, first_name text, last_name text, emailid text); and create index as: create index on user(first_name); then inserted some data as: insert into user(user_id,first_name,last_name,emailId) values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in'); Then if update same column family using Cassandra-cli as: update column family user with key_validation_class='UTF8Type' and column_metadata=[{column_name:last_name, validation_class:'UTF8Type', index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', index_type:KEYS}]; Now if i connect via cqlsh and explore user table, i can see column first_name,last_name are not part of table structure anymore. Here is the output: CREATE TABLE user ( key text PRIMARY KEY ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND
Re: CQL Thrift
my bias perspective, I find the sweet spot is thrift for insert/update and CQL for select queries. CQL is too limiting and negates the power of storing arbitrary data types in dynamic columns. On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote: If you're going to work with CQL, work with CQL. If you're going to work with Thrift, work with Thrift. Don't mix. On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote: Hi, If i a create a table with CQL3 as create table user(user_id text PRIMARY KEY, first_name text, last_name text, emailid text); and create index as: create index on user(first_name); then inserted some data as: insert into user(user_id,first_name,last_name,emailId) values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in'); Then if update same column family using Cassandra-cli as: update column family user with key_validation_class='UTF8Type' and column_metadata=[{column_name:last_name, validation_class:'UTF8Type', index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', index_type:KEYS}]; Now if i connect via cqlsh and explore user table, i can see column first_name,last_name are not part of table structure anymore. Here is the output: CREATE TABLE user ( key text PRIMARY KEY ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; cqlsh:cql3usage select * from user; user_id - @mevivs I understand that, CQL3 and thrift interoperability is an issue. But this looks to me a very basic scenario. Any suggestions? Or If anybody can explain a reason behind this? -Vivek
Re: CqlStorage creates wrong schema for Pig
I threw together a quick UDF to work around this issue. It just extracts the value portion of the tuple while taking advantage of the CqlStorage generated schema to keep the type correct. You can get it here: https://github.com/iamthechad/cqlstorage-udf I'll see if I can find more useful information and open a defect, since that's what this seems to be. Chad On Fri, Aug 30, 2013 at 2:02 AM, Miguel Angel Martin junquera mianmarjun.mailingl...@gmail.com wrote: I try this: *rows = LOAD 'cql://keyspace1/test?page_size=1split_size=4where_clause=age%3D30' USING CqlStorage();* *dump rows;* *ILLUSTRATE rows;* *describe rows;* * * *values2= FOREACH rows GENERATE TOTUPLE (id) as (mycolumn:tuple(name,value));* *dump values2;* *describe values2;* * * But I get this results: - | rows | id:chararray | age:int | title:chararray | - | | (id, 6)| (age, 30) | (title, QA) | - rows: {id: chararray,age: int,title: chararray} 2013-08-30 09:54:37,831 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1031: Incompatable field schema: left is tuple_0:tuple(mycolumn:tuple(name:bytearray,value:bytearray)), right is org.apache.pig.builtin.totuple_id_1:tuple(id:chararray) or *values2= FOREACH rows GENERATE TOTUPLE (id) ;* *dump values2;* *describe values2;* and the results are: ... (((id,6))) (((id,5))) values2: {org.apache.pig.builtin.totuple_id_8: (id: chararray)} Aggg! * * Miguel Angel Martín Junquera Analyst Engineer. miguelangel.mar...@brainsins.com 2013/8/26 Miguel Angel Martin junquera mianmarjun.mailingl...@gmail.com hi Chad . I have this issue I send a mail to user-pig-list and I still i can resolve this, and I can not access to column values. In this mail I write some things that I try without results... and information about this issue. http://mail-archives.apache.org/mod_mbox/pig-user/201308.mbox/%3ccajeg_hq9s2po3_xytzx5xki4j1mao8q26jydg2wndy_kyiv...@mail.gmail.com%3E I hope someOne reply one comment, idea or solution about this issue or bug. I have reviewed the CqlStorage class in code cassandra 1.2.8 but i do not have configure the environmetn to debug and trace this issue. Only I find some comments like, but I do not understand at all. /** * A LoadStoreFunc for retrieving data from and storing data to Cassandra * * A row from a standard CF will be returned as nested tuples: * (((key1, value1), (key2, value2)), ((name1, val1), (name2, val2))). */ I you found some idea or solution, please post it thanks 2013/8/23 Chad Johnston cjohns...@megatome.com (I'm using Cassandra 1.2.8 and Pig 0.11.1) I'm loading some simple data from Cassandra into Pig using CqlStorage. The CqlStorage loader defines a Pig schema based on the Cassandra schema, but it seems to be wrong. If I do: data = LOAD 'cql://bookdata/books' USING CqlStorage(); DESCRIBE data; I get this: data: {isbn: chararray,bookauthor: chararray,booktitle: chararray,publisher: chararray,yearofpublication: int} However, if I DUMP data, I get results like these: ((isbn,0425093387),(bookauthor,Georgette Heyer),(booktitle,Death in the Stocks),(publisher,Berkley Pub Group),(yearofpublication,1986)) Clearly the results from Cassandra are key/value pairs, as would be expected. I don't know why the schema generated by CqlStorage() would be so different. This is really causing me problems trying to access the column values. I tried a naive approach of FLATTENing each tuple, then trying to access the values that way: flattened = FOREACH data GENERATE FLATTEN(isbn), FLATTEN(booktitle), ... values = FOREACH flattened GENERATE $1 AS ISBN, $3 AS BookTitle, ... As soon as I try to access field $5, Pig complains about the index being out of bounds. Is there a way to solve the schema/reality mismatch? Am I doing something wrong, or have I stumbled across a defect? Thanks, Chad
Selecting multiple rows with composite partition keys using CQL3
Hello, I've been trying to figure out how to port my application to CQL3 based on http://cassandra.apache.org/doc/cql3/CQL.html. I have a table with a primary key: ( (app, name), timestamp ). So, the partition key would be composite (on app and name). I'm trying to figure out if there is a way to select multiple rows that span partition keys. Basically, I am trying to do: SELECT .. WHERE (app = 'foo' AND name = 'bar' AND timestamp = 123) OR (app = 'foo' AND name='hello' AND timestamp = 123)
Data Modeling help for representing a survey form.
I have an existing system in postgres that I would like to move to cassandra. The system is for building registration forms for conferences. For example, you might want to build a registration form (or survey) that has a bunch of questions on it. An overview of this system I whiteboarded here: http://paste2.org/JeHP1tV0 What I'm trying to figure out is how this data should be structured in a de-normalized way? Basic queries would be: 1. Give me all surveys for an account 2. Give me all questions for a survey 3. Give me all responses for a survey 4. Give me all responses for a specific question 5. Compare responses for question What is your favorite color with people who answered question What is your gender. i.e a crosstab of males/females and the colors they like. 6. Give me a time series of how many people responded to a question per hour The reason I would like to get it on cassandra is because currently at peak times this is an extremely write heavy application since people are registering for a conference that launched or filling out a new survey, so everyone comes in all at once. Also, if anyone is in the bay area and wants to discuss cassandra data modeling over some beers, let me know! Thanks, John
Is it possible to synchronous run Cassandra Triggers?
Hi, All I am interested in using the new Cassandra feature Trigger to implement a synchronous (or asynchronous but with deadline) index on Cassandra. The Trigger API allows one to define a mutation job to do (in the future) but is there any way to control when the (asynchronously executed) job is actually executed. Or there is anyway to control execution model of Triggers, like turn on synchronous model and asynchronous model. Regards Yun