mysterious 'column1' in cql describe

2013-08-30 Thread Alexander Shutyaev
Hi all!

We have encountered the following problem. We create our column families
via hector like this:

ColumnFamilyDefinition cfdef = HFactory.createColumnFamilyDefinition(*
mykeyspace*, *mycf*);
cfdef.setColumnType(ColumnType.*STANDARD*);
cfdef.setComparatorType(ComparatorType.*UTF8TYPE*);
cfdef.setDefaultValidationClass(*BytesType*);
cfdef.setKeyValidationClass(*UTF8Type*);
cfdef.setReadRepairChance(0.1);
cfdef.setGcGraceSeconds(864000);
cfdef.setMinCompactionThreshold(4);
cfdef.setMaxCompactionThreshold(32);
cfdef.setReplicateOnWrite(*true*);
cfdef.setCompactionStrategy(*SizeTieredCompactionStrategy*);
MapString, String compressionOptions = *new* HashMapString, String();
compressionOptions.put(*sstable_compression*, **);
cfdef.setCompressionOptions(compressionOptions);
cluster.addColumnFamily(cfdef, *true*);

When we *describe *this column family via *cqlsh* we get this

CREATE TABLE mycf (
  key text,
  column1 text,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=0.10 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={};

As you can see there is a mysterious *column1* and moreover it is added to
the primary key. We've thought it wrong so we've tried getting rid of it.
We've managed to do it by adding explicit column definitions like this:

BasicColumnDefinition cdef = new BasicColumnDefinition();
cdef.setName(StringSerializer.get().toByteBuffer(*mycolumn*));
cdef.setValidationClass(ComparatorType.*BYTESTYPE*.getTypeName());
cdef.setIndexType(ColumnIndexType.*CUSTOM*);
cfdef.addColumnDefinition(cDef);

After this the primary key was like

PRIMARY KEY (key)

The effect of this was *overwhelming* - we got a tremendous performance
improvement and according to stats, the key cache began working while
previously its hit ratio was close to zero.

My questions are

1) What is this all about? Is what we did right?
2) In this project we can provide explicit column definitions. But in
another project we have some column families where this is not possible
because column names are dynamic (based on timestamps). If what we did is
right - how can we adapt this solution to the dynamic column name case?


Re: how can i get the column value? Need help!.. cassandra 1.28 and pig 0.11.1

2013-08-30 Thread Miguel Angel Martin junquera
I try this:

*rows = LOAD
'cql://keyspace1/test?page_size=1split_size=4where_clause=age%3D30' USING
CqlStorage();*

*dump rows;*

*ILLUSTRATE rows;*

*describe rows;*

*
*

*values2= FOREACH rows GENERATE  TOTUPLE (id) as
(mycolumn:tuple(name,value));*

*dump values2;*

*describe values2;*
*
*

But I get this results:



-
| rows | id:chararray   | age:int   | title:chararray   |
-
|  | (id, 6)| (age, 30) | (title, QA)   |
-

rows: {id: chararray,age: int,title: chararray}
2013-08-30 09:54:37,831 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1031: Incompatable field schema: left is
tuple_0:tuple(mycolumn:tuple(name:bytearray,value:bytearray)), right is
org.apache.pig.builtin.totuple_id_1:tuple(id:chararray)





or





*values2= FOREACH rows GENERATE  TOTUPLE (id) ;*
*dump values2;*
*describe values2;*




and  the results are:


...
(((id,6)))
(((id,5)))
values2: {org.apache.pig.builtin.totuple_id_8: (id: chararray)}



Aggg!


*
*




Miguel Angel Martín Junquera
Analyst Engineer.
miguelangel.mar...@brainsins.com



2013/8/28 Miguel Angel Martin junquera mianmarjun.mailingl...@gmail.com

 hi:

 I can not understand why the schema is  define like 
 *id:chararray,age:int,title:chararray
  and it does not define like tuples or bag tuples,  if we have pair
 key-values  columns*
 *
 *
 *
 *
 *I try other time to change schema  but it does not work.*
 *
 *
 *any ideas ...*
 *
 *
 *perhaps, is the issue in the definition cql3 tables ?*
 *
 *
 *regards*


 2013/8/28 Miguel Angel Martin junquera mianmarjun.mailingl...@gmail.com

 hi all:


 Regards

 Still i can resolve this issue. .

 does anybody have this issue or try to test this simple example?


 i am stumped I can not find a solution working.

 I appreciate any comment or help


 2013/8/22 Miguel Angel Martin junquera mianmarjun.mailingl...@gmail.com

 hi all:




 I,m testing the new CqlStorage() with cassandra 1.28 and pig 0.11.1


 I am using this sample data test:


 http://frommyworkshop.blogspot.com.es/2013/07/hadoop-map-reduce-with-cassandra.html

 And I load and dump data Righ with this script:

 *rows = LOAD
 'cql://keyspace1/test?page_size=1split_size=4where_clause=age%3D30' USING
 CqlStorage();*
 *
 *
 *dump rows;*
 *describe rows;*
 *
 *

 *resutls:

 ((id,6),(age,30),(title,QA))

 ((id,5),(age,30),(title,QA))

 rows: {id: chararray,age: int,title: chararray}


 *


 But i can not  get  the column values

 I try to define   another schemas in Load like I used with
 cassandraStorage()


 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-and-Pig-how-to-get-column-values-td5641158.html


 example:

 *rows = LOAD
 'cql://keyspace1/test?page_size=1split_size=4where_clause=age%3D30' USING
 CqlStorage() AS (columns: bag {T: tuple(name, value)});*


 and I get this error:

 *2013-08-22 12:24:45,426 [main] ERROR org.apache.pig.tools.grunt.Grunt
 - ERROR 1031: Incompatable schema: left is
 columns:bag{T:tuple(name:bytearray,value:bytearray)}, right is
 id:chararray,age:int,title:chararray*




 I try to use, FLATTEN, SUBSTRING, SPLIT UDF`s but i have not get good
 result:

 Example:


- when I flatten , I get a set of tuples like

 *(title,QA)*

 *(title,QA)*

 *2013-08-22 12:42:20,673 [main] INFO
  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
 input paths to process : 1*

 *A: {title: chararray}*



 but i can get value QA

 Sustring only works with title



 example:

 *B = FOREACH A GENERATE SUBSTRING(title,2,5);*
 *
 *
 *dump B;*
 *describe B;*
 *
 *
 *
 *

 *results:*
 *
 *

 *(tle)*
 *(tle)*
 *B: {chararray}*




 i try, this like ERIC LEE inthe other mail  and have the same results:


  Anyways, what I really what is the column value, not the name. Is there
 a way to do that? I listed all of the failed attempts I made below.

- colnames = FOREACH cols GENERATE $1 and was told $1 was out of
bounds.
- casted = FOREACH cols GENERATE (tuple(chararray, chararray))$0;
but all I got back were empty tuples
- values = FOREACH cols GENERATE $0.$1; but I got an error telling
me data byte array can't be casted to tuple


 Please, I will appreciate any help


 Regards









 --

 Miguel Angel Martín Junquera
 Analyst Engineer.
 miguelangel.mar...@brainsins.com
 Tel. / Fax: (+34) 91 485 56 66
 *http://www.brainsins.com*
 Smart eCommerce
 *Madrid*: http://goo.gl/4B5kv
  *London*: http://goo.gl/uIXdv
  *Barcelona*: http://goo.gl/NZslW

 Antes de imprimir este e-mail, piense si es necesario.
 La legislación española ampara el secreto de las comunicaciones. Este
 correo electrónico es estrictamente confidencial y va dirigido
 exclusivamente a su destinatario/a. Si no es Ud., le rogamos que no difunda
 ni copie la transmisión y nos lo notifique cuanto antes.




 

Re: CqlStorage creates wrong schema for Pig

2013-08-30 Thread Miguel Angel Martin junquera
I try this:

*rows = LOAD
'cql://keyspace1/test?page_size=1split_size=4where_clause=age%3D30' USING
CqlStorage();*

*dump rows;*

*ILLUSTRATE rows;*

*describe rows;*

*
*

*values2= FOREACH rows GENERATE  TOTUPLE (id) as
(mycolumn:tuple(name,value));*

*dump values2;*

*describe values2;*
*
*

But I get this results:



-
| rows | id:chararray   | age:int   | title:chararray   |
-
|  | (id, 6)| (age, 30) | (title, QA)   |
-

rows: {id: chararray,age: int,title: chararray}
2013-08-30 09:54:37,831 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1031: Incompatable field schema: left is
tuple_0:tuple(mycolumn:tuple(name:bytearray,value:bytearray)), right is
org.apache.pig.builtin.totuple_id_1:tuple(id:chararray)





or





*values2= FOREACH rows GENERATE  TOTUPLE (id) ;*
*dump values2;*
*describe values2;*




and  the results are:


...
(((id,6)))
(((id,5)))
values2: {org.apache.pig.builtin.totuple_id_8: (id: chararray)}



Aggg!


*
*



Miguel Angel Martín Junquera
Analyst Engineer.
miguelangel.mar...@brainsins.com



2013/8/26 Miguel Angel Martin junquera mianmarjun.mailingl...@gmail.com

 hi Chad .

 I have this issue

 I send a mail to user-pig-list and  I still i can resolve this, and I can
 not  access to column values.
 In this mail  I write some things that I try without results... and
 information about this issue.



 http://mail-archives.apache.org/mod_mbox/pig-user/201308.mbox/%3ccajeg_hq9s2po3_xytzx5xki4j1mao8q26jydg2wndy_kyiv...@mail.gmail.com%3E



 I hope  someOne reply  one comment, idea or  solution about  this issue or
 bug.


 I have reviewed the CqlStorage class in code cassandra 1.2.8  but i do not
 have configure the environmetn to debug  and trace this issue.

 Only  I find some comments like, but I do not understand at all.


 /**

  * A LoadStoreFunc for retrieving data from and storing data to Cassandra

  *

  * A row from a standard CF will be returned as nested tuples:

  * (((key1, value1), (key2, value2)), ((name1, val1), (name2, val2))).
  */


 I you found some idea or solution, please post it

 thanks









 2013/8/23 Chad Johnston cjohns...@megatome.com

 (I'm using Cassandra 1.2.8 and Pig 0.11.1)

 I'm loading some simple data from Cassandra into Pig using CqlStorage.
 The CqlStorage loader defines a Pig schema based on the Cassandra schema,
 but it seems to be wrong.

 If I do:

 data = LOAD 'cql://bookdata/books' USING CqlStorage();
 DESCRIBE data;

 I get this:

 data: {isbn: chararray,bookauthor: chararray,booktitle:
 chararray,publisher: chararray,yearofpublication: int}

 However, if I DUMP data, I get results like these:

 ((isbn,0425093387),(bookauthor,Georgette Heyer),(booktitle,Death in the
 Stocks),(publisher,Berkley Pub Group),(yearofpublication,1986))

 Clearly the results from Cassandra are key/value pairs, as would be
 expected. I don't know why the schema generated by CqlStorage() would be so
 different.

 This is really causing me problems trying to access the column values. I
 tried a naive approach of FLATTENing each tuple, then trying to access the
 values that way:

 flattened = FOREACH data GENERATE
   FLATTEN(isbn),
   FLATTEN(booktitle),
   ...
 values = FOREACH flattened GENERATE
   $1 AS ISBN,
   $3 AS BookTitle,
   ...

 As soon as I try to access field $5, Pig complains about the index being
 out of bounds.

 Is there a way to solve the schema/reality mismatch? Am I doing something
 wrong, or have I stumbled across a defect?

 Thanks,
 Chad





Re: successful use of shuffle?

2013-08-30 Thread Alain RODRIGUEZ
+1.

I am still afraid of this step. Yet you can avoid it by introducing new
nodes, with vnodes enabled, and then remove old ones. This should work.

My problem is that I am not really confident in vnodes either...

Any share, on this transition, and then of the use of vnodes would be great
indeed.

Alain


2013/8/29 Robert Coli rc...@eventbrite.com

 Hi!

 I've been wondering... is there anyone in the cassandra-user audience who
 has used shuffle feature successfully on a non-toy-or-testing cluster? If
 so, could you describe the experience you had and any problems you
 encountered?

 Thanks!

 =Rob



Re: mysterious 'column1' in cql describe

2013-08-30 Thread Sylvain Lebresne
The short story is that you're probably not up to date on how CQL and
thrift table definition relate to one another, and that may not be exactly
how you think it does. If you haven't done so, I'd suggest the reading of
http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows(should
answer your what about dynamic column name case) and
http://www.datastax.com/dev/blog/thrift-to-cql3 (should help explain how
CQL3 interprets thrift table, and why your saw what you saw).

--
Sylvain


On Fri, Aug 30, 2013 at 9:50 AM, Alexander Shutyaev shuty...@gmail.comwrote:

 Hi all!

 We have encountered the following problem. We create our column families
 via hector like this:

 ColumnFamilyDefinition cfdef = HFactory.createColumnFamilyDefinition(*
 mykeyspace*, *mycf*);
 cfdef.setColumnType(ColumnType.*STANDARD*);
 cfdef.setComparatorType(ComparatorType.*UTF8TYPE*);
 cfdef.setDefaultValidationClass(*BytesType*);
 cfdef.setKeyValidationClass(*UTF8Type*);
 cfdef.setReadRepairChance(0.1);
 cfdef.setGcGraceSeconds(864000);
 cfdef.setMinCompactionThreshold(4);
 cfdef.setMaxCompactionThreshold(32);
 cfdef.setReplicateOnWrite(*true*);
 cfdef.setCompactionStrategy(*SizeTieredCompactionStrategy*);
 MapString, String compressionOptions = *new* HashMapString, String();
 compressionOptions.put(*sstable_compression*, **);
 cfdef.setCompressionOptions(compressionOptions);
 cluster.addColumnFamily(cfdef, *true*);

 When we *describe *this column family via *cqlsh* we get this

 CREATE TABLE mycf (
   key text,
   column1 text,
   value blob,
   PRIMARY KEY (key, column1)
 ) WITH COMPACT STORAGE AND
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={};

 As you can see there is a mysterious *column1* and moreover it is added
 to the primary key. We've thought it wrong so we've tried getting rid of
 it. We've managed to do it by adding explicit column definitions like this:

 BasicColumnDefinition cdef = new BasicColumnDefinition();
 cdef.setName(StringSerializer.get().toByteBuffer(*mycolumn*));
 cdef.setValidationClass(ComparatorType.*BYTESTYPE*.getTypeName());
 cdef.setIndexType(ColumnIndexType.*CUSTOM*);
 cfdef.addColumnDefinition(cDef);

 After this the primary key was like

 PRIMARY KEY (key)

 The effect of this was *overwhelming* - we got a tremendous performance
 improvement and according to stats, the key cache began working while
 previously its hit ratio was close to zero.

 My questions are

 1) What is this all about? Is what we did right?
 2) In this project we can provide explicit column definitions. But in
 another project we have some column families where this is not possible
 because column names are dynamic (based on timestamps). If what we did is
 right - how can we adapt this solution to the dynamic column name case?



Re: mysterious 'column1' in cql describe

2013-08-30 Thread Alexander Shutyaev
Thanks, Sylvain! I'll read it most thoroughly but after a quick glance I
wish to repeat my another (implied) question that I believe will not be
answered in these articles.

Why does the explicit definition of columns in a column family
significantly improve performance and key cache hit ratio (the last one
being almost zero when there are no explicit column definitions)?


2013/8/30 Sylvain Lebresne sylv...@datastax.com

 The short story is that you're probably not up to date on how CQL and
 thrift table definition relate to one another, and that may not be exactly
 how you think it does. If you haven't done so, I'd suggest the reading of
 http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows(should
  answer your what about dynamic column name case) and
 http://www.datastax.com/dev/blog/thrift-to-cql3 (should help explain how
 CQL3 interprets thrift table, and why your saw what you saw).

 --
 Sylvain


 On Fri, Aug 30, 2013 at 9:50 AM, Alexander Shutyaev shuty...@gmail.comwrote:

 Hi all!

 We have encountered the following problem. We create our column families
 via hector like this:

 ColumnFamilyDefinition cfdef = HFactory.createColumnFamilyDefinition(*
 mykeyspace*, *mycf*);
 cfdef.setColumnType(ColumnType.*STANDARD*);
 cfdef.setComparatorType(ComparatorType.*UTF8TYPE*);
 cfdef.setDefaultValidationClass(*BytesType*);
  cfdef.setKeyValidationClass(*UTF8Type*);
 cfdef.setReadRepairChance(0.1);
 cfdef.setGcGraceSeconds(864000);
 cfdef.setMinCompactionThreshold(4);
 cfdef.setMaxCompactionThreshold(32);
 cfdef.setReplicateOnWrite(*true*);
 cfdef.setCompactionStrategy(*SizeTieredCompactionStrategy*);
 MapString, String compressionOptions = *new* HashMapString, String();
 compressionOptions.put(*sstable_compression*, **);
 cfdef.setCompressionOptions(compressionOptions);
 cluster.addColumnFamily(cfdef, *true*);

 When we *describe *this column family via *cqlsh* we get this

 CREATE TABLE mycf (
   key text,
   column1 text,
   value blob,
   PRIMARY KEY (key, column1)
 ) WITH COMPACT STORAGE AND
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={};

 As you can see there is a mysterious *column1* and moreover it is added
 to the primary key. We've thought it wrong so we've tried getting rid of
 it. We've managed to do it by adding explicit column definitions like this:

 BasicColumnDefinition cdef = new BasicColumnDefinition();
 cdef.setName(StringSerializer.get().toByteBuffer(*mycolumn*));
 cdef.setValidationClass(ComparatorType.*BYTESTYPE*.getTypeName());
 cdef.setIndexType(ColumnIndexType.*CUSTOM*);
 cfdef.addColumnDefinition(cDef);

 After this the primary key was like

 PRIMARY KEY (key)

 The effect of this was *overwhelming* - we got a tremendous performance
 improvement and according to stats, the key cache began working while
 previously its hit ratio was close to zero.

 My questions are

 1) What is this all about? Is what we did right?
 2) In this project we can provide explicit column definitions. But in
 another project we have some column families where this is not possible
 because column names are dynamic (based on timestamps). If what we did is
 right - how can we adapt this solution to the dynamic column name case?





[RELEASE] Apache Cassandra 1.2.9 released

2013-08-30 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra
version 1.2.9.

Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a maintenance/bug fix release[1] on the 1.2 series. As
always,
please pay attention to the release notes[2] and Let us know[3] if you were
to
encounter any problem.

Enjoy!

[1]: http://goo.gl/2UVSW5 (CHANGES.txt)
[2]: http://goo.gl/lOZAdM (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: mysterious 'column1' in cql describe

2013-08-30 Thread Sylvain Lebresne
 Why does the explicit definition of columns in a column family
 significantly improve performance and key cache hit ratio (the last one
 being almost zero when there are no explicit column definitions)?


It doesn't, not in itself at least. So something else has changed or
something is wrong in your comparison of before/after. But it's hard to say
without at least a minimum of information on how you actually observed such
significant performance improvement (which queries for instance).

As for the key cache hit rate, adding a column definition certainly have no
effect on it in itself. But defining a new 2ndary index might, and the code
to add the column you've provided does has a  setIndexType. Again, hard to
be definitive on that because the code you've show set a CUSTOM index type
without providing any indexOption, which is *invalid* (and rejected as so
by Cassandra). So either the code above is not complete, or it's not the
one you've used, or Hector is doing some weird stuff behind your back. In
any case, if index creation there has been, then *that* could easily
explain a before-after performance difference.

--
Sylvain





 2013/8/30 Sylvain Lebresne sylv...@datastax.com

 The short story is that you're probably not up to date on how CQL and
 thrift table definition relate to one another, and that may not be exactly
 how you think it does. If you haven't done so, I'd suggest the reading of
 http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows(should
  answer your what about dynamic column name case) and
 http://www.datastax.com/dev/blog/thrift-to-cql3 (should help explain how
 CQL3 interprets thrift table, and why your saw what you saw).

 --
 Sylvain


 On Fri, Aug 30, 2013 at 9:50 AM, Alexander Shutyaev 
 shuty...@gmail.comwrote:

 Hi all!

 We have encountered the following problem. We create our column families
 via hector like this:

 ColumnFamilyDefinition cfdef = HFactory.createColumnFamilyDefinition(*
 mykeyspace*, *mycf*);
 cfdef.setColumnType(ColumnType.*STANDARD*);
 cfdef.setComparatorType(ComparatorType.*UTF8TYPE*);
 cfdef.setDefaultValidationClass(*BytesType*);
  cfdef.setKeyValidationClass(*UTF8Type*);
 cfdef.setReadRepairChance(0.1);
 cfdef.setGcGraceSeconds(864000);
 cfdef.setMinCompactionThreshold(4);
 cfdef.setMaxCompactionThreshold(32);
 cfdef.setReplicateOnWrite(*true*);
 cfdef.setCompactionStrategy(*SizeTieredCompactionStrategy*);
 MapString, String compressionOptions = *new* HashMapString,
 String();
 compressionOptions.put(*sstable_compression*, **);
 cfdef.setCompressionOptions(compressionOptions);
 cluster.addColumnFamily(cfdef, *true*);

 When we *describe *this column family via *cqlsh* we get this

 CREATE TABLE mycf (
   key text,
   column1 text,
   value blob,
   PRIMARY KEY (key, column1)
 ) WITH COMPACT STORAGE AND
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={};

 As you can see there is a mysterious *column1* and moreover it is added
 to the primary key. We've thought it wrong so we've tried getting rid of
 it. We've managed to do it by adding explicit column definitions like this:

 BasicColumnDefinition cdef = new BasicColumnDefinition();
 cdef.setName(StringSerializer.get().toByteBuffer(*mycolumn*));
 cdef.setValidationClass(ComparatorType.*BYTESTYPE*.getTypeName());
 cdef.setIndexType(ColumnIndexType.*CUSTOM*);
 cfdef.addColumnDefinition(cDef);

 After this the primary key was like

 PRIMARY KEY (key)

 The effect of this was *overwhelming* - we got a tremendous performance
 improvement and according to stats, the key cache began working while
 previously its hit ratio was close to zero.

 My questions are

 1) What is this all about? Is what we did right?
 2) In this project we can provide explicit column definitions. But in
 another project we have some column families where this is not possible
 because column names are dynamic (based on timestamps). If what we did is
 right - how can we adapt this solution to the dynamic column name case?






RE: Cassandra-shuffle fails

2013-08-30 Thread Romain HARDOUIN
Hi,

Failed to enable shuffling is thrown when an IOException occurs in the 
constructor JMXConnection(endpoint, port).
See Shuffle.enableRelocations() in org.apache.cassandra.tools.

Have you set up credentials for JMX?

Regards,
Romain



De :Tamar Rosen ta...@correlor.com
A : user@cassandra.apache.org, 
Cc :Vitaly Sourikov vit...@correlor.com, Yair Pinyan 
y...@correlor.com
Date :  29/08/2013 17:35
Objet : Cassandra-shuffle fails



Hi,

We recently upgraded from version 1.1 to 1.2
It all went well, including setting up vnodes, but shuffle fails. 

We have 2 nodes, hosted on Amazon AWS

The steps we took (on each of our nodes) are pretty straight forward:
1. upgrade binaries
2. adjust cassandra.yaml (keep token)
3. nodetool upgradesstables
4. change cassandra.yaml to vnodes rather than tokens
5. restart cassandra
6. cassandra-shuffle create. 

All the above went fine. However, the following fails:
cassandra-shuffle enable
Failed to enable shuffling on 10.194.230.175!

Note:
1. The failure is immediate, and consistent.  
2. Calling shuffle create on either node prepares the shuffle files for 
both. 
3. I made sure both servers are communicating fine on both 9160 and 7199.

Any help will be greatly appreciated.

Tamar

Tamar Rosen
Senior Data Architect
Correlor.com




 


map/reduce performance time and sstable readerŠ.

2013-08-30 Thread Hiller, Dean
Has anyone done performance tests on sstable reading vs. M/R?  I did a quick 
test on reading all SSTAbles in a LCS column family on 23 tables and took the 
average time it took sstable2json(to /dev/null to make it faster) which was 7 
seconds per table.  (reading to stdout took 16 seconds per table).  This then 
worked out to an estimation of 12.5 hours up to 27 hours(from to stdout 
calculation).  I am suspecting the map/reduce time may be much worse since 
there are not as many repeated rows in LCS

Ie. I am wondering if I should just read from SSTAbles directly instead of 
map/reduce?   I am about to dig around in the code of M/R and sstable2json to 
see what each is doing specifically.

Thanks,
Dean


is there a SSTAbleInput for Map/Reduce instead of ColumnFamily?

2013-08-30 Thread Hiller, Dean
is there a SSTableInput for Map/Reduce instead of ColumnFamily (which uses 
thrift)?

We are not worried about repeated reads since we are idempotent but would 
rather have the direct speed (even if we had to read from a snapshot, it would 
be fine).

(We would most likely run our M/R on 4 nodes of the 12 nodes we have since we 
have RF=3 right now).

Thanks,
Dean


RE: Truncate question

2013-08-30 Thread S C
Thank you all for your responses. Yes I have cleared the snapshots post 
truncate operation.

Thanks,SC
Date: Thu, 29 Aug 2013 21:41:25 -0400
Subject: Re: Truncate question
From: dmcne...@gmail.com
To: user@cassandra.apache.org

You would, however, want to clear the snapshot folder afterword, right?  I 
thought that truncate, like drop table, created a snapshot (unless that feature 
had been disabled in your yaml.  


On Thu, Aug 29, 2013 at 6:51 PM, Robert Coli rc...@eventbrite.com wrote:

On Thu, Aug 29, 2013 at 3:48 PM, S C as...@outlook.com wrote:





Do we have to run nodetool repair or nodetool cleanup after Truncating a 
Column Family?
No. Why would you?


=Rob 

  

Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Jon Haddad
Does your previous snapshot include the system keyspace?  I haven't tried 
upgrading from 1.0.x then rolling back, but it's possible there's some 
backwards incompatible changes.Other than that, make sure you also rolled 
back your config files? 

On Aug 30, 2013, at 8:57 AM, Mike Neir m...@liquidweb.com wrote:

 Greetings folks,
 
 I'm faced with the need to update a 36 node cluster with roughly 25T of data 
 on disk to a version of cassandra in the 1.2.x series. While it seems that 
 1.2.8 will play nicely in the 1.0.9 cluster long enough to do a rolling 
 upgrade, I'd still like to have a roll-back plan in case the rolling upgrade 
 goes sideways.
 
 I've tried to upgrade a single node in my dev cluster, then roll back using a 
 snapshot taken previously, but things don't appear to be going smoothly. The 
 node will rejoin the ring eventually, but not after spending some time in the 
 Joining state as shown by nodetool ring, and spewing a ton of error 
 messages similar to the following:
 
 ERROR [MutationStage:31] 2013-08-29 14:07:20,530 RowMutationVerbHandler.java 
 (line 61) Error in row mutation
 org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1178
 
 My test procedure is as follows:
 1)  nodetool -h localhost snapshot
 2)  nodetool -h localhost drain
 3)  service cassandra stop
 4)  back up cassandra configs
 5)  remove cassandra 1.0.9
 6)  install cassandra 1.2.8
 7)  restore cassandra configs, alter them to remove configuration entries no 
 longer used
 8)  start cassandra 1.2.8, let it run for a bit, then drain/stop it
 9)  remove cassandra 1.2.8
 10) reinstall cassandra 1.0.9
 11) restore original cassandra configs
 12) remove any commit logs present
 13) remove folders for system_auth and system_traces Keyspaces (since they 
 don't seem to be present in 1.0.9)
 14) Move snapshots back to where they should be for 1.0.9 and remove cass 
 1.2.8 data
  # cd /var/lib/cassandra/data/$KEYSPACE/
  # mv */snapshots/$TIMESTAMP/* .
  # find . -mindepth 1 -type d -exec rm -rf {} \;
  # cd /var/lib/cassandra/data/system
  # mv */snapshots/$TIMESTAMP/* .
  # find . -mindepth 1 -type d -exec rm -rf {} \;
 15) start cassandra 1.0.9
 16) observe cassandra system.log
 
 Does anyone have any insight on things I may be doing wrong, or whether this 
 is just an unavoidable pain point caused by rolling back? It seems that since 
 there are no schema changes going on, the node should be able to just hop 
 back into the cluster without error and without transitioning through the 
 Joining state.
 
 -- 
 
 
 
 Mike Neir
 Liquid Web, Inc.
 Infrastructure Administrator
 



Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Jon Haddad
Sorry, I didn't see the test procedure, it's still early.

On Aug 30, 2013, at 8:57 AM, Mike Neir m...@liquidweb.com wrote:

 Greetings folks,
 
 I'm faced with the need to update a 36 node cluster with roughly 25T of data 
 on disk to a version of cassandra in the 1.2.x series. While it seems that 
 1.2.8 will play nicely in the 1.0.9 cluster long enough to do a rolling 
 upgrade, I'd still like to have a roll-back plan in case the rolling upgrade 
 goes sideways.
 
 I've tried to upgrade a single node in my dev cluster, then roll back using a 
 snapshot taken previously, but things don't appear to be going smoothly. The 
 node will rejoin the ring eventually, but not after spending some time in the 
 Joining state as shown by nodetool ring, and spewing a ton of error 
 messages similar to the following:
 
 ERROR [MutationStage:31] 2013-08-29 14:07:20,530 RowMutationVerbHandler.java 
 (line 61) Error in row mutation
 org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1178
 
 My test procedure is as follows:
 1)  nodetool -h localhost snapshot
 2)  nodetool -h localhost drain
 3)  service cassandra stop
 4)  back up cassandra configs
 5)  remove cassandra 1.0.9
 6)  install cassandra 1.2.8
 7)  restore cassandra configs, alter them to remove configuration entries no 
 longer used
 8)  start cassandra 1.2.8, let it run for a bit, then drain/stop it
 9)  remove cassandra 1.2.8
 10) reinstall cassandra 1.0.9
 11) restore original cassandra configs
 12) remove any commit logs present
 13) remove folders for system_auth and system_traces Keyspaces (since they 
 don't seem to be present in 1.0.9)
 14) Move snapshots back to where they should be for 1.0.9 and remove cass 
 1.2.8 data
  # cd /var/lib/cassandra/data/$KEYSPACE/
  # mv */snapshots/$TIMESTAMP/* .
  # find . -mindepth 1 -type d -exec rm -rf {} \;
  # cd /var/lib/cassandra/data/system
  # mv */snapshots/$TIMESTAMP/* .
  # find . -mindepth 1 -type d -exec rm -rf {} \;
 15) start cassandra 1.0.9
 16) observe cassandra system.log
 
 Does anyone have any insight on things I may be doing wrong, or whether this 
 is just an unavoidable pain point caused by rolling back? It seems that since 
 there are no schema changes going on, the node should be able to just hop 
 back into the cluster without error and without transitioning through the 
 Joining state.
 
 -- 
 
 
 
 Mike Neir
 Liquid Web, Inc.
 Infrastructure Administrator
 



Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Robert Coli
On Fri, Aug 30, 2013 at 8:57 AM, Mike Neir m...@liquidweb.com wrote:

 I'm faced with the need to update a 36 node cluster with roughly 25T of
 data on disk to a version of cassandra in the 1.2.x series. While it seems
 that 1.2.8 will play nicely in the 1.0.9 cluster long enough to do a
 rolling upgrade, I'd still like to have a roll-back plan in case the
 rolling upgrade goes sideways.


Upgrading two major versions online is an unsupported operation. I would
not expect it to work. Is there a detailed reason you believe it should
work between these versions? Also, instead of 1.2.8 you should upgrade to
1.2.9, released yesterday. Everyone headed to 2.0 has to pass through 1.2.9.

=Rob


Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Mike Neir

Greetings folks,

I'm faced with the need to update a 36 node cluster with roughly 25T of data on 
disk to a version of cassandra in the 1.2.x series. While it seems that 1.2.8 
will play nicely in the 1.0.9 cluster long enough to do a rolling upgrade, I'd 
still like to have a roll-back plan in case the rolling upgrade goes sideways.


I've tried to upgrade a single node in my dev cluster, then roll back using a 
snapshot taken previously, but things don't appear to be going smoothly. The 
node will rejoin the ring eventually, but not after spending some time in the 
Joining state as shown by nodetool ring, and spewing a ton of error messages 
similar to the following:


ERROR [MutationStage:31] 2013-08-29 14:07:20,530 RowMutationVerbHandler.java 
(line 61) Error in row mutation

org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1178

My test procedure is as follows:
1)  nodetool -h localhost snapshot
2)  nodetool -h localhost drain
3)  service cassandra stop
4)  back up cassandra configs
5)  remove cassandra 1.0.9
6)  install cassandra 1.2.8
7)  restore cassandra configs, alter them to remove configuration entries no 
longer used

8)  start cassandra 1.2.8, let it run for a bit, then drain/stop it
9)  remove cassandra 1.2.8
10) reinstall cassandra 1.0.9
11) restore original cassandra configs
12) remove any commit logs present
13) remove folders for system_auth and system_traces Keyspaces (since they don't 
seem to be present in 1.0.9)

14) Move snapshots back to where they should be for 1.0.9 and remove cass 1.2.8 
data
  # cd /var/lib/cassandra/data/$KEYSPACE/
  # mv */snapshots/$TIMESTAMP/* .
  # find . -mindepth 1 -type d -exec rm -rf {} \;
  # cd /var/lib/cassandra/data/system
  # mv */snapshots/$TIMESTAMP/* .
  # find . -mindepth 1 -type d -exec rm -rf {} \;
15) start cassandra 1.0.9
16) observe cassandra system.log

Does anyone have any insight on things I may be doing wrong, or whether this is 
just an unavoidable pain point caused by rolling back? It seems that since there 
are no schema changes going on, the node should be able to just hop back into 
the cluster without error and without transitioning through the Joining state.


--



Mike Neir
Liquid Web, Inc.
Infrastructure Administrator



Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Mohit Anchlia
If you have multiple DCs you at least want to upgrade to 1.0.11. There is
an issue where you might get errors during cross DC replication.

On Fri, Aug 30, 2013 at 9:41 AM, Mike Neir m...@liquidweb.com wrote:

 In my testing, mixing 1.0.9 and 1.2.8 seems to work fine as long as there
 is no need to do streaming operations (move/repair/bootstrap/etc). The
 reading I've done confirms that 1.2.x should be network-compatible with
 1.0.x, sans streaming operations. Datastax seems to indicate here that
 doing a rolling upgrade from 1.0.x to 1.2.x is viable:

 http://www.datastax.com/**documentation/cassandra/1.2/**
 webhelp/#upgrade/upgradeC_c.**html#concept_ds_nht_czr_ckhttp://www.datastax.com/documentation/cassandra/1.2/webhelp/#upgrade/upgradeC_c.html%23concept_ds_nht_czr_ck

 See the second bullet point in the Prerequisites section.

 I'll look into 1.2.9. It wasn't available when I started my testing.

 MN


 On 08/30/2013 12:15 PM, Robert Coli wrote:

 On Fri, Aug 30, 2013 at 8:57 AM, Mike Neir m...@liquidweb.com
  mailto:m...@liquidweb.com wrote:

 I'm faced with the need to update a 36 node cluster with roughly 25T
 of data
 on disk to a version of cassandra in the 1.2.x series. While it seems
 that
 1.2.8 will play nicely in the 1.0.9 cluster long enough to do a
 rolling
 upgrade, I'd still like to have a roll-back plan in case the rolling
 upgrade
 goes sideways.


 Upgrading two major versions online is an unsupported operation. I would
 not
 expect it to work. Is there a detailed reason you believe it should work
 between
 these versions? Also, instead of 1.2.8 you should upgrade to 1.2.9,
 released
 yesterday. Everyone headed to 2.0 has to pass through 1.2.9.

 =Rob


  --



 Mike Neir
 Liquid Web, Inc.
 Infrastructure Administrator




Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Mike Neir
In my testing, mixing 1.0.9 and 1.2.8 seems to work fine as long as there is no 
need to do streaming operations (move/repair/bootstrap/etc). The reading I've 
done confirms that 1.2.x should be network-compatible with 1.0.x, sans streaming 
operations. Datastax seems to indicate here that doing a rolling upgrade from 
1.0.x to 1.2.x is viable:


http://www.datastax.com/documentation/cassandra/1.2/webhelp/#upgrade/upgradeC_c.html#concept_ds_nht_czr_ck

See the second bullet point in the Prerequisites section.

I'll look into 1.2.9. It wasn't available when I started my testing.

MN

On 08/30/2013 12:15 PM, Robert Coli wrote:

On Fri, Aug 30, 2013 at 8:57 AM, Mike Neir m...@liquidweb.com
mailto:m...@liquidweb.com wrote:

I'm faced with the need to update a 36 node cluster with roughly 25T of data
on disk to a version of cassandra in the 1.2.x series. While it seems that
1.2.8 will play nicely in the 1.0.9 cluster long enough to do a rolling
upgrade, I'd still like to have a roll-back plan in case the rolling upgrade
goes sideways.


Upgrading two major versions online is an unsupported operation. I would not
expect it to work. Is there a detailed reason you believe it should work between
these versions? Also, instead of 1.2.8 you should upgrade to 1.2.9, released
yesterday. Everyone headed to 2.0 has to pass through 1.2.9.

=Rob


--



Mike Neir
Liquid Web, Inc.
Infrastructure Administrator



Update-Replace

2013-08-30 Thread Jan Algermissen
Hi,

I have a use case, where I periodically need to apply updates to a wide row 
that should replace the whole row.

The straight-forward insert/update only replace values that are present in the 
executed statement, keeping remaining data around.

Is there a smooth way to do a replace with C* or do I have to handle this by 
the application (e.g. doing delete and then write or coming up with a more 
clever data model)?

Jan

[ANNOUNCE] Polidoro - A Cassandra client in Scala

2013-08-30 Thread Lanny Ripple
Hi all,

We've open sourced Polidoro.  It's a Cassandra client in Scala on top of 
Astyanax and in the style of Cascal.

Find it at https://github.com/SpotRight/Polidoro

  -Lanny Ripple
  SpotRight, Inc - http://spotright.com

Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Jeremiah D Jordan
You probably want to go to 1.0.11/12 first no matter what.  If you want the 
least chance of issue you should then go to 1.1.12.  While there is a high 
probability that going from 1.0.X-1.2 will work. You have the best chance at 
no failures if you go through 1.1.12.  There are some edge cases that can cause 
errors if you don't do that.

-Jeremiah


On Aug 30, 2013, at 11:41 AM, Mike Neir m...@liquidweb.com wrote:

 In my testing, mixing 1.0.9 and 1.2.8 seems to work fine as long as there is 
 no need to do streaming operations (move/repair/bootstrap/etc). The reading 
 I've done confirms that 1.2.x should be network-compatible with 1.0.x, sans 
 streaming operations. Datastax seems to indicate here that doing a rolling 
 upgrade from 1.0.x to 1.2.x is viable:
 
 http://www.datastax.com/documentation/cassandra/1.2/webhelp/#upgrade/upgradeC_c.html#concept_ds_nht_czr_ck
 
 See the second bullet point in the Prerequisites section.
 
 I'll look into 1.2.9. It wasn't available when I started my testing.
 
 MN
 
 On 08/30/2013 12:15 PM, Robert Coli wrote:
 On Fri, Aug 30, 2013 at 8:57 AM, Mike Neir m...@liquidweb.com
 mailto:m...@liquidweb.com wrote:
 
I'm faced with the need to update a 36 node cluster with roughly 25T of 
 data
on disk to a version of cassandra in the 1.2.x series. While it seems that
1.2.8 will play nicely in the 1.0.9 cluster long enough to do a rolling
upgrade, I'd still like to have a roll-back plan in case the rolling 
 upgrade
goes sideways.
 
 
 Upgrading two major versions online is an unsupported operation. I would not
 expect it to work. Is there a detailed reason you believe it should work 
 between
 these versions? Also, instead of 1.2.8 you should upgrade to 1.2.9, released
 yesterday. Everyone headed to 2.0 has to pass through 1.2.9.
 
 =Rob
 
 -- 
 
 
 
 Mike Neir
 Liquid Web, Inc.
 Infrastructure Administrator
 



Re: CQL Thrift

2013-08-30 Thread Jon Haddad
If you're going to work with CQL, work with CQL.  If you're going to work with 
Thrift, work with Thrift.  Don't mix.

On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote:

 Hi,
 If i a create a table with CQL3 as 
 
 create table user(user_id text PRIMARY KEY, first_name text, last_name text, 
 emailid text);
 
 and create index as:
 create index on user(first_name);
 
 then inserted some data as:
 insert into user(user_id,first_name,last_name,emailId) 
 values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in');
 
 
 Then if update same column family using Cassandra-cli as:
 
 update column family user with key_validation_class='UTF8Type' and 
 column_metadata=[{column_name:last_name, validation_class:'UTF8Type', 
 index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', 
 index_type:KEYS}];
 
 
 Now if i connect via cqlsh and explore user table, i can see column 
 first_name,last_name are not part of table structure anymore. Here is the 
 output:
 
 CREATE TABLE user (
   key text PRIMARY KEY
 ) WITH
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'SnappyCompressor'};
 
 cqlsh:cql3usage select * from user;
 
  user_id
 -
  @mevivs
 
 
 
 
 
 I understand that, CQL3 and thrift interoperability is an issue. But this 
 looks to me a very basic scenario.
 
 
 
 Any suggestions? Or If anybody can explain a reason behind this?
 
 -Vivek
 
 
 
 



Re: is there a SSTAbleInput for Map/Reduce instead of ColumnFamily?

2013-08-30 Thread Jeremiah D Jordan
FYI: 
http://techblog.netflix.com/2012/02/aegisthus-bulk-data-pipeline-out-of.html

-Jeremiah

On Aug 30, 2013, at 9:21 AM, Hiller, Dean dean.hil...@nrel.gov wrote:

 is there a SSTableInput for Map/Reduce instead of ColumnFamily (which uses 
 thrift)?
 
 We are not worried about repeated reads since we are idempotent but would 
 rather have the direct speed (even if we had to read from a snapshot, it 
 would be fine).
 
 (We would most likely run our M/R on 4 nodes of the 12 nodes we have since we 
 have RF=3 right now).
 
 Thanks,
 Dean



Re: CQL Thrift

2013-08-30 Thread Vivek Mishra
And surprisingly if i alter table as :

alter table user add first_name text;
alter table user add last_name text;

It gives me back column with values, but still no indexes.

Thrift and CQL3 depends on same storage engine. Do they really maintain
different metadata for same column family?

-Vivek



On Fri, Aug 30, 2013 at 11:08 PM, Vivek Mishra mishra.v...@gmail.comwrote:

 Hi,
 If i a create a table with CQL3 as

 create table user(user_id text PRIMARY KEY, first_name text, last_name
 text, emailid text);

 and create index as:
 create index on user(first_name);

 then inserted some data as:
 insert into user(user_id,first_name,last_name,emailId)
 values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in');


 Then if update same column family using Cassandra-cli as:

 update column family user with key_validation_class='UTF8Type' and
 column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
 index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',
 index_type:KEYS}];


 Now if i connect via cqlsh and explore user table, i can see column
 first_name,last_name are not part of table structure anymore. Here is the
 output:

 CREATE TABLE user (
   key text PRIMARY KEY
 ) WITH
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'SnappyCompressor'};

 cqlsh:cql3usage select * from user;

  user_id
 -
  @mevivs





 I understand that, CQL3 and thrift interoperability is an issue. But this
 looks to me a very basic scenario.



 Any suggestions? Or If anybody can explain a reason behind this?

 -Vivek







Re: CQL Thrift

2013-08-30 Thread Peter Lin
in my case, I built a temporal database on top of Cassandra, so it's
absolutely key.

Dynamic columns are super powerful, which relational database have no
equivalent. For me, that is one of the top 3 reasons for using Cassandra.



On Fri, Aug 30, 2013 at 2:03 PM, Vivek Mishra mishra.v...@gmail.com wrote:

 If you talk about comparator. Yes, that's a valid point and not possible
 with CQL3.

 -Vivek


 On Fri, Aug 30, 2013 at 11:31 PM, Peter Lin wool...@gmail.com wrote:


 I use dynamic columns all the time and they vary in type.

 With CQL you can define a default type, but you can't insert specific
 types of data for column name and value. It forces you to use all bytes or
 all strings, which would require coverting it to other types.

 thrift is much more powerful in that respect.

 not everyone needs to take advantage of the full power of dynamic columns.


 On Fri, Aug 30, 2013 at 1:58 PM, Jon Haddad j...@jonhaddad.com wrote:

 Just curious - what do you need to do that requires thrift?  We've build
 our entire platform using CQL3 and we haven't hit any issues.

 On Aug 30, 2013, at 10:53 AM, Peter Lin wool...@gmail.com wrote:


 my bias perspective, I find the sweet spot is thrift for insert/update
 and CQL for select queries.

 CQL is too limiting and negates the power of storing arbitrary data
 types in dynamic columns.


 On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote:

 If you're going to work with CQL, work with CQL.  If you're going to
 work with Thrift, work with Thrift.  Don't mix.

 On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com
 wrote:

 Hi,
 If i a create a table with CQL3 as

 create table user(user_id text PRIMARY KEY, first_name text, last_name
 text, emailid text);

 and create index as:
 create index on user(first_name);

 then inserted some data as:
 insert into user(user_id,first_name,last_name,emailId)
 values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in');


 Then if update same column family using Cassandra-cli as:

 update column family user with key_validation_class='UTF8Type' and
 column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
 index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',
 index_type:KEYS}];


 Now if i connect via cqlsh and explore user table, i can see column
 first_name,last_name are not part of table structure anymore. Here is the
 output:

 CREATE TABLE user (
   key text PRIMARY KEY
 ) WITH
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'SnappyCompressor'};

 cqlsh:cql3usage select * from user;

  user_id
 -
  @mevivs





 I understand that, CQL3 and thrift interoperability is an issue. But
 this looks to me a very basic scenario.



 Any suggestions? Or If anybody can explain a reason behind this?

 -Vivek












Re: CQL Thrift

2013-08-30 Thread Vivek Mishra
Hi,
I understand that, but i want to understand the reason behind
such behavior?  Is it because of maintaining different metadata objects for
CQL3 and thrift?

Any suggestion?

-Vivek


On Fri, Aug 30, 2013 at 11:15 PM, Jon Haddad j...@jonhaddad.com wrote:

 If you're going to work with CQL, work with CQL.  If you're going to work
 with Thrift, work with Thrift.  Don't mix.

 On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote:

 Hi,
 If i a create a table with CQL3 as

 create table user(user_id text PRIMARY KEY, first_name text, last_name
 text, emailid text);

 and create index as:
 create index on user(first_name);

 then inserted some data as:
 insert into user(user_id,first_name,last_name,emailId)
 values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in');


 Then if update same column family using Cassandra-cli as:

 update column family user with key_validation_class='UTF8Type' and
 column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
 index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',
 index_type:KEYS}];


 Now if i connect via cqlsh and explore user table, i can see column
 first_name,last_name are not part of table structure anymore. Here is the
 output:

 CREATE TABLE user (
   key text PRIMARY KEY
 ) WITH
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'SnappyCompressor'};

 cqlsh:cql3usage select * from user;

  user_id
 -
  @mevivs





 I understand that, CQL3 and thrift interoperability is an issue. But this
 looks to me a very basic scenario.



 Any suggestions? Or If anybody can explain a reason behind this?

 -Vivek








Re: CQL Thrift

2013-08-30 Thread Jon Haddad
Could you please give a more concrete example?  

On Aug 30, 2013, at 11:10 AM, Peter Lin wool...@gmail.com wrote:

 
 in my case, I built a temporal database on top of Cassandra, so it's 
 absolutely key.
 
 Dynamic columns are super powerful, which relational database have no 
 equivalent. For me, that is one of the top 3 reasons for using Cassandra.
 
 
 
 On Fri, Aug 30, 2013 at 2:03 PM, Vivek Mishra mishra.v...@gmail.com wrote:
 If you talk about comparator. Yes, that's a valid point and not possible with 
 CQL3.
 
 -Vivek
 
 
 On Fri, Aug 30, 2013 at 11:31 PM, Peter Lin wool...@gmail.com wrote:
 
 I use dynamic columns all the time and they vary in type.
 
 With CQL you can define a default type, but you can't insert specific types 
 of data for column name and value. It forces you to use all bytes or all 
 strings, which would require coverting it to other types.
 
 thrift is much more powerful in that respect.
 
 not everyone needs to take advantage of the full power of dynamic columns.
 
 
 On Fri, Aug 30, 2013 at 1:58 PM, Jon Haddad j...@jonhaddad.com wrote:
 Just curious - what do you need to do that requires thrift?  We've build our 
 entire platform using CQL3 and we haven't hit any issues.  
 
 On Aug 30, 2013, at 10:53 AM, Peter Lin wool...@gmail.com wrote:
 
 
 my bias perspective, I find the sweet spot is thrift for insert/update and 
 CQL for select queries.
 
 CQL is too limiting and negates the power of storing arbitrary data types in 
 dynamic columns.
 
 
 On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote:
 If you're going to work with CQL, work with CQL.  If you're going to work 
 with Thrift, work with Thrift.  Don't mix.
 
 On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote:
 
 Hi,
 If i a create a table with CQL3 as 
 
 create table user(user_id text PRIMARY KEY, first_name text, last_name 
 text, emailid text);
 
 and create index as:
 create index on user(first_name);
 
 then inserted some data as:
 insert into user(user_id,first_name,last_name,emailId) 
 values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in');
 
 
 Then if update same column family using Cassandra-cli as:
 
 update column family user with key_validation_class='UTF8Type' and 
 column_metadata=[{column_name:last_name, validation_class:'UTF8Type', 
 index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', 
 index_type:KEYS}];
 
 
 Now if i connect via cqlsh and explore user table, i can see column 
 first_name,last_name are not part of table structure anymore. Here is the 
 output:
 
 CREATE TABLE user (
   key text PRIMARY KEY
 ) WITH
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'SnappyCompressor'};
 
 cqlsh:cql3usage select * from user;
 
  user_id
 -
  @mevivs
 
 
 
 
 
 I understand that, CQL3 and thrift interoperability is an issue. But this 
 looks to me a very basic scenario.
 
 
 
 Any suggestions? Or If anybody can explain a reason behind this?
 
 -Vivek
 
 
 
 
 
 
 
 
 
 



Re: CQL Thrift

2013-08-30 Thread Jonathan Ellis
http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows


On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin wool...@gmail.com wrote:


 my bias perspective, I find the sweet spot is thrift for insert/update and
 CQL for select queries.

 CQL is too limiting and negates the power of storing arbitrary data types
 in dynamic columns.


 On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote:

 If you're going to work with CQL, work with CQL.  If you're going to work
 with Thrift, work with Thrift.  Don't mix.

 On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote:

 Hi,
 If i a create a table with CQL3 as

 create table user(user_id text PRIMARY KEY, first_name text, last_name
 text, emailid text);

 and create index as:
 create index on user(first_name);

 then inserted some data as:
 insert into user(user_id,first_name,last_name,emailId)
 values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in');


 Then if update same column family using Cassandra-cli as:

 update column family user with key_validation_class='UTF8Type' and
 column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
 index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',
 index_type:KEYS}];


 Now if i connect via cqlsh and explore user table, i can see column
 first_name,last_name are not part of table structure anymore. Here is the
 output:

 CREATE TABLE user (
   key text PRIMARY KEY
 ) WITH
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'SnappyCompressor'};

 cqlsh:cql3usage select * from user;

  user_id
 -
  @mevivs





 I understand that, CQL3 and thrift interoperability is an issue. But this
 looks to me a very basic scenario.



 Any suggestions? Or If anybody can explain a reason behind this?

 -Vivek









-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: CQL Thrift

2013-08-30 Thread Jon Haddad
Just curious - what do you need to do that requires thrift?  We've build our 
entire platform using CQL3 and we haven't hit any issues.  

On Aug 30, 2013, at 10:53 AM, Peter Lin wool...@gmail.com wrote:

 
 my bias perspective, I find the sweet spot is thrift for insert/update and 
 CQL for select queries.
 
 CQL is too limiting and negates the power of storing arbitrary data types in 
 dynamic columns.
 
 
 On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote:
 If you're going to work with CQL, work with CQL.  If you're going to work 
 with Thrift, work with Thrift.  Don't mix.
 
 On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote:
 
 Hi,
 If i a create a table with CQL3 as 
 
 create table user(user_id text PRIMARY KEY, first_name text, last_name text, 
 emailid text);
 
 and create index as:
 create index on user(first_name);
 
 then inserted some data as:
 insert into user(user_id,first_name,last_name,emailId) 
 values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in');
 
 
 Then if update same column family using Cassandra-cli as:
 
 update column family user with key_validation_class='UTF8Type' and 
 column_metadata=[{column_name:last_name, validation_class:'UTF8Type', 
 index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', 
 index_type:KEYS}];
 
 
 Now if i connect via cqlsh and explore user table, i can see column 
 first_name,last_name are not part of table structure anymore. Here is the 
 output:
 
 CREATE TABLE user (
   key text PRIMARY KEY
 ) WITH
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'SnappyCompressor'};
 
 cqlsh:cql3usage select * from user;
 
  user_id
 -
  @mevivs
 
 
 
 
 
 I understand that, CQL3 and thrift interoperability is an issue. But this 
 looks to me a very basic scenario.
 
 
 
 Any suggestions? Or If anybody can explain a reason behind this?
 
 -Vivek
 
 
 
 
 
 



Re: CQL Thrift

2013-08-30 Thread Vivek Mishra
CQL is too limiting and negates the power of storing arbitrary data types
in dynamic columns.

I agree but partly. You can always create column family with key, column
and value and store any number of arbitrary columns as column name in
column and it's corresponding value with value.  I find it much easier.

Coming back to original question, i think differentiator is the column
metadata is treated in thrift and CQL3. What i do not understand is, for
same column family if maintaining two set of metadata
objects(CqlMetadata,CFDef), why updating anyone would cause trouble for
another!

-Vivek


On Fri, Aug 30, 2013 at 11:23 PM, Peter Lin wool...@gmail.com wrote:


 my bias perspective, I find the sweet spot is thrift for insert/update and
 CQL for select queries.

 CQL is too limiting and negates the power of storing arbitrary data types
 in dynamic columns.


 On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote:

 If you're going to work with CQL, work with CQL.  If you're going to work
 with Thrift, work with Thrift.  Don't mix.

 On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote:

 Hi,
 If i a create a table with CQL3 as

 create table user(user_id text PRIMARY KEY, first_name text, last_name
 text, emailid text);

 and create index as:
 create index on user(first_name);

 then inserted some data as:
 insert into user(user_id,first_name,last_name,emailId)
 values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in');


 Then if update same column family using Cassandra-cli as:

 update column family user with key_validation_class='UTF8Type' and
 column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
 index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',
 index_type:KEYS}];


 Now if i connect via cqlsh and explore user table, i can see column
 first_name,last_name are not part of table structure anymore. Here is the
 output:

 CREATE TABLE user (
   key text PRIMARY KEY
 ) WITH
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'SnappyCompressor'};

 cqlsh:cql3usage select * from user;

  user_id
 -
  @mevivs





 I understand that, CQL3 and thrift interoperability is an issue. But this
 looks to me a very basic scenario.



 Any suggestions? Or If anybody can explain a reason behind this?

 -Vivek









Re: CQL Thrift

2013-08-30 Thread Peter Lin
I use dynamic columns all the time and they vary in type.

With CQL you can define a default type, but you can't insert specific types
of data for column name and value. It forces you to use all bytes or all
strings, which would require coverting it to other types.

thrift is much more powerful in that respect.

not everyone needs to take advantage of the full power of dynamic columns.


On Fri, Aug 30, 2013 at 1:58 PM, Jon Haddad j...@jonhaddad.com wrote:

 Just curious - what do you need to do that requires thrift?  We've build
 our entire platform using CQL3 and we haven't hit any issues.

 On Aug 30, 2013, at 10:53 AM, Peter Lin wool...@gmail.com wrote:


 my bias perspective, I find the sweet spot is thrift for insert/update and
 CQL for select queries.

 CQL is too limiting and negates the power of storing arbitrary data types
 in dynamic columns.


 On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote:

 If you're going to work with CQL, work with CQL.  If you're going to work
 with Thrift, work with Thrift.  Don't mix.

 On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote:

 Hi,
 If i a create a table with CQL3 as

 create table user(user_id text PRIMARY KEY, first_name text, last_name
 text, emailid text);

 and create index as:
 create index on user(first_name);

 then inserted some data as:
 insert into user(user_id,first_name,last_name,emailId)
 values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in');


 Then if update same column family using Cassandra-cli as:

 update column family user with key_validation_class='UTF8Type' and
 column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
 index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',
 index_type:KEYS}];


 Now if i connect via cqlsh and explore user table, i can see column
 first_name,last_name are not part of table structure anymore. Here is the
 output:

 CREATE TABLE user (
   key text PRIMARY KEY
 ) WITH
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'SnappyCompressor'};

 cqlsh:cql3usage select * from user;

  user_id
 -
  @mevivs





 I understand that, CQL3 and thrift interoperability is an issue. But this
 looks to me a very basic scenario.



 Any suggestions? Or If anybody can explain a reason behind this?

 -Vivek










Re: CQL Thrift

2013-08-30 Thread Vivek Mishra
True for newly build platform(s), but what about existing apps build using
thrift? As per http://
www.datastax.com/dev/blog/thrift-to-cql3http://www.datastax.com/dev/blog/thrift-to-cql3
it
should be easy.

I am just curious to understand the real reason behind such behavior.

-Vivek



On Fri, Aug 30, 2013 at 11:28 PM, Jon Haddad j...@jonhaddad.com wrote:

 Just curious - what do you need to do that requires thrift?  We've build
 our entire platform using CQL3 and we haven't hit any issues.

 On Aug 30, 2013, at 10:53 AM, Peter Lin wool...@gmail.com wrote:


 my bias perspective, I find the sweet spot is thrift for insert/update and
 CQL for select queries.

 CQL is too limiting and negates the power of storing arbitrary data types
 in dynamic columns.


 On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote:

 If you're going to work with CQL, work with CQL.  If you're going to work
 with Thrift, work with Thrift.  Don't mix.

 On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote:

 Hi,
 If i a create a table with CQL3 as

 create table user(user_id text PRIMARY KEY, first_name text, last_name
 text, emailid text);

 and create index as:
 create index on user(first_name);

 then inserted some data as:
 insert into user(user_id,first_name,last_name,emailId)
 values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in');


 Then if update same column family using Cassandra-cli as:

 update column family user with key_validation_class='UTF8Type' and
 column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
 index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',
 index_type:KEYS}];


 Now if i connect via cqlsh and explore user table, i can see column
 first_name,last_name are not part of table structure anymore. Here is the
 output:

 CREATE TABLE user (
   key text PRIMARY KEY
 ) WITH
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'SnappyCompressor'};

 cqlsh:cql3usage select * from user;

  user_id
 -
  @mevivs





 I understand that, CQL3 and thrift interoperability is an issue. But this
 looks to me a very basic scenario.



 Any suggestions? Or If anybody can explain a reason behind this?

 -Vivek










Re: CQL Thrift

2013-08-30 Thread Vivek Mishra
If you talk about comparator. Yes, that's a valid point and not possible
with CQL3.

-Vivek


On Fri, Aug 30, 2013 at 11:31 PM, Peter Lin wool...@gmail.com wrote:


 I use dynamic columns all the time and they vary in type.

 With CQL you can define a default type, but you can't insert specific
 types of data for column name and value. It forces you to use all bytes or
 all strings, which would require coverting it to other types.

 thrift is much more powerful in that respect.

 not everyone needs to take advantage of the full power of dynamic columns.


 On Fri, Aug 30, 2013 at 1:58 PM, Jon Haddad j...@jonhaddad.com wrote:

 Just curious - what do you need to do that requires thrift?  We've build
 our entire platform using CQL3 and we haven't hit any issues.

 On Aug 30, 2013, at 10:53 AM, Peter Lin wool...@gmail.com wrote:


 my bias perspective, I find the sweet spot is thrift for insert/update
 and CQL for select queries.

 CQL is too limiting and negates the power of storing arbitrary data types
 in dynamic columns.


 On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote:

 If you're going to work with CQL, work with CQL.  If you're going to
 work with Thrift, work with Thrift.  Don't mix.

 On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com
 wrote:

 Hi,
 If i a create a table with CQL3 as

 create table user(user_id text PRIMARY KEY, first_name text, last_name
 text, emailid text);

 and create index as:
 create index on user(first_name);

 then inserted some data as:
 insert into user(user_id,first_name,last_name,emailId)
 values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in');


 Then if update same column family using Cassandra-cli as:

 update column family user with key_validation_class='UTF8Type' and
 column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
 index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',
 index_type:KEYS}];


 Now if i connect via cqlsh and explore user table, i can see column
 first_name,last_name are not part of table structure anymore. Here is the
 output:

 CREATE TABLE user (
   key text PRIMARY KEY
 ) WITH
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'SnappyCompressor'};

 cqlsh:cql3usage select * from user;

  user_id
 -
  @mevivs





 I understand that, CQL3 and thrift interoperability is an issue. But
 this looks to me a very basic scenario.



 Any suggestions? Or If anybody can explain a reason behind this?

 -Vivek











Re: CQL Thrift

2013-08-30 Thread Peter Lin
In the interest of education and discussion.

I didn't mean to say CQL3 doesn't support dynamic columns. The example from
the page shows default type defined in the create statement.

create column family data
with key_validation_class=Int32Type
 and comparator=DateType
 and default_validation_class=FloatType;


If I try to insert a dynamic column that uses double for column name and
string for column value, it will throw an error. The kind of use case I'm
talking about defines a minimum number of static columns. Most of the
columns that are added at runtime are different name and value type. This
is specific to my use case.

Having said that, I believe it would be possible to provide that kind of
feature in CQL, but the trade off is it deviates from SQL. The grammar
would have to allow type declaration in the columns list and functions in
the values. Something like

insert into mytable (KEY, doubleType(newcol1), string(newcol2)) values
('abc123', some string, double(102.211))

doubleType(newcol1) and string(newcol2) are dynamic columns.

I know many people find thrift hard to grok and struggle with it, but I'm a
firm believer in taking time to learn. Every developer should take time to
read cassandra source code and the source code for the driver they're using.



On Fri, Aug 30, 2013 at 2:18 PM, Jonathan Ellis jbel...@gmail.com wrote:

 http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows


 On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin wool...@gmail.com wrote:


 my bias perspective, I find the sweet spot is thrift for insert/update
 and CQL for select queries.

 CQL is too limiting and negates the power of storing arbitrary data types
 in dynamic columns.


 On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote:

 If you're going to work with CQL, work with CQL.  If you're going to
 work with Thrift, work with Thrift.  Don't mix.

 On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com
 wrote:

 Hi,
 If i a create a table with CQL3 as

 create table user(user_id text PRIMARY KEY, first_name text, last_name
 text, emailid text);

 and create index as:
 create index on user(first_name);

 then inserted some data as:
 insert into user(user_id,first_name,last_name,emailId)
 values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in');


 Then if update same column family using Cassandra-cli as:

 update column family user with key_validation_class='UTF8Type' and
 column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
 index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',
 index_type:KEYS}];


 Now if i connect via cqlsh and explore user table, i can see column
 first_name,last_name are not part of table structure anymore. Here is the
 output:

 CREATE TABLE user (
   key text PRIMARY KEY
 ) WITH
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'SnappyCompressor'};

 cqlsh:cql3usage select * from user;

  user_id
 -
  @mevivs





 I understand that, CQL3 and thrift interoperability is an issue. But
 this looks to me a very basic scenario.



 Any suggestions? Or If anybody can explain a reason behind this?

 -Vivek









 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced



Re: CQL Thrift

2013-08-30 Thread Jon Haddad
It sounds like you want this:

create table data ( pk int, colname blob, value blob, primary key (pk, 
colname));

that gives you arbitrary columns (cleverly labeled colname) in a single row, 
where the value is value. 

If you don't want the overhead of storing colname in every row, try with 
compact storage.

Does this solve the problem, or am I missing something?

On Aug 30, 2013, at 11:45 AM, Peter Lin wool...@gmail.com wrote:

 
 you could dynamically create new tables at runtime and insert rows into the 
 new table, but is that better than using thrift and putting it into a regular 
 dynamic column with the exact name type and value type?
 
 that would mean if there's 20 dynamic columns of different types, you'd have 
 to execute 21 queries to rebuild the data. That's basically the same as using 
 EVA tables in relational databases.
 
 Having used that approach in the past to build temporal databases, it doesn't 
 scale well.
 
 
 
 On Fri, Aug 30, 2013 at 2:40 PM, Vivek Mishra mishra.v...@gmail.com wrote:
 create a column family as:
 
 create table dynamicTable(key text, nameAsDouble double, valueAsBlob blob);
 
 insert into dynamicTable(key, nameAsDouble, valueAsBlob) values ( key, 
 double(102.211), textAsBlob('valueInBytes').
 
 Do you think, it will work in case column name are double?
 
 -Vivek
 
 
 On Sat, Aug 31, 2013 at 12:03 AM, Peter Lin wool...@gmail.com wrote:
 
 In the interest of education and discussion.
 
 I didn't mean to say CQL3 doesn't support dynamic columns. The example from 
 the page shows default type defined in the create statement.
 create column family data 
 with key_validation_class=Int32Type 
  and comparator=DateType 
  and default_validation_class=FloatType;
 
 
 If I try to insert a dynamic column that uses double for column name and 
 string for column value, it will throw an error. The kind of use case I'm 
 talking about defines a minimum number of static columns. Most of the columns 
 that are added at runtime are different name and value type. This is specific 
 to my use case.
 
 Having said that, I believe it would be possible to provide that kind of 
 feature in CQL, but the trade off is it deviates from SQL. The grammar would 
 have to allow type declaration in the columns list and functions in the 
 values. Something like
 
 insert into mytable (KEY, doubleType(newcol1), string(newcol2)) values 
 ('abc123', some string, double(102.211))
 
 doubleType(newcol1) and string(newcol2) are dynamic columns.
 
 I know many people find thrift hard to grok and struggle with it, but I'm a 
 firm believer in taking time to learn. Every developer should take time to 
 read cassandra source code and the source code for the driver they're using.
 
 
 
 On Fri, Aug 30, 2013 at 2:18 PM, Jonathan Ellis jbel...@gmail.com wrote:
 http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows
 
 
 On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin wool...@gmail.com wrote:
 
 my bias perspective, I find the sweet spot is thrift for insert/update and 
 CQL for select queries.
 
 CQL is too limiting and negates the power of storing arbitrary data types in 
 dynamic columns.
 
 
 On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote:
 If you're going to work with CQL, work with CQL.  If you're going to work 
 with Thrift, work with Thrift.  Don't mix.
 
 On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote:
 
 Hi,
 If i a create a table with CQL3 as 
 
 create table user(user_id text PRIMARY KEY, first_name text, last_name text, 
 emailid text);
 
 and create index as:
 create index on user(first_name);
 
 then inserted some data as:
 insert into user(user_id,first_name,last_name,emailId) 
 values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in');
 
 
 Then if update same column family using Cassandra-cli as:
 
 update column family user with key_validation_class='UTF8Type' and 
 column_metadata=[{column_name:last_name, validation_class:'UTF8Type', 
 index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', 
 index_type:KEYS}];
 
 
 Now if i connect via cqlsh and explore user table, i can see column 
 first_name,last_name are not part of table structure anymore. Here is the 
 output:
 
 CREATE TABLE user (
   key text PRIMARY KEY
 ) WITH
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'SnappyCompressor'};
 
 cqlsh:cql3usage select * from user;
 
  user_id
 -
  @mevivs
 
 
 
 
 
 I understand that, CQL3 and thrift interoperability is an issue. But this 
 looks to me a very basic scenario.
 
 
 
 Any suggestions? Or If anybody can explain a reason behind this?
 
 -Vivek
 
 
 
 
 
 
 
 
 
 -- 
 Jonathan 

Re: CQL Thrift

2013-08-30 Thread Les Hazlewood
On Fri, Aug 30, 2013 at 10:58 AM, Jon Haddad j...@jonhaddad.com wrote:

 Just curious - what do you need to do that requires thrift?  We've build
 our entire platform using CQL3 and we haven't hit any issues.


Here's one thing: If you're using wide rows and you want to do anything
other than just append individual columns to the row, then CQL3 (as it
functions currently) is way too slow.

I just created the following Jira issue 5 minutes ago because we've been
fighting with this issue for the last 2 days. Our workaround was to swap
out CQL3 + DataStax Java Driver in favor of Astyanax for this particular
use case:

https://issues.apache.org/jira/browse/CASSANDRA-5959

Cheers,

--
Les Hazlewood | @lhazlewood
CTO, Stormpath | http://stormpath.com | @goStormpath | 888.391.5282


Re: CQL Thrift

2013-08-30 Thread Vivek Mishra
Did you try to explore CQL3 collection support for the same? You can
definitely save on number of rows with that.

Point which i am trying to make out is, you can achieve it via CQL3 (
Jonathan's blog :
http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows)

I agree with you that still thrift may have some valid points to prove, but
considering latest development around new Cassandra features, i think CQL3
is the path to follow.


-Vivek


On Sat, Aug 31, 2013 at 12:15 AM, Peter Lin wool...@gmail.com wrote:


 you could dynamically create new tables at runtime and insert rows into
 the new table, but is that better than using thrift and putting it into a
 regular dynamic column with the exact name type and value type?

 that would mean if there's 20 dynamic columns of different types, you'd
 have to execute 21 queries to rebuild the data. That's basically the same
 as using EVA tables in relational databases.

 Having used that approach in the past to build temporal databases, it
 doesn't scale well.



 On Fri, Aug 30, 2013 at 2:40 PM, Vivek Mishra mishra.v...@gmail.comwrote:

 create a column family as:

 create table dynamicTable(key text, nameAsDouble double, valueAsBlob
 blob);

 insert into dynamicTable(key, nameAsDouble, valueAsBlob) values ( key, 
 double(102.211),
 textAsBlob('valueInBytes').

 Do you think, it will work in case column name are double?

 -Vivek


 On Sat, Aug 31, 2013 at 12:03 AM, Peter Lin wool...@gmail.com wrote:


 In the interest of education and discussion.

 I didn't mean to say CQL3 doesn't support dynamic columns. The example
 from the page shows default type defined in the create statement.

 create column family data
 with key_validation_class=Int32Type
  and comparator=DateType
  and default_validation_class=FloatType;


 If I try to insert a dynamic column that uses double for column name and
 string for column value, it will throw an error. The kind of use case I'm
 talking about defines a minimum number of static columns. Most of the
 columns that are added at runtime are different name and value type. This
 is specific to my use case.

 Having said that, I believe it would be possible to provide that kind
 of feature in CQL, but the trade off is it deviates from SQL. The grammar
 would have to allow type declaration in the columns list and functions in
 the values. Something like

 insert into mytable (KEY, doubleType(newcol1), string(newcol2)) values
 ('abc123', some string, double(102.211))

 doubleType(newcol1) and string(newcol2) are dynamic columns.

 I know many people find thrift hard to grok and struggle with it, but
 I'm a firm believer in taking time to learn. Every developer should take
 time to read cassandra source code and the source code for the driver
 they're using.



 On Fri, Aug 30, 2013 at 2:18 PM, Jonathan Ellis jbel...@gmail.comwrote:


 http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows


 On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin wool...@gmail.com wrote:


 my bias perspective, I find the sweet spot is thrift for insert/update
 and CQL for select queries.

 CQL is too limiting and negates the power of storing arbitrary data
 types in dynamic columns.


 On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote:

 If you're going to work with CQL, work with CQL.  If you're going to
 work with Thrift, work with Thrift.  Don't mix.

 On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com
 wrote:

 Hi,
 If i a create a table with CQL3 as

 create table user(user_id text PRIMARY KEY, first_name text,
 last_name text, emailid text);

 and create index as:
 create index on user(first_name);

 then inserted some data as:
 insert into user(user_id,first_name,last_name,emailId)
 values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in');


 Then if update same column family using Cassandra-cli as:

 update column family user with key_validation_class='UTF8Type' and
 column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
 index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',
 index_type:KEYS}];


 Now if i connect via cqlsh and explore user table, i can see column
 first_name,last_name are not part of table structure anymore. Here is the
 output:

 CREATE TABLE user (
   key text PRIMARY KEY
 ) WITH
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'SnappyCompressor'};

 cqlsh:cql3usage select * from user;

  user_id
 -
  @mevivs





 I understand that, CQL3 and thrift interoperability is an issue. But
 this looks to me a very basic scenario.



 Any suggestions? Or If anybody can explain a reason behind this?

 -Vivek









Re: CQL Thrift

2013-08-30 Thread Vivek Mishra
@lhazlewood

https://issues.apache.org/jira/browse/CASSANDRA-5959

Begin batch

 multiple insert statements.

apply batch

It doesn't work for you?

-Vivek
On Sat, Aug 31, 2013 at 12:21 AM, Les Hazlewood lhazlew...@apache.orgwrote:

 On Fri, Aug 30, 2013 at 10:58 AM, Jon Haddad j...@jonhaddad.com wrote:

 Just curious - what do you need to do that requires thrift?  We've build
 our entire platform using CQL3 and we haven't hit any issues.


 Here's one thing: If you're using wide rows and you want to do anything
 other than just append individual columns to the row, then CQL3 (as it
 functions currently) is way too slow.

 I just created the following Jira issue 5 minutes ago because we've been
 fighting with this issue for the last 2 days. Our workaround was to swap
 out CQL3 + DataStax Java Driver in favor of Astyanax for this particular
 use case:

 https://issues.apache.org/jira/browse/CASSANDRA-5959

 Cheers,

 --
 Les Hazlewood | @lhazlewood
 CTO, Stormpath | http://stormpath.com | @goStormpath | 888.391.5282



CQL Thrift

2013-08-30 Thread Vivek Mishra
Hi,
If i a create a table with CQL3 as

create table user(user_id text PRIMARY KEY, first_name text, last_name
text, emailid text);

and create index as:
create index on user(first_name);

then inserted some data as:
insert into user(user_id,first_name,last_name,emailId)
values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in');


Then if update same column family using Cassandra-cli as:

update column family user with key_validation_class='UTF8Type' and
column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',
index_type:KEYS}];


Now if i connect via cqlsh and explore user table, i can see column
first_name,last_name are not part of table structure anymore. Here is the
output:

CREATE TABLE user (
  key text PRIMARY KEY
) WITH
  bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=0.10 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};

cqlsh:cql3usage select * from user;

 user_id
-
 @mevivs





I understand that, CQL3 and thrift interoperability is an issue. But this
looks to me a very basic scenario.



Any suggestions? Or If anybody can explain a reason behind this?

-Vivek


Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Mike Neir
Is there anything that you can link that describes the pitfalls you mention? I'd 
like a bit more information. Just for clarity's sake, are you recommending 1.0.9 
- 1.0.12 - 1.1.12 - 1.2.x? Or would  1.0.9 - 1.1.12 - 1.2.x suffice?


Regarding the placement strategy mentioned in a different post, I'm using the 
Simple placement strategy, with the RackInferringSnitch. How does that play into 
the bugs mentioned previously about cross-DC replication?


MN

On 08/30/2013 01:28 PM, Jeremiah D Jordan wrote:

You probably want to go to 1.0.11/12 first no matter what.  If you want the least 
chance of issue you should then go to 1.1.12.  While there is a high probability 
that going from 1.0.X-1.2 will work. You have the best chance at no failures 
if you go through 1.1.12.  There are some edge cases that can cause errors if you 
don't do that.

-Jeremiah




Re: CQL Thrift

2013-08-30 Thread Peter Lin
CQL3 collections is meant to store stuff that is list, set, map. Plus,
collections currently do not supporting secondary indexes.

The point is often you don't know what columns are needed at design time.
If you know what's needed, use static columns.

Using a list, set or map to store data you don't know and can't predict in
the future feels like a hammer solution. Cassandra has this super
powerful and useful feature that developers can use via thrift.

The last time I looked DataStax's official statement is that thrift isn't
going away, so I take them at their word.



On Fri, Aug 30, 2013 at 2:51 PM, Vivek Mishra mishra.v...@gmail.com wrote:

 Did you try to explore CQL3 collection support for the same? You can
 definitely save on number of rows with that.

 Point which i am trying to make out is, you can achieve it via CQL3 (
 Jonathan's blog :
 http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows
 )

 I agree with you that still thrift may have some valid points to prove,
 but considering latest development around new Cassandra features, i think
 CQL3 is the path to follow.


 -Vivek


 On Sat, Aug 31, 2013 at 12:15 AM, Peter Lin wool...@gmail.com wrote:


 you could dynamically create new tables at runtime and insert rows into
 the new table, but is that better than using thrift and putting it into a
 regular dynamic column with the exact name type and value type?

 that would mean if there's 20 dynamic columns of different types, you'd
 have to execute 21 queries to rebuild the data. That's basically the same
 as using EVA tables in relational databases.

  Having used that approach in the past to build temporal databases, it
 doesn't scale well.



 On Fri, Aug 30, 2013 at 2:40 PM, Vivek Mishra mishra.v...@gmail.comwrote:

 create a column family as:

 create table dynamicTable(key text, nameAsDouble double, valueAsBlob
 blob);

 insert into dynamicTable(key, nameAsDouble, valueAsBlob) values ( key, 
 double(102.211),
 textAsBlob('valueInBytes').

 Do you think, it will work in case column name are double?

 -Vivek


 On Sat, Aug 31, 2013 at 12:03 AM, Peter Lin wool...@gmail.com wrote:


 In the interest of education and discussion.

 I didn't mean to say CQL3 doesn't support dynamic columns. The example
 from the page shows default type defined in the create statement.

 create column family data
 with key_validation_class=Int32Type
  and comparator=DateType
  and default_validation_class=FloatType;


 If I try to insert a dynamic column that uses double for column name
 and string for column value, it will throw an error. The kind of use case
 I'm talking about defines a minimum number of static columns. Most of the
 columns that are added at runtime are different name and value type. This
 is specific to my use case.

 Having said that, I believe it would be possible to provide that kind
 of feature in CQL, but the trade off is it deviates from SQL. The grammar
 would have to allow type declaration in the columns list and functions in
 the values. Something like

 insert into mytable (KEY, doubleType(newcol1), string(newcol2)) values
 ('abc123', some string, double(102.211))

 doubleType(newcol1) and string(newcol2) are dynamic columns.

 I know many people find thrift hard to grok and struggle with it, but
 I'm a firm believer in taking time to learn. Every developer should take
 time to read cassandra source code and the source code for the driver
 they're using.



 On Fri, Aug 30, 2013 at 2:18 PM, Jonathan Ellis jbel...@gmail.comwrote:


 http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows


 On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin wool...@gmail.com wrote:


 my bias perspective, I find the sweet spot is thrift for
 insert/update and CQL for select queries.

 CQL is too limiting and negates the power of storing arbitrary data
 types in dynamic columns.


 On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.comwrote:

 If you're going to work with CQL, work with CQL.  If you're going to
 work with Thrift, work with Thrift.  Don't mix.

 On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com
 wrote:

 Hi,
 If i a create a table with CQL3 as

 create table user(user_id text PRIMARY KEY, first_name text,
 last_name text, emailid text);

 and create index as:
 create index on user(first_name);

 then inserted some data as:
 insert into user(user_id,first_name,last_name,emailId)
 values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in');


 Then if update same column family using Cassandra-cli as:

 update column family user with key_validation_class='UTF8Type' and
 column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
 index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',
 index_type:KEYS}];


 Now if i connect via cqlsh and explore user table, i can see column
 first_name,last_name are not part of table structure anymore. Here is 
 the
 output:

 CREATE TABLE user (
   key text PRIMARY KEY

Re: successful use of shuffle?

2013-08-30 Thread Jeremiah D Jordan
You need to introduce the new vnode enabled nodes in a new DC.  Or you will 
have similar issues to https://issues.apache.org/jira/browse/CASSANDRA-5525

Add vnode DC:
http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/operations/ops_add_dc_to_cluster_t.html

Point clients to new DC

Remove non vnode DC:
http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/operations/ops_decomission_dc_t.html

-Jeremiah

On Aug 30, 2013, at 3:04 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:

 +1.
 
 I am still afraid of this step. Yet you can avoid it by introducing new 
 nodes, with vnodes enabled, and then remove old ones. This should work.
 
 My problem is that I am not really confident in vnodes either...
 
 Any share, on this transition, and then of the use of vnodes would be great 
 indeed.
 
 Alain
 
 
 2013/8/29 Robert Coli rc...@eventbrite.com
 Hi!
 
 I've been wondering... is there anyone in the cassandra-user audience who has 
 used shuffle feature successfully on a non-toy-or-testing cluster? If so, 
 could you describe the experience you had and any problems you encountered?
 
 Thanks!
 
 =Rob
 



Re: CQL Thrift

2013-08-30 Thread Alex Popescu
On Fri, Aug 30, 2013 at 11:56 AM, Vivek Mishra mishra.v...@gmail.comwrote:

 @lhazlewood

 https://issues.apache.org/jira/browse/CASSANDRA-5959

 Begin batch

  multiple insert statements.

 apply batch

 It doesn't work for you?

 -Vivek


According to the OP batching inserts is slow. The SO thread [1] mentions
that the in their environment BATCH takes 1.5min, while the Thrift-based
approach is around 235millis.

[1]
http://stackoverflow.com/questions/18522191/using-cassandra-and-cql3-how-do-you-insert-an-entire-wide-row-in-a-single-reque
-- 

:- a)


Alex Popescu
Sen. Product Manager @ DataStax
@al3xandru


Re: CQL Thrift

2013-08-30 Thread Jon Haddad
It seems really strange to me that you're create a table with specific types 
then try to deviate from it.  Why not just use the blob type, then you can 
store whatever you want in there?

The whole point of adding strong typing is to adhere to it.  I wouldn't 
consider it a fault of the database that it does what you asked it to.

On Aug 30, 2013, at 11:33 AM, Peter Lin wool...@gmail.com wrote:

 
 In the interest of education and discussion.
 
 I didn't mean to say CQL3 doesn't support dynamic columns. The example from 
 the page shows default type defined in the create statement.
 create column family data 
 with key_validation_class=Int32Type 
  and comparator=DateType 
  and default_validation_class=FloatType;
 
 
 If I try to insert a dynamic column that uses double for column name and 
 string for column value, it will throw an error. The kind of use case I'm 
 talking about defines a minimum number of static columns. Most of the columns 
 that are added at runtime are different name and value type. This is specific 
 to my use case.
 
 Having said that, I believe it would be possible to provide that kind of 
 feature in CQL, but the trade off is it deviates from SQL. The grammar would 
 have to allow type declaration in the columns list and functions in the 
 values. Something like
 
 insert into mytable (KEY, doubleType(newcol1), string(newcol2)) values 
 ('abc123', some string, double(102.211))
 
 doubleType(newcol1) and string(newcol2) are dynamic columns.
 
 I know many people find thrift hard to grok and struggle with it, but I'm a 
 firm believer in taking time to learn. Every developer should take time to 
 read cassandra source code and the source code for the driver they're using.
 
 
 
 On Fri, Aug 30, 2013 at 2:18 PM, Jonathan Ellis jbel...@gmail.com wrote:
 http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows
 
 
 On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin wool...@gmail.com wrote:
 
 my bias perspective, I find the sweet spot is thrift for insert/update and 
 CQL for select queries.
 
 CQL is too limiting and negates the power of storing arbitrary data types in 
 dynamic columns.
 
 
 On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote:
 If you're going to work with CQL, work with CQL.  If you're going to work 
 with Thrift, work with Thrift.  Don't mix.
 
 On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote:
 
 Hi,
 If i a create a table with CQL3 as 
 
 create table user(user_id text PRIMARY KEY, first_name text, last_name text, 
 emailid text);
 
 and create index as:
 create index on user(first_name);
 
 then inserted some data as:
 insert into user(user_id,first_name,last_name,emailId) 
 values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in');
 
 
 Then if update same column family using Cassandra-cli as:
 
 update column family user with key_validation_class='UTF8Type' and 
 column_metadata=[{column_name:last_name, validation_class:'UTF8Type', 
 index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', 
 index_type:KEYS}];
 
 
 Now if i connect via cqlsh and explore user table, i can see column 
 first_name,last_name are not part of table structure anymore. Here is the 
 output:
 
 CREATE TABLE user (
   key text PRIMARY KEY
 ) WITH
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'SnappyCompressor'};
 
 cqlsh:cql3usage select * from user;
 
  user_id
 -
  @mevivs
 
 
 
 
 
 I understand that, CQL3 and thrift interoperability is an issue. But this 
 looks to me a very basic scenario.
 
 
 
 Any suggestions? Or If anybody can explain a reason behind this?
 
 -Vivek
 
 
 
 
 
 
 
 
 
 -- 
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced
 



Re: CQL3 wide row and slow inserts - is there a single insert alternative?

2013-08-30 Thread Les Hazlewood
Well, it appears that this just isn't possible.  I created CASSANDRA-5959
as a result.  (Backstory + performance testing results are described in the
issue):

https://issues.apache.org/jira/browse/CASSANDRA-5959

--
Les Hazlewood | @lhazlewood
CTO, Stormpath | http://stormpath.com | @goStormpath | 888.391.5282

On Thu, Aug 29, 2013 at 12:04 PM, Les Hazlewood lhazlew...@apache.orgwrote:

 Hi all,

 We're using a Cassandra table to store search results in a
 table/column family that that look like this:

 ++-+-+-+
 || 0   | 1   | 2   | ...
 ++-+-+-+
 | row_id | text... | text... | text... | ...

 The column name is the index # (an integer) of the location in the
 overall result set.  The value is the result at that particular index.
  This is great because pagination becomes a simple slice query on the
 column name.

 Large result sets are split into multiple rows - we're limiting row
 size on disk to be around 6 or 7 MB.  For our particular result
 entries, this means we can get around 50,000 columns in a single row.

 When we create the rows, we have the entire data available in the
 application at the time the row insert is necessary.

 Using CQL3, an initial implementation had one INSERT statement per
 column.  This was killing performance (not to mention the # of
 tombstones it created).

 Here's the CQL3 table definition:

 create table query_results (
 row_id text,
 shard_num int,
 list_index int,
 result text,
 primary key (row_id, shard_num), list_index))
 with compact storage

 (the row key is row_id + shard_num.  The 'cluster column' is list_index).

 I don't want to execute 50,000 INSERT statements for a single row.  We
 have all of the data up front - I want to execute a single INSERT.

 Is this possible?

 We're using the Datastax Java Driver.

 Thanks for any help!

 Les



Re: CQL Thrift

2013-08-30 Thread Vivek Mishra
create a column family as:

create table dynamicTable(key text, nameAsDouble double, valueAsBlob blob);

insert into dynamicTable(key, nameAsDouble, valueAsBlob) values (
key, double(102.211),
textAsBlob('valueInBytes').

Do you think, it will work in case column name are double?

-Vivek


On Sat, Aug 31, 2013 at 12:03 AM, Peter Lin wool...@gmail.com wrote:


 In the interest of education and discussion.

 I didn't mean to say CQL3 doesn't support dynamic columns. The example
 from the page shows default type defined in the create statement.

 create column family data
 with key_validation_class=Int32Type
  and comparator=DateType
  and default_validation_class=FloatType;


 If I try to insert a dynamic column that uses double for column name and
 string for column value, it will throw an error. The kind of use case I'm
 talking about defines a minimum number of static columns. Most of the
 columns that are added at runtime are different name and value type. This
 is specific to my use case.

 Having said that, I believe it would be possible to provide that kind of
 feature in CQL, but the trade off is it deviates from SQL. The grammar
 would have to allow type declaration in the columns list and functions in
 the values. Something like

 insert into mytable (KEY, doubleType(newcol1), string(newcol2)) values
 ('abc123', some string, double(102.211))

 doubleType(newcol1) and string(newcol2) are dynamic columns.

 I know many people find thrift hard to grok and struggle with it, but I'm
 a firm believer in taking time to learn. Every developer should take time
 to read cassandra source code and the source code for the driver they're
 using.



 On Fri, Aug 30, 2013 at 2:18 PM, Jonathan Ellis jbel...@gmail.com wrote:


 http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows


 On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin wool...@gmail.com wrote:


 my bias perspective, I find the sweet spot is thrift for insert/update
 and CQL for select queries.

 CQL is too limiting and negates the power of storing arbitrary data
 types in dynamic columns.


 On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote:

 If you're going to work with CQL, work with CQL.  If you're going to
 work with Thrift, work with Thrift.  Don't mix.

 On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com
 wrote:

 Hi,
 If i a create a table with CQL3 as

 create table user(user_id text PRIMARY KEY, first_name text, last_name
 text, emailid text);

 and create index as:
 create index on user(first_name);

 then inserted some data as:
 insert into user(user_id,first_name,last_name,emailId)
 values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in');


 Then if update same column family using Cassandra-cli as:

 update column family user with key_validation_class='UTF8Type' and
 column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
 index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',
 index_type:KEYS}];


 Now if i connect via cqlsh and explore user table, i can see column
 first_name,last_name are not part of table structure anymore. Here is the
 output:

 CREATE TABLE user (
   key text PRIMARY KEY
 ) WITH
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'SnappyCompressor'};

 cqlsh:cql3usage select * from user;

  user_id
 -
  @mevivs





 I understand that, CQL3 and thrift interoperability is an issue. But
 this looks to me a very basic scenario.



 Any suggestions? Or If anybody can explain a reason behind this?

 -Vivek









 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced





Re: CQL Thrift

2013-08-30 Thread Les Hazlewood
Yes, that's correct - and that's a scaled number.  In practice:

On the local dev machine, CQL3 inserting 10,000 columns (for 1 row) in a
BATCH took 1.5 minutes.  50,000 columns (the desired amount) in a BATCH
took 7.5 minutes.  The same Thrift functionality took _235 milliseconds_.
 That's almost 2,000 times faster (3 orders of magnitude difference)!

However, according to Aleksey Yeschenko, this performance problem has been
addressed in 2.0 beta 1 via
https://issues.apache.org/jira/browse/CASSANDRA-4693.

I'll reserve judgement until I can performance-test 2.0 beta 1 ;)

Cheers,

--
Les Hazlewood | @lhazlewood
CTO, Stormpath | http://stormpath.com | @goStormpath | 888.391.5282

On Fri, Aug 30, 2013 at 12:50 PM, Alex Popescu al...@datastax.com wrote:

 On Fri, Aug 30, 2013 at 11:56 AM, Vivek Mishra mishra.v...@gmail.comwrote:

 @lhazlewood

 https://issues.apache.org/jira/browse/CASSANDRA-5959

 Begin batch

  multiple insert statements.

 apply batch

 It doesn't work for you?

 -Vivek


 According to the OP batching inserts is slow. The SO thread [1] mentions
 that the in their environment BATCH takes 1.5min, while the Thrift-based
 approach is around 235millis.

 [1]
 http://stackoverflow.com/questions/18522191/using-cassandra-and-cql3-how-do-you-insert-an-entire-wide-row-in-a-single-reque
 --

 :- a)


 Alex Popescu
 Sen. Product Manager @ DataStax
 @al3xandru



Re: CQL Thrift

2013-08-30 Thread Peter Lin
This has nothing to do with compact storage.

Cassandra supports arbitrary dynamic columns of different name/value type
today. If people are happy with SQL metaphor, then CQL is fine.

Then again, if SQL metaphor was good for temporal databases, there wouldn't
be so many failed temporal databases built on RDB. I've built over 4
bi-temporal databases on RDB over the last 12 years, so it's not something
that was done lightly.

it was from years of pain. I won't bore others about the challenges of
building temporal databases.




On Fri, Aug 30, 2013 at 2:51 PM, Jon Haddad j...@jonhaddad.com wrote:

 It sounds like you want this:

 create table data ( pk int, colname blob, value blob, primary key (pk,
 colname));

 that gives you arbitrary columns (cleverly labeled colname) in a single
 row, where the value is value.

 If you don't want the overhead of storing colname in every row, try with
 compact storage.

 Does this solve the problem, or am I missing something?

 On Aug 30, 2013, at 11:45 AM, Peter Lin wool...@gmail.com wrote:


 you could dynamically create new tables at runtime and insert rows into
 the new table, but is that better than using thrift and putting it into a
 regular dynamic column with the exact name type and value type?

 that would mean if there's 20 dynamic columns of different types, you'd
 have to execute 21 queries to rebuild the data. That's basically the same
 as using EVA tables in relational databases.

 Having used that approach in the past to build temporal databases, it
 doesn't scale well.



 On Fri, Aug 30, 2013 at 2:40 PM, Vivek Mishra mishra.v...@gmail.comwrote:

 create a column family as:

 create table dynamicTable(key text, nameAsDouble double, valueAsBlob
 blob);

 insert into dynamicTable(key, nameAsDouble, valueAsBlob) values ( key, 
 double(102.211),
 textAsBlob('valueInBytes').

 Do you think, it will work in case column name are double?

 -Vivek


 On Sat, Aug 31, 2013 at 12:03 AM, Peter Lin wool...@gmail.com wrote:


 In the interest of education and discussion.

 I didn't mean to say CQL3 doesn't support dynamic columns. The example
 from the page shows default type defined in the create statement.

 create column family data
 with key_validation_class=Int32Type
  and comparator=DateType
  and default_validation_class=FloatType;


 If I try to insert a dynamic column that uses double for column name and
 string for column value, it will throw an error. The kind of use case I'm
 talking about defines a minimum number of static columns. Most of the
 columns that are added at runtime are different name and value type. This
 is specific to my use case.

 Having said that, I believe it would be possible to provide that kind
 of feature in CQL, but the trade off is it deviates from SQL. The grammar
 would have to allow type declaration in the columns list and functions in
 the values. Something like

 insert into mytable (KEY, doubleType(newcol1), string(newcol2)) values
 ('abc123', some string, double(102.211))

 doubleType(newcol1) and string(newcol2) are dynamic columns.

 I know many people find thrift hard to grok and struggle with it, but
 I'm a firm believer in taking time to learn. Every developer should take
 time to read cassandra source code and the source code for the driver
 they're using.



 On Fri, Aug 30, 2013 at 2:18 PM, Jonathan Ellis jbel...@gmail.comwrote:


 http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows


 On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin wool...@gmail.com wrote:


 my bias perspective, I find the sweet spot is thrift for insert/update
 and CQL for select queries.

 CQL is too limiting and negates the power of storing arbitrary data
 types in dynamic columns.


 On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote:

 If you're going to work with CQL, work with CQL.  If you're going to
 work with Thrift, work with Thrift.  Don't mix.

 On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com
 wrote:

 Hi,
 If i a create a table with CQL3 as

 create table user(user_id text PRIMARY KEY, first_name text,
 last_name text, emailid text);

 and create index as:
 create index on user(first_name);

 then inserted some data as:
 insert into user(user_id,first_name,last_name,emailId)
 values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in');


 Then if update same column family using Cassandra-cli as:

 update column family user with key_validation_class='UTF8Type' and
 column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
 index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',
 index_type:KEYS}];


 Now if i connect via cqlsh and explore user table, i can see column
 first_name,last_name are not part of table structure anymore. Here is the
 output:

 CREATE TABLE user (
   key text PRIMARY KEY
 ) WITH
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   

Re: CQL Thrift

2013-08-30 Thread Peter Lin
my bias perspective, I find the sweet spot is thrift for insert/update and
CQL for select queries.

CQL is too limiting and negates the power of storing arbitrary data types
in dynamic columns.


On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote:

 If you're going to work with CQL, work with CQL.  If you're going to work
 with Thrift, work with Thrift.  Don't mix.

 On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote:

 Hi,
 If i a create a table with CQL3 as

 create table user(user_id text PRIMARY KEY, first_name text, last_name
 text, emailid text);

 and create index as:
 create index on user(first_name);

 then inserted some data as:
 insert into user(user_id,first_name,last_name,emailId)
 values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in');


 Then if update same column family using Cassandra-cli as:

 update column family user with key_validation_class='UTF8Type' and
 column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
 index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',
 index_type:KEYS}];


 Now if i connect via cqlsh and explore user table, i can see column
 first_name,last_name are not part of table structure anymore. Here is the
 output:

 CREATE TABLE user (
   key text PRIMARY KEY
 ) WITH
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'SnappyCompressor'};

 cqlsh:cql3usage select * from user;

  user_id
 -
  @mevivs





 I understand that, CQL3 and thrift interoperability is an issue. But this
 looks to me a very basic scenario.



 Any suggestions? Or If anybody can explain a reason behind this?

 -Vivek








Re: CqlStorage creates wrong schema for Pig

2013-08-30 Thread Chad Johnston
I threw together a quick UDF to work around this issue. It just extracts
the value portion of the tuple while taking advantage of the CqlStorage
generated schema to keep the type correct.

You can get it here: https://github.com/iamthechad/cqlstorage-udf

I'll see if I can find more useful information and open a defect, since
that's what this seems to be.

Chad


On Fri, Aug 30, 2013 at 2:02 AM, Miguel Angel Martin junquera 
mianmarjun.mailingl...@gmail.com wrote:

 I try this:

 *rows = LOAD
 'cql://keyspace1/test?page_size=1split_size=4where_clause=age%3D30' USING
 CqlStorage();*

 *dump rows;*

 *ILLUSTRATE rows;*

 *describe rows;*

 *
 *

 *values2= FOREACH rows GENERATE  TOTUPLE (id) as
 (mycolumn:tuple(name,value));*

 *dump values2;*

 *describe values2;*
 *
 *

 But I get this results:



 -
 | rows | id:chararray   | age:int   | title:chararray   |
 -
 |  | (id, 6)| (age, 30) | (title, QA)   |
 -

 rows: {id: chararray,age: int,title: chararray}
 2013-08-30 09:54:37,831 [main] ERROR org.apache.pig.tools.grunt.Grunt -
 ERROR 1031: Incompatable field schema: left is
 tuple_0:tuple(mycolumn:tuple(name:bytearray,value:bytearray)), right is
 org.apache.pig.builtin.totuple_id_1:tuple(id:chararray)





 or



 

 *values2= FOREACH rows GENERATE  TOTUPLE (id) ;*
 *dump values2;*
 *describe values2;*




 and  the results are:


 ...
 (((id,6)))
 (((id,5)))
 values2: {org.apache.pig.builtin.totuple_id_8: (id: chararray)}



 Aggg!


 *
 *



 Miguel Angel Martín Junquera
 Analyst Engineer.
 miguelangel.mar...@brainsins.com



 2013/8/26 Miguel Angel Martin junquera mianmarjun.mailingl...@gmail.com

 hi Chad .

 I have this issue

 I send a mail to user-pig-list and  I still i can resolve this, and I can
 not  access to column values.
 In this mail  I write some things that I try without results... and
 information about this issue.



 http://mail-archives.apache.org/mod_mbox/pig-user/201308.mbox/%3ccajeg_hq9s2po3_xytzx5xki4j1mao8q26jydg2wndy_kyiv...@mail.gmail.com%3E



 I hope  someOne reply  one comment, idea or  solution about  this issue
 or bug.


 I have reviewed the CqlStorage class in code cassandra 1.2.8  but i do
 not have configure the environmetn to debug  and trace this issue.

 Only  I find some comments like, but I do not understand at all.


 /**

  * A LoadStoreFunc for retrieving data from and storing data to Cassandra

  *

  * A row from a standard CF will be returned as nested tuples:

  * (((key1, value1), (key2, value2)), ((name1, val1), (name2, val2))).
  */


 I you found some idea or solution, please post it

 thanks









 2013/8/23 Chad Johnston cjohns...@megatome.com

 (I'm using Cassandra 1.2.8 and Pig 0.11.1)

 I'm loading some simple data from Cassandra into Pig using CqlStorage.
 The CqlStorage loader defines a Pig schema based on the Cassandra schema,
 but it seems to be wrong.

 If I do:

 data = LOAD 'cql://bookdata/books' USING CqlStorage();
 DESCRIBE data;

 I get this:

 data: {isbn: chararray,bookauthor: chararray,booktitle:
 chararray,publisher: chararray,yearofpublication: int}

 However, if I DUMP data, I get results like these:

 ((isbn,0425093387),(bookauthor,Georgette Heyer),(booktitle,Death in the
 Stocks),(publisher,Berkley Pub Group),(yearofpublication,1986))

 Clearly the results from Cassandra are key/value pairs, as would be
 expected. I don't know why the schema generated by CqlStorage() would be so
 different.

 This is really causing me problems trying to access the column values. I
 tried a naive approach of FLATTENing each tuple, then trying to access the
 values that way:

 flattened = FOREACH data GENERATE
   FLATTEN(isbn),
   FLATTEN(booktitle),
   ...
 values = FOREACH flattened GENERATE
   $1 AS ISBN,
   $3 AS BookTitle,
   ...

 As soon as I try to access field $5, Pig complains about the index being
 out of bounds.

 Is there a way to solve the schema/reality mismatch? Am I doing
 something wrong, or have I stumbled across a defect?

 Thanks,
 Chad






Selecting multiple rows with composite partition keys using CQL3

2013-08-30 Thread Carl Lerche
Hello,

I've been trying to figure out how to port my application to CQL3 based on
http://cassandra.apache.org/doc/cql3/CQL.html.

I have a table with a primary key: ( (app, name), timestamp ). So, the
partition key would be composite (on app and name). I'm trying to figure
out if there is a way to select multiple rows that span partition keys.
Basically, I am trying to do:

SELECT .. WHERE (app = 'foo' AND name = 'bar' AND timestamp = 123) OR (app
= 'foo' AND name='hello' AND timestamp = 123)


Data Modeling help for representing a survey form.

2013-08-30 Thread John Anderson
I have an existing system in postgres that I would like to move to
cassandra.  The system is for building registration forms for conferences.

For example, you might want to build a registration form (or survey) that
has a bunch of questions on it.  An overview of this system I whiteboarded
here: http://paste2.org/JeHP1tV0

What I'm trying to figure out is how this data should be structured in a
de-normalized way?

Basic queries would be:
1. Give me all surveys for an account
2. Give me all questions for a survey
3. Give me all responses for a survey
4. Give me all responses for a specific question
5. Compare responses for question What is your favorite color with people
who answered question What is your gender.  i.e  a crosstab of
males/females and the colors they like.
6. Give me a time series of how many people responded to a question per hour

The reason I would like to get it on cassandra is because currently at peak
times this is an extremely write heavy application since people are
registering for a conference that launched or filling out a new survey, so
everyone comes in all at once.

Also, if anyone is in the bay area and wants to discuss cassandra data
modeling over some beers, let me know!

Thanks,
John


Is it possible to synchronous run Cassandra Triggers?

2013-08-30 Thread yun peng
Hi, All
I am interested in using the new Cassandra feature Trigger to implement a
synchronous (or asynchronous but with deadline) index on Cassandra.

The Trigger API allows one to define a mutation job to do (in the future)
but is there any way to control when the (asynchronously executed) job is
actually executed. Or there is anyway to control execution model of
Triggers, like turn on synchronous model and asynchronous model.

Regards
Yun