date:20130830

mysterious 'column1' in cql describe

2013-08-30 Thread Alexander Shutyaev

Hi all!

We have encountered the following problem. We create our column families
via hector like this:

ColumnFamilyDefinition cfdef = HFactory.createColumnFamilyDefinition(*
mykeyspace*, *mycf*);
cfdef.setColumnType(ColumnType.*STANDARD*);
cfdef.setComparatorType(ComparatorType.*UTF8TYPE*);
cfdef.setDefaultValidationClass(*BytesType*);
cfdef.setKeyValidationClass(*UTF8Type*);
cfdef.setReadRepairChance(0.1);
cfdef.setGcGraceSeconds(864000);
cfdef.setMinCompactionThreshold(4);
cfdef.setMaxCompactionThreshold(32);
cfdef.setReplicateOnWrite(*true*);
cfdef.setCompactionStrategy(*SizeTieredCompactionStrategy*);
MapString, String compressionOptions = *new* HashMapString, String();
compressionOptions.put(*sstable_compression*, **);
cfdef.setCompressionOptions(compressionOptions);
cluster.addColumnFamily(cfdef, *true*);

When we *describe *this column family via *cqlsh* we get this

CREATE TABLE mycf (
  key text,
  column1 text,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=0.10 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={};

As you can see there is a mysterious *column1* and moreover it is added to
the primary key. We've thought it wrong so we've tried getting rid of it.
We've managed to do it by adding explicit column definitions like this:

BasicColumnDefinition cdef = new BasicColumnDefinition();
cdef.setName(StringSerializer.get().toByteBuffer(*mycolumn*));
cdef.setValidationClass(ComparatorType.*BYTESTYPE*.getTypeName());
cdef.setIndexType(ColumnIndexType.*CUSTOM*);
cfdef.addColumnDefinition(cDef);

After this the primary key was like

PRIMARY KEY (key)

The effect of this was *overwhelming* - we got a tremendous performance
improvement and according to stats, the key cache began working while
previously its hit ratio was close to zero.

My questions are

1) What is this all about? Is what we did right?
2) In this project we can provide explicit column definitions. But in
another project we have some column families where this is not possible
because column names are dynamic (based on timestamps). If what we did is
right - how can we adapt this solution to the dynamic column name case?

Re: how can i get the column value? Need help!.. cassandra 1.28 and pig 0.11.1

2013-08-30 Thread Miguel Angel Martin junquera

I try this:

*rows = LOAD
'cql://keyspace1/test?page_size=1split_size=4where_clause=age%3D30' USING
CqlStorage();*

*dump rows;*

*ILLUSTRATE rows;*

*describe rows;*

*
*

*values2= FOREACH rows GENERATE  TOTUPLE (id) as
(mycolumn:tuple(name,value));*

*dump values2;*

*describe values2;*
*
*

But I get this results:



-
| rows | id:chararray   | age:int   | title:chararray   |
-
|  | (id, 6)| (age, 30) | (title, QA)   |
-

rows: {id: chararray,age: int,title: chararray}
2013-08-30 09:54:37,831 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1031: Incompatable field schema: left is
tuple_0:tuple(mycolumn:tuple(name:bytearray,value:bytearray)), right is
org.apache.pig.builtin.totuple_id_1:tuple(id:chararray)





or





*values2= FOREACH rows GENERATE  TOTUPLE (id) ;*
*dump values2;*
*describe values2;*




and  the results are:


...
(((id,6)))
(((id,5)))
values2: {org.apache.pig.builtin.totuple_id_8: (id: chararray)}



Aggg!


*
*




Miguel Angel Martín Junquera
Analyst Engineer.
miguelangel.mar...@brainsins.com



2013/8/28 Miguel Angel Martin junquera mianmarjun.mailingl...@gmail.com

 hi:

 I can not understand why the schema is  define like 
 *id:chararray,age:int,title:chararray
  and it does not define like tuples or bag tuples,  if we have pair
 key-values  columns*
 *
 *
 *
 *
 *I try other time to change schema  but it does not work.*
 *
 *
 *any ideas ...*
 *
 *
 *perhaps, is the issue in the definition cql3 tables ?*
 *
 *
 *regards*


 2013/8/28 Miguel Angel Martin junquera mianmarjun.mailingl...@gmail.com

 hi all:


 Regards

 Still i can resolve this issue. .

 does anybody have this issue or try to test this simple example?


 i am stumped I can not find a solution working.

 I appreciate any comment or help


 2013/8/22 Miguel Angel Martin junquera mianmarjun.mailingl...@gmail.com

 hi all:




 I,m testing the new CqlStorage() with cassandra 1.28 and pig 0.11.1


 I am using this sample data test:


 http://frommyworkshop.blogspot.com.es/2013/07/hadoop-map-reduce-with-cassandra.html

 And I load and dump data Righ with this script:

 *rows = LOAD
 'cql://keyspace1/test?page_size=1split_size=4where_clause=age%3D30' USING
 CqlStorage();*
 *
 *
 *dump rows;*
 *describe rows;*
 *
 *

 *resutls:

 ((id,6),(age,30),(title,QA))

 ((id,5),(age,30),(title,QA))

 rows: {id: chararray,age: int,title: chararray}


 *


 But i can not  get  the column values

 I try to define   another schemas in Load like I used with
 cassandraStorage()


 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-and-Pig-how-to-get-column-values-td5641158.html


 example:

 *rows = LOAD
 'cql://keyspace1/test?page_size=1split_size=4where_clause=age%3D30' USING
 CqlStorage() AS (columns: bag {T: tuple(name, value)});*


 and I get this error:

 *2013-08-22 12:24:45,426 [main] ERROR org.apache.pig.tools.grunt.Grunt
 - ERROR 1031: Incompatable schema: left is
 columns:bag{T:tuple(name:bytearray,value:bytearray)}, right is
 id:chararray,age:int,title:chararray*




 I try to use, FLATTEN, SUBSTRING, SPLIT UDF`s but i have not get good
 result:

 Example:


- when I flatten , I get a set of tuples like

 *(title,QA)*

 *(title,QA)*

 *2013-08-22 12:42:20,673 [main] INFO
  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
 input paths to process : 1*

 *A: {title: chararray}*



 but i can get value QA

 Sustring only works with title



 example:

 *B = FOREACH A GENERATE SUBSTRING(title,2,5);*
 *
 *
 *dump B;*
 *describe B;*
 *
 *
 *
 *

 *results:*
 *
 *

 *(tle)*
 *(tle)*
 *B: {chararray}*




 i try, this like ERIC LEE inthe other mail  and have the same results:


  Anyways, what I really what is the column value, not the name. Is there
 a way to do that? I listed all of the failed attempts I made below.

- colnames = FOREACH cols GENERATE $1 and was told $1 was out of
bounds.
- casted = FOREACH cols GENERATE (tuple(chararray, chararray))$0;
but all I got back were empty tuples
- values = FOREACH cols GENERATE $0.$1; but I got an error telling
me data byte array can't be casted to tuple


 Please, I will appreciate any help


 Regards









 --

 Miguel Angel Martín Junquera
 Analyst Engineer.
 miguelangel.mar...@brainsins.com
 Tel. / Fax: (+34) 91 485 56 66
 *http://www.brainsins.com*
 Smart eCommerce
 *Madrid*: http://goo.gl/4B5kv
  *London*: http://goo.gl/uIXdv
  *Barcelona*: http://goo.gl/NZslW

 Antes de imprimir este e-mail, piense si es necesario.
 La legislación española ampara el secreto de las comunicaciones. Este
 correo electrónico es estrictamente confidencial y va dirigido
 exclusivamente a su destinatario/a. Si no es Ud., le rogamos que no difunda
 ni copie la transmisión y nos lo notifique cuanto antes.

Re: CqlStorage creates wrong schema for Pig

2013-08-30 Thread Miguel Angel Martin junquera

I try this:

*rows = LOAD
'cql://keyspace1/test?page_size=1split_size=4where_clause=age%3D30' USING
CqlStorage();*

*dump rows;*

*ILLUSTRATE rows;*

*describe rows;*

*
*

*values2= FOREACH rows GENERATE  TOTUPLE (id) as
(mycolumn:tuple(name,value));*

*dump values2;*

*describe values2;*
*
*

But I get this results:



-
| rows | id:chararray   | age:int   | title:chararray   |
-
|  | (id, 6)| (age, 30) | (title, QA)   |
-

rows: {id: chararray,age: int,title: chararray}
2013-08-30 09:54:37,831 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1031: Incompatable field schema: left is
tuple_0:tuple(mycolumn:tuple(name:bytearray,value:bytearray)), right is
org.apache.pig.builtin.totuple_id_1:tuple(id:chararray)





or





*values2= FOREACH rows GENERATE  TOTUPLE (id) ;*
*dump values2;*
*describe values2;*




and  the results are:


...
(((id,6)))
(((id,5)))
values2: {org.apache.pig.builtin.totuple_id_8: (id: chararray)}



Aggg!


*
*



Miguel Angel Martín Junquera
Analyst Engineer.
miguelangel.mar...@brainsins.com



2013/8/26 Miguel Angel Martin junquera mianmarjun.mailingl...@gmail.com

 hi Chad .

 I have this issue

 I send a mail to user-pig-list and  I still i can resolve this, and I can
 not  access to column values.
 In this mail  I write some things that I try without results... and
 information about this issue.



 http://mail-archives.apache.org/mod_mbox/pig-user/201308.mbox/%3ccajeg_hq9s2po3_xytzx5xki4j1mao8q26jydg2wndy_kyiv...@mail.gmail.com%3E



 I hope  someOne reply  one comment, idea or  solution about  this issue or
 bug.


 I have reviewed the CqlStorage class in code cassandra 1.2.8  but i do not
 have configure the environmetn to debug  and trace this issue.

 Only  I find some comments like, but I do not understand at all.


 /**

  * A LoadStoreFunc for retrieving data from and storing data to Cassandra

  *

  * A row from a standard CF will be returned as nested tuples:

  * (((key1, value1), (key2, value2)), ((name1, val1), (name2, val2))).
  */


 I you found some idea or solution, please post it

 thanks









 2013/8/23 Chad Johnston cjohns...@megatome.com

 (I'm using Cassandra 1.2.8 and Pig 0.11.1)

 I'm loading some simple data from Cassandra into Pig using CqlStorage.
 The CqlStorage loader defines a Pig schema based on the Cassandra schema,
 but it seems to be wrong.

 If I do:

 data = LOAD 'cql://bookdata/books' USING CqlStorage();
 DESCRIBE data;

 I get this:

 data: {isbn: chararray,bookauthor: chararray,booktitle:
 chararray,publisher: chararray,yearofpublication: int}

 However, if I DUMP data, I get results like these:

 ((isbn,0425093387),(bookauthor,Georgette Heyer),(booktitle,Death in the
 Stocks),(publisher,Berkley Pub Group),(yearofpublication,1986))

 Clearly the results from Cassandra are key/value pairs, as would be
 expected. I don't know why the schema generated by CqlStorage() would be so
 different.

 This is really causing me problems trying to access the column values. I
 tried a naive approach of FLATTENing each tuple, then trying to access the
 values that way:

 flattened = FOREACH data GENERATE
   FLATTEN(isbn),
   FLATTEN(booktitle),
   ...
 values = FOREACH flattened GENERATE
   $1 AS ISBN,
   $3 AS BookTitle,
   ...

 As soon as I try to access field $5, Pig complains about the index being
 out of bounds.

 Is there a way to solve the schema/reality mismatch? Am I doing something
 wrong, or have I stumbled across a defect?

 Thanks,
 Chad

Re: successful use of shuffle?

2013-08-30 Thread Alain RODRIGUEZ

+1.

I am still afraid of this step. Yet you can avoid it by introducing new
nodes, with vnodes enabled, and then remove old ones. This should work.

My problem is that I am not really confident in vnodes either...

Any share, on this transition, and then of the use of vnodes would be great
indeed.

Alain


2013/8/29 Robert Coli rc...@eventbrite.com

 Hi!

 I've been wondering... is there anyone in the cassandra-user audience who
 has used shuffle feature successfully on a non-toy-or-testing cluster? If
 so, could you describe the experience you had and any problems you
 encountered?

 Thanks!

 =Rob

Re: mysterious 'column1' in cql describe

2013-08-30 Thread Sylvain Lebresne

The short story is that you're probably not up to date on how CQL and
thrift table definition relate to one another, and that may not be exactly
how you think it does. If you haven't done so, I'd suggest the reading of
http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows(should
answer your what about dynamic column name case) and
http://www.datastax.com/dev/blog/thrift-to-cql3 (should help explain how
CQL3 interprets thrift table, and why your saw what you saw).

--
Sylvain


On Fri, Aug 30, 2013 at 9:50 AM, Alexander Shutyaev shuty...@gmail.comwrote:

 Hi all!

 We have encountered the following problem. We create our column families
 via hector like this:

 ColumnFamilyDefinition cfdef = HFactory.createColumnFamilyDefinition(*
 mykeyspace*, *mycf*);
 cfdef.setColumnType(ColumnType.*STANDARD*);
 cfdef.setComparatorType(ComparatorType.*UTF8TYPE*);
 cfdef.setDefaultValidationClass(*BytesType*);
 cfdef.setKeyValidationClass(*UTF8Type*);
 cfdef.setReadRepairChance(0.1);
 cfdef.setGcGraceSeconds(864000);
 cfdef.setMinCompactionThreshold(4);
 cfdef.setMaxCompactionThreshold(32);
 cfdef.setReplicateOnWrite(*true*);
 cfdef.setCompactionStrategy(*SizeTieredCompactionStrategy*);
 MapString, String compressionOptions = *new* HashMapString, String();
 compressionOptions.put(*sstable_compression*, **);
 cfdef.setCompressionOptions(compressionOptions);
 cluster.addColumnFamily(cfdef, *true*);

 When we *describe *this column family via *cqlsh* we get this

 CREATE TABLE mycf (
   key text,
   column1 text,
   value blob,
   PRIMARY KEY (key, column1)
 ) WITH COMPACT STORAGE AND
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={};

 As you can see there is a mysterious *column1* and moreover it is added
 to the primary key. We've thought it wrong so we've tried getting rid of
 it. We've managed to do it by adding explicit column definitions like this:

 BasicColumnDefinition cdef = new BasicColumnDefinition();
 cdef.setName(StringSerializer.get().toByteBuffer(*mycolumn*));
 cdef.setValidationClass(ComparatorType.*BYTESTYPE*.getTypeName());
 cdef.setIndexType(ColumnIndexType.*CUSTOM*);
 cfdef.addColumnDefinition(cDef);

 After this the primary key was like

 PRIMARY KEY (key)

 The effect of this was *overwhelming* - we got a tremendous performance
 improvement and according to stats, the key cache began working while
 previously its hit ratio was close to zero.

 My questions are

 1) What is this all about? Is what we did right?
 2) In this project we can provide explicit column definitions. But in
 another project we have some column families where this is not possible
 because column names are dynamic (based on timestamps). If what we did is
 right - how can we adapt this solution to the dynamic column name case?

Re: mysterious 'column1' in cql describe

2013-08-30 Thread Alexander Shutyaev

Thanks, Sylvain! I'll read it most thoroughly but after a quick glance I
wish to repeat my another (implied) question that I believe will not be
answered in these articles.

Why does the explicit definition of columns in a column family
significantly improve performance and key cache hit ratio (the last one
being almost zero when there are no explicit column definitions)?


2013/8/30 Sylvain Lebresne sylv...@datastax.com

 The short story is that you're probably not up to date on how CQL and
 thrift table definition relate to one another, and that may not be exactly
 how you think it does. If you haven't done so, I'd suggest the reading of
 http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows(should
  answer your what about dynamic column name case) and
 http://www.datastax.com/dev/blog/thrift-to-cql3 (should help explain how
 CQL3 interprets thrift table, and why your saw what you saw).

 --
 Sylvain


 On Fri, Aug 30, 2013 at 9:50 AM, Alexander Shutyaev shuty...@gmail.comwrote:

 Hi all!

 We have encountered the following problem. We create our column families
 via hector like this:

 ColumnFamilyDefinition cfdef = HFactory.createColumnFamilyDefinition(*
 mykeyspace*, *mycf*);
 cfdef.setColumnType(ColumnType.*STANDARD*);
 cfdef.setComparatorType(ComparatorType.*UTF8TYPE*);
 cfdef.setDefaultValidationClass(*BytesType*);
  cfdef.setKeyValidationClass(*UTF8Type*);
 cfdef.setReadRepairChance(0.1);
 cfdef.setGcGraceSeconds(864000);
 cfdef.setMinCompactionThreshold(4);
 cfdef.setMaxCompactionThreshold(32);
 cfdef.setReplicateOnWrite(*true*);
 cfdef.setCompactionStrategy(*SizeTieredCompactionStrategy*);
 MapString, String compressionOptions = *new* HashMapString, String();
 compressionOptions.put(*sstable_compression*, **);
 cfdef.setCompressionOptions(compressionOptions);
 cluster.addColumnFamily(cfdef, *true*);

 When we *describe *this column family via *cqlsh* we get this

 CREATE TABLE mycf (
   key text,
   column1 text,
   value blob,
   PRIMARY KEY (key, column1)
 ) WITH COMPACT STORAGE AND
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={};

 As you can see there is a mysterious *column1* and moreover it is added
 to the primary key. We've thought it wrong so we've tried getting rid of
 it. We've managed to do it by adding explicit column definitions like this:

 BasicColumnDefinition cdef = new BasicColumnDefinition();
 cdef.setName(StringSerializer.get().toByteBuffer(*mycolumn*));
 cdef.setValidationClass(ComparatorType.*BYTESTYPE*.getTypeName());
 cdef.setIndexType(ColumnIndexType.*CUSTOM*);
 cfdef.addColumnDefinition(cDef);

 After this the primary key was like

 PRIMARY KEY (key)

 The effect of this was *overwhelming* - we got a tremendous performance
 improvement and according to stats, the key cache began working while
 previously its hit ratio was close to zero.

 My questions are

 1) What is this all about? Is what we did right?
 2) In this project we can provide explicit column definitions. But in
 another project we have some column families where this is not possible
 because column names are dynamic (based on timestamps). If what we did is
 right - how can we adapt this solution to the dynamic column name case?

[RELEASE] Apache Cassandra 1.2.9 released

2013-08-30 Thread Sylvain Lebresne

The Cassandra team is pleased to announce the release of Apache Cassandra
version 1.2.9.

Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a maintenance/bug fix release[1] on the 1.2 series. As
always,
please pay attention to the release notes[2] and Let us know[3] if you were
to
encounter any problem.

Enjoy!

[1]: http://goo.gl/2UVSW5 (CHANGES.txt)
[2]: http://goo.gl/lOZAdM (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA

Re: mysterious 'column1' in cql describe

2013-08-30 Thread Sylvain Lebresne

 Why does the explicit definition of columns in a column family
 significantly improve performance and key cache hit ratio (the last one
 being almost zero when there are no explicit column definitions)?


It doesn't, not in itself at least. So something else has changed or
something is wrong in your comparison of before/after. But it's hard to say
without at least a minimum of information on how you actually observed such
significant performance improvement (which queries for instance).

As for the key cache hit rate, adding a column definition certainly have no
effect on it in itself. But defining a new 2ndary index might, and the code
to add the column you've provided does has a  setIndexType. Again, hard to
be definitive on that because the code you've show set a CUSTOM index type
without providing any indexOption, which is *invalid* (and rejected as so
by Cassandra). So either the code above is not complete, or it's not the
one you've used, or Hector is doing some weird stuff behind your back. In
any case, if index creation there has been, then *that* could easily
explain a before-after performance difference.

--
Sylvain





 2013/8/30 Sylvain Lebresne sylv...@datastax.com

 The short story is that you're probably not up to date on how CQL and
 thrift table definition relate to one another, and that may not be exactly
 how you think it does. If you haven't done so, I'd suggest the reading of
 http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows(should
  answer your what about dynamic column name case) and
 http://www.datastax.com/dev/blog/thrift-to-cql3 (should help explain how
 CQL3 interprets thrift table, and why your saw what you saw).

 --
 Sylvain


 On Fri, Aug 30, 2013 at 9:50 AM, Alexander Shutyaev 
 shuty...@gmail.comwrote:

 Hi all!

 We have encountered the following problem. We create our column families
 via hector like this:

 ColumnFamilyDefinition cfdef = HFactory.createColumnFamilyDefinition(*
 mykeyspace*, *mycf*);
 cfdef.setColumnType(ColumnType.*STANDARD*);
 cfdef.setComparatorType(ComparatorType.*UTF8TYPE*);
 cfdef.setDefaultValidationClass(*BytesType*);
  cfdef.setKeyValidationClass(*UTF8Type*);
 cfdef.setReadRepairChance(0.1);
 cfdef.setGcGraceSeconds(864000);
 cfdef.setMinCompactionThreshold(4);
 cfdef.setMaxCompactionThreshold(32);
 cfdef.setReplicateOnWrite(*true*);
 cfdef.setCompactionStrategy(*SizeTieredCompactionStrategy*);
 MapString, String compressionOptions = *new* HashMapString,
 String();
 compressionOptions.put(*sstable_compression*, **);
 cfdef.setCompressionOptions(compressionOptions);
 cluster.addColumnFamily(cfdef, *true*);

 When we *describe *this column family via *cqlsh* we get this

 CREATE TABLE mycf (
   key text,
   column1 text,
   value blob,
   PRIMARY KEY (key, column1)
 ) WITH COMPACT STORAGE AND
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={};

 As you can see there is a mysterious *column1* and moreover it is added
 to the primary key. We've thought it wrong so we've tried getting rid of
 it. We've managed to do it by adding explicit column definitions like this:

 BasicColumnDefinition cdef = new BasicColumnDefinition();
 cdef.setName(StringSerializer.get().toByteBuffer(*mycolumn*));
 cdef.setValidationClass(ComparatorType.*BYTESTYPE*.getTypeName());
 cdef.setIndexType(ColumnIndexType.*CUSTOM*);
 cfdef.addColumnDefinition(cDef);

 After this the primary key was like

 PRIMARY KEY (key)

 The effect of this was *overwhelming* - we got a tremendous performance
 improvement and according to stats, the key cache began working while
 previously its hit ratio was close to zero.

 My questions are

 1) What is this all about? Is what we did right?
 2) In this project we can provide explicit column definitions. But in
 another project we have some column families where this is not possible
 because column names are dynamic (based on timestamps). If what we did is
 right - how can we adapt this solution to the dynamic column name case?

RE: Cassandra-shuffle fails

2013-08-30 Thread Romain HARDOUIN

Hi,

Failed to enable shuffling is thrown when an IOException occurs in the 
constructor JMXConnection(endpoint, port).
See Shuffle.enableRelocations() in org.apache.cassandra.tools.

Have you set up credentials for JMX?

Regards,
Romain



De :Tamar Rosen ta...@correlor.com
A : user@cassandra.apache.org, 
Cc :Vitaly Sourikov vit...@correlor.com, Yair Pinyan 
y...@correlor.com
Date :  29/08/2013 17:35
Objet : Cassandra-shuffle fails



Hi,

We recently upgraded from version 1.1 to 1.2
It all went well, including setting up vnodes, but shuffle fails. 

We have 2 nodes, hosted on Amazon AWS

The steps we took (on each of our nodes) are pretty straight forward:
1. upgrade binaries
2. adjust cassandra.yaml (keep token)
3. nodetool upgradesstables
4. change cassandra.yaml to vnodes rather than tokens
5. restart cassandra
6. cassandra-shuffle create. 

All the above went fine. However, the following fails:
cassandra-shuffle enable
Failed to enable shuffling on 10.194.230.175!

Note:
1. The failure is immediate, and consistent.  
2. Calling shuffle create on either node prepares the shuffle files for 
both. 
3. I made sure both servers are communicating fine on both 9160 and 7199.

Any help will be greatly appreciated.

Tamar

Tamar Rosen
Senior Data Architect
Correlor.com

map/reduce performance time and sstable readerŠ.

2013-08-30 Thread Hiller, Dean

Has anyone done performance tests on sstable reading vs. M/R?  I did a quick 
test on reading all SSTAbles in a LCS column family on 23 tables and took the 
average time it took sstable2json(to /dev/null to make it faster) which was 7 
seconds per table.  (reading to stdout took 16 seconds per table).  This then 
worked out to an estimation of 12.5 hours up to 27 hours(from to stdout 
calculation).  I am suspecting the map/reduce time may be much worse since 
there are not as many repeated rows in LCS

Ie. I am wondering if I should just read from SSTAbles directly instead of 
map/reduce?   I am about to dig around in the code of M/R and sstable2json to 
see what each is doing specifically.

Thanks,
Dean

is there a SSTAbleInput for Map/Reduce instead of ColumnFamily?

2013-08-30 Thread Hiller, Dean

is there a SSTableInput for Map/Reduce instead of ColumnFamily (which uses 
thrift)?

We are not worried about repeated reads since we are idempotent but would 
rather have the direct speed (even if we had to read from a snapshot, it would 
be fine).

(We would most likely run our M/R on 4 nodes of the 12 nodes we have since we 
have RF=3 right now).

Thanks,
Dean

RE: Truncate question

2013-08-30 Thread S C

Thank you all for your responses. Yes I have cleared the snapshots post 
truncate operation.

Thanks,SC
Date: Thu, 29 Aug 2013 21:41:25 -0400
Subject: Re: Truncate question
From: dmcne...@gmail.com
To: user@cassandra.apache.org

You would, however, want to clear the snapshot folder afterword, right?  I 
thought that truncate, like drop table, created a snapshot (unless that feature 
had been disabled in your yaml.  

On Thu, Aug 29, 2013 at 6:51 PM, Robert Coli rc...@eventbrite.com wrote:

On Thu, Aug 29, 2013 at 3:48 PM, S C as...@outlook.com wrote:

Do we have to run nodetool repair or nodetool cleanup after Truncating a 
Column Family?
No. Why would you?

=Rob

Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Jon Haddad

Does your previous snapshot include the system keyspace?  I haven't tried 
upgrading from 1.0.x then rolling back, but it's possible there's some 
backwards incompatible changes.Other than that, make sure you also rolled 
back your config files? 

On Aug 30, 2013, at 8:57 AM, Mike Neir m...@liquidweb.com wrote:

 Greetings folks,
 
 I'm faced with the need to update a 36 node cluster with roughly 25T of data 
 on disk to a version of cassandra in the 1.2.x series. While it seems that 
 1.2.8 will play nicely in the 1.0.9 cluster long enough to do a rolling 
 upgrade, I'd still like to have a roll-back plan in case the rolling upgrade 
 goes sideways.
 
 I've tried to upgrade a single node in my dev cluster, then roll back using a 
 snapshot taken previously, but things don't appear to be going smoothly. The 
 node will rejoin the ring eventually, but not after spending some time in the 
 Joining state as shown by nodetool ring, and spewing a ton of error 
 messages similar to the following:
 
 ERROR [MutationStage:31] 2013-08-29 14:07:20,530 RowMutationVerbHandler.java 
 (line 61) Error in row mutation
 org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1178
 
 My test procedure is as follows:
 1)  nodetool -h localhost snapshot
 2)  nodetool -h localhost drain
 3)  service cassandra stop
 4)  back up cassandra configs
 5)  remove cassandra 1.0.9
 6)  install cassandra 1.2.8
 7)  restore cassandra configs, alter them to remove configuration entries no 
 longer used
 8)  start cassandra 1.2.8, let it run for a bit, then drain/stop it
 9)  remove cassandra 1.2.8
 10) reinstall cassandra 1.0.9
 11) restore original cassandra configs
 12) remove any commit logs present
 13) remove folders for system_auth and system_traces Keyspaces (since they 
 don't seem to be present in 1.0.9)
 14) Move snapshots back to where they should be for 1.0.9 and remove cass 
 1.2.8 data
  # cd /var/lib/cassandra/data/$KEYSPACE/
  # mv */snapshots/$TIMESTAMP/* .
  # find . -mindepth 1 -type d -exec rm -rf {} \;
  # cd /var/lib/cassandra/data/system
  # mv */snapshots/$TIMESTAMP/* .
  # find . -mindepth 1 -type d -exec rm -rf {} \;
 15) start cassandra 1.0.9
 16) observe cassandra system.log
 
 Does anyone have any insight on things I may be doing wrong, or whether this 
 is just an unavoidable pain point caused by rolling back? It seems that since 
 there are no schema changes going on, the node should be able to just hop 
 back into the cluster without error and without transitioning through the 
 Joining state.
 
 -- 
 
 
 
 Mike Neir
 Liquid Web, Inc.
 Infrastructure Administrator

Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Jon Haddad

Sorry, I didn't see the test procedure, it's still early.

On Aug 30, 2013, at 8:57 AM, Mike Neir m...@liquidweb.com wrote:

 Greetings folks,
 
 I'm faced with the need to update a 36 node cluster with roughly 25T of data 
 on disk to a version of cassandra in the 1.2.x series. While it seems that 
 1.2.8 will play nicely in the 1.0.9 cluster long enough to do a rolling 
 upgrade, I'd still like to have a roll-back plan in case the rolling upgrade 
 goes sideways.
 
 I've tried to upgrade a single node in my dev cluster, then roll back using a 
 snapshot taken previously, but things don't appear to be going smoothly. The 
 node will rejoin the ring eventually, but not after spending some time in the 
 Joining state as shown by nodetool ring, and spewing a ton of error 
 messages similar to the following:
 
 ERROR [MutationStage:31] 2013-08-29 14:07:20,530 RowMutationVerbHandler.java 
 (line 61) Error in row mutation
 org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1178
 
 My test procedure is as follows:
 1)  nodetool -h localhost snapshot
 2)  nodetool -h localhost drain
 3)  service cassandra stop
 4)  back up cassandra configs
 5)  remove cassandra 1.0.9
 6)  install cassandra 1.2.8
 7)  restore cassandra configs, alter them to remove configuration entries no 
 longer used
 8)  start cassandra 1.2.8, let it run for a bit, then drain/stop it
 9)  remove cassandra 1.2.8
 10) reinstall cassandra 1.0.9
 11) restore original cassandra configs
 12) remove any commit logs present
 13) remove folders for system_auth and system_traces Keyspaces (since they 
 don't seem to be present in 1.0.9)
 14) Move snapshots back to where they should be for 1.0.9 and remove cass 
 1.2.8 data
  # cd /var/lib/cassandra/data/$KEYSPACE/
  # mv */snapshots/$TIMESTAMP/* .
  # find . -mindepth 1 -type d -exec rm -rf {} \;
  # cd /var/lib/cassandra/data/system
  # mv */snapshots/$TIMESTAMP/* .
  # find . -mindepth 1 -type d -exec rm -rf {} \;
 15) start cassandra 1.0.9
 16) observe cassandra system.log
 
 Does anyone have any insight on things I may be doing wrong, or whether this 
 is just an unavoidable pain point caused by rolling back? It seems that since 
 there are no schema changes going on, the node should be able to just hop 
 back into the cluster without error and without transitioning through the 
 Joining state.
 
 -- 
 
 
 
 Mike Neir
 Liquid Web, Inc.
 Infrastructure Administrator

Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Robert Coli

On Fri, Aug 30, 2013 at 8:57 AM, Mike Neir m...@liquidweb.com wrote:

 I'm faced with the need to update a 36 node cluster with roughly 25T of
 data on disk to a version of cassandra in the 1.2.x series. While it seems
 that 1.2.8 will play nicely in the 1.0.9 cluster long enough to do a
 rolling upgrade, I'd still like to have a roll-back plan in case the
 rolling upgrade goes sideways.


Upgrading two major versions online is an unsupported operation. I would
not expect it to work. Is there a detailed reason you believe it should
work between these versions? Also, instead of 1.2.8 you should upgrade to
1.2.9, released yesterday. Everyone headed to 2.0 has to pass through 1.2.9.

=Rob

Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Mike Neir


Greetings folks,

I'm faced with the need to update a 36 node cluster with roughly 25T of data on 
disk to a version of cassandra in the 1.2.x series. While it seems that 1.2.8 
will play nicely in the 1.0.9 cluster long enough to do a rolling upgrade, I'd 
still like to have a roll-back plan in case the rolling upgrade goes sideways.


I've tried to upgrade a single node in my dev cluster, then roll back using a 
snapshot taken previously, but things don't appear to be going smoothly. The 
node will rejoin the ring eventually, but not after spending some time in the 
Joining state as shown by nodetool ring, and spewing a ton of error messages 
similar to the following:


ERROR [MutationStage:31] 2013-08-29 14:07:20,530 RowMutationVerbHandler.java 
(line 61) Error in row mutation

org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1178

My test procedure is as follows:
1)  nodetool -h localhost snapshot
2)  nodetool -h localhost drain
3)  service cassandra stop
4)  back up cassandra configs
5)  remove cassandra 1.0.9
6)  install cassandra 1.2.8
7)  restore cassandra configs, alter them to remove configuration entries no 
longer used

8)  start cassandra 1.2.8, let it run for a bit, then drain/stop it
9)  remove cassandra 1.2.8
10) reinstall cassandra 1.0.9
11) restore original cassandra configs
12) remove any commit logs present
13) remove folders for system_auth and system_traces Keyspaces (since they don't 
seem to be present in 1.0.9)

14) Move snapshots back to where they should be for 1.0.9 and remove cass 1.2.8 
data
  # cd /var/lib/cassandra/data/$KEYSPACE/
  # mv */snapshots/$TIMESTAMP/* .
  # find . -mindepth 1 -type d -exec rm -rf {} \;
  # cd /var/lib/cassandra/data/system
  # mv */snapshots/$TIMESTAMP/* .
  # find . -mindepth 1 -type d -exec rm -rf {} \;
15) start cassandra 1.0.9
16) observe cassandra system.log

Does anyone have any insight on things I may be doing wrong, or whether this is 
just an unavoidable pain point caused by rolling back? It seems that since there 
are no schema changes going on, the node should be able to just hop back into 
the cluster without error and without transitioning through the Joining state.


--



Mike Neir
Liquid Web, Inc.
Infrastructure Administrator

Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Mohit Anchlia

If you have multiple DCs you at least want to upgrade to 1.0.11. There is
an issue where you might get errors during cross DC replication.

On Fri, Aug 30, 2013 at 9:41 AM, Mike Neir m...@liquidweb.com wrote:

In my testing, mixing 1.0.9 and 1.2.8 seems to work fine as long as there
is no need to do streaming operations (move/repair/bootstrap/etc). The
reading I've done confirms that 1.2.x should be network-compatible with
1.0.x, sans streaming operations. Datastax seems to indicate here that
doing a rolling upgrade from 1.0.x to 1.2.x is viable:

http://www.datastax.com/**documentation/cassandra/1.2/**
webhelp/#upgrade/upgradeC_c.**html#concept_ds_nht_czr_ckhttp://www.datastax.com/documentation/cassandra/1.2/webhelp/#upgrade/upgradeC_c.html%23concept_ds_nht_czr_ck

See the second bullet point in the Prerequisites section.

I'll look into 1.2.9. It wasn't available when I started my testing.

On 08/30/2013 12:15 PM, Robert Coli wrote:

On Fri, Aug 30, 2013 at 8:57 AM, Mike Neir m...@liquidweb.com
mailto:m...@liquidweb.com wrote:

I'm faced with the need to update a 36 node cluster with roughly 25T
of data
on disk to a version of cassandra in the 1.2.x series. While it seems
that
1.2.8 will play nicely in the 1.0.9 cluster long enough to do a
rolling
upgrade, I'd still like to have a roll-back plan in case the rolling
upgrade
goes sideways.

Upgrading two major versions online is an unsupported operation. I would
not
expect it to work. Is there a detailed reason you believe it should work
between
these versions? Also, instead of 1.2.8 you should upgrade to 1.2.9,
released
yesterday. Everyone headed to 2.0 has to pass through 1.2.9.

=Rob

Mike Neir
Liquid Web, Inc.
Infrastructure Administrator

Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Mike Neir

In my testing, mixing 1.0.9 and 1.2.8 seems to work fine as long as there is no 
need to do streaming operations (move/repair/bootstrap/etc). The reading I've 
done confirms that 1.2.x should be network-compatible with 1.0.x, sans streaming 
operations. Datastax seems to indicate here that doing a rolling upgrade from 
1.0.x to 1.2.x is viable:


http://www.datastax.com/documentation/cassandra/1.2/webhelp/#upgrade/upgradeC_c.html#concept_ds_nht_czr_ck

See the second bullet point in the Prerequisites section.

I'll look into 1.2.9. It wasn't available when I started my testing.

MN

On 08/30/2013 12:15 PM, Robert Coli wrote:

On Fri, Aug 30, 2013 at 8:57 AM, Mike Neir m...@liquidweb.com
mailto:m...@liquidweb.com wrote:

I'm faced with the need to update a 36 node cluster with roughly 25T of data
on disk to a version of cassandra in the 1.2.x series. While it seems that
1.2.8 will play nicely in the 1.0.9 cluster long enough to do a rolling
upgrade, I'd still like to have a roll-back plan in case the rolling upgrade
goes sideways.


Upgrading two major versions online is an unsupported operation. I would not
expect it to work. Is there a detailed reason you believe it should work between
these versions? Also, instead of 1.2.8 you should upgrade to 1.2.9, released
yesterday. Everyone headed to 2.0 has to pass through 1.2.9.

=Rob


--



Mike Neir
Liquid Web, Inc.
Infrastructure Administrator

Update-Replace

2013-08-30 Thread Jan Algermissen

Hi,

I have a use case, where I periodically need to apply updates to a wide row 
that should replace the whole row.

The straight-forward insert/update only replace values that are present in the 
executed statement, keeping remaining data around.

Is there a smooth way to do a replace with C* or do I have to handle this by 
the application (e.g. doing delete and then write or coming up with a more 
clever data model)?

Jan

[ANNOUNCE] Polidoro - A Cassandra client in Scala

2013-08-30 Thread Lanny Ripple

Hi all,

We've open sourced Polidoro.  It's a Cassandra client in Scala on top of 
Astyanax and in the style of Cascal.

Find it at https://github.com/SpotRight/Polidoro

  -Lanny Ripple
  SpotRight, Inc - http://spotright.com

Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Jeremiah D Jordan

You probably want to go to 1.0.11/12 first no matter what. If you want the
least chance of issue you should then go to 1.1.12. While there is a high
probability that going from 1.0.X-1.2 will work. You have the best chance at
no failures if you go through 1.1.12. There are some edge cases that can cause
errors if you don't do that.

-Jeremiah

On Aug 30, 2013, at 11:41 AM, Mike Neir m...@liquidweb.com wrote:

In my testing, mixing 1.0.9 and 1.2.8 seems to work fine as long as there is
no need to do streaming operations (move/repair/bootstrap/etc). The reading
I've done confirms that 1.2.x should be network-compatible with 1.0.x, sans
streaming operations. Datastax seems to indicate here that doing a rolling
upgrade from 1.0.x to 1.2.x is viable:

http://www.datastax.com/documentation/cassandra/1.2/webhelp/#upgrade/upgradeC_c.html#concept_ds_nht_czr_ck

See the second bullet point in the Prerequisites section.

I'll look into 1.2.9. It wasn't available when I started my testing.

On 08/30/2013 12:15 PM, Robert Coli wrote:
On Fri, Aug 30, 2013 at 8:57 AM, Mike Neir m...@liquidweb.com
mailto:m...@liquidweb.com wrote:

I'm faced with the need to update a 36 node cluster with roughly 25T of
data
on disk to a version of cassandra in the 1.2.x series. While it seems that
1.2.8 will play nicely in the 1.0.9 cluster long enough to do a rolling
upgrade, I'd still like to have a roll-back plan in case the rolling
upgrade
goes sideways.

Upgrading two major versions online is an unsupported operation. I would not
expect it to work. Is there a detailed reason you believe it should work
between
these versions? Also, instead of 1.2.8 you should upgrade to 1.2.9, released
yesterday. Everyone headed to 2.0 has to pass through 1.2.9.

=Rob

Mike Neir
Liquid Web, Inc.
Infrastructure Administrator

53 matches

Mail list logo