Re: Restoring all cluster from snapshots

2015-06-08 Thread Robert Coli
On Mon, Jun 8, 2015 at 6:22 AM, Anton Koshevoy nowa...@gmail.com wrote:

 - sudo rm -rf /db/cassandra/cr/data0*/system/*


This removes the schema. You can't load SSTables for column families which
don't exist.

=Rob


Re: Cassandra crashes daily; nothing on the log

2015-06-08 Thread Bryan Holladay
It could be the linux kernel killing Cassandra b/c of memory usage. When
this happens, nothing is logged in Cassandra. Check the system
logs: /var/log/messages  Look for a message saying Out of Memory... kill
process...

On Mon, Jun 8, 2015 at 1:37 PM, Paulo Motta pauloricard...@gmail.com
wrote:

 try checking your system logs (generally /var/log/syslog) to check if the
 cassandra process was killed by the OS oom-killer

 2015-06-06 15:39 GMT-03:00 Brian Sam-Bodden bsbod...@integrallis.com:

 Berk,
1 GB is not enough to run C*, the minimum memory we use on Digital
 Ocean is 4GB.

 Cheers,
 Brian
 http://integrallis.com

 On Sat, Jun 6, 2015 at 10:50 AM, graffit...@yahoo.com wrote:

 Hi all,

 I've installed Cassandra on a test server hosted on Digital Ocean. The
 server has 1GB RAM, and is running a single docker container alongside C*.
 Somehow, every night, the Cassandra instance crashes. The annoying part is
 that I cannot see anything wrong with the log files, so I can't tell what's
 going on.

 The log files are here:
 http://pastebin.com/Zquu5wvd

 Do you have any idea what's going on? Can you suggest some ways I can
 try to troubleshoot this?

 Thanks!
  Berk




 --
 Cheers,
 Brian
 http://www.integrallis.com





Re: sstableloader usage doubts

2015-06-08 Thread Robert Coli
On Mon, Jun 8, 2015 at 6:58 AM, ZeroUno zerozerouno...@gmail.com wrote:

 So you mean that refresh needs to be used if the cluster is running, but
 if I stopped cassandra while copying the sstables then refresh is useless?
 So the error No new SSTables were found during my refresh attempt is due
 to the fact that the sstables in my data dir were not new because already
 loaded, and not to the files not being found?


Yes. You should be able to see logs of it opening the files it finds in the
data dir.


 So... if I stop the two nodes on the first DC, restore their sstables'
 files, and then restart the nodes, nothing else needs to be done on the
 first DC?


Be careful to avoid bootstrapping, but yes.


 And on the second DC instead I just need to do nodetool rebuild --
 FirstDC on _both_ nodes?


Yes.

=Rob


Re: Cassandra crashes daily; nothing on the log

2015-06-08 Thread Paulo Motta
try checking your system logs (generally /var/log/syslog) to check if the
cassandra process was killed by the OS oom-killer

2015-06-06 15:39 GMT-03:00 Brian Sam-Bodden bsbod...@integrallis.com:

 Berk,
1 GB is not enough to run C*, the minimum memory we use on Digital
 Ocean is 4GB.

 Cheers,
 Brian
 http://integrallis.com

 On Sat, Jun 6, 2015 at 10:50 AM, graffit...@yahoo.com wrote:

 Hi all,

 I've installed Cassandra on a test server hosted on Digital Ocean. The
 server has 1GB RAM, and is running a single docker container alongside C*.
 Somehow, every night, the Cassandra instance crashes. The annoying part is
 that I cannot see anything wrong with the log files, so I can't tell what's
 going on.

 The log files are here:
 http://pastebin.com/Zquu5wvd

 Do you have any idea what's going on? Can you suggest some ways I can try
 to troubleshoot this?

 Thanks!
  Berk




 --
 Cheers,
 Brian
 http://www.integrallis.com



Re: Restoring all cluster from snapshots

2015-06-08 Thread Anton Koshevoy
Rob, thanks for the answer.

I just follow instruction from 
http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_snapshot_restore_new_cluster.html

If not to remove system table data, the test cluster starts interfering to a 
production cluster. How Can I avoid this situation?



On June 8, 2015 at 9:48:30 PM, Robert Coli (rc...@eventbrite.com) wrote:

On Mon, Jun 8, 2015 at 6:22 AM, Anton Koshevoy nowa...@gmail.com wrote:
- sudo rm -rf /db/cassandra/cr/data0*/system/*

This removes the schema. You can't load SSTables for column families which 
don't exist.
 
=Rob



RE: Restoring all cluster from snapshots

2015-06-08 Thread Sanjay Baronia
Yes, you shouldn’t delete the system directory. Next steps are …reconfigure the 
test cluster with new IP addresses, clear the gossiping information and then 
boot the test cluster.

If you are running Cassandra on VMware,  then you may also want to look at this 
solutionhttp://www.triliodata.com/wp-content/uploads/2015/04/Cassandra-Trilio-Data-Sheet4.pdf
 from Trilio Data, where you can create a Cassandra backup and restore it to a 
Test Cluster.

Regards,

Sanjay

_
Sanjay Baronia
VP of Product  Solutions Management
TrilioData
(c) 508-335-2306
sanjay.baro...@triliodata.commailto:sanjay.baro...@triliodata.com
[Trilio-Business Assurance_300 Pixels]http://www.triliodata.com/

Experience Trilio in action, please click 
heremailto:i...@triliodata.com?subject=Demo%20Request. to request a demo 
today!

From: Anton Koshevoy [mailto:nowa...@gmail.com]
Sent: Monday, June 8, 2015 4:42 PM
To: user@cassandra.apache.org
Subject: Re: Restoring all cluster from snapshots

Rob, thanks for the answer.

I just follow instruction from 
http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_snapshot_restore_new_cluster.html

If not to remove system table data, the test cluster starts interfering to a 
production cluster. How Can I avoid this situation?




On June 8, 2015 at 9:48:30 PM, Robert Coli 
(rc...@eventbrite.commailto:rc...@eventbrite.com) wrote:
On Mon, Jun 8, 2015 at 6:22 AM, Anton Koshevoy 
nowa...@gmail.commailto:nowa...@gmail.com wrote:
- sudo rm -rf /db/cassandra/cr/data0*/system/*

This removes the schema. You can't load SSTables for column families which 
don't exist.

=Rob



Deserialize the collection type data from the SSTable file

2015-06-08 Thread java8964
Hi, Cassandra users:
I have a question related to how to Deserialize the new collection types data 
in the Cassandra 2.x. (The exactly version is C 2.0.10).
I create the following example tables in the CQLSH:
CREATE TABLE coupon (  account_id bigint,  campaign_id uuid,  
,  discount_info maptext, text,  
,  PRIMARY KEY (account_id, campaign_id))
The other columns can be ignored in this case. Then I inserted into the one 
test data like this:
insert into coupon (account_id, campaign_id, discount_info) values (111,uuid(), 
{'test_key':'test_value'});
After this, I got the SSTable files. I use the sstable2json file to check the 
output:
$./resources/cassandra/bin/sstable2json /xxx/test-coupon-jb-1-Data.db[{key: 
006f,columns: 
[[0336e50d-21aa-4b3a-9f01-989a8c540e54:,,1433792922055000], 
[0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info,0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info:!,1433792922054999,t,1433792922],
 
[0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info:746573745f6b6579,746573745f76616c7565,1433792922055000]]}]
 
What I want to is to get the {test_key : test_value} as key/value pair that 
I input into discount_info column. I followed the sstable2json code, and try 
to deserialize the data by myself, but to my surprise, I cannot make it work, 
even I tried several ways, but kept getting Exception.
From what I researched, I know that Cassandra put the campaign_id + 
discount_info + Another ByteBuffer as composite column in this case. When 
I deserialize this columnName, I got the following dumped out as String:
0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info:746573745f6b6579.
It includes 3 parts: the first part is the uuid for the campaign_id. The 2nd 
part as discount_info, which is the static name I defined in the table. The 3 
part is a bytes array as length of 46, which I am not sure what it is. 
The corresponding value part of this composite column is another byte array as 
length of 10, hex as 746573745f76616c7565 if I dump it out.
Now, here is what I did and not sure why it doesn't work. First, I assume the 
value part stores the real value I put in the Map, so I did the following:
ByteBuffer value = ByteBufferUtil.clone(column.value());MapTypeString, String 
result = MapType.getInstance(UTF8Type.instance, UTF8Type.instance);
MapString, String output = result.compose(value);// it gave me the following 
exception: org.apache.cassandra.serializers.MarshalException: Not enough bytes 
to read a mapThen I am think that the real value must be stored as part of the 
column names (the 3rd part of 46 bytes), so I did this:MapTypeString, String 
result = MapType.getInstance(UTF8Type.instance, UTF8Type.instance);
MapString, String output = result.compose(third_part.value);// I got the 
following exception:java.lang.IllegalArgumentException
at java.nio.Buffer.limit(Buffer.java:267)
at 
org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:587)
at 
org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(ByteBufferUtil.java:596)
at 
org.apache.cassandra.serializers.MapSerializer.deserialize(MapSerializer.java:63)
at 
org.apache.cassandra.serializers.MapSerializer.deserialize(MapSerializer.java:28)
at 
org.apache.cassandra.db.marshal.AbstractType.compose(AbstractType.java:142)
I can get all other non-collection types data, but I cannot get the data from 
the Map. My questions are:1) How does the Cassandra store the collection data 
in the SSTable files? From the length of bytes, it is most likely as part of 
the composite column. If so, why I got the exception as above? 2) The 
sstable2json doesn't deserialize the real data out from the collection type. So 
I don't have an example to follow. Do I use the wrong way trying to compose the 
Map type data?
Thanks
Yong  

Re: Deserialize the collection type data from the SSTable file

2015-06-08 Thread Daniel Chia
I'm not sure why sstable2json doesn't work for collections, but if you're
into reading raw sstables we use the following code with good success:

https://github.com/coursera/aegisthus/blob/77c73f6259f2a30d3d8ca64578be5c13ecc4e6f4/aegisthus-hadoop/src/main/java/org/coursera/mapreducer/CQLMapper.java#L85

Thanks,
Daniel

On Mon, Jun 8, 2015 at 1:22 PM, java8964 java8...@hotmail.com wrote:

 Hi, Cassandra users:

 I have a question related to how to Deserialize the new collection types
 data in the Cassandra 2.x. (The exactly version is C 2.0.10).

 I create the following example tables in the CQLSH:

 CREATE TABLE coupon (
   account_id bigint,
   campaign_id uuid,
   ,
   discount_info maptext, text,
   ,
   PRIMARY KEY (account_id, campaign_id)
 )

 The other columns can be ignored in this case. Then I inserted into the
 one test data like this:

 insert into coupon (account_id, campaign_id, discount_info) values
 (111,uuid(), {'test_key':'test_value'});

 After this, I got the SSTable files. I use the sstable2json file to check
 the output:

 $./resources/cassandra/bin/sstable2json /xxx/test-coupon-jb-1-Data.db
 [
 {key: 006f,columns:
 [[0336e50d-21aa-4b3a-9f01-989a8c540e54:,,1433792922055000],
 [0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info,0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info:!,1433792922054999,t,1433792922],
 [0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info:746573745f6b6579,746573745f76616c7565,1433792922055000]]}
 ]

 What I want to is to get the {test_key : test_value} as key/value pair
 that I input into discount_info column. I followed the sstable2json code,
 and try to deserialize the data by myself, but to my surprise, I cannot
 make it work, even I tried several ways, but kept getting Exception.

 From what I researched, I know that Cassandra put the campaign_id +
 discount_info + Another ByteBuffer as composite column in this case.
 When I deserialize this columnName, I got the following dumped out as
 String:

 0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info:746573745f6b6579.

 It includes 3 parts: the first part is the uuid for the campaign_id. The
 2nd part as discount_info, which is the static name I defined in the
 table. The 3 part is a bytes array as length of 46, which I am not sure
 what it is.

 The corresponding value part of this composite column is another byte
 array as length of 10, hex as 746573745f76616c7565 if I dump it out.

 Now, here is what I did and not sure why it doesn't work.
 First, I assume the value part stores the real value I put in the Map, so
 I did the following:

 ByteBuffer value = ByteBufferUtil.clone(column.value());

 MapTypeString, String result = MapType.getInstance(UTF8Type.instance, 
 UTF8Type.instance);
 MapString, String output = result.compose(value);

 // it gave me the following exception: 
 org.apache.cassandra.serializers.MarshalException: Not enough bytes to read a 
 map

 Then I am think that the real value must be stored as part of the column 
 names (the 3rd part of 46 bytes), so I did this:

 MapTypeString, String result = MapType.getInstance(UTF8Type.instance, 
 UTF8Type.instance);
 MapString, String output = result.compose(third_part.value);

 // I got the following exception:

 java.lang.IllegalArgumentException
   at java.nio.Buffer.limit(Buffer.java:267)
   at 
 org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:587)
   at 
 org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(ByteBufferUtil.java:596)
   at 
 org.apache.cassandra.serializers.MapSerializer.deserialize(MapSerializer.java:63)
   at 
 org.apache.cassandra.serializers.MapSerializer.deserialize(MapSerializer.java:28)
   at 
 org.apache.cassandra.db.marshal.AbstractType.compose(AbstractType.java:142)


 I can get all other non-collection types data, but I cannot get the data from 
 the Map. My questions are:

 1) How does the Cassandra store the collection data in the SSTable files? 
 From the length of bytes, it is most likely as part of the composite column. 
 If so, why I got the exception as above?

 2) The sstable2json doesn't deserialize the real data out from the collection 
 type. So I don't have an example to follow. Do I use the wrong way trying to 
 compose the Map type data?


 Thanks


 Yong




Deserialize the collection type data from the SSTable file

2015-06-08 Thread java8964
Hi, Cassandra users:
I have a question related to how to Deserialize the new collection types data 
in the Cassandra 2.x. (The exactly version is C 2.0.10).
I create the following example tables in the CQLSH:
CREATE TABLE coupon (  account_id bigint,  campaign_id uuid,  
,  discount_info maptext, text,  
,  PRIMARY KEY (account_id, campaign_id))
The other columns can be ignored in this case. Then I inserted into the one 
test data like this:
insert into coupon (account_id, campaign_id, discount_info) values (111,uuid(), 
{'test_key':'test_value'});
After this, I got the SSTable files. I use the sstable2json file to check the 
output:
$./resources/cassandra/bin/sstable2json /xxx/test-coupon-jb-1-Data.db[{key: 
006f,columns: 
[[0336e50d-21aa-4b3a-9f01-989a8c540e54:,,1433792922055000], 
[0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info,0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info:!,1433792922054999,t,1433792922],
 
[0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info:746573745f6b6579,746573745f76616c7565,1433792922055000]]}]
 
What I want to is to get the {test_key : test_value} as key/value pair that 
I input into discount_info column. I followed the sstable2json code, and try 
to deserialize the data by myself, but to my surprise, I cannot make it work, 
even I tried several ways, but kept getting Exception.
From what I researched, I know that Cassandra put the campaign_id + 
discount_info + Another ByteBuffer as composite column in this case. When 
I deserialize this columnName, I got the following dumped out as String:
0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info:746573745f6b6579.
It includes 3 parts: the first part is the uuid for the campaign_id. The 2nd 
part as discount_info, which is the static name I defined in the table. The 3 
part is a bytes array as length of 46, which I am not sure what it is. 
The corresponding value part of this composite column is another byte array as 
length of 10, hex as 746573745f76616c7565 if I dump it out.
Now, here is what I did and not sure why it doesn't work. First, I assume the 
value part stores the real value I put in the Map, so I did the following:
ByteBuffer value = ByteBufferUtil.clone(column.value());MapTypeString, String 
result = MapType.getInstance(UTF8Type.instance, UTF8Type.instance);
MapString, String output = result.compose(value);// it gave me the following 
exception: org.apache.cassandra.serializers.MarshalException: Not enough bytes 
to read a mapThen I am think that the real value must be stored as part of the 
column names (the 3rd part of 46 bytes), so I did this:MapTypeString, String 
result = MapType.getInstance(UTF8Type.instance, UTF8Type.instance);
MapString, String output = result.compose(third_part.value);// I got the 
following exception:java.lang.IllegalArgumentException
at java.nio.Buffer.limit(Buffer.java:267)
at 
org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:587)
at 
org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(ByteBufferUtil.java:596)
at 
org.apache.cassandra.serializers.MapSerializer.deserialize(MapSerializer.java:63)
at 
org.apache.cassandra.serializers.MapSerializer.deserialize(MapSerializer.java:28)
at 
org.apache.cassandra.db.marshal.AbstractType.compose(AbstractType.java:142)
I can get all other non-collection types data, but I cannot get the data from 
the Map. My questions are:1) How does the Cassandra store the collection data 
in the SSTable files? From the length of bytes, it is most likely as part of 
the composite column. If so, why I got the exception as above? 2) The 
sstable2json doesn't deserialize the real data out from the collection type. So 
I don't have an example to follow. Do I use the wrong way trying to compose the 
Map type data?
Thanks
Yong  

Re: Restoring all cluster from snapshots

2015-06-08 Thread Alain RODRIGUEZ
I think you just have to do a DESC KEYSPACE mykeyspace; from one node of
the production cluster then copy the output and import it in your dev
cluster using cqlsh -f output.cql.

Take care at the start of the output you might want to change DC names, RF
or strategy.

Also, if you don't want to restart nodes you can load data by using
nodetool refresh mykeyspace mycf

C*heers

Alain

2015-06-08 22:42 GMT+02:00 Anton Koshevoy nowa...@gmail.com:

 Rob, thanks for the answer.

 I just follow instruction from
 http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_snapshot_restore_new_cluster.html

 If not to remove system table data, the test cluster starts interfering
 to a production cluster. How Can I avoid this situation?



 On June 8, 2015 at 9:48:30 PM, Robert Coli (rc...@eventbrite.com) wrote:

  On Mon, Jun 8, 2015 at 6:22 AM, Anton Koshevoy nowa...@gmail.com wrote:

  - sudo rm -rf /db/cassandra/cr/data0*/system/*


 This removes the schema. You can't load SSTables for column families which
 don't exist.

  =Rob




Re: Restoring all cluster from snapshots

2015-06-08 Thread Robert Coli
On Mon, Jun 8, 2015 at 2:52 PM, Sanjay Baronia 
sanjay.baro...@triliodata.com wrote:

  Yes, you shouldn’t delete the system directory. Next steps are
 …reconfigure the test cluster with new IP addresses, clear the gossiping
 information and then boot the test cluster.


If you don't delete the system directory, you run the risk of the test
cluster nodes joining the source cluster.

Just start a single node on the new cluster, empty, and create the schema
on it.

Then do the rest of the process.

=Rob


Re: DSE 4.7 security

2015-06-08 Thread Jack Krupansky
Cassandra authorization is at the keyspace and table level. Click on the
GRANT link on the doc page, to get more info:
http://docs.datastax.com/en/cql/3.1/cql/cql_reference/grant_r.html

Which says *Permissions to access all keyspaces, a named keyspace, or a
table can be granted to a user.*

There is no finer-grain authorization at the row, column, or cell level.

You might want to open a Jira for this valuable feature.

-- Jack Krupansky

On Sun, Jun 7, 2015 at 5:19 PM, Moshe Kranc moshekr...@gmail.com wrote:

 The DSE 4.7 documentation says: You use the familiar relational database
 GRANT/REVOKE paradigm to grant or revoke permissions to

 access Cassandra data.

 Does this mean authorization is per table?

 What if I need finer grain authorization, e.g., per row or even per cell
 (e.g., a specific column in a specific row may not be seen by users in a
 group)?

 Do I need to implement this in my application, because Cassandra does not
 support it?



auto clear data with ttl

2015-06-08 Thread 曹志富
I have C* 2.1.5,store some data with ttl.Reduce the gc_grace_seconds to
zero.

But it seems has no effect.

Did I miss something?
--
Ranger Tsao


Re: auto clear data with ttl

2015-06-08 Thread Aiman Parvaiz
So gc_grace zero will remove tombstones without any delay after compaction. So 
it's possible that tombstones containing SSTs still need to be compacted. So 
either you can wait for compaction to happen or do a manual compaction 
depending on your compaction strategy. Manual compaction does have some 
drawbacks so please read about it.

Sent from my iPhone

 On Jun 8, 2015, at 7:26 PM, 曹志富 cao.zh...@gmail.com wrote:
 
 I have C* 2.1.5,store some data with ttl.Reduce the gc_grace_seconds to zero.
 
 But it seems has no effect.
 
 Did I miss something?
 --
 Ranger Tsao


Re: auto clear data with ttl

2015-06-08 Thread 曹志富
Thank You. I have change unchecked_tombstone_compaction to true . Major
compaction will cause a big sstable ,I think is a lot good choice

--
Ranger Tsao

2015-06-09 11:16 GMT+08:00 Aiman Parvaiz ai...@flipagram.com:

 So gc_grace zero will remove tombstones without any delay after
 compaction. So it's possible that tombstones containing SSTs still need to
 be compacted. So either you can wait for compaction to happen or do a
 manual compaction depending on your compaction strategy. Manual compaction
 does have some drawbacks so please read about it.

 Sent from my iPhone

 On Jun 8, 2015, at 7:26 PM, 曹志富 cao.zh...@gmail.com wrote:

 I have C* 2.1.5,store some data with ttl.Reduce the gc_grace_seconds to
 zero.

 But it seems has no effect.

 Did I miss something?
 --
 Ranger Tsao




C* 2.0.15 - java.lang.NegativeArraySizeException

2015-06-08 Thread Aiman Parvaiz
Hi everyone
I am running C* 2.0.9 and decided to do a rolling upgrade. Added a node of
C* 2.0.15 in the existing cluster and saw this twice:

Jun  9 02:27:20 prod-cass23.localdomain cassandra: 2015-06-09 02:27:20,658
INFO CompactionExecutor:4 CompactionTask.runMayThrow - Compacting
[SSTableReader(path='/var/lib/cassandra/data/system/schema_columns/system-schema_columns-jb-37-Data.db'),
SSTableReader(path='/var/lib/cassandra/data/system/schema_columns/system-schema_columns-jb-40-Data.db'),
SSTableReader(path='/var/lib/cassandra/data/system/schema_columns/system-schema_columns-jb-42-Data.db'),
SSTableReader(path='/var/lib/cassandra/data/system/schema_columns/system-schema_columns-jb-38-Data.db'),
SSTableReader(path='/var/lib/cassandra/data/system/schema_columns/system-schema_columns-jb-39-Data.db'),
SSTableReader(path='/var/lib/cassandra/data/system/schema_columns/system-schema_columns-jb-44-Data.db')]



Jun  9 02:27:20 prod-cass23.localdomain cassandra: 2015-06-09 02:27:20,669
ERROR CompactionExecutor:4 CassandraDaemon.uncaughtException - Exception in
thread Thread[CompactionExecutor:4,1,main]
Jun  9 02:27:20 prod-cass23.localdomain
*java.lang.NegativeArraySizeException*
Jun  9 02:27:20 prod-cass23.localdomain at
org.apache.cassandra.utils.EstimatedHistogram$EstimatedHistogramSerializer.deserialize(EstimatedHistogram.java:335)
Jun  9 02:27:20 prod-cass23.localdomain at
org.apache.cassandra.io.sstable.SSTableMetadata$SSTableMetadataSerializer.deserialize(SSTableMetadata.java:462)
Jun  9 02:27:20 prod-cass23.localdomain at
org.apache.cassandra.io.sstable.SSTableMetadata$SSTableMetadataSerializer.deserialize(SSTableMetadata.java:448)
Jun  9 02:27:20 prod-cass23.localdomain at
org.apache.cassandra.io.sstable.SSTableMetadata$SSTableMetadataSerializer.deserialize(SSTableMetadata.java:432)
Jun  9 02:27:20 prod-cass23.localdomain at
org.apache.cassandra.io.sstable.SSTableReader.getAncestors(SSTableReader.java:1366)
Jun  9 02:27:20 prod-cass23.localdomain at
org.apache.cassandra.io.sstable.SSTableMetadata.createCollector(SSTableMetadata.java:134)
Jun  9 02:27:20 prod-cass23.localdomain at
org.apache.cassandra.db.compaction.CompactionTask.createCompactionWriter(CompactionTask.java:316)
Jun  9 02:27:20 prod-cass23.localdomain at
org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:162)
Jun  9 02:27:20 prod-cass23.localdomain at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
Jun  9 02:27:20 prod-cass23.localdomain at
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
Jun  9 02:27:20 prod-cass23.localdomain at
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
Jun  9 02:27:20 prod-cass23.localdomain at
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198)
Jun  9 02:27:20 prod-cass23.localdomain at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
Jun  9 02:27:20 prod-cass23.localdomain at
java.util.concurrent.FutureTask.run(FutureTask.java:262)
Jun  9 02:27:20 prod-cass23.localdomain at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
Jun  9 02:27:20 prod-cass23.localdomain at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
Jun  9 02:27:20 prod-cass23.localdomain at
java.lang.Thread.run(Thread.java:745)
Jun  9 02:27:47 prod-cass23.localdomain cassandra: 2015-06-09 02:27:47,725
INFO main StorageService.setMode - JOINING: Starting to bootstrap...

As you can see this happened first time even before Joining. Second
occasion stack trace:

Jun  9 02:32:15 prod-cass23.localdomain cassandra: 2015-06-09 02:32:15,097
ERROR CompactionExecutor:6 CassandraDaemon.uncaughtException - Exception in
thread Thread[CompactionExecutor:6,1,main]
Jun  9 02:32:15 prod-cass23.localdomain java.lang.NegativeArraySizeException
Jun  9 02:32:15 prod-cass23.localdomain at
org.apache.cassandra.utils.EstimatedHistogram$EstimatedHistogramSerializer.deserialize(EstimatedHistogram.java:335)
Jun  9 02:32:15 prod-cass23.localdomain at
org.apache.cassandra.io.sstable.SSTableMetadata$SSTableMetadataSerializer.deserialize(SSTableMetadata.java:462)
Jun  9 02:32:15 prod-cass23.localdomain at
org.apache.cassandra.io.sstable.SSTableMetadata$SSTableMetadataSerializer.deserialize(SSTableMetadata.java:448)
Jun  9 02:32:15 prod-cass23.localdomain at
org.apache.cassandra.io.sstable.SSTableMetadata$SSTableMetadataSerializer.deserialize(SSTableMetadata.java:432)
Jun  9 02:32:15 prod-cass23.localdomain at
org.apache.cassandra.io.sstable.SSTableReader.getAncestors(SSTableReader.java:1366)
Jun  9 02:32:15 prod-cass23.localdomain at
org.apache.cassandra.io.sstable.SSTableMetadata.createCollector(SSTableMetadata.java:134)
Jun  9 02:32:15 prod-cass23.localdomain at

Re: Hbase vs Cassandra

2015-06-08 Thread Ajay
Hi All,

Thanks for all the input. I posted the same question in HBase forum and got
more response.

Posting the consolidated list here.

Our case is that a central team builds and maintain the platform (Cassandra
as a service). We have couple of usecases which fits Cassandra like
time-series data. But as a platform team, we need to know more features and
usecases which fits or best handled in Cassandra. Also to understand the
usecases where HBase performs better (we might need to have it as a service
too).

*Cassandra:*

1) From 2013 both can still be relevant:
http://www.pythian.com/blog/watch-hbase-vs-cassandra/

2) Here are some use cases from PlanetCassandra.org of companies who chose
Cassandra over HBase after evaluation, or migrated to Cassandra from HBase.
The eComNext interview cited on the page touches on time-series data;
http://planetcassandra.org/hbase-to-cassandra-migration/

3) From googling, the most popular advantages for Cassandra over HBase is
easy to deploy, maintain  monitor and no single point of failure.

4) From our six months research and POC experience in Cassandra, CQL is
pretty limited. Though CQL is targeted for Real time Read and Write, there
are cases where need to pull out data differently and we are OK with little
more latency. But Cassandra doesn't support that. We need MapReduce or
Spark for those. Then the debate starts why Cassandra and why not HBase if
we need Hadoop/Spark for MapReduce.

Expected a few more technical features/usecases that is best handled by
Cassandra (and how it works).

*HBase:*

1) As for the #4 you might be interested in reading
https://aphyr.com/posts/294-call-me-maybe-cassandra
Not sure if there is comparable article about HBase (anybody knows?) but it
can give you another perspective about what else to keep an eye on
regarding these systems.

2) See http://hbase.apache.org/book.html#perf.network.call_me_maybe

3) http://blog.parsely.com/post/1928/cass/
*Anyone have any comments on this?*

4) 1. No killer features comparing to hbase
2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool for
Cassandra but it doesn't support vnodes.
3. Rumors say it fast when it works;) the reason- it can silently drop data
you try to write.
4. Timeseries is a nightmare. The easiest approach is just replicate data
to hdfs, partition it by hour/day and run spark/scalding/pig/hive/Impala

5)  Migrated from Cassandra to HBase.
Reasons:
Scan is fast with HBase. It fits better with time series data model. Please
look at opentsdb. Cassandra models it with large rows.
Server side filtering. You can use to filter some of your time series data
on the server side.
Hbase has a better integration with hadoop in general. We had to write our
own bulk loader using mapreduce for cassandra. hbase has already had a tool
for that. There is a nice integration with flume and kite.
High availability didnet matter for us. 10 secs down is fine for our use
cases.HBase started to support eventually consistent reads.

6) Coprocessor framework (custom code inside Region Server and
MasterServers), which Cassandra is missing, afaik.
   Coprocessors have been widely used by hBase users (Phoenix SQL, for
example) since inception (in 0.92).
* HBase security model is more mature and align well with Hadoop/HDFS
security. Cassandra provides just basic authentication/authorization/SSL
encryption, no Kerberos, no end-to-end data encryption,
no cell level security.

7) Another point to add is the new HBase read high-availability using
timeline-consistent region replicas feature from HBase 1.0 onward, which
brings HBase closer to Cassandra in term of Read Availability during
node failures.  You have a choice for Read Availability now.
https://issues.apache.org/jira/browse/HBASE-10070

8) Hbase can do range scans, and one can attack many problems with range
scans. Cassandra can't do range scans.

9) HBase is a distributed, consistent, sorted key value store. The sorted
bit allows for range scans in addition to the point gets that all K/V
stores support. Nothing more, nothing less.
It happens to store its data in HDFS by default, and we provide convenient
input and output formats for map reduce.

*Neutral:*
1)
http://khangaonkar.blogspot.com/2013/09/cassandra-vs-hbase-which-nosql-store-do.html

2) The fundamental differences that come to mind are:
* HBase is always consistent. Machine outages lead to inability to read or
write data on that machine. With Cassandra you can always write.

* Cassandra defaults to a random partitioner, so range scans are not
possible (by default)
* HBase has a range partitioner (if you don't want that the client has to
prefix the rowkey with a prefix of a hash of the rowkey). The main feature
that set HBase apart are range scans.

* HBase is much more tightly integrated with Hadoop/MapReduce/HDFS, etc.
You can map reduce directly into HFiles and map those into HBase instantly.

* Cassandra has a dedicated company supporting (and promoting) it.
* Getting 

Re: Hbase vs Cassandra

2015-06-08 Thread Ajay
Hi Jens,

All the points listed weren't from me. I posted the HBase Vs Cassandra in
both the forums and consolidated here for the discussion.


On Mon, Jun 8, 2015 at 2:27 PM, Jens Rantil jens.ran...@tink.se wrote:

 Hi,

 Some minor comments:

  2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool
 for Cassandra but it doesn't support vnodes.

 Not entirely sure what you mean here, but we ran Cloudera for a while and
 Cloudera Manager was buggy and hard to debug. Overall, our experience
 wasn't very good. This was definitely also due to us not knowing how all
 the Cloudera packages were configured.


* This is the one of the response I got it from HBase forum. Datastax
OpsCenter is there but seems it doesn't support the latest Cassandra
versions (we tried it couple of times and there were bugs too)*


  HBase is always consistent. Machine outages lead to inability to read
 or write data on that machine. With Cassandra you can always write.

 Sort of true. You can decide write consistency and throw an exception if
 write didn't go through consistently. However, do note that Cassandra will
 never rollback failed writes which means writes aren't atomic (as in ACID).

 * If I understand correctly, you mean when we write with QUORUM and
Cassandra writes to few machines and fails to write to few machines and
throws exception if it doesn't satisfy QUORUM, leaving it inconsistent and
doesn't rollback?. *


 We chose Cassandra over HBase mostly due to ease of managability. We are a
 small team, and my feeling is that you will want dedicated people taking
 care of a Hadoop cluster if you are going down the HBase path. A Cassandra
 cluster can be handled by a single engineer and is, in my opinion, easier
 to maintain.


* This is the most popular reason for Cassandra over HBase. But this
alone is not a sufficient driver. *


 Cheers,
 Jens

 On Mon, Jun 8, 2015 at 9:59 AM, Ajay ajay.ga...@gmail.com wrote:

 Hi All,

 Thanks for all the input. I posted the same question in HBase forum and
 got more response.

 Posting the consolidated list here.

 Our case is that a central team builds and maintain the platform
 (Cassandra as a service). We have couple of usecases which fits Cassandra
 like time-series data. But as a platform team, we need to know more
 features and usecases which fits or best handled in Cassandra. Also to
 understand the usecases where HBase performs better (we might need to have
 it as a service too).

 *Cassandra:*

 1) From 2013 both can still be relevant:
 http://www.pythian.com/blog/watch-hbase-vs-cassandra/

 2) Here are some use cases from PlanetCassandra.org of companies who
 chose Cassandra over HBase after evaluation, or migrated to Cassandra from
 HBase.
 The eComNext interview cited on the page touches on time-series data;
 http://planetcassandra.org/hbase-to-cassandra-migration/

 3) From googling, the most popular advantages for Cassandra over HBase is
 easy to deploy, maintain  monitor and no single point of failure.

 4) From our six months research and POC experience in Cassandra, CQL is
 pretty limited. Though CQL is targeted for Real time Read and Write, there
 are cases where need to pull out data differently and we are OK with little
 more latency. But Cassandra doesn't support that. We need MapReduce or
 Spark for those. Then the debate starts why Cassandra and why not HBase if
 we need Hadoop/Spark for MapReduce.

 Expected a few more technical features/usecases that is best handled by
 Cassandra (and how it works).

 *HBase:*

 1) As for the #4 you might be interested in reading
 https://aphyr.com/posts/294-call-me-maybe-cassandra
 Not sure if there is comparable article about HBase (anybody knows?) but
 it can give you another perspective about what else to keep an eye on
 regarding these systems.

 2) See http://hbase.apache.org/book.html#perf.network.call_me_maybe

 3) http://blog.parsely.com/post/1928/cass/
 *Anyone have any comments on this?*

 4) 1. No killer features comparing to hbase
 2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool
 for Cassandra but it doesn't support vnodes.
 3. Rumors say it fast when it works;) the reason- it can silently drop
 data you try to write.
 4. Timeseries is a nightmare. The easiest approach is just replicate data
 to hdfs, partition it by hour/day and run spark/scalding/pig/hive/Impala

 5)  Migrated from Cassandra to HBase.
 Reasons:
 Scan is fast with HBase. It fits better with time series data model.
 Please look at opentsdb. Cassandra models it with large rows.
 Server side filtering. You can use to filter some of your time series
 data on the server side.
 Hbase has a better integration with hadoop in general. We had to write
 our own bulk loader using mapreduce for cassandra. hbase has already had a
 tool for that. There is a nice integration with flume and kite.
 High availability didnet matter for us. 10 secs down is fine for our use
 cases.HBase started to support eventually 

Re: Ghost compaction process

2015-06-08 Thread Tim Heckman
Does `nodetool comactionstats` show nothing running as well? Also, for
posterity what are some details of the setup (C* version, etc.)?

-Tim

--
Tim Heckman
Operations Engineer
PagerDuty, Inc.


On Sun, Jun 7, 2015 at 6:40 PM, Arturas Raizys artu...@noantidot.com
wrote:

 Hello,

 I'm having problem there in 1 node I have continues compaction process
 running and consuming CPU. nodetool tpstats show 1 compaction in
 progress, but if I try to query system.compactions_in_progress table, I
 see 0 records. This never ending compaction does slow down node and it
 becomes laggy.
 I'm willing to hire a contractor to solve this problem if anyone is
 interested.


 Cheers,
 Arturas



Ghost compaction process

2015-06-08 Thread Arturas Raizys
Hello,

I'm having problem there in 1 node I have continues compaction process
running and consuming CPU. nodetool tpstats show 1 compaction in
progress, but if I try to query system.compactions_in_progress table, I
see 0 records. This never ending compaction does slow down node and it
becomes laggy.
I'm willing to hire a contractor to solve this problem if anyone is
interested.


Cheers,
Arturas


Re: Ghost compaction process

2015-06-08 Thread Arturas Raizys
Hi,

 Does `nodetool comactionstats` show nothing running as well? Also, for
 posterity what are some details of the setup (C* version, etc.)?

`nodetool comactionstats` does not return anything, it just waits.
If I do enable DEBUG logging, I see this line poping up while executing
`nodetool compactionstats` :
DEBUG [RMI TCP Connection(1856)-127.0.0.1] 2015-06-08 09:29:46,043
LeveledManifest.java:680 - Estimating [0, 0, 0, 0, 0, 0, 0, 0, 0]
compactions to do for system.paxos

I'm running Cassandra 2.1.14 with 7 node cluster. We're using small VM
with 8GB of ram and SSD. Our data size per node with RF=2 is ~40GB. Load
is ~ 1000 writes/second. Most of the data TTL is 2weeks.


Cheers,
Arturas


Re: Hbase vs Cassandra

2015-06-08 Thread Jens Rantil
Hi,

Some minor comments:

 2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool
for Cassandra but it doesn't support vnodes.

Not entirely sure what you mean here, but we ran Cloudera for a while and
Cloudera Manager was buggy and hard to debug. Overall, our experience
wasn't very good. This was definitely also due to us not knowing how all
the Cloudera packages were configured.

 HBase is always consistent. Machine outages lead to inability to read or
write data on that machine. With Cassandra you can always write.

Sort of true. You can decide write consistency and throw an exception if
write didn't go through consistently. However, do note that Cassandra will
never rollback failed writes which means writes aren't atomic (as in ACID).

We chose Cassandra over HBase mostly due to ease of managability. We are a
small team, and my feeling is that you will want dedicated people taking
care of a Hadoop cluster if you are going down the HBase path. A Cassandra
cluster can be handled by a single engineer and is, in my opinion, easier
to maintain.

Cheers,
Jens

On Mon, Jun 8, 2015 at 9:59 AM, Ajay ajay.ga...@gmail.com wrote:

 Hi All,

 Thanks for all the input. I posted the same question in HBase forum and
 got more response.

 Posting the consolidated list here.

 Our case is that a central team builds and maintain the platform
 (Cassandra as a service). We have couple of usecases which fits Cassandra
 like time-series data. But as a platform team, we need to know more
 features and usecases which fits or best handled in Cassandra. Also to
 understand the usecases where HBase performs better (we might need to have
 it as a service too).

 *Cassandra:*

 1) From 2013 both can still be relevant:
 http://www.pythian.com/blog/watch-hbase-vs-cassandra/

 2) Here are some use cases from PlanetCassandra.org of companies who chose
 Cassandra over HBase after evaluation, or migrated to Cassandra from HBase.
 The eComNext interview cited on the page touches on time-series data;
 http://planetcassandra.org/hbase-to-cassandra-migration/

 3) From googling, the most popular advantages for Cassandra over HBase is
 easy to deploy, maintain  monitor and no single point of failure.

 4) From our six months research and POC experience in Cassandra, CQL is
 pretty limited. Though CQL is targeted for Real time Read and Write, there
 are cases where need to pull out data differently and we are OK with little
 more latency. But Cassandra doesn't support that. We need MapReduce or
 Spark for those. Then the debate starts why Cassandra and why not HBase if
 we need Hadoop/Spark for MapReduce.

 Expected a few more technical features/usecases that is best handled by
 Cassandra (and how it works).

 *HBase:*

 1) As for the #4 you might be interested in reading
 https://aphyr.com/posts/294-call-me-maybe-cassandra
 Not sure if there is comparable article about HBase (anybody knows?) but
 it can give you another perspective about what else to keep an eye on
 regarding these systems.

 2) See http://hbase.apache.org/book.html#perf.network.call_me_maybe

 3) http://blog.parsely.com/post/1928/cass/
 *Anyone have any comments on this?*

 4) 1. No killer features comparing to hbase
 2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool
 for Cassandra but it doesn't support vnodes.
 3. Rumors say it fast when it works;) the reason- it can silently drop
 data you try to write.
 4. Timeseries is a nightmare. The easiest approach is just replicate data
 to hdfs, partition it by hour/day and run spark/scalding/pig/hive/Impala

 5)  Migrated from Cassandra to HBase.
 Reasons:
 Scan is fast with HBase. It fits better with time series data model.
 Please look at opentsdb. Cassandra models it with large rows.
 Server side filtering. You can use to filter some of your time series data
 on the server side.
 Hbase has a better integration with hadoop in general. We had to write our
 own bulk loader using mapreduce for cassandra. hbase has already had a tool
 for that. There is a nice integration with flume and kite.
 High availability didnet matter for us. 10 secs down is fine for our use
 cases.HBase started to support eventually consistent reads.

 6) Coprocessor framework (custom code inside Region Server and
 MasterServers), which Cassandra is missing, afaik.
Coprocessors have been widely used by hBase users (Phoenix SQL, for
 example) since inception (in 0.92).
 * HBase security model is more mature and align well with Hadoop/HDFS
 security. Cassandra provides just basic authentication/authorization/SSL
 encryption, no Kerberos, no end-to-end data encryption,
 no cell level security.

 7) Another point to add is the new HBase read high-availability using
 timeline-consistent region replicas feature from HBase 1.0 onward, which
 brings HBase closer to Cassandra in term of Read Availability during
 node failures.  You have a choice for Read Availability now.
 https://issues.apache.org/jira/browse/HBASE-10070

Re: Ghost compaction process

2015-06-08 Thread Carlos Rolo
HI,

Is it 2.0.14 or 2.1.4? If you are on 2.1.4 I would recommend an upgrade to
2.1.5 regardless of that issue.

From the data you provide it is difficult to access what is the issue. If
you are running with RF=2 you can always add another node and kill that one
if that is the only node that shows that problem. With 40GB load is not a
big issue.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
http://linkedin.com/in/carlosjuzarterolo*
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.com

On Mon, Jun 8, 2015 at 4:04 AM, Arturas Raizys artu...@noantidot.com
wrote:

 Hi,

  Does `nodetool comactionstats` show nothing running as well? Also, for
  posterity what are some details of the setup (C* version, etc.)?

 `nodetool comactionstats` does not return anything, it just waits.
 If I do enable DEBUG logging, I see this line poping up while executing
 `nodetool compactionstats` :
 DEBUG [RMI TCP Connection(1856)-127.0.0.1] 2015-06-08 09:29:46,043
 LeveledManifest.java:680 - Estimating [0, 0, 0, 0, 0, 0, 0, 0, 0]
 compactions to do for system.paxos

 I'm running Cassandra 2.1.14 with 7 node cluster. We're using small VM
 with 8GB of ram and SSD. Our data size per node with RF=2 is ~40GB. Load
 is ~ 1000 writes/second. Most of the data TTL is 2weeks.


 Cheers,
 Arturas


-- 


--





Re: Hbase vs Cassandra

2015-06-08 Thread Jens Rantil
On Mon, Jun 8, 2015 at 11:16 AM, Ajay ajay.ga...@gmail.com wrote:

  If I understand correctly, you mean when we write with QUORUM and
 Cassandra writes to few machines and fails to write to few machines and
 throws exception if it doesn't satisfy QUORUM, leaving it inconsistent and
 doesn't rollback?.


Yes.

/Jens


-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink


Re: sstableloader usage doubts

2015-06-08 Thread ZeroUno

Il 05/06/15 22:40, Robert Coli ha scritto:


On Fri, Jun 5, 2015 at 7:53 AM, Sebastian Estevez
sebastian.este...@datastax.com mailto:sebastian.este...@datastax.com
wrote:

Since you only restored one dc's sstables, you should be able to
rebuild them on the second DC.

Refresh means pick up new SSTables that have been directly added to
the data directory.

Rebuild means stream data from other replicas to re create SSTables
from scratch.

Sebastian's response is correct; use rebuild. Sorry that I missed that
specific aspect of your question!


Thank you both.

So you mean that refresh needs to be used if the cluster is running, 
but if I stopped cassandra while copying the sstables then refresh is 
useless? So the error No new SSTables were found during my refresh 
attempt is due to the fact that the sstables in my data dir were not 
new because already loaded, and not to the files not being found?


So... if I stop the two nodes on the first DC, restore their sstables' 
files, and then restart the nodes, nothing else needs to be done on the 
first DC?


And on the second DC instead I just need to do nodetool rebuild -- 
FirstDC on _both_ nodes?


--
01



Restoring all cluster from snapshots

2015-06-08 Thread Anton Koshevoy
Hello all.

I need to transfer and start the copy of production cluster in a test 
environment. My steps:

- nodetool snapshot -t `hostname`-#{cluster_name}-#{timestamp} -p #{jmx_port}
- nodetool ring -p #{jmx_port} | grep `/sbin/ifconfig eth0 | grep 'inet addr' | 
awk -F: '{print $2}' | awk '{print $1}'` | awk '{ print $NF }' | tr '\\n' ',' | 
sudo tee /etc/cassandra/#{cluster_name}.conf/tokens.txt
- rsync snapshots to the backup machine
- copy files to the 2 test servers in the same folders as on production.
- sudo rm -rf /db/cassandra/cr/data0*/system/*
- paste list of initial_token from step 2 to the cassandra.yaml file on each 
server
- start both test servers.

And instead of gigabytes of my keyspaces I see only:

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens  Owns (effective)  Host ID                  
             Rack
UN  10.40.231.3   151.06 KB  256     100.0%            
c505db2f-d14a-4044-949f-cb952ec022f6  RACK01
UN  10.40.231.31  134.59 KB  256     100.0%            
12879849-ade0-4dcb-84c0-abb3db996ba7  RACK01

And any mentions about my keyspaces here:

[cqlsh 5.0.1 | Cassandra 2.1.3 | CQL spec 3.2.0 | Native protocol v3]
Use HELP for help.
cqlsh
cqlsh describe keyspaces

system_traces  system
cqlsh

What Do I miss in this process?

Re: Avoiding Data Duplication

2015-06-08 Thread Paulo Motta
Some options I can think of:

1 - depending on your data size and stime query frequency, you may use
spark to peform queries filtering by server time in the log table, maybe
within an device time window to reduce the dataset your spark job will need
to go through. more info on the spark connector:
https://github.com/datastax/spark-cassandra-connector

2 - if dtime and stime are almost always in the same date bucket
(day/hour/minute/second), you may create an additional table stable_log
with the same structure, but the date bucket refers to the sdate field. so,
when you have an entry when stime and dtime are not from the same bucket,
you should insert that entry in both the log and stime_log tables. when you
want to query entries by stime, you take the distinct union of the query of
both tables in your client application. this way, you only duplicate
delayed data.

3 - if you data field is big and you can't afford duplicating that,
create an additional table stable_log, but do not store the data field,
only the metadata (imei, date, dtime, stime).. so when you want to query by
stime, first query the stable_log, and then query the original log table to
fetch the data field.

2015-06-05 18:10 GMT-03:00 Abhishek Singh Bailoo 
abhishek.singh.bai...@gmail.com:

 Hello!

 I have a column family to log in data coming from my GPS devices.

 CREATE TABLE log(
   imei ascii,
   date ascii,
   dtime timestamp,
   data ascii,
   stime timestamp,
   PRIMARY KEY ((imei, date), dtime))
   WITH CLUSTERING ORDER BY (dtime DESC)
 ;

 It is the standard schema for modeling time series data where
 imei is the unique ID associated with each GPS device
 date is the date taken from dtime
 dtime is the date-time coming from the device
 data is all the latitude, longitude etc that the device is sending us
 stime is the date-time stamp of the server

 The reason why I put dtime in the primary key as the clustering column is
 because most of our queries are done on device time. There can be a delay
 of a few minutes to a few hours (or a few days! in rare cases) between
 dtime and stime if the network is not available.

 However, now we want to query on server time as well for the purpose of
 debugging. These queries will be not as common as queries on  device time.
 Say for every 100 queries on dtime there will be just 1 query on stime.

 What options do I have?

 1. Seconday Index - not possible because stime is a timestamp and CQL does
 not allow me to put  or  in the query for secondary index

 2. Data duplication - I can build another column family where I will index
 by stime but that means I am storing twice as much data. I know everyone
 says that write operations are cheap and storage is cheap but how? If I
 have to buy twice as many machines on AWS EC2 each with their own ephemeral
 storage, then my bill doubles up!

 Any other ideas I can try?

 Many Thanks,
 Abhishek



[RELEASE] Apache Cassandra 2.2.0-rc1 released

2015-06-08 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.2.0-rc1.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a release candidate[1] on the 2.2 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/pBjybx (CHANGES.txt)
[2]: http://goo.gl/E1RiHd (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.1.6 released

2015-06-08 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.1.6.  We are now calling 2.1 series stable and suitable for
production.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/8aR9L2 (CHANGES.txt)
[2]: http://goo.gl/dstU4D (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA