Re: Astyanax returns empty row

2013-01-16 Thread Sávio Teles
I ran the tests with only one machine, so the CL_ONE is not the problem. Am
i right?

2013/1/15 Hiller, Dean dean.hil...@nrel.gov

 What is your consistency level set to?  If you set it to CL_ONE, you could
 get different results or is your database constant and unchanging?

 Dean

 From: Sávio Teles savio.te...@lupa.inf.ufg.brmailto:
 savio.te...@lupa.inf.ufg.br
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Tuesday, January 15, 2013 5:43 AM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Astyanax returns empty row


 sometimes Astyanax returns a empty row for a specific key. For example, on
 first attempt Astyanax returns a empty row for a specific key, but on the
 second attempt it returns the desired row.




-- 
Atenciosamente,
Sávio S. Teles de Oliveira
voice: +55 62 9136 6996
http://br.linkedin.com/in/savioteles
Mestrando em Ciências da Computação - UFG
Arquiteto de Software
Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG


Re: How many BATCH inserts in to many?

2013-01-16 Thread Alan Ristić
Tnx all for clarification and your views. Queues and asynch are definitly a
way to go. Anyway I'll take pull+aggregate aproach for now, it should work
better for start. (if someone has the same follows app problem, there is
a great research: http://research.yahoo.com/files/sigmod278-silberstein.pdf
 )

Lp,
*Alan Ristić*

*w*: www.microhint.com-- obišči, tnx
* f*: facebook.com/microhint --  lajkaj, tnx*
*
 *t*: twitter.com/microhint_com --  follow, tnx
*m*: 040 423 688


2013/1/14 Vitalii Tymchyshyn tiv...@gmail.com

 Well, for me it was better to use async operations then batches. So, you
 are not bitten by latency, but can control everything per-operation. You
 will need to support a kind of window thought. But this windows can be
 quite low, like 10-20 ops.


 2013/1/14 Wei Zhu wz1...@yahoo.com

 Another potential issue is when some failure happens to some of the
 mutations. Is atomic batches in 1.2 designed to resolve this?

 http://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2

 -Wei

 - Original Message -
 From: aaron morton aa...@thelastpickle.com
 To: user@cassandra.apache.org
 Sent: Sunday, January 13, 2013 7:57:56 PM
 Subject: Re: How many BATCH inserts in to many?

 With regard to a large number of records in a batch mutation there are
 some potential issues.


 Each row becomes a task in the write thread pool on each replica. If a
 single client sends 1,000 rows in a mutation it will take time for the
 (default) 32 threads in the write pool to work through the mutations. While
 they are doing this other clients / requests will appear to be starved /
 stalled.


 There are also issues with the max message size in thrift and cql over
 thrift.


 IMHO as a rule of thumb dont go over a few hundred if you have a high
 number of concurrent writers.


 Cheers








 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand


 @aaronmorton
 http://www.thelastpickle.com


 On 14/01/2013, at 12:56 AM, Radim Kolar  h...@filez.com  wrote:


 do not use cassandra for implementing queueing system with high
 throughput. It does not scale because of tombstone management. Use hornetQ,
 its amazingly fast broker but it has quite slow persistence if you want to
 create queues significantly larger then your memory and use selectors for
 searching for specific messages in them.

 My point is for implementing queue message broker is what you want.





 --
 Best regards,
  Vitalii Tymchyshyn



read path, I have missed something

2013-01-16 Thread Carlos Pérez Miguel
Hi,

I am trying to understand the read path in Cassandra. I've read Cassandra's
documentation and it seems that the read path is like this:

- Client contacts with a proxy node which performs the operation over
certain object
- Proxy node sends requests to every replica of that object
- Replica nodes answers eventually if they are up
- After the first R replicas answer, the proxy node returns value to the
client.
- If some of the replicas are non updated and readrepair is active, proxy
node updates those replicas.

Ok, so far so good.

But now I found some incoherences that I don't understand:

Let's suppose that we have a 5 node cluster: x1, x2, x3, x4 and x5
each with replication factor 3, read_repair_chance=0.0, autobootstrap=false
and caching=NONE
We have keyspace KS1 and colunfamily CF1.

With this configuration, we know that if any node crashes and erases its
data directories it will be necesary to run nodetool repair in that node in
order to repair that node and gather information from its replica
companions.

So, let's suppose that x1, x2 and x3 are the endpoint which stores the data
KS1.CF1['data1']
If x1 crashes (loosing all its data), and we execute get KS1.CF1['data1']
with consistency level ALL, the operation will fail. That is ok to my
understanding.

If we restart x1 node and doesn't execute nodetool repair and repeat the
operation get KS1.CF1['data1'] using consistency ALL, we will obtain the
original data! Why? one of the nodes doesn't have any data about
KS1.CF1['data1']. Ok, let's suppose that as all the required nodes answer,
even if one doesn't have data, the operation ends correctly.

Now let's repeat the same procedure with the rest of nodes, that is:

1- stop x1, erase data, logs, cache and commitlog from x1
2- restart x1 adn don't repair it
3- stop x2, erase data, logs, cache and commitlog from x2
4- restart x2 adn don't repair it
5- stop x3, erase data, logs, cache and commitlog from x3
6- restart x3 adn don't repair it
7- execute get KS1.CF1['data1'] with consistency level ALL - still return
the correct data!

Where did that data come from? the endpoint is supposed to be empty of
data. I tried this using cassandra-cli and cassandra's ruby client and the
result is always the same. What did I miss?

Thank you for reading until the end, ;)

Bye

Carlos Pérez Miguel


Pig / Map Reduce on Cassandra

2013-01-16 Thread cscetbon.ext
Hi,

I know that DataStax Enterprise package provide Brisk, but is there a community 
version ? Is it easy to interface Hadoop with Cassandra as the storage or do we 
absolutely have to use Brisk for that ?
I know CassandraFS is natively available in cassandra 1.2, the version I use, 
so is there a way/procedure to interface hadoop with Cassandra as the storage ?

Thanks 
_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
France Telecom - Orange decline toute responsabilite si ce message a ete 
altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, France Telecom - Orange is not liable for messages 
that have been modified, changed or falsified.
Thank you.



Re: Astyanax returns empty row

2013-01-16 Thread Sávio Teles
We have multiple clients reading the same row key. It makes no sense fail
in one machine. When we use Thrift, Cassandra always returns the correct
result.


2013/1/16 Sávio Teles savio.te...@lupa.inf.ufg.br

 I ran the tests with only one machine, so the CL_ONE is not the problem.
 Am i right?





-- 
Atenciosamente,
Sávio S. Teles de Oliveira
voice: +55 62 9136 6996
http://br.linkedin.com/in/savioteles
 Mestrando em Ciências da Computação - UFG
Arquiteto de Software
Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG


Re: Pig / Map Reduce on Cassandra

2013-01-16 Thread James Schappet
Here are a few examples I have worked on, reading from xml.gz files then
writing to cassandara.


https://github.com/jschappet/medline

You will also need:

https://github.com/jschappet/medline-base



These examples are Hadoop Jobs using Cassandra as the Data Store.

This one is a good place to start.
https://github.com/jschappet/medline/blob/master/src/main/java/edu/uiowa/ic
ts/jobs/LoadMedline/StartJob.java

ConfigHelper.setInputColumnFamily(job.getConfiguration(), KEYSPACE,
COLUMN_FAMILY);
ConfigHelper.setOutputColumnFamily(job.getConfiguration(), KEYSPACE,
outputPath);

job.setMapperClass(MapperToCassandra.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);

LOG.info(Writing output to Cassandra);
//job.setReducerClass(ReducerToCassandra.class);
job.setOutputFormatClass(ColumnFamilyOutputFormat.class);

ConfigHelper.setRpcPort(job.getConfiguration(), 9160);
//org.apache.cassandra.dht.LocalPartitioner
ConfigHelper.setInitialAddress(job.getConfiguration(),
localhost);
ConfigHelper.setPartitioner(job.getConfiguration(),
org.apache.cassandra.dht.RandomPartitioner);






On 1/16/13 7:37 AM, cscetbon@orange.com cscetbon@orange.com
wrote:

Hi,

I know that DataStax Enterprise package provide Brisk, but is there a
community version ? Is it easy to interface Hadoop with Cassandra as the
storage or do we absolutely have to use Brisk for that ?
I know CassandraFS is natively available in cassandra 1.2, the version I
use, so is there a way/procedure to interface hadoop with Cassandra as
the storage ?

Thanks 
__
___

Ce message et ses pieces jointes peuvent contenir des informations
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez
recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
electroniques etant susceptibles d'alteration,
France Telecom - Orange decline toute responsabilite si ce message a ete
altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and
delete this message and its attachments.
As emails may be altered, France Telecom - Orange is not liable for
messages that have been modified, changed or falsified.
Thank you.





Re: Pig / Map Reduce on Cassandra

2013-01-16 Thread cscetbon.ext
I don't want to write to Cassandra as it replicates data from another 
datacenter, but I just want to use Hadoop Jobs (Pig and Hive) to read data from 
it. I would like to use the same configuration as 
http://www.datastax.com/dev/blog/hadoop-mapreduce-in-the-cassandra-cluster but 
I want to know if there are alternatives to DataStax Enterprise package.

Thanks
On Jan 16, 2013, at 3:59 PM, James Schappet jschap...@gmail.com wrote:

 Here are a few examples I have worked on, reading from xml.gz files then
 writing to cassandara.
 
 
 https://github.com/jschappet/medline
 
 You will also need:
 
 https://github.com/jschappet/medline-base
 
 
 
 These examples are Hadoop Jobs using Cassandra as the Data Store.
 
 This one is a good place to start.
 https://github.com/jschappet/medline/blob/master/src/main/java/edu/uiowa/ic
 ts/jobs/LoadMedline/StartJob.java
 
 ConfigHelper.setInputColumnFamily(job.getConfiguration(), KEYSPACE,
 COLUMN_FAMILY);
   ConfigHelper.setOutputColumnFamily(job.getConfiguration(), KEYSPACE,
 outputPath);
 
job.setMapperClass(MapperToCassandra.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
 
   LOG.info(Writing output to Cassandra);
   //job.setReducerClass(ReducerToCassandra.class);
   job.setOutputFormatClass(ColumnFamilyOutputFormat.class);
 
ConfigHelper.setRpcPort(job.getConfiguration(), 9160);
//org.apache.cassandra.dht.LocalPartitioner
ConfigHelper.setInitialAddress(job.getConfiguration(),
 localhost);
ConfigHelper.setPartitioner(job.getConfiguration(),
 org.apache.cassandra.dht.RandomPartitioner);
 
 
 
 
 
 
 On 1/16/13 7:37 AM, cscetbon@orange.com cscetbon@orange.com
 wrote:
 
 Hi,
 
 I know that DataStax Enterprise package provide Brisk, but is there a
 community version ? Is it easy to interface Hadoop with Cassandra as the
 storage or do we absolutely have to use Brisk for that ?
 I know CassandraFS is natively available in cassandra 1.2, the version I
 use, so is there a way/procedure to interface hadoop with Cassandra as
 the storage ?
 
 Thanks 
 __
 ___
 
 Ce message et ses pieces jointes peuvent contenir des informations
 confidentielles ou privilegiees et ne doivent donc
 pas etre diffuses, exploites ou copies sans autorisation. Si vous avez
 recu ce message par erreur, veuillez le signaler
 a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
 electroniques etant susceptibles d'alteration,
 France Telecom - Orange decline toute responsabilite si ce message a ete
 altere, deforme ou falsifie. Merci.
 
 This message and its attachments may contain confidential or privileged
 information that may be protected by law;
 they should not be distributed, used or copied without authorisation.
 If you have received this email in error, please notify the sender and
 delete this message and its attachments.
 As emails may be altered, France Telecom - Orange is not liable for
 messages that have been modified, changed or falsified.
 Thank you.
 
 
 


_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
France Telecom - Orange decline toute responsabilite si ce message a ete 
altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, France Telecom - Orange is not liable for messages 
that have been modified, changed or falsified.
Thank you.



Re: Pig / Map Reduce on Cassandra

2013-01-16 Thread James Schappet
Try this one then, it reads from cassandra, then writes back to cassandra,
but you could change the write to where ever you would like.



   getConf().set(IN_COLUMN_NAME, columnName );

Job job = new Job(getConf(), ProcessRawXml);
job.setInputFormatClass(ColumnFamilyInputFormat.class);
job.setNumReduceTasks(0);

job.setJarByClass(StartJob.class);
job.setMapperClass(ParseMapper.class);
job.setOutputKeyClass(ByteBuffer.class);
//job.setOutputValueClass(Text.class);
job.setOutputFormatClass(ColumnFamilyOutputFormat.class);

ConfigHelper.setOutputColumnFamily(job.getConfiguration(),
KEYSPACE, COLUMN_FAMILY);
job.setInputFormatClass(ColumnFamilyInputFormat.class);
ConfigHelper.setRpcPort(job.getConfiguration(), 9160);
//org.apache.cassandra.dht.LocalPartitioner
ConfigHelper.setInitialAddress(job.getConfiguration(),
localhost);
ConfigHelper.setPartitioner(job.getConfiguration(),
org.apache.cassandra.dht.RandomPartitioner);
ConfigHelper.setInputColumnFamily(job.getConfiguration(),
KEYSPACE, COLUMN_FAMILY);


SlicePredicate predicate = new
SlicePredicate().setColumn_names(Arrays.asList(ByteBufferUtil.bytes(columnN
ame)));
//  SliceRange slice_range = new SliceRange();
//  slice_range.setStart(ByteBufferUtil.bytes(startPoint));
//  slice_range.setFinish(ByteBufferUtil.bytes(endPoint));
//  
//  predicate.setSlice_range(slice_range);
ConfigHelper.setInputSlicePredicate(job.getConfiguration(),
predicate);

job.waitForCompletion(true);


https://github.com/jschappet/medline/blob/master/src/main/java/edu/uiowa/ic
ts/jobs/ProcessXml/StartJob.java








On 1/16/13 9:22 AM, cscetbon@orange.com cscetbon@orange.com
wrote:

I don't want to write to Cassandra as it replicates data from another
datacenter, but I just want to use Hadoop Jobs (Pig and Hive) to read
data from it. I would like to use the same configuration as
http://www.datastax.com/dev/blog/hadoop-mapreduce-in-the-cassandra-cluster
 but I want to know if there are alternatives to DataStax Enterprise
package.

Thanks
On Jan 16, 2013, at 3:59 PM, James Schappet jschap...@gmail.com wrote:

 Here are a few examples I have worked on, reading from xml.gz files then
 writing to cassandara.
 
 
 https://github.com/jschappet/medline
 
 You will also need:
 
 https://github.com/jschappet/medline-base
 
 
 
 These examples are Hadoop Jobs using Cassandra as the Data Store.
 
 This one is a good place to start.
 
https://github.com/jschappet/medline/blob/master/src/main/java/edu/uiowa/
ic
 ts/jobs/LoadMedline/StartJob.java
 
 ConfigHelper.setInputColumnFamily(job.getConfiguration(), KEYSPACE,
 COLUMN_FAMILY);
  ConfigHelper.setOutputColumnFamily(job.getConfiguration(), KEYSPACE,
 outputPath);
 
job.setMapperClass(MapperToCassandra.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
 
  LOG.info(Writing output to Cassandra);
  //job.setReducerClass(ReducerToCassandra.class);
  job.setOutputFormatClass(ColumnFamilyOutputFormat.class);
 
ConfigHelper.setRpcPort(job.getConfiguration(), 9160);
//org.apache.cassandra.dht.LocalPartitioner
ConfigHelper.setInitialAddress(job.getConfiguration(),
 localhost);
ConfigHelper.setPartitioner(job.getConfiguration(),
 org.apache.cassandra.dht.RandomPartitioner);
 
 
 
 
 
 
 On 1/16/13 7:37 AM, cscetbon@orange.com cscetbon@orange.com
 wrote:
 
 Hi,
 
 I know that DataStax Enterprise package provide Brisk, but is there a
 community version ? Is it easy to interface Hadoop with Cassandra as
the
 storage or do we absolutely have to use Brisk for that ?
 I know CassandraFS is natively available in cassandra 1.2, the version
I
 use, so is there a way/procedure to interface hadoop with Cassandra as
 the storage ?
 
 Thanks 
 

__
 ___
 
 Ce message et ses pieces jointes peuvent contenir des informations
 confidentielles ou privilegiees et ne doivent donc
 pas etre diffuses, exploites ou copies sans autorisation. Si vous avez
 recu ce message par erreur, veuillez le signaler
 a l'expediteur et le detruire ainsi que les pieces jointes. Les
messages
 electroniques etant susceptibles d'alteration,
 France Telecom - Orange decline toute responsabilite si ce message a
ete
 altere, deforme ou falsifie. Merci.
 
 This message and its attachments may contain confidential or privileged
 information that may be protected by law;
 they should not be distributed, used or copied without authorisation.
 If you have received this email in error, please notify the sender and
 delete this message and its attachments.

Cassandra 1.1.2 - 1.1.8 upgrade

2013-01-16 Thread Mike

Hello,

We are looking to upgrade our Cassandra cluster from 1.1.2 - 1.1.8 (or 
possibly 1.1.9 depending on timing).  It is my understanding that 
rolling upgrades of Cassandra is supported, so as we upgrade our 
cluster, we can do so one node at a time without experiencing downtime.


Has anyone had any gotchas recently that I should be aware of before 
performing this upgrade?


In order to upgrade, is the only thing that needs to change are the JAR 
files?  Can everything remain as-is?


Thanks,
-Mike


Re: read path, I have missed something

2013-01-16 Thread Sylvain Lebresne
You're missing the correct definition of read_repair_chance.

When you do a read at CL.ALL, all replicas are wait upon and the results
from all those replicas are compared. From that, we can extract which nodes
are not up to date, i.e. which ones can be read repair. And if some node
need to be repair, we do it. Always, whatever the value of
read_repair_chance is.

Now if you do a read at CL.ONE, if you only end up querying 1 replica, you
will never be able to do read repair. That's where read_repair_chance come
into play. What it really control, is how often we query *more* replica
than strictly required by the consistency level. And it happens that the
reason you would want to do that is because of read repair and hence the
option name. But read repair potentially kicks in anytime more than replica
answer a query. One corollary is that read_repair_chance has no impact
whatsoever at CL.ALL.

--
Sylvain


On Wed, Jan 16, 2013 at 1:55 PM, Carlos Pérez Miguel cperez...@gmail.comwrote:

 Hi,

 I am trying to understand the read path in Cassandra. I've read
 Cassandra's documentation and it seems that the read path is like this:

 - Client contacts with a proxy node which performs the operation over
 certain object
 - Proxy node sends requests to every replica of that object
 - Replica nodes answers eventually if they are up
 - After the first R replicas answer, the proxy node returns value to the
 client.
 - If some of the replicas are non updated and readrepair is active, proxy
 node updates those replicas.

 Ok, so far so good.

 But now I found some incoherences that I don't understand:

 Let's suppose that we have a 5 node cluster: x1, x2, x3, x4 and x5
 each with replication factor 3, read_repair_chance=0.0,
 autobootstrap=false and caching=NONE
 We have keyspace KS1 and colunfamily CF1.

 With this configuration, we know that if any node crashes and erases its
 data directories it will be necesary to run nodetool repair in that node in
 order to repair that node and gather information from its replica
 companions.

 So, let's suppose that x1, x2 and x3 are the endpoint which stores the
 data KS1.CF1['data1']
 If x1 crashes (loosing all its data), and we execute get KS1.CF1['data1']
 with consistency level ALL, the operation will fail. That is ok to my
 understanding.

 If we restart x1 node and doesn't execute nodetool repair and repeat the
 operation get KS1.CF1['data1'] using consistency ALL, we will obtain the
 original data! Why? one of the nodes doesn't have any data about
 KS1.CF1['data1']. Ok, let's suppose that as all the required nodes answer,
 even if one doesn't have data, the operation ends correctly.

 Now let's repeat the same procedure with the rest of nodes, that is:

 1- stop x1, erase data, logs, cache and commitlog from x1
 2- restart x1 adn don't repair it
 3- stop x2, erase data, logs, cache and commitlog from x2
 4- restart x2 adn don't repair it
 5- stop x3, erase data, logs, cache and commitlog from x3
 6- restart x3 adn don't repair it
 7- execute get KS1.CF1['data1'] with consistency level ALL - still return
 the correct data!

 Where did that data come from? the endpoint is supposed to be empty of
 data. I tried this using cassandra-cli and cassandra's ruby client and the
 result is always the same. What did I miss?

 Thank you for reading until the end, ;)

 Bye

 Carlos Pérez Miguel



Re: Cassandra 1.1.2 - 1.1.8 upgrade

2013-01-16 Thread Jason Wee
always check NEWS.txt for instance for cassandra 1.1.3 you need to
run nodetool upgradesstables if your cf has counter.


On Wed, Jan 16, 2013 at 11:58 PM, Mike mthero...@yahoo.com wrote:

 Hello,

 We are looking to upgrade our Cassandra cluster from 1.1.2 - 1.1.8 (or
 possibly 1.1.9 depending on timing).  It is my understanding that rolling
 upgrades of Cassandra is supported, so as we upgrade our cluster, we can do
 so one node at a time without experiencing downtime.

 Has anyone had any gotchas recently that I should be aware of before
 performing this upgrade?

 In order to upgrade, is the only thing that needs to change are the JAR
 files?  Can everything remain as-is?

 Thanks,
 -Mike



Re: Cassandra 1.1.2 - 1.1.8 upgrade

2013-01-16 Thread Mike

Thanks for pointing that out.

Given upgradesstables can only be run on a live node, does anyone know 
if there is a danger of having this node in the cluster while this is 
being performed?  Also, can anyone confirm this only needs to be done on 
counter counter column families, or all column families (the former 
makes sense, I'm just making sure).


-Mike

On 1/16/2013 11:08 AM, Jason Wee wrote:
always check NEWS.txt for instance for cassandra 1.1.3 you need to 
run nodetool upgradesstables if your cf has counter.



On Wed, Jan 16, 2013 at 11:58 PM, Mike mthero...@yahoo.com 
mailto:mthero...@yahoo.com wrote:


Hello,

We are looking to upgrade our Cassandra cluster from 1.1.2 -
1.1.8 (or possibly 1.1.9 depending on timing).  It is my
understanding that rolling upgrades of Cassandra is supported, so
as we upgrade our cluster, we can do so one node at a time without
experiencing downtime.

Has anyone had any gotchas recently that I should be aware of
before performing this upgrade?

In order to upgrade, is the only thing that needs to change are
the JAR files?  Can everything remain as-is?

Thanks,
-Mike






Re: read path, I have missed something

2013-01-16 Thread Carlos Pérez Miguel
a, ok. Now I understand where the data came from. When using CL.ALL
read_repair always repairs inconsistent data.

Thanks a lot, Sylvain.


Carlos Pérez Miguel


2013/1/17 Sylvain Lebresne sylv...@datastax.com

 You're missing the correct definition of read_repair_chance.

 When you do a read at CL.ALL, all replicas are wait upon and the results
 from all those replicas are compared. From that, we can extract which nodes
 are not up to date, i.e. which ones can be read repair. And if some node
 need to be repair, we do it. Always, whatever the value of
 read_repair_chance is.

 Now if you do a read at CL.ONE, if you only end up querying 1 replica, you
 will never be able to do read repair. That's where read_repair_chance come
 into play. What it really control, is how often we query *more* replica
 than strictly required by the consistency level. And it happens that the
 reason you would want to do that is because of read repair and hence the
 option name. But read repair potentially kicks in anytime more than replica
 answer a query. One corollary is that read_repair_chance has no impact
 whatsoever at CL.ALL.

 --
 Sylvain


 On Wed, Jan 16, 2013 at 1:55 PM, Carlos Pérez Miguel 
 cperez...@gmail.comwrote:

 Hi,

 I am trying to understand the read path in Cassandra. I've read
 Cassandra's documentation and it seems that the read path is like this:

 - Client contacts with a proxy node which performs the operation over
 certain object
 - Proxy node sends requests to every replica of that object
 - Replica nodes answers eventually if they are up
 - After the first R replicas answer, the proxy node returns value to the
 client.
 - If some of the replicas are non updated and readrepair is active, proxy
 node updates those replicas.

 Ok, so far so good.

 But now I found some incoherences that I don't understand:

 Let's suppose that we have a 5 node cluster: x1, x2, x3, x4 and x5
 each with replication factor 3, read_repair_chance=0.0,
 autobootstrap=false and caching=NONE
 We have keyspace KS1 and colunfamily CF1.

 With this configuration, we know that if any node crashes and erases its
 data directories it will be necesary to run nodetool repair in that node in
 order to repair that node and gather information from its replica
 companions.

 So, let's suppose that x1, x2 and x3 are the endpoint which stores the
 data KS1.CF1['data1']
 If x1 crashes (loosing all its data), and we execute get KS1.CF1['data1']
 with consistency level ALL, the operation will fail. That is ok to my
 understanding.

 If we restart x1 node and doesn't execute nodetool repair and repeat the
 operation get KS1.CF1['data1'] using consistency ALL, we will obtain the
 original data! Why? one of the nodes doesn't have any data about
 KS1.CF1['data1']. Ok, let's suppose that as all the required nodes answer,
 even if one doesn't have data, the operation ends correctly.

 Now let's repeat the same procedure with the rest of nodes, that is:

 1- stop x1, erase data, logs, cache and commitlog from x1
 2- restart x1 adn don't repair it
 3- stop x2, erase data, logs, cache and commitlog from x2
 4- restart x2 adn don't repair it
 5- stop x3, erase data, logs, cache and commitlog from x3
 6- restart x3 adn don't repair it
 7- execute get KS1.CF1['data1'] with consistency level ALL - still
 return the correct data!

 Where did that data come from? the endpoint is supposed to be empty of
 data. I tried this using cassandra-cli and cassandra's ruby client and the
 result is always the same. What did I miss?

 Thank you for reading until the end, ;)

 Bye

 Carlos Pérez Miguel





Re: Cassandra 1.1.2 - 1.1.8 upgrade

2013-01-16 Thread Michael Kjellman
upgradesstables is safe, but it is essentially compaction (because sstables are 
immutable it rewrites the sstable in the new format) so you'll want to do it 
when traffic is low to avoid IO issues.

upgradesstables always needs to be done between majors. While 1.1.2 - 1.1.8 is 
not a major, due to an unforeseen bug in the conversion to microseconds you'll 
need to run upgradesstables.

You can check if all of your sstables have been upgraded by looking at the file 
names of the sstables. Your files should be –hd–

1.1.8 will be –hf–

I don't remember there being changes to cassandra.yaml between  1.1.2 - 1.1.7 
but you might want to check out a clean copy and compare your yaml to make sure 
some of the recommended default haven't changed or required new config options.

Otherwise, nodetool drain, stop the service, upgrade to the new release, and 
start the service, upgradesstables

Hinted Handoff should take care of anything while the node is down but if you 
want to be extra safe you can do a repair –pr on every node but you should be 
doing that on a regular basis anyways!

Hope this helps.

Best,
michael

From: Mike mthero...@yahoo.commailto:mthero...@yahoo.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Wednesday, January 16, 2013 8:15 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Cassandra 1.1.2 - 1.1.8 upgrade

Thanks for pointing that out.

Given upgradesstables can only be run on a live node, does anyone know if there 
is a danger of having this node in the cluster while this is being performed?  
Also, can anyone confirm this only needs to be done on counter counter column 
families, or all column families (the former makes sense, I'm just making sure).

-Mike

On 1/16/2013 11:08 AM, Jason Wee wrote:
always check NEWS.txt for instance for cassandra 1.1.3 you need to run nodetool 
upgradesstables if your cf has counter.


On Wed, Jan 16, 2013 at 11:58 PM, Mike 
mthero...@yahoo.commailto:mthero...@yahoo.com wrote:
Hello,

We are looking to upgrade our Cassandra cluster from 1.1.2 - 1.1.8 (or 
possibly 1.1.9 depending on timing).  It is my understanding that rolling 
upgrades of Cassandra is supported, so as we upgrade our cluster, we can do so 
one node at a time without experiencing downtime.

Has anyone had any gotchas recently that I should be aware of before performing 
this upgrade?

In order to upgrade, is the only thing that needs to change are the JAR files?  
Can everything remain as-is?

Thanks,
-Mike




Re: read path, I have missed something

2013-01-16 Thread Renato Marroquín Mogrovejo
Hi there,

I am sorry to get into this thread with more questions but isn't the
gossip protocol in charge of making the read_repair automatically
anytime a new node comes into the ring? I mean if a node is down, then
we get that node up and running again, wouldn't it be synchronized
automatically?
Thanks!


Renato M.

2013/1/16 Carlos Pérez Miguel cperez...@gmail.com:
 a, ok. Now I understand where the data came from. When using CL.ALL
 read_repair always repairs inconsistent data.

 Thanks a lot, Sylvain.


 Carlos Pérez Miguel


 2013/1/17 Sylvain Lebresne sylv...@datastax.com

 You're missing the correct definition of read_repair_chance.

 When you do a read at CL.ALL, all replicas are wait upon and the results
 from all those replicas are compared. From that, we can extract which nodes
 are not up to date, i.e. which ones can be read repair. And if some node
 need to be repair, we do it. Always, whatever the value of
 read_repair_chance is.

 Now if you do a read at CL.ONE, if you only end up querying 1 replica, you
 will never be able to do read repair. That's where read_repair_chance come
 into play. What it really control, is how often we query *more* replica than
 strictly required by the consistency level. And it happens that the reason
 you would want to do that is because of read repair and hence the option
 name. But read repair potentially kicks in anytime more than replica answer
 a query. One corollary is that read_repair_chance has no impact whatsoever
 at CL.ALL.

 --
 Sylvain


 On Wed, Jan 16, 2013 at 1:55 PM, Carlos Pérez Miguel cperez...@gmail.com
 wrote:

 Hi,

 I am trying to understand the read path in Cassandra. I've read
 Cassandra's documentation and it seems that the read path is like this:

 - Client contacts with a proxy node which performs the operation over
 certain object
 - Proxy node sends requests to every replica of that object
 - Replica nodes answers eventually if they are up
 - After the first R replicas answer, the proxy node returns value to the
 client.
 - If some of the replicas are non updated and readrepair is active, proxy
 node updates those replicas.

 Ok, so far so good.

 But now I found some incoherences that I don't understand:

 Let's suppose that we have a 5 node cluster: x1, x2, x3, x4 and x5
 each with replication factor 3, read_repair_chance=0.0,
 autobootstrap=false and caching=NONE
 We have keyspace KS1 and colunfamily CF1.

 With this configuration, we know that if any node crashes and erases its
 data directories it will be necesary to run nodetool repair in that node in
 order to repair that node and gather information from its replica
 companions.

 So, let's suppose that x1, x2 and x3 are the endpoint which stores the
 data KS1.CF1['data1']
 If x1 crashes (loosing all its data), and we execute get KS1.CF1['data1']
 with consistency level ALL, the operation will fail. That is ok to my
 understanding.

 If we restart x1 node and doesn't execute nodetool repair and repeat the
 operation get KS1.CF1['data1'] using consistency ALL, we will obtain the
 original data! Why? one of the nodes doesn't have any data about
 KS1.CF1['data1']. Ok, let's suppose that as all the required nodes answer,
 even if one doesn't have data, the operation ends correctly.

 Now let's repeat the same procedure with the rest of nodes, that is:

 1- stop x1, erase data, logs, cache and commitlog from x1
 2- restart x1 adn don't repair it
 3- stop x2, erase data, logs, cache and commitlog from x2
 4- restart x2 adn don't repair it
 5- stop x3, erase data, logs, cache and commitlog from x3
 6- restart x3 adn don't repair it
 7- execute get KS1.CF1['data1'] with consistency level ALL - still
 return the correct data!

 Where did that data come from? the endpoint is supposed to be empty of
 data. I tried this using cassandra-cli and cassandra's ruby client and the
 result is always the same. What did I miss?

 Thank you for reading until the end, ;)

 Bye

 Carlos Pérez Miguel





Re: How can OpsCenter show me Read Request Latency where there are no read requests??

2013-01-16 Thread Tyler Hobbs
When you view OpsCenter metrics, you're generating a small number of reads
to fetch the metric data, which is why your read count is near zero instead
of actually being zero.  Since reads are still occurring, Cassandra will
continue to show a read latency.  Basically, you're just viewing the
latency on the reads to fetch metric data.

Normally the number of reads required to view metrics are small enough that
they only make a minor difference in your overall read latency average, but
when you have no other reads occurring, they're the only reads that are
included in the average.


On Tue, Jan 15, 2013 at 9:28 PM, Brian Tarbox tar...@cabotresearch.comwrote:

 I am making heavy use of DataStax OpsCenter to help tune my system and its
 great.

 And yet puzzling.  I see my clients do a burst of Reads causing the
 OpsCenter Read Requests chart to go up and stay up until the clients finish
 doing their reads.  The read request latency chart also goes upbut it
 stays up even after all the reads are done.  At last glance I've had next
 to zero reads for 10 minutes but still have a read request latency thats
 basically unchanged from when there were actual reads.

 How am I to interpret this?

 Thanks.

 Brian Tarbox




-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: read path, I have missed something

2013-01-16 Thread Sylvain Lebresne

 I mean if a node is down, then
 we get that node up and running again, wouldn't it be synchronized
 automatically?


It will, thanks to hinted handoff (not gossip, gossip only handle the ring
topology and a bunch of metadata, it doesn't deal with data synchronization
at all). But hinted handoff are not bulletproof (if only because hinted
handoff expire after some time if they are not delivered). And you're
right, that's probably why Carlos' example worked as he observed it,
especially since he didn't mentioned reads between his stop/erase/restart
steps. Anyway, my description of read_repair_chance is still correct if
someone wonder about that :)

--
Sylvain



 Thanks!


 Renato M.

 2013/1/16 Carlos Pérez Miguel cperez...@gmail.com:
  a, ok. Now I understand where the data came from. When using CL.ALL
  read_repair always repairs inconsistent data.
 
  Thanks a lot, Sylvain.
 
 
  Carlos Pérez Miguel
 
 
  2013/1/17 Sylvain Lebresne sylv...@datastax.com
 
  You're missing the correct definition of read_repair_chance.
 
  When you do a read at CL.ALL, all replicas are wait upon and the results
  from all those replicas are compared. From that, we can extract which
 nodes
  are not up to date, i.e. which ones can be read repair. And if some node
  need to be repair, we do it. Always, whatever the value of
  read_repair_chance is.
 
  Now if you do a read at CL.ONE, if you only end up querying 1 replica,
 you
  will never be able to do read repair. That's where read_repair_chance
 come
  into play. What it really control, is how often we query *more* replica
 than
  strictly required by the consistency level. And it happens that the
 reason
  you would want to do that is because of read repair and hence the option
  name. But read repair potentially kicks in anytime more than replica
 answer
  a query. One corollary is that read_repair_chance has no impact
 whatsoever
  at CL.ALL.
 
  --
  Sylvain
 
 
  On Wed, Jan 16, 2013 at 1:55 PM, Carlos Pérez Miguel 
 cperez...@gmail.com
  wrote:
 
  Hi,
 
  I am trying to understand the read path in Cassandra. I've read
  Cassandra's documentation and it seems that the read path is like this:
 
  - Client contacts with a proxy node which performs the operation over
  certain object
  - Proxy node sends requests to every replica of that object
  - Replica nodes answers eventually if they are up
  - After the first R replicas answer, the proxy node returns value to
 the
  client.
  - If some of the replicas are non updated and readrepair is active,
 proxy
  node updates those replicas.
 
  Ok, so far so good.
 
  But now I found some incoherences that I don't understand:
 
  Let's suppose that we have a 5 node cluster: x1, x2, x3, x4 and x5
  each with replication factor 3, read_repair_chance=0.0,
  autobootstrap=false and caching=NONE
  We have keyspace KS1 and colunfamily CF1.
 
  With this configuration, we know that if any node crashes and erases
 its
  data directories it will be necesary to run nodetool repair in that
 node in
  order to repair that node and gather information from its replica
  companions.
 
  So, let's suppose that x1, x2 and x3 are the endpoint which stores the
  data KS1.CF1['data1']
  If x1 crashes (loosing all its data), and we execute get
 KS1.CF1['data1']
  with consistency level ALL, the operation will fail. That is ok to my
  understanding.
 
  If we restart x1 node and doesn't execute nodetool repair and repeat
 the
  operation get KS1.CF1['data1'] using consistency ALL, we will obtain
 the
  original data! Why? one of the nodes doesn't have any data about
  KS1.CF1['data1']. Ok, let's suppose that as all the required nodes
 answer,
  even if one doesn't have data, the operation ends correctly.
 
  Now let's repeat the same procedure with the rest of nodes, that is:
 
  1- stop x1, erase data, logs, cache and commitlog from x1
  2- restart x1 adn don't repair it
  3- stop x2, erase data, logs, cache and commitlog from x2
  4- restart x2 adn don't repair it
  5- stop x3, erase data, logs, cache and commitlog from x3
  6- restart x3 adn don't repair it
  7- execute get KS1.CF1['data1'] with consistency level ALL - still
  return the correct data!
 
  Where did that data come from? the endpoint is supposed to be empty of
  data. I tried this using cassandra-cli and cassandra's ruby client and
 the
  result is always the same. What did I miss?
 
  Thank you for reading until the end, ;)
 
  Bye
 
  Carlos Pérez Miguel
 
 




Re: How can OpsCenter show me Read Request Latency where there are no read requests??

2013-01-16 Thread Brian Tarbox
Hmm, that's sense but then why is the latency for the reads that get the
metric often so high (several thousand uSecs) and why does it so closely
track the latency of my normal reads?


On Wed, Jan 16, 2013 at 12:14 PM, Tyler Hobbs ty...@datastax.com wrote:

 When you view OpsCenter metrics, you're generating a small number of reads
 to fetch the metric data, which is why your read count is near zero instead
 of actually being zero.  Since reads are still occurring, Cassandra will
 continue to show a read latency.  Basically, you're just viewing the
 latency on the reads to fetch metric data.

 Normally the number of reads required to view metrics are small enough
 that they only make a minor difference in your overall read latency
 average, but when you have no other reads occurring, they're the only reads
 that are included in the average.


 On Tue, Jan 15, 2013 at 9:28 PM, Brian Tarbox tar...@cabotresearch.comwrote:

 I am making heavy use of DataStax OpsCenter to help tune my system and
 its great.

 And yet puzzling.  I see my clients do a burst of Reads causing the
 OpsCenter Read Requests chart to go up and stay up until the clients finish
 doing their reads.  The read request latency chart also goes upbut it
 stays up even after all the reads are done.  At last glance I've had next
 to zero reads for 10 minutes but still have a read request latency thats
 basically unchanged from when there were actual reads.

 How am I to interpret this?

 Thanks.

 Brian Tarbox




 --
 Tyler Hobbs
 DataStax http://datastax.com/



Re: read path, I have missed something

2013-01-16 Thread Renato Marroquín Mogrovejo
Thanks for the explanation Sylvain!

2013/1/16 Sylvain Lebresne sylv...@datastax.com:
 I mean if a node is down, then
 we get that node up and running again, wouldn't it be synchronized
 automatically?


 It will, thanks to hinted handoff (not gossip, gossip only handle the ring
 topology and a bunch of metadata, it doesn't deal with data synchronization
 at all). But hinted handoff are not bulletproof (if only because hinted
 handoff expire after some time if they are not delivered). And you're right,
 that's probably why Carlos' example worked as he observed it, especially
 since he didn't mentioned reads between his stop/erase/restart steps.
 Anyway, my description of read_repair_chance is still correct if someone
 wonder about that :)

 --
 Sylvain



 Thanks!


 Renato M.

 2013/1/16 Carlos Pérez Miguel cperez...@gmail.com:
  a, ok. Now I understand where the data came from. When using CL.ALL
  read_repair always repairs inconsistent data.
 
  Thanks a lot, Sylvain.
 
 
  Carlos Pérez Miguel
 
 
  2013/1/17 Sylvain Lebresne sylv...@datastax.com
 
  You're missing the correct definition of read_repair_chance.
 
  When you do a read at CL.ALL, all replicas are wait upon and the
  results
  from all those replicas are compared. From that, we can extract which
  nodes
  are not up to date, i.e. which ones can be read repair. And if some
  node
  need to be repair, we do it. Always, whatever the value of
  read_repair_chance is.
 
  Now if you do a read at CL.ONE, if you only end up querying 1 replica,
  you
  will never be able to do read repair. That's where read_repair_chance
  come
  into play. What it really control, is how often we query *more* replica
  than
  strictly required by the consistency level. And it happens that the
  reason
  you would want to do that is because of read repair and hence the
  option
  name. But read repair potentially kicks in anytime more than replica
  answer
  a query. One corollary is that read_repair_chance has no impact
  whatsoever
  at CL.ALL.
 
  --
  Sylvain
 
 
  On Wed, Jan 16, 2013 at 1:55 PM, Carlos Pérez Miguel
  cperez...@gmail.com
  wrote:
 
  Hi,
 
  I am trying to understand the read path in Cassandra. I've read
  Cassandra's documentation and it seems that the read path is like
  this:
 
  - Client contacts with a proxy node which performs the operation over
  certain object
  - Proxy node sends requests to every replica of that object
  - Replica nodes answers eventually if they are up
  - After the first R replicas answer, the proxy node returns value to
  the
  client.
  - If some of the replicas are non updated and readrepair is active,
  proxy
  node updates those replicas.
 
  Ok, so far so good.
 
  But now I found some incoherences that I don't understand:
 
  Let's suppose that we have a 5 node cluster: x1, x2, x3, x4 and x5
  each with replication factor 3, read_repair_chance=0.0,
  autobootstrap=false and caching=NONE
  We have keyspace KS1 and colunfamily CF1.
 
  With this configuration, we know that if any node crashes and erases
  its
  data directories it will be necesary to run nodetool repair in that
  node in
  order to repair that node and gather information from its replica
  companions.
 
  So, let's suppose that x1, x2 and x3 are the endpoint which stores the
  data KS1.CF1['data1']
  If x1 crashes (loosing all its data), and we execute get
  KS1.CF1['data1']
  with consistency level ALL, the operation will fail. That is ok to my
  understanding.
 
  If we restart x1 node and doesn't execute nodetool repair and repeat
  the
  operation get KS1.CF1['data1'] using consistency ALL, we will obtain
  the
  original data! Why? one of the nodes doesn't have any data about
  KS1.CF1['data1']. Ok, let's suppose that as all the required nodes
  answer,
  even if one doesn't have data, the operation ends correctly.
 
  Now let's repeat the same procedure with the rest of nodes, that is:
 
  1- stop x1, erase data, logs, cache and commitlog from x1
  2- restart x1 adn don't repair it
  3- stop x2, erase data, logs, cache and commitlog from x2
  4- restart x2 adn don't repair it
  5- stop x3, erase data, logs, cache and commitlog from x3
  6- restart x3 adn don't repair it
  7- execute get KS1.CF1['data1'] with consistency level ALL - still
  return the correct data!
 
  Where did that data come from? the endpoint is supposed to be empty of
  data. I tried this using cassandra-cli and cassandra's ruby client and
  the
  result is always the same. What did I miss?
 
  Thank you for reading until the end, ;)
 
  Bye
 
  Carlos Pérez Miguel
 
 




trying to use row_cache (b/c we have hot rows) but nodetool info says zero requests

2013-01-16 Thread Brian Tarbox
We have quite wide rows and do a lot of concentrated processing on each
row...so I thought I'd try the row cache on one node in my cluster to see
if I could detect an effect of using it.

The problem is that nodetool info says that even with a two gig row_cache
we're getting zero requests.  Since my client program is actively
processing, and since keycache shows lots of activity I'm puzzled.

Shouldn't any read of a column cause the entire row to be loaded?

My entire data file is only 32 gig right now so its hard to imagine the 2
gig is too small to hold even a single row?

Any suggestions how to proceed are appreciated.

Thanks.

Brian Tarbox


unsubscribe

2013-01-16 Thread Leonid Ilyevsky


Leonid Ilyevsky
Moon Capital Management, LP
499 Park Avenue
New York, NY 10022
P: (212) 652-4586
F: (212) 652-4501
E: lilyev...@mooncapital.com

[cid:image001.png@01CDF3EE.E9EA60F0]



This email, along with any attachments, is confidential and may be legally 
privileged or otherwise protected from disclosure. Any unauthorized 
dissemination, copying or use of the contents of this email is strictly 
prohibited and may be in violation of law. If you are not the intended 
recipient, any disclosure, copying, forwarding or distribution of this email is 
strictly prohibited and this email and any attachments should be deleted 
immediately. This email and any attachments do not constitute an offer to sell 
or a solicitation of an offer to purchase any interest in any investment 
vehicle sponsored by Moon Capital Management LP (Moon Capital). Moon Capital 
does not provide legal, accounting or tax advice. Any statement regarding 
legal, accounting or tax matters was not intended or written to be relied upon 
by any person as advice. Moon Capital does not waive confidentiality or 
privilege as a result of this email.
inline: image001.png

Re: unsubscribe

2013-01-16 Thread Michael Kjellman
Writing to the list user@cassandra.apache.org
Subscription addressuser-subscr...@cassandra.apache.org
Digest subscription address user-digest-subscr...@cassandra.apache.org
Unsubscription addressesuser-unsubscr...@cassandra.apache.org
Getting help with the list  user-h...@cassandra.apache.org

From: Leonid Ilyevsky 
lilyev...@mooncapital.commailto:lilyev...@mooncapital.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Wednesday, January 16, 2013 10:39 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: unsubscribe



Leonid Ilyevsky
Moon Capital Management, LP
499 Park Avenue
New York, NY 10022
P: (212) 652-4586
F: (212) 652-4501
E: lilyev...@mooncapital.commailto:lilyev...@mooncapital.com

[cid:image001.png@01CDF3EE.E9EA60F0]



This email, along with any attachments, is confidential and may be legally 
privileged or otherwise protected from disclosure. Any unauthorized 
dissemination, copying or use of the contents of this email is strictly 
prohibited and may be in violation of law. If you are not the intended 
recipient, any disclosure, copying, forwarding or distribution of this email is 
strictly prohibited and this email and any attachments should be deleted 
immediately. This email and any attachments do not constitute an offer to sell 
or a solicitation of an offer to purchase any interest in any investment 
vehicle sponsored by Moon Capital Management LP (“Moon Capital”). Moon Capital 
does not provide legal, accounting or tax advice. Any statement regarding 
legal, accounting or tax matters was not intended or written to be relied upon 
by any person as advice. Moon Capital does not waive confidentiality or 
privilege as a result of this email.
attachment: image001.png

Re: Pig / Map Reduce on Cassandra

2013-01-16 Thread Michael Kjellman
Brisk is pretty much stagnant. I think someone forked it to work with 1.0
but not sure how that is going. You'll need to pay for DSE to get CFS
(which is essentially Brisk) if you want to use any modern version of C*.

Best,
Michael

On 1/16/13 11:17 AM, cscetbon@orange.com cscetbon@orange.com
wrote:

Thanks I understand that your code uses the hadoop interface of Cassandra
to be able to read from it with a job. However I would like to know how
to bring pieces (hive + pig + hadoop) together with cassandra as the
storage layer, not to get code to test it. I have found repository
https://github.com/riptano/brisk which might be a good start for it

Regards 

On Jan 16, 2013, at 4:27 PM, James Schappet jschap...@gmail.com wrote:

 Try this one then, it reads from cassandra, then writes back to
cassandra,
 but you could change the write to where ever you would like.
 
 
 
   getConf().set(IN_COLUMN_NAME, columnName );
 
  Job job = new Job(getConf(), ProcessRawXml);
job.setInputFormatClass(ColumnFamilyInputFormat.class);
  job.setNumReduceTasks(0);
 
job.setJarByClass(StartJob.class);
job.setMapperClass(ParseMapper.class);
job.setOutputKeyClass(ByteBuffer.class);
//job.setOutputValueClass(Text.class);
  job.setOutputFormatClass(ColumnFamilyOutputFormat.class);
 
ConfigHelper.setOutputColumnFamily(job.getConfiguration(),
 KEYSPACE, COLUMN_FAMILY);
job.setInputFormatClass(ColumnFamilyInputFormat.class);
ConfigHelper.setRpcPort(job.getConfiguration(), 9160);
//org.apache.cassandra.dht.LocalPartitioner
  ConfigHelper.setInitialAddress(job.getConfiguration(),
 localhost);
  ConfigHelper.setPartitioner(job.getConfiguration(),
 org.apache.cassandra.dht.RandomPartitioner);
  ConfigHelper.setInputColumnFamily(job.getConfiguration(),
 KEYSPACE, COLUMN_FAMILY);
 
 
  SlicePredicate predicate = new
 
SlicePredicate().setColumn_names(Arrays.asList(ByteBufferUtil.bytes(colum
nN
 ame)));
 //   SliceRange slice_range = new SliceRange();
 //   slice_range.setStart(ByteBufferUtil.bytes(startPoint));
 //   slice_range.setFinish(ByteBufferUtil.bytes(endPoint));
 //   
 //   predicate.setSlice_range(slice_range);
  ConfigHelper.setInputSlicePredicate(job.getConfiguration(),
 predicate);
 
  job.waitForCompletion(true);
 
 
 
https://github.com/jschappet/medline/blob/master/src/main/java/edu/uiowa/
ic
 ts/jobs/ProcessXml/StartJob.java
 
 
 
 
 
 
 
 
 On 1/16/13 9:22 AM, cscetbon@orange.com cscetbon@orange.com
 wrote:
 
 I don't want to write to Cassandra as it replicates data from another
 datacenter, but I just want to use Hadoop Jobs (Pig and Hive) to read
 data from it. I would like to use the same configuration as
 
http://www.datastax.com/dev/blog/hadoop-mapreduce-in-the-cassandra-clust
er
 but I want to know if there are alternatives to DataStax Enterprise
 package.
 
 Thanks
 On Jan 16, 2013, at 3:59 PM, James Schappet jschap...@gmail.com
wrote:
 
 Here are a few examples I have worked on, reading from xml.gz files
then
 writing to cassandara.
 
 
 https://github.com/jschappet/medline
 
 You will also need:
 
 https://github.com/jschappet/medline-base
 
 
 
 These examples are Hadoop Jobs using Cassandra as the Data Store.
 
 This one is a good place to start.
 
 
https://github.com/jschappet/medline/blob/master/src/main/java/edu/uiow
a/
 ic
 ts/jobs/LoadMedline/StartJob.java
 
 ConfigHelper.setInputColumnFamily(job.getConfiguration(), KEYSPACE,
 COLUMN_FAMILY);
ConfigHelper.setOutputColumnFamily(job.getConfiguration(),
KEYSPACE,
 outputPath);
 
   job.setMapperClass(MapperToCassandra.class);
   job.setOutputKeyClass(Text.class);
   job.setOutputValueClass(Text.class);
 
LOG.info(Writing output to Cassandra);
//job.setReducerClass(ReducerToCassandra.class);
job.setOutputFormatClass(ColumnFamilyOutputFormat.class);
 
   ConfigHelper.setRpcPort(job.getConfiguration(), 9160);
   //org.apache.cassandra.dht.LocalPartitioner
   ConfigHelper.setInitialAddress(job.getConfiguration(),
 localhost);
   ConfigHelper.setPartitioner(job.getConfiguration(),
 org.apache.cassandra.dht.RandomPartitioner);
 
 
 
 
 
 
 On 1/16/13 7:37 AM, cscetbon@orange.com
cscetbon@orange.com
 wrote:
 
 Hi,
 
 I know that DataStax Enterprise package provide Brisk, but is there a
 community version ? Is it easy to interface Hadoop with Cassandra as
 the
 storage or do we absolutely have to use Brisk for that ?
 I know CassandraFS is natively available in cassandra 1.2, the
version
 I
 use, so is there a way/procedure to interface hadoop with Cassandra
as
 the storage ?
 
 Thanks 
 
 
__
__
 __
 ___
 
 Ce message et ses 

LCS not removing rows with all TTL expired columns

2013-01-16 Thread Bryan Talbot
On cassandra 1.1.5 with a write heavy workload, we're having problems
getting rows to be compacted away (removed) even though all columns have
expired TTL.  We've tried size tiered and now leveled and are seeing the
same symptom: the data stays around essentially forever.

Currently we write all columns with a TTL of 72 hours (259200 seconds) and
expect to add 10 GB of data to this CF per day per node.  Each node
currently has 73 GB for the affected CF and shows no indications that old
rows will be removed on their own.

Why aren't rows being removed?  Below is some data from a sample row which
should have been removed several days ago but is still around even though
it has been involved in numerous compactions since being expired.

$ ./bin/nodetool -h localhost getsstables metrics request_summary
459fb460-5ace-11e2-9b92-11d67b6163b4
/virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db

$ ls -alF
/virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
-rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42
/virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db

$ ./bin/sstable2json
/virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
-k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 %x')
{
34353966623436302d356163652d313165322d396239322d313164363762363136336234:
[[app_name,50f21d3d,1357785277207001,d],
[client_ip,50f21d3d,1357785277207001,d],
[client_req_id,50f21d3d,1357785277207001,d],
[mysql_call_cnt,50f21d3d,1357785277207001,d],
[mysql_duration_us,50f21d3d,1357785277207001,d],
[mysql_failure_call_cnt,50f21d3d,1357785277207001,d],
[mysql_success_call_cnt,50f21d3d,1357785277207001,d],
[req_duration_us,50f21d3d,1357785277207001,d],
[req_finish_time_us,50f21d3d,1357785277207001,d],
[req_method,50f21d3d,1357785277207001,d],
[req_service,50f21d3d,1357785277207001,d],
[req_start_time_us,50f21d3d,1357785277207001,d],
[success,50f21d3d,1357785277207001,d]]
}


Decoding the column timestamps to shows that the columns were written at
Thu, 10 Jan 2013 02:34:37 GMT and that their TTL expired at Sun, 13 Jan
2013 02:34:37 GMT.  The date of the SSTable shows that it was generated on
Jan 16 which is 3 days after all columns have TTL-ed out.


The schema shows that gc_grace is set to 0 since this data is write-once,
read-seldom and is never updated or deleted.

create column family request_summary
  with column_type = 'Standard'
  and comparator = 'UTF8Type'
  and default_validation_class = 'UTF8Type'
  and key_validation_class = 'UTF8Type'
  and read_repair_chance = 0.1
  and dclocal_read_repair_chance = 0.0
  and gc_grace = 0
  and min_compaction_threshold = 4
  and max_compaction_threshold = 32
  and replicate_on_write = true
  and compaction_strategy =
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
  and caching = 'NONE'
  and bloom_filter_fp_chance = 1.0
  and compression_options = {'chunk_length_kb' : '64',
'sstable_compression' :
'org.apache.cassandra.io.compress.SnappyCompressor'};


Thanks in advance for help in understanding why rows such as this are not
removed!

-Bryan


Re: Query column names

2013-01-16 Thread Renato Marroquín Mogrovejo
What I mean is that if there is a way of doing this but using Hector:


-
public static void main(String[] args) throws Exception {
Connector conn = new Connector();
Cassandra.Client client = conn.connect();

SlicePredicate predicate = new SlicePredicate();
Listbyte[] colNames = new ArrayListbyte[]();
colNames.add(a.getBytes());
colNames.add(b.getBytes());
predicate.column_names = colNames;

ColumnParent parent = new ColumnParent(Standard1);

byte[] key = k1.getBytes();
ListColumnOrSuperColumn results =
client.get_slice(key, parent, predicate, ConsistencyLevel.ONE);

for (ColumnOrSuperColumn cosc : results) {
Column c = cosc.column;
System.out.println(new String(c.name, UTF-8) +  : 
+ new String(c.value, UTF-8));
}

conn.close();

System.out.println(All done.);
}
-


Thanks!

2013/1/16 Renato Marroquín Mogrovejo renatoj.marroq...@gmail.com:
 Hi,

 I am facing some problems while retrieving a some events from a column
 family. I am using as column family name the event name plus the
 timestamp of when it occurred.
 The thing is that now I want to find out the latest event and I don't
 how to query asking for the last event without a RangeSlicesQuery,
 getting all rows, and columns, and asking one by one.
 Is there any other better way of doing this using Hector client?

 [default@clickstream] list click_event;
 ---
 RowKey: 
 706d63666164696e3a31396132613664322d633730642d343139362d623638642d396663663638343766333563
 = (column=start:2013-01-13 18:14:59.244, value=, timestamp=1358118943979000)
 = (column=stop:2013-01-13 18:15:56.793,
 value=323031332d30312d31332031383a31353a35382e333437,
 timestamp=1358118960946000)

 Thanks in advance!


 Renato M.


Re: AWS EMR - Cassandra

2013-01-16 Thread Marcelo Elias Del Valle
William,

I just saw your message today. I am using Cassandra + Amazon EMR
(hadoop 1.0.3) but I am not using PIG as you are. I set my configuration
vars in Java, as I have a custom jar file and I am using
ColumnFamilyInputFormat.
However, if I understood well your problem, the only thing you have to
do is to set environment vars when running cluster tasks, right? Take a
look a this link:
http://sujee.net/tech/articles/hadoop/amazon-emr-beyond-basics/
As it shows, you can run EMR setting some command line arguments that
specify a script to be executed before the job starts, in each machine in
the cluster. This way, you would be able to correctly set the vars you need.
 Out of curiosity, could you share what are you using for cassandra
storage? I am currently using EC2 local disks, but I am looking for an
alternative.

Best regards,
Marcelo.


2013/1/4 William Oberman ober...@civicscience.com

 So I've made it work, but I don't get it yet.

 I have no idea why my DIY server works when I set the environment
 variables on the machine that kicks off pig (master), and in EMR it
 doesn't.  I recompiled ConfigHelper and CassandraStorage with tons of
 debugging, and in EMR I can see the hadoop Configuration object get the
 proper values on the master node, and I can see it does NOT propagate to
 the task threads.

 The other part that was driving me nuts could be made more user friendly.
  The issue is this: I started to try to set
 cassandra.thrift.address, cassandra.thrift.port,
 cassandra.partitioner.class in mapred-site.xml, and it didn't work.  After
 even more painful debugging, I noticed that the only time Cassandra sets
 the input/output versions of those settings (and these input/output
 specific versions are the only versions really used!) is when Cassandra
 maps the system environment variables.  So, having cassandra.thrift.address
 in mapred-site.xml does NOTHING, as I needed to
 have cassandra.output.thrift.address set.  It would be much nicer if the
 get{Input/Output}XYZ checked for the existence of getXYZ
 if get{Input/Output}XYZ is empty/null.  E.g. in getOutputThriftAddress(),
 if that setting is null, it would have been nice if that method returned
 getThriftAddress().  My problem went away when I put the full cross product
 in the XML. E.g. cassandra.input.thrift.address
 and cassandra.output.thrift.address (and port, and partitioner).

 I still want to know why the old easy way (of setting the 3 system
 variables on the pig starter box, and having the config flow into the task
 trackers) doesn't work!

 will


 On Fri, Jan 4, 2013 at 9:04 AM, William Oberman 
 ober...@civicscience.comwrote:

 On all tasktrackers, I see:
 java.io.IOException: PIG_OUTPUT_INITIAL_ADDRESS or PIG_INITIAL_ADDRESS
 environment variable not set
 at
 org.apache.cassandra.hadoop.pig.CassandraStorage.setStoreLocation(CassandraStorage.java:821)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setLocation(PigOutputFormat.java:170)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.setUpContext(PigOutputCommitter.java:112)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.getCommitters(PigOutputCommitter.java:86)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.init(PigOutputCommitter.java:67)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getOutputCommitter(PigOutputFormat.java:279)
 at org.apache.hadoop.mapred.Task.initialize(Task.java:515)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:358)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)


 On Thu, Jan 3, 2013 at 10:45 PM, aaron morton aa...@thelastpickle.comwrote:

 Instead, I get an error from CassandraStorage that the initial address
 isn't set (on the slave, the master is ok).

 Can you post the full error ?

 Cheers
-
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 4/01/2013, at 11:15 AM, William Oberman ober...@civicscience.com
 wrote:

 Anyone ever try to read or write directly between EMR - Cassandra?

 I'm running various Cassandra resources in Ec2, so the physical
 connection part is pretty easy using security groups.  But, I'm having
 some configuration issues.  I have managed to get Cassandra + Hadoop
 working in the past using a DIY hadoop cluster, and looking at the
 configurations in the two environments (EMR vs DIY), I'm not sure what's
 different that is causing my failures...  I should probably note I'm using

Re: Pig / Map Reduce on Cassandra

2013-01-16 Thread cscetbon.ext
Here is the point. You're right this github repository has not been updated for 
a year and a half. I thought brisk was just a bundle of some technologies and 
that it was possible to install the same components and make them work together 
without using this bundle :(


On Jan 16, 2013, at 8:22 PM, Michael Kjellman mkjell...@barracuda.com wrote:

 Brisk is pretty much stagnant. I think someone forked it to work with 1.0
 but not sure how that is going. You'll need to pay for DSE to get CFS
 (which is essentially Brisk) if you want to use any modern version of C*.
 
 Best,
 Michael
 
 On 1/16/13 11:17 AM, cscetbon@orange.com cscetbon@orange.com
 wrote:
 
 Thanks I understand that your code uses the hadoop interface of Cassandra
 to be able to read from it with a job. However I would like to know how
 to bring pieces (hive + pig + hadoop) together with cassandra as the
 storage layer, not to get code to test it. I have found repository
 https://github.com/riptano/brisk which might be a good start for it
 
 Regards
 
 On Jan 16, 2013, at 4:27 PM, James Schappet jschap...@gmail.com wrote:
 
 Try this one then, it reads from cassandra, then writes back to
 cassandra,
 but you could change the write to where ever you would like.
 
 
 
  getConf().set(IN_COLUMN_NAME, columnName );
 
 Job job = new Job(getConf(), ProcessRawXml);
   job.setInputFormatClass(ColumnFamilyInputFormat.class);
 job.setNumReduceTasks(0);
 
   job.setJarByClass(StartJob.class);
   job.setMapperClass(ParseMapper.class);
   job.setOutputKeyClass(ByteBuffer.class);
   //job.setOutputValueClass(Text.class);
 job.setOutputFormatClass(ColumnFamilyOutputFormat.class);
 
   ConfigHelper.setOutputColumnFamily(job.getConfiguration(),
 KEYSPACE, COLUMN_FAMILY);
   job.setInputFormatClass(ColumnFamilyInputFormat.class);
   ConfigHelper.setRpcPort(job.getConfiguration(), 9160);
   //org.apache.cassandra.dht.LocalPartitioner
 ConfigHelper.setInitialAddress(job.getConfiguration(),
 localhost);
 ConfigHelper.setPartitioner(job.getConfiguration(),
 org.apache.cassandra.dht.RandomPartitioner);
 ConfigHelper.setInputColumnFamily(job.getConfiguration(),
 KEYSPACE, COLUMN_FAMILY);
 
 
 SlicePredicate predicate = new
 
 SlicePredicate().setColumn_names(Arrays.asList(ByteBufferUtil.bytes(colum
 nN
 ame)));
 //   SliceRange slice_range = new SliceRange();
 //   slice_range.setStart(ByteBufferUtil.bytes(startPoint));
 //   slice_range.setFinish(ByteBufferUtil.bytes(endPoint));
 //
 //   predicate.setSlice_range(slice_range);
 ConfigHelper.setInputSlicePredicate(job.getConfiguration(),
 predicate);
 
 job.waitForCompletion(true);
 
 
 
 https://github.com/jschappet/medline/blob/master/src/main/java/edu/uiowa/
 ic
 ts/jobs/ProcessXml/StartJob.java
 
 
 
 
 
 
 
 
 On 1/16/13 9:22 AM, cscetbon@orange.com cscetbon@orange.com
 wrote:
 
 I don't want to write to Cassandra as it replicates data from another
 datacenter, but I just want to use Hadoop Jobs (Pig and Hive) to read
 data from it. I would like to use the same configuration as
 
 http://www.datastax.com/dev/blog/hadoop-mapreduce-in-the-cassandra-clust
 er
 but I want to know if there are alternatives to DataStax Enterprise
 package.
 
 Thanks
 On Jan 16, 2013, at 3:59 PM, James Schappet jschap...@gmail.com
 wrote:
 
 Here are a few examples I have worked on, reading from xml.gz files
 then
 writing to cassandara.
 
 
 https://github.com/jschappet/medline
 
 You will also need:
 
 https://github.com/jschappet/medline-base
 
 
 
 These examples are Hadoop Jobs using Cassandra as the Data Store.
 
 This one is a good place to start.
 
 
 https://github.com/jschappet/medline/blob/master/src/main/java/edu/uiow
 a/
 ic
 ts/jobs/LoadMedline/StartJob.java
 
 ConfigHelper.setInputColumnFamily(job.getConfiguration(), KEYSPACE,
 COLUMN_FAMILY);
   ConfigHelper.setOutputColumnFamily(job.getConfiguration(),
 KEYSPACE,
 outputPath);
 
  job.setMapperClass(MapperToCassandra.class);
  job.setOutputKeyClass(Text.class);
  job.setOutputValueClass(Text.class);
 
   LOG.info(Writing output to Cassandra);
   //job.setReducerClass(ReducerToCassandra.class);
   job.setOutputFormatClass(ColumnFamilyOutputFormat.class);
 
  ConfigHelper.setRpcPort(job.getConfiguration(), 9160);
  //org.apache.cassandra.dht.LocalPartitioner
  ConfigHelper.setInitialAddress(job.getConfiguration(),
 localhost);
  ConfigHelper.setPartitioner(job.getConfiguration(),
 org.apache.cassandra.dht.RandomPartitioner);
 
 
 
 
 
 
 On 1/16/13 7:37 AM, cscetbon@orange.com
 cscetbon@orange.com
 wrote:
 
 Hi,
 
 I know that DataStax Enterprise package provide Brisk, but is there a
 community version ? Is it easy to interface Hadoop with Cassandra as
 the
 storage or do we absolutely have to use 

Cassandra at Amazon AWS

2013-01-16 Thread Marcelo Elias Del Valle
Hello,

   I am currently using hadoop + cassandra at amazon AWS. Cassandra runs on
EC2 and my hadoop process runs at EMR. For cassandra storage, I am using
local EC2 EBS disks.
   My system is running fine for my tests, but to me it's not a good setup
for production. I need my system to perform well for specially for writes
on cassandra, but the amount of data could grow really big, taking several
Tb of total storage.
My first guess was using S3 as a storage and I saw this can be done by
using Cloudian package, but I wouldn't like to become dependent on a
pre-package solution and I found it's kind of expensive for more than
100Tb: http://www.cloudian.com/pricing.html
I saw some discussion at internet about using EBS or ephemeral disks
for storage at Amazon too.

My question is: does someone on this list have the same problem as me?
What are you using as solution to Cassandra's storage when running it at
Amazon AWS?

Any thoughts would be highly appreciatted.

Best regards,
-- 
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr


Re: Query column names

2013-01-16 Thread Renato Marroquín Mogrovejo
After searching for a while I found what I was looking for [1]
Hope it helps to someone else (:


Renato M.

[1] http://www.datastax.com/dev/blog/introduction-to-composite-columns-part-1

2013/1/16 Renato Marroquín Mogrovejo renatoj.marroq...@gmail.com:
 What I mean is that if there is a way of doing this but using Hector:


 -
 public static void main(String[] args) throws Exception {
 Connector conn = new Connector();
 Cassandra.Client client = conn.connect();

 SlicePredicate predicate = new SlicePredicate();
 Listbyte[] colNames = new ArrayListbyte[]();
 colNames.add(a.getBytes());
 colNames.add(b.getBytes());
 predicate.column_names = colNames;

 ColumnParent parent = new ColumnParent(Standard1);

 byte[] key = k1.getBytes();
 ListColumnOrSuperColumn results =
 client.get_slice(key, parent, predicate, ConsistencyLevel.ONE);

 for (ColumnOrSuperColumn cosc : results) {
 Column c = cosc.column;
 System.out.println(new String(c.name, UTF-8) +  : 
 + new String(c.value, UTF-8));
 }

 conn.close();

 System.out.println(All done.);
 }
 -


 Thanks!

 2013/1/16 Renato Marroquín Mogrovejo renatoj.marroq...@gmail.com:
 Hi,

 I am facing some problems while retrieving a some events from a column
 family. I am using as column family name the event name plus the
 timestamp of when it occurred.
 The thing is that now I want to find out the latest event and I don't
 how to query asking for the last event without a RangeSlicesQuery,
 getting all rows, and columns, and asking one by one.
 Is there any other better way of doing this using Hector client?

 [default@clickstream] list click_event;
 ---
 RowKey: 
 706d63666164696e3a31396132613664322d633730642d343139362d623638642d396663663638343766333563
 = (column=start:2013-01-13 18:14:59.244, value=, timestamp=1358118943979000)
 = (column=stop:2013-01-13 18:15:56.793,
 value=323031332d30312d31332031383a31353a35382e333437,
 timestamp=1358118960946000)

 Thanks in advance!


 Renato M.


Re: AWS EMR - Cassandra

2013-01-16 Thread William Oberman
DataStax recommended (forget the reference) to use the ephemeral disks in
RAID0, which is what I've been running for well over a year now in
production.

In terms of how I'm doing Cassandra/AWS/Hadoop, I started by doing the
split data center thing (one DC for low latency queries, one DC for
hadoop).  But, that's a lot of system management.  And compute is the most
expensive part of AWS, and you need a LOT of compute to run this setup.  I
tried doing Cassandra EC2 cluster - snapshot - clone cluster with hadoop
overlay - ETL to S3 using hadoop - EMR for real work.  But that's kind of
a pain too (and the ETL to S3 wasn't very fast).

Now I'm going after the SStables directly(*), which sounds like how Netflix
does it.  You can do incremental updates, if you're careful.

(*) Cassandra EC2 - backup to local EBS - remap EBS to another box -
sstable2json over new sstables - S3 (splitting into ~100MB parts), then
use EMR to consume the JSON part files.

will


On Wed, Jan 16, 2013 at 3:30 PM, Marcelo Elias Del Valle mvall...@gmail.com
 wrote:

 William,

 I just saw your message today. I am using Cassandra + Amazon EMR
 (hadoop 1.0.3) but I am not using PIG as you are. I set my configuration
 vars in Java, as I have a custom jar file and I am using
 ColumnFamilyInputFormat.
 However, if I understood well your problem, the only thing you have to
 do is to set environment vars when running cluster tasks, right? Take a
 look a this link:
 http://sujee.net/tech/articles/hadoop/amazon-emr-beyond-basics/
 As it shows, you can run EMR setting some command line arguments that
 specify a script to be executed before the job starts, in each machine in
 the cluster. This way, you would be able to correctly set the vars you need.
  Out of curiosity, could you share what are you using for cassandra
 storage? I am currently using EC2 local disks, but I am looking for an
 alternative.

 Best regards,
 Marcelo.


 2013/1/4 William Oberman ober...@civicscience.com

 So I've made it work, but I don't get it yet.

 I have no idea why my DIY server works when I set the environment
 variables on the machine that kicks off pig (master), and in EMR it
 doesn't.  I recompiled ConfigHelper and CassandraStorage with tons of
 debugging, and in EMR I can see the hadoop Configuration object get the
 proper values on the master node, and I can see it does NOT propagate to
 the task threads.

 The other part that was driving me nuts could be made more user friendly.
  The issue is this: I started to try to set
 cassandra.thrift.address, cassandra.thrift.port,
 cassandra.partitioner.class in mapred-site.xml, and it didn't work.  After
 even more painful debugging, I noticed that the only time Cassandra sets
 the input/output versions of those settings (and these input/output
 specific versions are the only versions really used!) is when Cassandra
 maps the system environment variables.  So, having cassandra.thrift.address
 in mapred-site.xml does NOTHING, as I needed to
 have cassandra.output.thrift.address set.  It would be much nicer if the
 get{Input/Output}XYZ checked for the existence of getXYZ
 if get{Input/Output}XYZ is empty/null.  E.g. in getOutputThriftAddress(),
 if that setting is null, it would have been nice if that method returned
 getThriftAddress().  My problem went away when I put the full cross product
 in the XML. E.g. cassandra.input.thrift.address
 and cassandra.output.thrift.address (and port, and partitioner).

 I still want to know why the old easy way (of setting the 3 system
 variables on the pig starter box, and having the config flow into the task
 trackers) doesn't work!

 will


 On Fri, Jan 4, 2013 at 9:04 AM, William Oberman ober...@civicscience.com
  wrote:

 On all tasktrackers, I see:
 java.io.IOException: PIG_OUTPUT_INITIAL_ADDRESS or PIG_INITIAL_ADDRESS
 environment variable not set
 at
 org.apache.cassandra.hadoop.pig.CassandraStorage.setStoreLocation(CassandraStorage.java:821)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setLocation(PigOutputFormat.java:170)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.setUpContext(PigOutputCommitter.java:112)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.getCommitters(PigOutputCommitter.java:86)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.init(PigOutputCommitter.java:67)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getOutputCommitter(PigOutputFormat.java:279)
 at org.apache.hadoop.mapred.Task.initialize(Task.java:515)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:358)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at
 

Re: Cassandra at Amazon AWS

2013-01-16 Thread Ben Chobot
We use cassandra on ephemeral drives. Yes, that means we need more nodes to 
hold more data, but doesn't that play into cassandra's strengths?

It sounds like you're trying to vertically scale your cassandra cluster.

On Jan 16, 2013, at 12:42 PM, Marcelo Elias Del Valle wrote:

 Hello, 
 
I am currently using hadoop + cassandra at amazon AWS. Cassandra runs on 
 EC2 and my hadoop process runs at EMR. For cassandra storage, I am using 
 local EC2 EBS disks. 
My system is running fine for my tests, but to me it's not a good setup 
 for production. I need my system to perform well for specially for writes on 
 cassandra, but the amount of data could grow really big, taking several Tb of 
 total storage. 
 My first guess was using S3 as a storage and I saw this can be done by 
 using Cloudian package, but I wouldn't like to become dependent on a 
 pre-package solution and I found it's kind of expensive for more than 100Tb: 
 http://www.cloudian.com/pricing.html
 I saw some discussion at internet about using EBS or ephemeral disks for 
 storage at Amazon too. 
 
 My question is: does someone on this list have the same problem as me? 
 What are you using as solution to Cassandra's storage when running it at 
 Amazon AWS?
 
 Any thoughts would be highly appreciatted. 
 
 Best regards,
 -- 
 Marcelo Elias Del Valle
 http://mvalle.com - @mvallebr



Re: Cassandra at Amazon AWS

2013-01-16 Thread Andrey Ilinykh
Storage size is not a problem, you always can add more nodes. Anyway, it is
not recommended to have nodes with more then 500G (compaction, repair take
forever). EC2 m1.large has 800G of ephemeral storage, EC2 m1.xlarge 1.6T.
I'd recommend xlarge, it has 4 CPUs, so maintenance procedures don't affect
performance a lot.

Andrey


On Wed, Jan 16, 2013 at 12:42 PM, Marcelo Elias Del Valle 
mvall...@gmail.com wrote:

 Hello,

I am currently using hadoop + cassandra at amazon AWS. Cassandra runs
 on EC2 and my hadoop process runs at EMR. For cassandra storage, I am using
 local EC2 EBS disks.
My system is running fine for my tests, but to me it's not a good setup
 for production. I need my system to perform well for specially for writes
 on cassandra, but the amount of data could grow really big, taking several
 Tb of total storage.
 My first guess was using S3 as a storage and I saw this can be done by
 using Cloudian package, but I wouldn't like to become dependent on a
 pre-package solution and I found it's kind of expensive for more than
 100Tb: http://www.cloudian.com/pricing.html
 I saw some discussion at internet about using EBS or ephemeral disks
 for storage at Amazon too.

 My question is: does someone on this list have the same problem as me?
 What are you using as solution to Cassandra's storage when running it at
 Amazon AWS?

 Any thoughts would be highly appreciatted.

 Best regards,
 --
 Marcelo Elias Del Valle
 http://mvalle.com - @mvallebr



Re: AWS EMR - Cassandra

2013-01-16 Thread Marcelo Elias Del Valle
That's good info! Thanks!


2013/1/16 William Oberman ober...@civicscience.com

 DataStax recommended (forget the reference) to use the ephemeral disks in
 RAID0, which is what I've been running for well over a year now in
 production.

 In terms of how I'm doing Cassandra/AWS/Hadoop, I started by doing the
 split data center thing (one DC for low latency queries, one DC for
 hadoop).  But, that's a lot of system management.  And compute is the most
 expensive part of AWS, and you need a LOT of compute to run this setup.  I
 tried doing Cassandra EC2 cluster - snapshot - clone cluster with hadoop
 overlay - ETL to S3 using hadoop - EMR for real work.  But that's kind of
 a pain too (and the ETL to S3 wasn't very fast).

 Now I'm going after the SStables directly(*), which sounds like how
 Netflix does it.  You can do incremental updates, if you're careful.

 (*) Cassandra EC2 - backup to local EBS - remap EBS to another box -
 sstable2json over new sstables - S3 (splitting into ~100MB parts), then
 use EMR to consume the JSON part files.

 will



 On Wed, Jan 16, 2013 at 3:30 PM, Marcelo Elias Del Valle 
 mvall...@gmail.com wrote:

 William,

 I just saw your message today. I am using Cassandra + Amazon EMR
 (hadoop 1.0.3) but I am not using PIG as you are. I set my configuration
 vars in Java, as I have a custom jar file and I am using
 ColumnFamilyInputFormat.
 However, if I understood well your problem, the only thing you have
 to do is to set environment vars when running cluster tasks, right? Take a
 look a this link:
 http://sujee.net/tech/articles/hadoop/amazon-emr-beyond-basics/
 As it shows, you can run EMR setting some command line arguments that
 specify a script to be executed before the job starts, in each machine in
 the cluster. This way, you would be able to correctly set the vars you need.
  Out of curiosity, could you share what are you using for cassandra
 storage? I am currently using EC2 local disks, but I am looking for an
 alternative.

 Best regards,
 Marcelo.


 2013/1/4 William Oberman ober...@civicscience.com

 So I've made it work, but I don't get it yet.

 I have no idea why my DIY server works when I set the environment
 variables on the machine that kicks off pig (master), and in EMR it
 doesn't.  I recompiled ConfigHelper and CassandraStorage with tons of
 debugging, and in EMR I can see the hadoop Configuration object get the
 proper values on the master node, and I can see it does NOT propagate to
 the task threads.

 The other part that was driving me nuts could be made more user
 friendly.  The issue is this: I started to try to set
 cassandra.thrift.address, cassandra.thrift.port,
 cassandra.partitioner.class in mapred-site.xml, and it didn't work.  After
 even more painful debugging, I noticed that the only time Cassandra sets
 the input/output versions of those settings (and these input/output
 specific versions are the only versions really used!) is when Cassandra
 maps the system environment variables.  So, having cassandra.thrift.address
 in mapred-site.xml does NOTHING, as I needed to
 have cassandra.output.thrift.address set.  It would be much nicer if the
 get{Input/Output}XYZ checked for the existence of getXYZ
 if get{Input/Output}XYZ is empty/null.  E.g. in getOutputThriftAddress(),
 if that setting is null, it would have been nice if that method returned
 getThriftAddress().  My problem went away when I put the full cross product
 in the XML. E.g. cassandra.input.thrift.address
 and cassandra.output.thrift.address (and port, and partitioner).

 I still want to know why the old easy way (of setting the 3 system
 variables on the pig starter box, and having the config flow into the task
 trackers) doesn't work!

 will


 On Fri, Jan 4, 2013 at 9:04 AM, William Oberman 
 ober...@civicscience.com wrote:

 On all tasktrackers, I see:
 java.io.IOException: PIG_OUTPUT_INITIAL_ADDRESS or PIG_INITIAL_ADDRESS
 environment variable not set
 at
 org.apache.cassandra.hadoop.pig.CassandraStorage.setStoreLocation(CassandraStorage.java:821)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setLocation(PigOutputFormat.java:170)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.setUpContext(PigOutputCommitter.java:112)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.getCommitters(PigOutputCommitter.java:86)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.init(PigOutputCommitter.java:67)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getOutputCommitter(PigOutputFormat.java:279)
 at org.apache.hadoop.mapred.Task.initialize(Task.java:515)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:358)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at 

Re: Cassandra at Amazon AWS

2013-01-16 Thread Jared Biel
We're currently using Cassandra on EC2 at very low scale (a 2 node
cluster on m1.large instances in two regions.) I don't believe that
EBS is recommended for performance reasons. Also, it's proven to be
very unreliable in the past (most of the big/notable AWS outages were
due to EBS issues.) We've moved 99% of our instances off of EBS.

As other have said, if you require more space in the future it's easy
to add more nodes to the cluster. I've found this page
(http://www.ec2instances.info/) very useful in determining the amount
of space each instance type has. Note that by default only one
ephemeral drive is attached and you must specify all ephemeral drives
that you want to use at launch time. Also, you can create a RAID 0 of
all local disks to provide maximum speed and space.


On 16 January 2013 20:42, Marcelo Elias Del Valle mvall...@gmail.com wrote:
 Hello,

I am currently using hadoop + cassandra at amazon AWS. Cassandra runs on
 EC2 and my hadoop process runs at EMR. For cassandra storage, I am using
 local EC2 EBS disks.
My system is running fine for my tests, but to me it's not a good setup
 for production. I need my system to perform well for specially for writes on
 cassandra, but the amount of data could grow really big, taking several Tb
 of total storage.
 My first guess was using S3 as a storage and I saw this can be done by
 using Cloudian package, but I wouldn't like to become dependent on a
 pre-package solution and I found it's kind of expensive for more than 100Tb:
 http://www.cloudian.com/pricing.html
 I saw some discussion at internet about using EBS or ephemeral disks for
 storage at Amazon too.

 My question is: does someone on this list have the same problem as me?
 What are you using as solution to Cassandra's storage when running it at
 Amazon AWS?

 Any thoughts would be highly appreciatted.

 Best regards,
 --
 Marcelo Elias Del Valle
 http://mvalle.com - @mvallebr


Cassandra timeout whereas it is not much busy

2013-01-16 Thread Nicolas Lalevée
Hi,

I have a strange behavior I am not able to understand.

I have 6 nodes with cassandra-1.0.12. Each nodes have 8G of RAM. I have a 
replication factor of 3.

---
my story is maybe too long, trying shorter here, while saving what I wrote in 
case someone has patience to read my bad english ;)

I got under a situation where my cluster was generating a lot of timeouts on 
our frontend, whereas I could not see any major trouble on the internal stats. 
Actually cpu, read  write counts on the column families were quite low. A mess 
until I switched from java7 to java6 and forced the used of jamm. After the 
switch, cpu, read  write counts, were going up again, timeouts gone. I have 
seen this behavior while reducing the xmx too.

What could be blocking cassandra from utilizing the while resources of the 
machine ? Is there is metrics I didn't saw which could explain this ?

---
Here is the long story.

When I first set my cluster up, I gave blindly 6G of heap to the cassandra 
nodes, thinking that more a java process has, the smoother it runs, while 
keeping some RAM to the disk cache. We got some new feature deployed, and 
things were going into hell, some machine up to 60% of wa. I give credit to 
cassandra because there was not that much timeout received on the web frontend, 
it was kind of slow but is was kind of working. With some optimizations, we 
reduced the pressure of the new feature, but it was still at 40%wa.

At that time I didn't have much monitoring, just heap and cpu. I read some 
article how to tune, and I learned that the disk cache is quite important 
because cassandra relies on it to be the read cache. So I have tried many xmx, 
and 3G seems of kind the lowest possible. So on 2 among 6 nodes, I have set 
3,3G to xmx. Amazingly, I saw the wa down to 10%. Quite happy with that, I 
changed the xmx 3,3G on each node. But then things really went to hell, a lot 
of timeouts on the frontend. It was not working at all. So I rolled back.

After some time, probably because of the growing data of the new feature to a 
nominal size, things went again to very high %wa, and cassandra was not able to 
keep it up. So we kind of reverted the feature, the column family is still used 
but only by one thread on the frontend. The wa was reduced to 20%, but things 
continued to not properly working, from time to time, a bunch of timeout are 
raised on our frontend.

In the mean time, I took time to do some proper monitoring of cassandra: column 
family read  write counts, latency, memtable size, but also the dropped 
messages, the pending tasks, the timeouts between nodes. It's just a start but 
it haves me a first nice view of what is actually going on.

I tried again reducing the xmx on one node. Cassandra is not complaining of 
having not enough heap, memtables are not flushed insanely every second, the 
number of read and write is reduced compared to the other node, the cpu is 
lower too, there is not much pending tasks, no message dropped more than 1 or 2 
from time to time. Everything indicates that there is probably more room to 
more work, but the node doesn't take it. Even its read and write latencies are 
lower than on the other nodes. But if I keep this long enough with this xmx, 
timeouts start to raise on the frontends.
After some individual node experiment, the cluster was starting be be quite 
sick. Even with 6G, the %wa were reducing, read and write counts too, on kind 
of every node. And more and more timeout raised on the frontend.
The only thing that I could see worrying, is the heap climbing slowly above the 
75% threshold and from time to time suddenly dropping from 95% to 70%. I looked 
at the full gc counter, not much pressure.
And another thing was some Timed out replaying hints to /10.0.0.56; aborting 
further deliveries in the log. But logged as info, so I guess not much 
important.

After some long useless staring at the monitoring graphs, I gave a try to using 
the openjdk 6b24 rather than openjdk 7u9, and force cassandra to load jamm, 
since in 1.0 the init script blacklist the openjdk. Node after node, I saw that 
the heap was behaving more like I use to see on jam based apps, some nice up 
and down rather than a long and slow climb. But read and write counts were 
still low on every node, and timeout were still bursting on our frontend.
A continuing mess until I restarted the first node of the cluster. There was 
still one to switch to java6 + jamm, but as soon as I restarted my first 
node, every node started working more, %wa climbing, read  write count 
climbing, no more timeout on the frontend, the frontend being then fast has 
hell.

I understand that my cluster is probably under-capacity. But I don't understand 
how since there is something within cassandra which might block the full use of 
the machine resources. It seems kind of related to the heap, but I don't know 
how. Any idea ?
I intend to start monitoring more metrics, but do you have any 

Re: Pig / Map Reduce on Cassandra

2013-01-16 Thread Brandon Williams
On Wed, Jan 16, 2013 at 2:37 PM,  cscetbon@orange.com wrote:
 Here is the point. You're right this github repository has not been updated 
 for a year and a half. I thought brisk was just a bundle of some technologies 
 and that it was possible to install the same components and make them work 
 together without using this bundle :(

You can install hadoop manually alongside Cassandra as well as pig.
Pig support is in C*'s tree in o.a.c.hadoop.pig.  You won't get CFS,
but it's not a hard requirement, either.

-Brandon


Re: How can OpsCenter show me Read Request Latency where there are no read requests??

2013-01-16 Thread Tyler Hobbs
A few milliseconds (or a few thousand usecs) isn't terribly high,
considering that number includes at least one round trip between nodes.
I'm not sure about the tracking behavior that you're describing -- could
you provide some more details or perhaps screenshots?


On Wed, Jan 16, 2013 at 12:16 PM, Brian Tarbox tar...@cabotresearch.comwrote:

 Hmm, that's sense but then why is the latency for the reads that get the
 metric often so high (several thousand uSecs) and why does it so closely
 track the latency of my normal reads?


 On Wed, Jan 16, 2013 at 12:14 PM, Tyler Hobbs ty...@datastax.com wrote:

 When you view OpsCenter metrics, you're generating a small number of
 reads to fetch the metric data, which is why your read count is near zero
 instead of actually being zero.  Since reads are still occurring, Cassandra
 will continue to show a read latency.  Basically, you're just viewing the
 latency on the reads to fetch metric data.

 Normally the number of reads required to view metrics are small enough
 that they only make a minor difference in your overall read latency
 average, but when you have no other reads occurring, they're the only reads
 that are included in the average.


 On Tue, Jan 15, 2013 at 9:28 PM, Brian Tarbox 
 tar...@cabotresearch.comwrote:

 I am making heavy use of DataStax OpsCenter to help tune my system and
 its great.

 And yet puzzling.  I see my clients do a burst of Reads causing the
 OpsCenter Read Requests chart to go up and stay up until the clients finish
 doing their reads.  The read request latency chart also goes upbut it
 stays up even after all the reads are done.  At last glance I've had next
 to zero reads for 10 minutes but still have a read request latency thats
 basically unchanged from when there were actual reads.

 How am I to interpret this?

 Thanks.

 Brian Tarbox




 --
 Tyler Hobbs
 DataStax http://datastax.com/





-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: trying to use row_cache (b/c we have hot rows) but nodetool info says zero requests

2013-01-16 Thread Edward Capriolo
You have to change the column family cache info from keys_only to otherwise
the cache will not br on for this cf.

On Wednesday, January 16, 2013, Brian Tarbox tar...@cabotresearch.com
wrote:
 We have quite wide rows and do a lot of concentrated processing on each
row...so I thought I'd try the row cache on one node in my cluster to see
if I could detect an effect of using it.
 The problem is that nodetool info says that even with a two gig row_cache
we're getting zero requests.  Since my client program is actively
processing, and since keycache shows lots of activity I'm puzzled.
 Shouldn't any read of a column cause the entire row to be loaded?
 My entire data file is only 32 gig right now so its hard to imagine the 2
gig is too small to hold even a single row?
 Any suggestions how to proceed are appreciated.
 Thanks.
 Brian Tarbox


Re: Starting Cassandra

2013-01-16 Thread Edward Capriolo
I think at this point cassandra startup scripts should reject versions
since cassandra won't even star with many jvms at this point.

On Tuesday, January 15, 2013, Michael Kjellman mkjell...@barracuda.com
wrote:
 Do yourself a favor and get a copy of the Oracle 7 JDK (now with more
security patches too!)
 On Jan 15, 2013, at 1:44 AM, Sloot, Hans-Peter 
hans-peter.sl...@atos.net wrote:

 I managed to install apache-cassandra-1.2.0-bin.tar.gz



 With java-1.6.0-openjdk-1.6.0.0-1.45.1.11.1.el6.x86_64 I still get the
segmentation fault.

 However with java-1.7.0-openjdk-1.7.0.3-2.1.0.1.el6.7.x86_64 everything
runs fine.



 Regards Hans-Peter



 From: aaron morton [mailto:aa...@thelastpickle.com]
 Sent: dinsdag 15 januari 2013 1:20
 To: user@cassandra.apache.org
 Subject: Re: Starting Cassandra



 DSE includes hadoop files. It looks like the installation is broken. I
would start again if possible and/or ask the peeps at Data Stax about your
particular OS / JVM configuration.



 In the past I've used this to set a particular JVM when multiple ones are
installed…



 update-alternatives --set java /usr/lib/jvm/java-6-sun/jre/bin/java



 Cheers



 -

 Aaron Morton

 Freelance Cassandra Developer

 New Zealand



 @aaronmorton

 http://www.thelastpickle.com



 On 11/01/2013, at 10:55 PM, Sloot, Hans-Peter hans-peter.sl...@atos.net
wrote:

 Hi,

 I removed the open-jdk packages which caused the dse* packages to be
uninstalled too and installed jdk6u38.



 But when I installed the dse packages yum also downloaded and installed
the open-jdk packages.

  

 --
 Join Barracuda Networks in the fight against hunger.
 To learn how you can help in your community, please visit:
http://on.fb.me/UAdL4f
   ­­


Re: LCS not removing rows with all TTL expired columns

2013-01-16 Thread Andrey Ilinykh
To get column removed you have to meet two requirements
1. column should be expired
2. after that CF gets compacted

I guess your expired columns are propagated to high tier CF, which gets
compacted rarely.
So, you have to wait when high tier CF gets compacted.

Andrey



On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot btal...@aeriagames.comwrote:

 On cassandra 1.1.5 with a write heavy workload, we're having problems
 getting rows to be compacted away (removed) even though all columns have
 expired TTL.  We've tried size tiered and now leveled and are seeing the
 same symptom: the data stays around essentially forever.

 Currently we write all columns with a TTL of 72 hours (259200 seconds) and
 expect to add 10 GB of data to this CF per day per node.  Each node
 currently has 73 GB for the affected CF and shows no indications that old
 rows will be removed on their own.

 Why aren't rows being removed?  Below is some data from a sample row which
 should have been removed several days ago but is still around even though
 it has been involved in numerous compactions since being expired.

 $ ./bin/nodetool -h localhost getsstables metrics request_summary
 459fb460-5ace-11e2-9b92-11d67b6163b4

 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db

 $ ls -alF
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
 -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db

 $ ./bin/sstable2json
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
 -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 %x')
 {
 34353966623436302d356163652d313165322d396239322d313164363762363136336234:
 [[app_name,50f21d3d,1357785277207001,d],
 [client_ip,50f21d3d,1357785277207001,d],
 [client_req_id,50f21d3d,1357785277207001,d],
 [mysql_call_cnt,50f21d3d,1357785277207001,d],
 [mysql_duration_us,50f21d3d,1357785277207001,d],
 [mysql_failure_call_cnt,50f21d3d,1357785277207001,d],
 [mysql_success_call_cnt,50f21d3d,1357785277207001,d],
 [req_duration_us,50f21d3d,1357785277207001,d],
 [req_finish_time_us,50f21d3d,1357785277207001,d],
 [req_method,50f21d3d,1357785277207001,d],
 [req_service,50f21d3d,1357785277207001,d],
 [req_start_time_us,50f21d3d,1357785277207001,d],
 [success,50f21d3d,1357785277207001,d]]
 }


 Decoding the column timestamps to shows that the columns were written at
 Thu, 10 Jan 2013 02:34:37 GMT and that their TTL expired at Sun, 13 Jan
 2013 02:34:37 GMT.  The date of the SSTable shows that it was generated on
 Jan 16 which is 3 days after all columns have TTL-ed out.


 The schema shows that gc_grace is set to 0 since this data is write-once,
 read-seldom and is never updated or deleted.

 create column family request_summary
   with column_type = 'Standard'
   and comparator = 'UTF8Type'
   and default_validation_class = 'UTF8Type'
   and key_validation_class = 'UTF8Type'
   and read_repair_chance = 0.1
   and dclocal_read_repair_chance = 0.0
   and gc_grace = 0
   and min_compaction_threshold = 4
   and max_compaction_threshold = 32
   and replicate_on_write = true
   and compaction_strategy =
 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
   and caching = 'NONE'
   and bloom_filter_fp_chance = 1.0
   and compression_options = {'chunk_length_kb' : '64',
 'sstable_compression' :
 'org.apache.cassandra.io.compress.SnappyCompressor'};


 Thanks in advance for help in understanding why rows such as this are not
 removed!

 -Bryan




Re: LCS not removing rows with all TTL expired columns

2013-01-16 Thread Bryan Talbot
According to the timestamps (see original post) the SSTable was written
(thus compacted compacted) 3 days after all columns for that row had
expired and 6 days after the row was created; yet all columns are still
showing up in the SSTable.  Note that the column shows now rows when a
get for that key is run so that's working correctly, but the data is
lugged around far longer than it should be -- maybe forever.


-Bryan


On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh ailin...@gmail.com wrote:

 To get column removed you have to meet two requirements
 1. column should be expired
 2. after that CF gets compacted

 I guess your expired columns are propagated to high tier CF, which gets
 compacted rarely.
 So, you have to wait when high tier CF gets compacted.

 Andrey



 On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot btal...@aeriagames.comwrote:

 On cassandra 1.1.5 with a write heavy workload, we're having problems
 getting rows to be compacted away (removed) even though all columns have
 expired TTL.  We've tried size tiered and now leveled and are seeing the
 same symptom: the data stays around essentially forever.

 Currently we write all columns with a TTL of 72 hours (259200 seconds)
 and expect to add 10 GB of data to this CF per day per node.  Each node
 currently has 73 GB for the affected CF and shows no indications that old
 rows will be removed on their own.

 Why aren't rows being removed?  Below is some data from a sample row
 which should have been removed several days ago but is still around even
 though it has been involved in numerous compactions since being expired.

 $ ./bin/nodetool -h localhost getsstables metrics request_summary
 459fb460-5ace-11e2-9b92-11d67b6163b4

 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db

 $ ls -alF
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
 -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db

 $ ./bin/sstable2json
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
 -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 %x')
 {
 34353966623436302d356163652d313165322d396239322d313164363762363136336234:
 [[app_name,50f21d3d,1357785277207001,d],
 [client_ip,50f21d3d,1357785277207001,d],
 [client_req_id,50f21d3d,1357785277207001,d],
 [mysql_call_cnt,50f21d3d,1357785277207001,d],
 [mysql_duration_us,50f21d3d,1357785277207001,d],
 [mysql_failure_call_cnt,50f21d3d,1357785277207001,d],
 [mysql_success_call_cnt,50f21d3d,1357785277207001,d],
 [req_duration_us,50f21d3d,1357785277207001,d],
 [req_finish_time_us,50f21d3d,1357785277207001,d],
 [req_method,50f21d3d,1357785277207001,d],
 [req_service,50f21d3d,1357785277207001,d],
 [req_start_time_us,50f21d3d,1357785277207001,d],
 [success,50f21d3d,1357785277207001,d]]
 }


 Decoding the column timestamps to shows that the columns were written at
 Thu, 10 Jan 2013 02:34:37 GMT and that their TTL expired at Sun, 13 Jan
 2013 02:34:37 GMT.  The date of the SSTable shows that it was generated on
 Jan 16 which is 3 days after all columns have TTL-ed out.


 The schema shows that gc_grace is set to 0 since this data is write-once,
 read-seldom and is never updated or deleted.

 create column family request_summary
   with column_type = 'Standard'
   and comparator = 'UTF8Type'
   and default_validation_class = 'UTF8Type'
   and key_validation_class = 'UTF8Type'
   and read_repair_chance = 0.1
   and dclocal_read_repair_chance = 0.0
   and gc_grace = 0
   and min_compaction_threshold = 4
   and max_compaction_threshold = 32
   and replicate_on_write = true
   and compaction_strategy =
 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
   and caching = 'NONE'
   and bloom_filter_fp_chance = 1.0
   and compression_options = {'chunk_length_kb' : '64',
 'sstable_compression' :
 'org.apache.cassandra.io.compress.SnappyCompressor'};


 Thanks in advance for help in understanding why rows such as this are not
 removed!

 -Bryan





Webinar: Using Storm for Distributed Processing on Cassandra

2013-01-16 Thread Brian O'Neill
Just an FYI --

We will be hosting a webinar tomorrow demonstrating the use of Storm
as a distributed processing layer on top of Cassandra.

I'll be tag teaming with Taylor Goetz, the original author of storm-cassandra.
http://www.datastax.com/resources/webinars/collegecredit

It is part of the C*ollege Credit Webinar Series from Datastax.

All are welcome.

-brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42


Re: Cassandra 1.2 thrift migration

2013-01-16 Thread aaron morton
 Any idea whether interoperability b/w Thrift and CQL should work properly in 
 1.2?
AFAIK the only incompatibility is CQL 3 between pre 1.2 and 1.2.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 16/01/2013, at 1:24 AM, Vivek Mishra mishra.v...@gmail.com wrote:

 Hi,
 Is there any document to follow, in case i migrate cassandra thrift API to 
 1.2 release? Is it backward compatible with previous releases? 
 While migrating Kundera to cassandra 1.2, it is complaining on various data 
 types. Giving weird errors like:
 
 While connecting from cassandra-cli:
 
 
 Exception in thread main java.lang.OutOfMemoryError: Java heap space 
 at java.lang.AbstractStringBuilder.init(AbstractStringBuilder.java:45) 
 at java.lang.StringBuilder.init(StringBuilder.java:80) 
 at java.math.BigDecimal.getValueString(BigDecimal.java:2885) 
 at java.math.BigDecimal.toPlainString(BigDecimal.java:2869) 
 at org.apache.cassandra.cql.jdbc.JdbcDecimal.getString(JdbcDecimal.java:72) 
 at org.apache.cassandra.db.marshal.DecimalType.getString(DecimalType.java:62) 
 at org.apache.cassandra.cli.CliClient.printSliceList(CliClient.java:2873) 
 at org.apache.cassandra.cli.CliClient.executeList(CliClient.java:1486) 
 at org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:272) 
 at 
 org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.java:210)
  
 at org.apache.cassandra.cli.CliMain.main(CliMain.java:337) 
 
 
 
 And sometimes results in Server Crash.
 
 
 Any idea whether interoperability b/w Thrift and CQL should work properly in 
 1.2?
 
 -Vivek



Re: error when creating column family using cql3 and persisting data using thrift

2013-01-16 Thread aaron morton
The thrift request is not sending a composite type where it should. CQL 3 uses 
composites in a lot of places. 

What was your table definition?

Are you using a high level client or rolling your own? 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 16/01/2013, at 5:32 AM, James Schappet jschap...@gmail.com wrote:

 I also saw this while testing the 
 https://github.com/boneill42/naughty-or-nice example project.
 
 
 
 
 --Jimmy
 
 
 From: Kuldeep Mishra kuld.cs.mis...@gmail.com
 Reply-To: user@cassandra.apache.org
 Date: Tuesday, January 15, 2013 10:29 AM
 To: user@cassandra.apache.org
 Subject: error when creating column family using cql3 and persisting data 
 using thrift
 
 Hi,
 I am facing following problem, when creating column family using cql3 and 
 trying to persist data using thrift 1.2.0
 in cassandra-1.2.0.
 
 Details: 
 InvalidRequestException(why:Not enough bytes to read value of component 0)
 at 
 org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:20833)
 at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
 at 
 org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:964)
 at 
 org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:950)
 at 
 com.impetus.client.cassandra.thrift.ThriftClient.onPersist(ThriftClient.java:157)
 
 
 
 Please help me.
 
 
 -- 
 Thanks and Regards
 Kuldeep Kumar Mishra
 +919540965199



Re: write count increase after 1.2 update

2013-01-16 Thread aaron morton
You *may* be seeing this https://issues.apache.org/jira/browse/CASSANDRA-2503

It was implemented in 1.1.0 but perhaps data in the original cluster is more 
compacted than the new one. 

Are the increases for all CF's are just a few?
Do you have a work load of infrequent writes to rows followed by wide reads ?

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 16/01/2013, at 6:23 AM, Reik Schatz reik.sch...@gmail.com wrote:

 Hi, we are running a 1.1.6 (datastax) test cluster with 6 nodes. After the 
 recent 1.2 release we have set up a second cluster - also having 6 nodes 
 running 1.2 (datastax).
 
 They are now running in parallel. We noticed an increase in the number of 
 writes in our monitoring tool (Datadog). The tool is using the write count 
 statistic of nodetool cfstats. So we ran nodetool cfstats on one node in each 
 cluster. To get an initial write count. Then we ran it again after 60 sec. It 
 looks like the 1.2 received about twice the amount of writes. 
 
 The way our application is designed is that the writes are idempotent, so we 
 don't see a size increase. Were there any changes in between 1.1.6  1.2 that 
 could explain this behavior?
 
 I know that 1.2 has the concept of virtual nodes, to spread out the data more 
 evenly. So if the write count value was actually the sum of all writes to 
 all nodes in the, this increase would make sense.
 
 Reik
 
 ps. the clusters are not 100% identical. i.e. since bloom filters are now 
 off-heap, we changed settings for heap size and memtables. Cluster 1.1.6: 
 heap 8G, memtables 1/3 of heap. Cluster 1.2.0: heap 4G, memtables 2G. Not 
 sure it can have an impact on the problem.



Re: Astyanax returns empty row

2013-01-16 Thread aaron morton
If you think you have located a bug in Astyanax please submit it to 
https://github.com/Netflix/astyanax

Cheers
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/01/2013, at 3:44 AM, Sávio Teles savio.te...@lupa.inf.ufg.br wrote:

 We have multiple clients reading the same row key. It makes no sense fail in 
 one machine. When we use Thrift, Cassandra always returns the correct result.
 
  
 2013/1/16 Sávio Teles savio.te...@lupa.inf.ufg.br
 I ran the tests with only one machine, so the CL_ONE is not the problem. Am i 
 right?
 
 
 
 
 -- 
 Atenciosamente,
 Sávio S. Teles de Oliveira
 voice: +55 62 9136 6996
 http://br.linkedin.com/in/savioteles
 Mestrando em Ciências da Computação - UFG 
 Arquiteto de Software
 Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG



Re: Cassandra timeout whereas it is not much busy

2013-01-16 Thread aaron morton
Check the disk utilisation using iostat -x 5
If you are on a VM / in the cloud check for CPU steal. 
Check the logs for messages from the GCInspector, the ParNew events are times 
the JVM is paused. 
Look at the times dropped messages are logged and try to correlate them with 
other server events. 

If you have a lot secondary indexes, or a lot of memtables flushing at the some 
time you may be blocking behind the global Switch Lock. If you use secondary 
indexes make sure the memtable_flush_queue_size is set correctly, see the 
comments in the yaml file.

If you have a lot of CF's flushing at the same time, and there are not messages 
from the MeteredFlusher, it may be the log segment is too big for the number 
of CF's you have. When the segment needs to be recycled all dirty CF's are 
flushed, if you have a lot of cf's this can result in blocking around the 
switch lock. Trying reducing the commitlog_segment_size_in_mb so that less CF's 
are flushed. 

Hope that helps
 
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/01/2013, at 10:30 AM, Nicolas Lalevée nicolas.lale...@hibnet.org wrote:

 Hi,
 
 I have a strange behavior I am not able to understand.
 
 I have 6 nodes with cassandra-1.0.12. Each nodes have 8G of RAM. I have a 
 replication factor of 3.
 
 ---
 my story is maybe too long, trying shorter here, while saving what I wrote in 
 case someone has patience to read my bad english ;)
 
 I got under a situation where my cluster was generating a lot of timeouts on 
 our frontend, whereas I could not see any major trouble on the internal 
 stats. Actually cpu, read  write counts on the column families were quite 
 low. A mess until I switched from java7 to java6 and forced the used of jamm. 
 After the switch, cpu, read  write counts, were going up again, timeouts 
 gone. I have seen this behavior while reducing the xmx too.
 
 What could be blocking cassandra from utilizing the while resources of the 
 machine ? Is there is metrics I didn't saw which could explain this ?
 
 ---
 Here is the long story.
 
 When I first set my cluster up, I gave blindly 6G of heap to the cassandra 
 nodes, thinking that more a java process has, the smoother it runs, while 
 keeping some RAM to the disk cache. We got some new feature deployed, and 
 things were going into hell, some machine up to 60% of wa. I give credit to 
 cassandra because there was not that much timeout received on the web 
 frontend, it was kind of slow but is was kind of working. With some 
 optimizations, we reduced the pressure of the new feature, but it was still 
 at 40%wa.
 
 At that time I didn't have much monitoring, just heap and cpu. I read some 
 article how to tune, and I learned that the disk cache is quite important 
 because cassandra relies on it to be the read cache. So I have tried many 
 xmx, and 3G seems of kind the lowest possible. So on 2 among 6 nodes, I have 
 set 3,3G to xmx. Amazingly, I saw the wa down to 10%. Quite happy with that, 
 I changed the xmx 3,3G on each node. But then things really went to hell, a 
 lot of timeouts on the frontend. It was not working at all. So I rolled back.
 
 After some time, probably because of the growing data of the new feature to a 
 nominal size, things went again to very high %wa, and cassandra was not able 
 to keep it up. So we kind of reverted the feature, the column family is still 
 used but only by one thread on the frontend. The wa was reduced to 20%, but 
 things continued to not properly working, from time to time, a bunch of 
 timeout are raised on our frontend.
 
 In the mean time, I took time to do some proper monitoring of cassandra: 
 column family read  write counts, latency, memtable size, but also the 
 dropped messages, the pending tasks, the timeouts between nodes. It's just a 
 start but it haves me a first nice view of what is actually going on.
 
 I tried again reducing the xmx on one node. Cassandra is not complaining of 
 having not enough heap, memtables are not flushed insanely every second, the 
 number of read and write is reduced compared to the other node, the cpu is 
 lower too, there is not much pending tasks, no message dropped more than 1 or 
 2 from time to time. Everything indicates that there is probably more room to 
 more work, but the node doesn't take it. Even its read and write latencies 
 are lower than on the other nodes. But if I keep this long enough with this 
 xmx, timeouts start to raise on the frontends.
 After some individual node experiment, the cluster was starting be be quite 
 sick. Even with 6G, the %wa were reducing, read and write counts too, on 
 kind of every node. And more and more timeout raised on the frontend.
 The only thing that I could see worrying, is the heap climbing slowly above 
 the 75% threshold and from time to time suddenly dropping from 95% to 70%. I 
 looked at the full gc counter, not much pressure.
 And 

Re: LCS not removing rows with all TTL expired columns

2013-01-16 Thread aaron morton
Minor compaction (with Size Tiered) will only purge tombstones if all fragments 
of a row are contained in the SSTables being compacted. So if you have a long 
lived row, that is present in many size tiers, the columns will not be purged. 

  (thus compacted compacted) 3 days after all columns for that row had expired
Tombstones have to get on disk, even if you set the gc_grace_seconds to 0. If 
not they do not get a chance to delete previous versions of the column which 
already exist on disk. So when the compaction ran your ExpiringColumn was 
turned into a DeletedColumn and placed on disk. 

I would expect the next round of compaction to remove these columns. 

There is a new feature in 1.2 that may help you here. It will do a special 
compaction of individual sstables when they have a certain proportion of dead 
columns https://issues.apache.org/jira/browse/CASSANDRA-3442 

Also interested to know if LCS helps. 

Cheers
 

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/01/2013, at 2:55 PM, Bryan Talbot btal...@aeriagames.com wrote:

 According to the timestamps (see original post) the SSTable was written (thus 
 compacted compacted) 3 days after all columns for that row had expired and 6 
 days after the row was created; yet all columns are still showing up in the 
 SSTable.  Note that the column shows now rows when a get for that key is 
 run so that's working correctly, but the data is lugged around far longer 
 than it should be -- maybe forever.
 
 
 -Bryan
 
 
 On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh ailin...@gmail.com wrote:
 To get column removed you have to meet two requirements 
 1. column should be expired
 2. after that CF gets compacted
 
 I guess your expired columns are propagated to high tier CF, which gets 
 compacted rarely.
 So, you have to wait when high tier CF gets compacted.  
 
 Andrey
 
 
 
 On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot btal...@aeriagames.com wrote:
 On cassandra 1.1.5 with a write heavy workload, we're having problems getting 
 rows to be compacted away (removed) even though all columns have expired TTL. 
  We've tried size tiered and now leveled and are seeing the same symptom: the 
 data stays around essentially forever.  
 
 Currently we write all columns with a TTL of 72 hours (259200 seconds) and 
 expect to add 10 GB of data to this CF per day per node.  Each node currently 
 has 73 GB for the affected CF and shows no indications that old rows will be 
 removed on their own.
 
 Why aren't rows being removed?  Below is some data from a sample row which 
 should have been removed several days ago but is still around even though it 
 has been involved in numerous compactions since being expired.
 
 $ ./bin/nodetool -h localhost getsstables metrics request_summary 
 459fb460-5ace-11e2-9b92-11d67b6163b4
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
 
 $ ls -alF 
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
 -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42 
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
 
 $ ./bin/sstable2json 
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
  -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 %x')
 {
 34353966623436302d356163652d313165322d396239322d313164363762363136336234: 
 [[app_name,50f21d3d,1357785277207001,d], 
 [client_ip,50f21d3d,1357785277207001,d], 
 [client_req_id,50f21d3d,1357785277207001,d], 
 [mysql_call_cnt,50f21d3d,1357785277207001,d], 
 [mysql_duration_us,50f21d3d,1357785277207001,d], 
 [mysql_failure_call_cnt,50f21d3d,1357785277207001,d], 
 [mysql_success_call_cnt,50f21d3d,1357785277207001,d], 
 [req_duration_us,50f21d3d,1357785277207001,d], 
 [req_finish_time_us,50f21d3d,1357785277207001,d], 
 [req_method,50f21d3d,1357785277207001,d], 
 [req_service,50f21d3d,1357785277207001,d], 
 [req_start_time_us,50f21d3d,1357785277207001,d], 
 [success,50f21d3d,1357785277207001,d]]
 }
 
 
 Decoding the column timestamps to shows that the columns were written at 
 Thu, 10 Jan 2013 02:34:37 GMT and that their TTL expired at Sun, 13 Jan 
 2013 02:34:37 GMT.  The date of the SSTable shows that it was generated on 
 Jan 16 which is 3 days after all columns have TTL-ed out.
 
 
 The schema shows that gc_grace is set to 0 since this data is write-once, 
 read-seldom and is never updated or deleted.
 
 create column family request_summary
   with column_type = 'Standard'
   and comparator = 'UTF8Type'
   and default_validation_class = 'UTF8Type'
   and key_validation_class = 'UTF8Type'
   and read_repair_chance = 0.1
   and dclocal_read_repair_chance = 0.0
   and gc_grace = 0
   and min_compaction_threshold = 4
   and max_compaction_threshold = 32
   and replicate_on_write = true
   and compaction_strategy = 
 

Re: Cassandra Consistency problem with NTP

2013-01-16 Thread Russell Haering
One solution is to only read up to (now - 1 second). If this is a public
API where you want to guarantee full consistency (ie, if you have added a
message to the queue, it will definitely appear to be there) you can
instead delay requests for 1 second before reading up to the moment that
the request was received.

In either of these approaches you can tune the time offset based on how
closely synchronized you believe you can keep your clocks. The tradeoff of
course, will be increased latency.


On Wed, Jan 16, 2013 at 5:56 PM, Jason Tang ares.t...@gmail.com wrote:

 Hi

 I am using Cassandra in a message bus solution, the major responsibility
 of cassandra is recording the incoming requests for later consumming.

 One strategy is First in First out (FIFO), so I need to get the stored
 request in reversed order.

 I use NTP to synchronize the system time for the nodes in the cluster. (4
 nodes).

 But the local time of each node are still have some inaccuracy, around 40
 ms.

 The consistency level is write all and read one, and replicate factor is 3.

 But here is the problem:
 A request come to node One at local time PM 10:00:01.000
 B request come to node Two at local time PM 10:00:00.980

 The correct order is A -- B
 But the timestamp is B -- A

 So is there any way for Cassandra to keep the correct order for read
 operation? (e.g. logical timestamp ?)

 Or Cassandra strong depence on time synchronization solution?

 BRs
 //Tang







Re: Cassandra Consistency problem with NTP

2013-01-16 Thread Jason Tang
Delay read is acceptable, but problem still there:
A request come to node One at local time PM 10:00:01.000
B request come to node Two at local time PM 10:00:00.980

The correct order is A -- B
I am not sure how node C will handle the data, although A came before B,
but B's timestamp is earlier then A ?



2013/1/17 Russell Haering russellhaer...@gmail.com

 One solution is to only read up to (now - 1 second). If this is a public
 API where you want to guarantee full consistency (ie, if you have added a
 message to the queue, it will definitely appear to be there) you can
 instead delay requests for 1 second before reading up to the moment that
 the request was received.

 In either of these approaches you can tune the time offset based on how
 closely synchronized you believe you can keep your clocks. The tradeoff of
 course, will be increased latency.


 On Wed, Jan 16, 2013 at 5:56 PM, Jason Tang ares.t...@gmail.com wrote:

 Hi

 I am using Cassandra in a message bus solution, the major responsibility
 of cassandra is recording the incoming requests for later consumming.

 One strategy is First in First out (FIFO), so I need to get the stored
 request in reversed order.

 I use NTP to synchronize the system time for the nodes in the cluster. (4
 nodes).

 But the local time of each node are still have some inaccuracy, around 40
 ms.

 The consistency level is write all and read one, and replicate factor is
 3.

 But here is the problem:
 A request come to node One at local time PM 10:00:01.000
 B request come to node Two at local time PM 10:00:00.980

 The correct order is A -- B
 But the timestamp is B -- A

 So is there any way for Cassandra to keep the correct order for read
 operation? (e.g. logical timestamp ?)

 Or Cassandra strong depence on time synchronization solution?

 BRs
 //Tang








Re: error when creating column family using cql3 and persisting data using thrift

2013-01-16 Thread Kuldeep Mishra
Hi Aaron,
I am using thrift client.

Here is column family creation script:-
```
String colFamily = CREATE COLUMNFAMILY users (key varchar
PRIMARY   KEY,full_name varchar, birth_date int,state varchar);
 conn.execute_cql3_query(ByteBuffer.wrap(colFamily.getBytes()),
Compression.NONE, ConsistencyLevel.ONE);
``

and thrift operation code :-

```
Cassandra.Client conn

 MapByteBuffer, MapString, ListMutation mutationMap = new
HashMapByteBuffer, MapString, ListMutation();

ListMutation insertion_list = new ArrayListMutation();

  Mutation mut = new Mutation();
  Column column = new Column(ByteBuffer.wrap(full_name.getBytes()));
  column.setValue(ByteBuffer.wrap(emp.getBytes()));
  mut.setColumn_or_supercolumn(new ColumnOrSuperColumn().setColumn(column));
insertion_list.add(mut);

   MapString, ListMutation columnFamilyValues = new HashMapString,
ListMutation();
   columnFamilyValues.put(users, insertion_list);

   mutationMap.put(ByteBuffer.wrap(K.getBytes()), columnFamilyValues);

   conn.batch_mutate(mutationMap, ConsistencyLevel.ONE);
``

and error stack trace :-
``
InvalidRequestException(why:Not enough bytes to read value of component 0)
at
org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:20833)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at
org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:964)
at
org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:950)



Thanks
Kuldeep Mishra

On Thu, Jan 17, 2013 at 8:40 AM, aaron morton aa...@thelastpickle.comwrote:

 The thrift request is not sending a composite type where it should. CQL 3
 uses composites in a lot of places.

 What was your table definition?

 Are you using a high level client or rolling your own?

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 16/01/2013, at 5:32 AM, James Schappet jschap...@gmail.com wrote:

 I also saw this while testing the
 https://github.com/boneill42/naughty-or-nice example project.




 --Jimmy


 From: Kuldeep Mishra kuld.cs.mis...@gmail.com
 Reply-To: user@cassandra.apache.org
 Date: Tuesday, January 15, 2013 10:29 AM
 To: user@cassandra.apache.org
 Subject: error when creating column family using cql3 and persisting data
 using thrift

 Hi,
 I am facing following problem, when creating column family using cql3 and
 trying to persist data using thrift 1.2.0
 in cassandra-1.2.0.

 Details:
 InvalidRequestException(why:Not enough bytes to read value of component 0)
 at
 org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:20833)
 at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
 at
 org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:964)
 at
 org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:950)
 at
 com.impetus.client.cassandra.thrift.ThriftClient.onPersist(ThriftClient.java:157)



 Please help me.


 --
 Thanks and Regards
 Kuldeep Kumar Mishra
 +919540965199





-- 
Thanks and Regards
Kuldeep Kumar Mishra
+919540965199


Re: Cassandra Consistency problem with NTP

2013-01-16 Thread Sylvain Lebresne
I'm not sure I fully understand your problem. You seem to be talking of
ordering the requests, in the order they are generated. But in that case,
you will rely on the ordering of columns within whatever row you store
request A and B in, and that order depends on the column names, which in
turns is client provided and doesn't depend at all of the time
synchronization of the cluster nodes. And since you are able to say that
request A comes before B, I suppose this means said requests are generated
from the same source. In which case you just need to make sure that the
column names storing each request respect the correct ordering.

The column timestamps Cassandra uses are here to which update *to the same
column* is the more recent one. So it only comes into play if you requests
A and B update the same column and you're interested in knowing which one
of the update will win when you read. But even if that's your case (which
doesn't sound like it at all from your description), the column timestamp
is only generated server side if you use CQL. And even in that latter case,
it's a convenience and you can force a timestamp client side if you really
wish. In other words, Cassandra dependency on time synchronization is not a
strong one even in that case. But again, that doesn't seem at all to be the
problem you are trying to solve.

--
Sylvain


On Thu, Jan 17, 2013 at 2:56 AM, Jason Tang ares.t...@gmail.com wrote:

 Hi

 I am using Cassandra in a message bus solution, the major responsibility
 of cassandra is recording the incoming requests for later consumming.

 One strategy is First in First out (FIFO), so I need to get the stored
 request in reversed order.

 I use NTP to synchronize the system time for the nodes in the cluster. (4
 nodes).

 But the local time of each node are still have some inaccuracy, around 40
 ms.

 The consistency level is write all and read one, and replicate factor is 3.

 But here is the problem:
 A request come to node One at local time PM 10:00:01.000
 B request come to node Two at local time PM 10:00:00.980

 The correct order is A -- B
 But the timestamp is B -- A

 So is there any way for Cassandra to keep the correct order for read
 operation? (e.g. logical timestamp ?)

 Or Cassandra strong depence on time synchronization solution?

 BRs
 //Tang







Re: Cassandra Consistency problem with NTP

2013-01-16 Thread Jason Tang
Yes, Sylvain, you are correct.
When I say A comes before B,  it means client will secure the order,
actually, B will be sent only after get response of A request.

And Yes, A and B are not update same record, so it is not typical Cassandra
consistency problem.

And Yes, the column name is provide by client, and now I use the local
timestamp, and local time of A and B are not synchronized well, so I have
problem.

So what I want is, Cassandra provide some information for client, to
indicate A is stored before B, e.g. global unique timestamp, or  row order.




2013/1/17 Sylvain Lebresne sylv...@datastax.com

 I'm not sure I fully understand your problem. You seem to be talking of
 ordering the requests, in the order they are generated. But in that case,
 you will rely on the ordering of columns within whatever row you store
 request A and B in, and that order depends on the column names, which in
 turns is client provided and doesn't depend at all of the time
 synchronization of the cluster nodes. And since you are able to say that
 request A comes before B, I suppose this means said requests are generated
 from the same source. In which case you just need to make sure that the
 column names storing each request respect the correct ordering.

 The column timestamps Cassandra uses are here to which update *to the same
 column* is the more recent one. So it only comes into play if you requests
 A and B update the same column and you're interested in knowing which one
 of the update will win when you read. But even if that's your case (which
 doesn't sound like it at all from your description), the column timestamp
 is only generated server side if you use CQL. And even in that latter case,
 it's a convenience and you can force a timestamp client side if you really
 wish. In other words, Cassandra dependency on time synchronization is not a
 strong one even in that case. But again, that doesn't seem at all to be the
 problem you are trying to solve.

 --
 Sylvain


 On Thu, Jan 17, 2013 at 2:56 AM, Jason Tang ares.t...@gmail.com wrote:

 Hi

 I am using Cassandra in a message bus solution, the major responsibility
 of cassandra is recording the incoming requests for later consumming.

 One strategy is First in First out (FIFO), so I need to get the stored
 request in reversed order.

 I use NTP to synchronize the system time for the nodes in the cluster. (4
 nodes).

 But the local time of each node are still have some inaccuracy, around 40
 ms.

 The consistency level is write all and read one, and replicate factor is
 3.

 But here is the problem:
 A request come to node One at local time PM 10:00:01.000
 B request come to node Two at local time PM 10:00:00.980

 The correct order is A -- B
 But the timestamp is B -- A

 So is there any way for Cassandra to keep the correct order for read
 operation? (e.g. logical timestamp ?)

 Or Cassandra strong depence on time synchronization solution?

 BRs
 //Tang