Re: Astyanax returns empty row
I ran the tests with only one machine, so the CL_ONE is not the problem. Am i right? 2013/1/15 Hiller, Dean dean.hil...@nrel.gov What is your consistency level set to? If you set it to CL_ONE, you could get different results or is your database constant and unchanging? Dean From: Sávio Teles savio.te...@lupa.inf.ufg.brmailto: savio.te...@lupa.inf.ufg.br Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tuesday, January 15, 2013 5:43 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Astyanax returns empty row sometimes Astyanax returns a empty row for a specific key. For example, on first attempt Astyanax returns a empty row for a specific key, but on the second attempt it returns the desired row. -- Atenciosamente, Sávio S. Teles de Oliveira voice: +55 62 9136 6996 http://br.linkedin.com/in/savioteles Mestrando em Ciências da Computação - UFG Arquiteto de Software Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG
Re: How many BATCH inserts in to many?
Tnx all for clarification and your views. Queues and asynch are definitly a way to go. Anyway I'll take pull+aggregate aproach for now, it should work better for start. (if someone has the same follows app problem, there is a great research: http://research.yahoo.com/files/sigmod278-silberstein.pdf ) Lp, *Alan Ristić* *w*: www.microhint.com-- obišči, tnx * f*: facebook.com/microhint -- lajkaj, tnx* * *t*: twitter.com/microhint_com -- follow, tnx *m*: 040 423 688 2013/1/14 Vitalii Tymchyshyn tiv...@gmail.com Well, for me it was better to use async operations then batches. So, you are not bitten by latency, but can control everything per-operation. You will need to support a kind of window thought. But this windows can be quite low, like 10-20 ops. 2013/1/14 Wei Zhu wz1...@yahoo.com Another potential issue is when some failure happens to some of the mutations. Is atomic batches in 1.2 designed to resolve this? http://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2 -Wei - Original Message - From: aaron morton aa...@thelastpickle.com To: user@cassandra.apache.org Sent: Sunday, January 13, 2013 7:57:56 PM Subject: Re: How many BATCH inserts in to many? With regard to a large number of records in a batch mutation there are some potential issues. Each row becomes a task in the write thread pool on each replica. If a single client sends 1,000 rows in a mutation it will take time for the (default) 32 threads in the write pool to work through the mutations. While they are doing this other clients / requests will appear to be starved / stalled. There are also issues with the max message size in thrift and cql over thrift. IMHO as a rule of thumb dont go over a few hundred if you have a high number of concurrent writers. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 14/01/2013, at 12:56 AM, Radim Kolar h...@filez.com wrote: do not use cassandra for implementing queueing system with high throughput. It does not scale because of tombstone management. Use hornetQ, its amazingly fast broker but it has quite slow persistence if you want to create queues significantly larger then your memory and use selectors for searching for specific messages in them. My point is for implementing queue message broker is what you want. -- Best regards, Vitalii Tymchyshyn
read path, I have missed something
Hi, I am trying to understand the read path in Cassandra. I've read Cassandra's documentation and it seems that the read path is like this: - Client contacts with a proxy node which performs the operation over certain object - Proxy node sends requests to every replica of that object - Replica nodes answers eventually if they are up - After the first R replicas answer, the proxy node returns value to the client. - If some of the replicas are non updated and readrepair is active, proxy node updates those replicas. Ok, so far so good. But now I found some incoherences that I don't understand: Let's suppose that we have a 5 node cluster: x1, x2, x3, x4 and x5 each with replication factor 3, read_repair_chance=0.0, autobootstrap=false and caching=NONE We have keyspace KS1 and colunfamily CF1. With this configuration, we know that if any node crashes and erases its data directories it will be necesary to run nodetool repair in that node in order to repair that node and gather information from its replica companions. So, let's suppose that x1, x2 and x3 are the endpoint which stores the data KS1.CF1['data1'] If x1 crashes (loosing all its data), and we execute get KS1.CF1['data1'] with consistency level ALL, the operation will fail. That is ok to my understanding. If we restart x1 node and doesn't execute nodetool repair and repeat the operation get KS1.CF1['data1'] using consistency ALL, we will obtain the original data! Why? one of the nodes doesn't have any data about KS1.CF1['data1']. Ok, let's suppose that as all the required nodes answer, even if one doesn't have data, the operation ends correctly. Now let's repeat the same procedure with the rest of nodes, that is: 1- stop x1, erase data, logs, cache and commitlog from x1 2- restart x1 adn don't repair it 3- stop x2, erase data, logs, cache and commitlog from x2 4- restart x2 adn don't repair it 5- stop x3, erase data, logs, cache and commitlog from x3 6- restart x3 adn don't repair it 7- execute get KS1.CF1['data1'] with consistency level ALL - still return the correct data! Where did that data come from? the endpoint is supposed to be empty of data. I tried this using cassandra-cli and cassandra's ruby client and the result is always the same. What did I miss? Thank you for reading until the end, ;) Bye Carlos Pérez Miguel
Pig / Map Reduce on Cassandra
Hi, I know that DataStax Enterprise package provide Brisk, but is there a community version ? Is it easy to interface Hadoop with Cassandra as the storage or do we absolutely have to use Brisk for that ? I know CassandraFS is natively available in cassandra 1.2, the version I use, so is there a way/procedure to interface hadoop with Cassandra as the storage ? Thanks _ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, France Telecom - Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, France Telecom - Orange is not liable for messages that have been modified, changed or falsified. Thank you.
Re: Astyanax returns empty row
We have multiple clients reading the same row key. It makes no sense fail in one machine. When we use Thrift, Cassandra always returns the correct result. 2013/1/16 Sávio Teles savio.te...@lupa.inf.ufg.br I ran the tests with only one machine, so the CL_ONE is not the problem. Am i right? -- Atenciosamente, Sávio S. Teles de Oliveira voice: +55 62 9136 6996 http://br.linkedin.com/in/savioteles Mestrando em Ciências da Computação - UFG Arquiteto de Software Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG
Re: Pig / Map Reduce on Cassandra
Here are a few examples I have worked on, reading from xml.gz files then writing to cassandara. https://github.com/jschappet/medline You will also need: https://github.com/jschappet/medline-base These examples are Hadoop Jobs using Cassandra as the Data Store. This one is a good place to start. https://github.com/jschappet/medline/blob/master/src/main/java/edu/uiowa/ic ts/jobs/LoadMedline/StartJob.java ConfigHelper.setInputColumnFamily(job.getConfiguration(), KEYSPACE, COLUMN_FAMILY); ConfigHelper.setOutputColumnFamily(job.getConfiguration(), KEYSPACE, outputPath); job.setMapperClass(MapperToCassandra.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); LOG.info(Writing output to Cassandra); //job.setReducerClass(ReducerToCassandra.class); job.setOutputFormatClass(ColumnFamilyOutputFormat.class); ConfigHelper.setRpcPort(job.getConfiguration(), 9160); //org.apache.cassandra.dht.LocalPartitioner ConfigHelper.setInitialAddress(job.getConfiguration(), localhost); ConfigHelper.setPartitioner(job.getConfiguration(), org.apache.cassandra.dht.RandomPartitioner); On 1/16/13 7:37 AM, cscetbon@orange.com cscetbon@orange.com wrote: Hi, I know that DataStax Enterprise package provide Brisk, but is there a community version ? Is it easy to interface Hadoop with Cassandra as the storage or do we absolutely have to use Brisk for that ? I know CassandraFS is natively available in cassandra 1.2, the version I use, so is there a way/procedure to interface hadoop with Cassandra as the storage ? Thanks __ ___ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, France Telecom - Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, France Telecom - Orange is not liable for messages that have been modified, changed or falsified. Thank you.
Re: Pig / Map Reduce on Cassandra
I don't want to write to Cassandra as it replicates data from another datacenter, but I just want to use Hadoop Jobs (Pig and Hive) to read data from it. I would like to use the same configuration as http://www.datastax.com/dev/blog/hadoop-mapreduce-in-the-cassandra-cluster but I want to know if there are alternatives to DataStax Enterprise package. Thanks On Jan 16, 2013, at 3:59 PM, James Schappet jschap...@gmail.com wrote: Here are a few examples I have worked on, reading from xml.gz files then writing to cassandara. https://github.com/jschappet/medline You will also need: https://github.com/jschappet/medline-base These examples are Hadoop Jobs using Cassandra as the Data Store. This one is a good place to start. https://github.com/jschappet/medline/blob/master/src/main/java/edu/uiowa/ic ts/jobs/LoadMedline/StartJob.java ConfigHelper.setInputColumnFamily(job.getConfiguration(), KEYSPACE, COLUMN_FAMILY); ConfigHelper.setOutputColumnFamily(job.getConfiguration(), KEYSPACE, outputPath); job.setMapperClass(MapperToCassandra.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); LOG.info(Writing output to Cassandra); //job.setReducerClass(ReducerToCassandra.class); job.setOutputFormatClass(ColumnFamilyOutputFormat.class); ConfigHelper.setRpcPort(job.getConfiguration(), 9160); //org.apache.cassandra.dht.LocalPartitioner ConfigHelper.setInitialAddress(job.getConfiguration(), localhost); ConfigHelper.setPartitioner(job.getConfiguration(), org.apache.cassandra.dht.RandomPartitioner); On 1/16/13 7:37 AM, cscetbon@orange.com cscetbon@orange.com wrote: Hi, I know that DataStax Enterprise package provide Brisk, but is there a community version ? Is it easy to interface Hadoop with Cassandra as the storage or do we absolutely have to use Brisk for that ? I know CassandraFS is natively available in cassandra 1.2, the version I use, so is there a way/procedure to interface hadoop with Cassandra as the storage ? Thanks __ ___ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, France Telecom - Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, France Telecom - Orange is not liable for messages that have been modified, changed or falsified. Thank you. _ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, France Telecom - Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, France Telecom - Orange is not liable for messages that have been modified, changed or falsified. Thank you.
Re: Pig / Map Reduce on Cassandra
Try this one then, it reads from cassandra, then writes back to cassandra, but you could change the write to where ever you would like. getConf().set(IN_COLUMN_NAME, columnName ); Job job = new Job(getConf(), ProcessRawXml); job.setInputFormatClass(ColumnFamilyInputFormat.class); job.setNumReduceTasks(0); job.setJarByClass(StartJob.class); job.setMapperClass(ParseMapper.class); job.setOutputKeyClass(ByteBuffer.class); //job.setOutputValueClass(Text.class); job.setOutputFormatClass(ColumnFamilyOutputFormat.class); ConfigHelper.setOutputColumnFamily(job.getConfiguration(), KEYSPACE, COLUMN_FAMILY); job.setInputFormatClass(ColumnFamilyInputFormat.class); ConfigHelper.setRpcPort(job.getConfiguration(), 9160); //org.apache.cassandra.dht.LocalPartitioner ConfigHelper.setInitialAddress(job.getConfiguration(), localhost); ConfigHelper.setPartitioner(job.getConfiguration(), org.apache.cassandra.dht.RandomPartitioner); ConfigHelper.setInputColumnFamily(job.getConfiguration(), KEYSPACE, COLUMN_FAMILY); SlicePredicate predicate = new SlicePredicate().setColumn_names(Arrays.asList(ByteBufferUtil.bytes(columnN ame))); // SliceRange slice_range = new SliceRange(); // slice_range.setStart(ByteBufferUtil.bytes(startPoint)); // slice_range.setFinish(ByteBufferUtil.bytes(endPoint)); // // predicate.setSlice_range(slice_range); ConfigHelper.setInputSlicePredicate(job.getConfiguration(), predicate); job.waitForCompletion(true); https://github.com/jschappet/medline/blob/master/src/main/java/edu/uiowa/ic ts/jobs/ProcessXml/StartJob.java On 1/16/13 9:22 AM, cscetbon@orange.com cscetbon@orange.com wrote: I don't want to write to Cassandra as it replicates data from another datacenter, but I just want to use Hadoop Jobs (Pig and Hive) to read data from it. I would like to use the same configuration as http://www.datastax.com/dev/blog/hadoop-mapreduce-in-the-cassandra-cluster but I want to know if there are alternatives to DataStax Enterprise package. Thanks On Jan 16, 2013, at 3:59 PM, James Schappet jschap...@gmail.com wrote: Here are a few examples I have worked on, reading from xml.gz files then writing to cassandara. https://github.com/jschappet/medline You will also need: https://github.com/jschappet/medline-base These examples are Hadoop Jobs using Cassandra as the Data Store. This one is a good place to start. https://github.com/jschappet/medline/blob/master/src/main/java/edu/uiowa/ ic ts/jobs/LoadMedline/StartJob.java ConfigHelper.setInputColumnFamily(job.getConfiguration(), KEYSPACE, COLUMN_FAMILY); ConfigHelper.setOutputColumnFamily(job.getConfiguration(), KEYSPACE, outputPath); job.setMapperClass(MapperToCassandra.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); LOG.info(Writing output to Cassandra); //job.setReducerClass(ReducerToCassandra.class); job.setOutputFormatClass(ColumnFamilyOutputFormat.class); ConfigHelper.setRpcPort(job.getConfiguration(), 9160); //org.apache.cassandra.dht.LocalPartitioner ConfigHelper.setInitialAddress(job.getConfiguration(), localhost); ConfigHelper.setPartitioner(job.getConfiguration(), org.apache.cassandra.dht.RandomPartitioner); On 1/16/13 7:37 AM, cscetbon@orange.com cscetbon@orange.com wrote: Hi, I know that DataStax Enterprise package provide Brisk, but is there a community version ? Is it easy to interface Hadoop with Cassandra as the storage or do we absolutely have to use Brisk for that ? I know CassandraFS is natively available in cassandra 1.2, the version I use, so is there a way/procedure to interface hadoop with Cassandra as the storage ? Thanks __ ___ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, France Telecom - Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments.
Cassandra 1.1.2 - 1.1.8 upgrade
Hello, We are looking to upgrade our Cassandra cluster from 1.1.2 - 1.1.8 (or possibly 1.1.9 depending on timing). It is my understanding that rolling upgrades of Cassandra is supported, so as we upgrade our cluster, we can do so one node at a time without experiencing downtime. Has anyone had any gotchas recently that I should be aware of before performing this upgrade? In order to upgrade, is the only thing that needs to change are the JAR files? Can everything remain as-is? Thanks, -Mike
Re: read path, I have missed something
You're missing the correct definition of read_repair_chance. When you do a read at CL.ALL, all replicas are wait upon and the results from all those replicas are compared. From that, we can extract which nodes are not up to date, i.e. which ones can be read repair. And if some node need to be repair, we do it. Always, whatever the value of read_repair_chance is. Now if you do a read at CL.ONE, if you only end up querying 1 replica, you will never be able to do read repair. That's where read_repair_chance come into play. What it really control, is how often we query *more* replica than strictly required by the consistency level. And it happens that the reason you would want to do that is because of read repair and hence the option name. But read repair potentially kicks in anytime more than replica answer a query. One corollary is that read_repair_chance has no impact whatsoever at CL.ALL. -- Sylvain On Wed, Jan 16, 2013 at 1:55 PM, Carlos Pérez Miguel cperez...@gmail.comwrote: Hi, I am trying to understand the read path in Cassandra. I've read Cassandra's documentation and it seems that the read path is like this: - Client contacts with a proxy node which performs the operation over certain object - Proxy node sends requests to every replica of that object - Replica nodes answers eventually if they are up - After the first R replicas answer, the proxy node returns value to the client. - If some of the replicas are non updated and readrepair is active, proxy node updates those replicas. Ok, so far so good. But now I found some incoherences that I don't understand: Let's suppose that we have a 5 node cluster: x1, x2, x3, x4 and x5 each with replication factor 3, read_repair_chance=0.0, autobootstrap=false and caching=NONE We have keyspace KS1 and colunfamily CF1. With this configuration, we know that if any node crashes and erases its data directories it will be necesary to run nodetool repair in that node in order to repair that node and gather information from its replica companions. So, let's suppose that x1, x2 and x3 are the endpoint which stores the data KS1.CF1['data1'] If x1 crashes (loosing all its data), and we execute get KS1.CF1['data1'] with consistency level ALL, the operation will fail. That is ok to my understanding. If we restart x1 node and doesn't execute nodetool repair and repeat the operation get KS1.CF1['data1'] using consistency ALL, we will obtain the original data! Why? one of the nodes doesn't have any data about KS1.CF1['data1']. Ok, let's suppose that as all the required nodes answer, even if one doesn't have data, the operation ends correctly. Now let's repeat the same procedure with the rest of nodes, that is: 1- stop x1, erase data, logs, cache and commitlog from x1 2- restart x1 adn don't repair it 3- stop x2, erase data, logs, cache and commitlog from x2 4- restart x2 adn don't repair it 5- stop x3, erase data, logs, cache and commitlog from x3 6- restart x3 adn don't repair it 7- execute get KS1.CF1['data1'] with consistency level ALL - still return the correct data! Where did that data come from? the endpoint is supposed to be empty of data. I tried this using cassandra-cli and cassandra's ruby client and the result is always the same. What did I miss? Thank you for reading until the end, ;) Bye Carlos Pérez Miguel
Re: Cassandra 1.1.2 - 1.1.8 upgrade
always check NEWS.txt for instance for cassandra 1.1.3 you need to run nodetool upgradesstables if your cf has counter. On Wed, Jan 16, 2013 at 11:58 PM, Mike mthero...@yahoo.com wrote: Hello, We are looking to upgrade our Cassandra cluster from 1.1.2 - 1.1.8 (or possibly 1.1.9 depending on timing). It is my understanding that rolling upgrades of Cassandra is supported, so as we upgrade our cluster, we can do so one node at a time without experiencing downtime. Has anyone had any gotchas recently that I should be aware of before performing this upgrade? In order to upgrade, is the only thing that needs to change are the JAR files? Can everything remain as-is? Thanks, -Mike
Re: Cassandra 1.1.2 - 1.1.8 upgrade
Thanks for pointing that out. Given upgradesstables can only be run on a live node, does anyone know if there is a danger of having this node in the cluster while this is being performed? Also, can anyone confirm this only needs to be done on counter counter column families, or all column families (the former makes sense, I'm just making sure). -Mike On 1/16/2013 11:08 AM, Jason Wee wrote: always check NEWS.txt for instance for cassandra 1.1.3 you need to run nodetool upgradesstables if your cf has counter. On Wed, Jan 16, 2013 at 11:58 PM, Mike mthero...@yahoo.com mailto:mthero...@yahoo.com wrote: Hello, We are looking to upgrade our Cassandra cluster from 1.1.2 - 1.1.8 (or possibly 1.1.9 depending on timing). It is my understanding that rolling upgrades of Cassandra is supported, so as we upgrade our cluster, we can do so one node at a time without experiencing downtime. Has anyone had any gotchas recently that I should be aware of before performing this upgrade? In order to upgrade, is the only thing that needs to change are the JAR files? Can everything remain as-is? Thanks, -Mike
Re: read path, I have missed something
a, ok. Now I understand where the data came from. When using CL.ALL read_repair always repairs inconsistent data. Thanks a lot, Sylvain. Carlos Pérez Miguel 2013/1/17 Sylvain Lebresne sylv...@datastax.com You're missing the correct definition of read_repair_chance. When you do a read at CL.ALL, all replicas are wait upon and the results from all those replicas are compared. From that, we can extract which nodes are not up to date, i.e. which ones can be read repair. And if some node need to be repair, we do it. Always, whatever the value of read_repair_chance is. Now if you do a read at CL.ONE, if you only end up querying 1 replica, you will never be able to do read repair. That's where read_repair_chance come into play. What it really control, is how often we query *more* replica than strictly required by the consistency level. And it happens that the reason you would want to do that is because of read repair and hence the option name. But read repair potentially kicks in anytime more than replica answer a query. One corollary is that read_repair_chance has no impact whatsoever at CL.ALL. -- Sylvain On Wed, Jan 16, 2013 at 1:55 PM, Carlos Pérez Miguel cperez...@gmail.comwrote: Hi, I am trying to understand the read path in Cassandra. I've read Cassandra's documentation and it seems that the read path is like this: - Client contacts with a proxy node which performs the operation over certain object - Proxy node sends requests to every replica of that object - Replica nodes answers eventually if they are up - After the first R replicas answer, the proxy node returns value to the client. - If some of the replicas are non updated and readrepair is active, proxy node updates those replicas. Ok, so far so good. But now I found some incoherences that I don't understand: Let's suppose that we have a 5 node cluster: x1, x2, x3, x4 and x5 each with replication factor 3, read_repair_chance=0.0, autobootstrap=false and caching=NONE We have keyspace KS1 and colunfamily CF1. With this configuration, we know that if any node crashes and erases its data directories it will be necesary to run nodetool repair in that node in order to repair that node and gather information from its replica companions. So, let's suppose that x1, x2 and x3 are the endpoint which stores the data KS1.CF1['data1'] If x1 crashes (loosing all its data), and we execute get KS1.CF1['data1'] with consistency level ALL, the operation will fail. That is ok to my understanding. If we restart x1 node and doesn't execute nodetool repair and repeat the operation get KS1.CF1['data1'] using consistency ALL, we will obtain the original data! Why? one of the nodes doesn't have any data about KS1.CF1['data1']. Ok, let's suppose that as all the required nodes answer, even if one doesn't have data, the operation ends correctly. Now let's repeat the same procedure with the rest of nodes, that is: 1- stop x1, erase data, logs, cache and commitlog from x1 2- restart x1 adn don't repair it 3- stop x2, erase data, logs, cache and commitlog from x2 4- restart x2 adn don't repair it 5- stop x3, erase data, logs, cache and commitlog from x3 6- restart x3 adn don't repair it 7- execute get KS1.CF1['data1'] with consistency level ALL - still return the correct data! Where did that data come from? the endpoint is supposed to be empty of data. I tried this using cassandra-cli and cassandra's ruby client and the result is always the same. What did I miss? Thank you for reading until the end, ;) Bye Carlos Pérez Miguel
Re: Cassandra 1.1.2 - 1.1.8 upgrade
upgradesstables is safe, but it is essentially compaction (because sstables are immutable it rewrites the sstable in the new format) so you'll want to do it when traffic is low to avoid IO issues. upgradesstables always needs to be done between majors. While 1.1.2 - 1.1.8 is not a major, due to an unforeseen bug in the conversion to microseconds you'll need to run upgradesstables. You can check if all of your sstables have been upgraded by looking at the file names of the sstables. Your files should be –hd– 1.1.8 will be –hf– I don't remember there being changes to cassandra.yaml between 1.1.2 - 1.1.7 but you might want to check out a clean copy and compare your yaml to make sure some of the recommended default haven't changed or required new config options. Otherwise, nodetool drain, stop the service, upgrade to the new release, and start the service, upgradesstables Hinted Handoff should take care of anything while the node is down but if you want to be extra safe you can do a repair –pr on every node but you should be doing that on a regular basis anyways! Hope this helps. Best, michael From: Mike mthero...@yahoo.commailto:mthero...@yahoo.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Wednesday, January 16, 2013 8:15 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Cassandra 1.1.2 - 1.1.8 upgrade Thanks for pointing that out. Given upgradesstables can only be run on a live node, does anyone know if there is a danger of having this node in the cluster while this is being performed? Also, can anyone confirm this only needs to be done on counter counter column families, or all column families (the former makes sense, I'm just making sure). -Mike On 1/16/2013 11:08 AM, Jason Wee wrote: always check NEWS.txt for instance for cassandra 1.1.3 you need to run nodetool upgradesstables if your cf has counter. On Wed, Jan 16, 2013 at 11:58 PM, Mike mthero...@yahoo.commailto:mthero...@yahoo.com wrote: Hello, We are looking to upgrade our Cassandra cluster from 1.1.2 - 1.1.8 (or possibly 1.1.9 depending on timing). It is my understanding that rolling upgrades of Cassandra is supported, so as we upgrade our cluster, we can do so one node at a time without experiencing downtime. Has anyone had any gotchas recently that I should be aware of before performing this upgrade? In order to upgrade, is the only thing that needs to change are the JAR files? Can everything remain as-is? Thanks, -Mike
Re: read path, I have missed something
Hi there, I am sorry to get into this thread with more questions but isn't the gossip protocol in charge of making the read_repair automatically anytime a new node comes into the ring? I mean if a node is down, then we get that node up and running again, wouldn't it be synchronized automatically? Thanks! Renato M. 2013/1/16 Carlos Pérez Miguel cperez...@gmail.com: a, ok. Now I understand where the data came from. When using CL.ALL read_repair always repairs inconsistent data. Thanks a lot, Sylvain. Carlos Pérez Miguel 2013/1/17 Sylvain Lebresne sylv...@datastax.com You're missing the correct definition of read_repair_chance. When you do a read at CL.ALL, all replicas are wait upon and the results from all those replicas are compared. From that, we can extract which nodes are not up to date, i.e. which ones can be read repair. And if some node need to be repair, we do it. Always, whatever the value of read_repair_chance is. Now if you do a read at CL.ONE, if you only end up querying 1 replica, you will never be able to do read repair. That's where read_repair_chance come into play. What it really control, is how often we query *more* replica than strictly required by the consistency level. And it happens that the reason you would want to do that is because of read repair and hence the option name. But read repair potentially kicks in anytime more than replica answer a query. One corollary is that read_repair_chance has no impact whatsoever at CL.ALL. -- Sylvain On Wed, Jan 16, 2013 at 1:55 PM, Carlos Pérez Miguel cperez...@gmail.com wrote: Hi, I am trying to understand the read path in Cassandra. I've read Cassandra's documentation and it seems that the read path is like this: - Client contacts with a proxy node which performs the operation over certain object - Proxy node sends requests to every replica of that object - Replica nodes answers eventually if they are up - After the first R replicas answer, the proxy node returns value to the client. - If some of the replicas are non updated and readrepair is active, proxy node updates those replicas. Ok, so far so good. But now I found some incoherences that I don't understand: Let's suppose that we have a 5 node cluster: x1, x2, x3, x4 and x5 each with replication factor 3, read_repair_chance=0.0, autobootstrap=false and caching=NONE We have keyspace KS1 and colunfamily CF1. With this configuration, we know that if any node crashes and erases its data directories it will be necesary to run nodetool repair in that node in order to repair that node and gather information from its replica companions. So, let's suppose that x1, x2 and x3 are the endpoint which stores the data KS1.CF1['data1'] If x1 crashes (loosing all its data), and we execute get KS1.CF1['data1'] with consistency level ALL, the operation will fail. That is ok to my understanding. If we restart x1 node and doesn't execute nodetool repair and repeat the operation get KS1.CF1['data1'] using consistency ALL, we will obtain the original data! Why? one of the nodes doesn't have any data about KS1.CF1['data1']. Ok, let's suppose that as all the required nodes answer, even if one doesn't have data, the operation ends correctly. Now let's repeat the same procedure with the rest of nodes, that is: 1- stop x1, erase data, logs, cache and commitlog from x1 2- restart x1 adn don't repair it 3- stop x2, erase data, logs, cache and commitlog from x2 4- restart x2 adn don't repair it 5- stop x3, erase data, logs, cache and commitlog from x3 6- restart x3 adn don't repair it 7- execute get KS1.CF1['data1'] with consistency level ALL - still return the correct data! Where did that data come from? the endpoint is supposed to be empty of data. I tried this using cassandra-cli and cassandra's ruby client and the result is always the same. What did I miss? Thank you for reading until the end, ;) Bye Carlos Pérez Miguel
Re: How can OpsCenter show me Read Request Latency where there are no read requests??
When you view OpsCenter metrics, you're generating a small number of reads to fetch the metric data, which is why your read count is near zero instead of actually being zero. Since reads are still occurring, Cassandra will continue to show a read latency. Basically, you're just viewing the latency on the reads to fetch metric data. Normally the number of reads required to view metrics are small enough that they only make a minor difference in your overall read latency average, but when you have no other reads occurring, they're the only reads that are included in the average. On Tue, Jan 15, 2013 at 9:28 PM, Brian Tarbox tar...@cabotresearch.comwrote: I am making heavy use of DataStax OpsCenter to help tune my system and its great. And yet puzzling. I see my clients do a burst of Reads causing the OpsCenter Read Requests chart to go up and stay up until the clients finish doing their reads. The read request latency chart also goes upbut it stays up even after all the reads are done. At last glance I've had next to zero reads for 10 minutes but still have a read request latency thats basically unchanged from when there were actual reads. How am I to interpret this? Thanks. Brian Tarbox -- Tyler Hobbs DataStax http://datastax.com/
Re: read path, I have missed something
I mean if a node is down, then we get that node up and running again, wouldn't it be synchronized automatically? It will, thanks to hinted handoff (not gossip, gossip only handle the ring topology and a bunch of metadata, it doesn't deal with data synchronization at all). But hinted handoff are not bulletproof (if only because hinted handoff expire after some time if they are not delivered). And you're right, that's probably why Carlos' example worked as he observed it, especially since he didn't mentioned reads between his stop/erase/restart steps. Anyway, my description of read_repair_chance is still correct if someone wonder about that :) -- Sylvain Thanks! Renato M. 2013/1/16 Carlos Pérez Miguel cperez...@gmail.com: a, ok. Now I understand where the data came from. When using CL.ALL read_repair always repairs inconsistent data. Thanks a lot, Sylvain. Carlos Pérez Miguel 2013/1/17 Sylvain Lebresne sylv...@datastax.com You're missing the correct definition of read_repair_chance. When you do a read at CL.ALL, all replicas are wait upon and the results from all those replicas are compared. From that, we can extract which nodes are not up to date, i.e. which ones can be read repair. And if some node need to be repair, we do it. Always, whatever the value of read_repair_chance is. Now if you do a read at CL.ONE, if you only end up querying 1 replica, you will never be able to do read repair. That's where read_repair_chance come into play. What it really control, is how often we query *more* replica than strictly required by the consistency level. And it happens that the reason you would want to do that is because of read repair and hence the option name. But read repair potentially kicks in anytime more than replica answer a query. One corollary is that read_repair_chance has no impact whatsoever at CL.ALL. -- Sylvain On Wed, Jan 16, 2013 at 1:55 PM, Carlos Pérez Miguel cperez...@gmail.com wrote: Hi, I am trying to understand the read path in Cassandra. I've read Cassandra's documentation and it seems that the read path is like this: - Client contacts with a proxy node which performs the operation over certain object - Proxy node sends requests to every replica of that object - Replica nodes answers eventually if they are up - After the first R replicas answer, the proxy node returns value to the client. - If some of the replicas are non updated and readrepair is active, proxy node updates those replicas. Ok, so far so good. But now I found some incoherences that I don't understand: Let's suppose that we have a 5 node cluster: x1, x2, x3, x4 and x5 each with replication factor 3, read_repair_chance=0.0, autobootstrap=false and caching=NONE We have keyspace KS1 and colunfamily CF1. With this configuration, we know that if any node crashes and erases its data directories it will be necesary to run nodetool repair in that node in order to repair that node and gather information from its replica companions. So, let's suppose that x1, x2 and x3 are the endpoint which stores the data KS1.CF1['data1'] If x1 crashes (loosing all its data), and we execute get KS1.CF1['data1'] with consistency level ALL, the operation will fail. That is ok to my understanding. If we restart x1 node and doesn't execute nodetool repair and repeat the operation get KS1.CF1['data1'] using consistency ALL, we will obtain the original data! Why? one of the nodes doesn't have any data about KS1.CF1['data1']. Ok, let's suppose that as all the required nodes answer, even if one doesn't have data, the operation ends correctly. Now let's repeat the same procedure with the rest of nodes, that is: 1- stop x1, erase data, logs, cache and commitlog from x1 2- restart x1 adn don't repair it 3- stop x2, erase data, logs, cache and commitlog from x2 4- restart x2 adn don't repair it 5- stop x3, erase data, logs, cache and commitlog from x3 6- restart x3 adn don't repair it 7- execute get KS1.CF1['data1'] with consistency level ALL - still return the correct data! Where did that data come from? the endpoint is supposed to be empty of data. I tried this using cassandra-cli and cassandra's ruby client and the result is always the same. What did I miss? Thank you for reading until the end, ;) Bye Carlos Pérez Miguel
Re: How can OpsCenter show me Read Request Latency where there are no read requests??
Hmm, that's sense but then why is the latency for the reads that get the metric often so high (several thousand uSecs) and why does it so closely track the latency of my normal reads? On Wed, Jan 16, 2013 at 12:14 PM, Tyler Hobbs ty...@datastax.com wrote: When you view OpsCenter metrics, you're generating a small number of reads to fetch the metric data, which is why your read count is near zero instead of actually being zero. Since reads are still occurring, Cassandra will continue to show a read latency. Basically, you're just viewing the latency on the reads to fetch metric data. Normally the number of reads required to view metrics are small enough that they only make a minor difference in your overall read latency average, but when you have no other reads occurring, they're the only reads that are included in the average. On Tue, Jan 15, 2013 at 9:28 PM, Brian Tarbox tar...@cabotresearch.comwrote: I am making heavy use of DataStax OpsCenter to help tune my system and its great. And yet puzzling. I see my clients do a burst of Reads causing the OpsCenter Read Requests chart to go up and stay up until the clients finish doing their reads. The read request latency chart also goes upbut it stays up even after all the reads are done. At last glance I've had next to zero reads for 10 minutes but still have a read request latency thats basically unchanged from when there were actual reads. How am I to interpret this? Thanks. Brian Tarbox -- Tyler Hobbs DataStax http://datastax.com/
Re: read path, I have missed something
Thanks for the explanation Sylvain! 2013/1/16 Sylvain Lebresne sylv...@datastax.com: I mean if a node is down, then we get that node up and running again, wouldn't it be synchronized automatically? It will, thanks to hinted handoff (not gossip, gossip only handle the ring topology and a bunch of metadata, it doesn't deal with data synchronization at all). But hinted handoff are not bulletproof (if only because hinted handoff expire after some time if they are not delivered). And you're right, that's probably why Carlos' example worked as he observed it, especially since he didn't mentioned reads between his stop/erase/restart steps. Anyway, my description of read_repair_chance is still correct if someone wonder about that :) -- Sylvain Thanks! Renato M. 2013/1/16 Carlos Pérez Miguel cperez...@gmail.com: a, ok. Now I understand where the data came from. When using CL.ALL read_repair always repairs inconsistent data. Thanks a lot, Sylvain. Carlos Pérez Miguel 2013/1/17 Sylvain Lebresne sylv...@datastax.com You're missing the correct definition of read_repair_chance. When you do a read at CL.ALL, all replicas are wait upon and the results from all those replicas are compared. From that, we can extract which nodes are not up to date, i.e. which ones can be read repair. And if some node need to be repair, we do it. Always, whatever the value of read_repair_chance is. Now if you do a read at CL.ONE, if you only end up querying 1 replica, you will never be able to do read repair. That's where read_repair_chance come into play. What it really control, is how often we query *more* replica than strictly required by the consistency level. And it happens that the reason you would want to do that is because of read repair and hence the option name. But read repair potentially kicks in anytime more than replica answer a query. One corollary is that read_repair_chance has no impact whatsoever at CL.ALL. -- Sylvain On Wed, Jan 16, 2013 at 1:55 PM, Carlos Pérez Miguel cperez...@gmail.com wrote: Hi, I am trying to understand the read path in Cassandra. I've read Cassandra's documentation and it seems that the read path is like this: - Client contacts with a proxy node which performs the operation over certain object - Proxy node sends requests to every replica of that object - Replica nodes answers eventually if they are up - After the first R replicas answer, the proxy node returns value to the client. - If some of the replicas are non updated and readrepair is active, proxy node updates those replicas. Ok, so far so good. But now I found some incoherences that I don't understand: Let's suppose that we have a 5 node cluster: x1, x2, x3, x4 and x5 each with replication factor 3, read_repair_chance=0.0, autobootstrap=false and caching=NONE We have keyspace KS1 and colunfamily CF1. With this configuration, we know that if any node crashes and erases its data directories it will be necesary to run nodetool repair in that node in order to repair that node and gather information from its replica companions. So, let's suppose that x1, x2 and x3 are the endpoint which stores the data KS1.CF1['data1'] If x1 crashes (loosing all its data), and we execute get KS1.CF1['data1'] with consistency level ALL, the operation will fail. That is ok to my understanding. If we restart x1 node and doesn't execute nodetool repair and repeat the operation get KS1.CF1['data1'] using consistency ALL, we will obtain the original data! Why? one of the nodes doesn't have any data about KS1.CF1['data1']. Ok, let's suppose that as all the required nodes answer, even if one doesn't have data, the operation ends correctly. Now let's repeat the same procedure with the rest of nodes, that is: 1- stop x1, erase data, logs, cache and commitlog from x1 2- restart x1 adn don't repair it 3- stop x2, erase data, logs, cache and commitlog from x2 4- restart x2 adn don't repair it 5- stop x3, erase data, logs, cache and commitlog from x3 6- restart x3 adn don't repair it 7- execute get KS1.CF1['data1'] with consistency level ALL - still return the correct data! Where did that data come from? the endpoint is supposed to be empty of data. I tried this using cassandra-cli and cassandra's ruby client and the result is always the same. What did I miss? Thank you for reading until the end, ;) Bye Carlos Pérez Miguel
trying to use row_cache (b/c we have hot rows) but nodetool info says zero requests
We have quite wide rows and do a lot of concentrated processing on each row...so I thought I'd try the row cache on one node in my cluster to see if I could detect an effect of using it. The problem is that nodetool info says that even with a two gig row_cache we're getting zero requests. Since my client program is actively processing, and since keycache shows lots of activity I'm puzzled. Shouldn't any read of a column cause the entire row to be loaded? My entire data file is only 32 gig right now so its hard to imagine the 2 gig is too small to hold even a single row? Any suggestions how to proceed are appreciated. Thanks. Brian Tarbox
unsubscribe
Leonid Ilyevsky Moon Capital Management, LP 499 Park Avenue New York, NY 10022 P: (212) 652-4586 F: (212) 652-4501 E: lilyev...@mooncapital.com [cid:image001.png@01CDF3EE.E9EA60F0] This email, along with any attachments, is confidential and may be legally privileged or otherwise protected from disclosure. Any unauthorized dissemination, copying or use of the contents of this email is strictly prohibited and may be in violation of law. If you are not the intended recipient, any disclosure, copying, forwarding or distribution of this email is strictly prohibited and this email and any attachments should be deleted immediately. This email and any attachments do not constitute an offer to sell or a solicitation of an offer to purchase any interest in any investment vehicle sponsored by Moon Capital Management LP (Moon Capital). Moon Capital does not provide legal, accounting or tax advice. Any statement regarding legal, accounting or tax matters was not intended or written to be relied upon by any person as advice. Moon Capital does not waive confidentiality or privilege as a result of this email. inline: image001.png
Re: unsubscribe
Writing to the list user@cassandra.apache.org Subscription addressuser-subscr...@cassandra.apache.org Digest subscription address user-digest-subscr...@cassandra.apache.org Unsubscription addressesuser-unsubscr...@cassandra.apache.org Getting help with the list user-h...@cassandra.apache.org From: Leonid Ilyevsky lilyev...@mooncapital.commailto:lilyev...@mooncapital.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Wednesday, January 16, 2013 10:39 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: unsubscribe Leonid Ilyevsky Moon Capital Management, LP 499 Park Avenue New York, NY 10022 P: (212) 652-4586 F: (212) 652-4501 E: lilyev...@mooncapital.commailto:lilyev...@mooncapital.com [cid:image001.png@01CDF3EE.E9EA60F0] This email, along with any attachments, is confidential and may be legally privileged or otherwise protected from disclosure. Any unauthorized dissemination, copying or use of the contents of this email is strictly prohibited and may be in violation of law. If you are not the intended recipient, any disclosure, copying, forwarding or distribution of this email is strictly prohibited and this email and any attachments should be deleted immediately. This email and any attachments do not constitute an offer to sell or a solicitation of an offer to purchase any interest in any investment vehicle sponsored by Moon Capital Management LP (“Moon Capital”). Moon Capital does not provide legal, accounting or tax advice. Any statement regarding legal, accounting or tax matters was not intended or written to be relied upon by any person as advice. Moon Capital does not waive confidentiality or privilege as a result of this email. attachment: image001.png
Re: Pig / Map Reduce on Cassandra
Brisk is pretty much stagnant. I think someone forked it to work with 1.0 but not sure how that is going. You'll need to pay for DSE to get CFS (which is essentially Brisk) if you want to use any modern version of C*. Best, Michael On 1/16/13 11:17 AM, cscetbon@orange.com cscetbon@orange.com wrote: Thanks I understand that your code uses the hadoop interface of Cassandra to be able to read from it with a job. However I would like to know how to bring pieces (hive + pig + hadoop) together with cassandra as the storage layer, not to get code to test it. I have found repository https://github.com/riptano/brisk which might be a good start for it Regards On Jan 16, 2013, at 4:27 PM, James Schappet jschap...@gmail.com wrote: Try this one then, it reads from cassandra, then writes back to cassandra, but you could change the write to where ever you would like. getConf().set(IN_COLUMN_NAME, columnName ); Job job = new Job(getConf(), ProcessRawXml); job.setInputFormatClass(ColumnFamilyInputFormat.class); job.setNumReduceTasks(0); job.setJarByClass(StartJob.class); job.setMapperClass(ParseMapper.class); job.setOutputKeyClass(ByteBuffer.class); //job.setOutputValueClass(Text.class); job.setOutputFormatClass(ColumnFamilyOutputFormat.class); ConfigHelper.setOutputColumnFamily(job.getConfiguration(), KEYSPACE, COLUMN_FAMILY); job.setInputFormatClass(ColumnFamilyInputFormat.class); ConfigHelper.setRpcPort(job.getConfiguration(), 9160); //org.apache.cassandra.dht.LocalPartitioner ConfigHelper.setInitialAddress(job.getConfiguration(), localhost); ConfigHelper.setPartitioner(job.getConfiguration(), org.apache.cassandra.dht.RandomPartitioner); ConfigHelper.setInputColumnFamily(job.getConfiguration(), KEYSPACE, COLUMN_FAMILY); SlicePredicate predicate = new SlicePredicate().setColumn_names(Arrays.asList(ByteBufferUtil.bytes(colum nN ame))); // SliceRange slice_range = new SliceRange(); // slice_range.setStart(ByteBufferUtil.bytes(startPoint)); // slice_range.setFinish(ByteBufferUtil.bytes(endPoint)); // // predicate.setSlice_range(slice_range); ConfigHelper.setInputSlicePredicate(job.getConfiguration(), predicate); job.waitForCompletion(true); https://github.com/jschappet/medline/blob/master/src/main/java/edu/uiowa/ ic ts/jobs/ProcessXml/StartJob.java On 1/16/13 9:22 AM, cscetbon@orange.com cscetbon@orange.com wrote: I don't want to write to Cassandra as it replicates data from another datacenter, but I just want to use Hadoop Jobs (Pig and Hive) to read data from it. I would like to use the same configuration as http://www.datastax.com/dev/blog/hadoop-mapreduce-in-the-cassandra-clust er but I want to know if there are alternatives to DataStax Enterprise package. Thanks On Jan 16, 2013, at 3:59 PM, James Schappet jschap...@gmail.com wrote: Here are a few examples I have worked on, reading from xml.gz files then writing to cassandara. https://github.com/jschappet/medline You will also need: https://github.com/jschappet/medline-base These examples are Hadoop Jobs using Cassandra as the Data Store. This one is a good place to start. https://github.com/jschappet/medline/blob/master/src/main/java/edu/uiow a/ ic ts/jobs/LoadMedline/StartJob.java ConfigHelper.setInputColumnFamily(job.getConfiguration(), KEYSPACE, COLUMN_FAMILY); ConfigHelper.setOutputColumnFamily(job.getConfiguration(), KEYSPACE, outputPath); job.setMapperClass(MapperToCassandra.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); LOG.info(Writing output to Cassandra); //job.setReducerClass(ReducerToCassandra.class); job.setOutputFormatClass(ColumnFamilyOutputFormat.class); ConfigHelper.setRpcPort(job.getConfiguration(), 9160); //org.apache.cassandra.dht.LocalPartitioner ConfigHelper.setInitialAddress(job.getConfiguration(), localhost); ConfigHelper.setPartitioner(job.getConfiguration(), org.apache.cassandra.dht.RandomPartitioner); On 1/16/13 7:37 AM, cscetbon@orange.com cscetbon@orange.com wrote: Hi, I know that DataStax Enterprise package provide Brisk, but is there a community version ? Is it easy to interface Hadoop with Cassandra as the storage or do we absolutely have to use Brisk for that ? I know CassandraFS is natively available in cassandra 1.2, the version I use, so is there a way/procedure to interface hadoop with Cassandra as the storage ? Thanks __ __ __ ___ Ce message et ses
LCS not removing rows with all TTL expired columns
On cassandra 1.1.5 with a write heavy workload, we're having problems getting rows to be compacted away (removed) even though all columns have expired TTL. We've tried size tiered and now leveled and are seeing the same symptom: the data stays around essentially forever. Currently we write all columns with a TTL of 72 hours (259200 seconds) and expect to add 10 GB of data to this CF per day per node. Each node currently has 73 GB for the affected CF and shows no indications that old rows will be removed on their own. Why aren't rows being removed? Below is some data from a sample row which should have been removed several days ago but is still around even though it has been involved in numerous compactions since being expired. $ ./bin/nodetool -h localhost getsstables metrics request_summary 459fb460-5ace-11e2-9b92-11d67b6163b4 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db $ ls -alF /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db $ ./bin/sstable2json /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump -e '36/1 %x') { 34353966623436302d356163652d313165322d396239322d313164363762363136336234: [[app_name,50f21d3d,1357785277207001,d], [client_ip,50f21d3d,1357785277207001,d], [client_req_id,50f21d3d,1357785277207001,d], [mysql_call_cnt,50f21d3d,1357785277207001,d], [mysql_duration_us,50f21d3d,1357785277207001,d], [mysql_failure_call_cnt,50f21d3d,1357785277207001,d], [mysql_success_call_cnt,50f21d3d,1357785277207001,d], [req_duration_us,50f21d3d,1357785277207001,d], [req_finish_time_us,50f21d3d,1357785277207001,d], [req_method,50f21d3d,1357785277207001,d], [req_service,50f21d3d,1357785277207001,d], [req_start_time_us,50f21d3d,1357785277207001,d], [success,50f21d3d,1357785277207001,d]] } Decoding the column timestamps to shows that the columns were written at Thu, 10 Jan 2013 02:34:37 GMT and that their TTL expired at Sun, 13 Jan 2013 02:34:37 GMT. The date of the SSTable shows that it was generated on Jan 16 which is 3 days after all columns have TTL-ed out. The schema shows that gc_grace is set to 0 since this data is write-once, read-seldom and is never updated or deleted. create column family request_summary with column_type = 'Standard' and comparator = 'UTF8Type' and default_validation_class = 'UTF8Type' and key_validation_class = 'UTF8Type' and read_repair_chance = 0.1 and dclocal_read_repair_chance = 0.0 and gc_grace = 0 and min_compaction_threshold = 4 and max_compaction_threshold = 32 and replicate_on_write = true and compaction_strategy = 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy' and caching = 'NONE' and bloom_filter_fp_chance = 1.0 and compression_options = {'chunk_length_kb' : '64', 'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'}; Thanks in advance for help in understanding why rows such as this are not removed! -Bryan
Re: Query column names
What I mean is that if there is a way of doing this but using Hector: - public static void main(String[] args) throws Exception { Connector conn = new Connector(); Cassandra.Client client = conn.connect(); SlicePredicate predicate = new SlicePredicate(); Listbyte[] colNames = new ArrayListbyte[](); colNames.add(a.getBytes()); colNames.add(b.getBytes()); predicate.column_names = colNames; ColumnParent parent = new ColumnParent(Standard1); byte[] key = k1.getBytes(); ListColumnOrSuperColumn results = client.get_slice(key, parent, predicate, ConsistencyLevel.ONE); for (ColumnOrSuperColumn cosc : results) { Column c = cosc.column; System.out.println(new String(c.name, UTF-8) + : + new String(c.value, UTF-8)); } conn.close(); System.out.println(All done.); } - Thanks! 2013/1/16 Renato Marroquín Mogrovejo renatoj.marroq...@gmail.com: Hi, I am facing some problems while retrieving a some events from a column family. I am using as column family name the event name plus the timestamp of when it occurred. The thing is that now I want to find out the latest event and I don't how to query asking for the last event without a RangeSlicesQuery, getting all rows, and columns, and asking one by one. Is there any other better way of doing this using Hector client? [default@clickstream] list click_event; --- RowKey: 706d63666164696e3a31396132613664322d633730642d343139362d623638642d396663663638343766333563 = (column=start:2013-01-13 18:14:59.244, value=, timestamp=1358118943979000) = (column=stop:2013-01-13 18:15:56.793, value=323031332d30312d31332031383a31353a35382e333437, timestamp=1358118960946000) Thanks in advance! Renato M.
Re: AWS EMR - Cassandra
William, I just saw your message today. I am using Cassandra + Amazon EMR (hadoop 1.0.3) but I am not using PIG as you are. I set my configuration vars in Java, as I have a custom jar file and I am using ColumnFamilyInputFormat. However, if I understood well your problem, the only thing you have to do is to set environment vars when running cluster tasks, right? Take a look a this link: http://sujee.net/tech/articles/hadoop/amazon-emr-beyond-basics/ As it shows, you can run EMR setting some command line arguments that specify a script to be executed before the job starts, in each machine in the cluster. This way, you would be able to correctly set the vars you need. Out of curiosity, could you share what are you using for cassandra storage? I am currently using EC2 local disks, but I am looking for an alternative. Best regards, Marcelo. 2013/1/4 William Oberman ober...@civicscience.com So I've made it work, but I don't get it yet. I have no idea why my DIY server works when I set the environment variables on the machine that kicks off pig (master), and in EMR it doesn't. I recompiled ConfigHelper and CassandraStorage with tons of debugging, and in EMR I can see the hadoop Configuration object get the proper values on the master node, and I can see it does NOT propagate to the task threads. The other part that was driving me nuts could be made more user friendly. The issue is this: I started to try to set cassandra.thrift.address, cassandra.thrift.port, cassandra.partitioner.class in mapred-site.xml, and it didn't work. After even more painful debugging, I noticed that the only time Cassandra sets the input/output versions of those settings (and these input/output specific versions are the only versions really used!) is when Cassandra maps the system environment variables. So, having cassandra.thrift.address in mapred-site.xml does NOTHING, as I needed to have cassandra.output.thrift.address set. It would be much nicer if the get{Input/Output}XYZ checked for the existence of getXYZ if get{Input/Output}XYZ is empty/null. E.g. in getOutputThriftAddress(), if that setting is null, it would have been nice if that method returned getThriftAddress(). My problem went away when I put the full cross product in the XML. E.g. cassandra.input.thrift.address and cassandra.output.thrift.address (and port, and partitioner). I still want to know why the old easy way (of setting the 3 system variables on the pig starter box, and having the config flow into the task trackers) doesn't work! will On Fri, Jan 4, 2013 at 9:04 AM, William Oberman ober...@civicscience.comwrote: On all tasktrackers, I see: java.io.IOException: PIG_OUTPUT_INITIAL_ADDRESS or PIG_INITIAL_ADDRESS environment variable not set at org.apache.cassandra.hadoop.pig.CassandraStorage.setStoreLocation(CassandraStorage.java:821) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setLocation(PigOutputFormat.java:170) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.setUpContext(PigOutputCommitter.java:112) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.getCommitters(PigOutputCommitter.java:86) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.init(PigOutputCommitter.java:67) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getOutputCommitter(PigOutputFormat.java:279) at org.apache.hadoop.mapred.Task.initialize(Task.java:515) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:358) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132) at org.apache.hadoop.mapred.Child.main(Child.java:249) On Thu, Jan 3, 2013 at 10:45 PM, aaron morton aa...@thelastpickle.comwrote: Instead, I get an error from CassandraStorage that the initial address isn't set (on the slave, the master is ok). Can you post the full error ? Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 4/01/2013, at 11:15 AM, William Oberman ober...@civicscience.com wrote: Anyone ever try to read or write directly between EMR - Cassandra? I'm running various Cassandra resources in Ec2, so the physical connection part is pretty easy using security groups. But, I'm having some configuration issues. I have managed to get Cassandra + Hadoop working in the past using a DIY hadoop cluster, and looking at the configurations in the two environments (EMR vs DIY), I'm not sure what's different that is causing my failures... I should probably note I'm using
Re: Pig / Map Reduce on Cassandra
Here is the point. You're right this github repository has not been updated for a year and a half. I thought brisk was just a bundle of some technologies and that it was possible to install the same components and make them work together without using this bundle :( On Jan 16, 2013, at 8:22 PM, Michael Kjellman mkjell...@barracuda.com wrote: Brisk is pretty much stagnant. I think someone forked it to work with 1.0 but not sure how that is going. You'll need to pay for DSE to get CFS (which is essentially Brisk) if you want to use any modern version of C*. Best, Michael On 1/16/13 11:17 AM, cscetbon@orange.com cscetbon@orange.com wrote: Thanks I understand that your code uses the hadoop interface of Cassandra to be able to read from it with a job. However I would like to know how to bring pieces (hive + pig + hadoop) together with cassandra as the storage layer, not to get code to test it. I have found repository https://github.com/riptano/brisk which might be a good start for it Regards On Jan 16, 2013, at 4:27 PM, James Schappet jschap...@gmail.com wrote: Try this one then, it reads from cassandra, then writes back to cassandra, but you could change the write to where ever you would like. getConf().set(IN_COLUMN_NAME, columnName ); Job job = new Job(getConf(), ProcessRawXml); job.setInputFormatClass(ColumnFamilyInputFormat.class); job.setNumReduceTasks(0); job.setJarByClass(StartJob.class); job.setMapperClass(ParseMapper.class); job.setOutputKeyClass(ByteBuffer.class); //job.setOutputValueClass(Text.class); job.setOutputFormatClass(ColumnFamilyOutputFormat.class); ConfigHelper.setOutputColumnFamily(job.getConfiguration(), KEYSPACE, COLUMN_FAMILY); job.setInputFormatClass(ColumnFamilyInputFormat.class); ConfigHelper.setRpcPort(job.getConfiguration(), 9160); //org.apache.cassandra.dht.LocalPartitioner ConfigHelper.setInitialAddress(job.getConfiguration(), localhost); ConfigHelper.setPartitioner(job.getConfiguration(), org.apache.cassandra.dht.RandomPartitioner); ConfigHelper.setInputColumnFamily(job.getConfiguration(), KEYSPACE, COLUMN_FAMILY); SlicePredicate predicate = new SlicePredicate().setColumn_names(Arrays.asList(ByteBufferUtil.bytes(colum nN ame))); // SliceRange slice_range = new SliceRange(); // slice_range.setStart(ByteBufferUtil.bytes(startPoint)); // slice_range.setFinish(ByteBufferUtil.bytes(endPoint)); // // predicate.setSlice_range(slice_range); ConfigHelper.setInputSlicePredicate(job.getConfiguration(), predicate); job.waitForCompletion(true); https://github.com/jschappet/medline/blob/master/src/main/java/edu/uiowa/ ic ts/jobs/ProcessXml/StartJob.java On 1/16/13 9:22 AM, cscetbon@orange.com cscetbon@orange.com wrote: I don't want to write to Cassandra as it replicates data from another datacenter, but I just want to use Hadoop Jobs (Pig and Hive) to read data from it. I would like to use the same configuration as http://www.datastax.com/dev/blog/hadoop-mapreduce-in-the-cassandra-clust er but I want to know if there are alternatives to DataStax Enterprise package. Thanks On Jan 16, 2013, at 3:59 PM, James Schappet jschap...@gmail.com wrote: Here are a few examples I have worked on, reading from xml.gz files then writing to cassandara. https://github.com/jschappet/medline You will also need: https://github.com/jschappet/medline-base These examples are Hadoop Jobs using Cassandra as the Data Store. This one is a good place to start. https://github.com/jschappet/medline/blob/master/src/main/java/edu/uiow a/ ic ts/jobs/LoadMedline/StartJob.java ConfigHelper.setInputColumnFamily(job.getConfiguration(), KEYSPACE, COLUMN_FAMILY); ConfigHelper.setOutputColumnFamily(job.getConfiguration(), KEYSPACE, outputPath); job.setMapperClass(MapperToCassandra.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); LOG.info(Writing output to Cassandra); //job.setReducerClass(ReducerToCassandra.class); job.setOutputFormatClass(ColumnFamilyOutputFormat.class); ConfigHelper.setRpcPort(job.getConfiguration(), 9160); //org.apache.cassandra.dht.LocalPartitioner ConfigHelper.setInitialAddress(job.getConfiguration(), localhost); ConfigHelper.setPartitioner(job.getConfiguration(), org.apache.cassandra.dht.RandomPartitioner); On 1/16/13 7:37 AM, cscetbon@orange.com cscetbon@orange.com wrote: Hi, I know that DataStax Enterprise package provide Brisk, but is there a community version ? Is it easy to interface Hadoop with Cassandra as the storage or do we absolutely have to use
Cassandra at Amazon AWS
Hello, I am currently using hadoop + cassandra at amazon AWS. Cassandra runs on EC2 and my hadoop process runs at EMR. For cassandra storage, I am using local EC2 EBS disks. My system is running fine for my tests, but to me it's not a good setup for production. I need my system to perform well for specially for writes on cassandra, but the amount of data could grow really big, taking several Tb of total storage. My first guess was using S3 as a storage and I saw this can be done by using Cloudian package, but I wouldn't like to become dependent on a pre-package solution and I found it's kind of expensive for more than 100Tb: http://www.cloudian.com/pricing.html I saw some discussion at internet about using EBS or ephemeral disks for storage at Amazon too. My question is: does someone on this list have the same problem as me? What are you using as solution to Cassandra's storage when running it at Amazon AWS? Any thoughts would be highly appreciatted. Best regards, -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr
Re: Query column names
After searching for a while I found what I was looking for [1] Hope it helps to someone else (: Renato M. [1] http://www.datastax.com/dev/blog/introduction-to-composite-columns-part-1 2013/1/16 Renato Marroquín Mogrovejo renatoj.marroq...@gmail.com: What I mean is that if there is a way of doing this but using Hector: - public static void main(String[] args) throws Exception { Connector conn = new Connector(); Cassandra.Client client = conn.connect(); SlicePredicate predicate = new SlicePredicate(); Listbyte[] colNames = new ArrayListbyte[](); colNames.add(a.getBytes()); colNames.add(b.getBytes()); predicate.column_names = colNames; ColumnParent parent = new ColumnParent(Standard1); byte[] key = k1.getBytes(); ListColumnOrSuperColumn results = client.get_slice(key, parent, predicate, ConsistencyLevel.ONE); for (ColumnOrSuperColumn cosc : results) { Column c = cosc.column; System.out.println(new String(c.name, UTF-8) + : + new String(c.value, UTF-8)); } conn.close(); System.out.println(All done.); } - Thanks! 2013/1/16 Renato Marroquín Mogrovejo renatoj.marroq...@gmail.com: Hi, I am facing some problems while retrieving a some events from a column family. I am using as column family name the event name plus the timestamp of when it occurred. The thing is that now I want to find out the latest event and I don't how to query asking for the last event without a RangeSlicesQuery, getting all rows, and columns, and asking one by one. Is there any other better way of doing this using Hector client? [default@clickstream] list click_event; --- RowKey: 706d63666164696e3a31396132613664322d633730642d343139362d623638642d396663663638343766333563 = (column=start:2013-01-13 18:14:59.244, value=, timestamp=1358118943979000) = (column=stop:2013-01-13 18:15:56.793, value=323031332d30312d31332031383a31353a35382e333437, timestamp=1358118960946000) Thanks in advance! Renato M.
Re: AWS EMR - Cassandra
DataStax recommended (forget the reference) to use the ephemeral disks in RAID0, which is what I've been running for well over a year now in production. In terms of how I'm doing Cassandra/AWS/Hadoop, I started by doing the split data center thing (one DC for low latency queries, one DC for hadoop). But, that's a lot of system management. And compute is the most expensive part of AWS, and you need a LOT of compute to run this setup. I tried doing Cassandra EC2 cluster - snapshot - clone cluster with hadoop overlay - ETL to S3 using hadoop - EMR for real work. But that's kind of a pain too (and the ETL to S3 wasn't very fast). Now I'm going after the SStables directly(*), which sounds like how Netflix does it. You can do incremental updates, if you're careful. (*) Cassandra EC2 - backup to local EBS - remap EBS to another box - sstable2json over new sstables - S3 (splitting into ~100MB parts), then use EMR to consume the JSON part files. will On Wed, Jan 16, 2013 at 3:30 PM, Marcelo Elias Del Valle mvall...@gmail.com wrote: William, I just saw your message today. I am using Cassandra + Amazon EMR (hadoop 1.0.3) but I am not using PIG as you are. I set my configuration vars in Java, as I have a custom jar file and I am using ColumnFamilyInputFormat. However, if I understood well your problem, the only thing you have to do is to set environment vars when running cluster tasks, right? Take a look a this link: http://sujee.net/tech/articles/hadoop/amazon-emr-beyond-basics/ As it shows, you can run EMR setting some command line arguments that specify a script to be executed before the job starts, in each machine in the cluster. This way, you would be able to correctly set the vars you need. Out of curiosity, could you share what are you using for cassandra storage? I am currently using EC2 local disks, but I am looking for an alternative. Best regards, Marcelo. 2013/1/4 William Oberman ober...@civicscience.com So I've made it work, but I don't get it yet. I have no idea why my DIY server works when I set the environment variables on the machine that kicks off pig (master), and in EMR it doesn't. I recompiled ConfigHelper and CassandraStorage with tons of debugging, and in EMR I can see the hadoop Configuration object get the proper values on the master node, and I can see it does NOT propagate to the task threads. The other part that was driving me nuts could be made more user friendly. The issue is this: I started to try to set cassandra.thrift.address, cassandra.thrift.port, cassandra.partitioner.class in mapred-site.xml, and it didn't work. After even more painful debugging, I noticed that the only time Cassandra sets the input/output versions of those settings (and these input/output specific versions are the only versions really used!) is when Cassandra maps the system environment variables. So, having cassandra.thrift.address in mapred-site.xml does NOTHING, as I needed to have cassandra.output.thrift.address set. It would be much nicer if the get{Input/Output}XYZ checked for the existence of getXYZ if get{Input/Output}XYZ is empty/null. E.g. in getOutputThriftAddress(), if that setting is null, it would have been nice if that method returned getThriftAddress(). My problem went away when I put the full cross product in the XML. E.g. cassandra.input.thrift.address and cassandra.output.thrift.address (and port, and partitioner). I still want to know why the old easy way (of setting the 3 system variables on the pig starter box, and having the config flow into the task trackers) doesn't work! will On Fri, Jan 4, 2013 at 9:04 AM, William Oberman ober...@civicscience.com wrote: On all tasktrackers, I see: java.io.IOException: PIG_OUTPUT_INITIAL_ADDRESS or PIG_INITIAL_ADDRESS environment variable not set at org.apache.cassandra.hadoop.pig.CassandraStorage.setStoreLocation(CassandraStorage.java:821) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setLocation(PigOutputFormat.java:170) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.setUpContext(PigOutputCommitter.java:112) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.getCommitters(PigOutputCommitter.java:86) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.init(PigOutputCommitter.java:67) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getOutputCommitter(PigOutputFormat.java:279) at org.apache.hadoop.mapred.Task.initialize(Task.java:515) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:358) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at
Re: Cassandra at Amazon AWS
We use cassandra on ephemeral drives. Yes, that means we need more nodes to hold more data, but doesn't that play into cassandra's strengths? It sounds like you're trying to vertically scale your cassandra cluster. On Jan 16, 2013, at 12:42 PM, Marcelo Elias Del Valle wrote: Hello, I am currently using hadoop + cassandra at amazon AWS. Cassandra runs on EC2 and my hadoop process runs at EMR. For cassandra storage, I am using local EC2 EBS disks. My system is running fine for my tests, but to me it's not a good setup for production. I need my system to perform well for specially for writes on cassandra, but the amount of data could grow really big, taking several Tb of total storage. My first guess was using S3 as a storage and I saw this can be done by using Cloudian package, but I wouldn't like to become dependent on a pre-package solution and I found it's kind of expensive for more than 100Tb: http://www.cloudian.com/pricing.html I saw some discussion at internet about using EBS or ephemeral disks for storage at Amazon too. My question is: does someone on this list have the same problem as me? What are you using as solution to Cassandra's storage when running it at Amazon AWS? Any thoughts would be highly appreciatted. Best regards, -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr
Re: Cassandra at Amazon AWS
Storage size is not a problem, you always can add more nodes. Anyway, it is not recommended to have nodes with more then 500G (compaction, repair take forever). EC2 m1.large has 800G of ephemeral storage, EC2 m1.xlarge 1.6T. I'd recommend xlarge, it has 4 CPUs, so maintenance procedures don't affect performance a lot. Andrey On Wed, Jan 16, 2013 at 12:42 PM, Marcelo Elias Del Valle mvall...@gmail.com wrote: Hello, I am currently using hadoop + cassandra at amazon AWS. Cassandra runs on EC2 and my hadoop process runs at EMR. For cassandra storage, I am using local EC2 EBS disks. My system is running fine for my tests, but to me it's not a good setup for production. I need my system to perform well for specially for writes on cassandra, but the amount of data could grow really big, taking several Tb of total storage. My first guess was using S3 as a storage and I saw this can be done by using Cloudian package, but I wouldn't like to become dependent on a pre-package solution and I found it's kind of expensive for more than 100Tb: http://www.cloudian.com/pricing.html I saw some discussion at internet about using EBS or ephemeral disks for storage at Amazon too. My question is: does someone on this list have the same problem as me? What are you using as solution to Cassandra's storage when running it at Amazon AWS? Any thoughts would be highly appreciatted. Best regards, -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr
Re: AWS EMR - Cassandra
That's good info! Thanks! 2013/1/16 William Oberman ober...@civicscience.com DataStax recommended (forget the reference) to use the ephemeral disks in RAID0, which is what I've been running for well over a year now in production. In terms of how I'm doing Cassandra/AWS/Hadoop, I started by doing the split data center thing (one DC for low latency queries, one DC for hadoop). But, that's a lot of system management. And compute is the most expensive part of AWS, and you need a LOT of compute to run this setup. I tried doing Cassandra EC2 cluster - snapshot - clone cluster with hadoop overlay - ETL to S3 using hadoop - EMR for real work. But that's kind of a pain too (and the ETL to S3 wasn't very fast). Now I'm going after the SStables directly(*), which sounds like how Netflix does it. You can do incremental updates, if you're careful. (*) Cassandra EC2 - backup to local EBS - remap EBS to another box - sstable2json over new sstables - S3 (splitting into ~100MB parts), then use EMR to consume the JSON part files. will On Wed, Jan 16, 2013 at 3:30 PM, Marcelo Elias Del Valle mvall...@gmail.com wrote: William, I just saw your message today. I am using Cassandra + Amazon EMR (hadoop 1.0.3) but I am not using PIG as you are. I set my configuration vars in Java, as I have a custom jar file and I am using ColumnFamilyInputFormat. However, if I understood well your problem, the only thing you have to do is to set environment vars when running cluster tasks, right? Take a look a this link: http://sujee.net/tech/articles/hadoop/amazon-emr-beyond-basics/ As it shows, you can run EMR setting some command line arguments that specify a script to be executed before the job starts, in each machine in the cluster. This way, you would be able to correctly set the vars you need. Out of curiosity, could you share what are you using for cassandra storage? I am currently using EC2 local disks, but I am looking for an alternative. Best regards, Marcelo. 2013/1/4 William Oberman ober...@civicscience.com So I've made it work, but I don't get it yet. I have no idea why my DIY server works when I set the environment variables on the machine that kicks off pig (master), and in EMR it doesn't. I recompiled ConfigHelper and CassandraStorage with tons of debugging, and in EMR I can see the hadoop Configuration object get the proper values on the master node, and I can see it does NOT propagate to the task threads. The other part that was driving me nuts could be made more user friendly. The issue is this: I started to try to set cassandra.thrift.address, cassandra.thrift.port, cassandra.partitioner.class in mapred-site.xml, and it didn't work. After even more painful debugging, I noticed that the only time Cassandra sets the input/output versions of those settings (and these input/output specific versions are the only versions really used!) is when Cassandra maps the system environment variables. So, having cassandra.thrift.address in mapred-site.xml does NOTHING, as I needed to have cassandra.output.thrift.address set. It would be much nicer if the get{Input/Output}XYZ checked for the existence of getXYZ if get{Input/Output}XYZ is empty/null. E.g. in getOutputThriftAddress(), if that setting is null, it would have been nice if that method returned getThriftAddress(). My problem went away when I put the full cross product in the XML. E.g. cassandra.input.thrift.address and cassandra.output.thrift.address (and port, and partitioner). I still want to know why the old easy way (of setting the 3 system variables on the pig starter box, and having the config flow into the task trackers) doesn't work! will On Fri, Jan 4, 2013 at 9:04 AM, William Oberman ober...@civicscience.com wrote: On all tasktrackers, I see: java.io.IOException: PIG_OUTPUT_INITIAL_ADDRESS or PIG_INITIAL_ADDRESS environment variable not set at org.apache.cassandra.hadoop.pig.CassandraStorage.setStoreLocation(CassandraStorage.java:821) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setLocation(PigOutputFormat.java:170) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.setUpContext(PigOutputCommitter.java:112) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.getCommitters(PigOutputCommitter.java:86) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.init(PigOutputCommitter.java:67) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getOutputCommitter(PigOutputFormat.java:279) at org.apache.hadoop.mapred.Task.initialize(Task.java:515) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:358) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at
Re: Cassandra at Amazon AWS
We're currently using Cassandra on EC2 at very low scale (a 2 node cluster on m1.large instances in two regions.) I don't believe that EBS is recommended for performance reasons. Also, it's proven to be very unreliable in the past (most of the big/notable AWS outages were due to EBS issues.) We've moved 99% of our instances off of EBS. As other have said, if you require more space in the future it's easy to add more nodes to the cluster. I've found this page (http://www.ec2instances.info/) very useful in determining the amount of space each instance type has. Note that by default only one ephemeral drive is attached and you must specify all ephemeral drives that you want to use at launch time. Also, you can create a RAID 0 of all local disks to provide maximum speed and space. On 16 January 2013 20:42, Marcelo Elias Del Valle mvall...@gmail.com wrote: Hello, I am currently using hadoop + cassandra at amazon AWS. Cassandra runs on EC2 and my hadoop process runs at EMR. For cassandra storage, I am using local EC2 EBS disks. My system is running fine for my tests, but to me it's not a good setup for production. I need my system to perform well for specially for writes on cassandra, but the amount of data could grow really big, taking several Tb of total storage. My first guess was using S3 as a storage and I saw this can be done by using Cloudian package, but I wouldn't like to become dependent on a pre-package solution and I found it's kind of expensive for more than 100Tb: http://www.cloudian.com/pricing.html I saw some discussion at internet about using EBS or ephemeral disks for storage at Amazon too. My question is: does someone on this list have the same problem as me? What are you using as solution to Cassandra's storage when running it at Amazon AWS? Any thoughts would be highly appreciatted. Best regards, -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr
Cassandra timeout whereas it is not much busy
Hi, I have a strange behavior I am not able to understand. I have 6 nodes with cassandra-1.0.12. Each nodes have 8G of RAM. I have a replication factor of 3. --- my story is maybe too long, trying shorter here, while saving what I wrote in case someone has patience to read my bad english ;) I got under a situation where my cluster was generating a lot of timeouts on our frontend, whereas I could not see any major trouble on the internal stats. Actually cpu, read write counts on the column families were quite low. A mess until I switched from java7 to java6 and forced the used of jamm. After the switch, cpu, read write counts, were going up again, timeouts gone. I have seen this behavior while reducing the xmx too. What could be blocking cassandra from utilizing the while resources of the machine ? Is there is metrics I didn't saw which could explain this ? --- Here is the long story. When I first set my cluster up, I gave blindly 6G of heap to the cassandra nodes, thinking that more a java process has, the smoother it runs, while keeping some RAM to the disk cache. We got some new feature deployed, and things were going into hell, some machine up to 60% of wa. I give credit to cassandra because there was not that much timeout received on the web frontend, it was kind of slow but is was kind of working. With some optimizations, we reduced the pressure of the new feature, but it was still at 40%wa. At that time I didn't have much monitoring, just heap and cpu. I read some article how to tune, and I learned that the disk cache is quite important because cassandra relies on it to be the read cache. So I have tried many xmx, and 3G seems of kind the lowest possible. So on 2 among 6 nodes, I have set 3,3G to xmx. Amazingly, I saw the wa down to 10%. Quite happy with that, I changed the xmx 3,3G on each node. But then things really went to hell, a lot of timeouts on the frontend. It was not working at all. So I rolled back. After some time, probably because of the growing data of the new feature to a nominal size, things went again to very high %wa, and cassandra was not able to keep it up. So we kind of reverted the feature, the column family is still used but only by one thread on the frontend. The wa was reduced to 20%, but things continued to not properly working, from time to time, a bunch of timeout are raised on our frontend. In the mean time, I took time to do some proper monitoring of cassandra: column family read write counts, latency, memtable size, but also the dropped messages, the pending tasks, the timeouts between nodes. It's just a start but it haves me a first nice view of what is actually going on. I tried again reducing the xmx on one node. Cassandra is not complaining of having not enough heap, memtables are not flushed insanely every second, the number of read and write is reduced compared to the other node, the cpu is lower too, there is not much pending tasks, no message dropped more than 1 or 2 from time to time. Everything indicates that there is probably more room to more work, but the node doesn't take it. Even its read and write latencies are lower than on the other nodes. But if I keep this long enough with this xmx, timeouts start to raise on the frontends. After some individual node experiment, the cluster was starting be be quite sick. Even with 6G, the %wa were reducing, read and write counts too, on kind of every node. And more and more timeout raised on the frontend. The only thing that I could see worrying, is the heap climbing slowly above the 75% threshold and from time to time suddenly dropping from 95% to 70%. I looked at the full gc counter, not much pressure. And another thing was some Timed out replaying hints to /10.0.0.56; aborting further deliveries in the log. But logged as info, so I guess not much important. After some long useless staring at the monitoring graphs, I gave a try to using the openjdk 6b24 rather than openjdk 7u9, and force cassandra to load jamm, since in 1.0 the init script blacklist the openjdk. Node after node, I saw that the heap was behaving more like I use to see on jam based apps, some nice up and down rather than a long and slow climb. But read and write counts were still low on every node, and timeout were still bursting on our frontend. A continuing mess until I restarted the first node of the cluster. There was still one to switch to java6 + jamm, but as soon as I restarted my first node, every node started working more, %wa climbing, read write count climbing, no more timeout on the frontend, the frontend being then fast has hell. I understand that my cluster is probably under-capacity. But I don't understand how since there is something within cassandra which might block the full use of the machine resources. It seems kind of related to the heap, but I don't know how. Any idea ? I intend to start monitoring more metrics, but do you have any
Re: Pig / Map Reduce on Cassandra
On Wed, Jan 16, 2013 at 2:37 PM, cscetbon@orange.com wrote: Here is the point. You're right this github repository has not been updated for a year and a half. I thought brisk was just a bundle of some technologies and that it was possible to install the same components and make them work together without using this bundle :( You can install hadoop manually alongside Cassandra as well as pig. Pig support is in C*'s tree in o.a.c.hadoop.pig. You won't get CFS, but it's not a hard requirement, either. -Brandon
Re: How can OpsCenter show me Read Request Latency where there are no read requests??
A few milliseconds (or a few thousand usecs) isn't terribly high, considering that number includes at least one round trip between nodes. I'm not sure about the tracking behavior that you're describing -- could you provide some more details or perhaps screenshots? On Wed, Jan 16, 2013 at 12:16 PM, Brian Tarbox tar...@cabotresearch.comwrote: Hmm, that's sense but then why is the latency for the reads that get the metric often so high (several thousand uSecs) and why does it so closely track the latency of my normal reads? On Wed, Jan 16, 2013 at 12:14 PM, Tyler Hobbs ty...@datastax.com wrote: When you view OpsCenter metrics, you're generating a small number of reads to fetch the metric data, which is why your read count is near zero instead of actually being zero. Since reads are still occurring, Cassandra will continue to show a read latency. Basically, you're just viewing the latency on the reads to fetch metric data. Normally the number of reads required to view metrics are small enough that they only make a minor difference in your overall read latency average, but when you have no other reads occurring, they're the only reads that are included in the average. On Tue, Jan 15, 2013 at 9:28 PM, Brian Tarbox tar...@cabotresearch.comwrote: I am making heavy use of DataStax OpsCenter to help tune my system and its great. And yet puzzling. I see my clients do a burst of Reads causing the OpsCenter Read Requests chart to go up and stay up until the clients finish doing their reads. The read request latency chart also goes upbut it stays up even after all the reads are done. At last glance I've had next to zero reads for 10 minutes but still have a read request latency thats basically unchanged from when there were actual reads. How am I to interpret this? Thanks. Brian Tarbox -- Tyler Hobbs DataStax http://datastax.com/ -- Tyler Hobbs DataStax http://datastax.com/
Re: trying to use row_cache (b/c we have hot rows) but nodetool info says zero requests
You have to change the column family cache info from keys_only to otherwise the cache will not br on for this cf. On Wednesday, January 16, 2013, Brian Tarbox tar...@cabotresearch.com wrote: We have quite wide rows and do a lot of concentrated processing on each row...so I thought I'd try the row cache on one node in my cluster to see if I could detect an effect of using it. The problem is that nodetool info says that even with a two gig row_cache we're getting zero requests. Since my client program is actively processing, and since keycache shows lots of activity I'm puzzled. Shouldn't any read of a column cause the entire row to be loaded? My entire data file is only 32 gig right now so its hard to imagine the 2 gig is too small to hold even a single row? Any suggestions how to proceed are appreciated. Thanks. Brian Tarbox
Re: Starting Cassandra
I think at this point cassandra startup scripts should reject versions since cassandra won't even star with many jvms at this point. On Tuesday, January 15, 2013, Michael Kjellman mkjell...@barracuda.com wrote: Do yourself a favor and get a copy of the Oracle 7 JDK (now with more security patches too!) On Jan 15, 2013, at 1:44 AM, Sloot, Hans-Peter hans-peter.sl...@atos.net wrote: I managed to install apache-cassandra-1.2.0-bin.tar.gz With java-1.6.0-openjdk-1.6.0.0-1.45.1.11.1.el6.x86_64 I still get the segmentation fault. However with java-1.7.0-openjdk-1.7.0.3-2.1.0.1.el6.7.x86_64 everything runs fine. Regards Hans-Peter From: aaron morton [mailto:aa...@thelastpickle.com] Sent: dinsdag 15 januari 2013 1:20 To: user@cassandra.apache.org Subject: Re: Starting Cassandra DSE includes hadoop files. It looks like the installation is broken. I would start again if possible and/or ask the peeps at Data Stax about your particular OS / JVM configuration. In the past I've used this to set a particular JVM when multiple ones are installed… update-alternatives --set java /usr/lib/jvm/java-6-sun/jre/bin/java Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 11/01/2013, at 10:55 PM, Sloot, Hans-Peter hans-peter.sl...@atos.net wrote: Hi, I removed the open-jdk packages which caused the dse* packages to be uninstalled too and installed jdk6u38. But when I installed the dse packages yum also downloaded and installed the open-jdk packages. -- Join Barracuda Networks in the fight against hunger. To learn how you can help in your community, please visit: http://on.fb.me/UAdL4f
Re: LCS not removing rows with all TTL expired columns
To get column removed you have to meet two requirements 1. column should be expired 2. after that CF gets compacted I guess your expired columns are propagated to high tier CF, which gets compacted rarely. So, you have to wait when high tier CF gets compacted. Andrey On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot btal...@aeriagames.comwrote: On cassandra 1.1.5 with a write heavy workload, we're having problems getting rows to be compacted away (removed) even though all columns have expired TTL. We've tried size tiered and now leveled and are seeing the same symptom: the data stays around essentially forever. Currently we write all columns with a TTL of 72 hours (259200 seconds) and expect to add 10 GB of data to this CF per day per node. Each node currently has 73 GB for the affected CF and shows no indications that old rows will be removed on their own. Why aren't rows being removed? Below is some data from a sample row which should have been removed several days ago but is still around even though it has been involved in numerous compactions since being expired. $ ./bin/nodetool -h localhost getsstables metrics request_summary 459fb460-5ace-11e2-9b92-11d67b6163b4 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db $ ls -alF /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db $ ./bin/sstable2json /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump -e '36/1 %x') { 34353966623436302d356163652d313165322d396239322d313164363762363136336234: [[app_name,50f21d3d,1357785277207001,d], [client_ip,50f21d3d,1357785277207001,d], [client_req_id,50f21d3d,1357785277207001,d], [mysql_call_cnt,50f21d3d,1357785277207001,d], [mysql_duration_us,50f21d3d,1357785277207001,d], [mysql_failure_call_cnt,50f21d3d,1357785277207001,d], [mysql_success_call_cnt,50f21d3d,1357785277207001,d], [req_duration_us,50f21d3d,1357785277207001,d], [req_finish_time_us,50f21d3d,1357785277207001,d], [req_method,50f21d3d,1357785277207001,d], [req_service,50f21d3d,1357785277207001,d], [req_start_time_us,50f21d3d,1357785277207001,d], [success,50f21d3d,1357785277207001,d]] } Decoding the column timestamps to shows that the columns were written at Thu, 10 Jan 2013 02:34:37 GMT and that their TTL expired at Sun, 13 Jan 2013 02:34:37 GMT. The date of the SSTable shows that it was generated on Jan 16 which is 3 days after all columns have TTL-ed out. The schema shows that gc_grace is set to 0 since this data is write-once, read-seldom and is never updated or deleted. create column family request_summary with column_type = 'Standard' and comparator = 'UTF8Type' and default_validation_class = 'UTF8Type' and key_validation_class = 'UTF8Type' and read_repair_chance = 0.1 and dclocal_read_repair_chance = 0.0 and gc_grace = 0 and min_compaction_threshold = 4 and max_compaction_threshold = 32 and replicate_on_write = true and compaction_strategy = 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy' and caching = 'NONE' and bloom_filter_fp_chance = 1.0 and compression_options = {'chunk_length_kb' : '64', 'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'}; Thanks in advance for help in understanding why rows such as this are not removed! -Bryan
Re: LCS not removing rows with all TTL expired columns
According to the timestamps (see original post) the SSTable was written (thus compacted compacted) 3 days after all columns for that row had expired and 6 days after the row was created; yet all columns are still showing up in the SSTable. Note that the column shows now rows when a get for that key is run so that's working correctly, but the data is lugged around far longer than it should be -- maybe forever. -Bryan On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh ailin...@gmail.com wrote: To get column removed you have to meet two requirements 1. column should be expired 2. after that CF gets compacted I guess your expired columns are propagated to high tier CF, which gets compacted rarely. So, you have to wait when high tier CF gets compacted. Andrey On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot btal...@aeriagames.comwrote: On cassandra 1.1.5 with a write heavy workload, we're having problems getting rows to be compacted away (removed) even though all columns have expired TTL. We've tried size tiered and now leveled and are seeing the same symptom: the data stays around essentially forever. Currently we write all columns with a TTL of 72 hours (259200 seconds) and expect to add 10 GB of data to this CF per day per node. Each node currently has 73 GB for the affected CF and shows no indications that old rows will be removed on their own. Why aren't rows being removed? Below is some data from a sample row which should have been removed several days ago but is still around even though it has been involved in numerous compactions since being expired. $ ./bin/nodetool -h localhost getsstables metrics request_summary 459fb460-5ace-11e2-9b92-11d67b6163b4 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db $ ls -alF /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db $ ./bin/sstable2json /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump -e '36/1 %x') { 34353966623436302d356163652d313165322d396239322d313164363762363136336234: [[app_name,50f21d3d,1357785277207001,d], [client_ip,50f21d3d,1357785277207001,d], [client_req_id,50f21d3d,1357785277207001,d], [mysql_call_cnt,50f21d3d,1357785277207001,d], [mysql_duration_us,50f21d3d,1357785277207001,d], [mysql_failure_call_cnt,50f21d3d,1357785277207001,d], [mysql_success_call_cnt,50f21d3d,1357785277207001,d], [req_duration_us,50f21d3d,1357785277207001,d], [req_finish_time_us,50f21d3d,1357785277207001,d], [req_method,50f21d3d,1357785277207001,d], [req_service,50f21d3d,1357785277207001,d], [req_start_time_us,50f21d3d,1357785277207001,d], [success,50f21d3d,1357785277207001,d]] } Decoding the column timestamps to shows that the columns were written at Thu, 10 Jan 2013 02:34:37 GMT and that their TTL expired at Sun, 13 Jan 2013 02:34:37 GMT. The date of the SSTable shows that it was generated on Jan 16 which is 3 days after all columns have TTL-ed out. The schema shows that gc_grace is set to 0 since this data is write-once, read-seldom and is never updated or deleted. create column family request_summary with column_type = 'Standard' and comparator = 'UTF8Type' and default_validation_class = 'UTF8Type' and key_validation_class = 'UTF8Type' and read_repair_chance = 0.1 and dclocal_read_repair_chance = 0.0 and gc_grace = 0 and min_compaction_threshold = 4 and max_compaction_threshold = 32 and replicate_on_write = true and compaction_strategy = 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy' and caching = 'NONE' and bloom_filter_fp_chance = 1.0 and compression_options = {'chunk_length_kb' : '64', 'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'}; Thanks in advance for help in understanding why rows such as this are not removed! -Bryan
Webinar: Using Storm for Distributed Processing on Cassandra
Just an FYI -- We will be hosting a webinar tomorrow demonstrating the use of Storm as a distributed processing layer on top of Cassandra. I'll be tag teaming with Taylor Goetz, the original author of storm-cassandra. http://www.datastax.com/resources/webinars/collegecredit It is part of the C*ollege Credit Webinar Series from Datastax. All are welcome. -brian -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://brianoneill.blogspot.com/ twitter: @boneill42
Re: Cassandra 1.2 thrift migration
Any idea whether interoperability b/w Thrift and CQL should work properly in 1.2? AFAIK the only incompatibility is CQL 3 between pre 1.2 and 1.2. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 16/01/2013, at 1:24 AM, Vivek Mishra mishra.v...@gmail.com wrote: Hi, Is there any document to follow, in case i migrate cassandra thrift API to 1.2 release? Is it backward compatible with previous releases? While migrating Kundera to cassandra 1.2, it is complaining on various data types. Giving weird errors like: While connecting from cassandra-cli: Exception in thread main java.lang.OutOfMemoryError: Java heap space at java.lang.AbstractStringBuilder.init(AbstractStringBuilder.java:45) at java.lang.StringBuilder.init(StringBuilder.java:80) at java.math.BigDecimal.getValueString(BigDecimal.java:2885) at java.math.BigDecimal.toPlainString(BigDecimal.java:2869) at org.apache.cassandra.cql.jdbc.JdbcDecimal.getString(JdbcDecimal.java:72) at org.apache.cassandra.db.marshal.DecimalType.getString(DecimalType.java:62) at org.apache.cassandra.cli.CliClient.printSliceList(CliClient.java:2873) at org.apache.cassandra.cli.CliClient.executeList(CliClient.java:1486) at org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:272) at org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.java:210) at org.apache.cassandra.cli.CliMain.main(CliMain.java:337) And sometimes results in Server Crash. Any idea whether interoperability b/w Thrift and CQL should work properly in 1.2? -Vivek
Re: error when creating column family using cql3 and persisting data using thrift
The thrift request is not sending a composite type where it should. CQL 3 uses composites in a lot of places. What was your table definition? Are you using a high level client or rolling your own? Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 16/01/2013, at 5:32 AM, James Schappet jschap...@gmail.com wrote: I also saw this while testing the https://github.com/boneill42/naughty-or-nice example project. --Jimmy From: Kuldeep Mishra kuld.cs.mis...@gmail.com Reply-To: user@cassandra.apache.org Date: Tuesday, January 15, 2013 10:29 AM To: user@cassandra.apache.org Subject: error when creating column family using cql3 and persisting data using thrift Hi, I am facing following problem, when creating column family using cql3 and trying to persist data using thrift 1.2.0 in cassandra-1.2.0. Details: InvalidRequestException(why:Not enough bytes to read value of component 0) at org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:20833) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:964) at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:950) at com.impetus.client.cassandra.thrift.ThriftClient.onPersist(ThriftClient.java:157) Please help me. -- Thanks and Regards Kuldeep Kumar Mishra +919540965199
Re: write count increase after 1.2 update
You *may* be seeing this https://issues.apache.org/jira/browse/CASSANDRA-2503 It was implemented in 1.1.0 but perhaps data in the original cluster is more compacted than the new one. Are the increases for all CF's are just a few? Do you have a work load of infrequent writes to rows followed by wide reads ? Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 16/01/2013, at 6:23 AM, Reik Schatz reik.sch...@gmail.com wrote: Hi, we are running a 1.1.6 (datastax) test cluster with 6 nodes. After the recent 1.2 release we have set up a second cluster - also having 6 nodes running 1.2 (datastax). They are now running in parallel. We noticed an increase in the number of writes in our monitoring tool (Datadog). The tool is using the write count statistic of nodetool cfstats. So we ran nodetool cfstats on one node in each cluster. To get an initial write count. Then we ran it again after 60 sec. It looks like the 1.2 received about twice the amount of writes. The way our application is designed is that the writes are idempotent, so we don't see a size increase. Were there any changes in between 1.1.6 1.2 that could explain this behavior? I know that 1.2 has the concept of virtual nodes, to spread out the data more evenly. So if the write count value was actually the sum of all writes to all nodes in the, this increase would make sense. Reik ps. the clusters are not 100% identical. i.e. since bloom filters are now off-heap, we changed settings for heap size and memtables. Cluster 1.1.6: heap 8G, memtables 1/3 of heap. Cluster 1.2.0: heap 4G, memtables 2G. Not sure it can have an impact on the problem.
Re: Astyanax returns empty row
If you think you have located a bug in Astyanax please submit it to https://github.com/Netflix/astyanax Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 17/01/2013, at 3:44 AM, Sávio Teles savio.te...@lupa.inf.ufg.br wrote: We have multiple clients reading the same row key. It makes no sense fail in one machine. When we use Thrift, Cassandra always returns the correct result. 2013/1/16 Sávio Teles savio.te...@lupa.inf.ufg.br I ran the tests with only one machine, so the CL_ONE is not the problem. Am i right? -- Atenciosamente, Sávio S. Teles de Oliveira voice: +55 62 9136 6996 http://br.linkedin.com/in/savioteles Mestrando em Ciências da Computação - UFG Arquiteto de Software Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG
Re: Cassandra timeout whereas it is not much busy
Check the disk utilisation using iostat -x 5 If you are on a VM / in the cloud check for CPU steal. Check the logs for messages from the GCInspector, the ParNew events are times the JVM is paused. Look at the times dropped messages are logged and try to correlate them with other server events. If you have a lot secondary indexes, or a lot of memtables flushing at the some time you may be blocking behind the global Switch Lock. If you use secondary indexes make sure the memtable_flush_queue_size is set correctly, see the comments in the yaml file. If you have a lot of CF's flushing at the same time, and there are not messages from the MeteredFlusher, it may be the log segment is too big for the number of CF's you have. When the segment needs to be recycled all dirty CF's are flushed, if you have a lot of cf's this can result in blocking around the switch lock. Trying reducing the commitlog_segment_size_in_mb so that less CF's are flushed. Hope that helps - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 17/01/2013, at 10:30 AM, Nicolas Lalevée nicolas.lale...@hibnet.org wrote: Hi, I have a strange behavior I am not able to understand. I have 6 nodes with cassandra-1.0.12. Each nodes have 8G of RAM. I have a replication factor of 3. --- my story is maybe too long, trying shorter here, while saving what I wrote in case someone has patience to read my bad english ;) I got under a situation where my cluster was generating a lot of timeouts on our frontend, whereas I could not see any major trouble on the internal stats. Actually cpu, read write counts on the column families were quite low. A mess until I switched from java7 to java6 and forced the used of jamm. After the switch, cpu, read write counts, were going up again, timeouts gone. I have seen this behavior while reducing the xmx too. What could be blocking cassandra from utilizing the while resources of the machine ? Is there is metrics I didn't saw which could explain this ? --- Here is the long story. When I first set my cluster up, I gave blindly 6G of heap to the cassandra nodes, thinking that more a java process has, the smoother it runs, while keeping some RAM to the disk cache. We got some new feature deployed, and things were going into hell, some machine up to 60% of wa. I give credit to cassandra because there was not that much timeout received on the web frontend, it was kind of slow but is was kind of working. With some optimizations, we reduced the pressure of the new feature, but it was still at 40%wa. At that time I didn't have much monitoring, just heap and cpu. I read some article how to tune, and I learned that the disk cache is quite important because cassandra relies on it to be the read cache. So I have tried many xmx, and 3G seems of kind the lowest possible. So on 2 among 6 nodes, I have set 3,3G to xmx. Amazingly, I saw the wa down to 10%. Quite happy with that, I changed the xmx 3,3G on each node. But then things really went to hell, a lot of timeouts on the frontend. It was not working at all. So I rolled back. After some time, probably because of the growing data of the new feature to a nominal size, things went again to very high %wa, and cassandra was not able to keep it up. So we kind of reverted the feature, the column family is still used but only by one thread on the frontend. The wa was reduced to 20%, but things continued to not properly working, from time to time, a bunch of timeout are raised on our frontend. In the mean time, I took time to do some proper monitoring of cassandra: column family read write counts, latency, memtable size, but also the dropped messages, the pending tasks, the timeouts between nodes. It's just a start but it haves me a first nice view of what is actually going on. I tried again reducing the xmx on one node. Cassandra is not complaining of having not enough heap, memtables are not flushed insanely every second, the number of read and write is reduced compared to the other node, the cpu is lower too, there is not much pending tasks, no message dropped more than 1 or 2 from time to time. Everything indicates that there is probably more room to more work, but the node doesn't take it. Even its read and write latencies are lower than on the other nodes. But if I keep this long enough with this xmx, timeouts start to raise on the frontends. After some individual node experiment, the cluster was starting be be quite sick. Even with 6G, the %wa were reducing, read and write counts too, on kind of every node. And more and more timeout raised on the frontend. The only thing that I could see worrying, is the heap climbing slowly above the 75% threshold and from time to time suddenly dropping from 95% to 70%. I looked at the full gc counter, not much pressure. And
Re: LCS not removing rows with all TTL expired columns
Minor compaction (with Size Tiered) will only purge tombstones if all fragments of a row are contained in the SSTables being compacted. So if you have a long lived row, that is present in many size tiers, the columns will not be purged. (thus compacted compacted) 3 days after all columns for that row had expired Tombstones have to get on disk, even if you set the gc_grace_seconds to 0. If not they do not get a chance to delete previous versions of the column which already exist on disk. So when the compaction ran your ExpiringColumn was turned into a DeletedColumn and placed on disk. I would expect the next round of compaction to remove these columns. There is a new feature in 1.2 that may help you here. It will do a special compaction of individual sstables when they have a certain proportion of dead columns https://issues.apache.org/jira/browse/CASSANDRA-3442 Also interested to know if LCS helps. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 17/01/2013, at 2:55 PM, Bryan Talbot btal...@aeriagames.com wrote: According to the timestamps (see original post) the SSTable was written (thus compacted compacted) 3 days after all columns for that row had expired and 6 days after the row was created; yet all columns are still showing up in the SSTable. Note that the column shows now rows when a get for that key is run so that's working correctly, but the data is lugged around far longer than it should be -- maybe forever. -Bryan On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh ailin...@gmail.com wrote: To get column removed you have to meet two requirements 1. column should be expired 2. after that CF gets compacted I guess your expired columns are propagated to high tier CF, which gets compacted rarely. So, you have to wait when high tier CF gets compacted. Andrey On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot btal...@aeriagames.com wrote: On cassandra 1.1.5 with a write heavy workload, we're having problems getting rows to be compacted away (removed) even though all columns have expired TTL. We've tried size tiered and now leveled and are seeing the same symptom: the data stays around essentially forever. Currently we write all columns with a TTL of 72 hours (259200 seconds) and expect to add 10 GB of data to this CF per day per node. Each node currently has 73 GB for the affected CF and shows no indications that old rows will be removed on their own. Why aren't rows being removed? Below is some data from a sample row which should have been removed several days ago but is still around even though it has been involved in numerous compactions since being expired. $ ./bin/nodetool -h localhost getsstables metrics request_summary 459fb460-5ace-11e2-9b92-11d67b6163b4 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db $ ls -alF /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db $ ./bin/sstable2json /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump -e '36/1 %x') { 34353966623436302d356163652d313165322d396239322d313164363762363136336234: [[app_name,50f21d3d,1357785277207001,d], [client_ip,50f21d3d,1357785277207001,d], [client_req_id,50f21d3d,1357785277207001,d], [mysql_call_cnt,50f21d3d,1357785277207001,d], [mysql_duration_us,50f21d3d,1357785277207001,d], [mysql_failure_call_cnt,50f21d3d,1357785277207001,d], [mysql_success_call_cnt,50f21d3d,1357785277207001,d], [req_duration_us,50f21d3d,1357785277207001,d], [req_finish_time_us,50f21d3d,1357785277207001,d], [req_method,50f21d3d,1357785277207001,d], [req_service,50f21d3d,1357785277207001,d], [req_start_time_us,50f21d3d,1357785277207001,d], [success,50f21d3d,1357785277207001,d]] } Decoding the column timestamps to shows that the columns were written at Thu, 10 Jan 2013 02:34:37 GMT and that their TTL expired at Sun, 13 Jan 2013 02:34:37 GMT. The date of the SSTable shows that it was generated on Jan 16 which is 3 days after all columns have TTL-ed out. The schema shows that gc_grace is set to 0 since this data is write-once, read-seldom and is never updated or deleted. create column family request_summary with column_type = 'Standard' and comparator = 'UTF8Type' and default_validation_class = 'UTF8Type' and key_validation_class = 'UTF8Type' and read_repair_chance = 0.1 and dclocal_read_repair_chance = 0.0 and gc_grace = 0 and min_compaction_threshold = 4 and max_compaction_threshold = 32 and replicate_on_write = true and compaction_strategy =
Re: Cassandra Consistency problem with NTP
One solution is to only read up to (now - 1 second). If this is a public API where you want to guarantee full consistency (ie, if you have added a message to the queue, it will definitely appear to be there) you can instead delay requests for 1 second before reading up to the moment that the request was received. In either of these approaches you can tune the time offset based on how closely synchronized you believe you can keep your clocks. The tradeoff of course, will be increased latency. On Wed, Jan 16, 2013 at 5:56 PM, Jason Tang ares.t...@gmail.com wrote: Hi I am using Cassandra in a message bus solution, the major responsibility of cassandra is recording the incoming requests for later consumming. One strategy is First in First out (FIFO), so I need to get the stored request in reversed order. I use NTP to synchronize the system time for the nodes in the cluster. (4 nodes). But the local time of each node are still have some inaccuracy, around 40 ms. The consistency level is write all and read one, and replicate factor is 3. But here is the problem: A request come to node One at local time PM 10:00:01.000 B request come to node Two at local time PM 10:00:00.980 The correct order is A -- B But the timestamp is B -- A So is there any way for Cassandra to keep the correct order for read operation? (e.g. logical timestamp ?) Or Cassandra strong depence on time synchronization solution? BRs //Tang
Re: Cassandra Consistency problem with NTP
Delay read is acceptable, but problem still there: A request come to node One at local time PM 10:00:01.000 B request come to node Two at local time PM 10:00:00.980 The correct order is A -- B I am not sure how node C will handle the data, although A came before B, but B's timestamp is earlier then A ? 2013/1/17 Russell Haering russellhaer...@gmail.com One solution is to only read up to (now - 1 second). If this is a public API where you want to guarantee full consistency (ie, if you have added a message to the queue, it will definitely appear to be there) you can instead delay requests for 1 second before reading up to the moment that the request was received. In either of these approaches you can tune the time offset based on how closely synchronized you believe you can keep your clocks. The tradeoff of course, will be increased latency. On Wed, Jan 16, 2013 at 5:56 PM, Jason Tang ares.t...@gmail.com wrote: Hi I am using Cassandra in a message bus solution, the major responsibility of cassandra is recording the incoming requests for later consumming. One strategy is First in First out (FIFO), so I need to get the stored request in reversed order. I use NTP to synchronize the system time for the nodes in the cluster. (4 nodes). But the local time of each node are still have some inaccuracy, around 40 ms. The consistency level is write all and read one, and replicate factor is 3. But here is the problem: A request come to node One at local time PM 10:00:01.000 B request come to node Two at local time PM 10:00:00.980 The correct order is A -- B But the timestamp is B -- A So is there any way for Cassandra to keep the correct order for read operation? (e.g. logical timestamp ?) Or Cassandra strong depence on time synchronization solution? BRs //Tang
Re: error when creating column family using cql3 and persisting data using thrift
Hi Aaron, I am using thrift client. Here is column family creation script:- ``` String colFamily = CREATE COLUMNFAMILY users (key varchar PRIMARY KEY,full_name varchar, birth_date int,state varchar); conn.execute_cql3_query(ByteBuffer.wrap(colFamily.getBytes()), Compression.NONE, ConsistencyLevel.ONE); `` and thrift operation code :- ``` Cassandra.Client conn MapByteBuffer, MapString, ListMutation mutationMap = new HashMapByteBuffer, MapString, ListMutation(); ListMutation insertion_list = new ArrayListMutation(); Mutation mut = new Mutation(); Column column = new Column(ByteBuffer.wrap(full_name.getBytes())); column.setValue(ByteBuffer.wrap(emp.getBytes())); mut.setColumn_or_supercolumn(new ColumnOrSuperColumn().setColumn(column)); insertion_list.add(mut); MapString, ListMutation columnFamilyValues = new HashMapString, ListMutation(); columnFamilyValues.put(users, insertion_list); mutationMap.put(ByteBuffer.wrap(K.getBytes()), columnFamilyValues); conn.batch_mutate(mutationMap, ConsistencyLevel.ONE); `` and error stack trace :- `` InvalidRequestException(why:Not enough bytes to read value of component 0) at org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:20833) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:964) at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:950) Thanks Kuldeep Mishra On Thu, Jan 17, 2013 at 8:40 AM, aaron morton aa...@thelastpickle.comwrote: The thrift request is not sending a composite type where it should. CQL 3 uses composites in a lot of places. What was your table definition? Are you using a high level client or rolling your own? Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 16/01/2013, at 5:32 AM, James Schappet jschap...@gmail.com wrote: I also saw this while testing the https://github.com/boneill42/naughty-or-nice example project. --Jimmy From: Kuldeep Mishra kuld.cs.mis...@gmail.com Reply-To: user@cassandra.apache.org Date: Tuesday, January 15, 2013 10:29 AM To: user@cassandra.apache.org Subject: error when creating column family using cql3 and persisting data using thrift Hi, I am facing following problem, when creating column family using cql3 and trying to persist data using thrift 1.2.0 in cassandra-1.2.0. Details: InvalidRequestException(why:Not enough bytes to read value of component 0) at org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:20833) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:964) at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:950) at com.impetus.client.cassandra.thrift.ThriftClient.onPersist(ThriftClient.java:157) Please help me. -- Thanks and Regards Kuldeep Kumar Mishra +919540965199 -- Thanks and Regards Kuldeep Kumar Mishra +919540965199
Re: Cassandra Consistency problem with NTP
I'm not sure I fully understand your problem. You seem to be talking of ordering the requests, in the order they are generated. But in that case, you will rely on the ordering of columns within whatever row you store request A and B in, and that order depends on the column names, which in turns is client provided and doesn't depend at all of the time synchronization of the cluster nodes. And since you are able to say that request A comes before B, I suppose this means said requests are generated from the same source. In which case you just need to make sure that the column names storing each request respect the correct ordering. The column timestamps Cassandra uses are here to which update *to the same column* is the more recent one. So it only comes into play if you requests A and B update the same column and you're interested in knowing which one of the update will win when you read. But even if that's your case (which doesn't sound like it at all from your description), the column timestamp is only generated server side if you use CQL. And even in that latter case, it's a convenience and you can force a timestamp client side if you really wish. In other words, Cassandra dependency on time synchronization is not a strong one even in that case. But again, that doesn't seem at all to be the problem you are trying to solve. -- Sylvain On Thu, Jan 17, 2013 at 2:56 AM, Jason Tang ares.t...@gmail.com wrote: Hi I am using Cassandra in a message bus solution, the major responsibility of cassandra is recording the incoming requests for later consumming. One strategy is First in First out (FIFO), so I need to get the stored request in reversed order. I use NTP to synchronize the system time for the nodes in the cluster. (4 nodes). But the local time of each node are still have some inaccuracy, around 40 ms. The consistency level is write all and read one, and replicate factor is 3. But here is the problem: A request come to node One at local time PM 10:00:01.000 B request come to node Two at local time PM 10:00:00.980 The correct order is A -- B But the timestamp is B -- A So is there any way for Cassandra to keep the correct order for read operation? (e.g. logical timestamp ?) Or Cassandra strong depence on time synchronization solution? BRs //Tang
Re: Cassandra Consistency problem with NTP
Yes, Sylvain, you are correct. When I say A comes before B, it means client will secure the order, actually, B will be sent only after get response of A request. And Yes, A and B are not update same record, so it is not typical Cassandra consistency problem. And Yes, the column name is provide by client, and now I use the local timestamp, and local time of A and B are not synchronized well, so I have problem. So what I want is, Cassandra provide some information for client, to indicate A is stored before B, e.g. global unique timestamp, or row order. 2013/1/17 Sylvain Lebresne sylv...@datastax.com I'm not sure I fully understand your problem. You seem to be talking of ordering the requests, in the order they are generated. But in that case, you will rely on the ordering of columns within whatever row you store request A and B in, and that order depends on the column names, which in turns is client provided and doesn't depend at all of the time synchronization of the cluster nodes. And since you are able to say that request A comes before B, I suppose this means said requests are generated from the same source. In which case you just need to make sure that the column names storing each request respect the correct ordering. The column timestamps Cassandra uses are here to which update *to the same column* is the more recent one. So it only comes into play if you requests A and B update the same column and you're interested in knowing which one of the update will win when you read. But even if that's your case (which doesn't sound like it at all from your description), the column timestamp is only generated server side if you use CQL. And even in that latter case, it's a convenience and you can force a timestamp client side if you really wish. In other words, Cassandra dependency on time synchronization is not a strong one even in that case. But again, that doesn't seem at all to be the problem you are trying to solve. -- Sylvain On Thu, Jan 17, 2013 at 2:56 AM, Jason Tang ares.t...@gmail.com wrote: Hi I am using Cassandra in a message bus solution, the major responsibility of cassandra is recording the incoming requests for later consumming. One strategy is First in First out (FIFO), so I need to get the stored request in reversed order. I use NTP to synchronize the system time for the nodes in the cluster. (4 nodes). But the local time of each node are still have some inaccuracy, around 40 ms. The consistency level is write all and read one, and replicate factor is 3. But here is the problem: A request come to node One at local time PM 10:00:01.000 B request come to node Two at local time PM 10:00:00.980 The correct order is A -- B But the timestamp is B -- A So is there any way for Cassandra to keep the correct order for read operation? (e.g. logical timestamp ?) Or Cassandra strong depence on time synchronization solution? BRs //Tang