Re: read path, I have missed something

2013-01-17 Thread santi kumar
Sorry to intrude in this thread, but my intention is to get a clarity on read_repair_chance. Our reads doesn't need near real time data, so all our reads use CL.ONE. In this case, how read repair happens in the replicas? what should be the ideal value of read_repair_chance in this case? how often

Re: Cassandra Consistency problem with NTP

2013-01-17 Thread Sylvain Lebresne
So what I want is, Cassandra provide some information for client, to indicate A is stored before B, e.g. global unique timestamp, or row order. The row order is determined by 1) the comparator you use for the column family and 2) the column names you, the client, choose for A and B. So what

RE: LCS not removing rows with all TTL expired columns

2013-01-17 Thread Viktor Jevdokimov
@Bryan, To keep data size as low as possible with TTL columns we still use STCS and nightly major compactions. Experience with LCS was not successful in our case, data size keeps too high along with amount of compactions. IMO, before 1.2, LCS was good for CFs without TTL or high delete rate.

Cassandra 1.2 system.peers table

2013-01-17 Thread Nicolai Gylling
Hi I have a cluster of 3 nodes running Cassandra v1.2 with num_tokens set to 256. It's running on EC2. When I installed the cluster, I took up one node with seed set to it's own IP. The next 2 had the first one as seed. A 'nodetool status' shows all 3 nodes up and running. Replicationfactor is

Re: write count increase after 1.2 update

2013-01-17 Thread Reik Schatz
Cool feature, didn't know it existed. It turned however out that everything works fine! There was a configuration error that duplicated a AWS sns-sqs subscription, so we go twice the amount of data delivered to our application. Semi-lame post to this mailing list i guess :( I should have checked

Re: Cassandra 1.2 system.peers table

2013-01-17 Thread Sylvain Lebresne
Now, one of the nodes dies, and when I bring it back up, it does'nt join the cluster again, but becomes it own node/cluster. I can't get it to join the cluster again, even after doing 'removenode' and clearing all data. That obviously should not have happened. That being said we have a few

Re: Cassandra 1.2 system.peers table

2013-01-17 Thread Nicolai Gylling
On Jan 17, 2013, at 11:54 AM, Sylvain Lebresne sylv...@datastax.com wrote: Now, one of the nodes dies, and when I bring it back up, it does'nt join the cluster again, but becomes it own node/cluster. I can't get it to join the cluster again, even after doing 'removenode' and clearing all

Re: Pig / Map Reduce on Cassandra

2013-01-17 Thread cscetbon.ext
what do you mean ? it's not needed by Pig or Hive to access Cassandra data. Regards On Jan 16, 2013, at 11:14 PM, Brandon Williams dri...@gmail.commailto:dri...@gmail.com wrote: You won't get CFS, but it's not a hard requirement, either.

Re: Pig / Map Reduce on Cassandra

2013-01-17 Thread James Schappet
CFS is Cassandra File System: http://www.datastax.com/dev/blog/cassandra-file-system-design But you don't need CFS to connect from PIG to Cassandra. The latest versions of Cassandra Source ship with examples of connecting from pig to cassandra. apache-cassandra-1.2.0-src/examples/pig --

Re: Cassandra at Amazon AWS

2013-01-17 Thread Adam Venturella
Jared, how do you guys handle data backups for your ephemeral based cluster? I'm trying to move to ephemeral drives myself, and that was my last sticking point; asking how others in the community deal with backup in case the VM explodes. On Wed, Jan 16, 2013 at 1:21 PM, Jared Biel

Re: Cassandra at Amazon AWS

2013-01-17 Thread William Oberman
I have a peer EBS disk to the ephemeral disk . Then I do nodetool snapshot - rsync from ephemeral to EBS - take snapshot of EBS. Syncing nodetool snapshot directly to S3 would involve less steps and be cheaper (EBS costs more than S3), but I do post processing on the snapshot for EMR, and it

Re: Pig / Map Reduce on Cassandra

2013-01-17 Thread cscetbon.ext
Jimmy, I understand that CFS can replace HDFS for those who use Hadoop. I just want to use pig and hive on cassandra. I know that pig samples are provided and work now with cassandra natively (they are part of the core). However, does it mean that the process will be spread over nodes with

Re: Pig / Map Reduce on Cassandra

2013-01-17 Thread James Schappet
This really depends on how you design your Hadoop Cluster. The testing I have done, had Hadoop and Cassandra Nodes collocated on the same hosts. Remember that Pig code runs inside of your hadoop cluster, and connects to Cassandra as the Database engine. I have not done any testing with Hive, so

Re: unsubscribe

2013-01-17 Thread Eric Evans
http://goo.gl/CkXv3 On Wed, Jan 16, 2013 at 12:39 PM, Leonid Ilyevsky lilyev...@mooncapital.com wrote: ** ** ** ** *Leonid Ilyevsky* *Moon Capital Management, LP* 499 Park Avenue New York, NY 10022 P: (212) 652-4586 F: (212) 652-4501 E:

Re: Pig / Map Reduce on Cassandra

2013-01-17 Thread cscetbon.ext
Ok, I understand that I need to manage both cassandra and hadoop components and that pig will use hadoop components to launch its tasks which will use Cassandra as the Storage engine. Thanks -- Cyril SCETBON On Jan 17, 2013, at 4:03 PM, James Schappet

Re: Cassandra at Amazon AWS

2013-01-17 Thread Jared Biel
We use a replication factor such that if any one instance dies the cluster would remain alive. If a node dies, we simply replace it and move on. As far as disaster recovery, it's easy to store snapshots in S3, although glacier is looking interesting. Jared Biel System Administrator Bolder Thinking

Re: Cassandra at Amazon AWS

2013-01-17 Thread Andrey Ilinykh
I'd recommend Priam. http://techblog.netflix.com/2012/02/announcing-priam.html Andrey On Thu, Jan 17, 2013 at 5:44 AM, Adam Venturella aventure...@gmail.comwrote: Jared, how do you guys handle data backups for your ephemeral based cluster? I'm trying to move to ephemeral drives myself,

Re: LCS not removing rows with all TTL expired columns

2013-01-17 Thread Bryan Talbot
We are using LCS and the particular row I've referenced has been involved in several compactions after all columns have TTL expired. The most recent one was again this morning and the row is still there -- TTL expired for several days now with gc_grace=0 and several compactions later ... $

Re: Cassandra Consistency problem with NTP

2013-01-17 Thread Edward Capriolo
If you have 40ms NTP drift something is VERY VERY wrong. You should have a local NTP server on the same subnet, do not try to use one on the moon. On Thu, Jan 17, 2013 at 4:42 AM, Sylvain Lebresne sylv...@datastax.comwrote: So what I want is, Cassandra provide some information for client, to

Re: Pig / Map Reduce on Cassandra

2013-01-17 Thread James Lyons
Silly question -- but does hive/pig hadoop etc work with cassandra 1.1.8? Or only with 1.2? We are using astyanax library, which seems to fail horribly on 1.2, so we're still on 1.1.8. But we're just starting out with this and i'm still debating between cassandra and hbase. So I just want to

Re: BulkOutputFormat

2013-01-17 Thread Michael Kjellman
https://issues.apache.org/jira/browse/CASSANDRA-4813 Fixed in 1.2.0 Best, michael From: chandra Varahala hadoopandcassan...@gmail.commailto:hadoopandcassan...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org

Re: BulkOutputFormat

2013-01-17 Thread chandra Varahala
I am not reducers, just Map only job still same kind issue ? thanks chandra On Thu, Jan 17, 2013 at 1:50 PM, Michael Kjellman mkjell...@barracuda.comwrote: https://issues.apache.org/jira/browse/CASSANDRA-4813 Fixed in 1.2.0 Best, michael From: chandra Varahala

Re: Starting Cassandra

2013-01-17 Thread Yang Song
oracle java + cassandra binary should work fiine 2013/1/17 Sloot, Hans-Peter hans-peter.sl...@atos.net Well I tried to use the oracle stuff but the Cassandra rpm’s seem to depend on the open-jdk‘s *From:* Michael Kjellman [mailto:mkjell...@barracuda.com] *Sent:* dinsdag 15 januari 2013

Cassandra Performance Benchmarking.

2013-01-17 Thread Pradeep Kumar Mantha
Hi, I am trying to maximize execution of the number of read queries/second. Here is my cluster configuration. Replication - Default 12 Data Nodes. 16 Client Nodes - used for querying. Each client node executes 32 threads - each thread executes 76896 read queries using cassandra-cli tool.

Re: BulkOutputFormat

2013-01-17 Thread Michael Kjellman
It was primarily a streaming issue not a Hadoop component issue. Seems very similar to not be related IMHO On Jan 17, 2013, at 10:59 AM, chandra Varahala hadoopandcassan...@gmail.commailto:hadoopandcassan...@gmail.com wrote: I am not reducers, just Map only job still same kind issue ? thanks

Re: error when creating column family using cql3 and persisting data using thrift

2013-01-17 Thread aaron morton
and thrift operation code :- You life will be a lot easier if you use one of the many find Java Cassandra clients such as https://github.com/Netflix/astyanax or https://github.com/hector-client/hector. They know how to talk to C* Cheers - Aaron Morton Freelance Cassandra

Re: read path, I have missed something

2013-01-17 Thread aaron morton
In this case, how read repair happens in the replicas? By default 90% of the reads will only read from 1 replica, and 10% will read from all. However the client request will *only* wait for one replica to return a value. And it has to be the replica that was asked to return the full data, not

Re: write count increase after 1.2 update

2013-01-17 Thread aaron morton
Semi-lame post to this mailing list i guess :( I should have checked that earlier No problems. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 17/01/2013, at 11:50 PM, Reik Schatz reik.sch...@gmail.com wrote:

Re: Cassandra Performance Benchmarking.

2013-01-17 Thread Edward Capriolo
Wow you managed to do a load test through the cassandra-cli. There should be a merit badge for that. You should use the built in stress tool or YCSB. The CLI has to do much more string conversion then a normal client would and it is not built for performance. You will definitely get better

Composite Keys Query

2013-01-17 Thread Renato Marroquín Mogrovejo
Hi all, I am using some composite keys to get just some specific composite columns names which I am using as follows: create column family video_event with comparator = 'CompositeType(UTF8Type,UTF8Type)' and key_validation_class = 'UTF8Type' and default_validation_class = 'UTF8Type';

Re: LCS not removing rows with all TTL expired columns

2013-01-17 Thread Bryan Talbot
I'm able to reproduce this behavior on my laptop using 1.1.5, 1.1.7, 1.1.8, a trivial schema, and a simple script that just inserts rows. If the TTL is small enough so that all LCS data fits in generation 0 then the rows seem to be removed with TTL expires as desired. However, if the insertion

Re: Cassandra Performance Benchmarking.

2013-01-17 Thread Pradeep Kumar Mantha
Hi, Thanks. I would like to benchmark cassandra with our application so that we understand the details of how the actual benchmarking is done. Not sure, how easy it would be to integrate YCSB with our application. So, i am trying different client interfaces to cassandra. I found for 12 Data

Re: LCS not removing rows with all TTL expired columns

2013-01-17 Thread Derek Williams
When you ran this test, is that the exact schema you used? I'm not seeing where you are setting gc_grace to 0 (although I could just be blind, it happens). On Thu, Jan 17, 2013 at 5:01 PM, Bryan Talbot btal...@aeriagames.comwrote: I'm able to reproduce this behavior on my laptop using 1.1.5,

Re: Cassandra Performance Benchmarking.

2013-01-17 Thread Pradeep Kumar Mantha
Thanks Tyler. I just moved the pool and cf which store the connection pool and CF information to have global scope. Increased the server_list values from 1 to 4. ( i think i can increase them max to 12 since I have 12 data nodes ) when I created 8 threads using python threading package , I see

Re: LCS not removing rows with all TTL expired columns

2013-01-17 Thread Bryan Talbot
Bleh, I rushed out the email before some meetings and I messed something up. Working on reproducing now with better notes this time. -Bryan On Thu, Jan 17, 2013 at 4:45 PM, Derek Williams de...@fyrie.net wrote: When you ran this test, is that the exact schema you used? I'm not seeing where

Re: Cassandra at Amazon AWS

2013-01-17 Thread Marcelo Elias Del Valle
Everyone, thanks a lot for the answer, they helped me a lot. 2013/1/17 Andrey Ilinykh ailin...@gmail.com I'd recommend Priam. http://techblog.netflix.com/2012/02/announcing-priam.html Andrey On Thu, Jan 17, 2013 at 5:44 AM, Adam Venturella aventure...@gmail.comwrote: Jared, how do you