Re: Replication factor - Consistency Questions
Yes, for ALL, it is not good for HA, and because we meet problem when use QUORAM, and current solution is switch Write:QUORAM / Read:QUORAM when got UnavailableException exception. 2012/7/18 Jay Parashar jparas...@itscape.com Thanks..but write ALL will fail for any downed nodes. I am thinking of QUORAM. ** ** *From:* Jason Tang [mailto:ares.t...@gmail.com] *Sent:* Tuesday, July 17, 2012 8:24 PM *To:* user@cassandra.apache.org *Subject:* Re: Replication factor - Consistency Questions ** ** Hi ** ** I am starting using Cassandra for not a long time, and also have problems in consistency. ** ** Here is some thinking. If you have Write:Any / Read:One, it will have consistency problem, and if you want to repair, check your schema, and check the parameter Read repair chance: http://wiki.apache.org/cassandra/StorageConfiguration ** ** And if you want to get consistency result, my suggestion is to have Write:ALL / Read:One, since for Cassandra, write is more faster then read. ** ** For performance impact, you need to test your traffic, and if your memory can not cache all your data, or your network is not fast enough, then yes, it will impact to write one more node. ** ** BRs ** ** 2012/7/18 Jay Parashar jparas...@itscape.com Hello all, There is a lot of material on Replication factor and Consistency level but I am a little confused by what is happening on my setup. (Cassandra 1.1.2). I would appreciate any answers. My Setup: A cluster of 2 nodes evenly balanced. My RF =2, Consistency Level; Write = ANY and Read = 1 I know that my consistency is Weak but since my RF = 2, I thought data would be just duplicated in both the nodes but sometimes, querying does not give me the correct (or gives partial) results. In other times, it gives me the right results Is the Read Repair going on after the first query? But as RF = 2, data is duplicated then why the repair? Note: My query is done a while after the Writes so data should have been in both the nodes. Or is this not the case (flushing not happening etc)? I am thinking of making the Write as 1 and Read as QUORAM so R + W RF (1 + 2 2) to give strong consistency. Will that affect performance a lot (generally speaking)? Thanks in advance Regards Jay ** **
Re: Cassandra Evaluation/ Benchmarking: Throughput not scaling as expected neither latency showing good numbers
How kind of client are you using in YCSB? If you want to improve latency, try distributing the requests among nodes instead of stressing a single node, try host connection pooling instead of creating connection for each request. Check high level clients like hector or asyantax for use if you are not already using them. Some clients have ring aware request handling. You have a 3 nodes cluster and using a RF of three, that means all the node will get the data. What CL are you using for writes? Latency increases for strong CL. If you want to increase throughput, try increasing the number of clients. Of course, it doesnt mean that throughtput will always increase. My observation was that it will increase and after certain number of clients throughput decrease again. Regards, Manoj Mainali On Wednesday, July 18, 2012, Code Box wrote: The cassandra stress tool gives me values around 2.5 milli seconds for writing. The problem with the Cassandra Stress Tool is that it just gives the average latency numbers and the average latency numbers that i am getting are comparable in some cases. It is the 95 percentile and 99 percentile numbers are the ones that are bad. So it means that the 95% of requests are really bad and the rest 5% are really good that makes the average go down. I want to make sure that the 95% and 99% values are in one digit milli seconds. I want them to be single digit because i have seen people getting those numbers. This is my conclusion till now with all the investigations:- Three node cluster with replication factor of 3 gets me around 10 ms 100% writes with consistency equal to ONE. The reads are really bad and they are around 65ms. I thought that network is the issue so i moved the client on a local machine. Client on the local machine with one node cluster gives me again good average write latencies but the 99%ile and 95%ile are bad. I am getting around 10 ms for write and 25 ms for read. Network Bandwidth between the client and server is 1 Gigabit/second. I was able to at the max generate 25 K requests. So it could be the client is the bottleneck. I am using YCSB. May be i should change my client to some other. Throughput that i got from a client at the maximum local was 35K and remote was 17K. I can try these things now:- Use a different client and see how much numbers i get for 99% and 95%. I am not sure if there is any client that gives me this detailed or i have to write one of my own. Tweak some hard disk settings raid0 and xfs / ext4 and see if that helps. Could be a possibility that the cassandra 0.8 to 1.1 the 95% and 99% numbers have gone down. The throughput numbers have also gone down. Is there any other client that i can use except the cassandra stress tool and YCSB and what ever numbers i have got are they good ? --Akshat Vig. On Tue, Jul 17, 2012 at 9:22 PM, aaron morton aa...@thelastpickle.comwrote: I would benchmark a default installation, then start tweaking. That way you can see if your changes result in improvements. To simplify things further try using the tools/stress utility in the cassandra source distribution first. It's pretty simple to use. Add clients until you see the latency increase and tasks start to back up in nodetool tpstats. If you see it report dropped messages it is over loaded. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 18/07/2012, at 4:48 AM, Code Box wrote: Thanks a lot for your reply guys. I was trying fsyn = batch and window =0ms to see if the disk utilization is happening full on my drive. I checked the numbers using iostat the numbers were around 60% and the CPU usage was also not too high. Configuration of my Setup :- I have three m1.xlarge hosts each having 15 GB RAM and 4 CPU. It has 8 EC2 Compute Units. I have kept the replication factor equal to 3. The typical write size is 1 KB. I tried adding different nodes each with 200 threads and the throughput got split into two. If i do it from a single host with FSync Set to Periodic and Window Size equal to 1000ms and using two nodes i am getting these numbers :- [OVERALL], Throughput(ops/sec), 4771 [INSERT], AverageLatency(us), 18747 [INSERT], MinLatency(us), 1470 [INSERT], MaxLatency(us), 446413 [INSERT], 95thPercentileLatency(ms), 55 [INSERT], 99thPercentileLatency(ms), 167 [OVERALL], Throughput(ops/sec), 4678 [INSERT], AverageLatency(us), 22015 [INSERT], MinLatency(us), 1439 [INSERT], MaxLatency(us), 466149 [INSERT], 95thPercentileLatency(ms), 62 [INSERT], 99thPercentileLatency(ms), 171 Is there something i am doing wrong in cassandra Setup ?? What is the bet Setup for Cassandra to get high throughput and good write latency numbers ? On Tue, Jul 17, 2012 at 7:02 AM, Sylvain Lebresne sylv...@datastax.com
Batch update efficiency with composite key
I have a question about efficiency of updates to a CF with composite key. Let say I have 100 of logical rows to update, and they all belong to the same physical wide row. In my naïve understanding (correct me if I am wrong), in order to update a logical row, Cassandra has to retrieve the whole physical row, add columns to it, and put it back. So I put all my 100 updates in a batch and send it over. Would Cassandra be smart enough to recognize that they all belong to one physical row, retrieve it once, do all the updates and put it back once? Is my batch thing even relevant in this case? What happens if I just send updates one by one? I want to understand why I should use batches. I don't really care about one timestamp for all records, I only care about efficiency. So I thought, I want to at least save on the number of remote calls, but I also wonder what happens on Cassandra side. This email, along with any attachments, is confidential and may be legally privileged or otherwise protected from disclosure. Any unauthorized dissemination, copying or use of the contents of this email is strictly prohibited and may be in violation of law. If you are not the intended recipient, any disclosure, copying, forwarding or distribution of this email is strictly prohibited and this email and any attachments should be deleted immediately. This email and any attachments do not constitute an offer to sell or a solicitation of an offer to purchase any interest in any investment vehicle sponsored by Moon Capital Management LP (Moon Capital). Moon Capital does not provide legal, accounting or tax advice. Any statement regarding legal, accounting or tax matters was not intended or written to be relied upon by any person as advice. Moon Capital does not waive confidentiality or privilege as a result of this email.
Re: Cassandra Evaluation/ Benchmarking: Throughput not scaling as expected neither latency showing good numbers
On 2012.07.18. 7:13, Code Box wrote: The cassandra stress tool gives me values around 2.5 milli seconds for writing. The problem with the Cassandra Stress Tool is that it just gives the average latency numbers and the average latency numbers that i am getting are comparable in some cases. It is the 95 percentile and 99 percentile numbers are the ones that are bad. So it means that the 95% of requests are really bad and the rest 5% are really good that makes the average go down. No, the opposite is true. 95% of the requests are fast, and 5% is slow. Or in case of the 99 percentile, 99% is fast, 1% is slow. Except if you order your samples in the opposite direction, not in the usual.
Re: Batch update efficiency with composite key
Cassandra doesn't do reads before writes. It just places the updates in memtables. In effect updates are the same as inserts.Batches certainly help with network latency, and some minor amount of code repetitiion on the server side. - Original Message -From: quot;Leonid Ilyevskyquot; ;lilyev...@mooncapital.com
Re: Composite Column Expiration Behavior
Hi, I don't think that composite columns have parent columns. your point might be true for supercolumns .. but each composite column is probably independent.. On Wed, Jul 18, 2012 at 9:14 PM, Thomas Van de Velde thomase...@gmail.com wrote: Hi there, I am trying to understand the expiration behavior of composite columns. Assume I have two entries both have the same parent column name but each one has a different ttl. Would expiration be applied at the parent column level (taking into account ttls set per column under the parent and expiring all of the child columns when the most recent ttl is met) or is each each child entry expired independently? Would this be correct? A:B-ttl=5 A:C-ttl=10 t+5: Nothing gets expired (because A:C's expiration has not yet been reached) t+10: Both A:B and A:C are expired Thanks, Thomas
Can't change replication factor in Cassandra 1.1.2
Hi folks, I have an interesting problem in Cassandra 1.1.2, a Google Search wasn't much help, so I thought I'd ask here. Essentially, I have a problem keyspace in my 2-node cluster that keeps me from changing the replication factor on a specific keyspace. It's probably easier to show what I'm seeing in cassandra-cli: [default@foobar] update keyspace test1 with strategy_options = {replication_factor:1}; 2d5f0d16-bb4b-3d75-a084-911fe39f7629 Waiting for schema agreement... ... schemas agree across the cluster [default@foobar] update keyspace test1 with strategy_options = {replication_factor:1}; 7745dd06-ee5d-3e74-8734-7cdc18871e67 Waiting for schema agreement... ... schemas agree across the cluster Even though keyspace test1 had a replication_factor of 1 to start with, each of the above UPDATE KEYSPACE commands caused a new UUID to be generated for the schema, which I assume is normal and expected. Then I try it with the problem keyspace: [default@foobar] update keyspace foobar with strategy_options = {replication_factor:1}; 7745dd06-ee5d-3e74-8734-7cdc18871e67 Waiting for schema agreement... ... schemas agree across the cluster Note that the UUID did not change, and the replication_factor in the underlying database did not change either. The funny thing is that foobar had a replication_factor of 1 yesterday, then I brought my second node online and changed the replication_factor to 2 without incident. I only ran into issues when I tried changing it back to 1. I tried running nodetool clean on both nodes, but the problem persists. Any suggestions? Thanks, -- Doug -- http://twitter.com/dmuth
An experiment using Spring Data w/ Cassandra (initially via JPA/Kundera)
This is just an FYI. I experimented w/ Spring Data JPA w/ Cassandra leveraging Kundera. It sort of worked: https://github.com/boneill42/spring-data-jpa-cassandra http://brianoneill.blogspot.com/2012/07/spring-data-w-cassandra-using-jpa.html I'm now working on a pure Spring Data adapter using Astyanax: https://github.com/boneill42/spring-data-cassandra I'll keep you posted. (Thanks to all those that helped out w/ advice) -brian -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/
Cassandra startup times
Good evening, I am interested in improving the startup time of our cassandra cluster. We have a 3 node cluster (replication factor of 3) in which our application requires quorum reads and writes to function. Each machine is well specced with 24gig of ram, 10 cores, jna enabled etc. On each server our keyspace files are so far around 90 Gb (stored on NFS although I am not seeing signs that we have much network io). This size will grow in future. Our startup time for 1 server at the moment is greater then half an hour (45 minutes to 50 minutes even) which is putting a risk factor on the resiliance of our service. I have tried version 1.09 to latest 1.12. I do not see too much system utilization while starting either. I gazed apon an article suggesting increased speed in 1.2 although when I set it up, it did not seem to be any faster at all (if not slower). I was observing what was happening during startup and I noticed (via strace), cassandra was doing lots of 8 byte reads from: /var/lib/cassandra/data/XX/YY/XXX-YYY-hc-1871-CompressionInfo.db /var/lib/cassandra/data/XX/YY/XXX-YYY-hc-1874-CompressionInfo.db Also... Is there someone I can change the 8 byte reads to something greater? 8 byte reads across NFS is terribly inefficient (and I am guessing the cause of our terribly slow startup times). Regards, -- -Ben