Re: Replication factor - Consistency Questions

2012-07-18 Thread Jason Tang
Yes, for ALL, it is not good for HA, and because we meet problem when use
QUORAM, and current solution is switch Write:QUORAM / Read:QUORAM when got
UnavailableException exception.

2012/7/18 Jay Parashar jparas...@itscape.com

 Thanks..but write ALL will fail for any downed nodes. I am thinking of
 QUORAM.

 ** **

 *From:* Jason Tang [mailto:ares.t...@gmail.com]
 *Sent:* Tuesday, July 17, 2012 8:24 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Replication factor - Consistency Questions

 ** **

 Hi

 ** **

 I am starting using Cassandra for not a long time, and also have problems
 in consistency.

 ** **

 Here is some thinking.

 If you have Write:Any / Read:One, it will have consistency problem, and if
 you want to repair, check your schema, and check the parameter Read repair
 chance: 

 http://wiki.apache.org/cassandra/StorageConfiguration 

 ** **

 And if you want to get consistency result, my suggestion is to have
 Write:ALL / Read:One, since for Cassandra, write is more faster then read.
 

 ** **

 For performance impact, you need to test your traffic, and if your memory
 can not cache all your data, or your network is not fast enough, then yes,
 it will impact to write one more node.

 ** **

 BRs

 ** **

 2012/7/18 Jay Parashar jparas...@itscape.com

 Hello all,

 There is a lot of material on Replication factor and Consistency level but
 I
 am a little confused by what is happening on my setup. (Cassandra 1.1.2). I
 would appreciate any answers.

 My Setup: A cluster of 2 nodes evenly balanced. My RF =2, Consistency
 Level;
 Write = ANY and Read = 1

 I know that my consistency is Weak but since my RF = 2, I thought data
 would
 be just duplicated in both the nodes but sometimes, querying does not give
 me the correct (or gives partial) results. In other times, it gives me the
 right results
 Is the Read Repair going on after the first query? But as RF = 2, data is
 duplicated then why the repair?
 Note: My query is done a while after the Writes so data should have been in
 both the nodes. Or is this not the case (flushing not happening etc)?

 I am thinking of making the Write as 1 and Read as QUORAM so R + W  RF (1
 +
 2  2) to give strong consistency. Will that affect performance a lot
 (generally speaking)?

 Thanks in advance
 Regards

 Jay

 

 ** **



Re: Cassandra Evaluation/ Benchmarking: Throughput not scaling as expected neither latency showing good numbers

2012-07-18 Thread Manoj Mainali
How kind of client are you using in YCSB? If you want to improve latency,
try distributing the requests among nodes instead of stressing a single
node, try host connection pooling instead of creating connection for each
request. Check high level clients like hector or asyantax for use if you
are not already using them. Some clients have ring aware request handling.

You have a 3 nodes cluster and using a RF of three, that means all the node
will get the data. What CL are you using for writes? Latency increases for
strong CL.

If you want to increase throughput, try increasing the number of clients.
Of course, it doesnt mean that throughtput will always increase. My
observation was that it will increase and after certain number of clients
throughput decrease again.

Regards,
Manoj Mainali


On Wednesday, July 18, 2012, Code Box wrote:

 The cassandra stress tool gives me values around 2.5 milli seconds for
 writing. The problem with the Cassandra Stress Tool is that it just gives
 the average latency numbers and the average latency numbers that i am
 getting are comparable in some cases. It is the 95 percentile and 99
 percentile numbers are the ones that are bad. So it means that the 95% of
 requests are really bad and the rest 5% are really good that makes the
 average go down. I want to make sure that the 95% and 99% values are in one
 digit milli seconds. I want them to be single digit because i have seen
 people getting those numbers.

 This is my conclusion till now with all the investigations:-

 Three node cluster with replication factor of 3 gets me around 10 ms 100%
 writes with consistency equal to ONE. The reads are really bad and they are
 around 65ms.

 I thought that network is the issue so i moved the client on a local
 machine. Client on the local machine with one node cluster gives me again
 good average write latencies but the 99%ile and 95%ile are bad. I am
 getting around 10 ms for write and 25 ms for read.

 Network Bandwidth between the client and server is 1 Gigabit/second. I was
 able to at the max generate 25 K requests. So it could be the client is the
 bottleneck. I am using YCSB. May be i should change my client to some other.

 Throughput that i got from a client at the maximum local was 35K and
 remote was 17K.


 I can try these things now:-

 Use a different client and see how much numbers i get for 99% and 95%. I
 am not sure if there is any client that gives me this detailed or i have to
 write one of my own.

 Tweak some hard disk settings raid0 and xfs / ext4 and see if that helps.

 Could be a possibility that the cassandra 0.8 to 1.1 the 95% and 99%
 numbers have gone down.  The throughput numbers have also gone down.

 Is there any other client that i can use except the cassandra stress tool
 and YCSB  and what ever numbers i have got are they good ?


 --Akshat Vig.




 On Tue, Jul 17, 2012 at 9:22 PM, aaron morton aa...@thelastpickle.comwrote:

 I would benchmark a default installation, then start tweaking. That way
 you can see if your changes result in improvements.

 To simplify things further try using the tools/stress utility in the
 cassandra source distribution first. It's pretty simple to use.

 Add clients until you see the latency increase and tasks start to back up
 in nodetool tpstats. If you see it report dropped messages it is over
 loaded.

 Hope that helps.

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 18/07/2012, at 4:48 AM, Code Box wrote:

 Thanks a lot for your reply guys. I was trying fsyn = batch and window
 =0ms to see if the disk utilization is happening full on my drive. I
 checked the  numbers using iostat the numbers were around 60% and the CPU
 usage was also not too high.

 Configuration of my Setup :-

 I have three m1.xlarge hosts each having 15 GB RAM and 4 CPU. It has 8
 EC2 Compute Units.
 I have kept the replication factor equal to 3. The typical write size is 1
 KB.

 I tried adding different nodes each with 200 threads and the throughput
 got split into two. If i do it from a single host with FSync Set to
 Periodic and Window Size equal to 1000ms and using two nodes i am getting
 these numbers :-


 [OVERALL], Throughput(ops/sec), 4771
 [INSERT], AverageLatency(us), 18747
 [INSERT], MinLatency(us), 1470
 [INSERT], MaxLatency(us), 446413
 [INSERT], 95thPercentileLatency(ms), 55
 [INSERT], 99thPercentileLatency(ms), 167

 [OVERALL], Throughput(ops/sec), 4678
 [INSERT], AverageLatency(us), 22015
 [INSERT], MinLatency(us), 1439
 [INSERT], MaxLatency(us), 466149
 [INSERT], 95thPercentileLatency(ms), 62
 [INSERT], 99thPercentileLatency(ms), 171

 Is there something i am doing wrong in cassandra Setup ?? What is the bet
 Setup for Cassandra to get high throughput and good write latency numbers ?



 On Tue, Jul 17, 2012 at 7:02 AM, Sylvain Lebresne sylv...@datastax.com




Batch update efficiency with composite key

2012-07-18 Thread Leonid Ilyevsky
I have a question about efficiency of updates to a CF with composite key.

Let say I have 100 of logical rows to update, and they all belong to the same 
physical wide row. In my naïve understanding (correct me if I am wrong), in 
order to update a logical row, Cassandra has to retrieve the whole physical 
row, add columns to it, and put it back. So I put all my 100 updates in a batch 
and send it over. Would Cassandra be smart enough to recognize that they all 
belong to one physical row, retrieve it once, do all the updates and put it 
back once? Is my batch thing even relevant in this case? What happens if I just 
send updates one by one?

I want to understand why I should use batches. I don't really care about one 
timestamp for all records, I only care about efficiency. So I thought, I want 
to at least save on the number of remote calls, but I also wonder what happens 
on Cassandra side.



This email, along with any attachments, is confidential and may be legally 
privileged or otherwise protected from disclosure. Any unauthorized 
dissemination, copying or use of the contents of this email is strictly 
prohibited and may be in violation of law. If you are not the intended 
recipient, any disclosure, copying, forwarding or distribution of this email is 
strictly prohibited and this email and any attachments should be deleted 
immediately. This email and any attachments do not constitute an offer to sell 
or a solicitation of an offer to purchase any interest in any investment 
vehicle sponsored by Moon Capital Management LP (Moon Capital). Moon Capital 
does not provide legal, accounting or tax advice. Any statement regarding 
legal, accounting or tax matters was not intended or written to be relied upon 
by any person as advice. Moon Capital does not waive confidentiality or 
privilege as a result of this email.


Re: Cassandra Evaluation/ Benchmarking: Throughput not scaling as expected neither latency showing good numbers

2012-07-18 Thread Hontvári József Levente

On 2012.07.18. 7:13, Code Box wrote:
The cassandra stress tool gives me values around 2.5 milli seconds for 
writing. The problem with the Cassandra Stress Tool is that it just 
gives the average latency numbers and the average latency numbers that 
i am getting are comparable in some cases. It is the 95 percentile and 
99 percentile numbers are the ones that are bad. So it means that the 
95% of requests are really bad and the rest 5% are really good that 
makes the average go down.



No, the opposite is true. 95% of the requests are fast, and 5% is slow. 
Or in case of the 99 percentile, 99% is fast, 1% is slow. Except if you 
order your samples in the opposite direction, not in the usual.


Re: Batch update efficiency with composite key

2012-07-18 Thread Dave Brosius
 Cassandra doesn't do reads before writes. It just places the updates in 
memtables. In effect updates are the same as inserts.Batches certainly help 
with network latency, and some minor amount of code repetitiion on the server 
side.  - Original Message -From: quot;Leonid Ilyevskyquot; 
;lilyev...@mooncapital.com 

Re: Composite Column Expiration Behavior

2012-07-18 Thread rohit bhatia
Hi,

I don't think that composite columns have parent columns. your point
might be true for supercolumns ..
but each composite column is probably independent..

On Wed, Jul 18, 2012 at 9:14 PM, Thomas Van de Velde
thomase...@gmail.com wrote:
 Hi there,

 I am trying to understand the expiration behavior of composite columns.
 Assume I have two entries both have the same parent column name but each one
 has a different ttl. Would expiration be applied at the parent column level
 (taking into account ttls set per column under the parent and expiring all
 of the child columns when the most recent ttl is met) or is each each child
 entry expired independently?

 Would this be correct?

 A:B-ttl=5
 A:C-ttl=10


 t+5: Nothing gets expired (because A:C's expiration has not yet been
 reached)
 t+10: Both A:B and A:C are expired


 Thanks,
 Thomas


Can't change replication factor in Cassandra 1.1.2

2012-07-18 Thread Douglas Muth
Hi folks,

I have an interesting problem in Cassandra 1.1.2, a Google Search
wasn't much help, so I thought I'd ask here.

Essentially, I have a problem keyspace in my 2-node cluster that
keeps me from changing the replication factor on a specific keyspace.
It's probably easier to show what I'm seeing in cassandra-cli:

[default@foobar] update keyspace test1 with strategy_options =
{replication_factor:1};
2d5f0d16-bb4b-3d75-a084-911fe39f7629
Waiting for schema agreement...
... schemas agree across the cluster
[default@foobar] update keyspace test1 with strategy_options =
{replication_factor:1};
7745dd06-ee5d-3e74-8734-7cdc18871e67
Waiting for schema agreement...
... schemas agree across the cluster

Even though keyspace test1 had a replication_factor of 1 to start
with, each of the above UPDATE KEYSPACE commands caused a new UUID to
be generated for the schema, which I assume is normal and expected.

Then I try it with the problem keyspace:

[default@foobar] update keyspace foobar with strategy_options =
{replication_factor:1};
7745dd06-ee5d-3e74-8734-7cdc18871e67
Waiting for schema agreement...
... schemas agree across the cluster

Note that the UUID did not change, and the replication_factor in the
underlying database did not change either.

The funny thing is that foobar had a replication_factor of 1
yesterday, then I brought my second node online and changed the
replication_factor to 2 without incident.  I only ran into issues when
I tried changing it back to 1.

I tried running nodetool clean on both nodes, but the problem persists.

Any suggestions?

Thanks,

-- Doug

-- 
http://twitter.com/dmuth


An experiment using Spring Data w/ Cassandra (initially via JPA/Kundera)

2012-07-18 Thread Brian O'Neill
This is just an FYI.

I experimented w/ Spring Data JPA w/ Cassandra leveraging Kundera.

It sort of worked:
https://github.com/boneill42/spring-data-jpa-cassandra
http://brianoneill.blogspot.com/2012/07/spring-data-w-cassandra-using-jpa.html

I'm now working on a pure Spring Data adapter using Astyanax:
https://github.com/boneill42/spring-data-cassandra

I'll keep you posted.

(Thanks to all those that helped out w/ advice)

-brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Cassandra startup times

2012-07-18 Thread Ben Kaehne
Good evening,

I am interested in improving the startup time of our cassandra cluster.

We have a 3 node cluster (replication factor of 3) in which our application
requires quorum reads and writes to function.

Each machine is well specced with 24gig of ram, 10 cores, jna enabled etc.

On each server our keyspace files are so far around 90 Gb (stored on NFS
although I am not seeing signs that we have much network io). This size
will grow in future.

Our startup time for 1 server at the moment is greater then half an hour
(45 minutes to 50 minutes even) which is putting a risk factor on the
resiliance of our service. I have tried version 1.09 to latest 1.12.

I do not see too much system utilization while starting either.

I gazed apon an article suggesting increased speed in 1.2 although when I
set it up, it did not seem to be any faster at all (if not slower).

I was observing what was happening during startup and I noticed (via
strace), cassandra was doing lots of 8 byte reads from:

 
/var/lib/cassandra/data/XX/YY/XXX-YYY-hc-1871-CompressionInfo.db
 
/var/lib/cassandra/data/XX/YY/XXX-YYY-hc-1874-CompressionInfo.db

Also... Is there someone I can change the 8 byte reads to something
greater? 8 byte reads across NFS is terribly inefficient (and I am guessing
the cause of our terribly slow startup times).

Regards,

-- 
-Ben