RE: Read latency skyrockets after 95-98th percentile

2012-07-31 Thread Viktor Jevdokimov
What is a data load? Does it fits in RAM?

I bet it's due to GC. Since this is a 1 node only, dynamic snitch is out of 
scope.




Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News] http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Andras Szerdahelyi [mailto:andras.szerdahe...@ignitionone.com]
Sent: Tuesday, July 31, 2012 15:53
To: user@cassandra.apache.org
Subject: Read latency skyrockets after 95-98th percentile

hey list,

i've been trying to understand why we are seeing rather nasty read latency 
peaks in as much as 2% of our total read ( not sure what the underlying thrift 
call is, should be get_slice ) requests that we have been **unable to tie to 
garbage collection or blocking I/O.**
This is what i mean by nasty :
Mean: 1.186341, Max: 41.912 stDev: 2.2490502737495333, 75th:1.02424999, 
95th: 2.11034977, 98th: 5.21026034, 99th: 13.248351, 
999th: 41.8945400041

This is measured around the Astyanax / Hector wrappers for running column slice 
queries and iterating over their results. FWIW they perform very similarly.

Now, the mean is awesome - 1.2ms for a read is really really fast, i think 
thrift overhead makes up for probably 1ms alone. What i can't live with is 
pretty much everything thats beyond the 98th percentile. Perhaps my 
expectations are misaligned to begin with and this is great, but the dataset 
under test that yielded this result ( see below ) is extremely small - only a 
1000 items per test made up of a few small dynamic composite columns  and the 
read is being done sequentially, from a single threaded client. I just don't 
understand whats causing the peaks.

Here's the complete run-down on the setup:
The data is dynamic composite columns with string or integer values. The 
DynamicComposites themselves are made up of 3 Components each: 
String,String,Integer and String,Integer,Integer

The query is getting all columns ( in this particular test, that will be a 
single column ) of  one of the composite types ( String, String, Integer or 
String, Integer, Integer ) for a particular string key.
( How this is achieved is more or less at https://gist.github.com/3207894 )


The production cluster is set up as follows:
30-40 read/sec clients from multiple hosts, NetworkTopologyStrategy, 2 replicas 
per DC, read consistency: ONE, 2DCs, 2 nodes per DC )
8GB multi-core physical hosts
most settings on default in cassandra.yaml


In an attempt to isolate the problem, i've done the following ( this is where 
the results above are coming from but we see roughly the same numbers in prod )

- created a short unit test around the ( two ) classes that wrap the Astyanax 
and Hector libraries in our application, in which i'm only testing the read 
performance
- in setUp() i start Cassandra with

CassandraStoreKeeperTest.cassandra = new CassandraDaemon();
CassandraStoreKeeperTest.cassandra.init(null);
CassandraStoreKeeperTest.cassandra.start();

- same cassandra yaml as in prod, file at https://gist.github.com/3207894 ( 
mostly defaults )
- i pass the following flags to the JVM running my unit test:  -server -Xmx2G 
and all the GC flags from cassandra-env.sh, plus the jamm java agent
- i warm up the embedded cassandra with 1k reads and 1k writes
- the test is 1000 single threaded, sequential reads via Hector and Astyanax ( 
no significant differences between the two ) via thrift to the embedded 
single-node Cassandra instance
- i read back the 1000 items saved in the warm-up

The results are
Mean: 1.186341, Max: 41.912 stDev: 2.2490502737495333, 75th:1.02424999, 
95th: 2.11034977, 98th: 5.21026034, 99th: 13.248351, 
999th: 41.8945400041

here's the GC output during the read

[GC [ParNew
Desired survivor size 1081344 bytes, new threshold 1 (max 1)
- age   1: 281480 bytes, 281480 total
: 17322K-308K(19136K), 0.0037375 secs] 28742K-11885K(83008K), 0.0038170 secs] 
[Times: user=0.01 sys=0.00, real=0.01 secs]
[GC [ParNew
Desired survivor size 1081344 bytes, new threshold 1 (max 1)
- age   1: 261360 bytes, 261360 total
: 17332K-386K(19136K), 0.0034069 secs] 28909K-12098K(83008K), 0.0034849 secs] 
[Times: user=0.00 sys=0.01, real=0.00 

Re: Read latency skyrockets after 95-98th percentile

2012-07-31 Thread Andras Szerdahelyi
Victor, the data is a single composite column String, String, Integer with a 
string value. The key is a UTF8 string
I can reproduce this exact behavior with only 100 of these items on a 2GB heap.

Also, please see -XX:+PrintGCDetails output below, i'm not convinced it's GC. 
With a 100 items no garbage collection is logged at all.

here's the GC output during the read

[GC [ParNew
Desired survivor size 1081344 bytes, new threshold 1 (max 1)
- age   1: 281480 bytes, 281480 total
: 17322K-308K(19136K), 0.0037375 secs] 28742K-11885K(83008K), 0.0038170 secs] 
[Times: user=0.01 sys=0.00, real=0.01 secs]
[GC [ParNew
Desired survivor size 1081344 bytes, new threshold 1 (max 1)
- age   1: 261360 bytes, 261360 total
: 17332K-386K(19136K), 0.0034069 secs] 28909K-12098K(83008K), 0.0034849 secs] 
[Times: user=0.00 sys=0.01, real=0.00 secs]
[GC [ParNew
Desired survivor size 1081344 bytes, new threshold 1 (max 1)
- age   1: 241336 bytes, 241336 total
: 17410K-383K(19136K), 0.0035252 secs] 29122K-12209K(83008K), 0.0036062 secs] 
[Times: user=0.00 sys=0.00, real=0.00 secs]



thanks,
Andras


On 31 Jul 2012, at 16:04, Viktor Jevdokimov wrote:

What is a data load? Does it fits in RAM?

I bet it’s due to GC. Since this is a 1 node only, dynamic snitch is out of 
scope.




Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

signature-logo29.pnghttp://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Andras Szerdahelyi [mailto:andras.szerdahe...@ignitionone.com]
Sent: Tuesday, July 31, 2012 15:53
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Read latency skyrockets after 95-98th percentile

hey list,

i've been trying to understand why we are seeing rather nasty read latency 
peaks in as much as 2% of our total read ( not sure what the underlying thrift 
call is, should be get_slice ) requests that we have been **unable to tie to 
garbage collection or blocking I/O.**
This is what i mean by nasty :
Mean: 1.186341, Max: 41.912 stDev: 2.2490502737495333, 75th:1.02424999, 
95th: 2.11034977, 98th: 5.21026034, 99th: 13.248351, 
999th: 41.8945400041

This is measured around the Astyanax / Hector wrappers for running column slice 
queries and iterating over their results. FWIW they perform very similarly.

Now, the mean is awesome - 1.2ms for a read is really really fast, i think 
thrift overhead makes up for probably 1ms alone. What i can't live with is 
pretty much everything thats beyond the 98th percentile. Perhaps my 
expectations are misaligned to begin with and this is great, but the dataset 
under test that yielded this result ( see below ) is extremely small - only a 
1000 items per test made up of a few small dynamic composite columns  and the 
read is being done sequentially, from a single threaded client. I just don't 
understand whats causing the peaks.

Here's the complete run-down on the setup:
The data is dynamic composite columns with string or integer values. The 
DynamicComposites themselves are made up of 3 Components each: 
String,String,Integer and String,Integer,Integer

The query is getting all columns ( in this particular test, that will be a 
single column ) of  one of the composite types ( String, String, Integer or 
String, Integer, Integer ) for a particular string key.
( How this is achieved is more or less at https://gist.github.com/3207894 )


The production cluster is set up as follows:
30-40 read/sec clients from multiple hosts, NetworkTopologyStrategy, 2 replicas 
per DC, read consistency: ONE, 2DCs, 2 nodes per DC )
8GB multi-core physical hosts
most settings on default in cassandra.yaml


In an attempt to isolate the problem, i've done the following ( this is where 
the results above are coming from but we see roughly the same numbers in prod )

- created a short unit test around the ( two ) classes that wrap the Astyanax 
and Hector libraries in our application, in which i'm only testing the read 
performance
- in setUp() i start Cassandra with

CassandraStoreKeeperTest.cassandra = new CassandraDaemon();
CassandraStoreKeeperTest.cassandra.init(null);
CassandraStoreKeeperTest.cassandra.start();

- same cassandra yaml as in prod, file at