RE: High performance disk io

2013-05-24 Thread Christopher Wirt
Hi Aaron, 

 

We have a pretty big key space and have found to get a decent key cache hit
rate it needs to be quite large.

 

We get 3 SStables per read at the 99th percentile, 2 at the 98th and 95th, 1
below that.

 

No problems at the moment with GC, but we're still quite early in our
Cassandra adventure. 

 

 

Thanks for the advice

 

Chris

 

From: aaron morton [mailto:aa...@thelastpickle.com] 
Sent: 23 May 2013 23:50
To: user@cassandra.apache.org
Subject: Re: High performance disk io

 

 I am currently trying to really study the effect of the width of a row
(being in multiple sstables) vs its 95th percentile read time.

I'd be interested to see your findings. 

 

Is use 3+ SSTables per read as (from cfhistograms) as a warning sign to dig
deeper in the data model. Also the type of query impacts on the number of
SSTables per read, queries by column name can short circuit and may be
served from (say) 0 or 1 sstables even if the row is spread out. 

 

-We don't change anything and just keep upping our keycache.

800MB is a very high key cache and may result in poor GC performance which
is ultimately going to hurt your read latency. Pay attention to what GC is
doing, both ParNew and CMS and reduce the key cache if needed. When ParNew
runs the server is stalled. 

 

Cheers

 

-

Aaron Morton

Freelance Cassandra Consultant

New Zealand

 

@aaronmorton

http://www.thelastpickle.com

 

On 24/05/2013, at 3:16 AM, Edward Capriolo  wrote:





I have used both rotation disks with lots of RAM as well as SSD devices. An
important thing to consider is that SSD devices are not magic. You have
big-o-notation in several places. 

1) more data large bloom filters
2) more data (larger key caches) JVM overhead

3) more requests more young gen JVM overhead
4) more data longer compaction (even with ssd)
5) more writes (more memtable flushing)
Bottom line: more data more disk seeks

We have used both the mid level SSD as well as the costly fusion io. Fit in
RAM/VFScache delivers better more predictable low latency, even with very
fast disks the average, 95th, and 99th, percentile can get by very far
apart. I am currently trying to really study the effect of the width of a
row (being in multiple sstables) vs its 95th percentile read time.

 

On Thu, May 23, 2013 at 10:43 AM, Christopher Wirt 
wrote:



Hi Igor,

 

I was talking about 99th percentile from the Cassandra histograms when I
said '1 or 2 ms for most cf'. 

 

But we have measured client side too and generally get a couple ms added on
top.. as one might expect.

 

Anyone interested - 

diskio (my original question) we have tried out the multiple SSD setup and
found it to work well and reduce the impact of a repair on node performance.


We ended up going with the single data directory in cassandra.yaml and mount
one SSD against that. Then have a dedicated SSD per large column family.

We're now moving all of nodes to have the same setup.

 

 

Chris

 

From: Igor [mailto:i...@4friends.od.ua] 
Sent: 23 May 2013 15:00
To: user@cassandra.apache.org
Subject: Re: High performance disk io

 

Hello Christopher,

BTW, are you talking about 99th percentiles on client side, or about
percentiles from cassandra histograms for CF on cassandra side?

Thanks!

On 05/22/2013 05:41 PM, Christopher Wirt wrote:

Hi Igor, 

 

Yea same here, 15ms for 99th percentile is our max. Currently getting one or
two ms for most CF. It goes up at peak times which is what we want to avoid.

 

We're using Cass 1.2.4 w/vnodes and our own barebones driver on top of
thrift. Needed to be .NET so Hector and Astyanax were not options.

 

Do you use SSDs or multiple SSDs in any kind of configuration or RAID?

 

Thanks

 

Chris

 

From: Igor [mailto:i...@4friends.od.ua] 
Sent: 22 May 2013 15:07
To: user@cassandra.apache.org
Subject: Re: High performance disk io

 

Hello

What level of read performance do you expect? We have limit 15 ms for 99
percentile with average read latency near 0.9ms. For some CF 99 percentile
actually equals to 2ms, for other - to 10ms, this depends on the data volume
you read in each query.

Tuning read performance involved cleaning up data model, tuning
cassandra.yaml, switching from Hector to astyanax, tuning OS parameters.

On 05/22/2013 04:40 PM, Christopher Wirt wrote:

Hello,

 

We're looking at deploying a new ring where we want the best possible read
performance.

 

We've setup a cluster with 6 nodes, replication level 3, 32Gb of memory, 8Gb
Heap, 800Mb keycache, each holding 40/50Gb of data on a 200Gb SSD and 500Gb
SATA for OS and commitlog

Three column families

ColFamily1 50% of the load and data

ColFamily2 35% of the load and data

ColFamily3 15% of the load and data

 

At the moment we are still seeing around 20% disk utilisation and
occasionally as high as 40/50% on some nodes at peak time.. we are
conducting some semi live testing.

CPU looks fine, memory is fine, keycac

Re: High performance disk io

2013-05-23 Thread aaron morton
>  I am currently trying to really study the effect of the width of a row 
> (being in multiple sstables) vs its 95th percentile read time.
I'd be interested to see your findings. 

Is use 3+ SSTables per read as (from cfhistograms) as a warning sign to dig 
deeper in the data model. Also the type of query impacts on the number of 
SSTables per read, queries by column name can short circuit and may be served 
from (say) 0 or 1 sstables even if the row is spread out. 

> -We don’t change anything and just keep upping our keycache.
> 

800MB is a very high key cache and may result in poor GC performance which is 
ultimately going to hurt your read latency. Pay attention to what GC is doing, 
both ParNew and CMS and reduce the key cache if needed. When ParNew runs the 
server is stalled. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 24/05/2013, at 3:16 AM, Edward Capriolo  wrote:

> I have used both rotation disks with lots of RAM as well as SSD devices. An 
> important thing to consider is that SSD devices are not magic. You have 
> big-o-notation in several places. 
> 1) more data large bloom filters
> 2) more data (larger key caches) JVM overhead
> 3) more requests more young gen JVM overhead
> 4) more data longer compaction (even with ssd)
> 5) more writes (more memtable flushing)
> Bottom line: more data more disk seeks
> 
> We have used both the mid level SSD as well as the costly fusion io. Fit in 
> RAM/VFScache delivers better more predictable low latency, even with very 
> fast disks the average, 95th, and 99th, percentile can get by very far apart. 
> I am currently trying to really study the effect of the width of a row (being 
> in multiple sstables) vs its 95th percentile read time.
> 
> 
> On Thu, May 23, 2013 at 10:43 AM, Christopher Wirt  
> wrote:
> Hi Igor,
> 
>  
> 
> I was talking about 99th percentile from the Cassandra histograms when I said 
> ‘1 or 2 ms for most cf’.
> 
>  
> 
> But we have measured client side too and generally get a couple ms added on 
> top.. as one might expect.
> 
>  
> 
> Anyone interested -
> 
> diskio (my original question) we have tried out the multiple SSD setup and 
> found it to work well and reduce the impact of a repair on node performance.
> 
> We ended up going with the single data directory in cassandra.yaml and mount 
> one SSD against that. Then have a dedicated SSD per large column family.
> 
> We’re now moving all of nodes to have the same setup.
> 
>  
> 
>  
> 
> Chris
> 
>  
> 
> From: Igor [mailto:i...@4friends.od.ua] 
> Sent: 23 May 2013 15:00
> To: user@cassandra.apache.org
> Subject: Re: High performance disk io
> 
>  
> 
> Hello Christopher,
> 
> BTW, are you talking about 99th percentiles on client side, or about 
> percentiles from cassandra histograms for CF on cassandra side?
> 
> Thanks!
> 
> On 05/22/2013 05:41 PM, Christopher Wirt wrote:
> 
> Hi Igor,
> 
>  
> 
> Yea same here, 15ms for 99th percentile is our max. Currently getting one or 
> two ms for most CF. It goes up at peak times which is what we want to avoid.
> 
>  
> 
> We’re using Cass 1.2.4 w/vnodes and our own barebones driver on top of 
> thrift. Needed to be .NET so Hector and Astyanax were not options.
> 
>  
> 
> Do you use SSDs or multiple SSDs in any kind of configuration or RAID?
> 
>  
> 
> Thanks
> 
>  
> 
> Chris
> 
>  
> 
> From: Igor [mailto:i...@4friends.od.ua] 
> Sent: 22 May 2013 15:07
> To: user@cassandra.apache.org
> Subject: Re: High performance disk io
> 
>  
> 
> Hello
> 
> What level of read performance do you expect? We have limit 15 ms for 99 
> percentile with average read latency near 0.9ms. For some CF 99 percentile 
> actually equals to 2ms, for other - to 10ms, this depends on the data volume 
> you read in each query.
> 
> Tuning read performance involved cleaning up data model, tuning 
> cassandra.yaml, switching from Hector to astyanax, tuning OS parameters.
> 
> On 05/22/2013 04:40 PM, Christopher Wirt wrote:
> 
> Hello,
> 
>  
> 
> We’re looking at deploying a new ring where we want the best possible read 
> performance.
> 
>  
> 
> We’ve setup a cluster with 6 nodes, replication level 3, 32Gb of memory, 8Gb 
> Heap, 800Mb keycache, each holding 40/50Gb of data on a 200Gb SSD and 500Gb 
> SATA for OS and commitlog
> 
> Three column families
> 
> ColFamily1 50% of the load and data
> 
> ColFamily2 35% of the load and data
> 
> ColFamily3 15% of the load and data
> 
>  
> 
> At the moment we are still seeing around 20% disk utilisation 

Re: High performance disk io

2013-05-23 Thread Edward Capriolo
I have used both rotation disks with lots of RAM as well as SSD devices. An
important thing to consider is that SSD devices are not magic. You have
big-o-notation in several places.
1) more data large bloom filters
2) more data (larger key caches) JVM overhead
3) more requests more young gen JVM overhead
4) more data longer compaction (even with ssd)
5) more writes (more memtable flushing)
Bottom line: more data more disk seeks

We have used both the mid level SSD as well as the costly fusion io. Fit in
RAM/VFScache delivers better more predictable low latency, even with very
fast disks the average, 95th, and 99th, percentile can get by very far
apart. I am currently trying to really study the effect of the width of a
row (being in multiple sstables) vs its 95th percentile read time.


On Thu, May 23, 2013 at 10:43 AM, Christopher Wirt wrote:

> Hi Igor,
>
> ** **
>
> I was talking about 99th percentile from the Cassandra histograms when I
> said ‘1 or 2 ms for most cf’. 
>
> ** **
>
> But we have measured client side too and generally get a couple ms added
> on top.. as one might expect.
>
> ** **
>
> Anyone interested - 
>
> diskio (my original question) we have tried out the multiple SSD setup and
> found it to work well and reduce the impact of a repair on node
> performance. 
>
> We ended up going with the single data directory in cassandra.yaml and
> mount one SSD against that. Then have a dedicated SSD per large column
> family.
>
> We’re now moving all of nodes to have the same setup.
>
> ** **
>
> ** **
>
> Chris
>
> ** **
>
> *From:* Igor [mailto:i...@4friends.od.ua]
> *Sent:* 23 May 2013 15:00
> *To:* user@cassandra.apache.org
> *Subject:* Re: High performance disk io
>
> ** **
>
> Hello Christopher,
>
> BTW, are you talking about 99th percentiles on client side, or about
> percentiles from cassandra histograms for CF on cassandra side?
>
> Thanks!
>
> On 05/22/2013 05:41 PM, Christopher Wirt wrote:
>
> Hi Igor, 
>
>  
>
> Yea same here, 15ms for 99th percentile is our max. Currently getting one
> or two ms for most CF. It goes up at peak times which is what we want to
> avoid.
>
>  
>
> We’re using Cass 1.2.4 w/vnodes and our own barebones driver on top of
> thrift. Needed to be .NET so Hector and Astyanax were not options.
>
>  
>
> Do you use SSDs or multiple SSDs in any kind of configuration or RAID?
>
>  
>
> Thanks
>
>  
>
> Chris
>
>  
>
> *From:* Igor [mailto:i...@4friends.od.ua ]
> *Sent:* 22 May 2013 15:07
> *To:* user@cassandra.apache.org
> *Subject:* Re: High performance disk io
>
>  
>
> Hello
>
> What level of read performance do you expect? We have limit 15 ms for 99
> percentile with average read latency near 0.9ms. For some CF 99 percentile
> actually equals to 2ms, for other - to 10ms, this depends on the data
> volume you read in each query.
>
> Tuning read performance involved cleaning up data model, tuning
> cassandra.yaml, switching from Hector to astyanax, tuning OS parameters.
>
> On 05/22/2013 04:40 PM, Christopher Wirt wrote:
>
> Hello,
>
>  
>
> We’re looking at deploying a new ring where we want the best possible read
> performance.
>
>  
>
> We’ve setup a cluster with 6 nodes, replication level 3, 32Gb of memory,
> 8Gb Heap, 800Mb keycache, each holding 40/50Gb of data on a 200Gb SSD and
> 500Gb SATA for OS and commitlog
>
> Three column families
>
> ColFamily1 50% of the load and data
>
> ColFamily2 35% of the load and data
>
> ColFamily3 15% of the load and data
>
>  
>
> At the moment we are still seeing around 20% disk utilisation and
> occasionally as high as 40/50% on some nodes at peak time.. we are
> conducting some semi live testing.
>
> CPU looks fine, memory is fine, keycache hit rate is about 80% (could be
> better, so maybe we should be increasing the keycache size?)
>
>  
>
> Anyway, we’re looking into what we can do to improve this.
>
>  
>
> One conversion we are having at the moment is around the SSD disk setup..*
> ***
>
>  
>
> We are considering moving to have 3 smaller SSD drives and spreading the
> data across those.
>
>  
>
> The possibilities are:
>
> -We have a RAID0 of the smaller SSDs and hope that improves performance. *
> ***
>
> Will this acutally yield better throughput?
>
>  
>
> -We mount the SSDs to different directories and define multiple data
> directories in Cassandra.yaml.

RE: High performance disk io

2013-05-23 Thread Christopher Wirt
Hi Igor,

 

I was talking about 99th percentile from the Cassandra histograms when I
said '1 or 2 ms for most cf'. 

 

But we have measured client side too and generally get a couple ms added on
top.. as one might expect.

 

Anyone interested - 

diskio (my original question) we have tried out the multiple SSD setup and
found it to work well and reduce the impact of a repair on node performance.


We ended up going with the single data directory in cassandra.yaml and mount
one SSD against that. Then have a dedicated SSD per large column family.

We're now moving all of nodes to have the same setup.

 

 

Chris

 

From: Igor [mailto:i...@4friends.od.ua] 
Sent: 23 May 2013 15:00
To: user@cassandra.apache.org
Subject: Re: High performance disk io

 

Hello Christopher,

BTW, are you talking about 99th percentiles on client side, or about
percentiles from cassandra histograms for CF on cassandra side?

Thanks!

On 05/22/2013 05:41 PM, Christopher Wirt wrote:

Hi Igor, 

 

Yea same here, 15ms for 99th percentile is our max. Currently getting one or
two ms for most CF. It goes up at peak times which is what we want to avoid.

 

We're using Cass 1.2.4 w/vnodes and our own barebones driver on top of
thrift. Needed to be .NET so Hector and Astyanax were not options.

 

Do you use SSDs or multiple SSDs in any kind of configuration or RAID?

 

Thanks

 

Chris

 

From: Igor [mailto:i...@4friends.od.ua] 
Sent: 22 May 2013 15:07
To: user@cassandra.apache.org
Subject: Re: High performance disk io

 

Hello

What level of read performance do you expect? We have limit 15 ms for 99
percentile with average read latency near 0.9ms. For some CF 99 percentile
actually equals to 2ms, for other - to 10ms, this depends on the data volume
you read in each query.

Tuning read performance involved cleaning up data model, tuning
cassandra.yaml, switching from Hector to astyanax, tuning OS parameters.

On 05/22/2013 04:40 PM, Christopher Wirt wrote:

Hello,

 

We're looking at deploying a new ring where we want the best possible read
performance.

 

We've setup a cluster with 6 nodes, replication level 3, 32Gb of memory, 8Gb
Heap, 800Mb keycache, each holding 40/50Gb of data on a 200Gb SSD and 500Gb
SATA for OS and commitlog

Three column families

ColFamily1 50% of the load and data

ColFamily2 35% of the load and data

ColFamily3 15% of the load and data

 

At the moment we are still seeing around 20% disk utilisation and
occasionally as high as 40/50% on some nodes at peak time.. we are
conducting some semi live testing.

CPU looks fine, memory is fine, keycache hit rate is about 80% (could be
better, so maybe we should be increasing the keycache size?)

 

Anyway, we're looking into what we can do to improve this.

 

One conversion we are having at the moment is around the SSD disk setup..

 

We are considering moving to have 3 smaller SSD drives and spreading the
data across those.

 

The possibilities are:

-We have a RAID0 of the smaller SSDs and hope that improves performance. 

Will this acutally yield better throughput?

 

-We mount the SSDs to different directories and define multiple data
directories in Cassandra.yaml.

Will not having a layer of RAID controller improve the throughput?

 

-We mount the SSDs to different columns family directories and have a single
data directory declared in Cassandra.yaml. 

Think this is quite attractive idea.

What are the drawbacks? System column families will be on the main SATA?

 

-We don't change anything and just keep upping our keycache.

-Anything you guys can think of.

 

Ideas and thoughts welcome. Thanks for your time and expertise. 

 

Chris

 

 

 

 



Re: High performance disk io

2013-05-23 Thread Igor

Hello Christopher,

BTW, are you talking about 99th percentiles on client side, or about 
percentiles from cassandra histograms for CF on cassandra side?


Thanks!

On 05/22/2013 05:41 PM, Christopher Wirt wrote:


Hi Igor,

Yea same here, 15ms for 99^th percentile is our max. Currently getting 
one or two ms for most CF. It goes up at peak times which is what we 
want to avoid.


We're using Cass 1.2.4 w/vnodes and our own barebones driver on top of 
thrift. Needed to be .NET so Hector and Astyanax were not options.


Do you use SSDs or multiple SSDs in any kind of configuration or RAID?

Thanks

Chris

*From:*Igor [mailto:i...@4friends.od.ua]
*Sent:* 22 May 2013 15:07
*To:* user@cassandra.apache.org
*Subject:* Re: High performance disk io

Hello

What level of read performance do you expect? We have limit 15 ms for 
99 percentile with average read latency near 0.9ms. For some CF 99 
percentile actually equals to 2ms, for other - to 10ms, this depends 
on the data volume you read in each query.


Tuning read performance involved cleaning up data model, tuning 
cassandra.yaml, switching from Hector to astyanax, tuning OS parameters.


On 05/22/2013 04:40 PM, Christopher Wirt wrote:

Hello,

We're looking at deploying a new ring where we want the best
possible read performance.

We've setup a cluster with 6 nodes, replication level 3, 32Gb of
memory, 8Gb Heap, 800Mb keycache, each holding 40/50Gb of data on
a 200Gb SSD and 500Gb SATA for OS and commitlog

Three column families

ColFamily1 50% of the load and data

ColFamily2 35% of the load and data

ColFamily3 15% of the load and data

At the moment we are still seeing around 20% disk utilisation and
occasionally as high as 40/50% on some nodes at peak time.. we are
conducting some semi live testing.

CPU looks fine, memory is fine, keycache hit rate is about 80%
(could be better, so maybe we should be increasing the keycache size?)

Anyway, we're looking into what we can do to improve this.

One conversion we are having at the moment is around the SSD disk
setup..

We are considering moving to have 3 smaller SSD drives and
spreading the data across those.

The possibilities are:

-We have a RAID0 of the smaller SSDs and hope that improves
performance.

Will this acutally yield better throughput?

-We mount the SSDs to different directories and define multiple
data directories in Cassandra.yaml.

Will not having a layer of RAID controller improve the throughput?

-We mount the SSDs to different columns family directories and
have a single data directory declared in Cassandra.yaml.

Think this is quite attractive idea.

What are the drawbacks? System column families will be on the main
SATA?

-We don't change anything and just keep upping our keycache.

-Anything you guys can think of.

Ideas and thoughts welcome. Thanks for your time and expertise.

Chris





Re: High performance disk io

2013-05-22 Thread Wei Zhu
without VNodes, during repair -pr, it will stream data for all the replicates 
and repair all of them. So it will impact RF number of nodes. 
In the case of VNodes, the streaming/compaction should happen to all the 
physical nodes. I heard the repair is even worse for VNodes Test it and see 
how it goes. 

-Wei 
- Original Message -

From: "Dean Hiller"  
To: user@cassandra.apache.org, "Wei Zhu"  
Sent: Wednesday, May 22, 2013 12:19:44 PM 
Subject: Re: High performance disk io 

If you are only running repair on one node, should it not skip that node? So 
there should be no performance hit except when doing CL_ALL of course. We had 
to make a change to cassandra or slow nodes did impact us previously. 

Dean 

From: Wei Zhu mailto:wz1...@yahoo.com>> 
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>, Wei Zhu 
mailto:wz1...@yahoo.com>> 
Date: Wednesday, May 22, 2013 1:16 PM 
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>> 
Subject: Re: High performance disk io 

For us, the biggest killer is repair and compaction following repair. If you 
are running VNodes, you need to test the performance while running repair. 

 
From: "Igor" mailto:i...@4friends.od.ua>> 
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> 
Sent: Wednesday, May 22, 2013 7:48:34 AM 
Subject: Re: High performance disk io 

On 05/22/2013 05:41 PM, Christopher Wirt wrote: 
Hi Igor, 

Yea same here, 15ms for 99th percentile is our max. Currently getting one or 
two ms for most CF. It goes up at peak times which is what we want to avoid. 

Our 99 percentile also goes up at peak times but stay at acceptable level. 

We’re using Cass 1.2.4 w/vnodes and our own barebones driver on top of thrift. 
Needed to be .NET so Hector and Astyanax were not options. 

Astyanax is token-aware, so we avoid extra data hops between cassandra nodes. 

Do you use SSDs or multiple SSDs in any kind of configuration or RAID? 

No, single SSD per host 


Thanks 

Chris 

From: Igor [mailto:i...@4friends.od.ua] 
Sent: 22 May 2013 15:07 
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> 
Subject: Re: High performance disk io 

Hello 

What level of read performance do you expect? We have limit 15 ms for 99 
percentile with average read latency near 0.9ms. For some CF 99 percentile 
actually equals to 2ms, for other - to 10ms, this depends on the data volume 
you read in each query. 

Tuning read performance involved cleaning up data model, tuning cassandra.yaml, 
switching from Hector to astyanax, tuning OS parameters. 

On 05/22/2013 04:40 PM, Christopher Wirt wrote: 
Hello, 

We’re looking at deploying a new ring where we want the best possible read 
performance. 

We’ve setup a cluster with 6 nodes, replication level 3, 32Gb of memory, 8Gb 
Heap, 800Mb keycache, each holding 40/50Gb of data on a 200Gb SSD and 500Gb 
SATA for OS and commitlog 
Three column families 
ColFamily1 50% of the load and data 
ColFamily2 35% of the load and data 
ColFamily3 15% of the load and data 

At the moment we are still seeing around 20% disk utilisation and occasionally 
as high as 40/50% on some nodes at peak time.. we are conducting some semi live 
testing. 
CPU looks fine, memory is fine, keycache hit rate is about 80% (could be 
better, so maybe we should be increasing the keycache size?) 

Anyway, we’re looking into what we can do to improve this. 

One conversion we are having at the moment is around the SSD disk setup.. 

We are considering moving to have 3 smaller SSD drives and spreading the data 
across those. 

The possibilities are: 
-We have a RAID0 of the smaller SSDs and hope that improves performance. 
Will this acutally yield better throughput? 

-We mount the SSDs to different directories and define multiple data 
directories in Cassandra.yaml. 
Will not having a layer of RAID controller improve the throughput? 

-We mount the SSDs to different columns family directories and have a single 
data directory declared in Cassandra.yaml. 
Think this is quite attractive idea. 
What are the drawbacks? System column families will be on the main SATA? 

-We don’t change anything and just keep upping our keycache. 
-Anything you guys can think of. 

Ideas and thoughts welcome. Thanks for your time and expertise. 

Chris 








Re: High performance disk io

2013-05-22 Thread Hiller, Dean
If you are only running repair on one node, should it not skip that node?  So 
there should be no performance hit except when doing CL_ALL of course.  We had 
to make a change to cassandra or slow nodes did impact us previously.

Dean

From: Wei Zhu mailto:wz1...@yahoo.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>, Wei Zhu 
mailto:wz1...@yahoo.com>>
Date: Wednesday, May 22, 2013 1:16 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Subject: Re: High performance disk io

For us, the biggest killer is repair and compaction following repair. If you 
are running VNodes, you need to test the performance while running repair.


From: "Igor" mailto:i...@4friends.od.ua>>
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Sent: Wednesday, May 22, 2013 7:48:34 AM
Subject: Re: High performance disk io

On 05/22/2013 05:41 PM, Christopher Wirt wrote:
Hi Igor,

Yea same here, 15ms for 99th percentile is our max. Currently getting one or 
two ms for most CF. It goes up at peak times which is what we want to avoid.

Our 99 percentile also goes up at peak times but stay at acceptable level.

We’re using Cass 1.2.4 w/vnodes and our own barebones driver on top of thrift. 
Needed to be .NET so Hector and Astyanax were not options.

Astyanax is token-aware, so we avoid extra data hops between cassandra nodes.

Do you use SSDs or multiple SSDs in any kind of configuration or RAID?

No, single SSD per host


Thanks

Chris

From: Igor [mailto:i...@4friends.od.ua]
Sent: 22 May 2013 15:07
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: High performance disk io

Hello

What level of read performance do you expect? We have limit 15 ms for 99 
percentile with average read latency near 0.9ms. For some CF 99 percentile 
actually equals to 2ms, for other - to 10ms, this depends on the data volume 
you read in each query.

Tuning read performance involved cleaning up data model, tuning cassandra.yaml, 
switching from Hector to astyanax, tuning OS parameters.

On 05/22/2013 04:40 PM, Christopher Wirt wrote:
Hello,

We’re looking at deploying a new ring where we want the best possible read 
performance.

We’ve setup a cluster with 6 nodes, replication level 3, 32Gb of memory, 8Gb 
Heap, 800Mb keycache, each holding 40/50Gb of data on a 200Gb SSD and 500Gb 
SATA for OS and commitlog
Three column families
ColFamily1 50% of the load and data
ColFamily2 35% of the load and data
ColFamily3 15% of the load and data

At the moment we are still seeing around 20% disk utilisation and occasionally 
as high as 40/50% on some nodes at peak time.. we are conducting some semi live 
testing.
CPU looks fine, memory is fine, keycache hit rate is about 80% (could be 
better, so maybe we should be increasing the keycache size?)

Anyway, we’re looking into what we can do to improve this.

One conversion we are having at the moment is around the SSD disk setup..

We are considering moving to have 3 smaller SSD drives and spreading the data 
across those.

The possibilities are:
-We have a RAID0 of the smaller SSDs and hope that improves performance.
Will this acutally yield better throughput?

-We mount the SSDs to different directories and define multiple data 
directories in Cassandra.yaml.
Will not having a layer of RAID controller improve the throughput?

-We mount the SSDs to different columns family directories and have a single 
data directory declared in Cassandra.yaml.
Think this is quite attractive idea.
What are the drawbacks? System column families will be on the main SATA?

-We don’t change anything and just keep upping our keycache.
-Anything you guys can think of.

Ideas and thoughts welcome. Thanks for your time and expertise.

Chris







Re: High performance disk io

2013-05-22 Thread Wei Zhu
For us, the biggest killer is repair and compaction following repair. If you 
are running VNodes, you need to test the performance while running repair. 

- Original Message -

From: "Igor"  
To: user@cassandra.apache.org 
Sent: Wednesday, May 22, 2013 7:48:34 AM 
Subject: Re: High performance disk io 


On 05/22/2013 05:41 PM, Christopher Wirt wrote: 




Hi Igor, 

Yea same here, 15ms for 99 th percentile is our max. Currently getting one or 
two ms for most CF. It goes up at peak times which is what we want to avoid. 


Our 99 percentile also goes up at peak times but stay at acceptable level. 





We’re using Cass 1.2.4 w/vnodes and our own barebones driver on top of thrift. 
Needed to be .NET so Hector and Astyanax were not options. 


Astyanax is token-aware, so we avoid extra data hops between cassandra nodes. 





Do you use SSDs or multiple SSDs in any kind of configuration or RAID? 


No, single SSD per host 






Thanks 

Chris 



From: Igor [ mailto:i...@4friends.od.ua ] 
Sent: 22 May 2013 15:07 
To: user@cassandra.apache.org 
Subject: Re: High performance disk io 


Hello 

What level of read performance do you expect? We have limit 15 ms for 99 
percentile with average read latency near 0.9ms. For some CF 99 percentile 
actually equals to 2ms, for other - to 10ms, this depends on the data volume 
you read in each query. 

Tuning read performance involved cleaning up data model, tuning cassandra.yaml, 
switching from Hector to astyanax, tuning OS parameters. 

On 05/22/2013 04:40 PM, Christopher Wirt wrote: 



Hello, 

We’re looking at deploying a new ring where we want the best possible read 
performance. 

We’ve setup a cluster with 6 nodes, replication level 3, 32Gb of memory, 8Gb 
Heap, 800Mb keycache, each holding 40/50Gb of data on a 200Gb SSD and 500Gb 
SATA for OS and commitlog 
Three column families 
ColFamily1 50% of the load and data 
ColFamily2 35% of the load and data 
ColFamily3 15% of the load and data 

At the moment we are still seeing around 20% disk utilisation and occasionally 
as high as 40/50% on some nodes at peak time.. we are conducting some semi live 
testing. 
CPU looks fine, memory is fine, keycache hit rate is about 80% (could be 
better, so maybe we should be increasing the keycache size?) 

Anyway, we’re looking into what we can do to improve this. 

One conversion we are having at the moment is around the SSD disk setup.. 

We are considering moving to have 3 smaller SSD drives and spreading the data 
across those. 

The possibilities are: 
-We have a RAID0 of the smaller SSDs and hope that improves performance. 
Will this acutally yield better throughput? 

-We mount the SSDs to different directories and define multiple data 
directories in Cassandra.yaml. 
Will not having a layer of RAID controller improve the throughput? 

-We mount the SSDs to different columns family directories and have a single 
data directory declared in Cassandra.yaml. 
Think this is quite attractive idea. 
What are the drawbacks? System column families will be on the main SATA? 

-We don’t change anything and just keep upping our keycache. 
-Anything you guys can think of. 

Ideas and thoughts welcome. Thanks for your time and expertise. 

Chris 










Re: High performance disk io

2013-05-22 Thread Igor

On 05/22/2013 05:41 PM, Christopher Wirt wrote:


Hi Igor,

Yea same here, 15ms for 99^th percentile is our max. Currently getting 
one or two ms for most CF. It goes up at peak times which is what we 
want to avoid.



Our 99 percentile also goes up at peak times but stay at acceptable level.

We're using Cass 1.2.4 w/vnodes and our own barebones driver on top of 
thrift. Needed to be .NET so Hector and Astyanax were not options.


Astyanax is token-aware, so we avoid extra data hops between cassandra 
nodes.



Do you use SSDs or multiple SSDs in any kind of configuration or RAID?



No, single SSD per host


Thanks

Chris

*From:*Igor [mailto:i...@4friends.od.ua]
*Sent:* 22 May 2013 15:07
*To:* user@cassandra.apache.org
*Subject:* Re: High performance disk io

Hello

What level of read performance do you expect? We have limit 15 ms for 
99 percentile with average read latency near 0.9ms. For some CF 99 
percentile actually equals to 2ms, for other - to 10ms, this depends 
on the data volume you read in each query.


Tuning read performance involved cleaning up data model, tuning 
cassandra.yaml, switching from Hector to astyanax, tuning OS parameters.


On 05/22/2013 04:40 PM, Christopher Wirt wrote:

Hello,

We're looking at deploying a new ring where we want the best
possible read performance.

We've setup a cluster with 6 nodes, replication level 3, 32Gb of
memory, 8Gb Heap, 800Mb keycache, each holding 40/50Gb of data on
a 200Gb SSD and 500Gb SATA for OS and commitlog

Three column families

ColFamily1 50% of the load and data

ColFamily2 35% of the load and data

ColFamily3 15% of the load and data

At the moment we are still seeing around 20% disk utilisation and
occasionally as high as 40/50% on some nodes at peak time.. we are
conducting some semi live testing.

CPU looks fine, memory is fine, keycache hit rate is about 80%
(could be better, so maybe we should be increasing the keycache size?)

Anyway, we're looking into what we can do to improve this.

One conversion we are having at the moment is around the SSD disk
setup..

We are considering moving to have 3 smaller SSD drives and
spreading the data across those.

The possibilities are:

-We have a RAID0 of the smaller SSDs and hope that improves
performance.

Will this acutally yield better throughput?

-We mount the SSDs to different directories and define multiple
data directories in Cassandra.yaml.

Will not having a layer of RAID controller improve the throughput?

-We mount the SSDs to different columns family directories and
have a single data directory declared in Cassandra.yaml.

Think this is quite attractive idea.

What are the drawbacks? System column families will be on the main
SATA?

-We don't change anything and just keep upping our keycache.

-Anything you guys can think of.

Ideas and thoughts welcome. Thanks for your time and expertise.

Chris





RE: High performance disk io

2013-05-22 Thread Christopher Wirt
Hi Dean, 

Adding nodes is the easy way out. We can get three smaller SSDs for the same
price as our current setup. 

How do we optimise performance for this? Is it worth the effort? To RAID or
not to RAID, that is one of my questions.

Currently I'm thinking it must be faster and given the same price tag easily
worth the effort.

Cheers,
Chris


-Original Message-
From: Hiller, Dean [mailto:dean.hil...@nrel.gov] 
Sent: 22 May 2013 15:33
To: user@cassandra.apache.org
Subject: Re: High performance disk io

Well, if you just want to lower your I/O util %, you could always just add
more nodes to the cluster ;).

Dean

From: Igor mailto:i...@4friends.od.ua>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
mailto:user@cassandra.apache.org>>
Date: Wednesday, May 22, 2013 8:06 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
mailto:user@cassandra.apache.org>>
Subject: Re: High performance disk io

Hello

What level of read performance do you expect? We have limit 15 ms for 99
percentile with average read latency near 0.9ms. For some CF 99 percentile
actually equals to 2ms, for other - to 10ms, this depends on the data volume
you read in each query.

Tuning read performance involved cleaning up data model, tuning
cassandra.yaml, switching from Hector to astyanax, tuning OS parameters.

On 05/22/2013 04:40 PM, Christopher Wirt wrote:
Hello,

We're looking at deploying a new ring where we want the best possible read
performance.

We've setup a cluster with 6 nodes, replication level 3, 32Gb of memory, 8Gb
Heap, 800Mb keycache, each holding 40/50Gb of data on a 200Gb SSD and 500Gb
SATA for OS and commitlog Three column families
ColFamily1 50% of the load and data
ColFamily2 35% of the load and data
ColFamily3 15% of the load and data

At the moment we are still seeing around 20% disk utilisation and
occasionally as high as 40/50% on some nodes at peak time.. we are
conducting some semi live testing.
CPU looks fine, memory is fine, keycache hit rate is about 80% (could be
better, so maybe we should be increasing the keycache size?)

Anyway, we're looking into what we can do to improve this.

One conversion we are having at the moment is around the SSD disk setup..

We are considering moving to have 3 smaller SSD drives and spreading the
data across those.

The possibilities are:
-We have a RAID0 of the smaller SSDs and hope that improves performance.
Will this acutally yield better throughput?

-We mount the SSDs to different directories and define multiple data
directories in Cassandra.yaml.
Will not having a layer of RAID controller improve the throughput?

-We mount the SSDs to different columns family directories and have a single
data directory declared in Cassandra.yaml.
Think this is quite attractive idea.
What are the drawbacks? System column families will be on the main SATA?

-We don't change anything and just keep upping our keycache.
-Anything you guys can think of.

Ideas and thoughts welcome. Thanks for your time and expertise.

Chris





RE: High performance disk io

2013-05-22 Thread Christopher Wirt
Hi Igor, 

 

Yea same here, 15ms for 99th percentile is our max. Currently getting one or
two ms for most CF. It goes up at peak times which is what we want to avoid.

 

We're using Cass 1.2.4 w/vnodes and our own barebones driver on top of
thrift. Needed to be .NET so Hector and Astyanax were not options.

 

Do you use SSDs or multiple SSDs in any kind of configuration or RAID?

 

Thanks

 

Chris

 

From: Igor [mailto:i...@4friends.od.ua] 
Sent: 22 May 2013 15:07
To: user@cassandra.apache.org
Subject: Re: High performance disk io

 

Hello

What level of read performance do you expect? We have limit 15 ms for 99
percentile with average read latency near 0.9ms. For some CF 99 percentile
actually equals to 2ms, for other - to 10ms, this depends on the data volume
you read in each query.

Tuning read performance involved cleaning up data model, tuning
cassandra.yaml, switching from Hector to astyanax, tuning OS parameters.

On 05/22/2013 04:40 PM, Christopher Wirt wrote:

Hello,

 

We're looking at deploying a new ring where we want the best possible read
performance.

 

We've setup a cluster with 6 nodes, replication level 3, 32Gb of memory, 8Gb
Heap, 800Mb keycache, each holding 40/50Gb of data on a 200Gb SSD and 500Gb
SATA for OS and commitlog

Three column families

ColFamily1 50% of the load and data

ColFamily2 35% of the load and data

ColFamily3 15% of the load and data

 

At the moment we are still seeing around 20% disk utilisation and
occasionally as high as 40/50% on some nodes at peak time.. we are
conducting some semi live testing.

CPU looks fine, memory is fine, keycache hit rate is about 80% (could be
better, so maybe we should be increasing the keycache size?)

 

Anyway, we're looking into what we can do to improve this.

 

One conversion we are having at the moment is around the SSD disk setup..

 

We are considering moving to have 3 smaller SSD drives and spreading the
data across those.

 

The possibilities are:

-We have a RAID0 of the smaller SSDs and hope that improves performance. 

Will this acutally yield better throughput?

 

-We mount the SSDs to different directories and define multiple data
directories in Cassandra.yaml.

Will not having a layer of RAID controller improve the throughput?

 

-We mount the SSDs to different columns family directories and have a single
data directory declared in Cassandra.yaml. 

Think this is quite attractive idea.

What are the drawbacks? System column families will be on the main SATA?

 

-We don't change anything and just keep upping our keycache.

-Anything you guys can think of.

 

Ideas and thoughts welcome. Thanks for your time and expertise. 

 

Chris

 

 

 



Re: High performance disk io

2013-05-22 Thread Hiller, Dean
Well, if you just want to lower your I/O util %, you could always just add more 
nodes to the cluster ;).

Dean

From: Igor mailto:i...@4friends.od.ua>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Date: Wednesday, May 22, 2013 8:06 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Subject: Re: High performance disk io

Hello

What level of read performance do you expect? We have limit 15 ms for 99 
percentile with average read latency near 0.9ms. For some CF 99 percentile 
actually equals to 2ms, for other - to 10ms, this depends on the data volume 
you read in each query.

Tuning read performance involved cleaning up data model, tuning cassandra.yaml, 
switching from Hector to astyanax, tuning OS parameters.

On 05/22/2013 04:40 PM, Christopher Wirt wrote:
Hello,

We’re looking at deploying a new ring where we want the best possible read 
performance.

We’ve setup a cluster with 6 nodes, replication level 3, 32Gb of memory, 8Gb 
Heap, 800Mb keycache, each holding 40/50Gb of data on a 200Gb SSD and 500Gb 
SATA for OS and commitlog
Three column families
ColFamily1 50% of the load and data
ColFamily2 35% of the load and data
ColFamily3 15% of the load and data

At the moment we are still seeing around 20% disk utilisation and occasionally 
as high as 40/50% on some nodes at peak time.. we are conducting some semi live 
testing.
CPU looks fine, memory is fine, keycache hit rate is about 80% (could be 
better, so maybe we should be increasing the keycache size?)

Anyway, we’re looking into what we can do to improve this.

One conversion we are having at the moment is around the SSD disk setup..

We are considering moving to have 3 smaller SSD drives and spreading the data 
across those.

The possibilities are:
-We have a RAID0 of the smaller SSDs and hope that improves performance.
Will this acutally yield better throughput?

-We mount the SSDs to different directories and define multiple data 
directories in Cassandra.yaml.
Will not having a layer of RAID controller improve the throughput?

-We mount the SSDs to different columns family directories and have a single 
data directory declared in Cassandra.yaml.
Think this is quite attractive idea.
What are the drawbacks? System column families will be on the main SATA?

-We don’t change anything and just keep upping our keycache.
-Anything you guys can think of.

Ideas and thoughts welcome. Thanks for your time and expertise.

Chris





Re: High performance disk io

2013-05-22 Thread Igor

Hello

What level of read performance do you expect? We have limit 15 ms for 99 
percentile with average read latency near 0.9ms. For some CF 99 
percentile actually equals to 2ms, for other - to 10ms, this depends on 
the data volume you read in each query.


Tuning read performance involved cleaning up data model, tuning 
cassandra.yaml, switching from Hector to astyanax, tuning OS parameters.


On 05/22/2013 04:40 PM, Christopher Wirt wrote:


Hello,

We're looking at deploying a new ring where we want the best possible 
read performance.


We've setup a cluster with 6 nodes, replication level 3, 32Gb of 
memory, 8Gb Heap, 800Mb keycache, each holding 40/50Gb of data on a 
200Gb SSD and 500Gb SATA for OS and commitlog


Three column families

ColFamily1 50% of the load and data

ColFamily2 35% of the load and data

ColFamily3 15% of the load and data

At the moment we are still seeing around 20% disk utilisation and 
occasionally as high as 40/50% on some nodes at peak time.. we are 
conducting some semi live testing.


CPU looks fine, memory is fine, keycache hit rate is about 80% (could 
be better, so maybe we should be increasing the keycache size?)


Anyway, we're looking into what we can do to improve this.

One conversion we are having at the moment is around the SSD disk setup..

We are considering moving to have 3 smaller SSD drives and spreading 
the data across those.


The possibilities are:

-We have a RAID0 of the smaller SSDs and hope that improves performance.

Will this acutally yield better throughput?

-We mount the SSDs to different directories and define multiple data 
directories in Cassandra.yaml.


Will not having a layer of RAID controller improve the throughput?

-We mount the SSDs to different columns family directories and have a 
single data directory declared in Cassandra.yaml.


Think this is quite attractive idea.

What are the drawbacks? System column families will be on the main SATA?

-We don't change anything and just keep upping our keycache.

-Anything you guys can think of.

Ideas and thoughts welcome. Thanks for your time and expertise.

Chris





High performance disk io

2013-05-22 Thread Christopher Wirt
Hello,

 

We're looking at deploying a new ring where we want the best possible read
performance.

 

We've setup a cluster with 6 nodes, replication level 3, 32Gb of memory, 8Gb
Heap, 800Mb keycache, each holding 40/50Gb of data on a 200Gb SSD and 500Gb
SATA for OS and commitlog

Three column families

ColFamily1 50% of the load and data

ColFamily2 35% of the load and data

ColFamily3 15% of the load and data

 

At the moment we are still seeing around 20% disk utilisation and
occasionally as high as 40/50% on some nodes at peak time.. we are
conducting some semi live testing.

CPU looks fine, memory is fine, keycache hit rate is about 80% (could be
better, so maybe we should be increasing the keycache size?)

 

Anyway, we're looking into what we can do to improve this.

 

One conversion we are having at the moment is around the SSD disk setup..

 

We are considering moving to have 3 smaller SSD drives and spreading the
data across those.

 

The possibilities are:

-We have a RAID0 of the smaller SSDs and hope that improves performance. 

Will this acutally yield better throughput?

 

-We mount the SSDs to different directories and define multiple data
directories in Cassandra.yaml.

Will not having a layer of RAID controller improve the throughput?

 

-We mount the SSDs to different columns family directories and have a single
data directory declared in Cassandra.yaml. 

Think this is quite attractive idea.

What are the drawbacks? System column families will be on the main SATA?

 

-We don't change anything and just keep upping our keycache.

-Anything you guys can think of.

 

Ideas and thoughts welcome. Thanks for your time and expertise. 

 

Chris