Re: NVMe SSD benchmarking with Cassandra

2018-01-17 Thread Matija Gobec
Justin,

NVMe drives have their own IO queueing mechanism and there is a huge
performance difference vs the linux queue.
Next to properly configured file system and scheduler try setting
"scsi_mod.use_blk_mq=1"
in grub cmdline.
If you are looking for a BFQ scheduler, its probably a module so you will
need to load it.

Best,
Matija

On Tue, Jan 9, 2018 at 1:17 AM, Nate McCall  wrote:

>
>>
>>
>> In regards to setting read ahead, how is this set for nvme drives? Also,
>> below is our compression settings for the table… It’s the same as our tests
>> that we are doing against SAS SSDs so I don’t think the compression
>> settings would be the issue…
>>
>>
>>
>
> Check blockdev --report between the old and the new servers to see if
> there is a difference. Are there other deltas in the disk layouts between
> the old and new servers (ie. LVM, mdadm, etc.)?
>
> You can control read ahead via 'blockdev --setra' or via poking the
> kernel: /sys/block/[YOUR DRIVE]/queue/read_ahead_kb
>
> In both cases, changes are instantaneous so you can do it on a canary and
> monitor for effect.
>
> Also, i'd be curious to know (since you have this benchmark setup) if you
> got the degradation you are currently seeing if you set concurrent_reads
> and concurrent_writes back to their defaults.
>
>
> --
> -
> Nate McCall
> Wellington, NZ
> @zznate
>
> CTO
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>


Re: NVMe SSD benchmarking with Cassandra

2018-01-08 Thread Nate McCall
>
>
>
>
> In regards to setting read ahead, how is this set for nvme drives? Also,
> below is our compression settings for the table… It’s the same as our tests
> that we are doing against SAS SSDs so I don’t think the compression
> settings would be the issue…
>
>
>

Check blockdev --report between the old and the new servers to see if there
is a difference. Are there other deltas in the disk layouts between the old
and new servers (ie. LVM, mdadm, etc.)?

You can control read ahead via 'blockdev --setra' or via poking the kernel:
/sys/block/[YOUR DRIVE]/queue/read_ahead_kb

In both cases, changes are instantaneous so you can do it on a canary and
monitor for effect.

Also, i'd be curious to know (since you have this benchmark setup) if you
got the degradation you are currently seeing if you set concurrent_reads
and concurrent_writes back to their defaults.


-- 
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com


RE: NVMe SSD benchmarking with Cassandra

2018-01-08 Thread Justin Sanciangco
n 34 minutes [INSERT: Count=78400, Max=3844095, Min=205, 
Avg=4716.95, 90=533, 99=875, 99.9=1020415, 99.99=383]
com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout 
during write query at consistency ONE (1 replica were required but only 0 
acknowledged the write)

Any insight would be very helpful.

Thank you,
Justin Sanciangco


From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Friday, January 5, 2018 5:50 PM
To: user@cassandra.apache.org
Subject: Re: NVMe SSD benchmarking with Cassandra

Second the note about compression chunk size in particular.
--
Jeff Jirsa


On Jan 5, 2018, at 5:48 PM, Jon Haddad 
mailto:j...@jonhaddad.com>> wrote:
Generally speaking, disable readahead.  After that it's very likely the issue 
isn’t in the settings you’re using the disk settings, but is actually in your 
Cassandra config or the data model.  How are you measuring things?  Are you 
saturating your disks?  What resource is your bottleneck?

*Every* single time I’ve handled a question like this, without exception, it 
ends up being a mix of incorrect compression settings (use 4K at most), some 
crazy readahead setting like 1MB, and terrible JVM settings that are the bulk 
of the problem.

Without knowing how you are testing things or *any* metrics whatsoever whether 
it be C* or OS it’s going to be hard to help you out.

Jon



On Jan 5, 2018, at 5:41 PM, Justin Sanciangco 
mailto:jsancian...@blizzard.com>> wrote:

Hello,

I am currently benchmarking NVMe SSDs with Cassandra and am getting very bad 
performance when my workload exceeds the memory size. What mount settings for 
NVMe should be used? Right now the SSD is formatted as XFS using noop 
scheduler. Are there any additional mount options that should be used? Any 
specific kernel parameters that should set in order to make best use of the 
PCIe NVMe SSD? Your insight would be well appreciated.

Thank you,
Justin Sanciangco



Re: NVMe SSD benchmarking with Cassandra

2018-01-05 Thread Dikang Gu
Do you have some detailed benchmark metrics? Like the QPS, Avg read/write
latency, P95/P99 read/write latency?

On Fri, Jan 5, 2018 at 5:57 PM, Justin Sanciangco 
wrote:

> I am benchmarking with the YCSB tool doing 1k writes.
>
>
>
> Below are my server specs
>
> 2 sockets
>
> 12 core hyperthreaded processor
>
> 64GB memory
>
>
>
> Cassandra settings
>
> 32GB heap
>
> Concurrent_reads: 128
>
> Concurrent_writes:256
>
>
>
> From what we are seeing it looks like the kernel writing to the disk
> causes degrading performance.
>
>
>
>
>
> Please let me know
>
>
>
>
>
> *From:* Jeff Jirsa [mailto:jji...@gmail.com]
> *Sent:* Friday, January 5, 2018 5:50 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: NVMe SSD benchmarking with Cassandra
>
>
>
> Second the note about compression chunk size in particular.
>
> --
>
> Jeff Jirsa
>
>
>
>
> On Jan 5, 2018, at 5:48 PM, Jon Haddad  wrote:
>
> Generally speaking, disable readahead.  After that it's very likely the
> issue isn’t in the settings you’re using the disk settings, but is actually
> in your Cassandra config or the data model.  How are you measuring things?
> Are you saturating your disks?  What resource is your bottleneck?
>
>
>
> *Every* single time I’ve handled a question like this, without exception,
> it ends up being a mix of incorrect compression settings (use 4K at most),
> some crazy readahead setting like 1MB, and terrible JVM settings that are
> the bulk of the problem.
>
>
>
> Without knowing how you are testing things or *any* metrics whatsoever
> whether it be C* or OS it’s going to be hard to help you out.
>
>
>
> Jon
>
>
>
>
>
> On Jan 5, 2018, at 5:41 PM, Justin Sanciangco 
> wrote:
>
>
>
> Hello,
>
>
>
> I am currently benchmarking NVMe SSDs with Cassandra and am getting very
> bad performance when my workload exceeds the memory size. What mount
> settings for NVMe should be used? Right now the SSD is formatted as XFS
> using noop scheduler. Are there any additional mount options that should be
> used? Any specific kernel parameters that should set in order to make best
> use of the PCIe NVMe SSD? Your insight would be well appreciated.
>
>
>
> Thank you,
>
> Justin Sanciangco
>
>
>
>


-- 
Dikang


RE: NVMe SSD benchmarking with Cassandra

2018-01-05 Thread Justin Sanciangco
I am benchmarking with the YCSB tool doing 1k writes.

Below are my server specs
2 sockets
12 core hyperthreaded processor
64GB memory

Cassandra settings
32GB heap
Concurrent_reads: 128
Concurrent_writes:256

From what we are seeing it looks like the kernel writing to the disk causes 
degrading performance.

[cid:image001.png@01D3864E.B5034DA0]

Please let me know


From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Friday, January 5, 2018 5:50 PM
To: user@cassandra.apache.org
Subject: Re: NVMe SSD benchmarking with Cassandra

Second the note about compression chunk size in particular.
--
Jeff Jirsa


On Jan 5, 2018, at 5:48 PM, Jon Haddad 
mailto:j...@jonhaddad.com>> wrote:
Generally speaking, disable readahead.  After that it's very likely the issue 
isn’t in the settings you’re using the disk settings, but is actually in your 
Cassandra config or the data model.  How are you measuring things?  Are you 
saturating your disks?  What resource is your bottleneck?

*Every* single time I’ve handled a question like this, without exception, it 
ends up being a mix of incorrect compression settings (use 4K at most), some 
crazy readahead setting like 1MB, and terrible JVM settings that are the bulk 
of the problem.

Without knowing how you are testing things or *any* metrics whatsoever whether 
it be C* or OS it’s going to be hard to help you out.

Jon



On Jan 5, 2018, at 5:41 PM, Justin Sanciangco 
mailto:jsancian...@blizzard.com>> wrote:

Hello,

I am currently benchmarking NVMe SSDs with Cassandra and am getting very bad 
performance when my workload exceeds the memory size. What mount settings for 
NVMe should be used? Right now the SSD is formatted as XFS using noop 
scheduler. Are there any additional mount options that should be used? Any 
specific kernel parameters that should set in order to make best use of the 
PCIe NVMe SSD? Your insight would be well appreciated.

Thank you,
Justin Sanciangco



Re: NVMe SSD benchmarking with Cassandra

2018-01-05 Thread Jeff Jirsa
Second the note about compression chunk size in particular. 

-- 
Jeff Jirsa


> On Jan 5, 2018, at 5:48 PM, Jon Haddad  wrote:
> 
> Generally speaking, disable readahead.  After that it's very likely the issue 
> isn’t in the settings you’re using the disk settings, but is actually in your 
> Cassandra config or the data model.  How are you measuring things?  Are you 
> saturating your disks?  What resource is your bottleneck?
> 
> *Every* single time I’ve handled a question like this, without exception, it 
> ends up being a mix of incorrect compression settings (use 4K at most), some 
> crazy readahead setting like 1MB, and terrible JVM settings that are the bulk 
> of the problem.  
> 
> Without knowing how you are testing things or *any* metrics whatsoever 
> whether it be C* or OS it’s going to be hard to help you out.
> 
> Jon
> 
> 
>> On Jan 5, 2018, at 5:41 PM, Justin Sanciangco  
>> wrote:
>> 
>> Hello,
>>  
>> I am currently benchmarking NVMe SSDs with Cassandra and am getting very bad 
>> performance when my workload exceeds the memory size. What mount settings 
>> for NVMe should be used? Right now the SSD is formatted as XFS using noop 
>> scheduler. Are there any additional mount options that should be used? Any 
>> specific kernel parameters that should set in order to make best use of the 
>> PCIe NVMe SSD? Your insight would be well appreciated.
>>  
>> Thank you,
>> Justin Sanciangco
> 


Re: NVMe SSD benchmarking with Cassandra

2018-01-05 Thread Jon Haddad
Oh, I should have added, my compression settings comment only applies to read 
heavy workloads, as reading 64KB off disk in order to return a handful of bytes 
is incredibly wasteful by orders of magnitude but doesn’t really cause any 
problems on write heavy workloads.

> On Jan 5, 2018, at 5:48 PM, Jon Haddad  wrote:
> 
> Generally speaking, disable readahead.  After that it's very likely the issue 
> isn’t in the settings you’re using the disk settings, but is actually in your 
> Cassandra config or the data model.  How are you measuring things?  Are you 
> saturating your disks?  What resource is your bottleneck?
> 
> *Every* single time I’ve handled a question like this, without exception, it 
> ends up being a mix of incorrect compression settings (use 4K at most), some 
> crazy readahead setting like 1MB, and terrible JVM settings that are the bulk 
> of the problem.  
> 
> Without knowing how you are testing things or *any* metrics whatsoever 
> whether it be C* or OS it’s going to be hard to help you out.
> 
> Jon
> 
> 
>> On Jan 5, 2018, at 5:41 PM, Justin Sanciangco > > wrote:
>> 
>> Hello,
>>  
>> I am currently benchmarking NVMe SSDs with Cassandra and am getting very bad 
>> performance when my workload exceeds the memory size. What mount settings 
>> for NVMe should be used? Right now the SSD is formatted as XFS using noop 
>> scheduler. Are there any additional mount options that should be used? Any 
>> specific kernel parameters that should set in order to make best use of the 
>> PCIe NVMe SSD? Your insight would be well appreciated.
>>  
>> Thank you,
>> Justin Sanciangco
> 



Re: NVMe SSD benchmarking with Cassandra

2018-01-05 Thread Jon Haddad
Generally speaking, disable readahead.  After that it's very likely the issue 
isn’t in the settings you’re using the disk settings, but is actually in your 
Cassandra config or the data model.  How are you measuring things?  Are you 
saturating your disks?  What resource is your bottleneck?

*Every* single time I’ve handled a question like this, without exception, it 
ends up being a mix of incorrect compression settings (use 4K at most), some 
crazy readahead setting like 1MB, and terrible JVM settings that are the bulk 
of the problem.  

Without knowing how you are testing things or *any* metrics whatsoever whether 
it be C* or OS it’s going to be hard to help you out.

Jon


> On Jan 5, 2018, at 5:41 PM, Justin Sanciangco  
> wrote:
> 
> Hello,
>  
> I am currently benchmarking NVMe SSDs with Cassandra and am getting very bad 
> performance when my workload exceeds the memory size. What mount settings for 
> NVMe should be used? Right now the SSD is formatted as XFS using noop 
> scheduler. Are there any additional mount options that should be used? Any 
> specific kernel parameters that should set in order to make best use of the 
> PCIe NVMe SSD? Your insight would be well appreciated.
>  
> Thank you,
> Justin Sanciangco



Re: NVMe SSD benchmarking with Cassandra

2018-01-05 Thread Jeff Jirsa
Can you quantify very bad performance? 

-- 
Jeff Jirsa


> On Jan 5, 2018, at 5:41 PM, Justin Sanciangco  
> wrote:
> 
> Hello,
>  
> I am currently benchmarking NVMe SSDs with Cassandra and am getting very bad 
> performance when my workload exceeds the memory size. What mount settings for 
> NVMe should be used? Right now the SSD is formatted as XFS using noop 
> scheduler. Are there any additional mount options that should be used? Any 
> specific kernel parameters that should set in order to make best use of the 
> PCIe NVMe SSD? Your insight would be well appreciated.
>  
> Thank you,
> Justin Sanciangco


NVMe SSD benchmarking with Cassandra

2018-01-05 Thread Justin Sanciangco
Hello,

I am currently benchmarking NVMe SSDs with Cassandra and am getting very bad 
performance when my workload exceeds the memory size. What mount settings for 
NVMe should be used? Right now the SSD is formatted as XFS using noop 
scheduler. Are there any additional mount options that should be used? Any 
specific kernel parameters that should set in order to make best use of the 
PCIe NVMe SSD? Your insight would be well appreciated.

Thank you,
Justin Sanciangco