Re: [ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times

2018-11-13 Thread Kevin Olbrich
I read the whole thread and it looks like the write cache should always be
disabled as in the worst case, the performance is the same(?).
This is based on this discussion.

I will test some WD4002FYYZ which don't mention "media cache".

Kevin

Am Di., 13. Nov. 2018 um 09:27 Uhr schrieb Виталий Филиппов <
vita...@yourcmc.ru>:

> This may be the explanation:
>
>
> https://serverfault.com/questions/857271/better-performance-when-hdd-write-cache-is-disabled-hgst-ultrastar-7k6000-and
>
> Other manufacturers may have started to do the same, I suppose.
> --
> With best regards,
> Vitaliy Filippov___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times

2018-11-13 Thread Ashley Merrick
Looks like it as the Toshiba drives I use have their own version of that it
seems.

So would explain the same kind of results.

On Tue, 13 Nov 2018 at 4:26 PM, Виталий Филиппов  wrote:

> This may be the explanation:
>
>
> https://serverfault.com/questions/857271/better-performance-when-hdd-write-cache-is-disabled-hgst-ultrastar-7k6000-and
>
> Other manufacturers may have started to do the same, I suppose.
> --
> With best regards,
> Vitaliy Filippov
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times

2018-11-13 Thread Виталий Филиппов
This may be the explanation:

https://serverfault.com/questions/857271/better-performance-when-hdd-write-cache-is-disabled-hgst-ultrastar-7k6000-and

Other manufacturers may have started to do the same, I suppose.
-- 
With best regards,
  Vitaliy Filippov___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times

2018-11-11 Thread Vitaliy Filippov

Either more weird then, what drives is in the other cluster?


Desktop Toshiba and Seagate Constellation 7200rpm

As I understand by now the main impact is for SSD+HDD clusters. Enabled  
HDD write cache causes kernel to send flush requests for it (when write  
cache is disabled it doesn't bother about that) and probably it affects  
something else and causes some extra waits for SSD journal (although it's  
strange and looks like a bug to me). I tried to check latencies in `ceph  
daemon osd.xx perf dump` and both kv_commit_lat and commit_lat decreased  
~10 times when I disabled HDD write cache (although both are SSD-related  
as I understand).


Maybe your HDD are connected via some RAID controller and when you disable  
cache it doesn't really get disabled, but the kernels just stops to issue  
flush requests and makes some writes unsafe?


--
With best regards,
  Vitaliy Filippov
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times

2018-11-11 Thread Ashley Merrick
Mixture of Toshiba drivers here all enterprise rated, cache 128 - 256MB

I have tried turning the write cache on and off a few times across the
cluster using hdparm, ever time can see a huge change from on (40ms
average) to off (1-3ms average)

Vitaliy what drives are you using? Maybe a particular brand / firmware?

On Sun, Nov 11, 2018 at 8:54 PM Marc Roos  wrote:

>
> WD Red here
>
>
>
>
>
>
> -Original Message-
> From: Ashley Merrick [mailto:singap...@amerrick.co.uk]
> Sent: zondag 11 november 2018 13:47
> To: Vitaliy Filippov
> Cc: Marc Roos; ceph-users
> Subject: Re: [ceph-users] Disabling write cache on SATA HDDs reduces
> write latency 7 times
>
> Either more weird then, what drives is in the other cluster?
>
> On Sun, 11 Nov 2018 at 7:19 PM, Vitaliy Filippov 
> wrote:
>
>
> It seems no, I've just tested it on another small cluster with
> HDDs
> only -
> no change
>
> > Does it make sense to test disabling this on hdd cluster only?
>
> --
> With best regards,
>Vitaliy Filippov
>
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times

2018-11-11 Thread Marc Roos
 
WD Red here






-Original Message-
From: Ashley Merrick [mailto:singap...@amerrick.co.uk] 
Sent: zondag 11 november 2018 13:47
To: Vitaliy Filippov
Cc: Marc Roos; ceph-users
Subject: Re: [ceph-users] Disabling write cache on SATA HDDs reduces 
write latency 7 times

Either more weird then, what drives is in the other cluster?

On Sun, 11 Nov 2018 at 7:19 PM, Vitaliy Filippov  
wrote:


It seems no, I've just tested it on another small cluster with HDDs 
only -  
no change

> Does it make sense to test disabling this on hdd cluster only?

-- 
With best regards,
   Vitaliy Filippov



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times

2018-11-11 Thread Ashley Merrick
Either more weird then, what drives is in the other cluster?

On Sun, 11 Nov 2018 at 7:19 PM, Vitaliy Filippov  wrote:

> It seems no, I've just tested it on another small cluster with HDDs only
> -
> no change
>
> > Does it make sense to test disabling this on hdd cluster only?
>
> --
> With best regards,
>Vitaliy Filippov
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times

2018-11-11 Thread Vitaliy Filippov
It seems no, I've just tested it on another small cluster with HDDs only -  
no change



Does it make sense to test disabling this on hdd cluster only?


--
With best regards,
  Vitaliy Filippov
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times

2018-11-11 Thread Marc Roos


I just did very very short test and don’t see any difference with this 
cache on or off, so I am leaving it on for now. 





-Original Message-
From: Ashley Merrick [mailto:singap...@amerrick.co.uk] 
Sent: zondag 11 november 2018 11:43
To: Marc Roos
Cc: ceph-users; vitalif
Subject: Re: [ceph-users] Disabling write cache on SATA HDDs reduces 
write latency 7 times

Don’t have any SSD in the cluster to test.

Also without knowing the exact reason why it being enabled has such a 
negative effect I wouldn’t be sure if also would be the same on SSD’s.

On Sun, 11 Nov 2018 at 6:41 PM, Marc Roos  
wrote:


 

Does it make sense to test disabling this on hdd cluster only?


-Original Message-
From: Ashley Merrick [mailto:singap...@amerrick.co.uk] 
Sent: zondag 11 november 2018 6:24
To: vita...@yourcmc.ru
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Disabling write cache on SATA HDDs 
reduces 
write latency 7 times

I've just worked out I had the same issue, been trying to work out 
the 
cause for the past few days!

However I am using brand new enterprise Toshiba drivers with 256MB 
write 
cache, was seeing I/O wait peaks of 40% even during a small writing 

operation to CEPH and commit / apply latency's in the 40ms+.

Just went through and disabled the write cache on each drive, and 
done a 
few tests with the exact same write performance, but I/O wait in 
the <1% 
and commit / apply latency's in the 1-3ms max.

Something somewhere definitely doesn't seem to like the write cache 

being enabled on the disks, this is a EC Pool in the latest Mimic 
version.

On Sun, Nov 11, 2018 at 5:34 AM Vitaliy Filippov 
 
wrote:


Hi

A weird thing happens in my test cluster made from desktop 
hardware.

The command `for i in /dev/sd?; do hdparm -W 0 $i; done` 
increases  

single-thread write iops (reduces latency) 7 times!

It is a 3-node cluster with Ryzen 2700 CPUs, 3x SATA 
7200rpm HDDs + 
1x  
SATA desktop SSD for system and ceph-mon + 1x SATA server 
SSD for  
block.db/wal in each host. Hosts are linked by 10gbit 
ethernet (not 
the  
fastest one though, average RTT according to flood-ping is 
0.098ms). Ceph  
and OpenNebula are installed on the same hosts, OSDs are 
prepared 
with  
ceph-volume and bluestore with default options. SSDs have 
capacitors  
('power-loss protection'), write cache is turned off for 
them since 
the  
very beginning (hdparm -W 0 /dev/sdb). They're quite old, 
but each 
of them  
is capable of delivering ~22000 iops in journal mode (fio 
-sync=1  
-direct=1 -iodepth=1 -bs=4k -rw=write).

However, RBD single-threaded random-write benchmark 
originally gave 
awful  
results - when testing with `fio -ioengine=libaio -size=10G 
-sync=1 

-direct=1 -name=test -bs=4k -iodepth=1 -rw=randwrite 
-runtime=60  
-filename=./testfile` from inside a VM, the result was only 
58 iops 

average (17ms latency). This was not what I expected from 
the 
HDD+SSD  
setup.

But today I tried to play with cache settings for data 
disks. And I 
was  
really surprised to discover that just disabling HDD write 
cache 
(hdparm  
-W 0 /dev/sdX for all HDD devices) increases 
single-threaded 
performance  
~7 times! The result from the same VM (without even 
rebooting it) 
is  
iops=405, avg lat=2.47ms. That's a magnitude faster and in 
fact 
2.5ms  
seems sort of an expected number.

As I understand 4k writes are always deferred at the 
default 
setting of  
prefer_deferred_size_hdd=32768, this means they should only 
get 
written to  
the journal device before OSD acks the write operation.

So my question is WHY? Why does HDD write cache affect 
commit 
latency with  
WAL on an SSD?

I would also appreciate if anybody with similar setup 
(HDD+SSD with 

desktop SATA controllers or HBA) could test the same 
thing...

-- 
With best r

Re: [ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times

2018-11-11 Thread Ashley Merrick
Don’t have any SSD in the cluster to test.

Also without knowing the exact reason why it being enabled has such a
negative effect I wouldn’t be sure if also would be the same on SSD’s.

On Sun, 11 Nov 2018 at 6:41 PM, Marc Roos  wrote:

>
>
> Does it make sense to test disabling this on hdd cluster only?
>
>
> -Original Message-
> From: Ashley Merrick [mailto:singap...@amerrick.co.uk]
> Sent: zondag 11 november 2018 6:24
> To: vita...@yourcmc.ru
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Disabling write cache on SATA HDDs reduces
> write latency 7 times
>
> I've just worked out I had the same issue, been trying to work out the
> cause for the past few days!
>
> However I am using brand new enterprise Toshiba drivers with 256MB write
> cache, was seeing I/O wait peaks of 40% even during a small writing
> operation to CEPH and commit / apply latency's in the 40ms+.
>
> Just went through and disabled the write cache on each drive, and done a
> few tests with the exact same write performance, but I/O wait in the <1%
> and commit / apply latency's in the 1-3ms max.
>
> Something somewhere definitely doesn't seem to like the write cache
> being enabled on the disks, this is a EC Pool in the latest Mimic
> version.
>
> On Sun, Nov 11, 2018 at 5:34 AM Vitaliy Filippov 
> wrote:
>
>
> Hi
>
> A weird thing happens in my test cluster made from desktop
> hardware.
>
> The command `for i in /dev/sd?; do hdparm -W 0 $i; done`
> increases
>
> single-thread write iops (reduces latency) 7 times!
>
> It is a 3-node cluster with Ryzen 2700 CPUs, 3x SATA 7200rpm HDDs
> +
> 1x
> SATA desktop SSD for system and ceph-mon + 1x SATA server SSD for
> block.db/wal in each host. Hosts are linked by 10gbit ethernet
> (not
> the
> fastest one though, average RTT according to flood-ping is
> 0.098ms). Ceph
> and OpenNebula are installed on the same hosts, OSDs are prepared
> with
> ceph-volume and bluestore with default options. SSDs have
> capacitors
> ('power-loss protection'), write cache is turned off for them
> since
> the
> very beginning (hdparm -W 0 /dev/sdb). They're quite old, but each
> of them
> is capable of delivering ~22000 iops in journal mode (fio -sync=1
> -direct=1 -iodepth=1 -bs=4k -rw=write).
>
> However, RBD single-threaded random-write benchmark originally
> gave
> awful
> results - when testing with `fio -ioengine=libaio -size=10G
> -sync=1
>
> -direct=1 -name=test -bs=4k -iodepth=1 -rw=randwrite -runtime=60
> -filename=./testfile` from inside a VM, the result was only 58
> iops
>
> average (17ms latency). This was not what I expected from the
> HDD+SSD
> setup.
>
> But today I tried to play with cache settings for data disks. And
> I
> was
> really surprised to discover that just disabling HDD write cache
> (hdparm
> -W 0 /dev/sdX for all HDD devices) increases single-threaded
> performance
> ~7 times! The result from the same VM (without even rebooting it)
> is
> iops=405, avg lat=2.47ms. That's a magnitude faster and in fact
> 2.5ms
> seems sort of an expected number.
>
> As I understand 4k writes are always deferred at the default
> setting of
> prefer_deferred_size_hdd=32768, this means they should only get
> written to
> the journal device before OSD acks the write operation.
>
> So my question is WHY? Why does HDD write cache affect commit
> latency with
> WAL on an SSD?
>
> I would also appreciate if anybody with similar setup (HDD+SSD
> with
>
> desktop SATA controllers or HBA) could test the same thing...
>
> --
> With best regards,
>Vitaliy Filippov
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times

2018-11-11 Thread Marc Roos
 

Does it make sense to test disabling this on hdd cluster only?


-Original Message-
From: Ashley Merrick [mailto:singap...@amerrick.co.uk] 
Sent: zondag 11 november 2018 6:24
To: vita...@yourcmc.ru
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Disabling write cache on SATA HDDs reduces 
write latency 7 times

I've just worked out I had the same issue, been trying to work out the 
cause for the past few days!

However I am using brand new enterprise Toshiba drivers with 256MB write 
cache, was seeing I/O wait peaks of 40% even during a small writing 
operation to CEPH and commit / apply latency's in the 40ms+.

Just went through and disabled the write cache on each drive, and done a 
few tests with the exact same write performance, but I/O wait in the <1% 
and commit / apply latency's in the 1-3ms max.

Something somewhere definitely doesn't seem to like the write cache 
being enabled on the disks, this is a EC Pool in the latest Mimic 
version.

On Sun, Nov 11, 2018 at 5:34 AM Vitaliy Filippov  
wrote:


Hi

A weird thing happens in my test cluster made from desktop 
hardware.

The command `for i in /dev/sd?; do hdparm -W 0 $i; done` increases  

single-thread write iops (reduces latency) 7 times!

It is a 3-node cluster with Ryzen 2700 CPUs, 3x SATA 7200rpm HDDs + 
1x  
SATA desktop SSD for system and ceph-mon + 1x SATA server SSD for  
block.db/wal in each host. Hosts are linked by 10gbit ethernet (not 
the  
fastest one though, average RTT according to flood-ping is 
0.098ms). Ceph  
and OpenNebula are installed on the same hosts, OSDs are prepared 
with  
ceph-volume and bluestore with default options. SSDs have 
capacitors  
('power-loss protection'), write cache is turned off for them since 
the  
very beginning (hdparm -W 0 /dev/sdb). They're quite old, but each 
of them  
is capable of delivering ~22000 iops in journal mode (fio -sync=1  
-direct=1 -iodepth=1 -bs=4k -rw=write).

However, RBD single-threaded random-write benchmark originally gave 
awful  
results - when testing with `fio -ioengine=libaio -size=10G -sync=1 
 
-direct=1 -name=test -bs=4k -iodepth=1 -rw=randwrite -runtime=60  
-filename=./testfile` from inside a VM, the result was only 58 iops 
 
average (17ms latency). This was not what I expected from the 
HDD+SSD  
setup.

But today I tried to play with cache settings for data disks. And I 
was  
really surprised to discover that just disabling HDD write cache 
(hdparm  
-W 0 /dev/sdX for all HDD devices) increases single-threaded 
performance  
~7 times! The result from the same VM (without even rebooting it) 
is  
iops=405, avg lat=2.47ms. That's a magnitude faster and in fact 
2.5ms  
seems sort of an expected number.

As I understand 4k writes are always deferred at the default 
setting of  
prefer_deferred_size_hdd=32768, this means they should only get 
written to  
the journal device before OSD acks the write operation.

So my question is WHY? Why does HDD write cache affect commit 
latency with  
WAL on an SSD?

I would also appreciate if anybody with similar setup (HDD+SSD with 
 
desktop SATA controllers or HBA) could test the same thing...

-- 
With best regards,
   Vitaliy Filippov
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times

2018-11-10 Thread Ashley Merrick
I've just worked out I had the same issue, been trying to work out the
cause for the past few days!

However I am using brand new enterprise Toshiba drivers with 256MB write
cache, was seeing I/O wait peaks of 40% even during a small writing
operation to CEPH and commit / apply latency's in the 40ms+.

Just went through and disabled the write cache on each drive, and done a
few tests with the exact same write performance, but I/O wait in the <1%
and commit / apply latency's in the 1-3ms max.

Something somewhere definitely doesn't seem to like the write cache being
enabled on the disks, this is a EC Pool in the latest Mimic version.

On Sun, Nov 11, 2018 at 5:34 AM Vitaliy Filippov  wrote:

> Hi
>
> A weird thing happens in my test cluster made from desktop hardware.
>
> The command `for i in /dev/sd?; do hdparm -W 0 $i; done` increases
> single-thread write iops (reduces latency) 7 times!
>
> It is a 3-node cluster with Ryzen 2700 CPUs, 3x SATA 7200rpm HDDs + 1x
> SATA desktop SSD for system and ceph-mon + 1x SATA server SSD for
> block.db/wal in each host. Hosts are linked by 10gbit ethernet (not the
> fastest one though, average RTT according to flood-ping is 0.098ms). Ceph
> and OpenNebula are installed on the same hosts, OSDs are prepared with
> ceph-volume and bluestore with default options. SSDs have capacitors
> ('power-loss protection'), write cache is turned off for them since the
> very beginning (hdparm -W 0 /dev/sdb). They're quite old, but each of
> them
> is capable of delivering ~22000 iops in journal mode (fio -sync=1
> -direct=1 -iodepth=1 -bs=4k -rw=write).
>
> However, RBD single-threaded random-write benchmark originally gave awful
> results - when testing with `fio -ioengine=libaio -size=10G -sync=1
> -direct=1 -name=test -bs=4k -iodepth=1 -rw=randwrite -runtime=60
> -filename=./testfile` from inside a VM, the result was only 58 iops
> average (17ms latency). This was not what I expected from the HDD+SSD
> setup.
>
> But today I tried to play with cache settings for data disks. And I was
> really surprised to discover that just disabling HDD write cache (hdparm
> -W 0 /dev/sdX for all HDD devices) increases single-threaded performance
> ~7 times! The result from the same VM (without even rebooting it) is
> iops=405, avg lat=2.47ms. That's a magnitude faster and in fact 2.5ms
> seems sort of an expected number.
>
> As I understand 4k writes are always deferred at the default setting of
> prefer_deferred_size_hdd=32768, this means they should only get written
> to
> the journal device before OSD acks the write operation.
>
> So my question is WHY? Why does HDD write cache affect commit latency
> with
> WAL on an SSD?
>
> I would also appreciate if anybody with similar setup (HDD+SSD with
> desktop SATA controllers or HBA) could test the same thing...
>
> --
> With best regards,
>Vitaliy Filippov
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Disabling write cache on SATA HDDs reduces write latency 7 times

2018-11-10 Thread Vitaliy Filippov

Hi

A weird thing happens in my test cluster made from desktop hardware.

The command `for i in /dev/sd?; do hdparm -W 0 $i; done` increases  
single-thread write iops (reduces latency) 7 times!


It is a 3-node cluster with Ryzen 2700 CPUs, 3x SATA 7200rpm HDDs + 1x  
SATA desktop SSD for system and ceph-mon + 1x SATA server SSD for  
block.db/wal in each host. Hosts are linked by 10gbit ethernet (not the  
fastest one though, average RTT according to flood-ping is 0.098ms). Ceph  
and OpenNebula are installed on the same hosts, OSDs are prepared with  
ceph-volume and bluestore with default options. SSDs have capacitors  
('power-loss protection'), write cache is turned off for them since the  
very beginning (hdparm -W 0 /dev/sdb). They're quite old, but each of them  
is capable of delivering ~22000 iops in journal mode (fio -sync=1  
-direct=1 -iodepth=1 -bs=4k -rw=write).


However, RBD single-threaded random-write benchmark originally gave awful  
results - when testing with `fio -ioengine=libaio -size=10G -sync=1  
-direct=1 -name=test -bs=4k -iodepth=1 -rw=randwrite -runtime=60  
-filename=./testfile` from inside a VM, the result was only 58 iops  
average (17ms latency). This was not what I expected from the HDD+SSD  
setup.


But today I tried to play with cache settings for data disks. And I was  
really surprised to discover that just disabling HDD write cache (hdparm  
-W 0 /dev/sdX for all HDD devices) increases single-threaded performance  
~7 times! The result from the same VM (without even rebooting it) is  
iops=405, avg lat=2.47ms. That's a magnitude faster and in fact 2.5ms  
seems sort of an expected number.


As I understand 4k writes are always deferred at the default setting of  
prefer_deferred_size_hdd=32768, this means they should only get written to  
the journal device before OSD acks the write operation.


So my question is WHY? Why does HDD write cache affect commit latency with  
WAL on an SSD?


I would also appreciate if anybody with similar setup (HDD+SSD with  
desktop SATA controllers or HBA) could test the same thing...


--
With best regards,
  Vitaliy Filippov
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com