Re: [ceph-users] CephFS performance vs. underlying storage

2019-01-30 Thread Marc Roos

I was wondering the same, from a 'default' setup I get this performance,
no idea if this is bad, good or normal.

4k r ran. 

4k w ran. 

4k r seq. 

4k w seq. 

1024k r ran. 

1024k w ran. 

1024k r seq. 

1024k w seq. 

  size 

lat 

iops 

kB/s 

lat 

iops 

kB/s 

lat 

iops 

MB/s 

lat 

iops 

MB/s 

lat 

iops 

MB/s 

lat 

iops 

MB/s 

lat 

iops 

MB/s 

lat 

iops 

MB/s 

Cephfs 

ssd rep. 3 

  2.78 

1781 

7297 

1.42 

700 

2871 

0.29 

3314 

13.6 

0.04 

889 

3.64 

4.3 

231 

243 

0.08 

132 

139 

4.23 

235 

247 

6.99 

142 

150 

Cephfs 

ssd rep. 1 

  0.54 

1809 

7412 

0.8 

1238 

5071 

0.29 

3325 

13.6 

0.56 

1761 

7.21 

4.27 

233 

245 

4.34 

229 

241 

4.21 

236 

248 

4.34 

229 

241 

Samsung 

MZK7KM480 

480GB 

   0.09 

10.2k 

41600 

0.05 

17.9k 

73200 

0.05 

18k 

77.6 

0.05 

18.3k 

75.1 

2.06 

482 

506 

2.16 

460 

483 

1.98 

502 

527 

2.13 

466 

489 


(4 nodes, CentOS7, luminous) 

Ps. not sure why you test with one node. If you expand to a 2nd node,
you might get a unpleasant surprise with a drop in performance, because
you will be adding network latency that decreases your iops.



-Original Message-
From: Hector Martin [mailto:hec...@marcansoft.com]
Sent: 30 January 2019 19:43
To: ceph-users@lists.ceph.com
Subject: [ceph-users] CephFS performance vs. underlying storage

Hi list,

I'm experimentally running single-host CephFS as as replacement for
"traditional" filesystems.

My setup is 8×8TB HDDs using dm-crypt, with CephFS on a 5+2 EC pool. All
of the components are running on the same host (mon/osd/mds/kernel
CephFS client). I've set the stripe_unit/object_size to a relatively
high 80MB (up from the default 4MB). I figure I want individual reads on
the disks to be several megabytes per object for good sequential
performance, and since this is an EC pool 4MB objects would be split
into 800kB chunks, which is clearly not ideal. With 80MB objects, chunks
are 16MB, which sounds more like a healthy read size for sequential
access (e.g. something like 10 IOPS per disk during seq reads).

With this config, I get about 270MB/s sequential from CephFS. On the
same disks, an ext4 on dm-crypt on dm-raid6 yields ~680MB/s. So it seems
Ceph achieves less than half of the raw performance that the underlying
storage is capable of (with similar RAID redundancy). *

Obviously there will be some overhead with a stack as deep as Ceph
compared to more traditional setups, but I'm wondering if there are
improvements to be had here. While reading from CephFS I do not have
significant CPU usage, so I don't think I'm CPU limited. Could the issue
perhaps be latency through the stack / lack of read-ahead? Reading two
files in parallel doesn't really get me more than 300MB/s in total, so
parallelism doesn't seem to help much.

I'm curious as to whether there are any knobs I can play with to try to
improve performance, or whether this level of overhead is pretty much
inherent to Ceph. Even though this is an unusual single-host setup, I
imagine proper clusters might also have similar results when comparing
raw storage performance.

* Ceph has a slight disadvantage here because its chunk of the drives is
logically after the traditional RAID, and HDDs get slower towards higher
logical addresses, but this should be on the order of a 15-20% hit at
most.

--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS performance vs. underlying storage

2019-01-30 Thread Hector Martin
Hi list,

I'm experimentally running single-host CephFS as as replacement for
"traditional" filesystems.

My setup is 8×8TB HDDs using dm-crypt, with CephFS on a 5+2 EC pool. All
of the components are running on the same host (mon/osd/mds/kernel
CephFS client). I've set the stripe_unit/object_size to a relatively
high 80MB (up from the default 4MB). I figure I want individual reads on
the disks to be several megabytes per object for good sequential
performance, and since this is an EC pool 4MB objects would be split
into 800kB chunks, which is clearly not ideal. With 80MB objects, chunks
are 16MB, which sounds more like a healthy read size for sequential
access (e.g. something like 10 IOPS per disk during seq reads).

With this config, I get about 270MB/s sequential from CephFS. On the
same disks, an ext4 on dm-crypt on dm-raid6 yields ~680MB/s. So it seems
Ceph achieves less than half of the raw performance that the underlying
storage is capable of (with similar RAID redundancy). *

Obviously there will be some overhead with a stack as deep as Ceph
compared to more traditional setups, but I'm wondering if there are
improvements to be had here. While reading from CephFS I do not have
significant CPU usage, so I don't think I'm CPU limited. Could the issue
perhaps be latency through the stack / lack of read-ahead? Reading two
files in parallel doesn't really get me more than 300MB/s in total, so
parallelism doesn't seem to help much.

I'm curious as to whether there are any knobs I can play with to try to
improve performance, or whether this level of overhead is pretty much
inherent to Ceph. Even though this is an unusual single-host setup, I
imagine proper clusters might also have similar results when comparing
raw storage performance.

* Ceph has a slight disadvantage here because its chunk of the drives is
logically after the traditional RAID, and HDDs get slower towards higher
logical addresses, but this should be on the order of a 15-20% hit at most.

-- 
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com