Re: [ceph-users] Fwd: [ceph bad performance], can't find a bottleneck
Hi, Maged Not a big difference in both cases. Performance of 4 nodes pool with 5x PM863a each is: 4k bs - 33-37kIOPS krbd 128 threads and 42-51kIOPS vs 1024 threads (fio numjobs 128-256-512) the same situation happens when we try to increase rbd workload, 3 rbd gets the same iops #. Dead end & limit ) Thank you! 2018-03-12 21:49 GMT+03:00 Maged Mokhtar: > Hi, > > Try increasing the queue depth from default 128 to 1024: > > rbd map image-XX -o queue_depth=1024 > > > Also if you run multiple rbd images/fio tests, do you get higher combined > performance ? > > Maged > > > On 2018-03-12 17:16, Sergey Kotov wrote: > > Dear moderator, i subscribed to ceph list today, could you please post my > message? > > -- Forwarded message -- > From: Sergey Kotov > Date: 2018-03-06 10:52 GMT+03:00 > Subject: [ceph bad performance], can't find a bottleneck > To: ceph-users@lists.ceph.com > Cc: Житенев Алексей , Anna Anikina < > anik...@gmail.com> > > > Good day. > > Can you please help us to find bottleneck in our ceph installations. > We have 3 SSD-only clusters with different HW, but situation is the same - > overall i/o operations between client & ceph lower than 1/6 of summary > performance all ssd. > > For example - > One of our cluster has 4-nodes with ssd Toshiba 2Tb Enterprise drives, > installed on Ubuntu server 16.04. > Servers are connected to the 10G switches. Latency between modes is about > 0.1ms. Ethernet utilisation is low. > > # uname -a > Linux storage01 4.4.0-101-generic #124-Ubuntu SMP Fri Nov 10 18:29:59 UTC > 2017 x86_64 x86_64 x86_64 GNU/Linux > > # ceph osd versions > { > "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) > luminous (stable)": 55 > } > > > When we map rbd image direct on the storage nodes via krbd, performance is > not good enough. > We use fio for testing. Even we run randwrite with 4k block size test in > multiple thread mode, our drives don't have utilisation higher then 30% and > lat is ok. > > At the same time iostat tool displays 100% utilisation on /dev/rbdX. > > Also we can't enable rbd_cache, because of using scst iscsi over rbd > mapped images. > > How can we resolve the issue? > > Ceph config: > > [global] > fsid = beX482fX-6a91-46dX-ad22-21a8a2696abX > mon_initial_members = storage01, storage02, storage03 > mon_host = X.Y.Z.1,X.Y.Z.2,X.Y.Z.3 > auth_cluster_required = cephx > auth_service_required = cephx > auth_client_required = cephx > public_network = X.Y.Z.0/24 > filestore_xattr_use_omap = true > osd_pool_default_size = 2 > osd_pool_default_min_size = 1 > osd_pool_default_pg_num = 1024 > osd_journal_size = 10240 > osd_mkfs_type = xfs > filestore_op_threads = 16 > filestore_wbthrottle_enable = False > throttler_perf_counter = False > osd crush update on start = false > > [osd] > osd_scrub_begin_hour = 1 > osd_scrub_end_hour = 6 > osd_scrub_priority = 1 > > osd_enable_op_tracker = False > osd_max_backfills = 1 > osd heartbeat grace = 20 > osd heartbeat interval = 5 > osd recovery max active = 1 > osd recovery max single start = 1 > osd recovery op priority = 1 > osd recovery threads = 1 > osd backfill scan max = 16 > osd backfill scan min = 4 > osd max scrubs = 1 > osd scrub interval randomize ratio = 1.0 > osd disk thread ioprio class = idle > osd disk thread ioprio priority = 0 > osd scrub chunk max = 1 > osd scrub chunk min = 1 > osd deep scrub stride = 1048576 > osd scrub load threshold = 5.0 > osd scrub sleep = 0.1 > > [client] > rbd_cache = false > > > Sample fio tests: > > root@storage04:~# fio --name iops --rw randread --bs 4k --filename > /dev/rbd2 --numjobs 12 --ioengine=libaio --group_reporting > iops: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1 > ... > fio-2.2.10 > Starting 12 processes > ^Cbs: 12 (f=12): [r(12)] [1.2% done] [128.4MB/0KB/0KB /s] [32.9K/0/0 iops] > [eta 16m:49s] > fio: terminating on signal 2 > > iops: (groupid=0, jobs=12): err= 0: pid=29812: Sun Feb 11 23:59:19 2018 > read : io=1367.8MB, bw=126212KB/s, iops=31553, runt= 11097msec > slat (usec): min=1, max=59700, avg=375.92, stdev=495.19 > clat (usec): min=0, max=377, avg= 1.12, stdev= 3.16 > lat (usec): min=1, max=59702, avg=377.61, stdev=495.32 > clat percentiles (usec): > | 1.00th=[0], 5.00th=[0], 10.00th=[1], 20.00th=[1], > | 30.00th=[1], 40.00th=[1], 50.00th=[1], 60.00th=[1], > | 70.00th=[1], 80.00th=[1], 90.00th=[1], 95.00th=[2], > | 99.00th=[2], 99.50th=[2], 99.90th=[ 73], 99.95th=[ 78], > | 99.99th=[ 115] > bw (KB /s): min= 8536, max=11944, per=8.33%, avg=10516.45, > stdev=635.32 > lat (usec) : 2=91.74%, 4=7.93%, 10=0.14%, 20=0.09%, 50=0.01% > lat (usec) : 100=0.07%, 250=0.03%, 500=0.01% > cpu : usr=1.32%, sys=3.69%, ctx=329556, majf=0, minf=134 > IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, > >=64=0.0% >
Re: [ceph-users] Fwd: [ceph bad performance], can't find a bottleneck
Hi, Try increasing the queue depth from default 128 to 1024: rbd map image-XX -o queue_depth=1024 Also if you run multiple rbd images/fio tests, do you get higher combined performance ? Maged On 2018-03-12 17:16, Sergey Kotov wrote: > Dear moderator, i subscribed to ceph list today, could you please post my > message? > > -- Forwarded message -- > From: SERGEY KOTOV> Date: 2018-03-06 10:52 GMT+03:00 > Subject: [ceph bad performance], can't find a bottleneck > To: ceph-users@lists.ceph.com > Cc: Житенев Алексей , Anna Anikina > > Good day. > > Can you please help us to find bottleneck in our ceph installations. > We have 3 SSD-only clusters with different HW, but situation is the same - > overall i/o operations between client & ceph lower than 1/6 of summary > performance all ssd. > > For example - > One of our cluster has 4-nodes with ssd Toshiba 2Tb Enterprise drives, > installed on Ubuntu server 16.04. > Servers are connected to the 10G switches. Latency between modes is about > 0.1ms. Ethernet utilisation is low. > > # uname -a > Linux storage01 4.4.0-101-generic #124-Ubuntu SMP Fri Nov 10 18:29:59 UTC > 2017 x86_64 x86_64 x86_64 GNU/Linux > > # ceph osd versions > { > "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous > (stable)": 55 > } > > When we map rbd image direct on the storage nodes via krbd, performance is > not good enough. > We use fio for testing. Even we run randwrite with 4k block size test in > multiple thread mode, our drives don't have utilisation higher then 30% and > lat is ok. > > At the same time iostat tool displays 100% utilisation on /dev/rbdX. > > Also we can't enable rbd_cache, because of using scst iscsi over rbd mapped > images. > > How can we resolve the issue? > > Ceph config: > > [global] > fsid = beX482fX-6a91-46dX-ad22-21a8a2696abX > mon_initial_members = storage01, storage02, storage03 > mon_host = X.Y.Z.1,X.Y.Z.2,X.Y.Z.3 > auth_cluster_required = cephx > auth_service_required = cephx > auth_client_required = cephx > public_network = X.Y.Z.0/24 > filestore_xattr_use_omap = true > osd_pool_default_size = 2 > osd_pool_default_min_size = 1 > osd_pool_default_pg_num = 1024 > osd_journal_size = 10240 > osd_mkfs_type = xfs > filestore_op_threads = 16 > filestore_wbthrottle_enable = False > throttler_perf_counter = False > osd crush update on start = false > > [osd] > osd_scrub_begin_hour = 1 > osd_scrub_end_hour = 6 > osd_scrub_priority = 1 > > osd_enable_op_tracker = False > osd_max_backfills = 1 > osd heartbeat grace = 20 > osd heartbeat interval = 5 > osd recovery max active = 1 > osd recovery max single start = 1 > osd recovery op priority = 1 > osd recovery threads = 1 > osd backfill scan max = 16 > osd backfill scan min = 4 > osd max scrubs = 1 > osd scrub interval randomize ratio = 1.0 > osd disk thread ioprio class = idle > osd disk thread ioprio priority = 0 > osd scrub chunk max = 1 > osd scrub chunk min = 1 > osd deep scrub stride = 1048576 > osd scrub load threshold = 5.0 > osd scrub sleep = 0.1 > > [client] > rbd_cache = false > > Sample fio tests: > > root@storage04:~# fio --name iops --rw randread --bs 4k --filename /dev/rbd2 > --numjobs 12 --ioengine=libaio --group_reporting > iops: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1 > ... > fio-2.2.10 > Starting 12 processes > ^Cbs: 12 (f=12): [r(12)] [1.2% done] [128.4MB/0KB/0KB /s] [32.9K/0/0 iops] > [eta 16m:49s] > fio: terminating on signal 2 > > iops: (groupid=0, jobs=12): err= 0: pid=29812: Sun Feb 11 23:59:19 2018 > read : io=1367.8MB, bw=126212KB/s, iops=31553, runt= 11097msec > slat (usec): min=1, max=59700, avg=375.92, stdev=495.19 > clat (usec): min=0, max=377, avg= 1.12, stdev= 3.16 > lat (usec): min=1, max=59702, avg=377.61, stdev=495.32 > clat percentiles (usec): > | 1.00th=[0], 5.00th=[0], 10.00th=[1], 20.00th=[1], > | 30.00th=[1], 40.00th=[1], 50.00th=[1], 60.00th=[1], > | 70.00th=[1], 80.00th=[1], 90.00th=[1], 95.00th=[2], > | 99.00th=[2], 99.50th=[2], 99.90th=[ 73], 99.95th=[ 78], > | 99.99th=[ 115] > bw (KB /s): min= 8536, max=11944, per=8.33%, avg=10516.45, stdev=635.32 > lat (usec) : 2=91.74%, 4=7.93%, 10=0.14%, 20=0.09%, 50=0.01% > lat (usec) : 100=0.07%, 250=0.03%, 500=0.01% > cpu : usr=1.32%, sys=3.69%, ctx=329556, majf=0, minf=134 > IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% > submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > issued: total=r=350144/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 > latency : target=0, window=0, percentile=100.00%, depth=1 > > Run status group 0 (all jobs): > READ: io=1367.8MB, aggrb=126212KB/s, minb=126212KB/s, maxb=126212KB/s, > mint=11097msec,
[ceph-users] Fwd: [ceph bad performance], can't find a bottleneck
Dear moderator, i subscribed to ceph list today, could you please post my message? -- Forwarded message -- From: Sergey KotovDate: 2018-03-06 10:52 GMT+03:00 Subject: [ceph bad performance], can't find a bottleneck To: ceph-users@lists.ceph.com Cc: Житенев Алексей , Anna Anikina Good day. Can you please help us to find bottleneck in our ceph installations. We have 3 SSD-only clusters with different HW, but situation is the same - overall i/o operations between client & ceph lower than 1/6 of summary performance all ssd. For example - One of our cluster has 4-nodes with ssd Toshiba 2Tb Enterprise drives, installed on Ubuntu server 16.04. Servers are connected to the 10G switches. Latency between modes is about 0.1ms. Ethernet utilisation is low. # uname -a Linux storage01 4.4.0-101-generic #124-Ubuntu SMP Fri Nov 10 18:29:59 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux # ceph osd versions { "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 55 } When we map rbd image direct on the storage nodes via krbd, performance is not good enough. We use fio for testing. Even we run randwrite with 4k block size test in multiple thread mode, our drives don't have utilisation higher then 30% and lat is ok. At the same time iostat tool displays 100% utilisation on /dev/rbdX. Also we can't enable rbd_cache, because of using scst iscsi over rbd mapped images. How can we resolve the issue? Ceph config: [global] fsid = beX482fX-6a91-46dX-ad22-21a8a2696abX mon_initial_members = storage01, storage02, storage03 mon_host = X.Y.Z.1,X.Y.Z.2,X.Y.Z.3 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx public_network = X.Y.Z.0/24 filestore_xattr_use_omap = true osd_pool_default_size = 2 osd_pool_default_min_size = 1 osd_pool_default_pg_num = 1024 osd_journal_size = 10240 osd_mkfs_type = xfs filestore_op_threads = 16 filestore_wbthrottle_enable = False throttler_perf_counter = False osd crush update on start = false [osd] osd_scrub_begin_hour = 1 osd_scrub_end_hour = 6 osd_scrub_priority = 1 osd_enable_op_tracker = False osd_max_backfills = 1 osd heartbeat grace = 20 osd heartbeat interval = 5 osd recovery max active = 1 osd recovery max single start = 1 osd recovery op priority = 1 osd recovery threads = 1 osd backfill scan max = 16 osd backfill scan min = 4 osd max scrubs = 1 osd scrub interval randomize ratio = 1.0 osd disk thread ioprio class = idle osd disk thread ioprio priority = 0 osd scrub chunk max = 1 osd scrub chunk min = 1 osd deep scrub stride = 1048576 osd scrub load threshold = 5.0 osd scrub sleep = 0.1 [client] rbd_cache = false Sample fio tests: root@storage04:~# fio --name iops --rw randread --bs 4k --filename /dev/rbd2 --numjobs 12 --ioengine=libaio --group_reporting iops: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1 ... fio-2.2.10 Starting 12 processes ^Cbs: 12 (f=12): [r(12)] [1.2% done] [128.4MB/0KB/0KB /s] [32.9K/0/0 iops] [eta 16m:49s] fio: terminating on signal 2 iops: (groupid=0, jobs=12): err= 0: pid=29812: Sun Feb 11 23:59:19 2018 read : io=1367.8MB, bw=126212KB/s, iops=31553, runt= 11097msec slat (usec): min=1, max=59700, avg=375.92, stdev=495.19 clat (usec): min=0, max=377, avg= 1.12, stdev= 3.16 lat (usec): min=1, max=59702, avg=377.61, stdev=495.32 clat percentiles (usec): | 1.00th=[0], 5.00th=[0], 10.00th=[1], 20.00th=[1], | 30.00th=[1], 40.00th=[1], 50.00th=[1], 60.00th=[1], | 70.00th=[1], 80.00th=[1], 90.00th=[1], 95.00th=[2], | 99.00th=[2], 99.50th=[2], 99.90th=[ 73], 99.95th=[ 78], | 99.99th=[ 115] bw (KB /s): min= 8536, max=11944, per=8.33%, avg=10516.45, stdev=635.32 lat (usec) : 2=91.74%, 4=7.93%, 10=0.14%, 20=0.09%, 50=0.01% lat (usec) : 100=0.07%, 250=0.03%, 500=0.01% cpu : usr=1.32%, sys=3.69%, ctx=329556, majf=0, minf=134 IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued: total=r=350144/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): READ: io=1367.8MB, aggrb=126212KB/s, minb=126212KB/s, maxb=126212KB/s, mint=11097msec, maxt=11097msec Disk stats (read/write): rbd2: ios=323072/0, merge=0/0, ticks=124268/0, in_queue=124680, util=99.31% root@storage04:~# fio --name iops --rw randwrite --bs 4k --filename /dev/rbd2 --numjobs 12 --ioengine=libaio --group_reporting iops: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1 ... fio-2.2.10 Starting 12 processes ^Cbs: 12 (f=12): [w(12)] [25.0% done] [0KB/713.5MB/0KB /s] [0/183K/0 iops] [eta 00m:45s] fio: