subject:"\[ceph\-users\] Slow\/Hung IOs"

Re: [ceph-users] Slow/Hung IOs

2015-01-09 Thread Craig Lewis

I doesn't seem like the problem here, but I've noticed that slow OSDs have
a large fan-out.  I have less than 100 OSDs, so every OSD talks to every
other OSD in my cluster.

I was getting slow notices from all of my OSDs.  Nothing jumped out, so I
started looking at disk write latency graphs.  I noticed that all the OSDs
in one node had 10x the write latency of the other nodes.  After that, I
graphed the number of slow notices per OSD, and noticed that a much higher
number of slow requests on that node.

Long story short, I lost a battery on my write cache.  But it wasn't at all
obvious from the slow request notices, not until I dug deeper.



On Mon, Jan 5, 2015 at 4:07 PM, Sanders, Bill 
wrote:

>  Thanks for the reply.
>
> 14 and 18 happened to show up during that run, but its certainly not only
> those OSD's.  It seems to vary each run.  Just from the runs I've done
> today I've seen the following pairs of OSD's:
>
> ['0,13', '0,18', '0,24', '0,25', '0,32', '0,34', '0,36', '10,22', '11,30',
> '12,28', '13,30', '14,22', '14,24', '14,27', '14,30', '14,31', '14,33',
> '14,34', '14,35', '14,39', '16,20', '16,27', '18,38', '19,30', '19,31',
> '19,39', '20,38', '22,30', '26,37', '26,38', '27,33', '27,34', '27,36',
> '28,32', '28,34', '28,36', '28,37', '3,18', '3,27', '3,29', '3,37', '4,10',
> '4,29', '5,19', '5,37', '6,25', '9,28', '9,29', '9,37']
>
> Which is almost all of the OSD's in the system.
>
> Bill
>
>  --
> *From:* Lincoln Bryant [linco...@uchicago.edu]
> *Sent:* Monday, January 05, 2015 3:40 PM
> *To:* Sanders, Bill
> *Cc:* ceph-users@lists.ceph.com
> *Subject:* Re: [ceph-users] Slow/Hung IOs
>
>  Hi BIll,
>
>  From your log excerpt, it looks like your slow requests are happening on
> OSDs 14 and 18. Is it always these two OSDs?
>
>  If you don't have a long recovery time (e.g., the cluster is just full
> of test data), maybe you could try setting OSDs 14 and 18 out and
> re-benching?
>
>  Alternatively I suppose you could just use bonnie++ or dd etc to write
> to those OSDs (careful to not clobber any Ceph dirs) and see how the
> performance looks.
>
>  Cheers,
> Lincoln
>
>   On Jan 5, 2015, at 4:36 PM, Sanders, Bill wrote:
>
>   Hi Ceph Users,
>
> We've got a Ceph cluster we've built, and we're experiencing issues with
> slow or hung IO's, even running 'rados bench' on the OSD cluster.  Things
> start out great, ~600 MB/s, then rapidly drops off as the test waits for
> IO's. Nothing seems to be taxed... the system just seems to be waiting.
> Any help trying to figure out what could cause the slow IO's is appreciated.
>
> For example, 'rados -p rbd bench 60 write -t 32' takes over 900s to
> complete:
>
> A typical rados bench:
>  Total time run: 957.458274
> Total writes made:  9251
> Write size: 4194304
> Bandwidth (MB/sec): 38.648
>
> Stddev Bandwidth:   157.323
> Max bandwidth (MB/sec): 964
> Min bandwidth (MB/sec): 0
> Average Latency:3.21126
> Stddev Latency: 51.9546
> Max latency:910.72
> Min latency:0.04516
>
>
> According to ceph.log, we're not experiencing any OSD flapping or monitor
> election cycles, just slow requests:
>
> # grep slow /var/log/ceph/ceph.log:
> 2015-01-05 13:42:42.937678 osd.18 39.7.48.7:6803/11185 220 : [WRN] 3 slow
> requests, 1 included below; oldest blocked for > 513.611379 secs
> 2015-01-05 13:42:42.937685 osd.18 39.7.48.7:6803/11185 221 : [WRN] slow
> request 30.136429 seconds old, received at 2015-01-05 13:42:12.801205:
> osd_op(client.92008.1:3101508 rb.0.1437.238e1f29.000f [write
> 114688~512] 3.841c0edf ondisk+write e994) v4 currently waiting for subops
> from 3,37
> 2015-01-05 13:42:49.938681 osd.18 39.7.48.7:6803/11185 222 : [WRN] 3 slow
> requests, 1 included below; oldest blocked for > 520.612372 secs
> 2015-01-05 13:42:49.938688 osd.18 39.7.48.7:6803/11185 223 : [WRN] slow
> request 480.636547 seconds old, received at 2015-01-05 13:34:49.302080:
> osd_op(client.92008.1:3100010 rb.0.140d.238e1f29.0c77 [write
> 3622400~512] 3.d031a69f ondisk+write e994) v4 currently waiting for subops
> from 26,37
>

Re: [ceph-users] Slow/Hung IOs

2015-01-07 Thread Sanders, Bill

Thanks for your reply, Christian.  Sorry for my delay in responding.

The kernel logs are silent.  Forgot to mention before that ntpd is running and 
the nodes are sync'd.

I'm working on some folks for an updated kernel, but I'm not holding my breath. 
 That said, If I'm seeing this problem by running rados bench on the storage 
cluster itself, is it fair to say that the kernel code isn't the issue?

vm/min_free_kbytes is now set to 512M, though that didn't solve the issue.  I 
also set "filestore_max_sync_interval = 30" (and commented out the journal 
line) as you suggested, but that didn't seem to change anything, either.  Not 
sure what you mean about the monitors and SSD's... they currently *are* hosted 
on SSD's, which don't appear to be 

When rados bench starts, atop (holy crap that's a lot of info) shows that the 
HDD's go crazy for a little while (busy >85%).  The SSD's never get that busy 
(certainly <50%).  I attached a few 'snapshots' of atop taken just after the 
test starts (~12s), while it was still running (~30s), and after the test was 
supposed to have ended (~70s), but was essentially waiting for slow-requests.  
The only thing red-lining at all were the HDD's

I wonder how I could test our network.  Are you thinking its possible we're 
losing packets?  I'll ping (har!) our network guy... 

I have to admit that the OSD logs don't mean a whole lot to me.  Are OSD log 
entries like this normal?  This is not from during the test, but just before 
when the system was essentially idle.

2015-01-07 15:38:40.340883 7fa264ff7700  0 -- 39.71.48.8:6800/46686 >> 
39.71.48.6:6806/47930 pipe(0x7fa268c14480 sd=111 :40639 s=2 pgs=559 cs=13 l=0 
c=0x7fa283060080).fault with nothing to send, going to standby
2015-01-07 15:38:53.573890 7fa2b99f6700  0 -- 39.71.48.8:6800/46686 >> 
39.71.48.9:6805/23130 pipe(0x7fa268c55800 sd=127 :6800 s=2 pgs=152 cs=13 l=0 
c=0x7fa268c17e00).fault with nothing to send, going to standby
2015-01-07 15:38:55.881934 7fa281bfd700  0 -- 39.71.48.8:6800/46686 >> 
39.71.48.9:6809/44433 pipe(0x7fa268c12180 sd=65 :41550 s=2 pgs=599 cs=19 l=0 
c=0x7fa28305fc00).fault with nothing to send, going to standby
2015-01-07 15:38:56.360866 7fa29e1f6700  0 -- 39.71.48.8:6800/46686 >> 
39.71.48.6:6820/48681 pipe(0x7fa268c14980 sd=145 :6800 s=2 pgs=500 cs=21 l=0 
c=0x7fa28305fa80).fault with nothing to send, going to standby
2015-01-07 15:38:58.767181 7fa2a85f6700  0 -- 39.71.48.8:6800/46686 >> 
39.71.48.6:6820/48681 pipe(0x7fa268c55d00 sd=52 :6800 s=0 pgs=0 cs=0 l=0 
c=0x7fa268c18b80).accept connect_seq 22 vs existing 21 state standby
2015-01-07 15:38:58.943514 7fa253cf0700  0 -- 39.71.48.8:6800/46686 >> 
39.71.48.9:6805/23130 pipe(0x7fa268c55f80 sd=49 :6800 s=0 pgs=0 cs=0 l=0 
c=0x7fa268c18d00).accept connect_seq 14 vs existing 13 state standby

For the OSD complaining about slow requests its logs show something like during 
the test:

2015-01-07 15:47:28.463470 7fc0714f0700  0 -- 39.7.48.7:6812/16907 >> 
39.7.48.4:0/3544514455 pipe(0x7fc08f827a80 sd=153 :6812 s=0 pgs=0 cs=0 l=0 
c=0x7fc08f882580).accept peer addr is really 39.7.48.4:0/3544514455 (socket is 
39.7.48.4:464
35/0)
2015-01-07 15:48:04.426399 7fc0e9bfd700  0 log [WRN] : 1 slow requests, 1 
included below; oldest blocked for > 30.738429 secs
2015-01-07 15:48:04.426416 7fc0e9bfd700  0 log [WRN] : slow request 30.738429 
seconds old, received at 2015-01-07 15:47:33.687935: osd_op(client.92886.0:4711 
benchmark_data_tvsaq1_29431_object4710 [write 0~4194304] 3.1639422f ack+ondisk+
write e1464) v4 currently waiting for subops from 22,36
2015-01-07 15:48:34.429979 7fc0e9bfd700  0 log [WRN] : 1 slow requests, 1 
included below; oldest blocked for > 60.742016 secs
2015-01-07 15:48:34.429997 7fc0e9bfd700  0 log [WRN] : slow request 60.742016 
seconds old, received at 2015-01-07 15:47:33.687935: osd_op(client.92886.0:4711 
benchmark_data_tvsaq1_29431_object4710 [write 0~4194304] 3.1639422f ack+ondisk+
write e1464) v4 currently waiting for subops from 22,36

____
From: Christian Balzer [ch...@gol.com]
Sent: Tuesday, January 06, 2015 12:25 AM
To: ceph-users@lists.ceph.com
Cc: Sanders, Bill
Subject: Re: [ceph-users] Slow/Hung IOs

On Mon, 5 Jan 2015 22:36:29 + Sanders, Bill wrote:

> Hi Ceph Users,
>
> We've got a Ceph cluster we've built, and we're experiencing issues with
> slow or hung IO's, even running 'rados bench' on the OSD cluster.
> Things start out great, ~600 MB/s, then rapidly drops off as the test
> waits for IO's. Nothing seems to be taxed... the system just seems to be
> waiting.  Any help trying to figure out what could cause the slow IO's
> is appreciated.
>
I assume nothing in the logs of the respective OSDs either?
Kernel or other logs equally silent?

Watc

Re: [ceph-users] Slow/Hung IOs

2015-01-07 Thread Christian Balzer

0x7fa268c18b80).accept connect_seq 22 vs existing 21 state standby
> 2015-01-07 15:38:58.943514 7fa253cf0700  0 -- 39.71.48.8:6800/46686 >>
> 39.71.48.9:6805/23130 pipe(0x7fa268c55f80 sd=49 :6800 s=0 pgs=0 cs=0 l=0
> c=0x7fa268c18d00).accept connect_seq 14 vs existing 13 state standby
> 
Totally normal.

> 
> For the OSD complaining about slow requests its logs show something like
> during the test:
> 
> 2015-01-07 15:47:28.463470 7fc0714f0700  0 -- 39.7.48.7:6812/16907 >>
> 39.7.48.4:0/3544514455 pipe(0x7fc08f827a80 sd=153 :6812 s=0 pgs=0 cs=0
> l=0 c=0x7fc08f882580).accept peer addr is really 39.7.48.4:0/3544514455
> (socket is 39.7.48.4:464 35/0) 2015-01-07 15:48:04.426399 7fc0e9bfd700
> 0 log [WRN] : 1 slow requests, 1 included below; oldest blocked for >
> 30.738429 secs 2015-01-07 15:48:04.426416 7fc0e9bfd700  0 log [WRN] :
> slow request 30.738429 seconds old, received at 2015-01-07
> 15:47:33.687935: osd_op(client.92886.0:4711
> benchmark_data_tvsaq1_29431_object4710 [write 0~4194304] 3.1639422f
> ack+ondisk+ write e1464) v4 currently waiting for subops from 22,36
> 2015-01-07 15:48:34.429979 7fc0e9bfd700  0 log [WRN] : 1 slow requests,
> 1 included below; oldest blocked for > 60.742016 secs 2015-01-07
> 15:48:34.429997 7fc0e9bfd700  0 log [WRN] : slow request 60.742016
> seconds old, received at 2015-01-07 15:47:33.687935:
> osd_op(client.92886.0:4711 benchmark_data_tvsaq1_29431_object4710 [write
> 0~4194304] 3.1639422f ack+ondisk+ write e1464) v4 currently waiting for
> subops from 22,36
> 
Which is "normal" and unfortunately not particular informative.

Look at things with:
ceph --admin-daemon /var/run/ceph/ceph-osd.[slowone].asok dump_historic_ops 
when it happens. 

Christian

> 
> From: Christian Balzer [ch...@gol.com]
> Sent: Tuesday, January 06, 2015 12:25 AM
> To: ceph-users@lists.ceph.com
> Cc: Sanders, Bill
> Subject: Re: [ceph-users] Slow/Hung IOs
> 
> On Mon, 5 Jan 2015 22:36:29 + Sanders, Bill wrote:
> 
> > Hi Ceph Users,
> >
> > We've got a Ceph cluster we've built, and we're experiencing issues
> > with slow or hung IO's, even running 'rados bench' on the OSD cluster.
> > Things start out great, ~600 MB/s, then rapidly drops off as the test
> > waits for IO's. Nothing seems to be taxed... the system just seems to
> > be waiting.  Any help trying to figure out what could cause the slow
> > IO's is appreciated.
> >
> I assume nothing in the logs of the respective OSDs either?
> Kernel or other logs equally silent?
> 
> Watching things with atop (while running the test) not showing anything
> particular?
> 
> Looking at the myriad of throttles and other data in
> http://ceph.com/docs/next/dev/perf_counters/
> might be helpful for the affected OSDs.
> 
> Having this kind of (consistent?) trouble feels like a networking issue
> of sorts, OSDs not able to reach each other or something massively
> messed up in the I/O stack.
> 
> [snip]
> 
> > Our ceph cluster is 4x Dell R720xd nodes:
> > 2x1TB spinners configured in RAID for the OS
> > 10x4TB spinners for OSD's (XFS)
> > 2x400GB SSD's, each with 5x~50GB OSD journals
> > 2x Xeon E5-2620 CPU (/proc/cpuinfo reports 24 cores)
> > 128GB RAM
> > Two networks (public+cluster), both over infiniband
> >
> Usual IB kernel tuning done, network stack stuff and vm/min_free_kbytes
> to 512MB at least?
> 
> > Three monitors are configured on the first three nodes, and use a chunk
> > of one of the SSDs for their data, on an XFS partition
> >
> Since you see nothing in the logs probably not your issue, but monitors
> like the I/O for their leveldb fast, SSD recommended.
> 
> > Software:
> > SLES 11SP3, with some in house patching. (3.0.1 kernel, "ceph-client"
> > backported from 3.10) Ceph version: ceph-0.80.5-0.9.2, packaged by SUSE
> >
> Can't get a 3.16 backport for this?
> 
> > ceph.conf:
> > fsid = 3e8dbfd8-c3c8-4d30-80e2-cd059619d757
> > mon initial members = tvsaq1, tvsaq2, tvsar1
> > mon host = 39.7.48.6, 39.7.48.7, 39.7.48.8
> >
> > cluster network = 39.64.0.0/12
> > public network = 39.0.0.0/12
> > auth cluster required = cephx
> > auth service required = cephx
> > auth client required = cephx
> > osd journal size = 9000
> Not sure how this will affect things given that you have 50GB partitions.
> 
> I'd remove that line and replace it with something like:
> 
>  filestore_max_sync_interval = 30
> 
> (I use 10 with 10GB journals)
> 
> Regards,
> 
> Christian
> 
> > filestore xattr use omap = true
> > osd crush update on start = false
> > osd pool default size = 3
> > osd pool default min size = 1
> > osd pool default pg num = 4096
> > osd pool default pgp num = 4096
> >
> > mon clock drift allowed = .100
> > osd mount options xfs = rw,noatime,inode64
> >
> >
> >
> >
> 
> 
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Global OnLine Japan/Fusion Communications
> http://www.gol.com/


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Slow/Hung IOs

2015-01-06 Thread Gonzalo Aguilar Delgado

Hi, 

I just ran this test and found my system is not better. But I use 
commodity hardware. The only difference is latency. You should look at 
it.


Total time run: 62.412381
Total writes made:  919
Write size: 4194304
Bandwidth (MB/sec): 58.899 


Stddev Bandwidth:   27.7791
Max bandwidth (MB/sec): 108
Min bandwidth (MB/sec): 0
Average Latency:2.15587
Stddev Latency: 1.1327
Max latency:6.42248
Min latency:0.444754

I use 2 nodes 1GB net, but only get 100MB/s. I will look at it and 
increase node number. 


Best regards,


El lun, 5 de ene 2015 a las 11:36 , Sanders, Bill 
 escribió:

Hi Ceph Users,

We've got a Ceph cluster we've built, and we're experiencing issues 
with slow or hung IO's, even running 'rados bench' on the OSD 
cluster.  Things start out great, ~600 MB/s, then rapidly drops off 
as the test waits for IO's. Nothing seems to be taxed... the system 
just seems to be waiting.  Any help trying to figure out what could 
cause the slow IO's is appreciated.


For example, 'rados -p rbd bench 60 write -t 32' takes over 900s to 
complete:


A typical rados bench:
 Total time run: 957.458274
Total writes made:  9251
Write size: 4194304
Bandwidth (MB/sec): 38.648 


Stddev Bandwidth:   157.323
Max bandwidth (MB/sec): 964
Min bandwidth (MB/sec): 0
Average Latency:3.21126
Stddev Latency: 51.9546
Max latency:910.72
Min latency:0.04516


According to ceph.log, we're not experiencing any OSD flapping or 
monitor election cycles, just slow requests:


# grep slow /var/log/ceph/ceph.log:
2015-01-05 13:42:42.937678 osd.18 39.7.48.7:6803/11185 220 : [WRN] 3 
slow requests, 1 included below; oldest blocked for > 513.611379 secs
2015-01-05 13:42:42.937685 osd.18 39.7.48.7:6803/11185 221 : [WRN] 
slow request 30.136429 seconds old, received at 2015-01-05 
13:42:12.801205: osd_op(client.92008.1:3101508 
rb.0.1437.238e1f29.000f [write 114688~512] 3.841c0edf 
ondisk+write e994) v4 currently waiting for subops from 3,37
2015-01-05 13:42:49.938681 osd.18 39.7.48.7:6803/11185 222 : [WRN] 3 
slow requests, 1 included below; oldest blocked for > 520.612372 secs
2015-01-05 13:42:49.938688 osd.18 39.7.48.7:6803/11185 223 : [WRN] 
slow request 480.636547 seconds old, received at 2015-01-05 
13:34:49.302080: osd_op(client.92008.1:3100010 
rb.0.140d.238e1f29.0c77 [write 3622400~512] 3.d031a69f 
ondisk+write e994) v4 currently waiting for subops from 26,37
2015-01-05 13:43:12.941838 osd.18 39.7.48.7:6803/11185 224 : [WRN] 3 
slow requests, 1 included below; oldest blocked for > 543.615545 secs
2015-01-05 13:43:12.941844 osd.18 39.7.48.7:6803/11185 225 : [WRN] 
slow request 60.140595 seconds old, received at 2015-01-05 
13:42:12.801205: osd_op(client.92008.1:3101508 
rb.0.1437.238e1f29.000f [write 114688~512] 3.841c0edf 
ondisk+write e994) v4 currently waiting for subops from 3,37
2015-01-05 13:44:04.933440 osd.14 39.7.48.7:6818/11640 251 : [WRN] 4 
slow requests, 1 included below; oldest blocked for > 606.941954 secs
2015-01-05 13:44:04.933469 osd.14 39.7.48.7:6818/11640 252 : [WRN] 
slow request 240.101138 seconds old, received at 2015-01-05 
13:40:04.832272: osd_op(client.92008.1:3101102 
rb.0.142b.238e1f29.0010 [write 475136~512] 3.5e623815 
ondisk+write e994) v4 currently waiting for subops from 27,33
2015-01-05 13:44:12.950805 osd.18 39.7.48.7:6803/11185 226 : [WRN] 3 
slow requests, 1 included below; oldest blocked for > 603.624511 secs
2015-01-05 13:44:12.950812 osd.18 39.7.48.7:6803/11185 227 : [WRN] 
slow request 120.149561 seconds old, received at 2015-01-05 
13:42:12.801205: osd_op(client.92008.1:3101508 
rb.0.1437.238e1f29.000f [write 114688~512] 3.841c0edf 
ondisk+write e994) v4 currently waiting for subops from 3,37
2015-01-05 13:46:12.988010 osd.18 39.7.48.7:6803/11185 228 : [WRN] 3 
slow requests, 1 included below; oldest blocked for > 723.661722 secs
2015-01-05 13:46:12.988017 osd.18 39.7.48.7:6803/11185 229 : [WRN] 
slow request 240.186772 seconds old, received at 2015-01-05 
13:42:12.801205: osd_op(client.92008.1:3101508 
rb.0.1437.238e1f29.000f [write 114688~512] 3.841c0edf 
ondisk+write e994) v4 currently waiting for subops from 3,37
2015-01-05 13:46:18.971570 osd.14 39.7.48.7:6818/11640 253 : [WRN] 4 
slow requests, 1 included below; oldest blocked for > 740.980083 secs
2015-01-05 13:46:18.971577 osd.14 39.7.48.7:6818/11640 254 : [WRN] 
slow request 480.063439 seconds old, received at 2015-01-05 
13:38:18.908100: osd_op(client.91911.1:3113675 
rb.0.13f5.238e1f29.0010 [write 475136~512] 3.679a939d 
ondisk+write e994) v4 currently waiting for subops from 27,34
2015-01-05 13:48:05.030581 osd.14 39.7.48.7:6818/11640 255 : [WRN] 4 
slow requests, 1 included below; oldest blocked for > 847.039098 secs
2015-01-05 13:48:05.030587 osd.14 39.7.48.7:6818/11640 256 : [WRN] 
slow request 480.198282 seconds old, receiv

Re: [ceph-users] Slow/Hung IOs

2015-01-06 Thread Lindsay Mathieson

On Tue, 6 Jan 2015 12:07:26 AM Sanders, Bill wrote:
> 14 and 18 happened to show up during that run, but its certainly not only
> those OSD's.  It seems to vary each run.  Just from the runs I've done
> today I've seen the following pairs of OSD's:

Could your osd nodes be paging? I know from watching atop, the performance on 
my nodes goes to the toilet when it starts hit the paging file.
-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Slow/Hung IOs

2015-01-06 Thread Christian Balzer

On Mon, 5 Jan 2015 22:36:29 + Sanders, Bill wrote:

> Hi Ceph Users,
> 
> We've got a Ceph cluster we've built, and we're experiencing issues with
> slow or hung IO's, even running 'rados bench' on the OSD cluster.
> Things start out great, ~600 MB/s, then rapidly drops off as the test
> waits for IO's. Nothing seems to be taxed... the system just seems to be
> waiting.  Any help trying to figure out what could cause the slow IO's
> is appreciated.
> 
I assume nothing in the logs of the respective OSDs either?
Kernel or other logs equally silent?

Watching things with atop (while running the test) not showing anything
particular?

Looking at the myriad of throttles and other data in 
http://ceph.com/docs/next/dev/perf_counters/ 
might be helpful for the affected OSDs.

Having this kind of (consistent?) trouble feels like a networking issue of
sorts, OSDs not able to reach each other or something massively messed up
in the I/O stack.

[snip]

> Our ceph cluster is 4x Dell R720xd nodes:
> 2x1TB spinners configured in RAID for the OS
> 10x4TB spinners for OSD's (XFS)
> 2x400GB SSD's, each with 5x~50GB OSD journals
> 2x Xeon E5-2620 CPU (/proc/cpuinfo reports 24 cores)
> 128GB RAM
> Two networks (public+cluster), both over infiniband
> 
Usual IB kernel tuning done, network stack stuff and vm/min_free_kbytes to
512MB at least?
 
> Three monitors are configured on the first three nodes, and use a chunk
> of one of the SSDs for their data, on an XFS partition
> 
Since you see nothing in the logs probably not your issue, but monitors
like the I/O for their leveldb fast, SSD recommended. 

> Software:
> SLES 11SP3, with some in house patching. (3.0.1 kernel, "ceph-client"
> backported from 3.10) Ceph version: ceph-0.80.5-0.9.2, packaged by SUSE
> 
Can't get a 3.16 backport for this?

> ceph.conf:
> fsid = 3e8dbfd8-c3c8-4d30-80e2-cd059619d757
> mon initial members = tvsaq1, tvsaq2, tvsar1
> mon host = 39.7.48.6, 39.7.48.7, 39.7.48.8
> 
> cluster network = 39.64.0.0/12
> public network = 39.0.0.0/12
> auth cluster required = cephx
> auth service required = cephx
> auth client required = cephx
> osd journal size = 9000
Not sure how this will affect things given that you have 50GB partitions.

I'd remove that line and replace it with something like:

 filestore_max_sync_interval = 30

(I use 10 with 10GB journals)

Regards,

Christian

> filestore xattr use omap = true
> osd crush update on start = false
> osd pool default size = 3
> osd pool default min size = 1
> osd pool default pg num = 4096
> osd pool default pgp num = 4096
> 
> mon clock drift allowed = .100
> osd mount options xfs = rw,noatime,inode64
> 
> 
> 
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Slow/Hung IOs

2015-01-05 Thread Sanders, Bill

Thanks for the reply.

14 and 18 happened to show up during that run, but its certainly not only those 
OSD's.  It seems to vary each run.  Just from the runs I've done today I've 
seen the following pairs of OSD's:

['0,13', '0,18', '0,24', '0,25', '0,32', '0,34', '0,36', '10,22', '11,30', 
'12,28', '13,30', '14,22', '14,24', '14,27', '14,30', '14,31', '14,33', 
'14,34', '14,35', '14,39', '16,20', '16,27', '18,38', '19,30', '19,31', 
'19,39', '20,38', '22,30', '26,37', '26,38', '27,33', '27,34', '27,36', 
'28,32', '28,34', '28,36', '28,37', '3,18', '3,27', '3,29', '3,37', '4,10', 
'4,29', '5,19', '5,37', '6,25', '9,28', '9,29', '9,37']

Which is almost all of the OSD's in the system.

Bill


From: Lincoln Bryant [linco...@uchicago.edu]
Sent: Monday, January 05, 2015 3:40 PM
To: Sanders, Bill
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Slow/Hung IOs

Hi BIll,

>From your log excerpt, it looks like your slow requests are happening on OSDs 
>14 and 18. Is it always these two OSDs?

If you don't have a long recovery time (e.g., the cluster is just full of test 
data), maybe you could try setting OSDs 14 and 18 out and re-benching?

Alternatively I suppose you could just use bonnie++ or dd etc to write to those 
OSDs (careful to not clobber any Ceph dirs) and see how the performance looks.

Cheers,
Lincoln

On Jan 5, 2015, at 4:36 PM, Sanders, Bill wrote:

Hi Ceph Users,

We've got a Ceph cluster we've built, and we're experiencing issues with slow 
or hung IO's, even running 'rados bench' on the OSD cluster.  Things start out 
great, ~600 MB/s, then rapidly drops off as the test waits for IO's. Nothing 
seems to be taxed... the system just seems to be waiting.  Any help trying to 
figure out what could cause the slow IO's is appreciated.

For example, 'rados -p rbd bench 60 write -t 32' takes over 900s to complete:

A typical rados bench:
 Total time run: 957.458274
Total writes made:  9251
Write size: 4194304
Bandwidth (MB/sec): 38.648

Stddev Bandwidth:   157.323
Max bandwidth (MB/sec): 964
Min bandwidth (MB/sec): 0
Average Latency:3.21126
Stddev Latency: 51.9546
Max latency:910.72
Min latency:0.04516


According to ceph.log, we're not experiencing any OSD flapping or monitor 
election cycles, just slow requests:

# grep slow /var/log/ceph/ceph.log:
2015-01-05 13:42:42.937678 osd.18 39.7.48.7:6803/11185 220 : [WRN] 3 slow 
requests, 1 included below; oldest blocked for > 513.611379 secs
2015-01-05 13:42:42.937685 osd.18 39.7.48.7:6803/11185 221 : [WRN] slow request 
30.136429 seconds old, received at 2015-01-05 13:42:12.801205: 
osd_op(client.92008.1:3101508 rb.0.1437.238e1f29.000f [write 
114688~512] 3.841c0edf ondisk+write e994) v4 currently waiting for subops from 
3,37
2015-01-05 13:42:49.938681 osd.18 39.7.48.7:6803/11185 222 : [WRN] 3 slow 
requests, 1 included below; oldest blocked for > 520.612372 secs
2015-01-05 13:42:49.938688 osd.18 39.7.48.7:6803/11185 223 : [WRN] slow request 
480.636547 seconds old, received at 2015-01-05 13:34:49.302080: 
osd_op(client.92008.1:3100010 rb.0.140d.238e1f29.0c77 [write 
3622400~512] 3.d031a69f ondisk+write e994) v4 currently waiting for subops from 
26,37
2015-01-05 13:43:12.941838 osd.18 39.7.48.7:6803/11185 224 : [WRN] 3 slow 
requests, 1 included below; oldest blocked for > 543.615545 secs
2015-01-05 13:43:12.941844 osd.18 39.7.48.7:6803/11185 225 : [WRN] slow request 
60.140595 seconds old, received at 2015-01-05 13:42:12.801205: 
osd_op(client.92008.1:3101508 rb.0.1437.238e1f29.000f [write 
114688~512] 3.841c0edf ondisk+write e994) v4 currently waiting for subops from 
3,37
2015-01-05 13:44:04.933440 osd.14 39.7.48.7:6818/11640 251 : [WRN] 4 slow 
requests, 1 included below; oldest blocked for > 606.941954 secs
2015-01-05 13:44:04.933469 osd.14 39.7.48.7:6818/11640 252 : [WRN] slow request 
240.101138 seconds old, received at 2015-01-05 13:40:04.832272: 
osd_op(client.92008.1:3101102 rb.0.142b.238e1f29.0010 [write 
475136~512] 3.5e623815 ondisk+write e994) v4 currently waiting for subops from 
27,33
2015-01-05 13:44:12.950805 osd.18 39.7.48.7:6803/11185 226 : [WRN] 3 slow 
requests, 1 included below; oldest blocked for > 603.624511 secs
2015-01-05 13:44:12.950812 osd.18 39.7.48.7:6803/11185 227 : [WRN] slow request 
120.14956

Re: [ceph-users] Slow/Hung IOs

2015-01-05 Thread Lincoln Bryant

Hi BIll,

From your log excerpt, it looks like your slow requests are happening on OSDs 
14 and 18. Is it always these two OSDs?

If you don't have a long recovery time (e.g., the cluster is just full of test 
data), maybe you could try setting OSDs 14 and 18 out and re-benching?

Alternatively I suppose you could just use bonnie++ or dd etc to write to those 
OSDs (careful to not clobber any Ceph dirs) and see how the performance looks. 

Cheers,
Lincoln

On Jan 5, 2015, at 4:36 PM, Sanders, Bill wrote:

> Hi Ceph Users,
> 
> We've got a Ceph cluster we've built, and we're experiencing issues with slow 
> or hung IO's, even running 'rados bench' on the OSD cluster.  Things start 
> out great, ~600 MB/s, then rapidly drops off as the test waits for IO's. 
> Nothing seems to be taxed... the system just seems to be waiting.  Any help 
> trying to figure out what could cause the slow IO's is appreciated.
> 
> For example, 'rados -p rbd bench 60 write -t 32' takes over 900s to complete:
> 
> A typical rados bench:
>  Total time run: 957.458274
> Total writes made:  9251
> Write size: 4194304
> Bandwidth (MB/sec): 38.648 
> 
> Stddev Bandwidth:   157.323
> Max bandwidth (MB/sec): 964
> Min bandwidth (MB/sec): 0
> Average Latency:3.21126
> Stddev Latency: 51.9546
> Max latency:910.72
> Min latency:0.04516
> 
> 
> According to ceph.log, we're not experiencing any OSD flapping or monitor 
> election cycles, just slow requests:
> 
> # grep slow /var/log/ceph/ceph.log:
> 2015-01-05 13:42:42.937678 osd.18 39.7.48.7:6803/11185 220 : [WRN] 3 slow 
> requests, 1 included below; oldest blocked for > 513.611379 secs
> 2015-01-05 13:42:42.937685 osd.18 39.7.48.7:6803/11185 221 : [WRN] slow 
> request 30.136429 seconds old, received at 2015-01-05 13:42:12.801205: 
> osd_op(client.92008.1:3101508 rb.0.1437.238e1f29.000f [write 
> 114688~512] 3.841c0edf ondisk+write e994) v4 currently waiting for subops 
> from 3,37
> 2015-01-05 13:42:49.938681 osd.18 39.7.48.7:6803/11185 222 : [WRN] 3 slow 
> requests, 1 included below; oldest blocked for > 520.612372 secs
> 2015-01-05 13:42:49.938688 osd.18 39.7.48.7:6803/11185 223 : [WRN] slow 
> request 480.636547 seconds old, received at 2015-01-05 13:34:49.302080: 
> osd_op(client.92008.1:3100010 rb.0.140d.238e1f29.0c77 [write 
> 3622400~512] 3.d031a69f ondisk+write e994) v4 currently waiting for subops 
> from 26,37
> 2015-01-05 13:43:12.941838 osd.18 39.7.48.7:6803/11185 224 : [WRN] 3 slow 
> requests, 1 included below; oldest blocked for > 543.615545 secs
> 2015-01-05 13:43:12.941844 osd.18 39.7.48.7:6803/11185 225 : [WRN] slow 
> request 60.140595 seconds old, received at 2015-01-05 13:42:12.801205: 
> osd_op(client.92008.1:3101508 rb.0.1437.238e1f29.000f [write 
> 114688~512] 3.841c0edf ondisk+write e994) v4 currently waiting for subops 
> from 3,37
> 2015-01-05 13:44:04.933440 osd.14 39.7.48.7:6818/11640 251 : [WRN] 4 slow 
> requests, 1 included below; oldest blocked for > 606.941954 secs
> 2015-01-05 13:44:04.933469 osd.14 39.7.48.7:6818/11640 252 : [WRN] slow 
> request 240.101138 seconds old, received at 2015-01-05 13:40:04.832272: 
> osd_op(client.92008.1:3101102 rb.0.142b.238e1f29.0010 [write 
> 475136~512] 3.5e623815 ondisk+write e994) v4 currently waiting for subops 
> from 27,33
> 2015-01-05 13:44:12.950805 osd.18 39.7.48.7:6803/11185 226 : [WRN] 3 slow 
> requests, 1 included below; oldest blocked for > 603.624511 secs
> 2015-01-05 13:44:12.950812 osd.18 39.7.48.7:6803/11185 227 : [WRN] slow 
> request 120.149561 seconds old, received at 2015-01-05 13:42:12.801205: 
> osd_op(client.92008.1:3101508 rb.0.1437.238e1f29.000f [write 
> 114688~512] 3.841c0edf ondisk+write e994) v4 currently waiting for subops 
> from 3,37
> 2015-01-05 13:46:12.988010 osd.18 39.7.48.7:6803/11185 228 : [WRN] 3 slow 
> requests, 1 included below; oldest blocked for > 723.661722 secs
> 2015-01-05 13:46:12.988017 osd.18 39.7.48.7:6803/11185 229 : [WRN] slow 
> request 240.186772 seconds old, received at 2015-01-05 13:42:12.801205: 
> osd_op(client.92008.1:3101508 rb.0.1437.238e1f29.000f [write 
> 114688~512] 3.841c0edf ondisk+write e994) v4 currently waiting for subops 
> from 3,37
> 2015-01-05 13:46:18.971570 osd.14 39.7.48.7:6818/11640 253 : [WRN] 4 slow 
> requests, 1 included below; oldest blocked for > 740.980083 secs
> 2015-01-05 13:46:18.971577 osd.14 39.7.48.7:6818/11640 254 : [WRN] slow 
> request 480.063439 seconds old, received at 2015-01-05 13:38:18.908100: 
> osd_op(client.91911.1:3113675 rb.0.13f5.238e1f29.0010 [write 
> 475136~512] 3.679a939d ondisk+write e994) v4 currently waiting for subops 
> from 27,34
> 2015-01-05 13:48:05.030581 osd.14 39.7.48.7:6818/11640 255 : [WRN] 4 slow 
> requests, 1 included below; oldest blocked for > 847.039098 secs
> 2015-01-05 13:48:05.030587 osd.14 39.7.48.7:6818/11640 256 : [WRN] slow 
> request 480.198282 seconds old, rec

Re: [ceph-users] Slow/Hung IOs

2015-01-05 Thread Sanders, Bill

Hi,

Yeah, the performance when the system isn't waiting for slow IO's is definitely 
acceptable for what I'm doing, it's just the handful of slow IO's messing up 
the overall latency.

Bill

From: Gonzalo Aguilar Delgado [gagui...@aguilardelgado.com]
Sent: Monday, January 05, 2015 3:47 PM
To: Sanders, Bill
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Slow/Hung IOs

Hi,

I just ran this test and found my system is not better. But I use commodity 
hardware. The only difference is latency. You should look at it.

 Total time run: 62.412381
Total writes made:  919
Write size: 4194304
Bandwidth (MB/sec): 58.899

Stddev Bandwidth:   27.7791
Max bandwidth (MB/sec): 108
Min bandwidth (MB/sec): 0
Average Latency:2.15587
Stddev Latency: 1.1327
Max latency:6.42248
Min latency:0.444754

I use 2 nodes 1GB net, but only get 100MB/s. I will look at it and increase 
node number.

Best regards,


El lun, 5 de ene 2015 a las 11:36 , Sanders, Bill  
escribió:
Hi Ceph Users,

We've got a Ceph cluster we've built, and we're experiencing issues with slow 
or hung IO's, even running 'rados bench' on the OSD cluster.  Things start out 
great, ~600 MB/s, then rapidly drops off as the test waits for IO's. Nothing 
seems to be taxed... the system just seems to be waiting.  Any help trying to 
figure out what could cause the slow IO's is appreciated.

For example, 'rados -p rbd bench 60 write -t 32' takes over 900s to complete:

A typical rados bench:
 Total time run: 957.458274
Total writes made:  9251
Write size: 4194304
Bandwidth (MB/sec): 38.648

Stddev Bandwidth:   157.323
Max bandwidth (MB/sec): 964
Min bandwidth (MB/sec): 0
Average Latency:3.21126
Stddev Latency: 51.9546
Max latency:910.72
Min latency:0.04516


According to ceph.log, we're not experiencing any OSD flapping or monitor 
election cycles, just slow requests:

# grep slow /var/log/ceph/ceph.log:
2015-01-05 13:42:42.937678 osd.18 39.7.48.7:6803/11185 220 : [WRN] 3 slow 
requests, 1 included below; oldest blocked for > 513.611379 secs
2015-01-05 13:42:42.937685 osd.18 39.7.48.7:6803/11185 221 : [WRN] slow request 
30.136429 seconds old, received at 2015-01-05 13:42:12.801205: 
osd_op(client.92008.1:3101508 rb.0.1437.238e1f29.000f [write 
114688~512] 3.841c0edf ondisk+write e994) v4 currently waiting for subops from 
3,37
2015-01-05 13:42:49.938681 osd.18 39.7.48.7:6803/11185 222 : [WRN] 3 slow 
requests, 1 included below; oldest blocked for > 520.612372 secs
2015-01-05 13:42:49.938688 osd.18 39.7.48.7:6803/11185 223 : [WRN] slow request 
480.636547 seconds old, received at 2015-01-05 13:34:49.302080: 
osd_op(client.92008.1:3100010 rb.0.140d.238e1f29.0c77 [write 
3622400~512] 3.d031a69f ondisk+write e994) v4 currently waiting for subops from 
26,37
2015-01-05 13:43:12.941838 osd.18 39.7.48.7:6803/11185 224 : [WRN] 3 slow 
requests, 1 included below; oldest blocked for > 543.615545 secs
2015-01-05 13:43:12.941844 osd.18 39.7.48.7:6803/11185 225 : [WRN] slow request 
60.140595 seconds old, received at 2015-01-05 13:42:12.801205: 
osd_op(client.92008.1:3101508 rb.0.1437.238e1f29.000f [write 
114688~512] 3.841c0edf ondisk+write e994) v4 currently waiting for subops from 
3,37
2015-01-05 13:44:04.933440 osd.14 39.7.48.7:6818/11640 251 : [WRN] 4 slow 
requests, 1 included below; oldest blocked for > 606.941954 secs
2015-01-05 13:44:04.933469 osd.14 39.7.48.7:6818/11640 252 : [WRN] slow request 
240.101138 seconds old, received at 2015-01-05 13:40:04.832272: 
osd_op(client.92008.1:3101102 rb.0.142b.238e1f29.0010 [write 
475136~512] 3.5e623815 ondisk+write e994) v4 currently waiting for subops from 
27,33
2015-01-05 13:44:12.950805 osd.18 39.7.48.7:6803/11185 226 : [WRN] 3 slow 
requests, 1 included below; oldest blocked for > 603.624511 secs
2015-01-05 13:44:12.950812 osd.18 39.7.48.7:6803/11185 227 : [WRN] slow request 
120.149561 seconds old, received at 2015-01-05 13:42:12.801205: 
osd_op(client.92008.1:3101508 rb.0.1437.238e1f29.000f [write 
114688~512] 3.841c0edf ondisk+write e994) v4 currently waiting for subops from 
3,37
2015-01-05 13:46:12.988010 osd.18 39.7.48.7:6803/11185 228 : [WRN] 3 slow 
requests, 1 included below; oldest blocked for > 723.661722 secs
2015-01-05 13:46:12.988017 osd.18 39.7.48.7:6803/11185 229 : [WRN] slow request 
240.186772 seconds old, received at 2015-01-05 13:42:12.801205: 
osd_op(client.92008.1:3101508 rb.0.1437.238e1f29.000f [write 
114688~512] 3.841c0edf ondisk+write e994) v4 currently waiting for subops from 
3,37
2015-01-05 13:46:18.971570 osd.14 39.7.48.7:6818/11640 253 : [WRN] 4 slow 
requests, 1 included below; oldest blocked for > 740.980083 secs
2015-01-05 13:46:18.971577 osd.14 39.7.48.7:6818/11

[ceph-users] Slow/Hung IOs

2015-01-05 Thread Sanders, Bill

Hi Ceph Users,

We've got a Ceph cluster we've built, and we're experiencing issues with slow 
or hung IO's, even running 'rados bench' on the OSD cluster.  Things start out 
great, ~600 MB/s, then rapidly drops off as the test waits for IO's. Nothing 
seems to be taxed... the system just seems to be waiting.  Any help trying to 
figure out what could cause the slow IO's is appreciated.

For example, 'rados -p rbd bench 60 write -t 32' takes over 900s to complete:

A typical rados bench:
 Total time run: 957.458274
Total writes made:  9251
Write size: 4194304
Bandwidth (MB/sec): 38.648

Stddev Bandwidth:   157.323
Max bandwidth (MB/sec): 964
Min bandwidth (MB/sec): 0
Average Latency:3.21126
Stddev Latency: 51.9546
Max latency:910.72
Min latency:0.04516


According to ceph.log, we're not experiencing any OSD flapping or monitor 
election cycles, just slow requests:

# grep slow /var/log/ceph/ceph.log:
2015-01-05 13:42:42.937678 osd.18 39.7.48.7:6803/11185 220 : [WRN] 3 slow 
requests, 1 included below; oldest blocked for > 513.611379 secs
2015-01-05 13:42:42.937685 osd.18 39.7.48.7:6803/11185 221 : [WRN] slow request 
30.136429 seconds old, received at 2015-01-05 13:42:12.801205: 
osd_op(client.92008.1:3101508 rb.0.1437.238e1f29.000f [write 
114688~512] 3.841c0edf ondisk+write e994) v4 currently waiting for subops from 
3,37
2015-01-05 13:42:49.938681 osd.18 39.7.48.7:6803/11185 222 : [WRN] 3 slow 
requests, 1 included below; oldest blocked for > 520.612372 secs
2015-01-05 13:42:49.938688 osd.18 39.7.48.7:6803/11185 223 : [WRN] slow request 
480.636547 seconds old, received at 2015-01-05 13:34:49.302080: 
osd_op(client.92008.1:3100010 rb.0.140d.238e1f29.0c77 [write 
3622400~512] 3.d031a69f ondisk+write e994) v4 currently waiting for subops from 
26,37
2015-01-05 13:43:12.941838 osd.18 39.7.48.7:6803/11185 224 : [WRN] 3 slow 
requests, 1 included below; oldest blocked for > 543.615545 secs
2015-01-05 13:43:12.941844 osd.18 39.7.48.7:6803/11185 225 : [WRN] slow request 
60.140595 seconds old, received at 2015-01-05 13:42:12.801205: 
osd_op(client.92008.1:3101508 rb.0.1437.238e1f29.000f [write 
114688~512] 3.841c0edf ondisk+write e994) v4 currently waiting for subops from 
3,37
2015-01-05 13:44:04.933440 osd.14 39.7.48.7:6818/11640 251 : [WRN] 4 slow 
requests, 1 included below; oldest blocked for > 606.941954 secs
2015-01-05 13:44:04.933469 osd.14 39.7.48.7:6818/11640 252 : [WRN] slow request 
240.101138 seconds old, received at 2015-01-05 13:40:04.832272: 
osd_op(client.92008.1:3101102 rb.0.142b.238e1f29.0010 [write 
475136~512] 3.5e623815 ondisk+write e994) v4 currently waiting for subops from 
27,33
2015-01-05 13:44:12.950805 osd.18 39.7.48.7:6803/11185 226 : [WRN] 3 slow 
requests, 1 included below; oldest blocked for > 603.624511 secs
2015-01-05 13:44:12.950812 osd.18 39.7.48.7:6803/11185 227 : [WRN] slow request 
120.149561 seconds old, received at 2015-01-05 13:42:12.801205: 
osd_op(client.92008.1:3101508 rb.0.1437.238e1f29.000f [write 
114688~512] 3.841c0edf ondisk+write e994) v4 currently waiting for subops from 
3,37
2015-01-05 13:46:12.988010 osd.18 39.7.48.7:6803/11185 228 : [WRN] 3 slow 
requests, 1 included below; oldest blocked for > 723.661722 secs
2015-01-05 13:46:12.988017 osd.18 39.7.48.7:6803/11185 229 : [WRN] slow request 
240.186772 seconds old, received at 2015-01-05 13:42:12.801205: 
osd_op(client.92008.1:3101508 rb.0.1437.238e1f29.000f [write 
114688~512] 3.841c0edf ondisk+write e994) v4 currently waiting for subops from 
3,37
2015-01-05 13:46:18.971570 osd.14 39.7.48.7:6818/11640 253 : [WRN] 4 slow 
requests, 1 included below; oldest blocked for > 740.980083 secs
2015-01-05 13:46:18.971577 osd.14 39.7.48.7:6818/11640 254 : [WRN] slow request 
480.063439 seconds old, received at 2015-01-05 13:38:18.908100: 
osd_op(client.91911.1:3113675 rb.0.13f5.238e1f29.0010 [write 
475136~512] 3.679a939d ondisk+write e994) v4 currently waiting for subops from 
27,34
2015-01-05 13:48:05.030581 osd.14 39.7.48.7:6818/11640 255 : [WRN] 4 slow 
requests, 1 included below; oldest blocked for > 847.039098 secs
2015-01-05 13:48:05.030587 osd.14 39.7.48.7:6818/11640 256 : [WRN] slow request 
480.198282 seconds old, received at 2015-01-05 13:40:04.832272: 
osd_op(client.92008.1:3101102 rb.0.142b.238e1f29.0010 [write 
475136~512] 3.5e623815 ondisk+write e994) v4 currently waiting for subops from 
27,33


iostat -dxm:
Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda5662.99 3.63  155.173.3722.94 0.03   296.72 
1.056.64   1.79  28.32
sdb   0.73 0.043.62   70.66 0.0410.85   300.04 
0.060.79   0.08   0.61
sdc   0.70 0.001.70   65.45 0.0110.34   315.60 
0.060.85   0.09   0.57
sdd   0.07 0.95   16.78

Re: [ceph-users] Slow/Hung IOs

Re: [ceph-users] Slow/Hung IOs

Re: [ceph-users] Slow/Hung IOs

Re: [ceph-users] Slow/Hung IOs

Re: [ceph-users] Slow/Hung IOs

Re: [ceph-users] Slow/Hung IOs

Re: [ceph-users] Slow/Hung IOs

Re: [ceph-users] Slow/Hung IOs

Re: [ceph-users] Slow/Hung IOs

[ceph-users] Slow/Hung IOs

10 matches

Site Navigation

Mail list logo

Footer information