Re: [ceph-users] slow requests break performance

2017-02-03 Thread Christian Balzer
Hello, On Thu, 02 Feb 2017 10:24:53 +0100 Eugen Block wrote: > Hi, > > thank you very much for your answer! I'm not sure I get all your > points, but I'll try to dig deeper. > I'll repeat myself and say that looking at your nodes with atop during a benchmark and slow request situation

Re: [ceph-users] slow requests break performance

2017-02-01 Thread Christian Balzer
Hello, On Wed, 01 Feb 2017 15:16:15 +0100 Eugen Block wrote: > > You've told us absolutely nothing about your cluster > > You're right, I'll try to provide as much information as possible. > > Please note that we have kind of a "special" layout... The HDDs on > ndesan01 are in a RAID6,

Re: [ceph-users] slow requests break performance

2017-02-01 Thread Eugen Block
You've told us absolutely nothing about your cluster You're right, I'll try to provide as much information as possible. Please note that we have kind of a "special" layout... The HDDs on ndesan01 are in a RAID6, which is bcached by two SSDs in a RAID1: ---cut here--- ndesan01: # cat

Re: [ceph-users] slow requests break performance

2017-02-01 Thread Christian Balzer
Hello, On Wed, 01 Feb 2017 11:43:02 +0100 Eugen Block wrote: > Hi, > > I haven't tracked the slow requests yet, but I ran some performance > tests. Although I'm not an expert, I believe the results are quite > unsatisfying. > You've told us absolutely nothing about your cluster that would

Re: [ceph-users] slow requests break performance

2017-02-01 Thread Eugen Block
Hi, I haven't tracked the slow requests yet, but I ran some performance tests. Although I'm not an expert, I believe the results are quite unsatisfying. I ran a couple of rados bench tests in a pool, with different replication sizes (1 to 3). The tests were executed on one of the ceph

Re: [ceph-users] slow requests break performance

2017-01-12 Thread Brad Hubbard
Check the latency figures in a "perf dump". High numbers in a particular area may help you nail it. I suspect though, that it may come down to enabling debug logging and tracking a slow request through the logs. On Thu, Jan 12, 2017 at 8:41 PM, Eugen Block wrote: > Hi, > >>

Re: [ceph-users] slow requests break performance

2017-01-12 Thread Eugen Block
Hi, Looking at the output of dump_historic_ops and dump_ops_in_flight I waited for new slow request messages and dumped the historic_ops into a file. The reporting OSD shows lots of "waiting for rw locks" messages and a duration of more than 30 secs: "age": 366.044746,

Re: [ceph-users] slow requests break performance

2017-01-11 Thread Brad Hubbard
On Thu, Jan 12, 2017 at 2:19 AM, Eugen Block wrote: > Hi, > > I simply grepped for "slow request" in ceph.log. What exactly do you mean by > "effective OSD"? > > If I have this log line: > 2017-01-11 [...] osd.16 [...] cluster [WRN] slow request 32.868141 seconds > old, received at

Re: [ceph-users] slow requests break performance

2017-01-11 Thread Eugen Block
Hi, I simply grepped for "slow request" in ceph.log. What exactly do you mean by "effective OSD"? If I have this log line: 2017-01-11 [...] osd.16 [...] cluster [WRN] slow request 32.868141 seconds old, received at 2017-01-11 [...] ack+ondisk+write+known_if_redirected e12440) currently

Re: [ceph-users] slow requests break performance

2017-01-11 Thread Burkhard Linke
Hi, just for clarity: Did you parse the slow request messages and use the effective OSD in the statistics? Some message may refer to other OSDs, e.g. "waiting for sub op on OSD X,Y". The reporting OSD is not the root cause in that case, but one of the mentioned OSDs (and I'm currently not

[ceph-users] slow requests break performance

2017-01-11 Thread Eugen Block
Hi list, I'm having trouble with slow requests, they have a noticeable impact on the performance. I'd like to find out, what the root cause is, I guess there are a lot of possible causes. But I'll just describe what I'm seeing and hopefully someone can give advices. I just counted the