Hello,
On Thu, 02 Feb 2017 10:24:53 +0100 Eugen Block wrote:
> Hi,
>
> thank you very much for your answer! I'm not sure I get all your
> points, but I'll try to dig deeper.
>
I'll repeat myself and say that looking at your nodes with atop during a
benchmark and slow request situation should
Hi,
thank you very much for your answer! I'm not sure I get all your
points, but I'll try to dig deeper.
IOPS are more likely to be your bottleneck
Can you give me an example of reasonable numbers for ops/s, especially
for our small cluster with 1 Gb/s? I have no idea what could be
con
Hello,
On Wed, 01 Feb 2017 15:16:15 +0100 Eugen Block wrote:
> > You've told us absolutely nothing about your cluster
>
> You're right, I'll try to provide as much information as possible.
>
> Please note that we have kind of a "special" layout... The HDDs on
> ndesan01 are in a RAID6, whi
You've told us absolutely nothing about your cluster
You're right, I'll try to provide as much information as possible.
Please note that we have kind of a "special" layout... The HDDs on
ndesan01 are in a RAID6, which is bcached by two SSDs in a RAID1:
---cut here---
ndesan01: # cat /proc/m
Hello,
On Wed, 01 Feb 2017 11:43:02 +0100 Eugen Block wrote:
> Hi,
>
> I haven't tracked the slow requests yet, but I ran some performance
> tests. Although I'm not an expert, I believe the results are quite
> unsatisfying.
>
You've told us absolutely nothing about your cluster that would b
Hi,
I haven't tracked the slow requests yet, but I ran some performance
tests. Although I'm not an expert, I believe the results are quite
unsatisfying.
I ran a couple of rados bench tests in a pool, with different
replication sizes (1 to 3). The tests were executed on one of the ceph
n
Check the latency figures in a "perf dump". High numbers in a
particular area may help you nail it.
I suspect though, that it may come down to enabling debug logging and
tracking a slow request through the logs.
On Thu, Jan 12, 2017 at 8:41 PM, Eugen Block wrote:
> Hi,
>
>> Looking at the output
Hi,
Looking at the output of dump_historic_ops and dump_ops_in_flight
I waited for new slow request messages and dumped the historic_ops
into a file. The reporting OSD shows lots of "waiting for rw locks"
messages and a duration of more than 30 secs:
"age": 366.044746,
On Thu, Jan 12, 2017 at 2:19 AM, Eugen Block wrote:
> Hi,
>
> I simply grepped for "slow request" in ceph.log. What exactly do you mean by
> "effective OSD"?
>
> If I have this log line:
> 2017-01-11 [...] osd.16 [...] cluster [WRN] slow request 32.868141 seconds
> old, received at 2017-01-11 [...
Hi,
I simply grepped for "slow request" in ceph.log. What exactly do you
mean by "effective OSD"?
If I have this log line:
2017-01-11 [...] osd.16 [...] cluster [WRN] slow request 32.868141
seconds old, received at 2017-01-11 [...]
ack+ondisk+write+known_if_redirected e12440) currently wa
Hi,
just for clarity:
Did you parse the slow request messages and use the effective OSD in the
statistics? Some message may refer to other OSDs, e.g. "waiting for sub
op on OSD X,Y". The reporting OSD is not the root cause in that case,
but one of the mentioned OSDs (and I'm currently not a
Hi list,
I'm having trouble with slow requests, they have a noticeable impact
on the performance. I'd like to find out, what the root cause is, I
guess there are a lot of possible causes. But I'll just describe what
I'm seeing and hopefully someone can give advices.
I just counted the occur
12 matches
Mail list logo