Re: [ceph-users] (yet another) multi active mds advise needed

2018-05-19 Thread Webert de Souza Lima
Hi Daniel,

Thanks for clarifying.
I'll have a look at dirfrag option.

Regards,
Webert Lima

Em sáb, 19 de mai de 2018 01:18, Daniel Baumann 
escreveu:

> On 05/19/2018 01:13 AM, Webert de Souza Lima wrote:
> > New question: will it make any difference in the balancing if instead of
> > having the MAIL directory in the root of cephfs and the domains's
> > subtrees inside it, I discard the parent dir and put all the subtress
> right in cephfs root?
>
> the balancing between the MDS is influenced by which directories are
> accessed, the currently accessed directory-trees are diveded between the
> MDS's (also check the dirfrag option in the docs). assuming you have the
> same access pattern, the "fragmentation" between the MDS's happens at
> these "target-directories", so it doesn't matter if these directories
> are further up or down in the same filesystem tree.
>
> in the multi-MDS scenario where the MDS serving rank 0 fails, the
> effects in the moment of the failure for any cephfs client accessing a
> directory/file are the same (as described in an earlier mail),
> regardless on which level the directory/file is within the filesystem.
>
> Regards,
> Daniel
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Intepreting reason for blocked request

2018-05-19 Thread Bryan Henderson
>>> 2018-05-03 01:56:35.249122 osd.0 192.168.1.16:6800/348 54 :
>>>   cluster [WRN] slow request 961.557151 seconds old,
>>>   received at 2018-05-03 01:40:33.689191:
>>> pg_query(4.f epoch 490) currently wait for new map
>>>
>
>The OSD is waiting for a new OSD map, which it will get from one of its
>peers or the monitor (by request). This tends to happen if the client sees
>a newer version than the OSD does.

Hmmm.  So the client gets the current OSD map from the Monitor and then
indicates in its request to the OSD what map epoch it is using?  And if the
OSD has an older map, it requests a new one from another OSD or Monitor before
proceeding?  And I suppose if the current epoch is still older than what the
client said, the OSD keeps trying until it gets the epoch the client stated.

If that's so, this situation could happen if for some reason the client got
the idea that there's a newer map than what there really is.

What I'm looking at is probably just a Ceph bug, because this small test
cluster got into this state immediately upon startup, before any client had
connected (I assume these blocked requests are from inside the cluster), and
the requests aren't just blocked for a long time; they're blocked
indefinitely.  The only time I've seen it is when I brought the cluster up in
a different order than I usually do.  So I'm just trying to understand the
inner workings in case I need to debug it if it keeps happening.

-- 
Bryan Henderson   San Jose, California
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multi-MDS Failover

2018-05-19 Thread Blair Bethwaite
On 19 May 2018 at 09:20, Scottix  wrote:
> It would be nice to have an option to have all IO blocked if it hits a 
> degraded state until it recovers. Since you are unaware of other MDS state, 
> seems like that would be tough to do.

I agree this would be a nice knob to have from the perspective of
having consistent (and easy to diagnose) client behaviour when such a
situation occurs. However I don't think this is possible, if a client
is working in a directory served via rank-0 MDS (whilst rank-1 has
just gone down) it isn't going to know rank-0 is down until the MONs
do. So to get the "all stop" you are talking about the client would
then have to undo already committed IO(!), the only other option would
be "pinging" all ranks on every metadata change, and that sounds
horrible.

Maybe this is a case where you'd be better off putting NFS in front of
your CephFS?

-- 
Cheers,
~Blairo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] in retrospect get OSD for "slow requests are blocked" ? / get detailed health status via librados?

2018-05-19 Thread Brad Hubbard
On Sat, May 19, 2018 at 5:01 PM, Uwe Sauter  wrote:
> The mistery is that these blocked requests occur numerously when at
> least
> one of the 6 servers is booted with kernel 4.15.17, if all are running
> 4.13.16 the number of blocked requests is infrequent and low.


 Sounds like you need to profile your two kernel versions and work out
 why one is under-performing.

>>>
>>> Well, the problem is that I see this behavior only in our production
>>> system (6 hosts and 22 OSDs total). The test system I have is
>>> a bit smaller (only 3 hosts with 12 OSDs on older hardware) and shows no
>>> sign of this possible regression…
>>
>>
>> Are you saying you can't gather performance data from your production
>> system?
>
>
> As far as I can tell the issue only occurs on the production cluster.
> Without a way to reproduce
> on the test cluster I can't bisect the kernels as on the production cluster
> runs our central
> infrastructure and each time the active LDAP is stuck, most of the other
> services are stuck as well…
> My colleagues won't appreciate that.
>
> What other kind of performance data would you have collected?
>

On systems where this can be reproduced I would use tools like 'perf
top', pvp, collectd and maybe something like the following to capture
data that can be analysed to define the nature of the issue.

// for rhel6 and rhel7 so may need modification

# { top -n 5 -b > /tmp/top.out; \
vmstat 1 50 > /tmp/vm.out; \
iostat -tkx -p ALL 1 10 > /tmp/io.out; \
mpstat -A 1 10 > /tmp/mp.out; \
ps auwwx > /tmp/ps1.out; \
ps axHo %cpu,stat,pid,tid,pgid,ppid,comm,wchan > /tmp/ps2.out; \
sar -A 1 50 > /tmp/sar.out; \
free > /tmp/free.out; } ; tar -cjvf outputs_$(hostname)_$(date
+"%d-%b-%Y_%H%M").tar.bz2 /tmp/*.out

As you've already pointed out this currently seems to be a kernel
performance issue but analysis of this sort of data should help you
narrow it down.

Of course, all of this relies on you being able to reproduce the
issue, but maybe you can gather a baseline to begin with so you have
something to compare to when you are in a position to gather perf data
during an issue.

At the same time I'd suggest pursuing this with Proxmox and/or Ubuntu
to see if they have anything to offer.

-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] in retrospect get OSD for "slow requests are blocked" ? / get detailed health status via librados?

2018-05-19 Thread Uwe Sauter



Am 19.05.2018 um 01:45 schrieb Brad Hubbard:

On Thu, May 17, 2018 at 6:06 PM, Uwe Sauter  wrote:

Brad,

thanks for the bug report. This is exactly the problem I am having (log-wise).


You don't give any indication what version you are running but see
https://tracker.ceph.com/issues/23205



the cluster is an Proxmox installation which is based on an Ubuntu kernel.

# ceph -v
ceph version 12.2.5 (dfcb7b53b2e4fcd2a5af0240d4975adc711ab96e) luminous
(stable)

The mistery is that these blocked requests occur numerously when at least
one of the 6 servers is booted with kernel 4.15.17, if all are running
4.13.16 the number of blocked requests is infrequent and low.


Sounds like you need to profile your two kernel versions and work out
why one is under-performing.



Well, the problem is that I see this behavior only in our production system (6 
hosts and 22 OSDs total). The test system I have is
a bit smaller (only 3 hosts with 12 OSDs on older hardware) and shows no sign 
of this possible regression…


Are you saying you can't gather performance data from your production system?


As far as I can tell the issue only occurs on the production cluster. Without a 
way to reproduce
on the test cluster I can't bisect the kernels as on the production cluster 
runs our central
infrastructure and each time the active LDAP is stuck, most of the other 
services are stuck as well…
My colleagues won't appreciate that.

What other kind of performance data would you have collected?

Uwe






Regards,

 Uwe





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com