Re: [ceph-users] How to debug problem in MDS ?

2018-10-18 Thread Yan, Zheng
On Thu, Oct 18, 2018 at 3:35 PM Florent B  wrote:
>
> I'm not familiar with gdb, what do I need to do ? Install "-gdb" version
> of ceph-mds package ? Then ?
> Thank you
>

install ceph with debug info, install gdb. run 'gdb attach '

> On 18/10/2018 03:40, Yan, Zheng wrote:
> > On Thu, Oct 18, 2018 at 3:59 AM Florent B  wrote:
> >> Hi,
> >> I'm not running multiple active MDS (1 active & 7 standby).
> >> I know about debug_mds 20, is it the only log you need to see bugs ?
> >>
> >> On 16/10/2018 18:32, Sergey Malinin wrote:
> >>> Are you running multiple active MDS daemons?
> >>> On MDS host issue "ceph-daemon mds.X config set debug_mds 20" for maximum 
> >>> logging verbosity.
> >>>
>  On 16.10.2018, at 19:23, Florent B  wrote:
> 
>  Hi,
> 
>  A few months ago I sent a message to that list about a problem with a
>  Ceph + Dovecot setup.
> 
>  Bug disappeared and I didn't answer to the thread.
> 
>  Now the bug has come again (Luminous up-to-date cluster + Dovecot
>  up-to-date + Debian Stretch up-to-date).
> 
>  I know how to reproduce it, but it seems very related to my user's
>  Dovecot data (few GB) and is related to file locking system (bug occurs
>  when I set locking method to "fcntl" or "flock" in Dovecot, but not with
>  "dotlock".
> 
>  It ends to a unresponsive MDS (100% CPU hang, switching to another MDS
>  but always staying at 100% CPU usage). I can't even use the admin socket
>  when MDS is hanged.
> 
> > For issue like this, gdb is the most convenient way to debug. After
> > finding where the buggy code is, set debug_mds=20 and restart mds,
> > check the log to find how the bug was triggered.
> >
> > Regards
> > Yan, Zheng
> >
> >
>  I would like to know *exactly* which information do you need to
>  investigate that bug ? (which commands, when, how to report large log
>  files...)
> 
>  Thank you.
> 
>  Florent
> 
> 
>  ___
>  ceph-users mailing list
>  ceph-users@lists.ceph.com
>  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>> ___
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to debug problem in MDS ?

2018-10-17 Thread Yan, Zheng
On Thu, Oct 18, 2018 at 3:59 AM Florent B  wrote:
>
> Hi,
> I'm not running multiple active MDS (1 active & 7 standby).
> I know about debug_mds 20, is it the only log you need to see bugs ?
>
> On 16/10/2018 18:32, Sergey Malinin wrote:
> > Are you running multiple active MDS daemons?
> > On MDS host issue "ceph-daemon mds.X config set debug_mds 20" for maximum 
> > logging verbosity.
> >
> >> On 16.10.2018, at 19:23, Florent B  wrote:
> >>
> >> Hi,
> >>
> >> A few months ago I sent a message to that list about a problem with a
> >> Ceph + Dovecot setup.
> >>
> >> Bug disappeared and I didn't answer to the thread.
> >>
> >> Now the bug has come again (Luminous up-to-date cluster + Dovecot
> >> up-to-date + Debian Stretch up-to-date).
> >>
> >> I know how to reproduce it, but it seems very related to my user's
> >> Dovecot data (few GB) and is related to file locking system (bug occurs
> >> when I set locking method to "fcntl" or "flock" in Dovecot, but not with
> >> "dotlock".
> >>
> >> It ends to a unresponsive MDS (100% CPU hang, switching to another MDS
> >> but always staying at 100% CPU usage). I can't even use the admin socket
> >> when MDS is hanged.
> >>

For issue like this, gdb is the most convenient way to debug. After
finding where the buggy code is, set debug_mds=20 and restart mds,
check the log to find how the bug was triggered.

Regards
Yan, Zheng


> >> I would like to know *exactly* which information do you need to
> >> investigate that bug ? (which commands, when, how to report large log
> >> files...)
> >>
> >> Thank you.
> >>
> >> Florent
> >>
> >>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to debug problem in MDS ?

2018-10-16 Thread Sergey Malinin
Are you running multiple active MDS daemons?
On MDS host issue "ceph-daemon mds.X config set debug_mds 20" for maximum 
logging verbosity.

> On 16.10.2018, at 19:23, Florent B  wrote:
> 
> Hi,
> 
> A few months ago I sent a message to that list about a problem with a
> Ceph + Dovecot setup.
> 
> Bug disappeared and I didn't answer to the thread.
> 
> Now the bug has come again (Luminous up-to-date cluster + Dovecot
> up-to-date + Debian Stretch up-to-date).
> 
> I know how to reproduce it, but it seems very related to my user's
> Dovecot data (few GB) and is related to file locking system (bug occurs
> when I set locking method to "fcntl" or "flock" in Dovecot, but not with
> "dotlock".
> 
> It ends to a unresponsive MDS (100% CPU hang, switching to another MDS
> but always staying at 100% CPU usage). I can't even use the admin socket
> when MDS is hanged.
> 
> I would like to know *exactly* which information do you need to
> investigate that bug ? (which commands, when, how to report large log
> files...)
> 
> Thank you.
> 
> Florent
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com