Re: Possible netlink autobind regression

2015-09-17 Thread Thomas Graf
On 09/17/15 at 01:15pm, Herbert Xu wrote: > On Wed, Sep 16, 2015 at 10:02:00PM -0700, Cong Wang wrote: > > > > This part doesn't look correct, seems it is checking if this is a kernel > > netlink socket rather than if it is bound. But I am not sure... > > Good point. I've changed it so that bound

Re: Possible netlink autobind regression

2015-09-17 Thread Tejun Heo
Hello, Herbert. On Thu, Sep 17, 2015 at 01:15:03PM +0800, Herbert Xu wrote: > netlink: Fix autobind race condition that leads to zero port ID > > The commit c0bb07df7d981e4091432754e30c9c720e2c0c78 ("netlink: > Reset portid after netlink_insert failure") introduced a race > condition where if two

Re: Possible netlink autobind regression

2015-09-16 Thread Herbert Xu
On Wed, Sep 16, 2015 at 10:02:00PM -0700, Cong Wang wrote: > > This part doesn't look correct, seems it is checking if this is a kernel > netlink socket rather than if it is bound. But I am not sure... Good point. I've changed it so that bound is only set for non-kernel sockets. ---8<--- netlink

Re: Possible netlink autobind regression

2015-09-16 Thread Cong Wang
On Wed, Sep 16, 2015 at 8:41 PM, Herbert Xu wrote: > On Thu, Sep 17, 2015 at 11:08:45AM +0800, Herbert Xu wrote: >> >> Good catch! I think your explanation makes perfect sense. Linus >> ran into this previously too after suspend-and-resume. > > Unfortunately you can't just postpone the setting of

Re: Possible netlink autobind regression

2015-09-16 Thread Herbert Xu
On Thu, Sep 17, 2015 at 11:08:45AM +0800, Herbert Xu wrote: > > Good catch! I think your explanation makes perfect sense. Linus > ran into this previously too after suspend-and-resume. Unfortunately you can't just postpone the setting of portid because once you pass it onto rhashtable the portid

Re: Possible netlink autobind regression

2015-09-16 Thread Herbert Xu
On Wed, Sep 16, 2015 at 10:29:09PM -0400, Tejun Heo wrote: > > Anyways, after the patch, it seems something like the following could > happen. Good catch! I think your explanation makes perfect sense. Linus ran into this previously too after suspend-and-resume. I'll look into it. Thanks! -- Em

Possible netlink autobind regression

2015-09-16 Thread Tejun Heo
Hello, We're seeing processes piling up on audit_cmd_mutex and after some digging it turns out sometimes audit_receive() ends up recursing into itself causing an A-A deadlock on audit_cmd_mutex. Here's the backtrace. PID: 1939995 TASK: 88085bdde360 CPU: 27 COMMAND: "crond" #0 [8803