Re: [PATCH v2] audit: use proper refcount locking on audit_sock

2016-12-13 Thread Richard Guy Briggs
On 2016-12-13 00:10, Richard Guy Briggs wrote:
> On 2016-12-12 15:18, Paul Moore wrote:
> > On Mon, Dec 12, 2016 at 5:03 AM, Richard Guy Briggs  wrote:
> > > Resetting audit_sock appears to be racy.
> > >
> > > audit_sock was being copied and dereferenced without using a refcount on
> > > the source sock.
> > >
> > > Bump the refcount on the underlying sock when we store a refrence in
> > > audit_sock and release it when we reset audit_sock.  audit_sock
> > > modification needs the audit_cmd_mutex.
> > >
> > > See: https://lkml.org/lkml/2016/11/26/232
> > >
> > > Thanks to Eric Dumazet  and Cong Wang
> > >  on ideas how to fix it.
> > >
> > > Signed-off-by: Richard Guy Briggs 
> > > ---
> > > There has been a lot of change in the audit code that is about to go
> > > upstream to address audit queue issues.  This patch is based on the
> > > source tree: git://git.infradead.org/users/pcmoore/audit#next
> > > ---
> > >  kernel/audit.c |   34 --
> > >  1 files changed, 28 insertions(+), 6 deletions(-)
> > > diff --git a/kernel/audit.c b/kernel/audit.c
> > > index f20eee0..439f7f3 100644
> > > --- a/kernel/audit.c
> > > +++ b/kernel/audit.c
> > > @@ -452,7 +452,9 @@ static void auditd_reset(void)
> > > struct sk_buff *skb;
> > >
> > > /* break the connection */
> > > +   sock_put(audit_sock);
> > > audit_pid = 0;
> > > +   audit_nlk_portid = 0;
> > > audit_sock = NULL;
> > >
> > > /* flush all of the retry queue to the hold queue */
> > > @@ -478,6 +480,12 @@ static int kauditd_send_unicast_skb(struct sk_buff 
> > > *skb)
> > > if (rc >= 0) {
> > > consume_skb(skb);
> > > rc = 0;
> > > +   } else {
> > > +   if (rc & (-ENOMEM|-EPERM|-ECONNREFUSED)) {
> > 
> > I dislike the way you wrote this because instead of simply looking at
> > this to see if it correct I need to sort out all the bits and find out
> > if there are other error codes that could run afoul of this check ...
> > make it simple, e.g. (rc == -ENOMEM || rc == -EPERM || ...).
> > Actually, since EPERM is 1, -EPERM (-1 in two's compliment is
> > 0x) is going to cause this to be true for pretty much any
> > value of rc, yes?
> 
> Yes, you are correct.  We need there a logical or on the results of each
> comparison to the return code rather than bit-wise or-ing the result
> codes together first to save a step.
> 
> > > +   mutex_lock(&audit_cmd_mutex);
> > > +   auditd_reset();
> > > +   mutex_unlock(&audit_cmd_mutex);
> > > +   }
> > 
> > The code in audit#next handles netlink_unicast() errors in
> > kauditd_thread() and you are adding error handling code here in
> > kauditd_send_unicast_skb() ... that's messy.  I don't care too much
> > where the auditd_reset() call is made, but let's only do it in one
> > function; FWIW, I originally put the error handling code in
> > kauditd_thread() because there was other error handling code that
> > needed to done in that scope so it resulted in cleaner code.
> 
> Hmmm, I seem to remember it not returning the return code and I thought
> I had changed it to do so, but I see now that it was already there.
> Agreed, I needlessly duplicated that error handling.
> 
> > Related, I see you are now considering ENOMEM to be a fatal condition,
> > that differs from the AUDITD_BAD macro in kauditd_thread(); this
> > difference needs to be reconciled.
> 
> Also correct about -EPERM now that I check back to the intent of commit
> 32a1dbaece7e ("audit: try harder to send to auditd upon netlink
> failure")
> 
> > Finally, you should update the comment header block for auditd_reset()
> > that it needs to be called with the audit_cmd_mutex held.
> 
> Yup.
> 
> > > @@ -1004,17 +1018,22 @@ static int audit_receive_msg(struct sk_buff *skb, 
> > > struct nlmsghdr *nlh)
> > > return -EACCES;
> > > }
> > > if (audit_pid && new_pid &&
> > > -   audit_replace(requesting_pid) != 
> > > -ECONNREFUSED) {
> > > +   (audit_replace(requesting_pid) & 
> > > (-ECONNREFUSED|-EPERM|-ENOMEM))) {
> > 
> > Do we simply want to treat any error here as fatal, and not just
> > ECONN/EPERM/ENOMEM?  If not, let's come up with a single macro to
> > handle the fatal netlink_unicast() return codes so we have some chance
> > to keep things consistent in the future.
> 
> I'll work through this before I post another patch...

Ok, I've gone back to look at the reasoning in commit 133e1e5acd4a
("audit: stop an old auditd being starved out by a new auditd") which
suggests only ECONNREFUSED can cause an audit_pid replace, so I've
returned it to its original state.

I'll post another tested patch, but I'm still not that happy that it
does not proactively reset audit_pid, audit_nlk_portid and audit_sock
when auditd's socket has a

Re: [PATCH v2] audit: use proper refcount locking on audit_sock

2016-12-13 Thread Richard Guy Briggs
On 2016-12-12 15:58, Cong Wang wrote:
> On Mon, Dec 12, 2016 at 2:03 AM, Richard Guy Briggs  wrote:
> > Resetting audit_sock appears to be racy.
> >
> > audit_sock was being copied and dereferenced without using a refcount on
> > the source sock.
> >
> > Bump the refcount on the underlying sock when we store a refrence in
> > audit_sock and release it when we reset audit_sock.  audit_sock
> > modification needs the audit_cmd_mutex.
> >
> > See: https://lkml.org/lkml/2016/11/26/232
> >
> > Thanks to Eric Dumazet  and Cong Wang
> >  on ideas how to fix it.
> >
> > Signed-off-by: Richard Guy Briggs 
> > ---
> > There has been a lot of change in the audit code that is about to go
> > upstream to address audit queue issues.  This patch is based on the
> > source tree: git://git.infradead.org/users/pcmoore/audit#next
> > ---
> >  kernel/audit.c |   34 --
> >  1 files changed, 28 insertions(+), 6 deletions(-)
> >
> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index f20eee0..439f7f3 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -452,7 +452,9 @@ static void auditd_reset(void)
> > struct sk_buff *skb;
> >
> > /* break the connection */
> > +   sock_put(audit_sock);
> 
> Why audit_sock can't be NULL here?

Fixed.

> > audit_pid = 0;
> > +   audit_nlk_portid = 0;
> > audit_sock = NULL;
> >
> > /* flush all of the retry queue to the hold queue */
> > @@ -478,6 +480,12 @@ static int kauditd_send_unicast_skb(struct sk_buff 
> > *skb)
> > if (rc >= 0) {
> > consume_skb(skb);
> > rc = 0;
> > +   } else {
> > +   if (rc & (-ENOMEM|-EPERM|-ECONNREFUSED)) {
> 
> Are these errno's bits??

No, I've fixed this silly error.

> > +   mutex_lock(&audit_cmd_mutex);
> > +   auditd_reset();
> > +   mutex_unlock(&audit_cmd_mutex);
> > +   }
> > }
> >
> > return rc;
> > @@ -579,7 +587,9 @@ static int kauditd_thread(void *dummy)
> >
> > auditd = 0;
> > if (AUDITD_BAD(rc, reschedule)) {
> > +   mutex_lock(&audit_cmd_mutex);
> > auditd_reset();
> > +   mutex_unlock(&audit_cmd_mutex);
> > reschedule = 0;
> > }
> > } else
> > @@ -594,7 +604,9 @@ static int kauditd_thread(void *dummy)
> > auditd = 0;
> > if (AUDITD_BAD(rc, reschedule)) {
> > kauditd_hold_skb(skb);
> > +   mutex_lock(&audit_cmd_mutex);
> > auditd_reset();
> > +   mutex_unlock(&audit_cmd_mutex);
> > reschedule = 0;
> > } else
> > /* temporary problem (we hope), 
> > queue
> > @@ -623,7 +635,9 @@ quick_loop:
> > if (rc) {
> > auditd = 0;
> > if (AUDITD_BAD(rc, reschedule)) {
> > +   
> > mutex_lock(&audit_cmd_mutex);
> > auditd_reset();
> > +   
> > mutex_unlock(&audit_cmd_mutex);
> > reschedule = 0;
> > }
> >
> > @@ -1004,17 +1018,22 @@ static int audit_receive_msg(struct sk_buff *skb, 
> > struct nlmsghdr *nlh)
> > return -EACCES;
> > }
> > if (audit_pid && new_pid &&
> > -   audit_replace(requesting_pid) != -ECONNREFUSED) 
> > {
> > +   (audit_replace(requesting_pid) & 
> > (-ECONNREFUSED|-EPERM|-ENOMEM))) {
> > audit_log_config_change("audit_pid", 
> > new_pid, audit_pid, 0);
> > return -EEXIST;
> > }
> > if (audit_enabled != AUDIT_OFF)
> > audit_log_config_change("audit_pid", 
> > new_pid, audit_pid, 1);
> > -   audit_pid = new_pid;
> > -   audit_nlk_portid = NETLINK_CB(skb).portid;
> > -   audit_sock = skb->sk;
> > -   if (!new_pid)
> > +   if (new_pid) {
> > +   if (audit_sock)
> > +   sock_put(audit_sock);
> > +   audit_pid = new_pid;
> > +   

Re: [PATCH v2] audit: use proper refcount locking on audit_sock

2016-12-12 Thread Richard Guy Briggs
On 2016-12-12 15:18, Paul Moore wrote:
> On Mon, Dec 12, 2016 at 5:03 AM, Richard Guy Briggs  wrote:
> > Resetting audit_sock appears to be racy.
> >
> > audit_sock was being copied and dereferenced without using a refcount on
> > the source sock.
> >
> > Bump the refcount on the underlying sock when we store a refrence in
> > audit_sock and release it when we reset audit_sock.  audit_sock
> > modification needs the audit_cmd_mutex.
> >
> > See: https://lkml.org/lkml/2016/11/26/232
> >
> > Thanks to Eric Dumazet  and Cong Wang
> >  on ideas how to fix it.
> >
> > Signed-off-by: Richard Guy Briggs 
> > ---
> > There has been a lot of change in the audit code that is about to go
> > upstream to address audit queue issues.  This patch is based on the
> > source tree: git://git.infradead.org/users/pcmoore/audit#next
> > ---
> >  kernel/audit.c |   34 --
> >  1 files changed, 28 insertions(+), 6 deletions(-)
> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index f20eee0..439f7f3 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -452,7 +452,9 @@ static void auditd_reset(void)
> > struct sk_buff *skb;
> >
> > /* break the connection */
> > +   sock_put(audit_sock);
> > audit_pid = 0;
> > +   audit_nlk_portid = 0;
> > audit_sock = NULL;
> >
> > /* flush all of the retry queue to the hold queue */
> > @@ -478,6 +480,12 @@ static int kauditd_send_unicast_skb(struct sk_buff 
> > *skb)
> > if (rc >= 0) {
> > consume_skb(skb);
> > rc = 0;
> > +   } else {
> > +   if (rc & (-ENOMEM|-EPERM|-ECONNREFUSED)) {
> 
> I dislike the way you wrote this because instead of simply looking at
> this to see if it correct I need to sort out all the bits and find out
> if there are other error codes that could run afoul of this check ...
> make it simple, e.g. (rc == -ENOMEM || rc == -EPERM || ...).
> Actually, since EPERM is 1, -EPERM (-1 in two's compliment is
> 0x) is going to cause this to be true for pretty much any
> value of rc, yes?

Yes, you are correct.  We need there a logical or on the results of each
comparison to the return code rather than bit-wise or-ing the result
codes together first to save a step.

> > +   mutex_lock(&audit_cmd_mutex);
> > +   auditd_reset();
> > +   mutex_unlock(&audit_cmd_mutex);
> > +   }
> 
> The code in audit#next handles netlink_unicast() errors in
> kauditd_thread() and you are adding error handling code here in
> kauditd_send_unicast_skb() ... that's messy.  I don't care too much
> where the auditd_reset() call is made, but let's only do it in one
> function; FWIW, I originally put the error handling code in
> kauditd_thread() because there was other error handling code that
> needed to done in that scope so it resulted in cleaner code.

Hmmm, I seem to remember it not returning the return code and I thought
I had changed it to do so, but I see now that it was already there.
Agreed, I needlessly duplicated that error handling.

> Related, I see you are now considering ENOMEM to be a fatal condition,
> that differs from the AUDITD_BAD macro in kauditd_thread(); this
> difference needs to be reconciled.

Also correct about -EPERM now that I check back to the intent of commit
32a1dbaece7e ("audit: try harder to send to auditd upon netlink
failure")

> Finally, you should update the comment header block for auditd_reset()
> that it needs to be called with the audit_cmd_mutex held.

Yup.

> > @@ -1004,17 +1018,22 @@ static int audit_receive_msg(struct sk_buff *skb, 
> > struct nlmsghdr *nlh)
> > return -EACCES;
> > }
> > if (audit_pid && new_pid &&
> > -   audit_replace(requesting_pid) != -ECONNREFUSED) 
> > {
> > +   (audit_replace(requesting_pid) & 
> > (-ECONNREFUSED|-EPERM|-ENOMEM))) {
> 
> Do we simply want to treat any error here as fatal, and not just
> ECONN/EPERM/ENOMEM?  If not, let's come up with a single macro to
> handle the fatal netlink_unicast() return codes so we have some chance
> to keep things consistent in the future.

I'll work through this before I post another patch...

> paul moore

- RGB

--
Richard Guy Briggs 
Kernel Security Engineering, Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635


Re: [PATCH v2] audit: use proper refcount locking on audit_sock

2016-12-12 Thread Richard Guy Briggs
On 2016-12-12 12:10, Paul Moore wrote:
> On Mon, Dec 12, 2016 at 5:03 AM, Richard Guy Briggs  wrote:
> > Resetting audit_sock appears to be racy.
> >
> > audit_sock was being copied and dereferenced without using a refcount on
> > the source sock.
> >
> > Bump the refcount on the underlying sock when we store a refrence in
> > audit_sock and release it when we reset audit_sock.  audit_sock
> > modification needs the audit_cmd_mutex.
> >
> > See: https://lkml.org/lkml/2016/11/26/232
> >
> > Thanks to Eric Dumazet  and Cong Wang
> >  on ideas how to fix it.
> >
> > Signed-off-by: Richard Guy Briggs 
> > ---
> > There has been a lot of change in the audit code that is about to go
> > upstream to address audit queue issues.  This patch is based on the
> > source tree: git://git.infradead.org/users/pcmoore/audit#next
> > ---
> >  kernel/audit.c |   34 --
> >  1 files changed, 28 insertions(+), 6 deletions(-)
> 
> This is coming in pretty late for the v4.10 merge window, much later
> than I would usually take things, but this is arguably important, and
> (at first glance) relatively low risk - what testing have you done on
> this?

At this point, compile and boot, and I'm able to compile and run the
supplied test code without any issues.

> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index f20eee0..439f7f3 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -452,7 +452,9 @@ static void auditd_reset(void)
> > struct sk_buff *skb;
> >
> > /* break the connection */
> > +   sock_put(audit_sock);
> > audit_pid = 0;
> > +   audit_nlk_portid = 0;
> > audit_sock = NULL;
> >
> > /* flush all of the retry queue to the hold queue */
> > @@ -478,6 +480,12 @@ static int kauditd_send_unicast_skb(struct sk_buff 
> > *skb)
> > if (rc >= 0) {
> > consume_skb(skb);
> > rc = 0;
> > +   } else {
> > +   if (rc & (-ENOMEM|-EPERM|-ECONNREFUSED)) {
> > +   mutex_lock(&audit_cmd_mutex);
> > +   auditd_reset();
> > +   mutex_unlock(&audit_cmd_mutex);
> > +   }
> > }
> >
> > return rc;
> > @@ -579,7 +587,9 @@ static int kauditd_thread(void *dummy)
> >
> > auditd = 0;
> > if (AUDITD_BAD(rc, reschedule)) {
> > +   mutex_lock(&audit_cmd_mutex);
> > auditd_reset();
> > +   mutex_unlock(&audit_cmd_mutex);
> > reschedule = 0;
> > }
> > } else
> > @@ -594,7 +604,9 @@ static int kauditd_thread(void *dummy)
> > auditd = 0;
> > if (AUDITD_BAD(rc, reschedule)) {
> > kauditd_hold_skb(skb);
> > +   mutex_lock(&audit_cmd_mutex);
> > auditd_reset();
> > +   mutex_unlock(&audit_cmd_mutex);
> > reschedule = 0;
> > } else
> > /* temporary problem (we hope), 
> > queue
> > @@ -623,7 +635,9 @@ quick_loop:
> > if (rc) {
> > auditd = 0;
> > if (AUDITD_BAD(rc, reschedule)) {
> > +   
> > mutex_lock(&audit_cmd_mutex);
> > auditd_reset();
> > +   
> > mutex_unlock(&audit_cmd_mutex);
> > reschedule = 0;
> > }
> >
> > @@ -1004,17 +1018,22 @@ static int audit_receive_msg(struct sk_buff *skb, 
> > struct nlmsghdr *nlh)
> > return -EACCES;
> > }
> > if (audit_pid && new_pid &&
> > -   audit_replace(requesting_pid) != -ECONNREFUSED) 
> > {
> > +   (audit_replace(requesting_pid) & 
> > (-ECONNREFUSED|-EPERM|-ENOMEM))) {
> > audit_log_config_change("audit_pid", 
> > new_pid, audit_pid, 0);
> > return -EEXIST;
> > }
> > if (audit_enabled != AUDIT_OFF)
> > audit_log_config_change("audit_pid", 
> > new_pid, audit_pid, 1);
> > -   audit_pid = new_pid;
> > -   audit_nlk_portid = NETLINK_CB(skb).portid;
> > -   audit_sock = skb->sk;
> > -   if (!new_pid)
> > +  

Re: [PATCH v2] audit: use proper refcount locking on audit_sock

2016-12-12 Thread Cong Wang
On Mon, Dec 12, 2016 at 2:03 AM, Richard Guy Briggs  wrote:
> Resetting audit_sock appears to be racy.
>
> audit_sock was being copied and dereferenced without using a refcount on
> the source sock.
>
> Bump the refcount on the underlying sock when we store a refrence in
> audit_sock and release it when we reset audit_sock.  audit_sock
> modification needs the audit_cmd_mutex.
>
> See: https://lkml.org/lkml/2016/11/26/232
>
> Thanks to Eric Dumazet  and Cong Wang
>  on ideas how to fix it.
>
> Signed-off-by: Richard Guy Briggs 
> ---
> There has been a lot of change in the audit code that is about to go
> upstream to address audit queue issues.  This patch is based on the
> source tree: git://git.infradead.org/users/pcmoore/audit#next
> ---
>  kernel/audit.c |   34 --
>  1 files changed, 28 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/audit.c b/kernel/audit.c
> index f20eee0..439f7f3 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -452,7 +452,9 @@ static void auditd_reset(void)
> struct sk_buff *skb;
>
> /* break the connection */
> +   sock_put(audit_sock);


Why audit_sock can't be NULL here?


> audit_pid = 0;
> +   audit_nlk_portid = 0;
> audit_sock = NULL;
>
> /* flush all of the retry queue to the hold queue */
> @@ -478,6 +480,12 @@ static int kauditd_send_unicast_skb(struct sk_buff *skb)
> if (rc >= 0) {
> consume_skb(skb);
> rc = 0;
> +   } else {
> +   if (rc & (-ENOMEM|-EPERM|-ECONNREFUSED)) {


Are these errno's bits??


> +   mutex_lock(&audit_cmd_mutex);
> +   auditd_reset();
> +   mutex_unlock(&audit_cmd_mutex);
> +   }
> }
>
> return rc;
> @@ -579,7 +587,9 @@ static int kauditd_thread(void *dummy)
>
> auditd = 0;
> if (AUDITD_BAD(rc, reschedule)) {
> +   mutex_lock(&audit_cmd_mutex);
> auditd_reset();
> +   mutex_unlock(&audit_cmd_mutex);
> reschedule = 0;
> }
> } else
> @@ -594,7 +604,9 @@ static int kauditd_thread(void *dummy)
> auditd = 0;
> if (AUDITD_BAD(rc, reschedule)) {
> kauditd_hold_skb(skb);
> +   mutex_lock(&audit_cmd_mutex);
> auditd_reset();
> +   mutex_unlock(&audit_cmd_mutex);
> reschedule = 0;
> } else
> /* temporary problem (we hope), queue
> @@ -623,7 +635,9 @@ quick_loop:
> if (rc) {
> auditd = 0;
> if (AUDITD_BAD(rc, reschedule)) {
> +   mutex_lock(&audit_cmd_mutex);
> auditd_reset();
> +   
> mutex_unlock(&audit_cmd_mutex);
> reschedule = 0;
> }
>
> @@ -1004,17 +1018,22 @@ static int audit_receive_msg(struct sk_buff *skb, 
> struct nlmsghdr *nlh)
> return -EACCES;
> }
> if (audit_pid && new_pid &&
> -   audit_replace(requesting_pid) != -ECONNREFUSED) {
> +   (audit_replace(requesting_pid) & 
> (-ECONNREFUSED|-EPERM|-ENOMEM))) {
> audit_log_config_change("audit_pid", new_pid, 
> audit_pid, 0);
> return -EEXIST;
> }
> if (audit_enabled != AUDIT_OFF)
> audit_log_config_change("audit_pid", new_pid, 
> audit_pid, 1);
> -   audit_pid = new_pid;
> -   audit_nlk_portid = NETLINK_CB(skb).portid;
> -   audit_sock = skb->sk;
> -   if (!new_pid)
> +   if (new_pid) {
> +   if (audit_sock)
> +   sock_put(audit_sock);
> +   audit_pid = new_pid;
> +   audit_nlk_portid = NETLINK_CB(skb).portid;
> +   sock_hold(skb->sk);

Why refcnt is still needed here? I need it because I removed the code
in net exit code path.


> +   audit_sock = skb->sk;
> +   } else {
>   

Re: [PATCH v2] audit: use proper refcount locking on audit_sock

2016-12-12 Thread Paul Moore
On Mon, Dec 12, 2016 at 5:03 AM, Richard Guy Briggs  wrote:
> Resetting audit_sock appears to be racy.
>
> audit_sock was being copied and dereferenced without using a refcount on
> the source sock.
>
> Bump the refcount on the underlying sock when we store a refrence in
> audit_sock and release it when we reset audit_sock.  audit_sock
> modification needs the audit_cmd_mutex.
>
> See: https://lkml.org/lkml/2016/11/26/232
>
> Thanks to Eric Dumazet  and Cong Wang
>  on ideas how to fix it.
>
> Signed-off-by: Richard Guy Briggs 
> ---
> There has been a lot of change in the audit code that is about to go
> upstream to address audit queue issues.  This patch is based on the
> source tree: git://git.infradead.org/users/pcmoore/audit#next
> ---
>  kernel/audit.c |   34 --
>  1 files changed, 28 insertions(+), 6 deletions(-)

My previous question about testing still stands, but I took a closer
look and have some additional comments, see below ...

> diff --git a/kernel/audit.c b/kernel/audit.c
> index f20eee0..439f7f3 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -452,7 +452,9 @@ static void auditd_reset(void)
> struct sk_buff *skb;
>
> /* break the connection */
> +   sock_put(audit_sock);
> audit_pid = 0;
> +   audit_nlk_portid = 0;
> audit_sock = NULL;
>
> /* flush all of the retry queue to the hold queue */
> @@ -478,6 +480,12 @@ static int kauditd_send_unicast_skb(struct sk_buff *skb)
> if (rc >= 0) {
> consume_skb(skb);
> rc = 0;
> +   } else {
> +   if (rc & (-ENOMEM|-EPERM|-ECONNREFUSED)) {

I dislike the way you wrote this because instead of simply looking at
this to see if it correct I need to sort out all the bits and find out
if there are other error codes that could run afoul of this check ...
make it simple, e.g. (rc == -ENOMEM || rc == -EPERM || ...).
Actually, since EPERM is 1, -EPERM (-1 in two's compliment is
0x) is going to cause this to be true for pretty much any
value of rc, yes?

> +   mutex_lock(&audit_cmd_mutex);
> +   auditd_reset();
> +   mutex_unlock(&audit_cmd_mutex);
> +   }

The code in audit#next handles netlink_unicast() errors in
kauditd_thread() and you are adding error handling code here in
kauditd_send_unicast_skb() ... that's messy.  I don't care too much
where the auditd_reset() call is made, but let's only do it in one
function; FWIW, I originally put the error handling code in
kauditd_thread() because there was other error handling code that
needed to done in that scope so it resulted in cleaner code.

Related, I see you are now considering ENOMEM to be a fatal condition,
that differs from the AUDITD_BAD macro in kauditd_thread(); this
difference needs to be reconciled.

Finally, you should update the comment header block for auditd_reset()
that it needs to be called with the audit_cmd_mutex held.

> @@ -1004,17 +1018,22 @@ static int audit_receive_msg(struct sk_buff *skb, 
> struct nlmsghdr *nlh)
> return -EACCES;
> }
> if (audit_pid && new_pid &&
> -   audit_replace(requesting_pid) != -ECONNREFUSED) {
> +   (audit_replace(requesting_pid) & 
> (-ECONNREFUSED|-EPERM|-ENOMEM))) {

Do we simply want to treat any error here as fatal, and not just
ECONN/EPERM/ENOMEM?  If not, let's come up with a single macro to
handle the fatal netlink_unicast() return codes so we have some chance
to keep things consistent in the future.

-- 
paul moore
www.paul-moore.com


Re: [PATCH v2] audit: use proper refcount locking on audit_sock

2016-12-12 Thread Paul Moore
On Mon, Dec 12, 2016 at 5:03 AM, Richard Guy Briggs  wrote:
> Resetting audit_sock appears to be racy.
>
> audit_sock was being copied and dereferenced without using a refcount on
> the source sock.
>
> Bump the refcount on the underlying sock when we store a refrence in
> audit_sock and release it when we reset audit_sock.  audit_sock
> modification needs the audit_cmd_mutex.
>
> See: https://lkml.org/lkml/2016/11/26/232
>
> Thanks to Eric Dumazet  and Cong Wang
>  on ideas how to fix it.
>
> Signed-off-by: Richard Guy Briggs 
> ---
> There has been a lot of change in the audit code that is about to go
> upstream to address audit queue issues.  This patch is based on the
> source tree: git://git.infradead.org/users/pcmoore/audit#next
> ---
>  kernel/audit.c |   34 --
>  1 files changed, 28 insertions(+), 6 deletions(-)

This is coming in pretty late for the v4.10 merge window, much later
than I would usually take things, but this is arguably important, and
(at first glance) relatively low risk - what testing have you done on
this?

> diff --git a/kernel/audit.c b/kernel/audit.c
> index f20eee0..439f7f3 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -452,7 +452,9 @@ static void auditd_reset(void)
> struct sk_buff *skb;
>
> /* break the connection */
> +   sock_put(audit_sock);
> audit_pid = 0;
> +   audit_nlk_portid = 0;
> audit_sock = NULL;
>
> /* flush all of the retry queue to the hold queue */
> @@ -478,6 +480,12 @@ static int kauditd_send_unicast_skb(struct sk_buff *skb)
> if (rc >= 0) {
> consume_skb(skb);
> rc = 0;
> +   } else {
> +   if (rc & (-ENOMEM|-EPERM|-ECONNREFUSED)) {
> +   mutex_lock(&audit_cmd_mutex);
> +   auditd_reset();
> +   mutex_unlock(&audit_cmd_mutex);
> +   }
> }
>
> return rc;
> @@ -579,7 +587,9 @@ static int kauditd_thread(void *dummy)
>
> auditd = 0;
> if (AUDITD_BAD(rc, reschedule)) {
> +   mutex_lock(&audit_cmd_mutex);
> auditd_reset();
> +   mutex_unlock(&audit_cmd_mutex);
> reschedule = 0;
> }
> } else
> @@ -594,7 +604,9 @@ static int kauditd_thread(void *dummy)
> auditd = 0;
> if (AUDITD_BAD(rc, reschedule)) {
> kauditd_hold_skb(skb);
> +   mutex_lock(&audit_cmd_mutex);
> auditd_reset();
> +   mutex_unlock(&audit_cmd_mutex);
> reschedule = 0;
> } else
> /* temporary problem (we hope), queue
> @@ -623,7 +635,9 @@ quick_loop:
> if (rc) {
> auditd = 0;
> if (AUDITD_BAD(rc, reschedule)) {
> +   mutex_lock(&audit_cmd_mutex);
> auditd_reset();
> +   
> mutex_unlock(&audit_cmd_mutex);
> reschedule = 0;
> }
>
> @@ -1004,17 +1018,22 @@ static int audit_receive_msg(struct sk_buff *skb, 
> struct nlmsghdr *nlh)
> return -EACCES;
> }
> if (audit_pid && new_pid &&
> -   audit_replace(requesting_pid) != -ECONNREFUSED) {
> +   (audit_replace(requesting_pid) & 
> (-ECONNREFUSED|-EPERM|-ENOMEM))) {
> audit_log_config_change("audit_pid", new_pid, 
> audit_pid, 0);
> return -EEXIST;
> }
> if (audit_enabled != AUDIT_OFF)
> audit_log_config_change("audit_pid", new_pid, 
> audit_pid, 1);
> -   audit_pid = new_pid;
> -   audit_nlk_portid = NETLINK_CB(skb).portid;
> -   audit_sock = skb->sk;
> -   if (!new_pid)
> +   if (new_pid) {
> +   if (audit_sock)
> +   sock_put(audit_sock);
> +   audit_pid = new_pid;
> +   audit_nlk_portid = NETLINK_CB(skb).portid;
> +   sock_hold(skb->sk);
> +   audit_sock = skb->sk;

[PATCH v2] audit: use proper refcount locking on audit_sock

2016-12-12 Thread Richard Guy Briggs
Resetting audit_sock appears to be racy.

audit_sock was being copied and dereferenced without using a refcount on
the source sock.

Bump the refcount on the underlying sock when we store a refrence in
audit_sock and release it when we reset audit_sock.  audit_sock
modification needs the audit_cmd_mutex.

See: https://lkml.org/lkml/2016/11/26/232

Thanks to Eric Dumazet  and Cong Wang
 on ideas how to fix it.

Signed-off-by: Richard Guy Briggs 
---
There has been a lot of change in the audit code that is about to go
upstream to address audit queue issues.  This patch is based on the
source tree: git://git.infradead.org/users/pcmoore/audit#next
---
 kernel/audit.c |   34 --
 1 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index f20eee0..439f7f3 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -452,7 +452,9 @@ static void auditd_reset(void)
struct sk_buff *skb;
 
/* break the connection */
+   sock_put(audit_sock);
audit_pid = 0;
+   audit_nlk_portid = 0;
audit_sock = NULL;
 
/* flush all of the retry queue to the hold queue */
@@ -478,6 +480,12 @@ static int kauditd_send_unicast_skb(struct sk_buff *skb)
if (rc >= 0) {
consume_skb(skb);
rc = 0;
+   } else {
+   if (rc & (-ENOMEM|-EPERM|-ECONNREFUSED)) {
+   mutex_lock(&audit_cmd_mutex);
+   auditd_reset();
+   mutex_unlock(&audit_cmd_mutex);
+   }
}
 
return rc;
@@ -579,7 +587,9 @@ static int kauditd_thread(void *dummy)
 
auditd = 0;
if (AUDITD_BAD(rc, reschedule)) {
+   mutex_lock(&audit_cmd_mutex);
auditd_reset();
+   mutex_unlock(&audit_cmd_mutex);
reschedule = 0;
}
} else
@@ -594,7 +604,9 @@ static int kauditd_thread(void *dummy)
auditd = 0;
if (AUDITD_BAD(rc, reschedule)) {
kauditd_hold_skb(skb);
+   mutex_lock(&audit_cmd_mutex);
auditd_reset();
+   mutex_unlock(&audit_cmd_mutex);
reschedule = 0;
} else
/* temporary problem (we hope), queue
@@ -623,7 +635,9 @@ quick_loop:
if (rc) {
auditd = 0;
if (AUDITD_BAD(rc, reschedule)) {
+   mutex_lock(&audit_cmd_mutex);
auditd_reset();
+   mutex_unlock(&audit_cmd_mutex);
reschedule = 0;
}
 
@@ -1004,17 +1018,22 @@ static int audit_receive_msg(struct sk_buff *skb, 
struct nlmsghdr *nlh)
return -EACCES;
}
if (audit_pid && new_pid &&
-   audit_replace(requesting_pid) != -ECONNREFUSED) {
+   (audit_replace(requesting_pid) & 
(-ECONNREFUSED|-EPERM|-ENOMEM))) {
audit_log_config_change("audit_pid", new_pid, 
audit_pid, 0);
return -EEXIST;
}
if (audit_enabled != AUDIT_OFF)
audit_log_config_change("audit_pid", new_pid, 
audit_pid, 1);
-   audit_pid = new_pid;
-   audit_nlk_portid = NETLINK_CB(skb).portid;
-   audit_sock = skb->sk;
-   if (!new_pid)
+   if (new_pid) {
+   if (audit_sock)
+   sock_put(audit_sock);
+   audit_pid = new_pid;
+   audit_nlk_portid = NETLINK_CB(skb).portid;
+   sock_hold(skb->sk);
+   audit_sock = skb->sk;
+   } else {
auditd_reset();
+   }
wake_up_interruptible(&kauditd_wait);
}
if (s.mask & AUDIT_STATUS_RATE_LIMIT) {
@@ -1283,8 +1302,11 @@ static void __net_exit audit_net_exit(struct net *net)
 {
struct audit_net *aunet = net_generic(net, audit_net_id);
struct sock *sock = aunet->nlsk;
-   if (sock == audit_sock)
+   if