Re: [Patch net] rds: mark bound socket with SOCK_RCU_FREE

2018-09-10 Thread Cong Wang
On Mon, Sep 10, 2018 at 5:56 PM Santosh Shilimkar
 wrote:
>
> On 9/10/2018 5:45 PM, Cong Wang wrote:
> > On Mon, Sep 10, 2018 at 5:26 PM Santosh Shilimkar
> >  wrote:
> >> Would you mind posting an updated patch please with call_rcu and
> >> above extended RCU grace period with rcu_read_lock. Thanks !!
> >
> > If you prefer to fix _two_ problems in one patch, sure.
> >
> > For the record, the bug this patch fixes is NOT same with the one
> > in rds_find_bound(), because there is no rds_find_bound() in
> > the backtrace. If you want to see the full backtrace, here it is:
> >
> > https://marc.info/?l=linux-netdev=15364808962=2
> >
> > This is why I believe they are two problems.
> >
> > Whether fixing two problems in one patch or two patches is
> > merely a preference, I leave it up to you.
> >
> I had a partial fix as well in past but since it wasn't covering
> complete window, it was abandoned.
>
> Since we know the other race, it would be best to address it
> together and hence requested a combine patch. Thanks for help !!

No problem! v2 is coming shortly after passing syzbot test. :)


Re: [Patch net] rds: mark bound socket with SOCK_RCU_FREE

2018-09-10 Thread Santosh Shilimkar

On 9/10/2018 5:45 PM, Cong Wang wrote:

On Mon, Sep 10, 2018 at 5:26 PM Santosh Shilimkar
 wrote:

Would you mind posting an updated patch please with call_rcu and
above extended RCU grace period with rcu_read_lock. Thanks !!


If you prefer to fix _two_ problems in one patch, sure.

For the record, the bug this patch fixes is NOT same with the one
in rds_find_bound(), because there is no rds_find_bound() in
the backtrace. If you want to see the full backtrace, here it is:

https://marc.info/?l=linux-netdev=15364808962=2

This is why I believe they are two problems.

Whether fixing two problems in one patch or two patches is
merely a preference, I leave it up to you.


I had a partial fix as well in past but since it wasn't covering
complete window, it was abandoned.

Since we know the other race, it would be best to address it
together and hence requested a combine patch. Thanks for help !!

Regards,
Santosh


Re: [Patch net] rds: mark bound socket with SOCK_RCU_FREE

2018-09-10 Thread Cong Wang
On Mon, Sep 10, 2018 at 5:26 PM Santosh Shilimkar
 wrote:
> Would you mind posting an updated patch please with call_rcu and
> above extended RCU grace period with rcu_read_lock. Thanks !!

If you prefer to fix _two_ problems in one patch, sure.

For the record, the bug this patch fixes is NOT same with the one
in rds_find_bound(), because there is no rds_find_bound() in
the backtrace. If you want to see the full backtrace, here it is:

https://marc.info/?l=linux-netdev=15364808962=2

This is why I believe they are two problems.

Whether fixing two problems in one patch or two patches is
merely a preference, I leave it up to you.


Re: [Patch net] rds: mark bound socket with SOCK_RCU_FREE

2018-09-10 Thread Cong Wang
On Mon, Sep 10, 2018 at 5:24 PM Sowmini Varadhan
 wrote:
>
> On (09/10/18 17:16), Cong Wang wrote:
> > >
> > > On (09/10/18 16:51), Cong Wang wrote:
> > > >
> > > > __rds_create_bind_key(key, addr, port, scope_id);
> > > > -   rs = rhashtable_lookup_fast(_hash_table, key, ht_parms);
> > > > +   rcu_read_lock();
> > > > +   rs = rhashtable_lookup(_hash_table, key, ht_parms);
> > > > if (rs && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD))
> > > > rds_sock_addref(rs);
> > > > else
> > > > rs = NULL;
> > > > +   rcu_read_unlock();
> > >
> > > aiui, the rcu_read lock/unlock is only useful if the write
> > > side doing destructive operations  does something to make sure readers
> > > are done before doing the destructive opertion. AFAIK, that does
> > > not exist for rds socket management today
> >
> > That is exactly why we need it here, right?
>
> Maybe I am confused, what exactly is the patch you are proposing?
>
> Does it have the SOCK_RCU_FREE change?

Yes, that patch is obviously on top of this patch.


> Does it have the rcu_read_lock you have above?
> Where is the call_rcu?

Sure, as it is on top of this patch, the call_rcu() is
here:

void sk_destruct(struct sock *sk)
{
if (sock_flag(sk, SOCK_RCU_FREE))
call_rcu(>sk_rcu, __sk_destruct);
else
__sk_destruct(>sk_rcu);
}


>
> > Hmm, so you are saying synchronize_rcu() is kinda more correct
> > than call_rcu()??
>
>
> I'm not saying that, I'm asking "what exactly is the patch
> you are proposing?" The only one on record is
>http://patchwork.ozlabs.org/patch/968282/
> which does not have either synchronize_rcu or call_rcu.

It is very obviously on top of this patch ($subject).


>
> > I never hear this before, would like to know why.
>
> Please post precise patches first.

Already showed you precisely what is is, just on top
of this one.


Re: [Patch net] rds: mark bound socket with SOCK_RCU_FREE

2018-09-10 Thread Santosh Shilimkar

On 9/10/2018 5:16 PM, Cong Wang wrote:

On Mon, Sep 10, 2018 at 5:04 PM Sowmini Varadhan
 wrote:


On (09/10/18 16:51), Cong Wang wrote:


 __rds_create_bind_key(key, addr, port, scope_id);
-   rs = rhashtable_lookup_fast(_hash_table, key, ht_parms);
+   rcu_read_lock();
+   rs = rhashtable_lookup(_hash_table, key, ht_parms);
 if (rs && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD))
 rds_sock_addref(rs);
 else
 rs = NULL;
+   rcu_read_unlock();


aiui, the rcu_read lock/unlock is only useful if the write
side doing destructive operations  does something to make sure readers
are done before doing the destructive opertion. AFAIK, that does
not exist for rds socket management today


That is exactly why we need it here, right?

As you mentioned previously, the sock could be freed after
rhashtable_lookup_fast() but before rds_sock_addref(), extending
the RCU read section after rds_sock_addref() exactly solves
the problem here.


Thats the case really.


Or is there any other destructive problem you didn't mention?
Mind to be specific?






Although sock release path is not a very hot path, but blocking
it isn't a good idea either, especially when you can use call_rcu(),
which has the same effect.

I don't see any reason we should prefer synchronize_rcu() here.


Usually correctness (making sure all readers are done, before nuking a
data structure) is a little bit more important than perforamance, aka
"safety before speed" is what I've always been taught.



Hmm, so you are saying synchronize_rcu() is kinda more correct
than call_rcu()?? I never hear this before, would like to know why.

To my best knowledge, the only difference between them is the context,
one is blocking, the other is non-blocking. Their correctness must be
equivalent.


We have burn our hands with blocking synchronize_rcu() and that was
actually the main reason we moved to rw locks from rcu to localise
the cost. call_rcu()should be just fine as long as it plugs the hole.
I don't want to add blocking in this path since this slows
down the connection shutdown path and at times lead to soft dumps.

Would you mind posting an updated patch please with call_rcu and
above extended RCU grace period with rcu_read_lock. Thanks !!

Regards,
Santosh

Regards,
Santosh


Re: [Patch net] rds: mark bound socket with SOCK_RCU_FREE

2018-09-10 Thread Sowmini Varadhan
On (09/10/18 17:16), Cong Wang wrote:
> >
> > On (09/10/18 16:51), Cong Wang wrote:
> > >
> > > __rds_create_bind_key(key, addr, port, scope_id);
> > > -   rs = rhashtable_lookup_fast(_hash_table, key, ht_parms);
> > > +   rcu_read_lock();
> > > +   rs = rhashtable_lookup(_hash_table, key, ht_parms);
> > > if (rs && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD))
> > > rds_sock_addref(rs);
> > > else
> > > rs = NULL;
> > > +   rcu_read_unlock();
> >
> > aiui, the rcu_read lock/unlock is only useful if the write
> > side doing destructive operations  does something to make sure readers
> > are done before doing the destructive opertion. AFAIK, that does
> > not exist for rds socket management today
> 
> That is exactly why we need it here, right?

Maybe I am confused, what exactly is the patch you are proposing?

Does it have the SOCK_RCU_FREE change? 
Does it have the rcu_read_lock you have above? 
Where is the call_rcu?

> Hmm, so you are saying synchronize_rcu() is kinda more correct
> than call_rcu()??  


I'm not saying that, I'm asking "what exactly is the patch
you are proposing?" The only one on record is 
   http://patchwork.ozlabs.org/patch/968282/
which does not have either synchronize_rcu or call_rcu.

> I never hear this before, would like to know why.

Please post precise patches first.

Thanks.



Re: [Patch net] rds: mark bound socket with SOCK_RCU_FREE

2018-09-10 Thread Cong Wang
On Mon, Sep 10, 2018 at 5:04 PM Sowmini Varadhan
 wrote:
>
> On (09/10/18 16:51), Cong Wang wrote:
> >
> > __rds_create_bind_key(key, addr, port, scope_id);
> > -   rs = rhashtable_lookup_fast(_hash_table, key, ht_parms);
> > +   rcu_read_lock();
> > +   rs = rhashtable_lookup(_hash_table, key, ht_parms);
> > if (rs && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD))
> > rds_sock_addref(rs);
> > else
> > rs = NULL;
> > +   rcu_read_unlock();
>
> aiui, the rcu_read lock/unlock is only useful if the write
> side doing destructive operations  does something to make sure readers
> are done before doing the destructive opertion. AFAIK, that does
> not exist for rds socket management today

That is exactly why we need it here, right?

As you mentioned previously, the sock could be freed after
rhashtable_lookup_fast() but before rds_sock_addref(), extending
the RCU read section after rds_sock_addref() exactly solves
the problem here.

Or is there any other destructive problem you didn't mention?
Mind to be specific?


>
>
> > Although sock release path is not a very hot path, but blocking
> > it isn't a good idea either, especially when you can use call_rcu(),
> > which has the same effect.
> >
> > I don't see any reason we should prefer synchronize_rcu() here.
>
> Usually correctness (making sure all readers are done, before nuking a
> data structure) is a little bit more important than perforamance, aka
> "safety before speed" is what I've always been taught.
>

Hmm, so you are saying synchronize_rcu() is kinda more correct
than call_rcu()?? I never hear this before, would like to know why.

To my best knowledge, the only difference between them is the context,
one is blocking, the other is non-blocking. Their correctness must be
equivalent.


Re: [Patch net] rds: mark bound socket with SOCK_RCU_FREE

2018-09-10 Thread Sowmini Varadhan
On (09/10/18 16:51), Cong Wang wrote:
> 
> __rds_create_bind_key(key, addr, port, scope_id);
> -   rs = rhashtable_lookup_fast(_hash_table, key, ht_parms);
> +   rcu_read_lock();
> +   rs = rhashtable_lookup(_hash_table, key, ht_parms);
> if (rs && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD))
> rds_sock_addref(rs);
> else
> rs = NULL;
> +   rcu_read_unlock();

aiui, the rcu_read lock/unlock is only useful if the write
side doing destructive operations  does something to make sure readers
are done before doing the destructive opertion. AFAIK, that does
not exist for rds socket management today


> Although sock release path is not a very hot path, but blocking
> it isn't a good idea either, especially when you can use call_rcu(),
> which has the same effect.
> 
> I don't see any reason we should prefer synchronize_rcu() here.

Usually correctness (making sure all readers are done, before nuking a
data structure) is a little bit more important than perforamance, aka
"safety before speed" is what I've always been taught.  

Clearly, your mileage varies. As you please.

--Sowmini




Re: [Patch net] rds: mark bound socket with SOCK_RCU_FREE

2018-09-10 Thread Cong Wang
On Mon, Sep 10, 2018 at 4:30 PM Sowmini Varadhan
 wrote:
>
> On (09/10/18 15:43), Santosh Shilimkar wrote:
> > On 9/10/2018 3:24 PM, Cong Wang wrote:
> > >When a rds sock is bound, it is inserted into the bind_hash_table
> > >which is protected by RCU. But when releasing rd sock, after it
> > >is removed from this hash table, it is freed immediately without
> > >respecting RCU grace period. This could cause some use-after-free
> > >as reported by syzbot.
> > >
> > Indeed.
> >
> > >Mark the rds sock as SOCK_RCU_FREE before inserting it into the
> > >bind_hash_table, so that it would be always freed after a RCU grace
> > >period.
>
> So I'm not sure I understand.
>
> Yes, Cong's fix may eliminate *some* of the syzbot failures, but the
> basic problem is not solved.
>
> To take one example of possible races (one that was discussed in
>   https://www.spinics.net/lists/netdev/msg475074.html)
> rds_recv_incoming->rds_find_bound is being called in rds_send_worker
> context and the rds_find_bound code is
>
>  63 rs = rhashtable_lookup_fast(_hash_table, , ht_parms);
>  64 if (rs && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD))
>  65 rds_sock_addref(rs);
>  66 else
>  67 rs = NULL;
>  68
>
> After we find an rs at line 63, how can we be sure that the entire
> logic of rds_release does not execute on another cpu, and free the rs,
> before we hit line 64 with the bad rs?

This is a different problem. The RCU grace period should be just
extended:

diff --git a/net/rds/bind.c b/net/rds/bind.c
index 3ab55784b637..fa7592d0760c 100644
--- a/net/rds/bind.c
+++ b/net/rds/bind.c
@@ -76,11 +76,13 @@ struct rds_sock *rds_find_bound(const struct
in6_addr *addr, __be16 port,
struct rds_sock *rs;

__rds_create_bind_key(key, addr, port, scope_id);
-   rs = rhashtable_lookup_fast(_hash_table, key, ht_parms);
+   rcu_read_lock();
+   rs = rhashtable_lookup(_hash_table, key, ht_parms);
if (rs && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD))
rds_sock_addref(rs);
else
rs = NULL;
+   rcu_read_unlock();

rdsdebug("returning rs %p for %pI6c:%u\n", rs, addr,
 ntohs(port));

>
> Normally synchronize_rcu() or synchronize_net() in rds_release() would
> ensure this. How do we ensure this with SOCK_RCU_FREE (or is the
> intention to just reduce *some* of the syzbot failures)?

Although sock release path is not a very hot path, but blocking
it isn't a good idea either, especially when you can use call_rcu(),
which has the same effect.

I don't see any reason we should prefer synchronize_rcu() here.


Re: [Patch net] rds: mark bound socket with SOCK_RCU_FREE

2018-09-10 Thread Sowmini Varadhan
On (09/10/18 15:43), Santosh Shilimkar wrote:
> On 9/10/2018 3:24 PM, Cong Wang wrote:
> >When a rds sock is bound, it is inserted into the bind_hash_table
> >which is protected by RCU. But when releasing rd sock, after it
> >is removed from this hash table, it is freed immediately without
> >respecting RCU grace period. This could cause some use-after-free
> >as reported by syzbot.
> >
> Indeed.
> 
> >Mark the rds sock as SOCK_RCU_FREE before inserting it into the
> >bind_hash_table, so that it would be always freed after a RCU grace
> >period.

So I'm not sure I understand. 

Yes, Cong's fix may eliminate *some* of the syzbot failures, but the
basic problem is not solved.

To take one example of possible races (one that was discussed in
  https://www.spinics.net/lists/netdev/msg475074.html)
rds_recv_incoming->rds_find_bound is being called in rds_send_worker
context and the rds_find_bound code is

 63 rs = rhashtable_lookup_fast(_hash_table, , ht_parms);
 64 if (rs && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD))
 65 rds_sock_addref(rs);
 66 else 
 67 rs = NULL;
 68 

After we find an rs at line 63, how can we be sure that the entire
logic of rds_release does not execute on another cpu, and free the rs,
before we hit line 64 with the bad rs?

Normally synchronize_rcu() or synchronize_net() in rds_release() would 
ensure this. How do we ensure this with SOCK_RCU_FREE (or is the
intention to just reduce *some* of the syzbot failures)?

--Sowmini




Re: [Patch net] rds: mark bound socket with SOCK_RCU_FREE

2018-09-10 Thread Santosh Shilimkar

On 9/10/2018 3:24 PM, Cong Wang wrote:

When a rds sock is bound, it is inserted into the bind_hash_table
which is protected by RCU. But when releasing rd sock, after it
is removed from this hash table, it is freed immediately without
respecting RCU grace period. This could cause some use-after-free
as reported by syzbot.


Indeed.


Mark the rds sock as SOCK_RCU_FREE before inserting it into the
bind_hash_table, so that it would be always freed after a RCU grace
period.

Reported-and-tested-by: syzbot+8967084bcac563795...@syzkaller.appspotmail.com
Cc: Sowmini Varadhan 
Cc: Santosh Shilimkar 
Cc: rds-de...@oss.oracle.com
Signed-off-by: Cong Wang 
---
  net/rds/bind.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/net/rds/bind.c b/net/rds/bind.c
index 3ab55784b637..2281b34415b9 100644
--- a/net/rds/bind.c
+++ b/net/rds/bind.c
@@ -235,6 +235,7 @@ int rds_bind(struct socket *sock, struct sockaddr *uaddr, 
int addr_len)
goto out;
}
  
+	sock_set_flag(sk, SOCK_RCU_FREE);

ret = rds_add_bound(rs, binding_addr, , scope_id);
if (ret)
goto out;

I wasn't aware of this "SOCK_RCU_FREE" so really thanks for this patch. 
Have been scratching my head over this for a while thinking about

generic provision at sk level to synchronize. This is much
better than adding the sync at upper layer.

It does have the tax for slowing down RDS for other kernel components
rcu sync but anyway this hole needs to be plugged so am fine to go
ahead with this change. Thanks for the patch.

FWIW,
Acked-by: Santosh Shilimkar 


Re: [Patch net] rds: mark bound socket with SOCK_RCU_FREE

2018-09-10 Thread Sowmini Varadhan
On (09/10/18 15:24), Cong Wang wrote:
> 
> When a rds sock is bound, it is inserted into the bind_hash_table
> which is protected by RCU. But when releasing rd sock, after it
> is removed from this hash table, it is freed immediately without
> respecting RCU grace period. This could cause some use-after-free
> as reported by syzbot.
> 

I have no objection to the change itself, but the syzbot failures
are caused for a very simple reason: we need synchronize_net()
in rds_release before we remove the rds_sock from the bind_hash_table.

I already pointed this out in 
  https://www.spinics.net/lists/netdev/msg475074.html

I think the objection to synchronize_net() is that it can cause
perf issues (I'm told that rds_release() has been known to be held
up by other threads in rcu critical sections?) but I personally
dont see any other alternative to this (other than going back
to rwlock, instead of rcu)

--Sowmini