Re:Re: Re:Re: Re:Re: Re: [PATCH net] ppp: Fix a scheduling-while-atomic bug in del_chan
At 2017-08-10 05:00:19, "Cong Wang"wrote: >On Wed, Aug 9, 2017 at 12:17 AM, Gao Feng wrote: >> Hi Cong, >> >> Actually I have one question about the SOCK_RCU_FREE. >> I don't think it could resolve the issue you raised even though it exists >> really. >> >> I checked the SOCK_RCU_FREE, it just defer the __sk_destruct after one rcu >> period. >> But when it performs, someone still could find this sock by callid during >> the del_chan period and it may still deference the sock which may freed soon. >> >> The right flow should be following. >> del_chan() >> wait a rcu period >> sock_put() It is safe that someone gets the sock because it >> already hold sock refcnt. >> >> When using SOCK_RCU_FREE, its flow would be following. >> wait a rcu period >> del_chan() >> free the sock directly no sock refcnt check again. >> Because the del_chan happens after rcu wait, not before, so it isn't helpful >> with SOCK_RCU_FREE. > > >Yes, good point! With SOCK_RCU_FREE the sock_hold() should >not be needed. For RCU, unpublish should indeed happen before >grace period. Sorry, I couldn't understand why sock_hold() isn't necessary with SOCK_RCU_FREE. When lookup_chan finds the sock, it would return and reference it later. If no refcnt, how to protect the sock ? Best Regards Feng > > >> >> I don't know if I misunderstand the SOCK_RCU_FREE usage. >> >> But it is a good news that the del_chan is only invoked in pptp_release >> actually and it would wait a rcu period before sock_put. >> > >Looking at the code again, the reader lookup_chan() is actually >invoked in BH context, but neither add_chan() nor del_chan() >actually disables BH...
Re:Re: Re: Re:Re: Re: [PATCH net] ppp: Fix a scheduling-while-atomic bug in del_chan
At 2017-08-10 02:18:44, "Cong Wang"wrote: >On Tue, Aug 8, 2017 at 10:13 PM, Gao Feng wrote: >> Maybe I didn't show my explanation clearly. >> I think it won't happen as I mentioned in the last email. >> Because the pptp_release invokes the synchronize_rcu to make sure it, and >> actually there is no one which would invoke del_chan except pptp_release. >> It is guaranteed by that the pptp_release doesn't put the sock refcnt until >> complete all cleanup include marking sk_state as PPPOX_DEAD. >> >> In other words, even though the pptp_release is not the last user of this >> sock, the other one wouldn't invoke del_chan in pptp_sock_destruct. >> Because the condition "!(sk->sk_state & PPPOX_DEAD)" must be false. > >Only if sock->sk is always non-NULL for pptp_release(), which >is what I am not sure. If you look at other ->release(), similar checks >are there too, so not just for pptp. Yes. It seems only if the release() is invoked twice, the sock->sk would be NULL. But I don't find there is any case which could cause it. > >> >> As summary, the del_chan and pppox_unbind_sock in pptp_sock_destruct are >> unnecessary. >> And it even brings confusing. > >Sorry, I can't draw any conclusion for this. Thank you all the same, and I have learn a lot from you :) Wish someone which is familiar with these codes could give more details and explanations. Best Regards Feng
Re: Re:Re: Re:Re: Re: [PATCH net] ppp: Fix a scheduling-while-atomic bug in del_chan
On Wed, Aug 9, 2017 at 12:17 AM, Gao Fengwrote: > Hi Cong, > > Actually I have one question about the SOCK_RCU_FREE. > I don't think it could resolve the issue you raised even though it exists > really. > > I checked the SOCK_RCU_FREE, it just defer the __sk_destruct after one rcu > period. > But when it performs, someone still could find this sock by callid during the > del_chan period and it may still deference the sock which may freed soon. > > The right flow should be following. > del_chan() > wait a rcu period > sock_put() It is safe that someone gets the sock because it > already hold sock refcnt. > > When using SOCK_RCU_FREE, its flow would be following. > wait a rcu period > del_chan() > free the sock directly no sock refcnt check again. > Because the del_chan happens after rcu wait, not before, so it isn't helpful > with SOCK_RCU_FREE. Yes, good point! With SOCK_RCU_FREE the sock_hold() should not be needed. For RCU, unpublish should indeed happen before grace period. > > I don't know if I misunderstand the SOCK_RCU_FREE usage. > > But it is a good news that the del_chan is only invoked in pptp_release > actually and it would wait a rcu period before sock_put. > Looking at the code again, the reader lookup_chan() is actually invoked in BH context, but neither add_chan() nor del_chan() actually disables BH...
Re: Re: Re:Re: Re: [PATCH net] ppp: Fix a scheduling-while-atomic bug in del_chan
On Tue, Aug 8, 2017 at 10:13 PM, Gao Fengwrote: > Maybe I didn't show my explanation clearly. > I think it won't happen as I mentioned in the last email. > Because the pptp_release invokes the synchronize_rcu to make sure it, and > actually there is no one which would invoke del_chan except pptp_release. > It is guaranteed by that the pptp_release doesn't put the sock refcnt until > complete all cleanup include marking sk_state as PPPOX_DEAD. > > In other words, even though the pptp_release is not the last user of this > sock, the other one wouldn't invoke del_chan in pptp_sock_destruct. > Because the condition "!(sk->sk_state & PPPOX_DEAD)" must be false. Only if sock->sk is always non-NULL for pptp_release(), which is what I am not sure. If you look at other ->release(), similar checks are there too, so not just for pptp. > > As summary, the del_chan and pppox_unbind_sock in pptp_sock_destruct are > unnecessary. > And it even brings confusing. Sorry, I can't draw any conclusion for this.
Re:Re:Re: Re:Re: Re: [PATCH net] ppp: Fix a scheduling-while-atomic bug in del_chan
At 2017-08-09 13:13:53, "Gao Feng"wrote: >At 2017-08-09 03:45:53, "Cong Wang" wrote: >>On Mon, Aug 7, 2017 at 6:10 PM, Gao Feng wrote: >>> >>> Sorry, I don't get you clearly. Why the sock_hold() isn't helpful? >> >>I already told you, the dereference happends before sock_hold(). >> >>sock = rcu_dereference(callid_sock[call_id]); >>if (sock) { >>opt = >proto.pptp; >>if (opt->dst_addr.sin_addr.s_addr != s_addr) <=== HERE >>sock = NULL; >>else >>sock_hold(sk_pppox(sock)); >>} >> >>If we don't wait for readers properly, sock could be freed at the >>same time when deference it. > >Maybe I didn't show my explanation clearly. >I think it won't happen as I mentioned in the last email. >Because the pptp_release invokes the synchronize_rcu to make sure it, and >actually there is no one which would invoke del_chan except pptp_release. >It is guaranteed by that the pptp_release doesn't put the sock refcnt until >complete all cleanup include marking sk_state as PPPOX_DEAD. > >In other words, even though the pptp_release is not the last user of this >sock, the other one wouldn't invoke del_chan in pptp_sock_destruct. >Because the condition "!(sk->sk_state & PPPOX_DEAD)" must be false. > >As summary, the del_chan and pppox_unbind_sock in pptp_sock_destruct are >unnecessary. >And it even brings confusing. > >Best Regards >Feng > >> >>> The pptp_release invokes synchronize_rcu after del_chan, it could make sure >>> the others has increased the sock refcnt if necessary >>> and the lookup is over. >>> There is no one could get the sock after synchronize_rcu in pptp_release. >> >> >>If this were true, then this code in pptp_sock_destruct() >>would be unneeded: >> >>if (!(sk->sk_state & PPPOX_DEAD)) { >>del_chan(pppox_sk(sk)); >>pppox_unbind_sock(sk); >>} >> >> >>> >>> >>> But I think about another problem. >>> It seems the pptp_sock_destruct should not invoke del_chan and >>> pppox_unbind_sock. >>> Because when the sock refcnt is 0, the pptp_release must have be invoked >>> already. >>> >> >> >>I don't know. Looks like sock_orphan() is only called >>in pptp_release(), but I am not sure if there is a case >>we call sock destructor before release. >> >>Also note, this socket is very special, it doesn't support >>poll(), sendmsg() or recvmsg().. > > > Hi Cong, Actually I have one question about the SOCK_RCU_FREE. I don't think it could resolve the issue you raised even though it exists really. I checked the SOCK_RCU_FREE, it just defer the __sk_destruct after one rcu period. But when it performs, someone still could find this sock by callid during the del_chan period and it may still deference the sock which may freed soon. The right flow should be following. del_chan() wait a rcu period sock_put() It is safe that someone gets the sock because it already hold sock refcnt. When using SOCK_RCU_FREE, its flow would be following. wait a rcu period del_chan() free the sock directly no sock refcnt check again. Because the del_chan happens after rcu wait, not before, so it isn't helpful with SOCK_RCU_FREE. I don't know if I misunderstand the SOCK_RCU_FREE usage. But it is a good news that the del_chan is only invoked in pptp_release actually and it would wait a rcu period before sock_put. Best Regards Feng
Re:Re: Re:Re: Re: [PATCH net] ppp: Fix a scheduling-while-atomic bug in del_chan
At 2017-08-09 03:45:53, "Cong Wang"wrote: >On Mon, Aug 7, 2017 at 6:10 PM, Gao Feng wrote: >> >> Sorry, I don't get you clearly. Why the sock_hold() isn't helpful? > >I already told you, the dereference happends before sock_hold(). > >sock = rcu_dereference(callid_sock[call_id]); >if (sock) { >opt = >proto.pptp; >if (opt->dst_addr.sin_addr.s_addr != s_addr) <=== HERE >sock = NULL; >else >sock_hold(sk_pppox(sock)); >} > >If we don't wait for readers properly, sock could be freed at the >same time when deference it. Maybe I didn't show my explanation clearly. I think it won't happen as I mentioned in the last email. Because the pptp_release invokes the synchronize_rcu to make sure it, and actually there is no one which would invoke del_chan except pptp_release. It is guaranteed by that the pptp_release doesn't put the sock refcnt until complete all cleanup include marking sk_state as PPPOX_DEAD. In other words, even though the pptp_release is not the last user of this sock, the other one wouldn't invoke del_chan in pptp_sock_destruct. Because the condition "!(sk->sk_state & PPPOX_DEAD)" must be false. As summary, the del_chan and pppox_unbind_sock in pptp_sock_destruct are unnecessary. And it even brings confusing. Best Regards Feng > >> The pptp_release invokes synchronize_rcu after del_chan, it could make sure >> the others has increased the sock refcnt if necessary >> and the lookup is over. >> There is no one could get the sock after synchronize_rcu in pptp_release. > > >If this were true, then this code in pptp_sock_destruct() >would be unneeded: > >if (!(sk->sk_state & PPPOX_DEAD)) { >del_chan(pppox_sk(sk)); >pppox_unbind_sock(sk); >} > > >> >> >> But I think about another problem. >> It seems the pptp_sock_destruct should not invoke del_chan and >> pppox_unbind_sock. >> Because when the sock refcnt is 0, the pptp_release must have be invoked >> already. >> > > >I don't know. Looks like sock_orphan() is only called >in pptp_release(), but I am not sure if there is a case >we call sock destructor before release. > >Also note, this socket is very special, it doesn't support >poll(), sendmsg() or recvmsg()..
Re: Re:Re: Re: [PATCH net] ppp: Fix a scheduling-while-atomic bug in del_chan
On Mon, Aug 7, 2017 at 6:10 PM, Gao Fengwrote: > > Sorry, I don't get you clearly. Why the sock_hold() isn't helpful? I already told you, the dereference happends before sock_hold(). sock = rcu_dereference(callid_sock[call_id]); if (sock) { opt = >proto.pptp; if (opt->dst_addr.sin_addr.s_addr != s_addr) <=== HERE sock = NULL; else sock_hold(sk_pppox(sock)); } If we don't wait for readers properly, sock could be freed at the same time when deference it. > The pptp_release invokes synchronize_rcu after del_chan, it could make sure > the others has increased the sock refcnt if necessary > and the lookup is over. > There is no one could get the sock after synchronize_rcu in pptp_release. If this were true, then this code in pptp_sock_destruct() would be unneeded: if (!(sk->sk_state & PPPOX_DEAD)) { del_chan(pppox_sk(sk)); pppox_unbind_sock(sk); } > > > But I think about another problem. > It seems the pptp_sock_destruct should not invoke del_chan and > pppox_unbind_sock. > Because when the sock refcnt is 0, the pptp_release must have be invoked > already. > I don't know. Looks like sock_orphan() is only called in pptp_release(), but I am not sure if there is a case we call sock destructor before release. Also note, this socket is very special, it doesn't support poll(), sendmsg() or recvmsg()..
Re:Re:Re: Re: [PATCH net] ppp: Fix a scheduling-while-atomic bug in del_chan
At 2017-08-08 01:17:02, "Cong Wang"wrote: >On Sun, Aug 6, 2017 at 6:32 PM, Gao Feng wrote: >> I think the RCU should be supposed to avoid the race between del_chan and >> lookup_chan. > >More precisely, it is callid_sock which is protected by RCU. > >Unless I miss any other code path, pptp_exit_module() is >problematic too, I don't think it can just vfree() the whole thing. > > >> The synchronize_rcu could make sure if there was one which calls lookup_chan >> in this period, it would be finished and the sock refcnt is increased if >> necessary. >> >> So I think it is ok to invoke sock_put directly without SOCK_RCU_FREE, >> because the lookup_chan caller has already hold the sock refcnt, >> > Hi Cong, I just thought about this issue last night, then I get your this email this morning. >If you mean the sock_hold() inside lookup_chan(), no, >it doesn't help because we already dereference the sock >before it. > Sorry, I don't get you clearly. Why the sock_hold() isn't helpful? The pptp_release invokes synchronize_rcu after del_chan, it could make sure the others has increased the sock refcnt if necessary and the lookup is over. There is no one could get the sock after synchronize_rcu in pptp_release. But I think about another problem. It seems the pptp_sock_destruct should not invoke del_chan and pppox_unbind_sock. Because when the sock refcnt is 0, the pptp_release must have be invoked already. There are two cases totally. 1. when pptp_release invokes sock_put, the refcnt is 0. The del_chan and pppox_unbind_sock are invoked. 2. when pptp_release invokes sock_put, the refcnt is not 0. It means someone holds the sock during the period pptp_release invokes del_chan. Then someone invokes sock_put and the sock refcnt reach 0, it would invoke sk_free and invokes pptp_sock_destruct. So it is unnecessary to invoke del_chan and pppox_unbind_sock again. And it would bring a race issue even if the pptp_sock_destruct invoked del_chan. If so, I would send another patch for it. >Also, lookup_chan_dst() does not have a refcnt, I don't >find any code preventing it deref'ing other sock in callid_sock >than the calling one. Sorry, the last email is html format, not text. So I send it with text format again. Best Regards Feng
Re: Re: [PATCH net] ppp: Fix a scheduling-while-atomic bug in del_chan
On Mon, Aug 7, 2017 at 10:17 AM, Cong Wangwrote: > Unless I miss any other code path, pptp_exit_module() is > problematic too, I don't think it can just vfree() the whole thing. > This path should be fine, because: 1. sock holds a refcnt to module via proto->owner 2. gre_del_protocol() already waits for readers
Re: Re: [PATCH net] ppp: Fix a scheduling-while-atomic bug in del_chan
On Sun, Aug 6, 2017 at 6:32 PM, Gao Fengwrote: > I think the RCU should be supposed to avoid the race between del_chan and > lookup_chan. More precisely, it is callid_sock which is protected by RCU. Unless I miss any other code path, pptp_exit_module() is problematic too, I don't think it can just vfree() the whole thing. > The synchronize_rcu could make sure if there was one which calls lookup_chan > in this period, it would be finished and the sock refcnt is increased if > necessary. > > So I think it is ok to invoke sock_put directly without SOCK_RCU_FREE, > because the lookup_chan caller has already hold the sock refcnt, > If you mean the sock_hold() inside lookup_chan(), no, it doesn't help because we already dereference the sock before it. Also, lookup_chan_dst() does not have a refcnt, I don't find any code preventing it deref'ing other sock in callid_sock than the calling one.
Re:Re: [PATCH net] ppp: Fix a scheduling-while-atomic bug in del_chan
At 2017-08-03 01:13:36, "Cong Wang"wrote: >Hi, Gao > >On Tue, Aug 1, 2017 at 1:39 PM, Cong Wang wrote: >> From my understanding, this RCU is supposed to protect the pppox_sock >> pointers in 'callid_sock' which could be NULL'ed in del_chan(). And the >> pppox_sock is freed when the last refcnt is gone, that is, when sock >> dctor is called. pptp_release() is ONLY called when the fd in user-space >> is gone, not necessarily the last refcnt. Hi Cong, I am sorry to reply you so late, because I took a short trip recently, and didn't check my emails. I think the RCU should be supposed to avoid the race between del_chan and lookup_chan. The synchronize_rcu could make sure if there was one which calls lookup_chan in this period, it would be finished and the sock refcnt is increased if necessary. So I think it is ok to invoke sock_put directly without SOCK_RCU_FREE, because the lookup_chan caller has already hold the sock refcnt, Best Regards Feng > >Your commit is probably not the right fix. Can you try the following fix? > >diff --git a/drivers/net/ppp/pptp.c b/drivers/net/ppp/pptp.c >index 6dde9a0cfe76..e75bb95c107f 100644 >--- a/drivers/net/ppp/pptp.c >+++ b/drivers/net/ppp/pptp.c >@@ -519,7 +519,6 @@ static int pptp_release(struct socket *sock) > >po = pppox_sk(sk); >del_chan(po); >- synchronize_rcu(); > >pppox_unbind_sock(sk); >sk->sk_state = PPPOX_DEAD; >@@ -564,6 +563,7 @@ static int pptp_create(struct net *net, struct >socket *sock, int kern) >sk->sk_family = PF_PPPOX; >sk->sk_protocol= PX_PROTO_PPTP; >sk->sk_destruct= pptp_sock_destruct; >+ sock_set_flag(sk, SOCK_RCU_FREE); > >po = pppox_sk(sk); >opt = >proto.pptp;
Re: [PATCH net] ppp: Fix a scheduling-while-atomic bug in del_chan
Hi, Gao On Tue, Aug 1, 2017 at 1:39 PM, Cong Wangwrote: > From my understanding, this RCU is supposed to protect the pppox_sock > pointers in 'callid_sock' which could be NULL'ed in del_chan(). And the > pppox_sock is freed when the last refcnt is gone, that is, when sock > dctor is called. pptp_release() is ONLY called when the fd in user-space > is gone, not necessarily the last refcnt. Your commit is probably not the right fix. Can you try the following fix? diff --git a/drivers/net/ppp/pptp.c b/drivers/net/ppp/pptp.c index 6dde9a0cfe76..e75bb95c107f 100644 --- a/drivers/net/ppp/pptp.c +++ b/drivers/net/ppp/pptp.c @@ -519,7 +519,6 @@ static int pptp_release(struct socket *sock) po = pppox_sk(sk); del_chan(po); - synchronize_rcu(); pppox_unbind_sock(sk); sk->sk_state = PPPOX_DEAD; @@ -564,6 +563,7 @@ static int pptp_create(struct net *net, struct socket *sock, int kern) sk->sk_family = PF_PPPOX; sk->sk_protocol= PX_PROTO_PPTP; sk->sk_destruct= pptp_sock_destruct; + sock_set_flag(sk, SOCK_RCU_FREE); po = pppox_sk(sk); opt = >proto.pptp;
Re: [PATCH net] ppp: Fix a scheduling-while-atomic bug in del_chan
On Mon, Jul 31, 2017 at 3:07 AM,wrote: > From: Gao Feng > > The PPTP set the pptp_sock_destruct as the sock's sk_destruct, it would > trigger this bug when __sk_free is invoked in atomic context, because of > the call path pptp_sock_destruct->del_chan->synchronize_rcu. > > Now move the synchronize_rcu to pptp_release from del_chan. This is the > only one case which would free the sock and need the synchronize_rcu. I don't understand the last part. >From my understanding, this RCU is supposed to protect the pppox_sock pointers in 'callid_sock' which could be NULL'ed in del_chan(). And the pppox_sock is freed when the last refcnt is gone, that is, when sock dctor is called. pptp_release() is ONLY called when the fd in user-space is gone, not necessarily the last refcnt.
Re: [PATCH net] ppp: Fix a scheduling-while-atomic bug in del_chan
From: gfree.w...@vip.163.com Date: Mon, 31 Jul 2017 18:07:38 +0800 > From: Gao Feng> > The PPTP set the pptp_sock_destruct as the sock's sk_destruct, it would > trigger this bug when __sk_free is invoked in atomic context, because of > the call path pptp_sock_destruct->del_chan->synchronize_rcu. > > Now move the synchronize_rcu to pptp_release from del_chan. This is the > only one case which would free the sock and need the synchronize_rcu. > > The following is the panic I met with kernel 3.3.8, but this issue should > exist in current kernel too according to the codes. ... > Signed-off-by: Gao Feng Applied, thanks.
[PATCH net] ppp: Fix a scheduling-while-atomic bug in del_chan
From: Gao FengThe PPTP set the pptp_sock_destruct as the sock's sk_destruct, it would trigger this bug when __sk_free is invoked in atomic context, because of the call path pptp_sock_destruct->del_chan->synchronize_rcu. Now move the synchronize_rcu to pptp_release from del_chan. This is the only one case which would free the sock and need the synchronize_rcu. The following is the panic I met with kernel 3.3.8, but this issue should exist in current kernel too according to the codes. BUG: scheduling while atomic __schedule_bug+0x5e/0x64 __schedule+0x55/0x580 ? ppp_unregister_channel+0x1cd5/0x1de0 [ppp_generic] ? dev_hard_start_xmit+0x423/0x530 ? sch_direct_xmit+0x73/0x170 __cond_resched+0x16/0x30 _cond_resched+0x22/0x30 wait_for_common+0x18/0x110 ? call_rcu_bh+0x10/0x10 wait_for_completion+0x12/0x20 wait_rcu_gp+0x34/0x40 ? wait_rcu_gp+0x40/0x40 synchronize_sched+0x1e/0x20 0xf8417298 0xf8417484 ? sock_queue_rcv_skb+0x109/0x130 __sk_free+0x16/0x110 ? udp_queue_rcv_skb+0x1f2/0x290 sk_free+0x16/0x20 __udp4_lib_rcv+0x3b8/0x650 Signed-off-by: Gao Feng --- drivers/net/ppp/pptp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ppp/pptp.c b/drivers/net/ppp/pptp.c index eac499c..6dde9a0 100644 --- a/drivers/net/ppp/pptp.c +++ b/drivers/net/ppp/pptp.c @@ -131,7 +131,6 @@ static void del_chan(struct pppox_sock *sock) clear_bit(sock->proto.pptp.src_addr.call_id, callid_bitmap); RCU_INIT_POINTER(callid_sock[sock->proto.pptp.src_addr.call_id], NULL); spin_unlock(_lock); - synchronize_rcu(); } static int pptp_xmit(struct ppp_channel *chan, struct sk_buff *skb) @@ -520,6 +519,7 @@ static int pptp_release(struct socket *sock) po = pppox_sk(sk); del_chan(po); + synchronize_rcu(); pppox_unbind_sock(sk); sk->sk_state = PPPOX_DEAD; -- 1.9.1