Re: pipex(4): use reference counters for `ifnet'
On 27/06/20(Sat) 17:58, Vitaliy Makkoveev wrote: > > [...] > > Look at r1.329 of net/if.c. Prior to this change if_detach_queues() was > > used to free all mbufs when an interface was removed. Now lazy freeing > > is used everytime if_get(9) rerturns NULL. > > > > This is possible because we store an index and not a pointer directly in > > the mbuf. > > > > The advantage of storing a session pointer in `ph_cookie' is that no > > lookup is required in pipexintr(), right? Maybe we could save a ID > > instead and do a lookup. How big can be the `pipex_session_list'? > > > > It's unlimited. In pppac(4) case you create the only one interface and > you can share it between the count of sessions you wish. In my practice > I had machines with 800+ active ppp interfaces in 2005. We can have > dosens of cores and hundreds gigs of ram now. How big can be real > count of active ppp interfaces on VPN provider's NAS? With that number of items a linear list might not be the best fit if we decide to stop using pointers. So if we want to use a "lookup" a different data structure might be more appropriate. > I looked at r1.328 of net/if.c. > Imagine we have a lot of connected clients with high traffic. One on them > starts connect/disconnect games. It's not malicious hacker, just mobile > phone in area with low signal. You need: > > 1. block pipexintr() (and netisr too). > 2. block insertion to queues from concurrent threads. > 3. Walk through very loaded queues and compare. And most or may be all >packets are foreign. > 4. Repeat it every time you lost connection. > > Yes, now it's all serealized. And pipexintr() is already blocked while > we do session destruction(). But what is the reason to make your future > life harder? pipex(4) session already has pointer to it's relared > `ifnet'. `ifnet' already has reference counters. Why don't use them? We decided as a team to not use reference counting because they are hard to debug. Using if_get(9)/if_put(9) allows us to check that our changes where correct with static analysis tools. If we start using reference counting for ifps in pipex(4) they we now have an exception in the network stack. That means the techniques applied are not coherent and it makes it harder to work across a huge amount of code. But I don't care, if there's a consensus that it is the way to go, then go for it. > In the way I suggest to use refcounters for pipex(4) session you don't > need to block your packet processing. You don't need to do searchs. You > just need to be sure the `ifnet' you has is still alive. You already has > mechanics for that. Why don't use this? The same can be said with the actual machinery. Using a if_get(9)-like approach doesn't block packet processing, you don't need to to search, there's no reference counting bug that can lead to deadlock, it is consistent with the existing network stack and there are already example of implementation like: if_get() and rtable_get(). > You don't like if_ref() to be > used outside net/if.c? Ok, what's wrong with referenced pointers > obtained by if_get(9)? While session is alive it *uses* it's `ifnet'. It > uses `ifnet' all lifetime *not* only while output. And we should be > shure we didn't destroy `ifnet' while someone uses it. What's wrong with > referenced pointers? If there's a bug, if the reference counting is messed up, it is hard to find where that happened. It is like a use-after-free, where the crash happens is not where the bug is. If you see a leak you don't know where the reference drop is missing. Sure they are many ways to deal with that, but since we did not embrace them. I'm not sure that it makes sense to change direction or introduce a difference now. > The way I wish to go used in file(9). May be it is > totally wrong and we need global in-kernel descriptor table where `fp' > will be referenced by index too :) ? That's not what I'm saying. if_get(9) already exists and is already used in a certain way, I don't see why not embrace that and keep the network stack coherent. But once again, I don't want to block anyone in its development, if you and others agree that's the way to go, then let's go this way.
Re: pipex(4): use reference counters for `ifnet'
On Sat, Jun 27, 2020 at 12:41:29PM +0200, Martin Pieuchot wrote: > On 27/06/20(Sat) 01:02, Vitaliy Makkoveev wrote: > > On Fri, Jun 26, 2020 at 09:15:38PM +0200, Martin Pieuchot wrote: > > > On 26/06/20(Fri) 17:53, Vitaliy Makkoveev wrote: > > > > On Fri, Jun 26, 2020 at 02:29:03PM +0200, Martin Pieuchot wrote: > > > > > On 26/06/20(Fri) 12:35, Vitaliy Makkoveev wrote: > > > > > > On Fri, Jun 26, 2020 at 10:23:42AM +0200, Martin Pieuchot wrote: > > > > > > > On 25/06/20(Thu) 19:56, Vitaliy Makkoveev wrote: > > > > > > > > Updated diff. > > > > > > > > > > > > > > > > OpenBSD uses 16 bit counter for allocate interface indexes. So > > > > > > > > we can't > > > > > > > > store index in session and be sure if_get(9) returned `ifnet' > > > > > > > > is our > > > > > > > > original `ifnet'. > > > > > > > > > > > > > > Why not? The point of if_get(9) is to be sure. If that doesn't > > > > > > > work > > > > > > > for whatever reason then the if_get(9) interface has to be fixed. > > > > > > > Which > > > > > > > case doesn't work for you? Do you have a reproducer? > > > > > > > > > > > > > > How does sessions stay around if their corresponding interface is > > > > > > > destroyed? > > > > > > > > > > > > We have `pipexinq' and `pipexoutq' which can store pointers to > > > > > > session. > > > > > > pipexintr() process these queues. pipexintr() and > > > > > > pipex_destroy_session() are *always* different context. This mean we > > > > > > *can't* free pipex(4) session without be sure there is no reference > > > > > > to > > > > > > this session in `pipexinq' or `pipexoutq'. Elsewhere this will > > > > > > cause use > > > > > > afret free issue. Look please at net/pipex.c:846. The way pppx(4) > > > > > > destroy sessions is wrong. While pppac(4) destroys sessions by > > > > > > pipex_iface_fini() it's also wrong. Because we don't check > > > > > > `pipexinq' > > > > > > and `pipexoutq' state. I'am said it again and again. > > > > > > > > > > I understand. Why is it a problem? Using reference counting the way > > > > > you're suggesting is *one* possible solution to a problem we don't > > > > > fully > > > > > understand. What are we trying to achieve? Which problem are we > > > > > trying > > > > > to solve? > > > > > > > > Sorry, may be I misunderstand something. > > > > > > > > `pipexoutq' case: > > > > > > > > 1. pppac_start() calls pipex_output() > > > > 2. pipex_output() calls pipex_ip_output() > > > > 3. pipex_ip_output() calls pipex_ppp_enqueue() > > > > 4. pipex_ppp_enqueue() calls schednetisr() which is task_add() > > > > > > > > `pipexinq' cases: > > > > > > > > 1.1. ether_input() calls pipex_pppoe_input() > > > > 1.2. gre_input() calls gre_input_1() > > > > gre_input_1() calls pipex_pptp_input() > > > > 1.3. udp_input() calls pipex_l2tp_input() > > > > > > > > 2. pipex_{pppoe,pptp,l2tp}_input() calls pipex_common_input() > > > > 3. pipex_common_input() calls schednetisr() which is task_add() > > > > > > > > task_add(9) just schedules the execution of the work specified by `tq'. > > > > So we can do pipex_destroy_session() * between * schednetisr() and > > > > pipexintr(). And we can do this right * now *, with our current locking. > > > > And this is the problem I'am trying to solve. > > > > > > > > My apologies if I'am wrong above. Please point me where I'am wrong. > > > > > > > > Also before pipex_{pppoe,pptp,l2tp}_input() we call corresponding > > > > pipex_{pptp,l2tp}_lookup_session() to obtain pointer to pipex(4) > > > > session. We should be shure `session' is still walid between > > > > pipex_*_lookup() and pipex_*_input(). It's not required now, but will be > > > > required in future. > > > > > > Why not iterate over the queues and garbage collect the sessions that > > > are about to be removed? That's what the network stack was doing with > > > mbuf queues prior to if_get(9) when interfaces where destroyed. > > > > > > > Do you mean net/if.c:1185 and below? This is the queues associated with > > this `ifp'. But for pipex(4) we should go through all mbufs associated > > with pipex(4). This can be heavy if we have hundreds of sessions. Also > > this would work until session destruction and `pr_input' are serialized. > > > > Point me please the line in source to see if I'am wrong about `ifnet's > > mbuf queues claninig. > > Look at r1.329 of net/if.c. Prior to this change if_detach_queues() was > used to free all mbufs when an interface was removed. Now lazy freeing > is used everytime if_get(9) rerturns NULL. > > This is possible because we store an index and not a pointer directly in > the mbuf. > > The advantage of storing a session pointer in `ph_cookie' is that no > lookup is required in pipexintr(), right? Maybe we could save a ID > instead and do a lookup. How big can be the `pipex_session_list'? > It's unlimited. In pppac(4) case you create the only one interface and you can share it between the count of sessions you wish. In my prac
Re: pipex(4): use reference counters for `ifnet'
On 27/06/20(Sat) 01:02, Vitaliy Makkoveev wrote: > On Fri, Jun 26, 2020 at 09:15:38PM +0200, Martin Pieuchot wrote: > > On 26/06/20(Fri) 17:53, Vitaliy Makkoveev wrote: > > > On Fri, Jun 26, 2020 at 02:29:03PM +0200, Martin Pieuchot wrote: > > > > On 26/06/20(Fri) 12:35, Vitaliy Makkoveev wrote: > > > > > On Fri, Jun 26, 2020 at 10:23:42AM +0200, Martin Pieuchot wrote: > > > > > > On 25/06/20(Thu) 19:56, Vitaliy Makkoveev wrote: > > > > > > > Updated diff. > > > > > > > > > > > > > > OpenBSD uses 16 bit counter for allocate interface indexes. So we > > > > > > > can't > > > > > > > store index in session and be sure if_get(9) returned `ifnet' is > > > > > > > our > > > > > > > original `ifnet'. > > > > > > > > > > > > Why not? The point of if_get(9) is to be sure. If that doesn't > > > > > > work > > > > > > for whatever reason then the if_get(9) interface has to be fixed. > > > > > > Which > > > > > > case doesn't work for you? Do you have a reproducer? > > > > > > > > > > > > How does sessions stay around if their corresponding interface is > > > > > > destroyed? > > > > > > > > > > We have `pipexinq' and `pipexoutq' which can store pointers to > > > > > session. > > > > > pipexintr() process these queues. pipexintr() and > > > > > pipex_destroy_session() are *always* different context. This mean we > > > > > *can't* free pipex(4) session without be sure there is no reference to > > > > > this session in `pipexinq' or `pipexoutq'. Elsewhere this will cause > > > > > use > > > > > afret free issue. Look please at net/pipex.c:846. The way pppx(4) > > > > > destroy sessions is wrong. While pppac(4) destroys sessions by > > > > > pipex_iface_fini() it's also wrong. Because we don't check `pipexinq' > > > > > and `pipexoutq' state. I'am said it again and again. > > > > > > > > I understand. Why is it a problem? Using reference counting the way > > > > you're suggesting is *one* possible solution to a problem we don't fully > > > > understand. What are we trying to achieve? Which problem are we trying > > > > to solve? > > > > > > Sorry, may be I misunderstand something. > > > > > > `pipexoutq' case: > > > > > > 1. pppac_start() calls pipex_output() > > > 2. pipex_output() calls pipex_ip_output() > > > 3. pipex_ip_output() calls pipex_ppp_enqueue() > > > 4. pipex_ppp_enqueue() calls schednetisr() which is task_add() > > > > > > `pipexinq' cases: > > > > > > 1.1. ether_input() calls pipex_pppoe_input() > > > 1.2. gre_input() calls gre_input_1() > > > gre_input_1() calls pipex_pptp_input() > > > 1.3. udp_input() calls pipex_l2tp_input() > > > > > > 2. pipex_{pppoe,pptp,l2tp}_input() calls pipex_common_input() > > > 3. pipex_common_input() calls schednetisr() which is task_add() > > > > > > task_add(9) just schedules the execution of the work specified by `tq'. > > > So we can do pipex_destroy_session() * between * schednetisr() and > > > pipexintr(). And we can do this right * now *, with our current locking. > > > And this is the problem I'am trying to solve. > > > > > > My apologies if I'am wrong above. Please point me where I'am wrong. > > > > > > Also before pipex_{pppoe,pptp,l2tp}_input() we call corresponding > > > pipex_{pptp,l2tp}_lookup_session() to obtain pointer to pipex(4) > > > session. We should be shure `session' is still walid between > > > pipex_*_lookup() and pipex_*_input(). It's not required now, but will be > > > required in future. > > > > Why not iterate over the queues and garbage collect the sessions that > > are about to be removed? That's what the network stack was doing with > > mbuf queues prior to if_get(9) when interfaces where destroyed. > > > > Do you mean net/if.c:1185 and below? This is the queues associated with > this `ifp'. But for pipex(4) we should go through all mbufs associated > with pipex(4). This can be heavy if we have hundreds of sessions. Also > this would work until session destruction and `pr_input' are serialized. > > Point me please the line in source to see if I'am wrong about `ifnet's > mbuf queues claninig. Look at r1.329 of net/if.c. Prior to this change if_detach_queues() was used to free all mbufs when an interface was removed. Now lazy freeing is used everytime if_get(9) rerturns NULL. This is possible because we store an index and not a pointer directly in the mbuf. The advantage of storing a session pointer in `ph_cookie' is that no lookup is required in pipexintr(), right? Maybe we could save a ID instead and do a lookup. How big can be the `pipex_session_list'?
Re: pipex(4): use reference counters for `ifnet'
On Fri, Jun 26, 2020 at 09:15:38PM +0200, Martin Pieuchot wrote: > On 26/06/20(Fri) 17:53, Vitaliy Makkoveev wrote: > > On Fri, Jun 26, 2020 at 02:29:03PM +0200, Martin Pieuchot wrote: > > > On 26/06/20(Fri) 12:35, Vitaliy Makkoveev wrote: > > > > On Fri, Jun 26, 2020 at 10:23:42AM +0200, Martin Pieuchot wrote: > > > > > On 25/06/20(Thu) 19:56, Vitaliy Makkoveev wrote: > > > > > > Updated diff. > > > > > > > > > > > > OpenBSD uses 16 bit counter for allocate interface indexes. So we > > > > > > can't > > > > > > store index in session and be sure if_get(9) returned `ifnet' is our > > > > > > original `ifnet'. > > > > > > > > > > Why not? The point of if_get(9) is to be sure. If that doesn't work > > > > > for whatever reason then the if_get(9) interface has to be fixed. > > > > > Which > > > > > case doesn't work for you? Do you have a reproducer? > > > > > > > > > > How does sessions stay around if their corresponding interface is > > > > > destroyed? > > > > > > > > We have `pipexinq' and `pipexoutq' which can store pointers to session. > > > > pipexintr() process these queues. pipexintr() and > > > > pipex_destroy_session() are *always* different context. This mean we > > > > *can't* free pipex(4) session without be sure there is no reference to > > > > this session in `pipexinq' or `pipexoutq'. Elsewhere this will cause use > > > > afret free issue. Look please at net/pipex.c:846. The way pppx(4) > > > > destroy sessions is wrong. While pppac(4) destroys sessions by > > > > pipex_iface_fini() it's also wrong. Because we don't check `pipexinq' > > > > and `pipexoutq' state. I'am said it again and again. > > > > > > I understand. Why is it a problem? Using reference counting the way > > > you're suggesting is *one* possible solution to a problem we don't fully > > > understand. What are we trying to achieve? Which problem are we trying > > > to solve? > > > > Sorry, may be I misunderstand something. > > > > `pipexoutq' case: > > > > 1. pppac_start() calls pipex_output() > > 2. pipex_output() calls pipex_ip_output() > > 3. pipex_ip_output() calls pipex_ppp_enqueue() > > 4. pipex_ppp_enqueue() calls schednetisr() which is task_add() > > > > `pipexinq' cases: > > > > 1.1. ether_input() calls pipex_pppoe_input() > > 1.2. gre_input() calls gre_input_1() > > gre_input_1() calls pipex_pptp_input() > > 1.3. udp_input() calls pipex_l2tp_input() > > > > 2. pipex_{pppoe,pptp,l2tp}_input() calls pipex_common_input() > > 3. pipex_common_input() calls schednetisr() which is task_add() > > > > task_add(9) just schedules the execution of the work specified by `tq'. > > So we can do pipex_destroy_session() * between * schednetisr() and > > pipexintr(). And we can do this right * now *, with our current locking. > > And this is the problem I'am trying to solve. > > > > My apologies if I'am wrong above. Please point me where I'am wrong. > > > > Also before pipex_{pppoe,pptp,l2tp}_input() we call corresponding > > pipex_{pptp,l2tp}_lookup_session() to obtain pointer to pipex(4) > > session. We should be shure `session' is still walid between > > pipex_*_lookup() and pipex_*_input(). It's not required now, but will be > > required in future. > > Why not iterate over the queues and garbage collect the sessions that > are about to be removed? That's what the network stack was doing with > mbuf queues prior to if_get(9) when interfaces where destroyed. > Do you mean net/if.c:1185 and below? This is the queues associated with this `ifp'. But for pipex(4) we should go through all mbufs associated with pipex(4). This can be heavy if we have hundreds of sessions. Also this would work until session destruction and `pr_input' are serialized. Point me please the line in source to see if I'am wrong about `ifnet's mbuf queues claninig.
Re: pipex(4): use reference counters for `ifnet'
On 26/06/20(Fri) 17:53, Vitaliy Makkoveev wrote: > On Fri, Jun 26, 2020 at 02:29:03PM +0200, Martin Pieuchot wrote: > > On 26/06/20(Fri) 12:35, Vitaliy Makkoveev wrote: > > > On Fri, Jun 26, 2020 at 10:23:42AM +0200, Martin Pieuchot wrote: > > > > On 25/06/20(Thu) 19:56, Vitaliy Makkoveev wrote: > > > > > Updated diff. > > > > > > > > > > OpenBSD uses 16 bit counter for allocate interface indexes. So we > > > > > can't > > > > > store index in session and be sure if_get(9) returned `ifnet' is our > > > > > original `ifnet'. > > > > > > > > Why not? The point of if_get(9) is to be sure. If that doesn't work > > > > for whatever reason then the if_get(9) interface has to be fixed. Which > > > > case doesn't work for you? Do you have a reproducer? > > > > > > > > How does sessions stay around if their corresponding interface is > > > > destroyed? > > > > > > We have `pipexinq' and `pipexoutq' which can store pointers to session. > > > pipexintr() process these queues. pipexintr() and > > > pipex_destroy_session() are *always* different context. This mean we > > > *can't* free pipex(4) session without be sure there is no reference to > > > this session in `pipexinq' or `pipexoutq'. Elsewhere this will cause use > > > afret free issue. Look please at net/pipex.c:846. The way pppx(4) > > > destroy sessions is wrong. While pppac(4) destroys sessions by > > > pipex_iface_fini() it's also wrong. Because we don't check `pipexinq' > > > and `pipexoutq' state. I'am said it again and again. > > > > I understand. Why is it a problem? Using reference counting the way > > you're suggesting is *one* possible solution to a problem we don't fully > > understand. What are we trying to achieve? Which problem are we trying > > to solve? > > Sorry, may be I misunderstand something. > > `pipexoutq' case: > > 1. pppac_start() calls pipex_output() > 2. pipex_output() calls pipex_ip_output() > 3. pipex_ip_output() calls pipex_ppp_enqueue() > 4. pipex_ppp_enqueue() calls schednetisr() which is task_add() > > `pipexinq' cases: > > 1.1. ether_input() calls pipex_pppoe_input() > 1.2. gre_input() calls gre_input_1() > gre_input_1() calls pipex_pptp_input() > 1.3. udp_input() calls pipex_l2tp_input() > > 2. pipex_{pppoe,pptp,l2tp}_input() calls pipex_common_input() > 3. pipex_common_input() calls schednetisr() which is task_add() > > task_add(9) just schedules the execution of the work specified by `tq'. > So we can do pipex_destroy_session() * between * schednetisr() and > pipexintr(). And we can do this right * now *, with our current locking. > And this is the problem I'am trying to solve. > > My apologies if I'am wrong above. Please point me where I'am wrong. > > Also before pipex_{pppoe,pptp,l2tp}_input() we call corresponding > pipex_{pptp,l2tp}_lookup_session() to obtain pointer to pipex(4) > session. We should be shure `session' is still walid between > pipex_*_lookup() and pipex_*_input(). It's not required now, but will be > required in future. Why not iterate over the queues and garbage collect the sessions that are about to be removed? That's what the network stack was doing with mbuf queues prior to if_get(9) when interfaces where destroyed.
Re: pipex(4): use reference counters for `ifnet'
On Fri, Jun 26, 2020 at 02:29:03PM +0200, Martin Pieuchot wrote: > On 26/06/20(Fri) 12:35, Vitaliy Makkoveev wrote: > > On Fri, Jun 26, 2020 at 10:23:42AM +0200, Martin Pieuchot wrote: > > > On 25/06/20(Thu) 19:56, Vitaliy Makkoveev wrote: > > > > Updated diff. > > > > > > > > OpenBSD uses 16 bit counter for allocate interface indexes. So we can't > > > > store index in session and be sure if_get(9) returned `ifnet' is our > > > > original `ifnet'. > > > > > > Why not? The point of if_get(9) is to be sure. If that doesn't work > > > for whatever reason then the if_get(9) interface has to be fixed. Which > > > case doesn't work for you? Do you have a reproducer? > > > > > > How does sessions stay around if their corresponding interface is > > > destroyed? > > > > We have `pipexinq' and `pipexoutq' which can store pointers to session. > > pipexintr() process these queues. pipexintr() and > > pipex_destroy_session() are *always* different context. This mean we > > *can't* free pipex(4) session without be sure there is no reference to > > this session in `pipexinq' or `pipexoutq'. Elsewhere this will cause use > > afret free issue. Look please at net/pipex.c:846. The way pppx(4) > > destroy sessions is wrong. While pppac(4) destroys sessions by > > pipex_iface_fini() it's also wrong. Because we don't check `pipexinq' > > and `pipexoutq' state. I'am said it again and again. > > I understand. Why is it a problem? Using reference counting the way > you're suggesting is *one* possible solution to a problem we don't fully > understand. What are we trying to achieve? Which problem are we trying > to solve? Sorry, may be I misunderstand something. `pipexoutq' case: 1. pppac_start() calls pipex_output() 2. pipex_output() calls pipex_ip_output() 3. pipex_ip_output() calls pipex_ppp_enqueue() 4. pipex_ppp_enqueue() calls schednetisr() which is task_add() `pipexinq' cases: 1.1. ether_input() calls pipex_pppoe_input() 1.2. gre_input() calls gre_input_1() gre_input_1() calls pipex_pptp_input() 1.3. udp_input() calls pipex_l2tp_input() 2. pipex_{pppoe,pptp,l2tp}_input() calls pipex_common_input() 3. pipex_common_input() calls schednetisr() which is task_add() task_add(9) just schedules the execution of the work specified by `tq'. So we can do pipex_destroy_session() * between * schednetisr() and pipexintr(). And we can do this right * now *, with our current locking. And this is the problem I'am trying to solve. My apologies if I'am wrong above. Please point me where I'am wrong. Also before pipex_{pppoe,pptp,l2tp}_input() we call corresponding pipex_{pptp,l2tp}_lookup_session() to obtain pointer to pipex(4) session. We should be shure `session' is still walid between pipex_*_lookup() and pipex_*_input(). It's not required now, but will be required in future. > Are we trying to move pipexintr() out of KERNEL_LOCK()? If so are we > trying to guarantee the memory pointed by `ph_cookie' representing a > session pointer is always valid without the KERNEL_LOCK()? > > If that's what we're trying to do, shouldn't we start with a baby-step > and trade the KERNEL_LOCK() for another lock first? I understand it is > not some sexy-mp-rcu-atomic-crazy-refcount-clever solution, but it is > simpler to do. > > Since all the packets (mbuf) are being processed with the NET_LOCK(). we > should be able to start trading the KERNEL_LOCK() by the NET_LOCK(), no? > Once that works and we've built knowledge about the existing architecture > we might consider some finer grain locking. > > Is there any downside to this suggestion? >
Re: pipex(4): use reference counters for `ifnet'
On Fri, Jun 26, 2020 at 10:23:42AM +0200, Martin Pieuchot wrote: > On 25/06/20(Thu) 19:56, Vitaliy Makkoveev wrote: > > Updated diff. > > > > OpenBSD uses 16 bit counter for allocate interface indexes. So we can't > > store index in session and be sure if_get(9) returned `ifnet' is our > > original `ifnet'. > > Why not? The point of if_get(9) is to be sure. If that doesn't work > for whatever reason then the if_get(9) interface has to be fixed. Which > case doesn't work for you? Do you have a reproducer? > > How does sessions stay around if their corresponding interface is > destroyed? We have `pipexinq' and `pipexoutq' which can store pointers to session. pipexintr() process these queues. pipexintr() and pipex_destroy_session() are *always* different context. This mean we *can't* free pipex(4) session without be sure there is no reference to this session in `pipexinq' or `pipexoutq'. Elsewhere this will cause use afret free issue. Look please at net/pipex.c:846. The way pppx(4) destroy sessions is wrong. While pppac(4) destroys sessions by pipex_iface_fini() it's also wrong. Because we don't check `pipexinq' and `pipexoutq' state. I'am said it again and again. Look at net/pipex.c:777 Static int pipex_ppp_enqueue(struct mbuf *m0, struct pipex_session *session, struct mbuf_queue *mq) { m0->m_pkthdr.ph_cookie = session; /* [1] */ /* XXX need to support other protocols */ m0->m_pkthdr.ph_ppp_proto = PPP_IP; if (mq_enqueue(mq, m0) != 0) /* [2] */ return (1); schednetisr(NETISR_PIPEX); /* [3] */ return (0); } and net/pipex.c:643 Static int pipex_destroy_session(struct pipex_session *session) { struct radix_node *rn; /* remove from radix tree and hash chain */ NET_ASSERT_LOCKED(); if (!in_nullhost(session->ip_address.sin_addr)) { rn = rn_delete(&session->ip_address, &session->ip_netmask, pipex_rd_head4, (struct radix_node *)session); KASSERT(rn != NULL); } pipex_unlink_session(session); pool_put(&pipex_session_pool, session); /* [4] */ return (0); } and net/pipex.c:720 void pipexintr(void) { struct pipex_session *pkt_session; u_int16_t proto; struct mbuf *m; struct mbuf_list ml; /* ppp output */ mq_delist(&pipexoutq, &ml); while ((m = ml_dequeue(&ml)) != NULL) { pkt_session = m->m_pkthdr.ph_cookie; /* [5] */ /* ... */ pipex_ppp_output(m, pkt_session, proto); /* [7] */ } /* ... */ mq_delist(&pipexinq, &ml); while ((m = ml_dequeue(&ml)) != NULL) { pkt_session = m->m_pkthdr.ph_cookie; /* [6] */ /* ... */ pipex_ppp_input(m, pkt_session, 0); /* [8] */ } } and net/pipex.c:808 Static void pipex_timer(void *ignored_arg) { struct pipex_session *session, *session_tmp; timeout_add_sec(&pipex_timer_ch, pipex_prune); NET_LOCK(); /* walk through */ LIST_FOREACH_SAFE(session, &pipex_session_list, session_list, session_tmp) { switch (session->state) { /* ... */ case PIPEX_STATE_CLOSED: /* * mbuf queued in pipexinq or pipexoutq may have a * refererce to this session. */ /* [9] */ if (!mq_empty(&pipexinq) || !mq_empty(&pipexoutq)) continue; pipex_destroy_session(session); break; /* ... */ } 1. You pass pointer to mbuf 2. You enqueue this mbuf to `pipexinq' or `pipexoutq' 3. You call task_add(); Is it * always * garanteed that netisr() will be called here? What will be happened if netisr will be called after [4]? 4. You destroy session. 5. You use memory pointed by `pkt_session' It's netisr context. Are you shure `ph_cookie' is still valid? 6. You use memory pointed by `pkt_session' It's netisr context. Are you shure `ph_cookie' is still valid? Also we are going to move pipexintr() out of KERNEL_LOCK() and NET_LOCK(). This means pipex_destroy_session() and pipexintr() will be executed simultaneously. How to protect `ph_cookie'? I guess to use reference counters for pipex(4) sessions. We will grab extra reference before [1] and release this reference at [7] or [8]. I see absolutele *no* reason to sleep at [9]. So refcnt_finalize() is not applicable here. I want to use reference counters in the way of file(9). This means session can live after userland destroy it because it has reference in `pipexinq' or `pipexoutq'. But userland will not only release it's reference to session. It will destroy `ifnet' too. And you will have session referenced by `pipe
Re: pipex(4): use reference counters for `ifnet'
On 26/06/20(Fri) 12:35, Vitaliy Makkoveev wrote: > On Fri, Jun 26, 2020 at 10:23:42AM +0200, Martin Pieuchot wrote: > > On 25/06/20(Thu) 19:56, Vitaliy Makkoveev wrote: > > > Updated diff. > > > > > > OpenBSD uses 16 bit counter for allocate interface indexes. So we can't > > > store index in session and be sure if_get(9) returned `ifnet' is our > > > original `ifnet'. > > > > Why not? The point of if_get(9) is to be sure. If that doesn't work > > for whatever reason then the if_get(9) interface has to be fixed. Which > > case doesn't work for you? Do you have a reproducer? > > > > How does sessions stay around if their corresponding interface is > > destroyed? > > We have `pipexinq' and `pipexoutq' which can store pointers to session. > pipexintr() process these queues. pipexintr() and > pipex_destroy_session() are *always* different context. This mean we > *can't* free pipex(4) session without be sure there is no reference to > this session in `pipexinq' or `pipexoutq'. Elsewhere this will cause use > afret free issue. Look please at net/pipex.c:846. The way pppx(4) > destroy sessions is wrong. While pppac(4) destroys sessions by > pipex_iface_fini() it's also wrong. Because we don't check `pipexinq' > and `pipexoutq' state. I'am said it again and again. I understand. Why is it a problem? Using reference counting the way you're suggesting is *one* possible solution to a problem we don't fully understand. What are we trying to achieve? Which problem are we trying to solve? Are we trying to move pipexintr() out of KERNEL_LOCK()? If so are we trying to guarantee the memory pointed by `ph_cookie' representing a session pointer is always valid without the KERNEL_LOCK()? If that's what we're trying to do, shouldn't we start with a baby-step and trade the KERNEL_LOCK() for another lock first? I understand it is not some sexy-mp-rcu-atomic-crazy-refcount-clever solution, but it is simpler to do. Since all the packets (mbuf) are being processed with the NET_LOCK(). we should be able to start trading the KERNEL_LOCK() by the NET_LOCK(), no? Once that works and we've built knowledge about the existing architecture we might consider some finer grain locking. Is there any downside to this suggestion?
Re: pipex(4): use reference counters for `ifnet'
On 25/06/20(Thu) 19:56, Vitaliy Makkoveev wrote: > Updated diff. > > OpenBSD uses 16 bit counter for allocate interface indexes. So we can't > store index in session and be sure if_get(9) returned `ifnet' is our > original `ifnet'. Why not? The point of if_get(9) is to be sure. If that doesn't work for whatever reason then the if_get(9) interface has to be fixed. Which case doesn't work for you? Do you have a reproducer? How does sessions stay around if their corresponding interface is destroyed? > Now each pipex(4) session has it's own reference to `ifnet'. Also pppoe > related sessions has the reference to it's ethernet interface. > > All `ifnet's are obtained by if_get(9) and we use `if_index' from stored > reference to `ifnet'. That's not how the API is meant to be used. All if_get(9) must have a corresponding if_put(9) in the same context. We don't store pointers put interface indexes, meaning: > Index: net/pipex.c > === > RCS file: /cvs/src/sys/net/pipex.c,v > retrieving revision 1.116 > diff -u -p -r1.116 pipex.c > --- net/pipex.c 22 Jun 2020 09:38:15 - 1.116 > +++ net/pipex.c 25 Jun 2020 16:41:25 - > @@ -148,7 +148,7 @@ pipex_iface_init(struct pipex_iface_cont > struct pipex_session *session; > > pipex_iface->pipexmode = 0; > - pipex_iface->ifnet_this = ifp; > + pipex_iface->ifnet_this = if_get(ifp->if_index); `ifnet_this' should be replaced by `ifidx' so this becomes: pipex_iface->ifidx = ifp->if_index.
Re: pipex(4): use reference counters for `ifnet'
> On 25 Jun 2020, at 20:13, Christiano F. Haesbaert > wrote: > > You can. > > The idea is that ifindex is always monotonically increased, so to actually > get a new interface you would have to have "overflowed" 65k interfaces, > which is unreal. > > So if your interface is gone, you can be sure if_get will give you NULL. No I can’t :) It’s *not* the operational infinity you talks about. You can run: while true ; do ifconfig tun0 create ; ifconfig tun0 destroy ; done And after couple of minutes you will see is very real to have *counter* be overflowed. Not the interface count. > > On Thu, Jun 25, 2020, 18:55 Vitaliy Makkoveev > wrote: > >> Updated diff. >> >> OpenBSD uses 16 bit counter for allocate interface indexes. So we can't >> store index in session and be sure if_get(9) returned `ifnet' is our >> original `ifnet'. >> >> Now each pipex(4) session has it's own reference to `ifnet'. Also pppoe >> related sessions has the reference to it's ethernet interface. >> >> All `ifnet's are obtained by if_get(9) and we use `if_index' from stored >> reference to `ifnet'. >> >> Index: net/if_pppx.c >> === >> RCS file: /cvs/src/sys/net/if_pppx.c,v >> retrieving revision 1.90 >> diff -u -p -r1.90 if_pppx.c >> --- net/if_pppx.c 24 Jun 2020 08:52:53 - 1.90 >> +++ net/if_pppx.c 25 Jun 2020 16:41:25 - >> @@ -714,15 +714,15 @@ pppx_add_session(struct pppx_dev *pxd, s >>ifp->if_softc = pxi; >>/* ifp->if_rdomain = req->pr_rdomain; */ >> >> - error = pipex_link_session(session, &pxi->pxi_ifcontext); >> - if (error) >> - goto remove; >> - >>/* XXXSMP breaks atomicity */ >>NET_UNLOCK(); >>if_attach(ifp); >>NET_LOCK(); >> >> + error = pipex_link_session(session, &pxi->pxi_ifcontext); >> + if (error) >> + goto detach; >> + >>if_addgroup(ifp, "pppx"); >>if_alloc_sadl(ifp); >> >> @@ -771,7 +771,12 @@ pppx_add_session(struct pppx_dev *pxd, s >> >>return (error); >> >> -remove: >> +detach: >> + /* XXXSMP breaks atomicity */ >> + NET_UNLOCK(); >> + if_detach(ifp); >> + NET_LOCK(); >> + >>if (RBT_REMOVE(pppx_ifs, &pppx_ifs, pxi) == NULL) >>panic("%s: inconsistent RB tree", __func__); >>LIST_REMOVE(pxi, pxi_list); >> @@ -860,13 +865,13 @@ pppx_if_destroy(struct pppx_dev *pxd, st >>CLR(ifp->if_flags, IFF_RUNNING); >> >>pipex_unlink_session(session); >> + pipex_rele_session(session); >> >>/* XXXSMP breaks atomicity */ >>NET_UNLOCK(); >>if_detach(ifp); >>NET_LOCK(); >> >> - pipex_rele_session(session); >>if (RBT_REMOVE(pppx_ifs, &pppx_ifs, pxi) == NULL) >>panic("%s: inconsistent RB tree", __func__); >>LIST_REMOVE(pxi, pxi_list); >> Index: net/pipex.c >> === >> RCS file: /cvs/src/sys/net/pipex.c,v >> retrieving revision 1.116 >> diff -u -p -r1.116 pipex.c >> --- net/pipex.c 22 Jun 2020 09:38:15 - 1.116 >> +++ net/pipex.c 25 Jun 2020 16:41:25 - >> @@ -148,7 +148,7 @@ pipex_iface_init(struct pipex_iface_cont >>struct pipex_session *session; >> >>pipex_iface->pipexmode = 0; >> - pipex_iface->ifnet_this = ifp; >> + pipex_iface->ifnet_this = if_get(ifp->if_index); >> >>if (pipex_rd_head4 == NULL) { >>if (!rn_inithead((void **)&pipex_rd_head4, >> @@ -165,6 +165,7 @@ pipex_iface_init(struct pipex_iface_cont >>session = pool_get(&pipex_session_pool, PR_WAITOK | PR_ZERO); >>session->is_multicast = 1; >>session->pipex_iface = pipex_iface; >> + session->ifnet_this = if_get(ifp->if_index); >>pipex_iface->multicast_session = session; >> } >> >> @@ -194,10 +195,11 @@ pipex_iface_stop(struct pipex_iface_cont >> void >> pipex_iface_fini(struct pipex_iface_context *pipex_iface) >> { >> - pool_put(&pipex_session_pool, pipex_iface->multicast_session); >>NET_LOCK(); >>pipex_iface_stop(pipex_iface); >>NET_UNLOCK(); >> + pipex_rele_session(pipex_iface->multicast_session); >> + if_put(pipex_iface->ifnet_this); >> } >> >> int >> @@ -358,7 +360,7 @@ pipex_init_session(struct pipex_session >>MIN(req->pr_local_address.ss_len, >> sizeof(session->local))); >> #ifdef PIPEX_PPPOE >>if (req->pr_protocol == PIPEX_PROTO_PPPOE) >> - session->proto.pppoe.over_ifidx = over_ifp->if_index; >> + session->proto.pppoe.over_iface = >> if_get(over_ifp->if_index); >> #endif >> #ifdef PIPEX_PPTP >>if (req->pr_protocol == PIPEX_PROTO_PPTP) { >> @@ -421,6 +423,13 @@ pipex_init_session(struct pipex_session >> void >> pipex_rele_session(struct pipex_session *session) >> { >> +#ifdef PIPEX_PPPOE >> + if (session->protocol == PIPEX_
Re: pipex(4): use reference counters for `ifnet'
Updated diff. OpenBSD uses 16 bit counter for allocate interface indexes. So we can't store index in session and be sure if_get(9) returned `ifnet' is our original `ifnet'. Now each pipex(4) session has it's own reference to `ifnet'. Also pppoe related sessions has the reference to it's ethernet interface. All `ifnet's are obtained by if_get(9) and we use `if_index' from stored reference to `ifnet'. Index: net/if_pppx.c === RCS file: /cvs/src/sys/net/if_pppx.c,v retrieving revision 1.90 diff -u -p -r1.90 if_pppx.c --- net/if_pppx.c 24 Jun 2020 08:52:53 - 1.90 +++ net/if_pppx.c 25 Jun 2020 16:41:25 - @@ -714,15 +714,15 @@ pppx_add_session(struct pppx_dev *pxd, s ifp->if_softc = pxi; /* ifp->if_rdomain = req->pr_rdomain; */ - error = pipex_link_session(session, &pxi->pxi_ifcontext); - if (error) - goto remove; - /* XXXSMP breaks atomicity */ NET_UNLOCK(); if_attach(ifp); NET_LOCK(); + error = pipex_link_session(session, &pxi->pxi_ifcontext); + if (error) + goto detach; + if_addgroup(ifp, "pppx"); if_alloc_sadl(ifp); @@ -771,7 +771,12 @@ pppx_add_session(struct pppx_dev *pxd, s return (error); -remove: +detach: + /* XXXSMP breaks atomicity */ + NET_UNLOCK(); + if_detach(ifp); + NET_LOCK(); + if (RBT_REMOVE(pppx_ifs, &pppx_ifs, pxi) == NULL) panic("%s: inconsistent RB tree", __func__); LIST_REMOVE(pxi, pxi_list); @@ -860,13 +865,13 @@ pppx_if_destroy(struct pppx_dev *pxd, st CLR(ifp->if_flags, IFF_RUNNING); pipex_unlink_session(session); + pipex_rele_session(session); /* XXXSMP breaks atomicity */ NET_UNLOCK(); if_detach(ifp); NET_LOCK(); - pipex_rele_session(session); if (RBT_REMOVE(pppx_ifs, &pppx_ifs, pxi) == NULL) panic("%s: inconsistent RB tree", __func__); LIST_REMOVE(pxi, pxi_list); Index: net/pipex.c === RCS file: /cvs/src/sys/net/pipex.c,v retrieving revision 1.116 diff -u -p -r1.116 pipex.c --- net/pipex.c 22 Jun 2020 09:38:15 - 1.116 +++ net/pipex.c 25 Jun 2020 16:41:25 - @@ -148,7 +148,7 @@ pipex_iface_init(struct pipex_iface_cont struct pipex_session *session; pipex_iface->pipexmode = 0; - pipex_iface->ifnet_this = ifp; + pipex_iface->ifnet_this = if_get(ifp->if_index); if (pipex_rd_head4 == NULL) { if (!rn_inithead((void **)&pipex_rd_head4, @@ -165,6 +165,7 @@ pipex_iface_init(struct pipex_iface_cont session = pool_get(&pipex_session_pool, PR_WAITOK | PR_ZERO); session->is_multicast = 1; session->pipex_iface = pipex_iface; + session->ifnet_this = if_get(ifp->if_index); pipex_iface->multicast_session = session; } @@ -194,10 +195,11 @@ pipex_iface_stop(struct pipex_iface_cont void pipex_iface_fini(struct pipex_iface_context *pipex_iface) { - pool_put(&pipex_session_pool, pipex_iface->multicast_session); NET_LOCK(); pipex_iface_stop(pipex_iface); NET_UNLOCK(); + pipex_rele_session(pipex_iface->multicast_session); + if_put(pipex_iface->ifnet_this); } int @@ -358,7 +360,7 @@ pipex_init_session(struct pipex_session MIN(req->pr_local_address.ss_len, sizeof(session->local))); #ifdef PIPEX_PPPOE if (req->pr_protocol == PIPEX_PROTO_PPPOE) - session->proto.pppoe.over_ifidx = over_ifp->if_index; + session->proto.pppoe.over_iface = if_get(over_ifp->if_index); #endif #ifdef PIPEX_PPTP if (req->pr_protocol == PIPEX_PROTO_PPTP) { @@ -421,6 +423,13 @@ pipex_init_session(struct pipex_session void pipex_rele_session(struct pipex_session *session) { +#ifdef PIPEX_PPPOE + if (session->protocol == PIPEX_PROTO_PPPOE) + if_put(session->proto.pppoe.over_iface); +#endif + if (session->ifnet_this) { + if_put(session->ifnet_this); + } if (session->mppe_recv.old_session_keys) pool_put(&mppe_key_pool, session->mppe_recv.old_session_keys); pool_put(&pipex_session_pool, session); @@ -439,6 +448,7 @@ pipex_link_session(struct pipex_session return (EEXIST); session->pipex_iface = iface; + session->ifnet_this = if_get(iface->ifnet_this->if_index); LIST_INSERT_HEAD(&pipex_session_list, session, session_list); chain = PIPEX_ID_HASHTABLE(session->session_id); @@ -655,7 +665,7 @@ pipex_destroy_session(struct pipex_sessi } pipex_unlink_session(session); - pool_put(&pipex_session_pool, session); + pipex_rele_session(session); return (0); } @@ -919,7 +929,7 @@ pipex_ip_output(struct mbuf *m0, struct
Re: pipex(4): use reference counters for `ifnet'
> On 25 Jun 2020, at 16:35, Christiano F. Haesbaert > wrote: > > On Thu, 25 Jun 2020 at 14:06, Vitaliy Makkoveev > wrote: >> >> >> >>> On 25 Jun 2020, at 11:55, Martin Pieuchot wrote: >>> >>> On 24/06/20(Wed) 17:10, Vitaliy Makkoveev wrote: While `mbuf' enqueued to `pipexinq' or `pipexoutq' it has the reference to corresponding pipex(4) session as `ph_cookie'. `ph_cookie' is accessed by pipexintr() and it's always defferent context from context where we destroy session. `ph_cookie' is protected only while we destroy session by pipex_timer() but this protection is not effective. While we destroy session related to pppx(4) `ph_cookie' is not potected. While we destroy session related to pppac(4) by `PIPEXSMODE' ioctl() or by closing pppac(4) device node `ph_cookie' is also not protected. I'am going to use reference counters to protect `ph_cookie' but some additional steps required to be done. >>> >>> Please no. Store an ifidx in session instead of a pointer. Such >>> index are guaranteed to be unique and can be used with if_get(9). >> >> This means I should do if_get(9) before each `ifnet’ usage. Also I >> should do checks and introduce error path. It's ugly. >> > > My 2 cents, > > Not really, this is actually easier to reason with, it produces less > dependencies between data structures and the protocol is kinda clear, > you have to account for the loss of atomicity. > Reference counting is pretty much the worst thing you can do in MP > development. > Very often it is actually desirable to pay for the cost of a lookup > (which is not the case here, since it's a silly O(1)) than to pay the > human/debug cost of interlinking data structures. > Also, in an ideal world if_get() would be RCUed (or SMRed) and would > not require bumping a reference count which implies an atomic > operation, it would also remove the need for if_put() since the epoch > is "until we context switch", as the kernel is not preemptive this is > easy to do without introducing new bugs. > But I can’t store interface index with session instead of pointer. Session can be alive after corresponding `ifnet’ was destroyed and replaced by new `ifnet’. > >> Alternatively, I can pass `ifnet->if_index’ instead of `ifnet’ to >> sessinon being linked and store obtained reference up to session >> destruction. I should add one error path to pipex_link_session(). >> It’s ugly but a little bit. >> >>> >>> We deliberately kept if_ref() private to keep the code base coherent. >> >> Could you explain please why if_ref() and if_get() are incoherent?
Re: pipex(4): use reference counters for `ifnet'
You can. The idea is that ifindex is always monotonically increased, so to actually get a new interface you would have to have "overflowed" 65k interfaces, which is unreal. So if your interface is gone, you can be sure if_get will give you NULL. On Thu, Jun 25, 2020, 18:55 Vitaliy Makkoveev wrote: > Updated diff. > > OpenBSD uses 16 bit counter for allocate interface indexes. So we can't > store index in session and be sure if_get(9) returned `ifnet' is our > original `ifnet'. > > Now each pipex(4) session has it's own reference to `ifnet'. Also pppoe > related sessions has the reference to it's ethernet interface. > > All `ifnet's are obtained by if_get(9) and we use `if_index' from stored > reference to `ifnet'. > > Index: net/if_pppx.c > === > RCS file: /cvs/src/sys/net/if_pppx.c,v > retrieving revision 1.90 > diff -u -p -r1.90 if_pppx.c > --- net/if_pppx.c 24 Jun 2020 08:52:53 - 1.90 > +++ net/if_pppx.c 25 Jun 2020 16:41:25 - > @@ -714,15 +714,15 @@ pppx_add_session(struct pppx_dev *pxd, s > ifp->if_softc = pxi; > /* ifp->if_rdomain = req->pr_rdomain; */ > > - error = pipex_link_session(session, &pxi->pxi_ifcontext); > - if (error) > - goto remove; > - > /* XXXSMP breaks atomicity */ > NET_UNLOCK(); > if_attach(ifp); > NET_LOCK(); > > + error = pipex_link_session(session, &pxi->pxi_ifcontext); > + if (error) > + goto detach; > + > if_addgroup(ifp, "pppx"); > if_alloc_sadl(ifp); > > @@ -771,7 +771,12 @@ pppx_add_session(struct pppx_dev *pxd, s > > return (error); > > -remove: > +detach: > + /* XXXSMP breaks atomicity */ > + NET_UNLOCK(); > + if_detach(ifp); > + NET_LOCK(); > + > if (RBT_REMOVE(pppx_ifs, &pppx_ifs, pxi) == NULL) > panic("%s: inconsistent RB tree", __func__); > LIST_REMOVE(pxi, pxi_list); > @@ -860,13 +865,13 @@ pppx_if_destroy(struct pppx_dev *pxd, st > CLR(ifp->if_flags, IFF_RUNNING); > > pipex_unlink_session(session); > + pipex_rele_session(session); > > /* XXXSMP breaks atomicity */ > NET_UNLOCK(); > if_detach(ifp); > NET_LOCK(); > > - pipex_rele_session(session); > if (RBT_REMOVE(pppx_ifs, &pppx_ifs, pxi) == NULL) > panic("%s: inconsistent RB tree", __func__); > LIST_REMOVE(pxi, pxi_list); > Index: net/pipex.c > === > RCS file: /cvs/src/sys/net/pipex.c,v > retrieving revision 1.116 > diff -u -p -r1.116 pipex.c > --- net/pipex.c 22 Jun 2020 09:38:15 - 1.116 > +++ net/pipex.c 25 Jun 2020 16:41:25 - > @@ -148,7 +148,7 @@ pipex_iface_init(struct pipex_iface_cont > struct pipex_session *session; > > pipex_iface->pipexmode = 0; > - pipex_iface->ifnet_this = ifp; > + pipex_iface->ifnet_this = if_get(ifp->if_index); > > if (pipex_rd_head4 == NULL) { > if (!rn_inithead((void **)&pipex_rd_head4, > @@ -165,6 +165,7 @@ pipex_iface_init(struct pipex_iface_cont > session = pool_get(&pipex_session_pool, PR_WAITOK | PR_ZERO); > session->is_multicast = 1; > session->pipex_iface = pipex_iface; > + session->ifnet_this = if_get(ifp->if_index); > pipex_iface->multicast_session = session; > } > > @@ -194,10 +195,11 @@ pipex_iface_stop(struct pipex_iface_cont > void > pipex_iface_fini(struct pipex_iface_context *pipex_iface) > { > - pool_put(&pipex_session_pool, pipex_iface->multicast_session); > NET_LOCK(); > pipex_iface_stop(pipex_iface); > NET_UNLOCK(); > + pipex_rele_session(pipex_iface->multicast_session); > + if_put(pipex_iface->ifnet_this); > } > > int > @@ -358,7 +360,7 @@ pipex_init_session(struct pipex_session > MIN(req->pr_local_address.ss_len, > sizeof(session->local))); > #ifdef PIPEX_PPPOE > if (req->pr_protocol == PIPEX_PROTO_PPPOE) > - session->proto.pppoe.over_ifidx = over_ifp->if_index; > + session->proto.pppoe.over_iface = > if_get(over_ifp->if_index); > #endif > #ifdef PIPEX_PPTP > if (req->pr_protocol == PIPEX_PROTO_PPTP) { > @@ -421,6 +423,13 @@ pipex_init_session(struct pipex_session > void > pipex_rele_session(struct pipex_session *session) > { > +#ifdef PIPEX_PPPOE > + if (session->protocol == PIPEX_PROTO_PPPOE) > + if_put(session->proto.pppoe.over_iface); > +#endif > + if (session->ifnet_this) { > + if_put(session->ifnet_this); > + } > if (session->mppe_recv.old_session_keys) > pool_put(&mppe_key_pool, > session->mppe_recv.old_session_keys); > pool_put(&pipex_session_pool, session); > @@ -439,6 +448,7 @@ pipex_link_session(struct pipex_session > retu
Re: pipex(4): use reference counters for `ifnet'
> On 25 Jun 2020, at 11:55, Martin Pieuchot wrote: > > On 24/06/20(Wed) 17:10, Vitaliy Makkoveev wrote: >> While `mbuf' enqueued to `pipexinq' or `pipexoutq' it has the reference >> to corresponding pipex(4) session as `ph_cookie'. `ph_cookie' is >> accessed by pipexintr() and it's always defferent context from context >> where we destroy session. `ph_cookie' is protected only while we destroy >> session by pipex_timer() but this protection is not effective. While we >> destroy session related to pppx(4) `ph_cookie' is not potected. While we >> destroy session related to pppac(4) by `PIPEXSMODE' ioctl() or by >> closing pppac(4) device node `ph_cookie' is also not protected. >> >> I'am going to use reference counters to protect `ph_cookie' but some >> additional steps required to be done. > > Please no. Store an ifidx in session instead of a pointer. Such > index are guaranteed to be unique and can be used with if_get(9). This means I should do if_get(9) before each `ifnet’ usage. Also I should do checks and introduce error path. It's ugly. Alternatively, I can pass `ifnet->if_index’ instead of `ifnet’ to sessinon being linked and store obtained reference up to session destruction. I should add one error path to pipex_link_session(). It’s ugly but a little bit. > > We deliberately kept if_ref() private to keep the code base coherent. Could you explain please why if_ref() and if_get() are incoherent?
Re: pipex(4): use reference counters for `ifnet'
On Thu, 25 Jun 2020 at 14:06, Vitaliy Makkoveev wrote: > > > > > On 25 Jun 2020, at 11:55, Martin Pieuchot wrote: > > > > On 24/06/20(Wed) 17:10, Vitaliy Makkoveev wrote: > >> While `mbuf' enqueued to `pipexinq' or `pipexoutq' it has the reference > >> to corresponding pipex(4) session as `ph_cookie'. `ph_cookie' is > >> accessed by pipexintr() and it's always defferent context from context > >> where we destroy session. `ph_cookie' is protected only while we destroy > >> session by pipex_timer() but this protection is not effective. While we > >> destroy session related to pppx(4) `ph_cookie' is not potected. While we > >> destroy session related to pppac(4) by `PIPEXSMODE' ioctl() or by > >> closing pppac(4) device node `ph_cookie' is also not protected. > >> > >> I'am going to use reference counters to protect `ph_cookie' but some > >> additional steps required to be done. > > > > Please no. Store an ifidx in session instead of a pointer. Such > > index are guaranteed to be unique and can be used with if_get(9). > > This means I should do if_get(9) before each `ifnet’ usage. Also I > should do checks and introduce error path. It's ugly. > My 2 cents, Not really, this is actually easier to reason with, it produces less dependencies between data structures and the protocol is kinda clear, you have to account for the loss of atomicity. Reference counting is pretty much the worst thing you can do in MP development. Very often it is actually desirable to pay for the cost of a lookup (which is not the case here, since it's a silly O(1)) than to pay the human/debug cost of interlinking data structures. Also, in an ideal world if_get() would be RCUed (or SMRed) and would not require bumping a reference count which implies an atomic operation, it would also remove the need for if_put() since the epoch is "until we context switch", as the kernel is not preemptive this is easy to do without introducing new bugs. > Alternatively, I can pass `ifnet->if_index’ instead of `ifnet’ to > sessinon being linked and store obtained reference up to session > destruction. I should add one error path to pipex_link_session(). > It’s ugly but a little bit. > > > > > We deliberately kept if_ref() private to keep the code base coherent. > > Could you explain please why if_ref() and if_get() are incoherent?
Re: pipex(4): use reference counters for `ifnet'
On 24/06/20(Wed) 17:10, Vitaliy Makkoveev wrote: > While `mbuf' enqueued to `pipexinq' or `pipexoutq' it has the reference > to corresponding pipex(4) session as `ph_cookie'. `ph_cookie' is > accessed by pipexintr() and it's always defferent context from context > where we destroy session. `ph_cookie' is protected only while we destroy > session by pipex_timer() but this protection is not effective. While we > destroy session related to pppx(4) `ph_cookie' is not potected. While we > destroy session related to pppac(4) by `PIPEXSMODE' ioctl() or by > closing pppac(4) device node `ph_cookie' is also not protected. > > I'am going to use reference counters to protect `ph_cookie' but some > additional steps required to be done. Please no. Store an ifidx in session instead of a pointer. Such index are guaranteed to be unique and can be used with if_get(9). We deliberately kept if_ref() private to keep the code base coherent. > We have `pipex_iface' which holds the reference to external `ifnet'. The > pipex(4) session also has reference to this `ifnet'. With reference > counters pipex(4) session can still be in `pipexinq' or `pipexoutq' but > corresponding `pppx_if' or `pppac_softc' is already destroyed. I want to > be shure while we do if_detach() no one has reference to this `ifnet'. > > Diff below grabs reference to `ifnet' each time it passed to pipex(4). > Also while we release this session we release the reference. > > We attach `ifnet' before we have linked sessions and we detach `ifnet' > after we release all sessions. Now it's safe. > > if_ref() was declared in sys/if.c so I moved if_ref() declaration to > sys/if.h. > > Index: sys/net/if.c > === > RCS file: /cvs/src/sys/net/if.c,v > retrieving revision 1.610 > diff -u -p -r1.610 if.c > --- sys/net/if.c 22 Jun 2020 09:45:13 - 1.610 > +++ sys/net/if.c 24 Jun 2020 13:43:33 - > @@ -192,7 +192,6 @@ void if_qstart_compat(struct ifqueue *); > > void if_ifp_dtor(void *, void *); > void if_map_dtor(void *, void *); > -struct ifnet *if_ref(struct ifnet *); > > /* > * struct if_map > Index: sys/net/if.h > === > RCS file: /cvs/src/sys/net/if.h,v > retrieving revision 1.203 > diff -u -p -r1.203 if.h > --- sys/net/if.h 25 Jul 2019 15:23:39 - 1.203 > +++ sys/net/if.h 24 Jun 2020 13:43:33 - > @@ -548,6 +548,7 @@ int if_delgroup(struct ifnet *, const ch > void if_group_routechange(struct sockaddr *, struct sockaddr *); > struct ifnet *ifunit(const char *); > struct ifnet *if_get(unsigned int); > +struct ifnet *if_ref(struct ifnet *); > void if_put(struct ifnet *); > void ifnewlladdr(struct ifnet *); > void if_congestion(void); > Index: sys/net/if_pppx.c > === > RCS file: /cvs/src/sys/net/if_pppx.c,v > retrieving revision 1.90 > diff -u -p -r1.90 if_pppx.c > --- sys/net/if_pppx.c 24 Jun 2020 08:52:53 - 1.90 > +++ sys/net/if_pppx.c 24 Jun 2020 13:43:33 - > @@ -714,15 +714,15 @@ pppx_add_session(struct pppx_dev *pxd, s > ifp->if_softc = pxi; > /* ifp->if_rdomain = req->pr_rdomain; */ > > - error = pipex_link_session(session, &pxi->pxi_ifcontext); > - if (error) > - goto remove; > - > /* XXXSMP breaks atomicity */ > NET_UNLOCK(); > if_attach(ifp); > NET_LOCK(); > > + error = pipex_link_session(session, &pxi->pxi_ifcontext); > + if (error) > + goto detach; > + > if_addgroup(ifp, "pppx"); > if_alloc_sadl(ifp); > > @@ -771,7 +771,12 @@ pppx_add_session(struct pppx_dev *pxd, s > > return (error); > > -remove: > +detach: > + /* XXXSMP breaks atomicity */ > + NET_UNLOCK(); > + if_detach(ifp); > + NET_LOCK(); > + > if (RBT_REMOVE(pppx_ifs, &pppx_ifs, pxi) == NULL) > panic("%s: inconsistent RB tree", __func__); > LIST_REMOVE(pxi, pxi_list); > @@ -860,13 +865,13 @@ pppx_if_destroy(struct pppx_dev *pxd, st > CLR(ifp->if_flags, IFF_RUNNING); > > pipex_unlink_session(session); > + pipex_rele_session(session); > > /* XXXSMP breaks atomicity */ > NET_UNLOCK(); > if_detach(ifp); > NET_LOCK(); > > - pipex_rele_session(session); > if (RBT_REMOVE(pppx_ifs, &pppx_ifs, pxi) == NULL) > panic("%s: inconsistent RB tree", __func__); > LIST_REMOVE(pxi, pxi_list); > Index: sys/net/pipex.c > === > RCS file: /cvs/src/sys/net/pipex.c,v > retrieving revision 1.116 > diff -u -p -r1.116 pipex.c > --- sys/net/pipex.c 22 Jun 2020 09:38:15 - 1.116 > +++ sys/net/pipex.c 24 Jun 2020 13:43:34 - > @@ -147,6 +147,7 @@ pipex_iface_init(struct pipex_iface_cont > { > struct pipex_session *session; > > + if_
pipex(4): use reference counters for `ifnet'
While `mbuf' enqueued to `pipexinq' or `pipexoutq' it has the reference to corresponding pipex(4) session as `ph_cookie'. `ph_cookie' is accessed by pipexintr() and it's always defferent context from context where we destroy session. `ph_cookie' is protected only while we destroy session by pipex_timer() but this protection is not effective. While we destroy session related to pppx(4) `ph_cookie' is not potected. While we destroy session related to pppac(4) by `PIPEXSMODE' ioctl() or by closing pppac(4) device node `ph_cookie' is also not protected. I'am going to use reference counters to protect `ph_cookie' but some additional steps required to be done. We have `pipex_iface' which holds the reference to external `ifnet'. The pipex(4) session also has reference to this `ifnet'. With reference counters pipex(4) session can still be in `pipexinq' or `pipexoutq' but corresponding `pppx_if' or `pppac_softc' is already destroyed. I want to be shure while we do if_detach() no one has reference to this `ifnet'. Diff below grabs reference to `ifnet' each time it passed to pipex(4). Also while we release this session we release the reference. We attach `ifnet' before we have linked sessions and we detach `ifnet' after we release all sessions. Now it's safe. if_ref() was declared in sys/if.c so I moved if_ref() declaration to sys/if.h. Index: sys/net/if.c === RCS file: /cvs/src/sys/net/if.c,v retrieving revision 1.610 diff -u -p -r1.610 if.c --- sys/net/if.c22 Jun 2020 09:45:13 - 1.610 +++ sys/net/if.c24 Jun 2020 13:43:33 - @@ -192,7 +192,6 @@ voidif_qstart_compat(struct ifqueue *); void if_ifp_dtor(void *, void *); void if_map_dtor(void *, void *); -struct ifnet *if_ref(struct ifnet *); /* * struct if_map Index: sys/net/if.h === RCS file: /cvs/src/sys/net/if.h,v retrieving revision 1.203 diff -u -p -r1.203 if.h --- sys/net/if.h25 Jul 2019 15:23:39 - 1.203 +++ sys/net/if.h24 Jun 2020 13:43:33 - @@ -548,6 +548,7 @@ int if_delgroup(struct ifnet *, const ch void if_group_routechange(struct sockaddr *, struct sockaddr *); struct ifnet *ifunit(const char *); struct ifnet *if_get(unsigned int); +struct ifnet *if_ref(struct ifnet *); void if_put(struct ifnet *); void ifnewlladdr(struct ifnet *); void if_congestion(void); Index: sys/net/if_pppx.c === RCS file: /cvs/src/sys/net/if_pppx.c,v retrieving revision 1.90 diff -u -p -r1.90 if_pppx.c --- sys/net/if_pppx.c 24 Jun 2020 08:52:53 - 1.90 +++ sys/net/if_pppx.c 24 Jun 2020 13:43:33 - @@ -714,15 +714,15 @@ pppx_add_session(struct pppx_dev *pxd, s ifp->if_softc = pxi; /* ifp->if_rdomain = req->pr_rdomain; */ - error = pipex_link_session(session, &pxi->pxi_ifcontext); - if (error) - goto remove; - /* XXXSMP breaks atomicity */ NET_UNLOCK(); if_attach(ifp); NET_LOCK(); + error = pipex_link_session(session, &pxi->pxi_ifcontext); + if (error) + goto detach; + if_addgroup(ifp, "pppx"); if_alloc_sadl(ifp); @@ -771,7 +771,12 @@ pppx_add_session(struct pppx_dev *pxd, s return (error); -remove: +detach: + /* XXXSMP breaks atomicity */ + NET_UNLOCK(); + if_detach(ifp); + NET_LOCK(); + if (RBT_REMOVE(pppx_ifs, &pppx_ifs, pxi) == NULL) panic("%s: inconsistent RB tree", __func__); LIST_REMOVE(pxi, pxi_list); @@ -860,13 +865,13 @@ pppx_if_destroy(struct pppx_dev *pxd, st CLR(ifp->if_flags, IFF_RUNNING); pipex_unlink_session(session); + pipex_rele_session(session); /* XXXSMP breaks atomicity */ NET_UNLOCK(); if_detach(ifp); NET_LOCK(); - pipex_rele_session(session); if (RBT_REMOVE(pppx_ifs, &pppx_ifs, pxi) == NULL) panic("%s: inconsistent RB tree", __func__); LIST_REMOVE(pxi, pxi_list); Index: sys/net/pipex.c === RCS file: /cvs/src/sys/net/pipex.c,v retrieving revision 1.116 diff -u -p -r1.116 pipex.c --- sys/net/pipex.c 22 Jun 2020 09:38:15 - 1.116 +++ sys/net/pipex.c 24 Jun 2020 13:43:34 - @@ -147,6 +147,7 @@ pipex_iface_init(struct pipex_iface_cont { struct pipex_session *session; + if_ref(ifp); pipex_iface->pipexmode = 0; pipex_iface->ifnet_this = ifp; @@ -164,6 +165,7 @@ pipex_iface_init(struct pipex_iface_cont /* virtual pipex_session entry for multicast */ session = pool_get(&pipex_session_pool, PR_WAITOK | PR_ZERO); session->is_multicast = 1; + if_ref(pipex_iface->ifnet_this); session->pipex_iface = pipex_iface; pipex_iface->multicast_sess