Re: [PATCH net] net: lapbether: Prevent racing when checking whether the netif is running
On Tue, Mar 16, 2021 at 8:17 AM Martin Schiller wrote: > > On 2021-03-14 02:59, Xie He wrote: > > Hi Martin, > > > > Could you ack? Thanks! > > Acked-by: Martin Schiller Thank you!!
Re: [PATCH net] net: lapbether: Prevent racing when checking whether the netif is running
On 2021-03-14 02:59, Xie He wrote: Hi Martin, Could you ack? Thanks! Acked-by: Martin Schiller
Re: [PATCH net] net: lapbether: Prevent racing when checking whether the netif is running
Hi Martin, Could you ack? Thanks!
Re: [PATCH net] net: lapbether: Prevent racing when checking whether the netif is running
On Thu, 11 Mar 2021 16:28:47 -0800 Xie He wrote: > On Thu, Mar 11, 2021 at 4:10 PM Jakub Kicinski wrote: > > > > And the "noqueue" queue is there because it's on top of hdlc_fr.c > > somehow or some out of tree driver? Or do you install it manually? > > No, this driver is not related to "hdlc_fr.c" or any out-of-tree > driver. The default qdisc is "noqueue" for this driver because this > driver doesn't set "tx_queue_len". This means the value of > "tx_queue_len" would be 0. In this case, "alloc_netdev_mqs" will > automatically add the "IFF_NO_QUEUE" flag to the device, then > "attach_one_default_qdisc" in "net/sched/sch_generic.c" will attach > the "noqueue" qdisc for devices with the "IFF_NO_QUEUE" flag. I see.
Re: [PATCH net] net: lapbether: Prevent racing when checking whether the netif is running
On Thu, Mar 11, 2021 at 4:10 PM Jakub Kicinski wrote: > > And the "noqueue" queue is there because it's on top of hdlc_fr.c > somehow or some out of tree driver? Or do you install it manually? No, this driver is not related to "hdlc_fr.c" or any out-of-tree driver. The default qdisc is "noqueue" for this driver because this driver doesn't set "tx_queue_len". This means the value of "tx_queue_len" would be 0. In this case, "alloc_netdev_mqs" will automatically add the "IFF_NO_QUEUE" flag to the device, then "attach_one_default_qdisc" in "net/sched/sch_generic.c" will attach the "noqueue" qdisc for devices with the "IFF_NO_QUEUE" flag.
Re: [PATCH net] net: lapbether: Prevent racing when checking whether the netif is running
On Thu, 11 Mar 2021 15:13:01 -0800 Xie He wrote: > On Thu, Mar 11, 2021 at 2:52 PM Jakub Kicinski wrote: > > > > Normally driver's ndo_stop() calls netif_tx_disable() which takes TX > > locks, so unless your driver is lockless (LLTX) there should be no xmit > > calls after that point. > > Do you mean I should call "netif_tx_disable" inside my "ndo_stop" > function as a fix for the racing between "ndo_stop" and > "ndo_start_xmit"? > > I can't call "netif_tx_disable" inside my "ndo_stop" function because > "netif_tx_disable" will call "netif_tx_stop_queue", which causes > another racing problem. Please see my recent commit f7d9d4854519 > ("net: lapbether: Remove netif_start_queue / netif_stop_queue") And the "noqueue" queue is there because it's on top of hdlc_fr.c somehow or some out of tree driver? Or do you install it manually?
Re: [PATCH net] net: lapbether: Prevent racing when checking whether the netif is running
On Thu, Mar 11, 2021 at 2:52 PM Jakub Kicinski wrote: > > Normally driver's ndo_stop() calls netif_tx_disable() which takes TX > locks, so unless your driver is lockless (LLTX) there should be no xmit > calls after that point. Do you mean I should call "netif_tx_disable" inside my "ndo_stop" function as a fix for the racing between "ndo_stop" and "ndo_start_xmit"? I can't call "netif_tx_disable" inside my "ndo_stop" function because "netif_tx_disable" will call "netif_tx_stop_queue", which causes another racing problem. Please see my recent commit f7d9d4854519 ("net: lapbether: Remove netif_start_queue / netif_stop_queue")
Re: [PATCH net] net: lapbether: Prevent racing when checking whether the netif is running
On Thu, 11 Mar 2021 13:12:25 -0800 Xie He wrote: > On Thu, Mar 11, 2021 at 12:43 PM Jakub Kicinski wrote: > > > > Is this a theoretical issues or do you see a path where it triggers? > > > > Who are the callers sending frames to a device which went down? > > This is a theoretical issue. I didn't see this issue in practice. > > When "__dev_queue_xmit" and "sch_direct_xmit" call > "dev_hard_start_xmit", there appears to be no locking mechanism > preventing the netif from going down while "dev_hard_start_xmit" is > doing its work. Normally driver's ndo_stop() calls netif_tx_disable() which takes TX locks, so unless your driver is lockless (LLTX) there should be no xmit calls after that point.
Re: [PATCH net] net: lapbether: Prevent racing when checking whether the netif is running
On Thu, Mar 11, 2021 at 12:43 PM Jakub Kicinski wrote: > > Is this a theoretical issues or do you see a path where it triggers? > > Who are the callers sending frames to a device which went down? This is a theoretical issue. I didn't see this issue in practice. When "__dev_queue_xmit" and "sch_direct_xmit" call "dev_hard_start_xmit", there appears to be no locking mechanism preventing the netif from going down while "dev_hard_start_xmit" is doing its work. David once confirmed in an email that a driver's "ndo_stop" function would indeed race with its "ndo_start_xmit" function: https://lore.kernel.org/netdev/20190520.200922.2277656639346033061.da...@davemloft.net/
Re: [PATCH net] net: lapbether: Prevent racing when checking whether the netif is running
On Wed, 10 Mar 2021 23:23:09 -0800 Xie He wrote: > There are two "netif_running" checks in this driver. One is in > "lapbeth_xmit" and the other is in "lapbeth_rcv". They serve to make > sure that the LAPB APIs called in these functions are called before > "lapb_unregister" is called by the "ndo_stop" function. > > However, these "netif_running" checks are unreliable, because it's > possible that immediately after "netif_running" returns true, "ndo_stop" > is called (which causes "lapb_unregister" to be called). > > This patch adds locking to make sure "lapbeth_xmit" and "lapbeth_rcv" can > reliably check and ensure the netif is running while doing their work. > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > Signed-off-by: Xie He Is this a theoretical issues or do you see a path where it triggers? Who are the callers sending frames to a device which went down?