On 09/30/2011 02:32 PM, Michal Sojka wrote: > Dear SocketCAN developers, > > recently, we worked with Oliver on evaluation of queuing disciplines for > use in CAN networks. We will publish our results soon, but before that I > would like to discuss with you one issue that annoys me for a long time > (and according to the mailing list, I'm not alone). Now we may have a
ACK :) > solution for it - see the description and the patch below. finally!!!!!!!11 > This issue is that write()/send() syscalls return ENOBUFS under > certain conditions. The easiest way to reproduce the problem is to > send some frames to an unconnected CAN interface (i.e. no frame cannot > leave the box, all stay queued somewhere). After attempting to send > 10+(number of HW TX buffers) frames, the application (e.g. cangen) > gets ENOBUFS. cangen tries to overcome the problem by calling poll() > in this case but this doesn't work. Poll never block and cangen ends > up busy waiting. Just run cangen .... -i -p 1000 and top. cangen will > eat 100% CPU. In the kernel poll callback does the following (citing myself): --- > The poll callback checks if the used memory is less than the half of per > socket snd buffer (IIRC ~60K). See: > > datagram_poll (http://lxr.linux.no/linux+v2.6.38/net/core/datagram.c#L737) > sock_writeable (http://lxr.linux.no/linux+v2.6.38/include/net/sock.h#L1618) > > Because the size of a can frame (+the skb overhead) is much less then > the ethernet frame (+overhead) the default value for the snd buffer is > too big for can. > > We get the -ENOBUF from write() if the tx_queue_len (default 10) is > exceeded. > > http://lxr.linux.no/linux+v2.6.38/drivers/net/can/dev.c#L435 > http://lxr.linux.no/linux+v2.6.38/net/can/af_can.c#L268 --- > Such a problem usually doesn't appear in Ethernet networks. > Applications block automatically when there is lack of resources for > sending. We dig into source codes to find out what is different in CAN > compared to the Ethernet and here is what we have found. > > So why is ENOBUFS typically returned? In the default configuration, > CAN interfaces have attached pfifo_fast queuing discipline. Therefore, > dev_queue_xmit() calls pfifo_fast_enqueue() which checks for > dev->tx_queue_len (which is 10 for CAN devices by default). If the > dev->number of queued frames is grater, it and returns > NET_XMIT_DROP. Then, can_send() calls net_xmit_errno(), which > translates NET_XMIT_DROP into -ENOBUFS which is then returned to the > application. > > The difference in Ethernet networks is that the default queue size is > 1000 and the reason why this limit is not reached is that there is > another limit, which is lower and causes the application to block. > This limit is SO_SNDBUF socket option. Ip over Ethernet first depletes sk->sk_sndbuf, but CAN first reaches the tx_queuelen limit. Because poll checks for the sk->sk_sndbuf, it never really blocks in the CAN use case. > In case of CAN_RAW sockets, this limit is checked in > sock_alloc_send_skb() like this: > > if (sk->sk_wmem_alloc < sk->sk_sndbuf) > alloc_skb(); > else > sock_wait_for_wmem(); // i.e. block > > sk->sk_wmem_alloc is increased by skb->truesize whenever application > creates a skb belonging to the socket (i.e. on write) and decreased by > the same amount whenever the skb is passed to the driver. The value > of skb->truesize is the sizeof(can_frame) + sizeof(skb), which is 200 > in my case (PowerPC). Can you check on some other ARCH, 32 and 64 bits please. > The default value of sk->sk_wmem_alloc is 108544 which means that for > CAN, this limit is reached (and the application blocks) when it has > 542 CAN frames waiting to be send to the driver. This is of cause more > then 10, allowed by dev->tx_queue_len. > > Therefore, we propose apply patch like this: > > diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c > index d0f8c7e..4831c53 100644 > --- a/drivers/net/can/dev.c > +++ b/drivers/net/can/dev.c > @@ -438,7 +438,7 @@ static void can_setup(struct net_device *dev) > dev->mtu = sizeof(struct can_frame); > dev->hard_header_len = 0; > dev->addr_len = 0; > - dev->tx_queue_len = 10; > + dev->tx_queue_len = 22; > > /* New-style flags. */ > dev->flags = IFF_NOARP; > diff --git a/net/can/af_can.c b/net/can/af_can.c > index 094fc53..4cf10e7 100644 > --- a/net/can/af_can.c > +++ b/net/can/af_can.c > @@ -190,6 +190,8 @@ static int can_create(struct net *net, struct socket > *sock, int protocol, > sock_init_data(sock, sk); > sk->sk_destruct = can_sock_destruct; > > + sk->sk_sndbuf = SOCK_MIN_SNDBUF; > + > if (sk->sk_prot->init) > err = sk->sk_prot->init(sk); > > This sets the minimum possible sk_sndbuf, i.e. 2048, which allows to > have 11 frames queued for a socket before the application blocks. In > my case, the driver (mpc5200) seems to utilize 3 TX buffers and > therefore cangen blocks when it tries to send the 15th frame (3 frames > are buffered in driver, 11 in pfifo_fast qdisc). If the application > does not want to block, it can set O_NONBLOCK flag on the socket and > it receives EAGAIN instead of ENOBUFS. What about dynamically calculating the sk->sk_sndbuf providing room for a fixed number of CAN frames in the socket, i.e. 10 so so. Maybe even make the number of CAN frames configurable during runtime. > It is also necessary to slightly increase the default tx_queue_len. > Increasing it to 22 allows using two applications (or better two > sockets) without seeing ENOBUFS. The third application/socket then > gets ENOBUFS just for its first write(). Hmmm...3 applications isn't that much, is it? How many ether applications are needed to deplete the standard 1000 tx queuelen? 100k snd_buf / 2k skb+data = 50 frames per sock 1000 tx_queuelen / 50 socks = 20 Aps > The above described situation is not the only way how can an > application get ENOBUFS, but I think that in case of PF_CAN this is > the most common situation and having a blocking behavior as provided > by this patch would help the users a lot. cheers, Marc -- Pengutronix e.K. | Marc Kleine-Budde | Industrial Linux Solutions | Phone: +49-231-2826-924 | Vertretung West/Dortmund | Fax: +49-5121-206917-5555 | Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Socketcan-core mailing list Socketcan-core@lists.berlios.de https://lists.berlios.de/mailman/listinfo/socketcan-core