On 09/30/2011 02:32 PM, Michal Sojka wrote:
> Dear SocketCAN developers,
> 
> recently, we worked with Oliver on evaluation of queuing disciplines for
> use in CAN networks. We will publish our results soon, but before that I
> would like to discuss with you one issue that annoys me for a long time
> (and according to the mailing list, I'm not alone). Now we may have a

ACK :)

> solution for it - see the description and the patch below.

finally!!!!!!!11

> This issue is that write()/send() syscalls return ENOBUFS under
> certain conditions. The easiest way to reproduce the problem is to
> send some frames to an unconnected CAN interface (i.e. no frame cannot
> leave the box, all stay queued somewhere). After attempting to send
> 10+(number of HW TX buffers) frames, the application (e.g. cangen)
> gets ENOBUFS. cangen tries to overcome the problem by calling poll()
> in this case but this doesn't work. Poll never block and cangen ends
> up busy waiting. Just run cangen .... -i -p 1000 and top. cangen will
> eat 100% CPU.

In the kernel poll callback does the following (citing myself):

---

> The poll callback checks if the used memory is less than the half of per
> socket snd buffer (IIRC ~60K). See:
> 
> datagram_poll (http://lxr.linux.no/linux+v2.6.38/net/core/datagram.c#L737)
> sock_writeable (http://lxr.linux.no/linux+v2.6.38/include/net/sock.h#L1618)
> 
> Because the size of a can frame (+the skb overhead) is much less then
> the ethernet frame (+overhead) the default value for the snd buffer is
> too big for can.
> 
> We get the -ENOBUF from write() if the tx_queue_len (default 10) is
> exceeded.
> 
> http://lxr.linux.no/linux+v2.6.38/drivers/net/can/dev.c#L435
> http://lxr.linux.no/linux+v2.6.38/net/can/af_can.c#L268

---

> Such a problem usually doesn't appear in Ethernet networks.
> Applications block automatically when there is lack of resources for
> sending. We dig into source codes to find out what is different in CAN
> compared to the Ethernet and here is what we have found.
> 
> So why is ENOBUFS typically returned? In the default configuration,
> CAN interfaces have attached pfifo_fast queuing discipline. Therefore,
> dev_queue_xmit() calls pfifo_fast_enqueue() which checks for
> dev->tx_queue_len (which is 10 for CAN devices by default). If the
> dev->number of queued frames is grater, it and returns
> NET_XMIT_DROP. Then, can_send() calls net_xmit_errno(), which
> translates NET_XMIT_DROP into -ENOBUFS which is then returned to the
> application.
> 
> The difference in Ethernet networks is that the default queue size is
> 1000 and the reason why this limit is not reached is that there is
> another limit, which is lower and causes the application to block.
> This limit is SO_SNDBUF socket option.

Ip over Ethernet first depletes sk->sk_sndbuf, but CAN first reaches the
tx_queuelen limit. Because poll checks for the sk->sk_sndbuf, it never
really blocks in the CAN use case.

> In case of CAN_RAW sockets, this limit is checked in
> sock_alloc_send_skb() like this:
> 
> if (sk->sk_wmem_alloc < sk->sk_sndbuf)
>   alloc_skb();
> else
>  sock_wait_for_wmem(); // i.e. block
> 
> sk->sk_wmem_alloc is increased by skb->truesize whenever application
> creates a skb belonging to the socket (i.e. on write) and decreased by
> the same amount whenever the skb is passed to the driver. The value
> of skb->truesize is the sizeof(can_frame) + sizeof(skb), which is 200
> in my case (PowerPC).

Can you check on some other ARCH, 32 and 64 bits please.

> The default value of sk->sk_wmem_alloc is 108544 which means that for
> CAN, this limit is reached (and the application blocks) when it has
> 542 CAN frames waiting to be send to the driver. This is of cause more
> then 10, allowed by dev->tx_queue_len.
> 
> Therefore, we propose apply patch like this:
> 
> diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c
> index d0f8c7e..4831c53 100644
> --- a/drivers/net/can/dev.c
> +++ b/drivers/net/can/dev.c
> @@ -438,7 +438,7 @@ static void can_setup(struct net_device *dev)
>       dev->mtu = sizeof(struct can_frame);
>       dev->hard_header_len = 0;
>       dev->addr_len = 0;
> -     dev->tx_queue_len = 10;
> +     dev->tx_queue_len = 22;
>  
>       /* New-style flags. */
>       dev->flags = IFF_NOARP;
> diff --git a/net/can/af_can.c b/net/can/af_can.c
> index 094fc53..4cf10e7 100644
> --- a/net/can/af_can.c
> +++ b/net/can/af_can.c
> @@ -190,6 +190,8 @@ static int can_create(struct net *net, struct socket 
> *sock, int protocol,
>       sock_init_data(sock, sk);
>       sk->sk_destruct = can_sock_destruct;
>  
> +     sk->sk_sndbuf = SOCK_MIN_SNDBUF;
> +
>       if (sk->sk_prot->init)
>               err = sk->sk_prot->init(sk);
>  
> This sets the minimum possible sk_sndbuf, i.e. 2048, which allows to
> have 11 frames queued for a socket before the application blocks. In
> my case, the driver (mpc5200) seems to utilize 3 TX buffers and
> therefore cangen blocks when it tries to send the 15th frame (3 frames
> are buffered in driver, 11 in pfifo_fast qdisc). If the application
> does not want to block, it can set O_NONBLOCK flag on the socket and
> it receives EAGAIN instead of ENOBUFS.

What about dynamically calculating the sk->sk_sndbuf providing room for
a fixed number of CAN frames in the socket, i.e. 10 so so. Maybe even
make the number of CAN frames configurable during runtime.

> It is also necessary to slightly increase the default tx_queue_len.
> Increasing it to 22 allows using two applications (or better two
> sockets) without seeing ENOBUFS. The third application/socket then
> gets ENOBUFS just for its first write().

Hmmm...3 applications isn't that much, is it?
How many ether applications are needed to deplete the standard 1000 tx
queuelen?

100k snd_buf / 2k skb+data = 50 frames per sock
1000 tx_queuelen / 50 socks = 20 Aps

> The above described situation is not the only way how can an
> application get ENOBUFS, but I think that in case of PF_CAN this is
> the most common situation and having a blocking behavior as provided
> by this patch would help the users a lot.

cheers, Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Socketcan-core mailing list
Socketcan-core@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/socketcan-core

Reply via email to