Ira W. Snyder wrote:
> On Wed, Feb 24, 2010 at 08:08:15PM +0100, Marc Kleine-Budde wrote:
>> Ira W. Snyder wrote:
>>
>> [...]
>>
>>>>>> But we want so solve the TX problem. You only have to stop the
>>>>>> netif_queue if you TX ring is full. Then you have to deal with calling
>>>>>> netif_wake_queue() eventually. But we obviously don't want to poll.
>>>>>>
>>>>>> The solution might be to look at the TX queue in the interrupt handler
>>>>>> and/or the NAPI loop. And now we get back to my first question. If the
>>>>>> controller does a loopback in hardware, i.e. you receive a frame for
>>>>>> each frame send, you don't need to poll the TX queue length.
>>>>>>
>>>>>> If you receive a CAN frame it might be on of your's, which means the TX
>>>>>> queue might have at least room for one frame. In pseudo code it might
>>>>>> look like this:
>>>>>>
>>>>>> xmit()
>>>>>> {
>>>>>> send_frame_to_hardware();
>>>>>> if (tx_ring_full) {
>>>>>> netif_stop_queue();
>>>>>> priv->flags |= TX_RING_FULL;
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> irq() and netif_poll()
>>>>>> {
>>>>>> if (priv->flags & TX_RING_FULL) {
>>>>>> if (room_in_ring()) {
>>>>>> priv->flags &= ~TX_RING_FULL;
>>>>>> netif_wake_queue();
>>>>>> }
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> Should be implemented race-free, i.e. you might have to use atomic_*
>>>>>> operations and/or don't use a shared flag variable.
>>>>> Cool, that should work. netif_queue_stopped() could be used instead of
>>>>> the TX_RING_FULL flag.
>>>> right...good point
>>>>
>>>> I should have looked at my own driver :)
>>>>
>>>> If room_in_ring() is cheap, it boils down to:
>>>>
>>>>>> irq() and netif_poll()
>>>>>> {
>>>>>> if (room_in_ring()) {
>>>>>> netif_wake_queue();
>>>>>> }
>>>> Otherwise:
>>>>
>>>>>> irq() and netif_poll()
>>>>>> {
>>>>>> if (netif_queue_stopped() && room_in_ring()) {
>>>>>> netif_wake_queue();
>>>>>> }
>>> Hmm, this *almost* works. Running "cangen -g 0 can0" quickly gets the
>>> queue into a hung state, though.
>> what is a "hung state"?
>>
>>> I added printk() to the sites where I stop/wake the queue. It is always
>>> woken up correctly, but cangen always exits with "write: No buffer space
>>> available" after a few stop/wake cycles. Immediately running cangen
>>> again works for a few more stop/wake cycles.
>> let's look the the cangen source if there's propper error handling...
>> http://svn.berlios.de/viewvc/socketcan/trunk/can-utils/cangen.c?revision=787&view=markup
>> nope, there isn't
>>
>> There's the option "-i" to ignore ENOBUFS, which is....we don't want to
>> do that....
>>
>> Better improbe the source: (from the pengutronix' canutils...)
>>
>> again:
>> len = write(s, &frame, sizeof(frame));
>> if (len == -1) {
>> switch (errno) {
>> case ENOBUFS: {
>> int err;
>> struct pollfd fds[] = {
>> {
>> .fd = s,
>> .events = POLLOUT,
>> },
>> };
>>
>> err = poll(fds, 1, 1000);
>> if (err == -1 && errno != -EINTR) {
>> perror("poll()");
>> exit(EXIT_FAILURE);
>> } else if (err == 0) {
>> printf("a timeout occured\n");
>> exit(EXIT_FAILURE);
>> }
>> }
>> case EINTR: /* fallthrough */
>> goto again;
>> default:
>> perror("write");
>> exit(EXIT_FAILURE);
>> }
>> }
>>
>> If you hit a timeout, which means you cannot send packages for 1000 ms,
>> your send queue probably didn't wake up...
>>
>
> Is there any chance you can send me the pengutronix version of
> cangen/canutils? I can't find it on their website.http://www.pengutronix.de/software/socket-can/download/canutils/v4.0/ http://www.pengutronix.de/software/libsocketcan/download/ you need the libsocketcan, too > >>> Here is the output of ifconfig: >>> can0 Link encap:UNSPEC HWaddr >>> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 >>> UP RUNNING NOARP MTU:16 Metric:1 >>> RX packets:1902 errors:0 dropped:0 overruns:0 frame:0 >>> TX packets:1902 errors:0 dropped:0 overruns:0 carrier:0 >>> collisions:0 txqueuelen:10 >>> RX bytes:10767 (10.5 KiB) TX bytes:10767 (10.5 KiB) >>> Interrupt:22 >>> >>> can1 Link encap:UNSPEC HWaddr >>> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 >>> UP RUNNING NOARP MTU:16 Metric:1 >>> RX packets:1902 errors:0 dropped:0 overruns:0 frame:0 >>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 >>> collisions:0 txqueuelen:10 >>> RX bytes:10767 (10.5 KiB) TX bytes:0 (0.0 b) >>> Interrupt:22 >>> >>> I'm stopping the queue correctly: the xmit() routine never complains that it >>> tried to transmit but had no buffer space. You'll see that we didn't drop >>> any >>> packets in this run. In another run, we did drop packets. It is like the >>> firmware didn't echo the packets back to me. Also note that the TX of >>> can0 always matches the RX of can1. They're cabled together, so this is >>> expected. >> Hmm...it looks like this. >> >> Double check your RX path, so that you don't loose any frames in >> software. Is it possible that the RX and TX collide in your driver >> and/or in hardware? >> > > I'm pretty sure that I do not lose RX frames in my driver. When I do, > stats->rx_dropped is always incremented. I do get the following warning I mean loosing RX frames due to an programming error... > from lockdep, though. Sorry about the length. The best I can get from > the output is that something in the networking open() code is broken (or > needs a special lockdep annotation). Puhh...I havn't read this before. The problem is a from within a HARDIRQ-safe a HARDIRQ-unsafe lock is aquired. the HARDIRQ-safe seems to be the spin lock in your interrupt handler, the unsafe one is in the close path on a socket. > [ 134.540375] janz-cmodio 0000:01:0e.0: PCI->APIC IRQ transform: INT A -> > IRQ 22 > [ 134.547629] janz-cmodio 0000:01:0e.0: setting latency timer to 64 > [ 134.684269] janz-ican3 janz-ican3.0: module 0: registered CAN device > [ 134.807929] janz-ican3 janz-ican3.1: module 1: registered CAN device > [ 175.361588] janz-ican3 janz-ican3.0: ican3_xmit: stopping queue > [ 175.373104] janz-ican3 janz-ican3.0: ican3_recv_skb: waking queue > [ 175.379556] > [ 175.379559] ====================================================== > [ 175.383277] [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ] > [ 175.383277] 2.6.33-rc7-00000-rc7 #3 > [ 175.383277] ------------------------------------------------------ > [ 175.383277] events/0/5 [HC0[0]:SC1[1]:HE0:SE0] is trying to acquire: > [ 175.383277] (clock-AF_CAN){++.-..}, at: [<c11920e7>] > sock_def_write_space+0x15/0x8d > [ 175.383277] > [ 175.383277] and this task is already holding: > [ 175.383277] (&(&mod->lock)->rlock){-.-...}, at: [<f898df29>] > ican3_xmit+0x48/0x277 [janz_ican3] > [ 175.383277] which would create a new lock dependency: > [ 175.383277] (&(&mod->lock)->rlock){-.-...} -> (clock-AF_CAN){++.-..} > [ 175.383277] > [ 175.383277] but this new dependency connects a HARDIRQ-irq-safe lock: > [ 175.383277] (&(&mod->lock)->rlock){-.-...} > [ 175.383277] ... which became HARDIRQ-irq-safe at: > [ 175.383277] [<c103d47f>] __lock_acquire+0x4ce/0x756 > [ 175.383277] [<c103e209>] lock_acquire+0x45/0x5c > [ 175.383277] [<c12037a7>] _raw_spin_lock_irqsave+0x24/0x34 from your interrupt handler you aquire a spin_lock > [ 175.383277] [<f898d3c9>] ican3_irq+0x41/0x164 [janz_ican3] > [ 175.383277] [<c10463ab>] handle_IRQ_event+0x1d/0x9a > [ 175.383277] [<c1047872>] handle_fasteoi_irq+0x9a/0xa7 > [ 175.383277] > [ 175.383277] to a HARDIRQ-irq-unsafe lock: > [ 175.383277] (clock-AF_CAN){++.-..} > [ 175.383277] ... which became HARDIRQ-irq-unsafe at: > [ 175.383277] ... [<c103d500>] __lock_acquire+0x54f/0x756 > [ 175.383277] [<c103e209>] lock_acquire+0x45/0x5c > [ 175.383277] [<c12038da>] _raw_write_lock_bh+0x20/0x2f > [ 175.383277] [<c11e3be7>] raw_release+0xf9/0x14c > [ 175.383277] [<c118f55c>] sock_release+0x14/0x58 > [ 175.383277] [<c118fc4c>] sock_close+0x1c/0x20 > [ 175.383277] [<c106fc41>] __fput+0xeb/0x187 > [ 175.383277] [<c106fe8d>] fput+0x16/0x18 > [ 175.383277] [<c106d873>] filp_close+0x51/0x5b > [ 175.383277] [<c106e9c3>] sys_close+0x67/0xa0 > [ 175.383277] [<c1002657>] sysenter_do_call+0x12/0x36 soneone calls a "close" ^^^ > [ 175.383277] > [ 175.383277] other info that might help us debug this: > [ 175.383277] > [ 175.383277] 4 locks held by events/0/5: > [ 175.383277] #0: (events){+.+.+.}, at: [<c102ce51>] > worker_thread+0xc9/0x1c9 > [ 175.383277] #1: ((&mod->work)){+.+...}, at: [<c102ce51>] > worker_thread+0xc9/0x1c9 > [ 175.383277] #2: (_xmit_NONE#2){+.-...}, at: [<c11a693a>] > sch_direct_xmit+0x2d/0x113 > [ 175.383277] #3: (&(&mod->lock)->rlock){-.-...}, at: [<f898df29>] > ican3_xmit+0x48/0x277 [janz_ican3] > [ 175.383277] > [ 175.383277] the dependencies between HARDIRQ-irq-safe lock and the holding > lock: > [ 175.383277] -> (&(&mod->lock)->rlock){-.-...} ops: 0 { > [ 175.383277] IN-HARDIRQ-W at: > [ 175.383277] [<c103d47f>] > __lock_acquire+0x4ce/0x756 > [ 175.383277] [<c103e209>] > lock_acquire+0x45/0x5c > [ 175.383277] [<c12037a7>] > _raw_spin_lock_irqsave+0x24/0x34 > [ 175.383277] [<f898d3c9>] > ican3_irq+0x41/0x164 [janz_ican3] > [ 175.383277] [<c10463ab>] > handle_IRQ_event+0x1d/0x9a > [ 175.383277] [<c1047872>] > handle_fasteoi_irq+0x9a/0xa7 > [ 175.383277] IN-SOFTIRQ-W at: > [ 175.383277] [<c103d4a3>] > __lock_acquire+0x4f2/0x756 > [ 175.383277] [<c103e209>] > lock_acquire+0x45/0x5c > [ 175.383277] [<c12037a7>] > _raw_spin_lock_irqsave+0x24/0x34 > [ 175.383277] [<f898d3c9>] > ican3_irq+0x41/0x164 [janz_ican3] > [ 175.383277] [<c10463ab>] > handle_IRQ_event+0x1d/0x9a > [ 175.383277] [<c1047872>] > handle_fasteoi_irq+0x9a/0xa7 > [ 175.383277] INITIAL USE at: > [ 175.383277] [<c103d571>] > __lock_acquire+0x5c0/0x756 > [ 175.383277] [<c103e209>] > lock_acquire+0x45/0x5c > [ 175.383277] [<c12037a7>] > _raw_spin_lock_irqsave+0x24/0x34 > [ 175.383277] [<f898d06e>] > ican3_send_msg+0x1c/0x1e6 [janz_ican3] > [ 175.383277] [<f898e64d>] > ican3_probe+0x421/0xc91 [janz_ican3] > [ 175.383277] [<c1140dcd>] > platform_drv_probe+0xc/0xe > [ 175.383277] [<c114022c>] > driver_probe_device+0x7d/0xf0 > [ 175.383277] [<c1140421>] > __device_attach+0x28/0x30 > [ 175.383277] [<c113f721>] > bus_for_each_drv+0x39/0x62 > [ 175.383277] [<c1140362>] > device_attach+0x45/0x59 > [ 175.383277] [<c113f6d3>] > bus_probe_device+0x1f/0x34 > [ 175.383277] [<c113e618>] > device_add+0x2f5/0x43b > [ 175.383277] [<c11410d1>] > platform_device_add+0xd9/0x119 > [ 175.383277] [<c1141126>] > platform_device_register+0x15/0x18 > [ 175.383277] [<f89a92df>] > cmodio_pci_probe+0x228/0x2cd [janz_cmodio] > [ 175.383277] [<c10f5100>] > local_pci_probe+0xe/0x10 > [ 175.383277] [<c10f52cf>] > pci_device_probe+0x48/0x66 > [ 175.383277] [<c114022c>] > driver_probe_device+0x7d/0xf0 > [ 175.383277] [<c11402e2>] > __driver_attach+0x43/0x5f > [ 175.383277] [<c113f950>] > bus_for_each_dev+0x39/0x5a > [ 175.383277] [<c11400ff>] > driver_attach+0x14/0x16 > [ 175.383277] [<c113fdd7>] > bus_add_driver+0xa2/0x1c8 > [ 175.383277] [<c11405ab>] > driver_register+0x7b/0xd7 > [ 175.383277] [<c10f54c5>] > __pci_register_driver+0x4d/0xa0 > [ 175.383277] [<f89ac017>] 0xf89ac017 > [ 175.383277] [<c100104b>] > do_one_initcall+0x4b/0x135 > [ 175.383277] [<c104475e>] > sys_init_module+0xa9/0x1e3 > [ 175.383277] [<c1002657>] > sysenter_do_call+0x12/0x36 > [ 175.383277] } > [ 175.383277] ... key at: [<f898f61c>] __key.26093+0x0/0xfffff8a1 > [janz_ican3] > [ 175.383277] ... acquired at: > [ 175.383277] [<c103c497>] check_irq_usage+0x50/0xad > [ 175.383277] [<c103caf8>] validate_chain+0x604/0xabd > [ 175.383277] [<c103d69c>] __lock_acquire+0x6eb/0x756 > [ 175.383277] [<c103e209>] lock_acquire+0x45/0x5c > [ 175.383277] [<c1203adb>] _raw_read_lock+0x1b/0x2a > [ 175.383277] [<c11920e7>] sock_def_write_space+0x15/0x8d > [ 175.383277] [<c11918d3>] sock_wfree+0x25/0x41 > [ 175.383277] [<c1193503>] skb_release_head_state+0x43/0x46 > [ 175.383277] [<c11945c0>] skb_release_all+0xb/0x15 > [ 175.383277] [<c1193e41>] __kfree_skb+0xb/0x69 > [ 175.383277] [<c1193ef1>] kfree_skb+0x28/0x2a > [ 175.383277] [<f898e0ef>] ican3_xmit+0x20e/0x277 [janz_ican3] > [ 175.383277] [<c119a597>] dev_hard_start_xmit+0x1c7/0x25d > [ 175.383277] [<c11a695a>] sch_direct_xmit+0x4d/0x113 > [ 175.383277] [<c11a6ab0>] __qdisc_run+0x90/0xa2 > [ 175.383277] [<c119c90d>] net_tx_action+0xa7/0xff > [ 175.383277] [<c102384e>] __do_softirq+0x75/0xf3 > [ 175.383277] > [ 175.383277] > [ 175.383277] the dependencies between the lock to be acquired and > HARDIRQ-irq-unsafe lock: > [ 175.383277] -> (clock-AF_CAN){++.-..} ops: 0 { > [ 175.383277] HARDIRQ-ON-W at: > [ 175.383277] [<c103d500>] > __lock_acquire+0x54f/0x756 > [ 175.383277] [<c103e209>] > lock_acquire+0x45/0x5c > [ 175.383277] [<c12038da>] > _raw_write_lock_bh+0x20/0x2f > [ 175.383277] [<c11e3be7>] > raw_release+0xf9/0x14c > [ 175.383277] [<c118f55c>] > sock_release+0x14/0x58 > [ 175.383277] [<c118fc4c>] > sock_close+0x1c/0x20 > [ 175.383277] [<c106fc41>] > __fput+0xeb/0x187 > [ 175.383277] [<c106fe8d>] > fput+0x16/0x18 > [ 175.383277] [<c106d873>] > filp_close+0x51/0x5b > [ 175.383277] [<c106e9c3>] > sys_close+0x67/0xa0 > [ 175.383277] [<c1002657>] > sysenter_do_call+0x12/0x36 > [ 175.383277] HARDIRQ-ON-R at: > [ 175.383277] [<c103d4d0>] > __lock_acquire+0x51f/0x756 > [ 175.383277] [<c103e209>] > lock_acquire+0x45/0x5c > [ 175.383277] [<c1203adb>] > _raw_read_lock+0x1b/0x2a > [ 175.383277] [<c119207c>] > sock_def_readable+0x15/0x6b > [ 175.383277] [<c119149d>] > sock_queue_rcv_skb+0xfc/0x115 > [ 175.383277] [<c11e3901>] > raw_rcv+0x4a/0x5a > [ 175.383277] [<c11e2491>] > can_rcv_filter+0x7a/0x147 > [ 175.383277] [<c11e2765>] > can_rcv+0xa8/0xed > [ 175.383277] [<c1199aec>] > netif_receive_skb+0x288/0x2bb > [ 175.383277] [<c119c1b5>] > process_backlog+0x63/0x8c > [ 175.383277] [<c119c5e1>] > net_rx_action+0x4b/0x104 > [ 175.383277] [<c102384e>] > __do_softirq+0x75/0xf3 > [ 175.383277] IN-SOFTIRQ-R at: > [ 175.383277] [<c103d4a3>] > __lock_acquire+0x4f2/0x756 > [ 175.383277] [<c103e209>] > lock_acquire+0x45/0x5c > [ 175.383277] [<c1203adb>] > _raw_read_lock+0x1b/0x2a > [ 175.383277] [<c119207c>] > sock_def_readable+0x15/0x6b > [ 175.383277] [<c119149d>] > sock_queue_rcv_skb+0xfc/0x115 > [ 175.383277] [<c11e3901>] > raw_rcv+0x4a/0x5a > [ 175.383277] [<c11e2491>] > can_rcv_filter+0x7a/0x147 > [ 175.383277] [<c11e2765>] > can_rcv+0xa8/0xed > [ 175.383277] [<c1199aec>] > netif_receive_skb+0x288/0x2bb > [ 175.383277] [<c119c1b5>] > process_backlog+0x63/0x8c > [ 175.383277] [<c119c5e1>] > net_rx_action+0x4b/0x104 > [ 175.383277] [<c102384e>] > __do_softirq+0x75/0xf3 > [ 175.383277] INITIAL USE at: > [ 175.383277] [<c103d571>] > __lock_acquire+0x5c0/0x756 > [ 175.383277] [<c103e209>] > lock_acquire+0x45/0x5c > [ 175.383277] [<c1203adb>] > _raw_read_lock+0x1b/0x2a > [ 175.383277] [<c11920e7>] > sock_def_write_space+0x15/0x8d > [ 175.383277] [<c11918d3>] > sock_wfree+0x25/0x41 > [ 175.383277] [<c1193503>] > skb_release_head_state+0x43/0x46 > [ 175.383277] [<c11945c0>] > skb_release_all+0xb/0x15 > [ 175.383277] [<c1193e41>] > __kfree_skb+0xb/0x69 > [ 175.383277] [<c1193ef1>] > kfree_skb+0x28/0x2a > [ 175.383277] [<f898e0ef>] > ican3_xmit+0x20e/0x277 [janz_ican3] > [ 175.383277] [<c119a597>] > dev_hard_start_xmit+0x1c7/0x25d > [ 175.383277] [<c11a695a>] > sch_direct_xmit+0x4d/0x113 > [ 175.383277] [<c119d15d>] > dev_queue_xmit+0x22c/0x397 > [ 175.383277] [<c11e267e>] > can_send+0xd0/0x10f > [ 175.383277] [<c11e3882>] > raw_sendmsg+0xc3/0xf8 > [ 175.383277] [<c118df5f>] > sock_aio_write+0xc5/0xce > [ 175.383277] [<c106edf8>] > do_sync_write+0x88/0xca > [ 175.383277] [<c106f49d>] > vfs_write+0x9d/0x116 > [ 175.383277] [<c106f86a>] > sys_write+0x3b/0x60 > [ 175.383277] [<c1002657>] > sysenter_do_call+0x12/0x36 > [ 175.383277] } > [ 175.383277] ... key at: [<c186ba04>] af_callback_keys+0xe8/0x128 > [ 175.383277] ... acquired at: > [ 175.383277] [<c103c497>] check_irq_usage+0x50/0xad > [ 175.383277] [<c103caf8>] validate_chain+0x604/0xabd > [ 175.383277] [<c103d69c>] __lock_acquire+0x6eb/0x756 > [ 175.383277] [<c103e209>] lock_acquire+0x45/0x5c > [ 175.383277] [<c1203adb>] _raw_read_lock+0x1b/0x2a > [ 175.383277] [<c11920e7>] sock_def_write_space+0x15/0x8d > [ 175.383277] [<c11918d3>] sock_wfree+0x25/0x41 > [ 175.383277] [<c1193503>] skb_release_head_state+0x43/0x46 > [ 175.383277] [<c11945c0>] skb_release_all+0xb/0x15 > [ 175.383277] [<c1193e41>] __kfree_skb+0xb/0x69 > [ 175.383277] [<c1193ef1>] kfree_skb+0x28/0x2a > [ 175.383277] [<f898e0ef>] ican3_xmit+0x20e/0x277 [janz_ican3] > [ 175.383277] [<c119a597>] dev_hard_start_xmit+0x1c7/0x25d > [ 175.383277] [<c11a695a>] sch_direct_xmit+0x4d/0x113 > [ 175.383277] [<c11a6ab0>] __qdisc_run+0x90/0xa2 > [ 175.383277] [<c119c90d>] net_tx_action+0xa7/0xff > [ 175.383277] [<c102384e>] __do_softirq+0x75/0xf3 > [ 175.383277] > > >> Another improvement for the cangen utility is to exit after a certain >> number of messages. So you can match the number of generated frames with >> the rx/tx counters of the CAN cards and interrupts (cat /proc/interrupts) >> > > Yeah, that would be a good improvement too. try cansequence from pengutronix' canutils > >>> can0 Link encap:UNSPEC HWaddr >>> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 >>> UP RUNNING NOARP MTU:16 Metric:1 >>> RX packets:461 errors:0 dropped:0 overruns:0 frame:0 >>> TX packets:608 errors:0 dropped:0 overruns:0 carrier:0 >>> collisions:0 txqueuelen:10 >>> RX bytes:2616 (2.5 KiB) TX bytes:3434 (3.3 KiB) >>> Interrupt:22 >>> >>> can1 Link encap:UNSPEC HWaddr >>> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 >>> UP RUNNING NOARP MTU:16 Metric:1 >>> RX packets:608 errors:0 dropped:0 overruns:0 frame:0 >>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 >>> collisions:0 txqueuelen:10 >>> RX bytes:3434 (3.3 KiB) TX bytes:0 (0.0 b) >>> Interrupt:22 Marc -- Pengutronix e.K. | Marc Kleine-Budde | Industrial Linux Solutions | Phone: +49-231-2826-924 | Vertretung West/Dortmund | Fax: +49-5121-206917-5555 | Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Socketcan-core mailing list [email protected] https://lists.berlios.de/mailman/listinfo/socketcan-core
