Re: [irda-users] [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...
On Tue, 1 May 2007, Samuel Ortiz wrote: But I will definitely spend some time this week running my IrDA stack with this patch applied. I didn't bother to do that earlier as you first reported some oops with this patch applied. I think, what I reported was not an Oops, but the race that we're also fixing ATM - the one in the state machine. So, unrelated. On Mon, Apr 30, 2007 at 03:24:05PM +0200, Guennadi Liakhovetski wrote: in irttp_dup() (remember spinlock_init()?:-)), otherwise it oopses. good catch, again...Yes, I do remember the irttp_dup bug ;-) I've put a tsap_init() function that does those common initialisation calls, so we have a smaller chance of forgetting the dup() path next time... I will be testing it too, but don't know how much longer and how intensively. Do you think we might get some new problems with this new context? It seems quite safe to me. But more importantly, I thing we do want to call the flow indication routine asynchronously in that specific case. There is one thing that this patch is missing though: we should probably clean the worqueue right before we destroy the TTP instance. The work routine checks for NULL pointer, but still... I thought about it too, but the only thing you can do is flush_scheduled_work(), is this what you mean? The test with your patch stopped inside a ftp transfer - the ftp client was doing a get, and it stopped half-way through. On the client side only the control tcp connection was still open, on the client both ftp-server forked children were dead, no connection open. No errors in logs. Weird. With my ugly patch it ran the weekend through. ...and it just stopped again - this time after just 13 minutes. A couple of comments to your patch: +static void irttp_flow_restart(struct work_struct *work) +{ + struct tsap_cb * self = + container_of(work, struct tsap_cb, irnet_flow_work); + + if (self == NULL) + return; self will not be NULL here. How about + IRDA_ASSERT(work != NULL, return;); + IRDA_ASSERT(self-magic == TTP_TSAP_MAGIC, return;); - /* Check if we can accept more frames from client. - * We don't want to wait until the todo timer to do that, and we - * can't use tasklets (grr...), so we are obliged to give control - * to client. That's ok, this test will be true not too often - * (max once per LAP window) and we are called from places - * where we can spend a bit of time doing stuff. - Jean II */ if ((self-tx_sdu_busy) (skb_queue_len(self-tx_queue) TTP_TX_LOW_THRESHOLD) (!self-close_pend)) Is this test still needed here? You do it again in the work-function, and the conditions can change, so, should we schedule_work unconditionally here? Thanks Guennadi - Guennadi Liakhovetski, Ph.D. DSA Daten- und Systemtechnik GmbH Pascalstr. 28 D-52076 Aachen Germany - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [irda-users] [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...
On Tue, 10 Apr 2007, Samuel Ortiz wrote: Hi Guennadi, The patch below schedules irnet_flow_indication() asynchronously. Could you please give it a try (it builds, but I couldn't test it...) ? : Ok, your patch (still below) works too (now that I fixed that state machine race, btw, we still have to decide on the final form how it goes in the mainline) __after__ you also add the line + INIT_WORK(new-irnet_flow_work, irttp_flow_restart); in irttp_dup() (remember spinlock_init()?:-)), otherwise it oopses. Generally, I like your patch better than mine to ppp_generic.c, where I explicitly check if a recursion is occurring. Still, I am a bit concerned about introducing yet another execution context into irda... We have seen a couple of locking issues there already in the last 2-3 months especially under rt-preempt... Would you be able to run some tests too? I will be testing it too, but don't know how much longer and how intensively. Do you think we might get some new problems with this new context? Thanks Guennadi diff --git a/include/net/irda/irttp.h b/include/net/irda/irttp.h index a899e58..941f0f1 100644 --- a/include/net/irda/irttp.h +++ b/include/net/irda/irttp.h @@ -128,6 +128,7 @@ struct tsap_cb { struct net_device_stats stats; struct timer_list todo_timer; + struct work_struct irnet_flow_work; /* irttp asynchronous flow restart */ __u32 max_seg_size; /* Max data that fit into an IrLAP frame */ __u8 max_header_size; diff --git a/net/irda/irnet/irnet.h b/net/irda/irnet/irnet.h diff --git a/net/irda/irttp.c b/net/irda/irttp.c index 7069e4a..a0d0f26 100644 --- a/net/irda/irttp.c +++ b/net/irda/irttp.c @@ -367,6 +367,29 @@ static int irttp_param_max_sdu_size(void *instance, irda_param_t *param, /*** CLIENT CALLS ***/ /** LMP CALLBACKS **/ /* Everything is happily mixed up. Waiting for next clean up - Jean II */ +static void irttp_flow_restart(struct work_struct *work) +{ + struct tsap_cb * self = + container_of(work, struct tsap_cb, irnet_flow_work); + + if (self == NULL) + return; + + /* Check if we can accept more frames from client. */ + if ((self-tx_sdu_busy) + (skb_queue_len(self-tx_queue) TTP_TX_LOW_THRESHOLD) + (!self-close_pend)) { + if (self-notify.flow_indication) + self-notify.flow_indication(self-notify.instance, + self, FLOW_START); + + /* self-tx_sdu_busy is the state of the client. + * We don't really have a race here, but it's always safer + * to update our state after the client - Jean II */ + self-tx_sdu_busy = FALSE; + } +} + /* * Function irttp_open_tsap (stsap, notify) @@ -402,6 +425,8 @@ struct tsap_cb *irttp_open_tsap(__u8 stsap_sel, int credit, notify_t *notify) self-todo_timer.data = (unsigned long) self; self-todo_timer.function = irttp_todo_expired; + INIT_WORK(self-irnet_flow_work, irttp_flow_restart); + /* Initialize callbacks for IrLMP to use */ irda_notify_init(ttp_notify); ttp_notify.connect_confirm = irttp_connect_confirm; @@ -761,25 +786,10 @@ static void irttp_run_tx_queue(struct tsap_cb *self) self-stats.tx_packets++; } - /* Check if we can accept more frames from client. - * We don't want to wait until the todo timer to do that, and we - * can't use tasklets (grr...), so we are obliged to give control - * to client. That's ok, this test will be true not too often - * (max once per LAP window) and we are called from places - * where we can spend a bit of time doing stuff. - Jean II */ if ((self-tx_sdu_busy) (skb_queue_len(self-tx_queue) TTP_TX_LOW_THRESHOLD) (!self-close_pend)) - { - if (self-notify.flow_indication) - self-notify.flow_indication(self-notify.instance, - self, FLOW_START); - - /* self-tx_sdu_busy is the state of the client. - * We don't really have a race here, but it's always safer - * to update our state after the client - Jean II */ - self-tx_sdu_busy = FALSE; - } + schedule_work(self-irnet_flow_work); /* Reset lock */ self-tx_queue_lock = 0; - Guennadi Liakhovetski, Ph.D. DSA Daten- und Systemtechnik GmbH Pascalstr. 28 D-52076 Aachen Germany - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [irda-users] [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...
On Mon, Apr 30, 2007 at 03:24:05PM +0200, Guennadi Liakhovetski wrote: On Tue, 10 Apr 2007, Samuel Ortiz wrote: Hi Guennadi, The patch below schedules irnet_flow_indication() asynchronously. Could you please give it a try (it builds, but I couldn't test it...) ? : Ok, your patch (still below) works too (now that I fixed that state machine race, btw, we still have to decide on the final form how it goes in the mainline) __after__ you also add the line + INIT_WORK(new-irnet_flow_work, irttp_flow_restart); in irttp_dup() (remember spinlock_init()?:-)), otherwise it oopses. good catch, again...Yes, I do remember the irttp_dup bug ;-) Generally, I like your patch better than mine to ppp_generic.c, where I explicitly check if a recursion is occurring. Still, I am a bit concerned about introducing yet another execution context into irda... We have seen a couple of locking issues there already in the last 2-3 months especially under rt-preempt... Would you be able to run some tests too? I think I can run some tests here as well, but probably not as many as you: I'm not doing IrDA stuff full time while it seems you currently are. But I will definitely spend some time this week running my IrDA stack with this patch applied. I didn't bother to do that earlier as you first reported some oops with this patch applied. I will be testing it too, but don't know how much longer and how intensively. Do you think we might get some new problems with this new context? It seems quite safe to me. But more importantly, I thing we do want to call the flow indication routine asynchronously in that specific case. There is one thing that this patch is missing though: we should probably clean the worqueue right before we destroy the TTP instance. The work routine checks for NULL pointer, but still... Cheers, Samuel. Thanks Guennadi diff --git a/include/net/irda/irttp.h b/include/net/irda/irttp.h index a899e58..941f0f1 100644 --- a/include/net/irda/irttp.h +++ b/include/net/irda/irttp.h @@ -128,6 +128,7 @@ struct tsap_cb { struct net_device_stats stats; struct timer_list todo_timer; + struct work_struct irnet_flow_work; /* irttp asynchronous flow restart */ __u32 max_seg_size; /* Max data that fit into an IrLAP frame */ __u8 max_header_size; diff --git a/net/irda/irnet/irnet.h b/net/irda/irnet/irnet.h diff --git a/net/irda/irttp.c b/net/irda/irttp.c index 7069e4a..a0d0f26 100644 --- a/net/irda/irttp.c +++ b/net/irda/irttp.c @@ -367,6 +367,29 @@ static int irttp_param_max_sdu_size(void *instance, irda_param_t *param, /*** CLIENT CALLS ***/ /** LMP CALLBACKS **/ /* Everything is happily mixed up. Waiting for next clean up - Jean II */ +static void irttp_flow_restart(struct work_struct *work) +{ + struct tsap_cb * self = + container_of(work, struct tsap_cb, irnet_flow_work); + + if (self == NULL) + return; + + /* Check if we can accept more frames from client. */ + if ((self-tx_sdu_busy) + (skb_queue_len(self-tx_queue) TTP_TX_LOW_THRESHOLD) + (!self-close_pend)) { + if (self-notify.flow_indication) + self-notify.flow_indication(self-notify.instance, +self, FLOW_START); + + /* self-tx_sdu_busy is the state of the client. +* We don't really have a race here, but it's always safer +* to update our state after the client - Jean II */ + self-tx_sdu_busy = FALSE; + } +} + /* * Function irttp_open_tsap (stsap, notify) @@ -402,6 +425,8 @@ struct tsap_cb *irttp_open_tsap(__u8 stsap_sel, int credit, notify_t *notify) self-todo_timer.data = (unsigned long) self; self-todo_timer.function = irttp_todo_expired; + INIT_WORK(self-irnet_flow_work, irttp_flow_restart); + /* Initialize callbacks for IrLMP to use */ irda_notify_init(ttp_notify); ttp_notify.connect_confirm = irttp_connect_confirm; @@ -761,25 +786,10 @@ static void irttp_run_tx_queue(struct tsap_cb *self) self-stats.tx_packets++; } - /* Check if we can accept more frames from client. -* We don't want to wait until the todo timer to do that, and we -* can't use tasklets (grr...), so we are obliged to give control -* to client. That's ok, this test will be true not too often -* (max once per LAP window) and we are called from places -* where we can spend a bit of time doing stuff. - Jean II */ if ((self-tx_sdu_busy) (skb_queue_len(self-tx_queue) TTP_TX_LOW_THRESHOLD) (!self-close_pend)) - { - if (self-notify.flow_indication) -
Re: [irda-users] [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...
On Tue, 10 Apr 2007, Samuel Ortiz wrote: The patch below schedules irnet_flow_indication() asynchronously. Could you please give it a try (it builds, but I couldn't test it...) ? : I'm afraid, your patch makes it worse - with my patch to ppp_generic it worked 5 hours before Ir stopped, remaining at 4mbaud, and only discovery timer was expiring periodically without actually doing anything; with your patch the same happens in about 5 minutes. I'll double-verify it, and then will try the same work-queue approach, but from ppp_generic, rather than from irttp, combining our 2 patches. Thanks Guennadi - Guennadi Liakhovetski, Ph.D. DSA Daten- und Systemtechnik GmbH Pascalstr. 28 D-52076 Aachen Germany - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [irda-users] [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...
Hi Guennadi, On Sat, Apr 07, 2007 at 03:59:26AM +0300, Samuel Ortiz wrote: IMHO, irnet_flow_indication() should be called asynchronously by irttp_run_tx_queue(), through some bottom-half mechanism. That would fix your locking issues, and that would reduce the time we spend in the IrDA code with this lock taken. I will try to come up with some patches for you later this weekend. The patch below schedules irnet_flow_indication() asynchronously. Could you please give it a try (it builds, but I couldn't test it...) ? : diff --git a/include/net/irda/irttp.h b/include/net/irda/irttp.h index a899e58..941f0f1 100644 --- a/include/net/irda/irttp.h +++ b/include/net/irda/irttp.h @@ -128,6 +128,7 @@ struct tsap_cb { struct net_device_stats stats; struct timer_list todo_timer; + struct work_struct irnet_flow_work; /* irttp asynchronous flow restart */ __u32 max_seg_size; /* Max data that fit into an IrLAP frame */ __u8 max_header_size; diff --git a/net/irda/irnet/irnet.h b/net/irda/irnet/irnet.h diff --git a/net/irda/irttp.c b/net/irda/irttp.c index 7069e4a..a0d0f26 100644 --- a/net/irda/irttp.c +++ b/net/irda/irttp.c @@ -367,6 +367,29 @@ static int irttp_param_max_sdu_size(void *instance, irda_param_t *param, /*** CLIENT CALLS ***/ /** LMP CALLBACKS **/ /* Everything is happily mixed up. Waiting for next clean up - Jean II */ +static void irttp_flow_restart(struct work_struct *work) +{ + struct tsap_cb * self = + container_of(work, struct tsap_cb, irnet_flow_work); + + if (self == NULL) + return; + + /* Check if we can accept more frames from client. */ + if ((self-tx_sdu_busy) + (skb_queue_len(self-tx_queue) TTP_TX_LOW_THRESHOLD) + (!self-close_pend)) { + if (self-notify.flow_indication) + self-notify.flow_indication(self-notify.instance, +self, FLOW_START); + + /* self-tx_sdu_busy is the state of the client. +* We don't really have a race here, but it's always safer +* to update our state after the client - Jean II */ + self-tx_sdu_busy = FALSE; + } +} + /* * Function irttp_open_tsap (stsap, notify) @@ -402,6 +425,8 @@ struct tsap_cb *irttp_open_tsap(__u8 stsap_sel, int credit, notify_t *notify) self-todo_timer.data = (unsigned long) self; self-todo_timer.function = irttp_todo_expired; + INIT_WORK(self-irnet_flow_work, irttp_flow_restart); + /* Initialize callbacks for IrLMP to use */ irda_notify_init(ttp_notify); ttp_notify.connect_confirm = irttp_connect_confirm; @@ -761,25 +786,10 @@ static void irttp_run_tx_queue(struct tsap_cb *self) self-stats.tx_packets++; } - /* Check if we can accept more frames from client. -* We don't want to wait until the todo timer to do that, and we -* can't use tasklets (grr...), so we are obliged to give control -* to client. That's ok, this test will be true not too often -* (max once per LAP window) and we are called from places -* where we can spend a bit of time doing stuff. - Jean II */ if ((self-tx_sdu_busy) (skb_queue_len(self-tx_queue) TTP_TX_LOW_THRESHOLD) (!self-close_pend)) - { - if (self-notify.flow_indication) - self-notify.flow_indication(self-notify.instance, -self, FLOW_START); - - /* self-tx_sdu_busy is the state of the client. -* We don't really have a race here, but it's always safer -* to update our state after the client - Jean II */ - self-tx_sdu_busy = FALSE; - } + schedule_work(self-irnet_flow_work); /* Reset lock */ self-tx_queue_lock = 0; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [irda-users] [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...
Hi Guennadi, On Thu, Apr 05, 2007 at 12:59:40PM +0200, Guennadi Liakhovetski wrote: Ok, a simple analysis reveals the recursive spinlock: On Thu, 5 Apr 2007, Guennadi Liakhovetski wrote: [bf12b220] (ppp_channel_push+0x0/0xc8 [ppp_generic]) from [bf12bf98] (ppp_output_wakeup+0x18/0x1c [ppp_generic]) === r7 = C38F42BC r6 = C38F4200 r5 = C38F4200 r4 = === spin_lock_bh(pch-downl); [bf12bf80] (ppp_output_wakeup+0x0/0x1c [ppp_generic]) from [bf132c98] (irnet_flow_indication+0x38/0x3c [irnet]) [bf132c60] (irnet_flow_indication+0x0/0x3c [irnet]) from [bf104e4c] (irttp_run_tx_queue+0x1c0/0x1d4 [irda]) [bf104c8c] (irttp_run_tx_queue+0x0/0x1d4 [irda]) from [bf104f88] (irttp_data_request+0x128/0x4f8 [irda]) r8 = BF121560 r7 = 0002 r6 = C38F4200 r5 = C21418B8 r4 = C21418B8 [bf104e60] (irttp_data_request+0x0/0x4f8 [irda]) from [bf1321bc] (ppp_irnet_send+0x134/0x238 [irnet]) [bf132088] (ppp_irnet_send+0x0/0x238 [irnet]) from [bf12a600] (ppp_push+0x80/0xb8 [ppp_generic]) r7 = C3A436E0 r6 = r5 = C21418B8 r4 = C1489600 [bf12a580] (ppp_push+0x0/0xb8 [ppp_generic]) from [bf12a8d8] (ppp_xmit_process+0x34/0x50c [ppp_generic]) === r7 = 0021 r6 = C21418B8 r5 = C1489600 r4 = === spin_lock_bh(pch-downl); [bf12a8a4] (ppp_xmit_process+0x0/0x50c [ppp_generic]) from [bf12aed8] (ppp_start_xmit+0x128/0x254 [ppp_generic]) [bf12adb0] (ppp_start_xmit+0x0/0x254 [ppp_generic]) from [c0186fa4] (dev_hard_start_xmit+0x170/0x268) [c0186e34] (dev_hard_start_xmit+0x0/0x268) from [c01979b8] (__qdisc_run+0x60/0x270) r8 = C1BBC914 r7 = C21418B8 r6 = r5 = C21418B8 r4 = C1BBC800 [c0197958] (__qdisc_run+0x0/0x270) from [c0187250] (dev_queue_xmit+0x1b4/0x25c) Now, does anyone have an idea how best to fix it? 1. Should re-entrance in ppp_channel_push() be prevented, and if yes - at which level? Or 2. Should re-entrance be allowed and only recursive spin_lock_bh() avoided? IMHO, irnet_flow_indication() should be called asynchronously by irttp_run_tx_queue(), through some bottom-half mechanism. That would fix your locking issues, and that would reduce the time we spend in the IrDA code with this lock taken. I will try to come up with some patches for you later this weekend. Cheers, Samuel. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html