Re: [irda-users] [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...

2007-05-02 Thread Guennadi Liakhovetski
On Tue, 1 May 2007, Samuel Ortiz wrote:
 But I will definitely spend some time this week running my IrDA stack 
 with this patch applied. I didn't bother to do that earlier as you first 
 reported some oops with this patch applied.

I think, what I reported was not an Oops, but the race that we're also 
fixing ATM - the one in the state machine. So, unrelated.

 On Mon, Apr 30, 2007 at 03:24:05PM +0200, Guennadi Liakhovetski wrote:

  in irttp_dup() (remember spinlock_init()?:-)), otherwise it oopses.
 good catch, again...Yes, I do remember the irttp_dup bug ;-)

I've put a tsap_init() function that does those common initialisation 
calls, so we have a smaller chance of forgetting the dup() path next 
time...

  I will be 
  testing it too, but don't know how much longer and how intensively. Do you 
  think we might get some new problems with this new context?
 It seems quite safe to me. But more importantly, I thing we do want to call
 the flow indication routine asynchronously in that specific case. 
 There is one thing that this patch is missing though: we should probably
 clean the worqueue right before we destroy the TTP instance. The work routine
 checks for NULL pointer, but still...

I thought about it too, but the only thing you can do is 
flush_scheduled_work(), is this what you mean?

The test with your patch stopped inside a ftp transfer - the ftp client 
was doing a get, and it stopped half-way through. On the client side only 
the control tcp connection was still open, on the client both ftp-server 
forked children were dead, no connection open. No errors in logs. Weird. 
With my ugly patch it ran the weekend through.

...and it just stopped again - this time after just 13 minutes.

A couple of comments to your patch:

   +static void irttp_flow_restart(struct work_struct *work)
   +{
   + struct tsap_cb * self =
   + container_of(work, struct tsap_cb, irnet_flow_work);
   +
   + if (self == NULL)
   + return;

self will not be NULL here. How about

+   IRDA_ASSERT(work != NULL, return;);
+   IRDA_ASSERT(self-magic == TTP_TSAP_MAGIC, return;);

   - /* Check if we can accept more frames from client.
   -  * We don't want to wait until the todo timer to do that, and we
   -  * can't use tasklets (grr...), so we are obliged to give control
   -  * to client. That's ok, this test will be true not too often
   -  * (max once per LAP window) and we are called from places
   -  * where we can spend a bit of time doing stuff. - Jean II */
 if ((self-tx_sdu_busy) 
 (skb_queue_len(self-tx_queue)  TTP_TX_LOW_THRESHOLD) 
 (!self-close_pend))

Is this test still needed here? You do it again in the work-function, and 
the conditions can change, so, should we schedule_work unconditionally 
here?

Thanks
Guennadi
-
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [irda-users] [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...

2007-04-30 Thread Guennadi Liakhovetski
On Tue, 10 Apr 2007, Samuel Ortiz wrote:

 Hi Guennadi,
 
 The patch below schedules irnet_flow_indication() asynchronously. Could
 you please give it a try (it builds, but I couldn't test it...) ? :

Ok, your patch (still below) works too (now that I fixed that state 
machine race, btw, we still have to decide on the final form how it goes 
in the mainline) __after__ you also add the line

+   INIT_WORK(new-irnet_flow_work, irttp_flow_restart);

in irttp_dup() (remember spinlock_init()?:-)), otherwise it oopses. 
Generally, I like your patch better than mine to ppp_generic.c, where I 
explicitly check if a recursion is occurring. Still, I am a bit concerned 
about introducing yet another execution context into irda... We have seen 
a couple of locking issues there already in the last 2-3 months especially 
under rt-preempt... Would you be able to run some tests too? I will be 
testing it too, but don't know how much longer and how intensively. Do you 
think we might get some new problems with this new context?

Thanks
Guennadi

 
 diff --git a/include/net/irda/irttp.h b/include/net/irda/irttp.h
 index a899e58..941f0f1 100644
 --- a/include/net/irda/irttp.h
 +++ b/include/net/irda/irttp.h
 @@ -128,6 +128,7 @@ struct tsap_cb {
  
   struct net_device_stats stats;
   struct timer_list todo_timer; 
 + struct work_struct irnet_flow_work;   /* irttp asynchronous flow 
 restart */
  
   __u32 max_seg_size; /* Max data that fit into an IrLAP frame */
   __u8  max_header_size;
 diff --git a/net/irda/irnet/irnet.h b/net/irda/irnet/irnet.h
 diff --git a/net/irda/irttp.c b/net/irda/irttp.c
 index 7069e4a..a0d0f26 100644
 --- a/net/irda/irttp.c
 +++ b/net/irda/irttp.c
 @@ -367,6 +367,29 @@ static int irttp_param_max_sdu_size(void *instance, 
 irda_param_t *param,
  /*** CLIENT CALLS ***/
  /** LMP CALLBACKS **/
  /* Everything is happily mixed up. Waiting for next clean up - Jean II */
 +static void irttp_flow_restart(struct work_struct *work)
 +{
 + struct tsap_cb * self =
 + container_of(work, struct tsap_cb, irnet_flow_work);
 +
 + if (self == NULL)
 + return;
 +
 + /* Check if we can accept more frames from client. */
 + if ((self-tx_sdu_busy) 
 + (skb_queue_len(self-tx_queue)  TTP_TX_LOW_THRESHOLD) 
 + (!self-close_pend)) {
 + if (self-notify.flow_indication)
 + self-notify.flow_indication(self-notify.instance,
 +  self, FLOW_START);
 +
 + /* self-tx_sdu_busy is the state of the client.
 +  * We don't really have a race here, but it's always safer
 +  * to update our state after the client - Jean II */
 + self-tx_sdu_busy = FALSE;
 + }
 +}
 +
  
  /*
   * Function irttp_open_tsap (stsap, notify)
 @@ -402,6 +425,8 @@ struct tsap_cb *irttp_open_tsap(__u8 stsap_sel, int 
 credit, notify_t *notify)
   self-todo_timer.data = (unsigned long) self;
   self-todo_timer.function = irttp_todo_expired;
  
 + INIT_WORK(self-irnet_flow_work, irttp_flow_restart);
 +
   /* Initialize callbacks for IrLMP to use */
   irda_notify_init(ttp_notify);
   ttp_notify.connect_confirm = irttp_connect_confirm;
 @@ -761,25 +786,10 @@ static void irttp_run_tx_queue(struct tsap_cb *self)
   self-stats.tx_packets++;
   }
  
 - /* Check if we can accept more frames from client.
 -  * We don't want to wait until the todo timer to do that, and we
 -  * can't use tasklets (grr...), so we are obliged to give control
 -  * to client. That's ok, this test will be true not too often
 -  * (max once per LAP window) and we are called from places
 -  * where we can spend a bit of time doing stuff. - Jean II */
   if ((self-tx_sdu_busy) 
   (skb_queue_len(self-tx_queue)  TTP_TX_LOW_THRESHOLD) 
   (!self-close_pend))
 - {
 - if (self-notify.flow_indication)
 - self-notify.flow_indication(self-notify.instance,
 -  self, FLOW_START);
 -
 - /* self-tx_sdu_busy is the state of the client.
 -  * We don't really have a race here, but it's always safer
 -  * to update our state after the client - Jean II */
 - self-tx_sdu_busy = FALSE;
 - }
 + schedule_work(self-irnet_flow_work);
  
   /* Reset lock */
   self-tx_queue_lock = 0;
 
 
 

-
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [irda-users] [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...

2007-04-30 Thread Samuel Ortiz
On Mon, Apr 30, 2007 at 03:24:05PM +0200, Guennadi Liakhovetski wrote:
 On Tue, 10 Apr 2007, Samuel Ortiz wrote:
 
  Hi Guennadi,
  
  The patch below schedules irnet_flow_indication() asynchronously. Could
  you please give it a try (it builds, but I couldn't test it...) ? :
 
 Ok, your patch (still below) works too (now that I fixed that state 
 machine race, btw, we still have to decide on the final form how it goes 
 in the mainline) __after__ you also add the line
 
 + INIT_WORK(new-irnet_flow_work, irttp_flow_restart);
 
 in irttp_dup() (remember spinlock_init()?:-)), otherwise it oopses.
good catch, again...Yes, I do remember the irttp_dup bug ;-)

 
 Generally, I like your patch better than mine to ppp_generic.c, where I 
 explicitly check if a recursion is occurring. Still, I am a bit concerned 
 about introducing yet another execution context into irda... We have seen 
 a couple of locking issues there already in the last 2-3 months especially 
 under rt-preempt... Would you be able to run some tests too? 
I think I can run some tests here as well, but probably not as many as you:
I'm not doing IrDA stuff full time while it seems you currently are.
But I will definitely spend some time this week running my IrDA stack with this
patch applied. I didn't bother to do that earlier as you first reported some
oops with this patch applied.

 I will be 
 testing it too, but don't know how much longer and how intensively. Do you 
 think we might get some new problems with this new context?
It seems quite safe to me. But more importantly, I thing we do want to call
the flow indication routine asynchronously in that specific case. 
There is one thing that this patch is missing though: we should probably
clean the worqueue right before we destroy the TTP instance. The work routine
checks for NULL pointer, but still...

Cheers,
Samuel.


 Thanks
 Guennadi
 
  
  diff --git a/include/net/irda/irttp.h b/include/net/irda/irttp.h
  index a899e58..941f0f1 100644
  --- a/include/net/irda/irttp.h
  +++ b/include/net/irda/irttp.h
  @@ -128,6 +128,7 @@ struct tsap_cb {
   
  struct net_device_stats stats;
  struct timer_list todo_timer; 
  +   struct work_struct irnet_flow_work;   /* irttp asynchronous flow 
  restart */
   
  __u32 max_seg_size; /* Max data that fit into an IrLAP frame */
  __u8  max_header_size;
  diff --git a/net/irda/irnet/irnet.h b/net/irda/irnet/irnet.h
  diff --git a/net/irda/irttp.c b/net/irda/irttp.c
  index 7069e4a..a0d0f26 100644
  --- a/net/irda/irttp.c
  +++ b/net/irda/irttp.c
  @@ -367,6 +367,29 @@ static int irttp_param_max_sdu_size(void *instance, 
  irda_param_t *param,
   /*** CLIENT CALLS ***/
/** LMP CALLBACKS **/
   /* Everything is happily mixed up. Waiting for next clean up - Jean II */
  +static void irttp_flow_restart(struct work_struct *work)
  +{
  +   struct tsap_cb * self =
  +   container_of(work, struct tsap_cb, irnet_flow_work);
  +
  +   if (self == NULL)
  +   return;
  +
  +   /* Check if we can accept more frames from client. */
  +   if ((self-tx_sdu_busy) 
  +   (skb_queue_len(self-tx_queue)  TTP_TX_LOW_THRESHOLD) 
  +   (!self-close_pend)) {
  +   if (self-notify.flow_indication)
  +   self-notify.flow_indication(self-notify.instance,
  +self, FLOW_START);
  +
  +   /* self-tx_sdu_busy is the state of the client.
  +* We don't really have a race here, but it's always safer
  +* to update our state after the client - Jean II */
  +   self-tx_sdu_busy = FALSE;
  +   }
  +}
  +
   
   /*
* Function irttp_open_tsap (stsap, notify)
  @@ -402,6 +425,8 @@ struct tsap_cb *irttp_open_tsap(__u8 stsap_sel, int 
  credit, notify_t *notify)
  self-todo_timer.data = (unsigned long) self;
  self-todo_timer.function = irttp_todo_expired;
   
  +   INIT_WORK(self-irnet_flow_work, irttp_flow_restart);
  +
  /* Initialize callbacks for IrLMP to use */
  irda_notify_init(ttp_notify);
  ttp_notify.connect_confirm = irttp_connect_confirm;
  @@ -761,25 +786,10 @@ static void irttp_run_tx_queue(struct tsap_cb *self)
  self-stats.tx_packets++;
  }
   
  -   /* Check if we can accept more frames from client.
  -* We don't want to wait until the todo timer to do that, and we
  -* can't use tasklets (grr...), so we are obliged to give control
  -* to client. That's ok, this test will be true not too often
  -* (max once per LAP window) and we are called from places
  -* where we can spend a bit of time doing stuff. - Jean II */
  if ((self-tx_sdu_busy) 
  (skb_queue_len(self-tx_queue)  TTP_TX_LOW_THRESHOLD) 
  (!self-close_pend))
  -   {
  -   if (self-notify.flow_indication)
  -   

Re: [irda-users] [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...

2007-04-11 Thread Guennadi Liakhovetski
On Tue, 10 Apr 2007, Samuel Ortiz wrote:

 The patch below schedules irnet_flow_indication() asynchronously. Could
 you please give it a try (it builds, but I couldn't test it...) ? :

I'm afraid, your patch makes it worse - with my patch to ppp_generic it 
worked 5 hours before Ir stopped, remaining at 4mbaud, and only discovery 
timer was expiring periodically without actually doing anything; with your 
patch the same happens in about 5 minutes.

I'll double-verify it, and then will try the same work-queue approach, but 
from ppp_generic, rather than from irttp, combining our 2 patches.

Thanks
Guennadi
-
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [irda-users] [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...

2007-04-10 Thread Samuel Ortiz
Hi Guennadi,

On Sat, Apr 07, 2007 at 03:59:26AM +0300, Samuel Ortiz wrote:
 IMHO, irnet_flow_indication() should be called asynchronously by
 irttp_run_tx_queue(), through some bottom-half mechanism. That would fix your
 locking issues, and that would reduce the time we spend in the IrDA code with
 this lock taken.
 
 I will try to come up with some patches for you later this weekend.
The patch below schedules irnet_flow_indication() asynchronously. Could
you please give it a try (it builds, but I couldn't test it...) ? :

diff --git a/include/net/irda/irttp.h b/include/net/irda/irttp.h
index a899e58..941f0f1 100644
--- a/include/net/irda/irttp.h
+++ b/include/net/irda/irttp.h
@@ -128,6 +128,7 @@ struct tsap_cb {
 
struct net_device_stats stats;
struct timer_list todo_timer; 
+   struct work_struct irnet_flow_work;   /* irttp asynchronous flow 
restart */
 
__u32 max_seg_size; /* Max data that fit into an IrLAP frame */
__u8  max_header_size;
diff --git a/net/irda/irnet/irnet.h b/net/irda/irnet/irnet.h
diff --git a/net/irda/irttp.c b/net/irda/irttp.c
index 7069e4a..a0d0f26 100644
--- a/net/irda/irttp.c
+++ b/net/irda/irttp.c
@@ -367,6 +367,29 @@ static int irttp_param_max_sdu_size(void *instance, 
irda_param_t *param,
 /*** CLIENT CALLS ***/
 /** LMP CALLBACKS **/
 /* Everything is happily mixed up. Waiting for next clean up - Jean II */
+static void irttp_flow_restart(struct work_struct *work)
+{
+   struct tsap_cb * self =
+   container_of(work, struct tsap_cb, irnet_flow_work);
+
+   if (self == NULL)
+   return;
+
+   /* Check if we can accept more frames from client. */
+   if ((self-tx_sdu_busy) 
+   (skb_queue_len(self-tx_queue)  TTP_TX_LOW_THRESHOLD) 
+   (!self-close_pend)) {
+   if (self-notify.flow_indication)
+   self-notify.flow_indication(self-notify.instance,
+self, FLOW_START);
+
+   /* self-tx_sdu_busy is the state of the client.
+* We don't really have a race here, but it's always safer
+* to update our state after the client - Jean II */
+   self-tx_sdu_busy = FALSE;
+   }
+}
+
 
 /*
  * Function irttp_open_tsap (stsap, notify)
@@ -402,6 +425,8 @@ struct tsap_cb *irttp_open_tsap(__u8 stsap_sel, int credit, 
notify_t *notify)
self-todo_timer.data = (unsigned long) self;
self-todo_timer.function = irttp_todo_expired;
 
+   INIT_WORK(self-irnet_flow_work, irttp_flow_restart);
+
/* Initialize callbacks for IrLMP to use */
irda_notify_init(ttp_notify);
ttp_notify.connect_confirm = irttp_connect_confirm;
@@ -761,25 +786,10 @@ static void irttp_run_tx_queue(struct tsap_cb *self)
self-stats.tx_packets++;
}
 
-   /* Check if we can accept more frames from client.
-* We don't want to wait until the todo timer to do that, and we
-* can't use tasklets (grr...), so we are obliged to give control
-* to client. That's ok, this test will be true not too often
-* (max once per LAP window) and we are called from places
-* where we can spend a bit of time doing stuff. - Jean II */
if ((self-tx_sdu_busy) 
(skb_queue_len(self-tx_queue)  TTP_TX_LOW_THRESHOLD) 
(!self-close_pend))
-   {
-   if (self-notify.flow_indication)
-   self-notify.flow_indication(self-notify.instance,
-self, FLOW_START);
-
-   /* self-tx_sdu_busy is the state of the client.
-* We don't really have a race here, but it's always safer
-* to update our state after the client - Jean II */
-   self-tx_sdu_busy = FALSE;
-   }
+   schedule_work(self-irnet_flow_work);
 
/* Reset lock */
self-tx_queue_lock = 0;

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [irda-users] [BUG] 2.6.20.1-rt8 irnet + pppd recursive spinlock...

2007-04-06 Thread Samuel Ortiz
Hi Guennadi,

On Thu, Apr 05, 2007 at 12:59:40PM +0200, Guennadi Liakhovetski wrote:
 Ok, a simple analysis reveals the recursive spinlock:
 
 On Thu, 5 Apr 2007, Guennadi Liakhovetski wrote:
 
  [bf12b220] (ppp_channel_push+0x0/0xc8 [ppp_generic]) from [bf12bf98] 
  (ppp_output_wakeup+0x18/0x1c [ppp_generic])
 ===  
   r7 = C38F42BC  r6 = C38F4200  r5 = C38F4200  r4 = 
 
 ===  spin_lock_bh(pch-downl);
 
  [bf12bf80] (ppp_output_wakeup+0x0/0x1c [ppp_generic]) from [bf132c98] 
  (irnet_flow_indication+0x38/0x3c [irnet])
  [bf132c60] (irnet_flow_indication+0x0/0x3c [irnet]) from [bf104e4c] 
  (irttp_run_tx_queue+0x1c0/0x1d4 [irda])
  [bf104c8c] (irttp_run_tx_queue+0x0/0x1d4 [irda]) from [bf104f88] 
  (irttp_data_request+0x128/0x4f8 [irda])
   r8 = BF121560  r7 = 0002  r6 = C38F4200  r5 = C21418B8
   r4 = C21418B8 
  [bf104e60] (irttp_data_request+0x0/0x4f8 [irda]) from [bf1321bc] 
  (ppp_irnet_send+0x134/0x238 [irnet])
  [bf132088] (ppp_irnet_send+0x0/0x238 [irnet]) from [bf12a600] 
  (ppp_push+0x80/0xb8 [ppp_generic])
   r7 = C3A436E0  r6 =   r5 = C21418B8  r4 = C1489600
  [bf12a580] (ppp_push+0x0/0xb8 [ppp_generic]) from [bf12a8d8] 
  (ppp_xmit_process+0x34/0x50c [ppp_generic])
 ===  
   r7 = 0021  r6 = C21418B8  r5 = C1489600  r4 = 
 
 ===  spin_lock_bh(pch-downl);
 
  [bf12a8a4] (ppp_xmit_process+0x0/0x50c [ppp_generic]) from [bf12aed8] 
  (ppp_start_xmit+0x128/0x254 [ppp_generic])
  [bf12adb0] (ppp_start_xmit+0x0/0x254 [ppp_generic]) from [c0186fa4] 
  (dev_hard_start_xmit+0x170/0x268)
  [c0186e34] (dev_hard_start_xmit+0x0/0x268) from [c01979b8] 
  (__qdisc_run+0x60/0x270)
   r8 = C1BBC914  r7 = C21418B8  r6 =   r5 = C21418B8
   r4 = C1BBC800 
  [c0197958] (__qdisc_run+0x0/0x270) from [c0187250] 
  (dev_queue_xmit+0x1b4/0x25c)
 
 Now, does anyone have an idea how best to fix it?
 
 1. Should re-entrance in ppp_channel_push() be prevented, and if yes - at 
 which level? Or
 
 2. Should re-entrance be allowed and only recursive spin_lock_bh() 
 avoided?
IMHO, irnet_flow_indication() should be called asynchronously by
irttp_run_tx_queue(), through some bottom-half mechanism. That would fix your
locking issues, and that would reduce the time we spend in the IrDA code with
this lock taken.

I will try to come up with some patches for you later this weekend.

Cheers,
Samuel.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html