Re: Gianfar driver failing on MPC8641D based board

2010-03-02 Thread Anton Vorontsov
Hi!

On Sat, Feb 27, 2010 at 11:05:32AM +0530, Kumar Gopalpet-B05799 wrote:
[...]
 Understood, and thanks for the explanation. Am I correct in saying that
 this is
 due to the out-of-order execution capability on powerpc ?

Nope, that was just a logic issue in the driver. 

Though, with the patch, the eieio() is needed so that compiler (or CPU)
won't reorder lstatus and skbuff writes.

 I have one more question, why don't we use use atomic_t for num_txbdfree
 and
 completely  do away with spin_locks in gfar_clean_tx_ring() and
 gfar_start_xmit().
 In an non-SMP, scenario I would feel there is absolutely no requirement
 of spin_locks
 and in case of SMP atomic operation would be much more safer on powerpc
 rather than spin_locks.
 
 What is your suggestion ?

I think that's a good idea.

However, in start_xmit() we'll have to keep the spinlock anyway
since it also protects from gfar_error(), which can modify
regs-tstat.

Thanks!

-- 
Anton Vorontsov
email: cbouatmai...@gmail.com
irc://irc.freenode.net/bd2
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Gianfar driver failing on MPC8641D based board

2010-03-01 Thread Martyn Welch
Anton Vorontsov wrote:
 diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
 index 8bd3c9f..cccb409 100644
 --- a/drivers/net/gianfar.c
 +++ b/drivers/net/gianfar.c
 @@ -2021,7 +2021,6 @@ static int gfar_start_xmit(struct sk_buff *skb, struct 
 net_device *dev)
   }
  
   /* setup the TxBD length and buffer pointer for the first BD */
 - tx_queue-tx_skbuff[tx_queue-skb_curtx] = skb;
   txbdp_start-bufPtr = dma_map_single(priv-ofdev-dev, skb-data,
   skb_headlen(skb), DMA_TO_DEVICE);
  
 @@ -2053,6 +2052,10 @@ static int gfar_start_xmit(struct sk_buff *skb, struct 
 net_device *dev)
  
   txbdp_start-lstatus = lstatus;
  
 + eieio(); /* force lstatus write before tx_skbuff */
 +
 + tx_queue-tx_skbuff[tx_queue-skb_curtx] = skb;
 +
   /* Update the current skb pointer to the next entry we will use
* (wrapping if necessary) */
   tx_queue-skb_curtx = (tx_queue-skb_curtx + 1) 
   
I can confirm 10/10 successful boots on p2020ds and mpc8641_hpcn.

Martyn


-- 
Martyn Welch (Principal Software Engineer)   |   Registered in England and
GE Intelligent Platforms |   Wales (3828642) at 100
T +44(0)127322748|   Barbirolli Square, Manchester,
E martyn.we...@ge.com|   M2 3AB  VAT:GB 927559189

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Gianfar driver failing on MPC8641D based board

2010-02-26 Thread Martyn Welch
Anton Vorontsov wrote:
 On Thu, Feb 25, 2010 at 04:46:54PM +, Martyn Welch wrote:
 [...]
   
 nfs: server 192.168.0.1 not responding, still trying
   
   
 Further testing has shown that this isn't restricted to warm reboots, it
 happens from cold as well. In addition, the exact timing of the failure
 seems to vary, some boots have got further before failing.
 

 Unfortunately I don't have any 8641 boards near me, so I can't
 debug this myself. Though, I tested gianfar on MPC8568E-MDS with
 2.6.33 kernel, and it seems to work just fine.

 I see you use SMP. Can you try to turn it off? If that will fix
 the issue, then it'll be a good data point.

 Meanwhile, I'll try SMP kernel on MPC8568 (UP), and let you
 know the results.

 Thanks

I removed the second core from the dts file rather than truly disabling
SMP in the kernel config. Doing this allowed the board to boot reliably.

Martyn

-- 
Martyn Welch (Principal Software Engineer)   |   Registered in England and
GE Intelligent Platforms |   Wales (3828642) at 100
T +44(0)127322748|   Barbirolli Square, Manchester,
E martyn.we...@ge.com|   M2 3AB  VAT:GB 927559189

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Gianfar driver failing on MPC8641D based board

2010-02-26 Thread Martyn Welch
Anton Vorontsov wrote:
 On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote:
 [...]
   
 I was able to reproduce it on an 8641D and bisected it down to this:

 ---
 commit a3bc1f11e9b867a4f49505ecac486a33af248b2e
 Author: Anton Vorontsov avoront...@ru.mvista.com
 Date:   Tue Nov 10 14:11:10 2009 +

 gianfar: Revive SKB recycling
 

 Thanks for the bisect. I have a guess why tx hangs in
 SMP case. Could anyone try the patch down below?
   

Yup, no problem. I'm afraid it doesn't resolve the problem for me.

 [...]
   
 ...which probably explains why you weren't seeing it on non-SMP.
 I'd imagine it would show up on any of the e500mc boards too.
 

 Yeah.. Pity, I don't have SMP boards anymore. I'll try
 to get one though.


 diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
 index 8bd3c9f..3ff3bd0 100644
 --- a/drivers/net/gianfar.c
 +++ b/drivers/net/gianfar.c
 @@ -2614,6 +2614,8 @@ static int gfar_poll(struct napi_struct *napi, int 
 budget)
   tx_queue = priv-tx_queue[rx_queue-qindex];
  
   tx_cleaned += gfar_clean_tx_ring(tx_queue);
 + if (!tx_cleaned  !tx_queue-num_txbdfree)
 + tx_cleaned += 1; /* don't complete napi */
   rx_cleaned_per_queue = gfar_clean_rx_ring(rx_queue,
   budget_per_queue);
   rx_cleaned += rx_cleaned_per_queue;
   


-- 
Martyn Welch (Principal Software Engineer)   |   Registered in England and
GE Intelligent Platforms |   Wales (3828642) at 100
T +44(0)127322748|   Barbirolli Square, Manchester,
E martyn.we...@ge.com|   M2 3AB  VAT:GB 927559189

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Gianfar driver failing on MPC8641D based board

2010-02-26 Thread Anton Vorontsov
On Fri, Feb 26, 2010 at 12:06:15PM +, Martyn Welch wrote:
 Anton Vorontsov wrote:
  On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote:
  [...]

  I was able to reproduce it on an 8641D and bisected it down to this:
 
  ---
  commit a3bc1f11e9b867a4f49505ecac486a33af248b2e
  Author: Anton Vorontsov avoront...@ru.mvista.com
  Date:   Tue Nov 10 14:11:10 2009 +
 
  gianfar: Revive SKB recycling
  
 
  Thanks for the bisect. I have a guess why tx hangs in
  SMP case. Could anyone try the patch down below?

 
 Yup, no problem. I'm afraid it doesn't resolve the problem for me.

Hm.. I found a p2020 board and I was able to reproduce the issue.
The patch down below fixed it completely for me... hm.

I'll look further, thanks!

  [...]

  ...which probably explains why you weren't seeing it on non-SMP.
  I'd imagine it would show up on any of the e500mc boards too.
  
 
  Yeah.. Pity, I don't have SMP boards anymore. I'll try
  to get one though.
 
 
  diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
  index 8bd3c9f..3ff3bd0 100644
  --- a/drivers/net/gianfar.c
  +++ b/drivers/net/gianfar.c
  @@ -2614,6 +2614,8 @@ static int gfar_poll(struct napi_struct *napi, int 
  budget)
  tx_queue = priv-tx_queue[rx_queue-qindex];
   
  tx_cleaned += gfar_clean_tx_ring(tx_queue);
  +   if (!tx_cleaned  !tx_queue-num_txbdfree)
  +   tx_cleaned += 1; /* don't complete napi */
  rx_cleaned_per_queue = gfar_clean_rx_ring(rx_queue,
  budget_per_queue);
  rx_cleaned += rx_cleaned_per_queue;


-- 
Anton Vorontsov
email: cbouatmai...@gmail.com
irc://irc.freenode.net/bd2
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Gianfar driver failing on MPC8641D based board

2010-02-26 Thread Martyn Welch
Martyn Welch wrote:
 Paul Gortmaker wrote:
   
 On 10-02-26 09:35 AM, Anton Vorontsov wrote:
   
 
 On Fri, Feb 26, 2010 at 12:06:15PM +, Martyn Welch wrote:
 
   
 Anton Vorontsov wrote:
   
 
 On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote:
 [...]

 
   
 I was able to reproduce it on an 8641D and bisected it down to this:

 ---
 commit a3bc1f11e9b867a4f49505ecac486a33af248b2e
 Author: Anton Vorontsovavoront...@ru.mvista.com
 Date:   Tue Nov 10 14:11:10 2009 +

  gianfar: Revive SKB recycling

   
 
 Thanks for the bisect. I have a guess why tx hangs in
 SMP case. Could anyone try the patch down below?

 
   
 Yup, no problem. I'm afraid it doesn't resolve the problem for me.
   
 
 Hm.. I found a p2020 board and I was able to reproduce the issue.
 The patch down below fixed it completely for me... hm.
 
   
 Interesting. I just tested the patch on the sbc8641d, and it
 still has the issue with your patch applied.  I'm using NFSroot
 just like Martyn was and it still appears bound up on that
 gianfar tx lock.  I'll see if I can get a SysRq backtrace in
 case that will help you see how it manages to get there...
   
 

 I've got a p2020ds here as well, so I'll give NFSroot on that a try with
 your patch.
   

Out of 10 boot attempts, 7 failed.

Martyn

-- 
Martyn Welch (Principal Software Engineer)   |   Registered in England and
GE Intelligent Platforms |   Wales (3828642) at 100
T +44(0)127322748|   Barbirolli Square, Manchester,
E martyn.we...@ge.com|   M2 3AB  VAT:GB 927559189

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Gianfar driver failing on MPC8641D based board

2010-02-26 Thread Anton Vorontsov
On Fri, Feb 26, 2010 at 03:34:07PM +, Martyn Welch wrote:
[...]
 Out of 10 boot attempts, 7 failed.

OK, I see why. With ip=on (dhcp boot) it's much harder to trigger
it. With static ip config can I see the same.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Gianfar driver failing on MPC8641D based board

2010-02-26 Thread Paul Gortmaker
On 10-02-26 09:35 AM, Anton Vorontsov wrote:
 On Fri, Feb 26, 2010 at 12:06:15PM +, Martyn Welch wrote:
 Anton Vorontsov wrote:
 On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote:
 [...]

 I was able to reproduce it on an 8641D and bisected it down to this:

 ---
 commit a3bc1f11e9b867a4f49505ecac486a33af248b2e
 Author: Anton Vorontsovavoront...@ru.mvista.com
 Date:   Tue Nov 10 14:11:10 2009 +

  gianfar: Revive SKB recycling


 Thanks for the bisect. I have a guess why tx hangs in
 SMP case. Could anyone try the patch down below?


 Yup, no problem. I'm afraid it doesn't resolve the problem for me.
 
 Hm.. I found a p2020 board and I was able to reproduce the issue.
 The patch down below fixed it completely for me... hm.

Interesting. I just tested the patch on the sbc8641d, and it
still has the issue with your patch applied.  I'm using NFSroot
just like Martyn was and it still appears bound up on that
gianfar tx lock.  I'll see if I can get a SysRq backtrace in
case that will help you see how it manages to get there...

Paul.



nfs: server not responding, still trying 

[repeated ~15 times, then...]
  
INFO: task rc.sysinit:837 blocked for more than 120 seconds.
echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this message.   
rc.sysinitD 0fef73f4 0   837836 0x  
Call Trace: 
[dfb7d9b0] [c000a144] __switch_to+0x8c/0xf8 
[dfb7d9d0] [c03443dc] schedule+0x380/0x954  
[dfb7da50] [c0344a0c] io_schedule+0x5c/0x90 
[dfb7da70] [c0074b0c] sync_page+0x4c/0x74   
[dfb7da80] [c0344f44] __wait_on_bit_lock+0xb0/0x148 
[dfb7dab0] [c0074a8c] __lock_page+0x94/0xa4 
[dfb7dae0] [c0074d5c] find_lock_page+0x8c/0xa4  
[dfb7db00] [c0075674] filemap_fault+0x1ec/0x4fc 
[dfb7db40] [c008d548] __do_fault+0x98/0x53c 
[dfb7dba0] [c0018478] do_page_fault+0x2d0/0x500 
[dfb7dc50] [c00149d4] handle_page_fault+0xc/0x80
--- Exception: 301 at __clear_user+0x14/0x7c
LR = load_elf_binary+0x670/0x1270   
[dfb7dd10] [c00f6ca0] load_elf_binary+0x620/0x1270 (unreliable) 
[dfb7dd90] [c00b1f78] search_binary_handler+0x17c/0x394 
[dfb7dde0] [c00f4f50] load_script+0x274/0x288   
[dfb7de90] [c00b1f78] search_binary_handler+0x17c/0x394 
[dfb7dee0] [c00b3580] do_execve+0x240/0x29c 
[dfb7df20] [c000a46c] sys_execve+0x68/0xa4  
[dfb7df40] [c00145a4] ret_from_syscall+0x0/0x38 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Gianfar driver failing on MPC8641D based board

2010-02-26 Thread Paul Gortmaker
On 10-02-26 11:10 AM, Anton Vorontsov wrote:
 On Fri, Feb 26, 2010 at 03:34:07PM +, Martyn Welch wrote:
 [...]
 Out of 10 boot attempts, 7 failed.
 
 OK, I see why. With ip=on (dhcp boot) it's much harder to trigger
 it. With static ip config can I see the same.

I'd kind of expected to see us stuck in gianfar on that lock, but
the SysRQ-T doesn't show us hung up anywhere in gianfar itself.
[This was on a base 2.6.33, with just a small sysrq fix patch]

Paul.

--

SysRq : Changing Loglevel
Loglevel set to 9   
nfs: server not responding, still trying  
SysRq : Show State  
  taskPC stack   pid father 
init  D 0ff1c380 0 1  0 0x  
Call Trace: 
[df841a30] [c0009fc4] __switch_to+0x8c/0xf8 
[df841a50] [c0350160] schedule+0x354/0x92c  
[df841ae0] [c0331394] rpc_wait_bit_killable+0x2c/0x54   
[df841af0] [c0350eb0] __wait_on_bit+0x9c/0x108  
[df841b10] [c0350fc0] out_of_line_wait_on_bit+0xa4/0xb4 
[df841b40] [c0331cf0] __rpc_execute+0x16c/0x398 
[df841b90] [c0329abc] rpc_run_task+0x48/0x9c
[df841ba0] [c0329c40] rpc_call_sync+0x54/0x88   
[df841bd0] [c015e780] nfs_proc_lookup+0x94/0xe8 
[df841c20] [c014eb60] nfs_lookup+0x12c/0x230
[df841d50] [c00b9680] do_lookup+0x118/0x288 
[df841d80] [c00bb904] link_path_walk+0x194/0x1118   
[df841df0] [c00bcb08] path_walk+0x8c/0x168  
[df841e20] [c00bcd6c] do_path_lookup+0x74/0x7c  
[df841e40] [c00be148] do_filp_open+0x5d4/0xba4  
[df841f10] [c00abe94] do_sys_open+0xac/0x190
[df841f40] [c001437c] ret_from_syscall+0x0/0x38 
--- Exception: c01 at 0xff1c380 
LR = 0xfec6d98  
kthreadd  S  0 2  0 0x  
Call Trace: 
[df843e50] [c002e788] wake_up_new_task+0x128/0x16c (unreliable) 
[df843f10] [c0009fc4] __switch_to+0x8c/0xf8 
[df843f30] [c0350160] schedule+0x354/0x92c  
[df843fc0] [c004d154] kthreadd+0x130/0x134  
[df843ff0] [c00141a0] kernel_thread+0x4c/0x68   
migration/0   S  0 3  2 0x  
Call Trace: 
[df847de0] [] 0x (unreliable)   
[df847ea0] [c0009fc4] __switch_to+0x8c/0xf8 
[df847ec0] [c0350160] schedule+0x354/0x92c  
[df847f50] [c002d074] migration_thread+0x29c/0x448  
[df847fb0] [c004d020] kthread+0x80/0x84 
[df847ff0] [c00141a0] kernel_thread+0x4c/0x68   
ksoftirqd/0   S  0 4  2 0x  
Call Trace: 
[df84be10] [0800] 0x800 (unreliable)
[df84bed0] [c0009fc4] __switch_to+0x8c/0xf8 
[df84bef0] [c0350160] schedule+0x354/0x92c  
[df84bf80] [c0038454] run_ksoftirqd+0x14c/0x1e0 
[df84bfb0] [c004d020] kthread+0x80/0x84 
[df84bff0] [c00141a0] kernel_thread+0x4c/0x68   
watchdog/0S  0 5  2 0x  
Call Trace: 
[df84dee0] [c0009fc4] __switch_to+0x8c/0xf8 
[df84df00] [c0350160] schedule+0x354/0x92c  
[df84df90] [c006b8e8] watchdog+0x48/0x88
[df84dfb0] [c004d020] kthread+0x80/0x84 
[df84dff0] [c00141a0] kernel_thread+0x4c/0x68   

Re: Gianfar driver failing on MPC8641D based board

2010-02-26 Thread Anton Vorontsov
On Fri, Feb 26, 2010 at 11:27:42AM -0500, Paul Gortmaker wrote:
 On 10-02-26 11:10 AM, Anton Vorontsov wrote:
  On Fri, Feb 26, 2010 at 03:34:07PM +, Martyn Welch wrote:
  [...]
  Out of 10 boot attempts, 7 failed.
  
  OK, I see why. With ip=on (dhcp boot) it's much harder to trigger
  it. With static ip config can I see the same.
 
 I'd kind of expected to see us stuck in gianfar on that lock, but
 the SysRQ-T doesn't show us hung up anywhere in gianfar itself.
 [This was on a base 2.6.33, with just a small sysrq fix patch]

 [df841a30] [c0009fc4] __switch_to+0x8c/0xf8   
   
 [df841a50] [c0350160] schedule+0x354/0x92c
   
 [df841ae0] [c0331394] rpc_wait_bit_killable+0x2c/0x54 
   
 [df841af0] [c0350eb0] __wait_on_bit+0x9c/0x108
   
 [df841b10] [c0350fc0] out_of_line_wait_on_bit+0xa4/0xb4   
   
 [df841b40] [c0331cf0] __rpc_execute+0x16c/0x398   
   
 [df841b90] [c0329abc] rpc_run_task+0x48/0x9c  
   
 [df841ba0] [c0329c40] rpc_call_sync+0x54/0x88 
   
 [df841bd0] [c015e780] nfs_proc_lookup+0x94/0xe8   
   
 [df841c20] [c014eb60] nfs_lookup+0x12c/0x230  
   
 [df841d50] [c00b9680] do_lookup+0x118/0x288   
   
 [df841d80] [c00bb904] link_path_walk+0x194/0x1118 
   
 [df841df0] [c00bcb08] path_walk+0x8c/0x168
   
 [df841e20] [c00bcd6c] do_path_lookup+0x74/0x7c
   
 [df841e40] [c00be148] do_filp_open+0x5d4/0xba4
   
 [df841f10] [c00abe94] do_sys_open+0xac/0x190  
   

Yeah, I don't think this is gianfar-related. It must be something
else triggered by the fact that gianfar no longer sends stuff.

OK, I think I found what's happening in gianfar.

Some background...

start_xmit() prepares new skb for transmitting, generally it does
three things:

1. sets up all BDs (marks them ready to send), except the first one.
2. stores skb into tx_queue-tx_skbuff so that clean_tx_ring()
   would cleanup it later.
3. sets up the first BD, i.e. marks it ready.

Here is what clean_tx_ring() does:

1. reads skbs from tx_queue-tx_skbuff
2. Checks if the *last* BD is ready. If it's still ready [to send]
   then it it isn't transmitted, so clean_tx_ring() returns.
   Otherwise it actually cleanups BDs. All is OK.

Now, if there is just one BD, code flow:

- start_xmit(): stores skb into tx_skbuff. Note that the first BD
  (which is also the last one) isn't marked as ready, yet.
- clean_tx_ring(): sees that skb is not null, *and* its lstatus
  says that it is NOT ready (like if BD was sent), so it cleans
  it up (bad!)
- start_xmit(): marks BD as ready [to send], but it's too late.

We can fix this simply by reordering lstatus/tx_skbuff writes.

It works flawlessly on my p2020, please try it.

Thanks!


diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index 8bd3c9f..cccb409 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -2021,7 +2021,6 @@ static int gfar_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
}
 
/* setup the TxBD length and buffer pointer for the first BD */
-   tx_queue-tx_skbuff[tx_queue-skb_curtx] = skb;
txbdp_start-bufPtr = dma_map_single(priv-ofdev-dev, skb-data,
skb_headlen(skb), DMA_TO_DEVICE);
 
@@ -2053,6 +2052,10 @@ static int gfar_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
 
txbdp_start-lstatus = lstatus;
 
+   eieio(); /* force lstatus write before tx_skbuff */
+
+   tx_queue-tx_skbuff[tx_queue-skb_curtx] = skb;
+
/* Update the current skb pointer to the next entry we will use
 * (wrapping if necessary) */
tx_queue-skb_curtx = (tx_queue-skb_curtx + 1) 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Gianfar driver failing on MPC8641D based board

2010-02-26 Thread Paul Gortmaker
On 10-02-26 04:38 PM, Anton Vorontsov wrote:

 OK, I think I found what's happening in gianfar.
 
 Some background...
 
 start_xmit() prepares new skb for transmitting, generally it does
 three things:
 
 1. sets up all BDs (marks them ready to send), except the first one.
 2. stores skb into tx_queue-tx_skbuff so that clean_tx_ring()
 would cleanup it later.
 3. sets up the first BD, i.e. marks it ready.
 
 Here is what clean_tx_ring() does:
 
 1. reads skbs from tx_queue-tx_skbuff
 2. Checks if the *last* BD is ready. If it's still ready [to send]
 then it it isn't transmitted, so clean_tx_ring() returns.
 Otherwise it actually cleanups BDs. All is OK.
 
 Now, if there is just one BD, code flow:
 
 - start_xmit(): stores skb into tx_skbuff. Note that the first BD
(which is also the last one) isn't marked as ready, yet.
 - clean_tx_ring(): sees that skb is not null, *and* its lstatus
says that it is NOT ready (like if BD was sent), so it cleans
it up (bad!)
 - start_xmit(): marks BD as ready [to send], but it's too late.
 
 We can fix this simply by reordering lstatus/tx_skbuff writes.
 
 It works flawlessly on my p2020, please try it.

I've skipped right to the test part (I'll think about the description
more later) and it passed 5 out of 5 boot tests on NFSroot sbc8641d.
Looks like you've got a solution.

Paul.

 
 Thanks!
 
 
 diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
 index 8bd3c9f..cccb409 100644
 --- a/drivers/net/gianfar.c
 +++ b/drivers/net/gianfar.c
 @@ -2021,7 +2021,6 @@ static int gfar_start_xmit(struct sk_buff *skb, struct 
 net_device *dev)
   }
 
   /* setup the TxBD length and buffer pointer for the first BD */
 - tx_queue-tx_skbuff[tx_queue-skb_curtx] = skb;
   txbdp_start-bufPtr = dma_map_single(priv-ofdev-dev, skb-data,
   skb_headlen(skb), DMA_TO_DEVICE);
 
 @@ -2053,6 +2052,10 @@ static int gfar_start_xmit(struct sk_buff *skb, struct 
 net_device *dev)
 
   txbdp_start-lstatus = lstatus;
 
 + eieio(); /* force lstatus write before tx_skbuff */
 +
 + tx_queue-tx_skbuff[tx_queue-skb_curtx] = skb;
 +
   /* Update the current skb pointer to the next entry we will use
* (wrapping if necessary) */
   tx_queue-skb_curtx = (tx_queue-skb_curtx + 1)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: Gianfar driver failing on MPC8641D based board

2010-02-26 Thread Kumar Gopalpet-B05799
 

-Original Message-
From: Anton Vorontsov [mailto:avoront...@ru.mvista.com] 
Sent: Saturday, February 27, 2010 3:08 AM
To: Paul Gortmaker
Cc: Martyn Welch; net...@vger.kernel.org; 
linux-ker...@vger.kernel.org; linuxppc-dev list; Kumar 
Gopalpet-B05799; da...@davemloft.net
Subject: Re: Gianfar driver failing on MPC8641D based board

On Fri, Feb 26, 2010 at 11:27:42AM -0500, Paul Gortmaker wrote:
 On 10-02-26 11:10 AM, Anton Vorontsov wrote:
  On Fri, Feb 26, 2010 at 03:34:07PM +, Martyn Welch wrote:
  [...]
  Out of 10 boot attempts, 7 failed.
  
  OK, I see why. With ip=on (dhcp boot) it's much harder to trigger 
  it. With static ip config can I see the same.
 
 I'd kind of expected to see us stuck in gianfar on that 
lock, but the 
 SysRQ-T doesn't show us hung up anywhere in gianfar itself.
 [This was on a base 2.6.33, with just a small sysrq fix patch]

 [df841a30] [c0009fc4] __switch_to+0x8c/0xf8  
   
 [df841a50] [c0350160] schedule+0x354/0x92c   
   
 [df841ae0] [c0331394] rpc_wait_bit_killable+0x2c/0x54
   
 [df841af0] [c0350eb0] __wait_on_bit+0x9c/0x108   
   
 [df841b10] [c0350fc0] out_of_line_wait_on_bit+0xa4/0xb4  
   
 [df841b40] [c0331cf0] __rpc_execute+0x16c/0x398  
   
 [df841b90] [c0329abc] rpc_run_task+0x48/0x9c 
   
 [df841ba0] [c0329c40] rpc_call_sync+0x54/0x88
   
 [df841bd0] [c015e780] nfs_proc_lookup+0x94/0xe8  
   
 [df841c20] [c014eb60] nfs_lookup+0x12c/0x230 
   
 [df841d50] [c00b9680] do_lookup+0x118/0x288  
   
 [df841d80] [c00bb904] link_path_walk+0x194/0x1118
   
 [df841df0] [c00bcb08] path_walk+0x8c/0x168   
   
 [df841e20] [c00bcd6c] do_path_lookup+0x74/0x7c   
   
 [df841e40] [c00be148] do_filp_open+0x5d4/0xba4   
   
 [df841f10] [c00abe94] do_sys_open+0xac/0x190 
   

Yeah, I don't think this is gianfar-related. It must be 
something else triggered by the fact that gianfar no longer 
sends stuff.

OK, I think I found what's happening in gianfar.

Some background...

start_xmit() prepares new skb for transmitting, generally it 
does three things:

1. sets up all BDs (marks them ready to send), except the first one.
2. stores skb into tx_queue-tx_skbuff so that clean_tx_ring()
   would cleanup it later.
3. sets up the first BD, i.e. marks it ready.

Here is what clean_tx_ring() does:

1. reads skbs from tx_queue-tx_skbuff
2. Checks if the *last* BD is ready. If it's still ready [to send]
   then it it isn't transmitted, so clean_tx_ring() returns.
   Otherwise it actually cleanups BDs. All is OK.

Now, if there is just one BD, code flow:

- start_xmit(): stores skb into tx_skbuff. Note that the first BD
  (which is also the last one) isn't marked as ready, yet.
- clean_tx_ring(): sees that skb is not null, *and* its lstatus
  says that it is NOT ready (like if BD was sent), so it cleans
  it up (bad!)
- start_xmit(): marks BD as ready [to send], but it's too late.

We can fix this simply by reordering lstatus/tx_skbuff writes.

It works flawlessly on my p2020, please try it.

Anton,

Understood, and thanks for the explanation. Am I correct in saying that
this is
due to the out-of-order execution capability on powerpc ?

I have one more question, why don't we use use atomic_t for num_txbdfree
and
completely  do away with spin_locks in gfar_clean_tx_ring() and
gfar_start_xmit().
In an non-SMP, scenario I would feel there is absolutely no requirement
of spin_locks
and in case of SMP atomic operation would be much more safer on powerpc
rather than spin_locks.

What is your suggestion ?


--

Thanks
Sandeep


Thanks!


diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c 
index 8bd3c9f..cccb409 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -2021,7 +2021,6 @@ static int gfar_start_xmit(struct 
sk_buff *skb, struct net_device *dev)
   }
 
   /* setup the TxBD length and buffer pointer for the first BD */
-  tx_queue-tx_skbuff[tx_queue-skb_curtx] = skb;
   txbdp_start-bufPtr = dma_map_single(priv-ofdev-dev, 
skb-data,
   skb_headlen(skb), DMA_TO_DEVICE);
 
@@ -2053,6 +2052,10 @@ static int gfar_start_xmit(struct 
sk_buff *skb, struct net_device *dev)
 
   txbdp_start-lstatus = lstatus;
 
+  eieio(); /* force lstatus write before tx_skbuff */
+
+  tx_queue-tx_skbuff[tx_queue-skb_curtx] = skb;
+
   /* Update the current skb pointer to the next entry we will use
* (wrapping if necessary) */
   tx_queue-skb_curtx = (tx_queue-skb_curtx + 1) 


___
Linuxppc-dev mailing list
Linuxppc

Re: Gianfar driver failing on MPC8641D based board

2010-02-25 Thread Martyn Welch
Martyn Welch wrote:
 I have recently attempted to boot an 8641D based board from an NFS root.
 The boot process grinds to a halt not long after the first access of the
 NFS root and I receive multiple nfs: server 192.168.0.1 not responding,
 still trying messages. Wireshark suggests that there is no further
 traffic from this board at this point on. The NFS server seems to
 eventually try sending duplicate packets it's already sent, which
 results in nfs: server 192.168.0.1 OK messages, but the not
 responding messages resume with no further traffic from the board.

 I am able to boot to a ramdisk fine and the network seems to work -
 though I haven't really pushed the interface from it.

 I have attempted to git bisect, though I wasn't able to get much further
 than discovering the problem was introduced in the 2.6.33 merge window -
 at which point the gianfar network driver fails to compile (I have tried
 to git bisect skip many, many times to no avail).

 NFS booting fails for this board on todays linux-next, the master branch
 of Kumar's PPC tree and the head of the main tree. I have also been able
 to NFS boot from a random x86 based board that I have, using the head of
 the main tree and the linux-next tree.

 Copying the gianfar drivers from 2.6.32 into the head of the main tree
 restores the correct behaviour and I'm able to NFS boot. I have heard
 from others that the latest drivers work on 83xx and 85xx based boards,
 but it seems to be broken on at least the 8641D.

 I can see there has been a fair amount of work done on the gianfar
 driver, I assume that this is a bug introduced by the multiple queue
 support, but I'm way out of my depth on this.
   
I have just compiled 2.6.33 for the Freescale MPC8641_HPCN demo board
and am having still experiencing the problems outlined in my previous
email, though I have noticed that I tend to be able to boot from cold,
but my boot fails on reboot. Hitting the reset button doesn't help, I
need to actually power the machine on and off again for it to work.

As before, I'm way out of my depth in this, any one have any ideas?
Below is a dump of the failed boot process:

U-Boot 2009.01-00181-gc1b7c70 (Jan 30 2009 - 11:17:31)

Freescale PowerPC
CPU:
Core: E600 Core 0, Version: 0.2, (0x80040202)
System: Unknown, Version: 2.0, (0x80900120)
Clocks: CPU:1000 MHz, MPX: 400 MHz, DDR: 200 MHz, LBC:  25 MHz
L2: Enabled
Board: MPC8641HPCN, System ID: 0x10, System Version: 0x10, FPGA Version:
0x22
I2C:   ready
DRAM:  DDR:  1 GB
FLASH:  8 MB
Invalid ID (ff ff ff ff)
   Scanning PCI bus 01
PCI-EXPRESS 1 on bus 00 - 02
PCI-EXPRESS 2 on bus 03 - 03
Video: No radeon video card found!
In:serial
Out:   serial
Err:   serial
SCSI:  AHCI 0001. 32 slots 4 ports 3 Gbps 0xf impl IDE mode
flags: ncq ilck pm led clo pmp pio slum part
scanning bus for devices...
Net:   eTSEC1, eTSEC2, eTSEC3, eTSEC4
=  tftp 400 hpcn/uImage-torvalds-linux-2.6
Speed: 1000, full duplex
Using eTSEC1 device
TFTP from server 192.168.0.1; our IP address is 192.168.0.30
Filename 'hpcn/uImage-torvalds-linux-2.6'.
Load address: 0x400
Loading: #
 #
 ###
done
Bytes transferred = 2709050 (29563a hex)
= tftp 500 hpcn/mpc8641_hpcn-torvalds-linux-2.6.dtb
Speed: 1000, full duplex
Using eTSEC1 device
TFTP from server 192.168.0.1; our IP address is 192.168.0.30
Filename 'hpcn/mpc8641_hpcn-torvalds-linux-2.6.dtb'.
Load address: 0x500
Loading: #
done
Bytes transferred = 11523 (2d03 hex)
= setenv bootargs root=/dev/nfs rw
nfsroot=192.168.0.1:/tftpboot/hpcn/root/ i
= bootm 400 - 500
WARNING: adjusting available memory to 1000
## Booting kernel from Legacy Image at 0400 ...
   Image Name:   Linux-2.6.33-1-gbaac35c
   Image Type:   PowerPC Linux Kernel Image (gzip compressed)
   Data Size:2708986 Bytes =  2.6 MB
   Load Address: 
   Entry Point:  
   Verifying Checksum ... OK
## Flattened Device Tree blob at 0500
   Booting using the fdt blob at 0x500
   Uncompressing Kernel Image ... OK
   Loading Device Tree to 007fa000, end 007ffd02 ... OK
Using MPC86xx HPCN machine description
Total memory = 1024MB; using 2048kB for hash table (at cfe0)
Linux version 2.6.33-1-gbaac35c (welc...@es-j7s4d2j) (gcc version
4.1.2) #20
CPU maps initialized for 1 thread per core
bootconsole [udbg0] enabled
setup_arch: bootmem
mpc86xx_hpcn_setup_arch()
Found FSL PCI host bridge at 0xffe08000. Firmware bus number: 0-2
PCI host bridge /p...@ffe08000 (primary) ranges:
 MEM 0x8000..0x9fff - 0x8000
  IO 0xffc0..0xffc0 - 0x
/p...@ffe08000: PCICSRBAR @ 0xfff0
Found FSL PCI host bridge at 0xffe09000. Firmware bus number: 0-0
PCI host bridge 

Re: Gianfar driver failing on MPC8641D based board

2010-02-25 Thread Martyn Welch
Martyn Welch wrote:
 Martyn Welch wrote:
   
 I have recently attempted to boot an 8641D based board from an NFS root.
 The boot process grinds to a halt not long after the first access of the
 NFS root and I receive multiple nfs: server 192.168.0.1 not responding,
 still trying messages. Wireshark suggests that there is no further
 traffic from this board at this point on. The NFS server seems to
 eventually try sending duplicate packets it's already sent, which
 results in nfs: server 192.168.0.1 OK messages, but the not
 responding messages resume with no further traffic from the board.

 I am able to boot to a ramdisk fine and the network seems to work -
 though I haven't really pushed the interface from it.

 I have attempted to git bisect, though I wasn't able to get much further
 than discovering the problem was introduced in the 2.6.33 merge window -
 at which point the gianfar network driver fails to compile (I have tried
 to git bisect skip many, many times to no avail).

 NFS booting fails for this board on todays linux-next, the master branch
 of Kumar's PPC tree and the head of the main tree. I have also been able
 to NFS boot from a random x86 based board that I have, using the head of
 the main tree and the linux-next tree.

 Copying the gianfar drivers from 2.6.32 into the head of the main tree
 restores the correct behaviour and I'm able to NFS boot. I have heard
 from others that the latest drivers work on 83xx and 85xx based boards,
 but it seems to be broken on at least the 8641D.

 I can see there has been a fair amount of work done on the gianfar
 driver, I assume that this is a bug introduced by the multiple queue
 support, but I'm way out of my depth on this.
   
 
 I have just compiled 2.6.33 for the Freescale MPC8641_HPCN demo board
 and am having still experiencing the problems outlined in my previous
 email, though I have noticed that I tend to be able to boot from cold,
 but my boot fails on reboot. Hitting the reset button doesn't help, I
 need to actually power the machine on and off again for it to work.

 As before, I'm way out of my depth in this, any one have any ideas?
 Below is a dump of the failed boot process:

 U-Boot 2009.01-00181-gc1b7c70 (Jan 30 2009 - 11:17:31)

 Freescale PowerPC
 CPU:
 Core: E600 Core 0, Version: 0.2, (0x80040202)
 System: Unknown, Version: 2.0, (0x80900120)
 Clocks: CPU:1000 MHz, MPX: 400 MHz, DDR: 200 MHz, LBC:  25 MHz
 L2: Enabled
 Board: MPC8641HPCN, System ID: 0x10, System Version: 0x10, FPGA Version:
 0x22
 I2C:   ready
 DRAM:  DDR:  1 GB
 FLASH:  8 MB
 Invalid ID (ff ff ff ff)
Scanning PCI bus 01
 PCI-EXPRESS 1 on bus 00 - 02
 PCI-EXPRESS 2 on bus 03 - 03
 Video: No radeon video card found!
 In:serial
 Out:   serial
 Err:   serial
 SCSI:  AHCI 0001. 32 slots 4 ports 3 Gbps 0xf impl IDE mode
 flags: ncq ilck pm led clo pmp pio slum part
 scanning bus for devices...
 Net:   eTSEC1, eTSEC2, eTSEC3, eTSEC4
 =  tftp 400 hpcn/uImage-torvalds-linux-2.6
 Speed: 1000, full duplex
 Using eTSEC1 device
 TFTP from server 192.168.0.1; our IP address is 192.168.0.30
 Filename 'hpcn/uImage-torvalds-linux-2.6'.
 Load address: 0x400
 Loading: #
  #
  ###
 done
 Bytes transferred = 2709050 (29563a hex)
 = tftp 500 hpcn/mpc8641_hpcn-torvalds-linux-2.6.dtb
 Speed: 1000, full duplex
 Using eTSEC1 device
 TFTP from server 192.168.0.1; our IP address is 192.168.0.30
 Filename 'hpcn/mpc8641_hpcn-torvalds-linux-2.6.dtb'.
 Load address: 0x500
 Loading: #
 done
 Bytes transferred = 11523 (2d03 hex)
 = setenv bootargs root=/dev/nfs rw
 nfsroot=192.168.0.1:/tftpboot/hpcn/root/ i
 = bootm 400 - 500
 WARNING: adjusting available memory to 1000
 ## Booting kernel from Legacy Image at 0400 ...
Image Name:   Linux-2.6.33-1-gbaac35c
Image Type:   PowerPC Linux Kernel Image (gzip compressed)
Data Size:2708986 Bytes =  2.6 MB
Load Address: 
Entry Point:  
Verifying Checksum ... OK
 ## Flattened Device Tree blob at 0500
Booting using the fdt blob at 0x500
Uncompressing Kernel Image ... OK
Loading Device Tree to 007fa000, end 007ffd02 ... OK
 Using MPC86xx HPCN machine description
 Total memory = 1024MB; using 2048kB for hash table (at cfe0)
 Linux version 2.6.33-1-gbaac35c (welc...@es-j7s4d2j) (gcc version
 4.1.2) #20
 CPU maps initialized for 1 thread per core
 bootconsole [udbg0] enabled
 setup_arch: bootmem
 mpc86xx_hpcn_setup_arch()
 Found FSL PCI host bridge at 0xffe08000. Firmware bus number: 0-2
 PCI host bridge /p...@ffe08000 (primary) ranges:
  MEM 0x8000..0x9fff - 0x8000
   IO 0xffc0..0xffc0 - 0x
 /p...@ffe08000: PCICSRBAR @ 

Re: Gianfar driver failing on MPC8641D based board

2010-02-25 Thread Anton Vorontsov
On Thu, Feb 25, 2010 at 04:46:54PM +, Martyn Welch wrote:
[...]
  nfs: server 192.168.0.1 not responding, still trying

 
 Further testing has shown that this isn't restricted to warm reboots, it
 happens from cold as well. In addition, the exact timing of the failure
 seems to vary, some boots have got further before failing.

Unfortunately I don't have any 8641 boards near me, so I can't
debug this myself. Though, I tested gianfar on MPC8568E-MDS with
2.6.33 kernel, and it seems to work just fine.

I see you use SMP. Can you try to turn it off? If that will fix
the issue, then it'll be a good data point.

Meanwhile, I'll try SMP kernel on MPC8568 (UP), and let you
know the results.

Thanks,

-- 
Anton Vorontsov
email: cbouatmai...@gmail.com
irc://irc.freenode.net/bd2
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Gianfar driver failing on MPC8641D based board

2010-02-25 Thread Kumar Gala

On Feb 25, 2010, at 10:46 AM, Martyn Welch wrote:

 
 Further testing has shown that this isn't restricted to warm reboots, it
 happens from cold as well. In addition, the exact timing of the failure
 seems to vary, some boots have got further before failing.

what mechanism do you use for warm resets?

- k
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Gianfar driver failing on MPC8641D based board

2010-02-25 Thread Paul Gortmaker
On Thu, Feb 25, 2010 at 12:49 PM, Anton Vorontsov
avoront...@ru.mvista.com wrote:
 On Thu, Feb 25, 2010 at 07:51:41PM +0300, Anton Vorontsov wrote:
 On Thu, Feb 25, 2010 at 04:46:54PM +, Martyn Welch wrote:
 [...]
   nfs: server 192.168.0.1 not responding, still trying
  
 
  Further testing has shown that this isn't restricted to warm reboots, it
  happens from cold as well. In addition, the exact timing of the failure
  seems to vary, some boots have got further before failing.

 Unfortunately I don't have any 8641 boards near me, so I can't
 debug this myself. Though, I tested gianfar on MPC8568E-MDS with
 2.6.33 kernel, and it seems to work just fine.

 I see you use SMP. Can you try to turn it off? If that will fix
 the issue, then it'll be a good data point.

 Meanwhile, I'll try SMP kernel on MPC8568 (UP), and let you
 know the results.

 Nope, no luck. Can't trigger the issue. :-/
 Tested with NFS boot, TCP and UDP netperf tests.

I was able to reproduce it on an 8641D and bisected it down to this:

---
commit a3bc1f11e9b867a4f49505ecac486a33af248b2e
Author: Anton Vorontsov avoront...@ru.mvista.com
Date:   Tue Nov 10 14:11:10 2009 +

gianfar: Revive SKB recycling

Before calling gfar_clean_tx_ring() the driver grabs an irqsave
spinlock, and then tries to recycle skbs. But since
skb_recycle_check() returns 0 with IRQs disabled, we'll never
recycle any skbs.

It appears that gfar_clean_tx_ring() and gfar_start_xmit() are
mostly idependent and can work in parallel, except when they
modify num_txbdfree.

So we can drop the lock from most sections and thus fix the skb
recycling.
---

...which probably explains why you weren't seeing it on non-SMP.
I'd imagine it would show up on any of the e500mc boards too.

I'd done a rev-list on gianfar.[ch] from 32 to 33-rc1, and then
cherry-picked those onto a 32 baseline to reduce the scale of
the bisection, but I don't think that should impact the final
result I got in any meaningful way.

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Gianfar driver failing on MPC8641D based board

2010-02-25 Thread Anton Vorontsov
On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote:
[...]
 I was able to reproduce it on an 8641D and bisected it down to this:
 
 ---
 commit a3bc1f11e9b867a4f49505ecac486a33af248b2e
 Author: Anton Vorontsov avoront...@ru.mvista.com
 Date:   Tue Nov 10 14:11:10 2009 +
 
 gianfar: Revive SKB recycling

Thanks for the bisect. I have a guess why tx hangs in
SMP case. Could anyone try the patch down below?

[...]
 ...which probably explains why you weren't seeing it on non-SMP.
 I'd imagine it would show up on any of the e500mc boards too.

Yeah.. Pity, I don't have SMP boards anymore. I'll try
to get one though.


diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index 8bd3c9f..3ff3bd0 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -2614,6 +2614,8 @@ static int gfar_poll(struct napi_struct *napi, int budget)
tx_queue = priv-tx_queue[rx_queue-qindex];
 
tx_cleaned += gfar_clean_tx_ring(tx_queue);
+   if (!tx_cleaned  !tx_queue-num_txbdfree)
+   tx_cleaned += 1; /* don't complete napi */
rx_cleaned_per_queue = gfar_clean_rx_ring(rx_queue,
budget_per_queue);
rx_cleaned += rx_cleaned_per_queue;
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: Gianfar driver failing on MPC8641D based board

2010-02-25 Thread Kumar Gopalpet-B05799
 

-Original Message-
From: Anton Vorontsov [mailto:avoront...@ru.mvista.com] 
Sent: Friday, February 26, 2010 8:45 AM
To: Paul Gortmaker
Cc: Martyn Welch; linuxppc-dev list; net...@vger.kernel.org; 
linux-ker...@vger.kernel.org; Kumar Gopalpet-B05799; 
da...@davemloft.net; Kumar Gala
Subject: Re: Gianfar driver failing on MPC8641D based board

On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote:
[...]
 I was able to reproduce it on an 8641D and bisected it down to this:
 
 ---
 commit a3bc1f11e9b867a4f49505ecac486a33af248b2e
 Author: Anton Vorontsov avoront...@ru.mvista.com
 Date:   Tue Nov 10 14:11:10 2009 +
 
 gianfar: Revive SKB recycling

Thanks for the bisect. I have a guess why tx hangs in SMP 
case. Could anyone try the patch down below?

[...]
 ...which probably explains why you weren't seeing it on non-SMP.
 I'd imagine it would show up on any of the e500mc boards too.

Yeah.. Pity, I don't have SMP boards anymore. I'll try to get 
one though.


diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c 
index 8bd3c9f..3ff3bd0 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -2614,6 +2614,8 @@ static int gfar_poll(struct napi_struct 
*napi, int budget)
   tx_queue = priv-tx_queue[rx_queue-qindex];
 
   tx_cleaned += gfar_clean_tx_ring(tx_queue);
+  if (!tx_cleaned  !tx_queue-num_txbdfree)
+  tx_cleaned += 1; /* don't 
complete napi */
   rx_cleaned_per_queue = 
gfar_clean_rx_ring(rx_queue,
   
budget_per_queue);
   rx_cleaned += rx_cleaned_per_queue;


Anton, 

There is also one more issue that I have been observing with the patch
gianfar: Revive SKB recycling.
The issue is when I do a IPV4 forwarding test scenario with
bidirectional flows (SMP environment). I am using Spirent smart bits
(smartflow) for automation testing and I frequently observe smart flow
reporting Rx packet counte greater than Tx packet count. Duplicate
packets might have been received.

To just get over the issue I have removed this patch and I didn't see
the issue.

To a certain extent I could get over the problem by using atomic_t for
num_txbdfree (atomic_add and atomic_dec instructions for updating the
num_txbdfree) and completely removing the spin_locks in the tx routines.

Also, I feel we might want to make some more changes to the
gfar_clean_tx_ring( ) and gfar_start_xmit() routines so that they can
operate parallely. 

I am really sorry for not posting it a bit earlier as I am caught up
with some urgent issues.

--

Thanks
Sandeep
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Gianfar driver failing on MPC8641D based board

2010-02-05 Thread Martyn Welch
I have recently attempted to boot an 8641D based board from an NFS root.
The boot process grinds to a halt not long after the first access of the
NFS root and I receive multiple nfs: server 192.168.0.1 not responding,
still trying messages. Wireshark suggests that there is no further
traffic from this board at this point on. The NFS server seems to
eventually try sending duplicate packets it's already sent, which
results in nfs: server 192.168.0.1 OK messages, but the not
responding messages resume with no further traffic from the board.

I am able to boot to a ramdisk fine and the network seems to work -
though I haven't really pushed the interface from it.

I have attempted to git bisect, though I wasn't able to get much further
than discovering the problem was introduced in the 2.6.33 merge window -
at which point the gianfar network driver fails to compile (I have tried
to git bisect skip many, many times to no avail).

NFS booting fails for this board on todays linux-next, the master branch
of Kumar's PPC tree and the head of the main tree. I have also been able
to NFS boot from a random x86 based board that I have, using the head of
the main tree and the linux-next tree.

Copying the gianfar drivers from 2.6.32 into the head of the main tree
restores the correct behaviour and I'm able to NFS boot. I have heard
from others that the latest drivers work on 83xx and 85xx based boards,
but it seems to be broken on at least the 8641D.

I can see there has been a fair amount of work done on the gianfar
driver, I assume that this is a bug introduced by the multiple queue
support, but I'm way out of my depth on this.

I'm also off for the next week - so if I'm quiet, it'll be because of that.

Martyn

-- 
Martyn Welch (Principal Software Engineer)   |   Registered in England and
GE Intelligent Platforms |   Wales (3828642) at 100
T +44(0)127322748|   Barbirolli Square, Manchester,
E martyn.we...@ge.com|   M2 3AB  VAT:GB 927559189

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev