Re: Gianfar driver failing on MPC8641D based board
Hi! On Sat, Feb 27, 2010 at 11:05:32AM +0530, Kumar Gopalpet-B05799 wrote: [...] Understood, and thanks for the explanation. Am I correct in saying that this is due to the out-of-order execution capability on powerpc ? Nope, that was just a logic issue in the driver. Though, with the patch, the eieio() is needed so that compiler (or CPU) won't reorder lstatus and skbuff writes. I have one more question, why don't we use use atomic_t for num_txbdfree and completely do away with spin_locks in gfar_clean_tx_ring() and gfar_start_xmit(). In an non-SMP, scenario I would feel there is absolutely no requirement of spin_locks and in case of SMP atomic operation would be much more safer on powerpc rather than spin_locks. What is your suggestion ? I think that's a good idea. However, in start_xmit() we'll have to keep the spinlock anyway since it also protects from gfar_error(), which can modify regs-tstat. Thanks! -- Anton Vorontsov email: cbouatmai...@gmail.com irc://irc.freenode.net/bd2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Gianfar driver failing on MPC8641D based board
Anton Vorontsov wrote: diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c index 8bd3c9f..cccb409 100644 --- a/drivers/net/gianfar.c +++ b/drivers/net/gianfar.c @@ -2021,7 +2021,6 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev) } /* setup the TxBD length and buffer pointer for the first BD */ - tx_queue-tx_skbuff[tx_queue-skb_curtx] = skb; txbdp_start-bufPtr = dma_map_single(priv-ofdev-dev, skb-data, skb_headlen(skb), DMA_TO_DEVICE); @@ -2053,6 +2052,10 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev) txbdp_start-lstatus = lstatus; + eieio(); /* force lstatus write before tx_skbuff */ + + tx_queue-tx_skbuff[tx_queue-skb_curtx] = skb; + /* Update the current skb pointer to the next entry we will use * (wrapping if necessary) */ tx_queue-skb_curtx = (tx_queue-skb_curtx + 1) I can confirm 10/10 successful boots on p2020ds and mpc8641_hpcn. Martyn -- Martyn Welch (Principal Software Engineer) | Registered in England and GE Intelligent Platforms | Wales (3828642) at 100 T +44(0)127322748| Barbirolli Square, Manchester, E martyn.we...@ge.com| M2 3AB VAT:GB 927559189 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Gianfar driver failing on MPC8641D based board
Anton Vorontsov wrote: On Thu, Feb 25, 2010 at 04:46:54PM +, Martyn Welch wrote: [...] nfs: server 192.168.0.1 not responding, still trying Further testing has shown that this isn't restricted to warm reboots, it happens from cold as well. In addition, the exact timing of the failure seems to vary, some boots have got further before failing. Unfortunately I don't have any 8641 boards near me, so I can't debug this myself. Though, I tested gianfar on MPC8568E-MDS with 2.6.33 kernel, and it seems to work just fine. I see you use SMP. Can you try to turn it off? If that will fix the issue, then it'll be a good data point. Meanwhile, I'll try SMP kernel on MPC8568 (UP), and let you know the results. Thanks I removed the second core from the dts file rather than truly disabling SMP in the kernel config. Doing this allowed the board to boot reliably. Martyn -- Martyn Welch (Principal Software Engineer) | Registered in England and GE Intelligent Platforms | Wales (3828642) at 100 T +44(0)127322748| Barbirolli Square, Manchester, E martyn.we...@ge.com| M2 3AB VAT:GB 927559189 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Gianfar driver failing on MPC8641D based board
Anton Vorontsov wrote: On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote: [...] I was able to reproduce it on an 8641D and bisected it down to this: --- commit a3bc1f11e9b867a4f49505ecac486a33af248b2e Author: Anton Vorontsov avoront...@ru.mvista.com Date: Tue Nov 10 14:11:10 2009 + gianfar: Revive SKB recycling Thanks for the bisect. I have a guess why tx hangs in SMP case. Could anyone try the patch down below? Yup, no problem. I'm afraid it doesn't resolve the problem for me. [...] ...which probably explains why you weren't seeing it on non-SMP. I'd imagine it would show up on any of the e500mc boards too. Yeah.. Pity, I don't have SMP boards anymore. I'll try to get one though. diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c index 8bd3c9f..3ff3bd0 100644 --- a/drivers/net/gianfar.c +++ b/drivers/net/gianfar.c @@ -2614,6 +2614,8 @@ static int gfar_poll(struct napi_struct *napi, int budget) tx_queue = priv-tx_queue[rx_queue-qindex]; tx_cleaned += gfar_clean_tx_ring(tx_queue); + if (!tx_cleaned !tx_queue-num_txbdfree) + tx_cleaned += 1; /* don't complete napi */ rx_cleaned_per_queue = gfar_clean_rx_ring(rx_queue, budget_per_queue); rx_cleaned += rx_cleaned_per_queue; -- Martyn Welch (Principal Software Engineer) | Registered in England and GE Intelligent Platforms | Wales (3828642) at 100 T +44(0)127322748| Barbirolli Square, Manchester, E martyn.we...@ge.com| M2 3AB VAT:GB 927559189 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Gianfar driver failing on MPC8641D based board
On Fri, Feb 26, 2010 at 12:06:15PM +, Martyn Welch wrote: Anton Vorontsov wrote: On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote: [...] I was able to reproduce it on an 8641D and bisected it down to this: --- commit a3bc1f11e9b867a4f49505ecac486a33af248b2e Author: Anton Vorontsov avoront...@ru.mvista.com Date: Tue Nov 10 14:11:10 2009 + gianfar: Revive SKB recycling Thanks for the bisect. I have a guess why tx hangs in SMP case. Could anyone try the patch down below? Yup, no problem. I'm afraid it doesn't resolve the problem for me. Hm.. I found a p2020 board and I was able to reproduce the issue. The patch down below fixed it completely for me... hm. I'll look further, thanks! [...] ...which probably explains why you weren't seeing it on non-SMP. I'd imagine it would show up on any of the e500mc boards too. Yeah.. Pity, I don't have SMP boards anymore. I'll try to get one though. diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c index 8bd3c9f..3ff3bd0 100644 --- a/drivers/net/gianfar.c +++ b/drivers/net/gianfar.c @@ -2614,6 +2614,8 @@ static int gfar_poll(struct napi_struct *napi, int budget) tx_queue = priv-tx_queue[rx_queue-qindex]; tx_cleaned += gfar_clean_tx_ring(tx_queue); + if (!tx_cleaned !tx_queue-num_txbdfree) + tx_cleaned += 1; /* don't complete napi */ rx_cleaned_per_queue = gfar_clean_rx_ring(rx_queue, budget_per_queue); rx_cleaned += rx_cleaned_per_queue; -- Anton Vorontsov email: cbouatmai...@gmail.com irc://irc.freenode.net/bd2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Gianfar driver failing on MPC8641D based board
Martyn Welch wrote: Paul Gortmaker wrote: On 10-02-26 09:35 AM, Anton Vorontsov wrote: On Fri, Feb 26, 2010 at 12:06:15PM +, Martyn Welch wrote: Anton Vorontsov wrote: On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote: [...] I was able to reproduce it on an 8641D and bisected it down to this: --- commit a3bc1f11e9b867a4f49505ecac486a33af248b2e Author: Anton Vorontsovavoront...@ru.mvista.com Date: Tue Nov 10 14:11:10 2009 + gianfar: Revive SKB recycling Thanks for the bisect. I have a guess why tx hangs in SMP case. Could anyone try the patch down below? Yup, no problem. I'm afraid it doesn't resolve the problem for me. Hm.. I found a p2020 board and I was able to reproduce the issue. The patch down below fixed it completely for me... hm. Interesting. I just tested the patch on the sbc8641d, and it still has the issue with your patch applied. I'm using NFSroot just like Martyn was and it still appears bound up on that gianfar tx lock. I'll see if I can get a SysRq backtrace in case that will help you see how it manages to get there... I've got a p2020ds here as well, so I'll give NFSroot on that a try with your patch. Out of 10 boot attempts, 7 failed. Martyn -- Martyn Welch (Principal Software Engineer) | Registered in England and GE Intelligent Platforms | Wales (3828642) at 100 T +44(0)127322748| Barbirolli Square, Manchester, E martyn.we...@ge.com| M2 3AB VAT:GB 927559189 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Gianfar driver failing on MPC8641D based board
On Fri, Feb 26, 2010 at 03:34:07PM +, Martyn Welch wrote: [...] Out of 10 boot attempts, 7 failed. OK, I see why. With ip=on (dhcp boot) it's much harder to trigger it. With static ip config can I see the same. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Gianfar driver failing on MPC8641D based board
On 10-02-26 09:35 AM, Anton Vorontsov wrote: On Fri, Feb 26, 2010 at 12:06:15PM +, Martyn Welch wrote: Anton Vorontsov wrote: On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote: [...] I was able to reproduce it on an 8641D and bisected it down to this: --- commit a3bc1f11e9b867a4f49505ecac486a33af248b2e Author: Anton Vorontsovavoront...@ru.mvista.com Date: Tue Nov 10 14:11:10 2009 + gianfar: Revive SKB recycling Thanks for the bisect. I have a guess why tx hangs in SMP case. Could anyone try the patch down below? Yup, no problem. I'm afraid it doesn't resolve the problem for me. Hm.. I found a p2020 board and I was able to reproduce the issue. The patch down below fixed it completely for me... hm. Interesting. I just tested the patch on the sbc8641d, and it still has the issue with your patch applied. I'm using NFSroot just like Martyn was and it still appears bound up on that gianfar tx lock. I'll see if I can get a SysRq backtrace in case that will help you see how it manages to get there... Paul. nfs: server not responding, still trying [repeated ~15 times, then...] INFO: task rc.sysinit:837 blocked for more than 120 seconds. echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. rc.sysinitD 0fef73f4 0 837836 0x Call Trace: [dfb7d9b0] [c000a144] __switch_to+0x8c/0xf8 [dfb7d9d0] [c03443dc] schedule+0x380/0x954 [dfb7da50] [c0344a0c] io_schedule+0x5c/0x90 [dfb7da70] [c0074b0c] sync_page+0x4c/0x74 [dfb7da80] [c0344f44] __wait_on_bit_lock+0xb0/0x148 [dfb7dab0] [c0074a8c] __lock_page+0x94/0xa4 [dfb7dae0] [c0074d5c] find_lock_page+0x8c/0xa4 [dfb7db00] [c0075674] filemap_fault+0x1ec/0x4fc [dfb7db40] [c008d548] __do_fault+0x98/0x53c [dfb7dba0] [c0018478] do_page_fault+0x2d0/0x500 [dfb7dc50] [c00149d4] handle_page_fault+0xc/0x80 --- Exception: 301 at __clear_user+0x14/0x7c LR = load_elf_binary+0x670/0x1270 [dfb7dd10] [c00f6ca0] load_elf_binary+0x620/0x1270 (unreliable) [dfb7dd90] [c00b1f78] search_binary_handler+0x17c/0x394 [dfb7dde0] [c00f4f50] load_script+0x274/0x288 [dfb7de90] [c00b1f78] search_binary_handler+0x17c/0x394 [dfb7dee0] [c00b3580] do_execve+0x240/0x29c [dfb7df20] [c000a46c] sys_execve+0x68/0xa4 [dfb7df40] [c00145a4] ret_from_syscall+0x0/0x38 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Gianfar driver failing on MPC8641D based board
On 10-02-26 11:10 AM, Anton Vorontsov wrote: On Fri, Feb 26, 2010 at 03:34:07PM +, Martyn Welch wrote: [...] Out of 10 boot attempts, 7 failed. OK, I see why. With ip=on (dhcp boot) it's much harder to trigger it. With static ip config can I see the same. I'd kind of expected to see us stuck in gianfar on that lock, but the SysRQ-T doesn't show us hung up anywhere in gianfar itself. [This was on a base 2.6.33, with just a small sysrq fix patch] Paul. -- SysRq : Changing Loglevel Loglevel set to 9 nfs: server not responding, still trying SysRq : Show State taskPC stack pid father init D 0ff1c380 0 1 0 0x Call Trace: [df841a30] [c0009fc4] __switch_to+0x8c/0xf8 [df841a50] [c0350160] schedule+0x354/0x92c [df841ae0] [c0331394] rpc_wait_bit_killable+0x2c/0x54 [df841af0] [c0350eb0] __wait_on_bit+0x9c/0x108 [df841b10] [c0350fc0] out_of_line_wait_on_bit+0xa4/0xb4 [df841b40] [c0331cf0] __rpc_execute+0x16c/0x398 [df841b90] [c0329abc] rpc_run_task+0x48/0x9c [df841ba0] [c0329c40] rpc_call_sync+0x54/0x88 [df841bd0] [c015e780] nfs_proc_lookup+0x94/0xe8 [df841c20] [c014eb60] nfs_lookup+0x12c/0x230 [df841d50] [c00b9680] do_lookup+0x118/0x288 [df841d80] [c00bb904] link_path_walk+0x194/0x1118 [df841df0] [c00bcb08] path_walk+0x8c/0x168 [df841e20] [c00bcd6c] do_path_lookup+0x74/0x7c [df841e40] [c00be148] do_filp_open+0x5d4/0xba4 [df841f10] [c00abe94] do_sys_open+0xac/0x190 [df841f40] [c001437c] ret_from_syscall+0x0/0x38 --- Exception: c01 at 0xff1c380 LR = 0xfec6d98 kthreadd S 0 2 0 0x Call Trace: [df843e50] [c002e788] wake_up_new_task+0x128/0x16c (unreliable) [df843f10] [c0009fc4] __switch_to+0x8c/0xf8 [df843f30] [c0350160] schedule+0x354/0x92c [df843fc0] [c004d154] kthreadd+0x130/0x134 [df843ff0] [c00141a0] kernel_thread+0x4c/0x68 migration/0 S 0 3 2 0x Call Trace: [df847de0] [] 0x (unreliable) [df847ea0] [c0009fc4] __switch_to+0x8c/0xf8 [df847ec0] [c0350160] schedule+0x354/0x92c [df847f50] [c002d074] migration_thread+0x29c/0x448 [df847fb0] [c004d020] kthread+0x80/0x84 [df847ff0] [c00141a0] kernel_thread+0x4c/0x68 ksoftirqd/0 S 0 4 2 0x Call Trace: [df84be10] [0800] 0x800 (unreliable) [df84bed0] [c0009fc4] __switch_to+0x8c/0xf8 [df84bef0] [c0350160] schedule+0x354/0x92c [df84bf80] [c0038454] run_ksoftirqd+0x14c/0x1e0 [df84bfb0] [c004d020] kthread+0x80/0x84 [df84bff0] [c00141a0] kernel_thread+0x4c/0x68 watchdog/0S 0 5 2 0x Call Trace: [df84dee0] [c0009fc4] __switch_to+0x8c/0xf8 [df84df00] [c0350160] schedule+0x354/0x92c [df84df90] [c006b8e8] watchdog+0x48/0x88 [df84dfb0] [c004d020] kthread+0x80/0x84 [df84dff0] [c00141a0] kernel_thread+0x4c/0x68
Re: Gianfar driver failing on MPC8641D based board
On Fri, Feb 26, 2010 at 11:27:42AM -0500, Paul Gortmaker wrote: On 10-02-26 11:10 AM, Anton Vorontsov wrote: On Fri, Feb 26, 2010 at 03:34:07PM +, Martyn Welch wrote: [...] Out of 10 boot attempts, 7 failed. OK, I see why. With ip=on (dhcp boot) it's much harder to trigger it. With static ip config can I see the same. I'd kind of expected to see us stuck in gianfar on that lock, but the SysRQ-T doesn't show us hung up anywhere in gianfar itself. [This was on a base 2.6.33, with just a small sysrq fix patch] [df841a30] [c0009fc4] __switch_to+0x8c/0xf8 [df841a50] [c0350160] schedule+0x354/0x92c [df841ae0] [c0331394] rpc_wait_bit_killable+0x2c/0x54 [df841af0] [c0350eb0] __wait_on_bit+0x9c/0x108 [df841b10] [c0350fc0] out_of_line_wait_on_bit+0xa4/0xb4 [df841b40] [c0331cf0] __rpc_execute+0x16c/0x398 [df841b90] [c0329abc] rpc_run_task+0x48/0x9c [df841ba0] [c0329c40] rpc_call_sync+0x54/0x88 [df841bd0] [c015e780] nfs_proc_lookup+0x94/0xe8 [df841c20] [c014eb60] nfs_lookup+0x12c/0x230 [df841d50] [c00b9680] do_lookup+0x118/0x288 [df841d80] [c00bb904] link_path_walk+0x194/0x1118 [df841df0] [c00bcb08] path_walk+0x8c/0x168 [df841e20] [c00bcd6c] do_path_lookup+0x74/0x7c [df841e40] [c00be148] do_filp_open+0x5d4/0xba4 [df841f10] [c00abe94] do_sys_open+0xac/0x190 Yeah, I don't think this is gianfar-related. It must be something else triggered by the fact that gianfar no longer sends stuff. OK, I think I found what's happening in gianfar. Some background... start_xmit() prepares new skb for transmitting, generally it does three things: 1. sets up all BDs (marks them ready to send), except the first one. 2. stores skb into tx_queue-tx_skbuff so that clean_tx_ring() would cleanup it later. 3. sets up the first BD, i.e. marks it ready. Here is what clean_tx_ring() does: 1. reads skbs from tx_queue-tx_skbuff 2. Checks if the *last* BD is ready. If it's still ready [to send] then it it isn't transmitted, so clean_tx_ring() returns. Otherwise it actually cleanups BDs. All is OK. Now, if there is just one BD, code flow: - start_xmit(): stores skb into tx_skbuff. Note that the first BD (which is also the last one) isn't marked as ready, yet. - clean_tx_ring(): sees that skb is not null, *and* its lstatus says that it is NOT ready (like if BD was sent), so it cleans it up (bad!) - start_xmit(): marks BD as ready [to send], but it's too late. We can fix this simply by reordering lstatus/tx_skbuff writes. It works flawlessly on my p2020, please try it. Thanks! diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c index 8bd3c9f..cccb409 100644 --- a/drivers/net/gianfar.c +++ b/drivers/net/gianfar.c @@ -2021,7 +2021,6 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev) } /* setup the TxBD length and buffer pointer for the first BD */ - tx_queue-tx_skbuff[tx_queue-skb_curtx] = skb; txbdp_start-bufPtr = dma_map_single(priv-ofdev-dev, skb-data, skb_headlen(skb), DMA_TO_DEVICE); @@ -2053,6 +2052,10 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev) txbdp_start-lstatus = lstatus; + eieio(); /* force lstatus write before tx_skbuff */ + + tx_queue-tx_skbuff[tx_queue-skb_curtx] = skb; + /* Update the current skb pointer to the next entry we will use * (wrapping if necessary) */ tx_queue-skb_curtx = (tx_queue-skb_curtx + 1) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Gianfar driver failing on MPC8641D based board
On 10-02-26 04:38 PM, Anton Vorontsov wrote: OK, I think I found what's happening in gianfar. Some background... start_xmit() prepares new skb for transmitting, generally it does three things: 1. sets up all BDs (marks them ready to send), except the first one. 2. stores skb into tx_queue-tx_skbuff so that clean_tx_ring() would cleanup it later. 3. sets up the first BD, i.e. marks it ready. Here is what clean_tx_ring() does: 1. reads skbs from tx_queue-tx_skbuff 2. Checks if the *last* BD is ready. If it's still ready [to send] then it it isn't transmitted, so clean_tx_ring() returns. Otherwise it actually cleanups BDs. All is OK. Now, if there is just one BD, code flow: - start_xmit(): stores skb into tx_skbuff. Note that the first BD (which is also the last one) isn't marked as ready, yet. - clean_tx_ring(): sees that skb is not null, *and* its lstatus says that it is NOT ready (like if BD was sent), so it cleans it up (bad!) - start_xmit(): marks BD as ready [to send], but it's too late. We can fix this simply by reordering lstatus/tx_skbuff writes. It works flawlessly on my p2020, please try it. I've skipped right to the test part (I'll think about the description more later) and it passed 5 out of 5 boot tests on NFSroot sbc8641d. Looks like you've got a solution. Paul. Thanks! diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c index 8bd3c9f..cccb409 100644 --- a/drivers/net/gianfar.c +++ b/drivers/net/gianfar.c @@ -2021,7 +2021,6 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev) } /* setup the TxBD length and buffer pointer for the first BD */ - tx_queue-tx_skbuff[tx_queue-skb_curtx] = skb; txbdp_start-bufPtr = dma_map_single(priv-ofdev-dev, skb-data, skb_headlen(skb), DMA_TO_DEVICE); @@ -2053,6 +2052,10 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev) txbdp_start-lstatus = lstatus; + eieio(); /* force lstatus write before tx_skbuff */ + + tx_queue-tx_skbuff[tx_queue-skb_curtx] = skb; + /* Update the current skb pointer to the next entry we will use * (wrapping if necessary) */ tx_queue-skb_curtx = (tx_queue-skb_curtx + 1) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: Gianfar driver failing on MPC8641D based board
-Original Message- From: Anton Vorontsov [mailto:avoront...@ru.mvista.com] Sent: Saturday, February 27, 2010 3:08 AM To: Paul Gortmaker Cc: Martyn Welch; net...@vger.kernel.org; linux-ker...@vger.kernel.org; linuxppc-dev list; Kumar Gopalpet-B05799; da...@davemloft.net Subject: Re: Gianfar driver failing on MPC8641D based board On Fri, Feb 26, 2010 at 11:27:42AM -0500, Paul Gortmaker wrote: On 10-02-26 11:10 AM, Anton Vorontsov wrote: On Fri, Feb 26, 2010 at 03:34:07PM +, Martyn Welch wrote: [...] Out of 10 boot attempts, 7 failed. OK, I see why. With ip=on (dhcp boot) it's much harder to trigger it. With static ip config can I see the same. I'd kind of expected to see us stuck in gianfar on that lock, but the SysRQ-T doesn't show us hung up anywhere in gianfar itself. [This was on a base 2.6.33, with just a small sysrq fix patch] [df841a30] [c0009fc4] __switch_to+0x8c/0xf8 [df841a50] [c0350160] schedule+0x354/0x92c [df841ae0] [c0331394] rpc_wait_bit_killable+0x2c/0x54 [df841af0] [c0350eb0] __wait_on_bit+0x9c/0x108 [df841b10] [c0350fc0] out_of_line_wait_on_bit+0xa4/0xb4 [df841b40] [c0331cf0] __rpc_execute+0x16c/0x398 [df841b90] [c0329abc] rpc_run_task+0x48/0x9c [df841ba0] [c0329c40] rpc_call_sync+0x54/0x88 [df841bd0] [c015e780] nfs_proc_lookup+0x94/0xe8 [df841c20] [c014eb60] nfs_lookup+0x12c/0x230 [df841d50] [c00b9680] do_lookup+0x118/0x288 [df841d80] [c00bb904] link_path_walk+0x194/0x1118 [df841df0] [c00bcb08] path_walk+0x8c/0x168 [df841e20] [c00bcd6c] do_path_lookup+0x74/0x7c [df841e40] [c00be148] do_filp_open+0x5d4/0xba4 [df841f10] [c00abe94] do_sys_open+0xac/0x190 Yeah, I don't think this is gianfar-related. It must be something else triggered by the fact that gianfar no longer sends stuff. OK, I think I found what's happening in gianfar. Some background... start_xmit() prepares new skb for transmitting, generally it does three things: 1. sets up all BDs (marks them ready to send), except the first one. 2. stores skb into tx_queue-tx_skbuff so that clean_tx_ring() would cleanup it later. 3. sets up the first BD, i.e. marks it ready. Here is what clean_tx_ring() does: 1. reads skbs from tx_queue-tx_skbuff 2. Checks if the *last* BD is ready. If it's still ready [to send] then it it isn't transmitted, so clean_tx_ring() returns. Otherwise it actually cleanups BDs. All is OK. Now, if there is just one BD, code flow: - start_xmit(): stores skb into tx_skbuff. Note that the first BD (which is also the last one) isn't marked as ready, yet. - clean_tx_ring(): sees that skb is not null, *and* its lstatus says that it is NOT ready (like if BD was sent), so it cleans it up (bad!) - start_xmit(): marks BD as ready [to send], but it's too late. We can fix this simply by reordering lstatus/tx_skbuff writes. It works flawlessly on my p2020, please try it. Anton, Understood, and thanks for the explanation. Am I correct in saying that this is due to the out-of-order execution capability on powerpc ? I have one more question, why don't we use use atomic_t for num_txbdfree and completely do away with spin_locks in gfar_clean_tx_ring() and gfar_start_xmit(). In an non-SMP, scenario I would feel there is absolutely no requirement of spin_locks and in case of SMP atomic operation would be much more safer on powerpc rather than spin_locks. What is your suggestion ? -- Thanks Sandeep Thanks! diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c index 8bd3c9f..cccb409 100644 --- a/drivers/net/gianfar.c +++ b/drivers/net/gianfar.c @@ -2021,7 +2021,6 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev) } /* setup the TxBD length and buffer pointer for the first BD */ - tx_queue-tx_skbuff[tx_queue-skb_curtx] = skb; txbdp_start-bufPtr = dma_map_single(priv-ofdev-dev, skb-data, skb_headlen(skb), DMA_TO_DEVICE); @@ -2053,6 +2052,10 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev) txbdp_start-lstatus = lstatus; + eieio(); /* force lstatus write before tx_skbuff */ + + tx_queue-tx_skbuff[tx_queue-skb_curtx] = skb; + /* Update the current skb pointer to the next entry we will use * (wrapping if necessary) */ tx_queue-skb_curtx = (tx_queue-skb_curtx + 1) ___ Linuxppc-dev mailing list Linuxppc
Re: Gianfar driver failing on MPC8641D based board
Martyn Welch wrote: I have recently attempted to boot an 8641D based board from an NFS root. The boot process grinds to a halt not long after the first access of the NFS root and I receive multiple nfs: server 192.168.0.1 not responding, still trying messages. Wireshark suggests that there is no further traffic from this board at this point on. The NFS server seems to eventually try sending duplicate packets it's already sent, which results in nfs: server 192.168.0.1 OK messages, but the not responding messages resume with no further traffic from the board. I am able to boot to a ramdisk fine and the network seems to work - though I haven't really pushed the interface from it. I have attempted to git bisect, though I wasn't able to get much further than discovering the problem was introduced in the 2.6.33 merge window - at which point the gianfar network driver fails to compile (I have tried to git bisect skip many, many times to no avail). NFS booting fails for this board on todays linux-next, the master branch of Kumar's PPC tree and the head of the main tree. I have also been able to NFS boot from a random x86 based board that I have, using the head of the main tree and the linux-next tree. Copying the gianfar drivers from 2.6.32 into the head of the main tree restores the correct behaviour and I'm able to NFS boot. I have heard from others that the latest drivers work on 83xx and 85xx based boards, but it seems to be broken on at least the 8641D. I can see there has been a fair amount of work done on the gianfar driver, I assume that this is a bug introduced by the multiple queue support, but I'm way out of my depth on this. I have just compiled 2.6.33 for the Freescale MPC8641_HPCN demo board and am having still experiencing the problems outlined in my previous email, though I have noticed that I tend to be able to boot from cold, but my boot fails on reboot. Hitting the reset button doesn't help, I need to actually power the machine on and off again for it to work. As before, I'm way out of my depth in this, any one have any ideas? Below is a dump of the failed boot process: U-Boot 2009.01-00181-gc1b7c70 (Jan 30 2009 - 11:17:31) Freescale PowerPC CPU: Core: E600 Core 0, Version: 0.2, (0x80040202) System: Unknown, Version: 2.0, (0x80900120) Clocks: CPU:1000 MHz, MPX: 400 MHz, DDR: 200 MHz, LBC: 25 MHz L2: Enabled Board: MPC8641HPCN, System ID: 0x10, System Version: 0x10, FPGA Version: 0x22 I2C: ready DRAM: DDR: 1 GB FLASH: 8 MB Invalid ID (ff ff ff ff) Scanning PCI bus 01 PCI-EXPRESS 1 on bus 00 - 02 PCI-EXPRESS 2 on bus 03 - 03 Video: No radeon video card found! In:serial Out: serial Err: serial SCSI: AHCI 0001. 32 slots 4 ports 3 Gbps 0xf impl IDE mode flags: ncq ilck pm led clo pmp pio slum part scanning bus for devices... Net: eTSEC1, eTSEC2, eTSEC3, eTSEC4 = tftp 400 hpcn/uImage-torvalds-linux-2.6 Speed: 1000, full duplex Using eTSEC1 device TFTP from server 192.168.0.1; our IP address is 192.168.0.30 Filename 'hpcn/uImage-torvalds-linux-2.6'. Load address: 0x400 Loading: # # ### done Bytes transferred = 2709050 (29563a hex) = tftp 500 hpcn/mpc8641_hpcn-torvalds-linux-2.6.dtb Speed: 1000, full duplex Using eTSEC1 device TFTP from server 192.168.0.1; our IP address is 192.168.0.30 Filename 'hpcn/mpc8641_hpcn-torvalds-linux-2.6.dtb'. Load address: 0x500 Loading: # done Bytes transferred = 11523 (2d03 hex) = setenv bootargs root=/dev/nfs rw nfsroot=192.168.0.1:/tftpboot/hpcn/root/ i = bootm 400 - 500 WARNING: adjusting available memory to 1000 ## Booting kernel from Legacy Image at 0400 ... Image Name: Linux-2.6.33-1-gbaac35c Image Type: PowerPC Linux Kernel Image (gzip compressed) Data Size:2708986 Bytes = 2.6 MB Load Address: Entry Point: Verifying Checksum ... OK ## Flattened Device Tree blob at 0500 Booting using the fdt blob at 0x500 Uncompressing Kernel Image ... OK Loading Device Tree to 007fa000, end 007ffd02 ... OK Using MPC86xx HPCN machine description Total memory = 1024MB; using 2048kB for hash table (at cfe0) Linux version 2.6.33-1-gbaac35c (welc...@es-j7s4d2j) (gcc version 4.1.2) #20 CPU maps initialized for 1 thread per core bootconsole [udbg0] enabled setup_arch: bootmem mpc86xx_hpcn_setup_arch() Found FSL PCI host bridge at 0xffe08000. Firmware bus number: 0-2 PCI host bridge /p...@ffe08000 (primary) ranges: MEM 0x8000..0x9fff - 0x8000 IO 0xffc0..0xffc0 - 0x /p...@ffe08000: PCICSRBAR @ 0xfff0 Found FSL PCI host bridge at 0xffe09000. Firmware bus number: 0-0 PCI host bridge
Re: Gianfar driver failing on MPC8641D based board
Martyn Welch wrote: Martyn Welch wrote: I have recently attempted to boot an 8641D based board from an NFS root. The boot process grinds to a halt not long after the first access of the NFS root and I receive multiple nfs: server 192.168.0.1 not responding, still trying messages. Wireshark suggests that there is no further traffic from this board at this point on. The NFS server seems to eventually try sending duplicate packets it's already sent, which results in nfs: server 192.168.0.1 OK messages, but the not responding messages resume with no further traffic from the board. I am able to boot to a ramdisk fine and the network seems to work - though I haven't really pushed the interface from it. I have attempted to git bisect, though I wasn't able to get much further than discovering the problem was introduced in the 2.6.33 merge window - at which point the gianfar network driver fails to compile (I have tried to git bisect skip many, many times to no avail). NFS booting fails for this board on todays linux-next, the master branch of Kumar's PPC tree and the head of the main tree. I have also been able to NFS boot from a random x86 based board that I have, using the head of the main tree and the linux-next tree. Copying the gianfar drivers from 2.6.32 into the head of the main tree restores the correct behaviour and I'm able to NFS boot. I have heard from others that the latest drivers work on 83xx and 85xx based boards, but it seems to be broken on at least the 8641D. I can see there has been a fair amount of work done on the gianfar driver, I assume that this is a bug introduced by the multiple queue support, but I'm way out of my depth on this. I have just compiled 2.6.33 for the Freescale MPC8641_HPCN demo board and am having still experiencing the problems outlined in my previous email, though I have noticed that I tend to be able to boot from cold, but my boot fails on reboot. Hitting the reset button doesn't help, I need to actually power the machine on and off again for it to work. As before, I'm way out of my depth in this, any one have any ideas? Below is a dump of the failed boot process: U-Boot 2009.01-00181-gc1b7c70 (Jan 30 2009 - 11:17:31) Freescale PowerPC CPU: Core: E600 Core 0, Version: 0.2, (0x80040202) System: Unknown, Version: 2.0, (0x80900120) Clocks: CPU:1000 MHz, MPX: 400 MHz, DDR: 200 MHz, LBC: 25 MHz L2: Enabled Board: MPC8641HPCN, System ID: 0x10, System Version: 0x10, FPGA Version: 0x22 I2C: ready DRAM: DDR: 1 GB FLASH: 8 MB Invalid ID (ff ff ff ff) Scanning PCI bus 01 PCI-EXPRESS 1 on bus 00 - 02 PCI-EXPRESS 2 on bus 03 - 03 Video: No radeon video card found! In:serial Out: serial Err: serial SCSI: AHCI 0001. 32 slots 4 ports 3 Gbps 0xf impl IDE mode flags: ncq ilck pm led clo pmp pio slum part scanning bus for devices... Net: eTSEC1, eTSEC2, eTSEC3, eTSEC4 = tftp 400 hpcn/uImage-torvalds-linux-2.6 Speed: 1000, full duplex Using eTSEC1 device TFTP from server 192.168.0.1; our IP address is 192.168.0.30 Filename 'hpcn/uImage-torvalds-linux-2.6'. Load address: 0x400 Loading: # # ### done Bytes transferred = 2709050 (29563a hex) = tftp 500 hpcn/mpc8641_hpcn-torvalds-linux-2.6.dtb Speed: 1000, full duplex Using eTSEC1 device TFTP from server 192.168.0.1; our IP address is 192.168.0.30 Filename 'hpcn/mpc8641_hpcn-torvalds-linux-2.6.dtb'. Load address: 0x500 Loading: # done Bytes transferred = 11523 (2d03 hex) = setenv bootargs root=/dev/nfs rw nfsroot=192.168.0.1:/tftpboot/hpcn/root/ i = bootm 400 - 500 WARNING: adjusting available memory to 1000 ## Booting kernel from Legacy Image at 0400 ... Image Name: Linux-2.6.33-1-gbaac35c Image Type: PowerPC Linux Kernel Image (gzip compressed) Data Size:2708986 Bytes = 2.6 MB Load Address: Entry Point: Verifying Checksum ... OK ## Flattened Device Tree blob at 0500 Booting using the fdt blob at 0x500 Uncompressing Kernel Image ... OK Loading Device Tree to 007fa000, end 007ffd02 ... OK Using MPC86xx HPCN machine description Total memory = 1024MB; using 2048kB for hash table (at cfe0) Linux version 2.6.33-1-gbaac35c (welc...@es-j7s4d2j) (gcc version 4.1.2) #20 CPU maps initialized for 1 thread per core bootconsole [udbg0] enabled setup_arch: bootmem mpc86xx_hpcn_setup_arch() Found FSL PCI host bridge at 0xffe08000. Firmware bus number: 0-2 PCI host bridge /p...@ffe08000 (primary) ranges: MEM 0x8000..0x9fff - 0x8000 IO 0xffc0..0xffc0 - 0x /p...@ffe08000: PCICSRBAR @
Re: Gianfar driver failing on MPC8641D based board
On Thu, Feb 25, 2010 at 04:46:54PM +, Martyn Welch wrote: [...] nfs: server 192.168.0.1 not responding, still trying Further testing has shown that this isn't restricted to warm reboots, it happens from cold as well. In addition, the exact timing of the failure seems to vary, some boots have got further before failing. Unfortunately I don't have any 8641 boards near me, so I can't debug this myself. Though, I tested gianfar on MPC8568E-MDS with 2.6.33 kernel, and it seems to work just fine. I see you use SMP. Can you try to turn it off? If that will fix the issue, then it'll be a good data point. Meanwhile, I'll try SMP kernel on MPC8568 (UP), and let you know the results. Thanks, -- Anton Vorontsov email: cbouatmai...@gmail.com irc://irc.freenode.net/bd2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Gianfar driver failing on MPC8641D based board
On Feb 25, 2010, at 10:46 AM, Martyn Welch wrote: Further testing has shown that this isn't restricted to warm reboots, it happens from cold as well. In addition, the exact timing of the failure seems to vary, some boots have got further before failing. what mechanism do you use for warm resets? - k ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Gianfar driver failing on MPC8641D based board
On Thu, Feb 25, 2010 at 12:49 PM, Anton Vorontsov avoront...@ru.mvista.com wrote: On Thu, Feb 25, 2010 at 07:51:41PM +0300, Anton Vorontsov wrote: On Thu, Feb 25, 2010 at 04:46:54PM +, Martyn Welch wrote: [...] nfs: server 192.168.0.1 not responding, still trying Further testing has shown that this isn't restricted to warm reboots, it happens from cold as well. In addition, the exact timing of the failure seems to vary, some boots have got further before failing. Unfortunately I don't have any 8641 boards near me, so I can't debug this myself. Though, I tested gianfar on MPC8568E-MDS with 2.6.33 kernel, and it seems to work just fine. I see you use SMP. Can you try to turn it off? If that will fix the issue, then it'll be a good data point. Meanwhile, I'll try SMP kernel on MPC8568 (UP), and let you know the results. Nope, no luck. Can't trigger the issue. :-/ Tested with NFS boot, TCP and UDP netperf tests. I was able to reproduce it on an 8641D and bisected it down to this: --- commit a3bc1f11e9b867a4f49505ecac486a33af248b2e Author: Anton Vorontsov avoront...@ru.mvista.com Date: Tue Nov 10 14:11:10 2009 + gianfar: Revive SKB recycling Before calling gfar_clean_tx_ring() the driver grabs an irqsave spinlock, and then tries to recycle skbs. But since skb_recycle_check() returns 0 with IRQs disabled, we'll never recycle any skbs. It appears that gfar_clean_tx_ring() and gfar_start_xmit() are mostly idependent and can work in parallel, except when they modify num_txbdfree. So we can drop the lock from most sections and thus fix the skb recycling. --- ...which probably explains why you weren't seeing it on non-SMP. I'd imagine it would show up on any of the e500mc boards too. I'd done a rev-list on gianfar.[ch] from 32 to 33-rc1, and then cherry-picked those onto a 32 baseline to reduce the scale of the bisection, but I don't think that should impact the final result I got in any meaningful way. Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Gianfar driver failing on MPC8641D based board
On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote: [...] I was able to reproduce it on an 8641D and bisected it down to this: --- commit a3bc1f11e9b867a4f49505ecac486a33af248b2e Author: Anton Vorontsov avoront...@ru.mvista.com Date: Tue Nov 10 14:11:10 2009 + gianfar: Revive SKB recycling Thanks for the bisect. I have a guess why tx hangs in SMP case. Could anyone try the patch down below? [...] ...which probably explains why you weren't seeing it on non-SMP. I'd imagine it would show up on any of the e500mc boards too. Yeah.. Pity, I don't have SMP boards anymore. I'll try to get one though. diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c index 8bd3c9f..3ff3bd0 100644 --- a/drivers/net/gianfar.c +++ b/drivers/net/gianfar.c @@ -2614,6 +2614,8 @@ static int gfar_poll(struct napi_struct *napi, int budget) tx_queue = priv-tx_queue[rx_queue-qindex]; tx_cleaned += gfar_clean_tx_ring(tx_queue); + if (!tx_cleaned !tx_queue-num_txbdfree) + tx_cleaned += 1; /* don't complete napi */ rx_cleaned_per_queue = gfar_clean_rx_ring(rx_queue, budget_per_queue); rx_cleaned += rx_cleaned_per_queue; ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: Gianfar driver failing on MPC8641D based board
-Original Message- From: Anton Vorontsov [mailto:avoront...@ru.mvista.com] Sent: Friday, February 26, 2010 8:45 AM To: Paul Gortmaker Cc: Martyn Welch; linuxppc-dev list; net...@vger.kernel.org; linux-ker...@vger.kernel.org; Kumar Gopalpet-B05799; da...@davemloft.net; Kumar Gala Subject: Re: Gianfar driver failing on MPC8641D based board On Thu, Feb 25, 2010 at 07:53:30PM -0500, Paul Gortmaker wrote: [...] I was able to reproduce it on an 8641D and bisected it down to this: --- commit a3bc1f11e9b867a4f49505ecac486a33af248b2e Author: Anton Vorontsov avoront...@ru.mvista.com Date: Tue Nov 10 14:11:10 2009 + gianfar: Revive SKB recycling Thanks for the bisect. I have a guess why tx hangs in SMP case. Could anyone try the patch down below? [...] ...which probably explains why you weren't seeing it on non-SMP. I'd imagine it would show up on any of the e500mc boards too. Yeah.. Pity, I don't have SMP boards anymore. I'll try to get one though. diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c index 8bd3c9f..3ff3bd0 100644 --- a/drivers/net/gianfar.c +++ b/drivers/net/gianfar.c @@ -2614,6 +2614,8 @@ static int gfar_poll(struct napi_struct *napi, int budget) tx_queue = priv-tx_queue[rx_queue-qindex]; tx_cleaned += gfar_clean_tx_ring(tx_queue); + if (!tx_cleaned !tx_queue-num_txbdfree) + tx_cleaned += 1; /* don't complete napi */ rx_cleaned_per_queue = gfar_clean_rx_ring(rx_queue, budget_per_queue); rx_cleaned += rx_cleaned_per_queue; Anton, There is also one more issue that I have been observing with the patch gianfar: Revive SKB recycling. The issue is when I do a IPV4 forwarding test scenario with bidirectional flows (SMP environment). I am using Spirent smart bits (smartflow) for automation testing and I frequently observe smart flow reporting Rx packet counte greater than Tx packet count. Duplicate packets might have been received. To just get over the issue I have removed this patch and I didn't see the issue. To a certain extent I could get over the problem by using atomic_t for num_txbdfree (atomic_add and atomic_dec instructions for updating the num_txbdfree) and completely removing the spin_locks in the tx routines. Also, I feel we might want to make some more changes to the gfar_clean_tx_ring( ) and gfar_start_xmit() routines so that they can operate parallely. I am really sorry for not posting it a bit earlier as I am caught up with some urgent issues. -- Thanks Sandeep ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Gianfar driver failing on MPC8641D based board
I have recently attempted to boot an 8641D based board from an NFS root. The boot process grinds to a halt not long after the first access of the NFS root and I receive multiple nfs: server 192.168.0.1 not responding, still trying messages. Wireshark suggests that there is no further traffic from this board at this point on. The NFS server seems to eventually try sending duplicate packets it's already sent, which results in nfs: server 192.168.0.1 OK messages, but the not responding messages resume with no further traffic from the board. I am able to boot to a ramdisk fine and the network seems to work - though I haven't really pushed the interface from it. I have attempted to git bisect, though I wasn't able to get much further than discovering the problem was introduced in the 2.6.33 merge window - at which point the gianfar network driver fails to compile (I have tried to git bisect skip many, many times to no avail). NFS booting fails for this board on todays linux-next, the master branch of Kumar's PPC tree and the head of the main tree. I have also been able to NFS boot from a random x86 based board that I have, using the head of the main tree and the linux-next tree. Copying the gianfar drivers from 2.6.32 into the head of the main tree restores the correct behaviour and I'm able to NFS boot. I have heard from others that the latest drivers work on 83xx and 85xx based boards, but it seems to be broken on at least the 8641D. I can see there has been a fair amount of work done on the gianfar driver, I assume that this is a bug introduced by the multiple queue support, but I'm way out of my depth on this. I'm also off for the next week - so if I'm quiet, it'll be because of that. Martyn -- Martyn Welch (Principal Software Engineer) | Registered in England and GE Intelligent Platforms | Wales (3828642) at 100 T +44(0)127322748| Barbirolli Square, Manchester, E martyn.we...@ge.com| M2 3AB VAT:GB 927559189 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev