Re: msk msk0 watchdog timeout freeze hang lock stop problem
On Thu, Aug 27, 2015 at 11:29:28AM +0200, Johann Hugo wrote: It's working for me so far and I haven't seen any watchdog timeouts. With 10.2-RELEASE I got timeouts and lost connectivity in less that a minute. Ok, great. Committed in r287238. Thanks again. Johann On Wed, Aug 26, 2015 at 10:28 AM, Yonghyeon PYUN pyu...@gmail.com wrote: On Wed, Aug 26, 2015 at 10:06:29AM +0200, Johann Hugo wrote: 10.2-RELEASE does not work for me. It works for a very short while and then it stops with msk0 watchdog timeout errors Thanks a lot for your report. This is the first report for msk(4) watchdog timeouts on 10.2-RELEASE. I'm not sure what patch Roosevelt was talking about, but the patch in this thread works for me: https://lists.freebsd.org/pipermail/freebsd-stable/2015-April/082226.html I've changed MSK_STAT_ALIGN from 4096 to 8192 in if_mskreg.h and it's been running stable for the last week. I see. I'm under the impression that RX/TX descriptor ring alignment shall trigger the same issue so it would be better to know how attached patch works on your box. Thanks. Johann On Sun, Aug 16, 2015 at 2:08 PM, Yonghyeon PYUN pyu...@gmail.com wrote: On Wed, Aug 12, 2015 at 09:44:06AM -0400, Roosevelt Littleton wrote: Hi, So, I can confirm with the attached patch. I have a working msk0 that hasn't failed for the past month. I considered this problem fix for me. Since, I have went a long time without any problems. Thanks! I'm not sure which patch you used. Given that users reported 10.2-RELEASE works, it would be great if you revert local patch and try it again on 10.2-RELEASE. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: msk msk0 watchdog timeout freeze hang lock stop problem
It's working for me so far and I haven't seen any watchdog timeouts. With 10.2-RELEASE I got timeouts and lost connectivity in less that a minute. Johann On Wed, Aug 26, 2015 at 10:28 AM, Yonghyeon PYUN pyu...@gmail.com wrote: On Wed, Aug 26, 2015 at 10:06:29AM +0200, Johann Hugo wrote: 10.2-RELEASE does not work for me. It works for a very short while and then it stops with msk0 watchdog timeout errors Thanks a lot for your report. This is the first report for msk(4) watchdog timeouts on 10.2-RELEASE. I'm not sure what patch Roosevelt was talking about, but the patch in this thread works for me: https://lists.freebsd.org/pipermail/freebsd-stable/2015-April/082226.html I've changed MSK_STAT_ALIGN from 4096 to 8192 in if_mskreg.h and it's been running stable for the last week. I see. I'm under the impression that RX/TX descriptor ring alignment shall trigger the same issue so it would be better to know how attached patch works on your box. Thanks. Johann On Sun, Aug 16, 2015 at 2:08 PM, Yonghyeon PYUN pyu...@gmail.com wrote: On Wed, Aug 12, 2015 at 09:44:06AM -0400, Roosevelt Littleton wrote: Hi, So, I can confirm with the attached patch. I have a working msk0 that hasn't failed for the past month. I considered this problem fix for me. Since, I have went a long time without any problems. Thanks! I'm not sure which patch you used. Given that users reported 10.2-RELEASE works, it would be great if you revert local patch and try it again on 10.2-RELEASE. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: msk msk0 watchdog timeout freeze hang lock stop problem
10.2-RELEASE does not work for me. It works for a very short while and then it stops with msk0 watchdog timeout errors I'm not sure what patch Roosevelt was talking about, but the patch in this thread works for me: https://lists.freebsd.org/pipermail/freebsd-stable/2015-April/082226.html I've changed MSK_STAT_ALIGN from 4096 to 8192 in if_mskreg.h and it's been running stable for the last week. Johann On Sun, Aug 16, 2015 at 2:08 PM, Yonghyeon PYUN pyu...@gmail.com wrote: On Wed, Aug 12, 2015 at 09:44:06AM -0400, Roosevelt Littleton wrote: Hi, So, I can confirm with the attached patch. I have a working msk0 that hasn't failed for the past month. I considered this problem fix for me. Since, I have went a long time without any problems. Thanks! I'm not sure which patch you used. Given that users reported 10.2-RELEASE works, it would be great if you revert local patch and try it again on 10.2-RELEASE. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: msk msk0 watchdog timeout freeze hang lock stop problem
On Wed, Aug 26, 2015 at 10:06:29AM +0200, Johann Hugo wrote: 10.2-RELEASE does not work for me. It works for a very short while and then it stops with msk0 watchdog timeout errors Thanks a lot for your report. This is the first report for msk(4) watchdog timeouts on 10.2-RELEASE. I'm not sure what patch Roosevelt was talking about, but the patch in this thread works for me: https://lists.freebsd.org/pipermail/freebsd-stable/2015-April/082226.html I've changed MSK_STAT_ALIGN from 4096 to 8192 in if_mskreg.h and it's been running stable for the last week. I see. I'm under the impression that RX/TX descriptor ring alignment shall trigger the same issue so it would be better to know how attached patch works on your box. Thanks. Johann On Sun, Aug 16, 2015 at 2:08 PM, Yonghyeon PYUN pyu...@gmail.com wrote: On Wed, Aug 12, 2015 at 09:44:06AM -0400, Roosevelt Littleton wrote: Hi, So, I can confirm with the attached patch. I have a working msk0 that hasn't failed for the past month. I considered this problem fix for me. Since, I have went a long time without any problems. Thanks! I'm not sure which patch you used. Given that users reported 10.2-RELEASE works, it would be great if you revert local patch and try it again on 10.2-RELEASE. Index: sys/dev/msk/if_mskreg.h === --- sys/dev/msk/if_mskreg.h (revision 281587) +++ sys/dev/msk/if_mskreg.h (working copy) @@ -2175,13 +2175,8 @@ #define MSK_ADDR_LO(x) ((uint64_t) (x) 0xUL) #define MSK_ADDR_HI(x) ((uint64_t) (x) 32) -/* - * At first I guessed 8 bytes, the size of a single descriptor, would be - * required alignment constraints. But, it seems that Yukon II have 4096 - * bytes boundary alignment constraints. - */ -#define MSK_RING_ALIGN 4096 -#define MSK_STAT_ALIGN 4096 +#define MSK_RING_ALIGN 32768 +#define MSK_STAT_ALIGN 32768 /* Rx descriptor data structure */ struct msk_rx_desc { ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: msk msk0 watchdog timeout freeze hang lock stop problem
On Wed, Aug 12, 2015 at 09:44:06AM -0400, Roosevelt Littleton wrote: Hi, So, I can confirm with the attached patch. I have a working msk0 that hasn't failed for the past month. I considered this problem fix for me. Since, I have went a long time without any problems. Thanks! I'm not sure which patch you used. Given that users reported 10.2-RELEASE works, it would be great if you revert local patch and try it again on 10.2-RELEASE. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: msk msk0 watchdog timeout freeze hang lock stop problem
On 08/12/2015 04:44 PM, Roosevelt Littleton wrote: Hi, So, I can confirm with the attached patch. I have a working msk0 that hasn't failed for the past month. I considered this problem fix for me. Since, I have went a long time without any problems. Thanks! Roosevelt ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org Since 10.2-RC1 it works for me, too; now on 10.2-RELEASE. And I don't use any patches, still. -Alnis ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: msk msk0 watchdog timeout freeze hang lock stop problem
Hi, So, I can confirm with the attached patch. I have a working msk0 that hasn't failed for the past month. I considered this problem fix for me. Since, I have went a long time without any problems. Thanks! Roosevelt ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: msk msk0 watchdog timeout freeze hang lock stop problem
On Sat, Jul 25, 2015 at 02:08:10PM +0300, Alnis Morics wrote: Just tried 10.2-RC1 amd64 GENERIC, and the problem seems to be gone. I was even able to scp a 500 MB file. Could it be related to this fix in BETA2, as mentioned in the announcement, The watchdog(4) device has been fixed to print to the correct buffer.? msk(4) will show watchdog timeouts when it detects driver TX path is in stuck condition but I believe this has nothing to do with watchdog(4). There was no msk(4) code change in 10.2-RC1. If you happen to see the watchdog timeouts again, please try attached patch and let me know whether it makes any difference for you. I didn't get much feedbacks on the patch so I'm not sure whether it really fixes the root cause. pciconf -lv [..] mskc0@pci0:9:0:0:class=0x02 card=0xc072144d chip=0x435411ab rev=0x00 hdr=0x00 vendor = 'Marvell Technology Group Ltd.' device = '88E8040 PCI-E Fast Ethernet Controller' class = network subclass = ethernet Index: sys/dev/msk/if_mskreg.h === --- sys/dev/msk/if_mskreg.h (revision 281587) +++ sys/dev/msk/if_mskreg.h (working copy) @@ -2175,13 +2175,8 @@ #define MSK_ADDR_LO(x) ((uint64_t) (x) 0xUL) #define MSK_ADDR_HI(x) ((uint64_t) (x) 32) -/* - * At first I guessed 8 bytes, the size of a single descriptor, would be - * required alignment constraints. But, it seems that Yukon II have 4096 - * bytes boundary alignment constraints. - */ -#define MSK_RING_ALIGN 4096 -#define MSK_STAT_ALIGN 4096 +#define MSK_RING_ALIGN 32768 +#define MSK_STAT_ALIGN 32768 /* Rx descriptor data structure */ struct msk_rx_desc { ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: msk msk0 watchdog timeout freeze hang lock stop problem
On 07/26/2015 01:40 PM, Yonghyeon PYUN wrote: On Sat, Jul 25, 2015 at 02:08:10PM +0300, Alnis Morics wrote: Just tried 10.2-RC1 amd64 GENERIC, and the problem seems to be gone. I was even able to scp a 500 MB file. Could it be related to this fix in BETA2, as mentioned in the announcement, The watchdog(4) device has been fixed to print to the correct buffer.? msk(4) will show watchdog timeouts when it detects driver TX path is in stuck condition but I believe this has nothing to do with watchdog(4). There was no msk(4) code change in 10.2-RC1. If you happen to see the watchdog timeouts again, please try attached patch and let me know whether it makes any difference for you. I didn't get much feedbacks on the patch so I'm not sure whether it really fixes the root cause. pciconf -lv [..] mskc0@pci0:9:0:0:class=0x02 card=0xc072144d chip=0x435411ab rev=0x00 hdr=0x00 vendor = 'Marvell Technology Group Ltd.' device = '88E8040 PCI-E Fast Ethernet Controller' class = network subclass = ethernet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org Thanks, Pyun. If the watchdog timeouts reappear, I'll try the patch and give notice about the results. -Alnis ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: msk msk0 watchdog timeout freeze hang lock stop problem
=197 mskc0: msk_handle_events: sd=0xfe011e23b620 sd-msk_control=1610612806 control=1610612806 mskc0: msk_handle_events: Break #5 cons=196 csrread=197 mskc0: msk_handle_events: Break #5 cons=197 csrread=198 ... mskc0: msk_handle_events: Break #5 cons=510 csrread=511 mskc0: msk_handle_events: Break #5 cons=511 csrread=512 mskc0: msk_handle_events: Break #1 cons=512 csrread=513 mskc0: msk_handle_events: sd=0xfe011e23c000 sd-msk_control=0 control=0 mskc0: msk_handle_events: Break #1 cons=512 csrread=513 mskc0: msk_handle_events: sd=0xfe011e23c000 sd-msk_control=0 control=0 mskc0: msk_handle_events: Break #1 cons=512 csrread=513 mskc0: msk_handle_events: sd=0xfe011e23c000 sd-msk_control=0 control=0 mskc0: msk_handle_events: Break #1 cons=512 csrread=513 mskc0: msk_handle_events: sd=0xfe011e23c000 sd-msk_control=0 control=0 mskc0: msk_handle_events: Break #1 cons=512 csrread=513 mskc0: msk_handle_events: sd=0xfe011e23c000 sd-msk_control=0 control=0 mskc0: msk_handle_events: Break #1 cons=512 csrread=513 mskc0: msk_handle_events: sd=0xfe011e23c000 sd-msk_control=0 control=0 mskc0: msk_handle_events: Break #1 cons=512 csrread=513 mskc0: msk_handle_events: sd=0xfe011e23c000 sd-msk_control=0 control=0 ... mskc0: msk_handle_events: Break #1 cons=512 csrread=519 mskc0: msk_handle_events: sd=0xfe011e23c000 sd-msk_control=0 control=0 mskc0: msk_handle_events: Break #1 cons=512 csrread=519 mskc0: msk_handle_events: sd=0xfe011e23c000 sd-msk_control=0 control=0 ...etc From: owner-freebsd-sta...@freebsd.org [owner-freebsd-sta...@freebsd.org] on behalf of Yonghyeon PYUN [pyu...@gmail.com] Sent: 13 April 2015 09:13 To: Gareth Wyn Roberts Cc: freebsd-stable@freebsd.org Subject: Re: msk msk0 watchdog timeout freeze hang lock stop problem On Sun, Apr 12, 2015 at 05:57:34PM +, Gareth Wyn Roberts wrote: I've run in to problems using the msk device where initially it works well enough to set DHCP etc. but stops/freezes as soon as any appreciable network traffic occurs . There are several threads describing similar symptoms over the past two years or more. I've been following several false leads but have finally found a solution (at least it solves my problem). I'm running a standard FreeBSD 10.1-RELEASE and the NIC is detected as: mskc0: Marvell Yukon 88E8057 Gigabit Ethernet mem 0xfa00-0xfa003fff irq 19 at device 0.0 on pci6 msk0: Marvell Technology Group Ltd. Yukon Ultra 2 Id 0xba Rev 0x00 on mskc0 msk0: Ethernet address: 00:13:77:e9:df:eb miibus0: MII bus on msk0 e1000phy0: Marvell 88E1149 Gigabit PHY PHY 0 on miibus0 e1000phy0: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-ma ster, auto, auto-flow The network worked when using the i386 release, but failed for the amd64 release (as reported previously) which prompted me to disable 64-bit DMA (the patch for this is attached below). This worked for the first kernel built but mysteriously failed when another unrelated part of the kernel was changed (a usb driver) and the kernel recompiled. So identical msk driver code worked in one kernel but not the second! This suggested that alignment differences between the two kernels were causing the msk driver to fail. Others have reported varying behaviour depending on different circumstances. It transpires that changing just one value in the if_mskreg.h file solved all my problems. Subsequently I have not been able to make it fail under heavy network traffic in either 32-bit or 64-bit mode. I'm working on 10.1-RELEASE source, i.e. if_msk.c revision 262524 and if_mskreg.h revision 264442. Thanks for letting me know your findings. I really appreciate that. I recall that the alignment requirement of status LEs(List Elements in Marvell terms) is 2048 and the maximum size of the status LEs is 4096 bytes(Actual alignment seems to be much lower value like 32 or 64 bytes, but alignment 2048 is chosen to avoid silicon bugs). Later experiments showed some variants of Yukon II require 4096 bytes alignment and I changed the alignment to 4096 in the past. It seems your finding indicates msk(4) needs 8192 alignment for status LEs. However this does not explain how and why the same code in 8.x/9.x works well. In addition, it's not common to require alignment size greater than PAGE_SIZE on x86 given that the maximum size of DMA buffer is 4096 bytes. I have to check whether there was a change in bus_dma(9) between 8.x/9.x and 10.x but it needs more time due to lack of spare time. Probably you can verify the DMA address of status LEs meets the following requirements both on i386 and amd64. - Alignment is 4096. - Number of DMA segment is 1. - DMA segment base address plus DMA segment size does not cross a PAGE_SIZE boundary. Here's the patch to if_mskreg.h --- if_mskreg.h-orig2014-11-11 20:02:58.0
Re: msk msk0 watchdog timeout freeze hang lock stop problem
On Wed, Apr 15, 2015 at 09:52:09PM +, Gareth Wyn Roberts wrote: I've inserted code to print some values which show the differences between specifying 4096 or 8192 for MSK_STAT_ALIGN. In both cases the status buffer has length 0x4000 (8x2048=16K) but the alignments are different as expected, respectively start addresses 0x5c3b000 or 0xbdc2c000. The following values were output from functions msk_status_dma_alloc(), msk_dmamap_cb() and msk_handle_events(). The Break #n refer to breaks in msk_handle_events(). #1 occurs if ((control HW_OWNER) == 0), #5 is OP_RXSTAT and #6 is OP_TXINDEXLE. The first output is for MSK_STAT_ALIGN=8192. It continues normally. Although not shown here, it reaches cons=2047 then cons=0 as expected. The second output is for MSK_STAT_ALIGN=4096. Although there can be isolated occurences of Break #1 (e.g. cons=196) (?are these to be expected?), it continues normally until cons=512. At this point it continually invokes the #1 block because the msk_control from msk_stat_ring[512] is always zero and the network hangs immediately. This suggests the Yukon Ultra 2 88E8057 can't access the next 4096 memory block, but why not? Yes, it seems the status LE block is not updated at all for MSK_STAT_ALIGN == 4096 and some elements of the status block looks suspicious(put index increases but the value in the location is 0). I vaguely guess this indicates there are DMA alignment and/or DMA boundary issues. The maximum number of elements of the status block is 4096 so the maximum size of the status block is 32KB. For i386, msk(4) uses 8KB status block(1024 elements). For 64bit architectures, the block size is increased to 16KB(2048 elements). Probably the safe alignment value for the status block would be 32K. This looks excessive value to me but it shall avoid guessing DMA boundary issue. Please let me know if any further information would be helpful. Thanks a lot. I've attached a diff which sets the alignment of TX/RX ring and status block to 32KB. Not sure whether this also addresses other msk(4) related watchdog timeouts. Index: sys/dev/msk/if_mskreg.h === --- sys/dev/msk/if_mskreg.h (revision 281587) +++ sys/dev/msk/if_mskreg.h (working copy) @@ -2175,13 +2175,8 @@ #define MSK_ADDR_LO(x) ((uint64_t) (x) 0xUL) #define MSK_ADDR_HI(x) ((uint64_t) (x) 32) -/* - * At first I guessed 8 bytes, the size of a single descriptor, would be - * required alignment constraints. But, it seems that Yukon II have 4096 - * bytes boundary alignment constraints. - */ -#define MSK_RING_ALIGN 4096 -#define MSK_STAT_ALIGN 4096 +#define MSK_RING_ALIGN 32768 +#define MSK_STAT_ALIGN 32768 /* Rx descriptor data structure */ struct msk_rx_desc { ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
RE: msk msk0 watchdog timeout freeze hang lock stop problem
sd-msk_control=1610612806 control=1610612806 mskc0: msk_handle_events: Break #5 cons=196 csrread=197 mskc0: msk_handle_events: Break #5 cons=197 csrread=198 ... mskc0: msk_handle_events: Break #5 cons=510 csrread=511 mskc0: msk_handle_events: Break #5 cons=511 csrread=512 mskc0: msk_handle_events: Break #1 cons=512 csrread=513 mskc0: msk_handle_events: sd=0xfe011e23c000 sd-msk_control=0 control=0 mskc0: msk_handle_events: Break #1 cons=512 csrread=513 mskc0: msk_handle_events: sd=0xfe011e23c000 sd-msk_control=0 control=0 mskc0: msk_handle_events: Break #1 cons=512 csrread=513 mskc0: msk_handle_events: sd=0xfe011e23c000 sd-msk_control=0 control=0 mskc0: msk_handle_events: Break #1 cons=512 csrread=513 mskc0: msk_handle_events: sd=0xfe011e23c000 sd-msk_control=0 control=0 mskc0: msk_handle_events: Break #1 cons=512 csrread=513 mskc0: msk_handle_events: sd=0xfe011e23c000 sd-msk_control=0 control=0 mskc0: msk_handle_events: Break #1 cons=512 csrread=513 mskc0: msk_handle_events: sd=0xfe011e23c000 sd-msk_control=0 control=0 mskc0: msk_handle_events: Break #1 cons=512 csrread=513 mskc0: msk_handle_events: sd=0xfe011e23c000 sd-msk_control=0 control=0 ... mskc0: msk_handle_events: Break #1 cons=512 csrread=519 mskc0: msk_handle_events: sd=0xfe011e23c000 sd-msk_control=0 control=0 mskc0: msk_handle_events: Break #1 cons=512 csrread=519 mskc0: msk_handle_events: sd=0xfe011e23c000 sd-msk_control=0 control=0 ...etc From: owner-freebsd-sta...@freebsd.org [owner-freebsd-sta...@freebsd.org] on behalf of Yonghyeon PYUN [pyu...@gmail.com] Sent: 13 April 2015 09:13 To: Gareth Wyn Roberts Cc: freebsd-stable@freebsd.org Subject: Re: msk msk0 watchdog timeout freeze hang lock stop problem On Sun, Apr 12, 2015 at 05:57:34PM +, Gareth Wyn Roberts wrote: I've run in to problems using the msk device where initially it works well enough to set DHCP etc. but stops/freezes as soon as any appreciable network traffic occurs . There are several threads describing similar symptoms over the past two years or more. I've been following several false leads but have finally found a solution (at least it solves my problem). I'm running a standard FreeBSD 10.1-RELEASE and the NIC is detected as: mskc0: Marvell Yukon 88E8057 Gigabit Ethernet mem 0xfa00-0xfa003fff irq 19 at device 0.0 on pci6 msk0: Marvell Technology Group Ltd. Yukon Ultra 2 Id 0xba Rev 0x00 on mskc0 msk0: Ethernet address: 00:13:77:e9:df:eb miibus0: MII bus on msk0 e1000phy0: Marvell 88E1149 Gigabit PHY PHY 0 on miibus0 e1000phy0: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-ma ster, auto, auto-flow The network worked when using the i386 release, but failed for the amd64 release (as reported previously) which prompted me to disable 64-bit DMA (the patch for this is attached below). This worked for the first kernel built but mysteriously failed when another unrelated part of the kernel was changed (a usb driver) and the kernel recompiled. So identical msk driver code worked in one kernel but not the second! This suggested that alignment differences between the two kernels were causing the msk driver to fail. Others have reported varying behaviour depending on different circumstances. It transpires that changing just one value in the if_mskreg.h file solved all my problems. Subsequently I have not been able to make it fail under heavy network traffic in either 32-bit or 64-bit mode. I'm working on 10.1-RELEASE source, i.e. if_msk.c revision 262524 and if_mskreg.h revision 264442. Thanks for letting me know your findings. I really appreciate that. I recall that the alignment requirement of status LEs(List Elements in Marvell terms) is 2048 and the maximum size of the status LEs is 4096 bytes(Actual alignment seems to be much lower value like 32 or 64 bytes, but alignment 2048 is chosen to avoid silicon bugs). Later experiments showed some variants of Yukon II require 4096 bytes alignment and I changed the alignment to 4096 in the past. It seems your finding indicates msk(4) needs 8192 alignment for status LEs. However this does not explain how and why the same code in 8.x/9.x works well. In addition, it's not common to require alignment size greater than PAGE_SIZE on x86 given that the maximum size of DMA buffer is 4096 bytes. I have to check whether there was a change in bus_dma(9) between 8.x/9.x and 10.x but it needs more time due to lack of spare time. Probably you can verify the DMA address of status LEs meets the following requirements both on i386 and amd64. - Alignment is 4096. - Number of DMA segment is 1. - DMA segment base address plus DMA segment size does not cross a PAGE_SIZE boundary. Here's the patch to if_mskreg.h --- if_mskreg.h-orig2014-11-11 20:02:58.0 + +++ if_mskreg.h 2015
msk msk0 watchdog timeout freeze hang lock stop problem
Hm... I patched if_msk.c with if_msk.c.rev262524.dma.diff (attachment-001.bin) and if_mskreg.h with if_mskreg.h.rev264442.dma.diff (attachment-002.bin), and nothing changed: scp'ing 50 MB soon got stalled and ended up with broken pipe, as it was before. I have 10.1-RELEASE-p9 amd64 pciconf -lv: [..] mskc0@pci0:9:0:0:class=0x02 card=0xc072144d chip=0x435411ab rev=0x00 hdr=0x00 vendor = 'Marvell Technology Group Ltd.' device = '88E8040 PCI-E Fast Ethernet Controller' class = network subclass = ethernet Alnis ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: msk msk0 watchdog timeout freeze hang lock stop problem
On Sun, Apr 12, 2015 at 05:57:34PM +, Gareth Wyn Roberts wrote: I've run in to problems using the msk device where initially it works well enough to set DHCP etc. but stops/freezes as soon as any appreciable network traffic occurs . There are several threads describing similar symptoms over the past two years or more. I've been following several false leads but have finally found a solution (at least it solves my problem). I'm running a standard FreeBSD 10.1-RELEASE and the NIC is detected as: mskc0: Marvell Yukon 88E8057 Gigabit Ethernet mem 0xfa00-0xfa003fff irq 19 at device 0.0 on pci6 msk0: Marvell Technology Group Ltd. Yukon Ultra 2 Id 0xba Rev 0x00 on mskc0 msk0: Ethernet address: 00:13:77:e9:df:eb miibus0: MII bus on msk0 e1000phy0: Marvell 88E1149 Gigabit PHY PHY 0 on miibus0 e1000phy0: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-ma ster, auto, auto-flow The network worked when using the i386 release, but failed for the amd64 release (as reported previously) which prompted me to disable 64-bit DMA (the patch for this is attached below). This worked for the first kernel built but mysteriously failed when another unrelated part of the kernel was changed (a usb driver) and the kernel recompiled. So identical msk driver code worked in one kernel but not the second! This suggested that alignment differences between the two kernels were causing the msk driver to fail. Others have reported varying behaviour depending on different circumstances. It transpires that changing just one value in the if_mskreg.h file solved all my problems. Subsequently I have not been able to make it fail under heavy network traffic in either 32-bit or 64-bit mode. I'm working on 10.1-RELEASE source, i.e. if_msk.c revision 262524 and if_mskreg.h revision 264442. Thanks for letting me know your findings. I really appreciate that. I recall that the alignment requirement of status LEs(List Elements in Marvell terms) is 2048 and the maximum size of the status LEs is 4096 bytes(Actual alignment seems to be much lower value like 32 or 64 bytes, but alignment 2048 is chosen to avoid silicon bugs). Later experiments showed some variants of Yukon II require 4096 bytes alignment and I changed the alignment to 4096 in the past. It seems your finding indicates msk(4) needs 8192 alignment for status LEs. However this does not explain how and why the same code in 8.x/9.x works well. In addition, it's not common to require alignment size greater than PAGE_SIZE on x86 given that the maximum size of DMA buffer is 4096 bytes. I have to check whether there was a change in bus_dma(9) between 8.x/9.x and 10.x but it needs more time due to lack of spare time. Probably you can verify the DMA address of status LEs meets the following requirements both on i386 and amd64. - Alignment is 4096. - Number of DMA segment is 1. - DMA segment base address plus DMA segment size does not cross a PAGE_SIZE boundary. Here's the patch to if_mskreg.h --- if_mskreg.h-orig2014-11-11 20:02:58.0 + +++ if_mskreg.h 2015-04-12 18:47:20.0 +0100 @@ -2179,9 +2179,11 @@ * At first I guessed 8 bytes, the size of a single descriptor, would be * required alignment constraints. But, it seems that Yukon II have 4096 * bytes boundary alignment constraints. + * And it seems that the DMA status region for the Yukon Ultra 2 (88E8057) + * requires 8192 byte alignment to prevent locking. */ #define MSK_RING_ALIGN 4096 -#defineMSK_STAT_ALIGN 4096 +#defineMSK_STAT_ALIGN 8192 The patches to both files which also implement a MSK_64BIT_DMA_DISABLE flag are attached. Perhaps the developers would consider committing these as it may be useful for future debugging. If you have more than 4GB memory installed and disables 64bit DMA addressing, msk(4) shall use bounce buffers. Passing packets through bounce buffers involves copy operation and it costs a lot. You can check hw.busdma sysctl node to see whether there are drivers that use bounce buffers. And if you want to disable 64bit DMA on 64bit architectures, add '#undef MSK_64BIT_DMA' just below BUS_SPACE_MAXADDR check in if_mskreg.h. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
msk msk0 watchdog timeout freeze hang lock stop problem
I've run in to problems using the msk device where initially it works well enough to set DHCP etc. but stops/freezes as soon as any appreciable network traffic occurs . There are several threads describing similar symptoms over the past two years or more. I've been following several false leads but have finally found a solution (at least it solves my problem). I'm running a standard FreeBSD 10.1-RELEASE and the NIC is detected as: mskc0: Marvell Yukon 88E8057 Gigabit Ethernet mem 0xfa00-0xfa003fff irq 19 at device 0.0 on pci6 msk0: Marvell Technology Group Ltd. Yukon Ultra 2 Id 0xba Rev 0x00 on mskc0 msk0: Ethernet address: 00:13:77:e9:df:eb miibus0: MII bus on msk0 e1000phy0: Marvell 88E1149 Gigabit PHY PHY 0 on miibus0 e1000phy0: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-ma ster, auto, auto-flow The network worked when using the i386 release, but failed for the amd64 release (as reported previously) which prompted me to disable 64-bit DMA (the patch for this is attached below). This worked for the first kernel built but mysteriously failed when another unrelated part of the kernel was changed (a usb driver) and the kernel recompiled. So identical msk driver code worked in one kernel but not the second! This suggested that alignment differences between the two kernels were causing the msk driver to fail. Others have reported varying behaviour depending on different circumstances. It transpires that changing just one value in the if_mskreg.h file solved all my problems. Subsequently I have not been able to make it fail under heavy network traffic in either 32-bit or 64-bit mode. I'm working on 10.1-RELEASE source, i.e. if_msk.c revision 262524 and if_mskreg.h revision 264442. Here's the patch to if_mskreg.h --- if_mskreg.h-orig2014-11-11 20:02:58.0 + +++ if_mskreg.h 2015-04-12 18:47:20.0 +0100 @@ -2179,9 +2179,11 @@ * At first I guessed 8 bytes, the size of a single descriptor, would be * required alignment constraints. But, it seems that Yukon II have 4096 * bytes boundary alignment constraints. + * And it seems that the DMA status region for the Yukon Ultra 2 (88E8057) + * requires 8192 byte alignment to prevent locking. */ #define MSK_RING_ALIGN 4096 -#defineMSK_STAT_ALIGN 4096 +#defineMSK_STAT_ALIGN 8192 The patches to both files which also implement a MSK_64BIT_DMA_DISABLE flag are attached. Perhaps the developers would consider committing these as it may be useful for future debugging. Gareth. --- if_mskreg.h-orig 2014-11-11 20:02:58.0 + +++ if_mskreg.h 2015-04-12 18:47:20.0 +0100 @@ -2179,9 +2179,11 @@ * At first I guessed 8 bytes, the size of a single descriptor, would be * required alignment constraints. But, it seems that Yukon II have 4096 * bytes boundary alignment constraints. + * And it seems that the DMA status region for the Yukon Ultra 2 (88E8057) + * requires 8192 byte alignment to prevent locking. */ #define MSK_RING_ALIGN 4096 -#define MSK_STAT_ALIGN 4096 +#define MSK_STAT_ALIGN 8192 /* Rx descriptor data structure */ struct msk_rx_desc { --- if_msk.c-orig 2014-11-11 20:02:58.0 + +++ if_msk.c 2015-04-12 02:15:12.551005000 +0100 @@ -2164,8 +2164,8 @@ error = bus_dma_tag_create( bus_get_dma_tag(sc-msk_dev), /* parent */ MSK_STAT_ALIGN, 0, /* alignment, boundary */ - BUS_SPACE_MAXADDR, /* lowaddr */ - BUS_SPACE_MAXADDR, /* highaddr */ + BUS_DMA_TAG_LOWADDR, /* lowaddr */ + BUS_DMA_TAG_HIGHADDR, /* highaddr */ NULL, NULL, /* filter, filterarg */ stat_sz, /* maxsize */ 1,/* nsegments */ @@ -2235,8 +2235,8 @@ error = bus_dma_tag_create( bus_get_dma_tag(sc_if-msk_if_dev), /* parent */ 1, 0, /* alignment, boundary */ - BUS_SPACE_MAXADDR, /* lowaddr */ - BUS_SPACE_MAXADDR, /* highaddr */ + BUS_DMA_TAG_LOWADDR, /* lowaddr */ + BUS_DMA_TAG_HIGHADDR, /* highaddr */ NULL, NULL, /* filter, filterarg */ BUS_SPACE_MAXSIZE_32BIT, /* maxsize */ 0,/* nsegments */ @@ -2252,8 +2252,8 @@ /* Create tag for Tx ring. */ error = bus_dma_tag_create(sc_if-msk_cdata.msk_parent_tag,/* parent */ MSK_RING_ALIGN, 0, /* alignment, boundary */ - BUS_SPACE_MAXADDR, /* lowaddr */ - BUS_SPACE_MAXADDR, /* highaddr */ + BUS_DMA_TAG_LOWADDR, /* lowaddr */ + BUS_DMA_TAG_HIGHADDR, /* highaddr */ NULL, NULL, /* filter, filterarg */ MSK_TX_RING_SZ, /* maxsize */ 1,/* nsegments */ @@ -2270,8 +2270,8 @@ /* Create tag for Rx ring. */ error = bus_dma_tag_create(sc_if-msk_cdata.msk_parent_tag,/* parent */ MSK_RING_ALIGN, 0, /* alignment, boundary */ - BUS_SPACE_MAXADDR, /* lowaddr */ - BUS_SPACE_MAXADDR, /* highaddr */ + BUS_DMA_TAG_LOWADDR, /* lowaddr */ + BUS_DMA_TAG_HIGHADDR, /* highaddr */ NULL,
Re: msk msk0 watchdog timeout freeze hang lock stop problem
Hi! I've run in to problems using the msk device [...] I'm working on 10.1-RELEASE source, i.e. if_msk.c revision 262524 and if_mskreg.h revision 264442. Here's the patch to if_mskreg.h [...] Thanks for the suggested fix. There are five PRs, all describe similar things: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197887 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197002 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=189404 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=186872 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=166727 I added some pointer to your posting, maybe someone can test it ? -- p...@opsec.eu+49 171 3101372 5 years to go ! ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org