Re: Linux 2.6.19.2: maybe a bug inside the r8169 network driver (was Re: Linux 2.6.19.2: Freeze with CIFS mount)
Just to alert potential readers, that the bug is now discussed there : http://bugzilla.kernel.org/show_bug.cgi?id=8143 Eric Lacombe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.19.2: maybe a bug inside the r8169 network driver (was Re: Linux 2.6.19.2: Freeze with CIFS mount)
Just to alert potential readers, that the bug is now discussed there : http://bugzilla.kernel.org/show_bug.cgi?id=8143 Eric Lacombe - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.19.2: maybe a bug inside the r8169 network driver (was Re: Linux 2.6.19.2: Freeze with CIFS mount)
Hello, I've just triggered the bug again but _now_ without the nvidia proprietary module. Unfortunately, I hadn't enable the DEBUG options yet. Nevertheless, It seems that the bug was triggered when the NAS was going to awake and that 2 user applications wanted to access it during his awakening. Maybe it could give you some clues about where the problem could be (it seems to be a deadlock that occur when (maybe) the r8169 driver is waiting to serve one application and that an other one reclaim the same type of service, dunno...). I will give you more information and a trace if I can obtain it. Thanks Eric Lacombe On Monday 05 March 2007 23:16:40 Francois Romieu wrote: > Eric Lacombe <[EMAIL PROTECTED]> : > [...] > > > Also, if you have some new ideas about the problem or what I could try to > > trigger it more frequently (I already wake up the NAS as more as I can, > > but maybe I could write a script to do that), I would be thankful. > > You can add more DEBUG options for spinlock and stack usage, disable > preempt and pray for a trace before the crash. If you do not have a > second host to add more traffic (wrt to bandwidth and/or pps), try to > dd your disk and/or your remote storage to /dev/null while watching TV. > > You are out of luck if it does not crash more easily. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.19.2: maybe a bug inside the r8169 network driver (was Re: Linux 2.6.19.2: Freeze with CIFS mount)
Hello, I've just triggered the bug again but _now_ without the nvidia proprietary module. Unfortunately, I hadn't enable the DEBUG options yet. Nevertheless, It seems that the bug was triggered when the NAS was going to awake and that 2 user applications wanted to access it during his awakening. Maybe it could give you some clues about where the problem could be (it seems to be a deadlock that occur when (maybe) the r8169 driver is waiting to serve one application and that an other one reclaim the same type of service, dunno...). I will give you more information and a trace if I can obtain it. Thanks Eric Lacombe On Monday 05 March 2007 23:16:40 Francois Romieu wrote: Eric Lacombe [EMAIL PROTECTED] : [...] Also, if you have some new ideas about the problem or what I could try to trigger it more frequently (I already wake up the NAS as more as I can, but maybe I could write a script to do that), I would be thankful. You can add more DEBUG options for spinlock and stack usage, disable preempt and pray for a trace before the crash. If you do not have a second host to add more traffic (wrt to bandwidth and/or pps), try to dd your disk and/or your remote storage to /dev/null while watching TV. You are out of luck if it does not crash more easily. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.19.2: maybe a bug inside the r8169 network driver (was Re: Linux 2.6.19.2: Freeze with CIFS mount)
Hello, Sorry for the long time without giving new information about the problem/bug, but I wasn't able to trigger it since now. Well, this time, my system freeze on a 2.6.20.1. I try to use the magic sysrq (which works fine when my system is running), but nothing happen (unraw, reboot, etc.). Also, as I used the nvidia proprietary module when I reported this problem, I've not open PR at bugzilla.kernel.org. During 5 days, my system on a fresh 2.6.20.1 hasn't freezed without the nvidia proprietary driver (just the nv driver from X). So, I thought it was ok and I modprobe the nvidia driver (cause with nv I have some minor display problems). Since today, after 7 days of well-behavior, my system has freezed again. So, now, I check again to see if it could crash without the nvidia driver... G.. I will also do what you recommend here : > > 6. You may setup a cron to monitor an ethtool dump of the register of > >the 8169 at regular interval. ifconfig and /proc/interrupts could > >exhibit some unusual drift as time passes on too. I join the result of dmesg as wanted. Feel free to ask more information if needed. (the cifs log looks like before so I haven't joined it) Also, if you have some new ideas about the problem or what I could try to trigger it more frequently (I already wake up the NAS as more as I can, but maybe I could write a script to do that), I would be thankful. Best regards, Eric Lacombe On Wednesday 14 February 2007 11:49:52 Eric Lacombe wrote: > On Tuesday 13 February 2007 21:30:47 Francois Romieu wrote: > > Eric Lacombe <[EMAIL PROTECTED]> : > > [...] > > > > > That problem also remind me that when I compiled this driver without > > > the "CONFIG_NET_ETHERNET" (in the section "Ethernet (10 or 100Mbit)"), > > > I have really poor performance with the net device. Maybe it is > > > related, or not ;) > > > > > > If it gives you more ideas ? > > > Maybe it could be interesting to know about the r8169 maintainer, but I > > > dont know who he is. > > > > 1. $ ls > >arch crypto include kernel mm scripts > >blockDocumentation init lib net security > >COPYING driversipc MAINTAINERS README sound > >CREDITS fs Kbuild Makefile REPORTING-BUGS usr > > > >The maintainer of the r8169 driver is listed in the MAINTAINERS file. > > I see, thanks ;) > > (I thought the MAINTAINERS file was not fully maintained ;) > > > 2. Disabling CONFIG_NET_ETHERNET is a bad idea. Don't do that. > > ok, but why having it only inside the "ethernet 100" menu ? > It is misleading, no ? > > > 3. See tethereal -w or tcpdump on the adequate interface to save a > >traffic dump. > > yep, but the problem is that I cant do that from the NAS Box. I will try to > monitor the traffic via the system that will freeze... For the moment I > can't monitor the net traffic from an alternate PC, but soon. > > > 4. Are you using a binary module for your video adapter ? > > yes, I suppose that I have to unload this one before doing further tests. > > > 5. How does the 2.6.20 version of the r8169 driver behave ? > > I don't have installed it yet, but I'll do it this evening. > > > 6. You may setup a cron to monitor an ethtool dump of the register of > >the 8169 at regular interval. ifconfig and /proc/interrupts could > >exhibit some unusual drift as time passes on too. > > I will do that. When I could put a third system to monitor the traffic, I > will make "the system that freeze" keep sending that information to it. > > > 7. A dmesg would be welcome. > > I could do that, this evening. > > > 8. Please open a PR at bugzilla.kernel.org. > > ok > > > |...] > > | > > > > There are various ways to analyze system hangs including (at least in > > > > some cases) getting a system dump which > > > > can be used to isolate the failing location - hopefully > > > > > > I would like to have more detailed help, if possible. > > > > CONFIG_MAGIC_SYSRQ is set. Check that the magic sysrq is not disabled at > > runtime through /etc/sysctl.conf. See Documentation/sysrq.txt for > > details. > > ok > > > Please keep Steve French in the loop. > > ok > > Thanks for your response ;) > > Eric Linux version 2.6.20.1 ([EMAIL PROTECTED]) (gcc version 4.1.1 (Gentoo 4.1.1-r3)) #16 SMP PREEMPT Wed Feb 21 01:32:51 CET 2007 Command line: root=/dev/sda3 vga=791 BIOS-provided physical RAM map: BIOS-e820: - 000
Re: Linux 2.6.19.2: maybe a bug inside the r8169 network driver (was Re: Linux 2.6.19.2: Freeze with CIFS mount)
Hello, Sorry for the long time without giving new information about the problem/bug, but I wasn't able to trigger it since now. Well, this time, my system freeze on a 2.6.20.1. I try to use the magic sysrq (which works fine when my system is running), but nothing happen (unraw, reboot, etc.). Also, as I used the nvidia proprietary module when I reported this problem, I've not open PR at bugzilla.kernel.org. During 5 days, my system on a fresh 2.6.20.1 hasn't freezed without the nvidia proprietary driver (just the nv driver from X). So, I thought it was ok and I modprobe the nvidia driver (cause with nv I have some minor display problems). Since today, after 7 days of well-behavior, my system has freezed again. So, now, I check again to see if it could crash without the nvidia driver... G.. I will also do what you recommend here : 6. You may setup a cron to monitor an ethtool dump of the register of the 8169 at regular interval. ifconfig and /proc/interrupts could exhibit some unusual drift as time passes on too. I join the result of dmesg as wanted. Feel free to ask more information if needed. (the cifs log looks like before so I haven't joined it) Also, if you have some new ideas about the problem or what I could try to trigger it more frequently (I already wake up the NAS as more as I can, but maybe I could write a script to do that), I would be thankful. Best regards, Eric Lacombe On Wednesday 14 February 2007 11:49:52 Eric Lacombe wrote: On Tuesday 13 February 2007 21:30:47 Francois Romieu wrote: Eric Lacombe [EMAIL PROTECTED] : [...] That problem also remind me that when I compiled this driver without the CONFIG_NET_ETHERNET (in the section Ethernet (10 or 100Mbit)), I have really poor performance with the net device. Maybe it is related, or not ;) If it gives you more ideas ? Maybe it could be interesting to know about the r8169 maintainer, but I dont know who he is. 1. $ ls arch crypto include kernel mm scripts blockDocumentation init lib net security COPYING driversipc MAINTAINERS README sound CREDITS fs Kbuild Makefile REPORTING-BUGS usr The maintainer of the r8169 driver is listed in the MAINTAINERS file. I see, thanks ;) (I thought the MAINTAINERS file was not fully maintained ;) 2. Disabling CONFIG_NET_ETHERNET is a bad idea. Don't do that. ok, but why having it only inside the ethernet 100 menu ? It is misleading, no ? 3. See tethereal -w or tcpdump on the adequate interface to save a traffic dump. yep, but the problem is that I cant do that from the NAS Box. I will try to monitor the traffic via the system that will freeze... For the moment I can't monitor the net traffic from an alternate PC, but soon. 4. Are you using a binary module for your video adapter ? yes, I suppose that I have to unload this one before doing further tests. 5. How does the 2.6.20 version of the r8169 driver behave ? I don't have installed it yet, but I'll do it this evening. 6. You may setup a cron to monitor an ethtool dump of the register of the 8169 at regular interval. ifconfig and /proc/interrupts could exhibit some unusual drift as time passes on too. I will do that. When I could put a third system to monitor the traffic, I will make the system that freeze keep sending that information to it. 7. A dmesg would be welcome. I could do that, this evening. 8. Please open a PR at bugzilla.kernel.org. ok |...] | There are various ways to analyze system hangs including (at least in some cases) getting a system dump which can be used to isolate the failing location - hopefully I would like to have more detailed help, if possible. CONFIG_MAGIC_SYSRQ is set. Check that the magic sysrq is not disabled at runtime through /etc/sysctl.conf. See Documentation/sysrq.txt for details. ok Please keep Steve French in the loop. ok Thanks for your response ;) Eric Linux version 2.6.20.1 ([EMAIL PROTECTED]) (gcc version 4.1.1 (Gentoo 4.1.1-r3)) #16 SMP PREEMPT Wed Feb 21 01:32:51 CET 2007 Command line: root=/dev/sda3 vga=791 BIOS-provided physical RAM map: BIOS-e820: - 0009ec00 (usable) BIOS-e820: 0009ec00 - 000a (reserved) BIOS-e820: 000e4000 - 0010 (reserved) BIOS-e820: 0010 - 7ffa (usable) BIOS-e820: 7ffa - 7ffae000 (ACPI data) BIOS-e820: 7ffae000 - 7ffe (ACPI NVS) BIOS-e820: 7ffe - 8000 (reserved) BIOS-e820: fee0 - fee01000 (reserved) BIOS-e820: ffb0 - 0001 (reserved) Entering add_active_range(0, 0, 158) 0 entries of 256 used Entering add_active_range(0, 256, 524192) 1 entries of 256 used end_pfn_map
Re: Linux 2.6.19.2: maybe a bug inside the r8169 network driver (was Re: Linux 2.6.19.2: Freeze with CIFS mount)
On Tuesday 13 February 2007 21:30:47 Francois Romieu wrote: > Eric Lacombe <[EMAIL PROTECTED]> : > [...] > > > That problem also remind me that when I compiled this driver without > > the "CONFIG_NET_ETHERNET" (in the section "Ethernet (10 or 100Mbit)"), I > > have really poor performance with the net device. Maybe it is related, or > > not ;) > > > > If it gives you more ideas ? > > Maybe it could be interesting to know about the r8169 maintainer, but I > > dont know who he is. > > 1. $ ls >arch crypto include kernel mm scripts >blockDocumentation init lib net security >COPYING driversipc MAINTAINERS README sound >CREDITS fs Kbuild Makefile REPORTING-BUGS usr > >The maintainer of the r8169 driver is listed in the MAINTAINERS file. I see, thanks ;) (I thought the MAINTAINERS file was not fully maintained ;) > > 2. Disabling CONFIG_NET_ETHERNET is a bad idea. Don't do that. ok, but why having it only inside the "ethernet 100" menu ? It is misleading, no ? > > 3. See tethereal -w or tcpdump on the adequate interface to save a >traffic dump. yep, but the problem is that I cant do that from the NAS Box. I will try to monitor the traffic via the system that will freeze... For the moment I can't monitor the net traffic from an alternate PC, but soon. > > 4. Are you using a binary module for your video adapter ? yes, I suppose that I have to unload this one before doing further tests. > > 5. How does the 2.6.20 version of the r8169 driver behave ? I don't have installed it yet, but I'll do it this evening. > > 6. You may setup a cron to monitor an ethtool dump of the register of >the 8169 at regular interval. ifconfig and /proc/interrupts could >exhibit some unusual drift as time passes on too. I will do that. When I could put a third system to monitor the traffic, I will make "the system that freeze" keep sending that information to it. > > 7. A dmesg would be welcome. I could do that, this evening. > > 8. Please open a PR at bugzilla.kernel.org. ok > > |...] > | > > > There are various ways to analyze system hangs including (at least in > > > some cases) getting a system dump which > > > can be used to isolate the failing location - hopefully > > > > I would like to have more detailed help, if possible. > > CONFIG_MAGIC_SYSRQ is set. Check that the magic sysrq is not disabled at > runtime through /etc/sysctl.conf. See Documentation/sysrq.txt for details. ok > > Please keep Steve French in the loop. ok Thanks for your response ;) Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.19.2: maybe a bug inside the r8169 network driver (was Re: Linux 2.6.19.2: Freeze with CIFS mount)
On Tuesday 13 February 2007 21:30:47 Francois Romieu wrote: Eric Lacombe [EMAIL PROTECTED] : [...] That problem also remind me that when I compiled this driver without the CONFIG_NET_ETHERNET (in the section Ethernet (10 or 100Mbit)), I have really poor performance with the net device. Maybe it is related, or not ;) If it gives you more ideas ? Maybe it could be interesting to know about the r8169 maintainer, but I dont know who he is. 1. $ ls arch crypto include kernel mm scripts blockDocumentation init lib net security COPYING driversipc MAINTAINERS README sound CREDITS fs Kbuild Makefile REPORTING-BUGS usr The maintainer of the r8169 driver is listed in the MAINTAINERS file. I see, thanks ;) (I thought the MAINTAINERS file was not fully maintained ;) 2. Disabling CONFIG_NET_ETHERNET is a bad idea. Don't do that. ok, but why having it only inside the ethernet 100 menu ? It is misleading, no ? 3. See tethereal -w or tcpdump on the adequate interface to save a traffic dump. yep, but the problem is that I cant do that from the NAS Box. I will try to monitor the traffic via the system that will freeze... For the moment I can't monitor the net traffic from an alternate PC, but soon. 4. Are you using a binary module for your video adapter ? yes, I suppose that I have to unload this one before doing further tests. 5. How does the 2.6.20 version of the r8169 driver behave ? I don't have installed it yet, but I'll do it this evening. 6. You may setup a cron to monitor an ethtool dump of the register of the 8169 at regular interval. ifconfig and /proc/interrupts could exhibit some unusual drift as time passes on too. I will do that. When I could put a third system to monitor the traffic, I will make the system that freeze keep sending that information to it. 7. A dmesg would be welcome. I could do that, this evening. 8. Please open a PR at bugzilla.kernel.org. ok |...] | There are various ways to analyze system hangs including (at least in some cases) getting a system dump which can be used to isolate the failing location - hopefully I would like to have more detailed help, if possible. CONFIG_MAGIC_SYSRQ is set. Check that the magic sysrq is not disabled at runtime through /etc/sysctl.conf. See Documentation/sysrq.txt for details. ok Please keep Steve French in the loop. ok Thanks for your response ;) Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.19.2: Freeze with CIFS mount
First of all, thank you for your answer. On Wednesday 31 January 2007 04:51:35 Steven French wrote: > The cifs entries in the dmesg log do not indicate any errors, much less > show the cause of this > particular problem. ok. > > The repeated entry: > CIFS VFS: Send error in SETFSUnixInfo = -5 > is expected on connection to certain older versions of Samba servers (or > other servers that > only partially support the current CIFS Unix Extensions). It is harmless. > > It would be useful to know (e.g. if it is possible to trace the network > traffic on the server side on your NAS box) whether > any network traffic from the client is being sent when (or just before) > the hang occurs. Unfortunately, I can't easily do stuff on the NAS box other than what it was provided for. (An interesting project exists about the first version of the Maxtor Shared Storage : openmss, but this one is based on a totally different hardware). > > It is possible that the restarting of the NAS box allows reconnection of > the smb/cifs session to proceed > which presumably could be hanging or looping in the network adapter > driver, the tcp stack or cifs on > the client, but it is hard to tell without more information. I don't > know much about either of the > GigE drivers loaded on your system to determine if there is an easy way to > tell their state. The network device I use is : "D-Link System Inc DGE-528T Gigabit Ethernet Adapter (rev 10)", and the driver used is the one for "Realtek 8169 PCI Gigabit Ethernet adapter" (in the 2.6.19.2) which is the only one that recognizes this device. That problem also remind me that when I compiled this driver without the "CONFIG_NET_ETHERNET" (in the section "Ethernet (10 or 100Mbit)"), I have really poor performance with the net device. Maybe it is related, or not ;) If it gives you more ideas ? Maybe it could be interesting to know about the r8169 maintainer, but I dont know who he is. > > There are various ways to analyze system hangs including (at least in some > cases) getting a system dump which > can be used to isolate the failing location - hopefully Could you give me some worthful URLs ? Thank you again. Eric > > [EMAIL PROTECTED] wrote on 01/30/2007 06:37:48 AM: [...] > Steve French > Senior Software Engineer > Linux Technology Center - IBM Austin > phone: 512-838-2294 > email: sfrench at-sign us dot ibm dot com > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.19.2: Freeze with CIFS mount
First of all, thank you for your answer. On Wednesday 31 January 2007 04:51:35 Steven French wrote: The cifs entries in the dmesg log do not indicate any errors, much less show the cause of this particular problem. ok. The repeated entry: CIFS VFS: Send error in SETFSUnixInfo = -5 is expected on connection to certain older versions of Samba servers (or other servers that only partially support the current CIFS Unix Extensions). It is harmless. It would be useful to know (e.g. if it is possible to trace the network traffic on the server side on your NAS box) whether any network traffic from the client is being sent when (or just before) the hang occurs. Unfortunately, I can't easily do stuff on the NAS box other than what it was provided for. (An interesting project exists about the first version of the Maxtor Shared Storage : openmss, but this one is based on a totally different hardware). It is possible that the restarting of the NAS box allows reconnection of the smb/cifs session to proceed which presumably could be hanging or looping in the network adapter driver, the tcp stack or cifs on the client, but it is hard to tell without more information. I don't know much about either of the GigE drivers loaded on your system to determine if there is an easy way to tell their state. The network device I use is : D-Link System Inc DGE-528T Gigabit Ethernet Adapter (rev 10), and the driver used is the one for Realtek 8169 PCI Gigabit Ethernet adapter (in the 2.6.19.2) which is the only one that recognizes this device. That problem also remind me that when I compiled this driver without the CONFIG_NET_ETHERNET (in the section Ethernet (10 or 100Mbit)), I have really poor performance with the net device. Maybe it is related, or not ;) If it gives you more ideas ? Maybe it could be interesting to know about the r8169 maintainer, but I dont know who he is. There are various ways to analyze system hangs including (at least in some cases) getting a system dump which can be used to isolate the failing location - hopefully Could you give me some worthful URLs ? Thank you again. Eric [EMAIL PROTECTED] wrote on 01/30/2007 06:37:48 AM: [...] Steve French Senior Software Engineer Linux Technology Center - IBM Austin phone: 512-838-2294 email: sfrench at-sign us dot ibm dot com - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/