debugging lack of link events
Dan, Any suggestions for how to debug the fact that NM isn't receiving link-up/link-down events from my e1000 after suspend/resume? I really don't know enough of the udev/hal/dbus interaction to know where to start looking, nor do I understand the underlying protocols well enough to know how to insert myself and watch all the traffic... Any (1st grade, even) suggestions would be greatly appreciated. I find it quite hard to live with losing network after a suspend/resume. Thanks in advance, -derek -- Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory Member, MIT Student Information Processing Board (SIPB) URL: http://web.mit.edu/warlord/PP-ASEL-IA N1NWH [EMAIL PROTECTED]PGP key available ___ NetworkManager-list mailing list NetworkManager-list@gnome.org http://mail.gnome.org/mailman/listinfo/networkmanager-list
Re: debugging lack of link events
On Wed, 2005-08-10 at 11:19 -0400, Derek Atkins wrote: Any suggestions for how to debug the fact that NM isn't receiving link-up/link-down events from my e1000 after suspend/resume? I really don't know enough of the udev/hal/dbus interaction to know where to start looking, nor do I understand the underlying protocols well enough to know how to insert myself and watch all the traffic... Any (1st grade, even) suggestions would be greatly appreciated. I find it quite hard to live with losing network after a suspend/resume. The link monitoring should be unrelated to HAL/udev/DBUS. It is does via netlink sockets, at a low-level. See src/nm-netlink-monitor.c. Does your wired device work otherwise on return from resume? It could very well be that NM is not detecting it properly and the link detection is just a fallout of that. Robert Love ___ NetworkManager-list mailing list NetworkManager-list@gnome.org http://mail.gnome.org/mailman/listinfo/networkmanager-list
Re: debugging lack of link events
On Wed, 2005-08-10 at 11:36 -0400, Derek Atkins wrote: Yep. I see kernel-log linkup/linkdown messages just fine after a suspend/resume.. And if I stop NetworkManager I can ifup eth0 and it works just fine. Is the device still listed in HAL? Now that I know where to look and I can see if the netlink socket needs to get reset after a resume.. I have an e1000 and it works fine after resume.. Robert Love ___ NetworkManager-list mailing list NetworkManager-list@gnome.org http://mail.gnome.org/mailman/listinfo/networkmanager-list
Re: debugging lack of link events
Quoting Robert Love [EMAIL PROTECTED]: On Wed, 2005-08-10 at 11:36 -0400, Derek Atkins wrote: Yep. I see kernel-log linkup/linkdown messages just fine after a suspend/resume.. And if I stop NetworkManager I can ifup eth0 and it works just fine. Is the device still listed in HAL? How do I tell? Now that I know where to look and I can see if the netlink socket needs to get reset after a resume.. I have an e1000 and it works fine after resume.. What version of NM? I'm using STABLE_0_3 on FC3. One reason I'm not sure it's NM is that restarting NM doesn't fix the problem. That's why I'm thinking it's HAL or dbus. But even when I restart NM it knows there's an eth0 device and initializes it. But it doesn't notice link after a suspend/resume. This is also true if I unload the driver... :( Robert Love -derek -- Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory Member, MIT Student Information Processing Board (SIPB) URL: http://web.mit.edu/warlord/PP-ASEL-IA N1NWH [EMAIL PROTECTED]PGP key available ___ NetworkManager-list mailing list NetworkManager-list@gnome.org http://mail.gnome.org/mailman/listinfo/networkmanager-list
Re: debugging lack of link events
Quoting Dan Williams [EMAIL PROTECTED]: I'm looking at this too, my e1000 driver seems to take a _really_ long time to notify the netlink socket that the device has changed status (like 20s or so?). That's not cool. I'm attempting to distill a testcase that I could send to kernel people, you might be able to use it too to narrow the causes down. Well, I do see the kernel log message almost immediately. I haven't looked into the kernel code-path between the link-up printk() and the netlink message. Anyways, I need to leave for dinner -- I should be back in several hours to follow up. Dan -derek -- Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory Member, MIT Student Information Processing Board (SIPB) URL: http://web.mit.edu/warlord/PP-ASEL-IA N1NWH [EMAIL PROTECTED]PGP key available ___ NetworkManager-list mailing list NetworkManager-list@gnome.org http://mail.gnome.org/mailman/listinfo/networkmanager-list
Re: debugging lack of link events
On Wed, 2005-08-10 at 11:45 -0400, Derek Atkins wrote: Now that I know where to look and I can see if the netlink socket needs to get reset after a resume.. I have an e1000 and it works fine after resume.. What version of NM? I'm using STABLE_0_3 on FC3. In 0.3, I think we're using HAL to do the netlink stuff, we just get notifications from HAL that link status changed. The netlink code is in hal, I think in hald/linux/netdevice.c (or something like that), near the bottom. Dan ___ NetworkManager-list mailing list NetworkManager-list@gnome.org http://mail.gnome.org/mailman/listinfo/networkmanager-list
Re: debugging lack of link events
On Wed, 2005-08-10 at 11:45 -0400, Derek Atkins wrote: How do I tell? $ hal-device | grep eth0 Now that I know where to look and I can see if the netlink socket needs to get reset after a resume.. I have an e1000 and it works fine after resume.. What version of NM? I'm using STABLE_0_3 on FC3. Oh, CVS HEAD. One reason I'm not sure it's NM is that restarting NM doesn't fix the problem. That's why I'm thinking it's HAL or dbus. But even when I restart NM it knows there's an eth0 device and initializes it. But it doesn't notice link after a suspend/resume. This is also true if I unload the driver... :( Yah, it might not be NM... but I don't think its HAL, because those things don't do the link monitoring. If NM can see the device, then all of that is working. It could be the driver, getting stuck. Although I guess it is possible that NM is not properly handling the link monitoring netlink socket. But, as I said, it works for me with the same NIC. Dan, anything change in nm-netlink-monitor.c between STABLE_0_3 and HEAD? Robert Love ___ NetworkManager-list mailing list NetworkManager-list@gnome.org http://mail.gnome.org/mailman/listinfo/networkmanager-list
Re: debugging lack of link events
On Wed, 2005-08-10 at 11:47 -0400, Derek Atkins wrote: Quoting Dan Williams [EMAIL PROTECTED]: I'm looking at this too, my e1000 driver seems to take a _really_ long time to notify the netlink socket that the device has changed status (like 20s or so?). That's not cool. I'm attempting to distill a testcase that I could send to kernel people, you might be able to use it too to narrow the causes down. Well, I do see the kernel log message almost immediately. I haven't looked into the kernel code-path between the link-up printk() and the netlink message. Well, the kernel logs most likely come from netif_carrier_on()/netif_carrier_off(), which the driver calls when it knows that the link is either on or off. That should queue up the netlink message, but my guess is that something is getting lost of delayed between the driver telling the kernel that the link has changed, and the kernel writing that data to the userspace netlink socket that HAL listens on. Dan ___ NetworkManager-list mailing list NetworkManager-list@gnome.org http://mail.gnome.org/mailman/listinfo/networkmanager-list
Re: debugging lack of link events
On Wed, 2005-08-10 at 11:51 -0400, Robert Love wrote: On Wed, 2005-08-10 at 11:45 -0400, Derek Atkins wrote: Yah, it might not be NM... but I don't think its HAL, because those things don't do the link monitoring. If NM can see the device, then all of that is working. Hal 0.4x does netlink monitoring, that got moved to NM when david decided that HAL shouldn't be doing link status stuff. It could be the driver, getting stuck. Although I guess it is possible that NM is not properly handling the link monitoring netlink socket. But, as I said, it works for me with the same NIC. I'm trying to distill a testcase that shows this. I've got one that works with the GObject stuff, I'm bringing that down to a plain file descriptor-based testcase now. So we'll see. But its an awefully long time between when mii-tool shows the link is gone (which is polling of course) and when the GObject based testcase actually gets notified that the link is gone (which is listening to the netlink socket). Dan, anything change in nm-netlink-monitor.c between STABLE_0_3 and HEAD? nm-netlink-monitor.c came along with the move to more recent HAL, I think for HAL 0.5x and dbus 0.3x. Previously, the netlink code lived in HAL's linux backend. Dan ___ NetworkManager-list mailing list NetworkManager-list@gnome.org http://mail.gnome.org/mailman/listinfo/networkmanager-list