debugging lack of link events

2005-08-10 Thread Derek Atkins
Dan,

Any suggestions for how to debug the fact that NM isn't receiving
link-up/link-down events from my e1000 after suspend/resume?  I really don't
know enough of the udev/hal/dbus interaction to know where to start looking,
nor do I understand the underlying protocols well enough to know how to insert
myself and watch all the traffic...

Any (1st grade, even) suggestions would be greatly appreciated.  I find it quite
hard to live with losing network after a suspend/resume.

Thanks in advance,

-derek

-- 
   Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
   Member, MIT Student Information Processing Board  (SIPB)
   URL: http://web.mit.edu/warlord/PP-ASEL-IA N1NWH
   [EMAIL PROTECTED]PGP key available
___
NetworkManager-list mailing list
NetworkManager-list@gnome.org
http://mail.gnome.org/mailman/listinfo/networkmanager-list


Re: debugging lack of link events

2005-08-10 Thread Robert Love
On Wed, 2005-08-10 at 11:19 -0400, Derek Atkins wrote:

 Any suggestions for how to debug the fact that NM isn't receiving
 link-up/link-down events from my e1000 after suspend/resume?  I really don't
 know enough of the udev/hal/dbus interaction to know where to start looking,
 nor do I understand the underlying protocols well enough to know how to insert
 myself and watch all the traffic...
 
 Any (1st grade, even) suggestions would be greatly appreciated.  I find it 
 quite
 hard to live with losing network after a suspend/resume.

The link monitoring should be unrelated to HAL/udev/DBUS.

It is does via netlink sockets, at a low-level.  See
src/nm-netlink-monitor.c.

Does your wired device work otherwise on return from resume?  It could
very well be that NM is not detecting it properly and the link detection
is just a fallout of that.

Robert Love


___
NetworkManager-list mailing list
NetworkManager-list@gnome.org
http://mail.gnome.org/mailman/listinfo/networkmanager-list


Re: debugging lack of link events

2005-08-10 Thread Robert Love
On Wed, 2005-08-10 at 11:36 -0400, Derek Atkins wrote:

 Yep.  I see kernel-log linkup/linkdown messages just fine after a
 suspend/resume..  And if I stop NetworkManager I can ifup eth0 and it works
 just fine.

Is the device still listed in HAL?

 Now that I know where to look and I can see if the netlink socket needs to get
 reset after a resume..

I have an e1000 and it works fine after resume..

Robert Love


___
NetworkManager-list mailing list
NetworkManager-list@gnome.org
http://mail.gnome.org/mailman/listinfo/networkmanager-list


Re: debugging lack of link events

2005-08-10 Thread Derek Atkins

Quoting Robert Love [EMAIL PROTECTED]:


On Wed, 2005-08-10 at 11:36 -0400, Derek Atkins wrote:


Yep.  I see kernel-log linkup/linkdown messages just fine after a
suspend/resume..  And if I stop NetworkManager I can ifup eth0 and it works
just fine.


Is the device still listed in HAL?


How do I tell?

Now that I know where to look and I can see if the netlink socket 
needs to get

reset after a resume..


I have an e1000 and it works fine after resume..


What version of NM?  I'm using STABLE_0_3 on FC3.

One reason I'm not sure it's NM is that restarting NM doesn't fix the 
problem. That's why I'm thinking it's HAL or dbus.  But even when I 
restart NM it knows

there's an eth0 device and initializes it.  But it doesn't notice link after a
suspend/resume.  This is also true if I unload the driver...  :(


Robert Love


-derek
--
  Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
  Member, MIT Student Information Processing Board  (SIPB)
  URL: http://web.mit.edu/warlord/PP-ASEL-IA N1NWH
  [EMAIL PROTECTED]PGP key available

___
NetworkManager-list mailing list
NetworkManager-list@gnome.org
http://mail.gnome.org/mailman/listinfo/networkmanager-list


Re: debugging lack of link events

2005-08-10 Thread Derek Atkins

Quoting Dan Williams [EMAIL PROTECTED]:


I'm looking at this too, my e1000 driver seems to take a _really_ long
time to notify the netlink socket that the device has changed status
(like 20s or so?).  That's not cool.  I'm attempting to distill a
testcase that I could send to kernel people, you might be able to use it
too to narrow the causes down.


Well, I do see the kernel log message almost immediately.  I haven't 
looked into

the kernel code-path between the link-up printk() and the netlink message.

Anyways, I need to leave for dinner -- I should be back in several hours to
follow up.


Dan


-derek

--
  Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
  Member, MIT Student Information Processing Board  (SIPB)
  URL: http://web.mit.edu/warlord/PP-ASEL-IA N1NWH
  [EMAIL PROTECTED]PGP key available

___
NetworkManager-list mailing list
NetworkManager-list@gnome.org
http://mail.gnome.org/mailman/listinfo/networkmanager-list


Re: debugging lack of link events

2005-08-10 Thread Dan Williams
On Wed, 2005-08-10 at 11:45 -0400, Derek Atkins wrote:
  Now that I know where to look and I can see if the netlink socket 
  needs to get
  reset after a resume..
 
  I have an e1000 and it works fine after resume..
 
 What version of NM?  I'm using STABLE_0_3 on FC3.

In 0.3, I think we're using HAL to do the netlink stuff, we just get
notifications from HAL that link status changed.  The netlink code is in
hal, I think in hald/linux/netdevice.c (or something like that), near
the bottom.

Dan


___
NetworkManager-list mailing list
NetworkManager-list@gnome.org
http://mail.gnome.org/mailman/listinfo/networkmanager-list


Re: debugging lack of link events

2005-08-10 Thread Robert Love
On Wed, 2005-08-10 at 11:45 -0400, Derek Atkins wrote:

 How do I tell?

$ hal-device | grep eth0

  Now that I know where to look and I can see if the netlink socket 
  needs to get
  reset after a resume..
 
  I have an e1000 and it works fine after resume..
 
 What version of NM?  I'm using STABLE_0_3 on FC3.

Oh, CVS HEAD.

 One reason I'm not sure it's NM is that restarting NM doesn't fix the 
 problem. That's why I'm thinking it's HAL or dbus.  But even when I 
 restart NM it knows
 there's an eth0 device and initializes it.  But it doesn't notice link after a
 suspend/resume.  This is also true if I unload the driver...  :(

Yah, it might not be NM... but I don't think its HAL, because those
things don't do the link monitoring.  If NM can see the device, then all
of that is working.

It could be the driver, getting stuck.  Although I guess it is possible
that NM is not properly handling the link monitoring netlink socket.
But, as I said, it works for me with the same NIC.

Dan, anything change in nm-netlink-monitor.c between STABLE_0_3 and
HEAD?

Robert Love


___
NetworkManager-list mailing list
NetworkManager-list@gnome.org
http://mail.gnome.org/mailman/listinfo/networkmanager-list


Re: debugging lack of link events

2005-08-10 Thread Dan Williams
On Wed, 2005-08-10 at 11:47 -0400, Derek Atkins wrote:
 Quoting Dan Williams [EMAIL PROTECTED]:
 
  I'm looking at this too, my e1000 driver seems to take a _really_ long
  time to notify the netlink socket that the device has changed status
  (like 20s or so?).  That's not cool.  I'm attempting to distill a
  testcase that I could send to kernel people, you might be able to use it
  too to narrow the causes down.
 
 Well, I do see the kernel log message almost immediately.  I haven't 
 looked into
 the kernel code-path between the link-up printk() and the netlink message.

Well, the kernel logs most likely come from
netif_carrier_on()/netif_carrier_off(), which the driver calls when it
knows that the link is either on or off.  That should queue up the
netlink message, but my guess is that something is getting lost of
delayed between the driver telling the kernel that the link has changed,
and the kernel writing that data to the userspace netlink socket that
HAL listens on.

Dan

___
NetworkManager-list mailing list
NetworkManager-list@gnome.org
http://mail.gnome.org/mailman/listinfo/networkmanager-list


Re: debugging lack of link events

2005-08-10 Thread Dan Williams
On Wed, 2005-08-10 at 11:51 -0400, Robert Love wrote:
 On Wed, 2005-08-10 at 11:45 -0400, Derek Atkins wrote:
 Yah, it might not be NM... but I don't think its HAL, because those
 things don't do the link monitoring.  If NM can see the device, then all
 of that is working.

Hal 0.4x does netlink monitoring, that got moved to NM when david
decided that HAL shouldn't be doing link status stuff.

 It could be the driver, getting stuck.  Although I guess it is possible
 that NM is not properly handling the link monitoring netlink socket.
 But, as I said, it works for me with the same NIC.

I'm trying to distill a testcase that shows this.  I've got one that
works with the GObject stuff, I'm bringing that down to a plain file
descriptor-based testcase now.  So we'll see.  But its an awefully long
time between when mii-tool shows the link is gone (which is polling of
course) and when the GObject based testcase actually gets notified that
the link is gone (which is listening to the netlink socket).

 Dan, anything change in nm-netlink-monitor.c between STABLE_0_3 and
 HEAD?

nm-netlink-monitor.c came along with the move to more recent HAL, I
think for HAL 0.5x and dbus 0.3x.  Previously, the netlink code lived in
HAL's linux backend.

Dan


___
NetworkManager-list mailing list
NetworkManager-list@gnome.org
http://mail.gnome.org/mailman/listinfo/networkmanager-list