Re: nfe0 problem (obsd 4.1)

2007-06-28 Thread Markus Ritzer

Hi!


I've noticed that once in a while the nfe0 interface will stop
sending and receiving data.  At this point I can not make it work
again.  The only solution I have is to reboot the box.  I have
installed a dc0 card in the box since.  The problem seemed
intermittent and not reliably reproducible.
I had problems like these when I ported OpenBSD to the Xbox ( 
http://tobias.schroepf.de/doku/doku.php?id=xbox:porting_openbsd_to_the_xbox  
)


You can find the patches I have made here:
http://tobias.schroepf.de/doku/doku.php?id=xbox:patch_the_openbsd_sources_network


But  don't know if this will solve your problem.



Markus Ritzer



Re: nfe0 problem (obsd 4.1)

2007-06-27 Thread Vijay Sankar
On Wednesday 27 June 2007 10:50, Tony Lambiris wrote:
 You might be interested in some unofficial patches I had created when
 experiencing the same thing. I hadn't officially released these
 because of the awful DELAY() timeout hack taken from the original nfe
 code from DragonFly BSD. Most of the updates were taken from NetBSD.
 Either way, what you would be interested in is the encap_delay stuff,
 specifically the part in nfe.c where it actually assigns the
 variable:

   case PCI_PRODUCT_NVIDIA_CK804_LAN1:
   case PCI_PRODUCT_NVIDIA_CK804_LAN2:
 + sc-sc_encap_delay = 10;
 + break;

 You would obviously have to locate where your interface matches and
 assign it there. For me, my interface is a CK804. Not sure if it was
 LAN1 or LAN2, but I assigned the delay to both anyway.

 These patches seemed to work good for me, didn't experience any
 timeouts, YMMV. Let me know if this works. These will apply cleanly
 against 4.1-RELEASE.

I downloaded your patches and would like to try it out. Thanks very 
much. Because I don't know what I am doing here, I need a bit more 
help. How can I find out whether my interface is also a CK804?

scanpci -v gave me the following:

pci bus 0x cardnum 0x08 function 0x00: vendor 0x10de device 0x0373
 nVidia Corporation MCP55 Ethernet
 CardVendor 0x1043 card 0x8239 (ASUSTeK Computer Inc., Card unknown)
  STATUS0x00b0  COMMAND 0x0007
  CLASS 0x06 0x80 0x00  REVISION 0xa2
  BIST  0x00  HEADER 0x00  LATENCY 0x00  CACHE 0x00
  BASE0 0xfe02a000  addr 0xfe02a000  MEM
  BASE1 0xb001  addr 0xb000  I/O
  BASE2 0xfe029000  addr 0xfe029000  MEM
  BASE3 0xfe028000  addr 0xfe028000  MEM
  MAX_LAT   0x14  MIN_GNT 0x01  INT_PIN 0x01  INT_LINE 0x0a
  BYTE_00x43  BYTE_1  0x10  BYTE_2  0x39  BYTE_3  0x82

pci bus 0x cardnum 0x09 function 0x00: vendor 0x10de device 0x0373
 nVidia Corporation MCP55 Ethernet
 CardVendor 0x1043 card 0x8239 (ASUSTeK Computer Inc., Card unknown)
  STATUS0x00b0  COMMAND 0x0007
  CLASS 0x06 0x80 0x00  REVISION 0xa2
  BIST  0x00  HEADER 0x00  LATENCY 0x00  CACHE 0x00
  BASE0 0xfe027000  addr 0xfe027000  MEM
  BASE1 0xac01  addr 0xac00  I/O
  BASE2 0xfe026000  addr 0xfe026000  MEM
  BASE3 0xfe025000  addr 0xfe025000  MEM
  MAX_LAT   0x14  MIN_GNT 0x01  INT_PIN 0x01  INT_LINE 0x0a
  BYTE_00x43  BYTE_1  0x10  BYTE_2  0x39  BYTE_3  0x82

dmesg shows

nfe0 at pci0 dev 8 function 0 NVIDIA MCP55 LAN rev 0xa2: irq 10, 
address 00:17:31:cb:ee:d1
eephy0 at nfe0 phy 1: Marvell 88E1116 Gigabit PHY, rev. 1

nfe1 at pci0 dev 9 function 0 NVIDIA MCP55 LAN rev 0xa2: irq 10, 
address 00:17:31:cb:dd:7a
eephy1 at nfe1 phy 1: Marvell 88E1116 Gigabit PHY, rev. 1




 http://lysergik.com/~tony/openbsd/

 On 6/25/07, patrick keshishian [EMAIL PROTECTED] wrote:
  On 6/24/07, Vijay Sankar [EMAIL PROTECTED] wrote:
   On Sunday 24 June 2007 13:50, patrick keshishian wrote:
Hi,
   
I've been noticing some strange problems with the built-in nfe0
interface on my desktop.  Actually I've seen it on two such
computers, but the description below is for my current desktop
PC.
   
The PC is running `cvs up -dP -rOPENBSD_4_1' built. I'm
including netstat, ifconfig output[1] and dmesg below[2].
   
I've noticed that once in a while the nfe0 interface will stop
sending and receiving data.  At this point I can not make it
work again.  The only solution I have is to reboot the box.  I
have installed a dc0 card in the box since.  The problem seemed
intermittent and not reliably reproducible.  But I think I
found a way to reproduce this problem on demand (at least for
the time being).  I have an ssh session to another box, on
which I run '/usr/bin/nm somelib.so'.  After a page or two of
output the terminal hangs.  At this point nfe0 becomes
unresponsive.
   
I switch to the dc0 interface and the terminal finishes the
output. Running the nm command while using the dc0 interface
doesn't cause any problems.
  
   I experienced similar problems last year and can empathize.
  
   The following items improved my situation somewhat:
  
   1) BIOS upgrade
   2) Removing dual boot (I had both OpenBSD and Windows 2003 on one
   machine. There were more errors if I did not power off after
   shutting down Windows 2003 and just did a restart from within
   Windows. If I did not unplug the machine after shutting down
   Windows, most of the time I saw watchdog timeouts but if I
   powered off the host, and then powered it back on, there were
   fewer errors)
 
  Both boxes I have run solely OpenBSD.
 
 
  One thing that I did notice was that after switching to the dc0
  interface for a short while (5 min or so?), I could switch back
  to the nfe0 and it would start responding again. Basically:
 
  # /sbin/ifconfig dc0 delete
  # /sbin/route delete default
  # /sbin/ifconfig nfe0 inet IP netmask netmask up
  # /sbin/route add 

Re: nfe0 problem (obsd 4.1)

2007-06-27 Thread Tony Lambiris

You might be interested in some unofficial patches I had created when
experiencing the same thing. I hadn't officially released these
because of the awful DELAY() timeout hack taken from the original nfe
code from DragonFly BSD. Most of the updates were taken from NetBSD.
Either way, what you would be interested in is the encap_delay stuff,
specifically the part in nfe.c where it actually assigns the variable:

case PCI_PRODUCT_NVIDIA_CK804_LAN1:
case PCI_PRODUCT_NVIDIA_CK804_LAN2:
+   sc-sc_encap_delay = 10;
+   break;

You would obviously have to locate where your interface matches and
assign it there. For me, my interface is a CK804. Not sure if it was
LAN1 or LAN2, but I assigned the delay to both anyway.

These patches seemed to work good for me, didn't experience any
timeouts, YMMV. Let me know if this works. These will apply cleanly
against 4.1-RELEASE.

http://lysergik.com/~tony/openbsd/

On 6/25/07, patrick keshishian [EMAIL PROTECTED] wrote:

On 6/24/07, Vijay Sankar [EMAIL PROTECTED] wrote:
 On Sunday 24 June 2007 13:50, patrick keshishian wrote:
  Hi,
 
  I've been noticing some strange problems with the built-in nfe0
  interface on my desktop.  Actually I've seen it on two such
  computers, but the description below is for my current desktop PC.
 
  The PC is running `cvs up -dP -rOPENBSD_4_1' built. I'm including
  netstat, ifconfig output[1] and dmesg below[2].
 
  I've noticed that once in a while the nfe0 interface will stop
  sending and receiving data.  At this point I can not make it work
  again.  The only solution I have is to reboot the box.  I have
  installed a dc0 card in the box since.  The problem seemed
  intermittent and not reliably reproducible.  But I think I found
  a way to reproduce this problem on demand (at least for the time
  being).  I have an ssh session to another box, on which I run
  '/usr/bin/nm somelib.so'.  After a page or two of output the
  terminal hangs.  At this point nfe0 becomes unresponsive.
 
  I switch to the dc0 interface and the terminal finishes the output.
  Running the nm command while using the dc0 interface doesn't cause
  any problems.

 I experienced similar problems last year and can empathize.

 The following items improved my situation somewhat:

 1) BIOS upgrade
 2) Removing dual boot (I had both OpenBSD and Windows 2003 on one
 machine. There were more errors if I did not power off after shutting
 down Windows 2003 and just did a restart from within Windows. If I did
 not unplug the machine after shutting down Windows, most of the time I
 saw watchdog timeouts but if I powered off the host, and then powered
 it back on, there were fewer errors)

Both boxes I have run solely OpenBSD.


One thing that I did notice was that after switching to the dc0
interface for a short while (5 min or so?), I could switch back
to the nfe0 and it would start responding again. Basically:

# /sbin/ifconfig dc0 delete
# /sbin/route delete default
# /sbin/ifconfig nfe0 inet IP netmask netmask up
# /sbin/route add default gateway

Therefore, a reboot isn't the only way to fix the problem (reset
the interface) as I had previously thought.  I am not sure exactly
what causes the interface to reset: idle time, no carrier, or
something completely random?


Either way, thanks for all the replies!



 I experimented with different combinations and different switches
 (10/100/1000, 10/100, and 10-Base-T). When all the hosts connected to a
 10/100 switch were running at 100 MB/s then changing nfe0 from
 autoselect to full-duplex using

 ifconfig nfe0 media 100baseTX mediaopt full-duplex

 seemed to eliminate nfe0 hangs as well as timeouts completely. I am not
 sure whether this has any rational basis or is specific to some weird
 situation in my network, but that has been my experience.

 Vijay


 
  Interestingly enough, if I redirect the output of nm to a file
  and subsequently cat the file the nfe0 interface doesn't seem
  to exhibit the same problem.
 
  I am not sure how to diagnose this problem further.  I've enabled
  debug on the nfe0 interface (/sbin/ifconfig nfe0 debug), but don't
  see any output.
 
  Any and all suggestions are welcome.
  --patrick




Re: nfe0 problem (obsd 4.1)

2007-06-27 Thread Tony Lambiris

After applying the patches, you want to go into if_nfe.c, and after
line 244 (PCI_PRODUCT_NVIDIA_MCP55_LAN2) you would want to put
sc-sc_encap_delay = 10;

On 6/27/07, Vijay Sankar [EMAIL PROTECTED] wrote:

On Wednesday 27 June 2007 10:50, Tony Lambiris wrote:
 You might be interested in some unofficial patches I had created when
 experiencing the same thing. I hadn't officially released these
 because of the awful DELAY() timeout hack taken from the original nfe
 code from DragonFly BSD. Most of the updates were taken from NetBSD.
 Either way, what you would be interested in is the encap_delay stuff,
 specifically the part in nfe.c where it actually assigns the
 variable:

   case PCI_PRODUCT_NVIDIA_CK804_LAN1:
   case PCI_PRODUCT_NVIDIA_CK804_LAN2:
 + sc-sc_encap_delay = 10;
 + break;

 You would obviously have to locate where your interface matches and
 assign it there. For me, my interface is a CK804. Not sure if it was
 LAN1 or LAN2, but I assigned the delay to both anyway.

 These patches seemed to work good for me, didn't experience any
 timeouts, YMMV. Let me know if this works. These will apply cleanly
 against 4.1-RELEASE.

I downloaded your patches and would like to try it out. Thanks very
much. Because I don't know what I am doing here, I need a bit more
help. How can I find out whether my interface is also a CK804?

scanpci -v gave me the following:

pci bus 0x cardnum 0x08 function 0x00: vendor 0x10de device 0x0373
 nVidia Corporation MCP55 Ethernet
 CardVendor 0x1043 card 0x8239 (ASUSTeK Computer Inc., Card unknown)
  STATUS0x00b0  COMMAND 0x0007
  CLASS 0x06 0x80 0x00  REVISION 0xa2
  BIST  0x00  HEADER 0x00  LATENCY 0x00  CACHE 0x00
  BASE0 0xfe02a000  addr 0xfe02a000  MEM
  BASE1 0xb001  addr 0xb000  I/O
  BASE2 0xfe029000  addr 0xfe029000  MEM
  BASE3 0xfe028000  addr 0xfe028000  MEM
  MAX_LAT   0x14  MIN_GNT 0x01  INT_PIN 0x01  INT_LINE 0x0a
  BYTE_00x43  BYTE_1  0x10  BYTE_2  0x39  BYTE_3  0x82

pci bus 0x cardnum 0x09 function 0x00: vendor 0x10de device 0x0373
 nVidia Corporation MCP55 Ethernet
 CardVendor 0x1043 card 0x8239 (ASUSTeK Computer Inc., Card unknown)
  STATUS0x00b0  COMMAND 0x0007
  CLASS 0x06 0x80 0x00  REVISION 0xa2
  BIST  0x00  HEADER 0x00  LATENCY 0x00  CACHE 0x00
  BASE0 0xfe027000  addr 0xfe027000  MEM
  BASE1 0xac01  addr 0xac00  I/O
  BASE2 0xfe026000  addr 0xfe026000  MEM
  BASE3 0xfe025000  addr 0xfe025000  MEM
  MAX_LAT   0x14  MIN_GNT 0x01  INT_PIN 0x01  INT_LINE 0x0a
  BYTE_00x43  BYTE_1  0x10  BYTE_2  0x39  BYTE_3  0x82

dmesg shows

nfe0 at pci0 dev 8 function 0 NVIDIA MCP55 LAN rev 0xa2: irq 10,
address 00:17:31:cb:ee:d1
eephy0 at nfe0 phy 1: Marvell 88E1116 Gigabit PHY, rev. 1

nfe1 at pci0 dev 9 function 0 NVIDIA MCP55 LAN rev 0xa2: irq 10,
address 00:17:31:cb:dd:7a
eephy1 at nfe1 phy 1: Marvell 88E1116 Gigabit PHY, rev. 1




 http://lysergik.com/~tony/openbsd/

 On 6/25/07, patrick keshishian [EMAIL PROTECTED] wrote:
  On 6/24/07, Vijay Sankar [EMAIL PROTECTED] wrote:
   On Sunday 24 June 2007 13:50, patrick keshishian wrote:
Hi,
   
I've been noticing some strange problems with the built-in nfe0
interface on my desktop.  Actually I've seen it on two such
computers, but the description below is for my current desktop
PC.
   
The PC is running `cvs up -dP -rOPENBSD_4_1' built. I'm
including netstat, ifconfig output[1] and dmesg below[2].
   
I've noticed that once in a while the nfe0 interface will stop
sending and receiving data.  At this point I can not make it
work again.  The only solution I have is to reboot the box.  I
have installed a dc0 card in the box since.  The problem seemed
intermittent and not reliably reproducible.  But I think I
found a way to reproduce this problem on demand (at least for
the time being).  I have an ssh session to another box, on
which I run '/usr/bin/nm somelib.so'.  After a page or two of
output the terminal hangs.  At this point nfe0 becomes
unresponsive.
   
I switch to the dc0 interface and the terminal finishes the
output. Running the nm command while using the dc0 interface
doesn't cause any problems.
  
   I experienced similar problems last year and can empathize.
  
   The following items improved my situation somewhat:
  
   1) BIOS upgrade
   2) Removing dual boot (I had both OpenBSD and Windows 2003 on one
   machine. There were more errors if I did not power off after
   shutting down Windows 2003 and just did a restart from within
   Windows. If I did not unplug the machine after shutting down
   Windows, most of the time I saw watchdog timeouts but if I
   powered off the host, and then powered it back on, there were
   fewer errors)
 
  Both boxes I have run solely OpenBSD.
 
 
  One thing that I did notice was that after switching to the dc0
  interface for a short while (5 min or so?), I could switch 

Re: nfe0 problem (obsd 4.1)

2007-06-25 Thread patrick keshishian

On 6/24/07, Vijay Sankar [EMAIL PROTECTED] wrote:

On Sunday 24 June 2007 13:50, patrick keshishian wrote:
 Hi,

 I've been noticing some strange problems with the built-in nfe0
 interface on my desktop.  Actually I've seen it on two such
 computers, but the description below is for my current desktop PC.

 The PC is running `cvs up -dP -rOPENBSD_4_1' built. I'm including
 netstat, ifconfig output[1] and dmesg below[2].

 I've noticed that once in a while the nfe0 interface will stop
 sending and receiving data.  At this point I can not make it work
 again.  The only solution I have is to reboot the box.  I have
 installed a dc0 card in the box since.  The problem seemed
 intermittent and not reliably reproducible.  But I think I found
 a way to reproduce this problem on demand (at least for the time
 being).  I have an ssh session to another box, on which I run
 '/usr/bin/nm somelib.so'.  After a page or two of output the
 terminal hangs.  At this point nfe0 becomes unresponsive.

 I switch to the dc0 interface and the terminal finishes the output.
 Running the nm command while using the dc0 interface doesn't cause
 any problems.

I experienced similar problems last year and can empathize.

The following items improved my situation somewhat:

1) BIOS upgrade
2) Removing dual boot (I had both OpenBSD and Windows 2003 on one
machine. There were more errors if I did not power off after shutting
down Windows 2003 and just did a restart from within Windows. If I did
not unplug the machine after shutting down Windows, most of the time I
saw watchdog timeouts but if I powered off the host, and then powered
it back on, there were fewer errors)


Both boxes I have run solely OpenBSD.


One thing that I did notice was that after switching to the dc0
interface for a short while (5 min or so?), I could switch back
to the nfe0 and it would start responding again. Basically:

# /sbin/ifconfig dc0 delete
# /sbin/route delete default
# /sbin/ifconfig nfe0 inet IP netmask netmask up
# /sbin/route add default gateway

Therefore, a reboot isn't the only way to fix the problem (reset
the interface) as I had previously thought.  I am not sure exactly
what causes the interface to reset: idle time, no carrier, or
something completely random?


Either way, thanks for all the replies!




I experimented with different combinations and different switches
(10/100/1000, 10/100, and 10-Base-T). When all the hosts connected to a
10/100 switch were running at 100 MB/s then changing nfe0 from
autoselect to full-duplex using

ifconfig nfe0 media 100baseTX mediaopt full-duplex

seemed to eliminate nfe0 hangs as well as timeouts completely. I am not
sure whether this has any rational basis or is specific to some weird
situation in my network, but that has been my experience.

Vijay



 Interestingly enough, if I redirect the output of nm to a file
 and subsequently cat the file the nfe0 interface doesn't seem
 to exhibit the same problem.

 I am not sure how to diagnose this problem further.  I've enabled
 debug on the nfe0 interface (/sbin/ifconfig nfe0 debug), but don't
 see any output.

 Any and all suggestions are welcome.
 --patrick




nfe0 problem (obsd 4.1)

2007-06-24 Thread patrick keshishian

Hi,

I've been noticing some strange problems with the built-in nfe0
interface on my desktop.  Actually I've seen it on two such
computers, but the description below is for my current desktop PC.

The PC is running `cvs up -dP -rOPENBSD_4_1' built. I'm including
netstat, ifconfig output[1] and dmesg below[2].

I've noticed that once in a while the nfe0 interface will stop
sending and receiving data.  At this point I can not make it work
again.  The only solution I have is to reboot the box.  I have
installed a dc0 card in the box since.  The problem seemed
intermittent and not reliably reproducible.  But I think I found
a way to reproduce this problem on demand (at least for the time
being).  I have an ssh session to another box, on which I run
'/usr/bin/nm somelib.so'.  After a page or two of output the
terminal hangs.  At this point nfe0 becomes unresponsive.

I switch to the dc0 interface and the terminal finishes the output.
Running the nm command while using the dc0 interface doesn't cause
any problems.

Interestingly enough, if I redirect the output of nm to a file
and subsequently cat the file the nfe0 interface doesn't seem
to exhibit the same problem.

I am not sure how to diagnose this problem further.  I've enabled
debug on the nfe0 interface (/sbin/ifconfig nfe0 debug), but don't
see any output.

Any and all suggestions are welcome.
--patrick

[1] netstat and ifconfig outputs:
$ /usr/bin/netstat -in
NameMtu   Network Address  Ipkts IerrsOpkts Oerrs Colls
lo0 33224 Link   1 01 0 0
lo0 33224 127/8   127.0.0.11 01 0 0
lo0 33224 ::1/128 ::1  1 01 0 0
lo0 33224 fe80::%lo0/ fe80::1%lo0  1 01 0 0
dc0 1500  Link  00:02:e3:07:cc:df 1713 0  424 7 0
dc0 1500  fe80::%dc0/ fe80::202:e3ff:fe 1713 0  424 7 0
nfe01500  Link  00:16:e6:82:17:da 1520   613  878 0 0
nfe01500  fe80::%nfe0 fe80::216:e6ff:fe 1520   613  878 0 0
nfe01500  xx.yy.ww.zz xx.yy.ww.zz2  1520   613  878 0 0
pflog0  33224 Link   0 00 0 0
enc0*   1536  Link   0 00 0 0

$ /usr/bin/netstat -rnfinet
Routing tables

Internet:
DestinationGatewayFlagsRefs  UseMtu  Interface
defaultxx.yy.ww.zz9   UGS 00  -   nfe0
xx.yy.ww.zz8/28link#2 UC  40  -   nfe0
xx.yy.ww.zz9   00:20:6f:03:a2:e5  UHLc10  -   nfe0
xx.yy.ww.zz1   link#2 UHLc02  -   nfe0
xx.yy.ww.zz3   00:01:02:c2:a1:b9  UHLc1  159  -   nfe0
xx.yy.ww.zz0   00:20:e0:68:5d:c8  UHLc1   11  - L nfe0
127/8  127.0.0.1  UGRS00  33224   lo0
127.0.0.1  127.0.0.1  UH  10  33224   lo0
224/4  127.0.0.1  URS 00  33224   lo0


$ /sbin/ifconfig
lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST mtu 33224
   groups: lo
   inet 127.0.0.1 netmask 0xff00
   inet6 ::1 prefixlen 128
   inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5
dc0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
   lladdr 00:02:e3:07:cc:df
   media: Ethernet autoselect (none)
   status: no carrier
   inet6 fe80::202:e3ff:fe07:ccdf%dc0 prefixlen 64 scopeid 0x1
nfe0: flags=8847UP,BROADCAST,DEBUG,RUNNING,SIMPLEX,MULTICAST mtu 1500
   lladdr 00:16:e6:82:17:da
   groups: egress
   media: Ethernet autoselect (100baseTX full-duplex)
   status: active
   inet6 fe80::216:e6ff:fe82:17da%nfe0 prefixlen 64 scopeid 0x2
   inet xx.yy.ww.zz2 netmask 0xfff0 broadcast xx.yy.ww.zz3
pflog0: flags=141UP,RUNNING,PROMISC mtu 33224
enc0: flags=0 mtu 1536



[2] dmesg
OpenBSD 4.1-stable (GENERIC) #0: Mon May 28 18:06:28 PDT 2007
   [EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: AMD Athlon(tm) 64 Processor 3200+ (AuthenticAMD 686-class, 512KB L2 cach
e) 2.02 GHz
cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CF
LUSH,MMX,FXSR,SSE,SSE2,SSE3
cpu0: AMD erratum 89 present, BIOS upgrade may be required
real mem  = 536375296 (523804K)
avail mem = 481710080 (470420K)
using 4278 buffers containing 26943488 bytes (26312K) of memory
mainbus0 (root)
bios0 at mainbus0: AT/286+ BIOS, date 05/11/06, BIOS32 rev. 0 @ 0xfb5f0, SMBIOS
rev. 2.3 @ 0xf0100 (43 entries)
bios0: Gigabyte Technology Co., Ltd. GA-K8N-SLi / GA-K8N-SLi-RH
apm0 at bios0: Power Management spec V1.2
apm0: AC on, battery charge unknown
apm0: flags 70102 dobusy 1 doidle 1
pcibios0 at bios0: rev 3.0 @ 0xf/0xdd64
pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xfdc00/352 (20 entries)
pcibios0: PCI 

Re: nfe0 problem (obsd 4.1)

2007-06-24 Thread Jason McIntyre
On Sun, Jun 24, 2007 at 11:50:28AM -0700, patrick keshishian wrote:
 Hi,
 
 I've been noticing some strange problems with the built-in nfe0
 interface on my desktop.  Actually I've seen it on two such
 computers, but the description below is for my current desktop PC.
 
 The PC is running `cvs up -dP -rOPENBSD_4_1' built. I'm including
 netstat, ifconfig output[1] and dmesg below[2].
 
 I've noticed that once in a while the nfe0 interface will stop
 sending and receiving data.  At this point I can not make it work
 again.  The only solution I have is to reboot the box.  I have
 installed a dc0 card in the box since.  The problem seemed
 intermittent and not reliably reproducible.  But I think I found
 a way to reproduce this problem on demand (at least for the time
 being).  I have an ssh session to another box, on which I run
 '/usr/bin/nm somelib.so'.  After a page or two of output the
 terminal hangs.  At this point nfe0 becomes unresponsive.
 

i used to see these hangs fairly often when doing a cvs up in
/usr/src. for some reason i have not seen them for an age. i am unable
to hang this box using your method, for example.

nfe(4) is not great. i think CAVEATS says it all. buyer beware ;(

jmc



Re: nfe0 problem (obsd 4.1)

2007-06-24 Thread Srebrenko Sehic

On 6/24/07, patrick keshishian [EMAIL PROTECTED] wrote:

Hi,

I've been noticing some strange problems with the built-in nfe0
interface on my desktop.  Actually I've seen it on two such
computers, but the description below is for my current desktop PC.

The PC is running `cvs up -dP -rOPENBSD_4_1' built. I'm including
netstat, ifconfig output[1] and dmesg below[2].

I've noticed that once in a while the nfe0 interface will stop
sending and receiving data.  At this point I can not make it work
again.  The only solution I have is to reboot the box.  I have
installed a dc0 card in the box since.  The problem seemed
intermittent and not reliably reproducible.  But I think I found
a way to reproduce this problem on demand (at least for the time
being).  I have an ssh session to another box, on which I run
'/usr/bin/nm somelib.so'.  After a page or two of output the
terminal hangs.  At this point nfe0 becomes unresponsive.


This is a known problem, but probably unfixable due to lack of
documentation from nvidia.
See http://cvs.openbsd.org/cgi-bin/query-pr-wrapper?full=yesnumbers=5108



Re: nfe0 problem (obsd 4.1)

2007-06-24 Thread Shane Harbour
I have one of the older Sun Ultra 20 systems that also has an nfe(4) in 
it.  It does the same thing everytime I try to cvs or put a load on the 
interface.  Only way around it was to install a second NIC.  Like 
someone else mentioned before, until more documentation is available, 
probably won't get any better.  Until then it won't bother me to run a 
second NIC.


Regards,
Shane

patrick keshishian wrote:

Hi,

I've been noticing some strange problems with the built-in nfe0
interface on my desktop.  Actually I've seen it on two such
computers, but the description below is for my current desktop PC.

The PC is running `cvs up -dP -rOPENBSD_4_1' built. I'm including
netstat, ifconfig output[1] and dmesg below[2].

I've noticed that once in a while the nfe0 interface will stop
sending and receiving data.  At this point I can not make it work
again.  The only solution I have is to reboot the box.  I have
installed a dc0 card in the box since.  The problem seemed
intermittent and not reliably reproducible.  But I think I found
a way to reproduce this problem on demand (at least for the time
being).  I have an ssh session to another box, on which I run
'/usr/bin/nm somelib.so'.  After a page or two of output the
terminal hangs.  At this point nfe0 becomes unresponsive.

I switch to the dc0 interface and the terminal finishes the output.
Running the nm command while using the dc0 interface doesn't cause
any problems.

Interestingly enough, if I redirect the output of nm to a file
and subsequently cat the file the nfe0 interface doesn't seem
to exhibit the same problem.

I am not sure how to diagnose this problem further.  I've enabled
debug on the nfe0 interface (/sbin/ifconfig nfe0 debug), but don't
see any output.

Any and all suggestions are welcome.
--patrick

[1] netstat and ifconfig outputs:
$ /usr/bin/netstat -in
NameMtu   Network Address  Ipkts IerrsOpkts 
Oerrs Colls
lo0 33224 Link   1 0
1 0 0
lo0 33224 127/8   127.0.0.11 0
1 0 0
lo0 33224 ::1/128 ::1  1 0
1 0 0
lo0 33224 fe80::%lo0/ fe80::1%lo0  1 0
1 0 0
dc0 1500  Link  00:02:e3:07:cc:df 1713 0  
424 7 0
dc0 1500  fe80::%dc0/ fe80::202:e3ff:fe 1713 0  
424 7 0
nfe01500  Link  00:16:e6:82:17:da 1520   613  
878 0 0
nfe01500  fe80::%nfe0 fe80::216:e6ff:fe 1520   613  
878 0 0
nfe01500  xx.yy.ww.zz xx.yy.ww.zz2  1520   613  
878 0 0
pflog0  33224 Link   0 0
0 0 0
enc0*   1536  Link   0 0
0 0 0


$ /usr/bin/netstat -rnfinet
Routing tables

Internet:
DestinationGatewayFlagsRefs  UseMtu  
Interface
defaultxx.yy.ww.zz9   UGS 00  -   
nfe0
xx.yy.ww.zz8/28link#2 UC  40  -   
nfe0
xx.yy.ww.zz9   00:20:6f:03:a2:e5  UHLc10  -   
nfe0
xx.yy.ww.zz1   link#2 UHLc02  -   
nfe0
xx.yy.ww.zz3   00:01:02:c2:a1:b9  UHLc1  159  -   
nfe0
xx.yy.ww.zz0   00:20:e0:68:5d:c8  UHLc1   11  - L 
nfe0

127/8  127.0.0.1  UGRS00  33224   lo0
127.0.0.1  127.0.0.1  UH  10  33224   lo0
224/4  127.0.0.1  URS 00  33224   lo0


$ /sbin/ifconfig
lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST mtu 33224
   groups: lo
   inet 127.0.0.1 netmask 0xff00
   inet6 ::1 prefixlen 128
   inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5
dc0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
   lladdr 00:02:e3:07:cc:df
   media: Ethernet autoselect (none)
   status: no carrier
   inet6 fe80::202:e3ff:fe07:ccdf%dc0 prefixlen 64 scopeid 0x1
nfe0: flags=8847UP,BROADCAST,DEBUG,RUNNING,SIMPLEX,MULTICAST mtu 1500
   lladdr 00:16:e6:82:17:da
   groups: egress
   media: Ethernet autoselect (100baseTX full-duplex)
   status: active
   inet6 fe80::216:e6ff:fe82:17da%nfe0 prefixlen 64 scopeid 0x2
   inet xx.yy.ww.zz2 netmask 0xfff0 broadcast xx.yy.ww.zz3
pflog0: flags=141UP,RUNNING,PROMISC mtu 33224
enc0: flags=0 mtu 1536



[2] dmesg
OpenBSD 4.1-stable (GENERIC) #0: Mon May 28 18:06:28 PDT 2007
   [EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: AMD Athlon(tm) 64 Processor 3200+ (AuthenticAMD 686-class, 
512KB L2 cach

e) 2.02 GHz
cpu0: 
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CF 


LUSH,MMX,FXSR,SSE,SSE2,SSE3
cpu0: AMD erratum 89 present, BIOS upgrade may be required
real mem  = 536375296 (523804K)
avail mem = 481710080 (470420K)
using 4278 buffers containing 26943488 bytes (26312K) of 

Re: nfe0 problem (obsd 4.1)

2007-06-24 Thread Vijay Sankar
On Sunday 24 June 2007 13:50, patrick keshishian wrote:
 Hi,

 I've been noticing some strange problems with the built-in nfe0
 interface on my desktop.  Actually I've seen it on two such
 computers, but the description below is for my current desktop PC.

 The PC is running `cvs up -dP -rOPENBSD_4_1' built. I'm including
 netstat, ifconfig output[1] and dmesg below[2].

 I've noticed that once in a while the nfe0 interface will stop
 sending and receiving data.  At this point I can not make it work
 again.  The only solution I have is to reboot the box.  I have
 installed a dc0 card in the box since.  The problem seemed
 intermittent and not reliably reproducible.  But I think I found
 a way to reproduce this problem on demand (at least for the time
 being).  I have an ssh session to another box, on which I run
 '/usr/bin/nm somelib.so'.  After a page or two of output the
 terminal hangs.  At this point nfe0 becomes unresponsive.

 I switch to the dc0 interface and the terminal finishes the output.
 Running the nm command while using the dc0 interface doesn't cause
 any problems.

I experienced similar problems last year and can empathize. 

The following items improved my situation somewhat:

1) BIOS upgrade
2) Removing dual boot (I had both OpenBSD and Windows 2003 on one 
machine. There were more errors if I did not power off after shutting 
down Windows 2003 and just did a restart from within Windows. If I did 
not unplug the machine after shutting down Windows, most of the time I 
saw watchdog timeouts but if I powered off the host, and then powered 
it back on, there were fewer errors)

I experimented with different combinations and different switches 
(10/100/1000, 10/100, and 10-Base-T). When all the hosts connected to a 
10/100 switch were running at 100 MB/s then changing nfe0 from 
autoselect to full-duplex using 

ifconfig nfe0 media 100baseTX mediaopt full-duplex  

seemed to eliminate nfe0 hangs as well as timeouts completely. I am not 
sure whether this has any rational basis or is specific to some weird 
situation in my network, but that has been my experience.

Vijay



 Interestingly enough, if I redirect the output of nm to a file
 and subsequently cat the file the nfe0 interface doesn't seem
 to exhibit the same problem.

 I am not sure how to diagnose this problem further.  I've enabled
 debug on the nfe0 interface (/sbin/ifconfig nfe0 debug), but don't
 see any output.

 Any and all suggestions are welcome.
 --patrick

 [1] netstat and ifconfig outputs:
 $ /usr/bin/netstat -in
 NameMtu   Network Address  Ipkts IerrsOpkts
 Oerrs Colls lo0 33224 Link   1
 01 0 0 lo0 33224 127/8   127.0.0.1   
 1 01 0 0 lo0 33224 ::1/128 ::1   
   1 01 0 0 lo0 33224 fe80::%lo0/
 fe80::1%lo0  1 01 0 0 dc0 1500 
 Link  00:02:e3:07:cc:df 1713 0  424 7 0 dc0
 1500  fe80::%dc0/ fe80::202:e3ff:fe 1713 0  424 7
 0 nfe01500  Link  00:16:e6:82:17:da 1520   613 
 878 0 0 nfe01500  fe80::%nfe0 fe80::216:e6ff:fe 1520 
  613  878 0 0 nfe01500  xx.yy.ww.zz xx.yy.ww.zz2 
 1520   613  878 0 0 pflog0  33224 Link 
  0 00 0 0 enc0*   1536  Link   
0 00 0 0

 $ /usr/bin/netstat -rnfinet
 Routing tables

 Internet:
 DestinationGatewayFlagsRefs  UseMtu 
 Interface defaultxx.yy.ww.zz9   UGS 0   
 0  -   nfe0 xx.yy.ww.zz8/28link#2 UC  4  
  0  -   nfe0 xx.yy.ww.zz9   00:20:6f:03:a2:e5  UHLc  
  10  -   nfe0 xx.yy.ww.zz1   link#2 UHLc 
   02  -   nfe0 xx.yy.ww.zz3   00:01:02:c2:a1:b9 
 UHLc1  159  -   nfe0 xx.yy.ww.zz0  
 00:20:e0:68:5d:c8  UHLc1   11  - L nfe0 127/8
  127.0.0.1  UGRS00  33224   lo0 127.0.0.1
  127.0.0.1  UH  10  33224   lo0 224/4
  127.0.0.1  URS 00  33224   lo0


 $ /sbin/ifconfig
 lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST mtu 33224
 groups: lo
 inet 127.0.0.1 netmask 0xff00
 inet6 ::1 prefixlen 128
 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5
 dc0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
 lladdr 00:02:e3:07:cc:df
 media: Ethernet autoselect (none)
 status: no carrier
 inet6 fe80::202:e3ff:fe07:ccdf%dc0 prefixlen 64 scopeid 0x1
 nfe0: flags=8847UP,BROADCAST,DEBUG,RUNNING,SIMPLEX,MULTICAST mtu
 1500 lladdr 00:16:e6:82:17:da
 groups: egress
 media: Ethernet autoselect (100baseTX full-duplex)
 status: active