Re: nve timeout (and down) regression?

2006-03-31 Thread Spil Oss
My home-network is so simple I could just tie the desktop to the
server's NIC with a cross-cable (xl 3c905C to nve). Let's see if the
3Com 16-port switch is the culprit!

Spil.

On 24/03/06, Kevin Oberman [EMAIL PROTECTED] wrote:
  Date: Fri, 24 Mar 2006 22:33:17 +0200
  From: Ion-Mihai Tetcu [EMAIL PROTECTED]
 
  On Thu, 23 Mar 2006 14:34:24 -0800
  Kevin Oberman [EMAIL PROTECTED] wrote:
 
Date: Thu, 23 Mar 2006 21:59:56 + (UTC)
From: Bjoern A. Zeeb [EMAIL PROTECTED]
   
On Thu, 23 Mar 2006, JoaoBR wrote:
   
 On Thursday 23 March 2006 15:59, Bjoern A. Zeeb wrote:

 nve did not worked on 6.0R (for me) but cvsup to stable resolved the 
 case (for
 me) in end of dezember

 since a month or so with recent releng_6 the problem came back, 
 timeouts and
 stopping rx/tx
   
did you do more updates in the timeframe from december to about a
month ago?
   
if the problem was gone and is back now any (exact) dates to narrow
down the timeframe where the problem came back would be very helpful.
 
  nve0: NVIDIA nForce MCP9 Networking Adapter port 0xbc00-0xbc07 mem 
  0xfebfa000-0xfebfafff irq 22 at device 10.0 on pci0
  nve0: Reserved 0x1000 bytes for rid 0x10 type 3 at 0xfebfa000
  nve0: Ethernet address 00:0a:48:1d:c6:97
  miibus1: MII bus on nve0
  nve0: bpf attached
  nve0: Ethernet address: 00:0a:48:1d:c6:97
  nve0: [MPSAFE]
 
  This happens w/o any real activity on that interface (which goes into
  an Allied Telesyn switch):
  ...
  Mar 24 19:39:54 worf kernel: nve0: device timeout (1)
  Mar 24 19:39:54 worf kernel: nve0: link state changed to DOWN
  Mar 24 19:39:55 worf kernel: nve0: link state changed to UP
  Mar 24 19:40:14 worf kernel: nve0: device timeout (1)
  Mar 24 19:40:14 worf kernel: nve0: link state changed to DOWN
  Mar 24 19:40:15 worf kernel: nve0: link state changed to UP
  Mar 24 19:40:33 worf kernel: nve0: device timeout (2)
  Mar 24 19:40:33 worf kernel: nve0: link state changed to DOWN
  Mar 24 19:40:34 worf kernel: nve0: link state changed to UP
  Mar 24 19:45:52 worf kernel: nve0: device timeout (1)
  Mar 24 19:45:52 worf kernel: nve0: link state changed to DOWN
  Mar 24 19:45:53 worf kernel: nve0: link state changed to UP
  .
 
 
  FreeBSD worf.tecnik93.com 6.1-PRERELEASE FreeBSD 6.1-PRERELEASE #0: Tue Mar 
  21 01:39:15 EET 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC  
  amd64

 Note that we are running on i386 running am an AMD64 platform.

 I updated my system (which was happy on Feb. 15 code) to March 13 code
 and I am still running fine. No errors at all. Also, another system was
 updated to RELENG_6 yesterday and it is also running clean.

 Again, all systems are identical dual core AMD64 systems running i386
 code. (We would like to run amd64, but OpenOffice.org still does not run
 on it and we need that.)

 Only the system in Iowa with the AT switch is seeing problems.

 Even if there is no traffic, it is possible that something that is
 negotiated by the switch is triggering the problem.
 --
 R. Kevin Oberman, Network Engineer
 Energy Sciences Network (ESnet)
 Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
 E-mail: [EMAIL PROTECTED]  Phone: +1 510 486-8634
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to [EMAIL PROTECTED]

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: nve timeout (and down) regression?

2006-03-27 Thread JoaoBR
On Saturday 25 March 2006 08:55, Bjoern A. Zeeb wrote:
 On Sat, 25 Mar 2006, David Xu wrote:
  ÿÿ Saturday 25 March 2006 18:04ÿÿJoaoBR ÿÿ
 
  It appears to be a point
  the machines with problem are all SMP, UP do no show the nve timeout or
  any other problem with it
  alias, same with SK, on SMP the system crashes and with UP it's ok

 For sk please try the new driver Pyun is regularly postng and will
 commit once the 5/6Rs are done.


for some reason I thought it was already commited

so I reused it and my sk problem is gone, since friday the machines are 
running fine again

why it will be in only after the release?

João







A mensagem foi scaneada pelo sistema de e-mail e pode ser considerada segura.
Service fornecido pelo Datacenter Matik  https://datacenter.matik.com.br
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: nve timeout (and down) regression?

2006-03-25 Thread JoaoBR
On Saturday 25 March 2006 02:29, Ion-Mihai Tetcu wrote:

  I updated my system (which was happy on Feb. 15 code) to March 13 code
  and I am still running fine. No errors at all. Also, another system was
  updated to RELENG_6 yesterday and it is also running clean.
 
  Again, all systems are identical dual core AMD64 systems running i386
  code. (We would like to run amd64, but OpenOffice.org still does not run
  on it and we need that.)

 Both my systems are single core single CPU.


It appears to be a point
the machines with problem are all SMP, UP do no show the nve timeout or any 
other problem with it
alias, same with SK, on SMP the system crashes and with UP it's ok

João







A mensagem foi scaneada pelo sistema de e-mail e pode ser considerada segura.
Service fornecido pelo Datacenter Matik  https://datacenter.matik.com.br
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: nve timeout (and down) regression?

2006-03-25 Thread David Xu
在 Saturday 25 March 2006 18:04,JoaoBR 写道:

 It appears to be a point
 the machines with problem are all SMP, UP do no show the nve timeout or any 
 other problem with it
 alias, same with SK, on SMP the system crashes and with UP it's ok
 
 João
 
Mine is UP, chipset is NForce3 250GB, current it shows TIMEOUT error, system 
freezes while resetting the NIC, but still works.

David Xu
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: nve timeout (and down) regression?

2006-03-25 Thread David G. Lawrence
 This happens w/o any real activity on that interface (which goes into
 an Allied Telesyn switch):
 ...
 Mar 24 19:39:54 worf kernel: nve0: device timeout (1)
 Mar 24 19:39:54 worf kernel: nve0: link state changed to DOWN
 Mar 24 19:39:55 worf kernel: nve0: link state changed to UP
 Mar 24 19:40:14 worf kernel: nve0: device timeout (1)

   The problem is the watchdog timeout itself. I've attached am email that
I sent a few months ago which describes the problem, along with a simple
patch which disables the watchdog timer.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.

Date: Wed, 4 Jan 2006 16:21:03 -0800
Subject: Re: nve(4) patch - please test!

 Since I sent the mail below I had to discover that the new driver
 has a problem when no cable is plugged in, at least on my Asus board.
 
 It doesn't only run into timeouts, during some of these timeout the
 machine or at least the keyboard hangs for about a minute.
 
 Is there anything I can do to help debug this?

   I ran into this problem recently as well and spent some time diagnosing
it. It's not that the cable isn't plugged in - rather it happens whenever
the traffic levels are low.
   The problem is that the nvidia-supplied portion of the driver is defering
the releasing of the completed transmit buffers and this occasionally
results in if_timer expiring, causing the driver watchdog routine to be
called (device timeout). The watchdog routine resets the card and the
nvidia-supplied code sits in a high-priority loop waiting for the card
to reset. This can take many seconds and your system will be hung until
it completes.
   I have a work-around patch for the problem that I've attached to this
email. It simply disables the watchdog. A real fix would involve accounting
for the outstanding transmit buffers differently (or perhaps not at all -
e.g. always attempt to call the nvidia-supplied code and if a queue-full
error occurs, then wait for an interrupt before trying to queue more
transmit packets).

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.

Index: if_nve.c
===
RCS file: /home/ncvs/src/sys/dev/nve/if_nve.c,v
retrieving revision 1.7.2.8
diff -c -r1.7.2.8 if_nve.c
*** if_nve.c25 Dec 2005 21:57:03 -  1.7.2.8
--- if_nve.c5 Jan 2006 00:12:45 -
***
*** 943,949 
return;
}
/* Set watchdog timer. */
!   ifp-if_timer = 8;
  
/* Copy packet to BPF tap */
BPF_MTAP(ifp, m0);
--- 943,949 
return;
}
/* Set watchdog timer. */
!   ifp-if_timer = 0;
  
/* Copy packet to BPF tap */
BPF_MTAP(ifp, m0);
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: nve timeout (and down) regression?

2006-03-25 Thread Bjoern A. Zeeb

On Sat, 25 Mar 2006, David Xu wrote:


ÿÿ Saturday 25 March 2006 18:04ÿÿJoaoBR ÿÿ


It appears to be a point
the machines with problem are all SMP, UP do no show the nve timeout or any
other problem with it
alias, same with SK, on SMP the system crashes and with UP it's ok


For sk please try the new driver Pyun is regularly postng and will
commit once the 5/6Rs are done.



Mine is UP, chipset is NForce3 250GB, current it shows TIMEOUT error, system
freezes while resetting the NIC, but still works.


Yes, people are seeing this with Nf4 too. Could you give me the full
details as asked earlier in this thread or as questioned in
http://www.freebsd.org/cgi/query-pr.cgi?pr=94524 ?

--
Bjoern A. Zeeb  bzeeb at Zabbadoz dot NeT___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: nve timeout (and down) regression?

2006-03-24 Thread Ion-Mihai Tetcu
On Thu, 23 Mar 2006 14:34:24 -0800
Kevin Oberman [EMAIL PROTECTED] wrote:

  Date: Thu, 23 Mar 2006 21:59:56 + (UTC)
  From: Bjoern A. Zeeb [EMAIL PROTECTED]
  
  On Thu, 23 Mar 2006, JoaoBR wrote:
  
   On Thursday 23 March 2006 15:59, Bjoern A. Zeeb wrote:
  
   nve did not worked on 6.0R (for me) but cvsup to stable resolved the case 
   (for
   me) in end of dezember
  
   since a month or so with recent releng_6 the problem came back, timeouts 
   and
   stopping rx/tx
  
  did you do more updates in the timeframe from december to about a
  month ago?
  
  if the problem was gone and is back now any (exact) dates to narrow
  down the timeframe where the problem came back would be very helpful.

nve0: NVIDIA nForce MCP9 Networking Adapter port 0xbc00-0xbc07 mem 
0xfebfa000-0xfebfafff irq 22 at device 10.0 on pci0
nve0: Reserved 0x1000 bytes for rid 0x10 type 3 at 0xfebfa000
nve0: Ethernet address 00:0a:48:1d:c6:97
miibus1: MII bus on nve0
nve0: bpf attached
nve0: Ethernet address: 00:0a:48:1d:c6:97
nve0: [MPSAFE]

This happens w/o any real activity on that interface (which goes into
an Allied Telesyn switch):
...
Mar 24 19:39:54 worf kernel: nve0: device timeout (1)
Mar 24 19:39:54 worf kernel: nve0: link state changed to DOWN
Mar 24 19:39:55 worf kernel: nve0: link state changed to UP
Mar 24 19:40:14 worf kernel: nve0: device timeout (1)
Mar 24 19:40:14 worf kernel: nve0: link state changed to DOWN
Mar 24 19:40:15 worf kernel: nve0: link state changed to UP
Mar 24 19:40:33 worf kernel: nve0: device timeout (2)
Mar 24 19:40:33 worf kernel: nve0: link state changed to DOWN
Mar 24 19:40:34 worf kernel: nve0: link state changed to UP
Mar 24 19:45:52 worf kernel: nve0: device timeout (1)
Mar 24 19:45:52 worf kernel: nve0: link state changed to DOWN
Mar 24 19:45:53 worf kernel: nve0: link state changed to UP
.


FreeBSD worf.tecnik93.com 6.1-PRERELEASE FreeBSD 6.1-PRERELEASE #0: Tue Mar 21 
01:39:15 EET 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC  amd64


-- 
IOnut - Unregistered ;) FreeBSD user
  Intellectual Property is   nowhere near as valuable   as Intellect

BOFH excuse #442:
Trojan horse ran out of hay


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: nve timeout (and down) regression?

2006-03-24 Thread Kevin Oberman
 Date: Fri, 24 Mar 2006 22:33:17 +0200
 From: Ion-Mihai Tetcu [EMAIL PROTECTED]
 
 On Thu, 23 Mar 2006 14:34:24 -0800
 Kevin Oberman [EMAIL PROTECTED] wrote:
 
   Date: Thu, 23 Mar 2006 21:59:56 + (UTC)
   From: Bjoern A. Zeeb [EMAIL PROTECTED]
   
   On Thu, 23 Mar 2006, JoaoBR wrote:
   
On Thursday 23 March 2006 15:59, Bjoern A. Zeeb wrote:
   
nve did not worked on 6.0R (for me) but cvsup to stable resolved the 
case (for
me) in end of dezember
   
since a month or so with recent releng_6 the problem came back, 
timeouts and
stopping rx/tx
   
   did you do more updates in the timeframe from december to about a
   month ago?
   
   if the problem was gone and is back now any (exact) dates to narrow
   down the timeframe where the problem came back would be very helpful.
 
 nve0: NVIDIA nForce MCP9 Networking Adapter port 0xbc00-0xbc07 mem 
 0xfebfa000-0xfebfafff irq 22 at device 10.0 on pci0
 nve0: Reserved 0x1000 bytes for rid 0x10 type 3 at 0xfebfa000
 nve0: Ethernet address 00:0a:48:1d:c6:97
 miibus1: MII bus on nve0
 nve0: bpf attached
 nve0: Ethernet address: 00:0a:48:1d:c6:97
 nve0: [MPSAFE]
 
 This happens w/o any real activity on that interface (which goes into
 an Allied Telesyn switch):
 ...
 Mar 24 19:39:54 worf kernel: nve0: device timeout (1)
 Mar 24 19:39:54 worf kernel: nve0: link state changed to DOWN
 Mar 24 19:39:55 worf kernel: nve0: link state changed to UP
 Mar 24 19:40:14 worf kernel: nve0: device timeout (1)
 Mar 24 19:40:14 worf kernel: nve0: link state changed to DOWN
 Mar 24 19:40:15 worf kernel: nve0: link state changed to UP
 Mar 24 19:40:33 worf kernel: nve0: device timeout (2)
 Mar 24 19:40:33 worf kernel: nve0: link state changed to DOWN
 Mar 24 19:40:34 worf kernel: nve0: link state changed to UP
 Mar 24 19:45:52 worf kernel: nve0: device timeout (1)
 Mar 24 19:45:52 worf kernel: nve0: link state changed to DOWN
 Mar 24 19:45:53 worf kernel: nve0: link state changed to UP
 .
 
 
 FreeBSD worf.tecnik93.com 6.1-PRERELEASE FreeBSD 6.1-PRERELEASE #0: Tue Mar 
 21 01:39:15 EET 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC  amd64

Note that we are running on i386 running am an AMD64 platform.

I updated my system (which was happy on Feb. 15 code) to March 13 code
and I am still running fine. No errors at all. Also, another system was
updated to RELENG_6 yesterday and it is also running clean.

Again, all systems are identical dual core AMD64 systems running i386
code. (We would like to run amd64, but OpenOffice.org still does not run
on it and we need that.)

Only the system in Iowa with the AT switch is seeing problems.

Even if there is no traffic, it is possible that something that is
negotiated by the switch is triggering the problem.
-- 
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: [EMAIL PROTECTED]   Phone: +1 510 486-8634
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: nve timeout (and down) regression?

2006-03-24 Thread Ion-Mihai Tetcu
On Fri, 24 Mar 2006 12:55:41 -0800
Kevin Oberman [EMAIL PROTECTED] wrote:

  Date: Fri, 24 Mar 2006 22:33:17 +0200
  From: Ion-Mihai Tetcu [EMAIL PROTECTED]
  
  On Thu, 23 Mar 2006 14:34:24 -0800
  Kevin Oberman [EMAIL PROTECTED] wrote:
  
Date: Thu, 23 Mar 2006 21:59:56 + (UTC)
From: Bjoern A. Zeeb [EMAIL PROTECTED]

On Thu, 23 Mar 2006, JoaoBR wrote:

 On Thursday 23 March 2006 15:59, Bjoern A. Zeeb wrote:

 nve did not worked on 6.0R (for me) but cvsup to stable resolved the 
 case (for
 me) in end of dezember

 since a month or so with recent releng_6 the problem came back, 
 timeouts and
 stopping rx/tx

did you do more updates in the timeframe from december to about a
month ago?

if the problem was gone and is back now any (exact) dates to narrow
down the timeframe where the problem came back would be very helpful.
  
  nve0: NVIDIA nForce MCP9 Networking Adapter port 0xbc00-0xbc07 mem 
  0xfebfa000-0xfebfafff irq 22 at device 10.0 on pci0
  nve0: Reserved 0x1000 bytes for rid 0x10 type 3 at 0xfebfa000
  nve0: Ethernet address 00:0a:48:1d:c6:97
  miibus1: MII bus on nve0
  nve0: bpf attached
  nve0: Ethernet address: 00:0a:48:1d:c6:97
  nve0: [MPSAFE]
  
  This happens w/o any real activity on that interface (which goes into
  an Allied Telesyn switch):
  ...
  Mar 24 19:39:54 worf kernel: nve0: device timeout (1)
  Mar 24 19:39:54 worf kernel: nve0: link state changed to DOWN
  Mar 24 19:39:55 worf kernel: nve0: link state changed to UP
  Mar 24 19:40:14 worf kernel: nve0: device timeout (1)
  Mar 24 19:40:14 worf kernel: nve0: link state changed to DOWN
  Mar 24 19:40:15 worf kernel: nve0: link state changed to UP
  Mar 24 19:40:33 worf kernel: nve0: device timeout (2)
  Mar 24 19:40:33 worf kernel: nve0: link state changed to DOWN
  Mar 24 19:40:34 worf kernel: nve0: link state changed to UP
  Mar 24 19:45:52 worf kernel: nve0: device timeout (1)
  Mar 24 19:45:52 worf kernel: nve0: link state changed to DOWN
  Mar 24 19:45:53 worf kernel: nve0: link state changed to UP
  .
  
  
  FreeBSD worf.tecnik93.com 6.1-PRERELEASE FreeBSD 6.1-PRERELEASE #0: Tue Mar 
  21 01:39:15 EET 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC  
  amd64
 
 Note that we are running on i386 running am an AMD64 platform.

I just enabled the nve0 on my desktop (I'm using sk0, it's a Asus
A8N-SLI Deluxe motherboard, both interfaces connected to the same 8-port
Surecom switch - talking about very inexpensive :) and it seems to work
OK.

nve0: NVIDIA nForce MCP9 Networking Adapter port 0xb000-0xb007 mem 
0xca10-0xca100fff irq 23 at device 10.0 on pci0
nve0: Reserved 0x1000 bytes for rid 0x10 type 3 at 0xca10
nve0: Ethernet address 00:15:f2:39:09:08
miibus1: MII bus on nve0
nve0: bpf attached
nve0: Ethernet address: 00:15:f2:39:09:08
nve0: [MPSAFE]

FreeBSD it.buh.tecnik93.com 6.1-PRERELEASE FreeBSD 6.1-PRERELEASE #0: Fri Feb 
24 07:01:54 EET 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/IT6_B_P  i386

 I updated my system (which was happy on Feb. 15 code) to March 13 code
 and I am still running fine. No errors at all. Also, another system was
 updated to RELENG_6 yesterday and it is also running clean.
 
 Again, all systems are identical dual core AMD64 systems running i386
 code. (We would like to run amd64, but OpenOffice.org still does not run
 on it and we need that.)

Both my systems are single core single CPU.

 Only the system in Iowa with the AT switch is seeing problems.
 
 Even if there is no traffic, it is possible that something that is
 negotiated by the switch is triggering the problem.

Possibly, but I think I remember seeing the same w/o cable plugged-in;
I'll try to remember to test this for a a few minutes when I'll be
on-site next week.


-- 
IOnut - Unregistered ;) FreeBSD user
  Intellectual Property is   nowhere near as valuable   as Intellect

BOFH excuse #266:
All of the packets are empty


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: nve timeout (and down) regression?

2006-03-24 Thread David Xu
在 Friday 24 March 2006 02:40,JoaoBR 写道:
  The other patch cited in the message has never been made:
  diff -u -r1.7.2.4 if_nve.c
  --- if_nve.c9 Oct 2005 04:18:17 -   1.7.2.4
  +++ if_nve.c27 Oct 2005 09:58:45 -
  @@ -727,7 +727,7 @@
 
  DEBUGOUT(NVE_DEBUG_INIT, nve: nve_init_rings - entry\n);
 
  -   sc-cur_rx = sc-cur_tx = sc-pending_rxs = sc-pending_txs = 0;
  +   sc-cur_rx = sc-cur_tx = sc-pending_rxs = 0;
 
 
 and I did this part and my NIC is running, as I said still lot of collisions 
 caused by it but it is running
 
 
 João
 

This change causes my NIC to not work anymore, though I still saw
timeout without this change, I think this varies from hardware 
revision to revision, unpredictable at all.

David Xu
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: nve timeout (and down) regression?

2006-03-23 Thread Kevin Oberman
I am a bit confused. The first addition of sc-pending_txs = 0; was
MFC'ed back in December by obrien.

Check around line 730 of if_nv.c (or whatever it's called in 6.0)
sc-linkup = 0;
sc-cur_rx = 0;
sc-pending_rxs = 0;
+   sc-pending_txs = 0;
This should mostly eliminate the problem.

The other patch cited in the message has never been made:
diff -u -r1.7.2.4 if_nve.c
--- if_nve.c9 Oct 2005 04:18:17 -   1.7.2.4
+++ if_nve.c27 Oct 2005 09:58:45 -
@@ -727,7 +727,7 @@

DEBUGOUT(NVE_DEBUG_INIT, nve: nve_init_rings - entry\n);

-   sc-cur_rx = sc-cur_tx = sc-pending_rxs = sc-pending_txs = 0;
+   sc-cur_rx = sc-cur_tx = sc-pending_rxs = 0;
/* Initialise RX ring */
for (i = 0; i  RX_RING_SIZE; i++) {
struct nve_rx_desc *desc = sc-rx_desc + i;


So sc-pending_txs should only be reset to zero only in nve_stop but not
in nve_init_rings? 
-- 
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: [EMAIL PROTECTED]   Phone: +1 510 486-8634
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: nve timeout (and down) regression?

2006-03-23 Thread JoaoBR
On Thursday 23 March 2006 15:29, Kevin Oberman wrote:
 I am a bit confused. The first addition of sc-pending_txs = 0; was
 MFC'ed back in December by obrien.

 Check around line 730 of if_nv.c (or whatever it's called in 6.0)
 sc-linkup = 0;
 sc-cur_rx = 0;
 sc-pending_rxs = 0;
 +   sc-pending_txs = 0;
 This should mostly eliminate the problem.


this part actually is in the driver but nve still doing timeout and stop 
imediatly rx/tx

 The other patch cited in the message has never been made:
 diff -u -r1.7.2.4 if_nve.c
 --- if_nve.c9 Oct 2005 04:18:17 -   1.7.2.4
 +++ if_nve.c27 Oct 2005 09:58:45 -
 @@ -727,7 +727,7 @@

 DEBUGOUT(NVE_DEBUG_INIT, nve: nve_init_rings - entry\n);

 -   sc-cur_rx = sc-cur_tx = sc-pending_rxs = sc-pending_txs = 0;
 +   sc-cur_rx = sc-cur_tx = sc-pending_rxs = 0;


and I did this part and my NIC is running, as I said still lot of collisions 
caused by it but it is running


João


 /* Initialise RX ring */
 for (i = 0; i  RX_RING_SIZE; i++) {
 struct nve_rx_desc *desc = sc-rx_desc + i;


 So sc-pending_txs should only be reset to zero only in nve_stop but not
 in nve_init_rings?








A mensagem foi scaneada pelo sistema de e-mail e pode ser considerada segura.
Service fornecido pelo Datacenter Matik  https://datacenter.matik.com.br
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: nve timeout (and down) regression?

2006-03-23 Thread Bjoern A. Zeeb

On Thu, 23 Mar 2006, JoaoBR wrote:

Hi,


The other patch cited in the message has never been made:
diff -u -r1.7.2.4 if_nve.c
--- if_nve.c9 Oct 2005 04:18:17 -   1.7.2.4
+++ if_nve.c27 Oct 2005 09:58:45 -
@@ -727,7 +727,7 @@

DEBUGOUT(NVE_DEBUG_INIT, nve: nve_init_rings - entry\n);

-   sc-cur_rx = sc-cur_tx = sc-pending_rxs = sc-pending_txs = 0;
+   sc-cur_rx = sc-cur_tx = sc-pending_rxs = 0;



and I did this part and my NIC is running, as I said still lot of collisions
caused by it but it is running


If you have collisions you have most likeely a duplex mismatch.

If you read the code and I remember right   the above change is a NOP.


The timeouts have been there and are there. The difference with the
last commits is that a lot of people couldn't get the NIC working at
all before   and now it works (somewhat) but there are timeouts from
time to time which for some people seem to auto-recover and for
others still get things 'stuck'.

The problem is to diagnose what everyone really has
- branch running (RELENG_6 or HEAD)
- i386 or amd64
- exact FreeBSD revisions for if_nve.c
- if using patches which
- pciconf -lv | grep -A4 ^nve
- which board
- exact problems
* is the interface working at all
* is it just stuck from time to time
* ...

See http://www.freebsd.org/cgi/query-pr.cgi?pr=94524 for more
questions. You my want to submit a fllow up and add your description
with the answer to these questions there.

--
Bjoern A. Zeeb  bzeeb at Zabbadoz dot NeT
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: nve timeout (and down) regression?

2006-03-23 Thread JoaoBR
On Thursday 23 March 2006 15:59, Bjoern A. Zeeb wrote:

 If you have collisions you have most likeely a duplex mismatch.

yep, but I set manually matching with the switch and tried other speeds, no 
change

 If you read the code and I remember right   the above change is a NOP.


anyway, resolved my case ...




 The timeouts have been there and are there. The difference with the
 last commits is that a lot of people couldn't get the NIC working at
 all before   and now it works (somewhat) but there are timeouts from
 time to time which for some people seem to auto-recover and for
 others still get things 'stuck'.


nve did not worked on 6.0R (for me) but cvsup to stable resolved the case (for 
me) in end of dezember

since a month or so with recent releng_6 the problem came back, timeouts and 
stopping rx/tx 


 The problem is to diagnose what everyone really has
 - branch running (RELENG_6 or HEAD)

releng_6 last cvsup from thi monday

 - i386 or amd64

amd64

 - exact FreeBSD revisions for if_nve.c
 - if using patches which
 - pciconf -lv | grep -A4 ^nve

nve0: NVIDIA nForce MCP7 Networking Adapter port 0xd400-0xd407 mem 
0xec00-0xec000fff irq 20 at device 5.0 on pci0
nve0: Ethernet address 00:04:61:98:97:d5
miibus0: MII bus on nve0

[EMAIL PROTECTED]:5:0:  class=0x068000 card=0x100c1695 chip=0x00df10de rev=0xa2 
hdr=0x00
vendor   = 'NVIDIA Corporation'
device   = 'Network Bus Enumerator'
class= bridge


 - which board
  - exact problems 
   * is the interface working at all

the system after probing HW comes up with 
nve0 down
nve0 up
nve0 down 


João

   * is it just stuck from time to time
   * ...

 See http://www.freebsd.org/cgi/query-pr.cgi?pr=94524 for more
 questions. You my want to submit a fllow up and add your description
 with the answer to these questions there.

-- 

Atenciosamente

Infomatik Internet Technology
(18)3551.8155  (18)8112.7007
http://info.matik.com.br







A mensagem foi scaneada pelo sistema de e-mail e pode ser considerada segura.
Service fornecido pelo Datacenter Matik  https://datacenter.matik.com.br
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: nve timeout (and down) regression?

2006-03-23 Thread Bjoern A. Zeeb

On Thu, 23 Mar 2006, JoaoBR wrote:


On Thursday 23 March 2006 15:59, Bjoern A. Zeeb wrote:

nve did not worked on 6.0R (for me) but cvsup to stable resolved the case (for
me) in end of dezember

since a month or so with recent releng_6 the problem came back, timeouts and
stopping rx/tx


did you do more updates in the timeframe from december to about a
month ago?

if the problem was gone and is back now any (exact) dates to narrow
down the timeframe where the problem came back would be very helpful.

--
Bjoern A. Zeeb  bzeeb at Zabbadoz dot NeT
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: nve timeout (and down) regression?

2006-03-23 Thread JoaoBR
On Thursday 23 March 2006 18:59, Bjoern A. Zeeb wrote:
 On Thu, 23 Mar 2006, JoaoBR wrote:
  On Thursday 23 March 2006 15:59, Bjoern A. Zeeb wrote:
 
  nve did not worked on 6.0R (for me) but cvsup to stable resolved the case
  (for me) in end of dezember
 
  since a month or so with recent releng_6 the problem came back, timeouts
  and stopping rx/tx

 did you do more updates in the timeframe from december to about a
 month ago?


yes, aprox once  a week since 6.0R release

 if the problem was gone and is back now any (exact) dates to narrow
 down the timeframe where the problem came back would be very helpful.

I know but unfortunatly I didn't tracked it and what I said is the most exact 
I have,

I just got something interesting, 
It seems the problem is not with media 100baseTX full-duplex (autoselect or 
set) 
but only with 100baseTX (autoselect or set)

but I need to doublecheck if it is really the same MB

João







A mensagem foi scaneada pelo sistema de e-mail e pode ser considerada segura.
Service fornecido pelo Datacenter Matik  https://datacenter.matik.com.br
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: nve timeout (and down) regression?

2006-03-23 Thread Kevin Oberman
 Date: Thu, 23 Mar 2006 21:59:56 + (UTC)
 From: Bjoern A. Zeeb [EMAIL PROTECTED]
 
 On Thu, 23 Mar 2006, JoaoBR wrote:
 
  On Thursday 23 March 2006 15:59, Bjoern A. Zeeb wrote:
 
  nve did not worked on 6.0R (for me) but cvsup to stable resolved the case 
  (for
  me) in end of dezember
 
  since a month or so with recent releng_6 the problem came back, timeouts and
  stopping rx/tx
 
 did you do more updates in the timeframe from december to about a
 month ago?
 
 if the problem was gone and is back now any (exact) dates to narrow
 down the timeframe where the problem came back would be very helpful.

We have several identical systems and most are running fine. Mine is
running RELENG_6 and was updated on 2/15 and I have no problem. Another
system that was just updated last week (I don't have the exact time) is
showing the problem. Another was built 1/21 and runs fine.

Guess I'll try updating my 2/15 system and see if it has problems.

Another thing that might be related is that the system having problems
is plugged into a very inexpensive switch (Allied Telesyn), my system
uses a Netgear FS108 and the third is connected to a Cisco 3548. I know
that this is unlikely, but I thought that it was worth mentioning. All
are claimed to be running 100-FD. Unfortunately, the one causing most of
the problems is about 2000 miles away, so I have only limited access to
that one.
-- 
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: [EMAIL PROTECTED]   Phone: +1 510 486-8634
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]