Re: nve timeout (and down) regression?
My home-network is so simple I could just tie the desktop to the server's NIC with a cross-cable (xl 3c905C to nve). Let's see if the 3Com 16-port switch is the culprit! Spil. On 24/03/06, Kevin Oberman [EMAIL PROTECTED] wrote: Date: Fri, 24 Mar 2006 22:33:17 +0200 From: Ion-Mihai Tetcu [EMAIL PROTECTED] On Thu, 23 Mar 2006 14:34:24 -0800 Kevin Oberman [EMAIL PROTECTED] wrote: Date: Thu, 23 Mar 2006 21:59:56 + (UTC) From: Bjoern A. Zeeb [EMAIL PROTECTED] On Thu, 23 Mar 2006, JoaoBR wrote: On Thursday 23 March 2006 15:59, Bjoern A. Zeeb wrote: nve did not worked on 6.0R (for me) but cvsup to stable resolved the case (for me) in end of dezember since a month or so with recent releng_6 the problem came back, timeouts and stopping rx/tx did you do more updates in the timeframe from december to about a month ago? if the problem was gone and is back now any (exact) dates to narrow down the timeframe where the problem came back would be very helpful. nve0: NVIDIA nForce MCP9 Networking Adapter port 0xbc00-0xbc07 mem 0xfebfa000-0xfebfafff irq 22 at device 10.0 on pci0 nve0: Reserved 0x1000 bytes for rid 0x10 type 3 at 0xfebfa000 nve0: Ethernet address 00:0a:48:1d:c6:97 miibus1: MII bus on nve0 nve0: bpf attached nve0: Ethernet address: 00:0a:48:1d:c6:97 nve0: [MPSAFE] This happens w/o any real activity on that interface (which goes into an Allied Telesyn switch): ... Mar 24 19:39:54 worf kernel: nve0: device timeout (1) Mar 24 19:39:54 worf kernel: nve0: link state changed to DOWN Mar 24 19:39:55 worf kernel: nve0: link state changed to UP Mar 24 19:40:14 worf kernel: nve0: device timeout (1) Mar 24 19:40:14 worf kernel: nve0: link state changed to DOWN Mar 24 19:40:15 worf kernel: nve0: link state changed to UP Mar 24 19:40:33 worf kernel: nve0: device timeout (2) Mar 24 19:40:33 worf kernel: nve0: link state changed to DOWN Mar 24 19:40:34 worf kernel: nve0: link state changed to UP Mar 24 19:45:52 worf kernel: nve0: device timeout (1) Mar 24 19:45:52 worf kernel: nve0: link state changed to DOWN Mar 24 19:45:53 worf kernel: nve0: link state changed to UP . FreeBSD worf.tecnik93.com 6.1-PRERELEASE FreeBSD 6.1-PRERELEASE #0: Tue Mar 21 01:39:15 EET 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC amd64 Note that we are running on i386 running am an AMD64 platform. I updated my system (which was happy on Feb. 15 code) to March 13 code and I am still running fine. No errors at all. Also, another system was updated to RELENG_6 yesterday and it is also running clean. Again, all systems are identical dual core AMD64 systems running i386 code. (We would like to run amd64, but OpenOffice.org still does not run on it and we need that.) Only the system in Iowa with the AT switch is seeing problems. Even if there is no traffic, it is possible that something that is negotiated by the switch is triggering the problem. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: [EMAIL PROTECTED] Phone: +1 510 486-8634 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nve timeout (and down) regression?
On Saturday 25 March 2006 08:55, Bjoern A. Zeeb wrote: On Sat, 25 Mar 2006, David Xu wrote: ÿÿ Saturday 25 March 2006 18:04ÿÿJoaoBR ÿÿ It appears to be a point the machines with problem are all SMP, UP do no show the nve timeout or any other problem with it alias, same with SK, on SMP the system crashes and with UP it's ok For sk please try the new driver Pyun is regularly postng and will commit once the 5/6Rs are done. for some reason I thought it was already commited so I reused it and my sk problem is gone, since friday the machines are running fine again why it will be in only after the release? João A mensagem foi scaneada pelo sistema de e-mail e pode ser considerada segura. Service fornecido pelo Datacenter Matik https://datacenter.matik.com.br ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nve timeout (and down) regression?
On Saturday 25 March 2006 02:29, Ion-Mihai Tetcu wrote: I updated my system (which was happy on Feb. 15 code) to March 13 code and I am still running fine. No errors at all. Also, another system was updated to RELENG_6 yesterday and it is also running clean. Again, all systems are identical dual core AMD64 systems running i386 code. (We would like to run amd64, but OpenOffice.org still does not run on it and we need that.) Both my systems are single core single CPU. It appears to be a point the machines with problem are all SMP, UP do no show the nve timeout or any other problem with it alias, same with SK, on SMP the system crashes and with UP it's ok João A mensagem foi scaneada pelo sistema de e-mail e pode ser considerada segura. Service fornecido pelo Datacenter Matik https://datacenter.matik.com.br ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nve timeout (and down) regression?
在 Saturday 25 March 2006 18:04,JoaoBR 写道: It appears to be a point the machines with problem are all SMP, UP do no show the nve timeout or any other problem with it alias, same with SK, on SMP the system crashes and with UP it's ok João Mine is UP, chipset is NForce3 250GB, current it shows TIMEOUT error, system freezes while resetting the NIC, but still works. David Xu ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nve timeout (and down) regression?
This happens w/o any real activity on that interface (which goes into an Allied Telesyn switch): ... Mar 24 19:39:54 worf kernel: nve0: device timeout (1) Mar 24 19:39:54 worf kernel: nve0: link state changed to DOWN Mar 24 19:39:55 worf kernel: nve0: link state changed to UP Mar 24 19:40:14 worf kernel: nve0: device timeout (1) The problem is the watchdog timeout itself. I've attached am email that I sent a few months ago which describes the problem, along with a simple patch which disables the watchdog timer. -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities. Date: Wed, 4 Jan 2006 16:21:03 -0800 Subject: Re: nve(4) patch - please test! Since I sent the mail below I had to discover that the new driver has a problem when no cable is plugged in, at least on my Asus board. It doesn't only run into timeouts, during some of these timeout the machine or at least the keyboard hangs for about a minute. Is there anything I can do to help debug this? I ran into this problem recently as well and spent some time diagnosing it. It's not that the cable isn't plugged in - rather it happens whenever the traffic levels are low. The problem is that the nvidia-supplied portion of the driver is defering the releasing of the completed transmit buffers and this occasionally results in if_timer expiring, causing the driver watchdog routine to be called (device timeout). The watchdog routine resets the card and the nvidia-supplied code sits in a high-priority loop waiting for the card to reset. This can take many seconds and your system will be hung until it completes. I have a work-around patch for the problem that I've attached to this email. It simply disables the watchdog. A real fix would involve accounting for the outstanding transmit buffers differently (or perhaps not at all - e.g. always attempt to call the nvidia-supplied code and if a queue-full error occurs, then wait for an interrupt before trying to queue more transmit packets). -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 The FreeBSD Project - http://www.freebsd.org Pave the road of life with opportunities. Index: if_nve.c === RCS file: /home/ncvs/src/sys/dev/nve/if_nve.c,v retrieving revision 1.7.2.8 diff -c -r1.7.2.8 if_nve.c *** if_nve.c25 Dec 2005 21:57:03 - 1.7.2.8 --- if_nve.c5 Jan 2006 00:12:45 - *** *** 943,949 return; } /* Set watchdog timer. */ ! ifp-if_timer = 8; /* Copy packet to BPF tap */ BPF_MTAP(ifp, m0); --- 943,949 return; } /* Set watchdog timer. */ ! ifp-if_timer = 0; /* Copy packet to BPF tap */ BPF_MTAP(ifp, m0); ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nve timeout (and down) regression?
On Sat, 25 Mar 2006, David Xu wrote: ÿÿ Saturday 25 March 2006 18:04ÿÿJoaoBR ÿÿ It appears to be a point the machines with problem are all SMP, UP do no show the nve timeout or any other problem with it alias, same with SK, on SMP the system crashes and with UP it's ok For sk please try the new driver Pyun is regularly postng and will commit once the 5/6Rs are done. Mine is UP, chipset is NForce3 250GB, current it shows TIMEOUT error, system freezes while resetting the NIC, but still works. Yes, people are seeing this with Nf4 too. Could you give me the full details as asked earlier in this thread or as questioned in http://www.freebsd.org/cgi/query-pr.cgi?pr=94524 ? -- Bjoern A. Zeeb bzeeb at Zabbadoz dot NeT___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nve timeout (and down) regression?
On Thu, 23 Mar 2006 14:34:24 -0800 Kevin Oberman [EMAIL PROTECTED] wrote: Date: Thu, 23 Mar 2006 21:59:56 + (UTC) From: Bjoern A. Zeeb [EMAIL PROTECTED] On Thu, 23 Mar 2006, JoaoBR wrote: On Thursday 23 March 2006 15:59, Bjoern A. Zeeb wrote: nve did not worked on 6.0R (for me) but cvsup to stable resolved the case (for me) in end of dezember since a month or so with recent releng_6 the problem came back, timeouts and stopping rx/tx did you do more updates in the timeframe from december to about a month ago? if the problem was gone and is back now any (exact) dates to narrow down the timeframe where the problem came back would be very helpful. nve0: NVIDIA nForce MCP9 Networking Adapter port 0xbc00-0xbc07 mem 0xfebfa000-0xfebfafff irq 22 at device 10.0 on pci0 nve0: Reserved 0x1000 bytes for rid 0x10 type 3 at 0xfebfa000 nve0: Ethernet address 00:0a:48:1d:c6:97 miibus1: MII bus on nve0 nve0: bpf attached nve0: Ethernet address: 00:0a:48:1d:c6:97 nve0: [MPSAFE] This happens w/o any real activity on that interface (which goes into an Allied Telesyn switch): ... Mar 24 19:39:54 worf kernel: nve0: device timeout (1) Mar 24 19:39:54 worf kernel: nve0: link state changed to DOWN Mar 24 19:39:55 worf kernel: nve0: link state changed to UP Mar 24 19:40:14 worf kernel: nve0: device timeout (1) Mar 24 19:40:14 worf kernel: nve0: link state changed to DOWN Mar 24 19:40:15 worf kernel: nve0: link state changed to UP Mar 24 19:40:33 worf kernel: nve0: device timeout (2) Mar 24 19:40:33 worf kernel: nve0: link state changed to DOWN Mar 24 19:40:34 worf kernel: nve0: link state changed to UP Mar 24 19:45:52 worf kernel: nve0: device timeout (1) Mar 24 19:45:52 worf kernel: nve0: link state changed to DOWN Mar 24 19:45:53 worf kernel: nve0: link state changed to UP . FreeBSD worf.tecnik93.com 6.1-PRERELEASE FreeBSD 6.1-PRERELEASE #0: Tue Mar 21 01:39:15 EET 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC amd64 -- IOnut - Unregistered ;) FreeBSD user Intellectual Property is nowhere near as valuable as Intellect BOFH excuse #442: Trojan horse ran out of hay ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nve timeout (and down) regression?
Date: Fri, 24 Mar 2006 22:33:17 +0200 From: Ion-Mihai Tetcu [EMAIL PROTECTED] On Thu, 23 Mar 2006 14:34:24 -0800 Kevin Oberman [EMAIL PROTECTED] wrote: Date: Thu, 23 Mar 2006 21:59:56 + (UTC) From: Bjoern A. Zeeb [EMAIL PROTECTED] On Thu, 23 Mar 2006, JoaoBR wrote: On Thursday 23 March 2006 15:59, Bjoern A. Zeeb wrote: nve did not worked on 6.0R (for me) but cvsup to stable resolved the case (for me) in end of dezember since a month or so with recent releng_6 the problem came back, timeouts and stopping rx/tx did you do more updates in the timeframe from december to about a month ago? if the problem was gone and is back now any (exact) dates to narrow down the timeframe where the problem came back would be very helpful. nve0: NVIDIA nForce MCP9 Networking Adapter port 0xbc00-0xbc07 mem 0xfebfa000-0xfebfafff irq 22 at device 10.0 on pci0 nve0: Reserved 0x1000 bytes for rid 0x10 type 3 at 0xfebfa000 nve0: Ethernet address 00:0a:48:1d:c6:97 miibus1: MII bus on nve0 nve0: bpf attached nve0: Ethernet address: 00:0a:48:1d:c6:97 nve0: [MPSAFE] This happens w/o any real activity on that interface (which goes into an Allied Telesyn switch): ... Mar 24 19:39:54 worf kernel: nve0: device timeout (1) Mar 24 19:39:54 worf kernel: nve0: link state changed to DOWN Mar 24 19:39:55 worf kernel: nve0: link state changed to UP Mar 24 19:40:14 worf kernel: nve0: device timeout (1) Mar 24 19:40:14 worf kernel: nve0: link state changed to DOWN Mar 24 19:40:15 worf kernel: nve0: link state changed to UP Mar 24 19:40:33 worf kernel: nve0: device timeout (2) Mar 24 19:40:33 worf kernel: nve0: link state changed to DOWN Mar 24 19:40:34 worf kernel: nve0: link state changed to UP Mar 24 19:45:52 worf kernel: nve0: device timeout (1) Mar 24 19:45:52 worf kernel: nve0: link state changed to DOWN Mar 24 19:45:53 worf kernel: nve0: link state changed to UP . FreeBSD worf.tecnik93.com 6.1-PRERELEASE FreeBSD 6.1-PRERELEASE #0: Tue Mar 21 01:39:15 EET 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC amd64 Note that we are running on i386 running am an AMD64 platform. I updated my system (which was happy on Feb. 15 code) to March 13 code and I am still running fine. No errors at all. Also, another system was updated to RELENG_6 yesterday and it is also running clean. Again, all systems are identical dual core AMD64 systems running i386 code. (We would like to run amd64, but OpenOffice.org still does not run on it and we need that.) Only the system in Iowa with the AT switch is seeing problems. Even if there is no traffic, it is possible that something that is negotiated by the switch is triggering the problem. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: [EMAIL PROTECTED] Phone: +1 510 486-8634 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nve timeout (and down) regression?
On Fri, 24 Mar 2006 12:55:41 -0800 Kevin Oberman [EMAIL PROTECTED] wrote: Date: Fri, 24 Mar 2006 22:33:17 +0200 From: Ion-Mihai Tetcu [EMAIL PROTECTED] On Thu, 23 Mar 2006 14:34:24 -0800 Kevin Oberman [EMAIL PROTECTED] wrote: Date: Thu, 23 Mar 2006 21:59:56 + (UTC) From: Bjoern A. Zeeb [EMAIL PROTECTED] On Thu, 23 Mar 2006, JoaoBR wrote: On Thursday 23 March 2006 15:59, Bjoern A. Zeeb wrote: nve did not worked on 6.0R (for me) but cvsup to stable resolved the case (for me) in end of dezember since a month or so with recent releng_6 the problem came back, timeouts and stopping rx/tx did you do more updates in the timeframe from december to about a month ago? if the problem was gone and is back now any (exact) dates to narrow down the timeframe where the problem came back would be very helpful. nve0: NVIDIA nForce MCP9 Networking Adapter port 0xbc00-0xbc07 mem 0xfebfa000-0xfebfafff irq 22 at device 10.0 on pci0 nve0: Reserved 0x1000 bytes for rid 0x10 type 3 at 0xfebfa000 nve0: Ethernet address 00:0a:48:1d:c6:97 miibus1: MII bus on nve0 nve0: bpf attached nve0: Ethernet address: 00:0a:48:1d:c6:97 nve0: [MPSAFE] This happens w/o any real activity on that interface (which goes into an Allied Telesyn switch): ... Mar 24 19:39:54 worf kernel: nve0: device timeout (1) Mar 24 19:39:54 worf kernel: nve0: link state changed to DOWN Mar 24 19:39:55 worf kernel: nve0: link state changed to UP Mar 24 19:40:14 worf kernel: nve0: device timeout (1) Mar 24 19:40:14 worf kernel: nve0: link state changed to DOWN Mar 24 19:40:15 worf kernel: nve0: link state changed to UP Mar 24 19:40:33 worf kernel: nve0: device timeout (2) Mar 24 19:40:33 worf kernel: nve0: link state changed to DOWN Mar 24 19:40:34 worf kernel: nve0: link state changed to UP Mar 24 19:45:52 worf kernel: nve0: device timeout (1) Mar 24 19:45:52 worf kernel: nve0: link state changed to DOWN Mar 24 19:45:53 worf kernel: nve0: link state changed to UP . FreeBSD worf.tecnik93.com 6.1-PRERELEASE FreeBSD 6.1-PRERELEASE #0: Tue Mar 21 01:39:15 EET 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC amd64 Note that we are running on i386 running am an AMD64 platform. I just enabled the nve0 on my desktop (I'm using sk0, it's a Asus A8N-SLI Deluxe motherboard, both interfaces connected to the same 8-port Surecom switch - talking about very inexpensive :) and it seems to work OK. nve0: NVIDIA nForce MCP9 Networking Adapter port 0xb000-0xb007 mem 0xca10-0xca100fff irq 23 at device 10.0 on pci0 nve0: Reserved 0x1000 bytes for rid 0x10 type 3 at 0xca10 nve0: Ethernet address 00:15:f2:39:09:08 miibus1: MII bus on nve0 nve0: bpf attached nve0: Ethernet address: 00:15:f2:39:09:08 nve0: [MPSAFE] FreeBSD it.buh.tecnik93.com 6.1-PRERELEASE FreeBSD 6.1-PRERELEASE #0: Fri Feb 24 07:01:54 EET 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/IT6_B_P i386 I updated my system (which was happy on Feb. 15 code) to March 13 code and I am still running fine. No errors at all. Also, another system was updated to RELENG_6 yesterday and it is also running clean. Again, all systems are identical dual core AMD64 systems running i386 code. (We would like to run amd64, but OpenOffice.org still does not run on it and we need that.) Both my systems are single core single CPU. Only the system in Iowa with the AT switch is seeing problems. Even if there is no traffic, it is possible that something that is negotiated by the switch is triggering the problem. Possibly, but I think I remember seeing the same w/o cable plugged-in; I'll try to remember to test this for a a few minutes when I'll be on-site next week. -- IOnut - Unregistered ;) FreeBSD user Intellectual Property is nowhere near as valuable as Intellect BOFH excuse #266: All of the packets are empty ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nve timeout (and down) regression?
在 Friday 24 March 2006 02:40,JoaoBR 写道: The other patch cited in the message has never been made: diff -u -r1.7.2.4 if_nve.c --- if_nve.c9 Oct 2005 04:18:17 - 1.7.2.4 +++ if_nve.c27 Oct 2005 09:58:45 - @@ -727,7 +727,7 @@ DEBUGOUT(NVE_DEBUG_INIT, nve: nve_init_rings - entry\n); - sc-cur_rx = sc-cur_tx = sc-pending_rxs = sc-pending_txs = 0; + sc-cur_rx = sc-cur_tx = sc-pending_rxs = 0; and I did this part and my NIC is running, as I said still lot of collisions caused by it but it is running João This change causes my NIC to not work anymore, though I still saw timeout without this change, I think this varies from hardware revision to revision, unpredictable at all. David Xu ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nve timeout (and down) regression?
I am a bit confused. The first addition of sc-pending_txs = 0; was MFC'ed back in December by obrien. Check around line 730 of if_nv.c (or whatever it's called in 6.0) sc-linkup = 0; sc-cur_rx = 0; sc-pending_rxs = 0; + sc-pending_txs = 0; This should mostly eliminate the problem. The other patch cited in the message has never been made: diff -u -r1.7.2.4 if_nve.c --- if_nve.c9 Oct 2005 04:18:17 - 1.7.2.4 +++ if_nve.c27 Oct 2005 09:58:45 - @@ -727,7 +727,7 @@ DEBUGOUT(NVE_DEBUG_INIT, nve: nve_init_rings - entry\n); - sc-cur_rx = sc-cur_tx = sc-pending_rxs = sc-pending_txs = 0; + sc-cur_rx = sc-cur_tx = sc-pending_rxs = 0; /* Initialise RX ring */ for (i = 0; i RX_RING_SIZE; i++) { struct nve_rx_desc *desc = sc-rx_desc + i; So sc-pending_txs should only be reset to zero only in nve_stop but not in nve_init_rings? -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: [EMAIL PROTECTED] Phone: +1 510 486-8634 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nve timeout (and down) regression?
On Thursday 23 March 2006 15:29, Kevin Oberman wrote: I am a bit confused. The first addition of sc-pending_txs = 0; was MFC'ed back in December by obrien. Check around line 730 of if_nv.c (or whatever it's called in 6.0) sc-linkup = 0; sc-cur_rx = 0; sc-pending_rxs = 0; + sc-pending_txs = 0; This should mostly eliminate the problem. this part actually is in the driver but nve still doing timeout and stop imediatly rx/tx The other patch cited in the message has never been made: diff -u -r1.7.2.4 if_nve.c --- if_nve.c9 Oct 2005 04:18:17 - 1.7.2.4 +++ if_nve.c27 Oct 2005 09:58:45 - @@ -727,7 +727,7 @@ DEBUGOUT(NVE_DEBUG_INIT, nve: nve_init_rings - entry\n); - sc-cur_rx = sc-cur_tx = sc-pending_rxs = sc-pending_txs = 0; + sc-cur_rx = sc-cur_tx = sc-pending_rxs = 0; and I did this part and my NIC is running, as I said still lot of collisions caused by it but it is running João /* Initialise RX ring */ for (i = 0; i RX_RING_SIZE; i++) { struct nve_rx_desc *desc = sc-rx_desc + i; So sc-pending_txs should only be reset to zero only in nve_stop but not in nve_init_rings? A mensagem foi scaneada pelo sistema de e-mail e pode ser considerada segura. Service fornecido pelo Datacenter Matik https://datacenter.matik.com.br ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nve timeout (and down) regression?
On Thu, 23 Mar 2006, JoaoBR wrote: Hi, The other patch cited in the message has never been made: diff -u -r1.7.2.4 if_nve.c --- if_nve.c9 Oct 2005 04:18:17 - 1.7.2.4 +++ if_nve.c27 Oct 2005 09:58:45 - @@ -727,7 +727,7 @@ DEBUGOUT(NVE_DEBUG_INIT, nve: nve_init_rings - entry\n); - sc-cur_rx = sc-cur_tx = sc-pending_rxs = sc-pending_txs = 0; + sc-cur_rx = sc-cur_tx = sc-pending_rxs = 0; and I did this part and my NIC is running, as I said still lot of collisions caused by it but it is running If you have collisions you have most likeely a duplex mismatch. If you read the code and I remember right the above change is a NOP. The timeouts have been there and are there. The difference with the last commits is that a lot of people couldn't get the NIC working at all before and now it works (somewhat) but there are timeouts from time to time which for some people seem to auto-recover and for others still get things 'stuck'. The problem is to diagnose what everyone really has - branch running (RELENG_6 or HEAD) - i386 or amd64 - exact FreeBSD revisions for if_nve.c - if using patches which - pciconf -lv | grep -A4 ^nve - which board - exact problems * is the interface working at all * is it just stuck from time to time * ... See http://www.freebsd.org/cgi/query-pr.cgi?pr=94524 for more questions. You my want to submit a fllow up and add your description with the answer to these questions there. -- Bjoern A. Zeeb bzeeb at Zabbadoz dot NeT ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nve timeout (and down) regression?
On Thursday 23 March 2006 15:59, Bjoern A. Zeeb wrote: If you have collisions you have most likeely a duplex mismatch. yep, but I set manually matching with the switch and tried other speeds, no change If you read the code and I remember right the above change is a NOP. anyway, resolved my case ... The timeouts have been there and are there. The difference with the last commits is that a lot of people couldn't get the NIC working at all before and now it works (somewhat) but there are timeouts from time to time which for some people seem to auto-recover and for others still get things 'stuck'. nve did not worked on 6.0R (for me) but cvsup to stable resolved the case (for me) in end of dezember since a month or so with recent releng_6 the problem came back, timeouts and stopping rx/tx The problem is to diagnose what everyone really has - branch running (RELENG_6 or HEAD) releng_6 last cvsup from thi monday - i386 or amd64 amd64 - exact FreeBSD revisions for if_nve.c - if using patches which - pciconf -lv | grep -A4 ^nve nve0: NVIDIA nForce MCP7 Networking Adapter port 0xd400-0xd407 mem 0xec00-0xec000fff irq 20 at device 5.0 on pci0 nve0: Ethernet address 00:04:61:98:97:d5 miibus0: MII bus on nve0 [EMAIL PROTECTED]:5:0: class=0x068000 card=0x100c1695 chip=0x00df10de rev=0xa2 hdr=0x00 vendor = 'NVIDIA Corporation' device = 'Network Bus Enumerator' class= bridge - which board - exact problems * is the interface working at all the system after probing HW comes up with nve0 down nve0 up nve0 down João * is it just stuck from time to time * ... See http://www.freebsd.org/cgi/query-pr.cgi?pr=94524 for more questions. You my want to submit a fllow up and add your description with the answer to these questions there. -- Atenciosamente Infomatik Internet Technology (18)3551.8155 (18)8112.7007 http://info.matik.com.br A mensagem foi scaneada pelo sistema de e-mail e pode ser considerada segura. Service fornecido pelo Datacenter Matik https://datacenter.matik.com.br ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nve timeout (and down) regression?
On Thu, 23 Mar 2006, JoaoBR wrote: On Thursday 23 March 2006 15:59, Bjoern A. Zeeb wrote: nve did not worked on 6.0R (for me) but cvsup to stable resolved the case (for me) in end of dezember since a month or so with recent releng_6 the problem came back, timeouts and stopping rx/tx did you do more updates in the timeframe from december to about a month ago? if the problem was gone and is back now any (exact) dates to narrow down the timeframe where the problem came back would be very helpful. -- Bjoern A. Zeeb bzeeb at Zabbadoz dot NeT ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nve timeout (and down) regression?
On Thursday 23 March 2006 18:59, Bjoern A. Zeeb wrote: On Thu, 23 Mar 2006, JoaoBR wrote: On Thursday 23 March 2006 15:59, Bjoern A. Zeeb wrote: nve did not worked on 6.0R (for me) but cvsup to stable resolved the case (for me) in end of dezember since a month or so with recent releng_6 the problem came back, timeouts and stopping rx/tx did you do more updates in the timeframe from december to about a month ago? yes, aprox once a week since 6.0R release if the problem was gone and is back now any (exact) dates to narrow down the timeframe where the problem came back would be very helpful. I know but unfortunatly I didn't tracked it and what I said is the most exact I have, I just got something interesting, It seems the problem is not with media 100baseTX full-duplex (autoselect or set) but only with 100baseTX (autoselect or set) but I need to doublecheck if it is really the same MB João A mensagem foi scaneada pelo sistema de e-mail e pode ser considerada segura. Service fornecido pelo Datacenter Matik https://datacenter.matik.com.br ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nve timeout (and down) regression?
Date: Thu, 23 Mar 2006 21:59:56 + (UTC) From: Bjoern A. Zeeb [EMAIL PROTECTED] On Thu, 23 Mar 2006, JoaoBR wrote: On Thursday 23 March 2006 15:59, Bjoern A. Zeeb wrote: nve did not worked on 6.0R (for me) but cvsup to stable resolved the case (for me) in end of dezember since a month or so with recent releng_6 the problem came back, timeouts and stopping rx/tx did you do more updates in the timeframe from december to about a month ago? if the problem was gone and is back now any (exact) dates to narrow down the timeframe where the problem came back would be very helpful. We have several identical systems and most are running fine. Mine is running RELENG_6 and was updated on 2/15 and I have no problem. Another system that was just updated last week (I don't have the exact time) is showing the problem. Another was built 1/21 and runs fine. Guess I'll try updating my 2/15 system and see if it has problems. Another thing that might be related is that the system having problems is plugged into a very inexpensive switch (Allied Telesyn), my system uses a Netgear FS108 and the third is connected to a Cisco 3548. I know that this is unlikely, but I thought that it was worth mentioning. All are claimed to be running 100-FD. Unfortunately, the one causing most of the problems is about 2000 miles away, so I have only limited access to that one. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: [EMAIL PROTECTED] Phone: +1 510 486-8634 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]