Re: [PATCH] Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!

2007-01-23 Thread Thibaut VARENE

On 1/22/07, Dale Farnsworth [EMAIL PROTECTED] wrote:

Jarek and Thibaut,

Thank you both very much for your work finding and fixing this bug.
Jarek, can you verify that the following patch fixes the problem you
were seeing?

-Dale


Hi Dale,

The patch seems to work fine. Just thinking out loud (as I really
don't know this part of the kernel), here are a few remarks:

- As Jarek pointed out, you're checking twice the value of
mp-tx_desc_count, which means dereferencing a pointer and accessing
memory twice. I don't know how perf-critical this bit of code is, but
I wonder which of keeping the lock for a long time or doing what you
is better (I'm being anal and you probably know that better than me :)

- Also, lines 344-349, in the test condition, cmd_sts (an indirection
to mp content) is accessed (dunno if it's ok to do that outside of the
lock), and on line 346, mp-stats.tx.errors is incremented outside of
the spinlock protection. But then, I don't know what that lock is
meant to protect, just pointing this out :)

Thanks for your help, I hope the fix will go upstream asap :)

And about being the author of the patch, since I'm not, I don't really mind 8)

HTH

T-Bone

--
Thibaut VARENE
http://www.parisc-linux.org/~varenet/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!

2007-01-23 Thread Thibaut VARENE

On 1/23/07, Thibaut VARENE [EMAIL PROTECTED] wrote:

- As Jarek pointed out, you're checking twice the value of
mp-tx_desc_count, which means dereferencing a pointer and accessing
memory twice. I don't know how perf-critical this bit of code is, but
I wonder which of keeping the lock for a long time or doing what you
is better (I'm being anal and you probably know that better than me :)


Forget that. That's an irq disabling lock, it's worse than anything else :)

--
Thibaut VARENE
http://www.parisc-linux.org/~varenet/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mv643xx_eth: Fix race condition in mv643xx_eth_free_tx_descs

2007-01-23 Thread Thibaut VARENE

On 1/23/07, Dale Farnsworth [EMAIL PROTECTED] wrote:

From Dale Farnsworth [EMAIL PROTECTED]

mv643xx_eth: Fix race condition in mv643xx_eth_free_tx_descs

This bug was found and isolated by Thibaut VARENE [EMAIL PROTECTED]
and Jarek Poplawski [EMAIL PROTECTED].  This patch is a modification of their
fixes.  We acquire and release the lock for each descriptor that is freed
to minimize the time the lock is held.

---

 drivers/net/mv643xx_eth.c |   11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c
index c41ae42..b3bf864 100644
--- a/drivers/net/mv643xx_eth.c
+++ b/drivers/net/mv643xx_eth.c
@@ -332,13 +339,13 @@ int mv643xx_eth_free_tx_descs(struct net
if (skb)
mp-tx_skb[tx_index] = NULL;

-   spin_unlock_irqrestore(mp-lock, flags);
-
if (cmd_sts  ETH_ERROR_SUMMARY) {
printk(%s: Error in TX\n, dev-name);
mp-stats.tx_errors++;
}


Note that this printk probably won't show immediately because IRQs are
disabled. But that's maybe not a big deal.

HTH

--
Thibaut VARENE
http://www.parisc-linux.org/~varenet/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!

2007-01-21 Thread Thibaut VARENE

On 1/11/07, Jarek Poplawski [EMAIL PROTECTED] wrote:


PS: alas I didn't even check compiling - I had no time to
find all compile dependencies of this driver
---

Signed-off-by: Jarek Poplawski [EMAIL PROTECTED]
---

diff -Nurp linux-2.6.20-rc4-/drivers/net/mv643xx_eth.c 
linux-2.6.20-rc4/drivers/net/mv643xx_eth.c
--- linux-2.6.20-rc4-/drivers/net/mv643xx_eth.c 2006-12-18 08:57:52.0 
+0100
+++ linux-2.6.20-rc4/drivers/net/mv643xx_eth.c  2007-01-11 08:55:34.0 
+0100
@@ -312,8 +312,8 @@ int mv643xx_eth_free_tx_descs(struct net
int count;
int released = 0;

+   spin_lock_irqsave(mp-lock, flags);
while (mp-tx_desc_count  0) {
-   spin_lock_irqsave(mp-lock, flags);
tx_index = mp-tx_used_desc_q;
desc = mp-p_tx_desc_area[tx_index];
cmd_sts = desc-cmd_sts;
@@ -348,8 +348,10 @@ int mv643xx_eth_free_tx_descs(struct net
dev_kfree_skb_irq(skb);


Hmm, I think this is guaranteed not to work. In between those lines
the lock is released, while data in the mp structure is still being
accessed. It seems that this bit of code is indeed not race-safe
though, I'm gonna try to figure something.


released = 1;
+   spin_lock_irqsave(mp-lock, flags);
}

+   spin_unlock_irqrestore(mp-lock, flags);
return released;
 }


Ugh, this is really unclean... Taking a lock for nothing like that
has a perf cost.

HTH

T-Bone

--
Thibaut VARENE
http://www.parisc-linux.org/~varenet/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!

2007-01-21 Thread Thibaut VARENE

On 1/21/07, Thibaut VARENE [EMAIL PROTECTED] wrote:

On 1/11/07, Jarek Poplawski [EMAIL PROTECTED] wrote:

 PS: alas I didn't even check compiling - I had no time to
 find all compile dependencies of this driver
 ---
Hmm, I think this is guaranteed not to work. In between those lines
the lock is released, while data in the mp structure is still being
accessed. It seems that this bit of code is indeed not race-safe
though, I'm gonna try to figure something.


This was indeed the right spot. The attached raw hack seems to fix the
bug (I couldn't crash the box so far).  I haven't checked that the
same situation happens elsewhere in the code, I leave that as an
exercise for the maintainers (or until I experience another kind of
crash :)

The patch is a bit ugly (printk with irq disabled will not show, etc)
but at least it does work. I'm sure somebody will figure something

HTH

T-Bone

--
Thibaut VARENE
http://www.parisc-linux.org/~varenet/
--- linux-2.6.19.orig/drivers/net/mv643xx_eth.c	2007-01-21 13:56:04.450689123 +0100
+++ linux-2.6.19/drivers/net/mv643xx_eth.c	2007-01-21 13:39:58.228404763 +0100
@@ -312,8 +312,8 @@
 	int count;
 	int released = 0;
 
+	spin_lock_irqsave(mp-lock, flags);
 	while (mp-tx_desc_count  0) {
-		spin_lock_irqsave(mp-lock, flags);
 		tx_index = mp-tx_used_desc_q;
 		desc = mp-p_tx_desc_area[tx_index];
 		cmd_sts = desc-cmd_sts;
@@ -332,8 +332,6 @@
 		if (skb)
 			mp-tx_skb[tx_index] = NULL;
 
-		spin_unlock_irqrestore(mp-lock, flags);
-
 		if (cmd_sts  ETH_ERROR_SUMMARY) {
 			printk(%s: Error in TX\n, dev-name);
 			mp-stats.tx_errors++;
@@ -349,6 +347,7 @@
 
 		released = 1;
 	}
+	spin_unlock_irqrestore(mp-lock, flags);
 
 	return released;
 }


Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!

2007-01-10 Thread Thibaut VARENE

On 1/9/07, Thibaut VARENE [EMAIL PROTECTED] wrote:

On 1/9/07, Dale Farnsworth [EMAIL PROTECTED] wrote:

 Thank you Thibaut.  Please try the following patch:

 From: Dale Farnsworth [EMAIL PROTECTED]

 Reserve one unused descriptor in the TX ring
 to facilitate testing for when the ring is full.

Dale,

tried it and unfortunately:


Also, I don't know if you read that bit, but everytime I reboot the
box immediately after a crash, the NIC gets a bogus (always the same
it seems) MAC address, and I have to reboot one more time to get back
to the normal MAC address.

Dunno if that hints anything though.

HTH

--
Thibaut VARENE
http://www.parisc-linux.org/~varenet/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!

2007-01-09 Thread Thibaut VARENE

On 1/9/07, Jarek Poplawski [EMAIL PROTECTED] wrote:

On Tue, Jan 09, 2007 at 11:27:59AM +0100, Thibaut VARENE wrote:
...
 I suspected both and changed both the disk and the ram for quality
 parts, that I tested afterwards. Both passed thorough tests.

You wrote about half an hour, so overheating was also
considered, I presume.


Yes, but since it works fine with the other NIC... :)


 Finally, using the other NIC on the box (a VIA Rhine II, 100Mbps),
 works absolutely fine.

So it looks like the card/driver (or maybe this specimen?).


I'm suspecting the driver, but I'm not a specialist :)
It's true that this particular card specimen could be damaged even
though that seems a bit unlikely.

HTH

T-Bone

--
Thibaut VARENE
http://www.parisc-linux.org/~varenet/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!

2007-01-09 Thread Thibaut VARENE

On 1/9/07, Jarek Poplawski [EMAIL PROTECTED] wrote:

On Tue, Jan 09, 2007 at 11:27:59AM +0100, Thibaut VARENE wrote:
...
 I suspected both and changed both the disk and the ram for quality
 parts, that I tested afterwards. Both passed thorough tests.

 Finally, using the other NIC on the box (a VIA Rhine II, 100Mbps),
 works absolutely fine.

If you are not tired, I'd suggest two more tests:


I volunteered to help :)

For the sake of testing up-to-date code, I performed the following
tests with 2.6.20-rc4.

First test was the usual nfs video playback. Crashdump is
panic-2.6.20-rc4-nfs.txt. Went down in about 20mn.


- as above but with NIC set to 100Mbps also,


Couldn't crash the machine (or at least it didn't happen in the time
frame I was willing to wait for doing ftp downloads, ~20mn). One note
though:

The throughput of the card was terribly sucky when set in 100-FD: I
couldn't get more than 5,5MB/s doing ftp get writing to /dev/null (to
rule out disk perf), ie, half the max link speed, though the /only/
thing I changed in the setup was the link speed (same switch - made
sure it properly detected link speed/duplex, same file server, same
everything else).

When configured in 1000-FD, still writing to /dev/null I could get
about 60MB/s. Again half link speed, but there, I suppose that the
remote fileserver couldn't pull data faster from the disks :)


- long downloading but without nfs e.g. ftp


That was fast and easy. In 1000-FD, I took down the box in 2s (after
downloading 90MB). Crashdump is panic-2.6.20-rc4-ftp.txt


(btw. there were some patches after 2.6.19
for rpc memory races).


It seems that's something else. I think I also reproduced the bug
while surfing the internet with firefox, but I didn't have serial line
hooked to capture a dump, unfortunately.


PS: Maintainers were cc-ed, I hope?


Now they are :)

HTH

T-Bone

--
Thibaut VARENE
http://www.parisc-linux.org/~varenet/
Debian GNU/Linux 4.0 Alucard ttyS0  

Alucard login: [ cut here ] 
kernel BUG at drivers/net/mv643xx_eth.c:1071!   
Oops: Exception in kernel mode, sig: 5 [#1] 
PREEMPT 
Modules linked in: eeprom sbp2 scsi_mod eth1394 uhci_hcd ohci1394 parport_pc pae
NIP: C0210B40 LR: C02126DC CTR: C0212620
REGS: da247ac0 TRAP: 0700   Not tainted  (2.6.20-rc4)   
MSR: 00021032 ME,IR,DR  CR: 28222488  XER:    
TASK = db82a050[1780] 'ncftp' THREAD: da246000  
GPR00:  DA247B70 DB82A050 CFB14260 CFB14000 000B DED5FD72   
GPR08: 0819 0001 1000 081A 48222422 10056CD0 28004422 C03D9BF8  
GPR16:    DA246000 0001 CFB142BC 9032   
GPR24:   C03E CFB14000 C0212620 DEDFD160 CFB14260 DED5FD40  
NIP [C0210B40] eth_alloc_tx_desc_index+0x44/0x50
LR [C02126DC] mv643xx_eth_start_xmit+0xbc/0x3b8 
Call Trace: 
[DA247B70] [DED5FD70] 0xded5fd70 (unreliable)   
[DA247BB0] [C029F258] dev_hard_start_xmit+0x1d4/0x2c8   
[DA247BD0] [C02A1BF4] dev_queue_xmit+0x2bc/0x334
[DA247BF0] [C02BC8A8] ip_output+0x120/0x244 
[DA247C10] [C02BD8DC] ip_queue_xmit+0x17c/0x408 
[DA247C80] [C02CEB1C] tcp_transmit_skb+0x358/0x7bc  
[DA247CC0] [C02CBF80] __tcp_ack_snd_check+0x64/0xbc 
[DA247CD0] [C02CDA94] tcp_rcv_established+0x5d4/0x980   
[DA247D00] [C02D4764] tcp_v4_do_rcv+0xe0/0x3c0  
[DA247D30] [C0294B58] release_sock+0x7c/0xf4
[DA247D50] [C02C5C1C] tcp_recvmsg+0x4c8/0xbcc   
[DA247DB0] [C0294490] sock_common_recvmsg+0x3c/0x60 
[DA247DD0] [C02920E4] sock_aio_read+0x10c/0x114 
[DA247E30] [C006F210] do_sync_read+0xc4/0x138   
[DA247EF0] [C006FECC] vfs_read+0x19c/0x1a4  
[DA247F10] [C00702E4] sys_read+0x4c/0x90
[DA247F40] [C00122EC] ret_from_syscall+0x0/0x38 
--- Exception: c01 at 0xff5ba98 
LR = 0x10032fc0 
Instruction dump

Re: kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!

2007-01-09 Thread Thibaut VARENE

On 1/9/07, Dale Farnsworth [EMAIL PROTECTED] wrote:

On Tue, Jan 09, 2007 at 06:44:49PM +0100, Thibaut VARENE wrote:
 On 1/9/07, Jarek Poplawski [EMAIL PROTECTED] wrote:
 On Tue, Jan 09, 2007 at 11:27:59AM +0100, Thibaut VARENE wrote:
 ...
  I suspected both and changed both the disk and the ram for quality
  parts, that I tested afterwards. Both passed thorough tests.
 
  Finally, using the other NIC on the box (a VIA Rhine II, 100Mbps),
  works absolutely fine.
 
 If you are not tired, I'd suggest two more tests:

 I volunteered to help :)

Thank you Thibaut.  Please try the following patch:

From: Dale Farnsworth [EMAIL PROTECTED]

Reserve one unused descriptor in the TX ring
to facilitate testing for when the ring is full.


Dale,

tried it and unfortunately:

Alucard login: [ cut here ]
kernel BUG at drivers/net/mv643xx_eth.c:1071!
Oops: Exception in kernel mode, sig: 5 [#1]
PREEMPT
Modules linked in: eeprom sbp2 scsi_mod eth1394 uhci_hcd vt8231 ohci1394 ieee13t
NIP: C0210B40 LR: C02126DC CTR: C0212620
REGS: dd2d7b40 TRAP: 0700   Not tainted  (2.6.20-rc4)
MSR: 00021032 ME,IR,DR  CR: 28242488  XER: 
TASK = da03c640[1775] 'ncftp' THREAD: dd2d6000
GPR00:  DD2D7BF0 DA03C640 CFB16260 CFB16000 000B DF79FDD2 
GPR08: 0BA9 0001 1000 0BAA 28242482 10056CD0 28004422 C03D9BF8
GPR16:    DD2D6000 0001 CFB162BC 9032 
GPR24: 05A8  C03E CFB16000 C0212620 CFCB3260 CFB16260 DF79FDA0
NIP [C0210B40] eth_alloc_tx_desc_index+0x44/0x50
LR [C02126DC] mv643xx_eth_start_xmit+0xbc/0x3b8
Call Trace:
[DD2D7BF0] [DF79FDD0] 0xdf79fdd0 (unreliable)
[DD2D7C30] [C029F258] dev_hard_start_xmit+0x1d4/0x2c8
[DD2D7C50] [C02A1BF4] dev_queue_xmit+0x2bc/0x334
[DD2D7C70] [C02BC8A8] ip_output+0x120/0x244
[DD2D7C90] [C02BD8DC] ip_queue_xmit+0x17c/0x408
[DD2D7D00] [C02CEB1C] tcp_transmit_skb+0x358/0x7bc
[DD2D7D40] [C02C2FC0] tcp_cleanup_rbuf+0xb8/0x158
[DD2D7D50] [C02C5C14] tcp_recvmsg+0x4c0/0xbcc
[DD2D7DB0] [C0294490] sock_common_recvmsg+0x3c/0x60
[DD2D7DD0] [C02920E4] sock_aio_read+0x10c/0x114
[DD2D7E30] [C006F210] do_sync_read+0xc4/0x138
[DD2D7EF0] [C006FECC] vfs_read+0x19c/0x1a4
[DD2D7F10] [C00702E4] sys_read+0x4c/0x90
[DD2D7F40] [C00122EC] ret_from_syscall+0x0/0x38
--- Exception: c01 at 0xff5ba98
   LR = 0x10032fc0
Instruction dump:
5400fffe 0f00 81030020 81230024 39680001 7c0b53d6 7c0051d6 7d605850
7d694a78 91630020 7d290034 5529d97e 0f09 7d034378 4e800020 2f840001
0Kernel panic - not syncing: Fatal exception in interrupt
0Rebooting in 180 seconds..4atkbd.c: Spurious ACK on isa0060/serio0. Some .
atkbd.c: Spurious ACK on isa0060/serio0. Some program might be trying access ha.
atkbd.c: Spurious ACK on isa0060/serio0. Some program might be trying access ha.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!

2007-01-05 Thread Thibaut VARENE

Hi,

I've been experiencing this bug on my Pegasos II (PPC G4 1GHz, 512M
RAM) box for a while: I can reliably kill my machine in about half an
hour while watching some video read from a remote nfs volume (hence
the mplayer task in the following dump). It was relatively uneasy to
get proper debug info as the crash happens while video was playing on
the screen, but it's there anyway :)

This particular dump comes from kernel 2.6.19-ck2 but I reproduced the
bug with vanilla 2.6.19 too, so the bug lives in mainline. I'm not
really familiar with that particular code, but I'd gladly provide as
much debug info as I can.

The box is hooked to a gigabit switch and the NIC is configured as
gigabit too. Interestingly, when I reboot immediately after the crash,
the NIC gets a bogus MAC address, and I have to reboot again to get
back to normal.

HTH

T-Bone

--
Thibaut VARENE
http://www.parisc-linux.org/~varenet/
kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1069!
Oops: Exception in kernel mode, sig: 5 [#1] 
PREEMPT 
Modules linked in: nfs lockd sunrpc eeprom sbp2 scsi_mod eth1394 uhci_hcd ohci14
NIP: C020F0E0 LR: C0210C54 CTR: C0210B98
REGS: c7f6f670 TRAP: 0700   Not tainted  (2.6.19-ck2)   
MSR: 00021032 ME,IR,DR  CR: 24022488  XER:    
TASK = c49a8d10[2227] 'mplayer' THREAD: c7f6e000
GPR00:  C7F6F720 C49A8D10 DFF41260 DFF41000 000B CE0CF932   
GPR08: 0CEA 0001 1000 0CEB 44022422 1085F9B8 C50B0368 B241  
GPR16: C7F6FD28 B240  DFF412DC C038 9032 0400 C7F6E000  
GPR24:   DFF41000 C7F6E000 C0210B98 CE0EAC80 DFF41260 CE0CF900  
NIP [C020F0E0] eth_alloc_tx_desc_index+0x44/0x50
LR [C0210C54] mv643xx_eth_start_xmit+0xbc/0x3b8 
Call Trace: 
[C7F6F720] [CE0CF930] 0xce0cf930 (unreliable)   
[C7F6F760] [C0299714] dev_hard_start_xmit+0x1d4/0x2c8   
[C7F6F780] [C029C0E0] dev_queue_xmit+0x2bc/0x334
[C7F6F7A0] [C02B6E1C] ip_output+0x124/0x248 
[C7F6F7C0] [C02B7E54] ip_queue_xmit+0x17c/0x404 
[C7F6F830] [C02C91BC] tcp_transmit_skb+0x38c/0x7dc  
[C7F6F860] [C02C65E4] __tcp_ack_snd_check+0x64/0xbc 
[C7F6F870] [C02C8100] tcp_rcv_established+0x5d4/0x980   
[C7F6F8A0] [C02CEDCC] tcp_v4_do_rcv+0xd8/0x3e4  
[C7F6F8D0] [C02D1610] tcp_v4_rcv+0x788/0x98c
[C7F6F900] [C02B2594] ip_local_deliver+0xe4/0x1a4   
[C7F6F920] [C02B2A50] ip_rcv+0x288/0x46c
[C7F6F950] [C0299308] netif_receive_skb+0x214/0x304 
[C7F6F980] [C0211CBC] mv643xx_poll+0x41c/0x48c  
[C7F6F9D0] [C029B550] net_rx_action+0x98/0x200  
[C7F6FA00] [C0026958] __do_softirq+0x80/0xf4
[C7F6FA30] [C0006930] do_softirq+0x58/0x5c  
[C7F6FA40] [C0026408] irq_exit+0x60/0x80
[C7F6FA50] [C00069DC] do_IRQ+0xa8/0xc8  
[C7F6FA60] [C0012498] ret_from_except+0x0/0x14  
--- Exception: 501 at __kmalloc+0x30/0xc0   
LR = rpc_malloc+0x48/0xac [sunrpc]  
[C7F6FB20] [C3D72508] 0xc3d72508 (unreliable)   
[C7F6FB30] [E2A88E18] rpc_malloc+0x48/0xac [sunrpc] 
[C7F6FB40] [E2A835F8] call_allocate+0x88/0x108 [sunrpc] 
[C7F6FB60] [E2A89554] __rpc_execute+0x94/0x248 [sunrpc] 
[C7F6FB80] [E2B0EEB0] nfs_execute_read+0x40/0x64 [nfs]  
[C7F6FBB0] [E2B0F6A4] nfs_pagein_one+0x2a0/0x300 [nfs]  
[C7F6FBF0] [E2B0FA9C] nfs_readpages+0x118/0x1f8 [nfs]   
[C7F6FC40] [C00521DC] __do_page_cache_readahead+0x1e8/0x318 
[C7F6FCD0] [C0052390] blockable_page_cache_readahead+0x84/0x114 
[C7F6FCF0] [C00524A4] make_ahead_window+0x84/0xd4   
[C7F6FD00] [C00525AC] page_cache_readahead+0xb8/0x220   
[C7F6FD20] [C004B00C] do_generic_mapping_read+0x574/0x5e8   
[C7F6FDC0] [C004D624] generic_file_aio_read+0x120/0x274 
[C7F6FE00] [E2B06F00] nfs_file_read

[BUG] in skge.c on 2.6.18-rc5

2006-08-30 Thread Thibaut VARENE

Hi,

The following commit:

commit 239e44e1f05e2163ee066c07a753f9fb445979b2
Author: Edgar E. Iglesias [EMAIL PROTECTED]
Date:   Mon Aug 14 23:00:24 2006 -0700

   [PATCH] skge: remember to run netif_poll_disable()

   Signed-off-by: Edgar E. Iglesias [EMAIL PROTECTED]
   Cc: Stephen Hemminger [EMAIL PROTECTED]
   Cc: Jeff Garzik [EMAIL PROTECTED]
   Signed-off-by: Andrew Morton [EMAIL PROTECTED]
   Signed-off-by: Jeff Garzik [EMAIL PROTECTED]

diff --git a/drivers/net/skge.c b/drivers/net/skge.c
index 7de9a07..ad878df 100644
--- a/drivers/net/skge.c
+++ b/drivers/net/skge.c
@@ -2211,6 +2211,7 @@ static int skge_up(struct net_device *de
   skge_write8(hw, Q_ADDR(rxqaddr[port], Q_CSR), CSR_START |
CSR_IRQ_CL_F);skge_led(skge, LED_MODE_ON);

+   netif_poll_enable(dev);
   return 0;

 free_rx_ring:
@@ -2279,6 +2280,7 @@ static int skge_down(struct net_device *

   skge_led(skge, LED_MODE_OFF);

+   netif_poll_disable(dev);
   skge_tx_clean(skge);
   skge_rx_clean(skge);


panics my 2.6.18-rc5 kernel on my em64t box apparently on first
network activity (eg 'ping'). Reverting it gets me back to a
functional kernel.

HTH

T-Bone

[EMAIL PROTECTED]:~$ lspci | grep Ethernet
02:00.0 Ethernet controller: Marvell Technology Group Ltd. Unknown
device 4364 (rev 12)
05:04.0 Ethernet controller: Marvell Technology Group Ltd. 88E8001
Gigabit Ethernet Controller (rev 13)


--
Thibaut VARENE
http://www.parisc-linux.org/~varenet/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] in skge.c on 2.6.18-rc5

2006-08-30 Thread Thibaut VARENE

Replying to myself as I've been pointed at Stephen's reply (please CC
me, i'm not subscribed):

I'm bringing the interface up with 'dhclient eth0', and yes it's using autoneg.

HTH

T_Bone

On 8/30/06, Thibaut VARENE [EMAIL PROTECTED] wrote:

Hi,

The following commit:

commit 239e44e1f05e2163ee066c07a753f9fb445979b2
Author: Edgar E. Iglesias [EMAIL PROTECTED]
Date:   Mon Aug 14 23:00:24 2006 -0700


--
Thibaut VARENE
http://www.parisc-linux.org/~varenet/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] in skge.c on 2.6.18-rc5

2006-08-30 Thread Thibaut VARENE

On 8/30/06, Stephen Hemminger [EMAIL PROTECTED] wrote:

On Wed, 30 Aug 2006 19:21:20 +0200
Thibaut VARENE [EMAIL PROTECTED] wrote:

 Replying to myself as I've been pointed at Stephen's reply (please CC
 me, i'm not subscribed):

 I'm bringing the interface up with 'dhclient eth0', and yes it's using 
autoneg.


Any chance of getting a backtrace; serial port, digital camera, handwritten 
note?


If you can deal with this extremely blurry shot:
http://www.pateam.org/archive/tmp/IMGP0825.JPG

begins with mod_timer / neigh_update / read_lock and so on.

Worst case I'll reproduce the bug again and dump a better bt, but I'd
rather avoid as much as possible as I use that machine a lot right now
;P

HTH

T-Bone

--
Thibaut VARENE
http://www.parisc-linux.org/~varenet/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/9] tulip: NatSemi DP83840A PHY fix

2006-04-27 Thread Thibaut VARENE
On 4/27/06, Jeff Garzik [EMAIL PROTECTED] wrote:
 [EMAIL PROTECTED] wrote:
  + if (startup) {
  + int timeout = 10;   /* max 1 ms */
for (i = 0; i  reset_length; i++)

  iowrite32(get_u16(reset_sequence[i])  16, ioaddr + CSR15);
  +
  + /* flush posted writes */
  + ioread32(ioaddr + CSR15);
  +
  + /* Sect 3.10.3 in DP83840A.pdf (p39) 
  */
  + udelay(500);
  +
  + /* Section 4.2 in DP83840A.pdf (p43) 
  */
  + /* and IEEE 802.3 22.2.4.1.1 Reset 
  */
  + while (timeout-- 
  + (tulip_mdio_read (dev, 
  phy_num, MII_BMCR)  BMCR_RESET))
  + udelay(100);


 What can we do about this?

 Its a huge delay to be taken inside a spinlock.

This is device setup code. ISTR Grant showing other similar examples
of delays in such code in the kernel. Unless you keep
configuring/deconfiguring the device, and assuming you hit worst case
scenario everytime, it won't be a problem. But if you're doing that,
you already have a problem elsewhere. Or am I missing something?

 Anybody interested to converting the driver to use schedule_work() or
 similar?

That question has been raised months ago without any significant
outcome. Maybe it's time to move on? This code does respect hardware
specs, at least, which isn't the case of existing code, and fixes a
bug...

HTH

T-Bone

--
Thibaut VARENE
http://www.parisc-linux.org/~varenet/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html