Re: NFS stop/start problems (related to datagram shutdown bug?)

2001-02-06 Thread Byron Stanoszek

> There does seem to be a possible problem with sk_inuse not being
> updated atomically, so a race between an increment and a decrement
> could lose one of them.
> svc_sock_release seems to often be called with no more protection than
> the BKL, and it decrements sk_inuse.
>
> svc_sock_enqueue, on the other hand increments sk_inuse, and is
> protected by sv_lock, but not, I think, by the BKL, as it is called by
> a networking layer callback. So there might be a possibility for a
> race here.
>
> The attached patch might fix it, so if you are having reproducable
> problems, it might be worth applying this patch.
>
> NeilBrown

I applied the patch and the problem seems to have gone away, where it was
fairly reproducable beforehand. It waits a little longer (about 4 seconds)
during the NFS daemon shutdown before [  OK  ] pops up, but it could be my
imagination because I was doing it on the 166 and I was used to the 866's.

But what matters is that I can stop and restart NFS just fine now whereas
before I couldn't. Thanks for the patch.

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: NFS stop/start problems (related to datagram shutdown bug?)

2001-02-06 Thread Trond Myklebust

> " " == Neil Brown <[EMAIL PROTECTED]> writes:

 > The attached patch might fix it, so if you are having
 > reproducable problems, it might be worth applying this patch.

 > Trond: any comments?


 > +
 > + spin_lock_bh(>sv_lock);
 >  if (!--(svsk->sk_inuse) && svsk->sk_dead) {
 > + spin_unlock_bh(>sv_lock);
 >  dprintk("svc: releasing dead socket\n");
 >  sock_release(svsk->sk_sock);
 >  kfree(svsk);
 >  }
 > + else
 > + spin_unlock_bh(>sv_lock);
 >  }
 
Looks correct, but there's a similar problem in svc_delete_socket()
(see the setting of sk_dead, and subsequent test for sk_inuse).

Cheers,
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: NFS stop/start problems (related to datagram shutdown bug?)

2001-02-06 Thread Trond Myklebust

 " " == Neil Brown [EMAIL PROTECTED] writes:

  The attached patch might fix it, so if you are having
  reproducable problems, it might be worth applying this patch.

  Trond: any comments?


  +
  + spin_lock_bh(serv-sv_lock);
   if (!--(svsk-sk_inuse)  svsk-sk_dead) {
  + spin_unlock_bh(serv-sv_lock);
   dprintk("svc: releasing dead socket\n");
   sock_release(svsk-sk_sock);
   kfree(svsk);
   }
  + else
  + spin_unlock_bh(serv-sv_lock);
   }
 
Looks correct, but there's a similar problem in svc_delete_socket()
(see the setting of sk_dead, and subsequent test for sk_inuse).

Cheers,
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: NFS stop/start problems (related to datagram shutdown bug?)

2001-02-06 Thread Byron Stanoszek

 There does seem to be a possible problem with sk_inuse not being
 updated atomically, so a race between an increment and a decrement
 could lose one of them.
 svc_sock_release seems to often be called with no more protection than
 the BKL, and it decrements sk_inuse.

 svc_sock_enqueue, on the other hand increments sk_inuse, and is
 protected by sv_lock, but not, I think, by the BKL, as it is called by
 a networking layer callback. So there might be a possibility for a
 race here.

 The attached patch might fix it, so if you are having reproducable
 problems, it might be worth applying this patch.

 NeilBrown

I applied the patch and the problem seems to have gone away, where it was
fairly reproducable beforehand. It waits a little longer (about 4 seconds)
during the NFS daemon shutdown before [  OK  ] pops up, but it could be my
imagination because I was doing it on the 166 and I was used to the 866's.

But what matters is that I can stop and restart NFS just fine now whereas
before I couldn't. Thanks for the patch.

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: NFS stop/start problems (related to datagram shutdown bug?)

2001-02-05 Thread Neil Brown

On Monday February 5, [EMAIL PROTECTED] wrote:
> On Tue, 6 Feb 2001, Neil Brown wrote:
> 
> > How repeatable is this?  Is the server SMP?
> 
> I've tested this on two UP Athlons and 2 SMP Pentium 3's and the same problem
> occurred. I have not tested it more than once on the same system (I left the
> NFS servers untouched after the reboot).
> 
> The Athlon systems running NFS were 2.4.1-ac3 and the Pentiums were running
> 2.2.19-pre7. All computers exporting the FS had one directory mounted at least
> once.
> 
> In one case, only 1 directory was mounted once and then unmounted before
> shutting off the NFS server. When I realized I forgot to copy a directory over,
> I went to restart NFS on the server and found out I was unable to. Probably
> irrelevant, but this had been after transferring 7 gigs of data over 100 Mbps.
> 
> I still have the 'broken' server running, so if you would like me to run a
> command or two on it I can show you the results.

I don't think that there is much useful that I could look at, thanks.

> 
> > The attached patch might fix it, so if you are having reproducable
> > problems, it might be worth applying this patch.
> 
> I can try it tomorrow and see if it fixes the problem, but since this problem
> also occurred on a UP, using spin locks probably will not correct it. Perhaps
> it's something else.

On second thoughts, this doesn't need to be SMP related.  I don't know
much about "bottom halves" but I gather that they get run after an
interrupt has been handled and interrupts have been re-enabled, but
before the original process is rescheduled.  If this is the case, then
the "_bh" part of the "spin_lock_bh" (which does a local_bh_disable)
could be the bit that is important on a UP system.

NeilBrown


> 
> > [patch snipped]
> 
>  -Byron
> 
> -- 
> Byron Stanoszek Ph: (330) 644-3059
> Systems Programmer  Fax: (330) 644-8110
> Commercial Timesharing Inc. Email: [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: NFS stop/start problems (related to datagram shutdown bug?)

2001-02-05 Thread Byron Stanoszek

On Tue, 6 Feb 2001, Neil Brown wrote:

> How repeatable is this?  Is the server SMP?

I've tested this on two UP Athlons and 2 SMP Pentium 3's and the same problem
occurred. I have not tested it more than once on the same system (I left the
NFS servers untouched after the reboot).

The Athlon systems running NFS were 2.4.1-ac3 and the Pentiums were running
2.2.19-pre7. All computers exporting the FS had one directory mounted at least
once.

In one case, only 1 directory was mounted once and then unmounted before
shutting off the NFS server. When I realized I forgot to copy a directory over,
I went to restart NFS on the server and found out I was unable to. Probably
irrelevant, but this had been after transferring 7 gigs of data over 100 Mbps.

I still have the 'broken' server running, so if you would like me to run a
command or two on it I can show you the results.

> The attached patch might fix it, so if you are having reproducable
> problems, it might be worth applying this patch.

I can try it tomorrow and see if it fixes the problem, but since this problem
also occurred on a UP, using spin locks probably will not correct it. Perhaps
it's something else.

> [patch snipped]

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: NFS stop/start problems (related to datagram shutdown bug?)

2001-02-05 Thread Neil Brown

On Monday February 5, [EMAIL PROTECTED] wrote:
> Seems recently, on both redhat 6.1 and 7.0 using kernel 2.4.1-ac3, I
> ran into this problem:
> 
> Stopping NFS says the following in the kernel logs:
> 
> nfsd: terminating on signal 9
> nfsd: terminating on signal 9
> nfsd: terminating on signal 9
> nfsd: terminating on signal 9
> nfsd: terminating on signal 9
> nfsd: terminating on signal 9
> nfsd: terminating on signal 9
> nfsd: terminating on signal 9
> svc: server socket destroy delayed
> 
> And restarting NFS has the following error message:
> 
> root:~> /etc/rc.d/init.d/nfs start
> Starting NFS services: [  OK  ]
> Starting NFS quotas:   [  OK  ]
> Starting NFS mountd:   [  OK  ]
> Starting NFS daemon: nfssvc: Address already in use
>[FAILED]

How repeatable is this?  Is the server SMP?

There does seem to be a possible problem with sk_inuse not being
updated atomically, so a race between an increment and a decrement
could lose one of them.
svc_sock_release seems to often be called with no more protection than
the BKL, and it decrements sk_inuse.

svc_sock_enqueue, on the other hand increments sk_inuse, and is
protected by sv_lock, but not, I think, by the BKL, as it is called by
a networking layer callback.  So there might be a possibility for a
race here.

The attached patch might fix it, so if you are having reproducable
problems, it might be worth applying this patch.

Trond: any comments?

NeilBrown

[ a better fix would be to make sk_inuse atomic_t ]

--- net/sunrpc/svcsock.c2001/02/05 23:45:54 1.1
+++ net/sunrpc/svcsock.c2001/02/05 23:48:12
@@ -211,16 +211,22 @@
 svc_sock_release(struct svc_rqst *rqstp)
 {
struct svc_sock *svsk = rqstp->rq_sock;
+   struct svc_serv *serv = svsk->sk_server;
 
if (!svsk)
return;
svc_release_skb(rqstp);
rqstp->rq_sock = NULL;
+
+   spin_lock_bh(>sv_lock);
if (!--(svsk->sk_inuse) && svsk->sk_dead) {
+   spin_unlock_bh(>sv_lock);
dprintk("svc: releasing dead socket\n");
sock_release(svsk->sk_sock);
kfree(svsk);
}
+   else
+   spin_unlock_bh(>sv_lock);
 }
 
 /*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: NFS stop/start problems (related to datagram shutdown bug?)

2001-02-05 Thread Byron Stanoszek

On Mon, 5 Feb 2001, Alan Cox wrote:

> > Seems recently, on both redhat 6.1 and 7.0 using kernel 2.4.1-ac3, I
> > ran into this problem:
> 
> Ok seen this in older 2.2 but not 2.4
> 
> > nfsd: terminating on signal 9
> > svc: server socket destroy delayed
> > 
> > And restarting NFS has the following error message:
> > Starting NFS mountd:   [  OK  ]
> > Starting NFS daemon: nfssvc: Address already in use
> >[FAILED]
> 
> A socket got stuck. Thats preventing you restarting it. The bug is whatever
> leak caused the svc: server socket destroy delayed case. 
> 
> Just for reference what network card ?

Both machines had a 3c905b-tx-nm card in them.

3c59x.c:LK1.1.12 06 Jan 2000  Donald Becker and others.
http://www.scyld.com/network/vortex.html $Revision: 1.102.2.46 $
See Documentation/networking/vortex.txt
eth0: 3Com PCI 3c905B Cyclone 100baseTx at 0x6100,  00:50:da:cd:c8:b9, IRQ 11
  product code 'XC' rev 00.13 date 12-29-99
  8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface.
  MII transceiver found at address 24, status 786d.
  Enabling bus-master transmits and whole-frame receives.

-Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: NFS stop/start problems (related to datagram shutdown bug?)

2001-02-05 Thread Alan Cox

> Seems recently, on both redhat 6.1 and 7.0 using kernel 2.4.1-ac3, I
> ran into this problem:

Ok seen this in older 2.2 but not 2.4

> nfsd: terminating on signal 9
> svc: server socket destroy delayed
> 
> And restarting NFS has the following error message:
> Starting NFS mountd:   [  OK  ]
> Starting NFS daemon: nfssvc: Address already in use
>[FAILED]

A socket got stuck. Thats preventing you restarting it. The bug is whatever
leak caused the svc: server socket destroy delayed case. 

Just for reference what network card ?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



NFS stop/start problems (related to datagram shutdown bug?)

2001-02-05 Thread Byron Stanoszek

Seems recently, on both redhat 6.1 and 7.0 using kernel 2.4.1-ac3, I
ran into this problem:

Stopping NFS says the following in the kernel logs:

nfsd: terminating on signal 9
nfsd: terminating on signal 9
nfsd: terminating on signal 9
nfsd: terminating on signal 9
nfsd: terminating on signal 9
nfsd: terminating on signal 9
nfsd: terminating on signal 9
nfsd: terminating on signal 9
svc: server socket destroy delayed

And restarting NFS has the following error message:

root:~> /etc/rc.d/init.d/nfs start
Starting NFS services: [  OK  ]
Starting NFS quotas:   [  OK  ]
Starting NFS mountd:   [  OK  ]
Starting NFS daemon: nfssvc: Address already in use
   [FAILED]

>From that moment forward, the NFS server is completely broken until the system
is rebooted, and other machines respond during a 'mount' by saying,

nfs: server xxx not responding, still trying

When I tried this, the remote computer had unmounted this NFS-served partition
prior to shutting NFS down with '/etc/rc.d/init.d/nfs stop'. I was wondering if
this could be related to that datagram shutdown bug, and maybe if there's a
quick solution in the meantime to kill the socket so that I can restart NFS
without rebooting.

Thanks,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



NFS stop/start problems (related to datagram shutdown bug?)

2001-02-05 Thread Byron Stanoszek

Seems recently, on both redhat 6.1 and 7.0 using kernel 2.4.1-ac3, I
ran into this problem:

Stopping NFS says the following in the kernel logs:

nfsd: terminating on signal 9
nfsd: terminating on signal 9
nfsd: terminating on signal 9
nfsd: terminating on signal 9
nfsd: terminating on signal 9
nfsd: terminating on signal 9
nfsd: terminating on signal 9
nfsd: terminating on signal 9
svc: server socket destroy delayed

And restarting NFS has the following error message:

root:~ /etc/rc.d/init.d/nfs start
Starting NFS services: [  OK  ]
Starting NFS quotas:   [  OK  ]
Starting NFS mountd:   [  OK  ]
Starting NFS daemon: nfssvc: Address already in use
   [FAILED]

From that moment forward, the NFS server is completely broken until the system
is rebooted, and other machines respond during a 'mount' by saying,

nfs: server xxx not responding, still trying

When I tried this, the remote computer had unmounted this NFS-served partition
prior to shutting NFS down with '/etc/rc.d/init.d/nfs stop'. I was wondering if
this could be related to that datagram shutdown bug, and maybe if there's a
quick solution in the meantime to kill the socket so that I can restart NFS
without rebooting.

Thanks,
 Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: NFS stop/start problems (related to datagram shutdown bug?)

2001-02-05 Thread Alan Cox

 Seems recently, on both redhat 6.1 and 7.0 using kernel 2.4.1-ac3, I
 ran into this problem:

Ok seen this in older 2.2 but not 2.4

 nfsd: terminating on signal 9
 svc: server socket destroy delayed
 
 And restarting NFS has the following error message:
 Starting NFS mountd:   [  OK  ]
 Starting NFS daemon: nfssvc: Address already in use
[FAILED]

A socket got stuck. Thats preventing you restarting it. The bug is whatever
leak caused the svc: server socket destroy delayed case. 

Just for reference what network card ?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: NFS stop/start problems (related to datagram shutdown bug?)

2001-02-05 Thread Byron Stanoszek

On Mon, 5 Feb 2001, Alan Cox wrote:

  Seems recently, on both redhat 6.1 and 7.0 using kernel 2.4.1-ac3, I
  ran into this problem:
 
 Ok seen this in older 2.2 but not 2.4
 
  nfsd: terminating on signal 9
  svc: server socket destroy delayed
  
  And restarting NFS has the following error message:
  Starting NFS mountd:   [  OK  ]
  Starting NFS daemon: nfssvc: Address already in use
 [FAILED]
 
 A socket got stuck. Thats preventing you restarting it. The bug is whatever
 leak caused the svc: server socket destroy delayed case. 
 
 Just for reference what network card ?

Both machines had a 3c905b-tx-nm card in them.

3c59x.c:LK1.1.12 06 Jan 2000  Donald Becker and others.
http://www.scyld.com/network/vortex.html $Revision: 1.102.2.46 $
See Documentation/networking/vortex.txt
eth0: 3Com PCI 3c905B Cyclone 100baseTx at 0x6100,  00:50:da:cd:c8:b9, IRQ 11
  product code 'XC' rev 00.13 date 12-29-99
  8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface.
  MII transceiver found at address 24, status 786d.
  Enabling bus-master transmits and whole-frame receives.

-Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: NFS stop/start problems (related to datagram shutdown bug?)

2001-02-05 Thread Neil Brown

On Monday February 5, [EMAIL PROTECTED] wrote:
 Seems recently, on both redhat 6.1 and 7.0 using kernel 2.4.1-ac3, I
 ran into this problem:
 
 Stopping NFS says the following in the kernel logs:
 
 nfsd: terminating on signal 9
 nfsd: terminating on signal 9
 nfsd: terminating on signal 9
 nfsd: terminating on signal 9
 nfsd: terminating on signal 9
 nfsd: terminating on signal 9
 nfsd: terminating on signal 9
 nfsd: terminating on signal 9
 svc: server socket destroy delayed
 
 And restarting NFS has the following error message:
 
 root:~ /etc/rc.d/init.d/nfs start
 Starting NFS services: [  OK  ]
 Starting NFS quotas:   [  OK  ]
 Starting NFS mountd:   [  OK  ]
 Starting NFS daemon: nfssvc: Address already in use
[FAILED]

How repeatable is this?  Is the server SMP?

There does seem to be a possible problem with sk_inuse not being
updated atomically, so a race between an increment and a decrement
could lose one of them.
svc_sock_release seems to often be called with no more protection than
the BKL, and it decrements sk_inuse.

svc_sock_enqueue, on the other hand increments sk_inuse, and is
protected by sv_lock, but not, I think, by the BKL, as it is called by
a networking layer callback.  So there might be a possibility for a
race here.

The attached patch might fix it, so if you are having reproducable
problems, it might be worth applying this patch.

Trond: any comments?

NeilBrown

[ a better fix would be to make sk_inuse atomic_t ]

--- net/sunrpc/svcsock.c2001/02/05 23:45:54 1.1
+++ net/sunrpc/svcsock.c2001/02/05 23:48:12
@@ -211,16 +211,22 @@
 svc_sock_release(struct svc_rqst *rqstp)
 {
struct svc_sock *svsk = rqstp-rq_sock;
+   struct svc_serv *serv = svsk-sk_server;
 
if (!svsk)
return;
svc_release_skb(rqstp);
rqstp-rq_sock = NULL;
+
+   spin_lock_bh(serv-sv_lock);
if (!--(svsk-sk_inuse)  svsk-sk_dead) {
+   spin_unlock_bh(serv-sv_lock);
dprintk("svc: releasing dead socket\n");
sock_release(svsk-sk_sock);
kfree(svsk);
}
+   else
+   spin_unlock_bh(serv-sv_lock);
 }
 
 /*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: NFS stop/start problems (related to datagram shutdown bug?)

2001-02-05 Thread Byron Stanoszek

On Tue, 6 Feb 2001, Neil Brown wrote:

 How repeatable is this?  Is the server SMP?

I've tested this on two UP Athlons and 2 SMP Pentium 3's and the same problem
occurred. I have not tested it more than once on the same system (I left the
NFS servers untouched after the reboot).

The Athlon systems running NFS were 2.4.1-ac3 and the Pentiums were running
2.2.19-pre7. All computers exporting the FS had one directory mounted at least
once.

In one case, only 1 directory was mounted once and then unmounted before
shutting off the NFS server. When I realized I forgot to copy a directory over,
I went to restart NFS on the server and found out I was unable to. Probably
irrelevant, but this had been after transferring 7 gigs of data over 100 Mbps.

I still have the 'broken' server running, so if you would like me to run a
command or two on it I can show you the results.

 The attached patch might fix it, so if you are having reproducable
 problems, it might be worth applying this patch.

I can try it tomorrow and see if it fixes the problem, but since this problem
also occurred on a UP, using spin locks probably will not correct it. Perhaps
it's something else.

 [patch snipped]

 -Byron

-- 
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: NFS stop/start problems (related to datagram shutdown bug?)

2001-02-05 Thread Neil Brown

On Monday February 5, [EMAIL PROTECTED] wrote:
 On Tue, 6 Feb 2001, Neil Brown wrote:
 
  How repeatable is this?  Is the server SMP?
 
 I've tested this on two UP Athlons and 2 SMP Pentium 3's and the same problem
 occurred. I have not tested it more than once on the same system (I left the
 NFS servers untouched after the reboot).
 
 The Athlon systems running NFS were 2.4.1-ac3 and the Pentiums were running
 2.2.19-pre7. All computers exporting the FS had one directory mounted at least
 once.
 
 In one case, only 1 directory was mounted once and then unmounted before
 shutting off the NFS server. When I realized I forgot to copy a directory over,
 I went to restart NFS on the server and found out I was unable to. Probably
 irrelevant, but this had been after transferring 7 gigs of data over 100 Mbps.
 
 I still have the 'broken' server running, so if you would like me to run a
 command or two on it I can show you the results.

I don't think that there is much useful that I could look at, thanks.

 
  The attached patch might fix it, so if you are having reproducable
  problems, it might be worth applying this patch.
 
 I can try it tomorrow and see if it fixes the problem, but since this problem
 also occurred on a UP, using spin locks probably will not correct it. Perhaps
 it's something else.

On second thoughts, this doesn't need to be SMP related.  I don't know
much about "bottom halves" but I gather that they get run after an
interrupt has been handled and interrupts have been re-enabled, but
before the original process is rescheduled.  If this is the case, then
the "_bh" part of the "spin_lock_bh" (which does a local_bh_disable)
could be the bit that is important on a UP system.

NeilBrown


 
  [patch snipped]
 
  -Byron
 
 -- 
 Byron Stanoszek Ph: (330) 644-3059
 Systems Programmer  Fax: (330) 644-8110
 Commercial Timesharing Inc. Email: [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/