nfsv3 lockd causes kernel oops

2000-09-19 Thread Trond Myklebust


Hi,

  Ever since the introduction of the new file locking code in 2.4.x,
NFS locking has been seriously broken. A preliminary look seems to
indicate that the problems might lie in the generic locking code: when
running the connectathon tests against another server I get crap like

  panic("Attempting to free lock with active wait queue")

This occurs regularly when running the various child process/parent
process locking tests (essentially those tests which activate the lock
blocking).

  This needs to be fixed before we release the final 2.4.0. I'll see
if I can't find time to look into it myself in the course of the week,
however I'm trying hard not to get too distracted from writing up my
thesis. It would therefore be nice if Willy and/or others familiar
with the file locking could take a look too.

  In the meantime, perhaps Ted could put this issue into his TODO
list?

Cheers,
  Trond


> " " == Christian Brandes <[EMAIL PROTECTED]> writes:

 > Hi!

 > I've been trying to setup an NFSv3 server that supports
 > nfs_lock.

 > So I installed RedHat 7.0beta (6.9.5) and built a 2.4.0-test7
 > kernel.  During boot up I noticed that the launching of
 > nfslockd did not succed, might be that I do not have the right
 > nfs-tools. I use the ones that are provided with the distrib
 > (nfs-utils-0.1.9.1-5).

 > I did try out what happens when I lock a file by nfs. I started
 > my little test program twice on a client machine:

 > #include  include  include  include
 > # include  include 
 > #include  include 

 > int main (void) {
 >   int fd; int rc; printf ("\n"); fd = open("./t1", O_RDWR); rc
 >   = flock(fd, LOCK_EX ); printf ("rc: %d\n", rc); sleep (30);
 >   printf ("exit\n");
 > }

 > And the result was a kernel Oops on the server:

 > locks_insert_block: removing duplicated lock (pid=59714
 > 0-9223372036854775807 type=1) Unable to handle kernel NULL
 > pointer dereference at virtual address 0004
 >  printing eip:
 > c013b88e *pde =  Oops: 0002 CPU: 0 EIP:
 > 0010:[locks_delete_block+14/48] EFLAGS: 00010286 eax: c586f86c
 > ebx:  ecx: c586f878 edx:  esi: c586f878 edi:
 > cfe1a9f0 ebp: cfe1a9f0 esp: c581def8 ds: 0018 es: 0018 ss: 0018
 > Process lockd (pid: 2368, stackpage=c581d000) Stack: c586f86c
 > c013b8e7 c586f86c c0224e60 e942   
 >7fff 0001 cfcbcc08 cfcbc81c c586f800 c013ce4d
 >cfe1a9f0 c586f86c c016d7cd cfe1a9f0 c586f86c c586f800
 >c0171c69 c5983400 c581df58 cfcbc7d0
 > Call Trace: [locks_insert_block+55/112] [tvecs+18936/114884]
 > [posix_block_lock+13/16] [nlmsvc_lock+685/720]
 > [nlm4svc_retrieve_args+169/240] [nlm4svc_proc_lock+149/224]
 > [svc_process+729/1248]
 >[lockd+423/576] [lockd+0/576] [kernel_thread+43/64]
 > Code: 89 5a 04 89 13 89 48 0c 89 48 10 8d 48 04 8b 59 04 8b 50
 > 04

 > I would be glad if you could tell me what went wrong and how I
 > could solve the problem.

 > Thanx a lot

 > Christian Brandes

 > --
 >   C H R I S T I A N B R A N D E S
 >  CAxOPEN product development technology GmbH
 > Sauerwiesen 2, D-67661 Kaiserslautern
 >  E-Mail: [EMAIL PROTECTED]
 > Tel.: +49 6301 70919 0 Fax: +49 6301 70919 59
 > --

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



nfsv3 lockd causes kernel oops

2000-09-19 Thread Trond Myklebust


Hi,

  Ever since the introduction of the new file locking code in 2.4.x,
NFS locking has been seriously broken. A preliminary look seems to
indicate that the problems might lie in the generic locking code: when
running the connectathon tests against another server I get crap like

  panic("Attempting to free lock with active wait queue")

This occurs regularly when running the various child process/parent
process locking tests (essentially those tests which activate the lock
blocking).

  This needs to be fixed before we release the final 2.4.0. I'll see
if I can't find time to look into it myself in the course of the week,
however I'm trying hard not to get too distracted from writing up my
thesis. It would therefore be nice if Willy and/or others familiar
with the file locking could take a look too.

  In the meantime, perhaps Ted could put this issue into his TODO
list?

Cheers,
  Trond


 " " == Christian Brandes [EMAIL PROTECTED] writes:

  Hi!

  I've been trying to setup an NFSv3 server that supports
  nfs_lock.

  So I installed RedHat 7.0beta (6.9.5) and built a 2.4.0-test7
  kernel.  During boot up I noticed that the launching of
  nfslockd did not succed, might be that I do not have the right
  nfs-tools. I use the ones that are provided with the distrib
  (nfs-utils-0.1.9.1-5).

  I did try out what happens when I lock a file by nfs. I started
  my little test program twice on a client machine:

  #include stdio.h include errno.h include stdlib.h include
  #sys/file.h include sys/types.h include sys/stat.h
  #include fcntl.h include unistd.h

  int main (void) {
int fd; int rc; printf ("\n"); fd = open("./t1", O_RDWR); rc
= flock(fd, LOCK_EX ); printf ("rc: %d\n", rc); sleep (30);
printf ("exit\n");
  }

  And the result was a kernel Oops on the server:

  locks_insert_block: removing duplicated lock (pid=59714
  0-9223372036854775807 type=1) Unable to handle kernel NULL
  pointer dereference at virtual address 0004
   printing eip:
  c013b88e *pde =  Oops: 0002 CPU: 0 EIP:
  0010:[locks_delete_block+14/48] EFLAGS: 00010286 eax: c586f86c
  ebx:  ecx: c586f878 edx:  esi: c586f878 edi:
  cfe1a9f0 ebp: cfe1a9f0 esp: c581def8 ds: 0018 es: 0018 ss: 0018
  Process lockd (pid: 2368, stackpage=c581d000) Stack: c586f86c
  c013b8e7 c586f86c c0224e60 e942   
 7fff 0001 cfcbcc08 cfcbc81c c586f800 c013ce4d
 cfe1a9f0 c586f86c c016d7cd cfe1a9f0 c586f86c c586f800
 c0171c69 c5983400 c581df58 cfcbc7d0
  Call Trace: [locks_insert_block+55/112] [tvecs+18936/114884]
  [posix_block_lock+13/16] [nlmsvc_lock+685/720]
  [nlm4svc_retrieve_args+169/240] [nlm4svc_proc_lock+149/224]
  [svc_process+729/1248]
 [lockd+423/576] [lockd+0/576] [kernel_thread+43/64]
  Code: 89 5a 04 89 13 89 48 0c 89 48 10 8d 48 04 8b 59 04 8b 50
  04

  I would be glad if you could tell me what went wrong and how I
  could solve the problem.

  Thanx a lot

  Christian Brandes

  --
C H R I S T I A N B R A N D E S
   CAxOPEN product development technology GmbH
  Sauerwiesen 2, D-67661 Kaiserslautern
   E-Mail: [EMAIL PROTECTED]
  Tel.: +49 6301 70919 0 Fax: +49 6301 70919 59
  --

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/