Re: Filesystem deadlock

1999-02-24 Thread Luoqi Chen
 Luoqi Chen said:
   
  Do you still have that piece of code? Does it handle the case involves more
  than one process? For example, process 1 mmaps file B and reads file A into
  the mmapped region, while process 2 mmaps file A and reads file B, this 
  could
  also result in a deadlock.
  
 It used to be part of the tree, but I seem to remember that it was removed
 (by those who understand the code :-)) soon after I left.  I will look for
 it, and see if it would help with the problem(s).
 
 -- 
 John  | Never try to teach a pig to sing,
 dy...@iquest.net  | it makes one look stupid
 jdy...@nc.com | and it irritates the pig.
 
I have some thoughts on how to solve this problem. A deadlock can occur
when you read into a mmapped region or write from a mmapped region, a
solution to this problem must be able to handle both cases.

For the first case (read), (as originally suggested by Tor Egge), we could 
allow vm_fault's shared lock attempt to succeed even if there's already
a process waiting for the exclusive lock. This is unlikely to create any
starvation problem.

For the second case (write), it's trickier if there're two processes
involved. My solution is not to use exclusive lock for write, because
in most cases we don't need to lock the vnode exclusively, except when
disk block allocation is required. We could instead perform a lock
upgrade before and a downgrade after the block allocation, so the process
will only hold a shared lock when copying from the mmapped address,
and thus deadlock can be avoided just as in the first case.

Comments?

-lq


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message



Re: Filesystem deadlock

1999-02-23 Thread Kevin Day
 On Mon, 22 Feb 1999, Alexander N. Kabaev wrote:
 
  The following script reliably causes FreeBSD 4.0-CURRENT (and 3.1-STABLE
  as of today) to lookup. Shortly after this script is started, all disk 
  activity
  
  stops and any attempt to create new process causes system to freese. While 
  in DDB, ps command
  
  shows, that all ten fgrep processes are sleeping on inode, all xargs are in 
  waitpid and
  
  all sh processes are in wait.
 
 You forget about all the processes (just a few, actually) stuck in kmaw
 (kmem_alloc_wait). This is definitely reproducible :( Should be simple for
 someone more knowledgeable to diagnose, as it looks to be a straight
 vm/vfs(ufs/ffs) interaction.


This is happening to me too, with a system that was from the 19th's SNAP, as
well as today's kernel. (except I don't see anything in 'kmaw'). The process
'swapper' is stuck in 'inode', as well as anything else that's tried to
touch the disk. Lots of 'sh's sitting in 'wait'.

This machine is a heavy NFS client, but I'm not sure that it's related.


Kevin


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message



Re: Filesystem deadlock

1999-02-23 Thread Valentin Nechayev
Alexander N. Kabaev wrote:

ANK The following script reliably causes FreeBSD 4.0-CURRENT (and 3.1-STABLE
ANK as of today) to lookup.

2.2.8 and 3.0-RELEASE are not vulnerable, by the way.

ANK Shortly after this script is started, all disk
ANK activity stops and any attempt to create new process causes system to
ANK freese.

No, creating of new process is possible, but no file can be opened. All memory
activity does not hang: i.e., top redraws list of active processes.
Also, command '( export A=1 B=2 set )' works - that is, fork() works.

ANK While in DDB, ps command shows, that all ten fgrep processes are
ANK sleeping on inode, all xargs are in waitpid and
ANK all sh processes are in wait.

In original tests, any process can stop in 'inode' state when it try to
open a file. For example, try type 'ps' at another terminal and You can see
shell stopped in 'inode' state ;(

ANK #!/bin/sh
ANK for j in 1 2 3 4 5 6 7 8 9 10; do
ANK   echo -n $i $j
   ~~ ;(
ANK nohup sh -c 'while :; do find /usr -type f | xargs fgrep zukabuka;
ANK done' \
  /dev/null 21 
ANK echo
ANK done

-- --
Valentin Nechayev
ne...@lucky.net
II:LDXIII/MCMLXXII.CCC


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message



Re: Filesystem deadlock

1999-02-23 Thread Luoqi Chen
 Luoqi Chen said:
   
  This seems to be the good old vnode deadlock during vm_fault() that has been
  reported a couple of times, and there's still no satisfactory solution to 
  it:
  fgrep does something like this: (don't ask me why)
  
  addr = mmap(0, len, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, offset);
  read(fd, addr, count);
  
  the read() syscall first locks the vnode, read the data from disk, then copy
  the data to buffer at addr, now if addr is not in core, there'll be a page
  fault and the fault handler vm_fault will try to lock the vnode pager 
  backing
  the page at addr, which is already locked, deadlock. This deadlock then
  propagates all the way back to the root vnode and the whole system would
  freeze.
 
 I believe that I had a pseudo-fix to that, and it might have been removed.
 (In non-multithreaded kernels, when having to do things like the above,
  allowing recursive locks under certain circumstances can solve the problem.
  The key is to avoid the case where it covers up real bugs.)
 
 -- 
 John  | Never try to teach a pig to sing,
 dy...@iquest.net  | it makes one look stupid
 jdy...@nc.com | and it irritates the pig.
 
Do you still have that piece of code? Does it handle the case involves more
than one process? For example, process 1 mmaps file B and reads file A into
the mmapped region, while process 2 mmaps file A and reads file B, this could
also result in a deadlock.

-lq


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message



Re: Filesystem deadlock

1999-02-23 Thread John S. Dyson
Luoqi Chen said:
  
 Do you still have that piece of code? Does it handle the case involves more
 than one process? For example, process 1 mmaps file B and reads file A into
 the mmapped region, while process 2 mmaps file A and reads file B, this could
 also result in a deadlock.
 
It used to be part of the tree, but I seem to remember that it was removed
(by those who understand the code :-)) soon after I left.  I will look for
it, and see if it would help with the problem(s).

-- 
John  | Never try to teach a pig to sing,
dy...@iquest.net  | it makes one look stupid
jdy...@nc.com | and it irritates the pig.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message



Filesystem deadlock

1999-02-22 Thread Alexander N. Kabaev
The following script reliably causes FreeBSD 4.0-CURRENT (and 3.1-STABLE
as of today) to lookup. Shortly after this script is started, all disk activity

stops and any attempt to create new process causes system to freese. While in 
DDB, ps command

shows, that all ten fgrep processes are sleeping on inode, all xargs are in 
waitpid and

all sh processes are in wait.

Unfortunately, I cannot run -g kernel on my box
at this time, so amount of useful information I can provide is pretty much
limited :(

#!/bin/sh
for j in 1 2 3 4 5 6 7 8 9 10; do
  echo -n $i $j
nohup sh -c 'while :; do find /usr -type f | xargs fgrep zukabuka;
done' \
  /dev/null 21 
echo
done





To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message



Re: Filesystem deadlock

1999-02-22 Thread Brian Feldman
On Mon, 22 Feb 1999, Alexander N. Kabaev wrote:

 The following script reliably causes FreeBSD 4.0-CURRENT (and 3.1-STABLE
 as of today) to lookup. Shortly after this script is started, all disk 
 activity
 
 stops and any attempt to create new process causes system to freese. While in 
 DDB, ps command
 
 shows, that all ten fgrep processes are sleeping on inode, all xargs are in 
 waitpid and
 
 all sh processes are in wait.

You forget about all the processes (just a few, actually) stuck in kmaw
(kmem_alloc_wait). This is definitely reproducible :( Should be simple for
someone more knowledgeable to diagnose, as it looks to be a straight
vm/vfs(ufs/ffs) interaction.

 
 Unfortunately, I cannot run -g kernel on my box
 at this time, so amount of useful information I can provide is pretty much
 limited :(
 
 #!/bin/sh
 for j in 1 2 3 4 5 6 7 8 9 10; do
   echo -n $i $j
 nohup sh -c 'while :; do find /usr -type f | xargs fgrep zukabuka;
 done' \
   /dev/null 21 
 echo
 done
 
 
 
 
 
 To Unsubscribe: send mail to majord...@freebsd.org
 with unsubscribe freebsd-current in the body of the message
 

 Brian Feldman_ __  ___ ___ ___  
 gr...@unixhelp.org   _ __ ___ | _ ) __|   \ 
 http://www.freebsd.org/ _ __ ___  | _ \__ \ |) |
 FreeBSD: The Power to Serve!  _ __ ___  _ |___/___/___/ 



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message



Re: Filesystem deadlock

1999-02-22 Thread Luoqi Chen
 On Mon, 22 Feb 1999, Alexander N. Kabaev wrote:
 
  The following script reliably causes FreeBSD 4.0-CURRENT (and 3.1-STABLE
  as of today) to lookup. Shortly after this script is started, all disk 
  activity
  
  stops and any attempt to create new process causes system to freese. While 
  in DDB, ps command
  
  shows, that all ten fgrep processes are sleeping on inode, all xargs are in 
  waitpid and
  
  all sh processes are in wait.
 
 You forget about all the processes (just a few, actually) stuck in kmaw
 (kmem_alloc_wait). This is definitely reproducible :( Should be simple for
 someone more knowledgeable to diagnose, as it looks to be a straight
 vm/vfs(ufs/ffs) interaction.
 
This seems to be the good old vnode deadlock during vm_fault() that has been
reported a couple of times, and there's still no satisfactory solution to it:
fgrep does something like this: (don't ask me why)

addr = mmap(0, len, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, offset);
read(fd, addr, count);

the read() syscall first locks the vnode, read the data from disk, then copy
the data to buffer at addr, now if addr is not in core, there'll be a page
fault and the fault handler vm_fault will try to lock the vnode pager backing
the page at addr, which is already locked, deadlock. This deadlock then
propagates all the way back to the root vnode and the whole system would
freeze.

-lq

  
  Unfortunately, I cannot run -g kernel on my box
  at this time, so amount of useful information I can provide is pretty much
  limited :(
  
  #!/bin/sh
  for j in 1 2 3 4 5 6 7 8 9 10; do
echo -n $i $j
  nohup sh -c 'while :; do find /usr -type f | xargs fgrep zukabuka;
  done' \
/dev/null 21 
  echo
  done
  


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message



Re: Filesystem deadlock

1999-02-22 Thread John S. Dyson
Luoqi Chen said:
  
 This seems to be the good old vnode deadlock during vm_fault() that has been
 reported a couple of times, and there's still no satisfactory solution to it:
 fgrep does something like this: (don't ask me why)
 
   addr = mmap(0, len, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, offset);
   read(fd, addr, count);
 
 the read() syscall first locks the vnode, read the data from disk, then copy
 the data to buffer at addr, now if addr is not in core, there'll be a page
 fault and the fault handler vm_fault will try to lock the vnode pager backing
 the page at addr, which is already locked, deadlock. This deadlock then
 propagates all the way back to the root vnode and the whole system would
 freeze.

I believe that I had a pseudo-fix to that, and it might have been removed.
(In non-multithreaded kernels, when having to do things like the above,
 allowing recursive locks under certain circumstances can solve the problem.
 The key is to avoid the case where it covers up real bugs.)

-- 
John  | Never try to teach a pig to sing,
dy...@iquest.net  | it makes one look stupid
jdy...@nc.com | and it irritates the pig.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message