Re: Filesystem deadlock
Luoqi Chen said: Do you still have that piece of code? Does it handle the case involves more than one process? For example, process 1 mmaps file B and reads file A into the mmapped region, while process 2 mmaps file A and reads file B, this could also result in a deadlock. It used to be part of the tree, but I seem to remember that it was removed (by those who understand the code :-)) soon after I left. I will look for it, and see if it would help with the problem(s). -- John | Never try to teach a pig to sing, dy...@iquest.net | it makes one look stupid jdy...@nc.com | and it irritates the pig. I have some thoughts on how to solve this problem. A deadlock can occur when you read into a mmapped region or write from a mmapped region, a solution to this problem must be able to handle both cases. For the first case (read), (as originally suggested by Tor Egge), we could allow vm_fault's shared lock attempt to succeed even if there's already a process waiting for the exclusive lock. This is unlikely to create any starvation problem. For the second case (write), it's trickier if there're two processes involved. My solution is not to use exclusive lock for write, because in most cases we don't need to lock the vnode exclusively, except when disk block allocation is required. We could instead perform a lock upgrade before and a downgrade after the block allocation, so the process will only hold a shared lock when copying from the mmapped address, and thus deadlock can be avoided just as in the first case. Comments? -lq To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: Filesystem deadlock
On Mon, 22 Feb 1999, Alexander N. Kabaev wrote: The following script reliably causes FreeBSD 4.0-CURRENT (and 3.1-STABLE as of today) to lookup. Shortly after this script is started, all disk activity stops and any attempt to create new process causes system to freese. While in DDB, ps command shows, that all ten fgrep processes are sleeping on inode, all xargs are in waitpid and all sh processes are in wait. You forget about all the processes (just a few, actually) stuck in kmaw (kmem_alloc_wait). This is definitely reproducible :( Should be simple for someone more knowledgeable to diagnose, as it looks to be a straight vm/vfs(ufs/ffs) interaction. This is happening to me too, with a system that was from the 19th's SNAP, as well as today's kernel. (except I don't see anything in 'kmaw'). The process 'swapper' is stuck in 'inode', as well as anything else that's tried to touch the disk. Lots of 'sh's sitting in 'wait'. This machine is a heavy NFS client, but I'm not sure that it's related. Kevin To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: Filesystem deadlock
Alexander N. Kabaev wrote: ANK The following script reliably causes FreeBSD 4.0-CURRENT (and 3.1-STABLE ANK as of today) to lookup. 2.2.8 and 3.0-RELEASE are not vulnerable, by the way. ANK Shortly after this script is started, all disk ANK activity stops and any attempt to create new process causes system to ANK freese. No, creating of new process is possible, but no file can be opened. All memory activity does not hang: i.e., top redraws list of active processes. Also, command '( export A=1 B=2 set )' works - that is, fork() works. ANK While in DDB, ps command shows, that all ten fgrep processes are ANK sleeping on inode, all xargs are in waitpid and ANK all sh processes are in wait. In original tests, any process can stop in 'inode' state when it try to open a file. For example, try type 'ps' at another terminal and You can see shell stopped in 'inode' state ;( ANK #!/bin/sh ANK for j in 1 2 3 4 5 6 7 8 9 10; do ANK echo -n $i $j ~~ ;( ANK nohup sh -c 'while :; do find /usr -type f | xargs fgrep zukabuka; ANK done' \ /dev/null 21 ANK echo ANK done -- -- Valentin Nechayev ne...@lucky.net II:LDXIII/MCMLXXII.CCC To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: Filesystem deadlock
Luoqi Chen said: This seems to be the good old vnode deadlock during vm_fault() that has been reported a couple of times, and there's still no satisfactory solution to it: fgrep does something like this: (don't ask me why) addr = mmap(0, len, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, offset); read(fd, addr, count); the read() syscall first locks the vnode, read the data from disk, then copy the data to buffer at addr, now if addr is not in core, there'll be a page fault and the fault handler vm_fault will try to lock the vnode pager backing the page at addr, which is already locked, deadlock. This deadlock then propagates all the way back to the root vnode and the whole system would freeze. I believe that I had a pseudo-fix to that, and it might have been removed. (In non-multithreaded kernels, when having to do things like the above, allowing recursive locks under certain circumstances can solve the problem. The key is to avoid the case where it covers up real bugs.) -- John | Never try to teach a pig to sing, dy...@iquest.net | it makes one look stupid jdy...@nc.com | and it irritates the pig. Do you still have that piece of code? Does it handle the case involves more than one process? For example, process 1 mmaps file B and reads file A into the mmapped region, while process 2 mmaps file A and reads file B, this could also result in a deadlock. -lq To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: Filesystem deadlock
Luoqi Chen said: Do you still have that piece of code? Does it handle the case involves more than one process? For example, process 1 mmaps file B and reads file A into the mmapped region, while process 2 mmaps file A and reads file B, this could also result in a deadlock. It used to be part of the tree, but I seem to remember that it was removed (by those who understand the code :-)) soon after I left. I will look for it, and see if it would help with the problem(s). -- John | Never try to teach a pig to sing, dy...@iquest.net | it makes one look stupid jdy...@nc.com | and it irritates the pig. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Filesystem deadlock
The following script reliably causes FreeBSD 4.0-CURRENT (and 3.1-STABLE as of today) to lookup. Shortly after this script is started, all disk activity stops and any attempt to create new process causes system to freese. While in DDB, ps command shows, that all ten fgrep processes are sleeping on inode, all xargs are in waitpid and all sh processes are in wait. Unfortunately, I cannot run -g kernel on my box at this time, so amount of useful information I can provide is pretty much limited :( #!/bin/sh for j in 1 2 3 4 5 6 7 8 9 10; do echo -n $i $j nohup sh -c 'while :; do find /usr -type f | xargs fgrep zukabuka; done' \ /dev/null 21 echo done To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: Filesystem deadlock
On Mon, 22 Feb 1999, Alexander N. Kabaev wrote: The following script reliably causes FreeBSD 4.0-CURRENT (and 3.1-STABLE as of today) to lookup. Shortly after this script is started, all disk activity stops and any attempt to create new process causes system to freese. While in DDB, ps command shows, that all ten fgrep processes are sleeping on inode, all xargs are in waitpid and all sh processes are in wait. You forget about all the processes (just a few, actually) stuck in kmaw (kmem_alloc_wait). This is definitely reproducible :( Should be simple for someone more knowledgeable to diagnose, as it looks to be a straight vm/vfs(ufs/ffs) interaction. Unfortunately, I cannot run -g kernel on my box at this time, so amount of useful information I can provide is pretty much limited :( #!/bin/sh for j in 1 2 3 4 5 6 7 8 9 10; do echo -n $i $j nohup sh -c 'while :; do find /usr -type f | xargs fgrep zukabuka; done' \ /dev/null 21 echo done To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message Brian Feldman_ __ ___ ___ ___ gr...@unixhelp.org _ __ ___ | _ ) __| \ http://www.freebsd.org/ _ __ ___ | _ \__ \ |) | FreeBSD: The Power to Serve! _ __ ___ _ |___/___/___/ To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: Filesystem deadlock
On Mon, 22 Feb 1999, Alexander N. Kabaev wrote: The following script reliably causes FreeBSD 4.0-CURRENT (and 3.1-STABLE as of today) to lookup. Shortly after this script is started, all disk activity stops and any attempt to create new process causes system to freese. While in DDB, ps command shows, that all ten fgrep processes are sleeping on inode, all xargs are in waitpid and all sh processes are in wait. You forget about all the processes (just a few, actually) stuck in kmaw (kmem_alloc_wait). This is definitely reproducible :( Should be simple for someone more knowledgeable to diagnose, as it looks to be a straight vm/vfs(ufs/ffs) interaction. This seems to be the good old vnode deadlock during vm_fault() that has been reported a couple of times, and there's still no satisfactory solution to it: fgrep does something like this: (don't ask me why) addr = mmap(0, len, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, offset); read(fd, addr, count); the read() syscall first locks the vnode, read the data from disk, then copy the data to buffer at addr, now if addr is not in core, there'll be a page fault and the fault handler vm_fault will try to lock the vnode pager backing the page at addr, which is already locked, deadlock. This deadlock then propagates all the way back to the root vnode and the whole system would freeze. -lq Unfortunately, I cannot run -g kernel on my box at this time, so amount of useful information I can provide is pretty much limited :( #!/bin/sh for j in 1 2 3 4 5 6 7 8 9 10; do echo -n $i $j nohup sh -c 'while :; do find /usr -type f | xargs fgrep zukabuka; done' \ /dev/null 21 echo done To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: Filesystem deadlock
Luoqi Chen said: This seems to be the good old vnode deadlock during vm_fault() that has been reported a couple of times, and there's still no satisfactory solution to it: fgrep does something like this: (don't ask me why) addr = mmap(0, len, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, offset); read(fd, addr, count); the read() syscall first locks the vnode, read the data from disk, then copy the data to buffer at addr, now if addr is not in core, there'll be a page fault and the fault handler vm_fault will try to lock the vnode pager backing the page at addr, which is already locked, deadlock. This deadlock then propagates all the way back to the root vnode and the whole system would freeze. I believe that I had a pseudo-fix to that, and it might have been removed. (In non-multithreaded kernels, when having to do things like the above, allowing recursive locks under certain circumstances can solve the problem. The key is to avoid the case where it covers up real bugs.) -- John | Never try to teach a pig to sing, dy...@iquest.net | it makes one look stupid jdy...@nc.com | and it irritates the pig. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message