hi All, It seems that unionfs is prone to deadlock when reading file, and concurrently mmapping/munmapping in thread another file within same directory.
Observatons: when process and thread are reading files in same directory,
unionfs_read() in process context aquires mutex on parent dentry, while thread
mmaps/munmaps file, and sys_mmap()/sys_munmap() locks mm->mmap_sem in thread
context, then waits for parent dentry to be released by process. When page
fault happens in process context, do_page_fault() waits for mm->mmap_sem to be
resumed, so process waits for thread to resume mm->_mmap_sem, while thread
waits for process to resume parent dentry.
Steps to reproduce: I was able to reproduce it on 4 core machine on 2.6.27
branch, but suspect that it's also reproducible on later releases. I wrote
simple application (attached test2.c) for it:
$ gcc -W -Wall -lpthread -o test2 test2.c
might not happen right away though, so run many times:
$ iteration=1; while (true); do ./test2 test.out test2.out && echo $iteration
passed ; ((iteration++)) ; done
I used 2 quite large files: test.out (~1G) and test2.out (~10M).
Here's excerpt from dmesg:
[...]
[1142663.186925] Show Blocked State
[1142663.187698] task taskaddr stack pid father
[1142663.187698] test2 D f3aba530 0 31221 18005
[1142663.187698] e9d1dbc8 00200082 00000002 e9d1dbb0 e9d1dbb8 00000000
e9d1dc9c e9d1dbb0
[1142663.187698] e9d1dbb8 00000003 000001db c5494a00 f3aba690 00000003
c05a8000 c06cca00
[1142663.187698] c05a9080 f3aba690 c0704f00 00000000 f3aba530 f3aba5d0
f3aba920 e9d1dbbc
[1142663.187698] Call Trace:
[1142663.187698] [<c0404798>] rwsem_down_failed_common+0x88/0x17d
[1142663.187698] [<c02136ff>] ? search_by_key+0x167/0x140c
[1142663.187698] [<c0114ef8>] ? do_page_fault+0x0/0x8f8
[1142663.187698] [<c04048cf>] rwsem_down_read_failed+0x1d/0x26
[1142663.187698] [<c0404913>] call_rwsem_down_read_failed+0x7/0xc
[1142663.187698] [<c0403dbf>] ? down_read+0x6d/0x7e
[1142663.187698] [<c0114fb7>] ? do_page_fault+0xbf/0x8f8
[1142663.187698] [<c0114fb7>] do_page_fault+0xbf/0x8f8
[1142663.187698] [<c01760c7>] ? __lock_acquire+0x2d5/0x939
[1142663.187698] [<c01760c7>] ? __lock_acquire+0x2d5/0x939
[1142663.187698] [<c0114ef8>] ? do_page_fault+0x0/0x8f8
[1142663.187698] [<c0405312>] error_code+0x72/0x78
[1142663.187698] [<c0188606>] ? file_read_actor+0x3c/0xc4
[1142663.187698] [<c018af6f>] generic_file_aio_read+0x3b6/0x647
[1142663.187698] [<c01abb41>] do_sync_read+0xbb/0xeb
[1142663.187698] [<c01361da>] ? autoremove_wake_function+0x0/0x36
[1142663.187698] [<c04037a0>] ? mutex_lock_nested+0x165/0x221
[1142663.187698] [<c01ac5a1>] vfs_read+0x89/0x127
[1142663.187698] [<c01aba86>] ? do_sync_read+0x0/0xeb
[1142663.187698] [<f8e8e8f9>] unionfs_read+0xe7/0x18e [unionfs]
[1142663.187698] [<c01ac5a1>] vfs_read+0x89/0x127
[1142663.187698] [<f8e8e812>] ? unionfs_read+0x0/0x18e [unionfs]
[1142663.187698] [<c01acb16>] ? fget_light+0x40/0xbc
[1142663.187698] [<c01ac728>] sys_read+0x3d/0xa4
[1142663.187698] [<c0103062>] syscall_call+0x7/0xb
[1142663.187698] =======================
[1142663.187698] test2 D f3abb120 0 31222 18005
[1142663.187698] e9d13e78 00200086 00000002 e9d13e60 e9d13e68 00000000
f3abb4e0 e9d13e60
[1142663.187698] e9d13e68 00000001 c01760c7 c5242a00 f3abb280 00000001
c05a8000 c06cca00
[1142663.187698] c05a9080 f3abb280 c0704e00 00000000 f3abb120 f3abb120
00000002 f3abb170
[1142663.187698] Call Trace:
[1142663.187698] [<c01760c7>] ? __lock_acquire+0x2d5/0x939
[1142663.187698] [<c0403728>] mutex_lock_nested+0xed/0x221
[1142663.187698] [<f8e8e446>] ? unionfs_mmap+0x9c/0x299 [unionfs]
[1142663.187698] [<f8e8e446>] ? unionfs_mmap+0x9c/0x299 [unionfs]
[1142663.187698] [<f8e8e446>] unionfs_mmap+0x9c/0x299 [unionfs]
[1142663.187698] [<c019ca44>] mmap_region+0x250/0x5e2
[1142663.187698] [<c01760c7>] ? __lock_acquire+0x2d5/0x939
[1142663.187698] [<c019cfbe>] do_mmap_pgoff+0x1e8/0x2d1
[1142663.187698] [<c0106d4b>] sys_mmap2+0x9d/0xb0
[1142663.187698] [<c0103062>] syscall_call+0x7/0xb
[1142663.187698] =======================
[1142726.316398] Show Locks Held
[1142726.317156]
[1142726.317158] Showing all locks held in the system:
[1142726.317187] 1 lock held by agetty/8788:
[1142726.317191] #0: (&tty->atomic_read_lock){....}, at: [<c02a195f>]
read_chan+0x463/0x6a0
[1142726.317214] 1 lock held by agetty/8789:
[1142726.317218] #0: (&tty->atomic_read_lock){....}, at: [<c02a195f>]
read_chan+0x463/0x6a0
[1142726.317232] 1 lock held by agetty/8791:
[1142726.317236] #0: (&tty->atomic_read_lock){....}, at: [<c02a195f>]
read_chan+0x463/0x6a0
[1142726.317250] 1 lock held by agetty/8792:
[1142726.317255] #0: (&tty->atomic_read_lock){....}, at: [<c02a195f>]
read_chan+0x463/0x6a0
[1142726.317271] 1 lock held by agetty/8794:
[1142726.317275] #0: (&tty->atomic_read_lock){....}, at: [<c02a195f>]
read_chan+0x463/0x6a0
[1142726.317289] 1 lock held by agetty/8796:
[1142726.317293] #0: (&tty->atomic_read_lock){....}, at: [<c02a195f>]
read_chan+0x463/0x6a0
[1142726.317322] 1 lock held by bash/32037:
[1142726.317327] #0: (&tty->atomic_read_lock){....}, at: [<c02a195f>]
read_chan+0x463/0x6a0
[1142726.317344] 1 lock held by bash/2268:
[1142726.317348] #0: (&tty->atomic_read_lock){....}, at: [<c02a195f>]
read_chan+0x463/0x6a0
[1142726.317363] 1 lock held by bash/12432:
[1142726.317368] #0: (&tty->atomic_read_lock){....}, at: [<c02a195f>]
read_chan+0x463/0x6a0
[1142726.317384] 1 lock held by bash/13179:
[1142726.317388] #0: (&tty->atomic_read_lock){....}, at: [<c02a195f>]
read_chan+0x463/0x6a0
[1142726.317403] 1 lock held by bash/26331:
[1142726.317408] #0: (&tty->atomic_read_lock){....}, at: [<c02a195f>]
read_chan+0x463/0x6a0
[1142726.317429] 4 locks held by test2/31221:
[1142726.317434] #0: (&UNIONFS_SB(sb)->rwsem#2/1){....}, at: [<f8e8e852>]
unionfs_read+0x40/0x18e [unionfs]
[1142726.317459] #1: (&info->lock#4/2){....}, at: [<f8e8e8b1>]
unionfs_read+0x9f/0x18e [unionfs]
[1142726.317480] #2: (&info->lock#3/3){....}, at: [<f8e8e8be>]
unionfs_read+0xac/0x18e [unionfs]
[1142726.317501] #3: (&mm->mmap_sem){....}, at: [<c0114fb7>]
do_page_fault+0xbf/0x8f8
[1142726.317522] 3 locks held by test2/31222:
[1142726.317526] #0: (&mm->mmap_sem){....}, at: [<c0106d2c>]
sys_mmap2+0x7e/0xb0
[1142726.317542] #1: (&UNIONFS_SB(sb)->rwsem#2/1){....}, at: [<f8e8e3e7>]
unionfs_mmap+0x3d/0x299 [unionfs]
[1142726.317562] #2: (&info->lock#4/2){....}, at: [<f8e8e446>]
unionfs_mmap+0x9c/0x299 [unionfs]
[1142726.317584] 2 locks held by bash/31255:
[1142726.317588] #0: (sysrq_key_table_lock){....}, at: [<c02af869>]
__handle_sysrq+0x1b/0x111
[1142726.317603] #1: (tasklist_lock){....}, at: [<c0175367>]
debug_show_all_locks+0x36/0x17a
[1142726.317619]
[1142726.317623] =============================================
Currently I don't understand unionfs internals good enough to fix it by my
own, so I'd like to ask if there's any possible way to overcome this?
AFAIK parent is locked for later file revalidation. Is revalidation in
unionfs_release() really needed then? How can we avoid parent dentry lock in
unionfs_mmap()?
Thanks!
--
regards,
Sergey
/* Test app to reproduce unionfs deadlock
*
* ./test2 file0 file1
*
* to read file0 in process, and mmap/munmap file1 in thread
*
* */
#include <pthread.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
#include <assert.h>
#include <stdio.h>
#define SIZE_0 300*1024*1024
#define SIZE_1 10*1024*1024
void* thread_cb(void *arg){
char *t_addr;
int fd, i;
fd = open( (char*)arg, O_RDONLY );
for(i=0; i < 1000000; ++i){
t_addr = mmap(NULL, SIZE_1, PROT_READ, MAP_PRIVATE, fd, 0);
assert(t_addr != MAP_FAILED);
munmap(t_addr, SIZE_1);
if (i%100000 == 0)
fprintf(stderr, "%d mmap/munmap iterations passed\n", i);
}
close(fd);
return NULL;
}
int main(int argc, char *argv[]){
char *major_file = argv[1];
char *thread_file = argv[2];
char *addr;
int ret, fd;
pthread_t thread;
addr = malloc(SIZE_0);
assert(addr != NULL);
fd = open(major_file, O_RDONLY);
assert(fd != -1);
ret = pthread_create(&thread, NULL, thread_cb, (void*)thread_file);
assert(ret == 0);
fprintf(stderr, "thread created\n" );
read(fd, addr, SIZE_0);
fprintf(stderr, "block read\n");
ret = pthread_join(thread, NULL);
assert(ret == 0);
fprintf(stderr, "thread terminated\n" );
close(fd);
free(addr);
return 0;
}
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ unionfs mailing list: http://unionfs.filesystems.org/ [email protected] http://www.fsl.cs.sunysb.edu/mailman/listinfo/unionfs
