hi All,

It seems that unionfs is prone to deadlock when reading file, and concurrently 
mmapping/munmapping in thread another file within same directory.

Observatons: when process and thread are reading files in same directory, 
unionfs_read() in process context aquires mutex on parent dentry, while thread 
mmaps/munmaps file, and sys_mmap()/sys_munmap() locks mm->mmap_sem in thread 
context, then waits for parent dentry to be released by process. When page 
fault happens in process context, do_page_fault() waits for mm->mmap_sem to be 
resumed, so process waits for thread to resume mm->_mmap_sem, while thread 
waits for process to resume parent dentry.

Steps to reproduce: I was able to reproduce it on 4 core machine on 2.6.27 
branch, but suspect that it's also reproducible on later releases. I wrote 
simple application (attached test2.c) for it:

$ gcc -W -Wall -lpthread -o test2 test2.c

might not happen right away though, so run many times:

$ iteration=1; while (true); do ./test2 test.out test2.out && echo $iteration 
passed ; ((iteration++)) ; done

I used 2 quite large files: test.out (~1G) and test2.out (~10M).

Here's excerpt from dmesg:
[...]
[1142663.186925] Show Blocked State
[1142663.187698]   task          taskaddr stack   pid father
[1142663.187698] test2         D f3aba530     0 31221  18005
[1142663.187698]        e9d1dbc8 00200082 00000002 e9d1dbb0 e9d1dbb8 00000000 
e9d1dc9c e9d1dbb0 
[1142663.187698]        e9d1dbb8 00000003 000001db c5494a00 f3aba690 00000003 
c05a8000 c06cca00 
[1142663.187698]        c05a9080 f3aba690 c0704f00 00000000 f3aba530 f3aba5d0 
f3aba920 e9d1dbbc 
[1142663.187698]  Call Trace:
[1142663.187698]  [<c0404798>] rwsem_down_failed_common+0x88/0x17d
[1142663.187698]  [<c02136ff>] ? search_by_key+0x167/0x140c
[1142663.187698]  [<c0114ef8>] ? do_page_fault+0x0/0x8f8
[1142663.187698]  [<c04048cf>] rwsem_down_read_failed+0x1d/0x26
[1142663.187698]  [<c0404913>] call_rwsem_down_read_failed+0x7/0xc
[1142663.187698]  [<c0403dbf>] ? down_read+0x6d/0x7e
[1142663.187698]  [<c0114fb7>] ? do_page_fault+0xbf/0x8f8
[1142663.187698]  [<c0114fb7>] do_page_fault+0xbf/0x8f8
[1142663.187698]  [<c01760c7>] ? __lock_acquire+0x2d5/0x939
[1142663.187698]  [<c01760c7>] ? __lock_acquire+0x2d5/0x939
[1142663.187698]  [<c0114ef8>] ? do_page_fault+0x0/0x8f8
[1142663.187698]  [<c0405312>] error_code+0x72/0x78
[1142663.187698]  [<c0188606>] ? file_read_actor+0x3c/0xc4
[1142663.187698]  [<c018af6f>] generic_file_aio_read+0x3b6/0x647
[1142663.187698]  [<c01abb41>] do_sync_read+0xbb/0xeb
[1142663.187698]  [<c01361da>] ? autoremove_wake_function+0x0/0x36
[1142663.187698]  [<c04037a0>] ? mutex_lock_nested+0x165/0x221
[1142663.187698]  [<c01ac5a1>] vfs_read+0x89/0x127
[1142663.187698]  [<c01aba86>] ? do_sync_read+0x0/0xeb
[1142663.187698]  [<f8e8e8f9>] unionfs_read+0xe7/0x18e [unionfs]
[1142663.187698]  [<c01ac5a1>] vfs_read+0x89/0x127
[1142663.187698]  [<f8e8e812>] ? unionfs_read+0x0/0x18e [unionfs]
[1142663.187698]  [<c01acb16>] ? fget_light+0x40/0xbc
[1142663.187698]  [<c01ac728>] sys_read+0x3d/0xa4
[1142663.187698]  [<c0103062>] syscall_call+0x7/0xb
[1142663.187698]  =======================
[1142663.187698] test2         D f3abb120     0 31222  18005
[1142663.187698]        e9d13e78 00200086 00000002 e9d13e60 e9d13e68 00000000 
f3abb4e0 e9d13e60 
[1142663.187698]        e9d13e68 00000001 c01760c7 c5242a00 f3abb280 00000001 
c05a8000 c06cca00 
[1142663.187698]        c05a9080 f3abb280 c0704e00 00000000 f3abb120 f3abb120 
00000002 f3abb170 
[1142663.187698]  Call Trace:
[1142663.187698]  [<c01760c7>] ? __lock_acquire+0x2d5/0x939
[1142663.187698]  [<c0403728>] mutex_lock_nested+0xed/0x221
[1142663.187698]  [<f8e8e446>] ? unionfs_mmap+0x9c/0x299 [unionfs]
[1142663.187698]  [<f8e8e446>] ? unionfs_mmap+0x9c/0x299 [unionfs]
[1142663.187698]  [<f8e8e446>] unionfs_mmap+0x9c/0x299 [unionfs]
[1142663.187698]  [<c019ca44>] mmap_region+0x250/0x5e2
[1142663.187698]  [<c01760c7>] ? __lock_acquire+0x2d5/0x939
[1142663.187698]  [<c019cfbe>] do_mmap_pgoff+0x1e8/0x2d1
[1142663.187698]  [<c0106d4b>] sys_mmap2+0x9d/0xb0
[1142663.187698]  [<c0103062>] syscall_call+0x7/0xb
[1142663.187698]  =======================
[1142726.316398] Show Locks Held
[1142726.317156] 
[1142726.317158] Showing all locks held in the system:
[1142726.317187] 1 lock held by agetty/8788:
[1142726.317191]  #0:  (&tty->atomic_read_lock){....}, at: [<c02a195f>] 
read_chan+0x463/0x6a0
[1142726.317214] 1 lock held by agetty/8789:
[1142726.317218]  #0:  (&tty->atomic_read_lock){....}, at: [<c02a195f>] 
read_chan+0x463/0x6a0
[1142726.317232] 1 lock held by agetty/8791:
[1142726.317236]  #0:  (&tty->atomic_read_lock){....}, at: [<c02a195f>] 
read_chan+0x463/0x6a0
[1142726.317250] 1 lock held by agetty/8792:
[1142726.317255]  #0:  (&tty->atomic_read_lock){....}, at: [<c02a195f>] 
read_chan+0x463/0x6a0
[1142726.317271] 1 lock held by agetty/8794:
[1142726.317275]  #0:  (&tty->atomic_read_lock){....}, at: [<c02a195f>] 
read_chan+0x463/0x6a0
[1142726.317289] 1 lock held by agetty/8796:
[1142726.317293]  #0:  (&tty->atomic_read_lock){....}, at: [<c02a195f>] 
read_chan+0x463/0x6a0
[1142726.317322] 1 lock held by bash/32037:
[1142726.317327]  #0:  (&tty->atomic_read_lock){....}, at: [<c02a195f>] 
read_chan+0x463/0x6a0
[1142726.317344] 1 lock held by bash/2268:
[1142726.317348]  #0:  (&tty->atomic_read_lock){....}, at: [<c02a195f>] 
read_chan+0x463/0x6a0
[1142726.317363] 1 lock held by bash/12432:
[1142726.317368]  #0:  (&tty->atomic_read_lock){....}, at: [<c02a195f>] 
read_chan+0x463/0x6a0
[1142726.317384] 1 lock held by bash/13179:
[1142726.317388]  #0:  (&tty->atomic_read_lock){....}, at: [<c02a195f>] 
read_chan+0x463/0x6a0
[1142726.317403] 1 lock held by bash/26331:
[1142726.317408]  #0:  (&tty->atomic_read_lock){....}, at: [<c02a195f>] 
read_chan+0x463/0x6a0
[1142726.317429] 4 locks held by test2/31221:
[1142726.317434]  #0:  (&UNIONFS_SB(sb)->rwsem#2/1){....}, at: [<f8e8e852>] 
unionfs_read+0x40/0x18e [unionfs]
[1142726.317459]  #1:  (&info->lock#4/2){....}, at: [<f8e8e8b1>] 
unionfs_read+0x9f/0x18e [unionfs]
[1142726.317480]  #2:  (&info->lock#3/3){....}, at: [<f8e8e8be>] 
unionfs_read+0xac/0x18e [unionfs]
[1142726.317501]  #3:  (&mm->mmap_sem){....}, at: [<c0114fb7>] 
do_page_fault+0xbf/0x8f8
[1142726.317522] 3 locks held by test2/31222:
[1142726.317526]  #0:  (&mm->mmap_sem){....}, at: [<c0106d2c>] 
sys_mmap2+0x7e/0xb0
[1142726.317542]  #1:  (&UNIONFS_SB(sb)->rwsem#2/1){....}, at: [<f8e8e3e7>] 
unionfs_mmap+0x3d/0x299 [unionfs]
[1142726.317562]  #2:  (&info->lock#4/2){....}, at: [<f8e8e446>] 
unionfs_mmap+0x9c/0x299 [unionfs]
[1142726.317584] 2 locks held by bash/31255:
[1142726.317588]  #0:  (sysrq_key_table_lock){....}, at: [<c02af869>] 
__handle_sysrq+0x1b/0x111
[1142726.317603]  #1:  (tasklist_lock){....}, at: [<c0175367>] 
debug_show_all_locks+0x36/0x17a
[1142726.317619] 
[1142726.317623] =============================================

Currently I don't understand unionfs internals good enough to fix it by my 
own, so I'd like to ask if there's any possible way to overcome this?
AFAIK parent is locked for later file revalidation. Is  revalidation in 
unionfs_release() really needed then? How can we avoid parent dentry lock in 
unionfs_mmap()?

Thanks!

-- 
regards,
Sergey
/* Test app to reproduce unionfs deadlock
 *
 * ./test2 file0 file1
 *
 * to read file0 in process, and mmap/munmap file1 in thread
 *
 * */
#include <pthread.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
#include <assert.h>
#include <stdio.h>

#define SIZE_0 300*1024*1024
#define SIZE_1 10*1024*1024

void* thread_cb(void *arg){
	char *t_addr;
	int fd, i;
	fd = open( (char*)arg, O_RDONLY );
	for(i=0; i < 1000000; ++i){
		t_addr = mmap(NULL, SIZE_1, PROT_READ, MAP_PRIVATE, fd, 0);
		assert(t_addr != MAP_FAILED);
		munmap(t_addr, SIZE_1);
		if (i%100000 == 0)
			fprintf(stderr, "%d mmap/munmap iterations passed\n", i);
	}
	close(fd);
	return NULL;
}

int main(int argc, char *argv[]){
	char *major_file = argv[1];
	char *thread_file = argv[2];
	char *addr;
	int ret, fd;
	pthread_t thread;
	addr = malloc(SIZE_0);
	assert(addr != NULL);
	fd = open(major_file, O_RDONLY);
	assert(fd != -1);

	ret = pthread_create(&thread, NULL, thread_cb, (void*)thread_file);
	assert(ret == 0);
	fprintf(stderr, "thread created\n" );

	read(fd, addr, SIZE_0);
	fprintf(stderr, "block read\n");
	ret = pthread_join(thread, NULL);
	assert(ret == 0);
	fprintf(stderr, "thread terminated\n" );
	close(fd);
	free(addr);
	return 0;
}

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
unionfs mailing list: http://unionfs.filesystems.org/
[email protected]
http://www.fsl.cs.sunysb.edu/mailman/listinfo/unionfs

Reply via email to