there seems to be a race condition that I hit regularly when running
postmark on a dual 3.06mhz xeon blade.

basically, I do a heavy duty postmark

set size 512 10240
set number 20000
set transactions 200000
set subdirectories 200
set read 4096
set write 4096
set buffering false

and have it branch every 60 seconds. (--add new ; --mode old ro)
 
this is using the latest snapshot 

unionfs-20060117-2031

kernel trace (echo t > /proc/sysrq-trigger) gives

unionctl      D C030B520     0  3237   3235                     (NOTLB)
ea885ea8 00000086 c26e5a20 c030b520 f8e17d20 f8e0c7a5 f8e0e060 00000297
       dfc33000 f71e2300 f782e520 00085578 be7b2551 000000ec c26e5b70
f7616818
       ea885ebc c26e5a20 c26e5a20 c024c73d f8e11906 0000015c f761681c
f74cfe18
Call Trace:
 [<c024c73d>] rwsem_down_write_failed+0x8d/0x170
 [<f8ddfbd4>] .text.lock.branchman+0x12a/0x1a6 [unionfs]
 [<f8e08334>] unionfs_ioctl+0x2a4/0x3f0 [unionfs]
 [<c015ba6e>] do_ioctl+0x6e/0x80
 [<c015bc55>] vfs_ioctl+0x65/0x1e0
 [<c015be37>] sys_ioctl+0x67/0x90
 [<c0102493>] syscall_call+0x7/0xb

i.e. unionctl is waiting to get the read/write semaphore as write so
that it can do the ioctl.  implying that either something is blocked
holding it as read or something is blocked holding it as write.

but

postmark      D C030B520     0  3210   2796                     (NOTLB)
f74cfe04 00000086 f782e520 c030b520 ea8a7ea4 c2655480 0000003a 000021d8
       00000000 00000000 00000000 0000025a be7b35ca 000000ec f782e670
f7616818
       f74cfe18 f782e520 0000000c c024c5cd f74cfe28 c010f017 f761681c
f761681c
Call Trace:
 [<c024c5cd>] rwsem_down_read_failed+0x8d/0x170
 [<c010f017>] __wake_up_locked+0x27/0x30
 [<f8dc303a>] .text.lock.dentry+0x1f/0x1c5 [unionfs]
 [<c024c5cd>] rwsem_down_read_failed+0x8d/0x170
 [<f8e03282>] unionfs_file_revalidate+0x122/0x2ba0 [unionfs]
 [<f8e09d54>] fist_dprint_internal+0x14/0x80 [unionfs]
 [<f8e084f3>] unionfs_flush+0x73/0xd2f [unionfs]
 [<f8dc3798>] unionfs_write+0x1b8/0x1f0 [unionfs]
 [<c014a2bf>] vfs_write+0xff/0x160
 [<c0149806>] filp_close+0x76/0x90
 [<c0149870>] sys_close+0x50/0x60
 [<c0102493>] syscall_call+0x7/0xb

can't get the read, implying that something is blocked holding it as
write.

however, unless something is blowing up in such a way that doesn't cause
the kernel to complain, I don't see how any write_locks can be taken
without being released (it's only in branchman.c, and seems pretty
simple).


_______________________________________________
unionfs mailing list
[email protected]
http://www.fsl.cs.sunysb.edu/mailman/listinfo/unionfs

Reply via email to