On Thu, 16 Apr 2009, Marc Dionne wrote:

On 04/16/2009 08:25 AM, Felix Frank wrote:
-    if (!avc->states & CPageWrite)

I see a bug there - this line probably wants to be:
   if (!(avc->states & CPageWrite))

So the recursion was avoided by never actually doing anything in StoreAllSegments, since CPageWrite never got set and the condition was always false.

I guess this explains why mmap was severely broken since 1.4.8

With the fix above, my larger mmap test quickly runs into a deadlock again. Looks like cache_write_pages is trying to lock the page that is currently being written:

I think I just reproduced :/

(this is pdflush):
[<ffffffffa0b91d14>] ? crfree+0x38/0x3c [libafs]
[<ffffffff81077f85>] ? getnstimeofday+0x5a/0xae
[<ffffffff810b2b0a>] ? sync_page+0x0/0x45
[<ffffffff8144c905>] schedule+0x9/0x1d
[<ffffffff8144c94c>] io_schedule+0x33/0x44
[<ffffffff810b2b4b>] sync_page+0x41/0x45
[<ffffffff8144cd0e>] __wait_on_bit_lock+0x41/0x8a
[<ffffffff810b2acf>] __lock_page+0x61/0x68
[<ffffffff8107144d>] ? wake_bit_function+0x0/0x2e
[<ffffffff810b863c>] write_cache_pages+0x1dc/0x3b3
[<ffffffff810b804a>] ? __writepage+0x0/0x2f
[<ffffffff810b8832>] generic_writepages+0x1f/0x21
[<ffffffff810b8863>] do_writepages+0x2f/0x37
[<ffffffff810b35e3>] __filemap_fdatawrite_range+0x4b/0x4d
[<ffffffff810b3d90>] filemap_fdatawrite+0x1a/0x1c
[<ffffffffa0b9485c>] osi_VM_StoreAllSegments+0xd7/0x17c [libafs]
[<ffffffffa0b5e000>] afs_StoreAllSegments+0xcb/0x17c7 [libafs]
[<ffffffff810dbc69>] ? __fput+0x17b/0x18a
[<ffffffff81077f85>] ? getnstimeofday+0x5a/0xae
[<ffffffff81077fee>] ? do_gettimeofday+0x15/0x38
[<ffffffffa0b99fdf>] ? afs_icl_Event4+0xfe/0x162 [libafs]
[<ffffffffa0b751ba>] afs_DoPartialWrite+0x55/0x5a [libafs]
[<ffffffffa0b97655>] afs_linux_writepage_sync+0x30f/0x3fc [libafs]
[<ffffffff8122156b>] ? prio_tree_next+0x1c3/0x224
[<ffffffffa0b97838>] afs_linux_writepage+0x8c/0xba [libafs]
[<ffffffff810b805c>] __writepage+0x12/0x2f
[<ffffffff810b8696>] write_cache_pages+0x236/0x3b3
[<ffffffff810b804a>] ? __writepage+0x0/0x2f
[<ffffffff810b8832>] generic_writepages+0x1f/0x21
[<ffffffff810b8863>] do_writepages+0x2f/0x37
[<ffffffff810f403a>] __writeback_single_inode+0x1a1/0x3b9
[<ffffffff81052516>] ? __dequeue_entity+0x2e/0x33
[<ffffffff810f468a>] generic_sync_sb_inodes+0x2a7/0x438

What I don't get is why setting CPageWrite prevents
afs_linux_writepage_sync from being called (?), as CPageWrite is checked
inside it, and only after the afs_Trace4(). Iupdatepage with code 99999
should therefore even show up with working antirecursion, as far as I
can understand it.

You probably didn't wait long enough for the other Iupdatepage to show up. The unmap() doesn't cause a flush to happen immediately - the dirty pages eventually get written by pdflush, but that can be several seconds later. Without the anti-recursion code, close() causes osi_VM_StoreAllSegments to write out the mmaped modified pages right away.

I see, thanks for clearing that up.

Guess we're back to square one then. I posted a hack to RT #124627 yesterday that does prevent deadlock, but apparently much data won't ever get written to the cache and mmap_test reports corruptions (gets lots of 0s). So what to do instead of osi_VM_StoreAllSegments() during partial writes?

Regards
 - Felix
_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Reply via email to