On Mon, 27 Apr 2009, Felix Frank wrote:

re-entry into either writepage or entry in osi_VM_StoreAllSegments for
the same file if it is set.  This looks sound.  The net effect differs
from Chaskiel's suggestion in that it 1) disables
osi_Vm_StoreAllSegments on the same file for callers other than
doPartialWrite (probably a good idea), and 2) prevents concurrent
writepage calls within the same file (which might already be the
case).

Issues that remain:
- I think Felix still sees some deadlocks and data inconsistencies
with 2.6.18, but I can't reproduce with 2.6.29 or 2.6.30

There is reproduceable data loss during the mmap test, its amount being
dependent (linearly, it appears) on the size of physical memory. Corruption
seems to start right above 1/3 memory size.

Deadlocks still appear to occur above 1/2 memory size.

I've done a bunch of tests. The results are too confusing for me to even
bother you with the plots. Bottom line:
- working with mmap on a file that is smaller than the disk cache always works
- as soon as the file size exceeds the cache size, the cache gets junked
  (from earlier observations, I guess the mmap_test prog reads 0s were
   data should be)
- larger physical memory seems to enhance the chances for reading sound data,
  but that's very statistically speaking. The numbers are very controversial
  that way.

All these tests were done with a 1.4.10 client patched with
http://rt.central.org/rt/Ticket/Attachment/414217/450599/antirec-fix.patch
I disbelieve that the vanilla 1.4.10 would fare any different (if not statistically worse, but what does it matter?) Early attempts with it yielded similar levels of corruption.

The test program will still deadlock, by the way. There appears to be no fixed minumum file size to make it happen, but during tests, the range of "safe" file sizes appeared to grow in relation to the physical memory of the client.

Traces of the usual deadlocked suspects are attached. At that point, just
about any process can deadlock, I suppose. Apparently, the system ceases to balance dirty pages (which appears plausible to me, but I have no experience with virtual memory implementations whatsoever).

Leaving writepage prematurely seems to be unsound after all. Derrik suggested earlier (RT 120491) that VM handling for Linux is crooked, hence this whole issue.
Is that still true? Is this going to not be addressed in 1.4.x?

Cheers
 - Felix
pdflush       D ffff8800020c5460     0    80      7            81    24 (L-TLB)
 ffff88003fa59510  0000000000000246  ffff880001e08340  ffff880001612380 
 000000000000000a  ffff88003fa327e0  ffff88003ff20860  0000000000000984 
 ffff88003fa329c8  ffff88003fa20040 
Call Trace:
 [<ffffffff802639f9>] _spin_lock_irqsave+0x9/0x14
 [<ffffffff8026e733>] do_gettimeofday+0x1f8/0x213
 [<ffffffff802639f9>] _spin_lock_irqsave+0x9/0x14
 [<ffffffff8023f00c>] lock_timer_base+0x1b/0x3c
 [<ffffffff8021cb5e>] __mod_timer+0xb0/0xbe
 [<ffffffff8026277c>] schedule_timeout+0x8a/0xad
 [<ffffffff80291141>] process_timeout+0x0/0x5
 [<ffffffff802620ed>] io_schedule_timeout+0x4b/0x78
 [<ffffffff8023c55e>] blk_congestion_wait+0x67/0x81
 [<ffffffff80299fec>] autoremove_wake_function+0x0/0x2e
 [<ffffffff80252c9f>] writeback_inodes+0xa8/0xd8
 [<ffffffff802bde60>] balance_dirty_pages_ratelimited_nr+0x17d/0x1fa
 [<ffffffff8021090a>] generic_file_buffered_write+0x527/0x645
 [<ffffffff8020e3d2>] current_fs_time+0x3b/0x40
 [<ffffffff80315c2f>] avc_has_perm+0x43/0x55
 [<ffffffff8021685c>] __generic_file_aio_write_nolock+0x36c/0x3b8
 [<ffffffff80221a14>] generic_file_aio_write+0x65/0xc1
 [<ffffffff8804b1a2>] :ext3:ext3_file_write+0x16/0x91
 [<ffffffff80218184>] do_sync_write+0xc7/0x104
 [<ffffffff80299fec>] autoremove_wake_function+0x0/0x2e
 [<ffffffff802629d6>] mutex_lock+0xd/0x1d
 [<ffffffff80214210>] generic_file_llseek+0x7f/0x8b
 [<ffffffff881a9d76>] :libafs:osi_rdwr+0xeb/0x151
 [<ffffffff8818c573>] :libafs:afs_UFSWrite+0x5d0/0x84b
 [<ffffffff881abc2c>] :libafs:afs_linux_writepage_sync+0x253/0x3da
 [<ffffffff881adbfa>] :libafs:afs_linux_writepage+0x61/0x8a
 [<ffffffff8021ce9c>] mpage_writepages+0x1ab/0x34d
 [<ffffffff881adb99>] :libafs:afs_linux_writepage+0x0/0x8a
 [<ffffffff8025c9fb>] do_writepages+0x29/0x2f
 [<ffffffff80230c5b>] __writeback_single_inode+0x1ae/0x328
 [<ffffffff802b3a73>] delayacct_end+0x5d/0x86
 [<ffffffff8022120d>] sync_sb_inodes+0x1a9/0x267
 [<ffffffff80299dd4>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80252c79>] writeback_inodes+0x82/0xd8
 [<ffffffff802bdf62>] background_writeout+0x85/0xb8
 [<ffffffff8025801c>] pdflush+0x0/0x207
 [<ffffffff80258175>] pdflush+0x159/0x207
 [<ffffffff802bdedd>] background_writeout+0x0/0xb8
 [<ffffffff80233483>] kthread+0xfe/0x132
 [<ffffffff8025fb2c>] child_rip+0xa/0x12
 [<ffffffff80299dd4>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8026df02>] monotonic_clock+0x35/0x7b
 [<ffffffff80233385>] kthread+0x0/0x132
 [<ffffffff8025fb22>] child_rip+0x0/0x12

afsd          D ffff8800020c5460     0  1537      1          1539  1535 (L-TLB)
 ffff880031013880  0000000000000246  000000000002be5d  0000000000000246 
 000000000000000a  ffff88003fa20040  ffff88003fa327e0  0000000000000dbe 
 ffff88003fa20228  ffffffff804e0a80 
Call Trace:
 [<ffffffff802639f9>] _spin_lock_irqsave+0x9/0x14
 [<ffffffff8026e733>] do_gettimeofday+0x1f8/0x213
 [<ffffffff802639f9>] _spin_lock_irqsave+0x9/0x14
 [<ffffffff8023f00c>] lock_timer_base+0x1b/0x3c
 [<ffffffff8021cb5e>] __mod_timer+0xb0/0xbe
 [<ffffffff8026277c>] schedule_timeout+0x8a/0xad
 [<ffffffff80291141>] process_timeout+0x0/0x5
 [<ffffffff802620ed>] io_schedule_timeout+0x4b/0x78
 [<ffffffff8023c55e>] blk_congestion_wait+0x67/0x81
 [<ffffffff80299fec>] autoremove_wake_function+0x0/0x2e
 [<ffffffff80252c9f>] writeback_inodes+0xa8/0xd8
 [<ffffffff802bde60>] balance_dirty_pages_ratelimited_nr+0x17d/0x1fa
 [<ffffffff8021090a>] generic_file_buffered_write+0x527/0x645
 [<ffffffff8022eca0>] __wake_up+0x38/0x4f
 [<ffffffff80207141>] kmem_cache_free+0x80/0xd3
 [<ffffffff880317ae>] :jbd:journal_stop+0x1f3/0x1ff
 [<ffffffff8020e3d2>] current_fs_time+0x3b/0x40
 [<ffffffff88054977>] :ext3:__ext3_journal_stop+0x1f/0x3d
 [<ffffffff8021685c>] __generic_file_aio_write_nolock+0x36c/0x3b8
 [<ffffffff80221a1f>] generic_file_aio_write+0x70/0xc1
 [<ffffffff80221a14>] generic_file_aio_write+0x65/0xc1
 [<ffffffff8804b1a2>] :ext3:ext3_file_write+0x16/0x91
 [<ffffffff80218184>] do_sync_write+0xc7/0x104
 [<ffffffff80299fec>] autoremove_wake_function+0x0/0x2e
 [<ffffffff802629d6>] mutex_lock+0xd/0x1d
 [<ffffffff80214210>] generic_file_llseek+0x7f/0x8b
 [<ffffffff881a9d76>] :libafs:osi_rdwr+0xeb/0x151
 [<ffffffff881a971b>] :libafs:afs_osi_Write+0xe2/0x16f
 [<ffffffff881690cf>] :libafs:afs_WriteDCache+0x92/0xa7
 [<ffffffff8816ab2c>] :libafs:afs_WriteThroughDSlots+0x1e2/0x309
 [<ffffffff88168172>] :libafs:afs_Daemon+0x18a/0x474
 [<ffffffff881b281c>] :libafs:afsd_launcher+0x0/0x2c
 [<ffffffff881b2a56>] :libafs:afsd_thread+0x20e/0x6f7
 [<ffffffff8025fb2c>] child_rip+0xa/0x12
 [<ffffffff881b281c>] :libafs:afsd_launcher+0x0/0x2c
 [<ffffffff8026df02>] monotonic_clock+0x35/0x7b
 [<ffffffff881b2848>] :libafs:afsd_thread+0x0/0x6f7
 [<ffffffff8025fb22>] child_rip+0x0/0x12

mmap_test_tem D ffff8800020c5460     0  2042   2041                     (NOTLB)
 ffff88003617dbe8  0000000000000282  0000000000000246  000000000000000a 
 0000000000000009  ffff88003ff20860  ffffffff804e0a80  0000000000000aaf 
 ffff88003ff20a48  ffff88003fa327e0 
Call Trace:
 [<ffffffff802639f9>] _spin_lock_irqsave+0x9/0x14
 [<ffffffff8026e733>] do_gettimeofday+0x1f8/0x213
 [<ffffffff802639f9>] _spin_lock_irqsave+0x9/0x14
 [<ffffffff8023f00c>] lock_timer_base+0x1b/0x3c
 [<ffffffff8021cb5e>] __mod_timer+0xb0/0xbe
 [<ffffffff8026277c>] schedule_timeout+0x8a/0xad
 [<ffffffff80291141>] process_timeout+0x0/0x5
 [<ffffffff802620ed>] io_schedule_timeout+0x4b/0x78
 [<ffffffff8023c55e>] blk_congestion_wait+0x67/0x81
 [<ffffffff80299fec>] autoremove_wake_function+0x0/0x2e
 [<ffffffff80252c9f>] writeback_inodes+0xa8/0xd8
 [<ffffffff802bde60>] balance_dirty_pages_ratelimited_nr+0x17d/0x1fa
 [<ffffffff80211ad2>] do_wp_page+0x66f/0x6a3
 [<ffffffff80209ac4>] __handle_mm_fault+0x114b/0x11f6
 [<ffffffff8020622a>] hypercall_page+0x22a/0x1000
 [<ffffffff802639f9>] _spin_lock_irqsave+0x9/0x14
 [<ffffffff802666ef>] do_page_fault+0xf7b/0x12e0
 [<ffffffff8026df02>] monotonic_clock+0x35/0x7b
 [<ffffffff80261e83>] thread_return+0x6c/0x113
 [<ffffffff8025f82b>] error_exit+0x0/0x6e
 [<ffffffff8025f82b>] error_exit+0x0/0x6e

Reply via email to