Re: + fs-introduce-write_begin-write_end-and-perform_write-aops.patch added to -mm tree
On 14:19 Втр 29 Май , [EMAIL PROTECTED] wrote: The patch titled fs: introduce write_begin, write_end, and perform_write aops has been added to the -mm tree. Its filename is fs-introduce-write_begin-write_end-and-perform_write-aops.patch *** Remember to use Documentation/SubmitChecklist when testing your code *** See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find out what to do about this -- Subject: fs: introduce write_begin, write_end, and perform_write aops From: Nick Piggin [EMAIL PROTECTED] These are intended to replace prepare_write and commit_write with more flexible alternatives that are also able to avoid the buffered write deadlock problems efficiently (which prepare_write is unable to do). [EMAIL PROTECTED]: API design contributions, code review and fixes] Signed-off-by: Nick Piggin [EMAIL PROTECTED] Signed-off-by: Mark Fasheh [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] I've finaly find time to review Nick's write_begin/write_end aop patch set. And i have some fixes and questions. My be i've missed somthing and it was already disscussed, but i cant find in LKML. 1) loop dev: loop.c code itself is not perfect. In fact before Nick's patch partial write was't possible. Assumption what write chunks are always page aligned is realy weird ( see index++ line). Fixed by new aop loop fix patch 2)block_write_begin: After we enter to block_write_begin with *pagep == NULL and some page was grabed we remember this page in *pagep And if __block_prepare_write() we have to clear *pagep , as it was before. Because this may confuse caller. for example caller may have folowing code: ret = block_write_begin(..., pagep,...) if (ret *pagep != NULL) { unlock_page(*pagep); page_cache_release(*pagep); } Fixed my new aop block_write_begin fix patch 3) __page_symlink: Nick's patch add folowing code: + err = pagecache_write_begin(NULL, mapping, 0,PAGE_CACHE_SIZE, + AOP_FLAG_UNINTERRUPTIBLE, page,fsdata); symlink now consume whole page. I have only one question WHY???. I don't see any advantages, but where are huge list of dissdvantages: a) fs with blksize == 1k and pagesize == 16k after this patch waste more than 10x times disk space for nothing. b) What happends if we want use fs with blksize == 4k on i386 after it was used by ia64 ??? (before this patch it was possible). I dont prepare patch for this because i dont understand issue witch Nick aimed to fix. 4) iov_iter_fault_in_readable: Function prerform check for signgle region, with out respect to segment nature of iovec, For example writev no longer works :) : writev(3, [{\1, 1}, {\2..., 4096}], 2) = -1 EFAULT (Bad address) this hidden bug, and it was invisiable because _fault_in_readable return value was ignored before. Lets iov_iter_fault_in_readable perform checks for all segments. Fixed by :iov_iter_fault_in_readable fix 5) ext3_write_end: Before write_begin/write_end patch set we have folowing locking order: stop_journal(handle); unlock_page(page); But now order is oposite: unlock_page(page); stop_journal(handle); Can we got any race condition now? I'm not sure is it actual problem, may be somebody cant describe this. - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] new aop loop fix
loop.c code itself is not perfect. In fact before Nick's patch partial write was't possible. Assumption what write chunks are always page aligned is realy weird ( see index++ line). Signed-off-by: Dmitriy Monakhov [EMAIL PROTECTED] diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 4bab9b1..8726da5 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -215,7 +215,6 @@ static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec, int len, ret; mutex_lock(mapping-host-i_mutex); - index = pos PAGE_CACHE_SHIFT; offset = pos ((pgoff_t)PAGE_CACHE_SIZE - 1); bv_offs = bvec-bv_offset; len = bvec-bv_len; @@ -226,6 +225,7 @@ static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec, struct page *page; void *fsdata; + index = pos PAGE_CACHE_SHIFT; IV = ((sector_t)index (PAGE_CACHE_SHIFT - 9))+(offset 9); size = PAGE_CACHE_SIZE - offset; if (size len) @@ -255,7 +255,6 @@ static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec, bv_offs += copied; len -= copied; offset = 0; - index++; pos += copied; } ret = 0; - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] new aop loop fix
On Wed, 13 Jun 2007, Dmitriy Monakhov wrote: loop.c code itself is not perfect. In fact before Nick's patch partial write was't possible. Assumption what write chunks are always page aligned is realy weird ( see index++ line). Signed-off-by: Dmitriy Monakhov [EMAIL PROTECTED] I'm interested, because I'm trying to chase down an -mm bug which occasionally leaves me with 1k of zeroes where it shouldn't (in a 1k bsize ext2 looped over tmpfs). The length of time for this to happen varies a lot, so bisection has been misleading: maybe the problem lies in Nick's patches, maybe it does not. But I don't understand your fix below at all. _If_ you need to change the handling of index, then you need to change the handling of offset too, don't you? But what's wrong with how inded was handled anyway? Yes, it might be being incremented at the bottom of the loop when we haven't reached the end of this page, but in that case we're not going round the loop again anyway: len will now be 0. So no problem. One of us is missing something: please enlighten me - thanks. Hugh diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 4bab9b1..8726da5 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -215,7 +215,6 @@ static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec, int len, ret; mutex_lock(mapping-host-i_mutex); - index = pos PAGE_CACHE_SHIFT; offset = pos ((pgoff_t)PAGE_CACHE_SIZE - 1); bv_offs = bvec-bv_offset; len = bvec-bv_len; @@ -226,6 +225,7 @@ static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec, struct page *page; void *fsdata; + index = pos PAGE_CACHE_SHIFT; IV = ((sector_t)index (PAGE_CACHE_SHIFT - 9))+(offset 9); size = PAGE_CACHE_SIZE - offset; if (size len) @@ -255,7 +255,6 @@ static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec, bv_offs += copied; len -= copied; offset = 0; - index++; pos += copied; } ret = 0; - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] new aop loop fix
On 14:36 Срд 13 Июн , Hugh Dickins wrote: On Wed, 13 Jun 2007, Dmitriy Monakhov wrote: loop.c code itself is not perfect. In fact before Nick's patch partial write was't possible. Assumption what write chunks are always page aligned is realy weird ( see index++ line). Signed-off-by: Dmitriy Monakhov [EMAIL PROTECTED] I'm interested, because I'm trying to chase down an -mm bug which occasionally leaves me with 1k of zeroes where it shouldn't (in a 1k bsize ext2 looped over tmpfs). The length of time for this to happen varies a lot, so bisection has been misleading: maybe the problem lies in Nick's patches, maybe it does not. But I don't understand your fix below at all. _If_ you need to change the handling of index, then you need to change the handling of offset too, don't you? But what's wrong with how inded was handled anyway? Yes, it might be being incremented at the bottom of the loop when we haven't reached the end of this page, but in that case we're not going round the loop again anyway: len will now be 0. So no problem. One of us is missing something: please enlighten me - thanks. Yepp. You absolutely right, wrong patch was attached :( Btw: Nick's patches broke LO_CRYPT_XOR mode, but it is ok because this feature was absolete and not used by anyone, am i right here? Hugh diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 4bab9b1..8726da5 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -215,7 +215,6 @@ static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec, int len, ret; mutex_lock(mapping-host-i_mutex); - index = pos PAGE_CACHE_SHIFT; offset = pos ((pgoff_t)PAGE_CACHE_SIZE - 1); bv_offs = bvec-bv_offset; len = bvec-bv_len; @@ -226,6 +225,7 @@ static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec, struct page *page; void *fsdata; + index = pos PAGE_CACHE_SHIFT; IV = ((sector_t)index (PAGE_CACHE_SHIFT - 9))+(offset 9); size = PAGE_CACHE_SIZE - offset; if (size len) @@ -255,7 +255,6 @@ static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec, bv_offs += copied; len -= copied; offset = 0; - index++; pos += copied; } ret = 0; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] new aop loop fix
On Wed, 13 Jun 2007, Dmitriy Monakhov wrote: Btw: Nick's patches broke LO_CRYPT_XOR mode, It would help him if you could describe how. but it is ok because this feature was absolete and not used by anyone, am i right here? I know nothing of this; but so long as its code remains in the driver, I'd say we should be supporting it rather than breaking it. Hugh - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: + fs-introduce-write_begin-write_end-and-perform_write-aops.patch added to -mm tree
On 13:43 Срд 13 Июн , Nick Piggin wrote: On Wed, Jun 13, 2007 at 05:40:05PM +0400, Dmitriy Monakhov wrote: On 14:19 ?? 29 ?? , [EMAIL PROTECTED] wrote: The patch titled fs: introduce write_begin, write_end, and perform_write aops has been added to the -mm tree. Its filename is fs-introduce-write_begin-write_end-and-perform_write-aops.patch *** Remember to use Documentation/SubmitChecklist when testing your code *** See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find out what to do about this -- Subject: fs: introduce write_begin, write_end, and perform_write aops From: Nick Piggin [EMAIL PROTECTED] These are intended to replace prepare_writ eand commit_write with more flexible alternatives that are also able to avoid the buffered write deadlock problems efficiently (which prepare_write is unable to do). [EMAIL PROTECTED]: API design contributions, code review and fixes] Signed-off-by: Nick Piggin [EMAIL PROTECTED] Signed-off-by: Mark Fasheh [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] I've finaly find time to review Nick's write_begin/write_end aop patch set. And i have some fixes and questions. My be i've missed somthing and it was already disscussed, but i cant find in LKML. Thanks, that's really helpful. 1) loop dev: loop.c code itself is not perfect. In fact before Nick's patch partial write was't possible. Assumption what write chunks are always page aligned is realy weird ( see index++ line). Fixed by new aop loop fix patch I think you're right, fix looks good. 2)block_write_begin: After we enter to block_write_begin with *pagep == NULL and some page was grabed we remember this page in *pagep And if __block_prepare_write() we have to clear *pagep , as it was before. Because this may confuse caller. for example caller may have folowing code: ret = block_write_begin(..., pagep,...) if (ret *pagep != NULL) { unlock_page(*pagep); page_cache_release(*pagep); } Fixed my new aop block_write_begin fix patch I don't think the caller can rely on that if it returns failure. But that is more defensive I guess. Maybe setting it to 1 or so would catch abusers. 3) __page_symlink: Nick's patch add folowing code: + err = pagecache_write_begin(NULL, mapping, 0,PAGE_CACHE_SIZE, + AOP_FLAG_UNINTERRUPTIBLE, page,fsdata); symlink now consume whole page. I have only one question WHY???. I don't see any advantages, but where are huge list of dissdvantages: a) fs with blksize == 1k and pagesize == 16k after this patch waste more than 10x times disk space for nothing. b) What happends if we want use fs with blksize == 4k on i386 after it was used by ia64 ??? (before this patch it was possible). One more visiable effect caused by wrong symlink size: fsstress cause folowing error: LOG BEGIN EXT3-fs unexpected failure: !buffer_revoked(bh); inconsistent data on disk ext3_forget: aborting transaction: IO failure in __ext3_journal_revoke ext3_abort called. EXT3-fs error (device dm-4): ext3_forget: error -5 when attempting revoke Remounting filesystem read-only Aborting journal on device dm-4. journal commit I/O error journal commit I/O error journal commit I/O error journal commit I/O error journal commit I/O error journal commit I/O error EXT3-fs error (device dm-4) in ext3_free_blocks_sb: Journal has aborted journal commit I/O error journal commit I/O error EXT3-fs error (device dm-4) in ext3_reserve_inode_write: Journal has aborted EXT3-fs error (device dm-4) in ext3_truncate: IO failure journal commit I/O error journal commit I/O error journal commit I/O error journal commit I/O error EXT3-fs error (device dm-4) in ext3_reserve_inode_write: Journal has aborted EXT3-fs error (device dm-4) in ext3_orphan_del: Journal has aborted EXT3-fs error (device dm-4) in ext3_reserve_inode_write: Journal has aborted EXT3-fs error (device dm-4) in ext3_delete_inode: IO failure LOG END After symlink size was fixed to len-1 problem dissappeared. I dont prepare patch for this because i dont understand issue witch Nick aimed to fix. I don't know why myself :P I think it would be just fine to use len-1 as it did previously, so it must have been a typo? 4) iov_iter_fault_in_readable: Function prerform check for signgle region, with out respect to segment nature of iovec, For example writev no longer works :) : writev(3, [{\1, 1}, {\2..., 4096}], 2) = -1 EFAULT (Bad address) this hidden bug, and it was invisiable because _fault_in_readable return value was ignored before. Lets
ext4 patch queue updated
Just updated the ext4 patch queue http://repo.or.cz/w/ext4-patch-queue Changes: Added these three patches from Jose Santos: ext4_set_jbd2_64bit_feature.patch jbd2_config_jbd2_debug_fix.patch jbd2_move_jbd2_debug_to_debugfs.patch Reordered the series, move jbd-stats-through-procfs and ext4_remove_subdirs_limit.patch before delayed allocation patches Run checkpatch.pl http://lwn.net/Articles/237451/ I fixed coding style issues for most patches by hand except the delayed allocation patch ext4-delayed-allocation.patch. That need a bit more work and requires someone family with the code better. Alex, can you help on this? Attached is the checkpatch.pl output, thanks. Mingming printk() should include KERN_ facility level #275: FILE: fs/ext4/writeback.c:73: +#define wb_debug(fmt, a...)printk(fmt, ##a); do not use assignment in condition #338: FILE: fs/ext4/writeback.c:136: + while (!bio (nr_vecs /= 2)) printk() should include KERN_ facility level #826: FILE: fs/ext4/writeback.c:624: + printk(no mem for ext4_wb_pages!\n); #if 0 -- if this code redundant remove it #946: FILE: fs/ext4/writeback.c:744: +#if 0 line over 80 characters #948: FILE: fs/ext4/writeback.c:746: + printk(#%u: wow! short extent %d for flush on #%lu\n, printk() should include KERN_ facility level #948: FILE: fs/ext4/writeback.c:746: + printk(#%u: wow! short extent %d for flush on #%lu\n, line over 80 characters #949: FILE: fs/ext4/writeback.c:747: + (unsigned) current-pid, wc.len, inode-i_ino); line over 80 characters #950: FILE: fs/ext4/writeback.c:748: + printk(#%u: done = %d, nr_to_write %ld, sync = %d\n, printk() should include KERN_ facility level #950: FILE: fs/ext4/writeback.c:748: + printk(#%u: done = %d, nr_to_write %ld, sync = %d\n, line over 80 characters #951: FILE: fs/ext4/writeback.c:749: + (unsigned) current-pid, done, wbc-nr_to_write, printk() should include KERN_ facility level #953: FILE: fs/ext4/writeback.c:751: + printk(#%u: written %d, extents %d\n, line over 80 characters #954: FILE: fs/ext4/writeback.c:752: + (unsigned) current-pid, written, extents); printk() should include KERN_ facility level #955: FILE: fs/ext4/writeback.c:753: + printk(#%u: cur %lu, prev %lu\n, #if 0 -- if this code redundant remove it #985: FILE: fs/ext4/writeback.c:783: +#if 0 line over 80 characters #991: FILE: fs/ext4/writeback.c:789: + atomic_inc(EXT4_SB(inode-i_sb)-s_wb_congested); printk() should include KERN_ facility level #1370: FILE: fs/ext4/writeback.c:1168: + printk(EXT4-fs: writeback: %d blocks %d extents in %d reqs (%d ave)\n, line over 80 characters #1375: FILE: fs/ext4/writeback.c:1173: + printk(EXT4-fs: writeback: %d nr_to_write, %d congestions, %d singles\n, printk() should include KERN_ facility level #1375: FILE: fs/ext4/writeback.c:1173: + printk(EXT4-fs: writeback: %d nr_to_write, %d congestions, %d singles\n, printk() should include KERN_ facility level #1379: FILE: fs/ext4/writeback.c:1177: + printk(EXT4-fs: writeback: %d collisions, %d single-page collisions\n, printk() should include KERN_ facility level #1382: FILE: fs/ext4/writeback.c:1180: + printk(EXT4-fs: writeback: %d allocated, %d dropped\n, Missing Signed-off-by: line(s) Your patch has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS.
Re: + fs-introduce-write_begin-write_end-and-perform_write-aops.patch added to -mm tree
On Wed, Jun 13, 2007 at 04:07:01PM -0700, Badari Pulavarty wrote: On Wed, 2007-06-13 at 13:43 +0200, Nick Piggin wrote: .. 5) ext3_write_end: Before write_begin/write_end patch set we have folowing locking order: stop_journal(handle); unlock_page(page); But now order is oposite: unlock_page(page); stop_journal(handle); Can we got any race condition now? I'm not sure is it actual problem, may be somebody cant describe this. Can we just change it to the original order? That would seem to be safest unless one of the ext3 devs explicitly acks it. It would be nice to go back to original order, but its not that simple with current structure of the code. With Nick's patches unlock_page() happens in generic_write_end(). journal_stop() needs to happen after generic_write_end(). :( Well we could use block_write_end? Mingming, can you take a look at the current proposed order ? I ran into bunch of races when I tried to change the order for -writepages() support earlier :( OK, it sounds like we probably want to revert to the original order at least for this patchset. If the new order is proven safe then that could be introduced later to simplify things... Thanks, Nick - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html