Re: iov_iter_fault_in_readable fix
On 18:31 Чтв 14 Июн , Christoph Hellwig wrote: > On Wed, Jun 13, 2007 at 05:57:59PM +0400, Dmitriy Monakhov wrote: > > Function prerform check for signgle region, with out respect to > > segment nature of iovec, For example writev no longer works :) > > Btw, could someone please start to collect all sniplets like this in > a nice simple regression test suite? If no one wants to start a new > one we should probably just put it into xfsqa (which should be useable > for other filesystems aswell despite the name) I've prepared testcase (testcases/kernel/syscalls/writev/writev06.c) and sent it to ltp mailing list. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] deny partial write for loop dev fd
Partial write can be easily supported by LO_CRYPT_NONE mode, but it is not easy in LO_CRYPT_CRYPTOAPI case, because of its block nature. I don't know who still used cryptoapi, but theoretically it is possible. So let's leave things as they are. Loop device doesn't support partial write before Nick's "write_begin/write_end" patch set, and let's it behave the same way after. Signed-off-by: Dmitriy Monakhov <[EMAIL PROTECTED]> --- drivers/block/loop.c |4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 4bab9b1..de122f3 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -244,10 +244,8 @@ static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec, ret = pagecache_write_end(file, mapping, pos, size, copied, page, fsdata); - if (ret < 0) + if (ret < 0 || ret != copied) goto fail; - if (ret < copied) - copied = ret; if (unlikely(transfer_result)) goto fail; -- 1.5.2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] deny partial write for loop dev fd
Partial write can be easily supported by LO_CRYPT_NONE mode, but it is not easy in LO_CRYPT_CRYPTOAPI case, because of its block nature. I don't know who still used cryptoapi, but theoretically it is possible. So let's leave things as they are. Loop device doesn't support partial write before Nick's write_begin/write_end patch set, and let's it behave the same way after. Signed-off-by: Dmitriy Monakhov [EMAIL PROTECTED] --- drivers/block/loop.c |4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 4bab9b1..de122f3 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -244,10 +244,8 @@ static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec, ret = pagecache_write_end(file, mapping, pos, size, copied, page, fsdata); - if (ret 0) + if (ret 0 || ret != copied) goto fail; - if (ret copied) - copied = ret; if (unlikely(transfer_result)) goto fail; -- 1.5.2 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: iov_iter_fault_in_readable fix
On 18:31 Чтв 14 Июн , Christoph Hellwig wrote: On Wed, Jun 13, 2007 at 05:57:59PM +0400, Dmitriy Monakhov wrote: Function prerform check for signgle region, with out respect to segment nature of iovec, For example writev no longer works :) Btw, could someone please start to collect all sniplets like this in a nice simple regression test suite? If no one wants to start a new one we should probably just put it into xfsqa (which should be useable for other filesystems aswell despite the name) I've prepared testcase (testcases/kernel/syscalls/writev/writev06.c) and sent it to ltp mailing list. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ext4:fix unexpected error from ext4_reserve_global
I just cant belive my eyes then i saw this at the first time... simple test: strace dd if=/dev/zero of=/mnt/file open("/dev/zero", O_RDONLY) = 0 close(1)= 0 open("/mnt/test/file", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 1 read(0, "\0\0\0\0\0\0\0\0\0"..., 512) = 512 write(1, "\0\0\0\0\0\0\0\0"..., 512) = 512 read(0, "\0\0\0\0\0\0\0\0\0"..., 512) = 512 write(1, "\0\0\0\0\0\0\0\0"..., 512) = -1 ENOENT (No such fil e or directory) This strange error returned from ext4_reserve_global(). It's just typo because: a) In fact this is 100% ENOSPC situation b) simular function ext4_reserve_local() returns -ENOSPC Signed-off-by: Dmitriy Monakhov <[EMAIL PROTECTED]> --- fs/ext4/balloc.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c index 4d7bfd2..43ae8f8 100644 --- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -1920,7 +1920,7 @@ int ext4_reserve_global(struct super_block *sb, int blocks) { struct ext4_sb_info *sbi = EXT4_SB(sb); struct ext4_reservation_slot *rs; - int i, rc = -ENOENT; + int i, rc = -ENOSPC; __u64 free = 0; rs = sbi->s_reservation_slots; -- 1.5.2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ext4:fix unexpected error from ext4_reserve_global
I just cant belive my eyes then i saw this at the first time... simple test: strace dd if=/dev/zero of=/mnt/file open(/dev/zero, O_RDONLY) = 0 close(1)= 0 open(/mnt/test/file, O_WRONLY|O_CREAT|O_TRUNC, 0666) = 1 read(0, \0\0\0\0\0\0\0\0\0..., 512) = 512 write(1, \0\0\0\0\0\0\0\0..., 512) = 512 read(0, \0\0\0\0\0\0\0\0\0..., 512) = 512 write(1, \0\0\0\0\0\0\0\0..., 512) = -1 ENOENT (No such fil e or directory) This strange error returned from ext4_reserve_global(). It's just typo because: a) In fact this is 100% ENOSPC situation b) simular function ext4_reserve_local() returns -ENOSPC Signed-off-by: Dmitriy Monakhov [EMAIL PROTECTED] --- fs/ext4/balloc.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c index 4d7bfd2..43ae8f8 100644 --- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -1920,7 +1920,7 @@ int ext4_reserve_global(struct super_block *sb, int blocks) { struct ext4_sb_info *sbi = EXT4_SB(sb); struct ext4_reservation_slot *rs; - int i, rc = -ENOENT; + int i, rc = -ENOSPC; __u64 free = 0; rs = sbi-s_reservation_slots; -- 1.5.2 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: + fs-introduce-write_begin-write_end-and-perform_write-aops.patch added to -mm tree
On 13:43 Срд 13 Июн , Nick Piggin wrote: > On Wed, Jun 13, 2007 at 05:40:05PM +0400, Dmitriy Monakhov wrote: > > On 14:19 ?? 29 ?? , [EMAIL PROTECTED] wrote: > > > > > > The patch titled > > > fs: introduce write_begin, write_end, and perform_write aops > > > has been added to the -mm tree. Its filename is > > > fs-introduce-write_begin-write_end-and-perform_write-aops.patch > > > > > > *** Remember to use Documentation/SubmitChecklist when testing your code > > > *** > > > > > > See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to > > > find > > > out what to do about this > > > > > > -- > > > Subject: fs: introduce write_begin, write_end, and perform_write aops > > > From: Nick Piggin <[EMAIL PROTECTED]> > > > > > > These are intended to replace prepare_writ eand commit_write with more > > > flexible alternatives that are also able to avoid the buffered write > > > deadlock problems efficiently (which prepare_write is unable to do). > > > > > > [EMAIL PROTECTED]: API design contributions, code review and fixes] > > > Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> > > > Signed-off-by: Mark Fasheh <[EMAIL PROTECTED]> > > > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> > > I've finaly find time to review Nick's "write_begin/write_end aop" patch > > set. > > And i have some fixes and questions. My be i've missed somthing and it was > > already disscussed, but i cant find in LKML. > > Thanks, that's really helpful. > > > > 1) loop dev: > > loop.c code itself is not perfect. In fact before Nick's patch > > partial write was't possible. Assumption what write chunks are > > always page aligned is realy weird ( see "index++" line). > > Fixed by "new aop loop fix" patch > > I think you're right, fix looks good. > > > > 2)block_write_begin: > > After we enter to block_write_begin with *pagep == NULL and > > some page was grabed we remember this page in *pagep > > And if __block_prepare_write() we have to clear *pagep , as > > it was before. Because this may confuse caller. > > for example caller may have folowing code: > > ret = block_write_begin(..., pagep,...) > > if (ret && *pagep != NULL) { > > unlock_page(*pagep); > > page_cache_release(*pagep); > > } > > Fixed my "new aop block_write_begin fix" patch > > I don't think the caller can rely on that if it returns failure. > But that is more defensive I guess. Maybe setting it to 1 or > so would catch abusers. > > > > 3) __page_symlink: > > Nick's patch add folowing code: > > + err = pagecache_write_begin(NULL, mapping, 0,PAGE_CACHE_SIZE, > > + AOP_FLAG_UNINTERRUPTIBLE, ,); > > symlink now consume whole page. I have only one question "WHY???". > > I don't see any advantages, but where are huge list of > > dissdvantages: > > a) fs with blksize == 1k and pagesize == 16k after this patch > >waste more than 10x times disk space for nothing. > > b) What happends if we want use fs with blksize == 4k on i386 > >after it was used by ia64 ??? (before this patch it was > >possible). One more visiable effect caused by wrong symlink size: fsstress cause folowing error: EXT3-fs unexpected failure: !buffer_revoked(bh); inconsistent data on disk ext3_forget: aborting transaction: IO failure in __ext3_journal_revoke ext3_abort called. EXT3-fs error (device dm-4): ext3_forget: error -5 when attempting revoke Remounting filesystem read-only Aborting journal on device dm-4. journal commit I/O error journal commit I/O error journal commit I/O error journal commit I/O error journal commit I/O error journal commit I/O error EXT3-fs error (device dm-4) in ext3_free_blocks_sb: Journal has aborted journal commit I/O error journal commit I/O error EXT3-fs error (device dm-4) in ext3_reserve_inode_write: Journal has aborted EXT3-fs error (device dm-4) in ext3_truncate: IO failure journal commit I/O error journal commit I/O error journal commit I/O error journal commit I/O error EXT3-fs error (device dm-4) in ext3_reserve_inode_write: Journal has aborted EXT3-fs error (device dm-4) in ext3_orphan_del: Journal has aborted EXT3-fs error (device dm-4) in ext3_reserve_inode_write: Journal has aborted EXT3-fs error (device dm-4) in ext3_de
Re: [patch] new aop loop fix
On 14:36 Срд 13 Июн , Hugh Dickins wrote: > On Wed, 13 Jun 2007, Dmitriy Monakhov wrote: > > > loop.c code itself is not perfect. In fact before Nick's patch > > partial write was't possible. Assumption what write chunks are > > always page aligned is realy weird ( see "index++" line). > > > > Signed-off-by: Dmitriy Monakhov <[EMAIL PROTECTED]> > > I'm interested, because I'm trying to chase down an -mm bug which > occasionally leaves me with 1k of zeroes where it shouldn't (in a > 1k bsize ext2 looped over tmpfs). The length of time for this to > happen varies a lot, so bisection has been misleading: maybe the > problem lies in Nick's patches, maybe it does not. > > But I don't understand your fix below at all. _If_ you need to > change the handling of index, then you need to change the handling > of offset too, don't you? > > But what's wrong with how inded was handled anyway? Yes, it might > be being incremented at the bottom of the loop when we haven't > reached the end of this page, but in that case we're not going > round the loop again anyway: len will now be 0. So no problem. > > One of us is missing something: please enlighten me - thanks. Yepp. You absolutely right, wrong patch was attached :( Btw: Nick's patches broke LO_CRYPT_XOR mode, but it is ok because this feature was absolete and not used by anyone, am i right here? > > Hugh > > > > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > > index 4bab9b1..8726da5 100644 > > --- a/drivers/block/loop.c > > +++ b/drivers/block/loop.c > > @@ -215,7 +215,6 @@ static int do_lo_send_aops(struct loop_device *lo, > > struct bio_vec *bvec, > > int len, ret; > > > > mutex_lock(>host->i_mutex); > > - index = pos >> PAGE_CACHE_SHIFT; > > offset = pos & ((pgoff_t)PAGE_CACHE_SIZE - 1); > > bv_offs = bvec->bv_offset; > > len = bvec->bv_len; > > @@ -226,6 +225,7 @@ static int do_lo_send_aops(struct loop_device *lo, > > struct bio_vec *bvec, > > struct page *page; > > void *fsdata; > > > > + index = pos >> PAGE_CACHE_SHIFT; > > IV = ((sector_t)index << (PAGE_CACHE_SHIFT - 9))+(offset >> 9); > > size = PAGE_CACHE_SIZE - offset; > > if (size > len) > > @@ -255,7 +255,6 @@ static int do_lo_send_aops(struct loop_device *lo, > > struct bio_vec *bvec, > > bv_offs += copied; > > len -= copied; > > offset = 0; > > - index++; > > pos += copied; > > } > > ret = 0; > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
iov_iter_fault_in_readable fix
Function prerform check for signgle region, with out respect to segment nature of iovec, For example writev no longer works :) /* TESTCASE BEGIN */ #include #include #include #include #include #include #define SIZE (4096 * 2) int main(int argc, char* argv[]) { char* ptr[4]; struct iovec iov[2]; int fd, ret; ptr[0] = mmap(NULL, SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 0, 0); ptr[1] = mmap(NULL, SIZE, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, 0, 0); ptr[2] = mmap(NULL, SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 0, 0); ptr[3] = mmap(NULL, SIZE, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, 0, 0); iov[0].iov_base = ptr[0] + (SIZE -1); iov[0].iov_len = 1; memset(ptr[0], 1, SIZE); iov[1].iov_base = ptr[2]; iov[1].iov_len = SIZE; memset(ptr[2], 2, SIZE); fd = open(argv[1], O_CREAT|O_RDWR|O_TRUNC, 0666); ret = writev(fd, iov, sizeof(iov) / sizeof(struct iovec)); return 0; } /* TESTCASE END*/ We will get folowing result: writev(3, [{"\1", 1}, {"\2"..., 8192}], 2) = -1 EFAULT (Bad address) this is hidden bug, and it was invisiable because _fault_in_readable return value was ignored before. Lets iov_iter_fault_in_readable perform checks for all segments. Signed-off-by: Dmitriy Monakhov <[EMAIL PROTECTED]> diff --git a/include/linux/fs.h b/include/linux/fs.h index fef19fc..7e025ea 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -433,7 +433,7 @@ size_t iov_iter_copy_from_user_atomic(struct page *page, size_t iov_iter_copy_from_user(struct page *page, struct iov_iter *i, unsigned long offset, size_t bytes); void iov_iter_advance(struct iov_iter *i, size_t bytes); -int iov_iter_fault_in_readable(struct iov_iter *i, size_t bytes); +int iov_iter_fault_in_readable(struct iov_iter *i, size_t *bytes); size_t iov_iter_single_seg_count(struct iov_iter *i); static inline void iov_iter_init(struct iov_iter *i, diff --git a/mm/filemap.c b/mm/filemap.c index 8d59ed9..8600c3e 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1817,10 +1817,32 @@ void iov_iter_advance(struct iov_iter *i, size_t bytes) } EXPORT_SYMBOL(iov_iter_advance); -int iov_iter_fault_in_readable(struct iov_iter *i, size_t bytes) +int iov_iter_fault_in_readable(struct iov_iter *i, size_t* bytes) { - char __user *buf = i->iov->iov_base + i->iov_offset; - return fault_in_pages_readable(buf, bytes); + size_t len = *bytes; + int ret; + if (likely(i->nr_segs == 1)) { + ret = fault_in_pages_readable(i->iov->iov_base, len); + if (ret) + *bytes = 0; + } else { + const struct iovec *iov = i->iov; + size_t base = i->iov_offset; + *bytes = 0; + while (len) { + int copy = min(len, iov->iov_len - base); + if ((ret = fault_in_pages_readable(iov->iov_base + base, copy))) + break; + *bytes += copy; + len -= copy; + base += copy; + if (iov->iov_len == base) { + iov++; + base = 0; + } + } + } + return ret; } EXPORT_SYMBOL(iov_iter_fault_in_readable); @@ -2110,7 +2132,7 @@ static ssize_t generic_perform_write_2copy(struct file *file, * to check that the address is actually valid, when atomic * usercopies are used, below. */ - if (unlikely(iov_iter_fault_in_readable(i, bytes))) { + if (unlikely(iov_iter_fault_in_readable(i, ) && !bytes)) { status = -EFAULT; break; } @@ -2284,7 +2306,7 @@ again: * to check that the address is actually valid, when atomic * usercopies are used, below. */ - if (unlikely(iov_iter_fault_in_readable(i, bytes))) { + if (unlikely(iov_iter_fault_in_readable(i, )&& !bytes)) { status = -EFAULT; break; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] new aop block_write_begin fix
After we enter to block_write_begin with *pagep == NULL and some page was grabed we remember this page in *pagep And if __block_prepare_write() we have to clear *pagep , as it was before. Because this may confuse caller. for example caller may have folowing code: ret = block_write_begin(..., pagep,...) if (ret && *pagep != NULL) { unlock_page(*pagep); page_cache_release(*pagep); } Signed-off-by: Dmitriy Monakhov <[EMAIL PROTECTED]> diff --git a/fs/buffer.c b/fs/buffer.c index 07cd457..df933ba 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -1982,6 +1982,7 @@ int block_write_begin(struct file *file, struct address_space *mapping, if (ownpage) { unlock_page(page); page_cache_release(page); + *pagep = NULL; /* * prepare_write() may have instantiated a few blocks - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] new aop loop fix
loop.c code itself is not perfect. In fact before Nick's patch partial write was't possible. Assumption what write chunks are always page aligned is realy weird ( see "index++" line). Signed-off-by: Dmitriy Monakhov <[EMAIL PROTECTED]> diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 4bab9b1..8726da5 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -215,7 +215,6 @@ static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec, int len, ret; mutex_lock(>host->i_mutex); - index = pos >> PAGE_CACHE_SHIFT; offset = pos & ((pgoff_t)PAGE_CACHE_SIZE - 1); bv_offs = bvec->bv_offset; len = bvec->bv_len; @@ -226,6 +225,7 @@ static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec, struct page *page; void *fsdata; + index = pos >> PAGE_CACHE_SHIFT; IV = ((sector_t)index << (PAGE_CACHE_SHIFT - 9))+(offset >> 9); size = PAGE_CACHE_SIZE - offset; if (size > len) @@ -255,7 +255,6 @@ static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec, bv_offs += copied; len -= copied; offset = 0; - index++; pos += copied; } ret = 0; > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: + fs-introduce-write_begin-write_end-and-perform_write-aops.patch added to -mm tree
On 14:19 Втр 29 Май , [EMAIL PROTECTED] wrote: > > The patch titled > fs: introduce write_begin, write_end, and perform_write aops > has been added to the -mm tree. Its filename is > fs-introduce-write_begin-write_end-and-perform_write-aops.patch > > *** Remember to use Documentation/SubmitChecklist when testing your code *** > > See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find > out what to do about this > > -- > Subject: fs: introduce write_begin, write_end, and perform_write aops > From: Nick Piggin <[EMAIL PROTECTED]> > > These are intended to replace prepare_write and commit_write with more > flexible alternatives that are also able to avoid the buffered write > deadlock problems efficiently (which prepare_write is unable to do). > > [EMAIL PROTECTED]: API design contributions, code review and fixes] > Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> > Signed-off-by: Mark Fasheh <[EMAIL PROTECTED]> > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> I've finaly find time to review Nick's "write_begin/write_end aop" patch set. And i have some fixes and questions. My be i've missed somthing and it was already disscussed, but i cant find in LKML. 1) loop dev: loop.c code itself is not perfect. In fact before Nick's patch partial write was't possible. Assumption what write chunks are always page aligned is realy weird ( see "index++" line). Fixed by "new aop loop fix" patch 2)block_write_begin: After we enter to block_write_begin with *pagep == NULL and some page was grabed we remember this page in *pagep And if __block_prepare_write() we have to clear *pagep , as it was before. Because this may confuse caller. for example caller may have folowing code: ret = block_write_begin(..., pagep,...) if (ret && *pagep != NULL) { unlock_page(*pagep); page_cache_release(*pagep); } Fixed my "new aop block_write_begin fix" patch 3) __page_symlink: Nick's patch add folowing code: + err = pagecache_write_begin(NULL, mapping, 0,PAGE_CACHE_SIZE, + AOP_FLAG_UNINTERRUPTIBLE, ,); symlink now consume whole page. I have only one question "WHY???". I don't see any advantages, but where are huge list of dissdvantages: a) fs with blksize == 1k and pagesize == 16k after this patch waste more than 10x times disk space for nothing. b) What happends if we want use fs with blksize == 4k on i386 after it was used by ia64 ??? (before this patch it was possible). I dont prepare patch for this because i dont understand issue witch Nick aimed to fix. 4) iov_iter_fault_in_readable: Function prerform check for signgle region, with out respect to segment nature of iovec, For example writev no longer works :) : writev(3, [{"\1", 1}, {"\2"..., 4096}], 2) = -1 EFAULT (Bad address) this hidden bug, and it was invisiable because _fault_in_readable return value was ignored before. Lets iov_iter_fault_in_readable perform checks for all segments. Fixed by :"iov_iter_fault_in_readable fix" 5) ext3_write_end: Before write_begin/write_end patch set we have folowing locking order: stop_journal(handle); unlock_page(page); But now order is oposite: unlock_page(page); stop_journal(handle); Can we got any race condition now? I'm not sure is it actual problem, may be somebody cant describe this. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: + fs-introduce-write_begin-write_end-and-perform_write-aops.patch added to -mm tree
On 14:19 Втр 29 Май , [EMAIL PROTECTED] wrote: The patch titled fs: introduce write_begin, write_end, and perform_write aops has been added to the -mm tree. Its filename is fs-introduce-write_begin-write_end-and-perform_write-aops.patch *** Remember to use Documentation/SubmitChecklist when testing your code *** See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find out what to do about this -- Subject: fs: introduce write_begin, write_end, and perform_write aops From: Nick Piggin [EMAIL PROTECTED] These are intended to replace prepare_write and commit_write with more flexible alternatives that are also able to avoid the buffered write deadlock problems efficiently (which prepare_write is unable to do). [EMAIL PROTECTED]: API design contributions, code review and fixes] Signed-off-by: Nick Piggin [EMAIL PROTECTED] Signed-off-by: Mark Fasheh [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] I've finaly find time to review Nick's write_begin/write_end aop patch set. And i have some fixes and questions. My be i've missed somthing and it was already disscussed, but i cant find in LKML. 1) loop dev: loop.c code itself is not perfect. In fact before Nick's patch partial write was't possible. Assumption what write chunks are always page aligned is realy weird ( see index++ line). Fixed by new aop loop fix patch 2)block_write_begin: After we enter to block_write_begin with *pagep == NULL and some page was grabed we remember this page in *pagep And if __block_prepare_write() we have to clear *pagep , as it was before. Because this may confuse caller. for example caller may have folowing code: ret = block_write_begin(..., pagep,...) if (ret *pagep != NULL) { unlock_page(*pagep); page_cache_release(*pagep); } Fixed my new aop block_write_begin fix patch 3) __page_symlink: Nick's patch add folowing code: + err = pagecache_write_begin(NULL, mapping, 0,PAGE_CACHE_SIZE, + AOP_FLAG_UNINTERRUPTIBLE, page,fsdata); symlink now consume whole page. I have only one question WHY???. I don't see any advantages, but where are huge list of dissdvantages: a) fs with blksize == 1k and pagesize == 16k after this patch waste more than 10x times disk space for nothing. b) What happends if we want use fs with blksize == 4k on i386 after it was used by ia64 ??? (before this patch it was possible). I dont prepare patch for this because i dont understand issue witch Nick aimed to fix. 4) iov_iter_fault_in_readable: Function prerform check for signgle region, with out respect to segment nature of iovec, For example writev no longer works :) : writev(3, [{\1, 1}, {\2..., 4096}], 2) = -1 EFAULT (Bad address) this hidden bug, and it was invisiable because _fault_in_readable return value was ignored before. Lets iov_iter_fault_in_readable perform checks for all segments. Fixed by :iov_iter_fault_in_readable fix 5) ext3_write_end: Before write_begin/write_end patch set we have folowing locking order: stop_journal(handle); unlock_page(page); But now order is oposite: unlock_page(page); stop_journal(handle); Can we got any race condition now? I'm not sure is it actual problem, may be somebody cant describe this. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] new aop loop fix
loop.c code itself is not perfect. In fact before Nick's patch partial write was't possible. Assumption what write chunks are always page aligned is realy weird ( see index++ line). Signed-off-by: Dmitriy Monakhov [EMAIL PROTECTED] diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 4bab9b1..8726da5 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -215,7 +215,6 @@ static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec, int len, ret; mutex_lock(mapping-host-i_mutex); - index = pos PAGE_CACHE_SHIFT; offset = pos ((pgoff_t)PAGE_CACHE_SIZE - 1); bv_offs = bvec-bv_offset; len = bvec-bv_len; @@ -226,6 +225,7 @@ static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec, struct page *page; void *fsdata; + index = pos PAGE_CACHE_SHIFT; IV = ((sector_t)index (PAGE_CACHE_SHIFT - 9))+(offset 9); size = PAGE_CACHE_SIZE - offset; if (size len) @@ -255,7 +255,6 @@ static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec, bv_offs += copied; len -= copied; offset = 0; - index++; pos += copied; } ret = 0; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] new aop block_write_begin fix
After we enter to block_write_begin with *pagep == NULL and some page was grabed we remember this page in *pagep And if __block_prepare_write() we have to clear *pagep , as it was before. Because this may confuse caller. for example caller may have folowing code: ret = block_write_begin(..., pagep,...) if (ret *pagep != NULL) { unlock_page(*pagep); page_cache_release(*pagep); } Signed-off-by: Dmitriy Monakhov [EMAIL PROTECTED] diff --git a/fs/buffer.c b/fs/buffer.c index 07cd457..df933ba 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -1982,6 +1982,7 @@ int block_write_begin(struct file *file, struct address_space *mapping, if (ownpage) { unlock_page(page); page_cache_release(page); + *pagep = NULL; /* * prepare_write() may have instantiated a few blocks - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
iov_iter_fault_in_readable fix
Function prerform check for signgle region, with out respect to segment nature of iovec, For example writev no longer works :) /* TESTCASE BEGIN */ #include stdio.h #include sys/types.h #include sys/stat.h #include fcntl.h #include sys/uio.h #include sys/mman.h #define SIZE (4096 * 2) int main(int argc, char* argv[]) { char* ptr[4]; struct iovec iov[2]; int fd, ret; ptr[0] = mmap(NULL, SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 0, 0); ptr[1] = mmap(NULL, SIZE, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, 0, 0); ptr[2] = mmap(NULL, SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 0, 0); ptr[3] = mmap(NULL, SIZE, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, 0, 0); iov[0].iov_base = ptr[0] + (SIZE -1); iov[0].iov_len = 1; memset(ptr[0], 1, SIZE); iov[1].iov_base = ptr[2]; iov[1].iov_len = SIZE; memset(ptr[2], 2, SIZE); fd = open(argv[1], O_CREAT|O_RDWR|O_TRUNC, 0666); ret = writev(fd, iov, sizeof(iov) / sizeof(struct iovec)); return 0; } /* TESTCASE END*/ We will get folowing result: writev(3, [{\1, 1}, {\2..., 8192}], 2) = -1 EFAULT (Bad address) this is hidden bug, and it was invisiable because _fault_in_readable return value was ignored before. Lets iov_iter_fault_in_readable perform checks for all segments. Signed-off-by: Dmitriy Monakhov [EMAIL PROTECTED] diff --git a/include/linux/fs.h b/include/linux/fs.h index fef19fc..7e025ea 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -433,7 +433,7 @@ size_t iov_iter_copy_from_user_atomic(struct page *page, size_t iov_iter_copy_from_user(struct page *page, struct iov_iter *i, unsigned long offset, size_t bytes); void iov_iter_advance(struct iov_iter *i, size_t bytes); -int iov_iter_fault_in_readable(struct iov_iter *i, size_t bytes); +int iov_iter_fault_in_readable(struct iov_iter *i, size_t *bytes); size_t iov_iter_single_seg_count(struct iov_iter *i); static inline void iov_iter_init(struct iov_iter *i, diff --git a/mm/filemap.c b/mm/filemap.c index 8d59ed9..8600c3e 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1817,10 +1817,32 @@ void iov_iter_advance(struct iov_iter *i, size_t bytes) } EXPORT_SYMBOL(iov_iter_advance); -int iov_iter_fault_in_readable(struct iov_iter *i, size_t bytes) +int iov_iter_fault_in_readable(struct iov_iter *i, size_t* bytes) { - char __user *buf = i-iov-iov_base + i-iov_offset; - return fault_in_pages_readable(buf, bytes); + size_t len = *bytes; + int ret; + if (likely(i-nr_segs == 1)) { + ret = fault_in_pages_readable(i-iov-iov_base, len); + if (ret) + *bytes = 0; + } else { + const struct iovec *iov = i-iov; + size_t base = i-iov_offset; + *bytes = 0; + while (len) { + int copy = min(len, iov-iov_len - base); + if ((ret = fault_in_pages_readable(iov-iov_base + base, copy))) + break; + *bytes += copy; + len -= copy; + base += copy; + if (iov-iov_len == base) { + iov++; + base = 0; + } + } + } + return ret; } EXPORT_SYMBOL(iov_iter_fault_in_readable); @@ -2110,7 +2132,7 @@ static ssize_t generic_perform_write_2copy(struct file *file, * to check that the address is actually valid, when atomic * usercopies are used, below. */ - if (unlikely(iov_iter_fault_in_readable(i, bytes))) { + if (unlikely(iov_iter_fault_in_readable(i, bytes) !bytes)) { status = -EFAULT; break; } @@ -2284,7 +2306,7 @@ again: * to check that the address is actually valid, when atomic * usercopies are used, below. */ - if (unlikely(iov_iter_fault_in_readable(i, bytes))) { + if (unlikely(iov_iter_fault_in_readable(i, bytes) !bytes)) { status = -EFAULT; break; } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org
Re: [patch] new aop loop fix
On 14:36 Срд 13 Июн , Hugh Dickins wrote: On Wed, 13 Jun 2007, Dmitriy Monakhov wrote: loop.c code itself is not perfect. In fact before Nick's patch partial write was't possible. Assumption what write chunks are always page aligned is realy weird ( see index++ line). Signed-off-by: Dmitriy Monakhov [EMAIL PROTECTED] I'm interested, because I'm trying to chase down an -mm bug which occasionally leaves me with 1k of zeroes where it shouldn't (in a 1k bsize ext2 looped over tmpfs). The length of time for this to happen varies a lot, so bisection has been misleading: maybe the problem lies in Nick's patches, maybe it does not. But I don't understand your fix below at all. _If_ you need to change the handling of index, then you need to change the handling of offset too, don't you? But what's wrong with how inded was handled anyway? Yes, it might be being incremented at the bottom of the loop when we haven't reached the end of this page, but in that case we're not going round the loop again anyway: len will now be 0. So no problem. One of us is missing something: please enlighten me - thanks. Yepp. You absolutely right, wrong patch was attached :( Btw: Nick's patches broke LO_CRYPT_XOR mode, but it is ok because this feature was absolete and not used by anyone, am i right here? Hugh diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 4bab9b1..8726da5 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -215,7 +215,6 @@ static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec, int len, ret; mutex_lock(mapping-host-i_mutex); - index = pos PAGE_CACHE_SHIFT; offset = pos ((pgoff_t)PAGE_CACHE_SIZE - 1); bv_offs = bvec-bv_offset; len = bvec-bv_len; @@ -226,6 +225,7 @@ static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec, struct page *page; void *fsdata; + index = pos PAGE_CACHE_SHIFT; IV = ((sector_t)index (PAGE_CACHE_SHIFT - 9))+(offset 9); size = PAGE_CACHE_SIZE - offset; if (size len) @@ -255,7 +255,6 @@ static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec, bv_offs += copied; len -= copied; offset = 0; - index++; pos += copied; } ret = 0; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: + fs-introduce-write_begin-write_end-and-perform_write-aops.patch added to -mm tree
On 13:43 Срд 13 Июн , Nick Piggin wrote: On Wed, Jun 13, 2007 at 05:40:05PM +0400, Dmitriy Monakhov wrote: On 14:19 ?? 29 ?? , [EMAIL PROTECTED] wrote: The patch titled fs: introduce write_begin, write_end, and perform_write aops has been added to the -mm tree. Its filename is fs-introduce-write_begin-write_end-and-perform_write-aops.patch *** Remember to use Documentation/SubmitChecklist when testing your code *** See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find out what to do about this -- Subject: fs: introduce write_begin, write_end, and perform_write aops From: Nick Piggin [EMAIL PROTECTED] These are intended to replace prepare_writ eand commit_write with more flexible alternatives that are also able to avoid the buffered write deadlock problems efficiently (which prepare_write is unable to do). [EMAIL PROTECTED]: API design contributions, code review and fixes] Signed-off-by: Nick Piggin [EMAIL PROTECTED] Signed-off-by: Mark Fasheh [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] I've finaly find time to review Nick's write_begin/write_end aop patch set. And i have some fixes and questions. My be i've missed somthing and it was already disscussed, but i cant find in LKML. Thanks, that's really helpful. 1) loop dev: loop.c code itself is not perfect. In fact before Nick's patch partial write was't possible. Assumption what write chunks are always page aligned is realy weird ( see index++ line). Fixed by new aop loop fix patch I think you're right, fix looks good. 2)block_write_begin: After we enter to block_write_begin with *pagep == NULL and some page was grabed we remember this page in *pagep And if __block_prepare_write() we have to clear *pagep , as it was before. Because this may confuse caller. for example caller may have folowing code: ret = block_write_begin(..., pagep,...) if (ret *pagep != NULL) { unlock_page(*pagep); page_cache_release(*pagep); } Fixed my new aop block_write_begin fix patch I don't think the caller can rely on that if it returns failure. But that is more defensive I guess. Maybe setting it to 1 or so would catch abusers. 3) __page_symlink: Nick's patch add folowing code: + err = pagecache_write_begin(NULL, mapping, 0,PAGE_CACHE_SIZE, + AOP_FLAG_UNINTERRUPTIBLE, page,fsdata); symlink now consume whole page. I have only one question WHY???. I don't see any advantages, but where are huge list of dissdvantages: a) fs with blksize == 1k and pagesize == 16k after this patch waste more than 10x times disk space for nothing. b) What happends if we want use fs with blksize == 4k on i386 after it was used by ia64 ??? (before this patch it was possible). One more visiable effect caused by wrong symlink size: fsstress cause folowing error: LOG BEGIN EXT3-fs unexpected failure: !buffer_revoked(bh); inconsistent data on disk ext3_forget: aborting transaction: IO failure in __ext3_journal_revoke ext3_abort called. EXT3-fs error (device dm-4): ext3_forget: error -5 when attempting revoke Remounting filesystem read-only Aborting journal on device dm-4. journal commit I/O error journal commit I/O error journal commit I/O error journal commit I/O error journal commit I/O error journal commit I/O error EXT3-fs error (device dm-4) in ext3_free_blocks_sb: Journal has aborted journal commit I/O error journal commit I/O error EXT3-fs error (device dm-4) in ext3_reserve_inode_write: Journal has aborted EXT3-fs error (device dm-4) in ext3_truncate: IO failure journal commit I/O error journal commit I/O error journal commit I/O error journal commit I/O error EXT3-fs error (device dm-4) in ext3_reserve_inode_write: Journal has aborted EXT3-fs error (device dm-4) in ext3_orphan_del: Journal has aborted EXT3-fs error (device dm-4) in ext3_reserve_inode_write: Journal has aborted EXT3-fs error (device dm-4) in ext3_delete_inode: IO failure LOG END After symlink size was fixed to len-1 problem dissappeared. I dont prepare patch for this because i dont understand issue witch Nick aimed to fix. I don't know why myself :P I think it would be just fine to use len-1 as it did previously, so it must have been a typo? 4) iov_iter_fault_in_readable: Function prerform check for signgle region, with out respect to segment nature of iovec, For example writev no longer works :) : writev(3, [{\1, 1}, {\2..., 4096}], 2) = -1 EFAULT (Bad address) this hidden bug, and it was invisiable because _fault_in_readable return value was ignored before. Lets
Re: [PATCH][5/5][resend] floppy.c: Fix device_create_file() warning
"Jesper Juhl" <[EMAIL PROTECTED]> writes: > On 20/03/07, Mikael Pettersson <[EMAIL PROTECTED]> wrote: >> On Mon, 19 Mar 2007 18:42:22 +0100, Jesper Juhl wrote: >> > --- a/drivers/block/floppy.c >> > +++ b/drivers/block/floppy.c >> > @@ -4302,7 +4302,12 @@ static int __init floppy_init(void) >> > if (err) >> > goto out_flush_work; >> > >> > - device_create_file(_device[drive].dev,_attr_cmos); >> > + err = device_create_file(_device[drive].dev, >> > _attr_cmos); >> > + if (err) >> > + goto out_flush_work; >> > + >> > /* to be cleaned up... */ >> > disks[drive]->private_data = (void *)(long)drive; >> > disks[drive]->queue = floppy_queue; >> >> The floppy driver's sysfs file just provides some auxiliary >> information to user-space, none of which matters for most of >> its users. It is IMO totally inappropriate to fail floppy >> driver init in this case. >> > > Which is why my original patch just output a warning to let the user > know that creating the file had failed. Ohh please... First of all you have to make you patch correct, and only after this think about tiny shiny error massages. Lets look at original code: err = platform_device_register(_device[drive]); if (err) goto out_flush_work; device_create_file(_device[drive].dev,_attr_cmos); < You propose "goto out_flush_work" here if device_create_file() failed < ,as it was done if platform_device_register() failed. But at the < moment we call device_create_file() platform_device_register() < succeed, this means we have to unregister it first and only after < this jumpto out_flush_work. > > -- > Jesper Juhl <[EMAIL PROTECTED]> > Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html > Plain text mails only, please http://www.expita.com/nomime.html > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][5/5][resend] floppy.c: Fix device_create_file() warning
Jesper Juhl [EMAIL PROTECTED] writes: On 20/03/07, Mikael Pettersson [EMAIL PROTECTED] wrote: On Mon, 19 Mar 2007 18:42:22 +0100, Jesper Juhl wrote: --- a/drivers/block/floppy.c +++ b/drivers/block/floppy.c @@ -4302,7 +4302,12 @@ static int __init floppy_init(void) if (err) goto out_flush_work; - device_create_file(floppy_device[drive].dev,dev_attr_cmos); + err = device_create_file(floppy_device[drive].dev, dev_attr_cmos); + if (err) + goto out_flush_work; + /* to be cleaned up... */ disks[drive]-private_data = (void *)(long)drive; disks[drive]-queue = floppy_queue; The floppy driver's sysfs file just provides some auxiliary information to user-space, none of which matters for most of its users. It is IMO totally inappropriate to fail floppy driver init in this case. Which is why my original patch just output a warning to let the user know that creating the file had failed. Ohh please... First of all you have to make you patch correct, and only after this think about tiny shiny error massages. Lets look at original code: err = platform_device_register(floppy_device[drive]); if (err) goto out_flush_work; device_create_file(floppy_device[drive].dev,dev_attr_cmos); You propose goto out_flush_work here if device_create_file() failed ,as it was done if platform_device_register() failed. But at the moment we call device_create_file() platform_device_register() succeed, this means we have to unregister it first and only after this jumpto out_flush_work. -- Jesper Juhl [EMAIL PROTECTED] Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html Plain text mails only, please http://www.expita.com/nomime.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] fs: remove duplicated iovec checking code v8
Where are several places where the same code used for iovec checks. This patch just move this code to separate helper function, and replace duplicated code with it. IMHO it is better because these are checks that we want for all filesystems/drivers that use vectored I/O. Signed-off-by: Dmitriy Monakhov <[EMAIL PROTECTED]> --- fs/ntfs/file.c | 21 +++-- fs/read_write.c| 40 fs/xfs/linux-2.6/xfs_lrw.c | 22 +++--- include/linux/fs.h |3 +++ mm/filemap.c | 43 ++- 5 files changed, 55 insertions(+), 74 deletions(-) diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c index dbbac55..2c672a4 100644 --- a/fs/ntfs/file.c +++ b/fs/ntfs/file.c @@ -2129,28 +2129,13 @@ static ssize_t ntfs_file_aio_write_nolock(struct kiocb *iocb, struct address_space *mapping = file->f_mapping; struct inode *inode = mapping->host; loff_t pos; - unsigned long seg; size_t count; /* after file limit checks */ ssize_t written, err; count = 0; - for (seg = 0; seg < nr_segs; seg++) { - const struct iovec *iv = [seg]; - /* -* If any segment has a negative length, or the cumulative -* length ever wraps negative then return -EINVAL. -*/ - count += iv->iov_len; - if (unlikely((ssize_t)(count|iv->iov_len) < 0)) - return -EINVAL; - if (access_ok(VERIFY_READ, iv->iov_base, iv->iov_len)) - continue; - if (!seg) - return -EFAULT; - nr_segs = seg; - count -= iv->iov_len; /* This segment is no good */ - break; - } + err = generic_iovec_checks(iov, _segs, , VERIFY_READ); + if (err) + return err; pos = *ppos; vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE); /* We can write back this queue in page reclaim. */ diff --git a/fs/read_write.c b/fs/read_write.c index 4d03008..22ec324 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -217,6 +217,46 @@ Einval: return -EINVAL; } +/* + * Performs necessary iovec checks before doing a write + * @iov: io vector request + * @nr_segs: number of segments in the iovec + * @count: number of bytes to write + * @access_flags: type of access: %VERIFY_READ or %VERIFY_WRITE + * + * Adjust number of segments and amount of bytes to write (nr_segs should be + * properly initialized first). Returns appropriate error code that caller + * should return or zero in case that write should be allowed. + */ +int generic_iovec_checks(const struct iovec *iov, + unsigned long *nr_segs, size_t *count, + unsigned long access_flags) +{ + unsigned long seg; + size_t cnt = 0; + for (seg = 0; seg < *nr_segs; seg++) { + const struct iovec *iv = [seg]; + + /* +* If any segment has a negative length, or the cumulative +* length ever wraps negative then return -EINVAL. +*/ + cnt += iv->iov_len; + if (unlikely((ssize_t)(cnt|iv->iov_len) < 0)) + return -EINVAL; + if (access_ok(access_flags, iv->iov_base, iv->iov_len)) + continue; + if (seg == 0) + return -EFAULT; + *nr_segs = seg; + cnt -= iv->iov_len; /* This segment is no good */ + break; + } + *count = cnt; + return 0; +} +EXPORT_SYMBOL(generic_iovec_checks); + static void wait_on_retry_sync_kiocb(struct kiocb *iocb) { set_current_state(TASK_UNINTERRUPTIBLE); diff --git a/fs/xfs/linux-2.6/xfs_lrw.c b/fs/xfs/linux-2.6/xfs_lrw.c index ff8d64e..9a11b00 100644 --- a/fs/xfs/linux-2.6/xfs_lrw.c +++ b/fs/xfs/linux-2.6/xfs_lrw.c @@ -639,7 +639,6 @@ xfs_write( xfs_fsize_t isize, new_size; xfs_iocore_t*io; bhv_vnode_t *vp; - unsigned long seg; int iolock; int eventsent = 0; bhv_vrwlock_t locktype; @@ -652,24 +651,9 @@ xfs_write( vp = BHV_TO_VNODE(bdp); xip = XFS_BHVTOI(bdp); - for (seg = 0; seg < segs; seg++) { - const struct iovec *iv = [seg]; - - /* -* If any segment has a negative length, or the cumulative -* length ever wraps negative then return -EINVAL. -*/ - ocount += iv->iov_len; - if (unlikely((ssize_t)(ocount|iv->iov_len) < 0)) - return -EINVAL;
[PATCH 2/2] fs: incorrect direct io error handling v8
If generic_file_direct_write() has fail (ENOSPC condition) inside __generic_file_aio_write_nolock() it may have instantiated a few blocks outside i_size in case of non blockdev files. At least ext2, ext3 and reiserfs interpret i_size and biggest block difference as error. Later fsck will complain about wrong i_size. After fsck will fix error i_size will be increased to the biggest block. This is bad because this blocks contain gurbage from previous write attempt. And result in silence file data corruption. This issue affect fs regardless the values of blocksize or pagesize. We need truncate any block beyond i_size after generic_file_direct_write() has failed. In fact all existing fs witch use __generic_file_aio_write_nolock() always call it under i_mutex for non blockdev files. This patch add correcspnding BUG_ON() in order to explicitly demand/document it. Also fix out of date direct_io locking comments. TEST_CASE: TESTCASE_BEGIN $touch /mnt/test/BIG_FILE ## at this moment /mnt/test/BIG_FILE size and blocks equal to zero open("/mnt/test/BIG_FILE", O_WRONLY|O_CREAT|O_DIRECT, 0666) = 3 write(3, ""..., 104857600) = -1 ENOSPC (No space left on device) ## size and block sould't be changed because write op failed. $stat /mnt/test/BIG_FILE File: `/mnt/test/BIG_FILE' Size: 0 Blocks: 110896 IO Block: 1024 regular empty file <<<<<<<<<<<<<<<<<<<<<<<<<<<<<^^^file size is less than biggest block idx Device: fe07h/65031dInode: 14 Links: 1 Access: (0644/-rw-r--r--) Uid: (0/root) Gid: (0/root) Access: 2007-01-24 20:03:38.0 +0300 Modify: 2007-01-24 20:03:38.0 +0300 Change: 2007-01-24 20:03:39.0 +0300 #fsck.ext3 -f /dev/VG/test e2fsck 1.39 (29-May-2006) Pass 1: Checking inodes, blocks, and sizes Inode 14, i_size is 0, should be 56556544. Fix? yes Pass 2: Checking directory structure #TESTCASE_END Signed-off-by: Dmitriy Monakhov <[EMAIL PROTECTED]> --- mm/filemap.c | 24 1 files changed, 20 insertions(+), 4 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index bbef42f..a08900e 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1892,8 +1892,10 @@ generic_file_direct_write(struct kiocb *iocb, const struct iovec *iov, /* * Sync the fs metadata but not the minor inode changes and * of course not the data as we did direct DMA for the IO. -* i_mutex is held, which protects generic_osync_inode() from -* livelocking. AIO O_DIRECT ops attempt to sync metadata here. +* i_mutex is held in case of DIO_LOCKING, which protects +* generic_osync_inode() from livelocking. If it is not held, then +* the filesystem must prevent this livelock. AIO O_DIRECT ops +* attempt to sync metadata here. */ if ((written >= 0 || written == -EIOCBQUEUED) && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) { @@ -2083,6 +2085,9 @@ __generic_file_aio_write_nolock(struct kiocb *iocb, const struct iovec *iov, ssize_t written; ssize_t err; + /* Always called under i_mutex for writes to non blockdev files */ + BUG_ON(!S_ISBLK(inode->i_mode) && + !mutex_is_locked(>i_mutex)); ocount = 0; err = generic_iovec_checks(iov, _segs, , VERIFY_READ); if (err) @@ -2117,6 +2122,17 @@ __generic_file_aio_write_nolock(struct kiocb *iocb, const struct iovec *iov, written = generic_file_direct_write(iocb, iov, _segs, pos, ppos, count, ocount); + /* +* If host is not blockdev generic_file_direct_write() may +* have instantiated a few blocks outside i_size files +* Trim these off again. +*/ + if (unlikely(written < 0) && !S_ISBLK(inode->i_mode)) { + loff_t isize = i_size_read(inode); + if (pos + count > isize) + vmtruncate(inode, isize); + } + if (written < 0 || written == count) goto out; /* @@ -2221,8 +2237,8 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov, EXPORT_SYMBOL(generic_file_aio_write); /* - * Called under i_mutex for writes to S_ISREG files. Returns -EIO if something - * went wrong during pagecache shootdown. + * Called under i_mutex for writes to S_ISREG files in case of DIO_LOCKING. + * Returns -EIO if something went wrong during pagecache shootdown. */ static ssize_t generic_file_direct_IO(int rw, struct kiocb *iocb, const struct iovec *io
[PATCH 2/2] fs: incorrect direct io error handling v8
If generic_file_direct_write() has fail (ENOSPC condition) inside __generic_file_aio_write_nolock() it may have instantiated a few blocks outside i_size in case of non blockdev files. At least ext2, ext3 and reiserfs interpret i_size and biggest block difference as error. Later fsck will complain about wrong i_size. After fsck will fix error i_size will be increased to the biggest block. This is bad because this blocks contain gurbage from previous write attempt. And result in silence file data corruption. This issue affect fs regardless the values of blocksize or pagesize. We need truncate any block beyond i_size after generic_file_direct_write() has failed. In fact all existing fs witch use __generic_file_aio_write_nolock() always call it under i_mutex for non blockdev files. This patch add correcspnding BUG_ON() in order to explicitly demand/document it. Also fix out of date direct_io locking comments. TEST_CASE: TESTCASE_BEGIN $touch /mnt/test/BIG_FILE ## at this moment /mnt/test/BIG_FILE size and blocks equal to zero open(/mnt/test/BIG_FILE, O_WRONLY|O_CREAT|O_DIRECT, 0666) = 3 write(3, ..., 104857600) = -1 ENOSPC (No space left on device) ## size and block sould't be changed because write op failed. $stat /mnt/test/BIG_FILE File: `/mnt/test/BIG_FILE' Size: 0 Blocks: 110896 IO Block: 1024 regular empty file ^^^file size is less than biggest block idx Device: fe07h/65031dInode: 14 Links: 1 Access: (0644/-rw-r--r--) Uid: (0/root) Gid: (0/root) Access: 2007-01-24 20:03:38.0 +0300 Modify: 2007-01-24 20:03:38.0 +0300 Change: 2007-01-24 20:03:39.0 +0300 #fsck.ext3 -f /dev/VG/test e2fsck 1.39 (29-May-2006) Pass 1: Checking inodes, blocks, and sizes Inode 14, i_size is 0, should be 56556544. Fixy? yes Pass 2: Checking directory structure #TESTCASE_END Signed-off-by: Dmitriy Monakhov [EMAIL PROTECTED] --- mm/filemap.c | 24 1 files changed, 20 insertions(+), 4 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index bbef42f..a08900e 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1892,8 +1892,10 @@ generic_file_direct_write(struct kiocb *iocb, const struct iovec *iov, /* * Sync the fs metadata but not the minor inode changes and * of course not the data as we did direct DMA for the IO. -* i_mutex is held, which protects generic_osync_inode() from -* livelocking. AIO O_DIRECT ops attempt to sync metadata here. +* i_mutex is held in case of DIO_LOCKING, which protects +* generic_osync_inode() from livelocking. If it is not held, then +* the filesystem must prevent this livelock. AIO O_DIRECT ops +* attempt to sync metadata here. */ if ((written = 0 || written == -EIOCBQUEUED) ((file-f_flags O_SYNC) || IS_SYNC(inode))) { @@ -2083,6 +2085,9 @@ __generic_file_aio_write_nolock(struct kiocb *iocb, const struct iovec *iov, ssize_t written; ssize_t err; + /* Always called under i_mutex for writes to non blockdev files */ + BUG_ON(!S_ISBLK(inode-i_mode) + !mutex_is_locked(inode-i_mutex)); ocount = 0; err = generic_iovec_checks(iov, nr_segs, ocount, VERIFY_READ); if (err) @@ -2117,6 +2122,17 @@ __generic_file_aio_write_nolock(struct kiocb *iocb, const struct iovec *iov, written = generic_file_direct_write(iocb, iov, nr_segs, pos, ppos, count, ocount); + /* +* If host is not blockdev generic_file_direct_write() may +* have instantiated a few blocks outside i_size files +* Trim these off again. +*/ + if (unlikely(written 0) !S_ISBLK(inode-i_mode)) { + loff_t isize = i_size_read(inode); + if (pos + count isize) + vmtruncate(inode, isize); + } + if (written 0 || written == count) goto out; /* @@ -2221,8 +2237,8 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov, EXPORT_SYMBOL(generic_file_aio_write); /* - * Called under i_mutex for writes to S_ISREG files. Returns -EIO if something - * went wrong during pagecache shootdown. + * Called under i_mutex for writes to S_ISREG files in case of DIO_LOCKING. + * Returns -EIO if something went wrong during pagecache shootdown. */ static ssize_t generic_file_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, -- 1.4.4.2 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read
[PATCH 1/2] fs: remove duplicated iovec checking code v8
Where are several places where the same code used for iovec checks. This patch just move this code to separate helper function, and replace duplicated code with it. IMHO it is better because these are checks that we want for all filesystems/drivers that use vectored I/O. Signed-off-by: Dmitriy Monakhov [EMAIL PROTECTED] --- fs/ntfs/file.c | 21 +++-- fs/read_write.c| 40 fs/xfs/linux-2.6/xfs_lrw.c | 22 +++--- include/linux/fs.h |3 +++ mm/filemap.c | 43 ++- 5 files changed, 55 insertions(+), 74 deletions(-) diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c index dbbac55..2c672a4 100644 --- a/fs/ntfs/file.c +++ b/fs/ntfs/file.c @@ -2129,28 +2129,13 @@ static ssize_t ntfs_file_aio_write_nolock(struct kiocb *iocb, struct address_space *mapping = file-f_mapping; struct inode *inode = mapping-host; loff_t pos; - unsigned long seg; size_t count; /* after file limit checks */ ssize_t written, err; count = 0; - for (seg = 0; seg nr_segs; seg++) { - const struct iovec *iv = iov[seg]; - /* -* If any segment has a negative length, or the cumulative -* length ever wraps negative then return -EINVAL. -*/ - count += iv-iov_len; - if (unlikely((ssize_t)(count|iv-iov_len) 0)) - return -EINVAL; - if (access_ok(VERIFY_READ, iv-iov_base, iv-iov_len)) - continue; - if (!seg) - return -EFAULT; - nr_segs = seg; - count -= iv-iov_len; /* This segment is no good */ - break; - } + err = generic_iovec_checks(iov, nr_segs, count, VERIFY_READ); + if (err) + return err; pos = *ppos; vfs_check_frozen(inode-i_sb, SB_FREEZE_WRITE); /* We can write back this queue in page reclaim. */ diff --git a/fs/read_write.c b/fs/read_write.c index 4d03008..22ec324 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -217,6 +217,46 @@ Einval: return -EINVAL; } +/* + * Performs necessary iovec checks before doing a write + * @iov: io vector request + * @nr_segs: number of segments in the iovec + * @count: number of bytes to write + * @access_flags: type of access: %VERIFY_READ or %VERIFY_WRITE + * + * Adjust number of segments and amount of bytes to write (nr_segs should be + * properly initialized first). Returns appropriate error code that caller + * should return or zero in case that write should be allowed. + */ +int generic_iovec_checks(const struct iovec *iov, + unsigned long *nr_segs, size_t *count, + unsigned long access_flags) +{ + unsigned long seg; + size_t cnt = 0; + for (seg = 0; seg *nr_segs; seg++) { + const struct iovec *iv = iov[seg]; + + /* +* If any segment has a negative length, or the cumulative +* length ever wraps negative then return -EINVAL. +*/ + cnt += iv-iov_len; + if (unlikely((ssize_t)(cnt|iv-iov_len) 0)) + return -EINVAL; + if (access_ok(access_flags, iv-iov_base, iv-iov_len)) + continue; + if (seg == 0) + return -EFAULT; + *nr_segs = seg; + cnt -= iv-iov_len; /* This segment is no good */ + break; + } + *count = cnt; + return 0; +} +EXPORT_SYMBOL(generic_iovec_checks); + static void wait_on_retry_sync_kiocb(struct kiocb *iocb) { set_current_state(TASK_UNINTERRUPTIBLE); diff --git a/fs/xfs/linux-2.6/xfs_lrw.c b/fs/xfs/linux-2.6/xfs_lrw.c index ff8d64e..9a11b00 100644 --- a/fs/xfs/linux-2.6/xfs_lrw.c +++ b/fs/xfs/linux-2.6/xfs_lrw.c @@ -639,7 +639,6 @@ xfs_write( xfs_fsize_t isize, new_size; xfs_iocore_t*io; bhv_vnode_t *vp; - unsigned long seg; int iolock; int eventsent = 0; bhv_vrwlock_t locktype; @@ -652,24 +651,9 @@ xfs_write( vp = BHV_TO_VNODE(bdp); xip = XFS_BHVTOI(bdp); - for (seg = 0; seg segs; seg++) { - const struct iovec *iv = iovp[seg]; - - /* -* If any segment has a negative length, or the cumulative -* length ever wraps negative then return -EINVAL. -*/ - ocount += iv-iov_len; - if (unlikely((ssize_t)(ocount|iv-iov_len) 0)) - return -EINVAL; - if (access_ok(VERIFY_READ, iv-iov_base, iv-iov_len
Re: [PATCH] splice: partial write handling fix
Jens Axboe <[EMAIL PROTECTED]> writes: > On Wed, Mar 14 2007, Dmitriy Monakhov wrote: >> currently if partial write has happened while ->commit_write() then page >> wasn't marked as accessed and rebalanced. > > The ->commit_write() return values aren't very well designed imho. Is > your fix correct getting the pipe_to_file() return value correct now? I think yes, after my patch this path look like follows: ret = mapping->a_ops->commit_write(file, page, offset, offset+this_len); if (ret) { if(ret == AOP_TRUNCATED_PAGE) { page_cache_release(page); goto find_page; } if (ret < 0) goto out; /* * Partial write has happened, so 'ret' already initialized by * number of bytes written, Where is nothing we have to do here. */ } else ret = this_len; /* * Return the number of bytes written and mark page as * accessed, we are now done! */ mark_page_accessed(page); balance_dirty_pages_ratelimited(mapping); out: page_cache_release(page); unlock_page(page); out_ret: return ret; } But off corse this patch does't fix issues spotted by Nick. > > -- > Jens Axboe > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2/5] fs: introduce new aops and infrastructure
Nick Piggin <[EMAIL PROTECTED]> writes: > Index: linux-2.6/fs/splice.c > === > --- linux-2.6.orig/fs/splice.c > +++ linux-2.6/fs/splice.c > @@ -559,7 +559,7 @@ static int pipe_to_file(struct pipe_inod > struct address_space *mapping = file->f_mapping; > unsigned int offset, this_len; > struct page *page; > - pgoff_t index; > + void *fsdata; > int ret; > > /* > @@ -569,13 +569,13 @@ static int pipe_to_file(struct pipe_inod > if (unlikely(ret)) > return ret; > > - index = sd->pos >> PAGE_CACHE_SHIFT; > offset = sd->pos & ~PAGE_CACHE_MASK; > > this_len = sd->len; > if (this_len + offset > PAGE_CACHE_SIZE) > this_len = PAGE_CACHE_SIZE - offset; > > +#if 0 > /* >* Reuse buf page, if SPLICE_F_MOVE is set and we are doing a full >* page. > @@ -587,86 +587,11 @@ static int pipe_to_file(struct pipe_inod >* locked on successful return. >*/ > if (buf->ops->steal(pipe, buf)) > - goto find_page; > +#endif One more note. It's looks like you just disabled all fancy zero copy logic. Off corse this is just rfc patchset. But i think where is fundamental problem with it: Previous logic was following: 1)splice code responsible for: stealing(if possible) and loking the page 2)prepare_write() code responsible for: do fs speciffic stuff But with new write_begin() logic all steps (grubbing, locking, preparing) happened internaly inside write_begin() witch doesn't even know about what kind of data will be copied between write_begin/write_end. So fancy zero copy logic is impossible :( I think this can be solved somehow, but i dont know yet, how can this be done without implementing it inside begin_write(). > > - page = buf->page; > - if (add_to_page_cache(page, mapping, index, GFP_KERNEL)) { > - unlock_page(page); > - goto find_page; > - } > - > - page_cache_get(page); > - > - if (!(buf->flags & PIPE_BUF_FLAG_LRU)) > - lru_cache_add(page); > - } else { > -find_page: > - page = find_lock_page(mapping, index); > - if (!page) { > - ret = -ENOMEM; > - page = page_cache_alloc_cold(mapping); > - if (unlikely(!page)) > - goto out_ret; > - > - /* > - * This will also lock the page > - */ > - ret = add_to_page_cache_lru(page, mapping, index, > - GFP_KERNEL); > - if (unlikely(ret)) > - goto out; > - } > - > - /* > - * We get here with the page locked. If the page is also > - * uptodate, we don't need to do more. If it isn't, we > - * may need to bring it in if we are not going to overwrite > - * the full page. > - */ > - if (!PageUptodate(page)) { > - if (this_len < PAGE_CACHE_SIZE) { > - ret = mapping->a_ops->readpage(file, page); > - if (unlikely(ret)) > - goto out; > - > - lock_page(page); > - > - if (!PageUptodate(page)) { > - /* > - * Page got invalidated, repeat. > - */ > - if (!page->mapping) { > - unlock_page(page); > - page_cache_release(page); > - goto find_page; > - } > - ret = -EIO; > - goto out; > - } > - } else > - SetPageUptodate(page); > - } > - } > - > - ret = mapping->a_ops->prepare_write(file, page, offset, > offset+this_len); > - if (unlikely(ret)) { > - loff_t isize = i_size_read(mapping->host); > - > - if (ret != AOP_TRUNCATED_PAGE) > - unlock_page(page); > - page_cache_release(page); > - if (ret == AOP_TRUNCATED_PAGE) > - goto find_page; > - > - /* > - * prepare_write() may have instantiated a few blocks > - * outside i_size. Trim these off again. > - */ > - if (sd->pos + this_len > isize) > - vmtruncate(mapping->host, isize); > - > -
Re: [patch 2/5] fs: introduce new aops and infrastructure
Nick Piggin [EMAIL PROTECTED] writes: Index: linux-2.6/fs/splice.c === --- linux-2.6.orig/fs/splice.c +++ linux-2.6/fs/splice.c @@ -559,7 +559,7 @@ static int pipe_to_file(struct pipe_inod struct address_space *mapping = file-f_mapping; unsigned int offset, this_len; struct page *page; - pgoff_t index; + void *fsdata; int ret; /* @@ -569,13 +569,13 @@ static int pipe_to_file(struct pipe_inod if (unlikely(ret)) return ret; - index = sd-pos PAGE_CACHE_SHIFT; offset = sd-pos ~PAGE_CACHE_MASK; this_len = sd-len; if (this_len + offset PAGE_CACHE_SIZE) this_len = PAGE_CACHE_SIZE - offset; +#if 0 /* * Reuse buf page, if SPLICE_F_MOVE is set and we are doing a full * page. @@ -587,86 +587,11 @@ static int pipe_to_file(struct pipe_inod * locked on successful return. */ if (buf-ops-steal(pipe, buf)) - goto find_page; +#endif One more note. It's looks like you just disabled all fancy zero copy logic. Off corse this is just rfc patchset. But i think where is fundamental problem with it: Previous logic was following: 1)splice code responsible for: stealing(if possible) and loking the page 2)prepare_write() code responsible for: do fs speciffic stuff But with new write_begin() logic all steps (grubbing, locking, preparing) happened internaly inside write_begin() witch doesn't even know about what kind of data will be copied between write_begin/write_end. So fancy zero copy logic is impossible :( I think this can be solved somehow, but i dont know yet, how can this be done without implementing it inside begin_write(). - page = buf-page; - if (add_to_page_cache(page, mapping, index, GFP_KERNEL)) { - unlock_page(page); - goto find_page; - } - - page_cache_get(page); - - if (!(buf-flags PIPE_BUF_FLAG_LRU)) - lru_cache_add(page); - } else { -find_page: - page = find_lock_page(mapping, index); - if (!page) { - ret = -ENOMEM; - page = page_cache_alloc_cold(mapping); - if (unlikely(!page)) - goto out_ret; - - /* - * This will also lock the page - */ - ret = add_to_page_cache_lru(page, mapping, index, - GFP_KERNEL); - if (unlikely(ret)) - goto out; - } - - /* - * We get here with the page locked. If the page is also - * uptodate, we don't need to do more. If it isn't, we - * may need to bring it in if we are not going to overwrite - * the full page. - */ - if (!PageUptodate(page)) { - if (this_len PAGE_CACHE_SIZE) { - ret = mapping-a_ops-readpage(file, page); - if (unlikely(ret)) - goto out; - - lock_page(page); - - if (!PageUptodate(page)) { - /* - * Page got invalidated, repeat. - */ - if (!page-mapping) { - unlock_page(page); - page_cache_release(page); - goto find_page; - } - ret = -EIO; - goto out; - } - } else - SetPageUptodate(page); - } - } - - ret = mapping-a_ops-prepare_write(file, page, offset, offset+this_len); - if (unlikely(ret)) { - loff_t isize = i_size_read(mapping-host); - - if (ret != AOP_TRUNCATED_PAGE) - unlock_page(page); - page_cache_release(page); - if (ret == AOP_TRUNCATED_PAGE) - goto find_page; - - /* - * prepare_write() may have instantiated a few blocks - * outside i_size. Trim these off again. - */ - if (sd-pos + this_len isize) - vmtruncate(mapping-host, isize); - - goto out_ret; - } + ret = pagecache_write_begin(file, mapping, sd-pos, sd-len, 0, page, fsdata); + if
Re: [PATCH] splice: partial write handling fix
Jens Axboe [EMAIL PROTECTED] writes: On Wed, Mar 14 2007, Dmitriy Monakhov wrote: currently if partial write has happened while -commit_write() then page wasn't marked as accessed and rebalanced. The -commit_write() return values aren't very well designed imho. Is your fix correct getting the pipe_to_file() return value correct now? I think yes, after my patch this path look like follows: ret = mapping-a_ops-commit_write(file, page, offset, offset+this_len); if (ret) { if(ret == AOP_TRUNCATED_PAGE) { page_cache_release(page); goto find_page; } if (ret 0) goto out; /* * Partial write has happened, so 'ret' already initialized by * number of bytes written, Where is nothing we have to do here. */ } else ret = this_len; /* * Return the number of bytes written and mark page as * accessed, we are now done! */ mark_page_accessed(page); balance_dirty_pages_ratelimited(mapping); out: page_cache_release(page); unlock_page(page); out_ret: return ret; } But off corse this patch does't fix issues spotted by Nick. -- Jens Axboe - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2/5] fs: introduce new aops and infrastructure
Nick Piggin <[EMAIL PROTECTED]> writes: > Introduce write_begin, write_end, and perform_write aops. > > These are intended to replace prepare_write and commit_write with more > flexible alternatives that are also able to avoid the buffered write > deadlock problems efficiently (which prepare_write is unable to do). > > Documentation/filesystems/Locking |9 + > Documentation/filesystems/vfs.txt | 38 ++ > drivers/block/loop.c | 69 --- > fs/buffer.c | 144 > fs/libfs.c| 40 ++ > fs/namei.c| 47 ++- > fs/splice.c | 106 + > include/linux/buffer_head.h |7 + > include/linux/fs.h| 29 > include/linux/pagemap.h |2 > mm/filemap.c | 228 > ++ > 11 files changed, 494 insertions(+), 225 deletions(-) > > Index: linux-2.6/include/linux/fs.h > === > --- linux-2.6.orig/include/linux/fs.h > +++ linux-2.6/include/linux/fs.h > @@ -449,6 +449,17 @@ struct address_space_operations { >*/ > int (*prepare_write)(struct file *, struct page *, unsigned, unsigned); > int (*commit_write)(struct file *, struct page *, unsigned, unsigned); > + > + int (*write_begin)(struct file *, struct address_space *mapping, > + loff_t pos, unsigned len, int intr, > + struct page **pagep, void **fsdata); > + int (*write_end)(struct file *, struct address_space *mapping, > + loff_t pos, unsigned len, unsigned copied, > + struct page *page, void *fsdata); > + > + int (*perform_write)(struct file *, struct address_space *, > + struct iov_iter *, loff_t); > + > /* Unfortunately this kludge is needed for FIBMAP. Don't use it */ > sector_t (*bmap)(struct address_space *, sector_t); > void (*invalidatepage) (struct page *, unsigned long); > @@ -463,6 +474,18 @@ struct address_space_operations { > int (*launder_page) (struct page *); > }; > > +/* > + * pagecache_write_begin/pagecache_write_end must be used by general code > + * to write into the pagecache. > + */ > +int pagecache_write_begin(struct file *, struct address_space *mapping, > + loff_t pos, unsigned len, int intr, > + struct page **pagep, void **fsdata); > + > +int pagecache_write_end(struct file *, struct address_space *mapping, > + loff_t pos, unsigned len, unsigned copied, > + struct page *page, void *fsdata); > + > struct backing_dev_info; > struct address_space { > struct inode*host; /* owner: inode, block_device */ > @@ -1902,6 +1925,12 @@ extern int simple_prepare_write(struct f > unsigned offset, unsigned to); > extern int simple_commit_write(struct file *file, struct page *page, > unsigned offset, unsigned to); > +extern int simple_write_begin(struct file *file, struct address_space > *mapping, > + loff_t pos, unsigned len, int intr, > + struct page **pagep, void **fsdata); > +extern int simple_write_end(struct file *file, struct address_space *mapping, > + loff_t pos, unsigned len, unsigned copied, > + struct page *page, void *fsdata); > > extern struct dentry *simple_lookup(struct inode *, struct dentry *, struct > nameidata *); > extern ssize_t generic_read_dir(struct file *, char __user *, size_t, loff_t > *); > Index: linux-2.6/mm/filemap.c > === > --- linux-2.6.orig/mm/filemap.c > +++ linux-2.6/mm/filemap.c > @@ -2049,6 +2049,88 @@ inline int generic_write_checks(struct f > } > EXPORT_SYMBOL(generic_write_checks); > > +int pagecache_write_begin(struct file *file, struct address_space *mapping, > + loff_t pos, unsigned len, int intr, > + struct page **pagep, void **fsdata) > +{ > + const struct address_space_operations *aops = mapping->a_ops; > + > + if (aops->write_begin) > + return aops->write_begin(file, mapping, pos, len, intr, pagep, > fsdata); > + else { > + int ret; > + pgoff_t index = pos >> PAGE_CACHE_SHIFT; > + unsigned offset = pos & (PAGE_CACHE_SIZE - 1); > + struct inode *inode = mapping->host; > + struct page *page; > +again: > + page = __grab_cache_page(mapping, index); > + *pagep = page; > + if (!page) > + return -ENOMEM; > + > + if (intr && !PageUptodate(page)) { > +
[PATCH] splice: partial write handling fix
currently if partial write has happened while ->commit_write() then page wasn't marked as accessed and rebalanced. Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]> diff --git a/fs/splice.c b/fs/splice.c index 2fca6eb..bb1bf62 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -682,18 +682,25 @@ find_page: } ret = mapping->a_ops->commit_write(file, page, offset, offset+this_len); - if (!ret) { + if (ret) { + if(ret == AOP_TRUNCATED_PAGE) { + page_cache_release(page); + goto find_page; + } + if (ret < 0) + goto out; /* -* Return the number of bytes written and mark page as -* accessed, we are now done! +* Partial write has happened, so 'ret' already initialized by +* number of bytes written, Where is nothing we have to do here. */ + } else ret = this_len; - mark_page_accessed(page); - balance_dirty_pages_ratelimited(mapping); - } else if (ret == AOP_TRUNCATED_PAGE) { - page_cache_release(page); - goto find_page; - } + /* +* Return the number of bytes written and mark page as +* accessed, we are now done! +*/ + mark_page_accessed(page); + balance_dirty_pages_ratelimited(mapping); out: page_cache_release(page); unlock_page(page); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] splice: partial write handling fix
currently if partial write has happened while -commit_write() then page wasn't marked as accessed and rebalanced. Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] diff --git a/fs/splice.c b/fs/splice.c index 2fca6eb..bb1bf62 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -682,18 +682,25 @@ find_page: } ret = mapping-a_ops-commit_write(file, page, offset, offset+this_len); - if (!ret) { + if (ret) { + if(ret == AOP_TRUNCATED_PAGE) { + page_cache_release(page); + goto find_page; + } + if (ret 0) + goto out; /* -* Return the number of bytes written and mark page as -* accessed, we are now done! +* Partial write has happened, so 'ret' already initialized by +* number of bytes written, Where is nothing we have to do here. */ + } else ret = this_len; - mark_page_accessed(page); - balance_dirty_pages_ratelimited(mapping); - } else if (ret == AOP_TRUNCATED_PAGE) { - page_cache_release(page); - goto find_page; - } + /* +* Return the number of bytes written and mark page as +* accessed, we are now done! +*/ + mark_page_accessed(page); + balance_dirty_pages_ratelimited(mapping); out: page_cache_release(page); unlock_page(page); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2/5] fs: introduce new aops and infrastructure
Nick Piggin [EMAIL PROTECTED] writes: Introduce write_begin, write_end, and perform_write aops. These are intended to replace prepare_write and commit_write with more flexible alternatives that are also able to avoid the buffered write deadlock problems efficiently (which prepare_write is unable to do). Documentation/filesystems/Locking |9 + Documentation/filesystems/vfs.txt | 38 ++ drivers/block/loop.c | 69 --- fs/buffer.c | 144 fs/libfs.c| 40 ++ fs/namei.c| 47 ++- fs/splice.c | 106 + include/linux/buffer_head.h |7 + include/linux/fs.h| 29 include/linux/pagemap.h |2 mm/filemap.c | 228 ++ 11 files changed, 494 insertions(+), 225 deletions(-) Index: linux-2.6/include/linux/fs.h === --- linux-2.6.orig/include/linux/fs.h +++ linux-2.6/include/linux/fs.h @@ -449,6 +449,17 @@ struct address_space_operations { */ int (*prepare_write)(struct file *, struct page *, unsigned, unsigned); int (*commit_write)(struct file *, struct page *, unsigned, unsigned); + + int (*write_begin)(struct file *, struct address_space *mapping, + loff_t pos, unsigned len, int intr, + struct page **pagep, void **fsdata); + int (*write_end)(struct file *, struct address_space *mapping, + loff_t pos, unsigned len, unsigned copied, + struct page *page, void *fsdata); + + int (*perform_write)(struct file *, struct address_space *, + struct iov_iter *, loff_t); + /* Unfortunately this kludge is needed for FIBMAP. Don't use it */ sector_t (*bmap)(struct address_space *, sector_t); void (*invalidatepage) (struct page *, unsigned long); @@ -463,6 +474,18 @@ struct address_space_operations { int (*launder_page) (struct page *); }; +/* + * pagecache_write_begin/pagecache_write_end must be used by general code + * to write into the pagecache. + */ +int pagecache_write_begin(struct file *, struct address_space *mapping, + loff_t pos, unsigned len, int intr, + struct page **pagep, void **fsdata); + +int pagecache_write_end(struct file *, struct address_space *mapping, + loff_t pos, unsigned len, unsigned copied, + struct page *page, void *fsdata); + struct backing_dev_info; struct address_space { struct inode*host; /* owner: inode, block_device */ @@ -1902,6 +1925,12 @@ extern int simple_prepare_write(struct f unsigned offset, unsigned to); extern int simple_commit_write(struct file *file, struct page *page, unsigned offset, unsigned to); +extern int simple_write_begin(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, int intr, + struct page **pagep, void **fsdata); +extern int simple_write_end(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned copied, + struct page *page, void *fsdata); extern struct dentry *simple_lookup(struct inode *, struct dentry *, struct nameidata *); extern ssize_t generic_read_dir(struct file *, char __user *, size_t, loff_t *); Index: linux-2.6/mm/filemap.c === --- linux-2.6.orig/mm/filemap.c +++ linux-2.6/mm/filemap.c @@ -2049,6 +2049,88 @@ inline int generic_write_checks(struct f } EXPORT_SYMBOL(generic_write_checks); +int pagecache_write_begin(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, int intr, + struct page **pagep, void **fsdata) +{ + const struct address_space_operations *aops = mapping-a_ops; + + if (aops-write_begin) + return aops-write_begin(file, mapping, pos, len, intr, pagep, fsdata); + else { + int ret; + pgoff_t index = pos PAGE_CACHE_SHIFT; + unsigned offset = pos (PAGE_CACHE_SIZE - 1); + struct inode *inode = mapping-host; + struct page *page; +again: + page = __grab_cache_page(mapping, index); + *pagep = page; + if (!page) + return -ENOMEM; + + if (intr !PageUptodate(page)) { + /* + * There is no way to resolve a short write situation +
[PATCH 2/2] incorrect direct io error handling (v7)
Changes against v6: - Handle direct_io failure inside generic_file_direct_write() as it was recommend by Andrew (during discussion v1), and by Nick (during discussion v6). - change comments, make it more clear. - one more time check what __generic_file_aio_write_nolock() always called under i_mutex for non blkdev files. Tested with: fsstress, manual direct_io tests Log: If generic_file_direct_write() has fail (ENOSPC condition) inside __generic_file_aio_write_nolock() it may have instantiated a few blocks outside i_size. And fsck will complain about wrong i_size (ext2, ext3 and reiserfs interpret i_size and biggest block difference as error), after fsck will fix error i_size will be increased to the biggest block, but this blocks contain gurbage from previous write attempt, this is not information leak, but its silence file data corruption. This issue affect fs regardless the values of blocksize or pagesize, and off corse only for non blkdev files. We need truncate any block beyond i_size after write have failed , do in simular generic_file_buffered_write() error path. We may safely call vmtruncate() here because i_mutex always held for non blkdev files. TEST_CASE: open("/mnt/test/BIG_FILE", O_WRONLY|O_CREAT|O_DIRECT, 0666) = 3 write(3, "aaa"..., 104857600) = -1 ENOSPC (No space left on device) #stat /mnt/test/BIG_FILE File: `/mnt/test/BIG_FILE' Size: 0 Blocks: 110896 IO Block: 1024 regular empty file <<--- mm/filemap.c | 28 1 files changed, 24 insertions(+), 4 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 8bd1ea4..95d49fe 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1932,8 +1932,10 @@ generic_file_direct_write(struct kiocb *iocb, const struct iovec *iov, /* * Sync the fs metadata but not the minor inode changes and * of course not the data as we did direct DMA for the IO. -* i_mutex is held, which protects generic_osync_inode() from -* livelocking. AIO O_DIRECT ops attempt to sync metadata here. +* i_mutex is held in case of DIO_LOCKING, which protects +* generic_osync_inode() from livelocking. If it is not held, then +* the filesystem must prevent this livelock. AIO O_DIRECT ops +* attempt to sync metadata here. */ if ((written >= 0 || written == -EIOCBQUEUED) && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) { @@ -2155,8 +2157,26 @@ __generic_file_aio_write_nolock(struct kiocb *iocb, const struct iovec *iov, loff_t endbyte; ssize_t written_buffered; + /* +* In case of non blockdev we may fail to buffered I/O. +* So i_mutex must be held. +*/ + if (!S_ISBLK(inode->i_mode)) + BUG_ON(!mutex_is_locked(>i_mutex)); + written = generic_file_direct_write(iocb, iov, _segs, pos, ppos, count, ocount); + /* +* If host is not S_ISBLK generic_file_direct_write() may +* have instantiated a few blocks outside i_size files +* Trim these off again. +*/ + if (unlikely(written < 0) && !S_ISBLK(inode->i_mode)) { + loff_t isize = i_size_read(inode); + if (pos + count > isize) + vmtruncate(inode, isize); + } + if (written < 0 || written == count) goto out; /* @@ -2261,8 +2281,8 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov, EXPORT_SYMBOL(generic_file_aio_write); /* - * Called under i_mutex for writes to S_ISREG files. Returns -EIO if something - * went wrong during pagecache shootdown. + * Called under i_mutex for writes to S_ISREG files in case of DIO_LOCKING. + * Returns -EIO if something went wrong during pagecache shootdown. */ static ssize_t generic_file_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, -- 1.5.0.1 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at
[PATCH 1/2] mm: move common segment checks to separate helper function (v7)
Changes against v6 - remove duplicated code from xfs,ntfs - export generic_segment_checks, because it used by xfs,nfs now. - change arguments initialization pocily according to Nick's comments. Tested with: ltp readv/writev tests Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]> --- fs/ntfs/file.c | 21 ++- fs/xfs/linux-2.6/xfs_lrw.c | 22 ++-- include/linux/fs.h |3 ++ mm/filemap.c | 83 --- 4 files changed, 55 insertions(+), 74 deletions(-) diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c index dbbac55..621de36 100644 --- a/fs/ntfs/file.c +++ b/fs/ntfs/file.c @@ -2129,28 +2129,13 @@ static ssize_t ntfs_file_aio_write_nolock(struct kiocb *iocb, struct address_space *mapping = file->f_mapping; struct inode *inode = mapping->host; loff_t pos; - unsigned long seg; size_t count; /* after file limit checks */ ssize_t written, err; count = 0; - for (seg = 0; seg < nr_segs; seg++) { - const struct iovec *iv = [seg]; - /* -* If any segment has a negative length, or the cumulative -* length ever wraps negative then return -EINVAL. -*/ - count += iv->iov_len; - if (unlikely((ssize_t)(count|iv->iov_len) < 0)) - return -EINVAL; - if (access_ok(VERIFY_READ, iv->iov_base, iv->iov_len)) - continue; - if (!seg) - return -EFAULT; - nr_segs = seg; - count -= iv->iov_len; /* This segment is no good */ - break; - } + err = generic_segment_checks(iov, _segs, , VERIFY_READ); + if (err) + return err; pos = *ppos; vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE); /* We can write back this queue in page reclaim. */ diff --git a/fs/xfs/linux-2.6/xfs_lrw.c b/fs/xfs/linux-2.6/xfs_lrw.c index ff8d64e..558076d 100644 --- a/fs/xfs/linux-2.6/xfs_lrw.c +++ b/fs/xfs/linux-2.6/xfs_lrw.c @@ -639,7 +639,6 @@ xfs_write( xfs_fsize_t isize, new_size; xfs_iocore_t*io; bhv_vnode_t *vp; - unsigned long seg; int iolock; int eventsent = 0; bhv_vrwlock_t locktype; @@ -652,24 +651,9 @@ xfs_write( vp = BHV_TO_VNODE(bdp); xip = XFS_BHVTOI(bdp); - for (seg = 0; seg < segs; seg++) { - const struct iovec *iv = [seg]; - - /* -* If any segment has a negative length, or the cumulative -* length ever wraps negative then return -EINVAL. -*/ - ocount += iv->iov_len; - if (unlikely((ssize_t)(ocount|iv->iov_len) < 0)) - return -EINVAL; - if (access_ok(VERIFY_READ, iv->iov_base, iv->iov_len)) - continue; - if (seg == 0) - return -EFAULT; - segs = seg; - ocount -= iv->iov_len; /* This segment is no good */ - break; - } + error = generic_segment_checks(iovp, , , VERIFY_READ); + if (error) + return error; count = ocount; pos = *offset; diff --git a/include/linux/fs.h b/include/linux/fs.h index 6a3d22e..3b99450 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1778,6 +1778,9 @@ extern ssize_t generic_file_sendfile(struct file *, loff_t *, size_t, read_actor extern void do_generic_mapping_read(struct address_space *mapping, struct file_ra_state *, struct file *, loff_t *, read_descriptor_t *, read_actor_t); +extern int generic_segment_checks(const struct iovec *iov, + unsigned long *nr_segs, size_t *count, + unsigned long access_flags); /* fs/splice.c */ extern ssize_t generic_file_splice_read(struct file *, loff_t *, diff --git a/mm/filemap.c b/mm/filemap.c index 8e1849a..8bd1ea4 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1159,6 +1159,46 @@ success: return size; } +/* + * Performs necessary checks before doing a write + * @iov: io vector request + * @nr_segs: number of segments in the iovec + * @count: number of bytes to write + * @access_flags: type of access: %VERIFY_READ or %VERIFY_WRITE + * + * Adjust number of segments and amount of bytes to write (nr_segs should be + * properly initialized first). Returns appropriate error code that caller + * should return or zero in case that write should be allowed. + */ +int generic_segment_checks(const struct iovec *iov, + unsigned long *nr_segs, size_t *count, + unsigned long access_flags) +{ + unsigned long
Re: [PATCH 1/2] mm: move common segment checks to separate helper function (v6)
Nick Piggin <[EMAIL PROTECTED]> writes: > On Mon, Mar 12, 2007 at 10:57:53AM +0300, Dmitriy Monakhov wrote: >> I realy don't want to be annoying by sending this patcheset over and over >> again. If anyone think this patch is realy cappy, please comment what >> exectly is bad. Thank you. > > Doesn't seem like a bad idea. > >> >> Changes: >> - patch was split in two patches. >> +/* >> + * Performs necessary checks before doing a write >> + * >> + * Adjust number of segments and amount of bytes to write. >> + * Returns appropriate error code that caller should return or >> + * zero in case that write should be allowed. >> + */ >> +inline int generic_segment_checks(const struct iovec *iov, >> +unsigned long *nr_segs, size_t *count, >> +unsigned long access_flags) > > Make it static and not inline, and the compiler will work it out. Wow i've just carefully checked and found more functions with duplicating code: fs/xfs/linux-2.6/xfs_lrw.c:655 xfs_write() fs/ntfs/file.c:2339 ntfs_file_aio_write_nolock() So i think nobody will object against exporting generic_segment_checks() and removing doplicating code. > > This function name doesn't really imply that it returns you the > nr_segs and count, but that's not a big deal I guess. > > You also don't say that nr_segs should be initialised to the amount > you which to write, while count must be initialised to zero. > >> +{ >> +unsigned long seg; >> +for (seg = 0; seg < *nr_segs; seg++) { >> +const struct iovec *iv = [seg]; >> + >> +/* >> + * If any segment has a negative length, or the cumulative >> + * length ever wraps negative then return -EINVAL. >> + */ >> +*count += iv->iov_len; >> +if (unlikely((ssize_t)(*count|iv->iov_len) < 0)) >> +return -EINVAL; >> +if (access_ok(access_flags, iv->iov_base, iv->iov_len)) >> +continue; > > Why now insert the above test, and put the below statements inside the > branch? OTOH, that makes it less obviously c from the others. Maybe > a subsequent patch. > >> +if (seg == 0) >> +return -EFAULT; >> +*nr_segs = seg; >> +*count -= iv->iov_len; /* This segment is no good */ >> +break; >> +} > > > You could assign to *count here, once, and remove the requirement > that the caller initialised it to zero? > >> +return 0; >> +} >> + >> /** >> * generic_file_aio_read - generic filesystem read routine >> * @iocb: kernel I/O control block >> @@ -1180,24 +1213,9 @@ generic_file_aio_read(struct kiocb *iocb, const >> struct iovec *iov, >> loff_t *ppos = >ki_pos; >> >> count = 0; >> -for (seg = 0; seg < nr_segs; seg++) { >> -const struct iovec *iv = [seg]; >> - >> -/* >> - * If any segment has a negative length, or the cumulative >> - * length ever wraps negative then return -EINVAL. >> - */ >> -count += iv->iov_len; >> -if (unlikely((ssize_t)(count|iv->iov_len) < 0)) >> -return -EINVAL; >> -if (access_ok(VERIFY_WRITE, iv->iov_base, iv->iov_len)) >> -continue; >> -if (seg == 0) >> -return -EFAULT; >> -nr_segs = seg; >> -count -= iv->iov_len; /* This segment is no good */ >> -break; >> -} >> +retval = generic_segment_checks(iov, _segs, , VERIFY_WRITE); >> +if (retval) >> +return retval; >> >> /* coalesce the iovecs and go direct-to-BIO for O_DIRECT */ >> if (filp->f_flags & O_DIRECT) { >> @@ -2094,30 +2112,14 @@ __generic_file_aio_write_nolock(struct kiocb *iocb, >> const struct iovec *iov, >> size_t ocount; /* original count */ >> size_t count; /* after file limit checks */ >> struct inode*inode = mapping->host; >> -unsigned long seg; >> loff_t pos; >> ssize_t written; >> ssize_t err; >> >> ocount = 0; >> -for (seg = 0; seg < nr_segs; seg++) { >> -const struct iovec *iv = [seg]; >> - >> -/* >> - * If any segment has a negative length, or the cumu
Re: [PATCH 2/2] mm: incorrect direct io error handling (v6)
Nick Piggin <[EMAIL PROTECTED]> writes: > On Mon, Mar 12, 2007 at 11:55:30AM +0300, Dmitriy Monakhov wrote: >> Nick Piggin <[EMAIL PROTECTED]> writes: >> >> > On Mon, Mar 12, 2007 at 10:58:10AM +0300, Dmitriy Monakhov wrote: > >> >> @@ -2240,6 +2241,29 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, >> >> const struct iovec *iov, >> >> mutex_lock(>i_mutex); >> >> ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs, >> >> >ki_pos); >> >> + /* >> >> + * If __generic_file_aio_write_nolock has failed. >> >> + * This may happen because of: >> >> + * 1) Bad segment found (failed before actual write attempt) >> >> + * 2) Segments are good, but actual write operation failed >> >> + *and may have instantiated a few blocks outside i_size. >> >> + * a) in case of buffered write these blocks was already >> >> + * trimmed by generic_file_buffered_write() >> >> + * b) in case of O_DIRECT these blocks weren't trimmed yet. >> >> + * >> >> + * In case of (2b) these blocks have to be trimmed off again. >> >> + */ >> >> + if (unlikely( ret < 0 && file->f_flags & O_DIRECT)) { >> >> + unsigned long nr_segs_avail = nr_segs; >> >> + size_t count = 0; >> >> + if (!generic_segment_checks(iov, _segs_avail, , >> >> + VERIFY_READ)) { >> >> + /*It is (2b) case, because segments are good*/ >> >> + loff_t isize = i_size_read(inode); >> >> + if (pos + count > isize) >> >> + vmtruncate(inode, isize); >> >> + } >> >> + } >> > >> > OK, but wouldn't this be better to be done in the actual direct IO >> > functions themselves? Thus you could be sure that you have the 2b case, >> > and the code would be less fragile to something changing? >> Ohh, We can't just call vmtruncate() after generic_file_direct_write() >> failure while __generic_file_aio_write_nolock() becase where is no guarantee >> what i_mutex held. In fact all existing fs always invoke >> __generic_file_aio_write_nolock() with i_mutex held in case of S_ISREG files, >> but this was't explicitly demanded and documented. I've proposed to do it in >> previous versions of this patch, because it this just document current state >> of affairs, but David Chinner wasn't agree with it. > > It seemed like it was documented in the comments that you altered in this > patch... > > How would such a filesystem that did not hold i_mutex propose to fix the > problem? > > The burden should be on those filesystems that might not want to hold > i_mutex here, to solve the problem nicely, rather than generic code to take > this ugly code. Ok then what do you think about this version http://lkml.org/lkml/2006/12/18/103 witch was posted almost month ago :) > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] mm: incorrect direct io error handling (v6)
Nick Piggin <[EMAIL PROTECTED]> writes: > On Mon, Mar 12, 2007 at 10:58:10AM +0300, Dmitriy Monakhov wrote: >> I realy don't want to be annoying by sending this patcheset over and over >> again, i just want the issue to be solved. If anyone think this solution >> is realy cappy, please comment what exectly is bad. Thank you. > > If you don't get feedback, then you have to keep posting. If you still > don't get feedback, try cc'ing a few more lists (eg. linux-fsdevel). > >> Changes: >> - patch was split in two patches. >> - comments added. I think now it is clearly describe things. >> - patch prepared against 2.6.20-mm3 >> >> How this patch tested: >> - fsstress test. >> - manual direct_io tests. >> >> LOG: >> - Trim off blocks after generic_file_direct_write() has failed. >> - Update out of date comments about direct_io locking rules. > > It can be nice to expand on what the problem was, and how you fixed it... > but I guess you do quite a good job in the C comments. If generic_file_direct_write() has fail (ENOSPC condition) inside __generic_file_aio_write_nolock() it may have instantiated a few blocks outside i_size. And fsck will complain about wrong i_size (ext2, ext3 and reiserfs interpret i_size and biggest block difference as error), after fsck will fix error i_size will be increased to the biggest block, but this blocks contain gurbage from previous write attempt, this is not information leak, but its silence file data corruption. This issue affect fs regardless the values of blocksize or pagesize. We need truncate any block beyond i_size after write have failed , do in simular generic_file_buffered_write() error path. TEST_CASE: open("/mnt/test/BIG_FILE", O_WRONLY|O_CREAT|O_DIRECT, 0666) = 3 write(3, "aaa"..., 104857600) = -1 ENOSPC (No space left on device) #stat /mnt/test/BIG_FILE File: `/mnt/test/BIG_FILE' Size: 0 Blocks: 110896 IO Block: 1024 regular empty file <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<file size is less than biggest block idx Device: fe07h/65031dInode: 14 Links: 1 Access: (0644/-rw-r--r--) Uid: (0/root) Gid: (0/root) Access: 2007-01-24 20:03:38.0 +0300 Modify: 2007-01-24 20:03:38.0 +0300 Change: 2007-01-24 20:03:39.0 +0300 #fsck.ext3 -f /dev/VG/test e2fsck 1.39 (29-May-2006) Pass 1: Checking inodes, blocks, and sizes Inode 14, i_size is 0, should be 56556544. Fix? yes Pass 2: Checking directory structure > >> >> Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]> >> --- >> mm/filemap.c | 32 >> 1 files changed, 28 insertions(+), 4 deletions(-) >> >> diff --git a/mm/filemap.c b/mm/filemap.c >> index 0aadf5f..8959ae3 100644 >> --- a/mm/filemap.c >> +++ b/mm/filemap.c >> @@ -1925,8 +1925,9 @@ generic_file_direct_write(struct kiocb *iocb, const >> struct iovec *iov, >> /* >> * Sync the fs metadata but not the minor inode changes and >> * of course not the data as we did direct DMA for the IO. >> - * i_mutex is held, which protects generic_osync_inode() from >> - * livelocking. AIO O_DIRECT ops attempt to sync metadata here. >> + * i_mutex may not being held, if so some specific locking >> + * ordering must protect generic_osync_inode() from livelocking. >> + * AIO O_DIRECT ops attempt to sync metadata here. >> */ > > This wasn't exactly clear to me. Did you mean: > > "may be held, which protects generic_osync_inode() from livelocking. If it > is not held, then the filesystem must prevent this livelock"? Yep.. my english is not realy good :( > >> if ((written >= 0 || written == -EIOCBQUEUED) && >> ((file->f_flags & O_SYNC) || IS_SYNC(inode))) { >> @@ -2240,6 +2241,29 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, >> const struct iovec *iov, >> mutex_lock(>i_mutex); >> ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs, >> >ki_pos); >> +/* >> + * If __generic_file_aio_write_nolock has failed. >> + * This may happen because of: >> + * 1) Bad segment found (failed before actual write attempt) >> + * 2) Segments are good, but actual write operation failed >> + *and may have instantiated a few blocks outside i_size. >> + * a) in case of buffered write these blocks was already >> + * trimmed by generic_file_buffered_write() >> + * b) in case of O_DIRECT t
[PATCH 1/2] mm: move common segment checks to separate helper function (v6)
I realy don't want to be annoying by sending this patcheset over and over again. If anyone think this patch is realy cappy, please comment what exectly is bad. Thank you. Changes: - patch was split in two patches. - comments added. I think now it is clearly describe things. - make generic_segment_checks() inline - patch prepared against 2.6.20-mm3 How is tested: - LTP test, and other readv/writev op tests. Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]> --- mm/filemap.c | 76 + 1 files changed, 39 insertions(+), 37 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 8e1849a..0aadf5f 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1159,6 +1159,39 @@ success: return size; } +/* + * Performs necessary checks before doing a write + * + * Adjust number of segments and amount of bytes to write. + * Returns appropriate error code that caller should return or + * zero in case that write should be allowed. + */ +inline int generic_segment_checks(const struct iovec *iov, + unsigned long *nr_segs, size_t *count, + unsigned long access_flags) +{ + unsigned long seg; + for (seg = 0; seg < *nr_segs; seg++) { + const struct iovec *iv = [seg]; + + /* +* If any segment has a negative length, or the cumulative +* length ever wraps negative then return -EINVAL. +*/ + *count += iv->iov_len; + if (unlikely((ssize_t)(*count|iv->iov_len) < 0)) + return -EINVAL; + if (access_ok(access_flags, iv->iov_base, iv->iov_len)) + continue; + if (seg == 0) + return -EFAULT; + *nr_segs = seg; + *count -= iv->iov_len; /* This segment is no good */ + break; + } + return 0; +} + /** * generic_file_aio_read - generic filesystem read routine * @iocb: kernel I/O control block @@ -1180,24 +1213,9 @@ generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov, loff_t *ppos = >ki_pos; count = 0; - for (seg = 0; seg < nr_segs; seg++) { - const struct iovec *iv = [seg]; - - /* -* If any segment has a negative length, or the cumulative -* length ever wraps negative then return -EINVAL. -*/ - count += iv->iov_len; - if (unlikely((ssize_t)(count|iv->iov_len) < 0)) - return -EINVAL; - if (access_ok(VERIFY_WRITE, iv->iov_base, iv->iov_len)) - continue; - if (seg == 0) - return -EFAULT; - nr_segs = seg; - count -= iv->iov_len; /* This segment is no good */ - break; - } + retval = generic_segment_checks(iov, _segs, , VERIFY_WRITE); + if (retval) + return retval; /* coalesce the iovecs and go direct-to-BIO for O_DIRECT */ if (filp->f_flags & O_DIRECT) { @@ -2094,30 +2112,14 @@ __generic_file_aio_write_nolock(struct kiocb *iocb, const struct iovec *iov, size_t ocount; /* original count */ size_t count; /* after file limit checks */ struct inode*inode = mapping->host; - unsigned long seg; loff_t pos; ssize_t written; ssize_t err; ocount = 0; - for (seg = 0; seg < nr_segs; seg++) { - const struct iovec *iv = [seg]; - - /* -* If any segment has a negative length, or the cumulative -* length ever wraps negative then return -EINVAL. -*/ - ocount += iv->iov_len; - if (unlikely((ssize_t)(ocount|iv->iov_len) < 0)) - return -EINVAL; - if (access_ok(VERIFY_READ, iv->iov_base, iv->iov_len)) - continue; - if (seg == 0) - return -EFAULT; - nr_segs = seg; - ocount -= iv->iov_len; /* This segment is no good */ - break; - } + err = generic_segment_checks(iov, _segs, , VERIFY_READ); + if (err) + return err; count = ocount; pos = *ppos; -- 1.5.0.1 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] mm: incorrect direct io error handling (v6)
I realy don't want to be annoying by sending this patcheset over and over again, i just want the issue to be solved. If anyone think this solution is realy cappy, please comment what exectly is bad. Thank you. Changes: - patch was split in two patches. - comments added. I think now it is clearly describe things. - patch prepared against 2.6.20-mm3 How this patch tested: - fsstress test. - manual direct_io tests. LOG: - Trim off blocks after generic_file_direct_write() has failed. - Update out of date comments about direct_io locking rules. Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]> --- mm/filemap.c | 32 1 files changed, 28 insertions(+), 4 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 0aadf5f..8959ae3 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1925,8 +1925,9 @@ generic_file_direct_write(struct kiocb *iocb, const struct iovec *iov, /* * Sync the fs metadata but not the minor inode changes and * of course not the data as we did direct DMA for the IO. -* i_mutex is held, which protects generic_osync_inode() from -* livelocking. AIO O_DIRECT ops attempt to sync metadata here. +* i_mutex may not being held, if so some specific locking +* ordering must protect generic_osync_inode() from livelocking. +* AIO O_DIRECT ops attempt to sync metadata here. */ if ((written >= 0 || written == -EIOCBQUEUED) && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) { @@ -2240,6 +2241,29 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov, mutex_lock(>i_mutex); ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs, >ki_pos); + /* +* If __generic_file_aio_write_nolock has failed. +* This may happen because of: +* 1) Bad segment found (failed before actual write attempt) +* 2) Segments are good, but actual write operation failed +*and may have instantiated a few blocks outside i_size. +* a) in case of buffered write these blocks was already +* trimmed by generic_file_buffered_write() +* b) in case of O_DIRECT these blocks weren't trimmed yet. +* +* In case of (2b) these blocks have to be trimmed off again. +*/ + if (unlikely( ret < 0 && file->f_flags & O_DIRECT)) { + unsigned long nr_segs_avail = nr_segs; + size_t count = 0; + if (!generic_segment_checks(iov, _segs_avail, , + VERIFY_READ)) { + /*It is (2b) case, because segments are good*/ + loff_t isize = i_size_read(inode); + if (pos + count > isize) + vmtruncate(inode, isize); + } + } mutex_unlock(>i_mutex); if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) { @@ -2254,8 +2278,8 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov, EXPORT_SYMBOL(generic_file_aio_write); /* - * Called under i_mutex for writes to S_ISREG files. Returns -EIO if something - * went wrong during pagecache shootdown. + * May be called without i_mutex for writes to S_ISREG files. + * Returns -EIO if something went wrong during pagecache shootdown. */ static ssize_t generic_file_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, -- 1.5.0.1 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ext3: dirindex error pointer issues (b)
Dmitriy Monakhov <[EMAIL PROTECTED]> writes: > - ext3_dx_find_entry() exit with out setting proper error pointer > - do_split() exit with out setting proper error pointer >it is realy painful because many callers contain folowing code: >de = do_split(handle,dir, , frame, , ); >if (!(de)) > return retval; ><<< WOW retval wasn't changed by do_split(), so caller failed ><<< but return SUCCESS :) > - Rearrange do_split() error path. Current error path is realy ugly, all >this up and down jump stuff doesn't make code easy to understand. Ohh my first patch change error message semantics in do_split(). Initially when ext3_append() failed we just exit without printing error. In fact ext3_append() may fail, it is legal and it's happens qite often (ENOSPC for example). This cause annoying fake error message. So restore this semantic as it was before patch. Andrew please apply incremental patch what fix it: Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]> diff --git a/fs/ext3/namei.c b/fs/ext3/namei.c index 1a52586..7edb617 100644 --- a/fs/ext3/namei.c +++ b/fs/ext3/namei.c @@ -1197,8 +1197,8 @@ journal_error: brelse(*bh); brelse(bh2); *bh = NULL; -errout: ext3_std_error(dir->i_sb, err); +errout: *error = err; return NULL; } diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index f0a6c26..02a75db 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -1195,8 +1195,8 @@ journal_error: brelse(*bh); brelse(bh2); *bh = NULL; -errout: ext4_std_error(dir->i_sb, err); +errout: *error = err; return NULL; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ext3: dirindex error pointer issues (b)
Dmitriy Monakhov [EMAIL PROTECTED] writes: - ext3_dx_find_entry() exit with out setting proper error pointer - do_split() exit with out setting proper error pointer it is realy painful because many callers contain folowing code: de = do_split(handle,dir, bh, frame, hinfo, retval); if (!(de)) return retval; WOW retval wasn't changed by do_split(), so caller failed but return SUCCESS :) - Rearrange do_split() error path. Current error path is realy ugly, all this up and down jump stuff doesn't make code easy to understand. Ohh my first patch change error message semantics in do_split(). Initially when ext3_append() failed we just exit without printing error. In fact ext3_append() may fail, it is legal and it's happens qite often (ENOSPC for example). This cause annoying fake error message. So restore this semantic as it was before patch. Andrew please apply incremental patch what fix it: Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] diff --git a/fs/ext3/namei.c b/fs/ext3/namei.c index 1a52586..7edb617 100644 --- a/fs/ext3/namei.c +++ b/fs/ext3/namei.c @@ -1197,8 +1197,8 @@ journal_error: brelse(*bh); brelse(bh2); *bh = NULL; -errout: ext3_std_error(dir-i_sb, err); +errout: *error = err; return NULL; } diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index f0a6c26..02a75db 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -1195,8 +1195,8 @@ journal_error: brelse(*bh); brelse(bh2); *bh = NULL; -errout: ext4_std_error(dir-i_sb, err); +errout: *error = err; return NULL; } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] mm: move common segment checks to separate helper function (v6)
I realy don't want to be annoying by sending this patcheset over and over again. If anyone think this patch is realy cappy, please comment what exectly is bad. Thank you. Changes: - patch was split in two patches. - comments added. I think now it is clearly describe things. - make generic_segment_checks() inline - patch prepared against 2.6.20-mm3 How is tested: - LTP test, and other readv/writev op tests. Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] --- mm/filemap.c | 76 + 1 files changed, 39 insertions(+), 37 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 8e1849a..0aadf5f 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1159,6 +1159,39 @@ success: return size; } +/* + * Performs necessary checks before doing a write + * + * Adjust number of segments and amount of bytes to write. + * Returns appropriate error code that caller should return or + * zero in case that write should be allowed. + */ +inline int generic_segment_checks(const struct iovec *iov, + unsigned long *nr_segs, size_t *count, + unsigned long access_flags) +{ + unsigned long seg; + for (seg = 0; seg *nr_segs; seg++) { + const struct iovec *iv = iov[seg]; + + /* +* If any segment has a negative length, or the cumulative +* length ever wraps negative then return -EINVAL. +*/ + *count += iv-iov_len; + if (unlikely((ssize_t)(*count|iv-iov_len) 0)) + return -EINVAL; + if (access_ok(access_flags, iv-iov_base, iv-iov_len)) + continue; + if (seg == 0) + return -EFAULT; + *nr_segs = seg; + *count -= iv-iov_len; /* This segment is no good */ + break; + } + return 0; +} + /** * generic_file_aio_read - generic filesystem read routine * @iocb: kernel I/O control block @@ -1180,24 +1213,9 @@ generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov, loff_t *ppos = iocb-ki_pos; count = 0; - for (seg = 0; seg nr_segs; seg++) { - const struct iovec *iv = iov[seg]; - - /* -* If any segment has a negative length, or the cumulative -* length ever wraps negative then return -EINVAL. -*/ - count += iv-iov_len; - if (unlikely((ssize_t)(count|iv-iov_len) 0)) - return -EINVAL; - if (access_ok(VERIFY_WRITE, iv-iov_base, iv-iov_len)) - continue; - if (seg == 0) - return -EFAULT; - nr_segs = seg; - count -= iv-iov_len; /* This segment is no good */ - break; - } + retval = generic_segment_checks(iov, nr_segs, count, VERIFY_WRITE); + if (retval) + return retval; /* coalesce the iovecs and go direct-to-BIO for O_DIRECT */ if (filp-f_flags O_DIRECT) { @@ -2094,30 +2112,14 @@ __generic_file_aio_write_nolock(struct kiocb *iocb, const struct iovec *iov, size_t ocount; /* original count */ size_t count; /* after file limit checks */ struct inode*inode = mapping-host; - unsigned long seg; loff_t pos; ssize_t written; ssize_t err; ocount = 0; - for (seg = 0; seg nr_segs; seg++) { - const struct iovec *iv = iov[seg]; - - /* -* If any segment has a negative length, or the cumulative -* length ever wraps negative then return -EINVAL. -*/ - ocount += iv-iov_len; - if (unlikely((ssize_t)(ocount|iv-iov_len) 0)) - return -EINVAL; - if (access_ok(VERIFY_READ, iv-iov_base, iv-iov_len)) - continue; - if (seg == 0) - return -EFAULT; - nr_segs = seg; - ocount -= iv-iov_len; /* This segment is no good */ - break; - } + err = generic_segment_checks(iov, nr_segs, ocount, VERIFY_READ); + if (err) + return err; count = ocount; pos = *ppos; -- 1.5.0.1 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] mm: incorrect direct io error handling (v6)
I realy don't want to be annoying by sending this patcheset over and over again, i just want the issue to be solved. If anyone think this solution is realy cappy, please comment what exectly is bad. Thank you. Changes: - patch was split in two patches. - comments added. I think now it is clearly describe things. - patch prepared against 2.6.20-mm3 How this patch tested: - fsstress test. - manual direct_io tests. LOG: - Trim off blocks after generic_file_direct_write() has failed. - Update out of date comments about direct_io locking rules. Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] --- mm/filemap.c | 32 1 files changed, 28 insertions(+), 4 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 0aadf5f..8959ae3 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1925,8 +1925,9 @@ generic_file_direct_write(struct kiocb *iocb, const struct iovec *iov, /* * Sync the fs metadata but not the minor inode changes and * of course not the data as we did direct DMA for the IO. -* i_mutex is held, which protects generic_osync_inode() from -* livelocking. AIO O_DIRECT ops attempt to sync metadata here. +* i_mutex may not being held, if so some specific locking +* ordering must protect generic_osync_inode() from livelocking. +* AIO O_DIRECT ops attempt to sync metadata here. */ if ((written = 0 || written == -EIOCBQUEUED) ((file-f_flags O_SYNC) || IS_SYNC(inode))) { @@ -2240,6 +2241,29 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov, mutex_lock(inode-i_mutex); ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs, iocb-ki_pos); + /* +* If __generic_file_aio_write_nolock has failed. +* This may happen because of: +* 1) Bad segment found (failed before actual write attempt) +* 2) Segments are good, but actual write operation failed +*and may have instantiated a few blocks outside i_size. +* a) in case of buffered write these blocks was already +* trimmed by generic_file_buffered_write() +* b) in case of O_DIRECT these blocks weren't trimmed yet. +* +* In case of (2b) these blocks have to be trimmed off again. +*/ + if (unlikely( ret 0 file-f_flags O_DIRECT)) { + unsigned long nr_segs_avail = nr_segs; + size_t count = 0; + if (!generic_segment_checks(iov, nr_segs_avail, count, + VERIFY_READ)) { + /*It is (2b) case, because segments are good*/ + loff_t isize = i_size_read(inode); + if (pos + count isize) + vmtruncate(inode, isize); + } + } mutex_unlock(inode-i_mutex); if (ret 0 ((file-f_flags O_SYNC) || IS_SYNC(inode))) { @@ -2254,8 +2278,8 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov, EXPORT_SYMBOL(generic_file_aio_write); /* - * Called under i_mutex for writes to S_ISREG files. Returns -EIO if something - * went wrong during pagecache shootdown. + * May be called without i_mutex for writes to S_ISREG files. + * Returns -EIO if something went wrong during pagecache shootdown. */ static ssize_t generic_file_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, -- 1.5.0.1 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] mm: incorrect direct io error handling (v6)
Nick Piggin [EMAIL PROTECTED] writes: On Mon, Mar 12, 2007 at 10:58:10AM +0300, Dmitriy Monakhov wrote: I realy don't want to be annoying by sending this patcheset over and over again, i just want the issue to be solved. If anyone think this solution is realy cappy, please comment what exectly is bad. Thank you. If you don't get feedback, then you have to keep posting. If you still don't get feedback, try cc'ing a few more lists (eg. linux-fsdevel). Changes: - patch was split in two patches. - comments added. I think now it is clearly describe things. - patch prepared against 2.6.20-mm3 How this patch tested: - fsstress test. - manual direct_io tests. LOG: - Trim off blocks after generic_file_direct_write() has failed. - Update out of date comments about direct_io locking rules. It can be nice to expand on what the problem was, and how you fixed it... but I guess you do quite a good job in the C comments. If generic_file_direct_write() has fail (ENOSPC condition) inside __generic_file_aio_write_nolock() it may have instantiated a few blocks outside i_size. And fsck will complain about wrong i_size (ext2, ext3 and reiserfs interpret i_size and biggest block difference as error), after fsck will fix error i_size will be increased to the biggest block, but this blocks contain gurbage from previous write attempt, this is not information leak, but its silence file data corruption. This issue affect fs regardless the values of blocksize or pagesize. We need truncate any block beyond i_size after write have failed , do in simular generic_file_buffered_write() error path. TEST_CASE: open(/mnt/test/BIG_FILE, O_WRONLY|O_CREAT|O_DIRECT, 0666) = 3 write(3, aaa..., 104857600) = -1 ENOSPC (No space left on device) #stat /mnt/test/BIG_FILE File: `/mnt/test/BIG_FILE' Size: 0 Blocks: 110896 IO Block: 1024 regular empty file file size is less than biggest block idx Device: fe07h/65031dInode: 14 Links: 1 Access: (0644/-rw-r--r--) Uid: (0/root) Gid: (0/root) Access: 2007-01-24 20:03:38.0 +0300 Modify: 2007-01-24 20:03:38.0 +0300 Change: 2007-01-24 20:03:39.0 +0300 #fsck.ext3 -f /dev/VG/test e2fsck 1.39 (29-May-2006) Pass 1: Checking inodes, blocks, and sizes Inode 14, i_size is 0, should be 56556544. Fixy? yes Pass 2: Checking directory structure Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] --- mm/filemap.c | 32 1 files changed, 28 insertions(+), 4 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 0aadf5f..8959ae3 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1925,8 +1925,9 @@ generic_file_direct_write(struct kiocb *iocb, const struct iovec *iov, /* * Sync the fs metadata but not the minor inode changes and * of course not the data as we did direct DMA for the IO. - * i_mutex is held, which protects generic_osync_inode() from - * livelocking. AIO O_DIRECT ops attempt to sync metadata here. + * i_mutex may not being held, if so some specific locking + * ordering must protect generic_osync_inode() from livelocking. + * AIO O_DIRECT ops attempt to sync metadata here. */ This wasn't exactly clear to me. Did you mean: may be held, which protects generic_osync_inode() from livelocking. If it is not held, then the filesystem must prevent this livelock? Yep.. my english is not realy good :( if ((written = 0 || written == -EIOCBQUEUED) ((file-f_flags O_SYNC) || IS_SYNC(inode))) { @@ -2240,6 +2241,29 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov, mutex_lock(inode-i_mutex); ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs, iocb-ki_pos); +/* + * If __generic_file_aio_write_nolock has failed. + * This may happen because of: + * 1) Bad segment found (failed before actual write attempt) + * 2) Segments are good, but actual write operation failed + *and may have instantiated a few blocks outside i_size. + * a) in case of buffered write these blocks was already + * trimmed by generic_file_buffered_write() + * b) in case of O_DIRECT these blocks weren't trimmed yet. + * + * In case of (2b) these blocks have to be trimmed off again. + */ +if (unlikely( ret 0 file-f_flags O_DIRECT)) { +unsigned long nr_segs_avail = nr_segs; +size_t count = 0; +if (!generic_segment_checks(iov, nr_segs_avail, count, +VERIFY_READ)) { +/*It is (2b) case, because segments are good*/ +loff_t isize = i_size_read(inode); +if (pos + count isize) +vmtruncate(inode, isize); +} +} OK, but wouldn't this be better to be done in the actual
Re: [PATCH 2/2] mm: incorrect direct io error handling (v6)
Nick Piggin [EMAIL PROTECTED] writes: On Mon, Mar 12, 2007 at 11:55:30AM +0300, Dmitriy Monakhov wrote: Nick Piggin [EMAIL PROTECTED] writes: On Mon, Mar 12, 2007 at 10:58:10AM +0300, Dmitriy Monakhov wrote: @@ -2240,6 +2241,29 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov, mutex_lock(inode-i_mutex); ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs, iocb-ki_pos); + /* + * If __generic_file_aio_write_nolock has failed. + * This may happen because of: + * 1) Bad segment found (failed before actual write attempt) + * 2) Segments are good, but actual write operation failed + *and may have instantiated a few blocks outside i_size. + * a) in case of buffered write these blocks was already + * trimmed by generic_file_buffered_write() + * b) in case of O_DIRECT these blocks weren't trimmed yet. + * + * In case of (2b) these blocks have to be trimmed off again. + */ + if (unlikely( ret 0 file-f_flags O_DIRECT)) { + unsigned long nr_segs_avail = nr_segs; + size_t count = 0; + if (!generic_segment_checks(iov, nr_segs_avail, count, + VERIFY_READ)) { + /*It is (2b) case, because segments are good*/ + loff_t isize = i_size_read(inode); + if (pos + count isize) + vmtruncate(inode, isize); + } + } OK, but wouldn't this be better to be done in the actual direct IO functions themselves? Thus you could be sure that you have the 2b case, and the code would be less fragile to something changing? Ohh, We can't just call vmtruncate() after generic_file_direct_write() failure while __generic_file_aio_write_nolock() becase where is no guarantee what i_mutex held. In fact all existing fs always invoke __generic_file_aio_write_nolock() with i_mutex held in case of S_ISREG files, but this was't explicitly demanded and documented. I've proposed to do it in previous versions of this patch, because it this just document current state of affairs, but David Chinner wasn't agree with it. It seemed like it was documented in the comments that you altered in this patch... How would such a filesystem that did not hold i_mutex propose to fix the problem? The burden should be on those filesystems that might not want to hold i_mutex here, to solve the problem nicely, rather than generic code to take this ugly code. Ok then what do you think about this version http://lkml.org/lkml/2006/12/18/103 witch was posted almost month ago :) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] mm: move common segment checks to separate helper function (v6)
Nick Piggin [EMAIL PROTECTED] writes: On Mon, Mar 12, 2007 at 10:57:53AM +0300, Dmitriy Monakhov wrote: I realy don't want to be annoying by sending this patcheset over and over again. If anyone think this patch is realy cappy, please comment what exectly is bad. Thank you. Doesn't seem like a bad idea. Changes: - patch was split in two patches. +/* + * Performs necessary checks before doing a write + * + * Adjust number of segments and amount of bytes to write. + * Returns appropriate error code that caller should return or + * zero in case that write should be allowed. + */ +inline int generic_segment_checks(const struct iovec *iov, +unsigned long *nr_segs, size_t *count, +unsigned long access_flags) Make it static and not inline, and the compiler will work it out. Wow i've just carefully checked and found more functions with duplicating code: fs/xfs/linux-2.6/xfs_lrw.c:655 xfs_write() fs/ntfs/file.c:2339 ntfs_file_aio_write_nolock() So i think nobody will object against exporting generic_segment_checks() and removing doplicating code. This function name doesn't really imply that it returns you the nr_segs and count, but that's not a big deal I guess. You also don't say that nr_segs should be initialised to the amount you which to write, while count must be initialised to zero. +{ +unsigned long seg; +for (seg = 0; seg *nr_segs; seg++) { +const struct iovec *iv = iov[seg]; + +/* + * If any segment has a negative length, or the cumulative + * length ever wraps negative then return -EINVAL. + */ +*count += iv-iov_len; +if (unlikely((ssize_t)(*count|iv-iov_len) 0)) +return -EINVAL; +if (access_ok(access_flags, iv-iov_base, iv-iov_len)) +continue; Why now insert the above test, and put the below statements inside the branch? OTOH, that makes it less obviously cp from the others. Maybe a subsequent patch. +if (seg == 0) +return -EFAULT; +*nr_segs = seg; +*count -= iv-iov_len; /* This segment is no good */ +break; +} You could assign to *count here, once, and remove the requirement that the caller initialised it to zero? +return 0; +} + /** * generic_file_aio_read - generic filesystem read routine * @iocb: kernel I/O control block @@ -1180,24 +1213,9 @@ generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov, loff_t *ppos = iocb-ki_pos; count = 0; -for (seg = 0; seg nr_segs; seg++) { -const struct iovec *iv = iov[seg]; - -/* - * If any segment has a negative length, or the cumulative - * length ever wraps negative then return -EINVAL. - */ -count += iv-iov_len; -if (unlikely((ssize_t)(count|iv-iov_len) 0)) -return -EINVAL; -if (access_ok(VERIFY_WRITE, iv-iov_base, iv-iov_len)) -continue; -if (seg == 0) -return -EFAULT; -nr_segs = seg; -count -= iv-iov_len; /* This segment is no good */ -break; -} +retval = generic_segment_checks(iov, nr_segs, count, VERIFY_WRITE); +if (retval) +return retval; /* coalesce the iovecs and go direct-to-BIO for O_DIRECT */ if (filp-f_flags O_DIRECT) { @@ -2094,30 +2112,14 @@ __generic_file_aio_write_nolock(struct kiocb *iocb, const struct iovec *iov, size_t ocount; /* original count */ size_t count; /* after file limit checks */ struct inode*inode = mapping-host; -unsigned long seg; loff_t pos; ssize_t written; ssize_t err; ocount = 0; -for (seg = 0; seg nr_segs; seg++) { -const struct iovec *iv = iov[seg]; - -/* - * If any segment has a negative length, or the cumulative - * length ever wraps negative then return -EINVAL. - */ -ocount += iv-iov_len; -if (unlikely((ssize_t)(ocount|iv-iov_len) 0)) -return -EINVAL; -if (access_ok(VERIFY_READ, iv-iov_base, iv-iov_len)) -continue; -if (seg == 0) -return -EFAULT; -nr_segs = seg; -ocount -= iv-iov_len; /* This segment is no good */ -break; -} +err = generic_segment_checks(iov, nr_segs, ocount, VERIFY_READ); +if (err) +return err; count = ocount; pos = *ppos; -- 1.5.0.1 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http
[PATCH 2/2] incorrect direct io error handling (v7)
Changes against v6: - Handle direct_io failure inside generic_file_direct_write() as it was recommend by Andrew (during discussion v1), and by Nick (during discussion v6). - change comments, make it more clear. - one more time check what __generic_file_aio_write_nolock() always called under i_mutex for non blkdev files. Tested with: fsstress, manual direct_io tests Log: If generic_file_direct_write() has fail (ENOSPC condition) inside __generic_file_aio_write_nolock() it may have instantiated a few blocks outside i_size. And fsck will complain about wrong i_size (ext2, ext3 and reiserfs interpret i_size and biggest block difference as error), after fsck will fix error i_size will be increased to the biggest block, but this blocks contain gurbage from previous write attempt, this is not information leak, but its silence file data corruption. This issue affect fs regardless the values of blocksize or pagesize, and off corse only for non blkdev files. We need truncate any block beyond i_size after write have failed , do in simular generic_file_buffered_write() error path. We may safely call vmtruncate() here because i_mutex always held for non blkdev files. TEST_CASE: open(/mnt/test/BIG_FILE, O_WRONLY|O_CREAT|O_DIRECT, 0666) = 3 write(3, aaa..., 104857600) = -1 ENOSPC (No space left on device) #stat /mnt/test/BIG_FILE File: `/mnt/test/BIG_FILE' Size: 0 Blocks: 110896 IO Block: 1024 regular empty file file size is less than biggest block idx Device: fe07h/65031dInode: 14 Links: 1 Access: (0644/-rw-r--r--) Uid: (0/root) Gid: (0/root) Access: 2007-01-24 20:03:38.0 +0300 Modify: 2007-01-24 20:03:38.0 +0300 Change: 2007-01-24 20:03:39.0 +0300 #fsck.ext3 -f /dev/VG/test e2fsck 1.39 (29-May-2006) Pass 1: Checking inodes, blocks, and sizes Inode 14, i_size is 0, should be 56556544. Fixy? yes Pass 2: Checking directory structure Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] --- mm/filemap.c | 28 1 files changed, 24 insertions(+), 4 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 8bd1ea4..95d49fe 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1932,8 +1932,10 @@ generic_file_direct_write(struct kiocb *iocb, const struct iovec *iov, /* * Sync the fs metadata but not the minor inode changes and * of course not the data as we did direct DMA for the IO. -* i_mutex is held, which protects generic_osync_inode() from -* livelocking. AIO O_DIRECT ops attempt to sync metadata here. +* i_mutex is held in case of DIO_LOCKING, which protects +* generic_osync_inode() from livelocking. If it is not held, then +* the filesystem must prevent this livelock. AIO O_DIRECT ops +* attempt to sync metadata here. */ if ((written = 0 || written == -EIOCBQUEUED) ((file-f_flags O_SYNC) || IS_SYNC(inode))) { @@ -2155,8 +2157,26 @@ __generic_file_aio_write_nolock(struct kiocb *iocb, const struct iovec *iov, loff_t endbyte; ssize_t written_buffered; + /* +* In case of non blockdev we may fail to buffered I/O. +* So i_mutex must be held. +*/ + if (!S_ISBLK(inode-i_mode)) + BUG_ON(!mutex_is_locked(inode-i_mutex)); + written = generic_file_direct_write(iocb, iov, nr_segs, pos, ppos, count, ocount); + /* +* If host is not S_ISBLK generic_file_direct_write() may +* have instantiated a few blocks outside i_size files +* Trim these off again. +*/ + if (unlikely(written 0) !S_ISBLK(inode-i_mode)) { + loff_t isize = i_size_read(inode); + if (pos + count isize) + vmtruncate(inode, isize); + } + if (written 0 || written == count) goto out; /* @@ -2261,8 +2281,8 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov, EXPORT_SYMBOL(generic_file_aio_write); /* - * Called under i_mutex for writes to S_ISREG files. Returns -EIO if something - * went wrong during pagecache shootdown. + * Called under i_mutex for writes to S_ISREG files in case of DIO_LOCKING. + * Returns -EIO if something went wrong during pagecache shootdown. */ static ssize_t generic_file_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, -- 1.5.0.1 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] mm: move common segment checks to separate helper function (v7)
Changes against v6 - remove duplicated code from xfs,ntfs - export generic_segment_checks, because it used by xfs,nfs now. - change arguments initialization pocily according to Nick's comments. Tested with: ltp readv/writev tests Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] --- fs/ntfs/file.c | 21 ++- fs/xfs/linux-2.6/xfs_lrw.c | 22 ++-- include/linux/fs.h |3 ++ mm/filemap.c | 83 --- 4 files changed, 55 insertions(+), 74 deletions(-) diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c index dbbac55..621de36 100644 --- a/fs/ntfs/file.c +++ b/fs/ntfs/file.c @@ -2129,28 +2129,13 @@ static ssize_t ntfs_file_aio_write_nolock(struct kiocb *iocb, struct address_space *mapping = file-f_mapping; struct inode *inode = mapping-host; loff_t pos; - unsigned long seg; size_t count; /* after file limit checks */ ssize_t written, err; count = 0; - for (seg = 0; seg nr_segs; seg++) { - const struct iovec *iv = iov[seg]; - /* -* If any segment has a negative length, or the cumulative -* length ever wraps negative then return -EINVAL. -*/ - count += iv-iov_len; - if (unlikely((ssize_t)(count|iv-iov_len) 0)) - return -EINVAL; - if (access_ok(VERIFY_READ, iv-iov_base, iv-iov_len)) - continue; - if (!seg) - return -EFAULT; - nr_segs = seg; - count -= iv-iov_len; /* This segment is no good */ - break; - } + err = generic_segment_checks(iov, nr_segs, count, VERIFY_READ); + if (err) + return err; pos = *ppos; vfs_check_frozen(inode-i_sb, SB_FREEZE_WRITE); /* We can write back this queue in page reclaim. */ diff --git a/fs/xfs/linux-2.6/xfs_lrw.c b/fs/xfs/linux-2.6/xfs_lrw.c index ff8d64e..558076d 100644 --- a/fs/xfs/linux-2.6/xfs_lrw.c +++ b/fs/xfs/linux-2.6/xfs_lrw.c @@ -639,7 +639,6 @@ xfs_write( xfs_fsize_t isize, new_size; xfs_iocore_t*io; bhv_vnode_t *vp; - unsigned long seg; int iolock; int eventsent = 0; bhv_vrwlock_t locktype; @@ -652,24 +651,9 @@ xfs_write( vp = BHV_TO_VNODE(bdp); xip = XFS_BHVTOI(bdp); - for (seg = 0; seg segs; seg++) { - const struct iovec *iv = iovp[seg]; - - /* -* If any segment has a negative length, or the cumulative -* length ever wraps negative then return -EINVAL. -*/ - ocount += iv-iov_len; - if (unlikely((ssize_t)(ocount|iv-iov_len) 0)) - return -EINVAL; - if (access_ok(VERIFY_READ, iv-iov_base, iv-iov_len)) - continue; - if (seg == 0) - return -EFAULT; - segs = seg; - ocount -= iv-iov_len; /* This segment is no good */ - break; - } + error = generic_segment_checks(iovp, segs, ocount, VERIFY_READ); + if (error) + return error; count = ocount; pos = *offset; diff --git a/include/linux/fs.h b/include/linux/fs.h index 6a3d22e..3b99450 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1778,6 +1778,9 @@ extern ssize_t generic_file_sendfile(struct file *, loff_t *, size_t, read_actor extern void do_generic_mapping_read(struct address_space *mapping, struct file_ra_state *, struct file *, loff_t *, read_descriptor_t *, read_actor_t); +extern int generic_segment_checks(const struct iovec *iov, + unsigned long *nr_segs, size_t *count, + unsigned long access_flags); /* fs/splice.c */ extern ssize_t generic_file_splice_read(struct file *, loff_t *, diff --git a/mm/filemap.c b/mm/filemap.c index 8e1849a..8bd1ea4 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1159,6 +1159,46 @@ success: return size; } +/* + * Performs necessary checks before doing a write + * @iov: io vector request + * @nr_segs: number of segments in the iovec + * @count: number of bytes to write + * @access_flags: type of access: %VERIFY_READ or %VERIFY_WRITE + * + * Adjust number of segments and amount of bytes to write (nr_segs should be + * properly initialized first). Returns appropriate error code that caller + * should return or zero in case that write should be allowed. + */ +int generic_segment_checks(const struct iovec *iov, + unsigned long *nr_segs, size_t *count, + unsigned long access_flags) +{ + unsigned
[PATCH] driver core: handles kobject_uevent failure while device_add
Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]> --- drivers/base/core.c | 10 -- 1 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index 142c222..da73012 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -630,8 +630,11 @@ int device_add(struct device *dev) goto PMError; if ((error = bus_add_device(dev))) goto BusError; - if (!dev->uevent_suppress) - kobject_uevent(>kobj, KOBJ_ADD); + if (!dev->uevent_suppress) { + error = kobject_uevent(>kobj, KOBJ_ADD); + if (error) + goto UeventError; + } bus_attach_device(dev); if (parent) klist_add_tail(>knode_parent, >klist_children); @@ -651,6 +654,9 @@ int device_add(struct device *dev) kfree(class_name); put_device(dev); return error; + + UeventError: + bus_remove_device(dev); BusError: device_pm_remove(dev); PMError: -- 1.5.0.1 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] driver core: handle sysfs_op failure while device_add
depends on: "[PATCH] driver core: fix device_add error path" - rearrange error path sequence, in order to make it more correct. In fact if initial sequance was doA(); doB(); doC(); we should undo it with folowing sequance undoC(); undoB(); undoA(); - handle sysfs_ops failure This patch kills "ignoring return value of 'sysfs_create_link'" warning message. Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]> --- drivers/base/core.c | 47 +++ 1 files changed, 31 insertions(+), 16 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index 245d703..bcfcfad 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -602,22 +602,34 @@ int device_add(struct device *dev) } if (dev->class) { - sysfs_create_link(>kobj, >class->subsys.kset.kobj, - "subsystem"); + error = sysfs_create_link(>kobj, + >class->subsys.kset.kobj, "subsystem"); + if (error) + goto ueventattrError; /* If this is not a "fake" compatible device, then create the * symlink from the class to the device. */ - if (dev->kobj.parent != >class->subsys.kset.kobj) - sysfs_create_link(>class->subsys.kset.kobj, - >kobj, dev->bus_id); + if (dev->kobj.parent != >class->subsys.kset.kobj) { + error = sysfs_create_link( + >class->subsys.kset.kobj, + >kobj, dev->bus_id); + if (error) + goto busLinkErr; + } if (parent) { - sysfs_create_link(>kobj, >parent->kobj, - "device"); + error = sysfs_create_link(>kobj, + >parent->kobj, "device"); + if (error) + goto deviceLinkErr; #ifdef CONFIG_SYSFS_DEPRECATED class_name = make_class_name(dev->class->name, >kobj); - if (class_name) - sysfs_create_link(>parent->kobj, - >kobj, class_name); + if (class_name) { + error = sysfs_create_link( + >parent->kobj, + >kobj, class_name); + if (error) + goto classLinkErr; + } #endif } } @@ -673,12 +685,6 @@ int device_add(struct device *dev) } if (dev->class) { - sysfs_remove_link(>kobj, "subsystem"); - /* If this is not a "fake" compatible device, remove the -* symlink from the class to the device. */ - if (dev->kobj.parent != >class->subsys.kset.kobj) - sysfs_remove_link(>class->subsys.kset.kobj, - dev->bus_id); if (parent) { #ifdef CONFIG_SYSFS_DEPRECATED char *class_name = make_class_name(dev->class->name, @@ -688,8 +694,17 @@ int device_add(struct device *dev) class_name); kfree(class_name); #endif +classLinkErr: sysfs_remove_link(>kobj, "device"); } +deviceLinkErr: + /* If this is not a "fake" compatible device, remove the +* symlink from the class to the device. */ + if (dev->kobj.parent != >class->subsys.kset.kobj) + sysfs_remove_link(>class->subsys.kset.kobj, + dev->bus_id); +busLinkErr: + sysfs_remove_link(>kobj, "subsystem"); } ueventattrError: device_remove_file(dev, >uevent_attr); -- 1.5.0.1 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] driver core: fix device_add error path
Dmitriy Monakhov <[EMAIL PROTECTED]> writes: > Greg Kroah-Hartman <[EMAIL PROTECTED]> writes: > >> From: James Simmons <[EMAIL PROTECTED]> >> >> When a device fails to register the class symlinks where not cleaned up. >> This left a symlink in the /sys/class/"device"/ directory that pointed >> to no where. This caused the sysfs_follow_link Oops I reported earlier. >> This patch cleanups up the symlink. Please apply. Thank you. >> >> Signed-Off: James Simmons <[EMAIL PROTECTED]> >> Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]> >> --- >> drivers/base/core.c | 31 ++- >> 1 files changed, 30 insertions(+), 1 deletions(-) >> >> diff --git a/drivers/base/core.c b/drivers/base/core.c >> index d04fd33..cf2a398 100644 >> --- a/drivers/base/core.c >> +++ b/drivers/base/core.c >> @@ -637,12 +637,41 @@ int device_add(struct device *dev) >> BUS_NOTIFY_DEL_DEVICE, dev); >> device_remove_groups(dev); >> GroupError: >> -device_remove_attrs(dev); >> +device_remove_attrs(dev); >> AttrsError: >> if (dev->devt_attr) { >> device_remove_file(dev, dev->devt_attr); >> kfree(dev->devt_attr); >> } >> + >> +if (dev->class) { >> +sysfs_remove_link(>kobj, "subsystem"); >> +/* If this is not a "fake" compatible device, remove the >> + * symlink from the class to the device. */ >> +if (dev->kobj.parent != >class->subsys.kset.kobj) >> +sysfs_remove_link(>class->subsys.kset.kobj, >> + dev->bus_id); >> +#ifdef CONFIG_SYSFS_DEPRECATED >> +if (parent) { >> +char *class_name = make_class_name(dev->class->name, >> + >kobj); >> +if (class_name) >> +sysfs_remove_link(>parent->kobj, >> + class_name); >> +kfree(class_name); >> +sysfs_remove_link(>kobj, "device"); >> +} >> +#endif >> + > <<<<< block begin >> +down(>class->sem); >> +/* notify any interfaces that the device is now gone */ >> +list_for_each_entry(class_intf, >class->interfaces, node) >> +if (class_intf->remove_dev) >> +class_intf->remove_dev(dev, class_intf); >> +/* remove the device from the class list */ >> +list_del_init(>node); >> +up(>class->sem); > <<<<<< block end > May be i've missed something, but i'm confuesd a litle bit. > For example if error happens while device_pm_add() we jump to label "PMError" > and code from block above will be executed (device will be remove from list), > but this device wasn't added to this list yet! I've check it one more time, code it really broken!, and i think i understand how this can happen it look like full code chunck was copy-pasted from device_del(), but in case of device_add() error path, device was't added to dev->class->devices list yet. Folowing patch fix this copy-paste error: [PATCH] driver core: fix device_add error path - At the moment we jump here device was't added to dev->class->devices list yet. Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]> --- drivers/base/core.c |9 - 1 files changed, 0 insertions(+), 9 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index 142c222..7d2459b 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -684,15 +684,6 @@ int device_add(struct device *dev) #endif sysfs_remove_link(>kobj, "device"); } - - down(>class->sem); - /* notify any interfaces that the device is now gone */ - list_for_each_entry(class_intf, >class->interfaces, node) - if (class_intf->remove_dev) - class_intf->remove_dev(dev, class_intf); - /* remove the device from the class list */ - list_del_init(>node); - up(>class->sem); } ueventattrError: device_remove_file(dev, >uevent_attr); -- 1.5.0.1 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] driver core: fix device_add error path
Dmitriy Monakhov [EMAIL PROTECTED] writes: Greg Kroah-Hartman [EMAIL PROTECTED] writes: From: James Simmons [EMAIL PROTECTED] When a device fails to register the class symlinks where not cleaned up. This left a symlink in the /sys/class/device/ directory that pointed to no where. This caused the sysfs_follow_link Oops I reported earlier. This patch cleanups up the symlink. Please apply. Thank you. Signed-Off: James Simmons [EMAIL PROTECTED] Signed-off-by: Greg Kroah-Hartman [EMAIL PROTECTED] --- drivers/base/core.c | 31 ++- 1 files changed, 30 insertions(+), 1 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index d04fd33..cf2a398 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -637,12 +637,41 @@ int device_add(struct device *dev) BUS_NOTIFY_DEL_DEVICE, dev); device_remove_groups(dev); GroupError: -device_remove_attrs(dev); +device_remove_attrs(dev); AttrsError: if (dev-devt_attr) { device_remove_file(dev, dev-devt_attr); kfree(dev-devt_attr); } + +if (dev-class) { +sysfs_remove_link(dev-kobj, subsystem); +/* If this is not a fake compatible device, remove the + * symlink from the class to the device. */ +if (dev-kobj.parent != dev-class-subsys.kset.kobj) +sysfs_remove_link(dev-class-subsys.kset.kobj, + dev-bus_id); +#ifdef CONFIG_SYSFS_DEPRECATED +if (parent) { +char *class_name = make_class_name(dev-class-name, + dev-kobj); +if (class_name) +sysfs_remove_link(dev-parent-kobj, + class_name); +kfree(class_name); +sysfs_remove_link(dev-kobj, device); +} +#endif + block begin +down(dev-class-sem); +/* notify any interfaces that the device is now gone */ +list_for_each_entry(class_intf, dev-class-interfaces, node) +if (class_intf-remove_dev) +class_intf-remove_dev(dev, class_intf); +/* remove the device from the class list */ +list_del_init(dev-node); +up(dev-class-sem); block end May be i've missed something, but i'm confuesd a litle bit. For example if error happens while device_pm_add() we jump to label PMError and code from block above will be executed (device will be remove from list), but this device wasn't added to this list yet! I've check it one more time, code it really broken!, and i think i understand how this can happen it look like full code chunck was copy-pasted from device_del(), but in case of device_add() error path, device was't added to dev-class-devices list yet. Folowing patch fix this copy-paste error: [PATCH] driver core: fix device_add error path - At the moment we jump here device was't added to dev-class-devices list yet. Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] --- drivers/base/core.c |9 - 1 files changed, 0 insertions(+), 9 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index 142c222..7d2459b 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -684,15 +684,6 @@ int device_add(struct device *dev) #endif sysfs_remove_link(dev-kobj, device); } - - down(dev-class-sem); - /* notify any interfaces that the device is now gone */ - list_for_each_entry(class_intf, dev-class-interfaces, node) - if (class_intf-remove_dev) - class_intf-remove_dev(dev, class_intf); - /* remove the device from the class list */ - list_del_init(dev-node); - up(dev-class-sem); } ueventattrError: device_remove_file(dev, dev-uevent_attr); -- 1.5.0.1 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] driver core: handles kobject_uevent failure while device_add
Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] --- drivers/base/core.c | 10 -- 1 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index 142c222..da73012 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -630,8 +630,11 @@ int device_add(struct device *dev) goto PMError; if ((error = bus_add_device(dev))) goto BusError; - if (!dev-uevent_suppress) - kobject_uevent(dev-kobj, KOBJ_ADD); + if (!dev-uevent_suppress) { + error = kobject_uevent(dev-kobj, KOBJ_ADD); + if (error) + goto UeventError; + } bus_attach_device(dev); if (parent) klist_add_tail(dev-knode_parent, parent-klist_children); @@ -651,6 +654,9 @@ int device_add(struct device *dev) kfree(class_name); put_device(dev); return error; + + UeventError: + bus_remove_device(dev); BusError: device_pm_remove(dev); PMError: -- 1.5.0.1 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] driver core: handle sysfs_op failure while device_add
depends on: [PATCH] driver core: fix device_add error path - rearrange error path sequence, in order to make it more correct. In fact if initial sequance was doA(); doB(); doC(); we should undo it with folowing sequance undoC(); undoB(); undoA(); - handle sysfs_ops failure This patch kills ignoring return value of 'sysfs_create_link' warning message. Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] --- drivers/base/core.c | 47 +++ 1 files changed, 31 insertions(+), 16 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index 245d703..bcfcfad 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -602,22 +602,34 @@ int device_add(struct device *dev) } if (dev-class) { - sysfs_create_link(dev-kobj, dev-class-subsys.kset.kobj, - subsystem); + error = sysfs_create_link(dev-kobj, + dev-class-subsys.kset.kobj, subsystem); + if (error) + goto ueventattrError; /* If this is not a fake compatible device, then create the * symlink from the class to the device. */ - if (dev-kobj.parent != dev-class-subsys.kset.kobj) - sysfs_create_link(dev-class-subsys.kset.kobj, - dev-kobj, dev-bus_id); + if (dev-kobj.parent != dev-class-subsys.kset.kobj) { + error = sysfs_create_link( + dev-class-subsys.kset.kobj, + dev-kobj, dev-bus_id); + if (error) + goto busLinkErr; + } if (parent) { - sysfs_create_link(dev-kobj, dev-parent-kobj, - device); + error = sysfs_create_link(dev-kobj, + dev-parent-kobj, device); + if (error) + goto deviceLinkErr; #ifdef CONFIG_SYSFS_DEPRECATED class_name = make_class_name(dev-class-name, dev-kobj); - if (class_name) - sysfs_create_link(dev-parent-kobj, - dev-kobj, class_name); + if (class_name) { + error = sysfs_create_link( + dev-parent-kobj, + dev-kobj, class_name); + if (error) + goto classLinkErr; + } #endif } } @@ -673,12 +685,6 @@ int device_add(struct device *dev) } if (dev-class) { - sysfs_remove_link(dev-kobj, subsystem); - /* If this is not a fake compatible device, remove the -* symlink from the class to the device. */ - if (dev-kobj.parent != dev-class-subsys.kset.kobj) - sysfs_remove_link(dev-class-subsys.kset.kobj, - dev-bus_id); if (parent) { #ifdef CONFIG_SYSFS_DEPRECATED char *class_name = make_class_name(dev-class-name, @@ -688,8 +694,17 @@ int device_add(struct device *dev) class_name); kfree(class_name); #endif +classLinkErr: sysfs_remove_link(dev-kobj, device); } +deviceLinkErr: + /* If this is not a fake compatible device, remove the +* symlink from the class to the device. */ + if (dev-kobj.parent != dev-class-subsys.kset.kobj) + sysfs_remove_link(dev-class-subsys.kset.kobj, + dev-bus_id); +busLinkErr: + sysfs_remove_link(dev-kobj, subsystem); } ueventattrError: device_remove_file(dev, dev-uevent_attr); -- 1.5.0.1 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 8/9] Driver core: fix error by cleanup up symlinks properly
Greg Kroah-Hartman <[EMAIL PROTECTED]> writes: > From: James Simmons <[EMAIL PROTECTED]> > > When a device fails to register the class symlinks where not cleaned up. > This left a symlink in the /sys/class/"device"/ directory that pointed > to no where. This caused the sysfs_follow_link Oops I reported earlier. > This patch cleanups up the symlink. Please apply. Thank you. > > Signed-Off: James Simmons <[EMAIL PROTECTED]> > Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]> > --- > drivers/base/core.c | 31 ++- > 1 files changed, 30 insertions(+), 1 deletions(-) > > diff --git a/drivers/base/core.c b/drivers/base/core.c > index d04fd33..cf2a398 100644 > --- a/drivers/base/core.c > +++ b/drivers/base/core.c > @@ -637,12 +637,41 @@ int device_add(struct device *dev) >BUS_NOTIFY_DEL_DEVICE, dev); > device_remove_groups(dev); > GroupError: > - device_remove_attrs(dev); > + device_remove_attrs(dev); > AttrsError: > if (dev->devt_attr) { > device_remove_file(dev, dev->devt_attr); > kfree(dev->devt_attr); > } > + > + if (dev->class) { > + sysfs_remove_link(>kobj, "subsystem"); > + /* If this is not a "fake" compatible device, remove the > + * symlink from the class to the device. */ > + if (dev->kobj.parent != >class->subsys.kset.kobj) > + sysfs_remove_link(>class->subsys.kset.kobj, > + dev->bus_id); > +#ifdef CONFIG_SYSFS_DEPRECATED > + if (parent) { > + char *class_name = make_class_name(dev->class->name, > +>kobj); > + if (class_name) > + sysfs_remove_link(>parent->kobj, > + class_name); > + kfree(class_name); > + sysfs_remove_link(>kobj, "device"); > + } > +#endif > + < block begin > + down(>class->sem); > + /* notify any interfaces that the device is now gone */ > + list_for_each_entry(class_intf, >class->interfaces, node) > + if (class_intf->remove_dev) > + class_intf->remove_dev(dev, class_intf); > + /* remove the device from the class list */ > + list_del_init(>node); > + up(>class->sem); << block end May be i've missed something, but i'm confuesd a litle bit. For example if error happens while device_pm_add() we jump to label "PMError" and code from block above will be executed (device will be remove from list), but this device wasn't added to this list yet! > + } > ueventattrError: > device_remove_file(dev, >uevent_attr); > attrError: > -- > 1.5.0.1 > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] kobject: kobject_shadow_add cleanup
- correct function name in comments - parrent assignment does metter only inside "if" block, so move it inside this block. Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]> diff --git a/lib/kobject.c b/lib/kobject.c index b94f208..e4b477d 100644 --- a/lib/kobject.c +++ b/lib/kobject.c @@ -201,7 +201,7 @@ static void unlink(struct kobject * kobj) } /** - * kobject_add - add an object to the hierarchy. + * kobject_shadow_add - add an object to the hierarchy. * @kobj: object. * @shadow_parent: sysfs directory to add to. */ @@ -234,8 +234,8 @@ int kobject_shadow_add(struct kobject * kobj, struct dentry *shadow_parent) list_add_tail(>entry,>kset->list); spin_unlock(>kset->list_lock); + kobj->parent = parent; } - kobj->parent = parent; error = create_dir(kobj, shadow_parent); if (error) { - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] kobject: kobject_shadow_add cleanup
- correct function name in comments - parrent assignment does metter only inside if block, so move it inside this block. Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] diff --git a/lib/kobject.c b/lib/kobject.c index b94f208..e4b477d 100644 --- a/lib/kobject.c +++ b/lib/kobject.c @@ -201,7 +201,7 @@ static void unlink(struct kobject * kobj) } /** - * kobject_add - add an object to the hierarchy. + * kobject_shadow_add - add an object to the hierarchy. * @kobj: object. * @shadow_parent: sysfs directory to add to. */ @@ -234,8 +234,8 @@ int kobject_shadow_add(struct kobject * kobj, struct dentry *shadow_parent) list_add_tail(kobj-entry,kobj-kset-list); spin_unlock(kobj-kset-list_lock); + kobj-parent = parent; } - kobj-parent = parent; error = create_dir(kobj, shadow_parent); if (error) { - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 8/9] Driver core: fix error by cleanup up symlinks properly
Greg Kroah-Hartman [EMAIL PROTECTED] writes: From: James Simmons [EMAIL PROTECTED] When a device fails to register the class symlinks where not cleaned up. This left a symlink in the /sys/class/device/ directory that pointed to no where. This caused the sysfs_follow_link Oops I reported earlier. This patch cleanups up the symlink. Please apply. Thank you. Signed-Off: James Simmons [EMAIL PROTECTED] Signed-off-by: Greg Kroah-Hartman [EMAIL PROTECTED] --- drivers/base/core.c | 31 ++- 1 files changed, 30 insertions(+), 1 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index d04fd33..cf2a398 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -637,12 +637,41 @@ int device_add(struct device *dev) BUS_NOTIFY_DEL_DEVICE, dev); device_remove_groups(dev); GroupError: - device_remove_attrs(dev); + device_remove_attrs(dev); AttrsError: if (dev-devt_attr) { device_remove_file(dev, dev-devt_attr); kfree(dev-devt_attr); } + + if (dev-class) { + sysfs_remove_link(dev-kobj, subsystem); + /* If this is not a fake compatible device, remove the + * symlink from the class to the device. */ + if (dev-kobj.parent != dev-class-subsys.kset.kobj) + sysfs_remove_link(dev-class-subsys.kset.kobj, + dev-bus_id); +#ifdef CONFIG_SYSFS_DEPRECATED + if (parent) { + char *class_name = make_class_name(dev-class-name, +dev-kobj); + if (class_name) + sysfs_remove_link(dev-parent-kobj, + class_name); + kfree(class_name); + sysfs_remove_link(dev-kobj, device); + } +#endif + block begin + down(dev-class-sem); + /* notify any interfaces that the device is now gone */ + list_for_each_entry(class_intf, dev-class-interfaces, node) + if (class_intf-remove_dev) + class_intf-remove_dev(dev, class_intf); + /* remove the device from the class list */ + list_del_init(dev-node); + up(dev-class-sem); block end May be i've missed something, but i'm confuesd a litle bit. For example if error happens while device_pm_add() we jump to label PMError and code from block above will be executed (device will be remove from list), but this device wasn't added to this list yet! + } ueventattrError: device_remove_file(dev, dev-uevent_attr); attrError: -- 1.5.0.1 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][Patch 2/6] integrity: fs hook placement
Mimi Zohar <[EMAIL PROTECTED]> writes: > This patch places calls to the new integrity hooks in the appropriate > places in the fs directory. It is not meant in any way to be viewed > as a complete set, but used as a basis for an initial discussion. > > Index: linux-2.6.21-rc3-mm2/fs/ext3/xattr_security.c > === > --- linux-2.6.21-rc3-mm2.orig/fs/ext3/xattr_security.c > +++ linux-2.6.21-rc3-mm2/fs/ext3/xattr_security.c > @@ -10,6 +10,7 @@ > #include > #include > #include > +#include > #include "xattr.h" > > static size_t > @@ -58,12 +59,19 @@ ext3_init_security(handle_t *handle, str > > err = security_inode_init_security(inode, dir, , , ); > if (err) { > + /* Even if creation of the security xattr fails, must > + * indicate this is a new inode. */ > + integrity_inode_init_integrity(inode, dir, NULL, NULL, NULL); > if (err == -EOPNOTSUPP) > return 0; > return err; > } block begin > err = ext3_xattr_set_handle(handle, inode, EXT3_XATTR_INDEX_SECURITY, > name, value, len, 0); > + > + integrity_inode_init_integrity(inode, dir, , , ); > + err = ext3_xattr_set_handle(handle, inode, EXT3_XATTR_INDEX_SECURITY, > + name, value, len, 0); block end May be i've missed something, but i don't get last block. Why you call ext3_xattr_set_handle() twise?, or you just mistyped and it has to look like this: <<< kfree(name); > kfree(value); > return err; > Index: linux-2.6.21-rc3-mm2/include/linux/fs.h > === > --- linux-2.6.21-rc3-mm2.orig/include/linux/fs.h > +++ linux-2.6.21-rc3-mm2/include/linux/fs.h > @@ -591,6 +591,9 @@ struct inode { > #ifdef CONFIG_SECURITY > void*i_security; > #endif > +#ifdef CONFIG_INTEGRITY > + void*i_integrity; > +#endif > void*i_private; /* fs or device private pointer */ > }; > > Index: linux-2.6.21-rc3-mm2/fs/dcache.c > === > --- linux-2.6.21-rc3-mm2.orig/fs/dcache.c > +++ linux-2.6.21-rc3-mm2/fs/dcache.c > @@ -29,6 +29,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -986,6 +987,7 @@ void d_instantiate(struct dentry *entry, > entry->d_inode = inode; > fsnotify_d_instantiate(entry, inode); > spin_unlock(_lock); > + integrity_d_instantiate(entry, inode); > security_d_instantiate(entry, inode); > } > > @@ -1050,6 +1052,7 @@ struct dentry *d_instantiate_unique(stru > spin_unlock(_lock); > > if (!result) { > + integrity_d_instantiate(entry, inode); > security_d_instantiate(entry, inode); > return NULL; > } > @@ -1187,6 +1190,7 @@ struct dentry *d_splice_alias(struct ino > BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED)); > fsnotify_d_instantiate(new, inode); > spin_unlock(_lock); > + integrity_d_instantiate(new, inode); > security_d_instantiate(new, inode); > d_rehash(dentry); > d_move(new, dentry); > @@ -1197,6 +1201,7 @@ struct dentry *d_splice_alias(struct ino > dentry->d_inode = inode; > fsnotify_d_instantiate(dentry, inode); > spin_unlock(_lock); > + integrity_d_instantiate(dentry, inode); > security_d_instantiate(dentry, inode); > d_rehash(dentry); > } > @@ -1748,6 +1753,7 @@ found: > spin_unlock(_lock); > out_nolock: > if (actual == dentry) { > + integrity_d_instantiate(dentry, inode); > security_d_instantiate(dentry, inode); > return NULL; > } > Index: linux-2.6.21-rc3-mm2/fs/file_table.c > === > --- linux-2.6.21-rc3-mm2.orig/fs/file_table.c > +++ linux-2.6.21-rc3-mm2/fs/file_table.c > @@ -13,6 +13,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -169,6 +170,7 @@ void fastcall __fput(struct file *file) > if (file->f_op && file->f_op->release) > file->f_op->release(inode, file); > security_file_free(file); > + integrity_file_free(file); > if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL)) > cdev_put(inode->i_cdev); > fops_put(file->f_op); > @@ -240,6 +242,7 @@ void put_filp(struct file *file) > { > if (atomic_dec_and_test(>f_count)) { > security_file_free(file); > + integrity_file_free(file); >
Re: [RFC][Patch 2/6] integrity: fs hook placement
Mimi Zohar [EMAIL PROTECTED] writes: This patch places calls to the new integrity hooks in the appropriate places in the fs directory. It is not meant in any way to be viewed as a complete set, but used as a basis for an initial discussion. Index: linux-2.6.21-rc3-mm2/fs/ext3/xattr_security.c === --- linux-2.6.21-rc3-mm2.orig/fs/ext3/xattr_security.c +++ linux-2.6.21-rc3-mm2/fs/ext3/xattr_security.c @@ -10,6 +10,7 @@ #include linux/ext3_jbd.h #include linux/ext3_fs.h #include linux/security.h +#include linux/integrity.h #include xattr.h static size_t @@ -58,12 +59,19 @@ ext3_init_security(handle_t *handle, str err = security_inode_init_security(inode, dir, name, value, len); if (err) { + /* Even if creation of the security xattr fails, must + * indicate this is a new inode. */ + integrity_inode_init_integrity(inode, dir, NULL, NULL, NULL); if (err == -EOPNOTSUPP) return 0; return err; } block begin err = ext3_xattr_set_handle(handle, inode, EXT3_XATTR_INDEX_SECURITY, name, value, len, 0); + + integrity_inode_init_integrity(inode, dir, name, value, len); + err = ext3_xattr_set_handle(handle, inode, EXT3_XATTR_INDEX_SECURITY, + name, value, len, 0); block end May be i've missed something, but i don't get last block. Why you call ext3_xattr_set_handle() twise?, or you just mistyped and it has to look like this: block_begin + integrity_inode_init_integrity(inode, dir, name, value, len); err = ext3_xattr_set_handle(handle, inode, EXT3_XATTR_INDEX_SECURITY, name, value, len, 0); block_end kfree(name); kfree(value); return err; Index: linux-2.6.21-rc3-mm2/include/linux/fs.h === --- linux-2.6.21-rc3-mm2.orig/include/linux/fs.h +++ linux-2.6.21-rc3-mm2/include/linux/fs.h @@ -591,6 +591,9 @@ struct inode { #ifdef CONFIG_SECURITY void*i_security; #endif +#ifdef CONFIG_INTEGRITY + void*i_integrity; +#endif void*i_private; /* fs or device private pointer */ }; Index: linux-2.6.21-rc3-mm2/fs/dcache.c === --- linux-2.6.21-rc3-mm2.orig/fs/dcache.c +++ linux-2.6.21-rc3-mm2/fs/dcache.c @@ -29,6 +29,7 @@ #include linux/file.h #include asm/uaccess.h #include linux/security.h +#include linux/integrity.h #include linux/seqlock.h #include linux/swap.h #include linux/bootmem.h @@ -986,6 +987,7 @@ void d_instantiate(struct dentry *entry, entry-d_inode = inode; fsnotify_d_instantiate(entry, inode); spin_unlock(dcache_lock); + integrity_d_instantiate(entry, inode); security_d_instantiate(entry, inode); } @@ -1050,6 +1052,7 @@ struct dentry *d_instantiate_unique(stru spin_unlock(dcache_lock); if (!result) { + integrity_d_instantiate(entry, inode); security_d_instantiate(entry, inode); return NULL; } @@ -1187,6 +1190,7 @@ struct dentry *d_splice_alias(struct ino BUG_ON(!(new-d_flags DCACHE_DISCONNECTED)); fsnotify_d_instantiate(new, inode); spin_unlock(dcache_lock); + integrity_d_instantiate(new, inode); security_d_instantiate(new, inode); d_rehash(dentry); d_move(new, dentry); @@ -1197,6 +1201,7 @@ struct dentry *d_splice_alias(struct ino dentry-d_inode = inode; fsnotify_d_instantiate(dentry, inode); spin_unlock(dcache_lock); + integrity_d_instantiate(dentry, inode); security_d_instantiate(dentry, inode); d_rehash(dentry); } @@ -1748,6 +1753,7 @@ found: spin_unlock(dcache_lock); out_nolock: if (actual == dentry) { + integrity_d_instantiate(dentry, inode); security_d_instantiate(dentry, inode); return NULL; } Index: linux-2.6.21-rc3-mm2/fs/file_table.c === --- linux-2.6.21-rc3-mm2.orig/fs/file_table.c +++ linux-2.6.21-rc3-mm2/fs/file_table.c @@ -13,6 +13,7 @@ #include linux/smp_lock.h #include linux/fs.h #include linux/security.h +#include linux/integrity.h #include linux/eventpoll.h #include linux/rcupdate.h #include linux/mount.h @@ -169,6 +170,7 @@ void fastcall __fput(struct file *file) if (file-f_op file-f_op-release) file-f_op-release(inode, file);
Re: linux-2.6.21-rc2 compile warnings
"young dave" <[EMAIL PROTECTED]> writes: > Hi, > As I compile the linux-2.6.21-rc2 (gcc version 3.4.6), there's some > warning messages: you should probably check -mm tree may be some issues already fixed where. > > drivers/video/Kconfig:1622:warning: 'select' used by config symbol > 'FB_PS3' refer to undefined symbol 'PS3_PS3AV' > > fs/partitions/check.c: In function `add_partition': > fs/partitions/check.c:388: warning: ignoring return value of > `kobject_add', declared with attribute warn_unused_result > fs/partitions/check.c:391: warning: ignoring return value of > `sysfs_create_link', declared with attribute warn_unused_result > fs/partitions/check.c:399: warning: ignoring return value of > `sysfs_create_file', declared with attribute warn_unused_result > > drivers/ata/libata-sff.c: In function `ata_pci_init_one': > drivers/ata/libata-sff.c:820: warning: ignoring return value of > `pci_request_region', declared with attribute warn_unused_result > drivers/ata/libata-sff.c:822: warning: ignoring return value of > `pci_request_region', declared with attribute warn_unused_result > drivers/ata/libata-sff.c:824: warning: ignoring return value of > `pci_request_region', declared with attribute warn_unused_result > > drivers/base/core.c: In function `device_add': > drivers/base/core.c:580: warning: ignoring return value of > `sysfs_create_link', declared with attribute warn_unused_result > drivers/base/core.c:585: warning: ignoring return value of > `sysfs_create_link', declared with attribute warn_unused_result > drivers/base/core.c:589: warning: ignoring return value of > `sysfs_create_link', declared with attribute warn_unused_result > drivers/base/core.c:594: warning: ignoring return value of > `sysfs_create_link', declared with attribute warn_unused_result > drivers/base/core.c: In function `device_rename': > drivers/base/core.c:1039: warning: ignoring return value of > `sysfs_create_link', declared with attribute warn_unused_result > drivers/base/core.c:1049: warning: ignoring return value of > `sysfs_create_link', declared with attribute warn_unused_result > > drivers/block/floppy.c: In function `floppy_init': > drivers/block/floppy.c:4337: warning: ignoring return value of I've already sent patch, but i wasn't added to -mm yet ( at least it not rejecdted yet :) ) > `device_create_file', declared with attribute warn_unused_result > drivers/cpufreq/cpufreq.c: In function `cpufreq_add_dev': > drivers/cpufreq/cpufreq.c:826: warning: ignoring return value of > `sysfs_create_file', declared with attribute warn_unused_result > drivers/cpufreq/cpufreq.c:830: warning: ignoring return value of > `sysfs_create_file', declared with attribute warn_unused_result > drivers/cpufreq/cpufreq.c:832: warning: ignoring return value of > `sysfs_create_file', declared with attribute warn_unused_result > > drivers/ide/pci/cs5530.c: In function `init_chipset_cs5530': > drivers/ide/pci/cs5530.c:244: warning: ignoring return value of > `pci_set_mwi', declared with attribute warn_unused_result > > drivers/ide/pci/sc1200.c: In function `sc1200_resume': > drivers/ide/pci/sc1200.c:397: warning: ignoring return value of > `pci_enable_device', declared with attribute warn_unused_result > > drivers/ide/setup-pci.c: In function `ide_scan_pcibus': > drivers/ide/setup-pci.c:866: warning: ignoring return value of > `__pci_register_driver', declared with attribute warn_unused_result > > drivers/input/mousedev.c: In function `mousedev_connect': > drivers/input/mousedev.c:664: warning: ignoring return value of > `sysfs_create_link', declared with attribute warn_unused_result > > drivers/input/joydev.c: In function `joydev_connect': > drivers/input/joydev.c:542: warning: ignoring return value of > `sysfs_create_link', declared with attribute warn_unused_result > > > drivers/input/evdev.c: In function `evdev_connect': > drivers/input/evdev.c:653: warning: ignoring return value of > `sysfs_create_link', declared with attribute warn_unused_result > drivers/message/i2o/device.c: In function `i2o_device_add': > drivers/message/i2o/device.c:247: warning: ignoring return value of > `sysfs_create_link', declared with attribute warn_unused_result > drivers/message/i2o/device.c:254: warning: ignoring return value of > `sysfs_create_link', declared with attribute warn_unused_result > drivers/message/i2o/device.c:260: warning: ignoring return value of > `sysfs_create_link', declared with attribute warn_unused_result > drivers/message/i2o/device.c:267: warning: ignoring return value of > `sysfs_create_link', declared with attribute warn_unused_result > CC drivers/message/i2o/debug. > > drivers/net/sk98lin/skge.c: In function `skge_resume': > drivers/net/sk98lin/skge.c:5128: warning: ignoring return value of > `pci_enable_device', declared with attribute warn_unused_result > drivers/net/tulip/xircom_tulip_cb.c: In function `outl_CSR6': > drivers/net/tulip/xircom_tulip_cb.c:348: warning: `save_flags' is > deprecated (declared at
Re: linux-2.6.21-rc2 compile warnings
young dave [EMAIL PROTECTED] writes: Hi, As I compile the linux-2.6.21-rc2 (gcc version 3.4.6), there's some warning messages: you should probably check -mm tree may be some issues already fixed where. drivers/video/Kconfig:1622:warning: 'select' used by config symbol 'FB_PS3' refer to undefined symbol 'PS3_PS3AV' fs/partitions/check.c: In function `add_partition': fs/partitions/check.c:388: warning: ignoring return value of `kobject_add', declared with attribute warn_unused_result fs/partitions/check.c:391: warning: ignoring return value of `sysfs_create_link', declared with attribute warn_unused_result fs/partitions/check.c:399: warning: ignoring return value of `sysfs_create_file', declared with attribute warn_unused_result drivers/ata/libata-sff.c: In function `ata_pci_init_one': drivers/ata/libata-sff.c:820: warning: ignoring return value of `pci_request_region', declared with attribute warn_unused_result drivers/ata/libata-sff.c:822: warning: ignoring return value of `pci_request_region', declared with attribute warn_unused_result drivers/ata/libata-sff.c:824: warning: ignoring return value of `pci_request_region', declared with attribute warn_unused_result drivers/base/core.c: In function `device_add': drivers/base/core.c:580: warning: ignoring return value of `sysfs_create_link', declared with attribute warn_unused_result drivers/base/core.c:585: warning: ignoring return value of `sysfs_create_link', declared with attribute warn_unused_result drivers/base/core.c:589: warning: ignoring return value of `sysfs_create_link', declared with attribute warn_unused_result drivers/base/core.c:594: warning: ignoring return value of `sysfs_create_link', declared with attribute warn_unused_result drivers/base/core.c: In function `device_rename': drivers/base/core.c:1039: warning: ignoring return value of `sysfs_create_link', declared with attribute warn_unused_result drivers/base/core.c:1049: warning: ignoring return value of `sysfs_create_link', declared with attribute warn_unused_result drivers/block/floppy.c: In function `floppy_init': drivers/block/floppy.c:4337: warning: ignoring return value of I've already sent patch, but i wasn't added to -mm yet ( at least it not rejecdted yet :) ) `device_create_file', declared with attribute warn_unused_result drivers/cpufreq/cpufreq.c: In function `cpufreq_add_dev': drivers/cpufreq/cpufreq.c:826: warning: ignoring return value of `sysfs_create_file', declared with attribute warn_unused_result drivers/cpufreq/cpufreq.c:830: warning: ignoring return value of `sysfs_create_file', declared with attribute warn_unused_result drivers/cpufreq/cpufreq.c:832: warning: ignoring return value of `sysfs_create_file', declared with attribute warn_unused_result drivers/ide/pci/cs5530.c: In function `init_chipset_cs5530': drivers/ide/pci/cs5530.c:244: warning: ignoring return value of `pci_set_mwi', declared with attribute warn_unused_result drivers/ide/pci/sc1200.c: In function `sc1200_resume': drivers/ide/pci/sc1200.c:397: warning: ignoring return value of `pci_enable_device', declared with attribute warn_unused_result drivers/ide/setup-pci.c: In function `ide_scan_pcibus': drivers/ide/setup-pci.c:866: warning: ignoring return value of `__pci_register_driver', declared with attribute warn_unused_result drivers/input/mousedev.c: In function `mousedev_connect': drivers/input/mousedev.c:664: warning: ignoring return value of `sysfs_create_link', declared with attribute warn_unused_result drivers/input/joydev.c: In function `joydev_connect': drivers/input/joydev.c:542: warning: ignoring return value of `sysfs_create_link', declared with attribute warn_unused_result drivers/input/evdev.c: In function `evdev_connect': drivers/input/evdev.c:653: warning: ignoring return value of `sysfs_create_link', declared with attribute warn_unused_result drivers/message/i2o/device.c: In function `i2o_device_add': drivers/message/i2o/device.c:247: warning: ignoring return value of `sysfs_create_link', declared with attribute warn_unused_result drivers/message/i2o/device.c:254: warning: ignoring return value of `sysfs_create_link', declared with attribute warn_unused_result drivers/message/i2o/device.c:260: warning: ignoring return value of `sysfs_create_link', declared with attribute warn_unused_result drivers/message/i2o/device.c:267: warning: ignoring return value of `sysfs_create_link', declared with attribute warn_unused_result CC drivers/message/i2o/debug. drivers/net/sk98lin/skge.c: In function `skge_resume': drivers/net/sk98lin/skge.c:5128: warning: ignoring return value of `pci_enable_device', declared with attribute warn_unused_result drivers/net/tulip/xircom_tulip_cb.c: In function `outl_CSR6': drivers/net/tulip/xircom_tulip_cb.c:348: warning: `save_flags' is deprecated (declared at include/linux/interrupt.h:214) drivers/net/tulip/xircom_tulip_cb.c:349: warning: `cli' is deprecated
Re: [PATCH] ext3: dirindex error pointer issues
Andreas Dilger <[EMAIL PROTECTED]> writes: > On Mar 04, 2007 17:18 +0300, Dmitriy Monakhov wrote: >> - ext3_dx_find_entry() exit with out setting proper error pointer >> - do_split() exit with out setting proper error pointer >>it is realy painful because many callers contain folowing code: >>de = do_split(handle,dir, , frame, , ); >>if (!(de)) >> return retval; >><<< WOW retval wasn't changed by do_split(), so caller failed >><<< but return SUCCESS :) >> - Rearrange do_split() error path. Current error path is realy ugly, all >>this up and down jump stuff doesn't make code easy to understand. >> >> Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]> >> --- >> fs/ext3/namei.c | 26 +++--- >> fs/ext4/namei.c | 26 +++--- >> 2 files changed, 30 insertions(+), 22 deletions(-) >> >> diff --git a/fs/ext3/namei.c b/fs/ext3/namei.c >> index 49159f1..1a52586 100644 >> --- a/fs/ext3/namei.c >> +++ b/fs/ext3/namei.c >> @@ -969,6 +969,7 @@ static struct buffer_head * ext3_dx_find_entry(struct >> dentry *dentry, >>(block<>+((char *)de - bh->b_data))) { >> brelse (bh); >> +*err = ERR_BAD_DX_DIR; >> goto errout; >> } >> *res_dir = de; > > I have no objection to this change (by principle of least surprise) but > I don't know if it fixes a real problem. The one caller of this function > treats ERR_BAD_DX_DIR the same as bh == NULL. Execly ERR_BAD_DX_DIR treats the same as bh == NULL and let's look at callers code: struct buffer_head * ext3_find_entry(..) { ... bh = ext3_dx_find_entry(dentry, res_dir, ); /* * On success, or if the error was file not found, * return. Otherwise, fall back to doing a search the * old fashioned way. */ if (bh || (err != ERR_BAD_DX_DIR)) <<<<< after ext3_dx_find_entry() has failed , but don't set err pointer <<<<< we get (bh == NULL, err == 0) , and then just return NULL. return bh; ... } > >> @@ -1134,9 +1135,9 @@ static struct ext3_dir_entry_2 *do_split(handle_t >> *handle, struct inode *dir, >> char *data1 = (*bh)->b_data, *data2; >> unsigned split; >> struct ext3_dir_entry_2 *de = NULL, *de2; >> -int err; >> +int err = 0; >> >> -bh2 = ext3_append (handle, dir, , error); >> +bh2 = ext3_append (handle, dir, , ); > > Why don't we just remove "err" entirely and use *error to avoid any risk > of setting err and not returning it in error? Also reduces stack usage > that tiny little bit. I've chosen this approuch becouse of: 1) this approuch was used in block allocation code. 2) this approuch prevent folowing errors: *error = do_some_function(.) /* some code*/ if(error) we do this checking as we do it thousands times before, but forget what error it pointer here. And compiler can't warn us here because it is absolutely legal operation. At least it is better to rename error to errorp. Btw: I've thought about adding assertations to error path witch may check what errp pointer was properly initialized after error happends. folowing code is draft only: --- a/fs/ext3/namei.c +++ b/fs/ext3/namei.c @@ -63,6 +63,7 @@ static struct buffer_head *ext3_append(handle_t *handle, EXT3_I(inode)->i_disksize = inode->i_size; ext3_journal_get_write_access(handle,bh); } + assert(bh || *err); return bh; } @@ -433,6 +434,7 @@ fail2: frame--; } fail: + assert(*err != 0); return NULL; } > > > In ext3_dx_add_entry() it appears like we should "goto journal_error" > to report an error after the failed call to do_split(), instead of just > "goto cleanup" so that we report an error in the filesystem. Not 100% > sure of this. do_split() already invoked ext3_std_error() on it's "journal_error" path. > > Cheers, Andreas > -- > Andreas Dilger > Principal Software Engineer > Cluster File Systems, Inc. > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ext3: dirindex error pointer issues
- ext3_dx_find_entry() exit with out setting proper error pointer - do_split() exit with out setting proper error pointer it is realy painful because many callers contain folowing code: de = do_split(handle,dir, , frame, , ); if (!(de)) return retval; <<< WOW retval wasn't changed by do_split(), so caller failed <<< but return SUCCESS :) - Rearrange do_split() error path. Current error path is realy ugly, all this up and down jump stuff doesn't make code easy to understand. Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]> --- fs/ext3/namei.c | 26 +++--- fs/ext4/namei.c | 26 +++--- 2 files changed, 30 insertions(+), 22 deletions(-) diff --git a/fs/ext3/namei.c b/fs/ext3/namei.c index 49159f1..1a52586 100644 --- a/fs/ext3/namei.c +++ b/fs/ext3/namei.c @@ -969,6 +969,7 @@ static struct buffer_head * ext3_dx_find_entry(struct dentry *dentry, (blockb_data, *data2; unsigned split; struct ext3_dir_entry_2 *de = NULL, *de2; - int err; + int err = 0; - bh2 = ext3_append (handle, dir, , error); + bh2 = ext3_append (handle, dir, , ); if (!(bh2)) { brelse(*bh); *bh = NULL; @@ -1145,14 +1146,9 @@ static struct ext3_dir_entry_2 *do_split(handle_t *handle, struct inode *dir, BUFFER_TRACE(*bh, "get_write_access"); err = ext3_journal_get_write_access(handle, *bh); - if (err) { - journal_error: - brelse(*bh); - brelse(bh2); - *bh = NULL; - ext3_std_error(dir->i_sb, err); - goto errout; - } + if (err) + goto journal_error; + BUFFER_TRACE(frame->bh, "get_write_access"); err = ext3_journal_get_write_access(handle, frame->bh); if (err) @@ -1195,8 +1191,16 @@ static struct ext3_dir_entry_2 *do_split(handle_t *handle, struct inode *dir, goto journal_error; brelse (bh2); dxtrace(dx_show_index ("frame", frame->entries)); -errout: return de; + +journal_error: + brelse(*bh); + brelse(bh2); + *bh = NULL; +errout: + ext3_std_error(dir->i_sb, err); + *error = err; + return NULL; } #endif diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index e7e1d79..682a3d7 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -967,6 +967,7 @@ static struct buffer_head * ext4_dx_find_entry(struct dentry *dentry, (block b_data, *data2; unsigned split; struct ext4_dir_entry_2 *de = NULL, *de2; - int err; + int err = 0; - bh2 = ext4_append (handle, dir, , error); + bh2 = ext4_append (handle, dir, , ); if (!(bh2)) { brelse(*bh); *bh = NULL; @@ -1143,14 +1144,9 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir, BUFFER_TRACE(*bh, "get_write_access"); err = ext4_journal_get_write_access(handle, *bh); - if (err) { - journal_error: - brelse(*bh); - brelse(bh2); - *bh = NULL; - ext4_std_error(dir->i_sb, err); - goto errout; - } + if (err) + goto journal_error; + BUFFER_TRACE(frame->bh, "get_write_access"); err = ext4_journal_get_write_access(handle, frame->bh); if (err) @@ -1193,8 +1189,16 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir, goto journal_error; brelse (bh2); dxtrace(dx_show_index ("frame", frame->entries)); -errout: return de; + +journal_error: + brelse(*bh); + brelse(bh2); + *bh = NULL; +errout: + ext4_std_error(dir->i_sb, err); + *error = err; + return NULL; } #endif -- 1.5.0.1 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] udf: possible null pointer dereference while load_partition
sb_read may return NULL, let's explicitly check it. Signed-off-by: Dmitriy Monakhov <[EMAIL PROTECTED]> diff --git a/fs/udf/super.c b/fs/udf/super.c index f4b3265..951b85d 100644 --- a/fs/udf/super.c +++ b/fs/udf/super.c @@ -1407,6 +1407,8 @@ udf_load_partition(struct super_block *sb, kernel_lb_addr *fileset) pos = udf_block_map(UDF_SB_VAT(sb), 0); bh = sb_bread(sb, pos); + if (!bh) + return 1; UDF_SB_TYPEVIRT(sb,i).s_start_offset = le16_to_cpu(((struct virtualAllocationTable20 *)bh->b_data + udf_ext0_offset(UDF_SB_VAT(sb)))->lengthHeader) + udf_ext0_offset(UDF_SB_VAT(sb));
[patch] udf: possible null pointer dereference while load_partition
sb_read may return NULL, let's explicitly check it. Signed-off-by: Dmitriy Monakhov [EMAIL PROTECTED] diff --git a/fs/udf/super.c b/fs/udf/super.c index f4b3265..951b85d 100644 --- a/fs/udf/super.c +++ b/fs/udf/super.c @@ -1407,6 +1407,8 @@ udf_load_partition(struct super_block *sb, kernel_lb_addr *fileset) pos = udf_block_map(UDF_SB_VAT(sb), 0); bh = sb_bread(sb, pos); + if (!bh) + return 1; UDF_SB_TYPEVIRT(sb,i).s_start_offset = le16_to_cpu(((struct virtualAllocationTable20 *)bh-b_data + udf_ext0_offset(UDF_SB_VAT(sb)))-lengthHeader) + udf_ext0_offset(UDF_SB_VAT(sb));
[PATCH] ext3: dirindex error pointer issues
- ext3_dx_find_entry() exit with out setting proper error pointer - do_split() exit with out setting proper error pointer it is realy painful because many callers contain folowing code: de = do_split(handle,dir, bh, frame, hinfo, retval); if (!(de)) return retval; WOW retval wasn't changed by do_split(), so caller failed but return SUCCESS :) - Rearrange do_split() error path. Current error path is realy ugly, all this up and down jump stuff doesn't make code easy to understand. Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] --- fs/ext3/namei.c | 26 +++--- fs/ext4/namei.c | 26 +++--- 2 files changed, 30 insertions(+), 22 deletions(-) diff --git a/fs/ext3/namei.c b/fs/ext3/namei.c index 49159f1..1a52586 100644 --- a/fs/ext3/namei.c +++ b/fs/ext3/namei.c @@ -969,6 +969,7 @@ static struct buffer_head * ext3_dx_find_entry(struct dentry *dentry, (blockEXT3_BLOCK_SIZE_BITS(sb)) +((char *)de - bh-b_data))) { brelse (bh); + *err = ERR_BAD_DX_DIR; goto errout; } *res_dir = de; @@ -1134,9 +1135,9 @@ static struct ext3_dir_entry_2 *do_split(handle_t *handle, struct inode *dir, char *data1 = (*bh)-b_data, *data2; unsigned split; struct ext3_dir_entry_2 *de = NULL, *de2; - int err; + int err = 0; - bh2 = ext3_append (handle, dir, newblock, error); + bh2 = ext3_append (handle, dir, newblock, err); if (!(bh2)) { brelse(*bh); *bh = NULL; @@ -1145,14 +1146,9 @@ static struct ext3_dir_entry_2 *do_split(handle_t *handle, struct inode *dir, BUFFER_TRACE(*bh, get_write_access); err = ext3_journal_get_write_access(handle, *bh); - if (err) { - journal_error: - brelse(*bh); - brelse(bh2); - *bh = NULL; - ext3_std_error(dir-i_sb, err); - goto errout; - } + if (err) + goto journal_error; + BUFFER_TRACE(frame-bh, get_write_access); err = ext3_journal_get_write_access(handle, frame-bh); if (err) @@ -1195,8 +1191,16 @@ static struct ext3_dir_entry_2 *do_split(handle_t *handle, struct inode *dir, goto journal_error; brelse (bh2); dxtrace(dx_show_index (frame, frame-entries)); -errout: return de; + +journal_error: + brelse(*bh); + brelse(bh2); + *bh = NULL; +errout: + ext3_std_error(dir-i_sb, err); + *error = err; + return NULL; } #endif diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index e7e1d79..682a3d7 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -967,6 +967,7 @@ static struct buffer_head * ext4_dx_find_entry(struct dentry *dentry, (blockEXT4_BLOCK_SIZE_BITS(sb)) +((char *)de - bh-b_data))) { brelse (bh); + *err = ERR_BAD_DX_DIR; goto errout; } *res_dir = de; @@ -1132,9 +1133,9 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir, char *data1 = (*bh)-b_data, *data2; unsigned split; struct ext4_dir_entry_2 *de = NULL, *de2; - int err; + int err = 0; - bh2 = ext4_append (handle, dir, newblock, error); + bh2 = ext4_append (handle, dir, newblock, err); if (!(bh2)) { brelse(*bh); *bh = NULL; @@ -1143,14 +1144,9 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir, BUFFER_TRACE(*bh, get_write_access); err = ext4_journal_get_write_access(handle, *bh); - if (err) { - journal_error: - brelse(*bh); - brelse(bh2); - *bh = NULL; - ext4_std_error(dir-i_sb, err); - goto errout; - } + if (err) + goto journal_error; + BUFFER_TRACE(frame-bh, get_write_access); err = ext4_journal_get_write_access(handle, frame-bh); if (err) @@ -1193,8 +1189,16 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir, goto journal_error; brelse (bh2); dxtrace(dx_show_index (frame, frame-entries)); -errout: return de; + +journal_error: + brelse(*bh); + brelse(bh2); + *bh = NULL; +errout: + ext4_std_error(dir-i_sb, err); + *error = err; + return NULL; } #endif -- 1.5.0.1 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to
Re: [PATCH] ext3: dirindex error pointer issues
Andreas Dilger [EMAIL PROTECTED] writes: On Mar 04, 2007 17:18 +0300, Dmitriy Monakhov wrote: - ext3_dx_find_entry() exit with out setting proper error pointer - do_split() exit with out setting proper error pointer it is realy painful because many callers contain folowing code: de = do_split(handle,dir, bh, frame, hinfo, retval); if (!(de)) return retval; WOW retval wasn't changed by do_split(), so caller failed but return SUCCESS :) - Rearrange do_split() error path. Current error path is realy ugly, all this up and down jump stuff doesn't make code easy to understand. Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] --- fs/ext3/namei.c | 26 +++--- fs/ext4/namei.c | 26 +++--- 2 files changed, 30 insertions(+), 22 deletions(-) diff --git a/fs/ext3/namei.c b/fs/ext3/namei.c index 49159f1..1a52586 100644 --- a/fs/ext3/namei.c +++ b/fs/ext3/namei.c @@ -969,6 +969,7 @@ static struct buffer_head * ext3_dx_find_entry(struct dentry *dentry, (blockEXT3_BLOCK_SIZE_BITS(sb)) +((char *)de - bh-b_data))) { brelse (bh); +*err = ERR_BAD_DX_DIR; goto errout; } *res_dir = de; I have no objection to this change (by principle of least surprise) but I don't know if it fixes a real problem. The one caller of this function treats ERR_BAD_DX_DIR the same as bh == NULL. Execly ERR_BAD_DX_DIR treats the same as bh == NULL and let's look at callers code: struct buffer_head * ext3_find_entry(..) { ... bh = ext3_dx_find_entry(dentry, res_dir, err); /* * On success, or if the error was file not found, * return. Otherwise, fall back to doing a search the * old fashioned way. */ if (bh || (err != ERR_BAD_DX_DIR)) after ext3_dx_find_entry() has failed , but don't set err pointer we get (bh == NULL, err == 0) , and then just return NULL. return bh; ... } @@ -1134,9 +1135,9 @@ static struct ext3_dir_entry_2 *do_split(handle_t *handle, struct inode *dir, char *data1 = (*bh)-b_data, *data2; unsigned split; struct ext3_dir_entry_2 *de = NULL, *de2; -int err; +int err = 0; -bh2 = ext3_append (handle, dir, newblock, error); +bh2 = ext3_append (handle, dir, newblock, err); Why don't we just remove err entirely and use *error to avoid any risk of setting err and not returning it in error? Also reduces stack usage that tiny little bit. I've chosen this approuch becouse of: 1) this approuch was used in block allocation code. 2) this approuch prevent folowing errors: *error = do_some_function(.) /* some code*/ if(error) we do this checking as we do it thousands times before, but forget what error it pointer here. And compiler can't warn us here because it is absolutely legal operation. At least it is better to rename error to errorp. Btw: I've thought about adding assertations to error path witch may check what errp pointer was properly initialized after error happends. folowing code is draft only: --- a/fs/ext3/namei.c +++ b/fs/ext3/namei.c @@ -63,6 +63,7 @@ static struct buffer_head *ext3_append(handle_t *handle, EXT3_I(inode)-i_disksize = inode-i_size; ext3_journal_get_write_access(handle,bh); } + assert(bh || *err); return bh; } @@ -433,6 +434,7 @@ fail2: frame--; } fail: + assert(*err != 0); return NULL; } In ext3_dx_add_entry() it appears like we should goto journal_error to report an error after the failed call to do_split(), instead of just goto cleanup so that we report an error in the filesystem. Not 100% sure of this. do_split() already invoked ext3_std_error() on it's journal_error path. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] kobject: new_device->kref wasn't putted in kobject_move()
Dmitriy Monakhov <[EMAIL PROTECTED]> writes: > Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]> > > diff --git a/lib/kobject.c b/lib/kobject.c > index b94f208..b11f7b2 100644 > --- a/lib/kobject.c > +++ b/lib/kobject.c > @@ -432,6 +432,7 @@ int kobject_move(struct kobject *kobj, struct kobject > *new_parent) > kobject_put(old_parent); > kobject_uevent_env(kobj, KOBJ_MOVE, envp); > out: > + kobject_put(new_parent); > kobject_put(kobj); > kfree(devpath_string); > kfree(devpath); OOps i'm realy sorry, by occasion patch was incomplete :( The updated patch version following: [PATCH] kobject: new_device->kref wasn't putted after error in kobject_move() If error happen we jump to "out" label, in this case new_device not yet became the parent but it wasn't putted. Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]> diff --git a/lib/kobject.c b/lib/kobject.c index b94f208..a6c9a06 100644 --- a/lib/kobject.c +++ b/lib/kobject.c @@ -429,9 +429,11 @@ int kobject_move(struct kobject *kobj, struct kobject *new_parent) goto out; old_parent = kobj->parent; kobj->parent = new_parent; + new_parent = NULL; kobject_put(old_parent); kobject_uevent_env(kobj, KOBJ_MOVE, envp); out: + kobject_put(new_parent); kobject_put(kobj); kfree(devpath_string); kfree(devpath); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] kobject: new_device->kref wasn't putted in kobject_move()
Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]> diff --git a/lib/kobject.c b/lib/kobject.c index b94f208..b11f7b2 100644 --- a/lib/kobject.c +++ b/lib/kobject.c @@ -432,6 +432,7 @@ int kobject_move(struct kobject *kobj, struct kobject *new_parent) kobject_put(old_parent); kobject_uevent_env(kobj, KOBJ_MOVE, envp); out: + kobject_put(new_parent); kobject_put(kobj); kfree(devpath_string); kfree(devpath); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] kobject: new_device-kref wasn't putted in kobject_move()
Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] diff --git a/lib/kobject.c b/lib/kobject.c index b94f208..b11f7b2 100644 --- a/lib/kobject.c +++ b/lib/kobject.c @@ -432,6 +432,7 @@ int kobject_move(struct kobject *kobj, struct kobject *new_parent) kobject_put(old_parent); kobject_uevent_env(kobj, KOBJ_MOVE, envp); out: + kobject_put(new_parent); kobject_put(kobj); kfree(devpath_string); kfree(devpath); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] kobject: new_device-kref wasn't putted in kobject_move()
Dmitriy Monakhov [EMAIL PROTECTED] writes: Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] diff --git a/lib/kobject.c b/lib/kobject.c index b94f208..b11f7b2 100644 --- a/lib/kobject.c +++ b/lib/kobject.c @@ -432,6 +432,7 @@ int kobject_move(struct kobject *kobj, struct kobject *new_parent) kobject_put(old_parent); kobject_uevent_env(kobj, KOBJ_MOVE, envp); out: + kobject_put(new_parent); kobject_put(kobj); kfree(devpath_string); kfree(devpath); OOps i'm realy sorry, by occasion patch was incomplete :( The updated patch version following: [PATCH] kobject: new_device-kref wasn't putted after error in kobject_move() If error happen we jump to out label, in this case new_device not yet became the parent but it wasn't putted. Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] diff --git a/lib/kobject.c b/lib/kobject.c index b94f208..a6c9a06 100644 --- a/lib/kobject.c +++ b/lib/kobject.c @@ -429,9 +429,11 @@ int kobject_move(struct kobject *kobj, struct kobject *new_parent) goto out; old_parent = kobj-parent; kobj-parent = new_parent; + new_parent = NULL; kobject_put(old_parent); kobject_uevent_env(kobj, KOBJ_MOVE, envp); out: + kobject_put(new_parent); kobject_put(kobj); kfree(devpath_string); kfree(devpath); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] floppy: handle device_create_file() failure while init
This patch kills the "ignoring return value of 'device_create_file'" warning message. Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]> --- drivers/block/floppy.c |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c index 567c630..dfdabc3 100644 --- a/drivers/block/floppy.c +++ b/drivers/block/floppy.c @@ -4333,7 +4333,10 @@ static int __init floppy_init(void) if (err) goto out_flush_work; - device_create_file(_device[drive].dev,_attr_cmos); + err = device_create_file(_device[drive].dev,_attr_cmos); + if (err) + goto out_unreg_platform_dev; + /* to be cleaned up... */ disks[drive]->private_data = (void *)(long)drive; disks[drive]->queue = floppy_queue; @@ -4344,6 +4347,8 @@ static int __init floppy_init(void) return 0; +out_unreg_platform_dev: + platform_device_unregister(_device[drive]); out_flush_work: flush_scheduled_work(); if (usage_count) -- 1.4.4.2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] floppy: handle device_create_file() failure while init
This patch kills the ignoring return value of 'device_create_file' warning message. Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] --- drivers/block/floppy.c |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c index 567c630..dfdabc3 100644 --- a/drivers/block/floppy.c +++ b/drivers/block/floppy.c @@ -4333,7 +4333,10 @@ static int __init floppy_init(void) if (err) goto out_flush_work; - device_create_file(floppy_device[drive].dev,dev_attr_cmos); + err = device_create_file(floppy_device[drive].dev,dev_attr_cmos); + if (err) + goto out_unreg_platform_dev; + /* to be cleaned up... */ disks[drive]-private_data = (void *)(long)drive; disks[drive]-queue = floppy_queue; @@ -4344,6 +4347,8 @@ static int __init floppy_init(void) return 0; +out_unreg_platform_dev: + platform_device_unregister(floppy_device[drive]); out_flush_work: flush_scheduled_work(); if (usage_count) -- 1.4.4.2 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ecryptfs: check xattr operation support fix
- ecryptfs_write_inode_size_to_metadata() error code was ignored. - i_op->setxattr() must be supported by lower fs because used below. Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]> --- fs/ecryptfs/inode.c |6 +++--- fs/ecryptfs/mmap.c |3 ++- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c index 27fd14a..9ccefad 100644 --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -168,9 +168,9 @@ static int grow_file(struct dentry *ecryptfs_dentry, struct file *lower_file, goto out; } i_size_write(inode, 0); - ecryptfs_write_inode_size_to_metadata(lower_file, lower_inode, inode, - ecryptfs_dentry, - ECRYPTFS_LOWER_I_MUTEX_NOT_HELD); + rc = ecryptfs_write_inode_size_to_metadata(lower_file, lower_inode, + inode, ecryptfs_dentry, + ECRYPTFS_LOWER_I_MUTEX_NOT_HELD); ecryptfs_inode_to_private(inode)->crypt_stat.flags |= ECRYPTFS_NEW_FILE; out: return rc; diff --git a/fs/ecryptfs/mmap.c b/fs/ecryptfs/mmap.c index 1e5d2ba..416985f 100644 --- a/fs/ecryptfs/mmap.c +++ b/fs/ecryptfs/mmap.c @@ -491,7 +491,8 @@ static int ecryptfs_write_inode_size_to_xattr(struct inode *lower_inode, goto out; } lower_dentry = ecryptfs_dentry_to_lower(ecryptfs_dentry); - if (!lower_dentry->d_inode->i_op->getxattr) { + if (!lower_dentry->d_inode->i_op->getxattr || + !lower_dentry->d_inode->i_op->setxattr) { printk(KERN_WARNING "No support for setting xattr in lower filesystem\n"); rc = -ENOSYS; -- 1.5.0.1 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][RFC] ext3: Handle ext[34]_journal_stop() failure
Where are many places where _journal_stop() return code wasn't checked. Off cause _journal_stop() failed very rarely (and usually with fatal consequences), but this does'n meen it should not be checked. For example most retry loops looks like follows: ext3_journal_stop(handle); if (err == -ENOSPC && ext3_should_retry_alloc(dir->i_sb, )) goto retry; It is realy insane do "retry" if ext3_journal_stop() has failed, because this may increase damage in case of real problem. Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]> --- fs/ext3/acl.c | 12 fs/ext3/inode.c | 22 +- fs/ext3/ioctl.c | 12 fs/ext3/namei.c | 48 fs/ext3/xattr.c |4 ++-- fs/ext4/acl.c | 12 fs/ext4/inode.c | 22 +- fs/ext4/ioctl.c | 12 fs/ext4/namei.c | 48 fs/ext4/xattr.c |4 ++-- 10 files changed, 134 insertions(+), 62 deletions(-) diff --git a/fs/ext3/acl.c b/fs/ext3/acl.c index 1e5038d..bbf4d8a 100644 --- a/fs/ext3/acl.c +++ b/fs/ext3/acl.c @@ -375,7 +375,7 @@ int ext3_acl_chmod(struct inode *inode) { struct posix_acl *acl, *clone; -int error; + int error, error2; if (S_ISLNK(inode->i_mode)) return -EOPNOTSUPP; @@ -402,7 +402,9 @@ ext3_acl_chmod(struct inode *inode) goto out; } error = ext3_set_acl(handle, inode, ACL_TYPE_ACCESS, clone); - ext3_journal_stop(handle); + error2 = ext3_journal_stop(handle); + if (error2 && (!error || error == -ENOSPC)) + error = error2; if (error == -ENOSPC && ext3_should_retry_alloc(inode->i_sb, )) goto retry; @@ -485,7 +487,7 @@ ext3_xattr_set_acl(struct inode *inode, int type, const void *value, { handle_t *handle; struct posix_acl *acl; - int error, retries = 0; + int error, error2, retries = 0; if (!test_opt(inode->i_sb, POSIX_ACL)) return -EOPNOTSUPP; @@ -509,7 +511,9 @@ retry: if (IS_ERR(handle)) return PTR_ERR(handle); error = ext3_set_acl(handle, inode, type, acl); - ext3_journal_stop(handle); + error2 = ext3_journal_stop(handle); + if (error2 && (!error || error == -ENOSPC)) + error = error2; if (error == -ENOSPC && ext3_should_retry_alloc(inode->i_sb, )) goto retry; diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c index 816f0c5..68d1c71 100644 --- a/fs/ext3/inode.c +++ b/fs/ext3/inode.c @@ -962,7 +962,9 @@ static int ext3_get_block(struct inode *inode, sector_t iblock, * Huge direct-io writes can hold off commits for long * periods of time. Let this commit run. */ - ext3_journal_stop(handle); + ret = ext3_journal_stop(handle); + if (ret) + goto get_block; handle = ext3_journal_start(inode, DIO_CREDITS); if (IS_ERR(handle)) ret = PTR_ERR(handle); @@ -2998,7 +3000,13 @@ int ext3_setattr(struct dentry *dentry, struct iattr *attr) if (attr->ia_valid & ATTR_GID) inode->i_gid = attr->ia_gid; error = ext3_mark_inode_dirty(handle, inode); - ext3_journal_stop(handle); + if (error) { + ext3_journal_stop(handle); + goto err_out; + } + error = ext3_journal_stop(handle); + if (error) + goto err_out; } if (S_ISREG(inode->i_mode) && @@ -3016,7 +3024,9 @@ int ext3_setattr(struct dentry *dentry, struct iattr *attr) rc = ext3_mark_inode_dirty(handle, inode); if (!error) error = rc; - ext3_journal_stop(handle); + rc = ext3_journal_stop(handle); + if (!error) + error = rc; } rc = inode_setattr(inode, attr); @@ -3231,7 +3241,7 @@ int ext3_change_inode_journal_flag(struct inode *inode, int val) { journal_t *journal; handle_t *handle; - int err; + int err, err2; /* * We have to be very careful here: changing a data block's @@ -3274,7 +3284,9 @@ int ext3_change_inode_journal_flag(struct inode *inode, int val) err = ext3_mark_inode_dirty(handle, inode); handle->h_sync = 1; - ext3_journal_stop(handle); + err2 = ext3_journal_stop(handle); + if (!err) + err = err2; ext3_std_error(inode->i_sb, err); return err; diff --git a/fs/ext3/ioctl.c b/fs/ext3/ioctl.c index
[PATCH] mm: be sure to trim blocks after direct_io has failed
This is updated version of patch aimed to fix direct_io error handling issue i've previously sent 2 wheeks ago. If you don't like anything in this patch plese let me know. Changes: - comments added. I think now it is clearly describe things. - patch prepared against 2.6.20-mm2 How this patch tested: - LTP test, and other readv/writev op tests. - fsstress test. - manual direct_io tests. Log: - Move common segment checks to separate helper function. - Trim off blocks after generic_file_direct_write() has failed. - Update out of date comments about direct_io locking rules. Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]> --- mm/filemap.c | 107 +++-- 1 files changed, 66 insertions(+), 41 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index a9284c2..c81184c 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1147,6 +1147,38 @@ success: return size; } +/* + * Performs necessary checks before doing a write + * + * Adjust number of segments and amount of bytes to write. + * Returns appropriate error code that caller should return or + * zero in case that write should be allowed. + */ +int generic_segment_checks(const struct iovec *iov, unsigned long *nr_segs, + size_t *count, unsigned long access_flags) +{ + unsigned long seg; + for (seg = 0; seg < *nr_segs; seg++) { + const struct iovec *iv = [seg]; + + /* +* If any segment has a negative length, or the cumulative +* length ever wraps negative then return -EINVAL. +*/ + *count += iv->iov_len; + if (unlikely((ssize_t)(*count|iv->iov_len) < 0)) + return -EINVAL; + if (access_ok(access_flags, iv->iov_base, iv->iov_len)) + continue; + if (seg == 0) + return -EFAULT; + *nr_segs = seg; + *count -= iv->iov_len; /* This segment is no good */ + break; + } + return 0; +} + /** * generic_file_aio_read - generic filesystem read routine * @iocb: kernel I/O control block @@ -1168,24 +1200,9 @@ generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov, loff_t *ppos = >ki_pos; count = 0; - for (seg = 0; seg < nr_segs; seg++) { - const struct iovec *iv = [seg]; - - /* -* If any segment has a negative length, or the cumulative -* length ever wraps negative then return -EINVAL. -*/ - count += iv->iov_len; - if (unlikely((ssize_t)(count|iv->iov_len) < 0)) - return -EINVAL; - if (access_ok(VERIFY_WRITE, iv->iov_base, iv->iov_len)) - continue; - if (seg == 0) - return -EFAULT; - nr_segs = seg; - count -= iv->iov_len; /* This segment is no good */ - break; - } + retval = generic_segment_checks(iov, _segs, , VERIFY_WRITE); + if (retval) + return retval; /* coalesce the iovecs and go direct-to-BIO for O_DIRECT */ if (filp->f_flags & O_DIRECT) { @@ -2080,8 +2097,9 @@ generic_file_direct_write(struct kiocb *iocb, const struct iovec *iov, /* * Sync the fs metadata but not the minor inode changes and * of course not the data as we did direct DMA for the IO. -* i_mutex is held, which protects generic_osync_inode() from -* livelocking. AIO O_DIRECT ops attempt to sync metadata here. +* i_mutex may not being held, if so some specific locking +* ordering must protect generic_osync_inode() from livelocking. +* AIO O_DIRECT ops attempt to sync metadata here. */ if ((written >= 0 || written == -EIOCBQUEUED) && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) { @@ -2271,30 +2289,14 @@ __generic_file_aio_write_nolock(struct kiocb *iocb, const struct iovec *iov, size_t ocount; /* original count */ size_t count; /* after file limit checks */ struct inode*inode = mapping->host; - unsigned long seg; loff_t pos; ssize_t written; ssize_t err; ocount = 0; - for (seg = 0; seg < nr_segs; seg++) { - const struct iovec *iv = [seg]; - - /* -* If any segment has a negative length, or the cumulative -* length ever wraps negative then return -EINVAL. -*/ - ocount += iv->iov_len; - if (unlikely((ssize_t)(ocount|iv->iov_len) < 0)) - return -EINVAL; - if (access_ok(VERIFY_READ, iv->iov_base, iv->iov_len)) - continue; - if (seg == 0) -
[PATCH] mm: be sure to trim blocks after direct_io has failed
This is updated version of patch aimed to fix direct_io error handling issue i've previously sent 2 wheeks ago. If you don't like anything in this patch plese let me know. Changes: - comments added. I think now it is clearly describe things. - patch prepared against 2.6.20-mm2 How this patch tested: - LTP test, and other readv/writev op tests. - fsstress test. - manual direct_io tests. Log: - Move common segment checks to separate helper function. - Trim off blocks after generic_file_direct_write() has failed. - Update out of date comments about direct_io locking rules. Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] --- mm/filemap.c | 107 +++-- 1 files changed, 66 insertions(+), 41 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index a9284c2..c81184c 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1147,6 +1147,38 @@ success: return size; } +/* + * Performs necessary checks before doing a write + * + * Adjust number of segments and amount of bytes to write. + * Returns appropriate error code that caller should return or + * zero in case that write should be allowed. + */ +int generic_segment_checks(const struct iovec *iov, unsigned long *nr_segs, + size_t *count, unsigned long access_flags) +{ + unsigned long seg; + for (seg = 0; seg *nr_segs; seg++) { + const struct iovec *iv = iov[seg]; + + /* +* If any segment has a negative length, or the cumulative +* length ever wraps negative then return -EINVAL. +*/ + *count += iv-iov_len; + if (unlikely((ssize_t)(*count|iv-iov_len) 0)) + return -EINVAL; + if (access_ok(access_flags, iv-iov_base, iv-iov_len)) + continue; + if (seg == 0) + return -EFAULT; + *nr_segs = seg; + *count -= iv-iov_len; /* This segment is no good */ + break; + } + return 0; +} + /** * generic_file_aio_read - generic filesystem read routine * @iocb: kernel I/O control block @@ -1168,24 +1200,9 @@ generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov, loff_t *ppos = iocb-ki_pos; count = 0; - for (seg = 0; seg nr_segs; seg++) { - const struct iovec *iv = iov[seg]; - - /* -* If any segment has a negative length, or the cumulative -* length ever wraps negative then return -EINVAL. -*/ - count += iv-iov_len; - if (unlikely((ssize_t)(count|iv-iov_len) 0)) - return -EINVAL; - if (access_ok(VERIFY_WRITE, iv-iov_base, iv-iov_len)) - continue; - if (seg == 0) - return -EFAULT; - nr_segs = seg; - count -= iv-iov_len; /* This segment is no good */ - break; - } + retval = generic_segment_checks(iov, nr_segs, count, VERIFY_WRITE); + if (retval) + return retval; /* coalesce the iovecs and go direct-to-BIO for O_DIRECT */ if (filp-f_flags O_DIRECT) { @@ -2080,8 +2097,9 @@ generic_file_direct_write(struct kiocb *iocb, const struct iovec *iov, /* * Sync the fs metadata but not the minor inode changes and * of course not the data as we did direct DMA for the IO. -* i_mutex is held, which protects generic_osync_inode() from -* livelocking. AIO O_DIRECT ops attempt to sync metadata here. +* i_mutex may not being held, if so some specific locking +* ordering must protect generic_osync_inode() from livelocking. +* AIO O_DIRECT ops attempt to sync metadata here. */ if ((written = 0 || written == -EIOCBQUEUED) ((file-f_flags O_SYNC) || IS_SYNC(inode))) { @@ -2271,30 +2289,14 @@ __generic_file_aio_write_nolock(struct kiocb *iocb, const struct iovec *iov, size_t ocount; /* original count */ size_t count; /* after file limit checks */ struct inode*inode = mapping-host; - unsigned long seg; loff_t pos; ssize_t written; ssize_t err; ocount = 0; - for (seg = 0; seg nr_segs; seg++) { - const struct iovec *iv = iov[seg]; - - /* -* If any segment has a negative length, or the cumulative -* length ever wraps negative then return -EINVAL. -*/ - ocount += iv-iov_len; - if (unlikely((ssize_t)(ocount|iv-iov_len) 0)) - return -EINVAL; - if (access_ok(VERIFY_READ, iv-iov_base, iv-iov_len)) - continue; - if (seg == 0) -
[PATCH] ecryptfs: check xattr operation support fix
- ecryptfs_write_inode_size_to_metadata() error code was ignored. - i_op-setxattr() must be supported by lower fs because used below. Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] --- fs/ecryptfs/inode.c |6 +++--- fs/ecryptfs/mmap.c |3 ++- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c index 27fd14a..9ccefad 100644 --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -168,9 +168,9 @@ static int grow_file(struct dentry *ecryptfs_dentry, struct file *lower_file, goto out; } i_size_write(inode, 0); - ecryptfs_write_inode_size_to_metadata(lower_file, lower_inode, inode, - ecryptfs_dentry, - ECRYPTFS_LOWER_I_MUTEX_NOT_HELD); + rc = ecryptfs_write_inode_size_to_metadata(lower_file, lower_inode, + inode, ecryptfs_dentry, + ECRYPTFS_LOWER_I_MUTEX_NOT_HELD); ecryptfs_inode_to_private(inode)-crypt_stat.flags |= ECRYPTFS_NEW_FILE; out: return rc; diff --git a/fs/ecryptfs/mmap.c b/fs/ecryptfs/mmap.c index 1e5d2ba..416985f 100644 --- a/fs/ecryptfs/mmap.c +++ b/fs/ecryptfs/mmap.c @@ -491,7 +491,8 @@ static int ecryptfs_write_inode_size_to_xattr(struct inode *lower_inode, goto out; } lower_dentry = ecryptfs_dentry_to_lower(ecryptfs_dentry); - if (!lower_dentry-d_inode-i_op-getxattr) { + if (!lower_dentry-d_inode-i_op-getxattr || + !lower_dentry-d_inode-i_op-setxattr) { printk(KERN_WARNING No support for setting xattr in lower filesystem\n); rc = -ENOSYS; -- 1.5.0.1 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][RFC] ext3: Handle ext[34]_journal_stop() failure
Where are many places where _journal_stop() return code wasn't checked. Off cause _journal_stop() failed very rarely (and usually with fatal consequences), but this does'n meen it should not be checked. For example most retry loops looks like follows: ext3_journal_stop(handle); if (err == -ENOSPC ext3_should_retry_alloc(dir-i_sb, retries)) goto retry; It is realy insane do retry if ext3_journal_stop() has failed, because this may increase damage in case of real problem. Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] --- fs/ext3/acl.c | 12 fs/ext3/inode.c | 22 +- fs/ext3/ioctl.c | 12 fs/ext3/namei.c | 48 fs/ext3/xattr.c |4 ++-- fs/ext4/acl.c | 12 fs/ext4/inode.c | 22 +- fs/ext4/ioctl.c | 12 fs/ext4/namei.c | 48 fs/ext4/xattr.c |4 ++-- 10 files changed, 134 insertions(+), 62 deletions(-) diff --git a/fs/ext3/acl.c b/fs/ext3/acl.c index 1e5038d..bbf4d8a 100644 --- a/fs/ext3/acl.c +++ b/fs/ext3/acl.c @@ -375,7 +375,7 @@ int ext3_acl_chmod(struct inode *inode) { struct posix_acl *acl, *clone; -int error; + int error, error2; if (S_ISLNK(inode-i_mode)) return -EOPNOTSUPP; @@ -402,7 +402,9 @@ ext3_acl_chmod(struct inode *inode) goto out; } error = ext3_set_acl(handle, inode, ACL_TYPE_ACCESS, clone); - ext3_journal_stop(handle); + error2 = ext3_journal_stop(handle); + if (error2 (!error || error == -ENOSPC)) + error = error2; if (error == -ENOSPC ext3_should_retry_alloc(inode-i_sb, retries)) goto retry; @@ -485,7 +487,7 @@ ext3_xattr_set_acl(struct inode *inode, int type, const void *value, { handle_t *handle; struct posix_acl *acl; - int error, retries = 0; + int error, error2, retries = 0; if (!test_opt(inode-i_sb, POSIX_ACL)) return -EOPNOTSUPP; @@ -509,7 +511,9 @@ retry: if (IS_ERR(handle)) return PTR_ERR(handle); error = ext3_set_acl(handle, inode, type, acl); - ext3_journal_stop(handle); + error2 = ext3_journal_stop(handle); + if (error2 (!error || error == -ENOSPC)) + error = error2; if (error == -ENOSPC ext3_should_retry_alloc(inode-i_sb, retries)) goto retry; diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c index 816f0c5..68d1c71 100644 --- a/fs/ext3/inode.c +++ b/fs/ext3/inode.c @@ -962,7 +962,9 @@ static int ext3_get_block(struct inode *inode, sector_t iblock, * Huge direct-io writes can hold off commits for long * periods of time. Let this commit run. */ - ext3_journal_stop(handle); + ret = ext3_journal_stop(handle); + if (ret) + goto get_block; handle = ext3_journal_start(inode, DIO_CREDITS); if (IS_ERR(handle)) ret = PTR_ERR(handle); @@ -2998,7 +3000,13 @@ int ext3_setattr(struct dentry *dentry, struct iattr *attr) if (attr-ia_valid ATTR_GID) inode-i_gid = attr-ia_gid; error = ext3_mark_inode_dirty(handle, inode); - ext3_journal_stop(handle); + if (error) { + ext3_journal_stop(handle); + goto err_out; + } + error = ext3_journal_stop(handle); + if (error) + goto err_out; } if (S_ISREG(inode-i_mode) @@ -3016,7 +3024,9 @@ int ext3_setattr(struct dentry *dentry, struct iattr *attr) rc = ext3_mark_inode_dirty(handle, inode); if (!error) error = rc; - ext3_journal_stop(handle); + rc = ext3_journal_stop(handle); + if (!error) + error = rc; } rc = inode_setattr(inode, attr); @@ -3231,7 +3241,7 @@ int ext3_change_inode_journal_flag(struct inode *inode, int val) { journal_t *journal; handle_t *handle; - int err; + int err, err2; /* * We have to be very careful here: changing a data block's @@ -3274,7 +3284,9 @@ int ext3_change_inode_journal_flag(struct inode *inode, int val) err = ext3_mark_inode_dirty(handle, inode); handle-h_sync = 1; - ext3_journal_stop(handle); + err2 = ext3_journal_stop(handle); + if (!err) + err = err2; ext3_std_error(inode-i_sb, err); return err; diff --git a/fs/ext3/ioctl.c b/fs/ext3/ioctl.c index
[PATCH] ecryptfs: lower root result must be adirectory
patch against lastest mm tree. - Currently after path_lookup succeed we dot't have any guarantie what it is DIR. This must be explicitly demanded. - path_lookup can't return negative dentry, So inode check is useless. Signed-off-by: Dmitriy Monakhov <[EMAIL PROTECTED]> diff --git a/fs/ecryptfs/main.c b/fs/ecryptfs/main.c index 812427e..fc4a3a2 100644 --- a/fs/ecryptfs/main.c +++ b/fs/ecryptfs/main.c @@ -484,18 +484,12 @@ static int ecryptfs_read_super(struct super_block *sb, const char *dev_name) struct vfsmount *lower_mnt; memset(, 0, sizeof(struct nameidata)); - rc = path_lookup(dev_name, LOOKUP_FOLLOW, ); + rc = path_lookup(dev_name, LOOKUP_FOLLOW | LOOKUP_DIRECTORY, ); if (rc) { ecryptfs_printk(KERN_WARNING, "path_lookup() failed\n"); goto out; } lower_root = nd.dentry; - if (!lower_root->d_inode) { - ecryptfs_printk(KERN_WARNING, - "No directory to interpose on\n"); - rc = -ENOENT; - goto out_free; - } lower_mnt = nd.mnt; ecryptfs_set_superblock_lower(sb, lower_root->d_sb); sb->s_maxbytes = lower_root->d_sb->s_maxbytes;
[PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() [resend]
Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]> --- drivers/net/sk98lin/skge.c | 20 +++- 1 files changed, 15 insertions(+), 5 deletions(-) diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c index e94ab25..eea753a 100644 --- a/drivers/net/sk98lin/skge.c +++ b/drivers/net/sk98lin/skge.c @@ -5125,7 +5125,12 @@ static int skge_resume(struct pci_dev *pdev) pci_set_power_state(pdev, PCI_D0); pci_restore_state(pdev); - pci_enable_device(pdev); + ret = pci_enable_device(pdev); + if (ret) { + printk(KERN_WARNING "sk98lin: unable to enable device %s " + "in resume\n", dev->name); + goto err_out; + } pci_set_master(pdev); if (pAC->GIni.GIMacsFound == 2) ret = request_irq(dev->irq, SkGeIsr, IRQF_SHARED, "sk98lin", dev); @@ -5133,10 +5138,8 @@ static int skge_resume(struct pci_dev *pdev) ret = request_irq(dev->irq, SkGeIsrOnePort, IRQF_SHARED, "sk98lin", dev); if (ret) { printk(KERN_WARNING "sk98lin: unable to acquire IRQ %d\n", dev->irq); - pAC->AllocFlag &= ~SK_ALLOC_IRQ; - dev->irq = 0; - pci_disable_device(pdev); - return -EBUSY; + ret = -EBUSY; + goto err_out_disable_pdev; } netif_device_attach(dev); @@ -5153,6 +5156,13 @@ static int skge_resume(struct pci_dev *pdev) } return 0; + +err_out_disable_pdev: + pci_disable_device(pdev); +err_out: + pAC->AllocFlag &= ~SK_ALLOC_IRQ; + dev->irq = 0; + return ret; } #else #define skge_suspend NULL -- 1.4.4.4 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ecryptfs: lower root result must be adirectory
patch against lastest mm tree. - Currently after path_lookup succeed we dot't have any guarantie what it is DIR. This must be explicitly demanded. - path_lookup can't return negative dentry, So inode check is useless. Signed-off-by: Dmitriy Monakhov [EMAIL PROTECTED] diff --git a/fs/ecryptfs/main.c b/fs/ecryptfs/main.c index 812427e..fc4a3a2 100644 --- a/fs/ecryptfs/main.c +++ b/fs/ecryptfs/main.c @@ -484,18 +484,12 @@ static int ecryptfs_read_super(struct super_block *sb, const char *dev_name) struct vfsmount *lower_mnt; memset(nd, 0, sizeof(struct nameidata)); - rc = path_lookup(dev_name, LOOKUP_FOLLOW, nd); + rc = path_lookup(dev_name, LOOKUP_FOLLOW | LOOKUP_DIRECTORY, nd); if (rc) { ecryptfs_printk(KERN_WARNING, path_lookup() failed\n); goto out; } lower_root = nd.dentry; - if (!lower_root-d_inode) { - ecryptfs_printk(KERN_WARNING, - No directory to interpose on\n); - rc = -ENOENT; - goto out_free; - } lower_mnt = nd.mnt; ecryptfs_set_superblock_lower(sb, lower_root-d_sb); sb-s_maxbytes = lower_root-d_sb-s_maxbytes;
[PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() [resend]
Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] --- drivers/net/sk98lin/skge.c | 20 +++- 1 files changed, 15 insertions(+), 5 deletions(-) diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c index e94ab25..eea753a 100644 --- a/drivers/net/sk98lin/skge.c +++ b/drivers/net/sk98lin/skge.c @@ -5125,7 +5125,12 @@ static int skge_resume(struct pci_dev *pdev) pci_set_power_state(pdev, PCI_D0); pci_restore_state(pdev); - pci_enable_device(pdev); + ret = pci_enable_device(pdev); + if (ret) { + printk(KERN_WARNING sk98lin: unable to enable device %s + in resume\n, dev-name); + goto err_out; + } pci_set_master(pdev); if (pAC-GIni.GIMacsFound == 2) ret = request_irq(dev-irq, SkGeIsr, IRQF_SHARED, sk98lin, dev); @@ -5133,10 +5138,8 @@ static int skge_resume(struct pci_dev *pdev) ret = request_irq(dev-irq, SkGeIsrOnePort, IRQF_SHARED, sk98lin, dev); if (ret) { printk(KERN_WARNING sk98lin: unable to acquire IRQ %d\n, dev-irq); - pAC-AllocFlag = ~SK_ALLOC_IRQ; - dev-irq = 0; - pci_disable_device(pdev); - return -EBUSY; + ret = -EBUSY; + goto err_out_disable_pdev; } netif_device_attach(dev); @@ -5153,6 +5156,13 @@ static int skge_resume(struct pci_dev *pdev) } return 0; + +err_out_disable_pdev: + pci_disable_device(pdev); +err_out: + pAC-AllocFlag = ~SK_ALLOC_IRQ; + dev-irq = 0; + return ret; } #else #define skge_suspend NULL -- 1.4.4.4 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] pcmcia: Handle request_irq() failure while opening device
Monakhov Dmitriy <[EMAIL PROTECTED]> writes: > Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]> > --- > drivers/net/pcmcia/axnet_cs.c |8 +++- > drivers/net/pcmcia/pcnet_cs.c |8 +++- > 2 files changed, 14 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/pcmcia/axnet_cs.c b/drivers/net/pcmcia/axnet_cs.c > index 6139048..9b57bab 100644 > --- a/drivers/net/pcmcia/axnet_cs.c > +++ b/drivers/net/pcmcia/axnet_cs.c > @@ -523,6 +523,7 @@ static int axnet_open(struct net_device *dev) > { > axnet_dev_t *info = PRIV(dev); > struct pcmcia_device *link = info->p_dev; > +int err = 0; > > DEBUG(2, "axnet_open('%s')\n", dev->name); > > @@ -531,7 +532,12 @@ static int axnet_open(struct net_device *dev) > > link->open++; > > -request_irq(dev->irq, ei_irq_wrapper, IRQF_SHARED, "axnet_cs", dev); > +err = request_irq(dev->irq, ei_irq_wrapper, IRQF_SHARED, "axnet_cs", > dev); > +if (err) { > + printk("axnet_cs: request_irq() failed %s\n", dev->name); > + link->open--; > + return -EBUSY; As David Miller noted it is better to return exact error code from request_irq(), truly most net_device->open() functions return error code without changing. The updated patch version following: Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]> --- drivers/net/pcmcia/axnet_cs.c |8 +++- drivers/net/pcmcia/pcnet_cs.c |8 +++- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/net/pcmcia/axnet_cs.c b/drivers/net/pcmcia/axnet_cs.c index 6139048..9b57bab 100644 --- a/drivers/net/pcmcia/axnet_cs.c +++ b/drivers/net/pcmcia/axnet_cs.c @@ -523,6 +523,7 @@ static int axnet_open(struct net_device *dev) { axnet_dev_t *info = PRIV(dev); struct pcmcia_device *link = info->p_dev; +int err = 0; DEBUG(2, "axnet_open('%s')\n", dev->name); @@ -531,7 +532,12 @@ static int axnet_open(struct net_device *dev) link->open++; -request_irq(dev->irq, ei_irq_wrapper, IRQF_SHARED, "axnet_cs", dev); +err = request_irq(dev->irq, ei_irq_wrapper, IRQF_SHARED, "axnet_cs", dev); +if (err) { + printk("axnet_cs: request_irq() failed %s\n", dev->name); + link->open--; + return err; +} info->link_status = 0x00; init_timer(>watchdog); diff --git a/drivers/net/pcmcia/pcnet_cs.c b/drivers/net/pcmcia/pcnet_cs.c index d88e9b2..13923ea 100644 --- a/drivers/net/pcmcia/pcnet_cs.c +++ b/drivers/net/pcmcia/pcnet_cs.c @@ -962,6 +962,7 @@ static int pcnet_open(struct net_device *dev) { pcnet_dev_t *info = PRIV(dev); struct pcmcia_device *link = info->p_dev; +int err = 0; DEBUG(2, "pcnet_open('%s')\n", dev->name); @@ -971,7 +972,12 @@ static int pcnet_open(struct net_device *dev) link->open++; set_misc_reg(dev); -request_irq(dev->irq, ei_irq_wrapper, IRQF_SHARED, dev_info, dev); +err = request_irq(dev->irq, ei_irq_wrapper, IRQF_SHARED, dev_info, dev); +if (err) { + printk("pcnet_open: request_irq() failed %s\n", dev->name); + link->open--; + return err; +} info->phy_id = info->eth_phy; info->link_status = 0x00; -- 1.5.0.1 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] pcmcia: Handle request_irq() failure while opening device
Monakhov Dmitriy [EMAIL PROTECTED] writes: Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] --- drivers/net/pcmcia/axnet_cs.c |8 +++- drivers/net/pcmcia/pcnet_cs.c |8 +++- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/net/pcmcia/axnet_cs.c b/drivers/net/pcmcia/axnet_cs.c index 6139048..9b57bab 100644 --- a/drivers/net/pcmcia/axnet_cs.c +++ b/drivers/net/pcmcia/axnet_cs.c @@ -523,6 +523,7 @@ static int axnet_open(struct net_device *dev) { axnet_dev_t *info = PRIV(dev); struct pcmcia_device *link = info-p_dev; +int err = 0; DEBUG(2, axnet_open('%s')\n, dev-name); @@ -531,7 +532,12 @@ static int axnet_open(struct net_device *dev) link-open++; -request_irq(dev-irq, ei_irq_wrapper, IRQF_SHARED, axnet_cs, dev); +err = request_irq(dev-irq, ei_irq_wrapper, IRQF_SHARED, axnet_cs, dev); +if (err) { + printk(axnet_cs: request_irq() failed %s\n, dev-name); + link-open--; + return -EBUSY; As David Miller noted it is better to return exact error code from request_irq(), truly most net_device-open() functions return error code without changing. The updated patch version following: Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] --- drivers/net/pcmcia/axnet_cs.c |8 +++- drivers/net/pcmcia/pcnet_cs.c |8 +++- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/net/pcmcia/axnet_cs.c b/drivers/net/pcmcia/axnet_cs.c index 6139048..9b57bab 100644 --- a/drivers/net/pcmcia/axnet_cs.c +++ b/drivers/net/pcmcia/axnet_cs.c @@ -523,6 +523,7 @@ static int axnet_open(struct net_device *dev) { axnet_dev_t *info = PRIV(dev); struct pcmcia_device *link = info-p_dev; +int err = 0; DEBUG(2, axnet_open('%s')\n, dev-name); @@ -531,7 +532,12 @@ static int axnet_open(struct net_device *dev) link-open++; -request_irq(dev-irq, ei_irq_wrapper, IRQF_SHARED, axnet_cs, dev); +err = request_irq(dev-irq, ei_irq_wrapper, IRQF_SHARED, axnet_cs, dev); +if (err) { + printk(axnet_cs: request_irq() failed %s\n, dev-name); + link-open--; + return err; +} info-link_status = 0x00; init_timer(info-watchdog); diff --git a/drivers/net/pcmcia/pcnet_cs.c b/drivers/net/pcmcia/pcnet_cs.c index d88e9b2..13923ea 100644 --- a/drivers/net/pcmcia/pcnet_cs.c +++ b/drivers/net/pcmcia/pcnet_cs.c @@ -962,6 +962,7 @@ static int pcnet_open(struct net_device *dev) { pcnet_dev_t *info = PRIV(dev); struct pcmcia_device *link = info-p_dev; +int err = 0; DEBUG(2, pcnet_open('%s')\n, dev-name); @@ -971,7 +972,12 @@ static int pcnet_open(struct net_device *dev) link-open++; set_misc_reg(dev); -request_irq(dev-irq, ei_irq_wrapper, IRQF_SHARED, dev_info, dev); +err = request_irq(dev-irq, ei_irq_wrapper, IRQF_SHARED, dev_info, dev); +if (err) { + printk(pcnet_open: request_irq() failed %s\n, dev-name); + link-open--; + return err; +} info-phy_id = info-eth_phy; info-link_status = 0x00; -- 1.5.0.1 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] libata: handle ata_pci_device_do_resume() failure while resuming (v2)
Randy Dunlap <[EMAIL PROTECTED]> writes: > On Sat, 24 Feb 2007 00:43:18 +0300 Dmitriy Monakhov wrote: > >> Since commit:553c4aa630af7bc885e056d0436e4eb7f238579b >> ata_pci_device_do_resume() can return error code, all callers was updated >> except this one. >> >> Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]> >> --- >> drivers/ata/sata_inic162x.c |5 - >> 1 files changed, 4 insertions(+), 1 deletions(-) >> >> diff --git a/drivers/ata/sata_inic162x.c b/drivers/ata/sata_inic162x.c >> index 31b636f..7933043 100644 >> --- a/drivers/ata/sata_inic162x.c >> +++ b/drivers/ata/sata_inic162x.c >> @@ -639,7 +639,10 @@ static int inic_pci_device_resume(struct pci_dev *pdev) >> void __iomem *mmio_base = host->iomap[MMIO_BAR]; >> int rc; >> >> -ata_pci_device_do_resume(pdev); >> +rc = ata_pci_device_do_resume(pdev); >> +if (rc) { >> +return rc; > > Either (a) don't use the braces when they are not needed, or > (b) is it possible to add a meaningful message there, or is that > done elsewhere? True. Thanks for correcting me here. It is't realy necessary print err message because ata_pci_device_do_resume() already done this. So just remove braces. Since commit:553c4aa630af7bc885e056d0436e4eb7f238579b ata_pci_device_do_resume() can return error code, all callers was updated except this one. Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]> diff --git a/drivers/ata/sata_inic162x.c b/drivers/ata/sata_inic162x.c index 31b636f..18ac665 100644 --- a/drivers/ata/sata_inic162x.c +++ b/drivers/ata/sata_inic162x.c @@ -639,7 +639,9 @@ static int inic_pci_device_resume(struct pci_dev *pdev) void __iomem *mmio_base = host->iomap[MMIO_BAR]; int rc; - ata_pci_device_do_resume(pdev); + rc = ata_pci_device_do_resume(pdev); + if (rc) + return rc; if (pdev->dev.power.power_state.event == PM_EVENT_SUSPEND) { printk("XXX\n");
[PATCH] libata: handle ata_pci_device_do_resume() failure while resuming (v2)
Randy Dunlap [EMAIL PROTECTED] writes: On Sat, 24 Feb 2007 00:43:18 +0300 Dmitriy Monakhov wrote: Since commit:553c4aa630af7bc885e056d0436e4eb7f238579b ata_pci_device_do_resume() can return error code, all callers was updated except this one. Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] --- drivers/ata/sata_inic162x.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/drivers/ata/sata_inic162x.c b/drivers/ata/sata_inic162x.c index 31b636f..7933043 100644 --- a/drivers/ata/sata_inic162x.c +++ b/drivers/ata/sata_inic162x.c @@ -639,7 +639,10 @@ static int inic_pci_device_resume(struct pci_dev *pdev) void __iomem *mmio_base = host-iomap[MMIO_BAR]; int rc; -ata_pci_device_do_resume(pdev); +rc = ata_pci_device_do_resume(pdev); +if (rc) { +return rc; Either (a) don't use the braces when they are not needed, or (b) is it possible to add a meaningful message there, or is that done elsewhere? True. Thanks for correcting me here. It is't realy necessary print err message because ata_pci_device_do_resume() already done this. So just remove braces. Since commit:553c4aa630af7bc885e056d0436e4eb7f238579b ata_pci_device_do_resume() can return error code, all callers was updated except this one. Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] diff --git a/drivers/ata/sata_inic162x.c b/drivers/ata/sata_inic162x.c index 31b636f..18ac665 100644 --- a/drivers/ata/sata_inic162x.c +++ b/drivers/ata/sata_inic162x.c @@ -639,7 +639,9 @@ static int inic_pci_device_resume(struct pci_dev *pdev) void __iomem *mmio_base = host-iomap[MMIO_BAR]; int rc; - ata_pci_device_do_resume(pdev); + rc = ata_pci_device_do_resume(pdev); + if (rc) + return rc; if (pdev-dev.power.power_state.event == PM_EVENT_SUSPEND) { printk(XXX\n);
Re: [PATCH] sk98lin: handle pci_enable_device() return value in skge_resume()
Dmitriy Monakhov <[EMAIL PROTECTED]> writes: > This thread looks dead but issue was't fixed. > > Jiri Kosina <[EMAIL PROTECTED]> writes: >>> > - pci_enable_device(pdev); >>> > + ret = pci_enable_device(pdev); >>> > + if (ret) { >>> > + printk(KERN_ERR "sk98lin: Cannot enable PCI device %s during >>> > resume\n", >>> > + dev->name); >>> > + unregister_netdev(dev); >>> This looks rather wrong - skge_exit() will run unregister_netdev() again. >> >> You are of course right (the problem was also spotted by Russell King). >> This I believe is the correct one for the sk98lin case. >> >> [PATCH] fix sk98lin driver, ignoring return value from pci_enable_device() >> >> add check of return value to _resume() function of sk98lin driver. >> >> Signed-off-by: Jiri Kosina <[EMAIL PROTECTED]> >> >> --- >> >> --- a/drivers/net/sk98lin/skge.c >> +++ b/drivers/net/sk98lin/skge.c >> @@ -5070,7 +5070,12 @@ static int skge_resume(struct pci_dev *p >> >> pci_set_power_state(pdev, PCI_D0); >> pci_restore_state(pdev); >> -pci_enable_device(pdev); >> +ret = pci_enable_device(pdev); >> +if (ret) { >> +printk(KERN_WARNING "sk98lin: unable to enable device %s in >> resume\n", >> +dev->name); >> +goto out_err; >> +} > [snip] >> +out_err: >> +pAC->AllocFlag &= ~SK_ALLOC_IRQ; >> +dev->irq = 0; >> +pci_disable_device(pdev); > <<<<< Ok what happend if we jump here right after pci_disable_device() has OOPs.Of course i mean pci_enable_device() here^ I'm sorry about this. (Option complete_word not always good, some times brain work required too :) ) So correct comment looks like: <<<<< Ok what happend if we jump here right after pci_enable_device() has failed, but pci_disable_device() was called anyway, this is wrong and may be fatal because pdev->enable_cnt may becomes negative. >> + >> +return ret; >> + >> } >> #else >> #define skge_suspend NULL > This is reworked Jiri's patch: > > [PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() > > Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]> > --- > drivers/net/sk98lin/skge.c | 20 +++- > 1 files changed, 15 insertions(+), 5 deletions(-) > > diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c > index e94ab25..eea753a 100644 > --- a/drivers/net/sk98lin/skge.c > +++ b/drivers/net/sk98lin/skge.c > @@ -5125,7 +5125,12 @@ static int skge_resume(struct pci_dev *pdev) > > pci_set_power_state(pdev, PCI_D0); > pci_restore_state(pdev); > - pci_enable_device(pdev); > + ret = pci_enable_device(pdev); > + if (ret) { > + printk(KERN_WARNING "sk98lin: unable to enable device %s " > + "in resume\n", dev->name); > + goto err_out; > + } > pci_set_master(pdev); > if (pAC->GIni.GIMacsFound == 2) > ret = request_irq(dev->irq, SkGeIsr, IRQF_SHARED, "sk98lin", > dev); > @@ -5133,10 +5138,8 @@ static int skge_resume(struct pci_dev *pdev) > ret = request_irq(dev->irq, SkGeIsrOnePort, IRQF_SHARED, > "sk98lin", dev); > if (ret) { > printk(KERN_WARNING "sk98lin: unable to acquire IRQ %d\n", > dev->irq); > - pAC->AllocFlag &= ~SK_ALLOC_IRQ; > - dev->irq = 0; > - pci_disable_device(pdev); > - return -EBUSY; > + ret = -EBUSY; > + goto err_out_disable_pdev; > } > > netif_device_attach(dev); > @@ -5153,6 +5156,13 @@ static int skge_resume(struct pci_dev *pdev) > } > > return 0; > + > +err_out_disable_pdev: > + pci_disable_device(pdev); > +err_out: > + pAC->AllocFlag &= ~SK_ALLOC_IRQ; > + dev->irq = 0; > + return ret; > } > #else > #define skge_suspend NULL > -- > 1.4.4.4 > > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sk98lin: handle pci_enable_device() return value in skge_resume()
Dmitriy Monakhov [EMAIL PROTECTED] writes: This thread looks dead but issue was't fixed. Jiri Kosina [EMAIL PROTECTED] writes: - pci_enable_device(pdev); + ret = pci_enable_device(pdev); + if (ret) { + printk(KERN_ERR sk98lin: Cannot enable PCI device %s during resume\n, + dev-name); + unregister_netdev(dev); This looks rather wrong - skge_exit() will run unregister_netdev() again. You are of course right (the problem was also spotted by Russell King). This I believe is the correct one for the sk98lin case. [PATCH] fix sk98lin driver, ignoring return value from pci_enable_device() add check of return value to _resume() function of sk98lin driver. Signed-off-by: Jiri Kosina [EMAIL PROTECTED] --- --- a/drivers/net/sk98lin/skge.c +++ b/drivers/net/sk98lin/skge.c @@ -5070,7 +5070,12 @@ static int skge_resume(struct pci_dev *p pci_set_power_state(pdev, PCI_D0); pci_restore_state(pdev); -pci_enable_device(pdev); +ret = pci_enable_device(pdev); +if (ret) { +printk(KERN_WARNING sk98lin: unable to enable device %s in resume\n, +dev-name); +goto out_err; +} [snip] +out_err: +pAC-AllocFlag = ~SK_ALLOC_IRQ; +dev-irq = 0; +pci_disable_device(pdev); Ok what happend if we jump here right after pci_disable_device() has OOPs.Of course i mean pci_enable_device() here^ I'm sorry about this. (Option complete_word not always good, some times brain work required too :) ) So correct comment looks like: Ok what happend if we jump here right after pci_enable_device() has failed, but pci_disable_device() was called anyway, this is wrong and may be fatal because pdev-enable_cnt may becomes negative. + +return ret; + } #else #define skge_suspend NULL This is reworked Jiri's patch: [PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] --- drivers/net/sk98lin/skge.c | 20 +++- 1 files changed, 15 insertions(+), 5 deletions(-) diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c index e94ab25..eea753a 100644 --- a/drivers/net/sk98lin/skge.c +++ b/drivers/net/sk98lin/skge.c @@ -5125,7 +5125,12 @@ static int skge_resume(struct pci_dev *pdev) pci_set_power_state(pdev, PCI_D0); pci_restore_state(pdev); - pci_enable_device(pdev); + ret = pci_enable_device(pdev); + if (ret) { + printk(KERN_WARNING sk98lin: unable to enable device %s + in resume\n, dev-name); + goto err_out; + } pci_set_master(pdev); if (pAC-GIni.GIMacsFound == 2) ret = request_irq(dev-irq, SkGeIsr, IRQF_SHARED, sk98lin, dev); @@ -5133,10 +5138,8 @@ static int skge_resume(struct pci_dev *pdev) ret = request_irq(dev-irq, SkGeIsrOnePort, IRQF_SHARED, sk98lin, dev); if (ret) { printk(KERN_WARNING sk98lin: unable to acquire IRQ %d\n, dev-irq); - pAC-AllocFlag = ~SK_ALLOC_IRQ; - dev-irq = 0; - pci_disable_device(pdev); - return -EBUSY; + ret = -EBUSY; + goto err_out_disable_pdev; } netif_device_attach(dev); @@ -5153,6 +5156,13 @@ static int skge_resume(struct pci_dev *pdev) } return 0; + +err_out_disable_pdev: + pci_disable_device(pdev); +err_out: + pAC-AllocFlag = ~SK_ALLOC_IRQ; + dev-irq = 0; + return ret; } #else #define skge_suspend NULL -- 1.4.4.4 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] sk98lin: handle pci_enable_device() return value in skge_resume()
This thread looks dead but issue was't fixed. Jiri Kosina <[EMAIL PROTECTED]> writes: >> > - pci_enable_device(pdev); >> > + ret = pci_enable_device(pdev); >> > + if (ret) { >> > + printk(KERN_ERR "sk98lin: Cannot enable PCI device %s during >> > resume\n", >> > + dev->name); >> > + unregister_netdev(dev); >> This looks rather wrong - skge_exit() will run unregister_netdev() again. > > You are of course right (the problem was also spotted by Russell King). > This I believe is the correct one for the sk98lin case. > > [PATCH] fix sk98lin driver, ignoring return value from pci_enable_device() > > add check of return value to _resume() function of sk98lin driver. > > Signed-off-by: Jiri Kosina <[EMAIL PROTECTED]> > > --- > > --- a/drivers/net/sk98lin/skge.c > +++ b/drivers/net/sk98lin/skge.c > @@ -5070,7 +5070,12 @@ static int skge_resume(struct pci_dev *p > > pci_set_power_state(pdev, PCI_D0); > pci_restore_state(pdev); > - pci_enable_device(pdev); > + ret = pci_enable_device(pdev); > + if (ret) { > + printk(KERN_WARNING "sk98lin: unable to enable device %s in > resume\n", > + dev->name); > + goto out_err; > + } [snip] > +out_err: > + pAC->AllocFlag &= ~SK_ALLOC_IRQ; > + dev->irq = 0; > + pci_disable_device(pdev); < Ok what happend if we jump here right after pci_disable_device() has failed, but pci_disable_device() was called anyway, this is wrong and may be fatal because pdev->enable_cnt may becomes negative. > + > + return ret; > + > } > #else > #define skge_suspend NULL This is reworked Jiri's patch: [PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]> --- drivers/net/sk98lin/skge.c | 20 +++- 1 files changed, 15 insertions(+), 5 deletions(-) diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c index e94ab25..eea753a 100644 --- a/drivers/net/sk98lin/skge.c +++ b/drivers/net/sk98lin/skge.c @@ -5125,7 +5125,12 @@ static int skge_resume(struct pci_dev *pdev) pci_set_power_state(pdev, PCI_D0); pci_restore_state(pdev); - pci_enable_device(pdev); + ret = pci_enable_device(pdev); + if (ret) { + printk(KERN_WARNING "sk98lin: unable to enable device %s " + "in resume\n", dev->name); + goto err_out; + } pci_set_master(pdev); if (pAC->GIni.GIMacsFound == 2) ret = request_irq(dev->irq, SkGeIsr, IRQF_SHARED, "sk98lin", dev); @@ -5133,10 +5138,8 @@ static int skge_resume(struct pci_dev *pdev) ret = request_irq(dev->irq, SkGeIsrOnePort, IRQF_SHARED, "sk98lin", dev); if (ret) { printk(KERN_WARNING "sk98lin: unable to acquire IRQ %d\n", dev->irq); - pAC->AllocFlag &= ~SK_ALLOC_IRQ; - dev->irq = 0; - pci_disable_device(pdev); - return -EBUSY; + ret = -EBUSY; + goto err_out_disable_pdev; } netif_device_attach(dev); @@ -5153,6 +5156,13 @@ static int skge_resume(struct pci_dev *pdev) } return 0; + +err_out_disable_pdev: + pci_disable_device(pdev); +err_out: + pAC->AllocFlag &= ~SK_ALLOC_IRQ; + dev->irq = 0; + return ret; } #else #define skge_suspend NULL -- 1.4.4.4 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] libata: handle ata_pci_device_do_resume() failure while resuming
Since commit:553c4aa630af7bc885e056d0436e4eb7f238579b ata_pci_device_do_resume() can return error code, all callers was updated except this one. Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]> --- drivers/ata/sata_inic162x.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/drivers/ata/sata_inic162x.c b/drivers/ata/sata_inic162x.c index 31b636f..7933043 100644 --- a/drivers/ata/sata_inic162x.c +++ b/drivers/ata/sata_inic162x.c @@ -639,7 +639,10 @@ static int inic_pci_device_resume(struct pci_dev *pdev) void __iomem *mmio_base = host->iomap[MMIO_BAR]; int rc; - ata_pci_device_do_resume(pdev); + rc = ata_pci_device_do_resume(pdev); + if (rc) { + return rc; + } if (pdev->dev.power.power_state.event == PM_EVENT_SUSPEND) { printk("XXX\n"); -- 1.4.4.4 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] MPT fusion: handle mpt_resume() failure while resuming
Since mpt-fusion-handle-pci-layer-error-on-resume.patch function mpt_resume() can return error code. Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]> --- drivers/message/fusion/mptscsih.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/drivers/message/fusion/mptscsih.c b/drivers/message/fusion/mptscsih.c index decadbe..96be1d6 100644 --- a/drivers/message/fusion/mptscsih.c +++ b/drivers/message/fusion/mptscsih.c @@ -1191,8 +1191,11 @@ mptscsih_resume(struct pci_dev *pdev) MPT_ADAPTER *ioc = pci_get_drvdata(pdev); struct Scsi_Host*host = ioc->sh; MPT_SCSI_HOST *hd; + int ret; - mpt_resume(pdev); + ret = mpt_resume(pdev); + if (ret) + return ret; if(!host) return 0; -- 1.4.4.4 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] 3c59x: Handle pci_enable_device() failure while resuming
Handle pci_enable_device() failure while resuming, we can safely exit here. Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]> --- drivers/net/3c59x.c |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) diff --git a/drivers/net/3c59x.c b/drivers/net/3c59x.c index 2b750bd..ea4a78f 100644 --- a/drivers/net/3c59x.c +++ b/drivers/net/3c59x.c @@ -821,11 +821,17 @@ static int vortex_resume(struct pci_dev *pdev) { struct net_device *dev = pci_get_drvdata(pdev); struct vortex_private *vp = netdev_priv(dev); + int err; if (dev && vp) { pci_set_power_state(pdev, PCI_D0); pci_restore_state(pdev); - pci_enable_device(pdev); + err = pci_enable_device(pdev); + if (err) { + printk(KERN_WARNING "%s: Could not enable device \n", + dev->name); + return err; + } pci_set_master(pdev); if (request_irq(dev->irq, vp->full_bus_master_rx ? _interrupt : _interrupt, IRQF_SHARED, dev->name, dev)) { -- 1.4.4.4 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ecryptfs: handles AOP_TRUNCATED_PAGE better
- In fact we don't have to fail if AOP_TRUNCATED_PAGE was returned from prepare_write or commit_write. It is beter to retry attempt where it is possible. - Rearange ecryptfs_get_lower_page() error handling logic, make it more clean. Signed-off-by: Dmitriy Monakhov <[EMAIL PROTECTED]> --- fs/ecryptfs/mmap.c | 28 1 files changed, 16 insertions(+), 12 deletions(-) diff --git a/fs/ecryptfs/mmap.c b/fs/ecryptfs/mmap.c index 842497d..b8fc1ef 100644 --- a/fs/ecryptfs/mmap.c +++ b/fs/ecryptfs/mmap.c @@ -447,6 +447,7 @@ static int ecryptfs_write_inode_size_to_header(struct file *lower_file, const struct address_space_operations *lower_a_ops; u64 file_size; +retry: header_page = grab_cache_page(lower_inode->i_mapping, 0); if (!header_page) { ecryptfs_printk(KERN_ERR, "grab_cache_page for " @@ -457,9 +458,10 @@ static int ecryptfs_write_inode_size_to_header(struct file *lower_file, lower_a_ops = lower_inode->i_mapping->a_ops; rc = lower_a_ops->prepare_write(lower_file, header_page, 0, 8); if (rc) { - if (rc == AOP_TRUNCATED_PAGE) + if (rc == AOP_TRUNCATED_PAGE) { ecryptfs_release_lower_page(header_page, 0); - else + goto retry; + } else ecryptfs_release_lower_page(header_page, 1); goto out; } @@ -474,9 +476,10 @@ static int ecryptfs_write_inode_size_to_header(struct file *lower_file, if (rc < 0) ecryptfs_printk(KERN_ERR, "Error commiting header page " "write\n"); - if (rc == AOP_TRUNCATED_PAGE) + if (rc == AOP_TRUNCATED_PAGE) { ecryptfs_release_lower_page(header_page, 0); - else + goto retry; + } else ecryptfs_release_lower_page(header_page, 1); lower_inode->i_mtime = lower_inode->i_ctime = CURRENT_TIME; mark_inode_dirty_sync(inode); @@ -565,6 +568,7 @@ int ecryptfs_get_lower_page(struct page **lower_page, struct inode *lower_inode, { int rc = 0; +retry: *lower_page = grab_cache_page(lower_inode->i_mapping, lower_page_index); if (!(*lower_page)) { rc = -EINVAL; @@ -578,18 +582,18 @@ int ecryptfs_get_lower_page(struct page **lower_page, struct inode *lower_inode, byte_offset, region_bytes); if (rc) { - ecryptfs_printk(KERN_ERR, "prepare_write for " + if (rc == AOP_TRUNCATED_PAGE) { + ecryptfs_release_lower_page(*lower_page, 0); + goto retry; + } else { + ecryptfs_printk(KERN_ERR, "prepare_write for " "lower_page_index = [0x%.16x] failed; rc = " "[%d]\n", lower_page_index, rc); - } -out: - if (rc && (*lower_page)) { - if (rc == AOP_TRUNCATED_PAGE) - ecryptfs_release_lower_page(*lower_page, 0); - else ecryptfs_release_lower_page(*lower_page, 1); - (*lower_page) = NULL; + (*lower_page) = NULL; + } } +out: return rc; } -- 1.4.4.4 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] MPT fusion: handle mpt_resume() failure while resuming
Since mpt-fusion-handle-pci-layer-error-on-resume.patch function mpt_resume() can return error code. Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] --- drivers/message/fusion/mptscsih.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/drivers/message/fusion/mptscsih.c b/drivers/message/fusion/mptscsih.c index decadbe..96be1d6 100644 --- a/drivers/message/fusion/mptscsih.c +++ b/drivers/message/fusion/mptscsih.c @@ -1191,8 +1191,11 @@ mptscsih_resume(struct pci_dev *pdev) MPT_ADAPTER *ioc = pci_get_drvdata(pdev); struct Scsi_Host*host = ioc-sh; MPT_SCSI_HOST *hd; + int ret; - mpt_resume(pdev); + ret = mpt_resume(pdev); + if (ret) + return ret; if(!host) return 0; -- 1.4.4.4 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] libata: handle ata_pci_device_do_resume() failure while resuming
Since commit:553c4aa630af7bc885e056d0436e4eb7f238579b ata_pci_device_do_resume() can return error code, all callers was updated except this one. Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] --- drivers/ata/sata_inic162x.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/drivers/ata/sata_inic162x.c b/drivers/ata/sata_inic162x.c index 31b636f..7933043 100644 --- a/drivers/ata/sata_inic162x.c +++ b/drivers/ata/sata_inic162x.c @@ -639,7 +639,10 @@ static int inic_pci_device_resume(struct pci_dev *pdev) void __iomem *mmio_base = host-iomap[MMIO_BAR]; int rc; - ata_pci_device_do_resume(pdev); + rc = ata_pci_device_do_resume(pdev); + if (rc) { + return rc; + } if (pdev-dev.power.power_state.event == PM_EVENT_SUSPEND) { printk(XXX\n); -- 1.4.4.4 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ecryptfs: handles AOP_TRUNCATED_PAGE better
- In fact we don't have to fail if AOP_TRUNCATED_PAGE was returned from prepare_write or commit_write. It is beter to retry attempt where it is possible. - Rearange ecryptfs_get_lower_page() error handling logic, make it more clean. Signed-off-by: Dmitriy Monakhov [EMAIL PROTECTED] --- fs/ecryptfs/mmap.c | 28 1 files changed, 16 insertions(+), 12 deletions(-) diff --git a/fs/ecryptfs/mmap.c b/fs/ecryptfs/mmap.c index 842497d..b8fc1ef 100644 --- a/fs/ecryptfs/mmap.c +++ b/fs/ecryptfs/mmap.c @@ -447,6 +447,7 @@ static int ecryptfs_write_inode_size_to_header(struct file *lower_file, const struct address_space_operations *lower_a_ops; u64 file_size; +retry: header_page = grab_cache_page(lower_inode-i_mapping, 0); if (!header_page) { ecryptfs_printk(KERN_ERR, grab_cache_page for @@ -457,9 +458,10 @@ static int ecryptfs_write_inode_size_to_header(struct file *lower_file, lower_a_ops = lower_inode-i_mapping-a_ops; rc = lower_a_ops-prepare_write(lower_file, header_page, 0, 8); if (rc) { - if (rc == AOP_TRUNCATED_PAGE) + if (rc == AOP_TRUNCATED_PAGE) { ecryptfs_release_lower_page(header_page, 0); - else + goto retry; + } else ecryptfs_release_lower_page(header_page, 1); goto out; } @@ -474,9 +476,10 @@ static int ecryptfs_write_inode_size_to_header(struct file *lower_file, if (rc 0) ecryptfs_printk(KERN_ERR, Error commiting header page write\n); - if (rc == AOP_TRUNCATED_PAGE) + if (rc == AOP_TRUNCATED_PAGE) { ecryptfs_release_lower_page(header_page, 0); - else + goto retry; + } else ecryptfs_release_lower_page(header_page, 1); lower_inode-i_mtime = lower_inode-i_ctime = CURRENT_TIME; mark_inode_dirty_sync(inode); @@ -565,6 +568,7 @@ int ecryptfs_get_lower_page(struct page **lower_page, struct inode *lower_inode, { int rc = 0; +retry: *lower_page = grab_cache_page(lower_inode-i_mapping, lower_page_index); if (!(*lower_page)) { rc = -EINVAL; @@ -578,18 +582,18 @@ int ecryptfs_get_lower_page(struct page **lower_page, struct inode *lower_inode, byte_offset, region_bytes); if (rc) { - ecryptfs_printk(KERN_ERR, prepare_write for + if (rc == AOP_TRUNCATED_PAGE) { + ecryptfs_release_lower_page(*lower_page, 0); + goto retry; + } else { + ecryptfs_printk(KERN_ERR, prepare_write for lower_page_index = [0x%.16x] failed; rc = [%d]\n, lower_page_index, rc); - } -out: - if (rc (*lower_page)) { - if (rc == AOP_TRUNCATED_PAGE) - ecryptfs_release_lower_page(*lower_page, 0); - else ecryptfs_release_lower_page(*lower_page, 1); - (*lower_page) = NULL; + (*lower_page) = NULL; + } } +out: return rc; } -- 1.4.4.4 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] 3c59x: Handle pci_enable_device() failure while resuming
Handle pci_enable_device() failure while resuming, we can safely exit here. Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] --- drivers/net/3c59x.c |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) diff --git a/drivers/net/3c59x.c b/drivers/net/3c59x.c index 2b750bd..ea4a78f 100644 --- a/drivers/net/3c59x.c +++ b/drivers/net/3c59x.c @@ -821,11 +821,17 @@ static int vortex_resume(struct pci_dev *pdev) { struct net_device *dev = pci_get_drvdata(pdev); struct vortex_private *vp = netdev_priv(dev); + int err; if (dev vp) { pci_set_power_state(pdev, PCI_D0); pci_restore_state(pdev); - pci_enable_device(pdev); + err = pci_enable_device(pdev); + if (err) { + printk(KERN_WARNING %s: Could not enable device \n, + dev-name); + return err; + } pci_set_master(pdev); if (request_irq(dev-irq, vp-full_bus_master_rx ? boomerang_interrupt : vortex_interrupt, IRQF_SHARED, dev-name, dev)) { -- 1.4.4.4 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] sk98lin: handle pci_enable_device() return value in skge_resume()
This thread looks dead but issue was't fixed. Jiri Kosina [EMAIL PROTECTED] writes: - pci_enable_device(pdev); + ret = pci_enable_device(pdev); + if (ret) { + printk(KERN_ERR sk98lin: Cannot enable PCI device %s during resume\n, + dev-name); + unregister_netdev(dev); This looks rather wrong - skge_exit() will run unregister_netdev() again. You are of course right (the problem was also spotted by Russell King). This I believe is the correct one for the sk98lin case. [PATCH] fix sk98lin driver, ignoring return value from pci_enable_device() add check of return value to _resume() function of sk98lin driver. Signed-off-by: Jiri Kosina [EMAIL PROTECTED] --- --- a/drivers/net/sk98lin/skge.c +++ b/drivers/net/sk98lin/skge.c @@ -5070,7 +5070,12 @@ static int skge_resume(struct pci_dev *p pci_set_power_state(pdev, PCI_D0); pci_restore_state(pdev); - pci_enable_device(pdev); + ret = pci_enable_device(pdev); + if (ret) { + printk(KERN_WARNING sk98lin: unable to enable device %s in resume\n, + dev-name); + goto out_err; + } [snip] +out_err: + pAC-AllocFlag = ~SK_ALLOC_IRQ; + dev-irq = 0; + pci_disable_device(pdev); Ok what happend if we jump here right after pci_disable_device() has failed, but pci_disable_device() was called anyway, this is wrong and may be fatal because pdev-enable_cnt may becomes negative. + + return ret; + } #else #define skge_suspend NULL This is reworked Jiri's patch: [PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED] --- drivers/net/sk98lin/skge.c | 20 +++- 1 files changed, 15 insertions(+), 5 deletions(-) diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c index e94ab25..eea753a 100644 --- a/drivers/net/sk98lin/skge.c +++ b/drivers/net/sk98lin/skge.c @@ -5125,7 +5125,12 @@ static int skge_resume(struct pci_dev *pdev) pci_set_power_state(pdev, PCI_D0); pci_restore_state(pdev); - pci_enable_device(pdev); + ret = pci_enable_device(pdev); + if (ret) { + printk(KERN_WARNING sk98lin: unable to enable device %s + in resume\n, dev-name); + goto err_out; + } pci_set_master(pdev); if (pAC-GIni.GIMacsFound == 2) ret = request_irq(dev-irq, SkGeIsr, IRQF_SHARED, sk98lin, dev); @@ -5133,10 +5138,8 @@ static int skge_resume(struct pci_dev *pdev) ret = request_irq(dev-irq, SkGeIsrOnePort, IRQF_SHARED, sk98lin, dev); if (ret) { printk(KERN_WARNING sk98lin: unable to acquire IRQ %d\n, dev-irq); - pAC-AllocFlag = ~SK_ALLOC_IRQ; - dev-irq = 0; - pci_disable_device(pdev); - return -EBUSY; + ret = -EBUSY; + goto err_out_disable_pdev; } netif_device_attach(dev); @@ -5153,6 +5156,13 @@ static int skge_resume(struct pci_dev *pdev) } return 0; + +err_out_disable_pdev: + pci_disable_device(pdev); +err_out: + pAC-AllocFlag = ~SK_ALLOC_IRQ; + dev-irq = 0; + return ret; } #else #define skge_suspend NULL -- 1.4.4.4 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/