from:"Dmitriy Monakhov"

Re: iov_iter_fault_in_readable fix

2007-06-16 Thread Dmitriy Monakhov

On 18:31 Чтв 14 Июн , Christoph Hellwig wrote:
> On Wed, Jun 13, 2007 at 05:57:59PM +0400, Dmitriy Monakhov wrote:
> > Function prerform check for signgle region, with out respect to
> > segment nature of iovec, For example writev no longer works :)
> 
> Btw, could someone please start to collect all sniplets like this in
> a nice simple regression test suite?  If no one wants to start a new
> one we should probably just put it into xfsqa (which should be useable
> for other filesystems aswell despite the name)
I've prepared testcase (testcases/kernel/syscalls/writev/writev06.c) 
and sent it to ltp mailing list.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] deny partial write for loop dev fd

2007-06-16 Thread Dmitriy Monakhov

Partial write can be easily supported by LO_CRYPT_NONE mode, but
it is not easy in LO_CRYPT_CRYPTOAPI case, because of its block nature.
I don't know who still used cryptoapi, but theoretically it is possible.
So let's leave things as they are. Loop device doesn't support partial
write before Nick's "write_begin/write_end" patch set, and let's it 
behave the same way after.

Signed-off-by: Dmitriy Monakhov <[EMAIL PROTECTED]>
---
 drivers/block/loop.c |4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 4bab9b1..de122f3 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -244,10 +244,8 @@ static int do_lo_send_aops(struct loop_device *lo, struct 
bio_vec *bvec,
 
ret = pagecache_write_end(file, mapping, pos, size, copied,
page, fsdata);
-   if (ret < 0)
+   if (ret < 0 || ret != copied)
goto fail;
-   if (ret < copied)
-   copied = ret;
 
if (unlikely(transfer_result))
goto fail;
-- 
1.5.2


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] deny partial write for loop dev fd

2007-06-16 Thread Dmitriy Monakhov

Partial write can be easily supported by LO_CRYPT_NONE mode, but
it is not easy in LO_CRYPT_CRYPTOAPI case, because of its block nature.
I don't know who still used cryptoapi, but theoretically it is possible.
So let's leave things as they are. Loop device doesn't support partial
write before Nick's write_begin/write_end patch set, and let's it 
behave the same way after.

Signed-off-by: Dmitriy Monakhov [EMAIL PROTECTED]
---
 drivers/block/loop.c |4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 4bab9b1..de122f3 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -244,10 +244,8 @@ static int do_lo_send_aops(struct loop_device *lo, struct 
bio_vec *bvec,
 
ret = pagecache_write_end(file, mapping, pos, size, copied,
page, fsdata);
-   if (ret  0)
+   if (ret  0 || ret != copied)
goto fail;
-   if (ret  copied)
-   copied = ret;
 
if (unlikely(transfer_result))
goto fail;
-- 
1.5.2


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: iov_iter_fault_in_readable fix

2007-06-16 Thread Dmitriy Monakhov

On 18:31 Чтв 14 Июн , Christoph Hellwig wrote:
 On Wed, Jun 13, 2007 at 05:57:59PM +0400, Dmitriy Monakhov wrote:
  Function prerform check for signgle region, with out respect to
  segment nature of iovec, For example writev no longer works :)
 
 Btw, could someone please start to collect all sniplets like this in
 a nice simple regression test suite?  If no one wants to start a new
 one we should probably just put it into xfsqa (which should be useable
 for other filesystems aswell despite the name)
I've prepared testcase (testcases/kernel/syscalls/writev/writev06.c) 
and sent it to ltp mailing list.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ext4:fix unexpected error from ext4_reserve_global

2007-06-14 Thread Dmitriy Monakhov

I just cant belive my eyes then i saw this at the first time...
simple test: strace dd if=/dev/zero of=/mnt/file

open("/dev/zero", O_RDONLY) = 0
close(1)= 0
open("/mnt/test/file", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 1
read(0, "\0\0\0\0\0\0\0\0\0"..., 512) = 512
write(1, "\0\0\0\0\0\0\0\0"..., 512) = 512
read(0, "\0\0\0\0\0\0\0\0\0"..., 512) = 512
write(1, "\0\0\0\0\0\0\0\0"..., 512) = -1 ENOENT (No such fil
e or directory)

This strange error returned from ext4_reserve_global().
It's just typo because:
a) In fact this is 100% ENOSPC situation
b) simular function ext4_reserve_local() returns -ENOSPC

Signed-off-by: Dmitriy Monakhov <[EMAIL PROTECTED]>
---
 fs/ext4/balloc.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 4d7bfd2..43ae8f8 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -1920,7 +1920,7 @@ int ext4_reserve_global(struct super_block *sb, int 
blocks)
 {
struct ext4_sb_info *sbi = EXT4_SB(sb);
struct ext4_reservation_slot *rs;
-   int i, rc = -ENOENT;
+   int i, rc = -ENOSPC;
__u64 free = 0;
 
rs = sbi->s_reservation_slots;
-- 
1.5.2


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ext4:fix unexpected error from ext4_reserve_global

2007-06-14 Thread Dmitriy Monakhov

I just cant belive my eyes then i saw this at the first time...
simple test: strace dd if=/dev/zero of=/mnt/file

open(/dev/zero, O_RDONLY) = 0
close(1)= 0
open(/mnt/test/file, O_WRONLY|O_CREAT|O_TRUNC, 0666) = 1
read(0, \0\0\0\0\0\0\0\0\0..., 512) = 512
write(1, \0\0\0\0\0\0\0\0..., 512) = 512
read(0, \0\0\0\0\0\0\0\0\0..., 512) = 512
write(1, \0\0\0\0\0\0\0\0..., 512) = -1 ENOENT (No such fil
e or directory)

This strange error returned from ext4_reserve_global().
It's just typo because:
a) In fact this is 100% ENOSPC situation
b) simular function ext4_reserve_local() returns -ENOSPC

Signed-off-by: Dmitriy Monakhov [EMAIL PROTECTED]
---
 fs/ext4/balloc.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 4d7bfd2..43ae8f8 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -1920,7 +1920,7 @@ int ext4_reserve_global(struct super_block *sb, int 
blocks)
 {
struct ext4_sb_info *sbi = EXT4_SB(sb);
struct ext4_reservation_slot *rs;
-   int i, rc = -ENOENT;
+   int i, rc = -ENOSPC;
__u64 free = 0;
 
rs = sbi-s_reservation_slots;
-- 
1.5.2


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + fs-introduce-write_begin-write_end-and-perform_write-aops.patch added to -mm tree

2007-06-13 Thread Dmitriy Monakhov

On 13:43 Срд 13 Июн , Nick Piggin wrote:
> On Wed, Jun 13, 2007 at 05:40:05PM +0400, Dmitriy Monakhov wrote:
> > On 14:19 ?? 29 ?? , [EMAIL PROTECTED] wrote:
> > > 
> > > The patch titled
> > >  fs: introduce write_begin, write_end, and perform_write aops
> > > has been added to the -mm tree.  Its filename is
> > >  fs-introduce-write_begin-write_end-and-perform_write-aops.patch
> > > 
> > > *** Remember to use Documentation/SubmitChecklist when testing your code 
> > > ***
> > > 
> > > See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to 
> > > find
> > > out what to do about this
> > > 
> > > --
> > > Subject: fs: introduce write_begin, write_end, and perform_write aops
> > > From: Nick Piggin <[EMAIL PROTECTED]>
> > > 
> > > These are intended to replace prepare_writ eand commit_write with more
> > > flexible alternatives that are also able to avoid the buffered write
> > > deadlock problems efficiently (which prepare_write is unable to do).
> > > 
> > > [EMAIL PROTECTED]: API design contributions, code review and fixes]
> > > Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
> > > Signed-off-by: Mark Fasheh <[EMAIL PROTECTED]>
> > > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
> > I've finaly find time to review Nick's "write_begin/write_end aop" patch 
> > set.
> > And i have some fixes and questions. My be i've missed somthing and it was 
> > already disscussed, but i cant find in LKML.
> 
> Thanks, that's really helpful.
> 
>  
> > 1) loop dev:
> > loop.c code itself is not perfect. In fact before Nick's patch
> > partial write was't possible. Assumption what write chunks are
> > always page aligned is realy weird ( see "index++" line).
> > Fixed by "new aop loop fix" patch
> 
> I think you're right, fix looks good.
> 
>  
> > 2)block_write_begin:
> > After we enter to block_write_begin with *pagep == NULL and
> > some page was grabed we remember this page in *pagep
> > And if __block_prepare_write() we have to clear *pagep , as 
> > it was before. Because this may confuse caller.
> > for example caller may have folowing code:
> > ret = block_write_begin(..., pagep,...)
> > if (ret && *pagep != NULL) {
> > unlock_page(*pagep);
> > page_cache_release(*pagep);
> > }
> > Fixed my "new aop block_write_begin fix" patch
> 
> I don't think the caller can rely on that if it returns failure.
> But that is more defensive I guess. Maybe setting it to 1 or
> so would catch abusers.
> 
>  
> > 3) __page_symlink:
> > Nick's patch add folowing code:
> > + err = pagecache_write_begin(NULL, mapping, 0,PAGE_CACHE_SIZE,
> > + AOP_FLAG_UNINTERRUPTIBLE, ,);
> > symlink now consume whole page. I have only one question "WHY???".
> > I don't see any advantages, but where are huge list of
> > dissdvantages:
> > a) fs with blksize == 1k and pagesize == 16k after this patch
> >waste more than 10x times disk space for nothing.
> > b) What happends if we want use fs with blksize == 4k on i386
> >after it was used by ia64 ??? (before this patch it was
> >possible).
One more visiable effect caused by wrong symlink size:
fsstress cause folowing error:

EXT3-fs unexpected failure: !buffer_revoked(bh); 
inconsistent data on disk
ext3_forget: aborting transaction: IO failure in __ext3_journal_revoke
ext3_abort called.

EXT3-fs error (device dm-4): ext3_forget: error -5 when attempting
revoke
Remounting filesystem read-only 
Aborting journal on device dm-4.
journal commit I/O error
journal commit I/O error
journal commit I/O error
journal commit I/O error
journal commit I/O error
journal commit I/O error
EXT3-fs error (device dm-4) in ext3_free_blocks_sb: Journal has aborted

journal commit I/O error
journal commit I/O error
EXT3-fs error (device dm-4) in ext3_reserve_inode_write: Journal has
aborted
EXT3-fs error (device dm-4) in ext3_truncate: IO failure
journal commit I/O error
journal commit I/O error
journal commit I/O error
journal commit I/O error
EXT3-fs error (device dm-4) in ext3_reserve_inode_write: Journal has
aborted
EXT3-fs error (device dm-4) in ext3_orphan_del: Journal has aborted
EXT3-fs error (device dm-4) in ext3_reserve_inode_write: Journal has
aborted
EXT3-fs error (device dm-4) in ext3_de

Re: [patch] new aop loop fix

2007-06-13 Thread Dmitriy Monakhov

On 14:36 Срд 13 Июн , Hugh Dickins wrote:
> On Wed, 13 Jun 2007, Dmitriy Monakhov wrote:
> 
> > loop.c code itself is not perfect. In fact before Nick's patch
> > partial write was't possible. Assumption what write chunks are
> > always page aligned is realy weird ( see "index++" line).
> > 
> > Signed-off-by: Dmitriy Monakhov <[EMAIL PROTECTED]>
> 
> I'm interested, because I'm trying to chase down an -mm bug which
> occasionally leaves me with 1k of zeroes where it shouldn't (in a
> 1k bsize ext2 looped over tmpfs).  The length of time for this to
> happen varies a lot, so bisection has been misleading: maybe the
> problem lies in Nick's patches, maybe it does not.
> 
> But I don't understand your fix below at all.  _If_ you need to
> change the handling of index, then you need to change the handling
> of offset too, don't you?
> 
> But what's wrong with how inded was handled anyway?  Yes, it might
> be being incremented at the bottom of the loop when we haven't
> reached the end of this page, but in that case we're not going
> round the loop again anyway: len will now be 0.  So no problem.
> 
> One of us is missing something: please enlighten me - thanks.
Yepp. You absolutely right, wrong patch was attached :(
Btw: Nick's patches broke  LO_CRYPT_XOR mode, but it is ok because
this feature was absolete and not used by anyone, am i right here?
> 
> Hugh
> 
> > 
> > diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> > index 4bab9b1..8726da5 100644
> > --- a/drivers/block/loop.c
> > +++ b/drivers/block/loop.c
> > @@ -215,7 +215,6 @@ static int do_lo_send_aops(struct loop_device *lo, 
> > struct bio_vec *bvec,
> > int len, ret;
> >  
> > mutex_lock(>host->i_mutex);
> > -   index = pos >> PAGE_CACHE_SHIFT;
> > offset = pos & ((pgoff_t)PAGE_CACHE_SIZE - 1);
> > bv_offs = bvec->bv_offset;
> > len = bvec->bv_len;
> > @@ -226,6 +225,7 @@ static int do_lo_send_aops(struct loop_device *lo, 
> > struct bio_vec *bvec,
> > struct page *page;
> > void *fsdata;
> >  
> > +   index = pos >> PAGE_CACHE_SHIFT;
> > IV = ((sector_t)index << (PAGE_CACHE_SHIFT - 9))+(offset >> 9);
> > size = PAGE_CACHE_SIZE - offset;
> > if (size > len)
> > @@ -255,7 +255,6 @@ static int do_lo_send_aops(struct loop_device *lo, 
> > struct bio_vec *bvec,
> > bv_offs += copied;
> > len -= copied;
> > offset = 0;
> > -   index++;
> > pos += copied;
> > }
> > ret = 0;
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

iov_iter_fault_in_readable fix

2007-06-13 Thread Dmitriy Monakhov

Function prerform check for signgle region, with out respect to
segment nature of iovec, For example writev no longer works :)

/* TESTCASE BEGIN */
#include 
#include 
#include 
#include 
#include 
#include 
#define SIZE  (4096 * 2)
int main(int argc, char* argv[])
{   
char* ptr[4];
struct iovec iov[2];
int fd, ret;
ptr[0] = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, 0, 0);
ptr[1] = mmap(NULL, SIZE, PROT_NONE,
MAP_PRIVATE|MAP_ANONYMOUS, 0, 0);
ptr[2] = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, 0, 0);
ptr[3] = mmap(NULL, SIZE, PROT_NONE, 
MAP_PRIVATE|MAP_ANONYMOUS, 0, 0);

iov[0].iov_base = ptr[0] + (SIZE -1);
iov[0].iov_len = 1;
memset(ptr[0], 1, SIZE);

iov[1].iov_base = ptr[2];
iov[1].iov_len = SIZE;
memset(ptr[2], 2, SIZE);

fd = open(argv[1], O_CREAT|O_RDWR|O_TRUNC, 0666);
ret = writev(fd, iov, sizeof(iov) / sizeof(struct iovec));
return 0;
}   
/* TESTCASE END*/
We will get folowing result:
writev(3, [{"\1", 1}, {"\2"..., 8192}], 2) = -1 EFAULT (Bad 
address)

this is hidden bug, and it was invisiable because _fault_in_readable
return value was ignored before. Lets iov_iter_fault_in_readable
perform checks for all segments.

Signed-off-by: Dmitriy Monakhov <[EMAIL PROTECTED]>

diff --git a/include/linux/fs.h b/include/linux/fs.h
index fef19fc..7e025ea 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -433,7 +433,7 @@ size_t iov_iter_copy_from_user_atomic(struct page *page,
 size_t iov_iter_copy_from_user(struct page *page,
struct iov_iter *i, unsigned long offset, size_t bytes);
 void iov_iter_advance(struct iov_iter *i, size_t bytes);
-int iov_iter_fault_in_readable(struct iov_iter *i, size_t bytes);
+int iov_iter_fault_in_readable(struct iov_iter *i, size_t *bytes);
 size_t iov_iter_single_seg_count(struct iov_iter *i);
 
 static inline void iov_iter_init(struct iov_iter *i,
diff --git a/mm/filemap.c b/mm/filemap.c
index 8d59ed9..8600c3e 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1817,10 +1817,32 @@ void iov_iter_advance(struct iov_iter *i, size_t bytes)
 }
 EXPORT_SYMBOL(iov_iter_advance);
 
-int iov_iter_fault_in_readable(struct iov_iter *i, size_t bytes)
+int iov_iter_fault_in_readable(struct iov_iter *i, size_t* bytes)
 {
-   char __user *buf = i->iov->iov_base + i->iov_offset;
-   return fault_in_pages_readable(buf, bytes);
+   size_t len = *bytes;
+   int ret;
+   if (likely(i->nr_segs == 1)) {
+   ret = fault_in_pages_readable(i->iov->iov_base, len);
+   if (ret)
+   *bytes = 0;
+   } else {
+   const struct iovec *iov = i->iov;
+   size_t base = i->iov_offset;
+   *bytes = 0;
+   while (len) {
+   int copy = min(len, iov->iov_len - base);
+   if ((ret = fault_in_pages_readable(iov->iov_base + 
base, copy)))
+   break;
+   *bytes += copy;
+   len -= copy;
+   base += copy;
+   if (iov->iov_len == base) {
+   iov++;
+   base = 0;
+   }
+   }
+   }
+   return ret; 
 }
 EXPORT_SYMBOL(iov_iter_fault_in_readable);
 
@@ -2110,7 +2132,7 @@ static ssize_t generic_perform_write_2copy(struct file 
*file,
 * to check that the address is actually valid, when atomic
 * usercopies are used, below.
 */
-   if (unlikely(iov_iter_fault_in_readable(i, bytes))) {
+   if (unlikely(iov_iter_fault_in_readable(i, ) && !bytes)) {
status = -EFAULT;
break;
}
@@ -2284,7 +2306,7 @@ again:
 * to check that the address is actually valid, when atomic
 * usercopies are used, below.
 */
-   if (unlikely(iov_iter_fault_in_readable(i, bytes))) {
+   if (unlikely(iov_iter_fault_in_readable(i, )&& !bytes)) {
status = -EFAULT;
break;
}


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] new aop block_write_begin fix

2007-06-13 Thread Dmitriy Monakhov

After we enter to block_write_begin with *pagep == NULL and
some page was grabed we remember this page in *pagep
And if __block_prepare_write() we have to clear *pagep , as 
it was before. Because this may confuse caller.
for example caller may have folowing code:
ret = block_write_begin(..., pagep,...)
if (ret && *pagep != NULL) {
unlock_page(*pagep);
page_cache_release(*pagep);
}
Signed-off-by: Dmitriy Monakhov <[EMAIL PROTECTED]>

diff --git a/fs/buffer.c b/fs/buffer.c
index 07cd457..df933ba 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1982,6 +1982,7 @@ int block_write_begin(struct file *file, struct 
address_space *mapping,
if (ownpage) {
unlock_page(page);
page_cache_release(page);
+   *pagep = NULL;
 
/*
 * prepare_write() may have instantiated a few blocks 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] new aop loop fix

2007-06-13 Thread Dmitriy Monakhov

loop.c code itself is not perfect. In fact before Nick's patch
partial write was't possible. Assumption what write chunks are
always page aligned is realy weird ( see "index++" line).

Signed-off-by: Dmitriy Monakhov <[EMAIL PROTECTED]>

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 4bab9b1..8726da5 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -215,7 +215,6 @@ static int do_lo_send_aops(struct loop_device *lo, struct 
bio_vec *bvec,
int len, ret;
 
mutex_lock(>host->i_mutex);
-   index = pos >> PAGE_CACHE_SHIFT;
offset = pos & ((pgoff_t)PAGE_CACHE_SIZE - 1);
bv_offs = bvec->bv_offset;
len = bvec->bv_len;
@@ -226,6 +225,7 @@ static int do_lo_send_aops(struct loop_device *lo, struct 
bio_vec *bvec,
struct page *page;
void *fsdata;
 
+   index = pos >> PAGE_CACHE_SHIFT;
IV = ((sector_t)index << (PAGE_CACHE_SHIFT - 9))+(offset >> 9);
size = PAGE_CACHE_SIZE - offset;
if (size > len)
@@ -255,7 +255,6 @@ static int do_lo_send_aops(struct loop_device *lo, struct 
bio_vec *bvec,
bv_offs += copied;
len -= copied;
offset = 0;
-   index++;
pos += copied;
}
ret = 0;

> 
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + fs-introduce-write_begin-write_end-and-perform_write-aops.patch added to -mm tree

2007-06-13 Thread Dmitriy Monakhov

On 14:19 Втр 29 Май , [EMAIL PROTECTED] wrote:
> 
> The patch titled
>  fs: introduce write_begin, write_end, and perform_write aops
> has been added to the -mm tree.  Its filename is
>  fs-introduce-write_begin-write_end-and-perform_write-aops.patch
> 
> *** Remember to use Documentation/SubmitChecklist when testing your code ***
> 
> See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find
> out what to do about this
> 
> --
> Subject: fs: introduce write_begin, write_end, and perform_write aops
> From: Nick Piggin <[EMAIL PROTECTED]>
> 
> These are intended to replace prepare_write and commit_write with more
> flexible alternatives that are also able to avoid the buffered write
> deadlock problems efficiently (which prepare_write is unable to do).
> 
> [EMAIL PROTECTED]: API design contributions, code review and fixes]
> Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>
> Signed-off-by: Mark Fasheh <[EMAIL PROTECTED]>
> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
I've finaly find time to review Nick's "write_begin/write_end aop" patch set.
And i have some fixes and questions. My be i've missed somthing and it was 
already disscussed, but i cant find in LKML.

1) loop dev:
loop.c code itself is not perfect. In fact before Nick's patch
partial write was't possible. Assumption what write chunks are
always page aligned is realy weird ( see "index++" line).
Fixed by "new aop loop fix" patch

2)block_write_begin:
After we enter to block_write_begin with *pagep == NULL and
some page was grabed we remember this page in *pagep
And if __block_prepare_write() we have to clear *pagep , as 
it was before. Because this may confuse caller.
for example caller may have folowing code:
ret = block_write_begin(..., pagep,...)
if (ret && *pagep != NULL) {
unlock_page(*pagep);
page_cache_release(*pagep);
}
Fixed my "new aop block_write_begin fix" patch

3) __page_symlink:
Nick's patch add folowing code:
+ err = pagecache_write_begin(NULL, mapping, 0,PAGE_CACHE_SIZE,
+ AOP_FLAG_UNINTERRUPTIBLE, ,);
symlink now consume whole page. I have only one question "WHY???".
I don't see any advantages, but where are huge list of
dissdvantages:
a) fs with blksize == 1k and pagesize == 16k after this patch
   waste more than 10x times disk space for nothing.
b) What happends if we want use fs with blksize == 4k on i386
   after it was used by ia64 ??? (before this patch it was
   possible).

I dont prepare patch for this because i dont understand issue
witch Nick aimed to fix.

4) iov_iter_fault_in_readable:
Function prerform check for signgle region, with out respect to
segment nature of iovec, For example writev no longer works :) :
writev(3, [{"\1", 1}, {"\2"..., 4096}], 2) = -1 EFAULT (Bad address)
this hidden bug, and it was invisiable because _fault_in_readable
return value was ignored before. Lets iov_iter_fault_in_readable
perform checks for all segments.
Fixed by :"iov_iter_fault_in_readable fix"

5) ext3_write_end:
Before  write_begin/write_end patch set we have folowing locking
order:
stop_journal(handle);
unlock_page(page);
But now order is oposite:
unlock_page(page);
stop_journal(handle);
Can we got any race condition now? I'm not sure is it actual problem,
may be somebody cant describe this.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + fs-introduce-write_begin-write_end-and-perform_write-aops.patch added to -mm tree

2007-06-13 Thread Dmitriy Monakhov

On 14:19 Втр 29 Май , [EMAIL PROTECTED] wrote:
 
 The patch titled
  fs: introduce write_begin, write_end, and perform_write aops
 has been added to the -mm tree.  Its filename is
  fs-introduce-write_begin-write_end-and-perform_write-aops.patch
 
 *** Remember to use Documentation/SubmitChecklist when testing your code ***
 
 See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find
 out what to do about this
 
 --
 Subject: fs: introduce write_begin, write_end, and perform_write aops
 From: Nick Piggin [EMAIL PROTECTED]
 
 These are intended to replace prepare_write and commit_write with more
 flexible alternatives that are also able to avoid the buffered write
 deadlock problems efficiently (which prepare_write is unable to do).
 
 [EMAIL PROTECTED]: API design contributions, code review and fixes]
 Signed-off-by: Nick Piggin [EMAIL PROTECTED]
 Signed-off-by: Mark Fasheh [EMAIL PROTECTED]
 Signed-off-by: Andrew Morton [EMAIL PROTECTED]
I've finaly find time to review Nick's write_begin/write_end aop patch set.
And i have some fixes and questions. My be i've missed somthing and it was 
already disscussed, but i cant find in LKML.

1) loop dev:
loop.c code itself is not perfect. In fact before Nick's patch
partial write was't possible. Assumption what write chunks are
always page aligned is realy weird ( see index++ line).
Fixed by new aop loop fix patch

2)block_write_begin:
After we enter to block_write_begin with *pagep == NULL and
some page was grabed we remember this page in *pagep
And if __block_prepare_write() we have to clear *pagep , as 
it was before. Because this may confuse caller.
for example caller may have folowing code:
ret = block_write_begin(..., pagep,...)
if (ret  *pagep != NULL) {
unlock_page(*pagep);
page_cache_release(*pagep);
}
Fixed my new aop block_write_begin fix patch

3) __page_symlink:
Nick's patch add folowing code:
+ err = pagecache_write_begin(NULL, mapping, 0,PAGE_CACHE_SIZE,
+ AOP_FLAG_UNINTERRUPTIBLE, page,fsdata);
symlink now consume whole page. I have only one question WHY???.
I don't see any advantages, but where are huge list of
dissdvantages:
a) fs with blksize == 1k and pagesize == 16k after this patch
   waste more than 10x times disk space for nothing.
b) What happends if we want use fs with blksize == 4k on i386
   after it was used by ia64 ??? (before this patch it was
   possible).

I dont prepare patch for this because i dont understand issue
witch Nick aimed to fix.

4) iov_iter_fault_in_readable:
Function prerform check for signgle region, with out respect to
segment nature of iovec, For example writev no longer works :) :
writev(3, [{\1, 1}, {\2..., 4096}], 2) = -1 EFAULT (Bad address)
this hidden bug, and it was invisiable because _fault_in_readable
return value was ignored before. Lets iov_iter_fault_in_readable
perform checks for all segments.
Fixed by :iov_iter_fault_in_readable fix

5) ext3_write_end:
Before  write_begin/write_end patch set we have folowing locking
order:
stop_journal(handle);
unlock_page(page);
But now order is oposite:
unlock_page(page);
stop_journal(handle);
Can we got any race condition now? I'm not sure is it actual problem,
may be somebody cant describe this.




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] new aop loop fix

2007-06-13 Thread Dmitriy Monakhov

loop.c code itself is not perfect. In fact before Nick's patch
partial write was't possible. Assumption what write chunks are
always page aligned is realy weird ( see index++ line).

Signed-off-by: Dmitriy Monakhov [EMAIL PROTECTED]

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 4bab9b1..8726da5 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -215,7 +215,6 @@ static int do_lo_send_aops(struct loop_device *lo, struct 
bio_vec *bvec,
int len, ret;
 
mutex_lock(mapping-host-i_mutex);
-   index = pos  PAGE_CACHE_SHIFT;
offset = pos  ((pgoff_t)PAGE_CACHE_SIZE - 1);
bv_offs = bvec-bv_offset;
len = bvec-bv_len;
@@ -226,6 +225,7 @@ static int do_lo_send_aops(struct loop_device *lo, struct 
bio_vec *bvec,
struct page *page;
void *fsdata;
 
+   index = pos  PAGE_CACHE_SHIFT;
IV = ((sector_t)index  (PAGE_CACHE_SHIFT - 9))+(offset  9);
size = PAGE_CACHE_SIZE - offset;
if (size  len)
@@ -255,7 +255,6 @@ static int do_lo_send_aops(struct loop_device *lo, struct 
bio_vec *bvec,
bv_offs += copied;
len -= copied;
offset = 0;
-   index++;
pos += copied;
}
ret = 0;

 
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] new aop block_write_begin fix

2007-06-13 Thread Dmitriy Monakhov

After we enter to block_write_begin with *pagep == NULL and
some page was grabed we remember this page in *pagep
And if __block_prepare_write() we have to clear *pagep , as 
it was before. Because this may confuse caller.
for example caller may have folowing code:
ret = block_write_begin(..., pagep,...)
if (ret  *pagep != NULL) {
unlock_page(*pagep);
page_cache_release(*pagep);
}
Signed-off-by: Dmitriy Monakhov [EMAIL PROTECTED]

diff --git a/fs/buffer.c b/fs/buffer.c
index 07cd457..df933ba 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1982,6 +1982,7 @@ int block_write_begin(struct file *file, struct 
address_space *mapping,
if (ownpage) {
unlock_page(page);
page_cache_release(page);
+   *pagep = NULL;
 
/*
 * prepare_write() may have instantiated a few blocks 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

iov_iter_fault_in_readable fix

2007-06-13 Thread Dmitriy Monakhov

Function prerform check for signgle region, with out respect to
segment nature of iovec, For example writev no longer works :)

/* TESTCASE BEGIN */
#include stdio.h
#include sys/types.h
#include sys/stat.h
#include fcntl.h
#include sys/uio.h
#include sys/mman.h
#define SIZE  (4096 * 2)
int main(int argc, char* argv[])
{   
char* ptr[4];
struct iovec iov[2];
int fd, ret;
ptr[0] = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, 0, 0);
ptr[1] = mmap(NULL, SIZE, PROT_NONE,
MAP_PRIVATE|MAP_ANONYMOUS, 0, 0);
ptr[2] = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, 0, 0);
ptr[3] = mmap(NULL, SIZE, PROT_NONE, 
MAP_PRIVATE|MAP_ANONYMOUS, 0, 0);

iov[0].iov_base = ptr[0] + (SIZE -1);
iov[0].iov_len = 1;
memset(ptr[0], 1, SIZE);

iov[1].iov_base = ptr[2];
iov[1].iov_len = SIZE;
memset(ptr[2], 2, SIZE);

fd = open(argv[1], O_CREAT|O_RDWR|O_TRUNC, 0666);
ret = writev(fd, iov, sizeof(iov) / sizeof(struct iovec));
return 0;
}   
/* TESTCASE END*/
We will get folowing result:
writev(3, [{\1, 1}, {\2..., 8192}], 2) = -1 EFAULT (Bad 
address)

this is hidden bug, and it was invisiable because _fault_in_readable
return value was ignored before. Lets iov_iter_fault_in_readable
perform checks for all segments.

Signed-off-by: Dmitriy Monakhov [EMAIL PROTECTED]

diff --git a/include/linux/fs.h b/include/linux/fs.h
index fef19fc..7e025ea 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -433,7 +433,7 @@ size_t iov_iter_copy_from_user_atomic(struct page *page,
 size_t iov_iter_copy_from_user(struct page *page,
struct iov_iter *i, unsigned long offset, size_t bytes);
 void iov_iter_advance(struct iov_iter *i, size_t bytes);
-int iov_iter_fault_in_readable(struct iov_iter *i, size_t bytes);
+int iov_iter_fault_in_readable(struct iov_iter *i, size_t *bytes);
 size_t iov_iter_single_seg_count(struct iov_iter *i);
 
 static inline void iov_iter_init(struct iov_iter *i,
diff --git a/mm/filemap.c b/mm/filemap.c
index 8d59ed9..8600c3e 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1817,10 +1817,32 @@ void iov_iter_advance(struct iov_iter *i, size_t bytes)
 }
 EXPORT_SYMBOL(iov_iter_advance);
 
-int iov_iter_fault_in_readable(struct iov_iter *i, size_t bytes)
+int iov_iter_fault_in_readable(struct iov_iter *i, size_t* bytes)
 {
-   char __user *buf = i-iov-iov_base + i-iov_offset;
-   return fault_in_pages_readable(buf, bytes);
+   size_t len = *bytes;
+   int ret;
+   if (likely(i-nr_segs == 1)) {
+   ret = fault_in_pages_readable(i-iov-iov_base, len);
+   if (ret)
+   *bytes = 0;
+   } else {
+   const struct iovec *iov = i-iov;
+   size_t base = i-iov_offset;
+   *bytes = 0;
+   while (len) {
+   int copy = min(len, iov-iov_len - base);
+   if ((ret = fault_in_pages_readable(iov-iov_base + 
base, copy)))
+   break;
+   *bytes += copy;
+   len -= copy;
+   base += copy;
+   if (iov-iov_len == base) {
+   iov++;
+   base = 0;
+   }
+   }
+   }
+   return ret; 
 }
 EXPORT_SYMBOL(iov_iter_fault_in_readable);
 
@@ -2110,7 +2132,7 @@ static ssize_t generic_perform_write_2copy(struct file 
*file,
 * to check that the address is actually valid, when atomic
 * usercopies are used, below.
 */
-   if (unlikely(iov_iter_fault_in_readable(i, bytes))) {
+   if (unlikely(iov_iter_fault_in_readable(i, bytes)  !bytes)) {
status = -EFAULT;
break;
}
@@ -2284,7 +2306,7 @@ again:
 * to check that the address is actually valid, when atomic
 * usercopies are used, below.
 */
-   if (unlikely(iov_iter_fault_in_readable(i, bytes))) {
+   if (unlikely(iov_iter_fault_in_readable(i, bytes) !bytes)) {
status = -EFAULT;
break;
}


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org

Re: [patch] new aop loop fix

2007-06-13 Thread Dmitriy Monakhov

On 14:36 Срд 13 Июн , Hugh Dickins wrote:
 On Wed, 13 Jun 2007, Dmitriy Monakhov wrote:
 
  loop.c code itself is not perfect. In fact before Nick's patch
  partial write was't possible. Assumption what write chunks are
  always page aligned is realy weird ( see index++ line).
  
  Signed-off-by: Dmitriy Monakhov [EMAIL PROTECTED]
 
 I'm interested, because I'm trying to chase down an -mm bug which
 occasionally leaves me with 1k of zeroes where it shouldn't (in a
 1k bsize ext2 looped over tmpfs).  The length of time for this to
 happen varies a lot, so bisection has been misleading: maybe the
 problem lies in Nick's patches, maybe it does not.
 
 But I don't understand your fix below at all.  _If_ you need to
 change the handling of index, then you need to change the handling
 of offset too, don't you?
 
 But what's wrong with how inded was handled anyway?  Yes, it might
 be being incremented at the bottom of the loop when we haven't
 reached the end of this page, but in that case we're not going
 round the loop again anyway: len will now be 0.  So no problem.
 
 One of us is missing something: please enlighten me - thanks.
Yepp. You absolutely right, wrong patch was attached :(
Btw: Nick's patches broke  LO_CRYPT_XOR mode, but it is ok because
this feature was absolete and not used by anyone, am i right here?
 
 Hugh
 
  
  diff --git a/drivers/block/loop.c b/drivers/block/loop.c
  index 4bab9b1..8726da5 100644
  --- a/drivers/block/loop.c
  +++ b/drivers/block/loop.c
  @@ -215,7 +215,6 @@ static int do_lo_send_aops(struct loop_device *lo, 
  struct bio_vec *bvec,
  int len, ret;
   
  mutex_lock(mapping-host-i_mutex);
  -   index = pos  PAGE_CACHE_SHIFT;
  offset = pos  ((pgoff_t)PAGE_CACHE_SIZE - 1);
  bv_offs = bvec-bv_offset;
  len = bvec-bv_len;
  @@ -226,6 +225,7 @@ static int do_lo_send_aops(struct loop_device *lo, 
  struct bio_vec *bvec,
  struct page *page;
  void *fsdata;
   
  +   index = pos  PAGE_CACHE_SHIFT;
  IV = ((sector_t)index  (PAGE_CACHE_SHIFT - 9))+(offset  9);
  size = PAGE_CACHE_SIZE - offset;
  if (size  len)
  @@ -255,7 +255,6 @@ static int do_lo_send_aops(struct loop_device *lo, 
  struct bio_vec *bvec,
  bv_offs += copied;
  len -= copied;
  offset = 0;
  -   index++;
  pos += copied;
  }
  ret = 0;
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + fs-introduce-write_begin-write_end-and-perform_write-aops.patch added to -mm tree

2007-06-13 Thread Dmitriy Monakhov

On 13:43 Срд 13 Июн , Nick Piggin wrote:
 On Wed, Jun 13, 2007 at 05:40:05PM +0400, Dmitriy Monakhov wrote:
  On 14:19 ?? 29 ?? , [EMAIL PROTECTED] wrote:
   
   The patch titled
fs: introduce write_begin, write_end, and perform_write aops
   has been added to the -mm tree.  Its filename is
fs-introduce-write_begin-write_end-and-perform_write-aops.patch
   
   *** Remember to use Documentation/SubmitChecklist when testing your code 
   ***
   
   See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to 
   find
   out what to do about this
   
   --
   Subject: fs: introduce write_begin, write_end, and perform_write aops
   From: Nick Piggin [EMAIL PROTECTED]
   
   These are intended to replace prepare_writ eand commit_write with more
   flexible alternatives that are also able to avoid the buffered write
   deadlock problems efficiently (which prepare_write is unable to do).
   
   [EMAIL PROTECTED]: API design contributions, code review and fixes]
   Signed-off-by: Nick Piggin [EMAIL PROTECTED]
   Signed-off-by: Mark Fasheh [EMAIL PROTECTED]
   Signed-off-by: Andrew Morton [EMAIL PROTECTED]
  I've finaly find time to review Nick's write_begin/write_end aop patch 
  set.
  And i have some fixes and questions. My be i've missed somthing and it was 
  already disscussed, but i cant find in LKML.
 
 Thanks, that's really helpful.
 
  
  1) loop dev:
  loop.c code itself is not perfect. In fact before Nick's patch
  partial write was't possible. Assumption what write chunks are
  always page aligned is realy weird ( see index++ line).
  Fixed by new aop loop fix patch
 
 I think you're right, fix looks good.
 
  
  2)block_write_begin:
  After we enter to block_write_begin with *pagep == NULL and
  some page was grabed we remember this page in *pagep
  And if __block_prepare_write() we have to clear *pagep , as 
  it was before. Because this may confuse caller.
  for example caller may have folowing code:
  ret = block_write_begin(..., pagep,...)
  if (ret  *pagep != NULL) {
  unlock_page(*pagep);
  page_cache_release(*pagep);
  }
  Fixed my new aop block_write_begin fix patch
 
 I don't think the caller can rely on that if it returns failure.
 But that is more defensive I guess. Maybe setting it to 1 or
 so would catch abusers.
 
  
  3) __page_symlink:
  Nick's patch add folowing code:
  + err = pagecache_write_begin(NULL, mapping, 0,PAGE_CACHE_SIZE,
  + AOP_FLAG_UNINTERRUPTIBLE, page,fsdata);
  symlink now consume whole page. I have only one question WHY???.
  I don't see any advantages, but where are huge list of
  dissdvantages:
  a) fs with blksize == 1k and pagesize == 16k after this patch
 waste more than 10x times disk space for nothing.
  b) What happends if we want use fs with blksize == 4k on i386
 after it was used by ia64 ??? (before this patch it was
 possible).
One more visiable effect caused by wrong symlink size:
fsstress cause folowing error:
LOG BEGIN
EXT3-fs unexpected failure: !buffer_revoked(bh); 
inconsistent data on disk
ext3_forget: aborting transaction: IO failure in __ext3_journal_revoke
ext3_abort called.

EXT3-fs error (device dm-4): ext3_forget: error -5 when attempting
revoke
Remounting filesystem read-only 
Aborting journal on device dm-4.
journal commit I/O error
journal commit I/O error
journal commit I/O error
journal commit I/O error
journal commit I/O error
journal commit I/O error
EXT3-fs error (device dm-4) in ext3_free_blocks_sb: Journal has aborted

journal commit I/O error
journal commit I/O error
EXT3-fs error (device dm-4) in ext3_reserve_inode_write: Journal has
aborted
EXT3-fs error (device dm-4) in ext3_truncate: IO failure
journal commit I/O error
journal commit I/O error
journal commit I/O error
journal commit I/O error
EXT3-fs error (device dm-4) in ext3_reserve_inode_write: Journal has
aborted
EXT3-fs error (device dm-4) in ext3_orphan_del: Journal has aborted
EXT3-fs error (device dm-4) in ext3_reserve_inode_write: Journal has
aborted
EXT3-fs error (device dm-4) in ext3_delete_inode: IO failure
LOG END

After symlink size was fixed to len-1 problem dissappeared. 
  
  I dont prepare patch for this because i dont understand issue
  witch Nick aimed to fix.
 
 I don't know why myself :P I think it would be just fine to use
 len-1 as it did previously, so it must have been a typo?
 
 
  4) iov_iter_fault_in_readable:
  Function prerform check for signgle region, with out respect to
  segment nature of iovec, For example writev no longer works :) :
  writev(3, [{\1, 1}, {\2..., 4096}], 2) = -1 EFAULT (Bad address)
  this hidden bug, and it was invisiable because _fault_in_readable
  return value was ignored before. Lets

Re: [PATCH][5/5][resend] floppy.c: Fix device_create_file() warning

2007-03-21 Thread Dmitriy Monakhov

"Jesper Juhl" <[EMAIL PROTECTED]> writes:

> On 20/03/07, Mikael Pettersson <[EMAIL PROTECTED]> wrote:
>> On Mon, 19 Mar 2007 18:42:22 +0100, Jesper Juhl wrote:
>> > --- a/drivers/block/floppy.c
>> > +++ b/drivers/block/floppy.c
>> > @@ -4302,7 +4302,12 @@ static int __init floppy_init(void)
>> >   if (err)
>> >   goto out_flush_work;
>> >
>> > - device_create_file(_device[drive].dev,_attr_cmos);
>> > + err = device_create_file(_device[drive].dev, 
>> > _attr_cmos);
>> > + if (err)
>> > + goto out_flush_work;
>> > +
>> >   /* to be cleaned up... */
>> >   disks[drive]->private_data = (void *)(long)drive;
>> >   disks[drive]->queue = floppy_queue;
>>
>> The floppy driver's sysfs file just provides some auxiliary
>> information to user-space, none of which matters for most of
>> its users. It is IMO totally inappropriate to fail floppy
>> driver init in this case.
>>
>
> Which is why my original patch just output a warning to let the user
> know that creating the file had failed.
Ohh please... First of all you have to make you patch correct, and only
after this think about tiny shiny error massages.
Lets look at original code:
  err = platform_device_register(_device[drive]);
  if (err)
goto out_flush_work;

  device_create_file(_device[drive].dev,_attr_cmos);
< You propose  "goto out_flush_work" here if device_create_file() failed
< ,as it was done if platform_device_register() failed. But at the 
< moment we call device_create_file() platform_device_register()
< succeed, this means we have to unregister it first and only after
< this jumpto out_flush_work.
>
> -- 
> Jesper Juhl <[EMAIL PROTECTED]>
> Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
> Plain text mails only, please  http://www.expita.com/nomime.html
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH][5/5][resend] floppy.c: Fix device_create_file() warning

2007-03-21 Thread Dmitriy Monakhov

Jesper Juhl [EMAIL PROTECTED] writes:

 On 20/03/07, Mikael Pettersson [EMAIL PROTECTED] wrote:
 On Mon, 19 Mar 2007 18:42:22 +0100, Jesper Juhl wrote:
  --- a/drivers/block/floppy.c
  +++ b/drivers/block/floppy.c
  @@ -4302,7 +4302,12 @@ static int __init floppy_init(void)
if (err)
goto out_flush_work;
 
  - device_create_file(floppy_device[drive].dev,dev_attr_cmos);
  + err = device_create_file(floppy_device[drive].dev, 
  dev_attr_cmos);
  + if (err)
  + goto out_flush_work;
  +
/* to be cleaned up... */
disks[drive]-private_data = (void *)(long)drive;
disks[drive]-queue = floppy_queue;

 The floppy driver's sysfs file just provides some auxiliary
 information to user-space, none of which matters for most of
 its users. It is IMO totally inappropriate to fail floppy
 driver init in this case.


 Which is why my original patch just output a warning to let the user
 know that creating the file had failed.
Ohh please... First of all you have to make you patch correct, and only
after this think about tiny shiny error massages.
Lets look at original code:
  err = platform_device_register(floppy_device[drive]);
  if (err)
goto out_flush_work;

  device_create_file(floppy_device[drive].dev,dev_attr_cmos);
 You propose  goto out_flush_work here if device_create_file() failed
 ,as it was done if platform_device_register() failed. But at the 
 moment we call device_create_file() platform_device_register()
 succeed, this means we have to unregister it first and only after
 this jumpto out_flush_work.

 -- 
 Jesper Juhl [EMAIL PROTECTED]
 Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
 Plain text mails only, please  http://www.expita.com/nomime.html
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] fs: remove duplicated iovec checking code v8

2007-03-19 Thread Dmitriy Monakhov


Where are several places where the same code used for iovec checks.
This patch just move this code to separate helper function, and replace
duplicated code with it. IMHO it is better because these are checks that
we want for all filesystems/drivers that use vectored I/O.

Signed-off-by: Dmitriy Monakhov <[EMAIL PROTECTED]>
---
 fs/ntfs/file.c |   21 +++--
 fs/read_write.c|   40 
 fs/xfs/linux-2.6/xfs_lrw.c |   22 +++---
 include/linux/fs.h |3 +++
 mm/filemap.c   |   43 ++-
 5 files changed, 55 insertions(+), 74 deletions(-)

diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c
index dbbac55..2c672a4 100644
--- a/fs/ntfs/file.c
+++ b/fs/ntfs/file.c
@@ -2129,28 +2129,13 @@ static ssize_t ntfs_file_aio_write_nolock(struct kiocb 
*iocb,
struct address_space *mapping = file->f_mapping;
struct inode *inode = mapping->host;
loff_t pos;
-   unsigned long seg;
size_t count;   /* after file limit checks */
ssize_t written, err;
 
count = 0;
-   for (seg = 0; seg < nr_segs; seg++) {
-   const struct iovec *iv = [seg];
-   /*
-* If any segment has a negative length, or the cumulative
-* length ever wraps negative then return -EINVAL.
-*/
-   count += iv->iov_len;
-   if (unlikely((ssize_t)(count|iv->iov_len) < 0))
-   return -EINVAL;
-   if (access_ok(VERIFY_READ, iv->iov_base, iv->iov_len))
-   continue;
-   if (!seg)
-   return -EFAULT;
-   nr_segs = seg;
-   count -= iv->iov_len;   /* This segment is no good */
-   break;
-   }
+   err = generic_iovec_checks(iov, _segs, , VERIFY_READ);
+   if (err)
+   return err;
pos = *ppos;
vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
/* We can write back this queue in page reclaim. */
diff --git a/fs/read_write.c b/fs/read_write.c
index 4d03008..22ec324 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -217,6 +217,46 @@ Einval:
return -EINVAL;
 }
 
+/*
+ * Performs necessary iovec checks before doing a write
+ * @iov:   io vector request
+ * @nr_segs:   number of segments in the iovec
+ * @count: number of bytes to write
+ * @access_flags: type of access: %VERIFY_READ or %VERIFY_WRITE
+ *
+ * Adjust number of segments and amount of bytes to write (nr_segs should be
+ * properly initialized first). Returns appropriate error code that caller
+ * should return or zero in case that write should be allowed.
+ */
+int generic_iovec_checks(const struct iovec *iov,
+   unsigned long *nr_segs, size_t *count,
+   unsigned long access_flags)
+{
+   unsigned long   seg;
+   size_t cnt = 0;
+   for (seg = 0; seg < *nr_segs; seg++) {
+   const struct iovec *iv = [seg];
+
+   /*
+* If any segment has a negative length, or the cumulative
+* length ever wraps negative then return -EINVAL.
+*/
+   cnt += iv->iov_len;
+   if (unlikely((ssize_t)(cnt|iv->iov_len) < 0))
+   return -EINVAL;
+   if (access_ok(access_flags, iv->iov_base, iv->iov_len))
+   continue;
+   if (seg == 0)
+   return -EFAULT;
+   *nr_segs = seg;
+   cnt -= iv->iov_len; /* This segment is no good */
+   break;
+   }
+   *count = cnt;
+   return 0;
+}
+EXPORT_SYMBOL(generic_iovec_checks);
+
 static void wait_on_retry_sync_kiocb(struct kiocb *iocb)
 {
set_current_state(TASK_UNINTERRUPTIBLE);
diff --git a/fs/xfs/linux-2.6/xfs_lrw.c b/fs/xfs/linux-2.6/xfs_lrw.c
index ff8d64e..9a11b00 100644
--- a/fs/xfs/linux-2.6/xfs_lrw.c
+++ b/fs/xfs/linux-2.6/xfs_lrw.c
@@ -639,7 +639,6 @@ xfs_write(
xfs_fsize_t isize, new_size;
xfs_iocore_t*io;
bhv_vnode_t *vp;
-   unsigned long   seg;
int iolock;
int eventsent = 0;
bhv_vrwlock_t   locktype;
@@ -652,24 +651,9 @@ xfs_write(
vp = BHV_TO_VNODE(bdp);
xip = XFS_BHVTOI(bdp);
 
-   for (seg = 0; seg < segs; seg++) {
-   const struct iovec *iv = [seg];
-
-   /*
-* If any segment has a negative length, or the cumulative
-* length ever wraps negative then return -EINVAL.
-*/
-   ocount += iv->iov_len;
-   if (unlikely((ssize_t)(ocount|iv->iov_len) < 0))
-   return -EINVAL;

[PATCH 2/2] fs: incorrect direct io error handling v8

2007-03-19 Thread Dmitriy Monakhov


If generic_file_direct_write() has fail (ENOSPC condition) inside
__generic_file_aio_write_nolock() it may have instantiated a few blocks outside
i_size in case of non blockdev files. At least ext2, ext3 and reiserfs interpret
i_size and biggest block difference as error. Later fsck will complain
about wrong i_size. After fsck will fix error i_size will be increased
to the biggest block. This is bad because this blocks contain gurbage
from previous write attempt. And result in silence file data corruption.
This issue affect fs regardless the values of blocksize or pagesize.
We need truncate any block beyond i_size after generic_file_direct_write()
has failed. In fact all existing fs witch use __generic_file_aio_write_nolock()
always call it under i_mutex for non blockdev files. This patch add 
correcspnding BUG_ON() in order to explicitly demand/document it.

Also fix out of date direct_io locking comments.

TEST_CASE:
TESTCASE_BEGIN
   $touch /mnt/test/BIG_FILE
   ## at this moment /mnt/test/BIG_FILE size and blocks equal to zero
   open("/mnt/test/BIG_FILE", O_WRONLY|O_CREAT|O_DIRECT, 0666) = 3
   write(3, ""..., 104857600) = -1 ENOSPC (No space left on device)
   ## size and block sould't be changed because write op failed.
   $stat /mnt/test/BIG_FILE
 File: `/mnt/test/BIG_FILE'
 Size: 0 Blocks: 110896 IO Block: 1024   regular empty file
   <<<<<<<<<<<<<<<<<<<<<<<<<<<<<^^^file size is less than biggest block idx
   
   Device: fe07h/65031dInode: 14  Links: 1
   Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
   Access: 2007-01-24 20:03:38.0 +0300
   Modify: 2007-01-24 20:03:38.0 +0300
   Change: 2007-01-24 20:03:39.0 +0300
   
   #fsck.ext3 -f /dev/VG/test 
   e2fsck 1.39 (29-May-2006)
   Pass 1: Checking inodes, blocks, and sizes
   Inode 14, i_size is 0, should be 56556544.  Fix? yes
   Pass 2: Checking directory structure
   
#TESTCASE_END

Signed-off-by: Dmitriy Monakhov <[EMAIL PROTECTED]>
---
 mm/filemap.c |   24 
 1 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index bbef42f..a08900e 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1892,8 +1892,10 @@ generic_file_direct_write(struct kiocb *iocb, const 
struct iovec *iov,
/*
 * Sync the fs metadata but not the minor inode changes and
 * of course not the data as we did direct DMA for the IO.
-* i_mutex is held, which protects generic_osync_inode() from
-* livelocking.  AIO O_DIRECT ops attempt to sync metadata here.
+* i_mutex is held in case of DIO_LOCKING, which protects
+* generic_osync_inode() from livelocking. If it is not held, then
+* the filesystem must prevent this livelock. AIO O_DIRECT ops
+* attempt to sync metadata here.
 */
if ((written >= 0 || written == -EIOCBQUEUED) &&
((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
@@ -2083,6 +2085,9 @@ __generic_file_aio_write_nolock(struct kiocb *iocb, const 
struct iovec *iov,
ssize_t written;
ssize_t err;
 
+   /* Always called under i_mutex for writes to non blockdev files */
+   BUG_ON(!S_ISBLK(inode->i_mode) &&
+   !mutex_is_locked(>i_mutex));
ocount = 0;
err = generic_iovec_checks(iov, _segs, , VERIFY_READ);
if (err)
@@ -2117,6 +2122,17 @@ __generic_file_aio_write_nolock(struct kiocb *iocb, 
const struct iovec *iov,
 
written = generic_file_direct_write(iocb, iov, _segs, pos,
ppos, count, ocount);
+   /*
+* If host is not blockdev generic_file_direct_write() may
+* have instantiated a few blocks outside i_size  files
+* Trim these off again.
+*/
+   if (unlikely(written < 0) && !S_ISBLK(inode->i_mode)) {
+   loff_t isize = i_size_read(inode);
+   if (pos + count > isize)
+   vmtruncate(inode, isize);
+   }
+
if (written < 0 || written == count)
goto out;
/*
@@ -2221,8 +2237,8 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const 
struct iovec *iov,
 EXPORT_SYMBOL(generic_file_aio_write);
 
 /*
- * Called under i_mutex for writes to S_ISREG files.   Returns -EIO if 
something
- * went wrong during pagecache shootdown.
+ * Called under i_mutex for writes to S_ISREG files in case of DIO_LOCKING.
+ * Returns -EIO if something went wrong during pagecache shootdown.
  */
 static ssize_t
 generic_file_direct_IO(int rw, struct kiocb *iocb, const struct iovec *io

[PATCH 2/2] fs: incorrect direct io error handling v8

2007-03-19 Thread Dmitriy Monakhov


If generic_file_direct_write() has fail (ENOSPC condition) inside
__generic_file_aio_write_nolock() it may have instantiated a few blocks outside
i_size in case of non blockdev files. At least ext2, ext3 and reiserfs interpret
i_size and biggest block difference as error. Later fsck will complain
about wrong i_size. After fsck will fix error i_size will be increased
to the biggest block. This is bad because this blocks contain gurbage
from previous write attempt. And result in silence file data corruption.
This issue affect fs regardless the values of blocksize or pagesize.
We need truncate any block beyond i_size after generic_file_direct_write()
has failed. In fact all existing fs witch use __generic_file_aio_write_nolock()
always call it under i_mutex for non blockdev files. This patch add 
correcspnding BUG_ON() in order to explicitly demand/document it.

Also fix out of date direct_io locking comments.

TEST_CASE:
TESTCASE_BEGIN
   $touch /mnt/test/BIG_FILE
   ## at this moment /mnt/test/BIG_FILE size and blocks equal to zero
   open(/mnt/test/BIG_FILE, O_WRONLY|O_CREAT|O_DIRECT, 0666) = 3
   write(3, ..., 104857600) = -1 ENOSPC (No space left on device)
   ## size and block sould't be changed because write op failed.
   $stat /mnt/test/BIG_FILE
 File: `/mnt/test/BIG_FILE'
 Size: 0 Blocks: 110896 IO Block: 1024   regular empty file
   ^^^file size is less than biggest block idx
   
   Device: fe07h/65031dInode: 14  Links: 1
   Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
   Access: 2007-01-24 20:03:38.0 +0300
   Modify: 2007-01-24 20:03:38.0 +0300
   Change: 2007-01-24 20:03:39.0 +0300
   
   #fsck.ext3 -f /dev/VG/test 
   e2fsck 1.39 (29-May-2006)
   Pass 1: Checking inodes, blocks, and sizes
   Inode 14, i_size is 0, should be 56556544.  Fixy? yes
   Pass 2: Checking directory structure
   
#TESTCASE_END

Signed-off-by: Dmitriy Monakhov [EMAIL PROTECTED]
---
 mm/filemap.c |   24 
 1 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index bbef42f..a08900e 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1892,8 +1892,10 @@ generic_file_direct_write(struct kiocb *iocb, const 
struct iovec *iov,
/*
 * Sync the fs metadata but not the minor inode changes and
 * of course not the data as we did direct DMA for the IO.
-* i_mutex is held, which protects generic_osync_inode() from
-* livelocking.  AIO O_DIRECT ops attempt to sync metadata here.
+* i_mutex is held in case of DIO_LOCKING, which protects
+* generic_osync_inode() from livelocking. If it is not held, then
+* the filesystem must prevent this livelock. AIO O_DIRECT ops
+* attempt to sync metadata here.
 */
if ((written = 0 || written == -EIOCBQUEUED) 
((file-f_flags  O_SYNC) || IS_SYNC(inode))) {
@@ -2083,6 +2085,9 @@ __generic_file_aio_write_nolock(struct kiocb *iocb, const 
struct iovec *iov,
ssize_t written;
ssize_t err;
 
+   /* Always called under i_mutex for writes to non blockdev files */
+   BUG_ON(!S_ISBLK(inode-i_mode) 
+   !mutex_is_locked(inode-i_mutex));
ocount = 0;
err = generic_iovec_checks(iov, nr_segs, ocount, VERIFY_READ);
if (err)
@@ -2117,6 +2122,17 @@ __generic_file_aio_write_nolock(struct kiocb *iocb, 
const struct iovec *iov,
 
written = generic_file_direct_write(iocb, iov, nr_segs, pos,
ppos, count, ocount);
+   /*
+* If host is not blockdev generic_file_direct_write() may
+* have instantiated a few blocks outside i_size  files
+* Trim these off again.
+*/
+   if (unlikely(written  0)  !S_ISBLK(inode-i_mode)) {
+   loff_t isize = i_size_read(inode);
+   if (pos + count  isize)
+   vmtruncate(inode, isize);
+   }
+
if (written  0 || written == count)
goto out;
/*
@@ -2221,8 +2237,8 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const 
struct iovec *iov,
 EXPORT_SYMBOL(generic_file_aio_write);
 
 /*
- * Called under i_mutex for writes to S_ISREG files.   Returns -EIO if 
something
- * went wrong during pagecache shootdown.
+ * Called under i_mutex for writes to S_ISREG files in case of DIO_LOCKING.
+ * Returns -EIO if something went wrong during pagecache shootdown.
  */
 static ssize_t
 generic_file_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
-- 
1.4.4.2


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read

[PATCH 1/2] fs: remove duplicated iovec checking code v8

2007-03-19 Thread Dmitriy Monakhov


Where are several places where the same code used for iovec checks.
This patch just move this code to separate helper function, and replace
duplicated code with it. IMHO it is better because these are checks that
we want for all filesystems/drivers that use vectored I/O.

Signed-off-by: Dmitriy Monakhov [EMAIL PROTECTED]
---
 fs/ntfs/file.c |   21 +++--
 fs/read_write.c|   40 
 fs/xfs/linux-2.6/xfs_lrw.c |   22 +++---
 include/linux/fs.h |3 +++
 mm/filemap.c   |   43 ++-
 5 files changed, 55 insertions(+), 74 deletions(-)

diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c
index dbbac55..2c672a4 100644
--- a/fs/ntfs/file.c
+++ b/fs/ntfs/file.c
@@ -2129,28 +2129,13 @@ static ssize_t ntfs_file_aio_write_nolock(struct kiocb 
*iocb,
struct address_space *mapping = file-f_mapping;
struct inode *inode = mapping-host;
loff_t pos;
-   unsigned long seg;
size_t count;   /* after file limit checks */
ssize_t written, err;
 
count = 0;
-   for (seg = 0; seg  nr_segs; seg++) {
-   const struct iovec *iv = iov[seg];
-   /*
-* If any segment has a negative length, or the cumulative
-* length ever wraps negative then return -EINVAL.
-*/
-   count += iv-iov_len;
-   if (unlikely((ssize_t)(count|iv-iov_len)  0))
-   return -EINVAL;
-   if (access_ok(VERIFY_READ, iv-iov_base, iv-iov_len))
-   continue;
-   if (!seg)
-   return -EFAULT;
-   nr_segs = seg;
-   count -= iv-iov_len;   /* This segment is no good */
-   break;
-   }
+   err = generic_iovec_checks(iov, nr_segs, count, VERIFY_READ);
+   if (err)
+   return err;
pos = *ppos;
vfs_check_frozen(inode-i_sb, SB_FREEZE_WRITE);
/* We can write back this queue in page reclaim. */
diff --git a/fs/read_write.c b/fs/read_write.c
index 4d03008..22ec324 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -217,6 +217,46 @@ Einval:
return -EINVAL;
 }
 
+/*
+ * Performs necessary iovec checks before doing a write
+ * @iov:   io vector request
+ * @nr_segs:   number of segments in the iovec
+ * @count: number of bytes to write
+ * @access_flags: type of access: %VERIFY_READ or %VERIFY_WRITE
+ *
+ * Adjust number of segments and amount of bytes to write (nr_segs should be
+ * properly initialized first). Returns appropriate error code that caller
+ * should return or zero in case that write should be allowed.
+ */
+int generic_iovec_checks(const struct iovec *iov,
+   unsigned long *nr_segs, size_t *count,
+   unsigned long access_flags)
+{
+   unsigned long   seg;
+   size_t cnt = 0;
+   for (seg = 0; seg  *nr_segs; seg++) {
+   const struct iovec *iv = iov[seg];
+
+   /*
+* If any segment has a negative length, or the cumulative
+* length ever wraps negative then return -EINVAL.
+*/
+   cnt += iv-iov_len;
+   if (unlikely((ssize_t)(cnt|iv-iov_len)  0))
+   return -EINVAL;
+   if (access_ok(access_flags, iv-iov_base, iv-iov_len))
+   continue;
+   if (seg == 0)
+   return -EFAULT;
+   *nr_segs = seg;
+   cnt -= iv-iov_len; /* This segment is no good */
+   break;
+   }
+   *count = cnt;
+   return 0;
+}
+EXPORT_SYMBOL(generic_iovec_checks);
+
 static void wait_on_retry_sync_kiocb(struct kiocb *iocb)
 {
set_current_state(TASK_UNINTERRUPTIBLE);
diff --git a/fs/xfs/linux-2.6/xfs_lrw.c b/fs/xfs/linux-2.6/xfs_lrw.c
index ff8d64e..9a11b00 100644
--- a/fs/xfs/linux-2.6/xfs_lrw.c
+++ b/fs/xfs/linux-2.6/xfs_lrw.c
@@ -639,7 +639,6 @@ xfs_write(
xfs_fsize_t isize, new_size;
xfs_iocore_t*io;
bhv_vnode_t *vp;
-   unsigned long   seg;
int iolock;
int eventsent = 0;
bhv_vrwlock_t   locktype;
@@ -652,24 +651,9 @@ xfs_write(
vp = BHV_TO_VNODE(bdp);
xip = XFS_BHVTOI(bdp);
 
-   for (seg = 0; seg  segs; seg++) {
-   const struct iovec *iv = iovp[seg];
-
-   /*
-* If any segment has a negative length, or the cumulative
-* length ever wraps negative then return -EINVAL.
-*/
-   ocount += iv-iov_len;
-   if (unlikely((ssize_t)(ocount|iv-iov_len)  0))
-   return -EINVAL;
-   if (access_ok(VERIFY_READ, iv-iov_base, iv-iov_len

Re: [PATCH] splice: partial write handling fix

2007-03-15 Thread Dmitriy Monakhov

Jens Axboe <[EMAIL PROTECTED]> writes:

> On Wed, Mar 14 2007, Dmitriy Monakhov wrote:
>> currently if partial write has happened while ->commit_write() then page
>> wasn't marked as accessed and rebalanced. 
>
> The ->commit_write() return values aren't very well designed imho. Is
> your fix correct getting the pipe_to_file() return value correct now?
I think yes, after my patch this path look like follows:

ret = mapping->a_ops->commit_write(file, page, offset, offset+this_len);
if (ret) {
if(ret == AOP_TRUNCATED_PAGE) {
page_cache_release(page);
goto find_page;
}
if (ret < 0)
goto out;
/*
 * Partial write has happened, so 'ret' already initialized by
 * number of bytes written, Where is nothing we have to do here.
 */
} else
ret = this_len;
/*
 * Return the number of bytes written and mark page as
 * accessed, we are now done!
 */
mark_page_accessed(page);
balance_dirty_pages_ratelimited(mapping);
out:
page_cache_release(page);
unlock_page(page);
out_ret:
return ret;
}

But off corse this patch does't fix issues spotted by Nick. 
>
> -- 
> Jens Axboe
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/5] fs: introduce new aops and infrastructure

2007-03-15 Thread Dmitriy Monakhov

Nick Piggin <[EMAIL PROTECTED]> writes:

> Index: linux-2.6/fs/splice.c
> ===
> --- linux-2.6.orig/fs/splice.c
> +++ linux-2.6/fs/splice.c
> @@ -559,7 +559,7 @@ static int pipe_to_file(struct pipe_inod
>   struct address_space *mapping = file->f_mapping;
>   unsigned int offset, this_len;
>   struct page *page;
> - pgoff_t index;
> + void *fsdata;
>   int ret;
>  
>   /*
> @@ -569,13 +569,13 @@ static int pipe_to_file(struct pipe_inod
>   if (unlikely(ret))
>   return ret;
>  
> - index = sd->pos >> PAGE_CACHE_SHIFT;
>   offset = sd->pos & ~PAGE_CACHE_MASK;
>  
>   this_len = sd->len;
>   if (this_len + offset > PAGE_CACHE_SIZE)
>   this_len = PAGE_CACHE_SIZE - offset;
>  
> +#if 0
>   /*
>* Reuse buf page, if SPLICE_F_MOVE is set and we are doing a full
>* page.
> @@ -587,86 +587,11 @@ static int pipe_to_file(struct pipe_inod
>* locked on successful return.
>*/
>   if (buf->ops->steal(pipe, buf))
> - goto find_page;
> +#endif
One more note. It's looks like you just disabled all fancy zero copy logic.
Off corse this is just rfc patchset.
But i think where is fundamental problem with it:
 Previous logic was following:
  1)splice code responsible for: stealing(if possible) and loking the page
  2)prepare_write() code responsible for: do fs speciffic stuff

 But with new write_begin() logic  all steps (grubbing, locking, preparing)
 happened internaly inside write_begin() witch doesn't even know about what
 kind of data will be copied between write_begin/write_end.
 So fancy zero copy logic is impossible :(
I think this can be solved somehow, but i dont know yet, how can this be done
without implementing it inside begin_write().

>  
> - page = buf->page;
> - if (add_to_page_cache(page, mapping, index, GFP_KERNEL)) {
> - unlock_page(page);
> - goto find_page;
> - }
> -
> - page_cache_get(page);
> -
> - if (!(buf->flags & PIPE_BUF_FLAG_LRU))
> - lru_cache_add(page);
> - } else {
> -find_page:
> - page = find_lock_page(mapping, index);
> - if (!page) {
> - ret = -ENOMEM;
> - page = page_cache_alloc_cold(mapping);
> - if (unlikely(!page))
> - goto out_ret;
> -
> - /*
> -  * This will also lock the page
> -  */
> - ret = add_to_page_cache_lru(page, mapping, index,
> - GFP_KERNEL);
> - if (unlikely(ret))
> - goto out;
> - }
> -
> - /*
> -  * We get here with the page locked. If the page is also
> -  * uptodate, we don't need to do more. If it isn't, we
> -  * may need to bring it in if we are not going to overwrite
> -  * the full page.
> -  */
> - if (!PageUptodate(page)) {
> - if (this_len < PAGE_CACHE_SIZE) {
> - ret = mapping->a_ops->readpage(file, page);
> - if (unlikely(ret))
> - goto out;
> -
> - lock_page(page);
> -
> - if (!PageUptodate(page)) {
> - /*
> -  * Page got invalidated, repeat.
> -  */
> - if (!page->mapping) {
> - unlock_page(page);
> - page_cache_release(page);
> - goto find_page;
> - }
> - ret = -EIO;
> - goto out;
> - }
> - } else
> - SetPageUptodate(page);
> - }
> - }
> -
> - ret = mapping->a_ops->prepare_write(file, page, offset, 
> offset+this_len);
> - if (unlikely(ret)) {
> - loff_t isize = i_size_read(mapping->host);
> -
> - if (ret != AOP_TRUNCATED_PAGE)
> - unlock_page(page);
> - page_cache_release(page);
> - if (ret == AOP_TRUNCATED_PAGE)
> - goto find_page;
> -
> - /*
> -  * prepare_write() may have instantiated a few blocks
> -  * outside i_size.  Trim these off again.
> -  */
> - if (sd->pos + this_len > isize)
> - vmtruncate(mapping->host, isize);
> -
> -

Re: [patch 2/5] fs: introduce new aops and infrastructure

2007-03-15 Thread Dmitriy Monakhov

Nick Piggin [EMAIL PROTECTED] writes:

 Index: linux-2.6/fs/splice.c
 ===
 --- linux-2.6.orig/fs/splice.c
 +++ linux-2.6/fs/splice.c
 @@ -559,7 +559,7 @@ static int pipe_to_file(struct pipe_inod
   struct address_space *mapping = file-f_mapping;
   unsigned int offset, this_len;
   struct page *page;
 - pgoff_t index;
 + void *fsdata;
   int ret;
  
   /*
 @@ -569,13 +569,13 @@ static int pipe_to_file(struct pipe_inod
   if (unlikely(ret))
   return ret;
  
 - index = sd-pos  PAGE_CACHE_SHIFT;
   offset = sd-pos  ~PAGE_CACHE_MASK;
  
   this_len = sd-len;
   if (this_len + offset  PAGE_CACHE_SIZE)
   this_len = PAGE_CACHE_SIZE - offset;
  
 +#if 0
   /*
* Reuse buf page, if SPLICE_F_MOVE is set and we are doing a full
* page.
 @@ -587,86 +587,11 @@ static int pipe_to_file(struct pipe_inod
* locked on successful return.
*/
   if (buf-ops-steal(pipe, buf))
 - goto find_page;
 +#endif
One more note. It's looks like you just disabled all fancy zero copy logic.
Off corse this is just rfc patchset.
But i think where is fundamental problem with it:
 Previous logic was following:
  1)splice code responsible for: stealing(if possible) and loking the page
  2)prepare_write() code responsible for: do fs speciffic stuff

 But with new write_begin() logic  all steps (grubbing, locking, preparing)
 happened internaly inside write_begin() witch doesn't even know about what
 kind of data will be copied between write_begin/write_end.
 So fancy zero copy logic is impossible :(
I think this can be solved somehow, but i dont know yet, how can this be done
without implementing it inside begin_write().

  
 - page = buf-page;
 - if (add_to_page_cache(page, mapping, index, GFP_KERNEL)) {
 - unlock_page(page);
 - goto find_page;
 - }
 -
 - page_cache_get(page);
 -
 - if (!(buf-flags  PIPE_BUF_FLAG_LRU))
 - lru_cache_add(page);
 - } else {
 -find_page:
 - page = find_lock_page(mapping, index);
 - if (!page) {
 - ret = -ENOMEM;
 - page = page_cache_alloc_cold(mapping);
 - if (unlikely(!page))
 - goto out_ret;
 -
 - /*
 -  * This will also lock the page
 -  */
 - ret = add_to_page_cache_lru(page, mapping, index,
 - GFP_KERNEL);
 - if (unlikely(ret))
 - goto out;
 - }
 -
 - /*
 -  * We get here with the page locked. If the page is also
 -  * uptodate, we don't need to do more. If it isn't, we
 -  * may need to bring it in if we are not going to overwrite
 -  * the full page.
 -  */
 - if (!PageUptodate(page)) {
 - if (this_len  PAGE_CACHE_SIZE) {
 - ret = mapping-a_ops-readpage(file, page);
 - if (unlikely(ret))
 - goto out;
 -
 - lock_page(page);
 -
 - if (!PageUptodate(page)) {
 - /*
 -  * Page got invalidated, repeat.
 -  */
 - if (!page-mapping) {
 - unlock_page(page);
 - page_cache_release(page);
 - goto find_page;
 - }
 - ret = -EIO;
 - goto out;
 - }
 - } else
 - SetPageUptodate(page);
 - }
 - }
 -
 - ret = mapping-a_ops-prepare_write(file, page, offset, 
 offset+this_len);
 - if (unlikely(ret)) {
 - loff_t isize = i_size_read(mapping-host);
 -
 - if (ret != AOP_TRUNCATED_PAGE)
 - unlock_page(page);
 - page_cache_release(page);
 - if (ret == AOP_TRUNCATED_PAGE)
 - goto find_page;
 -
 - /*
 -  * prepare_write() may have instantiated a few blocks
 -  * outside i_size.  Trim these off again.
 -  */
 - if (sd-pos + this_len  isize)
 - vmtruncate(mapping-host, isize);
 -
 - goto out_ret;
 - }
 + ret = pagecache_write_begin(file, mapping, sd-pos, sd-len, 0, page, 
 fsdata);
 + if

Re: [PATCH] splice: partial write handling fix

2007-03-15 Thread Dmitriy Monakhov

Jens Axboe [EMAIL PROTECTED] writes:

 On Wed, Mar 14 2007, Dmitriy Monakhov wrote:
 currently if partial write has happened while -commit_write() then page
 wasn't marked as accessed and rebalanced. 

 The -commit_write() return values aren't very well designed imho. Is
 your fix correct getting the pipe_to_file() return value correct now?
I think yes, after my patch this path look like follows:

ret = mapping-a_ops-commit_write(file, page, offset, offset+this_len);
if (ret) {
if(ret == AOP_TRUNCATED_PAGE) {
page_cache_release(page);
goto find_page;
}
if (ret  0)
goto out;
/*
 * Partial write has happened, so 'ret' already initialized by
 * number of bytes written, Where is nothing we have to do here.
 */
} else
ret = this_len;
/*
 * Return the number of bytes written and mark page as
 * accessed, we are now done!
 */
mark_page_accessed(page);
balance_dirty_pages_ratelimited(mapping);
out:
page_cache_release(page);
unlock_page(page);
out_ret:
return ret;
}

But off corse this patch does't fix issues spotted by Nick. 

 -- 
 Jens Axboe

 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/5] fs: introduce new aops and infrastructure

2007-03-14 Thread Dmitriy Monakhov

Nick Piggin <[EMAIL PROTECTED]> writes:

> Introduce write_begin, write_end, and perform_write aops.
>
> These are intended to replace prepare_write and commit_write with more
> flexible alternatives that are also able to avoid the buffered write
> deadlock problems efficiently (which prepare_write is unable to do).
>
>  Documentation/filesystems/Locking |9 +
>  Documentation/filesystems/vfs.txt |   38 ++
>  drivers/block/loop.c  |   69 ---
>  fs/buffer.c   |  144 
>  fs/libfs.c|   40 ++
>  fs/namei.c|   47 ++-
>  fs/splice.c   |  106 +
>  include/linux/buffer_head.h   |7 +
>  include/linux/fs.h|   29 
>  include/linux/pagemap.h   |2 
>  mm/filemap.c  |  228 
> ++
>  11 files changed, 494 insertions(+), 225 deletions(-)
>
> Index: linux-2.6/include/linux/fs.h
> ===
> --- linux-2.6.orig/include/linux/fs.h
> +++ linux-2.6/include/linux/fs.h
> @@ -449,6 +449,17 @@ struct address_space_operations {
>*/
>   int (*prepare_write)(struct file *, struct page *, unsigned, unsigned);
>   int (*commit_write)(struct file *, struct page *, unsigned, unsigned);
> +
> + int (*write_begin)(struct file *, struct address_space *mapping,
> + loff_t pos, unsigned len, int intr,
> + struct page **pagep, void **fsdata);
> + int (*write_end)(struct file *, struct address_space *mapping,
> + loff_t pos, unsigned len, unsigned copied,
> + struct page *page, void *fsdata);
> +
> + int (*perform_write)(struct file *, struct address_space *,
> + struct iov_iter *, loff_t);
> +
>   /* Unfortunately this kludge is needed for FIBMAP. Don't use it */
>   sector_t (*bmap)(struct address_space *, sector_t);
>   void (*invalidatepage) (struct page *, unsigned long);
> @@ -463,6 +474,18 @@ struct address_space_operations {
>   int (*launder_page) (struct page *);
>  };
>  
> +/*
> + * pagecache_write_begin/pagecache_write_end must be used by general code
> + * to write into the pagecache.
> + */
> +int pagecache_write_begin(struct file *, struct address_space *mapping,
> + loff_t pos, unsigned len, int intr,
> + struct page **pagep, void **fsdata);
> +
> +int pagecache_write_end(struct file *, struct address_space *mapping,
> + loff_t pos, unsigned len, unsigned copied,
> + struct page *page, void *fsdata);
> +
>  struct backing_dev_info;
>  struct address_space {
>   struct inode*host;  /* owner: inode, block_device */
> @@ -1902,6 +1925,12 @@ extern int simple_prepare_write(struct f
>   unsigned offset, unsigned to);
>  extern int simple_commit_write(struct file *file, struct page *page,
>   unsigned offset, unsigned to);
> +extern int simple_write_begin(struct file *file, struct address_space 
> *mapping,
> + loff_t pos, unsigned len, int intr,
> + struct page **pagep, void **fsdata);
> +extern int simple_write_end(struct file *file, struct address_space *mapping,
> + loff_t pos, unsigned len, unsigned copied,
> + struct page *page, void *fsdata);
>  
>  extern struct dentry *simple_lookup(struct inode *, struct dentry *, struct 
> nameidata *);
>  extern ssize_t generic_read_dir(struct file *, char __user *, size_t, loff_t 
> *);
> Index: linux-2.6/mm/filemap.c
> ===
> --- linux-2.6.orig/mm/filemap.c
> +++ linux-2.6/mm/filemap.c
> @@ -2049,6 +2049,88 @@ inline int generic_write_checks(struct f
>  }
>  EXPORT_SYMBOL(generic_write_checks);
>  
> +int pagecache_write_begin(struct file *file, struct address_space *mapping,
> + loff_t pos, unsigned len, int intr,
> + struct page **pagep, void **fsdata)
> +{
> + const struct address_space_operations *aops = mapping->a_ops;
> +
> + if (aops->write_begin)
> + return aops->write_begin(file, mapping, pos, len, intr, pagep, 
> fsdata);
> + else {
> + int ret;
> + pgoff_t index = pos >> PAGE_CACHE_SHIFT;
> + unsigned offset = pos & (PAGE_CACHE_SIZE - 1);
> + struct inode *inode = mapping->host;
> + struct page *page;
> +again:
> + page = __grab_cache_page(mapping, index);
> + *pagep = page;
> + if (!page)
> + return -ENOMEM;
> +
> + if (intr && !PageUptodate(page)) {
> +

[PATCH] splice: partial write handling fix

2007-03-14 Thread Dmitriy Monakhov

currently if partial write has happened while ->commit_write() then page
wasn't marked as accessed and rebalanced. 

Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]>

diff --git a/fs/splice.c b/fs/splice.c
index 2fca6eb..bb1bf62 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -682,18 +682,25 @@ find_page:
}
 
ret = mapping->a_ops->commit_write(file, page, offset, offset+this_len);
-   if (!ret) {
+   if (ret) {
+   if(ret == AOP_TRUNCATED_PAGE) {
+   page_cache_release(page);
+   goto find_page;
+   }
+   if (ret < 0)
+   goto out;
/*
-* Return the number of bytes written and mark page as
-* accessed, we are now done!
+* Partial write has happened, so 'ret' already initialized by
+* number of bytes written, Where is nothing we have to do here.
 */
+   } else
ret = this_len;
-   mark_page_accessed(page);
-   balance_dirty_pages_ratelimited(mapping);
-   } else if (ret == AOP_TRUNCATED_PAGE) {
-   page_cache_release(page);
-   goto find_page;
-   }
+   /*
+* Return the number of bytes written and mark page as
+* accessed, we are now done!
+*/
+   mark_page_accessed(page);
+   balance_dirty_pages_ratelimited(mapping);
 out:
page_cache_release(page);
unlock_page(page);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] splice: partial write handling fix

2007-03-14 Thread Dmitriy Monakhov

currently if partial write has happened while -commit_write() then page
wasn't marked as accessed and rebalanced. 

Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]

diff --git a/fs/splice.c b/fs/splice.c
index 2fca6eb..bb1bf62 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -682,18 +682,25 @@ find_page:
}
 
ret = mapping-a_ops-commit_write(file, page, offset, offset+this_len);
-   if (!ret) {
+   if (ret) {
+   if(ret == AOP_TRUNCATED_PAGE) {
+   page_cache_release(page);
+   goto find_page;
+   }
+   if (ret  0)
+   goto out;
/*
-* Return the number of bytes written and mark page as
-* accessed, we are now done!
+* Partial write has happened, so 'ret' already initialized by
+* number of bytes written, Where is nothing we have to do here.
 */
+   } else
ret = this_len;
-   mark_page_accessed(page);
-   balance_dirty_pages_ratelimited(mapping);
-   } else if (ret == AOP_TRUNCATED_PAGE) {
-   page_cache_release(page);
-   goto find_page;
-   }
+   /*
+* Return the number of bytes written and mark page as
+* accessed, we are now done!
+*/
+   mark_page_accessed(page);
+   balance_dirty_pages_ratelimited(mapping);
 out:
page_cache_release(page);
unlock_page(page);

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/5] fs: introduce new aops and infrastructure

2007-03-14 Thread Dmitriy Monakhov

Nick Piggin [EMAIL PROTECTED] writes:

 Introduce write_begin, write_end, and perform_write aops.

 These are intended to replace prepare_write and commit_write with more
 flexible alternatives that are also able to avoid the buffered write
 deadlock problems efficiently (which prepare_write is unable to do).

  Documentation/filesystems/Locking |9 +
  Documentation/filesystems/vfs.txt |   38 ++
  drivers/block/loop.c  |   69 ---
  fs/buffer.c   |  144 
  fs/libfs.c|   40 ++
  fs/namei.c|   47 ++-
  fs/splice.c   |  106 +
  include/linux/buffer_head.h   |7 +
  include/linux/fs.h|   29 
  include/linux/pagemap.h   |2 
  mm/filemap.c  |  228 
 ++
  11 files changed, 494 insertions(+), 225 deletions(-)

 Index: linux-2.6/include/linux/fs.h
 ===
 --- linux-2.6.orig/include/linux/fs.h
 +++ linux-2.6/include/linux/fs.h
 @@ -449,6 +449,17 @@ struct address_space_operations {
*/
   int (*prepare_write)(struct file *, struct page *, unsigned, unsigned);
   int (*commit_write)(struct file *, struct page *, unsigned, unsigned);
 +
 + int (*write_begin)(struct file *, struct address_space *mapping,
 + loff_t pos, unsigned len, int intr,
 + struct page **pagep, void **fsdata);
 + int (*write_end)(struct file *, struct address_space *mapping,
 + loff_t pos, unsigned len, unsigned copied,
 + struct page *page, void *fsdata);
 +
 + int (*perform_write)(struct file *, struct address_space *,
 + struct iov_iter *, loff_t);
 +
   /* Unfortunately this kludge is needed for FIBMAP. Don't use it */
   sector_t (*bmap)(struct address_space *, sector_t);
   void (*invalidatepage) (struct page *, unsigned long);
 @@ -463,6 +474,18 @@ struct address_space_operations {
   int (*launder_page) (struct page *);
  };
  
 +/*
 + * pagecache_write_begin/pagecache_write_end must be used by general code
 + * to write into the pagecache.
 + */
 +int pagecache_write_begin(struct file *, struct address_space *mapping,
 + loff_t pos, unsigned len, int intr,
 + struct page **pagep, void **fsdata);
 +
 +int pagecache_write_end(struct file *, struct address_space *mapping,
 + loff_t pos, unsigned len, unsigned copied,
 + struct page *page, void *fsdata);
 +
  struct backing_dev_info;
  struct address_space {
   struct inode*host;  /* owner: inode, block_device */
 @@ -1902,6 +1925,12 @@ extern int simple_prepare_write(struct f
   unsigned offset, unsigned to);
  extern int simple_commit_write(struct file *file, struct page *page,
   unsigned offset, unsigned to);
 +extern int simple_write_begin(struct file *file, struct address_space 
 *mapping,
 + loff_t pos, unsigned len, int intr,
 + struct page **pagep, void **fsdata);
 +extern int simple_write_end(struct file *file, struct address_space *mapping,
 + loff_t pos, unsigned len, unsigned copied,
 + struct page *page, void *fsdata);
  
  extern struct dentry *simple_lookup(struct inode *, struct dentry *, struct 
 nameidata *);
  extern ssize_t generic_read_dir(struct file *, char __user *, size_t, loff_t 
 *);
 Index: linux-2.6/mm/filemap.c
 ===
 --- linux-2.6.orig/mm/filemap.c
 +++ linux-2.6/mm/filemap.c
 @@ -2049,6 +2049,88 @@ inline int generic_write_checks(struct f
  }
  EXPORT_SYMBOL(generic_write_checks);
  
 +int pagecache_write_begin(struct file *file, struct address_space *mapping,
 + loff_t pos, unsigned len, int intr,
 + struct page **pagep, void **fsdata)
 +{
 + const struct address_space_operations *aops = mapping-a_ops;
 +
 + if (aops-write_begin)
 + return aops-write_begin(file, mapping, pos, len, intr, pagep, 
 fsdata);
 + else {
 + int ret;
 + pgoff_t index = pos  PAGE_CACHE_SHIFT;
 + unsigned offset = pos  (PAGE_CACHE_SIZE - 1);
 + struct inode *inode = mapping-host;
 + struct page *page;
 +again:
 + page = __grab_cache_page(mapping, index);
 + *pagep = page;
 + if (!page)
 + return -ENOMEM;
 +
 + if (intr  !PageUptodate(page)) {
 + /*
 +  * There is no way to resolve a short write situation
 +

[PATCH 2/2] incorrect direct io error handling (v7)

2007-03-12 Thread Dmitriy Monakhov

Changes against v6:
 - Handle direct_io failure inside generic_file_direct_write() as it was
   recommend by Andrew (during discussion v1), and by Nick (during 
   discussion v6).
 - change comments, make it more clear.
 - one more time check what __generic_file_aio_write_nolock() always called
   under i_mutex for non blkdev files.
Tested with: fsstress, manual direct_io tests

Log:
If generic_file_direct_write() has fail (ENOSPC condition) inside 
__generic_file_aio_write_nolock() it may have instantiated
a few blocks outside i_size. And fsck will complain about wrong i_size
(ext2, ext3 and reiserfs interpret i_size and biggest block difference as 
error),
after fsck will fix error i_size will be increased to the biggest block,
but this blocks contain gurbage from previous write attempt, this is not 
information leak, but its silence file data corruption. This issue affect 
fs regardless the values of blocksize or pagesize, and off corse only for non 
blkdev files.
We need truncate any block beyond i_size after write have failed , do in simular
generic_file_buffered_write() error path. We may safely call vmtruncate() here
because i_mutex always held for non blkdev files.
 
TEST_CASE:
open("/mnt/test/BIG_FILE", O_WRONLY|O_CREAT|O_DIRECT, 0666) = 3
write(3, "aaa"..., 104857600) = -1 ENOSPC (No space left on device)

#stat /mnt/test/BIG_FILE
  File: `/mnt/test/BIG_FILE'
  Size: 0   Blocks: 110896 IO Block: 1024   regular empty file
<<
---
 mm/filemap.c |   28 
 1 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 8bd1ea4..95d49fe 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1932,8 +1932,10 @@ generic_file_direct_write(struct kiocb *iocb, const 
struct iovec *iov,
/*
 * Sync the fs metadata but not the minor inode changes and
 * of course not the data as we did direct DMA for the IO.
-* i_mutex is held, which protects generic_osync_inode() from
-* livelocking.  AIO O_DIRECT ops attempt to sync metadata here.
+* i_mutex is held in case of DIO_LOCKING, which protects 
+* generic_osync_inode() from livelocking. If it is not held, then 
+* the filesystem must prevent this livelock. AIO O_DIRECT ops 
+* attempt to sync metadata here.
 */
if ((written >= 0 || written == -EIOCBQUEUED) &&
((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
@@ -2155,8 +2157,26 @@ __generic_file_aio_write_nolock(struct kiocb *iocb, 
const struct iovec *iov,
loff_t endbyte;
ssize_t written_buffered;
 
+   /*
+* In case of non blockdev we may fail to buffered I/O.
+* So i_mutex must be held.
+*/
+   if (!S_ISBLK(inode->i_mode))
+   BUG_ON(!mutex_is_locked(>i_mutex));
+
written = generic_file_direct_write(iocb, iov, _segs, pos,
ppos, count, ocount);
+   /*
+* If host is not S_ISBLK generic_file_direct_write() may 
+* have instantiated a few blocks outside i_size  files
+* Trim these off again.
+*/
+   if (unlikely(written < 0) && !S_ISBLK(inode->i_mode)) {
+   loff_t isize = i_size_read(inode);
+   if (pos + count > isize)
+   vmtruncate(inode, isize);
+   }
+
if (written < 0 || written == count)
goto out;
/*
@@ -2261,8 +2281,8 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const 
struct iovec *iov,
 EXPORT_SYMBOL(generic_file_aio_write);
 
 /*
- * Called under i_mutex for writes to S_ISREG files.   Returns -EIO if 
something
- * went wrong during pagecache shootdown.
+ * Called under i_mutex for writes to S_ISREG files in case of DIO_LOCKING.
+ * Returns -EIO if something went wrong during pagecache shootdown.
  */
 static ssize_t
 generic_file_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
-- 
1.5.0.1


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at

[PATCH 1/2] mm: move common segment checks to separate helper function (v7)

2007-03-12 Thread Dmitriy Monakhov

Changes against v6
 - remove duplicated code from xfs,ntfs
 - export generic_segment_checks, because it used by xfs,nfs now.
 - change arguments initialization pocily according to Nick's comments.

Tested with: ltp readv/writev tests

Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]>
---
 fs/ntfs/file.c |   21 ++-
 fs/xfs/linux-2.6/xfs_lrw.c |   22 ++--
 include/linux/fs.h |3 ++
 mm/filemap.c   |   83 ---
 4 files changed, 55 insertions(+), 74 deletions(-)

diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c
index dbbac55..621de36 100644
--- a/fs/ntfs/file.c
+++ b/fs/ntfs/file.c
@@ -2129,28 +2129,13 @@ static ssize_t ntfs_file_aio_write_nolock(struct kiocb 
*iocb,
struct address_space *mapping = file->f_mapping;
struct inode *inode = mapping->host;
loff_t pos;
-   unsigned long seg;
size_t count;   /* after file limit checks */
ssize_t written, err;
 
count = 0;
-   for (seg = 0; seg < nr_segs; seg++) {
-   const struct iovec *iv = [seg];
-   /*
-* If any segment has a negative length, or the cumulative
-* length ever wraps negative then return -EINVAL.
-*/
-   count += iv->iov_len;
-   if (unlikely((ssize_t)(count|iv->iov_len) < 0))
-   return -EINVAL;
-   if (access_ok(VERIFY_READ, iv->iov_base, iv->iov_len))
-   continue;
-   if (!seg)
-   return -EFAULT;
-   nr_segs = seg;
-   count -= iv->iov_len;   /* This segment is no good */
-   break;
-   }
+   err = generic_segment_checks(iov, _segs, , VERIFY_READ);
+   if (err)
+   return err;
pos = *ppos;
vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
/* We can write back this queue in page reclaim. */
diff --git a/fs/xfs/linux-2.6/xfs_lrw.c b/fs/xfs/linux-2.6/xfs_lrw.c
index ff8d64e..558076d 100644
--- a/fs/xfs/linux-2.6/xfs_lrw.c
+++ b/fs/xfs/linux-2.6/xfs_lrw.c
@@ -639,7 +639,6 @@ xfs_write(
xfs_fsize_t isize, new_size;
xfs_iocore_t*io;
bhv_vnode_t *vp;
-   unsigned long   seg;
int iolock;
int eventsent = 0;
bhv_vrwlock_t   locktype;
@@ -652,24 +651,9 @@ xfs_write(
vp = BHV_TO_VNODE(bdp);
xip = XFS_BHVTOI(bdp);
 
-   for (seg = 0; seg < segs; seg++) {
-   const struct iovec *iv = [seg];
-
-   /*
-* If any segment has a negative length, or the cumulative
-* length ever wraps negative then return -EINVAL.
-*/
-   ocount += iv->iov_len;
-   if (unlikely((ssize_t)(ocount|iv->iov_len) < 0))
-   return -EINVAL;
-   if (access_ok(VERIFY_READ, iv->iov_base, iv->iov_len))
-   continue;
-   if (seg == 0)
-   return -EFAULT;
-   segs = seg;
-   ocount -= iv->iov_len;  /* This segment is no good */
-   break;
-   }
+   error = generic_segment_checks(iovp, , , VERIFY_READ);
+   if (error)
+   return error;
 
count = ocount;
pos = *offset;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6a3d22e..3b99450 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1778,6 +1778,9 @@ extern ssize_t generic_file_sendfile(struct file *, 
loff_t *, size_t, read_actor
 extern void do_generic_mapping_read(struct address_space *mapping,
struct file_ra_state *, struct file *,
loff_t *, read_descriptor_t *, 
read_actor_t);
+extern int generic_segment_checks(const struct iovec *iov,
+   unsigned long *nr_segs, size_t *count,
+   unsigned long access_flags);
 
 /* fs/splice.c */
 extern ssize_t generic_file_splice_read(struct file *, loff_t *,
diff --git a/mm/filemap.c b/mm/filemap.c
index 8e1849a..8bd1ea4 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1159,6 +1159,46 @@ success:
return size;
 }
 
+/*
+ * Performs necessary checks before doing a write
+ * @iov:   io vector request
+ * @nr_segs:   number of segments in the iovec
+ * @count: number of bytes to write
+ * @access_flags: type of access: %VERIFY_READ or %VERIFY_WRITE
+ *
+ * Adjust number of segments and amount of bytes to write (nr_segs should be
+ * properly initialized first). Returns appropriate error code that caller 
+ * should return or zero in case that write should be allowed.
+ */
+int generic_segment_checks(const struct iovec *iov,
+   unsigned long *nr_segs, size_t *count,
+   unsigned long access_flags)
+{
+   unsigned long

Re: [PATCH 1/2] mm: move common segment checks to separate helper function (v6)

2007-03-12 Thread Dmitriy Monakhov

Nick Piggin <[EMAIL PROTECTED]> writes:

> On Mon, Mar 12, 2007 at 10:57:53AM +0300, Dmitriy Monakhov wrote:
>> I realy don't want to be annoying by sending this patcheset over and over
>> again. If anyone think this patch is realy cappy, please comment what 
>> exectly is bad. Thank you.
>
> Doesn't seem like a bad idea.
>
>> 
>> Changes:
>>   - patch was split in two patches.

>> +/*
>> + * Performs necessary checks before doing a write
>> + *
>> + * Adjust number of segments and amount of bytes to write.
>> + * Returns appropriate error code that caller should return or
>> + * zero in case that write should be allowed.
>> + */
>> +inline int generic_segment_checks(const struct iovec *iov,
>> +unsigned long *nr_segs, size_t *count,
>> +unsigned long access_flags)
>
> Make it static and not inline, and the compiler will work it out.
Wow i've just carefully checked and found more functions with duplicating code:
fs/xfs/linux-2.6/xfs_lrw.c:655 xfs_write()
fs/ntfs/file.c:2339 ntfs_file_aio_write_nolock()
So i think nobody will object against exporting generic_segment_checks()
and removing doplicating code.
>
> This function name doesn't really imply that it returns you the
> nr_segs and count, but that's not a big deal I guess.
>
> You also don't say that nr_segs should be initialised to the amount
> you which to write, while count must be initialised to zero.
>
>> +{
>> +unsigned long   seg;
>> +for (seg = 0; seg < *nr_segs; seg++) {
>> +const struct iovec *iv = [seg];
>> +
>> +/*
>> + * If any segment has a negative length, or the cumulative
>> + * length ever wraps negative then return -EINVAL.
>> + */
>> +*count += iv->iov_len;
>> +if (unlikely((ssize_t)(*count|iv->iov_len) < 0))
>> +return -EINVAL;
>> +if (access_ok(access_flags, iv->iov_base, iv->iov_len))
>> +continue;
>
> Why now insert the above test, and put the below statements inside the
> branch? OTOH, that makes it less obviously c from the others. Maybe
> a subsequent patch.
>
>> +if (seg == 0)
>> +return -EFAULT;
>> +*nr_segs = seg;
>> +*count -= iv->iov_len;  /* This segment is no good */
>> +break;
>> +}
>
>
> You could assign to *count here, once, and remove the requirement
> that the caller initialised it to zero?
>
>> +return 0;
>> +}
>> +
>>  /**
>>   * generic_file_aio_read - generic filesystem read routine
>>   * @iocb:   kernel I/O control block
>> @@ -1180,24 +1213,9 @@ generic_file_aio_read(struct kiocb *iocb, const 
>> struct iovec *iov,
>>  loff_t *ppos = >ki_pos;
>>  
>>  count = 0;
>> -for (seg = 0; seg < nr_segs; seg++) {
>> -const struct iovec *iv = [seg];
>> -
>> -/*
>> - * If any segment has a negative length, or the cumulative
>> - * length ever wraps negative then return -EINVAL.
>> - */
>> -count += iv->iov_len;
>> -if (unlikely((ssize_t)(count|iv->iov_len) < 0))
>> -return -EINVAL;
>> -if (access_ok(VERIFY_WRITE, iv->iov_base, iv->iov_len))
>> -continue;
>> -if (seg == 0)
>> -return -EFAULT;
>> -nr_segs = seg;
>> -count -= iv->iov_len;   /* This segment is no good */
>> -break;
>> -}
>> +retval = generic_segment_checks(iov, _segs, , VERIFY_WRITE);
>> +if (retval)
>> +return retval;
>>  
>>  /* coalesce the iovecs and go direct-to-BIO for O_DIRECT */
>>  if (filp->f_flags & O_DIRECT) {
>> @@ -2094,30 +2112,14 @@ __generic_file_aio_write_nolock(struct kiocb *iocb, 
>> const struct iovec *iov,
>>  size_t ocount;  /* original count */
>>  size_t count;   /* after file limit checks */
>>  struct inode*inode = mapping->host;
>> -unsigned long   seg;
>>  loff_t  pos;
>>  ssize_t written;
>>  ssize_t err;
>>  
>>  ocount = 0;
>> -for (seg = 0; seg < nr_segs; seg++) {
>> -const struct iovec *iv = [seg];
>> -
>> -/*
>> - * If any segment has a negative length, or the cumu

Re: [PATCH 2/2] mm: incorrect direct io error handling (v6)

2007-03-12 Thread Dmitriy Monakhov

Nick Piggin <[EMAIL PROTECTED]> writes:

> On Mon, Mar 12, 2007 at 11:55:30AM +0300, Dmitriy Monakhov wrote:
>> Nick Piggin <[EMAIL PROTECTED]> writes:
>> 
>> > On Mon, Mar 12, 2007 at 10:58:10AM +0300, Dmitriy Monakhov wrote:
>
>> >> @@ -2240,6 +2241,29 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, 
>> >> const struct iovec *iov,
>> >>   mutex_lock(>i_mutex);
>> >>   ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs,
>> >>   >ki_pos);
>> >> + /* 
>> >> +  * If __generic_file_aio_write_nolock has failed.
>> >> +  * This may happen because of:
>> >> +  * 1) Bad segment found (failed before actual write attempt)
>> >> +  * 2) Segments are good, but actual write operation failed
>> >> +  *and may have instantiated a few blocks outside i_size.
>> >> +  *   a) in case of buffered write these blocks was already
>> >> +  *  trimmed by generic_file_buffered_write()
>> >> +  *   b) in case of O_DIRECT these blocks weren't trimmed yet.
>> >> +  *
>> >> +  * In case of (2b) these blocks have to be trimmed off again.
>> >> +  */
>> >> + if (unlikely( ret < 0 && file->f_flags & O_DIRECT)) {
>> >> + unsigned long nr_segs_avail = nr_segs;
>> >> + size_t count = 0;
>> >> + if (!generic_segment_checks(iov, _segs_avail, ,
>> >> + VERIFY_READ)) {
>> >> + /*It is (2b) case, because segments are good*/
>> >> + loff_t isize = i_size_read(inode);
>> >> + if (pos + count > isize)
>> >> + vmtruncate(inode, isize);
>> >> + }
>> >> + }
>> >
>> > OK, but wouldn't this be better to be done in the actual direct IO
>> > functions themselves? Thus you could be sure that you have the 2b case,
>> > and the code would be less fragile to something changing?
>> Ohh, We can't just call vmtruncate() after generic_file_direct_write()
>> failure while __generic_file_aio_write_nolock() becase where is no guarantee
>> what i_mutex held. In fact all existing fs always invoke 
>> __generic_file_aio_write_nolock() with i_mutex held in case of S_ISREG files,
>> but this was't explicitly demanded and documented. I've proposed to do it in
>> previous versions of this patch, because it this just document current state
>> of affairs, but David Chinner wasn't agree with it.
>
> It seemed like it was documented in the comments that you altered in this
> patch...
>
> How would such a filesystem that did not hold i_mutex propose to fix the
> problem?
>
> The burden should be on those filesystems that might not want to hold
> i_mutex here, to solve the problem nicely, rather than generic code to take
> this ugly code.
Ok then what do you think about this version http://lkml.org/lkml/2006/12/18/103
witch was posted almost  month ago :)
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] mm: incorrect direct io error handling (v6)

2007-03-12 Thread Dmitriy Monakhov

Nick Piggin <[EMAIL PROTECTED]> writes:

> On Mon, Mar 12, 2007 at 10:58:10AM +0300, Dmitriy Monakhov wrote:
>> I realy don't want to be annoying by sending this patcheset over and over
>> again, i just want the issue to be solved. If anyone think this solution
>> is realy cappy, please comment what exectly is bad. Thank you.
>
> If you don't get feedback, then you have to keep posting. If you still
> don't get feedback, try cc'ing a few more lists (eg. linux-fsdevel).
>
>> Changes:
>>   - patch was split in two patches.
>>   - comments added. I think now it is clearly describe things.
>>   - patch prepared against 2.6.20-mm3
>> 
>> How this patch tested:
>>   - fsstress test.
>>   - manual direct_io tests.
>> 
>> LOG:
>>  - Trim off blocks after generic_file_direct_write() has failed.
>>  - Update out of date comments about direct_io locking rules.
>
> It can be nice to expand on what the problem was, and how you fixed it...
> but I guess you do quite a good job in the C comments.
If generic_file_direct_write() has fail (ENOSPC condition) inside 
__generic_file_aio_write_nolock() it may have instantiated
a few blocks outside i_size. And fsck will complain about wrong i_size
(ext2, ext3 and reiserfs interpret i_size and biggest block difference as 
error),
after fsck will fix error i_size will be increased to the biggest block,
but this blocks contain gurbage from previous write attempt, this is not 
information leak, but its silence file data corruption. This issue affect 
fs regardless the values of blocksize or pagesize.
We need truncate any block beyond i_size after write have failed , do in simular
generic_file_buffered_write() error path.
TEST_CASE:
open("/mnt/test/BIG_FILE", O_WRONLY|O_CREAT|O_DIRECT, 0666) = 3
write(3, "aaa"..., 104857600) = -1 ENOSPC (No space left on device)

#stat /mnt/test/BIG_FILE
  File: `/mnt/test/BIG_FILE'
  Size: 0   Blocks: 110896 IO Block: 1024   regular empty file
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<file size is less than biggest block idx

Device: fe07h/65031dInode: 14  Links: 1
Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
Access: 2007-01-24 20:03:38.0 +0300
Modify: 2007-01-24 20:03:38.0 +0300
Change: 2007-01-24 20:03:39.0 +0300

#fsck.ext3 -f /dev/VG/test 
e2fsck 1.39 (29-May-2006)
Pass 1: Checking inodes, blocks, and sizes
Inode 14, i_size is 0, should be 56556544.  Fix? yes
Pass 2: Checking directory structure

>
>> 
>> Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]>
>> ---
>>  mm/filemap.c |   32 
>>  1 files changed, 28 insertions(+), 4 deletions(-)
>> 
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 0aadf5f..8959ae3 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -1925,8 +1925,9 @@ generic_file_direct_write(struct kiocb *iocb, const 
>> struct iovec *iov,
>>  /*
>>   * Sync the fs metadata but not the minor inode changes and
>>   * of course not the data as we did direct DMA for the IO.
>> - * i_mutex is held, which protects generic_osync_inode() from
>> - * livelocking.  AIO O_DIRECT ops attempt to sync metadata here.
>> + * i_mutex may not being held, if so some specific locking
>> + * ordering must protect generic_osync_inode() from livelocking.
>> + * AIO O_DIRECT ops attempt to sync metadata here.
>>   */
>
> This wasn't exactly clear to me. Did you mean:
>
>  "may be held, which protects generic_osync_inode() from livelocking. If it
>   is not held, then the filesystem must prevent this livelock"?
Yep..  my english is not realy good :(
>
>>  if ((written >= 0 || written == -EIOCBQUEUED) &&
>>  ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
>> @@ -2240,6 +2241,29 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, 
>> const struct iovec *iov,
>>  mutex_lock(>i_mutex);
>>  ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs,
>>  >ki_pos);
>> +/* 
>> + * If __generic_file_aio_write_nolock has failed.
>> + * This may happen because of:
>> + * 1) Bad segment found (failed before actual write attempt)
>> + * 2) Segments are good, but actual write operation failed
>> + *and may have instantiated a few blocks outside i_size.
>> + *   a) in case of buffered write these blocks was already
>> + *  trimmed by generic_file_buffered_write()
>> + *   b) in case of O_DIRECT t

[PATCH 1/2] mm: move common segment checks to separate helper function (v6)

2007-03-12 Thread Dmitriy Monakhov

I realy don't want to be annoying by sending this patcheset over and over
again. If anyone think this patch is realy cappy, please comment what 
exectly is bad. Thank you.

Changes:
  - patch was split in two patches.
  - comments added. I think now it is clearly describe things.
  - make generic_segment_checks() inline
  - patch prepared against 2.6.20-mm3

How is tested:
 - LTP test, and other readv/writev op tests.

Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]>
---
 mm/filemap.c |   76 +
 1 files changed, 39 insertions(+), 37 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 8e1849a..0aadf5f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1159,6 +1159,39 @@ success:
return size;
 }
 
+/*
+ * Performs necessary checks before doing a write
+ *
+ * Adjust number of segments and amount of bytes to write.
+ * Returns appropriate error code that caller should return or
+ * zero in case that write should be allowed.
+ */
+inline int generic_segment_checks(const struct iovec *iov,
+   unsigned long *nr_segs, size_t *count,
+   unsigned long access_flags)
+{
+   unsigned long   seg;
+   for (seg = 0; seg < *nr_segs; seg++) {
+   const struct iovec *iv = [seg];
+
+   /*
+* If any segment has a negative length, or the cumulative
+* length ever wraps negative then return -EINVAL.
+*/
+   *count += iv->iov_len;
+   if (unlikely((ssize_t)(*count|iv->iov_len) < 0))
+   return -EINVAL;
+   if (access_ok(access_flags, iv->iov_base, iv->iov_len))
+   continue;
+   if (seg == 0)
+   return -EFAULT;
+   *nr_segs = seg;
+   *count -= iv->iov_len;  /* This segment is no good */
+   break;
+   }
+   return 0;
+}
+
 /**
  * generic_file_aio_read - generic filesystem read routine
  * @iocb:  kernel I/O control block
@@ -1180,24 +1213,9 @@ generic_file_aio_read(struct kiocb *iocb, const struct 
iovec *iov,
loff_t *ppos = >ki_pos;
 
count = 0;
-   for (seg = 0; seg < nr_segs; seg++) {
-   const struct iovec *iv = [seg];
-
-   /*
-* If any segment has a negative length, or the cumulative
-* length ever wraps negative then return -EINVAL.
-*/
-   count += iv->iov_len;
-   if (unlikely((ssize_t)(count|iv->iov_len) < 0))
-   return -EINVAL;
-   if (access_ok(VERIFY_WRITE, iv->iov_base, iv->iov_len))
-   continue;
-   if (seg == 0)
-   return -EFAULT;
-   nr_segs = seg;
-   count -= iv->iov_len;   /* This segment is no good */
-   break;
-   }
+   retval = generic_segment_checks(iov, _segs, , VERIFY_WRITE);
+   if (retval)
+   return retval;
 
/* coalesce the iovecs and go direct-to-BIO for O_DIRECT */
if (filp->f_flags & O_DIRECT) {
@@ -2094,30 +2112,14 @@ __generic_file_aio_write_nolock(struct kiocb *iocb, 
const struct iovec *iov,
size_t ocount;  /* original count */
size_t count;   /* after file limit checks */
struct inode*inode = mapping->host;
-   unsigned long   seg;
loff_t  pos;
ssize_t written;
ssize_t err;
 
ocount = 0;
-   for (seg = 0; seg < nr_segs; seg++) {
-   const struct iovec *iv = [seg];
-
-   /*
-* If any segment has a negative length, or the cumulative
-* length ever wraps negative then return -EINVAL.
-*/
-   ocount += iv->iov_len;
-   if (unlikely((ssize_t)(ocount|iv->iov_len) < 0))
-   return -EINVAL;
-   if (access_ok(VERIFY_READ, iv->iov_base, iv->iov_len))
-   continue;
-   if (seg == 0)
-   return -EFAULT;
-   nr_segs = seg;
-   ocount -= iv->iov_len;  /* This segment is no good */
-   break;
-   }
+   err = generic_segment_checks(iov, _segs, , VERIFY_READ);
+   if (err)
+   return err;
 
count = ocount;
pos = *ppos;
-- 
1.5.0.1


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] mm: incorrect direct io error handling (v6)

2007-03-12 Thread Dmitriy Monakhov

I realy don't want to be annoying by sending this patcheset over and over
again, i just want the issue to be solved. If anyone think this solution
is realy cappy, please comment what exectly is bad. Thank you.

Changes:
  - patch was split in two patches.
  - comments added. I think now it is clearly describe things.
  - patch prepared against 2.6.20-mm3

How this patch tested:
  - fsstress test.
  - manual direct_io tests.

LOG:
 - Trim off blocks after generic_file_direct_write() has failed.
 - Update out of date comments about direct_io locking rules.

Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]>
---
 mm/filemap.c |   32 
 1 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 0aadf5f..8959ae3 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1925,8 +1925,9 @@ generic_file_direct_write(struct kiocb *iocb, const 
struct iovec *iov,
/*
 * Sync the fs metadata but not the minor inode changes and
 * of course not the data as we did direct DMA for the IO.
-* i_mutex is held, which protects generic_osync_inode() from
-* livelocking.  AIO O_DIRECT ops attempt to sync metadata here.
+* i_mutex may not being held, if so some specific locking
+* ordering must protect generic_osync_inode() from livelocking.
+* AIO O_DIRECT ops attempt to sync metadata here.
 */
if ((written >= 0 || written == -EIOCBQUEUED) &&
((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
@@ -2240,6 +2241,29 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const 
struct iovec *iov,
mutex_lock(>i_mutex);
ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs,
>ki_pos);
+   /* 
+* If __generic_file_aio_write_nolock has failed.
+* This may happen because of:
+* 1) Bad segment found (failed before actual write attempt)
+* 2) Segments are good, but actual write operation failed
+*and may have instantiated a few blocks outside i_size.
+*   a) in case of buffered write these blocks was already
+*  trimmed by generic_file_buffered_write()
+*   b) in case of O_DIRECT these blocks weren't trimmed yet.
+*
+* In case of (2b) these blocks have to be trimmed off again.
+*/
+   if (unlikely( ret < 0 && file->f_flags & O_DIRECT)) {
+   unsigned long nr_segs_avail = nr_segs;
+   size_t count = 0;
+   if (!generic_segment_checks(iov, _segs_avail, ,
+   VERIFY_READ)) {
+   /*It is (2b) case, because segments are good*/
+   loff_t isize = i_size_read(inode);
+   if (pos + count > isize)
+   vmtruncate(inode, isize);
+   }
+   }
mutex_unlock(>i_mutex);
 
if (ret > 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
@@ -2254,8 +2278,8 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const 
struct iovec *iov,
 EXPORT_SYMBOL(generic_file_aio_write);
 
 /*
- * Called under i_mutex for writes to S_ISREG files.   Returns -EIO if 
something
- * went wrong during pagecache shootdown.
+ * May be called without i_mutex for writes to S_ISREG files.
+ * Returns -EIO if something went wrong during pagecache shootdown.
  */
 static ssize_t
 generic_file_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
-- 
1.5.0.1


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ext3: dirindex error pointer issues (b)

2007-03-12 Thread Dmitriy Monakhov

Dmitriy Monakhov <[EMAIL PROTECTED]> writes:

>  - ext3_dx_find_entry() exit with out setting proper error pointer
>  - do_split() exit with out setting proper error pointer
>it is realy painful because many callers contain folowing code:
>de = do_split(handle,dir, , frame, , );
>if (!(de))
> return retval;
><<< WOW retval wasn't changed by do_split(), so caller failed
><<< but return SUCCESS :)
>  - Rearrange do_split() error path. Current error path is realy ugly, all
>this up and down jump stuff doesn't make code easy to understand.
Ohh my first patch change error message semantics in do_split(). Initially when 
ext3_append() failed we just exit without printing error. In fact ext3_append()
may fail, it is legal and it's happens qite often (ENOSPC for example). This 
cause annoying fake error message. So restore this semantic as it was before
patch.
Andrew please apply incremental patch what fix it:

Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]>

diff --git a/fs/ext3/namei.c b/fs/ext3/namei.c
index 1a52586..7edb617 100644
--- a/fs/ext3/namei.c
+++ b/fs/ext3/namei.c
@@ -1197,8 +1197,8 @@ journal_error:
brelse(*bh);
brelse(bh2);
*bh = NULL;
-errout:
ext3_std_error(dir->i_sb, err);
+errout:
*error = err;
return NULL;
 }
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index f0a6c26..02a75db 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1195,8 +1195,8 @@ journal_error:
brelse(*bh);
brelse(bh2);
*bh = NULL;
-errout:
ext4_std_error(dir->i_sb, err);
+errout:
*error = err;
return NULL;
 }

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ext3: dirindex error pointer issues (b)

2007-03-12 Thread Dmitriy Monakhov

Dmitriy Monakhov [EMAIL PROTECTED] writes:

  - ext3_dx_find_entry() exit with out setting proper error pointer
  - do_split() exit with out setting proper error pointer
it is realy painful because many callers contain folowing code:
de = do_split(handle,dir, bh, frame, hinfo, retval);
if (!(de))
 return retval;
 WOW retval wasn't changed by do_split(), so caller failed
 but return SUCCESS :)
  - Rearrange do_split() error path. Current error path is realy ugly, all
this up and down jump stuff doesn't make code easy to understand.
Ohh my first patch change error message semantics in do_split(). Initially when 
ext3_append() failed we just exit without printing error. In fact ext3_append()
may fail, it is legal and it's happens qite often (ENOSPC for example). This 
cause annoying fake error message. So restore this semantic as it was before
patch.
Andrew please apply incremental patch what fix it:

Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]

diff --git a/fs/ext3/namei.c b/fs/ext3/namei.c
index 1a52586..7edb617 100644
--- a/fs/ext3/namei.c
+++ b/fs/ext3/namei.c
@@ -1197,8 +1197,8 @@ journal_error:
brelse(*bh);
brelse(bh2);
*bh = NULL;
-errout:
ext3_std_error(dir-i_sb, err);
+errout:
*error = err;
return NULL;
 }
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index f0a6c26..02a75db 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1195,8 +1195,8 @@ journal_error:
brelse(*bh);
brelse(bh2);
*bh = NULL;
-errout:
ext4_std_error(dir-i_sb, err);
+errout:
*error = err;
return NULL;
 }

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] mm: move common segment checks to separate helper function (v6)

2007-03-12 Thread Dmitriy Monakhov

I realy don't want to be annoying by sending this patcheset over and over
again. If anyone think this patch is realy cappy, please comment what 
exectly is bad. Thank you.

Changes:
  - patch was split in two patches.
  - comments added. I think now it is clearly describe things.
  - make generic_segment_checks() inline
  - patch prepared against 2.6.20-mm3

How is tested:
 - LTP test, and other readv/writev op tests.

Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]
---
 mm/filemap.c |   76 +
 1 files changed, 39 insertions(+), 37 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 8e1849a..0aadf5f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1159,6 +1159,39 @@ success:
return size;
 }
 
+/*
+ * Performs necessary checks before doing a write
+ *
+ * Adjust number of segments and amount of bytes to write.
+ * Returns appropriate error code that caller should return or
+ * zero in case that write should be allowed.
+ */
+inline int generic_segment_checks(const struct iovec *iov,
+   unsigned long *nr_segs, size_t *count,
+   unsigned long access_flags)
+{
+   unsigned long   seg;
+   for (seg = 0; seg  *nr_segs; seg++) {
+   const struct iovec *iv = iov[seg];
+
+   /*
+* If any segment has a negative length, or the cumulative
+* length ever wraps negative then return -EINVAL.
+*/
+   *count += iv-iov_len;
+   if (unlikely((ssize_t)(*count|iv-iov_len)  0))
+   return -EINVAL;
+   if (access_ok(access_flags, iv-iov_base, iv-iov_len))
+   continue;
+   if (seg == 0)
+   return -EFAULT;
+   *nr_segs = seg;
+   *count -= iv-iov_len;  /* This segment is no good */
+   break;
+   }
+   return 0;
+}
+
 /**
  * generic_file_aio_read - generic filesystem read routine
  * @iocb:  kernel I/O control block
@@ -1180,24 +1213,9 @@ generic_file_aio_read(struct kiocb *iocb, const struct 
iovec *iov,
loff_t *ppos = iocb-ki_pos;
 
count = 0;
-   for (seg = 0; seg  nr_segs; seg++) {
-   const struct iovec *iv = iov[seg];
-
-   /*
-* If any segment has a negative length, or the cumulative
-* length ever wraps negative then return -EINVAL.
-*/
-   count += iv-iov_len;
-   if (unlikely((ssize_t)(count|iv-iov_len)  0))
-   return -EINVAL;
-   if (access_ok(VERIFY_WRITE, iv-iov_base, iv-iov_len))
-   continue;
-   if (seg == 0)
-   return -EFAULT;
-   nr_segs = seg;
-   count -= iv-iov_len;   /* This segment is no good */
-   break;
-   }
+   retval = generic_segment_checks(iov, nr_segs, count, VERIFY_WRITE);
+   if (retval)
+   return retval;
 
/* coalesce the iovecs and go direct-to-BIO for O_DIRECT */
if (filp-f_flags  O_DIRECT) {
@@ -2094,30 +2112,14 @@ __generic_file_aio_write_nolock(struct kiocb *iocb, 
const struct iovec *iov,
size_t ocount;  /* original count */
size_t count;   /* after file limit checks */
struct inode*inode = mapping-host;
-   unsigned long   seg;
loff_t  pos;
ssize_t written;
ssize_t err;
 
ocount = 0;
-   for (seg = 0; seg  nr_segs; seg++) {
-   const struct iovec *iv = iov[seg];
-
-   /*
-* If any segment has a negative length, or the cumulative
-* length ever wraps negative then return -EINVAL.
-*/
-   ocount += iv-iov_len;
-   if (unlikely((ssize_t)(ocount|iv-iov_len)  0))
-   return -EINVAL;
-   if (access_ok(VERIFY_READ, iv-iov_base, iv-iov_len))
-   continue;
-   if (seg == 0)
-   return -EFAULT;
-   nr_segs = seg;
-   ocount -= iv-iov_len;  /* This segment is no good */
-   break;
-   }
+   err = generic_segment_checks(iov, nr_segs, ocount, VERIFY_READ);
+   if (err)
+   return err;
 
count = ocount;
pos = *ppos;
-- 
1.5.0.1


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] mm: incorrect direct io error handling (v6)

2007-03-12 Thread Dmitriy Monakhov

I realy don't want to be annoying by sending this patcheset over and over
again, i just want the issue to be solved. If anyone think this solution
is realy cappy, please comment what exectly is bad. Thank you.

Changes:
  - patch was split in two patches.
  - comments added. I think now it is clearly describe things.
  - patch prepared against 2.6.20-mm3

How this patch tested:
  - fsstress test.
  - manual direct_io tests.

LOG:
 - Trim off blocks after generic_file_direct_write() has failed.
 - Update out of date comments about direct_io locking rules.

Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]
---
 mm/filemap.c |   32 
 1 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 0aadf5f..8959ae3 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1925,8 +1925,9 @@ generic_file_direct_write(struct kiocb *iocb, const 
struct iovec *iov,
/*
 * Sync the fs metadata but not the minor inode changes and
 * of course not the data as we did direct DMA for the IO.
-* i_mutex is held, which protects generic_osync_inode() from
-* livelocking.  AIO O_DIRECT ops attempt to sync metadata here.
+* i_mutex may not being held, if so some specific locking
+* ordering must protect generic_osync_inode() from livelocking.
+* AIO O_DIRECT ops attempt to sync metadata here.
 */
if ((written = 0 || written == -EIOCBQUEUED) 
((file-f_flags  O_SYNC) || IS_SYNC(inode))) {
@@ -2240,6 +2241,29 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const 
struct iovec *iov,
mutex_lock(inode-i_mutex);
ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs,
iocb-ki_pos);
+   /* 
+* If __generic_file_aio_write_nolock has failed.
+* This may happen because of:
+* 1) Bad segment found (failed before actual write attempt)
+* 2) Segments are good, but actual write operation failed
+*and may have instantiated a few blocks outside i_size.
+*   a) in case of buffered write these blocks was already
+*  trimmed by generic_file_buffered_write()
+*   b) in case of O_DIRECT these blocks weren't trimmed yet.
+*
+* In case of (2b) these blocks have to be trimmed off again.
+*/
+   if (unlikely( ret  0  file-f_flags  O_DIRECT)) {
+   unsigned long nr_segs_avail = nr_segs;
+   size_t count = 0;
+   if (!generic_segment_checks(iov, nr_segs_avail, count,
+   VERIFY_READ)) {
+   /*It is (2b) case, because segments are good*/
+   loff_t isize = i_size_read(inode);
+   if (pos + count  isize)
+   vmtruncate(inode, isize);
+   }
+   }
mutex_unlock(inode-i_mutex);
 
if (ret  0  ((file-f_flags  O_SYNC) || IS_SYNC(inode))) {
@@ -2254,8 +2278,8 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const 
struct iovec *iov,
 EXPORT_SYMBOL(generic_file_aio_write);
 
 /*
- * Called under i_mutex for writes to S_ISREG files.   Returns -EIO if 
something
- * went wrong during pagecache shootdown.
+ * May be called without i_mutex for writes to S_ISREG files.
+ * Returns -EIO if something went wrong during pagecache shootdown.
  */
 static ssize_t
 generic_file_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
-- 
1.5.0.1


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] mm: incorrect direct io error handling (v6)

2007-03-12 Thread Dmitriy Monakhov

Nick Piggin [EMAIL PROTECTED] writes:

 On Mon, Mar 12, 2007 at 10:58:10AM +0300, Dmitriy Monakhov wrote:
 I realy don't want to be annoying by sending this patcheset over and over
 again, i just want the issue to be solved. If anyone think this solution
 is realy cappy, please comment what exectly is bad. Thank you.

 If you don't get feedback, then you have to keep posting. If you still
 don't get feedback, try cc'ing a few more lists (eg. linux-fsdevel).

 Changes:
   - patch was split in two patches.
   - comments added. I think now it is clearly describe things.
   - patch prepared against 2.6.20-mm3
 
 How this patch tested:
   - fsstress test.
   - manual direct_io tests.
 
 LOG:
  - Trim off blocks after generic_file_direct_write() has failed.
  - Update out of date comments about direct_io locking rules.

 It can be nice to expand on what the problem was, and how you fixed it...
 but I guess you do quite a good job in the C comments.
If generic_file_direct_write() has fail (ENOSPC condition) inside 
__generic_file_aio_write_nolock() it may have instantiated
a few blocks outside i_size. And fsck will complain about wrong i_size
(ext2, ext3 and reiserfs interpret i_size and biggest block difference as 
error),
after fsck will fix error i_size will be increased to the biggest block,
but this blocks contain gurbage from previous write attempt, this is not 
information leak, but its silence file data corruption. This issue affect 
fs regardless the values of blocksize or pagesize.
We need truncate any block beyond i_size after write have failed , do in simular
generic_file_buffered_write() error path.
TEST_CASE:
open(/mnt/test/BIG_FILE, O_WRONLY|O_CREAT|O_DIRECT, 0666) = 3
write(3, aaa..., 104857600) = -1 ENOSPC (No space left on device)

#stat /mnt/test/BIG_FILE
  File: `/mnt/test/BIG_FILE'
  Size: 0   Blocks: 110896 IO Block: 1024   regular empty file
file size is less than biggest block idx

Device: fe07h/65031dInode: 14  Links: 1
Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
Access: 2007-01-24 20:03:38.0 +0300
Modify: 2007-01-24 20:03:38.0 +0300
Change: 2007-01-24 20:03:39.0 +0300

#fsck.ext3 -f /dev/VG/test 
e2fsck 1.39 (29-May-2006)
Pass 1: Checking inodes, blocks, and sizes
Inode 14, i_size is 0, should be 56556544.  Fixy? yes
Pass 2: Checking directory structure


 
 Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]
 ---
  mm/filemap.c |   32 
  1 files changed, 28 insertions(+), 4 deletions(-)
 
 diff --git a/mm/filemap.c b/mm/filemap.c
 index 0aadf5f..8959ae3 100644
 --- a/mm/filemap.c
 +++ b/mm/filemap.c
 @@ -1925,8 +1925,9 @@ generic_file_direct_write(struct kiocb *iocb, const 
 struct iovec *iov,
  /*
   * Sync the fs metadata but not the minor inode changes and
   * of course not the data as we did direct DMA for the IO.
 - * i_mutex is held, which protects generic_osync_inode() from
 - * livelocking.  AIO O_DIRECT ops attempt to sync metadata here.
 + * i_mutex may not being held, if so some specific locking
 + * ordering must protect generic_osync_inode() from livelocking.
 + * AIO O_DIRECT ops attempt to sync metadata here.
   */

 This wasn't exactly clear to me. Did you mean:

  may be held, which protects generic_osync_inode() from livelocking. If it
   is not held, then the filesystem must prevent this livelock?
Yep..  my english is not realy good :(

  if ((written = 0 || written == -EIOCBQUEUED) 
  ((file-f_flags  O_SYNC) || IS_SYNC(inode))) {
 @@ -2240,6 +2241,29 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, 
 const struct iovec *iov,
  mutex_lock(inode-i_mutex);
  ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs,
  iocb-ki_pos);
 +/* 
 + * If __generic_file_aio_write_nolock has failed.
 + * This may happen because of:
 + * 1) Bad segment found (failed before actual write attempt)
 + * 2) Segments are good, but actual write operation failed
 + *and may have instantiated a few blocks outside i_size.
 + *   a) in case of buffered write these blocks was already
 + *  trimmed by generic_file_buffered_write()
 + *   b) in case of O_DIRECT these blocks weren't trimmed yet.
 + *
 + * In case of (2b) these blocks have to be trimmed off again.
 + */
 +if (unlikely( ret  0  file-f_flags  O_DIRECT)) {
 +unsigned long nr_segs_avail = nr_segs;
 +size_t count = 0;
 +if (!generic_segment_checks(iov, nr_segs_avail, count,
 +VERIFY_READ)) {
 +/*It is (2b) case, because segments are good*/
 +loff_t isize = i_size_read(inode);
 +if (pos + count  isize)
 +vmtruncate(inode, isize);
 +}
 +}

 OK, but wouldn't this be better to be done in the actual

Re: [PATCH 2/2] mm: incorrect direct io error handling (v6)

2007-03-12 Thread Dmitriy Monakhov

Nick Piggin [EMAIL PROTECTED] writes:

 On Mon, Mar 12, 2007 at 11:55:30AM +0300, Dmitriy Monakhov wrote:
 Nick Piggin [EMAIL PROTECTED] writes:
 
  On Mon, Mar 12, 2007 at 10:58:10AM +0300, Dmitriy Monakhov wrote:

  @@ -2240,6 +2241,29 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, 
  const struct iovec *iov,
mutex_lock(inode-i_mutex);
ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs,
iocb-ki_pos);
  + /* 
  +  * If __generic_file_aio_write_nolock has failed.
  +  * This may happen because of:
  +  * 1) Bad segment found (failed before actual write attempt)
  +  * 2) Segments are good, but actual write operation failed
  +  *and may have instantiated a few blocks outside i_size.
  +  *   a) in case of buffered write these blocks was already
  +  *  trimmed by generic_file_buffered_write()
  +  *   b) in case of O_DIRECT these blocks weren't trimmed yet.
  +  *
  +  * In case of (2b) these blocks have to be trimmed off again.
  +  */
  + if (unlikely( ret  0  file-f_flags  O_DIRECT)) {
  + unsigned long nr_segs_avail = nr_segs;
  + size_t count = 0;
  + if (!generic_segment_checks(iov, nr_segs_avail, count,
  + VERIFY_READ)) {
  + /*It is (2b) case, because segments are good*/
  + loff_t isize = i_size_read(inode);
  + if (pos + count  isize)
  + vmtruncate(inode, isize);
  + }
  + }
 
  OK, but wouldn't this be better to be done in the actual direct IO
  functions themselves? Thus you could be sure that you have the 2b case,
  and the code would be less fragile to something changing?
 Ohh, We can't just call vmtruncate() after generic_file_direct_write()
 failure while __generic_file_aio_write_nolock() becase where is no guarantee
 what i_mutex held. In fact all existing fs always invoke 
 __generic_file_aio_write_nolock() with i_mutex held in case of S_ISREG files,
 but this was't explicitly demanded and documented. I've proposed to do it in
 previous versions of this patch, because it this just document current state
 of affairs, but David Chinner wasn't agree with it.

 It seemed like it was documented in the comments that you altered in this
 patch...

 How would such a filesystem that did not hold i_mutex propose to fix the
 problem?

 The burden should be on those filesystems that might not want to hold
 i_mutex here, to solve the problem nicely, rather than generic code to take
 this ugly code.
Ok then what do you think about this version http://lkml.org/lkml/2006/12/18/103
witch was posted almost  month ago :)

 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] mm: move common segment checks to separate helper function (v6)

2007-03-12 Thread Dmitriy Monakhov

Nick Piggin [EMAIL PROTECTED] writes:

 On Mon, Mar 12, 2007 at 10:57:53AM +0300, Dmitriy Monakhov wrote:
 I realy don't want to be annoying by sending this patcheset over and over
 again. If anyone think this patch is realy cappy, please comment what 
 exectly is bad. Thank you.

 Doesn't seem like a bad idea.

 
 Changes:
   - patch was split in two patches.

 +/*
 + * Performs necessary checks before doing a write
 + *
 + * Adjust number of segments and amount of bytes to write.
 + * Returns appropriate error code that caller should return or
 + * zero in case that write should be allowed.
 + */
 +inline int generic_segment_checks(const struct iovec *iov,
 +unsigned long *nr_segs, size_t *count,
 +unsigned long access_flags)

 Make it static and not inline, and the compiler will work it out.
Wow i've just carefully checked and found more functions with duplicating code:
fs/xfs/linux-2.6/xfs_lrw.c:655 xfs_write()
fs/ntfs/file.c:2339 ntfs_file_aio_write_nolock()
So i think nobody will object against exporting generic_segment_checks()
and removing doplicating code.

 This function name doesn't really imply that it returns you the
 nr_segs and count, but that's not a big deal I guess.

 You also don't say that nr_segs should be initialised to the amount
 you which to write, while count must be initialised to zero.

 +{
 +unsigned long   seg;
 +for (seg = 0; seg  *nr_segs; seg++) {
 +const struct iovec *iv = iov[seg];
 +
 +/*
 + * If any segment has a negative length, or the cumulative
 + * length ever wraps negative then return -EINVAL.
 + */
 +*count += iv-iov_len;
 +if (unlikely((ssize_t)(*count|iv-iov_len)  0))
 +return -EINVAL;
 +if (access_ok(access_flags, iv-iov_base, iv-iov_len))
 +continue;

 Why now insert the above test, and put the below statements inside the
 branch? OTOH, that makes it less obviously cp from the others. Maybe
 a subsequent patch.

 +if (seg == 0)
 +return -EFAULT;
 +*nr_segs = seg;
 +*count -= iv-iov_len;  /* This segment is no good */
 +break;
 +}


 You could assign to *count here, once, and remove the requirement
 that the caller initialised it to zero?

 +return 0;
 +}
 +
  /**
   * generic_file_aio_read - generic filesystem read routine
   * @iocb:   kernel I/O control block
 @@ -1180,24 +1213,9 @@ generic_file_aio_read(struct kiocb *iocb, const 
 struct iovec *iov,
  loff_t *ppos = iocb-ki_pos;
  
  count = 0;
 -for (seg = 0; seg  nr_segs; seg++) {
 -const struct iovec *iv = iov[seg];
 -
 -/*
 - * If any segment has a negative length, or the cumulative
 - * length ever wraps negative then return -EINVAL.
 - */
 -count += iv-iov_len;
 -if (unlikely((ssize_t)(count|iv-iov_len)  0))
 -return -EINVAL;
 -if (access_ok(VERIFY_WRITE, iv-iov_base, iv-iov_len))
 -continue;
 -if (seg == 0)
 -return -EFAULT;
 -nr_segs = seg;
 -count -= iv-iov_len;   /* This segment is no good */
 -break;
 -}
 +retval = generic_segment_checks(iov, nr_segs, count, VERIFY_WRITE);
 +if (retval)
 +return retval;
  
  /* coalesce the iovecs and go direct-to-BIO for O_DIRECT */
  if (filp-f_flags  O_DIRECT) {
 @@ -2094,30 +2112,14 @@ __generic_file_aio_write_nolock(struct kiocb *iocb, 
 const struct iovec *iov,
  size_t ocount;  /* original count */
  size_t count;   /* after file limit checks */
  struct inode*inode = mapping-host;
 -unsigned long   seg;
  loff_t  pos;
  ssize_t written;
  ssize_t err;
  
  ocount = 0;
 -for (seg = 0; seg  nr_segs; seg++) {
 -const struct iovec *iv = iov[seg];
 -
 -/*
 - * If any segment has a negative length, or the cumulative
 - * length ever wraps negative then return -EINVAL.
 - */
 -ocount += iv-iov_len;
 -if (unlikely((ssize_t)(ocount|iv-iov_len)  0))
 -return -EINVAL;
 -if (access_ok(VERIFY_READ, iv-iov_base, iv-iov_len))
 -continue;
 -if (seg == 0)
 -return -EFAULT;
 -nr_segs = seg;
 -ocount -= iv-iov_len;  /* This segment is no good */
 -break;
 -}
 +err = generic_segment_checks(iov, nr_segs, ocount, VERIFY_READ);
 +if (err)
 +return err;
  
  count = ocount;
  pos = *ppos;
 -- 
 1.5.0.1
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http

[PATCH 2/2] incorrect direct io error handling (v7)

2007-03-12 Thread Dmitriy Monakhov

Changes against v6:
 - Handle direct_io failure inside generic_file_direct_write() as it was
   recommend by Andrew (during discussion v1), and by Nick (during 
   discussion v6).
 - change comments, make it more clear.
 - one more time check what __generic_file_aio_write_nolock() always called
   under i_mutex for non blkdev files.
Tested with: fsstress, manual direct_io tests

Log:
If generic_file_direct_write() has fail (ENOSPC condition) inside 
__generic_file_aio_write_nolock() it may have instantiated
a few blocks outside i_size. And fsck will complain about wrong i_size
(ext2, ext3 and reiserfs interpret i_size and biggest block difference as 
error),
after fsck will fix error i_size will be increased to the biggest block,
but this blocks contain gurbage from previous write attempt, this is not 
information leak, but its silence file data corruption. This issue affect 
fs regardless the values of blocksize or pagesize, and off corse only for non 
blkdev files.
We need truncate any block beyond i_size after write have failed , do in simular
generic_file_buffered_write() error path. We may safely call vmtruncate() here
because i_mutex always held for non blkdev files.
 
TEST_CASE:
open(/mnt/test/BIG_FILE, O_WRONLY|O_CREAT|O_DIRECT, 0666) = 3
write(3, aaa..., 104857600) = -1 ENOSPC (No space left on device)

#stat /mnt/test/BIG_FILE
  File: `/mnt/test/BIG_FILE'
  Size: 0   Blocks: 110896 IO Block: 1024   regular empty file
file size is less than biggest block idx

Device: fe07h/65031dInode: 14  Links: 1
Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
Access: 2007-01-24 20:03:38.0 +0300
Modify: 2007-01-24 20:03:38.0 +0300
Change: 2007-01-24 20:03:39.0 +0300

#fsck.ext3 -f /dev/VG/test 
e2fsck 1.39 (29-May-2006)
Pass 1: Checking inodes, blocks, and sizes
Inode 14, i_size is 0, should be 56556544.  Fixy? yes
Pass 2: Checking directory structure

 

Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]
---
 mm/filemap.c |   28 
 1 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 8bd1ea4..95d49fe 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1932,8 +1932,10 @@ generic_file_direct_write(struct kiocb *iocb, const 
struct iovec *iov,
/*
 * Sync the fs metadata but not the minor inode changes and
 * of course not the data as we did direct DMA for the IO.
-* i_mutex is held, which protects generic_osync_inode() from
-* livelocking.  AIO O_DIRECT ops attempt to sync metadata here.
+* i_mutex is held in case of DIO_LOCKING, which protects 
+* generic_osync_inode() from livelocking. If it is not held, then 
+* the filesystem must prevent this livelock. AIO O_DIRECT ops 
+* attempt to sync metadata here.
 */
if ((written = 0 || written == -EIOCBQUEUED) 
((file-f_flags  O_SYNC) || IS_SYNC(inode))) {
@@ -2155,8 +2157,26 @@ __generic_file_aio_write_nolock(struct kiocb *iocb, 
const struct iovec *iov,
loff_t endbyte;
ssize_t written_buffered;
 
+   /*
+* In case of non blockdev we may fail to buffered I/O.
+* So i_mutex must be held.
+*/
+   if (!S_ISBLK(inode-i_mode))
+   BUG_ON(!mutex_is_locked(inode-i_mutex));
+
written = generic_file_direct_write(iocb, iov, nr_segs, pos,
ppos, count, ocount);
+   /*
+* If host is not S_ISBLK generic_file_direct_write() may 
+* have instantiated a few blocks outside i_size  files
+* Trim these off again.
+*/
+   if (unlikely(written  0)  !S_ISBLK(inode-i_mode)) {
+   loff_t isize = i_size_read(inode);
+   if (pos + count  isize)
+   vmtruncate(inode, isize);
+   }
+
if (written  0 || written == count)
goto out;
/*
@@ -2261,8 +2281,8 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const 
struct iovec *iov,
 EXPORT_SYMBOL(generic_file_aio_write);
 
 /*
- * Called under i_mutex for writes to S_ISREG files.   Returns -EIO if 
something
- * went wrong during pagecache shootdown.
+ * Called under i_mutex for writes to S_ISREG files in case of DIO_LOCKING.
+ * Returns -EIO if something went wrong during pagecache shootdown.
  */
 static ssize_t
 generic_file_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
-- 
1.5.0.1


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] mm: move common segment checks to separate helper function (v7)

2007-03-12 Thread Dmitriy Monakhov

Changes against v6
 - remove duplicated code from xfs,ntfs
 - export generic_segment_checks, because it used by xfs,nfs now.
 - change arguments initialization pocily according to Nick's comments.

Tested with: ltp readv/writev tests

Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]
---
 fs/ntfs/file.c |   21 ++-
 fs/xfs/linux-2.6/xfs_lrw.c |   22 ++--
 include/linux/fs.h |3 ++
 mm/filemap.c   |   83 ---
 4 files changed, 55 insertions(+), 74 deletions(-)

diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c
index dbbac55..621de36 100644
--- a/fs/ntfs/file.c
+++ b/fs/ntfs/file.c
@@ -2129,28 +2129,13 @@ static ssize_t ntfs_file_aio_write_nolock(struct kiocb 
*iocb,
struct address_space *mapping = file-f_mapping;
struct inode *inode = mapping-host;
loff_t pos;
-   unsigned long seg;
size_t count;   /* after file limit checks */
ssize_t written, err;
 
count = 0;
-   for (seg = 0; seg  nr_segs; seg++) {
-   const struct iovec *iv = iov[seg];
-   /*
-* If any segment has a negative length, or the cumulative
-* length ever wraps negative then return -EINVAL.
-*/
-   count += iv-iov_len;
-   if (unlikely((ssize_t)(count|iv-iov_len)  0))
-   return -EINVAL;
-   if (access_ok(VERIFY_READ, iv-iov_base, iv-iov_len))
-   continue;
-   if (!seg)
-   return -EFAULT;
-   nr_segs = seg;
-   count -= iv-iov_len;   /* This segment is no good */
-   break;
-   }
+   err = generic_segment_checks(iov, nr_segs, count, VERIFY_READ);
+   if (err)
+   return err;
pos = *ppos;
vfs_check_frozen(inode-i_sb, SB_FREEZE_WRITE);
/* We can write back this queue in page reclaim. */
diff --git a/fs/xfs/linux-2.6/xfs_lrw.c b/fs/xfs/linux-2.6/xfs_lrw.c
index ff8d64e..558076d 100644
--- a/fs/xfs/linux-2.6/xfs_lrw.c
+++ b/fs/xfs/linux-2.6/xfs_lrw.c
@@ -639,7 +639,6 @@ xfs_write(
xfs_fsize_t isize, new_size;
xfs_iocore_t*io;
bhv_vnode_t *vp;
-   unsigned long   seg;
int iolock;
int eventsent = 0;
bhv_vrwlock_t   locktype;
@@ -652,24 +651,9 @@ xfs_write(
vp = BHV_TO_VNODE(bdp);
xip = XFS_BHVTOI(bdp);
 
-   for (seg = 0; seg  segs; seg++) {
-   const struct iovec *iv = iovp[seg];
-
-   /*
-* If any segment has a negative length, or the cumulative
-* length ever wraps negative then return -EINVAL.
-*/
-   ocount += iv-iov_len;
-   if (unlikely((ssize_t)(ocount|iv-iov_len)  0))
-   return -EINVAL;
-   if (access_ok(VERIFY_READ, iv-iov_base, iv-iov_len))
-   continue;
-   if (seg == 0)
-   return -EFAULT;
-   segs = seg;
-   ocount -= iv-iov_len;  /* This segment is no good */
-   break;
-   }
+   error = generic_segment_checks(iovp, segs, ocount, VERIFY_READ);
+   if (error)
+   return error;
 
count = ocount;
pos = *offset;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6a3d22e..3b99450 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1778,6 +1778,9 @@ extern ssize_t generic_file_sendfile(struct file *, 
loff_t *, size_t, read_actor
 extern void do_generic_mapping_read(struct address_space *mapping,
struct file_ra_state *, struct file *,
loff_t *, read_descriptor_t *, 
read_actor_t);
+extern int generic_segment_checks(const struct iovec *iov,
+   unsigned long *nr_segs, size_t *count,
+   unsigned long access_flags);
 
 /* fs/splice.c */
 extern ssize_t generic_file_splice_read(struct file *, loff_t *,
diff --git a/mm/filemap.c b/mm/filemap.c
index 8e1849a..8bd1ea4 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1159,6 +1159,46 @@ success:
return size;
 }
 
+/*
+ * Performs necessary checks before doing a write
+ * @iov:   io vector request
+ * @nr_segs:   number of segments in the iovec
+ * @count: number of bytes to write
+ * @access_flags: type of access: %VERIFY_READ or %VERIFY_WRITE
+ *
+ * Adjust number of segments and amount of bytes to write (nr_segs should be
+ * properly initialized first). Returns appropriate error code that caller 
+ * should return or zero in case that write should be allowed.
+ */
+int generic_segment_checks(const struct iovec *iov,
+   unsigned long *nr_segs, size_t *count,
+   unsigned long access_flags)
+{
+   unsigned

[PATCH] driver core: handles kobject_uevent failure while device_add

2007-03-11 Thread Dmitriy Monakhov

Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]>
---
 drivers/base/core.c |   10 --
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 142c222..da73012 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -630,8 +630,11 @@ int device_add(struct device *dev)
goto PMError;
if ((error = bus_add_device(dev)))
goto BusError;
-   if (!dev->uevent_suppress)
-   kobject_uevent(>kobj, KOBJ_ADD);
+   if (!dev->uevent_suppress) {
+   error = kobject_uevent(>kobj, KOBJ_ADD);
+   if (error)
+   goto UeventError;
+   }
bus_attach_device(dev);
if (parent)
klist_add_tail(>knode_parent, >klist_children);
@@ -651,6 +654,9 @@ int device_add(struct device *dev)
kfree(class_name);
put_device(dev);
return error;
+
+ UeventError:
+   bus_remove_device(dev);
  BusError:
device_pm_remove(dev);
  PMError:
-- 
1.5.0.1


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] driver core: handle sysfs_op failure while device_add

2007-03-11 Thread Dmitriy Monakhov

depends on: "[PATCH] driver core: fix device_add error path"

 - rearrange error path sequence, in order to make it more correct.
   In fact if initial sequance was doA(); doB(); doC();
   we should undo it with folowing sequance undoC(); undoB(); undoA();
 - handle sysfs_ops failure

This patch kills "ignoring return value of 'sysfs_create_link'"
warning message.

Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]>
---
 drivers/base/core.c |   47 +++
 1 files changed, 31 insertions(+), 16 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 245d703..bcfcfad 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -602,22 +602,34 @@ int device_add(struct device *dev)
}
 
if (dev->class) {
-   sysfs_create_link(>kobj, >class->subsys.kset.kobj,
- "subsystem");
+   error = sysfs_create_link(>kobj,
+   >class->subsys.kset.kobj, "subsystem");
+   if (error)
+   goto ueventattrError;
/* If this is not a "fake" compatible device, then create the
 * symlink from the class to the device. */
-   if (dev->kobj.parent != >class->subsys.kset.kobj)
-   sysfs_create_link(>class->subsys.kset.kobj,
- >kobj, dev->bus_id);
+   if (dev->kobj.parent != >class->subsys.kset.kobj) {
+   error = sysfs_create_link(
+   >class->subsys.kset.kobj,
+   >kobj, dev->bus_id);
+   if (error)
+   goto busLinkErr;
+   }
if (parent) {
-   sysfs_create_link(>kobj, >parent->kobj,
-   "device");
+   error = sysfs_create_link(>kobj,
+   >parent->kobj, "device");
+   if (error)
+   goto deviceLinkErr;
 #ifdef CONFIG_SYSFS_DEPRECATED
class_name = make_class_name(dev->class->name,
>kobj);
-   if (class_name)
-   sysfs_create_link(>parent->kobj,
- >kobj, class_name);
+   if (class_name) {
+   error = sysfs_create_link(
+   >parent->kobj,
+   >kobj, class_name);
+   if (error)
+   goto classLinkErr;
+   }
 #endif
}
}
@@ -673,12 +685,6 @@ int device_add(struct device *dev)
}
 
if (dev->class) {
-   sysfs_remove_link(>kobj, "subsystem");
-   /* If this is not a "fake" compatible device, remove the
-* symlink from the class to the device. */
-   if (dev->kobj.parent != >class->subsys.kset.kobj)
-   sysfs_remove_link(>class->subsys.kset.kobj,
- dev->bus_id);
if (parent) {
 #ifdef CONFIG_SYSFS_DEPRECATED
char *class_name = make_class_name(dev->class->name,
@@ -688,8 +694,17 @@ int device_add(struct device *dev)
  class_name);
kfree(class_name);
 #endif
+classLinkErr:
sysfs_remove_link(>kobj, "device");
}
+deviceLinkErr:
+   /* If this is not a "fake" compatible device, remove the
+* symlink from the class to the device. */
+   if (dev->kobj.parent != >class->subsys.kset.kobj)
+   sysfs_remove_link(>class->subsys.kset.kobj,
+ dev->bus_id);
+busLinkErr:
+   sysfs_remove_link(>kobj, "subsystem");
}
  ueventattrError:
device_remove_file(dev, >uevent_attr);
-- 
1.5.0.1

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] driver core: fix device_add error path

2007-03-11 Thread Dmitriy Monakhov

Dmitriy Monakhov <[EMAIL PROTECTED]> writes:

> Greg Kroah-Hartman <[EMAIL PROTECTED]> writes:
>
>> From: James Simmons <[EMAIL PROTECTED]>
>>
>> When a device fails to register the class symlinks where not cleaned up.
>> This left a symlink in the /sys/class/"device"/ directory that pointed
>> to no where. This caused the sysfs_follow_link Oops I reported earlier.
>> This patch cleanups up the symlink. Please apply. Thank you.
>>
>> Signed-Off: James Simmons <[EMAIL PROTECTED]>
>> Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
>> ---
>>  drivers/base/core.c |   31 ++-
>>  1 files changed, 30 insertions(+), 1 deletions(-)
>>
>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>> index d04fd33..cf2a398 100644
>> --- a/drivers/base/core.c
>> +++ b/drivers/base/core.c
>> @@ -637,12 +637,41 @@ int device_add(struct device *dev)
>>   BUS_NOTIFY_DEL_DEVICE, dev);
>>  device_remove_groups(dev);
>>   GroupError:
>> -device_remove_attrs(dev);
>> +device_remove_attrs(dev);
>>   AttrsError:
>>  if (dev->devt_attr) {
>>  device_remove_file(dev, dev->devt_attr);
>>  kfree(dev->devt_attr);
>>  }
>> +
>> +if (dev->class) {
>> +sysfs_remove_link(>kobj, "subsystem");
>> +/* If this is not a "fake" compatible device, remove the
>> + * symlink from the class to the device. */
>> +if (dev->kobj.parent != >class->subsys.kset.kobj)
>> +sysfs_remove_link(>class->subsys.kset.kobj,
>> +  dev->bus_id);
>> +#ifdef CONFIG_SYSFS_DEPRECATED
>> +if (parent) {
>> +char *class_name = make_class_name(dev->class->name,
>> +   >kobj);
>> +if (class_name)
>> +sysfs_remove_link(>parent->kobj,
>> +  class_name);
>> +kfree(class_name);
>> +sysfs_remove_link(>kobj, "device");
>> +}
>> +#endif
>> +
> <<<<< block begin
>> +down(>class->sem);
>> +/* notify any interfaces that the device is now gone */
>> +list_for_each_entry(class_intf, >class->interfaces, node)
>> +if (class_intf->remove_dev)
>> +class_intf->remove_dev(dev, class_intf);
>> +/* remove the device from the class list */
>> +list_del_init(>node);
>> +up(>class->sem);
> <<<<<< block end 
> May be i've missed something, but i'm confuesd a litle bit.
> For example if error happens while device_pm_add() we jump to label "PMError"
> and code from block above will be executed (device will be remove from list),
> but this device wasn't added to this list yet!
I've check it one more time, code it really broken!, and i think i understand 
how
this can happen 
it look like full code chunck was copy-pasted from device_del(), but in case of 
device_add() error path, device was't added to dev->class->devices list yet.
Folowing patch fix this copy-paste error:

 [PATCH] driver core: fix device_add error path

 - At the moment we jump here device was't added to
   dev->class->devices list yet.

Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]>
---
 drivers/base/core.c |9 -
 1 files changed, 0 insertions(+), 9 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 142c222..7d2459b 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -684,15 +684,6 @@ int device_add(struct device *dev)
 #endif
sysfs_remove_link(>kobj, "device");
}
-
-   down(>class->sem);
-   /* notify any interfaces that the device is now gone */
-   list_for_each_entry(class_intf, >class->interfaces, node)
-   if (class_intf->remove_dev)
-   class_intf->remove_dev(dev, class_intf);
-   /* remove the device from the class list */
-   list_del_init(>node);
-   up(>class->sem);
}
  ueventattrError:
device_remove_file(dev, >uevent_attr);
-- 
1.5.0.1


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] driver core: fix device_add error path

2007-03-11 Thread Dmitriy Monakhov

Dmitriy Monakhov [EMAIL PROTECTED] writes:

 Greg Kroah-Hartman [EMAIL PROTECTED] writes:

 From: James Simmons [EMAIL PROTECTED]

 When a device fails to register the class symlinks where not cleaned up.
 This left a symlink in the /sys/class/device/ directory that pointed
 to no where. This caused the sysfs_follow_link Oops I reported earlier.
 This patch cleanups up the symlink. Please apply. Thank you.

 Signed-Off: James Simmons [EMAIL PROTECTED]
 Signed-off-by: Greg Kroah-Hartman [EMAIL PROTECTED]
 ---
  drivers/base/core.c |   31 ++-
  1 files changed, 30 insertions(+), 1 deletions(-)

 diff --git a/drivers/base/core.c b/drivers/base/core.c
 index d04fd33..cf2a398 100644
 --- a/drivers/base/core.c
 +++ b/drivers/base/core.c
 @@ -637,12 +637,41 @@ int device_add(struct device *dev)
   BUS_NOTIFY_DEL_DEVICE, dev);
  device_remove_groups(dev);
   GroupError:
 -device_remove_attrs(dev);
 +device_remove_attrs(dev);
   AttrsError:
  if (dev-devt_attr) {
  device_remove_file(dev, dev-devt_attr);
  kfree(dev-devt_attr);
  }
 +
 +if (dev-class) {
 +sysfs_remove_link(dev-kobj, subsystem);
 +/* If this is not a fake compatible device, remove the
 + * symlink from the class to the device. */
 +if (dev-kobj.parent != dev-class-subsys.kset.kobj)
 +sysfs_remove_link(dev-class-subsys.kset.kobj,
 +  dev-bus_id);
 +#ifdef CONFIG_SYSFS_DEPRECATED
 +if (parent) {
 +char *class_name = make_class_name(dev-class-name,
 +   dev-kobj);
 +if (class_name)
 +sysfs_remove_link(dev-parent-kobj,
 +  class_name);
 +kfree(class_name);
 +sysfs_remove_link(dev-kobj, device);
 +}
 +#endif
 +
  block begin
 +down(dev-class-sem);
 +/* notify any interfaces that the device is now gone */
 +list_for_each_entry(class_intf, dev-class-interfaces, node)
 +if (class_intf-remove_dev)
 +class_intf-remove_dev(dev, class_intf);
 +/* remove the device from the class list */
 +list_del_init(dev-node);
 +up(dev-class-sem);
  block end 
 May be i've missed something, but i'm confuesd a litle bit.
 For example if error happens while device_pm_add() we jump to label PMError
 and code from block above will be executed (device will be remove from list),
 but this device wasn't added to this list yet!
I've check it one more time, code it really broken!, and i think i understand 
how
this can happen 
it look like full code chunck was copy-pasted from device_del(), but in case of 
device_add() error path, device was't added to dev-class-devices list yet.
Folowing patch fix this copy-paste error:

 [PATCH] driver core: fix device_add error path

 - At the moment we jump here device was't added to
   dev-class-devices list yet.

Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]
---
 drivers/base/core.c |9 -
 1 files changed, 0 insertions(+), 9 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 142c222..7d2459b 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -684,15 +684,6 @@ int device_add(struct device *dev)
 #endif
sysfs_remove_link(dev-kobj, device);
}
-
-   down(dev-class-sem);
-   /* notify any interfaces that the device is now gone */
-   list_for_each_entry(class_intf, dev-class-interfaces, node)
-   if (class_intf-remove_dev)
-   class_intf-remove_dev(dev, class_intf);
-   /* remove the device from the class list */
-   list_del_init(dev-node);
-   up(dev-class-sem);
}
  ueventattrError:
device_remove_file(dev, dev-uevent_attr);
-- 
1.5.0.1


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] driver core: handles kobject_uevent failure while device_add

2007-03-11 Thread Dmitriy Monakhov

Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]
---
 drivers/base/core.c |   10 --
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 142c222..da73012 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -630,8 +630,11 @@ int device_add(struct device *dev)
goto PMError;
if ((error = bus_add_device(dev)))
goto BusError;
-   if (!dev-uevent_suppress)
-   kobject_uevent(dev-kobj, KOBJ_ADD);
+   if (!dev-uevent_suppress) {
+   error = kobject_uevent(dev-kobj, KOBJ_ADD);
+   if (error)
+   goto UeventError;
+   }
bus_attach_device(dev);
if (parent)
klist_add_tail(dev-knode_parent, parent-klist_children);
@@ -651,6 +654,9 @@ int device_add(struct device *dev)
kfree(class_name);
put_device(dev);
return error;
+
+ UeventError:
+   bus_remove_device(dev);
  BusError:
device_pm_remove(dev);
  PMError:
-- 
1.5.0.1


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] driver core: handle sysfs_op failure while device_add

2007-03-11 Thread Dmitriy Monakhov

depends on: [PATCH] driver core: fix device_add error path

 - rearrange error path sequence, in order to make it more correct.
   In fact if initial sequance was doA(); doB(); doC();
   we should undo it with folowing sequance undoC(); undoB(); undoA();
 - handle sysfs_ops failure

This patch kills ignoring return value of 'sysfs_create_link'
warning message.

Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]
---
 drivers/base/core.c |   47 +++
 1 files changed, 31 insertions(+), 16 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 245d703..bcfcfad 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -602,22 +602,34 @@ int device_add(struct device *dev)
}
 
if (dev-class) {
-   sysfs_create_link(dev-kobj, dev-class-subsys.kset.kobj,
- subsystem);
+   error = sysfs_create_link(dev-kobj,
+   dev-class-subsys.kset.kobj, subsystem);
+   if (error)
+   goto ueventattrError;
/* If this is not a fake compatible device, then create the
 * symlink from the class to the device. */
-   if (dev-kobj.parent != dev-class-subsys.kset.kobj)
-   sysfs_create_link(dev-class-subsys.kset.kobj,
- dev-kobj, dev-bus_id);
+   if (dev-kobj.parent != dev-class-subsys.kset.kobj) {
+   error = sysfs_create_link(
+   dev-class-subsys.kset.kobj,
+   dev-kobj, dev-bus_id);
+   if (error)
+   goto busLinkErr;
+   }
if (parent) {
-   sysfs_create_link(dev-kobj, dev-parent-kobj,
-   device);
+   error = sysfs_create_link(dev-kobj,
+   dev-parent-kobj, device);
+   if (error)
+   goto deviceLinkErr;
 #ifdef CONFIG_SYSFS_DEPRECATED
class_name = make_class_name(dev-class-name,
dev-kobj);
-   if (class_name)
-   sysfs_create_link(dev-parent-kobj,
- dev-kobj, class_name);
+   if (class_name) {
+   error = sysfs_create_link(
+   dev-parent-kobj,
+   dev-kobj, class_name);
+   if (error)
+   goto classLinkErr;
+   }
 #endif
}
}
@@ -673,12 +685,6 @@ int device_add(struct device *dev)
}
 
if (dev-class) {
-   sysfs_remove_link(dev-kobj, subsystem);
-   /* If this is not a fake compatible device, remove the
-* symlink from the class to the device. */
-   if (dev-kobj.parent != dev-class-subsys.kset.kobj)
-   sysfs_remove_link(dev-class-subsys.kset.kobj,
- dev-bus_id);
if (parent) {
 #ifdef CONFIG_SYSFS_DEPRECATED
char *class_name = make_class_name(dev-class-name,
@@ -688,8 +694,17 @@ int device_add(struct device *dev)
  class_name);
kfree(class_name);
 #endif
+classLinkErr:
sysfs_remove_link(dev-kobj, device);
}
+deviceLinkErr:
+   /* If this is not a fake compatible device, remove the
+* symlink from the class to the device. */
+   if (dev-kobj.parent != dev-class-subsys.kset.kobj)
+   sysfs_remove_link(dev-class-subsys.kset.kobj,
+ dev-bus_id);
+busLinkErr:
+   sysfs_remove_link(dev-kobj, subsystem);
}
  ueventattrError:
device_remove_file(dev, dev-uevent_attr);
-- 
1.5.0.1

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 8/9] Driver core: fix error by cleanup up symlinks properly

2007-03-10 Thread Dmitriy Monakhov

Greg Kroah-Hartman <[EMAIL PROTECTED]> writes:

> From: James Simmons <[EMAIL PROTECTED]>
>
> When a device fails to register the class symlinks where not cleaned up.
> This left a symlink in the /sys/class/"device"/ directory that pointed
> to no where. This caused the sysfs_follow_link Oops I reported earlier.
> This patch cleanups up the symlink. Please apply. Thank you.
>
> Signed-Off: James Simmons <[EMAIL PROTECTED]>
> Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
> ---
>  drivers/base/core.c |   31 ++-
>  1 files changed, 30 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index d04fd33..cf2a398 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -637,12 +637,41 @@ int device_add(struct device *dev)
>BUS_NOTIFY_DEL_DEVICE, dev);
>   device_remove_groups(dev);
>   GroupError:
> - device_remove_attrs(dev);
> + device_remove_attrs(dev);
>   AttrsError:
>   if (dev->devt_attr) {
>   device_remove_file(dev, dev->devt_attr);
>   kfree(dev->devt_attr);
>   }
> +
> + if (dev->class) {
> + sysfs_remove_link(>kobj, "subsystem");
> + /* If this is not a "fake" compatible device, remove the
> +  * symlink from the class to the device. */
> + if (dev->kobj.parent != >class->subsys.kset.kobj)
> + sysfs_remove_link(>class->subsys.kset.kobj,
> +   dev->bus_id);
> +#ifdef CONFIG_SYSFS_DEPRECATED
> + if (parent) {
> + char *class_name = make_class_name(dev->class->name,
> +>kobj);
> + if (class_name)
> + sysfs_remove_link(>parent->kobj,
> +   class_name);
> + kfree(class_name);
> + sysfs_remove_link(>kobj, "device");
> + }
> +#endif
> +
< block begin
> + down(>class->sem);
> + /* notify any interfaces that the device is now gone */
> + list_for_each_entry(class_intf, >class->interfaces, node)
> + if (class_intf->remove_dev)
> + class_intf->remove_dev(dev, class_intf);
> + /* remove the device from the class list */
> + list_del_init(>node);
> + up(>class->sem);
<< block end 
May be i've missed something, but i'm confuesd a litle bit.
For example if error happens while device_pm_add() we jump to label "PMError"
and code from block above will be executed (device will be remove from list),
but this device wasn't added to this list yet!
> + }
>   ueventattrError:
>   device_remove_file(dev, >uevent_attr);
>   attrError:
> -- 
> 1.5.0.1
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] kobject: kobject_shadow_add cleanup

2007-03-10 Thread Dmitriy Monakhov

 - correct function name in comments
 - parrent assignment does metter only inside "if" block, 
   so move it inside this block.
Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]>

diff --git a/lib/kobject.c b/lib/kobject.c
index b94f208..e4b477d 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -201,7 +201,7 @@ static void unlink(struct kobject * kobj)
 }
 
 /**
- * kobject_add - add an object to the hierarchy.
+ * kobject_shadow_add - add an object to the hierarchy.
  * @kobj:  object.
  * @shadow_parent: sysfs directory to add to.
  */
@@ -234,8 +234,8 @@ int kobject_shadow_add(struct kobject * kobj, struct dentry 
*shadow_parent)
 
list_add_tail(>entry,>kset->list);
spin_unlock(>kset->list_lock);
+   kobj->parent = parent;
}
-   kobj->parent = parent;
 
error = create_dir(kobj, shadow_parent);
if (error) {

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] kobject: kobject_shadow_add cleanup

2007-03-10 Thread Dmitriy Monakhov

 - correct function name in comments
 - parrent assignment does metter only inside if block, 
   so move it inside this block.
Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]

diff --git a/lib/kobject.c b/lib/kobject.c
index b94f208..e4b477d 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -201,7 +201,7 @@ static void unlink(struct kobject * kobj)
 }
 
 /**
- * kobject_add - add an object to the hierarchy.
+ * kobject_shadow_add - add an object to the hierarchy.
  * @kobj:  object.
  * @shadow_parent: sysfs directory to add to.
  */
@@ -234,8 +234,8 @@ int kobject_shadow_add(struct kobject * kobj, struct dentry 
*shadow_parent)
 
list_add_tail(kobj-entry,kobj-kset-list);
spin_unlock(kobj-kset-list_lock);
+   kobj-parent = parent;
}
-   kobj-parent = parent;
 
error = create_dir(kobj, shadow_parent);
if (error) {

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 8/9] Driver core: fix error by cleanup up symlinks properly

2007-03-10 Thread Dmitriy Monakhov

Greg Kroah-Hartman [EMAIL PROTECTED] writes:

 From: James Simmons [EMAIL PROTECTED]

 When a device fails to register the class symlinks where not cleaned up.
 This left a symlink in the /sys/class/device/ directory that pointed
 to no where. This caused the sysfs_follow_link Oops I reported earlier.
 This patch cleanups up the symlink. Please apply. Thank you.

 Signed-Off: James Simmons [EMAIL PROTECTED]
 Signed-off-by: Greg Kroah-Hartman [EMAIL PROTECTED]
 ---
  drivers/base/core.c |   31 ++-
  1 files changed, 30 insertions(+), 1 deletions(-)

 diff --git a/drivers/base/core.c b/drivers/base/core.c
 index d04fd33..cf2a398 100644
 --- a/drivers/base/core.c
 +++ b/drivers/base/core.c
 @@ -637,12 +637,41 @@ int device_add(struct device *dev)
BUS_NOTIFY_DEL_DEVICE, dev);
   device_remove_groups(dev);
   GroupError:
 - device_remove_attrs(dev);
 + device_remove_attrs(dev);
   AttrsError:
   if (dev-devt_attr) {
   device_remove_file(dev, dev-devt_attr);
   kfree(dev-devt_attr);
   }
 +
 + if (dev-class) {
 + sysfs_remove_link(dev-kobj, subsystem);
 + /* If this is not a fake compatible device, remove the
 +  * symlink from the class to the device. */
 + if (dev-kobj.parent != dev-class-subsys.kset.kobj)
 + sysfs_remove_link(dev-class-subsys.kset.kobj,
 +   dev-bus_id);
 +#ifdef CONFIG_SYSFS_DEPRECATED
 + if (parent) {
 + char *class_name = make_class_name(dev-class-name,
 +dev-kobj);
 + if (class_name)
 + sysfs_remove_link(dev-parent-kobj,
 +   class_name);
 + kfree(class_name);
 + sysfs_remove_link(dev-kobj, device);
 + }
 +#endif
 +
 block begin
 + down(dev-class-sem);
 + /* notify any interfaces that the device is now gone */
 + list_for_each_entry(class_intf, dev-class-interfaces, node)
 + if (class_intf-remove_dev)
 + class_intf-remove_dev(dev, class_intf);
 + /* remove the device from the class list */
 + list_del_init(dev-node);
 + up(dev-class-sem);
 block end 
May be i've missed something, but i'm confuesd a litle bit.
For example if error happens while device_pm_add() we jump to label PMError
and code from block above will be executed (device will be remove from list),
but this device wasn't added to this list yet!
 + }
   ueventattrError:
   device_remove_file(dev, dev-uevent_attr);
   attrError:
 -- 
 1.5.0.1

 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][Patch 2/6] integrity: fs hook placement

2007-03-08 Thread Dmitriy Monakhov

Mimi Zohar <[EMAIL PROTECTED]> writes:

> This patch places calls to the new integrity hooks in the appropriate
> places in the fs directory. It is not meant in any way to be viewed
> as a complete set, but used as a basis for an initial discussion.
>
> Index: linux-2.6.21-rc3-mm2/fs/ext3/xattr_security.c
> ===
> --- linux-2.6.21-rc3-mm2.orig/fs/ext3/xattr_security.c
> +++ linux-2.6.21-rc3-mm2/fs/ext3/xattr_security.c
> @@ -10,6 +10,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "xattr.h"
>  
>  static size_t
> @@ -58,12 +59,19 @@ ext3_init_security(handle_t *handle, str
>  
>   err = security_inode_init_security(inode, dir, , , );
>   if (err) {
> + /* Even if creation of the security xattr fails, must
> +  * indicate this is a new inode. */
> + integrity_inode_init_integrity(inode, dir, NULL, NULL, NULL);
>   if (err == -EOPNOTSUPP)
>   return 0;
>   return err;
>   }
 block begin
>   err = ext3_xattr_set_handle(handle, inode, EXT3_XATTR_INDEX_SECURITY,
>   name, value, len, 0);
> +
> + integrity_inode_init_integrity(inode, dir, , , );
> + err = ext3_xattr_set_handle(handle, inode, EXT3_XATTR_INDEX_SECURITY,
> + name, value, len, 0);
 block end
May be i've missed something, but i don't get last block.
Why you call ext3_xattr_set_handle() twise?, 
or you just mistyped and  it has to look like this:
<<<   kfree(name);
>   kfree(value);
>   return err;
> Index: linux-2.6.21-rc3-mm2/include/linux/fs.h
> ===
> --- linux-2.6.21-rc3-mm2.orig/include/linux/fs.h
> +++ linux-2.6.21-rc3-mm2/include/linux/fs.h
> @@ -591,6 +591,9 @@ struct inode {
>  #ifdef CONFIG_SECURITY
>   void*i_security;
>  #endif
> +#ifdef CONFIG_INTEGRITY
> + void*i_integrity;
> +#endif
>   void*i_private; /* fs or device private pointer */
>  };
>  
> Index: linux-2.6.21-rc3-mm2/fs/dcache.c
> ===
> --- linux-2.6.21-rc3-mm2.orig/fs/dcache.c
> +++ linux-2.6.21-rc3-mm2/fs/dcache.c
> @@ -29,6 +29,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -986,6 +987,7 @@ void d_instantiate(struct dentry *entry,
>   entry->d_inode = inode;
>   fsnotify_d_instantiate(entry, inode);
>   spin_unlock(_lock);
> + integrity_d_instantiate(entry, inode);
>   security_d_instantiate(entry, inode);
>  }
>  
> @@ -1050,6 +1052,7 @@ struct dentry *d_instantiate_unique(stru
>   spin_unlock(_lock);
>  
>   if (!result) {
> + integrity_d_instantiate(entry, inode);
>   security_d_instantiate(entry, inode);
>   return NULL;
>   }
> @@ -1187,6 +1190,7 @@ struct dentry *d_splice_alias(struct ino
>   BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED));
>   fsnotify_d_instantiate(new, inode);
>   spin_unlock(_lock);
> + integrity_d_instantiate(new, inode);
>   security_d_instantiate(new, inode);
>   d_rehash(dentry);
>   d_move(new, dentry);
> @@ -1197,6 +1201,7 @@ struct dentry *d_splice_alias(struct ino
>   dentry->d_inode = inode;
>   fsnotify_d_instantiate(dentry, inode);
>   spin_unlock(_lock);
> + integrity_d_instantiate(dentry, inode);
>   security_d_instantiate(dentry, inode);
>   d_rehash(dentry);
>   }
> @@ -1748,6 +1753,7 @@ found:
>   spin_unlock(_lock);
>  out_nolock:
>   if (actual == dentry) {
> + integrity_d_instantiate(dentry, inode);
>   security_d_instantiate(dentry, inode);
>   return NULL;
>   }
> Index: linux-2.6.21-rc3-mm2/fs/file_table.c
> ===
> --- linux-2.6.21-rc3-mm2.orig/fs/file_table.c
> +++ linux-2.6.21-rc3-mm2/fs/file_table.c
> @@ -13,6 +13,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -169,6 +170,7 @@ void fastcall __fput(struct file *file)
>   if (file->f_op && file->f_op->release)
>   file->f_op->release(inode, file);
>   security_file_free(file);
> + integrity_file_free(file);
>   if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL))
>   cdev_put(inode->i_cdev);
>   fops_put(file->f_op);
> @@ -240,6 +242,7 @@ void put_filp(struct file *file)
>  {
>   if (atomic_dec_and_test(>f_count)) {
>   security_file_free(file);
> + integrity_file_free(file);
>

Re: [RFC][Patch 2/6] integrity: fs hook placement

2007-03-08 Thread Dmitriy Monakhov

Mimi Zohar [EMAIL PROTECTED] writes:

 This patch places calls to the new integrity hooks in the appropriate
 places in the fs directory. It is not meant in any way to be viewed
 as a complete set, but used as a basis for an initial discussion.

 Index: linux-2.6.21-rc3-mm2/fs/ext3/xattr_security.c
 ===
 --- linux-2.6.21-rc3-mm2.orig/fs/ext3/xattr_security.c
 +++ linux-2.6.21-rc3-mm2/fs/ext3/xattr_security.c
 @@ -10,6 +10,7 @@
  #include linux/ext3_jbd.h
  #include linux/ext3_fs.h
  #include linux/security.h
 +#include linux/integrity.h
  #include xattr.h
  
  static size_t
 @@ -58,12 +59,19 @@ ext3_init_security(handle_t *handle, str
  
   err = security_inode_init_security(inode, dir, name, value, len);
   if (err) {
 + /* Even if creation of the security xattr fails, must
 +  * indicate this is a new inode. */
 + integrity_inode_init_integrity(inode, dir, NULL, NULL, NULL);
   if (err == -EOPNOTSUPP)
   return 0;
   return err;
   }
 block begin
   err = ext3_xattr_set_handle(handle, inode, EXT3_XATTR_INDEX_SECURITY,
   name, value, len, 0);
 +
 + integrity_inode_init_integrity(inode, dir, name, value, len);
 + err = ext3_xattr_set_handle(handle, inode, EXT3_XATTR_INDEX_SECURITY,
 + name, value, len, 0);
 block end
May be i've missed something, but i don't get last block.
Why you call ext3_xattr_set_handle() twise?, 
or you just mistyped and  it has to look like this:
block_begin

+   integrity_inode_init_integrity(inode, dir, name, value, len);
err = ext3_xattr_set_handle(handle, inode, EXT3_XATTR_INDEX_SECURITY,
name, value, len, 0);
block_end
   kfree(name);
   kfree(value);
   return err;
 Index: linux-2.6.21-rc3-mm2/include/linux/fs.h
 ===
 --- linux-2.6.21-rc3-mm2.orig/include/linux/fs.h
 +++ linux-2.6.21-rc3-mm2/include/linux/fs.h
 @@ -591,6 +591,9 @@ struct inode {
  #ifdef CONFIG_SECURITY
   void*i_security;
  #endif
 +#ifdef CONFIG_INTEGRITY
 + void*i_integrity;
 +#endif
   void*i_private; /* fs or device private pointer */
  };
  
 Index: linux-2.6.21-rc3-mm2/fs/dcache.c
 ===
 --- linux-2.6.21-rc3-mm2.orig/fs/dcache.c
 +++ linux-2.6.21-rc3-mm2/fs/dcache.c
 @@ -29,6 +29,7 @@
  #include linux/file.h
  #include asm/uaccess.h
  #include linux/security.h
 +#include linux/integrity.h
  #include linux/seqlock.h
  #include linux/swap.h
  #include linux/bootmem.h
 @@ -986,6 +987,7 @@ void d_instantiate(struct dentry *entry,
   entry-d_inode = inode;
   fsnotify_d_instantiate(entry, inode);
   spin_unlock(dcache_lock);
 + integrity_d_instantiate(entry, inode);
   security_d_instantiate(entry, inode);
  }
  
 @@ -1050,6 +1052,7 @@ struct dentry *d_instantiate_unique(stru
   spin_unlock(dcache_lock);
  
   if (!result) {
 + integrity_d_instantiate(entry, inode);
   security_d_instantiate(entry, inode);
   return NULL;
   }
 @@ -1187,6 +1190,7 @@ struct dentry *d_splice_alias(struct ino
   BUG_ON(!(new-d_flags  DCACHE_DISCONNECTED));
   fsnotify_d_instantiate(new, inode);
   spin_unlock(dcache_lock);
 + integrity_d_instantiate(new, inode);
   security_d_instantiate(new, inode);
   d_rehash(dentry);
   d_move(new, dentry);
 @@ -1197,6 +1201,7 @@ struct dentry *d_splice_alias(struct ino
   dentry-d_inode = inode;
   fsnotify_d_instantiate(dentry, inode);
   spin_unlock(dcache_lock);
 + integrity_d_instantiate(dentry, inode);
   security_d_instantiate(dentry, inode);
   d_rehash(dentry);
   }
 @@ -1748,6 +1753,7 @@ found:
   spin_unlock(dcache_lock);
  out_nolock:
   if (actual == dentry) {
 + integrity_d_instantiate(dentry, inode);
   security_d_instantiate(dentry, inode);
   return NULL;
   }
 Index: linux-2.6.21-rc3-mm2/fs/file_table.c
 ===
 --- linux-2.6.21-rc3-mm2.orig/fs/file_table.c
 +++ linux-2.6.21-rc3-mm2/fs/file_table.c
 @@ -13,6 +13,7 @@
  #include linux/smp_lock.h
  #include linux/fs.h
  #include linux/security.h
 +#include linux/integrity.h
  #include linux/eventpoll.h
  #include linux/rcupdate.h
  #include linux/mount.h
 @@ -169,6 +170,7 @@ void fastcall __fput(struct file *file)
   if (file-f_op  file-f_op-release)
   file-f_op-release(inode, file);

Re: linux-2.6.21-rc2 compile warnings

2007-03-05 Thread Dmitriy Monakhov

"young dave" <[EMAIL PROTECTED]> writes:

> Hi,
> As I compile the linux-2.6.21-rc2  (gcc version 3.4.6), there's some
> warning  messages:
you should probably check -mm tree may be some issues already fixed where.
>
> drivers/video/Kconfig:1622:warning: 'select' used by config symbol
> 'FB_PS3' refer to undefined symbol 'PS3_PS3AV'
>
> fs/partitions/check.c: In function `add_partition':
> fs/partitions/check.c:388: warning: ignoring return value of
> `kobject_add', declared with attribute warn_unused_result
> fs/partitions/check.c:391: warning: ignoring return value of
> `sysfs_create_link', declared with attribute warn_unused_result
> fs/partitions/check.c:399: warning: ignoring return value of
> `sysfs_create_file', declared with attribute warn_unused_result
>
> drivers/ata/libata-sff.c: In function `ata_pci_init_one':
> drivers/ata/libata-sff.c:820: warning: ignoring return value of
> `pci_request_region', declared with attribute warn_unused_result
> drivers/ata/libata-sff.c:822: warning: ignoring return value of
> `pci_request_region', declared with attribute warn_unused_result
> drivers/ata/libata-sff.c:824: warning: ignoring return value of
> `pci_request_region', declared with attribute warn_unused_result
>
> drivers/base/core.c: In function `device_add':
> drivers/base/core.c:580: warning: ignoring return value of
> `sysfs_create_link', declared with attribute warn_unused_result
> drivers/base/core.c:585: warning: ignoring return value of
> `sysfs_create_link', declared with attribute warn_unused_result
> drivers/base/core.c:589: warning: ignoring return value of
> `sysfs_create_link', declared with attribute warn_unused_result
> drivers/base/core.c:594: warning: ignoring return value of
> `sysfs_create_link', declared with attribute warn_unused_result
> drivers/base/core.c: In function `device_rename':
> drivers/base/core.c:1039: warning: ignoring return value of
> `sysfs_create_link', declared with attribute warn_unused_result
> drivers/base/core.c:1049: warning: ignoring return value of
> `sysfs_create_link', declared with attribute warn_unused_result
>
> drivers/block/floppy.c: In function `floppy_init':
> drivers/block/floppy.c:4337: warning: ignoring return value of
I've already sent patch, but i wasn't added to -mm yet ( at least
it not rejecdted yet :) ) 
> `device_create_file', declared with attribute warn_unused_result
> drivers/cpufreq/cpufreq.c: In function `cpufreq_add_dev':
> drivers/cpufreq/cpufreq.c:826: warning: ignoring return value of
> `sysfs_create_file', declared with attribute warn_unused_result
> drivers/cpufreq/cpufreq.c:830: warning: ignoring return value of
> `sysfs_create_file', declared with attribute warn_unused_result
> drivers/cpufreq/cpufreq.c:832: warning: ignoring return value of
> `sysfs_create_file', declared with attribute warn_unused_result
>
> drivers/ide/pci/cs5530.c: In function `init_chipset_cs5530':
> drivers/ide/pci/cs5530.c:244: warning: ignoring return value of
> `pci_set_mwi', declared with attribute warn_unused_result
>
> drivers/ide/pci/sc1200.c: In function `sc1200_resume':
> drivers/ide/pci/sc1200.c:397: warning: ignoring return value of
> `pci_enable_device', declared with attribute warn_unused_result
>
> drivers/ide/setup-pci.c: In function `ide_scan_pcibus':
> drivers/ide/setup-pci.c:866: warning: ignoring return value of
> `__pci_register_driver', declared with attribute warn_unused_result
>
> drivers/input/mousedev.c: In function `mousedev_connect':
> drivers/input/mousedev.c:664: warning: ignoring return value of
> `sysfs_create_link', declared with attribute warn_unused_result
>
> drivers/input/joydev.c: In function `joydev_connect':
> drivers/input/joydev.c:542: warning: ignoring return value of
> `sysfs_create_link', declared with attribute warn_unused_result
>
>
> drivers/input/evdev.c: In function `evdev_connect':
> drivers/input/evdev.c:653: warning: ignoring return value of
> `sysfs_create_link', declared with attribute warn_unused_result
> drivers/message/i2o/device.c: In function `i2o_device_add':
> drivers/message/i2o/device.c:247: warning: ignoring return value of
> `sysfs_create_link', declared with attribute warn_unused_result
> drivers/message/i2o/device.c:254: warning: ignoring return value of
> `sysfs_create_link', declared with attribute warn_unused_result
> drivers/message/i2o/device.c:260: warning: ignoring return value of
> `sysfs_create_link', declared with attribute warn_unused_result
> drivers/message/i2o/device.c:267: warning: ignoring return value of
> `sysfs_create_link', declared with attribute warn_unused_result
>  CC  drivers/message/i2o/debug.
>
> drivers/net/sk98lin/skge.c: In function `skge_resume':
> drivers/net/sk98lin/skge.c:5128: warning: ignoring return value of
> `pci_enable_device', declared with attribute warn_unused_result
> drivers/net/tulip/xircom_tulip_cb.c: In function `outl_CSR6':
> drivers/net/tulip/xircom_tulip_cb.c:348: warning: `save_flags' is
> deprecated (declared at

Re: linux-2.6.21-rc2 compile warnings

2007-03-05 Thread Dmitriy Monakhov

young dave [EMAIL PROTECTED] writes:

 Hi,
 As I compile the linux-2.6.21-rc2  (gcc version 3.4.6), there's some
 warning  messages:
you should probably check -mm tree may be some issues already fixed where.

 drivers/video/Kconfig:1622:warning: 'select' used by config symbol
 'FB_PS3' refer to undefined symbol 'PS3_PS3AV'

 fs/partitions/check.c: In function `add_partition':
 fs/partitions/check.c:388: warning: ignoring return value of
 `kobject_add', declared with attribute warn_unused_result
 fs/partitions/check.c:391: warning: ignoring return value of
 `sysfs_create_link', declared with attribute warn_unused_result
 fs/partitions/check.c:399: warning: ignoring return value of
 `sysfs_create_file', declared with attribute warn_unused_result

 drivers/ata/libata-sff.c: In function `ata_pci_init_one':
 drivers/ata/libata-sff.c:820: warning: ignoring return value of
 `pci_request_region', declared with attribute warn_unused_result
 drivers/ata/libata-sff.c:822: warning: ignoring return value of
 `pci_request_region', declared with attribute warn_unused_result
 drivers/ata/libata-sff.c:824: warning: ignoring return value of
 `pci_request_region', declared with attribute warn_unused_result

 drivers/base/core.c: In function `device_add':
 drivers/base/core.c:580: warning: ignoring return value of
 `sysfs_create_link', declared with attribute warn_unused_result
 drivers/base/core.c:585: warning: ignoring return value of
 `sysfs_create_link', declared with attribute warn_unused_result
 drivers/base/core.c:589: warning: ignoring return value of
 `sysfs_create_link', declared with attribute warn_unused_result
 drivers/base/core.c:594: warning: ignoring return value of
 `sysfs_create_link', declared with attribute warn_unused_result
 drivers/base/core.c: In function `device_rename':
 drivers/base/core.c:1039: warning: ignoring return value of
 `sysfs_create_link', declared with attribute warn_unused_result
 drivers/base/core.c:1049: warning: ignoring return value of
 `sysfs_create_link', declared with attribute warn_unused_result

 drivers/block/floppy.c: In function `floppy_init':
 drivers/block/floppy.c:4337: warning: ignoring return value of
I've already sent patch, but i wasn't added to -mm yet ( at least
it not rejecdted yet :) ) 
 `device_create_file', declared with attribute warn_unused_result
 drivers/cpufreq/cpufreq.c: In function `cpufreq_add_dev':
 drivers/cpufreq/cpufreq.c:826: warning: ignoring return value of
 `sysfs_create_file', declared with attribute warn_unused_result
 drivers/cpufreq/cpufreq.c:830: warning: ignoring return value of
 `sysfs_create_file', declared with attribute warn_unused_result
 drivers/cpufreq/cpufreq.c:832: warning: ignoring return value of
 `sysfs_create_file', declared with attribute warn_unused_result

 drivers/ide/pci/cs5530.c: In function `init_chipset_cs5530':
 drivers/ide/pci/cs5530.c:244: warning: ignoring return value of
 `pci_set_mwi', declared with attribute warn_unused_result

 drivers/ide/pci/sc1200.c: In function `sc1200_resume':
 drivers/ide/pci/sc1200.c:397: warning: ignoring return value of
 `pci_enable_device', declared with attribute warn_unused_result

 drivers/ide/setup-pci.c: In function `ide_scan_pcibus':
 drivers/ide/setup-pci.c:866: warning: ignoring return value of
 `__pci_register_driver', declared with attribute warn_unused_result

 drivers/input/mousedev.c: In function `mousedev_connect':
 drivers/input/mousedev.c:664: warning: ignoring return value of
 `sysfs_create_link', declared with attribute warn_unused_result

 drivers/input/joydev.c: In function `joydev_connect':
 drivers/input/joydev.c:542: warning: ignoring return value of
 `sysfs_create_link', declared with attribute warn_unused_result


 drivers/input/evdev.c: In function `evdev_connect':
 drivers/input/evdev.c:653: warning: ignoring return value of
 `sysfs_create_link', declared with attribute warn_unused_result
 drivers/message/i2o/device.c: In function `i2o_device_add':
 drivers/message/i2o/device.c:247: warning: ignoring return value of
 `sysfs_create_link', declared with attribute warn_unused_result
 drivers/message/i2o/device.c:254: warning: ignoring return value of
 `sysfs_create_link', declared with attribute warn_unused_result
 drivers/message/i2o/device.c:260: warning: ignoring return value of
 `sysfs_create_link', declared with attribute warn_unused_result
 drivers/message/i2o/device.c:267: warning: ignoring return value of
 `sysfs_create_link', declared with attribute warn_unused_result
  CC  drivers/message/i2o/debug.

 drivers/net/sk98lin/skge.c: In function `skge_resume':
 drivers/net/sk98lin/skge.c:5128: warning: ignoring return value of
 `pci_enable_device', declared with attribute warn_unused_result
 drivers/net/tulip/xircom_tulip_cb.c: In function `outl_CSR6':
 drivers/net/tulip/xircom_tulip_cb.c:348: warning: `save_flags' is
 deprecated (declared at include/linux/interrupt.h:214)
 drivers/net/tulip/xircom_tulip_cb.c:349: warning: `cli' is deprecated

Re: [PATCH] ext3: dirindex error pointer issues

2007-03-04 Thread Dmitriy Monakhov

Andreas Dilger <[EMAIL PROTECTED]> writes:

> On Mar 04, 2007  17:18 +0300, Dmitriy Monakhov wrote:
>>  - ext3_dx_find_entry() exit with out setting proper error pointer
>>  - do_split() exit with out setting proper error pointer
>>it is realy painful because many callers contain folowing code:
>>de = do_split(handle,dir, , frame, , );
>>if (!(de))
>> return retval;
>><<< WOW retval wasn't changed by do_split(), so caller failed
>><<< but return SUCCESS :)
>>  - Rearrange do_split() error path. Current error path is realy ugly, all
>>this up and down jump stuff doesn't make code easy to understand.
>> 
>> Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]>
>> ---
>>  fs/ext3/namei.c |   26 +++---
>>  fs/ext4/namei.c |   26 +++---
>>  2 files changed, 30 insertions(+), 22 deletions(-)
>> 
>> diff --git a/fs/ext3/namei.c b/fs/ext3/namei.c
>> index 49159f1..1a52586 100644
>> --- a/fs/ext3/namei.c
>> +++ b/fs/ext3/namei.c
>> @@ -969,6 +969,7 @@ static struct buffer_head * ext3_dx_find_entry(struct 
>> dentry *dentry,
>>(block<>+((char *)de - bh->b_data))) {
>>  brelse (bh);
>> +*err = ERR_BAD_DX_DIR;
>>  goto errout;
>>  }
>>  *res_dir = de;
>
> I have no objection to this change (by principle of least surprise) but
> I don't know if it fixes a real problem.  The one caller of this function
> treats ERR_BAD_DX_DIR the same as bh == NULL.
  Execly  ERR_BAD_DX_DIR treats  the same as bh == NULL and let's look at
  callers code: 
  struct buffer_head * ext3_find_entry(..)
  {
  ...
  bh = ext3_dx_find_entry(dentry, res_dir, );
/*
 * On success, or if the error was file not found,
 * return.  Otherwise, fall back to doing a search the
 * old fashioned way.
 */
if (bh || (err != ERR_BAD_DX_DIR))
  <<<<< after ext3_dx_find_entry() has failed , but don't set err pointer 
  <<<<< we get  (bh == NULL, err == 0) , and then just return NULL.
return bh;
 ...
 }

>
>> @@ -1134,9 +1135,9 @@ static struct ext3_dir_entry_2 *do_split(handle_t 
>> *handle, struct inode *dir,
>>  char *data1 = (*bh)->b_data, *data2;
>>  unsigned split;
>>  struct ext3_dir_entry_2 *de = NULL, *de2;
>> -int err;
>> +int err = 0;
>>  
>> -bh2 = ext3_append (handle, dir, , error);
>> +bh2 = ext3_append (handle, dir, , );
>
> Why don't we just remove "err" entirely and use *error to avoid any risk
> of setting err and not returning it in error?  Also reduces stack usage
> that tiny little bit.
I've chosen this approuch becouse of:
1) this approuch was used in block allocation code.
2) this approuch prevent  folowing errors:
  *error = do_some_function(.)
   /* some code*/
  if(error) 
 we do this checking as we do it thousands times before, but forget
 what error it pointer here. And compiler can't warn us here because 
 it is absolutely legal operation. At least it is better to rename 
 error to errorp.

Btw: I've thought about adding assertations to error path witch may check
what errp pointer was properly initialized after error happends.
folowing code is draft only:
--- a/fs/ext3/namei.c
+++ b/fs/ext3/namei.c
@@ -63,6 +63,7 @@ static struct buffer_head *ext3_append(handle_t *handle,
EXT3_I(inode)->i_disksize = inode->i_size;
ext3_journal_get_write_access(handle,bh);
}
+   assert(bh || *err);
return bh;
 }
 
@@ -433,6 +434,7 @@ fail2:
frame--;
}
 fail:
+   assert(*err != 0);
return NULL;
 }

>
>
> In ext3_dx_add_entry() it appears like we should "goto journal_error"
> to report an error after the failed call to do_split(), instead of just
> "goto cleanup" so that we report an error in the filesystem.  Not 100%
> sure of this.
do_split() already invoked ext3_std_error() on it's "journal_error" path.

>
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Software Engineer
> Cluster File Systems, Inc.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ext3: dirindex error pointer issues

2007-03-04 Thread Dmitriy Monakhov

 - ext3_dx_find_entry() exit with out setting proper error pointer
 - do_split() exit with out setting proper error pointer
   it is realy painful because many callers contain folowing code:
   de = do_split(handle,dir, , frame, , );
   if (!(de))
return retval;
   <<< WOW retval wasn't changed by do_split(), so caller failed
   <<< but return SUCCESS :)
 - Rearrange do_split() error path. Current error path is realy ugly, all
   this up and down jump stuff doesn't make code easy to understand.

Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]>
---
 fs/ext3/namei.c |   26 +++---
 fs/ext4/namei.c |   26 +++---
 2 files changed, 30 insertions(+), 22 deletions(-)

diff --git a/fs/ext3/namei.c b/fs/ext3/namei.c
index 49159f1..1a52586 100644
--- a/fs/ext3/namei.c
+++ b/fs/ext3/namei.c
@@ -969,6 +969,7 @@ static struct buffer_head * ext3_dx_find_entry(struct 
dentry *dentry,
  (blockb_data, *data2;
unsigned split;
struct ext3_dir_entry_2 *de = NULL, *de2;
-   int err;
+   int err = 0;
 
-   bh2 = ext3_append (handle, dir, , error);
+   bh2 = ext3_append (handle, dir, , );
if (!(bh2)) {
brelse(*bh);
*bh = NULL;
@@ -1145,14 +1146,9 @@ static struct ext3_dir_entry_2 *do_split(handle_t 
*handle, struct inode *dir,
 
BUFFER_TRACE(*bh, "get_write_access");
err = ext3_journal_get_write_access(handle, *bh);
-   if (err) {
-   journal_error:
-   brelse(*bh);
-   brelse(bh2);
-   *bh = NULL;
-   ext3_std_error(dir->i_sb, err);
-   goto errout;
-   }
+   if (err)
+   goto journal_error;
+
BUFFER_TRACE(frame->bh, "get_write_access");
err = ext3_journal_get_write_access(handle, frame->bh);
if (err)
@@ -1195,8 +1191,16 @@ static struct ext3_dir_entry_2 *do_split(handle_t 
*handle, struct inode *dir,
goto journal_error;
brelse (bh2);
dxtrace(dx_show_index ("frame", frame->entries));
-errout:
return de;
+
+journal_error:
+   brelse(*bh);
+   brelse(bh2);
+   *bh = NULL;
+errout:
+   ext3_std_error(dir->i_sb, err);
+   *error = err;
+   return NULL;
 }
 #endif
 
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index e7e1d79..682a3d7 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -967,6 +967,7 @@ static struct buffer_head * ext4_dx_find_entry(struct 
dentry *dentry,
  (blockb_data, *data2;
unsigned split;
struct ext4_dir_entry_2 *de = NULL, *de2;
-   int err;
+   int err = 0;
 
-   bh2 = ext4_append (handle, dir, , error);
+   bh2 = ext4_append (handle, dir, , );
if (!(bh2)) {
brelse(*bh);
*bh = NULL;
@@ -1143,14 +1144,9 @@ static struct ext4_dir_entry_2 *do_split(handle_t 
*handle, struct inode *dir,
 
BUFFER_TRACE(*bh, "get_write_access");
err = ext4_journal_get_write_access(handle, *bh);
-   if (err) {
-   journal_error:
-   brelse(*bh);
-   brelse(bh2);
-   *bh = NULL;
-   ext4_std_error(dir->i_sb, err);
-   goto errout;
-   }
+   if (err)
+   goto journal_error;
+
BUFFER_TRACE(frame->bh, "get_write_access");
err = ext4_journal_get_write_access(handle, frame->bh);
if (err)
@@ -1193,8 +1189,16 @@ static struct ext4_dir_entry_2 *do_split(handle_t 
*handle, struct inode *dir,
goto journal_error;
brelse (bh2);
dxtrace(dx_show_index ("frame", frame->entries));
-errout:
return de;
+
+journal_error:
+   brelse(*bh);
+   brelse(bh2);
+   *bh = NULL;
+errout:
+   ext4_std_error(dir->i_sb, err);
+   *error = err;
+   return NULL;
 }
 #endif
 
-- 
1.5.0.1


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] udf: possible null pointer dereference while load_partition

2007-03-04 Thread Dmitriy Monakhov


sb_read may return NULL, let's explicitly check it.
Signed-off-by: Dmitriy Monakhov <[EMAIL PROTECTED]>

diff --git a/fs/udf/super.c b/fs/udf/super.c
index f4b3265..951b85d 100644
--- a/fs/udf/super.c
+++ b/fs/udf/super.c
@@ -1407,6 +1407,8 @@ udf_load_partition(struct super_block *sb, kernel_lb_addr 
*fileset)
 
pos = udf_block_map(UDF_SB_VAT(sb), 0);
bh = sb_bread(sb, pos);
+   if (!bh)
+   return 1;
UDF_SB_TYPEVIRT(sb,i).s_start_offset =
le16_to_cpu(((struct 
virtualAllocationTable20 *)bh->b_data + 
udf_ext0_offset(UDF_SB_VAT(sb)))->lengthHeader) +

udf_ext0_offset(UDF_SB_VAT(sb));

[patch] udf: possible null pointer dereference while load_partition

2007-03-04 Thread Dmitriy Monakhov


sb_read may return NULL, let's explicitly check it.
Signed-off-by: Dmitriy Monakhov [EMAIL PROTECTED]

diff --git a/fs/udf/super.c b/fs/udf/super.c
index f4b3265..951b85d 100644
--- a/fs/udf/super.c
+++ b/fs/udf/super.c
@@ -1407,6 +1407,8 @@ udf_load_partition(struct super_block *sb, kernel_lb_addr 
*fileset)
 
pos = udf_block_map(UDF_SB_VAT(sb), 0);
bh = sb_bread(sb, pos);
+   if (!bh)
+   return 1;
UDF_SB_TYPEVIRT(sb,i).s_start_offset =
le16_to_cpu(((struct 
virtualAllocationTable20 *)bh-b_data + 
udf_ext0_offset(UDF_SB_VAT(sb)))-lengthHeader) +

udf_ext0_offset(UDF_SB_VAT(sb));

[PATCH] ext3: dirindex error pointer issues

2007-03-04 Thread Dmitriy Monakhov

 - ext3_dx_find_entry() exit with out setting proper error pointer
 - do_split() exit with out setting proper error pointer
   it is realy painful because many callers contain folowing code:
   de = do_split(handle,dir, bh, frame, hinfo, retval);
   if (!(de))
return retval;
WOW retval wasn't changed by do_split(), so caller failed
but return SUCCESS :)
 - Rearrange do_split() error path. Current error path is realy ugly, all
   this up and down jump stuff doesn't make code easy to understand.

Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]
---
 fs/ext3/namei.c |   26 +++---
 fs/ext4/namei.c |   26 +++---
 2 files changed, 30 insertions(+), 22 deletions(-)

diff --git a/fs/ext3/namei.c b/fs/ext3/namei.c
index 49159f1..1a52586 100644
--- a/fs/ext3/namei.c
+++ b/fs/ext3/namei.c
@@ -969,6 +969,7 @@ static struct buffer_head * ext3_dx_find_entry(struct 
dentry *dentry,
  (blockEXT3_BLOCK_SIZE_BITS(sb))
  +((char *)de - bh-b_data))) {
brelse (bh);
+   *err = ERR_BAD_DX_DIR;
goto errout;
}
*res_dir = de;
@@ -1134,9 +1135,9 @@ static struct ext3_dir_entry_2 *do_split(handle_t 
*handle, struct inode *dir,
char *data1 = (*bh)-b_data, *data2;
unsigned split;
struct ext3_dir_entry_2 *de = NULL, *de2;
-   int err;
+   int err = 0;
 
-   bh2 = ext3_append (handle, dir, newblock, error);
+   bh2 = ext3_append (handle, dir, newblock, err);
if (!(bh2)) {
brelse(*bh);
*bh = NULL;
@@ -1145,14 +1146,9 @@ static struct ext3_dir_entry_2 *do_split(handle_t 
*handle, struct inode *dir,
 
BUFFER_TRACE(*bh, get_write_access);
err = ext3_journal_get_write_access(handle, *bh);
-   if (err) {
-   journal_error:
-   brelse(*bh);
-   brelse(bh2);
-   *bh = NULL;
-   ext3_std_error(dir-i_sb, err);
-   goto errout;
-   }
+   if (err)
+   goto journal_error;
+
BUFFER_TRACE(frame-bh, get_write_access);
err = ext3_journal_get_write_access(handle, frame-bh);
if (err)
@@ -1195,8 +1191,16 @@ static struct ext3_dir_entry_2 *do_split(handle_t 
*handle, struct inode *dir,
goto journal_error;
brelse (bh2);
dxtrace(dx_show_index (frame, frame-entries));
-errout:
return de;
+
+journal_error:
+   brelse(*bh);
+   brelse(bh2);
+   *bh = NULL;
+errout:
+   ext3_std_error(dir-i_sb, err);
+   *error = err;
+   return NULL;
 }
 #endif
 
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index e7e1d79..682a3d7 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -967,6 +967,7 @@ static struct buffer_head * ext4_dx_find_entry(struct 
dentry *dentry,
  (blockEXT4_BLOCK_SIZE_BITS(sb))
  +((char *)de - bh-b_data))) {
brelse (bh);
+   *err = ERR_BAD_DX_DIR;
goto errout;
}
*res_dir = de;
@@ -1132,9 +1133,9 @@ static struct ext4_dir_entry_2 *do_split(handle_t 
*handle, struct inode *dir,
char *data1 = (*bh)-b_data, *data2;
unsigned split;
struct ext4_dir_entry_2 *de = NULL, *de2;
-   int err;
+   int err = 0;
 
-   bh2 = ext4_append (handle, dir, newblock, error);
+   bh2 = ext4_append (handle, dir, newblock, err);
if (!(bh2)) {
brelse(*bh);
*bh = NULL;
@@ -1143,14 +1144,9 @@ static struct ext4_dir_entry_2 *do_split(handle_t 
*handle, struct inode *dir,
 
BUFFER_TRACE(*bh, get_write_access);
err = ext4_journal_get_write_access(handle, *bh);
-   if (err) {
-   journal_error:
-   brelse(*bh);
-   brelse(bh2);
-   *bh = NULL;
-   ext4_std_error(dir-i_sb, err);
-   goto errout;
-   }
+   if (err)
+   goto journal_error;
+
BUFFER_TRACE(frame-bh, get_write_access);
err = ext4_journal_get_write_access(handle, frame-bh);
if (err)
@@ -1193,8 +1189,16 @@ static struct ext4_dir_entry_2 *do_split(handle_t 
*handle, struct inode *dir,
goto journal_error;
brelse (bh2);
dxtrace(dx_show_index (frame, frame-entries));
-errout:
return de;
+
+journal_error:
+   brelse(*bh);
+   brelse(bh2);
+   *bh = NULL;
+errout:
+   ext4_std_error(dir-i_sb, err);
+   *error = err;
+   return NULL;
 }
 #endif
 
-- 
1.5.0.1


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to

Re: [PATCH] ext3: dirindex error pointer issues

2007-03-04 Thread Dmitriy Monakhov

Andreas Dilger [EMAIL PROTECTED] writes:

 On Mar 04, 2007  17:18 +0300, Dmitriy Monakhov wrote:
  - ext3_dx_find_entry() exit with out setting proper error pointer
  - do_split() exit with out setting proper error pointer
it is realy painful because many callers contain folowing code:
de = do_split(handle,dir, bh, frame, hinfo, retval);
if (!(de))
 return retval;
 WOW retval wasn't changed by do_split(), so caller failed
 but return SUCCESS :)
  - Rearrange do_split() error path. Current error path is realy ugly, all
this up and down jump stuff doesn't make code easy to understand.
 
 Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]
 ---
  fs/ext3/namei.c |   26 +++---
  fs/ext4/namei.c |   26 +++---
  2 files changed, 30 insertions(+), 22 deletions(-)
 
 diff --git a/fs/ext3/namei.c b/fs/ext3/namei.c
 index 49159f1..1a52586 100644
 --- a/fs/ext3/namei.c
 +++ b/fs/ext3/namei.c
 @@ -969,6 +969,7 @@ static struct buffer_head * ext3_dx_find_entry(struct 
 dentry *dentry,
(blockEXT3_BLOCK_SIZE_BITS(sb))
+((char *)de - bh-b_data))) {
  brelse (bh);
 +*err = ERR_BAD_DX_DIR;
  goto errout;
  }
  *res_dir = de;

 I have no objection to this change (by principle of least surprise) but
 I don't know if it fixes a real problem.  The one caller of this function
 treats ERR_BAD_DX_DIR the same as bh == NULL.
  Execly  ERR_BAD_DX_DIR treats  the same as bh == NULL and let's look at
  callers code: 
  struct buffer_head * ext3_find_entry(..)
  {
  ...
  bh = ext3_dx_find_entry(dentry, res_dir, err);
/*
 * On success, or if the error was file not found,
 * return.  Otherwise, fall back to doing a search the
 * old fashioned way.
 */
if (bh || (err != ERR_BAD_DX_DIR))
   after ext3_dx_find_entry() has failed , but don't set err pointer 
   we get  (bh == NULL, err == 0) , and then just return NULL.
return bh;
 ...
 }


 @@ -1134,9 +1135,9 @@ static struct ext3_dir_entry_2 *do_split(handle_t 
 *handle, struct inode *dir,
  char *data1 = (*bh)-b_data, *data2;
  unsigned split;
  struct ext3_dir_entry_2 *de = NULL, *de2;
 -int err;
 +int err = 0;
  
 -bh2 = ext3_append (handle, dir, newblock, error);
 +bh2 = ext3_append (handle, dir, newblock, err);

 Why don't we just remove err entirely and use *error to avoid any risk
 of setting err and not returning it in error?  Also reduces stack usage
 that tiny little bit.
I've chosen this approuch becouse of:
1) this approuch was used in block allocation code.
2) this approuch prevent  folowing errors:
  *error = do_some_function(.)
   /* some code*/
  if(error) 
 we do this checking as we do it thousands times before, but forget
 what error it pointer here. And compiler can't warn us here because 
 it is absolutely legal operation. At least it is better to rename 
 error to errorp.

Btw: I've thought about adding assertations to error path witch may check
what errp pointer was properly initialized after error happends.
folowing code is draft only:
--- a/fs/ext3/namei.c
+++ b/fs/ext3/namei.c
@@ -63,6 +63,7 @@ static struct buffer_head *ext3_append(handle_t *handle,
EXT3_I(inode)-i_disksize = inode-i_size;
ext3_journal_get_write_access(handle,bh);
}
+   assert(bh || *err);
return bh;
 }
 
@@ -433,6 +434,7 @@ fail2:
frame--;
}
 fail:
+   assert(*err != 0);
return NULL;
 }



 In ext3_dx_add_entry() it appears like we should goto journal_error
 to report an error after the failed call to do_split(), instead of just
 goto cleanup so that we report an error in the filesystem.  Not 100%
 sure of this.
do_split() already invoked ext3_std_error() on it's journal_error path.


 Cheers, Andreas
 --
 Andreas Dilger
 Principal Software Engineer
 Cluster File Systems, Inc.

 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] kobject: new_device->kref wasn't putted in kobject_move()

2007-03-03 Thread Dmitriy Monakhov

Dmitriy Monakhov <[EMAIL PROTECTED]> writes:

> Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]>
>
> diff --git a/lib/kobject.c b/lib/kobject.c
> index b94f208..b11f7b2 100644
> --- a/lib/kobject.c
> +++ b/lib/kobject.c
> @@ -432,6 +432,7 @@ int kobject_move(struct kobject *kobj, struct kobject 
> *new_parent)
>   kobject_put(old_parent);
>   kobject_uevent_env(kobj, KOBJ_MOVE, envp);
>  out:
> + kobject_put(new_parent);
>   kobject_put(kobj);
>   kfree(devpath_string);
>   kfree(devpath);
OOps i'm realy sorry, by occasion patch was incomplete :(
The updated patch version following:

  [PATCH] kobject: new_device->kref wasn't putted after error in kobject_move()
  If error happen we jump to "out" label, in this case new_device not yet 
  became the parent but it wasn't putted.
 
  Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]>

diff --git a/lib/kobject.c b/lib/kobject.c
index b94f208..a6c9a06 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -429,9 +429,11 @@ int kobject_move(struct kobject *kobj, struct kobject 
*new_parent)
goto out;
old_parent = kobj->parent;
kobj->parent = new_parent;
+   new_parent = NULL;
kobject_put(old_parent);
kobject_uevent_env(kobj, KOBJ_MOVE, envp);
 out:
+   kobject_put(new_parent);
kobject_put(kobj);
kfree(devpath_string);
kfree(devpath);


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] kobject: new_device->kref wasn't putted in kobject_move()

2007-03-03 Thread Dmitriy Monakhov

Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]>

diff --git a/lib/kobject.c b/lib/kobject.c
index b94f208..b11f7b2 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -432,6 +432,7 @@ int kobject_move(struct kobject *kobj, struct kobject 
*new_parent)
kobject_put(old_parent);
kobject_uevent_env(kobj, KOBJ_MOVE, envp);
 out:
+   kobject_put(new_parent);
kobject_put(kobj);
kfree(devpath_string);
kfree(devpath);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] kobject: new_device-kref wasn't putted in kobject_move()

2007-03-03 Thread Dmitriy Monakhov

Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]

diff --git a/lib/kobject.c b/lib/kobject.c
index b94f208..b11f7b2 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -432,6 +432,7 @@ int kobject_move(struct kobject *kobj, struct kobject 
*new_parent)
kobject_put(old_parent);
kobject_uevent_env(kobj, KOBJ_MOVE, envp);
 out:
+   kobject_put(new_parent);
kobject_put(kobj);
kfree(devpath_string);
kfree(devpath);

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] kobject: new_device-kref wasn't putted in kobject_move()

2007-03-03 Thread Dmitriy Monakhov

Dmitriy Monakhov [EMAIL PROTECTED] writes:

 Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]

 diff --git a/lib/kobject.c b/lib/kobject.c
 index b94f208..b11f7b2 100644
 --- a/lib/kobject.c
 +++ b/lib/kobject.c
 @@ -432,6 +432,7 @@ int kobject_move(struct kobject *kobj, struct kobject 
 *new_parent)
   kobject_put(old_parent);
   kobject_uevent_env(kobj, KOBJ_MOVE, envp);
  out:
 + kobject_put(new_parent);
   kobject_put(kobj);
   kfree(devpath_string);
   kfree(devpath);
OOps i'm realy sorry, by occasion patch was incomplete :(
The updated patch version following:

  [PATCH] kobject: new_device-kref wasn't putted after error in kobject_move()
  If error happen we jump to out label, in this case new_device not yet 
  became the parent but it wasn't putted.
 
  Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]

diff --git a/lib/kobject.c b/lib/kobject.c
index b94f208..a6c9a06 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -429,9 +429,11 @@ int kobject_move(struct kobject *kobj, struct kobject 
*new_parent)
goto out;
old_parent = kobj-parent;
kobj-parent = new_parent;
+   new_parent = NULL;
kobject_put(old_parent);
kobject_uevent_env(kobj, KOBJ_MOVE, envp);
 out:
+   kobject_put(new_parent);
kobject_put(kobj);
kfree(devpath_string);
kfree(devpath);


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] floppy: handle device_create_file() failure while init

2007-03-02 Thread Dmitriy Monakhov

This patch kills the "ignoring return value of 'device_create_file'" 
warning message.
Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]>
---
 drivers/block/floppy.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index 567c630..dfdabc3 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -4333,7 +4333,10 @@ static int __init floppy_init(void)
if (err)
goto out_flush_work;
 
-   device_create_file(_device[drive].dev,_attr_cmos);
+   err = 
device_create_file(_device[drive].dev,_attr_cmos);
+   if (err)
+   goto out_unreg_platform_dev;
+
/* to be cleaned up... */
disks[drive]->private_data = (void *)(long)drive;
disks[drive]->queue = floppy_queue;
@@ -4344,6 +4347,8 @@ static int __init floppy_init(void)
 
return 0;
 
+out_unreg_platform_dev:
+   platform_device_unregister(_device[drive]);
 out_flush_work:
flush_scheduled_work();
if (usage_count)
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] floppy: handle device_create_file() failure while init

2007-03-02 Thread Dmitriy Monakhov

This patch kills the ignoring return value of 'device_create_file' 
warning message.
Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]
---
 drivers/block/floppy.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index 567c630..dfdabc3 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -4333,7 +4333,10 @@ static int __init floppy_init(void)
if (err)
goto out_flush_work;
 
-   device_create_file(floppy_device[drive].dev,dev_attr_cmos);
+   err = 
device_create_file(floppy_device[drive].dev,dev_attr_cmos);
+   if (err)
+   goto out_unreg_platform_dev;
+
/* to be cleaned up... */
disks[drive]-private_data = (void *)(long)drive;
disks[drive]-queue = floppy_queue;
@@ -4344,6 +4347,8 @@ static int __init floppy_init(void)
 
return 0;
 
+out_unreg_platform_dev:
+   platform_device_unregister(floppy_device[drive]);
 out_flush_work:
flush_scheduled_work();
if (usage_count)
-- 
1.4.4.2

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ecryptfs: check xattr operation support fix

2007-02-28 Thread Dmitriy Monakhov


  - ecryptfs_write_inode_size_to_metadata() error code was ignored.
  - i_op->setxattr() must be supported by lower fs because used below.

Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]>
---
 fs/ecryptfs/inode.c |6 +++---
 fs/ecryptfs/mmap.c  |3 ++-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 27fd14a..9ccefad 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -168,9 +168,9 @@ static int grow_file(struct dentry *ecryptfs_dentry, struct 
file *lower_file,
goto out;
}
i_size_write(inode, 0);
-   ecryptfs_write_inode_size_to_metadata(lower_file, lower_inode, inode,
- ecryptfs_dentry,
- ECRYPTFS_LOWER_I_MUTEX_NOT_HELD);
+   rc = ecryptfs_write_inode_size_to_metadata(lower_file, lower_inode,
+   inode, ecryptfs_dentry,
+   ECRYPTFS_LOWER_I_MUTEX_NOT_HELD);
ecryptfs_inode_to_private(inode)->crypt_stat.flags |= ECRYPTFS_NEW_FILE;
 out:
return rc;
diff --git a/fs/ecryptfs/mmap.c b/fs/ecryptfs/mmap.c
index 1e5d2ba..416985f 100644
--- a/fs/ecryptfs/mmap.c
+++ b/fs/ecryptfs/mmap.c
@@ -491,7 +491,8 @@ static int ecryptfs_write_inode_size_to_xattr(struct inode 
*lower_inode,
goto out;
}
lower_dentry = ecryptfs_dentry_to_lower(ecryptfs_dentry);
-   if (!lower_dentry->d_inode->i_op->getxattr) {
+   if (!lower_dentry->d_inode->i_op->getxattr ||
+   !lower_dentry->d_inode->i_op->setxattr) {
printk(KERN_WARNING
   "No support for setting xattr in lower filesystem\n");
rc = -ENOSYS;
-- 
1.5.0.1


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH][RFC] ext3: Handle ext[34]_journal_stop() failure

2007-02-28 Thread Dmitriy Monakhov


Where are many places where _journal_stop() return code wasn't
checked. Off cause _journal_stop() failed very rarely (and usually
with fatal consequences), but this does'n meen it should not be checked.
For example most retry loops looks like follows:
ext3_journal_stop(handle);
if (err == -ENOSPC && ext3_should_retry_alloc(dir->i_sb, ))
goto retry;
It is realy insane do "retry" if ext3_journal_stop() has failed, because
this may increase damage in case of real problem.

Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]>
---
 fs/ext3/acl.c   |   12 
 fs/ext3/inode.c |   22 +-
 fs/ext3/ioctl.c |   12 
 fs/ext3/namei.c |   48 
 fs/ext3/xattr.c |4 ++--
 fs/ext4/acl.c   |   12 
 fs/ext4/inode.c |   22 +-
 fs/ext4/ioctl.c |   12 
 fs/ext4/namei.c |   48 
 fs/ext4/xattr.c |4 ++--
 10 files changed, 134 insertions(+), 62 deletions(-)

diff --git a/fs/ext3/acl.c b/fs/ext3/acl.c
index 1e5038d..bbf4d8a 100644
--- a/fs/ext3/acl.c
+++ b/fs/ext3/acl.c
@@ -375,7 +375,7 @@ int
 ext3_acl_chmod(struct inode *inode)
 {
struct posix_acl *acl, *clone;
-int error;
+   int error, error2;
 
if (S_ISLNK(inode->i_mode))
return -EOPNOTSUPP;
@@ -402,7 +402,9 @@ ext3_acl_chmod(struct inode *inode)
goto out;
}
error = ext3_set_acl(handle, inode, ACL_TYPE_ACCESS, clone);
-   ext3_journal_stop(handle);
+   error2 = ext3_journal_stop(handle);
+   if (error2 && (!error || error == -ENOSPC))
+   error = error2;
if (error == -ENOSPC &&
ext3_should_retry_alloc(inode->i_sb, ))
goto retry;
@@ -485,7 +487,7 @@ ext3_xattr_set_acl(struct inode *inode, int type, const 
void *value,
 {
handle_t *handle;
struct posix_acl *acl;
-   int error, retries = 0;
+   int error, error2, retries = 0;
 
if (!test_opt(inode->i_sb, POSIX_ACL))
return -EOPNOTSUPP;
@@ -509,7 +511,9 @@ retry:
if (IS_ERR(handle))
return PTR_ERR(handle);
error = ext3_set_acl(handle, inode, type, acl);
-   ext3_journal_stop(handle);
+   error2 = ext3_journal_stop(handle);
+   if (error2 && (!error || error == -ENOSPC))
+   error = error2;
if (error == -ENOSPC && ext3_should_retry_alloc(inode->i_sb, ))
goto retry;
 
diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index 816f0c5..68d1c71 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -962,7 +962,9 @@ static int ext3_get_block(struct inode *inode, sector_t 
iblock,
 * Huge direct-io writes can hold off commits for long
 * periods of time.  Let this commit run.
 */
-   ext3_journal_stop(handle);
+   ret = ext3_journal_stop(handle);
+   if (ret)
+   goto get_block;
handle = ext3_journal_start(inode, DIO_CREDITS);
if (IS_ERR(handle))
ret = PTR_ERR(handle);
@@ -2998,7 +3000,13 @@ int ext3_setattr(struct dentry *dentry, struct iattr 
*attr)
if (attr->ia_valid & ATTR_GID)
inode->i_gid = attr->ia_gid;
error = ext3_mark_inode_dirty(handle, inode);
-   ext3_journal_stop(handle);
+   if (error) {
+   ext3_journal_stop(handle);
+   goto err_out;
+   }
+   error = ext3_journal_stop(handle);
+   if (error)
+   goto err_out;
}
 
if (S_ISREG(inode->i_mode) &&
@@ -3016,7 +3024,9 @@ int ext3_setattr(struct dentry *dentry, struct iattr 
*attr)
rc = ext3_mark_inode_dirty(handle, inode);
if (!error)
error = rc;
-   ext3_journal_stop(handle);
+   rc = ext3_journal_stop(handle);
+   if (!error)
+   error = rc;
}
 
rc = inode_setattr(inode, attr);
@@ -3231,7 +3241,7 @@ int ext3_change_inode_journal_flag(struct inode *inode, 
int val)
 {
journal_t *journal;
handle_t *handle;
-   int err;
+   int err, err2;
 
/*
 * We have to be very careful here: changing a data block's
@@ -3274,7 +3284,9 @@ int ext3_change_inode_journal_flag(struct inode *inode, 
int val)
 
err = ext3_mark_inode_dirty(handle, inode);
handle->h_sync = 1;
-   ext3_journal_stop(handle);
+   err2 = ext3_journal_stop(handle);
+   if (!err)
+   err = err2;
ext3_std_error(inode->i_sb, err);
 
return err;
diff --git a/fs/ext3/ioctl.c b/fs/ext3/ioctl.c
index

[PATCH] mm: be sure to trim blocks after direct_io has failed

2007-02-28 Thread Dmitriy Monakhov

This is updated version of patch aimed to fix direct_io error handling issue
i've previously sent 2 wheeks ago. If you don't like anything in this patch
plese let me know.

Changes:
  - comments added. I think now it is clearly describe things.
  - patch prepared against 2.6.20-mm2

How this patch tested:
  - LTP test, and other readv/writev op tests.
  - fsstress test.
  - manual direct_io tests.

Log:
 - Move common segment checks to separate helper function.
 - Trim off blocks after generic_file_direct_write() has failed.
 - Update out of date comments about direct_io locking rules.

Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]>
---
 mm/filemap.c |  107 +++--
 1 files changed, 66 insertions(+), 41 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index a9284c2..c81184c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1147,6 +1147,38 @@ success:
return size;
 }
 
+/*
+ * Performs necessary checks before doing a write
+ *
+ * Adjust number of segments and amount of bytes to write.
+ * Returns appropriate error code that caller should return or
+ * zero in case that write should be allowed.
+ */
+int generic_segment_checks(const struct iovec *iov, unsigned long *nr_segs,
+   size_t *count, unsigned long access_flags)
+{
+   unsigned long   seg;
+   for (seg = 0; seg < *nr_segs; seg++) {
+   const struct iovec *iv = [seg];
+
+   /*
+* If any segment has a negative length, or the cumulative
+* length ever wraps negative then return -EINVAL.
+*/
+   *count += iv->iov_len;
+   if (unlikely((ssize_t)(*count|iv->iov_len) < 0))
+   return -EINVAL;
+   if (access_ok(access_flags, iv->iov_base, iv->iov_len))
+   continue;
+   if (seg == 0)
+   return -EFAULT;
+   *nr_segs = seg;
+   *count -= iv->iov_len;  /* This segment is no good */
+   break;
+   }
+   return 0;
+}
+
 /**
  * generic_file_aio_read - generic filesystem read routine
  * @iocb:  kernel I/O control block
@@ -1168,24 +1200,9 @@ generic_file_aio_read(struct kiocb *iocb, const struct 
iovec *iov,
loff_t *ppos = >ki_pos;
 
count = 0;
-   for (seg = 0; seg < nr_segs; seg++) {
-   const struct iovec *iv = [seg];
-
-   /*
-* If any segment has a negative length, or the cumulative
-* length ever wraps negative then return -EINVAL.
-*/
-   count += iv->iov_len;
-   if (unlikely((ssize_t)(count|iv->iov_len) < 0))
-   return -EINVAL;
-   if (access_ok(VERIFY_WRITE, iv->iov_base, iv->iov_len))
-   continue;
-   if (seg == 0)
-   return -EFAULT;
-   nr_segs = seg;
-   count -= iv->iov_len;   /* This segment is no good */
-   break;
-   }
+   retval = generic_segment_checks(iov, _segs, , VERIFY_WRITE);
+   if (retval)
+   return retval;
 
/* coalesce the iovecs and go direct-to-BIO for O_DIRECT */
if (filp->f_flags & O_DIRECT) {
@@ -2080,8 +2097,9 @@ generic_file_direct_write(struct kiocb *iocb, const 
struct iovec *iov,
/*
 * Sync the fs metadata but not the minor inode changes and
 * of course not the data as we did direct DMA for the IO.
-* i_mutex is held, which protects generic_osync_inode() from
-* livelocking.  AIO O_DIRECT ops attempt to sync metadata here.
+* i_mutex may not being held, if so some specific locking
+* ordering must protect generic_osync_inode() from livelocking.
+* AIO O_DIRECT ops attempt to sync metadata here.
 */
if ((written >= 0 || written == -EIOCBQUEUED) &&
((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
@@ -2271,30 +2289,14 @@ __generic_file_aio_write_nolock(struct kiocb *iocb, 
const struct iovec *iov,
size_t ocount;  /* original count */
size_t count;   /* after file limit checks */
struct inode*inode = mapping->host;
-   unsigned long   seg;
loff_t  pos;
ssize_t written;
ssize_t err;
 
ocount = 0;
-   for (seg = 0; seg < nr_segs; seg++) {
-   const struct iovec *iv = [seg];
-
-   /*
-* If any segment has a negative length, or the cumulative
-* length ever wraps negative then return -EINVAL.
-*/
-   ocount += iv->iov_len;
-   if (unlikely((ssize_t)(ocount|iv->iov_len) < 0))
-   return -EINVAL;
-   if (access_ok(VERIFY_READ, iv->iov_base, iv->iov_len))
-   continue;
-   if (seg == 0)
-

[PATCH] mm: be sure to trim blocks after direct_io has failed

2007-02-28 Thread Dmitriy Monakhov

This is updated version of patch aimed to fix direct_io error handling issue
i've previously sent 2 wheeks ago. If you don't like anything in this patch
plese let me know.

Changes:
  - comments added. I think now it is clearly describe things.
  - patch prepared against 2.6.20-mm2

How this patch tested:
  - LTP test, and other readv/writev op tests.
  - fsstress test.
  - manual direct_io tests.

Log:
 - Move common segment checks to separate helper function.
 - Trim off blocks after generic_file_direct_write() has failed.
 - Update out of date comments about direct_io locking rules.

Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]
---
 mm/filemap.c |  107 +++--
 1 files changed, 66 insertions(+), 41 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index a9284c2..c81184c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1147,6 +1147,38 @@ success:
return size;
 }
 
+/*
+ * Performs necessary checks before doing a write
+ *
+ * Adjust number of segments and amount of bytes to write.
+ * Returns appropriate error code that caller should return or
+ * zero in case that write should be allowed.
+ */
+int generic_segment_checks(const struct iovec *iov, unsigned long *nr_segs,
+   size_t *count, unsigned long access_flags)
+{
+   unsigned long   seg;
+   for (seg = 0; seg  *nr_segs; seg++) {
+   const struct iovec *iv = iov[seg];
+
+   /*
+* If any segment has a negative length, or the cumulative
+* length ever wraps negative then return -EINVAL.
+*/
+   *count += iv-iov_len;
+   if (unlikely((ssize_t)(*count|iv-iov_len)  0))
+   return -EINVAL;
+   if (access_ok(access_flags, iv-iov_base, iv-iov_len))
+   continue;
+   if (seg == 0)
+   return -EFAULT;
+   *nr_segs = seg;
+   *count -= iv-iov_len;  /* This segment is no good */
+   break;
+   }
+   return 0;
+}
+
 /**
  * generic_file_aio_read - generic filesystem read routine
  * @iocb:  kernel I/O control block
@@ -1168,24 +1200,9 @@ generic_file_aio_read(struct kiocb *iocb, const struct 
iovec *iov,
loff_t *ppos = iocb-ki_pos;
 
count = 0;
-   for (seg = 0; seg  nr_segs; seg++) {
-   const struct iovec *iv = iov[seg];
-
-   /*
-* If any segment has a negative length, or the cumulative
-* length ever wraps negative then return -EINVAL.
-*/
-   count += iv-iov_len;
-   if (unlikely((ssize_t)(count|iv-iov_len)  0))
-   return -EINVAL;
-   if (access_ok(VERIFY_WRITE, iv-iov_base, iv-iov_len))
-   continue;
-   if (seg == 0)
-   return -EFAULT;
-   nr_segs = seg;
-   count -= iv-iov_len;   /* This segment is no good */
-   break;
-   }
+   retval = generic_segment_checks(iov, nr_segs, count, VERIFY_WRITE);
+   if (retval)
+   return retval;
 
/* coalesce the iovecs and go direct-to-BIO for O_DIRECT */
if (filp-f_flags  O_DIRECT) {
@@ -2080,8 +2097,9 @@ generic_file_direct_write(struct kiocb *iocb, const 
struct iovec *iov,
/*
 * Sync the fs metadata but not the minor inode changes and
 * of course not the data as we did direct DMA for the IO.
-* i_mutex is held, which protects generic_osync_inode() from
-* livelocking.  AIO O_DIRECT ops attempt to sync metadata here.
+* i_mutex may not being held, if so some specific locking
+* ordering must protect generic_osync_inode() from livelocking.
+* AIO O_DIRECT ops attempt to sync metadata here.
 */
if ((written = 0 || written == -EIOCBQUEUED) 
((file-f_flags  O_SYNC) || IS_SYNC(inode))) {
@@ -2271,30 +2289,14 @@ __generic_file_aio_write_nolock(struct kiocb *iocb, 
const struct iovec *iov,
size_t ocount;  /* original count */
size_t count;   /* after file limit checks */
struct inode*inode = mapping-host;
-   unsigned long   seg;
loff_t  pos;
ssize_t written;
ssize_t err;
 
ocount = 0;
-   for (seg = 0; seg  nr_segs; seg++) {
-   const struct iovec *iv = iov[seg];
-
-   /*
-* If any segment has a negative length, or the cumulative
-* length ever wraps negative then return -EINVAL.
-*/
-   ocount += iv-iov_len;
-   if (unlikely((ssize_t)(ocount|iv-iov_len)  0))
-   return -EINVAL;
-   if (access_ok(VERIFY_READ, iv-iov_base, iv-iov_len))
-   continue;
-   if (seg == 0)
-

[PATCH] ecryptfs: check xattr operation support fix

2007-02-28 Thread Dmitriy Monakhov


  - ecryptfs_write_inode_size_to_metadata() error code was ignored.
  - i_op-setxattr() must be supported by lower fs because used below.

Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]
---
 fs/ecryptfs/inode.c |6 +++---
 fs/ecryptfs/mmap.c  |3 ++-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 27fd14a..9ccefad 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -168,9 +168,9 @@ static int grow_file(struct dentry *ecryptfs_dentry, struct 
file *lower_file,
goto out;
}
i_size_write(inode, 0);
-   ecryptfs_write_inode_size_to_metadata(lower_file, lower_inode, inode,
- ecryptfs_dentry,
- ECRYPTFS_LOWER_I_MUTEX_NOT_HELD);
+   rc = ecryptfs_write_inode_size_to_metadata(lower_file, lower_inode,
+   inode, ecryptfs_dentry,
+   ECRYPTFS_LOWER_I_MUTEX_NOT_HELD);
ecryptfs_inode_to_private(inode)-crypt_stat.flags |= ECRYPTFS_NEW_FILE;
 out:
return rc;
diff --git a/fs/ecryptfs/mmap.c b/fs/ecryptfs/mmap.c
index 1e5d2ba..416985f 100644
--- a/fs/ecryptfs/mmap.c
+++ b/fs/ecryptfs/mmap.c
@@ -491,7 +491,8 @@ static int ecryptfs_write_inode_size_to_xattr(struct inode 
*lower_inode,
goto out;
}
lower_dentry = ecryptfs_dentry_to_lower(ecryptfs_dentry);
-   if (!lower_dentry-d_inode-i_op-getxattr) {
+   if (!lower_dentry-d_inode-i_op-getxattr ||
+   !lower_dentry-d_inode-i_op-setxattr) {
printk(KERN_WARNING
   No support for setting xattr in lower filesystem\n);
rc = -ENOSYS;
-- 
1.5.0.1


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH][RFC] ext3: Handle ext[34]_journal_stop() failure

2007-02-28 Thread Dmitriy Monakhov


Where are many places where _journal_stop() return code wasn't
checked. Off cause _journal_stop() failed very rarely (and usually
with fatal consequences), but this does'n meen it should not be checked.
For example most retry loops looks like follows:
ext3_journal_stop(handle);
if (err == -ENOSPC  ext3_should_retry_alloc(dir-i_sb, retries))
goto retry;
It is realy insane do retry if ext3_journal_stop() has failed, because
this may increase damage in case of real problem.

Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]
---
 fs/ext3/acl.c   |   12 
 fs/ext3/inode.c |   22 +-
 fs/ext3/ioctl.c |   12 
 fs/ext3/namei.c |   48 
 fs/ext3/xattr.c |4 ++--
 fs/ext4/acl.c   |   12 
 fs/ext4/inode.c |   22 +-
 fs/ext4/ioctl.c |   12 
 fs/ext4/namei.c |   48 
 fs/ext4/xattr.c |4 ++--
 10 files changed, 134 insertions(+), 62 deletions(-)

diff --git a/fs/ext3/acl.c b/fs/ext3/acl.c
index 1e5038d..bbf4d8a 100644
--- a/fs/ext3/acl.c
+++ b/fs/ext3/acl.c
@@ -375,7 +375,7 @@ int
 ext3_acl_chmod(struct inode *inode)
 {
struct posix_acl *acl, *clone;
-int error;
+   int error, error2;
 
if (S_ISLNK(inode-i_mode))
return -EOPNOTSUPP;
@@ -402,7 +402,9 @@ ext3_acl_chmod(struct inode *inode)
goto out;
}
error = ext3_set_acl(handle, inode, ACL_TYPE_ACCESS, clone);
-   ext3_journal_stop(handle);
+   error2 = ext3_journal_stop(handle);
+   if (error2  (!error || error == -ENOSPC))
+   error = error2;
if (error == -ENOSPC 
ext3_should_retry_alloc(inode-i_sb, retries))
goto retry;
@@ -485,7 +487,7 @@ ext3_xattr_set_acl(struct inode *inode, int type, const 
void *value,
 {
handle_t *handle;
struct posix_acl *acl;
-   int error, retries = 0;
+   int error, error2, retries = 0;
 
if (!test_opt(inode-i_sb, POSIX_ACL))
return -EOPNOTSUPP;
@@ -509,7 +511,9 @@ retry:
if (IS_ERR(handle))
return PTR_ERR(handle);
error = ext3_set_acl(handle, inode, type, acl);
-   ext3_journal_stop(handle);
+   error2 = ext3_journal_stop(handle);
+   if (error2  (!error || error == -ENOSPC))
+   error = error2;
if (error == -ENOSPC  ext3_should_retry_alloc(inode-i_sb, retries))
goto retry;
 
diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index 816f0c5..68d1c71 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -962,7 +962,9 @@ static int ext3_get_block(struct inode *inode, sector_t 
iblock,
 * Huge direct-io writes can hold off commits for long
 * periods of time.  Let this commit run.
 */
-   ext3_journal_stop(handle);
+   ret = ext3_journal_stop(handle);
+   if (ret)
+   goto get_block;
handle = ext3_journal_start(inode, DIO_CREDITS);
if (IS_ERR(handle))
ret = PTR_ERR(handle);
@@ -2998,7 +3000,13 @@ int ext3_setattr(struct dentry *dentry, struct iattr 
*attr)
if (attr-ia_valid  ATTR_GID)
inode-i_gid = attr-ia_gid;
error = ext3_mark_inode_dirty(handle, inode);
-   ext3_journal_stop(handle);
+   if (error) {
+   ext3_journal_stop(handle);
+   goto err_out;
+   }
+   error = ext3_journal_stop(handle);
+   if (error)
+   goto err_out;
}
 
if (S_ISREG(inode-i_mode) 
@@ -3016,7 +3024,9 @@ int ext3_setattr(struct dentry *dentry, struct iattr 
*attr)
rc = ext3_mark_inode_dirty(handle, inode);
if (!error)
error = rc;
-   ext3_journal_stop(handle);
+   rc = ext3_journal_stop(handle);
+   if (!error)
+   error = rc;
}
 
rc = inode_setattr(inode, attr);
@@ -3231,7 +3241,7 @@ int ext3_change_inode_journal_flag(struct inode *inode, 
int val)
 {
journal_t *journal;
handle_t *handle;
-   int err;
+   int err, err2;
 
/*
 * We have to be very careful here: changing a data block's
@@ -3274,7 +3284,9 @@ int ext3_change_inode_journal_flag(struct inode *inode, 
int val)
 
err = ext3_mark_inode_dirty(handle, inode);
handle-h_sync = 1;
-   ext3_journal_stop(handle);
+   err2 = ext3_journal_stop(handle);
+   if (!err)
+   err = err2;
ext3_std_error(inode-i_sb, err);
 
return err;
diff --git a/fs/ext3/ioctl.c b/fs/ext3/ioctl.c
index

[PATCH] ecryptfs: lower root result must be adirectory

2007-02-27 Thread Dmitriy Monakhov

patch against lastest mm tree.

- Currently after path_lookup succeed we dot't have any guarantie what
  it is DIR. This must be explicitly demanded.
- path_lookup can't return negative dentry, So inode check is useless.

Signed-off-by: Dmitriy Monakhov <[EMAIL PROTECTED]>

diff --git a/fs/ecryptfs/main.c b/fs/ecryptfs/main.c
index 812427e..fc4a3a2 100644
--- a/fs/ecryptfs/main.c
+++ b/fs/ecryptfs/main.c
@@ -484,18 +484,12 @@ static int ecryptfs_read_super(struct super_block *sb, 
const char *dev_name)
struct vfsmount *lower_mnt;
 
memset(, 0, sizeof(struct nameidata));
-   rc = path_lookup(dev_name, LOOKUP_FOLLOW, );
+   rc = path_lookup(dev_name, LOOKUP_FOLLOW | LOOKUP_DIRECTORY, );
if (rc) {
ecryptfs_printk(KERN_WARNING, "path_lookup() failed\n");
goto out;
}
lower_root = nd.dentry;
-   if (!lower_root->d_inode) {
-   ecryptfs_printk(KERN_WARNING,
-   "No directory to interpose on\n");
-   rc = -ENOENT;
-   goto out_free;
-   }
lower_mnt = nd.mnt;
ecryptfs_set_superblock_lower(sb, lower_root->d_sb);
sb->s_maxbytes = lower_root->d_sb->s_maxbytes;

[PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() [resend]

2007-02-27 Thread Dmitriy Monakhov


Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]>
---
 drivers/net/sk98lin/skge.c |   20 +++-
 1 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c
index e94ab25..eea753a 100644
--- a/drivers/net/sk98lin/skge.c
+++ b/drivers/net/sk98lin/skge.c
@@ -5125,7 +5125,12 @@ static int skge_resume(struct pci_dev *pdev)
 
pci_set_power_state(pdev, PCI_D0);
pci_restore_state(pdev);
-   pci_enable_device(pdev);
+   ret = pci_enable_device(pdev);
+   if (ret) {
+   printk(KERN_WARNING "sk98lin: unable to enable device %s "
+   "in resume\n", dev->name);
+   goto err_out;
+   }   
pci_set_master(pdev);
if (pAC->GIni.GIMacsFound == 2)
ret = request_irq(dev->irq, SkGeIsr, IRQF_SHARED, "sk98lin", 
dev);
@@ -5133,10 +5138,8 @@ static int skge_resume(struct pci_dev *pdev)
ret = request_irq(dev->irq, SkGeIsrOnePort, IRQF_SHARED, 
"sk98lin", dev);
if (ret) {
printk(KERN_WARNING "sk98lin: unable to acquire IRQ %d\n", 
dev->irq);
-   pAC->AllocFlag &= ~SK_ALLOC_IRQ;
-   dev->irq = 0;
-   pci_disable_device(pdev);
-   return -EBUSY;
+   ret = -EBUSY;
+   goto err_out_disable_pdev;
}
 
netif_device_attach(dev);
@@ -5153,6 +5156,13 @@ static int skge_resume(struct pci_dev *pdev)
}
 
return 0;
+
+err_out_disable_pdev:
+   pci_disable_device(pdev);
+err_out:
+   pAC->AllocFlag &= ~SK_ALLOC_IRQ;
+   dev->irq = 0;
+   return ret;
 }
 #else
 #define skge_suspend NULL
-- 
1.4.4.4

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ecryptfs: lower root result must be adirectory

2007-02-27 Thread Dmitriy Monakhov

patch against lastest mm tree.

- Currently after path_lookup succeed we dot't have any guarantie what
  it is DIR. This must be explicitly demanded.
- path_lookup can't return negative dentry, So inode check is useless.

Signed-off-by: Dmitriy Monakhov [EMAIL PROTECTED]

diff --git a/fs/ecryptfs/main.c b/fs/ecryptfs/main.c
index 812427e..fc4a3a2 100644
--- a/fs/ecryptfs/main.c
+++ b/fs/ecryptfs/main.c
@@ -484,18 +484,12 @@ static int ecryptfs_read_super(struct super_block *sb, 
const char *dev_name)
struct vfsmount *lower_mnt;
 
memset(nd, 0, sizeof(struct nameidata));
-   rc = path_lookup(dev_name, LOOKUP_FOLLOW, nd);
+   rc = path_lookup(dev_name, LOOKUP_FOLLOW | LOOKUP_DIRECTORY, nd);
if (rc) {
ecryptfs_printk(KERN_WARNING, path_lookup() failed\n);
goto out;
}
lower_root = nd.dentry;
-   if (!lower_root-d_inode) {
-   ecryptfs_printk(KERN_WARNING,
-   No directory to interpose on\n);
-   rc = -ENOENT;
-   goto out_free;
-   }
lower_mnt = nd.mnt;
ecryptfs_set_superblock_lower(sb, lower_root-d_sb);
sb-s_maxbytes = lower_root-d_sb-s_maxbytes;

[PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() [resend]

2007-02-27 Thread Dmitriy Monakhov


Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]
---
 drivers/net/sk98lin/skge.c |   20 +++-
 1 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c
index e94ab25..eea753a 100644
--- a/drivers/net/sk98lin/skge.c
+++ b/drivers/net/sk98lin/skge.c
@@ -5125,7 +5125,12 @@ static int skge_resume(struct pci_dev *pdev)
 
pci_set_power_state(pdev, PCI_D0);
pci_restore_state(pdev);
-   pci_enable_device(pdev);
+   ret = pci_enable_device(pdev);
+   if (ret) {
+   printk(KERN_WARNING sk98lin: unable to enable device %s 
+   in resume\n, dev-name);
+   goto err_out;
+   }   
pci_set_master(pdev);
if (pAC-GIni.GIMacsFound == 2)
ret = request_irq(dev-irq, SkGeIsr, IRQF_SHARED, sk98lin, 
dev);
@@ -5133,10 +5138,8 @@ static int skge_resume(struct pci_dev *pdev)
ret = request_irq(dev-irq, SkGeIsrOnePort, IRQF_SHARED, 
sk98lin, dev);
if (ret) {
printk(KERN_WARNING sk98lin: unable to acquire IRQ %d\n, 
dev-irq);
-   pAC-AllocFlag = ~SK_ALLOC_IRQ;
-   dev-irq = 0;
-   pci_disable_device(pdev);
-   return -EBUSY;
+   ret = -EBUSY;
+   goto err_out_disable_pdev;
}
 
netif_device_attach(dev);
@@ -5153,6 +5156,13 @@ static int skge_resume(struct pci_dev *pdev)
}
 
return 0;
+
+err_out_disable_pdev:
+   pci_disable_device(pdev);
+err_out:
+   pAC-AllocFlag = ~SK_ALLOC_IRQ;
+   dev-irq = 0;
+   return ret;
 }
 #else
 #define skge_suspend NULL
-- 
1.4.4.4

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] pcmcia: Handle request_irq() failure while opening device

2007-02-26 Thread Dmitriy Monakhov

Monakhov Dmitriy <[EMAIL PROTECTED]> writes:

> Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]>
> ---
>  drivers/net/pcmcia/axnet_cs.c |8 +++-
>  drivers/net/pcmcia/pcnet_cs.c |8 +++-
>  2 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/pcmcia/axnet_cs.c b/drivers/net/pcmcia/axnet_cs.c
> index 6139048..9b57bab 100644
> --- a/drivers/net/pcmcia/axnet_cs.c
> +++ b/drivers/net/pcmcia/axnet_cs.c
> @@ -523,6 +523,7 @@ static int axnet_open(struct net_device *dev)
>  {
>  axnet_dev_t *info = PRIV(dev);
>  struct pcmcia_device *link = info->p_dev;
> +int err = 0;
>  
>  DEBUG(2, "axnet_open('%s')\n", dev->name);
>  
> @@ -531,7 +532,12 @@ static int axnet_open(struct net_device *dev)
>  
>  link->open++;
>  
> -request_irq(dev->irq, ei_irq_wrapper, IRQF_SHARED, "axnet_cs", dev);
> +err = request_irq(dev->irq, ei_irq_wrapper, IRQF_SHARED, "axnet_cs", 
> dev);
> +if (err) {
> + printk("axnet_cs: request_irq() failed  %s\n", dev->name);
> + link->open--;
> + return -EBUSY;
As David Miller noted it is better to return exact error code from 
request_irq(),
truly most net_device->open() functions return error code without changing.  
The updated patch version following:

Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]>
---
 drivers/net/pcmcia/axnet_cs.c |8 +++-
 drivers/net/pcmcia/pcnet_cs.c |8 +++-
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/net/pcmcia/axnet_cs.c b/drivers/net/pcmcia/axnet_cs.c
index 6139048..9b57bab 100644
--- a/drivers/net/pcmcia/axnet_cs.c
+++ b/drivers/net/pcmcia/axnet_cs.c
@@ -523,6 +523,7 @@ static int axnet_open(struct net_device *dev)
 {
 axnet_dev_t *info = PRIV(dev);
 struct pcmcia_device *link = info->p_dev;
+int err = 0;
 
 DEBUG(2, "axnet_open('%s')\n", dev->name);
 
@@ -531,7 +532,12 @@ static int axnet_open(struct net_device *dev)
 
 link->open++;
 
-request_irq(dev->irq, ei_irq_wrapper, IRQF_SHARED, "axnet_cs", dev);
+err = request_irq(dev->irq, ei_irq_wrapper, IRQF_SHARED, "axnet_cs", dev);
+if (err) {
+   printk("axnet_cs: request_irq() failed  %s\n", dev->name);
+   link->open--;
+   return err;
+}
 
 info->link_status = 0x00;
 init_timer(>watchdog);
diff --git a/drivers/net/pcmcia/pcnet_cs.c b/drivers/net/pcmcia/pcnet_cs.c
index d88e9b2..13923ea 100644
--- a/drivers/net/pcmcia/pcnet_cs.c
+++ b/drivers/net/pcmcia/pcnet_cs.c
@@ -962,6 +962,7 @@ static int pcnet_open(struct net_device *dev)
 {
 pcnet_dev_t *info = PRIV(dev);
 struct pcmcia_device *link = info->p_dev;
+int err = 0;
 
 DEBUG(2, "pcnet_open('%s')\n", dev->name);
 
@@ -971,7 +972,12 @@ static int pcnet_open(struct net_device *dev)
 link->open++;
 
 set_misc_reg(dev);
-request_irq(dev->irq, ei_irq_wrapper, IRQF_SHARED, dev_info, dev);
+err = request_irq(dev->irq, ei_irq_wrapper, IRQF_SHARED, dev_info, dev);
+if (err) {
+   printk("pcnet_open: request_irq() failed  %s\n", dev->name);
+   link->open--;
+   return err;
+}
 
 info->phy_id = info->eth_phy;
 info->link_status = 0x00;
-- 
1.5.0.1

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] pcmcia: Handle request_irq() failure while opening device

2007-02-26 Thread Dmitriy Monakhov

Monakhov Dmitriy [EMAIL PROTECTED] writes:

 Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]
 ---
  drivers/net/pcmcia/axnet_cs.c |8 +++-
  drivers/net/pcmcia/pcnet_cs.c |8 +++-
  2 files changed, 14 insertions(+), 2 deletions(-)

 diff --git a/drivers/net/pcmcia/axnet_cs.c b/drivers/net/pcmcia/axnet_cs.c
 index 6139048..9b57bab 100644
 --- a/drivers/net/pcmcia/axnet_cs.c
 +++ b/drivers/net/pcmcia/axnet_cs.c
 @@ -523,6 +523,7 @@ static int axnet_open(struct net_device *dev)
  {
  axnet_dev_t *info = PRIV(dev);
  struct pcmcia_device *link = info-p_dev;
 +int err = 0;
  
  DEBUG(2, axnet_open('%s')\n, dev-name);
  
 @@ -531,7 +532,12 @@ static int axnet_open(struct net_device *dev)
  
  link-open++;
  
 -request_irq(dev-irq, ei_irq_wrapper, IRQF_SHARED, axnet_cs, dev);
 +err = request_irq(dev-irq, ei_irq_wrapper, IRQF_SHARED, axnet_cs, 
 dev);
 +if (err) {
 + printk(axnet_cs: request_irq() failed  %s\n, dev-name);
 + link-open--;
 + return -EBUSY;
As David Miller noted it is better to return exact error code from 
request_irq(),
truly most net_device-open() functions return error code without changing.  
The updated patch version following:

Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]
---
 drivers/net/pcmcia/axnet_cs.c |8 +++-
 drivers/net/pcmcia/pcnet_cs.c |8 +++-
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/net/pcmcia/axnet_cs.c b/drivers/net/pcmcia/axnet_cs.c
index 6139048..9b57bab 100644
--- a/drivers/net/pcmcia/axnet_cs.c
+++ b/drivers/net/pcmcia/axnet_cs.c
@@ -523,6 +523,7 @@ static int axnet_open(struct net_device *dev)
 {
 axnet_dev_t *info = PRIV(dev);
 struct pcmcia_device *link = info-p_dev;
+int err = 0;
 
 DEBUG(2, axnet_open('%s')\n, dev-name);
 
@@ -531,7 +532,12 @@ static int axnet_open(struct net_device *dev)
 
 link-open++;
 
-request_irq(dev-irq, ei_irq_wrapper, IRQF_SHARED, axnet_cs, dev);
+err = request_irq(dev-irq, ei_irq_wrapper, IRQF_SHARED, axnet_cs, dev);
+if (err) {
+   printk(axnet_cs: request_irq() failed  %s\n, dev-name);
+   link-open--;
+   return err;
+}
 
 info-link_status = 0x00;
 init_timer(info-watchdog);
diff --git a/drivers/net/pcmcia/pcnet_cs.c b/drivers/net/pcmcia/pcnet_cs.c
index d88e9b2..13923ea 100644
--- a/drivers/net/pcmcia/pcnet_cs.c
+++ b/drivers/net/pcmcia/pcnet_cs.c
@@ -962,6 +962,7 @@ static int pcnet_open(struct net_device *dev)
 {
 pcnet_dev_t *info = PRIV(dev);
 struct pcmcia_device *link = info-p_dev;
+int err = 0;
 
 DEBUG(2, pcnet_open('%s')\n, dev-name);
 
@@ -971,7 +972,12 @@ static int pcnet_open(struct net_device *dev)
 link-open++;
 
 set_misc_reg(dev);
-request_irq(dev-irq, ei_irq_wrapper, IRQF_SHARED, dev_info, dev);
+err = request_irq(dev-irq, ei_irq_wrapper, IRQF_SHARED, dev_info, dev);
+if (err) {
+   printk(pcnet_open: request_irq() failed  %s\n, dev-name);
+   link-open--;
+   return err;
+}
 
 info-phy_id = info-eth_phy;
 info-link_status = 0x00;
-- 
1.5.0.1

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] libata: handle ata_pci_device_do_resume() failure while resuming (v2)

2007-02-25 Thread Dmitriy Monakhov

Randy Dunlap <[EMAIL PROTECTED]> writes:

> On Sat, 24 Feb 2007 00:43:18 +0300 Dmitriy Monakhov wrote:
>
>> Since commit:553c4aa630af7bc885e056d0436e4eb7f238579b
>> ata_pci_device_do_resume() can return error code, all callers was updated
>> except this one.
>> 
>> Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]>
>> ---
>>  drivers/ata/sata_inic162x.c |5 -
>>  1 files changed, 4 insertions(+), 1 deletions(-)
>> 
>> diff --git a/drivers/ata/sata_inic162x.c b/drivers/ata/sata_inic162x.c
>> index 31b636f..7933043 100644
>> --- a/drivers/ata/sata_inic162x.c
>> +++ b/drivers/ata/sata_inic162x.c
>> @@ -639,7 +639,10 @@ static int inic_pci_device_resume(struct pci_dev *pdev)
>>  void __iomem *mmio_base = host->iomap[MMIO_BAR];
>>  int rc;
>>  
>> -ata_pci_device_do_resume(pdev);
>> +rc = ata_pci_device_do_resume(pdev);
>> +if (rc) {
>> +return rc;
>
> Either (a) don't use the braces when they are not needed, or
> (b) is it possible to add a meaningful message there, or is that
> done elsewhere?
True. Thanks for correcting me here.
It is't realy necessary print err message because ata_pci_device_do_resume()
already done this. So just remove braces.


 Since commit:553c4aa630af7bc885e056d0436e4eb7f238579b
 ata_pci_device_do_resume() can return error code, all callers was updated
 except this one.

Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]>

diff --git a/drivers/ata/sata_inic162x.c b/drivers/ata/sata_inic162x.c
index 31b636f..18ac665 100644
--- a/drivers/ata/sata_inic162x.c
+++ b/drivers/ata/sata_inic162x.c
@@ -639,7 +639,9 @@ static int inic_pci_device_resume(struct pci_dev *pdev)
void __iomem *mmio_base = host->iomap[MMIO_BAR];
int rc;
 
-   ata_pci_device_do_resume(pdev);
+   rc = ata_pci_device_do_resume(pdev);
+   if (rc)
+   return rc;
 
if (pdev->dev.power.power_state.event == PM_EVENT_SUSPEND) {
printk("XXX\n");

[PATCH] libata: handle ata_pci_device_do_resume() failure while resuming (v2)

2007-02-25 Thread Dmitriy Monakhov

Randy Dunlap [EMAIL PROTECTED] writes:

 On Sat, 24 Feb 2007 00:43:18 +0300 Dmitriy Monakhov wrote:

 Since commit:553c4aa630af7bc885e056d0436e4eb7f238579b
 ata_pci_device_do_resume() can return error code, all callers was updated
 except this one.
 
 Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]
 ---
  drivers/ata/sata_inic162x.c |5 -
  1 files changed, 4 insertions(+), 1 deletions(-)
 
 diff --git a/drivers/ata/sata_inic162x.c b/drivers/ata/sata_inic162x.c
 index 31b636f..7933043 100644
 --- a/drivers/ata/sata_inic162x.c
 +++ b/drivers/ata/sata_inic162x.c
 @@ -639,7 +639,10 @@ static int inic_pci_device_resume(struct pci_dev *pdev)
  void __iomem *mmio_base = host-iomap[MMIO_BAR];
  int rc;
  
 -ata_pci_device_do_resume(pdev);
 +rc = ata_pci_device_do_resume(pdev);
 +if (rc) {
 +return rc;

 Either (a) don't use the braces when they are not needed, or
 (b) is it possible to add a meaningful message there, or is that
 done elsewhere?
True. Thanks for correcting me here.
It is't realy necessary print err message because ata_pci_device_do_resume()
already done this. So just remove braces.


 Since commit:553c4aa630af7bc885e056d0436e4eb7f238579b
 ata_pci_device_do_resume() can return error code, all callers was updated
 except this one.

Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]

diff --git a/drivers/ata/sata_inic162x.c b/drivers/ata/sata_inic162x.c
index 31b636f..18ac665 100644
--- a/drivers/ata/sata_inic162x.c
+++ b/drivers/ata/sata_inic162x.c
@@ -639,7 +639,9 @@ static int inic_pci_device_resume(struct pci_dev *pdev)
void __iomem *mmio_base = host-iomap[MMIO_BAR];
int rc;
 
-   ata_pci_device_do_resume(pdev);
+   rc = ata_pci_device_do_resume(pdev);
+   if (rc)
+   return rc;
 
if (pdev-dev.power.power_state.event == PM_EVENT_SUSPEND) {
printk(XXX\n);

Re: [PATCH] sk98lin: handle pci_enable_device() return value in skge_resume()

2007-02-24 Thread Dmitriy Monakhov

Dmitriy Monakhov <[EMAIL PROTECTED]> writes:

> This thread looks dead but issue was't fixed.
>
> Jiri Kosina <[EMAIL PROTECTED]> writes:
>>> > - pci_enable_device(pdev);
>>> > + ret = pci_enable_device(pdev);
>>> > + if (ret) {
>>> > + printk(KERN_ERR "sk98lin: Cannot enable PCI device %s during 
>>> > resume\n", 
>>> > + dev->name);
>>> > + unregister_netdev(dev);
>>> This looks rather wrong - skge_exit() will run unregister_netdev() again.
>>
>> You are of course right (the problem was also spotted by Russell King). 
>> This I believe is the correct one for the sk98lin case.
>>
>> [PATCH] fix sk98lin driver, ignoring return value from pci_enable_device()
>>
>> add check of return value to _resume() function of sk98lin driver.
>>
>> Signed-off-by: Jiri Kosina <[EMAIL PROTECTED]>
>>
>> ---
>>
>> --- a/drivers/net/sk98lin/skge.c
>> +++ b/drivers/net/sk98lin/skge.c
>> @@ -5070,7 +5070,12 @@ static int skge_resume(struct pci_dev *p
>>  
>>  pci_set_power_state(pdev, PCI_D0);
>>  pci_restore_state(pdev);
>> -pci_enable_device(pdev);
>> +ret = pci_enable_device(pdev);
>> +if (ret) {
>> +printk(KERN_WARNING "sk98lin: unable to enable device %s in 
>> resume\n",
>> +dev->name);
>> +goto out_err;
>> +}   
> [snip]
>> +out_err:
>> +pAC->AllocFlag &= ~SK_ALLOC_IRQ;
>> +dev->irq = 0;
>> +pci_disable_device(pdev);
> <<<<< Ok what happend if we jump here right after pci_disable_device() has 
 OOPs.Of course i mean pci_enable_device() here^
 I'm sorry about this. (Option complete_word not always good, some times 
 brain work required too :) )
 So correct comment looks like:
  <<<<< Ok what happend if we jump here right after pci_enable_device() has
failed, but pci_disable_device() was called anyway, this is wrong and 
may be fatal because pdev->enable_cnt may becomes negative.  
>> +
>> +return ret;
>> +
>>  }
>>  #else
>>  #define skge_suspend NULL
> This is reworked Jiri's patch:
>
> [PATCH] sk98lin: handle pci_enable_device() return value in skge_resume()
>
> Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]>
> ---
>  drivers/net/sk98lin/skge.c |   20 +++-
>  1 files changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c
> index e94ab25..eea753a 100644
> --- a/drivers/net/sk98lin/skge.c
> +++ b/drivers/net/sk98lin/skge.c
> @@ -5125,7 +5125,12 @@ static int skge_resume(struct pci_dev *pdev)
>  
>   pci_set_power_state(pdev, PCI_D0);
>   pci_restore_state(pdev);
> - pci_enable_device(pdev);
> + ret = pci_enable_device(pdev);
> + if (ret) {
> + printk(KERN_WARNING "sk98lin: unable to enable device %s "
> + "in resume\n", dev->name);
> + goto err_out;
> + }   
>   pci_set_master(pdev);
>   if (pAC->GIni.GIMacsFound == 2)
>   ret = request_irq(dev->irq, SkGeIsr, IRQF_SHARED, "sk98lin", 
> dev);
> @@ -5133,10 +5138,8 @@ static int skge_resume(struct pci_dev *pdev)
>   ret = request_irq(dev->irq, SkGeIsrOnePort, IRQF_SHARED, 
> "sk98lin", dev);
>   if (ret) {
>   printk(KERN_WARNING "sk98lin: unable to acquire IRQ %d\n", 
> dev->irq);
> - pAC->AllocFlag &= ~SK_ALLOC_IRQ;
> - dev->irq = 0;
> - pci_disable_device(pdev);
> - return -EBUSY;
> + ret = -EBUSY;
> + goto err_out_disable_pdev;
>   }
>  
>   netif_device_attach(dev);
> @@ -5153,6 +5156,13 @@ static int skge_resume(struct pci_dev *pdev)
>   }
>  
>   return 0;
> +
> +err_out_disable_pdev:
> + pci_disable_device(pdev);
> +err_out:
> + pAC->AllocFlag &= ~SK_ALLOC_IRQ;
> + dev->irq = 0;
> + return ret;
>  }
>  #else
>  #define skge_suspend NULL
> -- 
> 1.4.4.4
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sk98lin: handle pci_enable_device() return value in skge_resume()

2007-02-24 Thread Dmitriy Monakhov

Dmitriy Monakhov [EMAIL PROTECTED] writes:

 This thread looks dead but issue was't fixed.

 Jiri Kosina [EMAIL PROTECTED] writes:
  - pci_enable_device(pdev);
  + ret = pci_enable_device(pdev);
  + if (ret) {
  + printk(KERN_ERR sk98lin: Cannot enable PCI device %s during 
  resume\n, 
  + dev-name);
  + unregister_netdev(dev);
 This looks rather wrong - skge_exit() will run unregister_netdev() again.

 You are of course right (the problem was also spotted by Russell King). 
 This I believe is the correct one for the sk98lin case.

 [PATCH] fix sk98lin driver, ignoring return value from pci_enable_device()

 add check of return value to _resume() function of sk98lin driver.

 Signed-off-by: Jiri Kosina [EMAIL PROTECTED]

 ---

 --- a/drivers/net/sk98lin/skge.c
 +++ b/drivers/net/sk98lin/skge.c
 @@ -5070,7 +5070,12 @@ static int skge_resume(struct pci_dev *p
  
  pci_set_power_state(pdev, PCI_D0);
  pci_restore_state(pdev);
 -pci_enable_device(pdev);
 +ret = pci_enable_device(pdev);
 +if (ret) {
 +printk(KERN_WARNING sk98lin: unable to enable device %s in 
 resume\n,
 +dev-name);
 +goto out_err;
 +}   
 [snip]
 +out_err:
 +pAC-AllocFlag = ~SK_ALLOC_IRQ;
 +dev-irq = 0;
 +pci_disable_device(pdev);
  Ok what happend if we jump here right after pci_disable_device() has 
 OOPs.Of course i mean pci_enable_device() here^
 I'm sorry about this. (Option complete_word not always good, some times 
 brain work required too :) )
 So correct comment looks like:
   Ok what happend if we jump here right after pci_enable_device() has
failed, but pci_disable_device() was called anyway, this is wrong and 
may be fatal because pdev-enable_cnt may becomes negative.  
 +
 +return ret;
 +
  }
  #else
  #define skge_suspend NULL
 This is reworked Jiri's patch:

 [PATCH] sk98lin: handle pci_enable_device() return value in skge_resume()

 Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]
 ---
  drivers/net/sk98lin/skge.c |   20 +++-
  1 files changed, 15 insertions(+), 5 deletions(-)

 diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c
 index e94ab25..eea753a 100644
 --- a/drivers/net/sk98lin/skge.c
 +++ b/drivers/net/sk98lin/skge.c
 @@ -5125,7 +5125,12 @@ static int skge_resume(struct pci_dev *pdev)
  
   pci_set_power_state(pdev, PCI_D0);
   pci_restore_state(pdev);
 - pci_enable_device(pdev);
 + ret = pci_enable_device(pdev);
 + if (ret) {
 + printk(KERN_WARNING sk98lin: unable to enable device %s 
 + in resume\n, dev-name);
 + goto err_out;
 + }   
   pci_set_master(pdev);
   if (pAC-GIni.GIMacsFound == 2)
   ret = request_irq(dev-irq, SkGeIsr, IRQF_SHARED, sk98lin, 
 dev);
 @@ -5133,10 +5138,8 @@ static int skge_resume(struct pci_dev *pdev)
   ret = request_irq(dev-irq, SkGeIsrOnePort, IRQF_SHARED, 
 sk98lin, dev);
   if (ret) {
   printk(KERN_WARNING sk98lin: unable to acquire IRQ %d\n, 
 dev-irq);
 - pAC-AllocFlag = ~SK_ALLOC_IRQ;
 - dev-irq = 0;
 - pci_disable_device(pdev);
 - return -EBUSY;
 + ret = -EBUSY;
 + goto err_out_disable_pdev;
   }
  
   netif_device_attach(dev);
 @@ -5153,6 +5156,13 @@ static int skge_resume(struct pci_dev *pdev)
   }
  
   return 0;
 +
 +err_out_disable_pdev:
 + pci_disable_device(pdev);
 +err_out:
 + pAC-AllocFlag = ~SK_ALLOC_IRQ;
 + dev-irq = 0;
 + return ret;
  }
  #else
  #define skge_suspend NULL
 -- 
 1.4.4.4



 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] sk98lin: handle pci_enable_device() return value in skge_resume()

2007-02-23 Thread Dmitriy Monakhov

This thread looks dead but issue was't fixed.

Jiri Kosina <[EMAIL PROTECTED]> writes:
>> > -  pci_enable_device(pdev);
>> > +  ret = pci_enable_device(pdev);
>> > +  if (ret) {
>> > +  printk(KERN_ERR "sk98lin: Cannot enable PCI device %s during 
>> > resume\n", 
>> > +  dev->name);
>> > +  unregister_netdev(dev);
>> This looks rather wrong - skge_exit() will run unregister_netdev() again.
>
> You are of course right (the problem was also spotted by Russell King). 
> This I believe is the correct one for the sk98lin case.
>
> [PATCH] fix sk98lin driver, ignoring return value from pci_enable_device()
>
> add check of return value to _resume() function of sk98lin driver.
>
> Signed-off-by: Jiri Kosina <[EMAIL PROTECTED]>
>
> ---
>
> --- a/drivers/net/sk98lin/skge.c
> +++ b/drivers/net/sk98lin/skge.c
> @@ -5070,7 +5070,12 @@ static int skge_resume(struct pci_dev *p
>  
>   pci_set_power_state(pdev, PCI_D0);
>   pci_restore_state(pdev);
> - pci_enable_device(pdev);
> + ret = pci_enable_device(pdev);
> + if (ret) {
> + printk(KERN_WARNING "sk98lin: unable to enable device %s in 
> resume\n",
> + dev->name);
> + goto out_err;
> + }   
[snip]
> +out_err:
> + pAC->AllocFlag &= ~SK_ALLOC_IRQ;
> + dev->irq = 0;
> + pci_disable_device(pdev);
< Ok what happend if we jump here right after pci_disable_device() has 
  failed, but pci_disable_device() was called anyway, this is wrong and 
  may be fatal because pdev->enable_cnt may becomes negative.  
> +
> + return ret;
> +
>  }
>  #else
>  #define skge_suspend NULL
This is reworked Jiri's patch:

[PATCH] sk98lin: handle pci_enable_device() return value in skge_resume()

Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]>
---
 drivers/net/sk98lin/skge.c |   20 +++-
 1 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c
index e94ab25..eea753a 100644
--- a/drivers/net/sk98lin/skge.c
+++ b/drivers/net/sk98lin/skge.c
@@ -5125,7 +5125,12 @@ static int skge_resume(struct pci_dev *pdev)
 
pci_set_power_state(pdev, PCI_D0);
pci_restore_state(pdev);
-   pci_enable_device(pdev);
+   ret = pci_enable_device(pdev);
+   if (ret) {
+   printk(KERN_WARNING "sk98lin: unable to enable device %s "
+   "in resume\n", dev->name);
+   goto err_out;
+   }   
pci_set_master(pdev);
if (pAC->GIni.GIMacsFound == 2)
ret = request_irq(dev->irq, SkGeIsr, IRQF_SHARED, "sk98lin", 
dev);
@@ -5133,10 +5138,8 @@ static int skge_resume(struct pci_dev *pdev)
ret = request_irq(dev->irq, SkGeIsrOnePort, IRQF_SHARED, 
"sk98lin", dev);
if (ret) {
printk(KERN_WARNING "sk98lin: unable to acquire IRQ %d\n", 
dev->irq);
-   pAC->AllocFlag &= ~SK_ALLOC_IRQ;
-   dev->irq = 0;
-   pci_disable_device(pdev);
-   return -EBUSY;
+   ret = -EBUSY;
+   goto err_out_disable_pdev;
}
 
netif_device_attach(dev);
@@ -5153,6 +5156,13 @@ static int skge_resume(struct pci_dev *pdev)
}
 
return 0;
+
+err_out_disable_pdev:
+   pci_disable_device(pdev);
+err_out:
+   pAC->AllocFlag &= ~SK_ALLOC_IRQ;
+   dev->irq = 0;
+   return ret;
 }
 #else
 #define skge_suspend NULL
-- 
1.4.4.4



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] libata: handle ata_pci_device_do_resume() failure while resuming

2007-02-23 Thread Dmitriy Monakhov

Since commit:553c4aa630af7bc885e056d0436e4eb7f238579b
ata_pci_device_do_resume() can return error code, all callers was updated
except this one.

Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]>
---
 drivers/ata/sata_inic162x.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/ata/sata_inic162x.c b/drivers/ata/sata_inic162x.c
index 31b636f..7933043 100644
--- a/drivers/ata/sata_inic162x.c
+++ b/drivers/ata/sata_inic162x.c
@@ -639,7 +639,10 @@ static int inic_pci_device_resume(struct pci_dev *pdev)
void __iomem *mmio_base = host->iomap[MMIO_BAR];
int rc;
 
-   ata_pci_device_do_resume(pdev);
+   rc = ata_pci_device_do_resume(pdev);
+   if (rc) {
+   return rc;
+   }
 
if (pdev->dev.power.power_state.event == PM_EVENT_SUSPEND) {
printk("XXX\n");
-- 
1.4.4.4


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] MPT fusion: handle mpt_resume() failure while resuming

2007-02-23 Thread Dmitriy Monakhov

Since mpt-fusion-handle-pci-layer-error-on-resume.patch
function mpt_resume() can return error code.

Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]>
---
 drivers/message/fusion/mptscsih.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/message/fusion/mptscsih.c 
b/drivers/message/fusion/mptscsih.c
index decadbe..96be1d6 100644
--- a/drivers/message/fusion/mptscsih.c
+++ b/drivers/message/fusion/mptscsih.c
@@ -1191,8 +1191,11 @@ mptscsih_resume(struct pci_dev *pdev)
MPT_ADAPTER *ioc = pci_get_drvdata(pdev);
struct Scsi_Host*host = ioc->sh;
MPT_SCSI_HOST   *hd;
+   int ret;
 
-   mpt_resume(pdev);
+   ret = mpt_resume(pdev);
+   if (ret)
+   return  ret;
 
if(!host)
return 0;
-- 
1.4.4.4

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] 3c59x: Handle pci_enable_device() failure while resuming

2007-02-23 Thread Dmitriy Monakhov

Handle pci_enable_device() failure while resuming, we can safely exit here.
Signed-off-by: Monakhov Dmitriy <[EMAIL PROTECTED]>
---
 drivers/net/3c59x.c |8 +++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/drivers/net/3c59x.c b/drivers/net/3c59x.c
index 2b750bd..ea4a78f 100644
--- a/drivers/net/3c59x.c
+++ b/drivers/net/3c59x.c
@@ -821,11 +821,17 @@ static int vortex_resume(struct pci_dev *pdev)
 {
struct net_device *dev = pci_get_drvdata(pdev);
struct vortex_private *vp = netdev_priv(dev);
+   int err;
 
if (dev && vp) {
pci_set_power_state(pdev, PCI_D0);
pci_restore_state(pdev);
-   pci_enable_device(pdev);
+   err = pci_enable_device(pdev);
+   if (err) {
+   printk(KERN_WARNING "%s: Could not enable device \n",
+   dev->name);
+   return err;
+   }
pci_set_master(pdev);
if (request_irq(dev->irq, vp->full_bus_master_rx ?
_interrupt : _interrupt, 
IRQF_SHARED, dev->name, dev)) {
-- 
1.4.4.4


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ecryptfs: handles AOP_TRUNCATED_PAGE better

2007-02-23 Thread Dmitriy Monakhov

- In fact we don't have to fail if AOP_TRUNCATED_PAGE was returned from 
  prepare_write or commit_write. It is beter to retry attempt where it
  is possible.

- Rearange ecryptfs_get_lower_page() error handling logic, make it more clean.

Signed-off-by: Dmitriy Monakhov <[EMAIL PROTECTED]>
---
 fs/ecryptfs/mmap.c |   28 
 1 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/fs/ecryptfs/mmap.c b/fs/ecryptfs/mmap.c
index 842497d..b8fc1ef 100644
--- a/fs/ecryptfs/mmap.c
+++ b/fs/ecryptfs/mmap.c
@@ -447,6 +447,7 @@ static int ecryptfs_write_inode_size_to_header(struct file 
*lower_file,
const struct address_space_operations *lower_a_ops;
u64 file_size;
 
+retry:
header_page = grab_cache_page(lower_inode->i_mapping, 0);
if (!header_page) {
ecryptfs_printk(KERN_ERR, "grab_cache_page for "
@@ -457,9 +458,10 @@ static int ecryptfs_write_inode_size_to_header(struct file 
*lower_file,
lower_a_ops = lower_inode->i_mapping->a_ops;
rc = lower_a_ops->prepare_write(lower_file, header_page, 0, 8);
if (rc) {
-   if (rc == AOP_TRUNCATED_PAGE)
+   if (rc == AOP_TRUNCATED_PAGE) {
ecryptfs_release_lower_page(header_page, 0);
-   else
+   goto retry;
+   } else
ecryptfs_release_lower_page(header_page, 1);
goto out;
}
@@ -474,9 +476,10 @@ static int ecryptfs_write_inode_size_to_header(struct file 
*lower_file,
if (rc < 0)
ecryptfs_printk(KERN_ERR, "Error commiting header page "
"write\n");
-   if (rc == AOP_TRUNCATED_PAGE)
+   if (rc == AOP_TRUNCATED_PAGE) {
ecryptfs_release_lower_page(header_page, 0);
-   else
+   goto retry;
+   } else
ecryptfs_release_lower_page(header_page, 1);
lower_inode->i_mtime = lower_inode->i_ctime = CURRENT_TIME;
mark_inode_dirty_sync(inode);
@@ -565,6 +568,7 @@ int ecryptfs_get_lower_page(struct page **lower_page, 
struct inode *lower_inode,
 {
int rc = 0;
 
+retry:
*lower_page = grab_cache_page(lower_inode->i_mapping, lower_page_index);
if (!(*lower_page)) {
rc = -EINVAL;
@@ -578,18 +582,18 @@ int ecryptfs_get_lower_page(struct page **lower_page, 
struct inode *lower_inode,
  byte_offset,
  region_bytes);
if (rc) {
-   ecryptfs_printk(KERN_ERR, "prepare_write for "
+   if (rc == AOP_TRUNCATED_PAGE) {
+   ecryptfs_release_lower_page(*lower_page, 0);
+   goto retry;
+   } else {
+   ecryptfs_printk(KERN_ERR, "prepare_write for "
"lower_page_index = [0x%.16x] failed; rc = "
"[%d]\n", lower_page_index, rc);
-   }
-out:
-   if (rc && (*lower_page)) {
-   if (rc == AOP_TRUNCATED_PAGE)
-   ecryptfs_release_lower_page(*lower_page, 0);
-   else
ecryptfs_release_lower_page(*lower_page, 1);
-   (*lower_page) = NULL;
+   (*lower_page) = NULL;
+   }
}
+out:
return rc;
 }
 
-- 
1.4.4.4

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] MPT fusion: handle mpt_resume() failure while resuming

2007-02-23 Thread Dmitriy Monakhov

Since mpt-fusion-handle-pci-layer-error-on-resume.patch
function mpt_resume() can return error code.

Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]
---
 drivers/message/fusion/mptscsih.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/message/fusion/mptscsih.c 
b/drivers/message/fusion/mptscsih.c
index decadbe..96be1d6 100644
--- a/drivers/message/fusion/mptscsih.c
+++ b/drivers/message/fusion/mptscsih.c
@@ -1191,8 +1191,11 @@ mptscsih_resume(struct pci_dev *pdev)
MPT_ADAPTER *ioc = pci_get_drvdata(pdev);
struct Scsi_Host*host = ioc-sh;
MPT_SCSI_HOST   *hd;
+   int ret;
 
-   mpt_resume(pdev);
+   ret = mpt_resume(pdev);
+   if (ret)
+   return  ret;
 
if(!host)
return 0;
-- 
1.4.4.4

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] libata: handle ata_pci_device_do_resume() failure while resuming

2007-02-23 Thread Dmitriy Monakhov

Since commit:553c4aa630af7bc885e056d0436e4eb7f238579b
ata_pci_device_do_resume() can return error code, all callers was updated
except this one.

Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]
---
 drivers/ata/sata_inic162x.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/ata/sata_inic162x.c b/drivers/ata/sata_inic162x.c
index 31b636f..7933043 100644
--- a/drivers/ata/sata_inic162x.c
+++ b/drivers/ata/sata_inic162x.c
@@ -639,7 +639,10 @@ static int inic_pci_device_resume(struct pci_dev *pdev)
void __iomem *mmio_base = host-iomap[MMIO_BAR];
int rc;
 
-   ata_pci_device_do_resume(pdev);
+   rc = ata_pci_device_do_resume(pdev);
+   if (rc) {
+   return rc;
+   }
 
if (pdev-dev.power.power_state.event == PM_EVENT_SUSPEND) {
printk(XXX\n);
-- 
1.4.4.4


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ecryptfs: handles AOP_TRUNCATED_PAGE better

2007-02-23 Thread Dmitriy Monakhov

- In fact we don't have to fail if AOP_TRUNCATED_PAGE was returned from 
  prepare_write or commit_write. It is beter to retry attempt where it
  is possible.

- Rearange ecryptfs_get_lower_page() error handling logic, make it more clean.

Signed-off-by: Dmitriy Monakhov [EMAIL PROTECTED]
---
 fs/ecryptfs/mmap.c |   28 
 1 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/fs/ecryptfs/mmap.c b/fs/ecryptfs/mmap.c
index 842497d..b8fc1ef 100644
--- a/fs/ecryptfs/mmap.c
+++ b/fs/ecryptfs/mmap.c
@@ -447,6 +447,7 @@ static int ecryptfs_write_inode_size_to_header(struct file 
*lower_file,
const struct address_space_operations *lower_a_ops;
u64 file_size;
 
+retry:
header_page = grab_cache_page(lower_inode-i_mapping, 0);
if (!header_page) {
ecryptfs_printk(KERN_ERR, grab_cache_page for 
@@ -457,9 +458,10 @@ static int ecryptfs_write_inode_size_to_header(struct file 
*lower_file,
lower_a_ops = lower_inode-i_mapping-a_ops;
rc = lower_a_ops-prepare_write(lower_file, header_page, 0, 8);
if (rc) {
-   if (rc == AOP_TRUNCATED_PAGE)
+   if (rc == AOP_TRUNCATED_PAGE) {
ecryptfs_release_lower_page(header_page, 0);
-   else
+   goto retry;
+   } else
ecryptfs_release_lower_page(header_page, 1);
goto out;
}
@@ -474,9 +476,10 @@ static int ecryptfs_write_inode_size_to_header(struct file 
*lower_file,
if (rc  0)
ecryptfs_printk(KERN_ERR, Error commiting header page 
write\n);
-   if (rc == AOP_TRUNCATED_PAGE)
+   if (rc == AOP_TRUNCATED_PAGE) {
ecryptfs_release_lower_page(header_page, 0);
-   else
+   goto retry;
+   } else
ecryptfs_release_lower_page(header_page, 1);
lower_inode-i_mtime = lower_inode-i_ctime = CURRENT_TIME;
mark_inode_dirty_sync(inode);
@@ -565,6 +568,7 @@ int ecryptfs_get_lower_page(struct page **lower_page, 
struct inode *lower_inode,
 {
int rc = 0;
 
+retry:
*lower_page = grab_cache_page(lower_inode-i_mapping, lower_page_index);
if (!(*lower_page)) {
rc = -EINVAL;
@@ -578,18 +582,18 @@ int ecryptfs_get_lower_page(struct page **lower_page, 
struct inode *lower_inode,
  byte_offset,
  region_bytes);
if (rc) {
-   ecryptfs_printk(KERN_ERR, prepare_write for 
+   if (rc == AOP_TRUNCATED_PAGE) {
+   ecryptfs_release_lower_page(*lower_page, 0);
+   goto retry;
+   } else {
+   ecryptfs_printk(KERN_ERR, prepare_write for 
lower_page_index = [0x%.16x] failed; rc = 
[%d]\n, lower_page_index, rc);
-   }
-out:
-   if (rc  (*lower_page)) {
-   if (rc == AOP_TRUNCATED_PAGE)
-   ecryptfs_release_lower_page(*lower_page, 0);
-   else
ecryptfs_release_lower_page(*lower_page, 1);
-   (*lower_page) = NULL;
+   (*lower_page) = NULL;
+   }
}
+out:
return rc;
 }
 
-- 
1.4.4.4

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] 3c59x: Handle pci_enable_device() failure while resuming

2007-02-23 Thread Dmitriy Monakhov

Handle pci_enable_device() failure while resuming, we can safely exit here.
Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]
---
 drivers/net/3c59x.c |8 +++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/drivers/net/3c59x.c b/drivers/net/3c59x.c
index 2b750bd..ea4a78f 100644
--- a/drivers/net/3c59x.c
+++ b/drivers/net/3c59x.c
@@ -821,11 +821,17 @@ static int vortex_resume(struct pci_dev *pdev)
 {
struct net_device *dev = pci_get_drvdata(pdev);
struct vortex_private *vp = netdev_priv(dev);
+   int err;
 
if (dev  vp) {
pci_set_power_state(pdev, PCI_D0);
pci_restore_state(pdev);
-   pci_enable_device(pdev);
+   err = pci_enable_device(pdev);
+   if (err) {
+   printk(KERN_WARNING %s: Could not enable device \n,
+   dev-name);
+   return err;
+   }
pci_set_master(pdev);
if (request_irq(dev-irq, vp-full_bus_master_rx ?
boomerang_interrupt : vortex_interrupt, 
IRQF_SHARED, dev-name, dev)) {
-- 
1.4.4.4


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] sk98lin: handle pci_enable_device() return value in skge_resume()

2007-02-23 Thread Dmitriy Monakhov

This thread looks dead but issue was't fixed.

Jiri Kosina [EMAIL PROTECTED] writes:
  -  pci_enable_device(pdev);
  +  ret = pci_enable_device(pdev);
  +  if (ret) {
  +  printk(KERN_ERR sk98lin: Cannot enable PCI device %s during 
  resume\n, 
  +  dev-name);
  +  unregister_netdev(dev);
 This looks rather wrong - skge_exit() will run unregister_netdev() again.

 You are of course right (the problem was also spotted by Russell King). 
 This I believe is the correct one for the sk98lin case.

 [PATCH] fix sk98lin driver, ignoring return value from pci_enable_device()

 add check of return value to _resume() function of sk98lin driver.

 Signed-off-by: Jiri Kosina [EMAIL PROTECTED]

 ---

 --- a/drivers/net/sk98lin/skge.c
 +++ b/drivers/net/sk98lin/skge.c
 @@ -5070,7 +5070,12 @@ static int skge_resume(struct pci_dev *p
  
   pci_set_power_state(pdev, PCI_D0);
   pci_restore_state(pdev);
 - pci_enable_device(pdev);
 + ret = pci_enable_device(pdev);
 + if (ret) {
 + printk(KERN_WARNING sk98lin: unable to enable device %s in 
 resume\n,
 + dev-name);
 + goto out_err;
 + }   
[snip]
 +out_err:
 + pAC-AllocFlag = ~SK_ALLOC_IRQ;
 + dev-irq = 0;
 + pci_disable_device(pdev);
 Ok what happend if we jump here right after pci_disable_device() has 
  failed, but pci_disable_device() was called anyway, this is wrong and 
  may be fatal because pdev-enable_cnt may becomes negative.  
 +
 + return ret;
 +
  }
  #else
  #define skge_suspend NULL
This is reworked Jiri's patch:

[PATCH] sk98lin: handle pci_enable_device() return value in skge_resume()

Signed-off-by: Monakhov Dmitriy [EMAIL PROTECTED]
---
 drivers/net/sk98lin/skge.c |   20 +++-
 1 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c
index e94ab25..eea753a 100644
--- a/drivers/net/sk98lin/skge.c
+++ b/drivers/net/sk98lin/skge.c
@@ -5125,7 +5125,12 @@ static int skge_resume(struct pci_dev *pdev)
 
pci_set_power_state(pdev, PCI_D0);
pci_restore_state(pdev);
-   pci_enable_device(pdev);
+   ret = pci_enable_device(pdev);
+   if (ret) {
+   printk(KERN_WARNING sk98lin: unable to enable device %s 
+   in resume\n, dev-name);
+   goto err_out;
+   }   
pci_set_master(pdev);
if (pAC-GIni.GIMacsFound == 2)
ret = request_irq(dev-irq, SkGeIsr, IRQF_SHARED, sk98lin, 
dev);
@@ -5133,10 +5138,8 @@ static int skge_resume(struct pci_dev *pdev)
ret = request_irq(dev-irq, SkGeIsrOnePort, IRQF_SHARED, 
sk98lin, dev);
if (ret) {
printk(KERN_WARNING sk98lin: unable to acquire IRQ %d\n, 
dev-irq);
-   pAC-AllocFlag = ~SK_ALLOC_IRQ;
-   dev-irq = 0;
-   pci_disable_device(pdev);
-   return -EBUSY;
+   ret = -EBUSY;
+   goto err_out_disable_pdev;
}
 
netif_device_attach(dev);
@@ -5153,6 +5156,13 @@ static int skge_resume(struct pci_dev *pdev)
}
 
return 0;
+
+err_out_disable_pdev:
+   pci_disable_device(pdev);
+err_out:
+   pAC-AllocFlag = ~SK_ALLOC_IRQ;
+   dev-irq = 0;
+   return ret;
 }
 #else
 #define skge_suspend NULL
-- 
1.4.4.4



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 >

1 - 100 of 194 matches

Mail list logo