Re: PROBLEM: 2.4.36.1 hangs.

2008-02-26 Thread Pascal Hambourg

dann frazier a écrit :


Correcting the le16_to_cpu placement as Glen described
fixes the issue for me.


One of my boxes has at least six directories triggering the issue (I 
must be very unlucky), and Glen's patch fixes it all. Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: 2.4.36.1 hangs.

2008-02-26 Thread Willy Tarreau
On Tue, Feb 26, 2008 at 03:13:46AM -0700, dann frazier wrote:
> I'm now able to reliably reproduce it by creating/removing a chroot
> (pbuilder create on a Debian system, though I'm sure a simpler test
> exists). Correcting the le16_to_cpu placement as Glen described
> fixes the issue for me.

OK that's excellent news.

> > BTW, I notice that 2.6 also has one extra chunk that 2.4 does not
> > have :
> > 
> > if (unlikely(need_revalidate)) {
> > +   if (offset) {
> > offset = ext2_validate_entry(kaddr, offset, 
> > chunk_mask);
> > +   filp->f_pos = (n< > offset;
> > +   }
> > +   filp->f_version = inode->i_version;
> > need_revalidate = 0;
> > }
> > 
> > I have no idea whether this part is needed, we'd better ask Theo or Al
> > for some advices, as I'm not tempted by merging an uncertain patch when
> > it comes to filesystems.
> 
> Looks like a test case may be available:
>   
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=2d7f2ea9c989853310c7f6e8be52cc090cc8e66b

Dann, would you care to ping Al about the opportunity to merge this entire
patch, as well as Masoud for his test-case ? The patch looks like all what
remains different between 2.4 and 2.6, so I would bet we need it too.

Thanks,
Willy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: 2.4.36.1 hangs.

2008-02-26 Thread dann frazier
On Tue, Feb 26, 2008 at 09:16:25AM +0100, Willy Tarreau wrote:
> On Mon, Feb 25, 2008 at 09:36:12PM -1000, Glen Nakamura wrote:
> > Aloha,
> > 
> > The "ext2_readdir() filp->f_pos fix" patch looks weird...
> > Perhaps the "filp->f_pos += le16_to_cpu(de->rec_len);" line should be
> > outside of the if statement like the indentation implies?
> 
> good catch! At least it's what is done in 2.6.

Yes, that certainly looks like a bug.

> > As it is, filp->f_pos gets corrupted if de->inode is ever zero...
> > This could possibly explain why I had a few strange directory
> > entries until I checked the filesystem with:
> > e2fsck -D -F -f /dev/{ext2 partition}
> > 
> > - glen
> > 
> > Here is an updated (untested) patch:
> 
> unfortunately, neither Dann nor me could reproduce the issue, so
> we'll wait for some victims^Wvolunteers to give it a try.

I'm now able to reliably reproduce it by creating/removing a chroot
(pbuilder create on a Debian system, though I'm sure a simpler test
exists). Correcting the le16_to_cpu placement as Glen described
fixes the issue for me.

> BTW, I notice that 2.6 also has one extra chunk that 2.4 does not
> have :
> 
> if (unlikely(need_revalidate)) {
> +   if (offset) {
> offset = ext2_validate_entry(kaddr, offset, 
> chunk_mask);
> +   filp->f_pos = (n< +   }
> +   filp->f_version = inode->i_version;
> need_revalidate = 0;
> }
> 
> I have no idea whether this part is needed, we'd better ask Theo or Al
> for some advices, as I'm not tempted by merging an uncertain patch when
> it comes to filesystems.

Looks like a test case may be available:
  
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=2d7f2ea9c989853310c7f6e8be52cc090cc8e66b

-- 
dann frazier

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: 2.4.36.1 hangs.

2008-02-26 Thread Willy Tarreau
On Mon, Feb 25, 2008 at 09:36:12PM -1000, Glen Nakamura wrote:
> Aloha,
> 
> The "ext2_readdir() filp->f_pos fix" patch looks weird...
> Perhaps the "filp->f_pos += le16_to_cpu(de->rec_len);" line should be
> outside of the if statement like the indentation implies?

good catch! At least it's what is done in 2.6.

> As it is, filp->f_pos gets corrupted if de->inode is ever zero...
> This could possibly explain why I had a few strange directory
> entries until I checked the filesystem with:
> e2fsck -D -F -f /dev/{ext2 partition}
> 
> - glen
> 
> Here is an updated (untested) patch:

unfortunately, neither Dann nor me could reproduce the issue, so
we'll wait for some victims^Wvolunteers to give it a try.

BTW, I notice that 2.6 also has one extra chunk that 2.4 does not
have :

if (unlikely(need_revalidate)) {
+   if (offset) {
offset = ext2_validate_entry(kaddr, offset, 
chunk_mask);
+   filp->f_pos = (ni_version;
need_revalidate = 0;
}

I have no idea whether this part is needed, we'd better ask Theo or Al
for some advices, as I'm not tempted by merging an uncertain patch when
it comes to filesystems.

Regards,
Willy

> --- linux-2.4.36.orig/fs/ext2/dir.c
> +++ linux-2.4.36/fs/ext2/dir.c
> @@ -240,7 +240,7 @@ ext2_readdir (struct file * filp, void *
>   loff_t pos = filp->f_pos;
>   struct inode *inode = filp->f_dentry->d_inode;
>   struct super_block *sb = inode->i_sb;
> - unsigned offset = pos & ~PAGE_CACHE_MASK;
> + unsigned int offset = pos & ~PAGE_CACHE_MASK;
>   unsigned long n = pos >> PAGE_CACHE_SHIFT;
>   unsigned long npages = dir_pages(inode);
>   unsigned chunk_mask = ~(ext2_chunk_size(inode)-1);
> @@ -258,8 +258,13 @@ ext2_readdir (struct file * filp, void *
>   ext2_dirent *de;
>   struct page *page = ext2_get_page(inode, n);
>  
> - if (IS_ERR(page))
> + if (IS_ERR(page)) {
> + ext2_error(sb, __FUNCTION__,
> +"bad page in #%lu",
> +inode->i_ino);
> + filp->f_pos += PAGE_CACHE_SIZE - offset;
>   continue;
> + }
>   kaddr = page_address(page);
>   if (need_revalidate) {
>   offset = ext2_validate_entry(kaddr, offset, chunk_mask);
> @@ -267,7 +272,7 @@ ext2_readdir (struct file * filp, void *
>   }
>   de = (ext2_dirent *)(kaddr+offset);
>   limit = kaddr + PAGE_CACHE_SIZE - EXT2_DIR_REC_LEN(1);
> - for ( ;(char*)de <= limit; de = ext2_next_entry(de))
> + for ( ;(char*)de <= limit; de = ext2_next_entry(de)) {
>   if (de->inode) {
>   int over;
>   unsigned char d_type = DT_UNKNOWN;
> @@ -284,11 +289,12 @@ ext2_readdir (struct file * filp, void *
>   goto done;
>   }
>   }
> + filp->f_pos += le16_to_cpu(de->rec_len);
> + }
>   ext2_put_page(page);
>   }
>  
>  done:
> - filp->f_pos = (n << PAGE_CACHE_SHIFT) | offset;
>   filp->f_version = inode->i_version;
>   UPDATE_ATIME(inode);
>   return 0;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: 2.4.36.1 hangs.

2008-02-25 Thread Glen Nakamura
Aloha,

The "ext2_readdir() filp->f_pos fix" patch looks weird...
Perhaps the "filp->f_pos += le16_to_cpu(de->rec_len);" line should be
outside of the if statement like the indentation implies?
As it is, filp->f_pos gets corrupted if de->inode is ever zero...
This could possibly explain why I had a few strange directory
entries until I checked the filesystem with:
e2fsck -D -F -f /dev/{ext2 partition}

- glen

Here is an updated (untested) patch:

--- linux-2.4.36.orig/fs/ext2/dir.c
+++ linux-2.4.36/fs/ext2/dir.c
@@ -240,7 +240,7 @@ ext2_readdir (struct file * filp, void *
loff_t pos = filp->f_pos;
struct inode *inode = filp->f_dentry->d_inode;
struct super_block *sb = inode->i_sb;
-   unsigned offset = pos & ~PAGE_CACHE_MASK;
+   unsigned int offset = pos & ~PAGE_CACHE_MASK;
unsigned long n = pos >> PAGE_CACHE_SHIFT;
unsigned long npages = dir_pages(inode);
unsigned chunk_mask = ~(ext2_chunk_size(inode)-1);
@@ -258,8 +258,13 @@ ext2_readdir (struct file * filp, void *
ext2_dirent *de;
struct page *page = ext2_get_page(inode, n);
 
-   if (IS_ERR(page))
+   if (IS_ERR(page)) {
+   ext2_error(sb, __FUNCTION__,
+  "bad page in #%lu",
+  inode->i_ino);
+   filp->f_pos += PAGE_CACHE_SIZE - offset;
continue;
+   }
kaddr = page_address(page);
if (need_revalidate) {
offset = ext2_validate_entry(kaddr, offset, chunk_mask);
@@ -267,7 +272,7 @@ ext2_readdir (struct file * filp, void *
}
de = (ext2_dirent *)(kaddr+offset);
limit = kaddr + PAGE_CACHE_SIZE - EXT2_DIR_REC_LEN(1);
-   for ( ;(char*)de <= limit; de = ext2_next_entry(de))
+   for ( ;(char*)de <= limit; de = ext2_next_entry(de)) {
if (de->inode) {
int over;
unsigned char d_type = DT_UNKNOWN;
@@ -284,11 +289,12 @@ ext2_readdir (struct file * filp, void *
goto done;
}
}
+   filp->f_pos += le16_to_cpu(de->rec_len);
+   }
ext2_put_page(page);
}
 
 done:
-   filp->f_pos = (n << PAGE_CACHE_SHIFT) | offset;
filp->f_version = inode->i_version;
UPDATE_ATIME(inode);
return 0;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: 2.4.36.1 hangs.

2008-02-24 Thread Pascal Hambourg

Hello,

Willy Tarreau wrote :


On Fri, Feb 22, 2008 at 03:08:30PM +0100, Unknown wrote:


2) 2.4.36.1 hangs during compilation of lighttpd-1.4.18.
  It is probably during ext2_readdir() but I cannot confirm that..
  Currently it only happens during compilation of lighttpd-1.4.18, always
  in the same place. [...]


I experience the same problem when executing "ls /etc".


OK, could you please try to revert the attached patch (patch -Rp1) to see if
this fixes the problem for you ? Please keep Dann and me in CC as it's not
easy to spot 2.4-related threads on LKML these days!



commit c30306fb287323591c854a0982d9fa5351859b45
Author: dann frazier <[EMAIL PROTECTED]>
Date:   Mon Jan 21 17:13:06 2008 -0700

ext2_readdir() filp->f_pos fix


The patch didn't revert cleanly because of the two subsequent 
ext2-related patches, so I had to revert the three of them. It fixed the 
problem. Then I applied only the "ext2_readdir() filp->f_pos fix" patch 
again and the problem came back. HTH.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: 2.4.36.1 hangs.

2008-02-24 Thread Willy Tarreau
On Sun, Feb 24, 2008 at 07:12:04PM +0100, Pascal Hambourg wrote:
> Hello,
> 
> Willy Tarreau wrote :
> >
> >On Fri, Feb 22, 2008 at 03:08:30PM +0100, Unknown wrote:
> >
> >>2) 2.4.36.1 hangs during compilation of lighttpd-1.4.18.
> >>  It is probably during ext2_readdir() but I cannot confirm that..
> >>  Currently it only happens during compilation of lighttpd-1.4.18, always
> >>  in the same place. [...]
> 
> I experience the same problem when executing "ls /etc".
> 
> >OK, could you please try to revert the attached patch (patch -Rp1) to see 
> >if
> >this fixes the problem for you ? Please keep Dann and me in CC as it's not
> >easy to spot 2.4-related threads on LKML these days!
> >
> >
> >
> >commit c30306fb287323591c854a0982d9fa5351859b45
> >Author: dann frazier <[EMAIL PROTECTED]>
> >Date:   Mon Jan 21 17:13:06 2008 -0700
> >
> >ext2_readdir() filp->f_pos fix
> 
> The patch didn't revert cleanly because of the two subsequent 
> ext2-related patches, so I had to revert the three of them. It fixed the 
> problem. Then I applied only the "ext2_readdir() filp->f_pos fix" patch 
> again and the problem came back. HTH.

OK thanks very much Pascal.

I'm reverting the patch for the moment and will release 2.4.36.2 without
it.

Dann, it looks like the backport of the fix causes more trouble than it
attempts to fix :-/

(Un)fortunately I had no problem here, so I think it's not that easy to
reproduce the issue. If the fix is too hard to get right and the risk
of vulnerability very low, don't you think we should simply leave it
unfixed, at least until we really understand the nature of the problem ?

Thanks,
Willy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: 2.4.36.1 hangs.

2008-02-23 Thread Willy Tarreau
Hello,

first, thanks for your detailed report.

On Fri, Feb 22, 2008 at 03:08:30PM +0100, Unknown wrote:
> 1) 2.4.36.1 hangs (probably during ext2_readdir())
> 
> 2) 2.4.36.1 hangs during compilation of lighttpd-1.4.18.
>It is probably during ext2_readdir() but I cannot confirm that..
>Currently it only happens during compilation of lighttpd-1.4.18, always
>in the same place.
>Kernel 2.4.36 w/ same options (make oldconfig) works fine.
> 
> Last few lines of 'strace -f make' just before hang:
> [pid 12817] open(".", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3
> [pid 12817] fstat64(3, {st_mode=S_IFDIR|0777, st_size=8192, ...}) = 0
> [pid 12817] fcntl64(3, F_SETFD, FD_CLOEXEC) = 0
> [pid 12817] getdents64(3, /* 114 entries */, 4096) = 4088
> [pid 12817] getdents64(3, /* 63 entries */, 4096) = 2424
> [pid 12817] getdents64(3,

OK, could you please try to revert the attached patch (patch -Rp1) to see if
this fixes the problem for you ? Please keep Dann and me in CC as it's not
easy to spot 2.4-related threads on LKML these days!

Thanks,
Willy



commit c30306fb287323591c854a0982d9fa5351859b45
Author: dann frazier <[EMAIL PROTECTED]>
Date:   Mon Jan 21 17:13:06 2008 -0700

ext2_readdir() filp->f_pos fix

This is a 2.4 backport of a linux-2.6 change by Jan Blunck
(old-2.6-bkcvs commit 2196b4744393d4f6c06fc4d63b98556d05b90933)

Commit log from 2.6 follows.

  [PATCH] ext2_readdir() filp->f_pos fix

  If the whole directory is read, ext2_readdir() sets the f_pos to a 
multiple
  of the page size (because of the conditions of the outer for loop).  This
  sets the wrong f_pos for directory inodes on ext2 partitions with a block
  size differing from the page size.

Signed-off-by: dann frazier <[EMAIL PROTECTED]>

diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c
index 58b76dd..b158e60 100644
--- a/fs/ext2/dir.c
+++ b/fs/ext2/dir.c
@@ -240,7 +240,7 @@ ext2_readdir (struct file * filp, void * dirent, filldir_t 
filldir)
loff_t pos = filp->f_pos;
struct inode *inode = filp->f_dentry->d_inode;
struct super_block *sb = inode->i_sb;
-   unsigned offset = pos & ~PAGE_CACHE_MASK;
+   unsigned int offset = pos & ~PAGE_CACHE_MASK;
unsigned long n = pos >> PAGE_CACHE_SHIFT;
unsigned long npages = dir_pages(inode);
unsigned chunk_mask = ~(ext2_chunk_size(inode)-1);
@@ -258,8 +258,13 @@ ext2_readdir (struct file * filp, void * dirent, filldir_t 
filldir)
ext2_dirent *de;
struct page *page = ext2_get_page(inode, n);
 
-   if (IS_ERR(page))
+   if (IS_ERR(page)) {
+   ext2_error(sb, __FUNCTION__,
+  "bad page in #%lu",
+  inode->i_ino);
+   filp->f_pos += PAGE_CACHE_SIZE - offset;
continue;
+   }
kaddr = page_address(page);
if (need_revalidate) {
offset = ext2_validate_entry(kaddr, offset, chunk_mask);
@@ -283,12 +288,12 @@ ext2_readdir (struct file * filp, void * dirent, 
filldir_t filldir)
ext2_put_page(page);
goto done;
}
+   filp->f_pos += le16_to_cpu(de->rec_len);
}
ext2_put_page(page);
}
 
 done:
-   filp->f_pos = (n << PAGE_CACHE_SHIFT) | offset;
filp->f_version = inode->i_version;
UPDATE_ATIME(inode);
return 0;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/