Re: [PATCH 0/6] fix INT_MAX readdir hang, plus cleanups

2013-07-01 Thread Josef Bacik
On Tue, Jun 04, 2013 at 06:17:54PM -0400, Zach Brown wrote:
 Hi gang,
 
 I finally sat down to fix that readdir hang that has been in the back
 of my mind for a while.  I *hope* that the fix is pretty simple: just
 don't manufacture a fake f_pos, I *think* we can abuse f_version as an
 indicator that we shouldn't return entries.  Does this look reasonable?
 
 We still have the problem that we can generate valid large f_pos values
 that can confuse 32bit userspace, but that's a different problem.  I
 think we'll want filldir generation of EOVERFLOW like what exists for
 large inodes. 
 
 The rest of the patches are cleanups that I saw when absorbing the
 code.  It's all lightly tested with xfstests but it wouldn't surprise
 me if I missed something so review is appreciated.
 
 Thanks!
 

One of these patches is making new entries not show up in readdir.  This was
discovered while running stress.sh overnight, it complained about files not
matching but when they were checked the files matched.  Dropping the entire
series made stress.sh run fine.  So I'm dropping these for the next merge window
but I'll dig into it and try and figure out what was causing the problem.
Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] fix INT_MAX readdir hang, plus cleanups

2013-07-01 Thread Chris Mason
Quoting Josef Bacik (2013-07-01 08:54:35)
 On Tue, Jun 04, 2013 at 06:17:54PM -0400, Zach Brown wrote:
  Hi gang,
  
  I finally sat down to fix that readdir hang that has been in the back
  of my mind for a while.  I *hope* that the fix is pretty simple: just
  don't manufacture a fake f_pos, I *think* we can abuse f_version as an
  indicator that we shouldn't return entries.  Does this look reasonable?
  
  We still have the problem that we can generate valid large f_pos values
  that can confuse 32bit userspace, but that's a different problem.  I
  think we'll want filldir generation of EOVERFLOW like what exists for
  large inodes. 
  
  The rest of the patches are cleanups that I saw when absorbing the
  code.  It's all lightly tested with xfstests but it wouldn't surprise
  me if I missed something so review is appreciated.
  
  Thanks!
  
 
 One of these patches is making new entries not show up in readdir.  This was
 discovered while running stress.sh overnight, it complained about files not
 matching but when they were checked the files matched.  Dropping the entire
 series made stress.sh run fine.  So I'm dropping these for the next merge 
 window
 but I'll dig into it and try and figure out what was causing the problem.

Unfortunately I've only triggered this on flash, and the run takes about
two hours to trigger.  Trying now with some extra printks to see if I
can nail it down

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] fix INT_MAX readdir hang, plus cleanups

2013-07-01 Thread Zach Brown
  code.  It's all lightly tested with xfstests but it wouldn't surprise
  me if I missed something so review is appreciated.

*mmm, hmmm*

 One of these patches is making new entries not show up in readdir.  This was
 discovered while running stress.sh overnight, it complained about files not
 matching but when they were checked the files matched.  Dropping the entire
 series made stress.sh run fine.  So I'm dropping these for the next merge 
 window
 but I'll dig into it and try and figure out what was causing the problem.

Nerts.  It's got to be the delayed inode stuff.

Maybe it's some unlink/recreate pattern?  Is this a thing that stress.sh
does?  (Where's stress.sh live?)

- z 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] fix INT_MAX readdir hang, plus cleanups

2013-06-12 Thread Chris Mason
Quoting Zach Brown (2013-06-10 18:39:58)
 On Tue, Jun 04, 2013 at 04:26:57PM -0700, Zach Brown wrote:
  On Tue, Jun 04, 2013 at 07:16:53PM -0400, Chris Mason wrote:
   Quoting Zach Brown (2013-06-04 18:17:54)
Hi gang,

I finally sat down to fix that readdir hang that has been in the back
of my mind for a while.  I *hope* that the fix is pretty simple: just
don't manufacture a fake f_pos, I *think* we can abuse f_version as an
indicator that we shouldn't return entries.  Does this look reasonable?
   
   I like it, and it doesn't look too far away from how others are abusing
   f_version.  Have you tried with NFS?  I don't think it'll hurt, but NFS
   loves to surprise me.
  
  Mm, no, I hadn't.  I'll give it a go tomorrow.  What could go wrong? :)
 
 Or a week later.  Pretty close!
 
 I couldn't get NFS to break.  Clients see new entries created directly
 in the exported btrfs and on either of noac and actime=1 client mounts.
 For whatever that's worth.

Great.

 
 But I did find that I'd broken the case of trying to re-enable readdir
 results by seeking past the last entry (which happens to be the current
 f_pos now that we're using f_version).
 
 Here's the incremental fix against what Josef has in -next.  I'm cool
 with either squashing or just committing it.

Lets squash it in, Josef loves to rebase.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] fix INT_MAX readdir hang, plus cleanups

2013-06-10 Thread Zach Brown
On Tue, Jun 04, 2013 at 04:26:57PM -0700, Zach Brown wrote:
 On Tue, Jun 04, 2013 at 07:16:53PM -0400, Chris Mason wrote:
  Quoting Zach Brown (2013-06-04 18:17:54)
   Hi gang,
   
   I finally sat down to fix that readdir hang that has been in the back
   of my mind for a while.  I *hope* that the fix is pretty simple: just
   don't manufacture a fake f_pos, I *think* we can abuse f_version as an
   indicator that we shouldn't return entries.  Does this look reasonable?
  
  I like it, and it doesn't look too far away from how others are abusing
  f_version.  Have you tried with NFS?  I don't think it'll hurt, but NFS
  loves to surprise me.
 
 Mm, no, I hadn't.  I'll give it a go tomorrow.  What could go wrong? :)

Or a week later.  Pretty close!

I couldn't get NFS to break.  Clients see new entries created directly
in the exported btrfs and on either of noac and actime=1 client mounts.
For whatever that's worth.

But I did find that I'd broken the case of trying to re-enable readdir
results by seeking past the last entry (which happens to be the current
f_pos now that we're using f_version).

Here's the incremental fix against what Josef has in -next.  I'm cool
with either squashing or just committing it.

- z

Subject: [PATCH] btrfs: reset f_version when seeking to pos

Commit 63e3dfe (btrfs: fix readdir hang with offsets past INT_MAX)
switched to using f_version to stop readdir results instead of setting a
large f_pos.

It inadvertantly changed behaviour in the case where an app specifically
seeks to one past the last valid dent-d_off it has seen.  Previously
f_pos would have changed from the fake f_pos to this new f_pos which
would let readdir return new entries.

But now that it's using f_version it might not have seen new entries.
generic_file_llseek() won't clear f_version if the desirned pos happens
to be the current f_pos.

So we add a little wrapper to notice this case and clear f_version so
that entries can be seen in this case.

Signed-off-by: Zach Brown z...@redhat.com
---
 fs/btrfs/inode.c | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1059c90..590c274 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4997,6 +4997,23 @@ unsigned char btrfs_filetype_table[] = {
  * which prevents readdir results until seek resets f_pos and f_version.
  */
 #define BTRFS_READDIR_EOF ~0ULL
+static loff_t btrfs_dir_llseek(struct file *file, loff_t offset, int whence)
+{
+   struct inode *inode = file-f_mapping-host;
+   loff_t ret;
+
+   /*
+* f_version isn't reset if a seek is attempted to the current pos.  A
+* caller can be trying to see more entries by seeking past the last
+* entry to the current pos after creating a new entry.
+*/
+   mutex_lock(inode-i_mutex);
+   ret = generic_file_llseek(file, offset, whence);
+   if (ret == offset  file-f_version == BTRFS_READDIR_EOF)
+   file-f_version = 0;
+   mutex_unlock(inode-i_mutex);
+   return ret;
+}
 
 static int btrfs_real_readdir(struct file *filp, void *dirent,
  filldir_t filldir)
@@ -8642,7 +8659,7 @@ static const struct inode_operations 
btrfs_dir_ro_inode_operations = {
 };
 
 static const struct file_operations btrfs_dir_file_operations = {
-   .llseek = generic_file_llseek,
+   .llseek = btrfs_dir_llseek,
.read   = generic_read_dir,
.readdir= btrfs_real_readdir,
.unlocked_ioctl = btrfs_ioctl,
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] fix INT_MAX readdir hang, plus cleanups

2013-06-06 Thread David Sterba
On Thu, Jun 06, 2013 at 09:35:07AM +0800, Miao Xie wrote:
 Onwed, 5 Jun 2013 15:36:36 +0200, David Sterba wrote:
  On Wed, Jun 05, 2013 at 10:34:08AM +0800, Miao Xie wrote:
  On tue, 4 Jun 2013 16:26:57 -0700, Zach Brown wrote:
  On Tue, Jun 04, 2013 at 07:16:53PM -0400, Chris Mason wrote:
  Quoting Zach Brown (2013-06-04 18:17:54)
  Hi gang,
 
  I finally sat down to fix that readdir hang that has been in the back
  of my mind for a while.  I *hope* that the fix is pretty simple: just
  don't manufacture a fake f_pos, I *think* we can abuse f_version as an
  indicator that we shouldn't return entries.  Does this look reasonable?
 
  I like it, and it doesn't look too far away from how others are abusing
  f_version.  Have you tried with NFS?  I don't think it'll hurt, but NFS
  loves to surprise me.
 
  Mm, no, I hadn't.  I'll give it a go tomorrow.  What could go wrong? :)
 
  If we can not use f_version, we can use private_data. I think this variant 
  is
  safe.
  
  private_data is used within the ioctl user transactions, so a
  readdir(mountpoint) with a user transaction running can break it.
 
 don't worry, we can allocate a structure to keep both transaction handle and 
 the information
 of readdir, just like ext3/ext4. It is a flexible way and we can extend the 
 structure to keep
 more information if need in the future.
 
 Beside the above method, we also can abuse the low bits of private_data to 
 indicator that
 we shouldn't return entries.

Allocating a full structure for private_data sounds better than directly
modifying the pointer value itself.


david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] fix INT_MAX readdir hang, plus cleanups

2013-06-06 Thread Chris Mason
Quoting David Sterba (2013-06-06 09:55:50)
 On Thu, Jun 06, 2013 at 09:35:07AM +0800, Miao Xie wrote:
  Onwed, 5 Jun 2013 15:36:36 +0200, David Sterba wrote:
   On Wed, Jun 05, 2013 at 10:34:08AM +0800, Miao Xie wrote:
   On tue, 4 Jun 2013 16:26:57 -0700, Zach Brown wrote:
   On Tue, Jun 04, 2013 at 07:16:53PM -0400, Chris Mason wrote:
   Quoting Zach Brown (2013-06-04 18:17:54)
   Hi gang,
  
   I finally sat down to fix that readdir hang that has been in the back
   of my mind for a while.  I *hope* that the fix is pretty simple: just
   don't manufacture a fake f_pos, I *think* we can abuse f_version as an
   indicator that we shouldn't return entries.  Does this look 
   reasonable?
  
   I like it, and it doesn't look too far away from how others are abusing
   f_version.  Have you tried with NFS?  I don't think it'll hurt, but NFS
   loves to surprise me.
  
   Mm, no, I hadn't.  I'll give it a go tomorrow.  What could go wrong? :)
  
   If we can not use f_version, we can use private_data. I think this 
   variant is
   safe.
   
   private_data is used within the ioctl user transactions, so a
   readdir(mountpoint) with a user transaction running can break it.
  
  don't worry, we can allocate a structure to keep both transaction handle 
  and the information
  of readdir, just like ext3/ext4. It is a flexible way and we can extend the 
  structure to keep
  more information if need in the future.
  
  Beside the above method, we also can abuse the low bits of private_data to 
  indicator that
  we shouldn't return entries.
 
 Allocating a full structure for private_data sounds better than directly
 modifying the pointer value itself.

I'd actually rather tag the pointers than go through kmalloc, we just
need one bit (maybe that really just shows how badly we've corrupted
poor Miao).  But, we're not there yet, I think Zach's initial patch will
work fine.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] fix INT_MAX readdir hang, plus cleanups

2013-06-05 Thread David Sterba
On Wed, Jun 05, 2013 at 10:34:08AM +0800, Miao Xie wrote:
 Ontue, 4 Jun 2013 16:26:57 -0700, Zach Brown wrote:
  On Tue, Jun 04, 2013 at 07:16:53PM -0400, Chris Mason wrote:
  Quoting Zach Brown (2013-06-04 18:17:54)
  Hi gang,
 
  I finally sat down to fix that readdir hang that has been in the back
  of my mind for a while.  I *hope* that the fix is pretty simple: just
  don't manufacture a fake f_pos, I *think* we can abuse f_version as an
  indicator that we shouldn't return entries.  Does this look reasonable?
 
  I like it, and it doesn't look too far away from how others are abusing
  f_version.  Have you tried with NFS?  I don't think it'll hurt, but NFS
  loves to surprise me.
  
  Mm, no, I hadn't.  I'll give it a go tomorrow.  What could go wrong? :)
 
 If we can not use f_version, we can use private_data. I think this variant is
 safe.

private_data is used within the ioctl user transactions, so a
readdir(mountpoint) with a user transaction running can break it.

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] fix INT_MAX readdir hang, plus cleanups

2013-06-05 Thread Miao Xie
On  wed, 5 Jun 2013 15:36:36 +0200, David Sterba wrote:
 On Wed, Jun 05, 2013 at 10:34:08AM +0800, Miao Xie wrote:
 On   tue, 4 Jun 2013 16:26:57 -0700, Zach Brown wrote:
 On Tue, Jun 04, 2013 at 07:16:53PM -0400, Chris Mason wrote:
 Quoting Zach Brown (2013-06-04 18:17:54)
 Hi gang,

 I finally sat down to fix that readdir hang that has been in the back
 of my mind for a while.  I *hope* that the fix is pretty simple: just
 don't manufacture a fake f_pos, I *think* we can abuse f_version as an
 indicator that we shouldn't return entries.  Does this look reasonable?

 I like it, and it doesn't look too far away from how others are abusing
 f_version.  Have you tried with NFS?  I don't think it'll hurt, but NFS
 loves to surprise me.

 Mm, no, I hadn't.  I'll give it a go tomorrow.  What could go wrong? :)

 If we can not use f_version, we can use private_data. I think this variant is
 safe.
 
 private_data is used within the ioctl user transactions, so a
 readdir(mountpoint) with a user transaction running can break it.

don't worry, we can allocate a structure to keep both transaction handle and 
the information
of readdir, just like ext3/ext4. It is a flexible way and we can extend the 
structure to keep
more information if need in the future.

Beside the above method, we also can abuse the low bits of private_data to 
indicator that
we shouldn't return entries.

Thanks
Miao

 
 david
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] fix INT_MAX readdir hang, plus cleanups

2013-06-04 Thread Chris Mason
Quoting Zach Brown (2013-06-04 18:17:54)
 Hi gang,
 
 I finally sat down to fix that readdir hang that has been in the back
 of my mind for a while.  I *hope* that the fix is pretty simple: just
 don't manufacture a fake f_pos, I *think* we can abuse f_version as an
 indicator that we shouldn't return entries.  Does this look reasonable?

I like it, and it doesn't look too far away from how others are abusing
f_version.  Have you tried with NFS?  I don't think it'll hurt, but NFS
loves to surprise me.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] fix INT_MAX readdir hang, plus cleanups

2013-06-04 Thread Zach Brown
On Tue, Jun 04, 2013 at 07:16:53PM -0400, Chris Mason wrote:
 Quoting Zach Brown (2013-06-04 18:17:54)
  Hi gang,
  
  I finally sat down to fix that readdir hang that has been in the back
  of my mind for a while.  I *hope* that the fix is pretty simple: just
  don't manufacture a fake f_pos, I *think* we can abuse f_version as an
  indicator that we shouldn't return entries.  Does this look reasonable?
 
 I like it, and it doesn't look too far away from how others are abusing
 f_version.  Have you tried with NFS?  I don't think it'll hurt, but NFS
 loves to surprise me.

Mm, no, I hadn't.  I'll give it a go tomorrow.  What could go wrong? :)

- z
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] fix INT_MAX readdir hang, plus cleanups

2013-06-04 Thread Miao Xie
On  tue, 4 Jun 2013 16:26:57 -0700, Zach Brown wrote:
 On Tue, Jun 04, 2013 at 07:16:53PM -0400, Chris Mason wrote:
 Quoting Zach Brown (2013-06-04 18:17:54)
 Hi gang,

 I finally sat down to fix that readdir hang that has been in the back
 of my mind for a while.  I *hope* that the fix is pretty simple: just
 don't manufacture a fake f_pos, I *think* we can abuse f_version as an
 indicator that we shouldn't return entries.  Does this look reasonable?

 I like it, and it doesn't look too far away from how others are abusing
 f_version.  Have you tried with NFS?  I don't think it'll hurt, but NFS
 loves to surprise me.
 
 Mm, no, I hadn't.  I'll give it a go tomorrow.  What could go wrong? :)

If we can not use f_version, we can use private_data. I think this variant is
safe.

Miao

 
 - z
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html