Re: [PATCH 0/6] fix INT_MAX readdir hang, plus cleanups
On Tue, Jun 04, 2013 at 06:17:54PM -0400, Zach Brown wrote: Hi gang, I finally sat down to fix that readdir hang that has been in the back of my mind for a while. I *hope* that the fix is pretty simple: just don't manufacture a fake f_pos, I *think* we can abuse f_version as an indicator that we shouldn't return entries. Does this look reasonable? We still have the problem that we can generate valid large f_pos values that can confuse 32bit userspace, but that's a different problem. I think we'll want filldir generation of EOVERFLOW like what exists for large inodes. The rest of the patches are cleanups that I saw when absorbing the code. It's all lightly tested with xfstests but it wouldn't surprise me if I missed something so review is appreciated. Thanks! One of these patches is making new entries not show up in readdir. This was discovered while running stress.sh overnight, it complained about files not matching but when they were checked the files matched. Dropping the entire series made stress.sh run fine. So I'm dropping these for the next merge window but I'll dig into it and try and figure out what was causing the problem. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] fix INT_MAX readdir hang, plus cleanups
Quoting Josef Bacik (2013-07-01 08:54:35) On Tue, Jun 04, 2013 at 06:17:54PM -0400, Zach Brown wrote: Hi gang, I finally sat down to fix that readdir hang that has been in the back of my mind for a while. I *hope* that the fix is pretty simple: just don't manufacture a fake f_pos, I *think* we can abuse f_version as an indicator that we shouldn't return entries. Does this look reasonable? We still have the problem that we can generate valid large f_pos values that can confuse 32bit userspace, but that's a different problem. I think we'll want filldir generation of EOVERFLOW like what exists for large inodes. The rest of the patches are cleanups that I saw when absorbing the code. It's all lightly tested with xfstests but it wouldn't surprise me if I missed something so review is appreciated. Thanks! One of these patches is making new entries not show up in readdir. This was discovered while running stress.sh overnight, it complained about files not matching but when they were checked the files matched. Dropping the entire series made stress.sh run fine. So I'm dropping these for the next merge window but I'll dig into it and try and figure out what was causing the problem. Unfortunately I've only triggered this on flash, and the run takes about two hours to trigger. Trying now with some extra printks to see if I can nail it down -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] fix INT_MAX readdir hang, plus cleanups
code. It's all lightly tested with xfstests but it wouldn't surprise me if I missed something so review is appreciated. *mmm, hmmm* One of these patches is making new entries not show up in readdir. This was discovered while running stress.sh overnight, it complained about files not matching but when they were checked the files matched. Dropping the entire series made stress.sh run fine. So I'm dropping these for the next merge window but I'll dig into it and try and figure out what was causing the problem. Nerts. It's got to be the delayed inode stuff. Maybe it's some unlink/recreate pattern? Is this a thing that stress.sh does? (Where's stress.sh live?) - z -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] fix INT_MAX readdir hang, plus cleanups
Quoting Zach Brown (2013-06-10 18:39:58) On Tue, Jun 04, 2013 at 04:26:57PM -0700, Zach Brown wrote: On Tue, Jun 04, 2013 at 07:16:53PM -0400, Chris Mason wrote: Quoting Zach Brown (2013-06-04 18:17:54) Hi gang, I finally sat down to fix that readdir hang that has been in the back of my mind for a while. I *hope* that the fix is pretty simple: just don't manufacture a fake f_pos, I *think* we can abuse f_version as an indicator that we shouldn't return entries. Does this look reasonable? I like it, and it doesn't look too far away from how others are abusing f_version. Have you tried with NFS? I don't think it'll hurt, but NFS loves to surprise me. Mm, no, I hadn't. I'll give it a go tomorrow. What could go wrong? :) Or a week later. Pretty close! I couldn't get NFS to break. Clients see new entries created directly in the exported btrfs and on either of noac and actime=1 client mounts. For whatever that's worth. Great. But I did find that I'd broken the case of trying to re-enable readdir results by seeking past the last entry (which happens to be the current f_pos now that we're using f_version). Here's the incremental fix against what Josef has in -next. I'm cool with either squashing or just committing it. Lets squash it in, Josef loves to rebase. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] fix INT_MAX readdir hang, plus cleanups
On Tue, Jun 04, 2013 at 04:26:57PM -0700, Zach Brown wrote: On Tue, Jun 04, 2013 at 07:16:53PM -0400, Chris Mason wrote: Quoting Zach Brown (2013-06-04 18:17:54) Hi gang, I finally sat down to fix that readdir hang that has been in the back of my mind for a while. I *hope* that the fix is pretty simple: just don't manufacture a fake f_pos, I *think* we can abuse f_version as an indicator that we shouldn't return entries. Does this look reasonable? I like it, and it doesn't look too far away from how others are abusing f_version. Have you tried with NFS? I don't think it'll hurt, but NFS loves to surprise me. Mm, no, I hadn't. I'll give it a go tomorrow. What could go wrong? :) Or a week later. Pretty close! I couldn't get NFS to break. Clients see new entries created directly in the exported btrfs and on either of noac and actime=1 client mounts. For whatever that's worth. But I did find that I'd broken the case of trying to re-enable readdir results by seeking past the last entry (which happens to be the current f_pos now that we're using f_version). Here's the incremental fix against what Josef has in -next. I'm cool with either squashing or just committing it. - z Subject: [PATCH] btrfs: reset f_version when seeking to pos Commit 63e3dfe (btrfs: fix readdir hang with offsets past INT_MAX) switched to using f_version to stop readdir results instead of setting a large f_pos. It inadvertantly changed behaviour in the case where an app specifically seeks to one past the last valid dent-d_off it has seen. Previously f_pos would have changed from the fake f_pos to this new f_pos which would let readdir return new entries. But now that it's using f_version it might not have seen new entries. generic_file_llseek() won't clear f_version if the desirned pos happens to be the current f_pos. So we add a little wrapper to notice this case and clear f_version so that entries can be seen in this case. Signed-off-by: Zach Brown z...@redhat.com --- fs/btrfs/inode.c | 19 ++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 1059c90..590c274 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4997,6 +4997,23 @@ unsigned char btrfs_filetype_table[] = { * which prevents readdir results until seek resets f_pos and f_version. */ #define BTRFS_READDIR_EOF ~0ULL +static loff_t btrfs_dir_llseek(struct file *file, loff_t offset, int whence) +{ + struct inode *inode = file-f_mapping-host; + loff_t ret; + + /* +* f_version isn't reset if a seek is attempted to the current pos. A +* caller can be trying to see more entries by seeking past the last +* entry to the current pos after creating a new entry. +*/ + mutex_lock(inode-i_mutex); + ret = generic_file_llseek(file, offset, whence); + if (ret == offset file-f_version == BTRFS_READDIR_EOF) + file-f_version = 0; + mutex_unlock(inode-i_mutex); + return ret; +} static int btrfs_real_readdir(struct file *filp, void *dirent, filldir_t filldir) @@ -8642,7 +8659,7 @@ static const struct inode_operations btrfs_dir_ro_inode_operations = { }; static const struct file_operations btrfs_dir_file_operations = { - .llseek = generic_file_llseek, + .llseek = btrfs_dir_llseek, .read = generic_read_dir, .readdir= btrfs_real_readdir, .unlocked_ioctl = btrfs_ioctl, -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] fix INT_MAX readdir hang, plus cleanups
On Thu, Jun 06, 2013 at 09:35:07AM +0800, Miao Xie wrote: Onwed, 5 Jun 2013 15:36:36 +0200, David Sterba wrote: On Wed, Jun 05, 2013 at 10:34:08AM +0800, Miao Xie wrote: On tue, 4 Jun 2013 16:26:57 -0700, Zach Brown wrote: On Tue, Jun 04, 2013 at 07:16:53PM -0400, Chris Mason wrote: Quoting Zach Brown (2013-06-04 18:17:54) Hi gang, I finally sat down to fix that readdir hang that has been in the back of my mind for a while. I *hope* that the fix is pretty simple: just don't manufacture a fake f_pos, I *think* we can abuse f_version as an indicator that we shouldn't return entries. Does this look reasonable? I like it, and it doesn't look too far away from how others are abusing f_version. Have you tried with NFS? I don't think it'll hurt, but NFS loves to surprise me. Mm, no, I hadn't. I'll give it a go tomorrow. What could go wrong? :) If we can not use f_version, we can use private_data. I think this variant is safe. private_data is used within the ioctl user transactions, so a readdir(mountpoint) with a user transaction running can break it. don't worry, we can allocate a structure to keep both transaction handle and the information of readdir, just like ext3/ext4. It is a flexible way and we can extend the structure to keep more information if need in the future. Beside the above method, we also can abuse the low bits of private_data to indicator that we shouldn't return entries. Allocating a full structure for private_data sounds better than directly modifying the pointer value itself. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] fix INT_MAX readdir hang, plus cleanups
Quoting David Sterba (2013-06-06 09:55:50) On Thu, Jun 06, 2013 at 09:35:07AM +0800, Miao Xie wrote: Onwed, 5 Jun 2013 15:36:36 +0200, David Sterba wrote: On Wed, Jun 05, 2013 at 10:34:08AM +0800, Miao Xie wrote: On tue, 4 Jun 2013 16:26:57 -0700, Zach Brown wrote: On Tue, Jun 04, 2013 at 07:16:53PM -0400, Chris Mason wrote: Quoting Zach Brown (2013-06-04 18:17:54) Hi gang, I finally sat down to fix that readdir hang that has been in the back of my mind for a while. I *hope* that the fix is pretty simple: just don't manufacture a fake f_pos, I *think* we can abuse f_version as an indicator that we shouldn't return entries. Does this look reasonable? I like it, and it doesn't look too far away from how others are abusing f_version. Have you tried with NFS? I don't think it'll hurt, but NFS loves to surprise me. Mm, no, I hadn't. I'll give it a go tomorrow. What could go wrong? :) If we can not use f_version, we can use private_data. I think this variant is safe. private_data is used within the ioctl user transactions, so a readdir(mountpoint) with a user transaction running can break it. don't worry, we can allocate a structure to keep both transaction handle and the information of readdir, just like ext3/ext4. It is a flexible way and we can extend the structure to keep more information if need in the future. Beside the above method, we also can abuse the low bits of private_data to indicator that we shouldn't return entries. Allocating a full structure for private_data sounds better than directly modifying the pointer value itself. I'd actually rather tag the pointers than go through kmalloc, we just need one bit (maybe that really just shows how badly we've corrupted poor Miao). But, we're not there yet, I think Zach's initial patch will work fine. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] fix INT_MAX readdir hang, plus cleanups
On Wed, Jun 05, 2013 at 10:34:08AM +0800, Miao Xie wrote: Ontue, 4 Jun 2013 16:26:57 -0700, Zach Brown wrote: On Tue, Jun 04, 2013 at 07:16:53PM -0400, Chris Mason wrote: Quoting Zach Brown (2013-06-04 18:17:54) Hi gang, I finally sat down to fix that readdir hang that has been in the back of my mind for a while. I *hope* that the fix is pretty simple: just don't manufacture a fake f_pos, I *think* we can abuse f_version as an indicator that we shouldn't return entries. Does this look reasonable? I like it, and it doesn't look too far away from how others are abusing f_version. Have you tried with NFS? I don't think it'll hurt, but NFS loves to surprise me. Mm, no, I hadn't. I'll give it a go tomorrow. What could go wrong? :) If we can not use f_version, we can use private_data. I think this variant is safe. private_data is used within the ioctl user transactions, so a readdir(mountpoint) with a user transaction running can break it. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] fix INT_MAX readdir hang, plus cleanups
On wed, 5 Jun 2013 15:36:36 +0200, David Sterba wrote: On Wed, Jun 05, 2013 at 10:34:08AM +0800, Miao Xie wrote: On tue, 4 Jun 2013 16:26:57 -0700, Zach Brown wrote: On Tue, Jun 04, 2013 at 07:16:53PM -0400, Chris Mason wrote: Quoting Zach Brown (2013-06-04 18:17:54) Hi gang, I finally sat down to fix that readdir hang that has been in the back of my mind for a while. I *hope* that the fix is pretty simple: just don't manufacture a fake f_pos, I *think* we can abuse f_version as an indicator that we shouldn't return entries. Does this look reasonable? I like it, and it doesn't look too far away from how others are abusing f_version. Have you tried with NFS? I don't think it'll hurt, but NFS loves to surprise me. Mm, no, I hadn't. I'll give it a go tomorrow. What could go wrong? :) If we can not use f_version, we can use private_data. I think this variant is safe. private_data is used within the ioctl user transactions, so a readdir(mountpoint) with a user transaction running can break it. don't worry, we can allocate a structure to keep both transaction handle and the information of readdir, just like ext3/ext4. It is a flexible way and we can extend the structure to keep more information if need in the future. Beside the above method, we also can abuse the low bits of private_data to indicator that we shouldn't return entries. Thanks Miao david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] fix INT_MAX readdir hang, plus cleanups
Quoting Zach Brown (2013-06-04 18:17:54) Hi gang, I finally sat down to fix that readdir hang that has been in the back of my mind for a while. I *hope* that the fix is pretty simple: just don't manufacture a fake f_pos, I *think* we can abuse f_version as an indicator that we shouldn't return entries. Does this look reasonable? I like it, and it doesn't look too far away from how others are abusing f_version. Have you tried with NFS? I don't think it'll hurt, but NFS loves to surprise me. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] fix INT_MAX readdir hang, plus cleanups
On Tue, Jun 04, 2013 at 07:16:53PM -0400, Chris Mason wrote: Quoting Zach Brown (2013-06-04 18:17:54) Hi gang, I finally sat down to fix that readdir hang that has been in the back of my mind for a while. I *hope* that the fix is pretty simple: just don't manufacture a fake f_pos, I *think* we can abuse f_version as an indicator that we shouldn't return entries. Does this look reasonable? I like it, and it doesn't look too far away from how others are abusing f_version. Have you tried with NFS? I don't think it'll hurt, but NFS loves to surprise me. Mm, no, I hadn't. I'll give it a go tomorrow. What could go wrong? :) - z -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] fix INT_MAX readdir hang, plus cleanups
On tue, 4 Jun 2013 16:26:57 -0700, Zach Brown wrote: On Tue, Jun 04, 2013 at 07:16:53PM -0400, Chris Mason wrote: Quoting Zach Brown (2013-06-04 18:17:54) Hi gang, I finally sat down to fix that readdir hang that has been in the back of my mind for a while. I *hope* that the fix is pretty simple: just don't manufacture a fake f_pos, I *think* we can abuse f_version as an indicator that we shouldn't return entries. Does this look reasonable? I like it, and it doesn't look too far away from how others are abusing f_version. Have you tried with NFS? I don't think it'll hurt, but NFS loves to surprise me. Mm, no, I hadn't. I'll give it a go tomorrow. What could go wrong? :) If we can not use f_version, we can use private_data. I think this variant is safe. Miao - z -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html