Re: [PATCH][RESEND] vfs: allow /proc/PID/maps to get device from stat

2013-09-10 Thread Josef Bacik
On Tue, Sep 10, 2013 at 08:36:55AM -0700, Mark Fasheh wrote:
 On Mon, Aug 12, 2013 at 04:47:52AM -0700, Christoph Hellwig wrote:
  On Thu, Aug 08, 2013 at 11:44:54AM -0400, Josef Bacik wrote:
   On Thu, Aug 08, 2013 at 06:48:05AM -0700, Christoph Hellwig wrote:
On Thu, Aug 08, 2013 at 09:02:07AM -0400, Josef Bacik wrote:
 This won't work, try having 1 subvolumes with dirty inodes and do 
 sync then
 go skiing, you'll have time :).  Thanks,

Why would the dirty inodes make any difference?  If you share the bdi
between the subvolumes the sync workflow should be exactly the same
still.

   
   If we could dis-entangle vfsmounts from sb's and have it so you could have
   multiple vfsmounts with just one sb that would solve at least the 
   in-kernel
   confusion, but I think we still have the userspace confusion.  Thanks,
  
  I think it would mostly solve userspace confusion, as userspace only
  sees mounts and the device names.
  
  But please fix this up properly instead of propagating the effects of
  the nasty btrfs hack that should never have been merged in that form
  further up the stack.
 
 Can one of you explain how this solves the problem that userspace is getting
 different devices for the same inode?
 
 Seriously, I've been looking into it and I'm a bit lost. I followed the
 converstaion until here but I don't see how any of the proposed changes
 actually *fix* anything? Also, what is the relationship between vfsmounts
 and sb today? Wouldn't a bind mount produce the situation of more than 1
 vfsmount per sb that is described above?
 
 Sincerely, someone who would like to fix this ABI breakage that has been
 going on for years.

And let me restate the problem so we're all on the same page.

Btrfs has subvolumes, completely separate trees within the file system.  These
trees get their own object numbering, which in turn is how we do our inode
numbers.  So if you have multiple subvolumes, they will likely have the same
inode numbers within the same file system.  This screws up things like rsync
which say hey look, these two inodes are the same, lets skip them.  So we have
an anonymous dev so we can make them look different.

Now if we were to make each subvol its own vfsmount (essentially a bind mount)
and remove the anonymous device that wouldn't fix the problem _at all_.  The
file system would appear to be the same to rsync and it wouldn't back stuff up.
So we still need some way of telling userspace that this object is different.

I'm not convinced vfsmounts is the way to do this, it doesn't do anything other
than add a whole lot of complexity to our mounting/subvolume mechanism that is
already relatively complex.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RESEND] vfs: allow /proc/PID/maps to get device from stat

2013-09-10 Thread Mark Fasheh
On Mon, Aug 12, 2013 at 04:47:52AM -0700, Christoph Hellwig wrote:
 On Thu, Aug 08, 2013 at 11:44:54AM -0400, Josef Bacik wrote:
  On Thu, Aug 08, 2013 at 06:48:05AM -0700, Christoph Hellwig wrote:
   On Thu, Aug 08, 2013 at 09:02:07AM -0400, Josef Bacik wrote:
This won't work, try having 1 subvolumes with dirty inodes and do 
sync then
go skiing, you'll have time :).  Thanks,
   
   Why would the dirty inodes make any difference?  If you share the bdi
   between the subvolumes the sync workflow should be exactly the same
   still.
   
  
  If we could dis-entangle vfsmounts from sb's and have it so you could have
  multiple vfsmounts with just one sb that would solve at least the in-kernel
  confusion, but I think we still have the userspace confusion.  Thanks,
 
 I think it would mostly solve userspace confusion, as userspace only
 sees mounts and the device names.
 
 But please fix this up properly instead of propagating the effects of
 the nasty btrfs hack that should never have been merged in that form
 further up the stack.

Can one of you explain how this solves the problem that userspace is getting
different devices for the same inode?

Seriously, I've been looking into it and I'm a bit lost. I followed the
converstaion until here but I don't see how any of the proposed changes
actually *fix* anything? Also, what is the relationship between vfsmounts
and sb today? Wouldn't a bind mount produce the situation of more than 1
vfsmount per sb that is described above?

Sincerely, someone who would like to fix this ABI breakage that has been
going on for years.
--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RESEND] vfs: allow /proc/PID/maps to get device from stat

2013-09-10 Thread Jeff Mahoney
On 9/10/13 11:56 AM, Josef Bacik wrote:
 On Tue, Sep 10, 2013 at 08:36:55AM -0700, Mark Fasheh wrote:
 On Mon, Aug 12, 2013 at 04:47:52AM -0700, Christoph Hellwig wrote:
 On Thu, Aug 08, 2013 at 11:44:54AM -0400, Josef Bacik wrote:
 On Thu, Aug 08, 2013 at 06:48:05AM -0700, Christoph Hellwig wrote:
 On Thu, Aug 08, 2013 at 09:02:07AM -0400, Josef Bacik wrote:
 This won't work, try having 1 subvolumes with dirty inodes and do 
 sync then
 go skiing, you'll have time :).  Thanks,

 Why would the dirty inodes make any difference?  If you share the bdi
 between the subvolumes the sync workflow should be exactly the same
 still.


 If we could dis-entangle vfsmounts from sb's and have it so you could have
 multiple vfsmounts with just one sb that would solve at least the in-kernel
 confusion, but I think we still have the userspace confusion.  Thanks,

 I think it would mostly solve userspace confusion, as userspace only
 sees mounts and the device names.

 But please fix this up properly instead of propagating the effects of
 the nasty btrfs hack that should never have been merged in that form
 further up the stack.

 Can one of you explain how this solves the problem that userspace is getting
 different devices for the same inode?

 Seriously, I've been looking into it and I'm a bit lost. I followed the
 converstaion until here but I don't see how any of the proposed changes
 actually *fix* anything? Also, what is the relationship between vfsmounts
 and sb today? Wouldn't a bind mount produce the situation of more than 1
 vfsmount per sb that is described above?

 Sincerely, someone who would like to fix this ABI breakage that has been
 going on for years.
 
 And let me restate the problem so we're all on the same page.
 
 Btrfs has subvolumes, completely separate trees within the file system.  These
 trees get their own object numbering, which in turn is how we do our inode
 numbers.  So if you have multiple subvolumes, they will likely have the same
 inode numbers within the same file system.  This screws up things like rsync
 which say hey look, these two inodes are the same, lets skip them.  So we 
 have
 an anonymous dev so we can make them look different.
 
 Now if we were to make each subvol its own vfsmount (essentially a bind mount)
 and remove the anonymous device that wouldn't fix the problem _at all_.  The
 file system would appear to be the same to rsync and it wouldn't back stuff 
 up.
 So we still need some way of telling userspace that this object is different.
 
 I'm not convinced vfsmounts is the way to do this, it doesn't do anything 
 other
 than add a whole lot of complexity to our mounting/subvolume mechanism that is
 already relatively complex.  Thanks,

Agreed. It's hugely wasteful as well. We can have thousands of
subvolumes even on modest systems like workstations when automated
snapshots are involved. Using a vfsmount for each subvolume would make
/proc/mounts pretty useless. Having a separate superblock for each one,
at 1k a pop, would waste a ton of memory considering that they'll be
identical except for the dev_t.

The only way vfsmounts would work is if we added a dev_t there, which
would usually be set to -mnt_sb-s_dev except for the btrfs case. That
still doesn't solve the polluted /proc/mounts, though.

-Jeff

-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature


Re: [PATCH][RESEND] vfs: allow /proc/PID/maps to get device from stat

2013-08-12 Thread Christoph Hellwig
On Thu, Aug 08, 2013 at 11:44:54AM -0400, Josef Bacik wrote:
 On Thu, Aug 08, 2013 at 06:48:05AM -0700, Christoph Hellwig wrote:
  On Thu, Aug 08, 2013 at 09:02:07AM -0400, Josef Bacik wrote:
   This won't work, try having 1 subvolumes with dirty inodes and do 
   sync then
   go skiing, you'll have time :).  Thanks,
  
  Why would the dirty inodes make any difference?  If you share the bdi
  between the subvolumes the sync workflow should be exactly the same
  still.
  
 
 If we could dis-entangle vfsmounts from sb's and have it so you could have
 multiple vfsmounts with just one sb that would solve at least the in-kernel
 confusion, but I think we still have the userspace confusion.  Thanks,

I think it would mostly solve userspace confusion, as userspace only
sees mounts and the device names.

But please fix this up properly instead of propagating the effects of
the nasty btrfs hack that should never have been merged in that form
further up the stack.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RESEND] vfs: allow /proc/PID/maps to get device from stat

2013-08-08 Thread Christoph Hellwig
On Wed, Aug 07, 2013 at 04:51:46PM -0400, Josef Bacik wrote:
 Not possible, this will break other things as subvolumes have their own inode
 space, it will confuse applications that get multiples of an inode number for
 different devices with the same st_dev.  Each subvolume has it's own anonymous
 dev to segregate things.  Thanks,

Yes, it's the same old issue of btrfs volumes misbehaving, and the
solution is still the same as 5 years ago: make sure each subvolume
has it's own sb, vfsmount and gets automounted, similar to what nfs4
does for this case.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RESEND] vfs: allow /proc/PID/maps to get device from stat

2013-08-08 Thread Josef Bacik
On Thu, Aug 08, 2013 at 05:13:49AM -0700, Christoph Hellwig wrote:
 On Wed, Aug 07, 2013 at 04:51:46PM -0400, Josef Bacik wrote:
  Not possible, this will break other things as subvolumes have their own 
  inode
  space, it will confuse applications that get multiples of an inode number 
  for
  different devices with the same st_dev.  Each subvolume has it's own 
  anonymous
  dev to segregate things.  Thanks,
 
 Yes, it's the same old issue of btrfs volumes misbehaving, and the
 solution is still the same as 5 years ago: make sure each subvolume
 has it's own sb, vfsmount and gets automounted, similar to what nfs4
 does for this case.

This won't work, try having 1 subvolumes with dirty inodes and do sync then
go skiing, you'll have time :).  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RESEND] vfs: allow /proc/PID/maps to get device from stat

2013-08-08 Thread Christoph Hellwig
On Thu, Aug 08, 2013 at 09:02:07AM -0400, Josef Bacik wrote:
 This won't work, try having 1 subvolumes with dirty inodes and do sync 
 then
 go skiing, you'll have time :).  Thanks,

Why would the dirty inodes make any difference?  If you share the bdi
between the subvolumes the sync workflow should be exactly the same
still.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RESEND] vfs: allow /proc/PID/maps to get device from stat

2013-08-08 Thread Josef Bacik
On Thu, Aug 08, 2013 at 06:48:05AM -0700, Christoph Hellwig wrote:
 On Thu, Aug 08, 2013 at 09:02:07AM -0400, Josef Bacik wrote:
  This won't work, try having 1 subvolumes with dirty inodes and do sync 
  then
  go skiing, you'll have time :).  Thanks,
 
 Why would the dirty inodes make any difference?  If you share the bdi
 between the subvolumes the sync workflow should be exactly the same
 still.
 

The inodes are in the per-sb list, so we may start all the writing but we don't
wait all at once, so in the case of btrfs we will write all the dirty inodes,
and then wait on the ones in whatever sb we have, and then sync, which will
commit the transaction.  Then we go to the next sb and wait on those inodes
which will dirty metadata which means we'll have another transaction and we'll
commit the transaction and so on and so forth.  This means we write the
superblock 1 times for one sync when we could have just done it once.

Now we could probably get around this by having -sync_fs wait itself for all of
the inodes to complete and then commit the transaction once, but we're still
going to get called the 9 times for the same damned file system that has
already had everything done.

And this is just one example, IIRC there were a few other issues that popped up
because we assume sb == completely separate file system, freeze I think is one
of those things.  I'm sure there were other ones but the last time I tried to do
this was 2010/2011 and many brain cells have died since then.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RESEND] vfs: allow /proc/PID/maps to get device from stat

2013-08-08 Thread Josef Bacik
On Thu, Aug 08, 2013 at 06:48:05AM -0700, Christoph Hellwig wrote:
 On Thu, Aug 08, 2013 at 09:02:07AM -0400, Josef Bacik wrote:
  This won't work, try having 1 subvolumes with dirty inodes and do sync 
  then
  go skiing, you'll have time :).  Thanks,
 
 Why would the dirty inodes make any difference?  If you share the bdi
 between the subvolumes the sync workflow should be exactly the same
 still.
 

If we could dis-entangle vfsmounts from sb's and have it so you could have
multiple vfsmounts with just one sb that would solve at least the in-kernel
confusion, but I think we still have the userspace confusion.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RESEND] vfs: allow /proc/PID/maps to get device from stat

2013-08-07 Thread Christoph Hellwig
On Wed, Aug 07, 2013 at 12:57:18PM -0700, Mark Fasheh wrote:
 stat(2) on btrfs returns a custom device, but proc uses s_dev from the super
 block. This causes problems (abi breakage) because software (and users) are
 not expecting the kernel to return different devices from these calls.

So fix stat on btrfs to return the proper device instead.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RESEND] vfs: allow /proc/PID/maps to get device from stat

2013-08-07 Thread Josef Bacik
On Wed, Aug 07, 2013 at 01:18:26PM -0700, Christoph Hellwig wrote:
 On Wed, Aug 07, 2013 at 12:57:18PM -0700, Mark Fasheh wrote:
  stat(2) on btrfs returns a custom device, but proc uses s_dev from the super
  block. This causes problems (abi breakage) because software (and users) are
  not expecting the kernel to return different devices from these calls.
 
 So fix stat on btrfs to return the proper device instead.
 

Not possible, this will break other things as subvolumes have their own inode
space, it will confuse applications that get multiples of an inode number for
different devices with the same st_dev.  Each subvolume has it's own anonymous
dev to segregate things.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html