Re: [PATCH 3/3] ioctl_xfs_ioc_getfsmap.2: document XFS_IOC_GETFSMAP ioctl

2016-09-11 Thread Darrick J. Wong
On Sat, Sep 10, 2016 at 10:00:29AM +1000, Dave Chinner wrote:
> On Thu, Sep 08, 2016 at 11:07:16PM -0700, Darrick J. Wong wrote:
> > On Fri, Sep 09, 2016 at 09:38:06AM +1000, Dave Chinner wrote:
> > > On Tue, Aug 30, 2016 at 12:09:49PM -0700, Darrick J. Wong wrote:
> > > > > I recall for FIEMAP that some filesystems may not have files aligned
> > > > > to sector offsets, and we just used byte offsets.  Storage like
> > > > > NVDIMMs are cacheline granular, so I don't think it makes sense to
> > > > > tie this to old disk sector sizes.  Alternately, the units could be
> > > > > in terms of fs blocks as returned by statvfs.st_bsize, but mixing
> > > > > units for fmv_block, fmv_offset, fmv_length is uneeded complexity.
> > > > 
> > > > Ugh.  I'd rather just change the units to bytes rather than force all
> > > > the users to multiply things. :)
> > > 
> > > Yup, units need to be either in disk addresses (i.e. 512 byte units)
> > > or bytes. If people can't handle disk addresses (seems to be the
> > > case), the bytes it should be.
> > 
> > 
> > 
> > > > I'd much rather just add more special owner codes for any other
> > > > filesystem that has distinguishable metadata types that are not
> > > > covered by the existing OWN_ codes.  We /do/ have 2^64 possible
> > > > values, so it's not like we're going to run out.
> > > 
> > > This is diagnositc information as much as anything, just like
> > > fiemap is diagnostic information. So if we have specific type
> > > information, it needs to be reported accurately to be useful.
> > > 
> > > Hence I really don't care if the users and developers of other fs
> > > types don't understand what the special owner codes that a specific
> > > filesystem returns mean. i.e. it's not useful user information -
> > > only a tool that groks the specific filesystem is going to be able
> > > to anything useful with special owner codes. So, IMO, there's little
> > > point trying to make them generic or to even trying to define and
> > > explain them in the man page
> > 
> >  I'm ok with describing generally what each special owner code
> > means.  Maybe the manpage could be more explicit about "None of these
> > codes are useful unless you're a low level filesystem tool"?
> 
> You can add that, but it doesn't address the underlying problem.
> i.e.  that we can add/change the codes, their name, meaning, etc,
> and now there's a third party man page that is incorrect and out of
> date. It's the same problem with documenting filesystem specific
> mount options in mount(8). Better, IMO, is to simple say "refer to
> filesystem specific documentation for a description of these special
> values". e.g. refer them to the XFS Filesystem Structure
> document where this is all spelled out in enough detail to be useful
> for someone thinking that they might want to use them

We could simply put a manpage in the xfsprogs source documenting the XFS
owner codes and let other implementers make their own manpage with a
discussion of the owner codes (and whatever other quirks they have).
Sort of fragments things, but that's probably unavoidable. :)

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] ioctl_xfs_ioc_getfsmap.2: document XFS_IOC_GETFSMAP ioctl

2016-09-09 Thread Dave Chinner
On Thu, Sep 08, 2016 at 11:07:16PM -0700, Darrick J. Wong wrote:
> On Fri, Sep 09, 2016 at 09:38:06AM +1000, Dave Chinner wrote:
> > On Tue, Aug 30, 2016 at 12:09:49PM -0700, Darrick J. Wong wrote:
> > > > I recall for FIEMAP that some filesystems may not have files aligned
> > > > to sector offsets, and we just used byte offsets.  Storage like
> > > > NVDIMMs are cacheline granular, so I don't think it makes sense to
> > > > tie this to old disk sector sizes.  Alternately, the units could be
> > > > in terms of fs blocks as returned by statvfs.st_bsize, but mixing
> > > > units for fmv_block, fmv_offset, fmv_length is uneeded complexity.
> > > 
> > > Ugh.  I'd rather just change the units to bytes rather than force all
> > > the users to multiply things. :)
> > 
> > Yup, units need to be either in disk addresses (i.e. 512 byte units)
> > or bytes. If people can't handle disk addresses (seems to be the
> > case), the bytes it should be.
> 
> 
> 
> > > I'd much rather just add more special owner codes for any other
> > > filesystem that has distinguishable metadata types that are not
> > > covered by the existing OWN_ codes.  We /do/ have 2^64 possible
> > > values, so it's not like we're going to run out.
> > 
> > This is diagnositc information as much as anything, just like
> > fiemap is diagnostic information. So if we have specific type
> > information, it needs to be reported accurately to be useful.
> > 
> > Hence I really don't care if the users and developers of other fs
> > types don't understand what the special owner codes that a specific
> > filesystem returns mean. i.e. it's not useful user information -
> > only a tool that groks the specific filesystem is going to be able
> > to anything useful with special owner codes. So, IMO, there's little
> > point trying to make them generic or to even trying to define and
> > explain them in the man page
> 
>  I'm ok with describing generally what each special owner code
> means.  Maybe the manpage could be more explicit about "None of these
> codes are useful unless you're a low level filesystem tool"?

You can add that, but it doesn't address the underlying problem.
i.e.  that we can add/change the codes, their name, meaning, etc,
and now there's a third party man page that is incorrect and out of
date. It's the same problem with documenting filesystem specific
mount options in mount(8). Better, IMO, is to simple say "refer to
filesystem specific documentation for a description of these special
values". e.g. refer them to the XFS Filesystem Structure
document where this is all spelled out in enough detail to be useful
for someone thinking that they might want to use them

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] ioctl_xfs_ioc_getfsmap.2: document XFS_IOC_GETFSMAP ioctl

2016-09-09 Thread Darrick J. Wong
On Fri, Sep 09, 2016 at 09:38:06AM +1000, Dave Chinner wrote:
> On Tue, Aug 30, 2016 at 12:09:49PM -0700, Darrick J. Wong wrote:
> > > I recall for FIEMAP that some filesystems may not have files aligned
> > > to sector offsets, and we just used byte offsets.  Storage like
> > > NVDIMMs are cacheline granular, so I don't think it makes sense to
> > > tie this to old disk sector sizes.  Alternately, the units could be
> > > in terms of fs blocks as returned by statvfs.st_bsize, but mixing
> > > units for fmv_block, fmv_offset, fmv_length is uneeded complexity.
> > 
> > Ugh.  I'd rather just change the units to bytes rather than force all
> > the users to multiply things. :)
> 
> Yup, units need to be either in disk addresses (i.e. 512 byte units)
> or bytes. If people can't handle disk addresses (seems to be the
> case), the bytes it should be.



> > I'd much rather just add more special owner codes for any other
> > filesystem that has distinguishable metadata types that are not
> > covered by the existing OWN_ codes.  We /do/ have 2^64 possible
> > values, so it's not like we're going to run out.
> 
> This is diagnositc information as much as anything, just like
> fiemap is diagnostic information. So if we have specific type
> information, it needs to be reported accurately to be useful.
> 
> Hence I really don't care if the users and developers of other fs
> types don't understand what the special owner codes that a specific
> filesystem returns mean. i.e. it's not useful user information -
> only a tool that groks the specific filesystem is going to be able
> to anything useful with special owner codes. So, IMO, there's little
> point trying to make them generic or to even trying to define and
> explain them in the man page

 I'm ok with describing generally what each special owner code
means.  Maybe the manpage could be more explicit about "None of these
codes are useful unless you're a low level filesystem tool"?

> > > It seems like there are several fields in the structure that are used for
> > > only input or only output?  Does it make more sense to have one structure
> > > used only for the input request, and then the array of values returned be
> > > in a different structure?  I'm not necessarily requesting that it be 
> > > changed,
> > > but it definitely is something I noticed a few times while reading this 
> > > doc.
> > 
> > I've been thinking about rearranging this a bit, since the flags
> > handling is very awkward with the current array structure.  Each
> > rmap has its own flags; we may someday want to pass operation flags
> > into the ioctl; and we currently have one operation flag to pass back
> > to userspace.  Each of those flags can be a separate field.  I think
> > people will get confused about FMV_OF_* and FMV_HOF_* being referenced
> > in oflags, and iflags has no meaning for returned records.
> 
> Yup, that's what I initially noticed when I glanced at this. The XFS
> getbmap interface is just plain nasty, and we shouldn't be copying
> that API pattern if we can help it.

Lol ok. :)

> > So, this instead?
> > 
> > struct getfsmap_rec {
> > u32 device; /* device id */
> > u32 flags;  /* mapping flags */
> > u64 block;  /* physical addr, bytes */
> > u64 owner;  /* inode or special owner code */
> > u64 offset; /* file offset of mapping, bytes */
> > u64 length; /* length of segment, bytes */
> > u64 reserved;   /* will be set to zero */
> > }; /* 48 bytes */
> > 
> > struct getfsmap_head {
> > u32 iflags; /* none defined yet */
> > u32 oflags; /* FMV_HOF_DEV_T */
> > u32 count;  /* # entries in recs array */
> > u32 entries;/* # entries filled in (output) */
> > u64 reserved[2];/* must be zero */
> > 
> > struct getfsmap_rec keys[2]; /* low and high keys for the mapping 
> > search */
> > struct getfsmap_rec recs[0];
> > }; /* 32 bytes + 2*48 = 128 bytes */
> > 
> > #define XFS_IOC_GETFSMAP_IOWR('X', 59, struct getfsmap_head)
> > 
> > This also means that userspace can set up for the next ioctl
> > invocation with memcpy(>keys[0], >recs[head->entries - 1]).
> > 
> > Yes, I think I like this better.  Everyone else, please chime in. :)
> 
> That's pretty much the structure I was going to suggest - it matches
> the fiemap pattern. i.e control parameters are separated from record
> data. I'd dump a bit more reserved space in the structure, though;
> we've got heaps of flag space for future expansion, but if we need
> to pass new parameters into/out of the kernel we'll quickly use the
> reserved space.

I padded struct fsmap with enough reserved space to make it an even 64 bytes,
and padded struct fsmap_head so that the space before keys is 64 bytes in
length.  See v3 patch of the ioctl manpage.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> da...@fromorbit.com
--
To unsubscribe 

Re: [PATCH 3/3] ioctl_xfs_ioc_getfsmap.2: document XFS_IOC_GETFSMAP ioctl

2016-09-08 Thread Dave Chinner
On Tue, Aug 30, 2016 at 12:09:49PM -0700, Darrick J. Wong wrote:
> > I recall for FIEMAP that some filesystems may not have files aligned
> > to sector offsets, and we just used byte offsets.  Storage like
> > NVDIMMs are cacheline granular, so I don't think it makes sense to
> > tie this to old disk sector sizes.  Alternately, the units could be
> > in terms of fs blocks as returned by statvfs.st_bsize, but mixing
> > units for fmv_block, fmv_offset, fmv_length is uneeded complexity.
> 
> Ugh.  I'd rather just change the units to bytes rather than force all
> the users to multiply things. :)

Yup, units need to be either in disk addresses (i.e. 512 byte units)
or bytes. If people can't handle disk addresses (seems to be the
case), the bytes it should be.

> I'd much rather just add more special owner codes for any other
> filesystem that has distinguishable metadata types that are not
> covered by the existing OWN_ codes.  We /do/ have 2^64 possible
> values, so it's not like we're going to run out.

This is diagnositc information as much as anything, just like
fiemap is diagnostic information. So if we have specific type
information, it needs to be reported accurately to be useful.

Hence I really don't care if the users and developers of other fs
types don't understand what the special owner codes that a specific
filesystem returns mean. i.e. it's not useful user information -
only a tool that groks the specific filesystem is going to be able
to anything useful with special owner codes. So, IMO, there's little
point trying to make them generic or to even trying to define and
explain them in the man page

> > It seems like there are several fields in the structure that are used for
> > only input or only output?  Does it make more sense to have one structure
> > used only for the input request, and then the array of values returned be
> > in a different structure?  I'm not necessarily requesting that it be 
> > changed,
> > but it definitely is something I noticed a few times while reading this doc.
> 
> I've been thinking about rearranging this a bit, since the flags
> handling is very awkward with the current array structure.  Each
> rmap has its own flags; we may someday want to pass operation flags
> into the ioctl; and we currently have one operation flag to pass back
> to userspace.  Each of those flags can be a separate field.  I think
> people will get confused about FMV_OF_* and FMV_HOF_* being referenced
> in oflags, and iflags has no meaning for returned records.

Yup, that's what I initially noticed when I glanced at this. The XFS
getbmap interface is just plain nasty, and we shouldn't be copying
that API pattern if we can help it.

> So, this instead?
> 
> struct getfsmap_rec {
>   u32 device; /* device id */
>   u32 flags;  /* mapping flags */
>   u64 block;  /* physical addr, bytes */
>   u64 owner;  /* inode or special owner code */
>   u64 offset; /* file offset of mapping, bytes */
>   u64 length; /* length of segment, bytes */
>   u64 reserved;   /* will be set to zero */
> }; /* 48 bytes */
> 
> struct getfsmap_head {
>   u32 iflags; /* none defined yet */
>   u32 oflags; /* FMV_HOF_DEV_T */
>   u32 count;  /* # entries in recs array */
>   u32 entries;/* # entries filled in (output) */
>   u64 reserved[2];/* must be zero */
> 
>   struct getfsmap_rec keys[2]; /* low and high keys for the mapping 
> search */
>   struct getfsmap_rec recs[0];
> }; /* 32 bytes + 2*48 = 128 bytes */
> 
> #define XFS_IOC_GETFSMAP  _IOWR('X', 59, struct getfsmap_head)
> 
> This also means that userspace can set up for the next ioctl
> invocation with memcpy(>keys[0], >recs[head->entries - 1]).
> 
> Yes, I think I like this better.  Everyone else, please chime in. :)

That's pretty much the structure I was going to suggest - it matches
the fiemap pattern. i.e control parameters are separated from record
data. I'd dump a bit more reserved space in the structure, though;
we've got heaps of flag space for future expansion, but if we need
to pass new parameters into/out of the kernel we'll quickly use the
reserved space.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] ioctl_xfs_ioc_getfsmap.2: document XFS_IOC_GETFSMAP ioctl

2016-08-30 Thread Darrick J. Wong
[add a few more relevant lists to cc]

On Mon, Aug 29, 2016 at 03:34:11PM -0600, Andreas Dilger wrote:
> On Aug 25, 2016, at 5:26 PM, Darrick J. Wong  wrote:
> > 
> > Document the new XFS_IOC_GETFSMAP ioctl that returns the physical
> > layout of a (disk-based) filesystem.
> > 
> > Signed-off-by: Darrick J. Wong 
> > ---
> > man2/ioctl_xfs_ioc_getfsmap.2 |  294 
> > +
> > 1 file changed, 294 insertions(+)
> > create mode 100644 man2/ioctl_xfs_ioc_getfsmap.2
> > 
> > 
> > diff --git a/man2/ioctl_xfs_ioc_getfsmap.2 b/man2/ioctl_xfs_ioc_getfsmap.2
> > new file mode 100644
> > index 000..0d9ed47
> > --- /dev/null
> > +++ b/man2/ioctl_xfs_ioc_getfsmap.2
> > @@ -0,0 +1,294 @@
> > +.\" Copyright (c) 2016, Oracle.  All rights reserved.
> > +.\"
> > +.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
> > +.\" This is free documentation; you can redistribute it and/or
> > +.\" modify it under the terms of the GNU General Public License as
> > +.\" published by the Free Software Foundation; either version 2 of
> > +.\" the License, or (at your option) any later version.
> > +.\"
> > +.\" The GNU General Public License's references to "object code"
> > +.\" and "executables" are to be interpreted as the output of any
> > +.\" document formatting or typesetting system, including
> > +.\" intermediate and printed output.
> > +.\"
> > +.\" This manual is distributed in the hope that it will be useful,
> > +.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +.\" GNU General Public License for more details.
> > +.\"
> > +.\" You should have received a copy of the GNU General Public
> > +.\" License along with this manual; if not, see
> > +.\" .
> > +.\" %%%LICENSE_END
> > +.TH IOCTL-XFS_IOC_GETFSMAP 2 2016-07-20 "Linux" "Linux Programmer's Manual"
> > +.SH NAME
> > +ioctl_xfs_ioc_getfsmap \- retrieve the physical layout of the filesystem
> > +.SH SYNOPSIS
> > +.br
> > +.B #include 
> > +.br
> > +.B #include 
> > +.sp
> > +.BI "int ioctl(int " fd ", XFS_IOC_GETFSMAP, struct getfsmap * " arg );
> > +.SH DESCRIPTION
> > +This
> > +.BR ioctl (2)
> > +retrieves physical extent mappings for a filesystem.
> > +This information can be used to discover which files are mapped to a 
> > physical
> > +block, examine free space, or find known bad blocks, among other things.
> > +
> > +The sole argument to this ioctl should be an array of the following
> > +structure:
> > +.in +4n
> > +.nf
> > +
> > +struct getfsmap {
> > +   __u32   fmv_device; /* device id */
> > +   __u32   fmv_unused1;/* future use, must be zero */
> > +   __u64   fmv_block;  /* starting block */
> > +   __u64   fmv_owner;  /* owner id */
> > +   __u64   fmv_offset; /* file offset of segment */
> > +   __u64   fmv_length; /* length of segment, blocks */
> > +   __u32   fmv_oflags; /* mapping flags */
> > +   __u32   fmv_iflags; /* control flags (1st structure) */
> > +   __u32   fmv_count;  /* # of entries in array incl. input */
> > +   __u32   fmv_entries;/* # of entries filled in (output). */
> > +   __u64   fmv_unused2;/* future use, must be zero */
> > +};
> > +
> > +.fi
> > +.in
> > +The array must contain at least two elements.
> > +The first two array elements specify the lowest and highest reverse-mapping
> > +keys, respectively, for which userspace would like physical mapping
> > +information.
> > +A reverse mapping key consists of the tuple (device, block, owner, offset).
> > +The owner and offset fields are part of the key because some filesystems
> > +support sharing physical blocks between multiple files and
> > +therefore may return multiple mappings for a given physical block.
> > +
> > +.SS Fields of struct getfsmap
> > +.PP
> > +The
> > +.I fmv_device
> > +field contains a 32-bit cookie to uniquely identify the underlying storage
> > +device.
> > +If the
> > +.B FMV_HOF_DEV_T
> > +flag is set in the header's
> > +.I fmv_oflags
> > +field, this field contains a dev_t from which major and minor numbers can
> > +be extracted.
> > +If the flag is not set, this field contains a value that must be unique
> > +for each storage device.
> > +
> > +.PP
> > +The
> > +.I fmv_unused1
> > +field must be zero in the first two array elements.
> > +
> > +.PP
> > +The
> > +.I fmv_block
> > +field contains the 512-byte sector address of the extent.
> 
> Why would you use 512-byte sectors in a new interface?

I started designing XFS GETFSMAP with the intent of making it feel
familiar to anyone who'd already used the XFS GETBMAP interface.
Hence you pass in an array of struct getfsmap[N] where the start of
the array are key fields and the rest are filled out by the kernel,
and the units are 512-byte blocks.  As a result, some things