Re: [PATCH v2 2/5] fs, xfs: introduce FALLOC_FL_SEAL_BLOCK_MAP

2017-08-04 Thread Dave Chinner
On Fri, Aug 04, 2017 at 04:43:50PM -0700, Dan Williams wrote:
> On Fri, Aug 4, 2017 at 4:31 PM, Dave Chinner  wrote:
> > On Thu, Aug 03, 2017 at 07:28:17PM -0700, Dan Williams wrote:
> >> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> >> index fe0f8f7f4bb7..46d8eb9e19fc 100644
> >> --- a/fs/xfs/xfs_bmap_util.c
> >> +++ b/fs/xfs/xfs_bmap_util.c
> >> @@ -1393,6 +1393,107 @@ xfs_zero_file_space(
> >>
> >>  }
> >>
> >> +/* Return 1 if hole detected, 0 if not, and < 0 if fail to determine */
> >> +STATIC int
> >> +xfs_file_has_holes(
> >> + struct xfs_inode*ip)
> >> +{
> >
> > Why do we need this function?
> >
> > We've just run xfs_alloc_file_space() across the entire range we
> > are sealing, so we've already guaranteed that it won't have holes
> > in it.
> 
> I'm sure this is due to my ignorance of the scope of XFS_IOLOCK_EXCL
> vs XFS_ILOCK_EXCL. I had assumed that since we drop and retake
> XFS_ILOCK_EXCL that we need to re-validate the block map before
> setting S_IOMAP_IMMUTABLE.

THe ILOCK is there to protect the inode metadata when there is
concurrent access through the IO/MMAP lock paths.  However, if we
hold the IOLOCK_EXCL and the MMAPLOCK_EXCL, then nothing can get
through the IO interfaces to modify the data in the file.  This is
required because APIs that directly modify the extent map (e.g.
fallocate, truncate, etc) have to lock out the IO path to ensure
there are no IOs in flight across the range we are manipulating.

Holding these locks also locks out other APIs that modify the extent
map and so effectively nothing else can be accessing or modifying
the extent map while a fallocate or truncate operation is in
progress.

Cheers,

Dave.

-- 
Dave Chinner
da...@fromorbit.com
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2 2/5] fs, xfs: introduce FALLOC_FL_SEAL_BLOCK_MAP

2017-08-04 Thread Dan Williams
On Fri, Aug 4, 2017 at 4:31 PM, Dave Chinner  wrote:
> On Thu, Aug 03, 2017 at 07:28:17PM -0700, Dan Williams wrote:
>> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
>> index fe0f8f7f4bb7..46d8eb9e19fc 100644
>> --- a/fs/xfs/xfs_bmap_util.c
>> +++ b/fs/xfs/xfs_bmap_util.c
>> @@ -1393,6 +1393,107 @@ xfs_zero_file_space(
>>
>>  }
>>
>> +/* Return 1 if hole detected, 0 if not, and < 0 if fail to determine */
>> +STATIC int
>> +xfs_file_has_holes(
>> + struct xfs_inode*ip)
>> +{
>
> Why do we need this function?
>
> We've just run xfs_alloc_file_space() across the entire range we
> are sealing, so we've already guaranteed that it won't have holes
> in it.

I'm sure this is due to my ignorance of the scope of XFS_IOLOCK_EXCL
vs XFS_ILOCK_EXCL. I had assumed that since we drop and retake
XFS_ILOCK_EXCL that we need to re-validate the block map before
setting S_IOMAP_IMMUTABLE.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2 2/5] fs, xfs: introduce FALLOC_FL_SEAL_BLOCK_MAP

2017-08-04 Thread Dave Chinner
On Thu, Aug 03, 2017 at 07:28:17PM -0700, Dan Williams wrote:
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index fe0f8f7f4bb7..46d8eb9e19fc 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -1393,6 +1393,107 @@ xfs_zero_file_space(
>  
>  }
>  
> +/* Return 1 if hole detected, 0 if not, and < 0 if fail to determine */
> +STATIC int
> +xfs_file_has_holes(
> + struct xfs_inode*ip)
> +{

Why do we need this function?

We've just run xfs_alloc_file_space() across the entire range we
are sealing, so we've already guaranteed that it won't have holes
in it.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v2 2/5] fs, xfs: introduce FALLOC_FL_SEAL_BLOCK_MAP

2017-08-04 Thread Dan Williams
On Fri, Aug 4, 2017 at 12:46 PM, Darrick J. Wong
 wrote:
> On Thu, Aug 03, 2017 at 07:28:17PM -0700, Dan Williams wrote:
>> >From falloc.h:
>>
>> FALLOC_FL_SEAL_BLOCK_MAP is used to seal (make immutable) all of the
>> file logical-to-physical extent offset mappings in the file. The
>> purpose is to allow an application to assume that there are no holes
>> or shared extents in the file and that the metadata needed to find
>> all the physical extents of the file is stable and can never be
>> dirtied.
>>
>> For now this patch only permits setting the in-memory state of
>> S_IOMAP_IMMMUTABLE. Support for clearing and persisting the state is
>> saved for later patches.
>>
>> The implementation is careful to not allow the immutable state to change
>> while any process might have any established mappings. It reuses the
>> existing xfs_reflink_unshare() and xfs_alloc_file_space() to unshare
>> extents and fill all holes in the file. It then holds XFS_ILOCK_EXCL
>> while it validates the file is in the proper state and sets
>> S_IOMAP_IMMUTABLE.
>>
>> Cc: Jan Kara 
>> Cc: Jeff Moyer 
>> Cc: Christoph Hellwig 
>> Cc: Ross Zwisler 
>> Cc: Alexander Viro 
>> Suggested-by: Dave Chinner 
>> Suggested-by: "Darrick J. Wong" 
>> Signed-off-by: Dan Williams 
>> ---
>>  fs/open.c   |   11 +
>>  fs/xfs/xfs_bmap_util.c  |  101 
>> +++
>>  fs/xfs/xfs_bmap_util.h  |2 +
>>  fs/xfs/xfs_file.c   |   14 --
>>  include/linux/falloc.h  |3 +
>>  include/uapi/linux/falloc.h |   19 
>>  6 files changed, 145 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/open.c b/fs/open.c
>> index 7395860d7164..e3aae59785ae 100644
>> --- a/fs/open.c
>> +++ b/fs/open.c
>> @@ -273,6 +273,17 @@ int vfs_fallocate(struct file *file, int mode, loff_t 
>> offset, loff_t len)
>>   (mode & ~(FALLOC_FL_UNSHARE_RANGE | FALLOC_FL_KEEP_SIZE)))
>>   return -EINVAL;
>>
>> + /*
>> +  * Seal block map operation should only be used exclusively, and
>> +  * with the IMMUTABLE capability.
>> +  */
>> + if (mode & FALLOC_FL_SEAL_BLOCK_MAP) {
>> + if (!capable(CAP_LINUX_IMMUTABLE))
>> + return -EPERM;
>> + if (mode & ~FALLOC_FL_SEAL_BLOCK_MAP)
>> + return -EINVAL;
>> + }
>> +
>>   if (!(file->f_mode & FMODE_WRITE))
>>   return -EBADF;
>>
>> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
>> index fe0f8f7f4bb7..46d8eb9e19fc 100644
>> --- a/fs/xfs/xfs_bmap_util.c
>> +++ b/fs/xfs/xfs_bmap_util.c
>> @@ -1393,6 +1393,107 @@ xfs_zero_file_space(
>>
>>  }
>>
>> +/* Return 1 if hole detected, 0 if not, and < 0 if fail to determine */
>> +STATIC int
>> +xfs_file_has_holes(
>> + struct xfs_inode*ip)
>> +{
>> + struct xfs_mount*mp = ip->i_mount;
>> + struct xfs_bmbt_irec*map;
>> + const int   map_size = 10;  /* constrain memory overhead */
>> + int i, nmaps;
>> + int error = 0;
>> + xfs_fileoff_t   lblkno = 0;
>> + xfs_filblks_t   maxlblkcnt;
>> +
>> + map = kmem_alloc(map_size * sizeof(*map), KM_SLEEP);
>
> Sleeping with an inode fully locked and (eventually) a running
> transaction?  Yikes.
>
> Just allocate one xfs_bmbt_irec on the stack and pass in nmaps=1.
>
> This method might fit better in libxfs/xfs_bmap.c where it'll be
> able to scan the extent list more quickly with the iext helpers.

Ok, I'll take a look.

>
>> +
>> + maxlblkcnt = XFS_B_TO_FSB(mp, i_size_read(VFS_I(ip)));
>> + do {
>> + nmaps = map_size;
>> + error = xfs_bmapi_read(ip, lblkno, maxlblkcnt - lblkno,
>> +map, , 0);
>> + if (error)
>> + break;
>> +
>> + ASSERT(nmaps <= map_size);
>> + for (i = 0; i < nmaps; i++) {
>> + lblkno += map[i].br_blockcount;
>> + if (map[i].br_startblock == HOLESTARTBLOCK) {
>
> I think we also need to check for unwritten extents here, because a
> write to an unwritten block requires some zeroing and a mapping metdata
> update.

Will do.

>
>> + error = 1;
>> + break;
>> + }
>> + }
>> + } while (nmaps > 0 && error == 0);
>> +
>> + kmem_free(map);
>> + return error;
>> +}
>> +
>> +int
>> +xfs_seal_file_space(
>> + struct xfs_inode*ip,
>> + xfs_off_t   offset,
>> + xfs_off_t   len)
>> +{
>> + struct inode*inode = VFS_I(ip);
>> + struct address_space*mapping = inode->i_mapping;
>> + 

Re: [PATCH v2 2/5] fs, xfs: introduce FALLOC_FL_SEAL_BLOCK_MAP

2017-08-04 Thread Darrick J. Wong
On Thu, Aug 03, 2017 at 07:28:17PM -0700, Dan Williams wrote:
> >From falloc.h:
> 
> FALLOC_FL_SEAL_BLOCK_MAP is used to seal (make immutable) all of the
> file logical-to-physical extent offset mappings in the file. The
> purpose is to allow an application to assume that there are no holes
> or shared extents in the file and that the metadata needed to find
> all the physical extents of the file is stable and can never be
> dirtied.
> 
> For now this patch only permits setting the in-memory state of
> S_IOMAP_IMMMUTABLE. Support for clearing and persisting the state is
> saved for later patches.
> 
> The implementation is careful to not allow the immutable state to change
> while any process might have any established mappings. It reuses the
> existing xfs_reflink_unshare() and xfs_alloc_file_space() to unshare
> extents and fill all holes in the file. It then holds XFS_ILOCK_EXCL
> while it validates the file is in the proper state and sets
> S_IOMAP_IMMUTABLE.
> 
> Cc: Jan Kara 
> Cc: Jeff Moyer 
> Cc: Christoph Hellwig 
> Cc: Ross Zwisler 
> Cc: Alexander Viro 
> Suggested-by: Dave Chinner 
> Suggested-by: "Darrick J. Wong" 
> Signed-off-by: Dan Williams 
> ---
>  fs/open.c   |   11 +
>  fs/xfs/xfs_bmap_util.c  |  101 
> +++
>  fs/xfs/xfs_bmap_util.h  |2 +
>  fs/xfs/xfs_file.c   |   14 --
>  include/linux/falloc.h  |3 +
>  include/uapi/linux/falloc.h |   19 
>  6 files changed, 145 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/open.c b/fs/open.c
> index 7395860d7164..e3aae59785ae 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -273,6 +273,17 @@ int vfs_fallocate(struct file *file, int mode, loff_t 
> offset, loff_t len)
>   (mode & ~(FALLOC_FL_UNSHARE_RANGE | FALLOC_FL_KEEP_SIZE)))
>   return -EINVAL;
>  
> + /*
> +  * Seal block map operation should only be used exclusively, and
> +  * with the IMMUTABLE capability.
> +  */
> + if (mode & FALLOC_FL_SEAL_BLOCK_MAP) {
> + if (!capable(CAP_LINUX_IMMUTABLE))
> + return -EPERM;
> + if (mode & ~FALLOC_FL_SEAL_BLOCK_MAP)
> + return -EINVAL;
> + }
> +
>   if (!(file->f_mode & FMODE_WRITE))
>   return -EBADF;
>  
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index fe0f8f7f4bb7..46d8eb9e19fc 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -1393,6 +1393,107 @@ xfs_zero_file_space(
>  
>  }
>  
> +/* Return 1 if hole detected, 0 if not, and < 0 if fail to determine */
> +STATIC int
> +xfs_file_has_holes(
> + struct xfs_inode*ip)
> +{
> + struct xfs_mount*mp = ip->i_mount;
> + struct xfs_bmbt_irec*map;
> + const int   map_size = 10;  /* constrain memory overhead */
> + int i, nmaps;
> + int error = 0;
> + xfs_fileoff_t   lblkno = 0;
> + xfs_filblks_t   maxlblkcnt;
> +
> + map = kmem_alloc(map_size * sizeof(*map), KM_SLEEP);

Sleeping with an inode fully locked and (eventually) a running
transaction?  Yikes.

Just allocate one xfs_bmbt_irec on the stack and pass in nmaps=1.

This method might fit better in libxfs/xfs_bmap.c where it'll be
able to scan the extent list more quickly with the iext helpers.

> +
> + maxlblkcnt = XFS_B_TO_FSB(mp, i_size_read(VFS_I(ip)));
> + do {
> + nmaps = map_size;
> + error = xfs_bmapi_read(ip, lblkno, maxlblkcnt - lblkno,
> +map, , 0);
> + if (error)
> + break;
> +
> + ASSERT(nmaps <= map_size);
> + for (i = 0; i < nmaps; i++) {
> + lblkno += map[i].br_blockcount;
> + if (map[i].br_startblock == HOLESTARTBLOCK) {

I think we also need to check for unwritten extents here, because a
write to an unwritten block requires some zeroing and a mapping metdata
update.

> + error = 1;
> + break;
> + }
> + }
> + } while (nmaps > 0 && error == 0);
> +
> + kmem_free(map);
> + return error;
> +}
> +
> +int
> +xfs_seal_file_space(
> + struct xfs_inode*ip,
> + xfs_off_t   offset,
> + xfs_off_t   len)
> +{
> + struct inode*inode = VFS_I(ip);
> + struct address_space*mapping = inode->i_mapping;
> + int error;
> +
> + ASSERT(xfs_isilocked(ip, XFS_MMAPLOCK_EXCL));

The IOLOCK must be held here too.

> +
> + if (offset)
> + return -EINVAL;
> +
> + error = xfs_reflink_unshare(ip, offset, len);
> +