Re: [PATCH v8 1/1] f2fs: dax: implement direct access

2017-07-26 Thread Dan Williams
On Wed, Jul 26, 2017 at 1:11 PM, Christoph Hellwig  wrote:
> On Wed, Jul 26, 2017 at 12:16:11PM -0700, Dan Williams wrote:
>> Silently turn on DAX if HMAT says its ok?
>
> Yes, absolutely.  I want my system to do the right thing by default,
> and if HMAT says bypassing the page cache is a clear advatange it
> should be the default.
>
>> I think we would instead
>> want a "-o autodax" for that case and then "-o dax" and "-o nodax" for
>> the force cases.
>
> Why?
>

I'm worried about the case where HMAT says pmem >= dram performance,
but dax semantics like disabling delayed allocation and
dirty-cacheline tracking end up hurting performance, but I guess we
can handle that on a case by case basis with targeted kernel
optimizations.

>> I think it's easier to administer than the dax mount option. If
>> someone wants dax on only in a sub-tree they can set the flag on that
>> parent directory and have a policy in dax filesystems that children
>> inherit the dax policy from the parent. That seems a better
>> administrative model than trying to get it all right globally at mount
>> time.
>
> And why exactly? If DAX is faster for file a in directory X it will
> be just as fast for a file b in directory Y.

So I want the inode setting for the pmem < dram performance case where
I know that access patterns of the application using file b in
directory Y can still yield better performance without page cache. For
example, the working set is larger than dram capacity.


Re: [PATCH v8 1/1] f2fs: dax: implement direct access

2017-07-26 Thread Dan Williams
On Wed, Jul 26, 2017 at 1:11 PM, Christoph Hellwig  wrote:
> On Wed, Jul 26, 2017 at 12:16:11PM -0700, Dan Williams wrote:
>> Silently turn on DAX if HMAT says its ok?
>
> Yes, absolutely.  I want my system to do the right thing by default,
> and if HMAT says bypassing the page cache is a clear advatange it
> should be the default.
>
>> I think we would instead
>> want a "-o autodax" for that case and then "-o dax" and "-o nodax" for
>> the force cases.
>
> Why?
>

I'm worried about the case where HMAT says pmem >= dram performance,
but dax semantics like disabling delayed allocation and
dirty-cacheline tracking end up hurting performance, but I guess we
can handle that on a case by case basis with targeted kernel
optimizations.

>> I think it's easier to administer than the dax mount option. If
>> someone wants dax on only in a sub-tree they can set the flag on that
>> parent directory and have a policy in dax filesystems that children
>> inherit the dax policy from the parent. That seems a better
>> administrative model than trying to get it all right globally at mount
>> time.
>
> And why exactly? If DAX is faster for file a in directory X it will
> be just as fast for a file b in directory Y.

So I want the inode setting for the pmem < dram performance case where
I know that access patterns of the application using file b in
directory Y can still yield better performance without page cache. For
example, the working set is larger than dram capacity.


Re: [PATCH v8 1/1] f2fs: dax: implement direct access

2017-07-26 Thread Christoph Hellwig
On Wed, Jul 26, 2017 at 12:16:11PM -0700, Dan Williams wrote:
> Silently turn on DAX if HMAT says its ok?

Yes, absolutely.  I want my system to do the right thing by default,
and if HMAT says bypassing the page cache is a clear advatange it
should be the default.

> I think we would instead
> want a "-o autodax" for that case and then "-o dax" and "-o nodax" for
> the force cases.

Why?

> I think it's easier to administer than the dax mount option. If
> someone wants dax on only in a sub-tree they can set the flag on that
> parent directory and have a policy in dax filesystems that children
> inherit the dax policy from the parent. That seems a better
> administrative model than trying to get it all right globally at mount
> time.

And why exactly? If DAX is faster for file a in directory X it will
be just as fast for a file b in directory Y.


Re: [PATCH v8 1/1] f2fs: dax: implement direct access

2017-07-26 Thread Christoph Hellwig
On Wed, Jul 26, 2017 at 12:16:11PM -0700, Dan Williams wrote:
> Silently turn on DAX if HMAT says its ok?

Yes, absolutely.  I want my system to do the right thing by default,
and if HMAT says bypassing the page cache is a clear advatange it
should be the default.

> I think we would instead
> want a "-o autodax" for that case and then "-o dax" and "-o nodax" for
> the force cases.

Why?

> I think it's easier to administer than the dax mount option. If
> someone wants dax on only in a sub-tree they can set the flag on that
> parent directory and have a policy in dax filesystems that children
> inherit the dax policy from the parent. That seems a better
> administrative model than trying to get it all right globally at mount
> time.

And why exactly? If DAX is faster for file a in directory X it will
be just as fast for a file b in directory Y.


Re: [PATCH v8 1/1] f2fs: dax: implement direct access

2017-07-26 Thread Dan Williams
On Wed, Jul 26, 2017 at 10:20 AM, Christoph Hellwig  wrote:
> On Wed, Jul 26, 2017 at 10:11:08AM -0700, Dan Williams wrote:
>> Until HMAT came along we had no data in the kernel how to pick a sane
>> default, but we could now very easily make a "if pmem performance <
>> dram, disable dax by default" policy in the kernel.
>
> I'd rather do it the other way around - if HMAT is present and
> pmem performance >= dram use dax.  Else require the explicit -o dax
> for now to enable it.  If an explicit -o nodax is specified disable
> DAX even if HMAT says it is faster.

Silently turn on DAX if HMAT says its ok? I think we would instead
want a "-o autodax" for that case and then "-o dax" and "-o nodax" for
the force cases.

>> The question for this patch is do we want to add yet another
>> filesystem that adds "-o dax" or require use of per-inode flags to
>> enable dax.
>
> Please stick to the mount option.  After spending a lot of time with
> DAX and various memory techologies I'm pretty confident that the inode
> flag is the wrong thing to do.

I think it's easier to administer than the dax mount option. If
someone wants dax on only in a sub-tree they can set the flag on that
parent directory and have a policy in dax filesystems that children
inherit the dax policy from the parent. That seems a better
administrative model than trying to get it all right globally at mount
time.


Re: [PATCH v8 1/1] f2fs: dax: implement direct access

2017-07-26 Thread Dan Williams
On Wed, Jul 26, 2017 at 10:20 AM, Christoph Hellwig  wrote:
> On Wed, Jul 26, 2017 at 10:11:08AM -0700, Dan Williams wrote:
>> Until HMAT came along we had no data in the kernel how to pick a sane
>> default, but we could now very easily make a "if pmem performance <
>> dram, disable dax by default" policy in the kernel.
>
> I'd rather do it the other way around - if HMAT is present and
> pmem performance >= dram use dax.  Else require the explicit -o dax
> for now to enable it.  If an explicit -o nodax is specified disable
> DAX even if HMAT says it is faster.

Silently turn on DAX if HMAT says its ok? I think we would instead
want a "-o autodax" for that case and then "-o dax" and "-o nodax" for
the force cases.

>> The question for this patch is do we want to add yet another
>> filesystem that adds "-o dax" or require use of per-inode flags to
>> enable dax.
>
> Please stick to the mount option.  After spending a lot of time with
> DAX and various memory techologies I'm pretty confident that the inode
> flag is the wrong thing to do.

I think it's easier to administer than the dax mount option. If
someone wants dax on only in a sub-tree they can set the flag on that
parent directory and have a policy in dax filesystems that children
inherit the dax policy from the parent. That seems a better
administrative model than trying to get it all right globally at mount
time.


Re: [PATCH v8 1/1] f2fs: dax: implement direct access

2017-07-26 Thread Christoph Hellwig
On Wed, Jul 26, 2017 at 10:11:08AM -0700, Dan Williams wrote:
> Until HMAT came along we had no data in the kernel how to pick a sane
> default, but we could now very easily make a "if pmem performance <
> dram, disable dax by default" policy in the kernel.

I'd rather do it the other way around - if HMAT is present and
pmem performance >= dram use dax.  Else require the explicit -o dax
for now to enable it.  If an explicit -o nodax is specified disable
DAX even if HMAT says it is faster.

> The question for this patch is do we want to add yet another
> filesystem that adds "-o dax" or require use of per-inode flags to
> enable dax.

Please stick to the mount option.  After spending a lot of time with
DAX and various memory techologies I'm pretty confident that the inode
flag is the wrong thing to do.


Re: [PATCH v8 1/1] f2fs: dax: implement direct access

2017-07-26 Thread Christoph Hellwig
On Wed, Jul 26, 2017 at 10:11:08AM -0700, Dan Williams wrote:
> Until HMAT came along we had no data in the kernel how to pick a sane
> default, but we could now very easily make a "if pmem performance <
> dram, disable dax by default" policy in the kernel.

I'd rather do it the other way around - if HMAT is present and
pmem performance >= dram use dax.  Else require the explicit -o dax
for now to enable it.  If an explicit -o nodax is specified disable
DAX even if HMAT says it is faster.

> The question for this patch is do we want to add yet another
> filesystem that adds "-o dax" or require use of per-inode flags to
> enable dax.

Please stick to the mount option.  After spending a lot of time with
DAX and various memory techologies I'm pretty confident that the inode
flag is the wrong thing to do.


Re: [PATCH v8 1/1] f2fs: dax: implement direct access

2017-07-26 Thread Dan Williams
On Wed, Jul 26, 2017 at 10:01 AM, Christoph Hellwig  wrote:
> On Wed, Jul 26, 2017 at 09:53:07AM -0700, Dan Williams wrote:
>> It allows for opt-in for applications, or administrators of those
>> applications, that know the type of access.
>
> That's BS.  We need to provide the best possible way to access the
> media to an application.  And whether that's DAX or the page cache
> is an implementation detail that should not matter to the application.
>
> Which doesn't mean there shouldn't be ways to override the default
> that the kernel chose based on hardware details, but it's certainly
> not something for the application to hardcode, but something for
> the adminstrator to decide.

Until HMAT came along we had no data in the kernel how to pick a sane
default, but we could now very easily make a "if pmem performance <
dram, disable dax by default" policy in the kernel.

The question for this patch is do we want to add yet another
filesystem that adds "-o dax" or require use of per-inode flags to
enable dax.


Re: [PATCH v8 1/1] f2fs: dax: implement direct access

2017-07-26 Thread Dan Williams
On Wed, Jul 26, 2017 at 10:01 AM, Christoph Hellwig  wrote:
> On Wed, Jul 26, 2017 at 09:53:07AM -0700, Dan Williams wrote:
>> It allows for opt-in for applications, or administrators of those
>> applications, that know the type of access.
>
> That's BS.  We need to provide the best possible way to access the
> media to an application.  And whether that's DAX or the page cache
> is an implementation detail that should not matter to the application.
>
> Which doesn't mean there shouldn't be ways to override the default
> that the kernel chose based on hardware details, but it's certainly
> not something for the application to hardcode, but something for
> the adminstrator to decide.

Until HMAT came along we had no data in the kernel how to pick a sane
default, but we could now very easily make a "if pmem performance <
dram, disable dax by default" policy in the kernel.

The question for this patch is do we want to add yet another
filesystem that adds "-o dax" or require use of per-inode flags to
enable dax.


Re: [PATCH v8 1/1] f2fs: dax: implement direct access

2017-07-26 Thread Christoph Hellwig
On Wed, Jul 26, 2017 at 09:53:07AM -0700, Dan Williams wrote:
> It allows for opt-in for applications, or administrators of those
> applications, that know the type of access.

That's BS.  We need to provide the best possible way to access the
media to an application.  And whether that's DAX or the page cache
is an implementation detail that should not matter to the application.

Which doesn't mean there shouldn't be ways to override the default
that the kernel chose based on hardware details, but it's certainly
not something for the application to hardcode, but something for
the adminstrator to decide.


Re: [PATCH v8 1/1] f2fs: dax: implement direct access

2017-07-26 Thread Christoph Hellwig
On Wed, Jul 26, 2017 at 09:53:07AM -0700, Dan Williams wrote:
> It allows for opt-in for applications, or administrators of those
> applications, that know the type of access.

That's BS.  We need to provide the best possible way to access the
media to an application.  And whether that's DAX or the page cache
is an implementation detail that should not matter to the application.

Which doesn't mean there shouldn't be ways to override the default
that the kernel chose based on hardware details, but it's certainly
not something for the application to hardcode, but something for
the adminstrator to decide.


Re: [PATCH v8 1/1] f2fs: dax: implement direct access

2017-07-26 Thread Dan Williams
On Wed, Jul 26, 2017 at 12:26 AM, Christoph Hellwig  wrote:
> On Tue, Jul 25, 2017 at 05:15:10PM -0700, Dan Williams wrote:
>> We're in the process of walking back and potentially deprecating the
>> use of the dax mount option for xfs and ext4 since dax can have
>> negative performance implications if page cache memory happens to be
>> faster than pmem. It should be limited to applications that
>> specifically want the semantic, not globally enabled for the entire
>> mount. xfs has went ahead and added the XFS_DIFLAG2_DAX indoe flag for
>> per-inode enabling of dax.
>>
>> I'm wondering if any new filesystem that adds dax support at this
>> point should do so with inode flags and not a mount option?
>
> That tradeoff is not one that the application should make, but one that
> should depend on the storage medium. To make things worse it might also
> depend on the type of access. E.g. with certain media it makes a lot of
> sense to cache writes in the page cache, but generally not reads.
> I've been spending some time to analyze how that could be done, but
> I've not made real progress on it.
>
> XFS_DIFLAG2_DAX is unfortunately totally unhelpful with that.

It allows for opt-in for applications, or administrators of those
applications, that know the type of access. There's also the new HMAT
(heterogeneous memory attributes table) in ACPI that can indicate the
relative performance of pmem to system-ram if userspace needs data to
make a decision. It would be interesting to have an automatic policy
in the kernel, but we also need a mechanism for explicit
configurations.


Re: [PATCH v8 1/1] f2fs: dax: implement direct access

2017-07-26 Thread Dan Williams
On Wed, Jul 26, 2017 at 12:26 AM, Christoph Hellwig  wrote:
> On Tue, Jul 25, 2017 at 05:15:10PM -0700, Dan Williams wrote:
>> We're in the process of walking back and potentially deprecating the
>> use of the dax mount option for xfs and ext4 since dax can have
>> negative performance implications if page cache memory happens to be
>> faster than pmem. It should be limited to applications that
>> specifically want the semantic, not globally enabled for the entire
>> mount. xfs has went ahead and added the XFS_DIFLAG2_DAX indoe flag for
>> per-inode enabling of dax.
>>
>> I'm wondering if any new filesystem that adds dax support at this
>> point should do so with inode flags and not a mount option?
>
> That tradeoff is not one that the application should make, but one that
> should depend on the storage medium. To make things worse it might also
> depend on the type of access. E.g. with certain media it makes a lot of
> sense to cache writes in the page cache, but generally not reads.
> I've been spending some time to analyze how that could be done, but
> I've not made real progress on it.
>
> XFS_DIFLAG2_DAX is unfortunately totally unhelpful with that.

It allows for opt-in for applications, or administrators of those
applications, that know the type of access. There's also the new HMAT
(heterogeneous memory attributes table) in ACPI that can indicate the
relative performance of pmem to system-ram if userspace needs data to
make a decision. It would be interesting to have an automatic policy
in the kernel, but we also need a mechanism for explicit
configurations.


Re: [PATCH v8 1/1] f2fs: dax: implement direct access

2017-07-26 Thread Christoph Hellwig
On Tue, Jul 25, 2017 at 05:15:10PM -0700, Dan Williams wrote:
> We're in the process of walking back and potentially deprecating the
> use of the dax mount option for xfs and ext4 since dax can have
> negative performance implications if page cache memory happens to be
> faster than pmem. It should be limited to applications that
> specifically want the semantic, not globally enabled for the entire
> mount. xfs has went ahead and added the XFS_DIFLAG2_DAX indoe flag for
> per-inode enabling of dax.
> 
> I'm wondering if any new filesystem that adds dax support at this
> point should do so with inode flags and not a mount option?

That tradeoff is not one that the application should make, but one that
should depend on the storage medium.  To make things worse it might also
depend on the type of access.  E.g. with certain media it makes a lot of
sense to cache writes in the page cache, but generally not reads.
I've been spending some time to analyze how that could be done, but
I've not made real progress on it.

XFS_DIFLAG2_DAX is unfortunately totally unhelpful with that.


Re: [PATCH v8 1/1] f2fs: dax: implement direct access

2017-07-26 Thread Christoph Hellwig
On Tue, Jul 25, 2017 at 05:15:10PM -0700, Dan Williams wrote:
> We're in the process of walking back and potentially deprecating the
> use of the dax mount option for xfs and ext4 since dax can have
> negative performance implications if page cache memory happens to be
> faster than pmem. It should be limited to applications that
> specifically want the semantic, not globally enabled for the entire
> mount. xfs has went ahead and added the XFS_DIFLAG2_DAX indoe flag for
> per-inode enabling of dax.
> 
> I'm wondering if any new filesystem that adds dax support at this
> point should do so with inode flags and not a mount option?

That tradeoff is not one that the application should make, but one that
should depend on the storage medium.  To make things worse it might also
depend on the type of access.  E.g. with certain media it makes a lot of
sense to cache writes in the page cache, but generally not reads.
I've been spending some time to analyze how that could be done, but
I've not made real progress on it.

XFS_DIFLAG2_DAX is unfortunately totally unhelpful with that.


RE: [PATCH v8 1/1] f2fs: dax: implement direct access

2017-07-26 Thread sunqiuyang
Hi, 

Considering the current interfaces of F2FS and EXT4, my thought is that we can 
define a generic user-modifiable flag FS_DAX_FL, which can be included in the 
i_flags field of [f2fs | ext4]_inode_info. Thus, DAX can be enabled in either 
of the two ways below: 

1) mount the FS with a "dax" option, so that all files created will have the 
flag S_DAX set in the VFS inode, and the flag FS_DAX_FL set in [f2fs | 
ext4]_inode_info, by default.

2) mount the FS without "dax", and enable DAX per-inode from 
f2fs_ioctl_setflags() => f2fs_set_inode_flags()
 
Thanks,


From: Jaegeuk Kim [jaeg...@kernel.org]
Sent: Wednesday, July 26, 2017 10:16
To: Dan Williams
Cc: sunqiuyang; Linux Kernel Mailing List; linux-fsdevel; 
linux-f2fs-de...@lists.sourceforge.net; linux-nvd...@lists.01.org
Subject: Re: [PATCH v8 1/1] f2fs: dax: implement direct access

Hi Dan,

On 07/25, Dan Williams wrote:
> [ adding linux-nvdimm ]
>
> On Thu, Jul 20, 2017 at 5:10 AM, sunqiuyang <sunqiuy...@huawei.com> wrote:
> > From: Qiuyang Sun <sunqiuy...@huawei.com>
> >
> > This patch implements Direct Access (DAX) in F2FS, including:
> >  - a mount option to choose whether to enable DAX or not
>
> We're in the process of walking back and potentially deprecating the
> use of the dax mount option for xfs and ext4 since dax can have
> negative performance implications if page cache memory happens to be
> faster than pmem. It should be limited to applications that
> specifically want the semantic, not globally enabled for the entire
> mount. xfs has went ahead and added the XFS_DIFLAG2_DAX indoe flag for
> per-inode enabling of dax.

Thank you so much for pointing this out. So, is there a plan to define a
generic inode flag to enable dax via inode_set_flag? Or, does each filesystem
need to handle it individually likewise xfs?

>
> I'm wondering if any new filesystem that adds dax support at this
> point should do so with inode flags and not a mount option?

Anyway, in such the case, I have to postpone merging this patch for a while.

Thanks,


RE: [PATCH v8 1/1] f2fs: dax: implement direct access

2017-07-26 Thread sunqiuyang
Hi, 

Considering the current interfaces of F2FS and EXT4, my thought is that we can 
define a generic user-modifiable flag FS_DAX_FL, which can be included in the 
i_flags field of [f2fs | ext4]_inode_info. Thus, DAX can be enabled in either 
of the two ways below: 

1) mount the FS with a "dax" option, so that all files created will have the 
flag S_DAX set in the VFS inode, and the flag FS_DAX_FL set in [f2fs | 
ext4]_inode_info, by default.

2) mount the FS without "dax", and enable DAX per-inode from 
f2fs_ioctl_setflags() => f2fs_set_inode_flags()
 
Thanks,


From: Jaegeuk Kim [jaeg...@kernel.org]
Sent: Wednesday, July 26, 2017 10:16
To: Dan Williams
Cc: sunqiuyang; Linux Kernel Mailing List; linux-fsdevel; 
linux-f2fs-de...@lists.sourceforge.net; linux-nvd...@lists.01.org
Subject: Re: [PATCH v8 1/1] f2fs: dax: implement direct access

Hi Dan,

On 07/25, Dan Williams wrote:
> [ adding linux-nvdimm ]
>
> On Thu, Jul 20, 2017 at 5:10 AM, sunqiuyang  wrote:
> > From: Qiuyang Sun 
> >
> > This patch implements Direct Access (DAX) in F2FS, including:
> >  - a mount option to choose whether to enable DAX or not
>
> We're in the process of walking back and potentially deprecating the
> use of the dax mount option for xfs and ext4 since dax can have
> negative performance implications if page cache memory happens to be
> faster than pmem. It should be limited to applications that
> specifically want the semantic, not globally enabled for the entire
> mount. xfs has went ahead and added the XFS_DIFLAG2_DAX indoe flag for
> per-inode enabling of dax.

Thank you so much for pointing this out. So, is there a plan to define a
generic inode flag to enable dax via inode_set_flag? Or, does each filesystem
need to handle it individually likewise xfs?

>
> I'm wondering if any new filesystem that adds dax support at this
> point should do so with inode flags and not a mount option?

Anyway, in such the case, I have to postpone merging this patch for a while.

Thanks,


Re: [PATCH v8 1/1] f2fs: dax: implement direct access

2017-07-25 Thread Jaegeuk Kim
Hi Dan,

On 07/25, Dan Williams wrote:
> [ adding linux-nvdimm ]
> 
> On Thu, Jul 20, 2017 at 5:10 AM, sunqiuyang  wrote:
> > From: Qiuyang Sun 
> >
> > This patch implements Direct Access (DAX) in F2FS, including:
> >  - a mount option to choose whether to enable DAX or not
> 
> We're in the process of walking back and potentially deprecating the
> use of the dax mount option for xfs and ext4 since dax can have
> negative performance implications if page cache memory happens to be
> faster than pmem. It should be limited to applications that
> specifically want the semantic, not globally enabled for the entire
> mount. xfs has went ahead and added the XFS_DIFLAG2_DAX indoe flag for
> per-inode enabling of dax.

Thank you so much for pointing this out. So, is there a plan to define a
generic inode flag to enable dax via inode_set_flag? Or, does each filesystem
need to handle it individually likewise xfs?

> 
> I'm wondering if any new filesystem that adds dax support at this
> point should do so with inode flags and not a mount option?

Anyway, in such the case, I have to postpone merging this patch for a while.

Thanks,


Re: [PATCH v8 1/1] f2fs: dax: implement direct access

2017-07-25 Thread Jaegeuk Kim
Hi Dan,

On 07/25, Dan Williams wrote:
> [ adding linux-nvdimm ]
> 
> On Thu, Jul 20, 2017 at 5:10 AM, sunqiuyang  wrote:
> > From: Qiuyang Sun 
> >
> > This patch implements Direct Access (DAX) in F2FS, including:
> >  - a mount option to choose whether to enable DAX or not
> 
> We're in the process of walking back and potentially deprecating the
> use of the dax mount option for xfs and ext4 since dax can have
> negative performance implications if page cache memory happens to be
> faster than pmem. It should be limited to applications that
> specifically want the semantic, not globally enabled for the entire
> mount. xfs has went ahead and added the XFS_DIFLAG2_DAX indoe flag for
> per-inode enabling of dax.

Thank you so much for pointing this out. So, is there a plan to define a
generic inode flag to enable dax via inode_set_flag? Or, does each filesystem
need to handle it individually likewise xfs?

> 
> I'm wondering if any new filesystem that adds dax support at this
> point should do so with inode flags and not a mount option?

Anyway, in such the case, I have to postpone merging this patch for a while.

Thanks,


Re: [PATCH v8 1/1] f2fs: dax: implement direct access

2017-07-25 Thread Dan Williams
[ adding linux-nvdimm ]

On Thu, Jul 20, 2017 at 5:10 AM, sunqiuyang  wrote:
> From: Qiuyang Sun 
>
> This patch implements Direct Access (DAX) in F2FS, including:
>  - a mount option to choose whether to enable DAX or not

We're in the process of walking back and potentially deprecating the
use of the dax mount option for xfs and ext4 since dax can have
negative performance implications if page cache memory happens to be
faster than pmem. It should be limited to applications that
specifically want the semantic, not globally enabled for the entire
mount. xfs has went ahead and added the XFS_DIFLAG2_DAX indoe flag for
per-inode enabling of dax.

I'm wondering if any new filesystem that adds dax support at this
point should do so with inode flags and not a mount option?


Re: [PATCH v8 1/1] f2fs: dax: implement direct access

2017-07-25 Thread Dan Williams
[ adding linux-nvdimm ]

On Thu, Jul 20, 2017 at 5:10 AM, sunqiuyang  wrote:
> From: Qiuyang Sun 
>
> This patch implements Direct Access (DAX) in F2FS, including:
>  - a mount option to choose whether to enable DAX or not

We're in the process of walking back and potentially deprecating the
use of the dax mount option for xfs and ext4 since dax can have
negative performance implications if page cache memory happens to be
faster than pmem. It should be limited to applications that
specifically want the semantic, not globally enabled for the entire
mount. xfs has went ahead and added the XFS_DIFLAG2_DAX indoe flag for
per-inode enabling of dax.

I'm wondering if any new filesystem that adds dax support at this
point should do so with inode flags and not a mount option?


Re: [PATCH v8 1/1] f2fs: dax: implement direct access

2017-07-24 Thread Sun Qiuyang

Hi Jaegeuk,

Below is the error message I got from this testcase:
---
write (Invalid argument) len 1024 dio [dax to nondax | both nodax]
read (Bad address) len [4096 | 16777216 | 67108864] dio dax to nondax
---
The write error is expected, as F2FS does not support unaligned direct 
IO (1024 B).


The read error is more complex. In the test script, when we mmap the src 
file (dax), the flags (VM_MIXEDMAP | VM_HUGEPAGE) are added into
vma->vm_flags. Later on, when we write to the dest file (nondax) and 
then read from it by direct IO, we will fail to get pages from this 
"special" vma.


Functions involved:
f2fs_direct_IO
blockdev_direct_IO
__blockdev_direct_IO
do_direct_IO
dio_get_page
dio_refill_pages
iov_iter_get_pages
get_user_pages_unlocked
__get_user_pages_fast
__get_user_pages_unlocked
__get_user_pages_locked
__get_user_pages
follow_page_mask
follow_p4d_mask
follow_pud_mask
follow_pmd_mask
follow_page_pte
vm_normal_page
follow_pfn_pte

In my test environment HAVE_PTE_SPECIAL is true, and vm_normal_page() 
returns NULL due to VM_MIXEDMAP in vm_flags. Then follow_page_pte() 
continue to call follow_pfn_pte(), which returns -EFAULT. This is how we 
get a "bad address" error finally.


This error also occurs in EXT4-DAX for similar reasons.

Thanks,



Hi Qiuyang,

This fails xfstests/generic/413.

Thanks,

On 07/20, sunqiuyang wrote:

From: Qiuyang Sun 

This patch implements Direct Access (DAX) in F2FS, including:
 - a mount option to choose whether to enable DAX or not
 - read/write and mmap of regular files in the DAX way
 - zero-out of unaligned partial blocks in the DAX way
 - garbage collection of DAX files, by mapping both old and new physical
   addresses of a data page into memory and copy data between them directly
 - incompatibility of DAX with inline data, atomic or volatile write,
   collapse|insert_range, etc.

Signed-off-by: Qiuyang Sun 
---
Changelog v7 -> v8:
 - Introduce the macro f2fs_dax_file() to judge if a file is DAX for cases
   when CONFIG_FS_DAX is set or not
 - Return -ENOTSUPP when an operation does not support DAX
 - In f2fs_iomap_begin(), convert the inline data of an inode (if any)
   before mapping blocks
 - Minor cleanups
---
 Documentation/filesystems/f2fs.txt |   2 +
 fs/f2fs/data.c | 132 +-
 fs/f2fs/f2fs.h |  15 +++
 fs/f2fs/file.c | 183 -
 fs/f2fs/gc.c   | 103 -
 fs/f2fs/inline.c   |   3 +
 fs/f2fs/inode.c|   8 +-
 fs/f2fs/namei.c|   5 +
 fs/f2fs/super.c|  15 +++
 9 files changed, 454 insertions(+), 12 deletions(-)

diff --git a/Documentation/filesystems/f2fs.txt 
b/Documentation/filesystems/f2fs.txt
index 273ccb2..c86c421 100644
--- a/Documentation/filesystems/f2fs.txt
+++ b/Documentation/filesystems/f2fs.txt
@@ -164,6 +164,8 @@ io_bits=%u Set the bit size of write IO 
requests. It should be set
with "mode=lfs".
 usrquota   Enable plain user disk quota accounting.
 grpquota   Enable plain group disk quota accounting.
+daxUse direct access (no page cache). See
+   Documentation/filesystems/dax.txt.

 

 DEBUGFS ENTRIES
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 87c1f41..4eb4b76 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -910,6 +910,15 @@ int f2fs_map_blocks(struct inode *inode, struct 
f2fs_map_blocks *map,
err = -EIO;
goto sync_out;
}
+   /*
+* If newly allocated blocks are to be zeroed out later,
+* a single f2fs_map_blocks must not contain both old
+* and new blocks at the same time.
+*/
+   if (flag == F2FS_GET_BLOCK_ZERO
+   && (map->m_flags & F2FS_MAP_MAPPED)
+   && !(map->m_flags & F2FS_MAP_NEW))
+   goto sync_out;
if (flag == F2FS_GET_BLOCK_PRE_AIO) {
if (blkaddr == NULL_ADDR) {
prealloc++;
@@ -938,6 +947,8 @@ int f2fs_map_blocks(struct inode *inode, struct 
f2fs_map_blocks *map,
blkaddr != NEW_ADDR)
goto sync_out;
}
+   } else if (flag == F2FS_GET_BLOCK_ZERO && map->m_flags & F2FS_MAP_NEW) {
+   goto sync_out;
}

if (flag == F2FS_GET_BLOCK_PRE_AIO)
@@ -996,6 +1007,12 @@ int f2fs_map_blocks(struct inode *inode, struct 
f2fs_map_blocks 

Re: [PATCH v8 1/1] f2fs: dax: implement direct access

2017-07-24 Thread Sun Qiuyang

Hi Jaegeuk,

Below is the error message I got from this testcase:
---
write (Invalid argument) len 1024 dio [dax to nondax | both nodax]
read (Bad address) len [4096 | 16777216 | 67108864] dio dax to nondax
---
The write error is expected, as F2FS does not support unaligned direct 
IO (1024 B).


The read error is more complex. In the test script, when we mmap the src 
file (dax), the flags (VM_MIXEDMAP | VM_HUGEPAGE) are added into
vma->vm_flags. Later on, when we write to the dest file (nondax) and 
then read from it by direct IO, we will fail to get pages from this 
"special" vma.


Functions involved:
f2fs_direct_IO
blockdev_direct_IO
__blockdev_direct_IO
do_direct_IO
dio_get_page
dio_refill_pages
iov_iter_get_pages
get_user_pages_unlocked
__get_user_pages_fast
__get_user_pages_unlocked
__get_user_pages_locked
__get_user_pages
follow_page_mask
follow_p4d_mask
follow_pud_mask
follow_pmd_mask
follow_page_pte
vm_normal_page
follow_pfn_pte

In my test environment HAVE_PTE_SPECIAL is true, and vm_normal_page() 
returns NULL due to VM_MIXEDMAP in vm_flags. Then follow_page_pte() 
continue to call follow_pfn_pte(), which returns -EFAULT. This is how we 
get a "bad address" error finally.


This error also occurs in EXT4-DAX for similar reasons.

Thanks,



Hi Qiuyang,

This fails xfstests/generic/413.

Thanks,

On 07/20, sunqiuyang wrote:

From: Qiuyang Sun 

This patch implements Direct Access (DAX) in F2FS, including:
 - a mount option to choose whether to enable DAX or not
 - read/write and mmap of regular files in the DAX way
 - zero-out of unaligned partial blocks in the DAX way
 - garbage collection of DAX files, by mapping both old and new physical
   addresses of a data page into memory and copy data between them directly
 - incompatibility of DAX with inline data, atomic or volatile write,
   collapse|insert_range, etc.

Signed-off-by: Qiuyang Sun 
---
Changelog v7 -> v8:
 - Introduce the macro f2fs_dax_file() to judge if a file is DAX for cases
   when CONFIG_FS_DAX is set or not
 - Return -ENOTSUPP when an operation does not support DAX
 - In f2fs_iomap_begin(), convert the inline data of an inode (if any)
   before mapping blocks
 - Minor cleanups
---
 Documentation/filesystems/f2fs.txt |   2 +
 fs/f2fs/data.c | 132 +-
 fs/f2fs/f2fs.h |  15 +++
 fs/f2fs/file.c | 183 -
 fs/f2fs/gc.c   | 103 -
 fs/f2fs/inline.c   |   3 +
 fs/f2fs/inode.c|   8 +-
 fs/f2fs/namei.c|   5 +
 fs/f2fs/super.c|  15 +++
 9 files changed, 454 insertions(+), 12 deletions(-)

diff --git a/Documentation/filesystems/f2fs.txt 
b/Documentation/filesystems/f2fs.txt
index 273ccb2..c86c421 100644
--- a/Documentation/filesystems/f2fs.txt
+++ b/Documentation/filesystems/f2fs.txt
@@ -164,6 +164,8 @@ io_bits=%u Set the bit size of write IO 
requests. It should be set
with "mode=lfs".
 usrquota   Enable plain user disk quota accounting.
 grpquota   Enable plain group disk quota accounting.
+daxUse direct access (no page cache). See
+   Documentation/filesystems/dax.txt.

 

 DEBUGFS ENTRIES
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 87c1f41..4eb4b76 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -910,6 +910,15 @@ int f2fs_map_blocks(struct inode *inode, struct 
f2fs_map_blocks *map,
err = -EIO;
goto sync_out;
}
+   /*
+* If newly allocated blocks are to be zeroed out later,
+* a single f2fs_map_blocks must not contain both old
+* and new blocks at the same time.
+*/
+   if (flag == F2FS_GET_BLOCK_ZERO
+   && (map->m_flags & F2FS_MAP_MAPPED)
+   && !(map->m_flags & F2FS_MAP_NEW))
+   goto sync_out;
if (flag == F2FS_GET_BLOCK_PRE_AIO) {
if (blkaddr == NULL_ADDR) {
prealloc++;
@@ -938,6 +947,8 @@ int f2fs_map_blocks(struct inode *inode, struct 
f2fs_map_blocks *map,
blkaddr != NEW_ADDR)
goto sync_out;
}
+   } else if (flag == F2FS_GET_BLOCK_ZERO && map->m_flags & F2FS_MAP_NEW) {
+   goto sync_out;
}

if (flag == F2FS_GET_BLOCK_PRE_AIO)
@@ -996,6 +1007,12 @@ int f2fs_map_blocks(struct inode *inode, struct 
f2fs_map_blocks *map,
goto next_dnode;

 sync_out:
+   

Re: [PATCH v8 1/1] f2fs: dax: implement direct access

2017-07-21 Thread Jaegeuk Kim
Hi Qiuyang,

This fails xfstests/generic/413.

Thanks,

On 07/20, sunqiuyang wrote:
> From: Qiuyang Sun 
> 
> This patch implements Direct Access (DAX) in F2FS, including:
>  - a mount option to choose whether to enable DAX or not
>  - read/write and mmap of regular files in the DAX way
>  - zero-out of unaligned partial blocks in the DAX way
>  - garbage collection of DAX files, by mapping both old and new physical
>addresses of a data page into memory and copy data between them directly
>  - incompatibility of DAX with inline data, atomic or volatile write, 
>collapse|insert_range, etc.
> 
> Signed-off-by: Qiuyang Sun 
> ---
> Changelog v7 -> v8:
>  - Introduce the macro f2fs_dax_file() to judge if a file is DAX for cases
>when CONFIG_FS_DAX is set or not
>  - Return -ENOTSUPP when an operation does not support DAX
>  - In f2fs_iomap_begin(), convert the inline data of an inode (if any) 
>before mapping blocks
>  - Minor cleanups
> ---
>  Documentation/filesystems/f2fs.txt |   2 +
>  fs/f2fs/data.c | 132 +-
>  fs/f2fs/f2fs.h |  15 +++
>  fs/f2fs/file.c | 183 
> -
>  fs/f2fs/gc.c   | 103 -
>  fs/f2fs/inline.c   |   3 +
>  fs/f2fs/inode.c|   8 +-
>  fs/f2fs/namei.c|   5 +
>  fs/f2fs/super.c|  15 +++
>  9 files changed, 454 insertions(+), 12 deletions(-)
> 
> diff --git a/Documentation/filesystems/f2fs.txt 
> b/Documentation/filesystems/f2fs.txt
> index 273ccb2..c86c421 100644
> --- a/Documentation/filesystems/f2fs.txt
> +++ b/Documentation/filesystems/f2fs.txt
> @@ -164,6 +164,8 @@ io_bits=%u Set the bit size of write IO 
> requests. It should be set
> with "mode=lfs".
>  usrquota   Enable plain user disk quota accounting.
>  grpquota   Enable plain group disk quota accounting.
> +daxUse direct access (no page cache). See
> +   Documentation/filesystems/dax.txt.
>  
>  
> 
>  DEBUGFS ENTRIES
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 87c1f41..4eb4b76 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -910,6 +910,15 @@ int f2fs_map_blocks(struct inode *inode, struct 
> f2fs_map_blocks *map,
>   err = -EIO;
>   goto sync_out;
>   }
> + /*
> +  * If newly allocated blocks are to be zeroed out later,
> +  * a single f2fs_map_blocks must not contain both old
> +  * and new blocks at the same time.
> +  */
> + if (flag == F2FS_GET_BLOCK_ZERO
> + && (map->m_flags & F2FS_MAP_MAPPED)
> + && !(map->m_flags & F2FS_MAP_NEW))
> + goto sync_out;
>   if (flag == F2FS_GET_BLOCK_PRE_AIO) {
>   if (blkaddr == NULL_ADDR) {
>   prealloc++;
> @@ -938,6 +947,8 @@ int f2fs_map_blocks(struct inode *inode, struct 
> f2fs_map_blocks *map,
>   blkaddr != NEW_ADDR)
>   goto sync_out;
>   }
> + } else if (flag == F2FS_GET_BLOCK_ZERO && map->m_flags & F2FS_MAP_NEW) {
> + goto sync_out;
>   }
>  
>   if (flag == F2FS_GET_BLOCK_PRE_AIO)
> @@ -996,6 +1007,12 @@ int f2fs_map_blocks(struct inode *inode, struct 
> f2fs_map_blocks *map,
>   goto next_dnode;
>  
>  sync_out:
> + if (flag == F2FS_GET_BLOCK_ZERO && map->m_flags & F2FS_MAP_NEW) {
> + clean_bdev_aliases(inode->i_sb->s_bdev,
> + map->m_pblk, map->m_len);
> + err = sb_issue_zeroout(inode->i_sb, map->m_pblk,
> + map->m_len, GFP_NOFS);
> + }
>   f2fs_put_dnode();
>  unlock_out:
>   if (create) {
> @@ -1808,16 +1825,19 @@ static int f2fs_write_data_pages(struct address_space 
> *mapping,
>   return 0;
>  }
>  
> -static void f2fs_write_failed(struct address_space *mapping, loff_t to)
> +static void f2fs_write_failed(struct address_space *mapping, loff_t to,
> + bool lock)
>  {
>   struct inode *inode = mapping->host;
>   loff_t i_size = i_size_read(inode);
>  
>   if (to > i_size) {
> - down_write(_I(inode)->i_mmap_sem);
> + if (lock)
> + down_write(_I(inode)->i_mmap_sem);
>   truncate_pagecache(inode, i_size);
>   truncate_blocks(inode, i_size, true);
> - 

Re: [PATCH v8 1/1] f2fs: dax: implement direct access

2017-07-21 Thread Jaegeuk Kim
Hi Qiuyang,

This fails xfstests/generic/413.

Thanks,

On 07/20, sunqiuyang wrote:
> From: Qiuyang Sun 
> 
> This patch implements Direct Access (DAX) in F2FS, including:
>  - a mount option to choose whether to enable DAX or not
>  - read/write and mmap of regular files in the DAX way
>  - zero-out of unaligned partial blocks in the DAX way
>  - garbage collection of DAX files, by mapping both old and new physical
>addresses of a data page into memory and copy data between them directly
>  - incompatibility of DAX with inline data, atomic or volatile write, 
>collapse|insert_range, etc.
> 
> Signed-off-by: Qiuyang Sun 
> ---
> Changelog v7 -> v8:
>  - Introduce the macro f2fs_dax_file() to judge if a file is DAX for cases
>when CONFIG_FS_DAX is set or not
>  - Return -ENOTSUPP when an operation does not support DAX
>  - In f2fs_iomap_begin(), convert the inline data of an inode (if any) 
>before mapping blocks
>  - Minor cleanups
> ---
>  Documentation/filesystems/f2fs.txt |   2 +
>  fs/f2fs/data.c | 132 +-
>  fs/f2fs/f2fs.h |  15 +++
>  fs/f2fs/file.c | 183 
> -
>  fs/f2fs/gc.c   | 103 -
>  fs/f2fs/inline.c   |   3 +
>  fs/f2fs/inode.c|   8 +-
>  fs/f2fs/namei.c|   5 +
>  fs/f2fs/super.c|  15 +++
>  9 files changed, 454 insertions(+), 12 deletions(-)
> 
> diff --git a/Documentation/filesystems/f2fs.txt 
> b/Documentation/filesystems/f2fs.txt
> index 273ccb2..c86c421 100644
> --- a/Documentation/filesystems/f2fs.txt
> +++ b/Documentation/filesystems/f2fs.txt
> @@ -164,6 +164,8 @@ io_bits=%u Set the bit size of write IO 
> requests. It should be set
> with "mode=lfs".
>  usrquota   Enable plain user disk quota accounting.
>  grpquota   Enable plain group disk quota accounting.
> +daxUse direct access (no page cache). See
> +   Documentation/filesystems/dax.txt.
>  
>  
> 
>  DEBUGFS ENTRIES
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 87c1f41..4eb4b76 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -910,6 +910,15 @@ int f2fs_map_blocks(struct inode *inode, struct 
> f2fs_map_blocks *map,
>   err = -EIO;
>   goto sync_out;
>   }
> + /*
> +  * If newly allocated blocks are to be zeroed out later,
> +  * a single f2fs_map_blocks must not contain both old
> +  * and new blocks at the same time.
> +  */
> + if (flag == F2FS_GET_BLOCK_ZERO
> + && (map->m_flags & F2FS_MAP_MAPPED)
> + && !(map->m_flags & F2FS_MAP_NEW))
> + goto sync_out;
>   if (flag == F2FS_GET_BLOCK_PRE_AIO) {
>   if (blkaddr == NULL_ADDR) {
>   prealloc++;
> @@ -938,6 +947,8 @@ int f2fs_map_blocks(struct inode *inode, struct 
> f2fs_map_blocks *map,
>   blkaddr != NEW_ADDR)
>   goto sync_out;
>   }
> + } else if (flag == F2FS_GET_BLOCK_ZERO && map->m_flags & F2FS_MAP_NEW) {
> + goto sync_out;
>   }
>  
>   if (flag == F2FS_GET_BLOCK_PRE_AIO)
> @@ -996,6 +1007,12 @@ int f2fs_map_blocks(struct inode *inode, struct 
> f2fs_map_blocks *map,
>   goto next_dnode;
>  
>  sync_out:
> + if (flag == F2FS_GET_BLOCK_ZERO && map->m_flags & F2FS_MAP_NEW) {
> + clean_bdev_aliases(inode->i_sb->s_bdev,
> + map->m_pblk, map->m_len);
> + err = sb_issue_zeroout(inode->i_sb, map->m_pblk,
> + map->m_len, GFP_NOFS);
> + }
>   f2fs_put_dnode();
>  unlock_out:
>   if (create) {
> @@ -1808,16 +1825,19 @@ static int f2fs_write_data_pages(struct address_space 
> *mapping,
>   return 0;
>  }
>  
> -static void f2fs_write_failed(struct address_space *mapping, loff_t to)
> +static void f2fs_write_failed(struct address_space *mapping, loff_t to,
> + bool lock)
>  {
>   struct inode *inode = mapping->host;
>   loff_t i_size = i_size_read(inode);
>  
>   if (to > i_size) {
> - down_write(_I(inode)->i_mmap_sem);
> + if (lock)
> + down_write(_I(inode)->i_mmap_sem);
>   truncate_pagecache(inode, i_size);
>   truncate_blocks(inode, i_size, true);
> - up_write(_I(inode)->i_mmap_sem);
> + if 

[PATCH v8 1/1] f2fs: dax: implement direct access

2017-07-20 Thread sunqiuyang
From: Qiuyang Sun 

This patch implements Direct Access (DAX) in F2FS, including:
 - a mount option to choose whether to enable DAX or not
 - read/write and mmap of regular files in the DAX way
 - zero-out of unaligned partial blocks in the DAX way
 - garbage collection of DAX files, by mapping both old and new physical
   addresses of a data page into memory and copy data between them directly
 - incompatibility of DAX with inline data, atomic or volatile write, 
   collapse|insert_range, etc.

Signed-off-by: Qiuyang Sun 
---
Changelog v7 -> v8:
 - Introduce the macro f2fs_dax_file() to judge if a file is DAX for cases
   when CONFIG_FS_DAX is set or not
 - Return -ENOTSUPP when an operation does not support DAX
 - In f2fs_iomap_begin(), convert the inline data of an inode (if any) 
   before mapping blocks
 - Minor cleanups
---
 Documentation/filesystems/f2fs.txt |   2 +
 fs/f2fs/data.c | 132 +-
 fs/f2fs/f2fs.h |  15 +++
 fs/f2fs/file.c | 183 -
 fs/f2fs/gc.c   | 103 -
 fs/f2fs/inline.c   |   3 +
 fs/f2fs/inode.c|   8 +-
 fs/f2fs/namei.c|   5 +
 fs/f2fs/super.c|  15 +++
 9 files changed, 454 insertions(+), 12 deletions(-)

diff --git a/Documentation/filesystems/f2fs.txt 
b/Documentation/filesystems/f2fs.txt
index 273ccb2..c86c421 100644
--- a/Documentation/filesystems/f2fs.txt
+++ b/Documentation/filesystems/f2fs.txt
@@ -164,6 +164,8 @@ io_bits=%u Set the bit size of write IO 
requests. It should be set
with "mode=lfs".
 usrquota   Enable plain user disk quota accounting.
 grpquota   Enable plain group disk quota accounting.
+daxUse direct access (no page cache). See
+   Documentation/filesystems/dax.txt.
 
 

 DEBUGFS ENTRIES
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 87c1f41..4eb4b76 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -910,6 +910,15 @@ int f2fs_map_blocks(struct inode *inode, struct 
f2fs_map_blocks *map,
err = -EIO;
goto sync_out;
}
+   /*
+* If newly allocated blocks are to be zeroed out later,
+* a single f2fs_map_blocks must not contain both old
+* and new blocks at the same time.
+*/
+   if (flag == F2FS_GET_BLOCK_ZERO
+   && (map->m_flags & F2FS_MAP_MAPPED)
+   && !(map->m_flags & F2FS_MAP_NEW))
+   goto sync_out;
if (flag == F2FS_GET_BLOCK_PRE_AIO) {
if (blkaddr == NULL_ADDR) {
prealloc++;
@@ -938,6 +947,8 @@ int f2fs_map_blocks(struct inode *inode, struct 
f2fs_map_blocks *map,
blkaddr != NEW_ADDR)
goto sync_out;
}
+   } else if (flag == F2FS_GET_BLOCK_ZERO && map->m_flags & F2FS_MAP_NEW) {
+   goto sync_out;
}
 
if (flag == F2FS_GET_BLOCK_PRE_AIO)
@@ -996,6 +1007,12 @@ int f2fs_map_blocks(struct inode *inode, struct 
f2fs_map_blocks *map,
goto next_dnode;
 
 sync_out:
+   if (flag == F2FS_GET_BLOCK_ZERO && map->m_flags & F2FS_MAP_NEW) {
+   clean_bdev_aliases(inode->i_sb->s_bdev,
+   map->m_pblk, map->m_len);
+   err = sb_issue_zeroout(inode->i_sb, map->m_pblk,
+   map->m_len, GFP_NOFS);
+   }
f2fs_put_dnode();
 unlock_out:
if (create) {
@@ -1808,16 +1825,19 @@ static int f2fs_write_data_pages(struct address_space 
*mapping,
return 0;
 }
 
-static void f2fs_write_failed(struct address_space *mapping, loff_t to)
+static void f2fs_write_failed(struct address_space *mapping, loff_t to,
+   bool lock)
 {
struct inode *inode = mapping->host;
loff_t i_size = i_size_read(inode);
 
if (to > i_size) {
-   down_write(_I(inode)->i_mmap_sem);
+   if (lock)
+   down_write(_I(inode)->i_mmap_sem);
truncate_pagecache(inode, i_size);
truncate_blocks(inode, i_size, true);
-   up_write(_I(inode)->i_mmap_sem);
+   if (lock)
+   up_write(_I(inode)->i_mmap_sem);
}
 }
 
@@ -2000,7 +2020,7 @@ static int f2fs_write_begin(struct file *file, struct 
address_space *mapping,
 
 

[PATCH v8 1/1] f2fs: dax: implement direct access

2017-07-20 Thread sunqiuyang
From: Qiuyang Sun 

This patch implements Direct Access (DAX) in F2FS, including:
 - a mount option to choose whether to enable DAX or not
 - read/write and mmap of regular files in the DAX way
 - zero-out of unaligned partial blocks in the DAX way
 - garbage collection of DAX files, by mapping both old and new physical
   addresses of a data page into memory and copy data between them directly
 - incompatibility of DAX with inline data, atomic or volatile write, 
   collapse|insert_range, etc.

Signed-off-by: Qiuyang Sun 
---
Changelog v7 -> v8:
 - Introduce the macro f2fs_dax_file() to judge if a file is DAX for cases
   when CONFIG_FS_DAX is set or not
 - Return -ENOTSUPP when an operation does not support DAX
 - In f2fs_iomap_begin(), convert the inline data of an inode (if any) 
   before mapping blocks
 - Minor cleanups
---
 Documentation/filesystems/f2fs.txt |   2 +
 fs/f2fs/data.c | 132 +-
 fs/f2fs/f2fs.h |  15 +++
 fs/f2fs/file.c | 183 -
 fs/f2fs/gc.c   | 103 -
 fs/f2fs/inline.c   |   3 +
 fs/f2fs/inode.c|   8 +-
 fs/f2fs/namei.c|   5 +
 fs/f2fs/super.c|  15 +++
 9 files changed, 454 insertions(+), 12 deletions(-)

diff --git a/Documentation/filesystems/f2fs.txt 
b/Documentation/filesystems/f2fs.txt
index 273ccb2..c86c421 100644
--- a/Documentation/filesystems/f2fs.txt
+++ b/Documentation/filesystems/f2fs.txt
@@ -164,6 +164,8 @@ io_bits=%u Set the bit size of write IO 
requests. It should be set
with "mode=lfs".
 usrquota   Enable plain user disk quota accounting.
 grpquota   Enable plain group disk quota accounting.
+daxUse direct access (no page cache). See
+   Documentation/filesystems/dax.txt.
 
 

 DEBUGFS ENTRIES
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 87c1f41..4eb4b76 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -910,6 +910,15 @@ int f2fs_map_blocks(struct inode *inode, struct 
f2fs_map_blocks *map,
err = -EIO;
goto sync_out;
}
+   /*
+* If newly allocated blocks are to be zeroed out later,
+* a single f2fs_map_blocks must not contain both old
+* and new blocks at the same time.
+*/
+   if (flag == F2FS_GET_BLOCK_ZERO
+   && (map->m_flags & F2FS_MAP_MAPPED)
+   && !(map->m_flags & F2FS_MAP_NEW))
+   goto sync_out;
if (flag == F2FS_GET_BLOCK_PRE_AIO) {
if (blkaddr == NULL_ADDR) {
prealloc++;
@@ -938,6 +947,8 @@ int f2fs_map_blocks(struct inode *inode, struct 
f2fs_map_blocks *map,
blkaddr != NEW_ADDR)
goto sync_out;
}
+   } else if (flag == F2FS_GET_BLOCK_ZERO && map->m_flags & F2FS_MAP_NEW) {
+   goto sync_out;
}
 
if (flag == F2FS_GET_BLOCK_PRE_AIO)
@@ -996,6 +1007,12 @@ int f2fs_map_blocks(struct inode *inode, struct 
f2fs_map_blocks *map,
goto next_dnode;
 
 sync_out:
+   if (flag == F2FS_GET_BLOCK_ZERO && map->m_flags & F2FS_MAP_NEW) {
+   clean_bdev_aliases(inode->i_sb->s_bdev,
+   map->m_pblk, map->m_len);
+   err = sb_issue_zeroout(inode->i_sb, map->m_pblk,
+   map->m_len, GFP_NOFS);
+   }
f2fs_put_dnode();
 unlock_out:
if (create) {
@@ -1808,16 +1825,19 @@ static int f2fs_write_data_pages(struct address_space 
*mapping,
return 0;
 }
 
-static void f2fs_write_failed(struct address_space *mapping, loff_t to)
+static void f2fs_write_failed(struct address_space *mapping, loff_t to,
+   bool lock)
 {
struct inode *inode = mapping->host;
loff_t i_size = i_size_read(inode);
 
if (to > i_size) {
-   down_write(_I(inode)->i_mmap_sem);
+   if (lock)
+   down_write(_I(inode)->i_mmap_sem);
truncate_pagecache(inode, i_size);
truncate_blocks(inode, i_size, true);
-   up_write(_I(inode)->i_mmap_sem);
+   if (lock)
+   up_write(_I(inode)->i_mmap_sem);
}
 }
 
@@ -2000,7 +2020,7 @@ static int f2fs_write_begin(struct file *file, struct 
address_space *mapping,
 
 fail:
f2fs_put_page(page, 1);
-