Re: [PATCH v10 00/21] Support ext4 on NV-DIMMs

2014-08-30 Thread Christian Stroetmann

On the 28th of August 2014 at 09:17, Dave Chinner wrote:

On Wed, Aug 27, 2014 at 02:30:55PM -0700, Andrew Morton wrote:

On Wed, 27 Aug 2014 16:22:20 -0500 (CDT) Christoph Lameter  
wrote:


Some explanation of why one would use ext4 instead of, say,
suitably-modified ramfs/tmpfs/rd/etc?

The NVDIMM contents survive reboot and therefore ramfs and friends wont
work with it.

See "suitably modified".  Presumably this type of memory would need to
come from a particular page allocator zone.  ramfs would be unweildy
due to its use to dentry/inode caches, but rd/etc should be feasible.




Hello Dave and the others

Thank you very much for your patience and your following summarization.


That's where we started about two years ago with that horrible
pramfs trainwreck.

To start with: brd is a block device, not a filesystem. We still
need the filesystem on top of a persistent ram disk to make it
useful to applications. We can do this with ext4/XFS right now, and
that is the fundamental basis on which DAX is built.

For sake of the discussion, however, let's walk through what is
required to make an "existing" ramfs persistent. Persistence means we
can't just wipe it and start again if it gets corrupted, and
rebooting is not a fix for problems.  Hence we need to be able to
identify it, check it, repair it, ensure metadata operations are
persistent across machine crashes, etc, so there is all sorts of
management tools required by a persistent ramfs.

But most important of all: the persistent storage format needs to be
forwards and backwards compatible across kernel versions.  Hence we
can't encode any structure the kernel uses internally into the
persistent storage because they aren't stable structures.  That
means we need to marshall objects between the persistence domain and
the volatile domain in an orderly fashion.


Two little questions:
1. If we would omit the compatiblitiy across kernel versions only for 
theoretical reasons,
then would it make sense at all to encode a structure that the kernel 
uses internally and

what advantages could be reached in this way?
2. Have the said structures used by the kernel changed so many times?


We can avoid using the dentry/inode *caches* by freeing those
volatile objects the moment reference counts dop to zero rather than
putting them on LRUs. However, we can't store them in persistent
storage and we can't avoid using them to interface with the VFS, so
it makes little sense to burn CPU continually marshalling such
structures in and out of volatile memory if we have free RAM to do
so. So even with a "persistent ramfs" caching the working set of
volatile VFS objects makes sense from a peformance point of view.


I am sorry to say so, but I am confused again and do not understand this 
argument,
because we are already talking about NVDIMMs here. So, if we have those 
volatile
VFS objects already in NVDIMMs so to say, then we have them also in 
persistent

storage and in DRAM at the same time.



Then you've got crash recovery management: NVDIMMs are not
synchronous: they can still lose data while it is being written on
power loss. And we can't update persistent memory piecemeal as the
VFS code modifies metadata - there needs to be synchronisation
points, otherwise we will always have inconsistent metadata state in
persistent memory.

Persistent memory also can't do atomic writes across multiple,
disjoint CPU cachelines or NVDIMMs, and this is what is needed for
synchroniation points for multi-object metadata modification
operations to be consistent after a crash.  There is some work in
the nvme working groups to define this, but so far there hasn't been
any useful outcome, and then we willhave to wait for CPUs to
implement those interfaces.

Hence the metadata that indexes the persistent RAM needs to use COW
techniques, use a log structure or use WAL (journalling).  Hence
that "persistent ramfs" is now looking much more like a database or
traditional filesystem.

Further, it's going to need to scale to very large amounts of
storage.  We're talking about machines with *tens of TB* of NVDIMM
capacity in the immediate future and so free space manangement and
concurrency of allocation and freeing of used space is going to be
fundamental to the performance of the persistent NVRAM filesystem.
So, you end up with block/allocation groups to subdivide the space.
Looking a lot like ext4 or XFS at this point.

And now you have to scale to indexing tens of millions of
everything. At least tens of millions - hundreds of millions to
billions is more likely, because storing tens of terabytes of small
files is going to require indexing billions of files. And because
there is no performance penalty for doing this, people will use the
filesystem as a great big database. So now you have to have a
scalable posix compatible directory structures, scalable freespace
indexation, dynamic, scalable inode allocation, freeing, etc. Oh,
and it also needs to be highly concurrent to handle machines 

Re: [PATCH v10 00/21] Support ext4 on NV-DIMMs

2014-08-30 Thread Christian Stroetmann

On the 28th of August 2014 at 09:17, Dave Chinner wrote:

On Wed, Aug 27, 2014 at 02:30:55PM -0700, Andrew Morton wrote:

On Wed, 27 Aug 2014 16:22:20 -0500 (CDT) Christoph Lameterc...@linux.com  
wrote:


Some explanation of why one would use ext4 instead of, say,
suitably-modified ramfs/tmpfs/rd/etc?

The NVDIMM contents survive reboot and therefore ramfs and friends wont
work with it.

See suitably modified.  Presumably this type of memory would need to
come from a particular page allocator zone.  ramfs would be unweildy
due to its use to dentry/inode caches, but rd/etc should be feasible.

sigh


Hello Dave and the others

Thank you very much for your patience and your following summarization.


That's where we started about two years ago with that horrible
pramfs trainwreck.

To start with: brd is a block device, not a filesystem. We still
need the filesystem on top of a persistent ram disk to make it
useful to applications. We can do this with ext4/XFS right now, and
that is the fundamental basis on which DAX is built.

For sake of the discussion, however, let's walk through what is
required to make an existing ramfs persistent. Persistence means we
can't just wipe it and start again if it gets corrupted, and
rebooting is not a fix for problems.  Hence we need to be able to
identify it, check it, repair it, ensure metadata operations are
persistent across machine crashes, etc, so there is all sorts of
management tools required by a persistent ramfs.

But most important of all: the persistent storage format needs to be
forwards and backwards compatible across kernel versions.  Hence we
can't encode any structure the kernel uses internally into the
persistent storage because they aren't stable structures.  That
means we need to marshall objects between the persistence domain and
the volatile domain in an orderly fashion.


Two little questions:
1. If we would omit the compatiblitiy across kernel versions only for 
theoretical reasons,
then would it make sense at all to encode a structure that the kernel 
uses internally and

what advantages could be reached in this way?
2. Have the said structures used by the kernel changed so many times?


We can avoid using the dentry/inode *caches* by freeing those
volatile objects the moment reference counts dop to zero rather than
putting them on LRUs. However, we can't store them in persistent
storage and we can't avoid using them to interface with the VFS, so
it makes little sense to burn CPU continually marshalling such
structures in and out of volatile memory if we have free RAM to do
so. So even with a persistent ramfs caching the working set of
volatile VFS objects makes sense from a peformance point of view.


I am sorry to say so, but I am confused again and do not understand this 
argument,
because we are already talking about NVDIMMs here. So, if we have those 
volatile
VFS objects already in NVDIMMs so to say, then we have them also in 
persistent

storage and in DRAM at the same time.



Then you've got crash recovery management: NVDIMMs are not
synchronous: they can still lose data while it is being written on
power loss. And we can't update persistent memory piecemeal as the
VFS code modifies metadata - there needs to be synchronisation
points, otherwise we will always have inconsistent metadata state in
persistent memory.

Persistent memory also can't do atomic writes across multiple,
disjoint CPU cachelines or NVDIMMs, and this is what is needed for
synchroniation points for multi-object metadata modification
operations to be consistent after a crash.  There is some work in
the nvme working groups to define this, but so far there hasn't been
any useful outcome, and then we willhave to wait for CPUs to
implement those interfaces.

Hence the metadata that indexes the persistent RAM needs to use COW
techniques, use a log structure or use WAL (journalling).  Hence
that persistent ramfs is now looking much more like a database or
traditional filesystem.

Further, it's going to need to scale to very large amounts of
storage.  We're talking about machines with *tens of TB* of NVDIMM
capacity in the immediate future and so free space manangement and
concurrency of allocation and freeing of used space is going to be
fundamental to the performance of the persistent NVRAM filesystem.
So, you end up with block/allocation groups to subdivide the space.
Looking a lot like ext4 or XFS at this point.

And now you have to scale to indexing tens of millions of
everything. At least tens of millions - hundreds of millions to
billions is more likely, because storing tens of terabytes of small
files is going to require indexing billions of files. And because
there is no performance penalty for doing this, people will use the
filesystem as a great big database. So now you have to have a
scalable posix compatible directory structures, scalable freespace
indexation, dynamic, scalable inode allocation, freeing, etc. Oh,
and it also needs to be highly concurrent to handle 

Re: [PATCH v10 00/21] Support ext4 on NV-DIMMs

2014-08-28 Thread Zwisler, Ross
On Thu, 2014-08-28 at 11:08 +0300, Boaz Harrosh wrote:
> On 08/27/2014 06:45 AM, Matthew Wilcox wrote:
> > One of the primary uses for NV-DIMMs is to expose them as a block device
> > and use a filesystem to store files on the NV-DIMM.  While that works,
> > it currently wastes memory and CPU time buffering the files in the page
> > cache.  We have support in ext2 for bypassing the page cache, but it
> > has some races which are unfixable in the current design.  This series
> > of patches rewrite the underlying support, and add support for direct
> > access to ext4.
> > 
> > Note that patch 6/21 has been included in
> > https://git.kernel.org/cgit/linux/kernel/git/viro/vfs.git/log/?h=for-next-candidate
> > 
> 
> Matthew hi
> 
> Could you please push this to the regular or a new public tree?
> 
> (Old versions are at: https://github.com/01org/prd)
> 
> Thanks
> Boaz

Hi Boaz,

I've pushed the updated tree to https://github.com/01org/prd in the master
branch.  All the older versions of the code that we've had while rebasing are
still available in their own branches.

Thanks,
- Ross



Re: [PATCH v10 00/21] Support ext4 on NV-DIMMs

2014-08-28 Thread Matthew Wilcox
On Wed, Aug 27, 2014 at 06:30:27PM -0700, Andy Lutomirski wrote:
> 4) No page faults ever once a page is writable (I hope -- I'm not sure
> whether this series actually achieves that goal).

I can't think of a circumstance in which you'd end up taking a page fault
after a writable mapping is established.

The next part to this series (that I'm working on now) is PMD support.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v10 00/21] Support ext4 on NV-DIMMs

2014-08-28 Thread Matthew Wilcox
On Wed, Aug 27, 2014 at 02:46:22PM -0700, Andrew Morton wrote:
> > > Sat down to read all this but I'm finding it rather unwieldy - it's
> > > just a great blob of code.  Is there some overall
> > > what-it-does-and-how-it-does-it roadmap?
> > 
> > The overall goal is to map persistent memory / NV-DIMMs directly to
> > userspace.  We have that functionality in the XIP code, but the way
> > it's structured is unsuitable for filesystems like ext4 & XFS, and
> > it has some pretty ugly races.
> 
> When thinking about looking at the patchset I wonder things like how
> does mmap work, in what situations does a page get COWed, how do we
> handle partial pages at EOF, etc.  I guess that's all part of the
> filemap_xip legacy, the details of which I've totally forgotten.

mmap works by installing a PTE that points to the storage.  This implies
that the NV-DIMM has to be the kind that always has everything mapped
(there are other types that require commands to be sent to move windows
around that point into the storage ... DAX is not for these types
of DIMMs).

We use a VM_MIXEDMAP vma.  The PTEs pointing to PFNs will just get
copied across on fork.  Read-faults on holes are covered by a read-only
page cache page.  On a write to a hole, any page cache page covering it
will be unmapped and evicted from the page cache.  The mapping for the
faulting task will be replaced with a mapping to the newly established
block, but other mappings will take a fresh fault on their next reference.

Partial pages are mmapable, just as they are with page-cache based
files.  You can even store beyond EOF, just as with page-cache files.
Those stores are, of course, going to end up on persistence, but they
might well end up being zeroed if the file is extended ... again, this
is no different to page-cache based files.

> > > Performance testing results?
> > 
> > I haven't been running any performance tests.  What sort of performance
> > tests would be interesting for you to see?
> 
> fs benchmarks?  `dd' would be a good start ;)
> 
> I assume (because I wasn't told!) that there are two objectives here:
> 
> 1) reduce memory consumption by not maintaining pagecache and
> 2) reduce CPU cost by avoiding the double-copies.
> 
> These things are pretty easily quantified.  And really they must be
> quantified as part of the developer testing, because if you find
> they've worsened then holy cow, what went wrong.

It's really a functionality argument; the users we anticipate for NV-DIMMs
really want to directly map them into memory and do a lot of work through
loads and stores with the kernel not being involved at all, so we don't
actually have any performance targets for things like read/write.
That said, when running xfstests and comparing results between ext4
with and without DAX, I do see many of the tests completing quicker
with DAX than without (others "run for thirty seconds" so there's no
time difference between with/without).

> None of the patch titles identify the subsystem(s) which they're
> hitting.  eg, "Introduce IS_DAX(inode)" is an ext2 patch, but nobody
> would know that from browsing the titles.

I actually see that one as being a VFS patch ... ext2 changing is just
a side-effect.  I can re-split that patch if desired.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v10 00/21] Support ext4 on NV-DIMMs

2014-08-28 Thread Boaz Harrosh
On 08/27/2014 06:45 AM, Matthew Wilcox wrote:
> One of the primary uses for NV-DIMMs is to expose them as a block device
> and use a filesystem to store files on the NV-DIMM.  While that works,
> it currently wastes memory and CPU time buffering the files in the page
> cache.  We have support in ext2 for bypassing the page cache, but it
> has some races which are unfixable in the current design.  This series
> of patches rewrite the underlying support, and add support for direct
> access to ext4.
> 
> Note that patch 6/21 has been included in
> https://git.kernel.org/cgit/linux/kernel/git/viro/vfs.git/log/?h=for-next-candidate
> 

Matthew hi

Could you please push this to the regular or a new public tree?

(Old versions are at: https://github.com/01org/prd)

Thanks
Boaz

> This iteration of the patchset rebases to 3.17-rc2, changes the page fault
> locking, fixes a couple of bugs and makes a few other minor changes.
> 
>  - Move the calculation of the maximum size available at the requested
>location from the ->direct_access implementations to bdev_direct_access()
>  - Fix a comment typo (Ross Zwisler)
>  - Check that the requested length is positive in bdev_direct_access().  If
>it is not, assume that it's an errno, and just return it.
>  - Fix some whitespace issues flagged by checkpatch
>  - Added the Acked-by responses from Kirill that I forget in the last round
>  - Added myself to MAINTAINERS for DAX
>  - Fixed compilation with !CONFIG_DAX (Vishal Verma)
>  - Revert the locking in the page fault handler back to an earlier version.
>If we hit the race that we were trying to protect against, we will leave
>blocks allocated past the end of the file.  They will be removed on file
>removal, the next truncate, or fsck.
> 
> 
> Matthew Wilcox (20):
>   axonram: Fix bug in direct_access
>   Change direct_access calling convention
>   Fix XIP fault vs truncate race
>   Allow page fault handlers to perform the COW
>   Introduce IS_DAX(inode)
>   Add copy_to_iter(), copy_from_iter() and iov_iter_zero()
>   Replace XIP read and write with DAX I/O
>   Replace ext2_clear_xip_target with dax_clear_blocks
>   Replace the XIP page fault handler with the DAX page fault handler
>   Replace xip_truncate_page with dax_truncate_page
>   Replace XIP documentation with DAX documentation
>   Remove get_xip_mem
>   ext2: Remove ext2_xip_verify_sb()
>   ext2: Remove ext2_use_xip
>   ext2: Remove xip.c and xip.h
>   Remove CONFIG_EXT2_FS_XIP and rename CONFIG_FS_XIP to CONFIG_FS_DAX
>   ext2: Remove ext2_aops_xip
>   Get rid of most mentions of XIP in ext2
>   xip: Add xip_zero_page_range
>   brd: Rename XIP to DAX
> 
> Ross Zwisler (1):
>   ext4: Add DAX functionality
> 
>  Documentation/filesystems/Locking  |   3 -
>  Documentation/filesystems/dax.txt  |  91 +++
>  Documentation/filesystems/ext4.txt |   2 +
>  Documentation/filesystems/xip.txt  |  68 -
>  MAINTAINERS|   6 +
>  arch/powerpc/sysdev/axonram.c  |  19 +-
>  drivers/block/Kconfig  |  13 +-
>  drivers/block/brd.c|  26 +-
>  drivers/s390/block/dcssblk.c   |  21 +-
>  fs/Kconfig |  21 +-
>  fs/Makefile|   1 +
>  fs/block_dev.c |  40 +++
>  fs/dax.c   | 497 
> +
>  fs/exofs/inode.c   |   1 -
>  fs/ext2/Kconfig|  11 -
>  fs/ext2/Makefile   |   1 -
>  fs/ext2/ext2.h |  10 +-
>  fs/ext2/file.c |  45 +++-
>  fs/ext2/inode.c|  38 +--
>  fs/ext2/namei.c|  13 +-
>  fs/ext2/super.c|  53 ++--
>  fs/ext2/xip.c  |  91 ---
>  fs/ext2/xip.h  |  26 --
>  fs/ext4/ext4.h |   6 +
>  fs/ext4/file.c |  49 +++-
>  fs/ext4/indirect.c |  18 +-
>  fs/ext4/inode.c|  51 ++--
>  fs/ext4/namei.c|  10 +-
>  fs/ext4/super.c|  39 ++-
>  fs/open.c  |   5 +-
>  include/linux/blkdev.h |   6 +-
>  include/linux/fs.h |  49 +++-
>  include/linux/mm.h |   1 +
>  include/linux/uio.h|   3 +
>  mm/Makefile|   1 -
>  mm/fadvise.c   |   6 +-
>  mm/filemap.c   |   6 +-
>  mm/filemap_xip.c   | 483 ---
>  mm/iov_iter.c  | 237 --
>  mm/madvise.c   |   2 +-
>  mm/memory.c|  33 ++-
>  41 files changed, 1229 insertions(+), 873 deletions(-)
>  create mode 100644 Documentation/filesystems/dax.txt
>  delete mode 100644 Documentation/filesystems/xip.txt
>  create mode 100644 fs/dax.c
>  delete mode 100644 fs/ext2/xip.c
>  delete 

Re: [PATCH v10 00/21] Support ext4 on NV-DIMMs

2014-08-28 Thread Dave Chinner
On Wed, Aug 27, 2014 at 02:30:55PM -0700, Andrew Morton wrote:
> On Wed, 27 Aug 2014 16:22:20 -0500 (CDT) Christoph Lameter  
> wrote:
> 
> > > Some explanation of why one would use ext4 instead of, say,
> > > suitably-modified ramfs/tmpfs/rd/etc?
> > 
> > The NVDIMM contents survive reboot and therefore ramfs and friends wont
> > work with it.
> 
> See "suitably modified".  Presumably this type of memory would need to
> come from a particular page allocator zone.  ramfs would be unweildy
> due to its use to dentry/inode caches, but rd/etc should be feasible.



That's where we started about two years ago with that horrible
pramfs trainwreck.

To start with: brd is a block device, not a filesystem. We still
need the filesystem on top of a persistent ram disk to make it
useful to applications. We can do this with ext4/XFS right now, and
that is the fundamental basis on which DAX is built.

For sake of the discussion, however, let's walk through what is
required to make an "existing" ramfs persistent. Persistence means we
can't just wipe it and start again if it gets corrupted, and
rebooting is not a fix for problems.  Hence we need to be able to
identify it, check it, repair it, ensure metadata operations are
persistent across machine crashes, etc, so there is all sorts of
management tools required by a persistent ramfs.

But most important of all: the persistent storage format needs to be
forwards and backwards compatible across kernel versions.  Hence we
can't encode any structure the kernel uses internally into the
persistent storage because they aren't stable structures.  That
means we need to marshall objects between the persistence domain and
the volatile domain in an orderly fashion.

We can avoid using the dentry/inode *caches* by freeing those
volatile objects the moment reference counts dop to zero rather than
putting them on LRUs. However, we can't store them in persistent
storage and we can't avoid using them to interface with the VFS, so
it makes little sense to burn CPU continually marshalling such
structures in and out of volatile memory if we have free RAM to do
so. So even with a "persistent ramfs" caching the working set of
volatile VFS objects makes sense from a peformance point of view.

Then you've got crash recovery management: NVDIMMs are not
synchronous: they can still lose data while it is being written on
power loss. And we can't update persistent memory piecemeal as the
VFS code modifies metadata - there needs to be synchronisation
points, otherwise we will always have inconsistent metadata state in
persistent memory.

Persistent memory also can't do atomic writes across multiple,
disjoint CPU cachelines or NVDIMMs, and this is what is needed for
synchroniation points for multi-object metadata modification
operations to be consistent after a crash.  There is some work in
the nvme working groups to define this, but so far there hasn't been
any useful outcome, and then we willhave to wait for CPUs to
implement those interfaces.

Hence the metadata that indexes the persistent RAM needs to use COW
techniques, use a log structure or use WAL (journalling).  Hence
that "persistent ramfs" is now looking much more like a database or
traditional filesystem.

Further, it's going to need to scale to very large amounts of
storage.  We're talking about machines with *tens of TB* of NVDIMM
capacity in the immediate future and so free space manangement and
concurrency of allocation and freeing of used space is going to be
fundamental to the performance of the persistent NVRAM filesystem.
So, you end up with block/allocation groups to subdivide the space.
Looking a lot like ext4 or XFS at this point.

And now you have to scale to indexing tens of millions of
everything. At least tens of millions - hundreds of millions to
billions is more likely, because storing tens of terabytes of small
files is going to require indexing billions of files. And because
there is no performance penalty for doing this, people will use the
filesystem as a great big database. So now you have to have a
scalable posix compatible directory structures, scalable freespace
indexation, dynamic, scalable inode allocation, freeing, etc. Oh,
and it also needs to be highly concurrent to handle machines with
hundreds of CPU cores.

Funnily enough, we already have a couple of persistent storage
implementations that solve these problems to varying degrees. ext4
is one of them, if you ignore the scalability and concurrency
requirements. XFS is the other. And both will run unmodified on
a persistant ram block device, which we *already have*.

And so back to DAX. What users actually want from their high speed
persistant RAM storage is direct, cpu addressable access to that
persistent storage. They don't want to have to care about how to
find an object in the persistent storage - that's what filesystems
are for - they just want to be able to read and write to it
directly. That's what DAX does - it provides existing filesystems
a method 

Re: [PATCH v10 00/21] Support ext4 on NV-DIMMs

2014-08-28 Thread Dave Chinner
On Wed, Aug 27, 2014 at 02:30:55PM -0700, Andrew Morton wrote:
 On Wed, 27 Aug 2014 16:22:20 -0500 (CDT) Christoph Lameter c...@linux.com 
 wrote:
 
   Some explanation of why one would use ext4 instead of, say,
   suitably-modified ramfs/tmpfs/rd/etc?
  
  The NVDIMM contents survive reboot and therefore ramfs and friends wont
  work with it.
 
 See suitably modified.  Presumably this type of memory would need to
 come from a particular page allocator zone.  ramfs would be unweildy
 due to its use to dentry/inode caches, but rd/etc should be feasible.

sigh

That's where we started about two years ago with that horrible
pramfs trainwreck.

To start with: brd is a block device, not a filesystem. We still
need the filesystem on top of a persistent ram disk to make it
useful to applications. We can do this with ext4/XFS right now, and
that is the fundamental basis on which DAX is built.

For sake of the discussion, however, let's walk through what is
required to make an existing ramfs persistent. Persistence means we
can't just wipe it and start again if it gets corrupted, and
rebooting is not a fix for problems.  Hence we need to be able to
identify it, check it, repair it, ensure metadata operations are
persistent across machine crashes, etc, so there is all sorts of
management tools required by a persistent ramfs.

But most important of all: the persistent storage format needs to be
forwards and backwards compatible across kernel versions.  Hence we
can't encode any structure the kernel uses internally into the
persistent storage because they aren't stable structures.  That
means we need to marshall objects between the persistence domain and
the volatile domain in an orderly fashion.

We can avoid using the dentry/inode *caches* by freeing those
volatile objects the moment reference counts dop to zero rather than
putting them on LRUs. However, we can't store them in persistent
storage and we can't avoid using them to interface with the VFS, so
it makes little sense to burn CPU continually marshalling such
structures in and out of volatile memory if we have free RAM to do
so. So even with a persistent ramfs caching the working set of
volatile VFS objects makes sense from a peformance point of view.

Then you've got crash recovery management: NVDIMMs are not
synchronous: they can still lose data while it is being written on
power loss. And we can't update persistent memory piecemeal as the
VFS code modifies metadata - there needs to be synchronisation
points, otherwise we will always have inconsistent metadata state in
persistent memory.

Persistent memory also can't do atomic writes across multiple,
disjoint CPU cachelines or NVDIMMs, and this is what is needed for
synchroniation points for multi-object metadata modification
operations to be consistent after a crash.  There is some work in
the nvme working groups to define this, but so far there hasn't been
any useful outcome, and then we willhave to wait for CPUs to
implement those interfaces.

Hence the metadata that indexes the persistent RAM needs to use COW
techniques, use a log structure or use WAL (journalling).  Hence
that persistent ramfs is now looking much more like a database or
traditional filesystem.

Further, it's going to need to scale to very large amounts of
storage.  We're talking about machines with *tens of TB* of NVDIMM
capacity in the immediate future and so free space manangement and
concurrency of allocation and freeing of used space is going to be
fundamental to the performance of the persistent NVRAM filesystem.
So, you end up with block/allocation groups to subdivide the space.
Looking a lot like ext4 or XFS at this point.

And now you have to scale to indexing tens of millions of
everything. At least tens of millions - hundreds of millions to
billions is more likely, because storing tens of terabytes of small
files is going to require indexing billions of files. And because
there is no performance penalty for doing this, people will use the
filesystem as a great big database. So now you have to have a
scalable posix compatible directory structures, scalable freespace
indexation, dynamic, scalable inode allocation, freeing, etc. Oh,
and it also needs to be highly concurrent to handle machines with
hundreds of CPU cores.

Funnily enough, we already have a couple of persistent storage
implementations that solve these problems to varying degrees. ext4
is one of them, if you ignore the scalability and concurrency
requirements. XFS is the other. And both will run unmodified on
a persistant ram block device, which we *already have*.

And so back to DAX. What users actually want from their high speed
persistant RAM storage is direct, cpu addressable access to that
persistent storage. They don't want to have to care about how to
find an object in the persistent storage - that's what filesystems
are for - they just want to be able to read and write to it
directly. That's what DAX does - it provides existing filesystems
a method for 

Re: [PATCH v10 00/21] Support ext4 on NV-DIMMs

2014-08-28 Thread Boaz Harrosh
On 08/27/2014 06:45 AM, Matthew Wilcox wrote:
 One of the primary uses for NV-DIMMs is to expose them as a block device
 and use a filesystem to store files on the NV-DIMM.  While that works,
 it currently wastes memory and CPU time buffering the files in the page
 cache.  We have support in ext2 for bypassing the page cache, but it
 has some races which are unfixable in the current design.  This series
 of patches rewrite the underlying support, and add support for direct
 access to ext4.
 
 Note that patch 6/21 has been included in
 https://git.kernel.org/cgit/linux/kernel/git/viro/vfs.git/log/?h=for-next-candidate
 

Matthew hi

Could you please push this to the regular or a new public tree?

(Old versions are at: https://github.com/01org/prd)

Thanks
Boaz

 This iteration of the patchset rebases to 3.17-rc2, changes the page fault
 locking, fixes a couple of bugs and makes a few other minor changes.
 
  - Move the calculation of the maximum size available at the requested
location from the -direct_access implementations to bdev_direct_access()
  - Fix a comment typo (Ross Zwisler)
  - Check that the requested length is positive in bdev_direct_access().  If
it is not, assume that it's an errno, and just return it.
  - Fix some whitespace issues flagged by checkpatch
  - Added the Acked-by responses from Kirill that I forget in the last round
  - Added myself to MAINTAINERS for DAX
  - Fixed compilation with !CONFIG_DAX (Vishal Verma)
  - Revert the locking in the page fault handler back to an earlier version.
If we hit the race that we were trying to protect against, we will leave
blocks allocated past the end of the file.  They will be removed on file
removal, the next truncate, or fsck.
 
 
 Matthew Wilcox (20):
   axonram: Fix bug in direct_access
   Change direct_access calling convention
   Fix XIP fault vs truncate race
   Allow page fault handlers to perform the COW
   Introduce IS_DAX(inode)
   Add copy_to_iter(), copy_from_iter() and iov_iter_zero()
   Replace XIP read and write with DAX I/O
   Replace ext2_clear_xip_target with dax_clear_blocks
   Replace the XIP page fault handler with the DAX page fault handler
   Replace xip_truncate_page with dax_truncate_page
   Replace XIP documentation with DAX documentation
   Remove get_xip_mem
   ext2: Remove ext2_xip_verify_sb()
   ext2: Remove ext2_use_xip
   ext2: Remove xip.c and xip.h
   Remove CONFIG_EXT2_FS_XIP and rename CONFIG_FS_XIP to CONFIG_FS_DAX
   ext2: Remove ext2_aops_xip
   Get rid of most mentions of XIP in ext2
   xip: Add xip_zero_page_range
   brd: Rename XIP to DAX
 
 Ross Zwisler (1):
   ext4: Add DAX functionality
 
  Documentation/filesystems/Locking  |   3 -
  Documentation/filesystems/dax.txt  |  91 +++
  Documentation/filesystems/ext4.txt |   2 +
  Documentation/filesystems/xip.txt  |  68 -
  MAINTAINERS|   6 +
  arch/powerpc/sysdev/axonram.c  |  19 +-
  drivers/block/Kconfig  |  13 +-
  drivers/block/brd.c|  26 +-
  drivers/s390/block/dcssblk.c   |  21 +-
  fs/Kconfig |  21 +-
  fs/Makefile|   1 +
  fs/block_dev.c |  40 +++
  fs/dax.c   | 497 
 +
  fs/exofs/inode.c   |   1 -
  fs/ext2/Kconfig|  11 -
  fs/ext2/Makefile   |   1 -
  fs/ext2/ext2.h |  10 +-
  fs/ext2/file.c |  45 +++-
  fs/ext2/inode.c|  38 +--
  fs/ext2/namei.c|  13 +-
  fs/ext2/super.c|  53 ++--
  fs/ext2/xip.c  |  91 ---
  fs/ext2/xip.h  |  26 --
  fs/ext4/ext4.h |   6 +
  fs/ext4/file.c |  49 +++-
  fs/ext4/indirect.c |  18 +-
  fs/ext4/inode.c|  51 ++--
  fs/ext4/namei.c|  10 +-
  fs/ext4/super.c|  39 ++-
  fs/open.c  |   5 +-
  include/linux/blkdev.h |   6 +-
  include/linux/fs.h |  49 +++-
  include/linux/mm.h |   1 +
  include/linux/uio.h|   3 +
  mm/Makefile|   1 -
  mm/fadvise.c   |   6 +-
  mm/filemap.c   |   6 +-
  mm/filemap_xip.c   | 483 ---
  mm/iov_iter.c  | 237 --
  mm/madvise.c   |   2 +-
  mm/memory.c|  33 ++-
  41 files changed, 1229 insertions(+), 873 deletions(-)
  create mode 100644 Documentation/filesystems/dax.txt
  delete mode 100644 Documentation/filesystems/xip.txt
  create mode 100644 fs/dax.c
  delete mode 100644 fs/ext2/xip.c
  delete mode 100644 fs/ext2/xip.h
  delete mode 100644 mm/filemap_xip.c
 

--
To unsubscribe from this list: send 

Re: [PATCH v10 00/21] Support ext4 on NV-DIMMs

2014-08-28 Thread Matthew Wilcox
On Wed, Aug 27, 2014 at 02:46:22PM -0700, Andrew Morton wrote:
   Sat down to read all this but I'm finding it rather unwieldy - it's
   just a great blob of code.  Is there some overall
   what-it-does-and-how-it-does-it roadmap?
  
  The overall goal is to map persistent memory / NV-DIMMs directly to
  userspace.  We have that functionality in the XIP code, but the way
  it's structured is unsuitable for filesystems like ext4  XFS, and
  it has some pretty ugly races.
 
 When thinking about looking at the patchset I wonder things like how
 does mmap work, in what situations does a page get COWed, how do we
 handle partial pages at EOF, etc.  I guess that's all part of the
 filemap_xip legacy, the details of which I've totally forgotten.

mmap works by installing a PTE that points to the storage.  This implies
that the NV-DIMM has to be the kind that always has everything mapped
(there are other types that require commands to be sent to move windows
around that point into the storage ... DAX is not for these types
of DIMMs).

We use a VM_MIXEDMAP vma.  The PTEs pointing to PFNs will just get
copied across on fork.  Read-faults on holes are covered by a read-only
page cache page.  On a write to a hole, any page cache page covering it
will be unmapped and evicted from the page cache.  The mapping for the
faulting task will be replaced with a mapping to the newly established
block, but other mappings will take a fresh fault on their next reference.

Partial pages are mmapable, just as they are with page-cache based
files.  You can even store beyond EOF, just as with page-cache files.
Those stores are, of course, going to end up on persistence, but they
might well end up being zeroed if the file is extended ... again, this
is no different to page-cache based files.

   Performance testing results?
  
  I haven't been running any performance tests.  What sort of performance
  tests would be interesting for you to see?
 
 fs benchmarks?  `dd' would be a good start ;)
 
 I assume (because I wasn't told!) that there are two objectives here:
 
 1) reduce memory consumption by not maintaining pagecache and
 2) reduce CPU cost by avoiding the double-copies.
 
 These things are pretty easily quantified.  And really they must be
 quantified as part of the developer testing, because if you find
 they've worsened then holy cow, what went wrong.

It's really a functionality argument; the users we anticipate for NV-DIMMs
really want to directly map them into memory and do a lot of work through
loads and stores with the kernel not being involved at all, so we don't
actually have any performance targets for things like read/write.
That said, when running xfstests and comparing results between ext4
with and without DAX, I do see many of the tests completing quicker
with DAX than without (others run for thirty seconds so there's no
time difference between with/without).

 None of the patch titles identify the subsystem(s) which they're
 hitting.  eg, Introduce IS_DAX(inode) is an ext2 patch, but nobody
 would know that from browsing the titles.

I actually see that one as being a VFS patch ... ext2 changing is just
a side-effect.  I can re-split that patch if desired.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v10 00/21] Support ext4 on NV-DIMMs

2014-08-28 Thread Matthew Wilcox
On Wed, Aug 27, 2014 at 06:30:27PM -0700, Andy Lutomirski wrote:
 4) No page faults ever once a page is writable (I hope -- I'm not sure
 whether this series actually achieves that goal).

I can't think of a circumstance in which you'd end up taking a page fault
after a writable mapping is established.

The next part to this series (that I'm working on now) is PMD support.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v10 00/21] Support ext4 on NV-DIMMs

2014-08-28 Thread Zwisler, Ross
On Thu, 2014-08-28 at 11:08 +0300, Boaz Harrosh wrote:
 On 08/27/2014 06:45 AM, Matthew Wilcox wrote:
  One of the primary uses for NV-DIMMs is to expose them as a block device
  and use a filesystem to store files on the NV-DIMM.  While that works,
  it currently wastes memory and CPU time buffering the files in the page
  cache.  We have support in ext2 for bypassing the page cache, but it
  has some races which are unfixable in the current design.  This series
  of patches rewrite the underlying support, and add support for direct
  access to ext4.
  
  Note that patch 6/21 has been included in
  https://git.kernel.org/cgit/linux/kernel/git/viro/vfs.git/log/?h=for-next-candidate
  
 
 Matthew hi
 
 Could you please push this to the regular or a new public tree?
 
 (Old versions are at: https://github.com/01org/prd)
 
 Thanks
 Boaz

Hi Boaz,

I've pushed the updated tree to https://github.com/01org/prd in the master
branch.  All the older versions of the code that we've had while rebasing are
still available in their own branches.

Thanks,
- Ross



Re: [PATCH v10 00/21] Support ext4 on NV-DIMMs

2014-08-27 Thread Andy Lutomirski
On 08/27/2014 02:46 PM, Andrew Morton wrote:
> I assume (because I wasn't told!) that there are two objectives here:
> 
> 1) reduce memory consumption by not maintaining pagecache and
> 2) reduce CPU cost by avoiding the double-copies.
> 
> These things are pretty easily quantified.  And really they must be
> quantified as part of the developer testing, because if you find
> they've worsened then holy cow, what went wrong.
> 

There are two more huge ones:

3) Writes via mmap are immediately durable (or at least they're durable
after a *very* lightweight flush).

4) No page faults ever once a page is writable (I hope -- I'm not sure
whether this series actually achieves that goal).

A note on #3: there is ongoing work to enable write-through memory for
things like this.  Once that's done, then writes via mmap might actually
be synchronously durable, depending on chipset details.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v10 00/21] Support ext4 on NV-DIMMs

2014-08-27 Thread One Thousand Gnomes
On Wed, 27 Aug 2014 14:30:55 -0700
Andrew Morton  wrote:

> On Wed, 27 Aug 2014 16:22:20 -0500 (CDT) Christoph Lameter  
> wrote:
> 
> > > Some explanation of why one would use ext4 instead of, say,
> > > suitably-modified ramfs/tmpfs/rd/etc?
> > 
> > The NVDIMM contents survive reboot and therefore ramfs and friends wont
> > work with it.
> 
> See "suitably modified".  Presumably this type of memory would need to
> come from a particular page allocator zone.  ramfs would be unweildy
> due to its use to dentry/inode caches, but rd/etc should be feasible.

If you took one of the existing ramfs types you would then need to

- make it persistent in its storage, and put all the objects in the store
- add journalling for failures mid transaction. Your dimm may retain its
  bits but if your CPU reset mid fs operation its got to be recovered
- write an fsck tool for it
- validate it

at which point it's probably turned into ext4 8)

It's persistent but that doesn't solve the 'my box crashed' problem. 

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v10 00/21] Support ext4 on NV-DIMMs

2014-08-27 Thread Andrew Morton
On Wed, 27 Aug 2014 17:12:50 -0400 Matthew Wilcox  wrote:

> On Wed, Aug 27, 2014 at 01:06:13PM -0700, Andrew Morton wrote:
> > On Tue, 26 Aug 2014 23:45:20 -0400 Matthew Wilcox 
> >  wrote:
> > 
> > > One of the primary uses for NV-DIMMs is to expose them as a block device
> > > and use a filesystem to store files on the NV-DIMM.  While that works,
> > > it currently wastes memory and CPU time buffering the files in the page
> > > cache.  We have support in ext2 for bypassing the page cache, but it
> > > has some races which are unfixable in the current design.  This series
> > > of patches rewrite the underlying support, and add support for direct
> > > access to ext4.
> > 
> > Sat down to read all this but I'm finding it rather unwieldy - it's
> > just a great blob of code.  Is there some overall
> > what-it-does-and-how-it-does-it roadmap?
> 
> The overall goal is to map persistent memory / NV-DIMMs directly to
> userspace.  We have that functionality in the XIP code, but the way
> it's structured is unsuitable for filesystems like ext4 & XFS, and
> it has some pretty ugly races.

When thinking about looking at the patchset I wonder things like how
does mmap work, in what situations does a page get COWed, how do we
handle partial pages at EOF, etc.  I guess that's all part of the
filemap_xip legacy, the details of which I've totally forgotten.

> Patches 1 & 3 are simply bug-fixes.  They should go in regardless of
> the merits of anything else in this series.
> 
> Patch 2 changes the API for the direct_access block_device_operation so
> it can report more than a single page at a time.  As the series evolved,
> this work also included moving support for partitioning into the VFS
> where it belongs, handling various error cases in the VFS and so on.
> 
> Patch 4 is an optimisation.  It's poor form to make userspace take two
> faults for the same dereference.
> 
> Patch 5 gives us a VFS flag for the DAX property, which lets us get rid of
> the get_xip_mem() method later on.
> 
> Patch 6 is also prep work; Al Viro liked it enough that it's now in
> his tree.
> 
> The new DAX code is then dribbled in over patches 7-11, split up by
> functional area.  At each stage, the ext2-xip code is converted over to
> the new DAX code.
> 
> Patches 12-18 delete the remnants of the old XIP code, and fix the things
> in ext2 that Jan didn't like when he reviewed them for ext4 :-)
> 
> Patches 19 & 20 are the work to make ext4 use DAX.
> 
> Patch 21 is some final cleanup of references to the old XIP code, renaming
> it all to DAX.

hrm.

> > Some explanation of why one would use ext4 instead of, say,
> > suitably-modified ramfs/tmpfs/rd/etc?
> 
> ramfs and tmpfs really rely on the page cache.  They're not exactly
> built for permanence either.  brd also relies on the page cache, and
> there's a clear desire to use a filesystem instead of a block device
> for all the usual reasons of access permissions, grow/shrink, etc.
> 
> Some people might want to use XFS instead of ext4.  We're starting with
> ext4, but we've been keeping an eye on what other filesystems might want
> to use.  btrfs isn't going to use the DAX code, but some of the other
> pieces will probably come in handy.
> 
> There are also at least three people working on their own filesystems
> specially designed for persistent memory.  I wish them all the best
> ... but I'd like to get this infrastructure into place.

This is the sort of thing which first-timers (this one at least) like
to see in [0/n].

> > Performance testing results?
> 
> I haven't been running any performance tests.  What sort of performance
> tests would be interesting for you to see?

fs benchmarks?  `dd' would be a good start ;)

I assume (because I wasn't told!) that there are two objectives here:

1) reduce memory consumption by not maintaining pagecache and
2) reduce CPU cost by avoiding the double-copies.

These things are pretty easily quantified.  And really they must be
quantified as part of the developer testing, because if you find
they've worsened then holy cow, what went wrong.

> > Carsten Otte wrote filemap_xip.c and may be a useful reviewer of this
> > work.
> 
> I cc'd him on some earlier versions and didn't hear anything back.  It felt
> rude to keep plying him with 20+ patches every month.

OK.

> > All the patch subjects violate Documentation/SubmittingPatches
> > section 15 ;)
> 
> errr ... which bit?  I used git format-patch to create them.

None of the patch titles identify the subsystem(s) which they're
hitting.  eg, "Introduce IS_DAX(inode)" is an ext2 patch, but nobody
would know that from browsing the titles.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v10 00/21] Support ext4 on NV-DIMMs

2014-08-27 Thread Andrew Morton
On Wed, 27 Aug 2014 16:22:20 -0500 (CDT) Christoph Lameter  
wrote:

> > Some explanation of why one would use ext4 instead of, say,
> > suitably-modified ramfs/tmpfs/rd/etc?
> 
> The NVDIMM contents survive reboot and therefore ramfs and friends wont
> work with it.

See "suitably modified".  Presumably this type of memory would need to
come from a particular page allocator zone.  ramfs would be unweildy
due to its use to dentry/inode caches, but rd/etc should be feasible.

I dunno, I'm not proposing implementations - I'm asking obvious
questions.  Stuff which should have been addressed in the changelogs
before one even starts to read the code...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v10 00/21] Support ext4 on NV-DIMMs

2014-08-27 Thread Christoph Lameter
On Wed, 27 Aug 2014, Andrew Morton wrote:

> Sat down to read all this but I'm finding it rather unwieldy - it's
> just a great blob of code.  Is there some overall
> what-it-does-and-how-it-does-it roadmap?

Matthew gave a talk about DAX at the kernel summit. Its a great feature
because this is another piece of the bare metal hardware technology that
is being improved by him.

> Some explanation of why one would use ext4 instead of, say,
> suitably-modified ramfs/tmpfs/rd/etc?

The NVDIMM contents survive reboot and therefore ramfs and friends wont
work with it.

> Performance testing results?

This is obviously avoiding kernel buffering and therefore decreasing
kernel overhead for non volatile memory. Avoids useless duplication of
data from the non volatile memory into regular ram and allows direct
access to non volatile memory from user space in a controlled fashion.

I think this should be a priority item.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v10 00/21] Support ext4 on NV-DIMMs

2014-08-27 Thread Matthew Wilcox
On Wed, Aug 27, 2014 at 01:06:13PM -0700, Andrew Morton wrote:
> On Tue, 26 Aug 2014 23:45:20 -0400 Matthew Wilcox 
>  wrote:
> 
> > One of the primary uses for NV-DIMMs is to expose them as a block device
> > and use a filesystem to store files on the NV-DIMM.  While that works,
> > it currently wastes memory and CPU time buffering the files in the page
> > cache.  We have support in ext2 for bypassing the page cache, but it
> > has some races which are unfixable in the current design.  This series
> > of patches rewrite the underlying support, and add support for direct
> > access to ext4.
> 
> Sat down to read all this but I'm finding it rather unwieldy - it's
> just a great blob of code.  Is there some overall
> what-it-does-and-how-it-does-it roadmap?

The overall goal is to map persistent memory / NV-DIMMs directly to
userspace.  We have that functionality in the XIP code, but the way
it's structured is unsuitable for filesystems like ext4 & XFS, and
it has some pretty ugly races.

Patches 1 & 3 are simply bug-fixes.  They should go in regardless of
the merits of anything else in this series.

Patch 2 changes the API for the direct_access block_device_operation so
it can report more than a single page at a time.  As the series evolved,
this work also included moving support for partitioning into the VFS
where it belongs, handling various error cases in the VFS and so on.

Patch 4 is an optimisation.  It's poor form to make userspace take two
faults for the same dereference.

Patch 5 gives us a VFS flag for the DAX property, which lets us get rid of
the get_xip_mem() method later on.

Patch 6 is also prep work; Al Viro liked it enough that it's now in
his tree.

The new DAX code is then dribbled in over patches 7-11, split up by
functional area.  At each stage, the ext2-xip code is converted over to
the new DAX code.

Patches 12-18 delete the remnants of the old XIP code, and fix the things
in ext2 that Jan didn't like when he reviewed them for ext4 :-)

Patches 19 & 20 are the work to make ext4 use DAX.

Patch 21 is some final cleanup of references to the old XIP code, renaming
it all to DAX.

> Some explanation of why one would use ext4 instead of, say,
> suitably-modified ramfs/tmpfs/rd/etc?

ramfs and tmpfs really rely on the page cache.  They're not exactly
built for permanence either.  brd also relies on the page cache, and
there's a clear desire to use a filesystem instead of a block device
for all the usual reasons of access permissions, grow/shrink, etc.

Some people might want to use XFS instead of ext4.  We're starting with
ext4, but we've been keeping an eye on what other filesystems might want
to use.  btrfs isn't going to use the DAX code, but some of the other
pieces will probably come in handy.

There are also at least three people working on their own filesystems
specially designed for persistent memory.  I wish them all the best
... but I'd like to get this infrastructure into place.

> Performance testing results?

I haven't been running any performance tests.  What sort of performance
tests would be interesting for you to see?

> Carsten Otte wrote filemap_xip.c and may be a useful reviewer of this
> work.

I cc'd him on some earlier versions and didn't hear anything back.  It felt
rude to keep plying him with 20+ patches every month.

> All the patch subjects violate Documentation/SubmittingPatches
> section 15 ;)

errr ... which bit?  I used git format-patch to create them.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v10 00/21] Support ext4 on NV-DIMMs

2014-08-27 Thread Andrew Morton
On Tue, 26 Aug 2014 23:45:20 -0400 Matthew Wilcox  
wrote:

> One of the primary uses for NV-DIMMs is to expose them as a block device
> and use a filesystem to store files on the NV-DIMM.  While that works,
> it currently wastes memory and CPU time buffering the files in the page
> cache.  We have support in ext2 for bypassing the page cache, but it
> has some races which are unfixable in the current design.  This series
> of patches rewrite the underlying support, and add support for direct
> access to ext4.

Sat down to read all this but I'm finding it rather unwieldy - it's
just a great blob of code.  Is there some overall
what-it-does-and-how-it-does-it roadmap?

Some explanation of why one would use ext4 instead of, say,
suitably-modified ramfs/tmpfs/rd/etc?

Performance testing results?

Carsten Otte wrote filemap_xip.c and may be a useful reviewer of this
work.

All the patch subjects violate Documentation/SubmittingPatches
section 15 ;)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v10 00/21] Support ext4 on NV-DIMMs

2014-08-27 Thread Andrew Morton
On Tue, 26 Aug 2014 23:45:20 -0400 Matthew Wilcox matthew.r.wil...@intel.com 
wrote:

 One of the primary uses for NV-DIMMs is to expose them as a block device
 and use a filesystem to store files on the NV-DIMM.  While that works,
 it currently wastes memory and CPU time buffering the files in the page
 cache.  We have support in ext2 for bypassing the page cache, but it
 has some races which are unfixable in the current design.  This series
 of patches rewrite the underlying support, and add support for direct
 access to ext4.

Sat down to read all this but I'm finding it rather unwieldy - it's
just a great blob of code.  Is there some overall
what-it-does-and-how-it-does-it roadmap?

Some explanation of why one would use ext4 instead of, say,
suitably-modified ramfs/tmpfs/rd/etc?

Performance testing results?

Carsten Otte wrote filemap_xip.c and may be a useful reviewer of this
work.

All the patch subjects violate Documentation/SubmittingPatches
section 15 ;)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v10 00/21] Support ext4 on NV-DIMMs

2014-08-27 Thread Matthew Wilcox
On Wed, Aug 27, 2014 at 01:06:13PM -0700, Andrew Morton wrote:
 On Tue, 26 Aug 2014 23:45:20 -0400 Matthew Wilcox 
 matthew.r.wil...@intel.com wrote:
 
  One of the primary uses for NV-DIMMs is to expose them as a block device
  and use a filesystem to store files on the NV-DIMM.  While that works,
  it currently wastes memory and CPU time buffering the files in the page
  cache.  We have support in ext2 for bypassing the page cache, but it
  has some races which are unfixable in the current design.  This series
  of patches rewrite the underlying support, and add support for direct
  access to ext4.
 
 Sat down to read all this but I'm finding it rather unwieldy - it's
 just a great blob of code.  Is there some overall
 what-it-does-and-how-it-does-it roadmap?

The overall goal is to map persistent memory / NV-DIMMs directly to
userspace.  We have that functionality in the XIP code, but the way
it's structured is unsuitable for filesystems like ext4  XFS, and
it has some pretty ugly races.

Patches 1  3 are simply bug-fixes.  They should go in regardless of
the merits of anything else in this series.

Patch 2 changes the API for the direct_access block_device_operation so
it can report more than a single page at a time.  As the series evolved,
this work also included moving support for partitioning into the VFS
where it belongs, handling various error cases in the VFS and so on.

Patch 4 is an optimisation.  It's poor form to make userspace take two
faults for the same dereference.

Patch 5 gives us a VFS flag for the DAX property, which lets us get rid of
the get_xip_mem() method later on.

Patch 6 is also prep work; Al Viro liked it enough that it's now in
his tree.

The new DAX code is then dribbled in over patches 7-11, split up by
functional area.  At each stage, the ext2-xip code is converted over to
the new DAX code.

Patches 12-18 delete the remnants of the old XIP code, and fix the things
in ext2 that Jan didn't like when he reviewed them for ext4 :-)

Patches 19  20 are the work to make ext4 use DAX.

Patch 21 is some final cleanup of references to the old XIP code, renaming
it all to DAX.

 Some explanation of why one would use ext4 instead of, say,
 suitably-modified ramfs/tmpfs/rd/etc?

ramfs and tmpfs really rely on the page cache.  They're not exactly
built for permanence either.  brd also relies on the page cache, and
there's a clear desire to use a filesystem instead of a block device
for all the usual reasons of access permissions, grow/shrink, etc.

Some people might want to use XFS instead of ext4.  We're starting with
ext4, but we've been keeping an eye on what other filesystems might want
to use.  btrfs isn't going to use the DAX code, but some of the other
pieces will probably come in handy.

There are also at least three people working on their own filesystems
specially designed for persistent memory.  I wish them all the best
... but I'd like to get this infrastructure into place.

 Performance testing results?

I haven't been running any performance tests.  What sort of performance
tests would be interesting for you to see?

 Carsten Otte wrote filemap_xip.c and may be a useful reviewer of this
 work.

I cc'd him on some earlier versions and didn't hear anything back.  It felt
rude to keep plying him with 20+ patches every month.

 All the patch subjects violate Documentation/SubmittingPatches
 section 15 ;)

errr ... which bit?  I used git format-patch to create them.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v10 00/21] Support ext4 on NV-DIMMs

2014-08-27 Thread Christoph Lameter
On Wed, 27 Aug 2014, Andrew Morton wrote:

 Sat down to read all this but I'm finding it rather unwieldy - it's
 just a great blob of code.  Is there some overall
 what-it-does-and-how-it-does-it roadmap?

Matthew gave a talk about DAX at the kernel summit. Its a great feature
because this is another piece of the bare metal hardware technology that
is being improved by him.

 Some explanation of why one would use ext4 instead of, say,
 suitably-modified ramfs/tmpfs/rd/etc?

The NVDIMM contents survive reboot and therefore ramfs and friends wont
work with it.

 Performance testing results?

This is obviously avoiding kernel buffering and therefore decreasing
kernel overhead for non volatile memory. Avoids useless duplication of
data from the non volatile memory into regular ram and allows direct
access to non volatile memory from user space in a controlled fashion.

I think this should be a priority item.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v10 00/21] Support ext4 on NV-DIMMs

2014-08-27 Thread Andrew Morton
On Wed, 27 Aug 2014 16:22:20 -0500 (CDT) Christoph Lameter c...@linux.com 
wrote:

  Some explanation of why one would use ext4 instead of, say,
  suitably-modified ramfs/tmpfs/rd/etc?
 
 The NVDIMM contents survive reboot and therefore ramfs and friends wont
 work with it.

See suitably modified.  Presumably this type of memory would need to
come from a particular page allocator zone.  ramfs would be unweildy
due to its use to dentry/inode caches, but rd/etc should be feasible.

I dunno, I'm not proposing implementations - I'm asking obvious
questions.  Stuff which should have been addressed in the changelogs
before one even starts to read the code...

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v10 00/21] Support ext4 on NV-DIMMs

2014-08-27 Thread Andrew Morton
On Wed, 27 Aug 2014 17:12:50 -0400 Matthew Wilcox wi...@linux.intel.com wrote:

 On Wed, Aug 27, 2014 at 01:06:13PM -0700, Andrew Morton wrote:
  On Tue, 26 Aug 2014 23:45:20 -0400 Matthew Wilcox 
  matthew.r.wil...@intel.com wrote:
  
   One of the primary uses for NV-DIMMs is to expose them as a block device
   and use a filesystem to store files on the NV-DIMM.  While that works,
   it currently wastes memory and CPU time buffering the files in the page
   cache.  We have support in ext2 for bypassing the page cache, but it
   has some races which are unfixable in the current design.  This series
   of patches rewrite the underlying support, and add support for direct
   access to ext4.
  
  Sat down to read all this but I'm finding it rather unwieldy - it's
  just a great blob of code.  Is there some overall
  what-it-does-and-how-it-does-it roadmap?
 
 The overall goal is to map persistent memory / NV-DIMMs directly to
 userspace.  We have that functionality in the XIP code, but the way
 it's structured is unsuitable for filesystems like ext4  XFS, and
 it has some pretty ugly races.

When thinking about looking at the patchset I wonder things like how
does mmap work, in what situations does a page get COWed, how do we
handle partial pages at EOF, etc.  I guess that's all part of the
filemap_xip legacy, the details of which I've totally forgotten.

 Patches 1  3 are simply bug-fixes.  They should go in regardless of
 the merits of anything else in this series.
 
 Patch 2 changes the API for the direct_access block_device_operation so
 it can report more than a single page at a time.  As the series evolved,
 this work also included moving support for partitioning into the VFS
 where it belongs, handling various error cases in the VFS and so on.
 
 Patch 4 is an optimisation.  It's poor form to make userspace take two
 faults for the same dereference.
 
 Patch 5 gives us a VFS flag for the DAX property, which lets us get rid of
 the get_xip_mem() method later on.
 
 Patch 6 is also prep work; Al Viro liked it enough that it's now in
 his tree.
 
 The new DAX code is then dribbled in over patches 7-11, split up by
 functional area.  At each stage, the ext2-xip code is converted over to
 the new DAX code.
 
 Patches 12-18 delete the remnants of the old XIP code, and fix the things
 in ext2 that Jan didn't like when he reviewed them for ext4 :-)
 
 Patches 19  20 are the work to make ext4 use DAX.
 
 Patch 21 is some final cleanup of references to the old XIP code, renaming
 it all to DAX.

hrm.

  Some explanation of why one would use ext4 instead of, say,
  suitably-modified ramfs/tmpfs/rd/etc?
 
 ramfs and tmpfs really rely on the page cache.  They're not exactly
 built for permanence either.  brd also relies on the page cache, and
 there's a clear desire to use a filesystem instead of a block device
 for all the usual reasons of access permissions, grow/shrink, etc.
 
 Some people might want to use XFS instead of ext4.  We're starting with
 ext4, but we've been keeping an eye on what other filesystems might want
 to use.  btrfs isn't going to use the DAX code, but some of the other
 pieces will probably come in handy.
 
 There are also at least three people working on their own filesystems
 specially designed for persistent memory.  I wish them all the best
 ... but I'd like to get this infrastructure into place.

This is the sort of thing which first-timers (this one at least) like
to see in [0/n].

  Performance testing results?
 
 I haven't been running any performance tests.  What sort of performance
 tests would be interesting for you to see?

fs benchmarks?  `dd' would be a good start ;)

I assume (because I wasn't told!) that there are two objectives here:

1) reduce memory consumption by not maintaining pagecache and
2) reduce CPU cost by avoiding the double-copies.

These things are pretty easily quantified.  And really they must be
quantified as part of the developer testing, because if you find
they've worsened then holy cow, what went wrong.

  Carsten Otte wrote filemap_xip.c and may be a useful reviewer of this
  work.
 
 I cc'd him on some earlier versions and didn't hear anything back.  It felt
 rude to keep plying him with 20+ patches every month.

OK.

  All the patch subjects violate Documentation/SubmittingPatches
  section 15 ;)
 
 errr ... which bit?  I used git format-patch to create them.

None of the patch titles identify the subsystem(s) which they're
hitting.  eg, Introduce IS_DAX(inode) is an ext2 patch, but nobody
would know that from browsing the titles.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v10 00/21] Support ext4 on NV-DIMMs

2014-08-27 Thread One Thousand Gnomes
On Wed, 27 Aug 2014 14:30:55 -0700
Andrew Morton a...@linux-foundation.org wrote:

 On Wed, 27 Aug 2014 16:22:20 -0500 (CDT) Christoph Lameter c...@linux.com 
 wrote:
 
   Some explanation of why one would use ext4 instead of, say,
   suitably-modified ramfs/tmpfs/rd/etc?
  
  The NVDIMM contents survive reboot and therefore ramfs and friends wont
  work with it.
 
 See suitably modified.  Presumably this type of memory would need to
 come from a particular page allocator zone.  ramfs would be unweildy
 due to its use to dentry/inode caches, but rd/etc should be feasible.

If you took one of the existing ramfs types you would then need to

- make it persistent in its storage, and put all the objects in the store
- add journalling for failures mid transaction. Your dimm may retain its
  bits but if your CPU reset mid fs operation its got to be recovered
- write an fsck tool for it
- validate it

at which point it's probably turned into ext4 8)

It's persistent but that doesn't solve the 'my box crashed' problem. 

Alan
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v10 00/21] Support ext4 on NV-DIMMs

2014-08-27 Thread Andy Lutomirski
On 08/27/2014 02:46 PM, Andrew Morton wrote:
 I assume (because I wasn't told!) that there are two objectives here:
 
 1) reduce memory consumption by not maintaining pagecache and
 2) reduce CPU cost by avoiding the double-copies.
 
 These things are pretty easily quantified.  And really they must be
 quantified as part of the developer testing, because if you find
 they've worsened then holy cow, what went wrong.
 

There are two more huge ones:

3) Writes via mmap are immediately durable (or at least they're durable
after a *very* lightweight flush).

4) No page faults ever once a page is writable (I hope -- I'm not sure
whether this series actually achieves that goal).

A note on #3: there is ongoing work to enable write-through memory for
things like this.  Once that's done, then writes via mmap might actually
be synchronously durable, depending on chipset details.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v10 00/21] Support ext4 on NV-DIMMs

2014-08-26 Thread Matthew Wilcox
One of the primary uses for NV-DIMMs is to expose them as a block device
and use a filesystem to store files on the NV-DIMM.  While that works,
it currently wastes memory and CPU time buffering the files in the page
cache.  We have support in ext2 for bypassing the page cache, but it
has some races which are unfixable in the current design.  This series
of patches rewrite the underlying support, and add support for direct
access to ext4.

Note that patch 6/21 has been included in
https://git.kernel.org/cgit/linux/kernel/git/viro/vfs.git/log/?h=for-next-candidate

This iteration of the patchset rebases to 3.17-rc2, changes the page fault
locking, fixes a couple of bugs and makes a few other minor changes.

 - Move the calculation of the maximum size available at the requested
   location from the ->direct_access implementations to bdev_direct_access()
 - Fix a comment typo (Ross Zwisler)
 - Check that the requested length is positive in bdev_direct_access().  If
   it is not, assume that it's an errno, and just return it.
 - Fix some whitespace issues flagged by checkpatch
 - Added the Acked-by responses from Kirill that I forget in the last round
 - Added myself to MAINTAINERS for DAX
 - Fixed compilation with !CONFIG_DAX (Vishal Verma)
 - Revert the locking in the page fault handler back to an earlier version.
   If we hit the race that we were trying to protect against, we will leave
   blocks allocated past the end of the file.  They will be removed on file
   removal, the next truncate, or fsck.


Matthew Wilcox (20):
  axonram: Fix bug in direct_access
  Change direct_access calling convention
  Fix XIP fault vs truncate race
  Allow page fault handlers to perform the COW
  Introduce IS_DAX(inode)
  Add copy_to_iter(), copy_from_iter() and iov_iter_zero()
  Replace XIP read and write with DAX I/O
  Replace ext2_clear_xip_target with dax_clear_blocks
  Replace the XIP page fault handler with the DAX page fault handler
  Replace xip_truncate_page with dax_truncate_page
  Replace XIP documentation with DAX documentation
  Remove get_xip_mem
  ext2: Remove ext2_xip_verify_sb()
  ext2: Remove ext2_use_xip
  ext2: Remove xip.c and xip.h
  Remove CONFIG_EXT2_FS_XIP and rename CONFIG_FS_XIP to CONFIG_FS_DAX
  ext2: Remove ext2_aops_xip
  Get rid of most mentions of XIP in ext2
  xip: Add xip_zero_page_range
  brd: Rename XIP to DAX

Ross Zwisler (1):
  ext4: Add DAX functionality

 Documentation/filesystems/Locking  |   3 -
 Documentation/filesystems/dax.txt  |  91 +++
 Documentation/filesystems/ext4.txt |   2 +
 Documentation/filesystems/xip.txt  |  68 -
 MAINTAINERS|   6 +
 arch/powerpc/sysdev/axonram.c  |  19 +-
 drivers/block/Kconfig  |  13 +-
 drivers/block/brd.c|  26 +-
 drivers/s390/block/dcssblk.c   |  21 +-
 fs/Kconfig |  21 +-
 fs/Makefile|   1 +
 fs/block_dev.c |  40 +++
 fs/dax.c   | 497 +
 fs/exofs/inode.c   |   1 -
 fs/ext2/Kconfig|  11 -
 fs/ext2/Makefile   |   1 -
 fs/ext2/ext2.h |  10 +-
 fs/ext2/file.c |  45 +++-
 fs/ext2/inode.c|  38 +--
 fs/ext2/namei.c|  13 +-
 fs/ext2/super.c|  53 ++--
 fs/ext2/xip.c  |  91 ---
 fs/ext2/xip.h  |  26 --
 fs/ext4/ext4.h |   6 +
 fs/ext4/file.c |  49 +++-
 fs/ext4/indirect.c |  18 +-
 fs/ext4/inode.c|  51 ++--
 fs/ext4/namei.c|  10 +-
 fs/ext4/super.c|  39 ++-
 fs/open.c  |   5 +-
 include/linux/blkdev.h |   6 +-
 include/linux/fs.h |  49 +++-
 include/linux/mm.h |   1 +
 include/linux/uio.h|   3 +
 mm/Makefile|   1 -
 mm/fadvise.c   |   6 +-
 mm/filemap.c   |   6 +-
 mm/filemap_xip.c   | 483 ---
 mm/iov_iter.c  | 237 --
 mm/madvise.c   |   2 +-
 mm/memory.c|  33 ++-
 41 files changed, 1229 insertions(+), 873 deletions(-)
 create mode 100644 Documentation/filesystems/dax.txt
 delete mode 100644 Documentation/filesystems/xip.txt
 create mode 100644 fs/dax.c
 delete mode 100644 fs/ext2/xip.c
 delete mode 100644 fs/ext2/xip.h
 delete mode 100644 mm/filemap_xip.c

-- 
2.0.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v10 00/21] Support ext4 on NV-DIMMs

2014-08-26 Thread Matthew Wilcox
One of the primary uses for NV-DIMMs is to expose them as a block device
and use a filesystem to store files on the NV-DIMM.  While that works,
it currently wastes memory and CPU time buffering the files in the page
cache.  We have support in ext2 for bypassing the page cache, but it
has some races which are unfixable in the current design.  This series
of patches rewrite the underlying support, and add support for direct
access to ext4.

Note that patch 6/21 has been included in
https://git.kernel.org/cgit/linux/kernel/git/viro/vfs.git/log/?h=for-next-candidate

This iteration of the patchset rebases to 3.17-rc2, changes the page fault
locking, fixes a couple of bugs and makes a few other minor changes.

 - Move the calculation of the maximum size available at the requested
   location from the -direct_access implementations to bdev_direct_access()
 - Fix a comment typo (Ross Zwisler)
 - Check that the requested length is positive in bdev_direct_access().  If
   it is not, assume that it's an errno, and just return it.
 - Fix some whitespace issues flagged by checkpatch
 - Added the Acked-by responses from Kirill that I forget in the last round
 - Added myself to MAINTAINERS for DAX
 - Fixed compilation with !CONFIG_DAX (Vishal Verma)
 - Revert the locking in the page fault handler back to an earlier version.
   If we hit the race that we were trying to protect against, we will leave
   blocks allocated past the end of the file.  They will be removed on file
   removal, the next truncate, or fsck.


Matthew Wilcox (20):
  axonram: Fix bug in direct_access
  Change direct_access calling convention
  Fix XIP fault vs truncate race
  Allow page fault handlers to perform the COW
  Introduce IS_DAX(inode)
  Add copy_to_iter(), copy_from_iter() and iov_iter_zero()
  Replace XIP read and write with DAX I/O
  Replace ext2_clear_xip_target with dax_clear_blocks
  Replace the XIP page fault handler with the DAX page fault handler
  Replace xip_truncate_page with dax_truncate_page
  Replace XIP documentation with DAX documentation
  Remove get_xip_mem
  ext2: Remove ext2_xip_verify_sb()
  ext2: Remove ext2_use_xip
  ext2: Remove xip.c and xip.h
  Remove CONFIG_EXT2_FS_XIP and rename CONFIG_FS_XIP to CONFIG_FS_DAX
  ext2: Remove ext2_aops_xip
  Get rid of most mentions of XIP in ext2
  xip: Add xip_zero_page_range
  brd: Rename XIP to DAX

Ross Zwisler (1):
  ext4: Add DAX functionality

 Documentation/filesystems/Locking  |   3 -
 Documentation/filesystems/dax.txt  |  91 +++
 Documentation/filesystems/ext4.txt |   2 +
 Documentation/filesystems/xip.txt  |  68 -
 MAINTAINERS|   6 +
 arch/powerpc/sysdev/axonram.c  |  19 +-
 drivers/block/Kconfig  |  13 +-
 drivers/block/brd.c|  26 +-
 drivers/s390/block/dcssblk.c   |  21 +-
 fs/Kconfig |  21 +-
 fs/Makefile|   1 +
 fs/block_dev.c |  40 +++
 fs/dax.c   | 497 +
 fs/exofs/inode.c   |   1 -
 fs/ext2/Kconfig|  11 -
 fs/ext2/Makefile   |   1 -
 fs/ext2/ext2.h |  10 +-
 fs/ext2/file.c |  45 +++-
 fs/ext2/inode.c|  38 +--
 fs/ext2/namei.c|  13 +-
 fs/ext2/super.c|  53 ++--
 fs/ext2/xip.c  |  91 ---
 fs/ext2/xip.h  |  26 --
 fs/ext4/ext4.h |   6 +
 fs/ext4/file.c |  49 +++-
 fs/ext4/indirect.c |  18 +-
 fs/ext4/inode.c|  51 ++--
 fs/ext4/namei.c|  10 +-
 fs/ext4/super.c|  39 ++-
 fs/open.c  |   5 +-
 include/linux/blkdev.h |   6 +-
 include/linux/fs.h |  49 +++-
 include/linux/mm.h |   1 +
 include/linux/uio.h|   3 +
 mm/Makefile|   1 -
 mm/fadvise.c   |   6 +-
 mm/filemap.c   |   6 +-
 mm/filemap_xip.c   | 483 ---
 mm/iov_iter.c  | 237 --
 mm/madvise.c   |   2 +-
 mm/memory.c|  33 ++-
 41 files changed, 1229 insertions(+), 873 deletions(-)
 create mode 100644 Documentation/filesystems/dax.txt
 delete mode 100644 Documentation/filesystems/xip.txt
 create mode 100644 fs/dax.c
 delete mode 100644 fs/ext2/xip.c
 delete mode 100644 fs/ext2/xip.h
 delete mode 100644 mm/filemap_xip.c

-- 
2.0.0

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/