Re: [Lsf-pc] [LSF/MM TOPIC] Badblocks checking/representation in filesystems

2017-01-19 Thread Jan Kara
On Wed 18-01-17 21:56:58, Verma, Vishal L wrote: > On Wed, 2017-01-18 at 13:32 -0800, Dan Williams wrote: > > On Wed, Jan 18, 2017 at 1:02 PM, Darrick J. Wong > > wrote: > > > On Wed, Jan 18, 2017 at 03:39:17PM -0500, Jeff Moyer wrote: > > > > Jan Kara

[PATCH v2 1/2] xfs: test per-inode DAX flag by IO

2017-01-19 Thread Xiong Zhou
In a DAX mountpoint, do IO betwen files with and without DAX per-inode flag. We do mmap and O_DIRECT read/write IO in this case. Then test again in the same device without dax mountoption. Add help _require_scratch_dax to make sure we can test DAX feature on SCRATCH_DEV. Add mmap dio test

[PATCH v2 0/2] mmap dio and DAX

2017-01-19 Thread Xiong Zhou
v2 : Merge helper function changes into the first patch; Rewrite _require_dax, check options for sure; Print msg in t_mmap_dio.c to show which test going wrong; Empty mount options and check after mount to ensure we wont mount with wrong option; Remove unnecessary leading underscore and

[PATCH v3 11/12] mm: enable section-unaligned devm_memremap_pages()

2017-01-19 Thread Dan Williams
Teach devm_memremap_pages() about the new sub-section capabilities of arch_{add,remove}_memory(). Cc: Michal Hocko Cc: Toshi Kani Cc: Andrew Morton Cc: Logan Gunthorpe Cc: Stephen Bates

[PATCH v3 02/12] mm, devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookups

2017-01-19 Thread Dan Williams
devm_memremap_pages() records mapped ranges in pgmap_radix with an entry per section's worth of memory (128MB). The key for each of those entries is a section number. This leads to false positives when devm_memremap_pages() is passed a section-unaligned range as lookups in the misalignment fail

[PATCH v3 07/12] mm: fix register_new_memory() zone type detection

2017-01-19 Thread Dan Williams
In preparation for sub-section memory hotplug support, remove a dependency on ->section_mem_map being populated. In SPARSEMEM_VMEMMAP=y configurations pfn_to_page() does not use ->section_mem_map. The sub-section hotplug support relies on this fact and skips initializing it. Without

[PATCH v3 04/12] mm: introduce common definitions for the size and mask of a section

2017-01-19 Thread Dan Williams
Up-level the local section size and mask from kernel/memremap.c to global definitions. These will be used by the new sub-section hotplug support. Cc: Michal Hocko Cc: Vlastimil Babka Cc: Johannes Weiner Cc: Logan Gunthorpe

[PATCH v3 05/12] mm: cleanup sparse_init_one_section() return value

2017-01-19 Thread Dan Williams
We mark and check that the section is present under a spin_lock() in sparse_add_one_section(), so the lock ensures it will not change between those 2 events. Also, we do not check the -EBUSY return value in sparse_init(). Just make sparse_init_one_section() return void and clean up the error

[PATCH v3 00/12] mm: sub-section memory hotplug support

2017-01-19 Thread Dan Williams
Changes since v2 [1]: 1/ Fixed a bug inserting multi-order entries into pgmap_radix. The insert index needs to be 'order' aligned. 2/ Fixed a __meminit section mismatch warning for section_activate() 3/ Forward ported to v4.10-rc4 [1]: https://lwn.net/Articles/708627/ --- The initial

[PATCH v3 08/12] mm: convert kmalloc_section_memmap() to populate_section_memmap()

2017-01-19 Thread Dan Williams
Allow sub-section sized ranges to be added to the memmap. populate_section_memmap() takes an explict pfn range rather than assuming a full section, and those parameters are plumbed all the way through to vmmemap_populate(). There should be no sub-section in current code. New warnings are added to

[PATCH v3 01/12] mm: fix type width of section to/from pfn conversion macros

2017-01-19 Thread Dan Williams
section_nr_to_pfn() will silently accept an argument that is too small to contain a pfn. Cast the argument to an unsigned long, similar to PFN_PHYS(). Fix up pfn_to_section_nr() in the same way. This was discovered in __add_pages() when converting it to use an signed integer for the loop

[PATCH v3 09/12] mm: prepare for hot-{add, remove} of sub-section ranges

2017-01-19 Thread Dan Williams
Prepare the memory hot-{add,remove} paths for handling sub-section ranges by plumbing the starting page frame and number of pages being handled through arch_{add,remove}_memory() to sparse_{add,remove}_one_section(). This is simply plumbing, small cleanups, and some identifier renames. No

[PATCH v3 03/12] mm: introduce struct mem_section_usage to track partial population of a section

2017-01-19 Thread Dan Williams
'struct mem_section_usage' combines the existing 'pageblock_flags' bitmap with a new 'map_active' bitmap. The new bitmap enables the memory hot{plug,remove} implementation to act on incremental sub-divisions of a section. The primary impetus for this functionality is to support platforms that mix

Re: [Lsf-pc] [LSF/MM TOPIC] Badblocks checking/representation in filesystems

2017-01-19 Thread Dan Williams
On Thu, Jan 19, 2017 at 10:59 AM, Vishal Verma wrote: > On 01/19, Jan Kara wrote: >> On Wed 18-01-17 21:56:58, Verma, Vishal L wrote: >> > On Wed, 2017-01-18 at 13:32 -0800, Dan Williams wrote: >> > > On Wed, Jan 18, 2017 at 1:02 PM, Darrick J. Wong >> > >

Re: [PATCH v2 1/2] xfs: test per-inode DAX flag by IO

2017-01-19 Thread Ross Zwisler
On Thu, Jan 19, 2017 at 06:13:57PM +0800, Xiong Zhou wrote: > In a DAX mountpoint, do IO betwen files with and > without DAX per-inode flag. We do mmap and O_DIRECT > read/write IO in this case. Then test again in the > same device without dax mountoption. > > Add help _require_scratch_dax to

Re: [Lsf-pc] [LSF/MM TOPIC] Badblocks checking/representation in filesystems

2017-01-19 Thread Vishal Verma
On 01/18, Jan Kara wrote: > On Tue 17-01-17 15:37:05, Vishal Verma wrote: > > I do mean that in the filesystem, for every IO, the badblocks will be > > checked. Currently, the pmem driver does this, and the hope is that the > > filesystem can do a better job at it. The driver unconditionally

Re: [LSF/MM TOPIC] Badblocks checking/representation in filesystems

2017-01-19 Thread Verma, Vishal L
On Tue, 2017-01-17 at 18:01 -0800, Andiry Xu wrote: > On Tue, Jan 17, 2017 at 4:16 PM, Andreas Dilger > wrote: > > On Jan 17, 2017, at 3:15 PM, Andiry Xu wrote: > > > On Tue, Jan 17, 2017 at 1:35 PM, Vishal Verma > > l.com> wrote: > > >

Re: [PATCH v3 06/12] mm: track active portions of a section at boot

2017-01-19 Thread Andrew Morton
On Thu, 19 Jan 2017 14:07:13 -0800 Dan Williams wrote: > Prepare for hot{plug,remove} of sub-ranges of a section by tracking a > section active bitmask, each bit representing 2MB (SECTION_SIZE (128M) / > map_active bitmask length (64)). > > ---

Re: [LSF/MM TOPIC] Badblocks checking/representation in filesystems

2017-01-19 Thread Verma, Vishal L
On Tue, 2017-01-17 at 17:58 -0800, Andiry Xu wrote: > On Tue, Jan 17, 2017 at 3:51 PM, Vishal Verma m> wrote: > > On 01/17, Andiry Xu wrote: > > > > > > > > > > > > > > > > The pmem_do_bvec() read logic is like this: > > > > > > > > > > pmem_do_bvec() > > > > >

[PATCH] ndctl: add a BTT check utility

2017-01-19 Thread Vishal Verma
Add the check-namespace command to ndctl. This will check the BTT metadata layout for the given namespace, and if requested, correct any errors found. Not all metadata corruption is detectable or fixable. Signed-off-by: Vishal Verma --- Documentation/Makefile.am

[PATCH 05/13] x86, dax: replace clear_pmem() with open coded memset + dax_ops->flush

2017-01-19 Thread Dan Williams
The clear_pmem() helper simply combines a memset() plus a cache flush. Now that the flush routine is optionally provided by the dax device driver we can avoid unnecessary cache management on dax devices fronting volatile memory. With clear_pmem() gone we can follow on with a patch to make pmem

[PATCH 03/13] x86, dax, pmem: introduce 'copy_from_iter' dax operation

2017-01-19 Thread Dan Williams
The direct-I/O write path for a pmem device must ensure that data is flushed to a power-fail safe zone when the operation is complete. However, other dax capable block devices, like brd, do not have this requirement. Introduce a 'copy_from_iter' dax operation so that pmem can inject cache

[PATCH 02/13] block, dax: introduce dax_operations

2017-01-19 Thread Dan Williams
Prepare for the removal of memcpy_to_pmem() and copy_from_iter_pmem() by introducing dax_ops. This allows for driver specific overrides for the routines that transfer data to a dax capable block device. Cc: Cc: Jan Kara Cc: Jens Axboe Cc: Jeff

[PATCH 08/13] x86, libnvdimm, dax: stop abusing __copy_user_nocache

2017-01-19 Thread Dan Williams
The pmem and nd_blk drivers both have need to copy data through the cpu cache to persistent memory. To date they have been abusing __copy_user_nocache through the memcpy_to_pmem abstraction, but this has several problems: * __copy_user_nocache does not guarantee that it will always avoid the

[PATCH 04/13] dax, pmem: introduce an optional 'flush' dax operation

2017-01-19 Thread Dan Williams
Filesystem-DAX flushes caches whenever it writes to the address returned through dax_map_atomic() and when writing back dirty radix entries. That flushing is only required in the pmem case, so add a dax operation to allow pmem to take this extra action, but skip it for other dax capable

[PATCH 07/13] x86, libnvdimm, pmem: move arch_invalidate_pmem() to libnvdimm

2017-01-19 Thread Dan Williams
Kill this globally defined wrapper and move to libnvdimm so that we can ultimately remove the public pmem api. Cc: Cc: Jan Kara Cc: Jeff Moyer Cc: Ingo Molnar Cc: Christoph Hellwig Cc: "H. Peter Anvin"

[PATCH 10/13] libnvdimm, pmem: fix persistence warning

2017-01-19 Thread Dan Williams
The pmem driver assumes if platform firmware describes the memory devices associated with a persistent memory range and CONFIG_ARCH_HAS_PMEM_API=y that it has all the mechanism necessary to flush data to a power-fail safe zone. We warn if the firmware does not describe memory devices, but we also

mmap dio write failure

2017-01-19 Thread Xiong Zhou
Hi, At first, I am not sure whether this is an issue. mmap a file in a DAX mountpoint, open another file in a non-DAX mountpoint with O_DIRECT, write the mapped area to the other file. This write Success on pmem ramdisk(memmap=2G!20G like) This write Fail(Bad address) on nvdimm pmem devices.

Re: [Lsf-pc] [LSF/MM TOPIC] Badblocks checking/representation in filesystems

2017-01-19 Thread Jeff Moyer
Hi, Slava, Slava Dubeyko writes: >>The data is lost, that's why you're getting an ECC. It's tantamount >>to -EIO for a disk block access. > > I see the three possible cases here: > (1) bad block has been discovered (no remap, no recovering) -> data is >> lost; -EIO

Re: mmap dio write failure

2017-01-19 Thread Dan Williams
On Thu, Jan 19, 2017 at 8:40 PM, Xiong Zhou wrote: > Hi, > > At first, I am not sure whether this is an issue. > > mmap a file in a DAX mountpoint, open another file > in a non-DAX mountpoint with O_DIRECT, write the > mapped area to the other file. > > This write Success on

[PATCH v4 0/2] mmap dio and DAX

2017-01-19 Thread Xiong Zhou
common/rc : requires SCRATCH_DEV support DAX src/t_mmap_dio.c : intro mmap and O_DIRECT rw through files tests/generic/405 : IO between DAX/non-DAX mountpoints tests/xfs/138 : IO between DAX/non-DAX xfs files(per-inode flag) v2 : Merge helper function changes into the first patch;

[PATCH v4 2/2] generic: test mmap io through DAX and non-DAX

2017-01-19 Thread Xiong Zhou
Mount TEST_DEV as non-DAX, SCRATCH_DEV as DAX, then do some IO between them. In this case we use mmap and dio/buffered IO read/write test programme. Signed-off-by: Xiong Zhou --- tests/generic/405 | 119 ++