[PATCH v6 04/17] ext2: remove support for DAX PMD faults

2016-10-12 Thread Ross Zwisler
DAX PMD support was added via the following commit:

commit e7b1ea2ad658 ("ext2: huge page fault support")

I believe this path to be untested as ext2 doesn't reliably provide block
allocations that are aligned to 2MiB.  In my testing I've been unable to
get ext2 to actually fault in a PMD.  It always fails with a "pfn
unaligned" message because the sector returned by ext2_get_block() isn't
aligned.

I've tried various settings for the "stride" and "stripe_width" extended
options to mkfs.ext2, without any luck.

Since we can't reliably get PMDs, remove support so that we don't have an
untested code path that we may someday traverse when we happen to get an
aligned block allocation.  This should also make 4k DAX faults in ext2 a
bit faster since they will no longer have to call the PMD fault handler
only to get a response of VM_FAULT_FALLBACK.

Signed-off-by: Ross Zwisler 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Jan Kara 
---
 fs/ext2/file.c | 29 ++---
 1 file changed, 6 insertions(+), 23 deletions(-)

diff --git a/fs/ext2/file.c b/fs/ext2/file.c
index 0ca363d..0f257f8 100644
--- a/fs/ext2/file.c
+++ b/fs/ext2/file.c
@@ -107,27 +107,6 @@ static int ext2_dax_fault(struct vm_area_struct *vma, 
struct vm_fault *vmf)
return ret;
 }
 
-static int ext2_dax_pmd_fault(struct vm_area_struct *vma, unsigned long addr,
-   pmd_t *pmd, unsigned int flags)
-{
-   struct inode *inode = file_inode(vma->vm_file);
-   struct ext2_inode_info *ei = EXT2_I(inode);
-   int ret;
-
-   if (flags & FAULT_FLAG_WRITE) {
-   sb_start_pagefault(inode->i_sb);
-   file_update_time(vma->vm_file);
-   }
-   down_read(>dax_sem);
-
-   ret = dax_pmd_fault(vma, addr, pmd, flags, ext2_get_block);
-
-   up_read(>dax_sem);
-   if (flags & FAULT_FLAG_WRITE)
-   sb_end_pagefault(inode->i_sb);
-   return ret;
-}
-
 static int ext2_dax_pfn_mkwrite(struct vm_area_struct *vma,
struct vm_fault *vmf)
 {
@@ -154,7 +133,11 @@ static int ext2_dax_pfn_mkwrite(struct vm_area_struct *vma,
 
 static const struct vm_operations_struct ext2_dax_vm_ops = {
.fault  = ext2_dax_fault,
-   .pmd_fault  = ext2_dax_pmd_fault,
+   /*
+* .pmd_fault is not supported for DAX because allocation in ext2
+* cannot be reliably aligned to huge page sizes and so pmd faults
+* will always fail and fail back to regular faults.
+*/
.page_mkwrite   = ext2_dax_fault,
.pfn_mkwrite= ext2_dax_pfn_mkwrite,
 };
@@ -166,7 +149,7 @@ static int ext2_file_mmap(struct file *file, struct 
vm_area_struct *vma)
 
file_accessed(file);
vma->vm_ops = _dax_vm_ops;
-   vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE;
+   vma->vm_flags |= VM_MIXEDMAP;
return 0;
 }
 #else
-- 
2.9.0

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v6 01/17] ext4: allow DAX writeback for hole punch

2016-10-12 Thread Ross Zwisler
Currently when doing a DAX hole punch with ext4 we fail to do a writeback.
This is because the logic around filemap_write_and_wait_range() in
ext4_punch_hole() only looks for dirty page cache pages in the radix tree,
not for dirty DAX exceptional entries.

Signed-off-by: Ross Zwisler 
Reviewed-by: Jan Kara 
Cc: 
---
 fs/ext4/inode.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3131747..0900cb4 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3890,7 +3890,7 @@ int ext4_update_disksize_before_punch(struct inode 
*inode, loff_t offset,
 }
 
 /*
- * ext4_punch_hole: punches a hole in a file by releaseing the blocks
+ * ext4_punch_hole: punches a hole in a file by releasing the blocks
  * associated with the given offset and length
  *
  * @inode:  File inode
@@ -3919,7 +3919,7 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, 
loff_t length)
 * Write out all dirty pages to avoid race conditions
 * Then release them.
 */
-   if (mapping->nrpages && mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
+   if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
ret = filemap_write_and_wait_range(mapping, offset,
   offset + length - 1);
if (ret)
-- 
2.9.0

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v6 13/17] dax: move RADIX_DAX_* defines to dax.h

2016-10-12 Thread Ross Zwisler
The RADIX_DAX_* defines currently mostly live in fs/dax.c, with just
RADIX_DAX_ENTRY_LOCK being in include/linux/dax.h so it can be used in
mm/filemap.c.  When we add PMD support, though, mm/filemap.c will also need
access to the RADIX_DAX_PTE type so it can properly construct a 4k sized
empty entry.

Instead of shifting the defines between dax.c and dax.h as they are
individually used in other code, just move them wholesale to dax.h so
they'll be available when we need them.

Signed-off-by: Ross Zwisler 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Jan Kara 
---
 fs/dax.c| 14 --
 include/linux/dax.h | 15 ++-
 2 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 6edd89b..c45cc4d 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -34,20 +34,6 @@
 #include 
 #include "internal.h"
 
-/*
- * We use lowest available bit in exceptional entry for locking, other two
- * bits to determine entry type. In total 3 special bits.
- */
-#define RADIX_DAX_SHIFT(RADIX_TREE_EXCEPTIONAL_SHIFT + 3)
-#define RADIX_DAX_PTE (1 << (RADIX_TREE_EXCEPTIONAL_SHIFT + 1))
-#define RADIX_DAX_PMD (1 << (RADIX_TREE_EXCEPTIONAL_SHIFT + 2))
-#define RADIX_DAX_TYPE_MASK (RADIX_DAX_PTE | RADIX_DAX_PMD)
-#define RADIX_DAX_TYPE(entry) ((unsigned long)entry & RADIX_DAX_TYPE_MASK)
-#define RADIX_DAX_SECTOR(entry) (((unsigned long)entry >> RADIX_DAX_SHIFT))
-#define RADIX_DAX_ENTRY(sector, pmd) ((void *)((unsigned long)sector << \
-   RADIX_DAX_SHIFT | (pmd ? RADIX_DAX_PMD : RADIX_DAX_PTE) | \
-   RADIX_TREE_EXCEPTIONAL_ENTRY))
-
 /* We choose 4096 entries - same as per-zone page wait tables */
 #define DAX_WAIT_TABLE_BITS 12
 #define DAX_WAIT_TABLE_ENTRIES (1 << DAX_WAIT_TABLE_BITS)
diff --git a/include/linux/dax.h b/include/linux/dax.h
index a3dfee4..e9ea78c 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -8,8 +8,21 @@
 
 struct iomap_ops;
 
-/* We use lowest available exceptional entry bit for locking */
+/*
+ * We use lowest available bit in exceptional entry for locking, other two
+ * bits to determine entry type. In total 3 special bits.
+ */
+#define RADIX_DAX_SHIFT(RADIX_TREE_EXCEPTIONAL_SHIFT + 3)
 #define RADIX_DAX_ENTRY_LOCK (1 << RADIX_TREE_EXCEPTIONAL_SHIFT)
+#define RADIX_DAX_PTE (1 << (RADIX_TREE_EXCEPTIONAL_SHIFT + 1))
+#define RADIX_DAX_PMD (1 << (RADIX_TREE_EXCEPTIONAL_SHIFT + 2))
+#define RADIX_DAX_TYPE_MASK (RADIX_DAX_PTE | RADIX_DAX_PMD)
+#define RADIX_DAX_TYPE(entry) ((unsigned long)entry & RADIX_DAX_TYPE_MASK)
+#define RADIX_DAX_SECTOR(entry) (((unsigned long)entry >> RADIX_DAX_SHIFT))
+#define RADIX_DAX_ENTRY(sector, pmd) ((void *)((unsigned long)sector << \
+   RADIX_DAX_SHIFT | (pmd ? RADIX_DAX_PMD : RADIX_DAX_PTE) | \
+   RADIX_TREE_EXCEPTIONAL_ENTRY))
+
 
 ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
struct iomap_ops *ops);
-- 
2.9.0

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v6 12/17] dax: dax_iomap_fault() needs to call iomap_end()

2016-10-12 Thread Ross Zwisler
Currently iomap_end() doesn't do anything for DAX page faults for both ext2
and XFS.  ext2_iomap_end() just checks for a write underrun, and
xfs_file_iomap_end() checks to see if it needs to finish a delayed
allocation.  However, in the future iomap_end() calls might be needed to
make sure we have balanced allocations, locks, etc.  So, add calls to
iomap_end() with appropriate error handling to dax_iomap_fault().

Signed-off-by: Ross Zwisler 
Suggested-by: Jan Kara 
Reviewed-by: Jan Kara 
---
 fs/dax.c | 37 +
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 7737954..6edd89b 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1165,6 +1165,7 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct 
vm_fault *vmf,
struct iomap iomap = { 0 };
unsigned flags = 0;
int error, major = 0;
+   int locked_status = 0;
void *entry;
 
/*
@@ -1194,7 +1195,7 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct 
vm_fault *vmf,
goto unlock_entry;
if (WARN_ON_ONCE(iomap.offset + iomap.length < pos + PAGE_SIZE)) {
error = -EIO;   /* fs corruption? */
-   goto unlock_entry;
+   goto finish_iomap;
}
 
sector = dax_iomap_sector(, pos);
@@ -1216,13 +1217,15 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct 
vm_fault *vmf,
}
 
if (error)
-   goto unlock_entry;
+   goto finish_iomap;
if (!radix_tree_exceptional_entry(entry)) {
vmf->page = entry;
-   return VM_FAULT_LOCKED;
+   locked_status = VM_FAULT_LOCKED;
+   } else {
+   vmf->entry = entry;
+   locked_status = VM_FAULT_DAX_LOCKED;
}
-   vmf->entry = entry;
-   return VM_FAULT_DAX_LOCKED;
+   goto finish_iomap;
}
 
switch (iomap.type) {
@@ -1237,8 +1240,10 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct 
vm_fault *vmf,
break;
case IOMAP_UNWRITTEN:
case IOMAP_HOLE:
-   if (!(vmf->flags & FAULT_FLAG_WRITE))
-   return dax_load_hole(mapping, entry, vmf);
+   if (!(vmf->flags & FAULT_FLAG_WRITE)) {
+   locked_status = dax_load_hole(mapping, entry, vmf);
+   break;
+   }
/*FALLTHRU*/
default:
WARN_ON_ONCE(1);
@@ -1246,14 +1251,30 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct 
vm_fault *vmf,
break;
}
 
+ finish_iomap:
+   if (ops->iomap_end) {
+   if (error) {
+   /* keep previous error */
+   ops->iomap_end(inode, pos, PAGE_SIZE, 0, flags,
+   );
+   } else {
+   error = ops->iomap_end(inode, pos, PAGE_SIZE,
+   PAGE_SIZE, flags, );
+   }
+   }
  unlock_entry:
-   put_locked_mapping_entry(mapping, vmf->pgoff, entry);
+   if (!locked_status || error)
+   put_locked_mapping_entry(mapping, vmf->pgoff, entry);
  out:
if (error == -ENOMEM)
return VM_FAULT_OOM | major;
/* -EBUSY is fine, somebody else faulted on the same PTE */
if (error < 0 && error != -EBUSY)
return VM_FAULT_SIGBUS | major;
+   if (locked_status) {
+   WARN_ON_ONCE(error); /* -EBUSY from ops->iomap_end? */
+   return locked_status;
+   }
return VM_FAULT_NOPAGE | major;
 }
 EXPORT_SYMBOL_GPL(dax_iomap_fault);
-- 
2.9.0

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v6 02/17] ext4: tell DAX the size of allocation holes

2016-10-12 Thread Ross Zwisler
When DAX calls _ext4_get_block() and the file offset points to a hole we
currently don't set bh->b_size.  This is current worked around via
buffer_size_valid() in fs/dax.c.

_ext4_get_block() has the hole size information from ext4_map_blocks(), so
populate bh->b_size so we can remove buffer_size_valid() in a later patch.

Signed-off-by: Ross Zwisler 
Reviewed-by: Jan Kara 
---
 fs/ext4/inode.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 0900cb4..9075fac 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -759,6 +759,9 @@ static int _ext4_get_block(struct inode *inode, sector_t 
iblock,
ext4_update_bh_state(bh, map.m_flags);
bh->b_size = inode->i_sb->s_blocksize * map.m_len;
ret = 0;
+   } else if (ret == 0) {
+   /* hole case, need to fill in bh->b_size */
+   bh->b_size = inode->i_sb->s_blocksize * map.m_len;
}
return ret;
 }
-- 
2.9.0

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v6 17/17] dax: remove "depends on BROKEN" from FS_DAX_PMD

2016-10-12 Thread Ross Zwisler
Now that DAX PMD faults are once again working and are now participating in
DAX's radix tree locking scheme, allow their config option to be enabled.

Signed-off-by: Ross Zwisler 
---
 fs/Kconfig | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index 2bc7ad7..b6f0fce 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -55,7 +55,6 @@ config FS_DAX_PMD
depends on FS_DAX
depends on ZONE_DEVICE
depends on TRANSPARENT_HUGEPAGE
-   depends on BROKEN
 
 endif # BLOCK
 
-- 
2.9.0

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v6 08/17] dax: coordinate locking for offsets in PMD range

2016-10-12 Thread Ross Zwisler
DAX radix tree locking currently locks entries based on the unique
combination of the 'mapping' pointer and the pgoff_t 'index' for the entry.
This works for PTEs, but as we move to PMDs we will need to have all the
offsets within the range covered by the PMD to map to the same bit lock.
To accomplish this, for ranges covered by a PMD entry we will instead lock
based on the page offset of the beginning of the PMD entry.  The 'mapping'
pointer is still used in the same way.

Signed-off-by: Ross Zwisler 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Jan Kara 
---
 fs/dax.c| 65 +
 include/linux/dax.h |  2 +-
 mm/filemap.c|  2 +-
 3 files changed, 43 insertions(+), 26 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 152a6e1..e103053 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -64,14 +64,6 @@ static int __init init_dax_wait_table(void)
 }
 fs_initcall(init_dax_wait_table);
 
-static wait_queue_head_t *dax_entry_waitqueue(struct address_space *mapping,
- pgoff_t index)
-{
-   unsigned long hash = hash_long((unsigned long)mapping ^ index,
-  DAX_WAIT_TABLE_BITS);
-   return wait_table + hash;
-}
-
 static long dax_map_atomic(struct block_device *bdev, struct blk_dax_ctl *dax)
 {
struct request_queue *q = bdev->bd_queue;
@@ -285,7 +277,7 @@ EXPORT_SYMBOL_GPL(dax_do_io);
  */
 struct exceptional_entry_key {
struct address_space *mapping;
-   unsigned long index;
+   pgoff_t entry_start;
 };
 
 struct wait_exceptional_entry_queue {
@@ -293,6 +285,26 @@ struct wait_exceptional_entry_queue {
struct exceptional_entry_key key;
 };
 
+static wait_queue_head_t *dax_entry_waitqueue(struct address_space *mapping,
+   pgoff_t index, void *entry, struct exceptional_entry_key *key)
+{
+   unsigned long hash;
+
+   /*
+* If 'entry' is a PMD, align the 'index' that we use for the wait
+* queue to the start of that PMD.  This ensures that all offsets in
+* the range covered by the PMD map to the same bit lock.
+*/
+   if (RADIX_DAX_TYPE(entry) == RADIX_DAX_PMD)
+   index &= ~((1UL << (PMD_SHIFT - PAGE_SHIFT)) - 1);
+
+   key->mapping = mapping;
+   key->entry_start = index;
+
+   hash = hash_long((unsigned long)mapping ^ index, DAX_WAIT_TABLE_BITS);
+   return wait_table + hash;
+}
+
 static int wake_exceptional_entry_func(wait_queue_t *wait, unsigned int mode,
   int sync, void *keyp)
 {
@@ -301,7 +313,7 @@ static int wake_exceptional_entry_func(wait_queue_t *wait, 
unsigned int mode,
container_of(wait, struct wait_exceptional_entry_queue, wait);
 
if (key->mapping != ewait->key.mapping ||
-   key->index != ewait->key.index)
+   key->entry_start != ewait->key.entry_start)
return 0;
return autoremove_wake_function(wait, mode, sync, NULL);
 }
@@ -359,12 +371,10 @@ static void *get_unlocked_mapping_entry(struct 
address_space *mapping,
 {
void *entry, **slot;
struct wait_exceptional_entry_queue ewait;
-   wait_queue_head_t *wq = dax_entry_waitqueue(mapping, index);
+   wait_queue_head_t *wq;
 
init_wait();
ewait.wait.func = wake_exceptional_entry_func;
-   ewait.key.mapping = mapping;
-   ewait.key.index = index;
 
for (;;) {
entry = __radix_tree_lookup(>page_tree, index, NULL,
@@ -375,6 +385,8 @@ static void *get_unlocked_mapping_entry(struct 
address_space *mapping,
*slotp = slot;
return entry;
}
+
+   wq = dax_entry_waitqueue(mapping, index, entry, );
prepare_to_wait_exclusive(wq, ,
  TASK_UNINTERRUPTIBLE);
spin_unlock_irq(>tree_lock);
@@ -447,10 +459,20 @@ restart:
return entry;
 }
 
+/*
+ * We do not necessarily hold the mapping->tree_lock when we call this
+ * function so it is possible that 'entry' is no longer a valid item in the
+ * radix tree.  This is okay, though, because all we really need to do is to
+ * find the correct waitqueue where tasks might be sleeping waiting for that
+ * old 'entry' and wake them.
+ */
 void dax_wake_mapping_entry_waiter(struct address_space *mapping,
-  pgoff_t index, bool wake_all)
+   pgoff_t index, void *entry, bool wake_all)
 {
-   wait_queue_head_t *wq = dax_entry_waitqueue(mapping, index);
+   struct exceptional_entry_key key;
+   wait_queue_head_t *wq;
+
+   wq = dax_entry_waitqueue(mapping, index, entry, );
 
/*
 * Checking for locked entry and prepare_to_wait_exclusive() happens
@@ -458,13 +480,8 @@ void dax_wake_mapping_entry_waiter(struct 

[PATCH v6 10/17] dax: correct dax iomap code namespace

2016-10-12 Thread Ross Zwisler
The recently added DAX functions that use the new struct iomap data
structure were named iomap_dax_rw(), iomap_dax_fault() and
iomap_dax_actor().  These are actually defined in fs/dax.c, though, so
should be part of the "dax" namespace and not the "iomap" namespace.
Rename them to dax_iomap_rw(), dax_iomap_fault() and dax_iomap_actor()
respectively.

Signed-off-by: Ross Zwisler 
Suggested-by: Dave Chinner 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Jan Kara 
---
 fs/dax.c| 16 
 fs/ext2/file.c  |  6 +++---
 fs/xfs/xfs_file.c   |  8 
 include/linux/dax.h |  4 ++--
 4 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 3d0b103..fdbd7a1 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1031,7 +1031,7 @@ EXPORT_SYMBOL_GPL(dax_truncate_page);
 
 #ifdef CONFIG_FS_IOMAP
 static loff_t
-iomap_dax_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
+dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
struct iomap *iomap)
 {
struct iov_iter *iter = data;
@@ -1088,7 +1088,7 @@ iomap_dax_actor(struct inode *inode, loff_t pos, loff_t 
length, void *data,
 }
 
 /**
- * iomap_dax_rw - Perform I/O to a DAX file
+ * dax_iomap_rw - Perform I/O to a DAX file
  * @iocb:  The control block for this I/O
  * @iter:  The addresses to do I/O from or to
  * @ops:   iomap ops passed from the file system
@@ -1098,7 +1098,7 @@ iomap_dax_actor(struct inode *inode, loff_t pos, loff_t 
length, void *data,
  * and evicting any page cache pages in the region under I/O.
  */
 ssize_t
-iomap_dax_rw(struct kiocb *iocb, struct iov_iter *iter,
+dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
struct iomap_ops *ops)
 {
struct address_space *mapping = iocb->ki_filp->f_mapping;
@@ -1128,7 +1128,7 @@ iomap_dax_rw(struct kiocb *iocb, struct iov_iter *iter,
 
while (iov_iter_count(iter)) {
ret = iomap_apply(inode, pos, iov_iter_count(iter), flags, ops,
-   iter, iomap_dax_actor);
+   iter, dax_iomap_actor);
if (ret <= 0)
break;
pos += ret;
@@ -1138,10 +1138,10 @@ iomap_dax_rw(struct kiocb *iocb, struct iov_iter *iter,
iocb->ki_pos += done;
return done ? done : ret;
 }
-EXPORT_SYMBOL_GPL(iomap_dax_rw);
+EXPORT_SYMBOL_GPL(dax_iomap_rw);
 
 /**
- * iomap_dax_fault - handle a page fault on a DAX file
+ * dax_iomap_fault - handle a page fault on a DAX file
  * @vma: The virtual memory area where the fault occurred
  * @vmf: The description of the fault
  * @ops: iomap ops passed from the file system
@@ -1150,7 +1150,7 @@ EXPORT_SYMBOL_GPL(iomap_dax_rw);
  * or mkwrite handler for DAX files. Assumes the caller has done all the
  * necessary locking for the page fault to proceed successfully.
  */
-int iomap_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
+int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
struct iomap_ops *ops)
 {
struct address_space *mapping = vma->vm_file->f_mapping;
@@ -1252,5 +1252,5 @@ int iomap_dax_fault(struct vm_area_struct *vma, struct 
vm_fault *vmf,
return VM_FAULT_SIGBUS | major;
return VM_FAULT_NOPAGE | major;
 }
-EXPORT_SYMBOL_GPL(iomap_dax_fault);
+EXPORT_SYMBOL_GPL(dax_iomap_fault);
 #endif /* CONFIG_FS_IOMAP */
diff --git a/fs/ext2/file.c b/fs/ext2/file.c
index 0f257f8..32a4913 100644
--- a/fs/ext2/file.c
+++ b/fs/ext2/file.c
@@ -38,7 +38,7 @@ static ssize_t ext2_dax_read_iter(struct kiocb *iocb, struct 
iov_iter *to)
return 0; /* skip atime */
 
inode_lock_shared(inode);
-   ret = iomap_dax_rw(iocb, to, _iomap_ops);
+   ret = dax_iomap_rw(iocb, to, _iomap_ops);
inode_unlock_shared(inode);
 
file_accessed(iocb->ki_filp);
@@ -62,7 +62,7 @@ static ssize_t ext2_dax_write_iter(struct kiocb *iocb, struct 
iov_iter *from)
if (ret)
goto out_unlock;
 
-   ret = iomap_dax_rw(iocb, from, _iomap_ops);
+   ret = dax_iomap_rw(iocb, from, _iomap_ops);
if (ret > 0 && iocb->ki_pos > i_size_read(inode)) {
i_size_write(inode, iocb->ki_pos);
mark_inode_dirty(inode);
@@ -99,7 +99,7 @@ static int ext2_dax_fault(struct vm_area_struct *vma, struct 
vm_fault *vmf)
}
down_read(>dax_sem);
 
-   ret = iomap_dax_fault(vma, vmf, _iomap_ops);
+   ret = dax_iomap_fault(vma, vmf, _iomap_ops);
 
up_read(>dax_sem);
if (vmf->flags & FAULT_FLAG_WRITE)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index de7c53c..8f12152 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -344,7 +344,7 @@ xfs_file_dax_read(
return 0; /* skip atime */
 
xfs_rw_ilock(ip, XFS_IOLOCK_SHARED);
-   ret = 

[PATCH v6 03/17] dax: remove buffer_size_valid()

2016-10-12 Thread Ross Zwisler
Now that ext4 properly sets bh.b_size when we call get_block() for a hole,
rely on that value and remove the buffer_size_valid() sanity check.

Signed-off-by: Ross Zwisler 
Reviewed-by: Jan Kara 
Reviewed-by: Christoph Hellwig 
---
 fs/dax.c | 22 +-
 1 file changed, 1 insertion(+), 21 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index cc025f8..9b9be8a 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -123,19 +123,6 @@ static bool buffer_written(struct buffer_head *bh)
return buffer_mapped(bh) && !buffer_unwritten(bh);
 }
 
-/*
- * When ext4 encounters a hole, it returns without modifying the buffer_head
- * which means that we can't trust b_size.  To cope with this, we set b_state
- * to 0 before calling get_block and, if any bit is set, we know we can trust
- * b_size.  Unfortunate, really, since ext4 knows precisely how long a hole is
- * and would save us time calling get_block repeatedly.
- */
-static bool buffer_size_valid(struct buffer_head *bh)
-{
-   return bh->b_state != 0;
-}
-
-
 static sector_t to_sector(const struct buffer_head *bh,
const struct inode *inode)
 {
@@ -177,8 +164,6 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter 
*iter,
rc = get_block(inode, block, bh, rw == WRITE);
if (rc)
break;
-   if (!buffer_size_valid(bh))
-   bh->b_size = 1 << blkbits;
bh_max = pos - first + bh->b_size;
bdev = bh->b_bdev;
/*
@@ -1012,12 +997,7 @@ int dax_pmd_fault(struct vm_area_struct *vma, unsigned 
long address,
 
bdev = bh.b_bdev;
 
-   /*
-* If the filesystem isn't willing to tell us the length of a hole,
-* just fall back to PTEs.  Calling get_block 512 times in a loop
-* would be silly.
-*/
-   if (!buffer_size_valid() || bh.b_size < PMD_SIZE) {
+   if (bh.b_size < PMD_SIZE) {
dax_pmd_dbg(, address, "allocated block too small");
return VM_FAULT_FALLBACK;
}
-- 
2.9.0

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v6 16/17] xfs: use struct iomap based DAX PMD fault path

2016-10-12 Thread Ross Zwisler
Switch xfs_filemap_pmd_fault() from using dax_pmd_fault() to the new and
improved dax_iomap_pmd_fault().  Also, now that it has no more users,
remove xfs_get_blocks_dax_fault().

Signed-off-by: Ross Zwisler 
Reviewed-by: Jan Kara 
---
 fs/xfs/xfs_aops.c | 26 +-
 fs/xfs/xfs_aops.h |  3 ---
 fs/xfs/xfs_file.c |  2 +-
 3 files changed, 6 insertions(+), 25 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 0e2a931..1c73d0a 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1298,8 +1298,7 @@ __xfs_get_blocks(
sector_tiblock,
struct buffer_head  *bh_result,
int create,
-   booldirect,
-   booldax_fault)
+   booldirect)
 {
struct xfs_inode*ip = XFS_I(inode);
struct xfs_mount*mp = ip->i_mount;
@@ -1420,13 +1419,8 @@ __xfs_get_blocks(
if (ISUNWRITTEN())
set_buffer_unwritten(bh_result);
/* direct IO needs special help */
-   if (create) {
-   if (dax_fault)
-   ASSERT(!ISUNWRITTEN());
-   else
-   xfs_map_direct(inode, bh_result, , offset,
-   is_cow);
-   }
+   if (create)
+   xfs_map_direct(inode, bh_result, , offset, is_cow);
}
 
/*
@@ -1466,7 +1460,7 @@ xfs_get_blocks(
struct buffer_head  *bh_result,
int create)
 {
-   return __xfs_get_blocks(inode, iblock, bh_result, create, false, false);
+   return __xfs_get_blocks(inode, iblock, bh_result, create, false);
 }
 
 int
@@ -1476,17 +1470,7 @@ xfs_get_blocks_direct(
struct buffer_head  *bh_result,
int create)
 {
-   return __xfs_get_blocks(inode, iblock, bh_result, create, true, false);
-}
-
-int
-xfs_get_blocks_dax_fault(
-   struct inode*inode,
-   sector_tiblock,
-   struct buffer_head  *bh_result,
-   int create)
-{
-   return __xfs_get_blocks(inode, iblock, bh_result, create, true, true);
+   return __xfs_get_blocks(inode, iblock, bh_result, create, true);
 }
 
 /*
diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
index b3c6634..34dc00d 100644
--- a/fs/xfs/xfs_aops.h
+++ b/fs/xfs/xfs_aops.h
@@ -59,9 +59,6 @@ int   xfs_get_blocks(struct inode *inode, sector_t offset,
   struct buffer_head *map_bh, int create);
 intxfs_get_blocks_direct(struct inode *inode, sector_t offset,
  struct buffer_head *map_bh, int create);
-intxfs_get_blocks_dax_fault(struct inode *inode, sector_t offset,
-struct buffer_head *map_bh, int create);
-
 intxfs_end_io_direct_write(struct kiocb *iocb, loff_t offset,
ssize_t size, void *private);
 intxfs_setfilesize(struct xfs_inode *ip, xfs_off_t offset, size_t size);
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 8f12152..7b13dda 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1750,7 +1750,7 @@ xfs_filemap_pmd_fault(
}
 
xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
-   ret = dax_pmd_fault(vma, addr, pmd, flags, xfs_get_blocks_dax_fault);
+   ret = dax_iomap_pmd_fault(vma, addr, pmd, flags, _iomap_ops);
xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
 
if (flags & FAULT_FLAG_WRITE)
-- 
2.9.0

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v6 15/17] dax: add struct iomap based DAX PMD support

2016-10-12 Thread Ross Zwisler
DAX PMDs have been disabled since Jan Kara introduced DAX radix tree based
locking.  This patch allows DAX PMDs to participate in the DAX radix tree
based locking scheme so that they can be re-enabled using the new struct
iomap based fault handlers.

There are currently three types of DAX 4k entries: 4k zero pages, 4k DAX
mappings that have an associated block allocation, and 4k DAX empty
entries.  The empty entries exist to provide locking for the duration of a
given page fault.

This patch adds three equivalent 2MiB DAX entries: Huge Zero Page (HZP)
entries, PMD DAX entries that have associated block allocations, and 2 MiB
DAX empty entries.

Unlike the 4k case where we insert a struct page* into the radix tree for
4k zero pages, for HZP we insert a DAX exceptional entry with the new
RADIX_DAX_HZP flag set.  This is because we use a single 2 MiB zero page in
every 2MiB hole mapping, and it doesn't make sense to have that same struct
page* with multiple entries in multiple trees.  This would cause contention
on the single page lock for the one Huge Zero Page, and it would break the
page->index and page->mapping associations that are assumed to be valid in
many other places in the kernel.

One difficult use case is when one thread is trying to use 4k entries in
radix tree for a given offset, and another thread is using 2 MiB entries
for that same offset.  The current code handles this by making the 2 MiB
user fall back to 4k entries for most cases.  This was done because it is
the simplest solution, and because the use of 2MiB pages is already
opportunistic.

If we were to try to upgrade from 4k pages to 2MiB pages for a given range,
we run into the problem of how we lock out 4k page faults for the entire
2MiB range while we clean out the radix tree so we can insert the 2MiB
entry.  We can solve this problem if we need to, but I think that the cases
where both 2MiB entries and 4K entries are being used for the same range
will be rare enough and the gain small enough that it probably won't be
worth the complexity.

Signed-off-by: Ross Zwisler 
---
 fs/dax.c| 377 ++--
 include/linux/dax.h |  55 ++--
 mm/filemap.c|   3 +-
 3 files changed, 385 insertions(+), 50 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 0582c7c..39b41ea 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -76,6 +76,26 @@ static void dax_unmap_atomic(struct block_device *bdev,
blk_queue_exit(bdev->bd_queue);
 }
 
+static int dax_is_pmd_entry(void *entry)
+{
+   return (unsigned long)entry & RADIX_DAX_PMD;
+}
+
+static int dax_is_pte_entry(void *entry)
+{
+   return !((unsigned long)entry & RADIX_DAX_PMD);
+}
+
+static int dax_is_zero_entry(void *entry)
+{
+   return (unsigned long)entry & RADIX_DAX_HZP;
+}
+
+static int dax_is_empty_entry(void *entry)
+{
+   return (unsigned long)entry & RADIX_DAX_EMPTY;
+}
+
 struct page *read_dax_sector(struct block_device *bdev, sector_t n)
 {
struct page *page = alloc_pages(GFP_KERNEL, 0);
@@ -281,7 +301,7 @@ static wait_queue_head_t *dax_entry_waitqueue(struct 
address_space *mapping,
 * queue to the start of that PMD.  This ensures that all offsets in
 * the range covered by the PMD map to the same bit lock.
 */
-   if (RADIX_DAX_TYPE(entry) == RADIX_DAX_PMD)
+   if (dax_is_pmd_entry(entry))
index &= ~((1UL << (PMD_SHIFT - PAGE_SHIFT)) - 1);
 
key->mapping = mapping;
@@ -413,36 +433,115 @@ static void put_unlocked_mapping_entry(struct 
address_space *mapping,
  * radix tree entry locked. If the radix tree doesn't contain given index,
  * create empty exceptional entry for the index and return with it locked.
  *
+ * When requesting an entry with size RADIX_DAX_PMD, grab_mapping_entry() will
+ * either return that locked entry or will return an error.  This error will
+ * happen if there are any 4k entries (either zero pages or DAX entries)
+ * within the 2MiB range that we are requesting.
+ *
+ * We always favor 4k entries over 2MiB entries. There isn't a flow where we
+ * evict 4k entries in order to 'upgrade' them to a 2MiB entry.  A 2MiB
+ * insertion will fail if it finds any 4k entries already in the tree, and a
+ * 4k insertion will cause an existing 2MiB entry to be unmapped and
+ * downgraded to 4k entries.  This happens for both 2MiB huge zero pages as
+ * well as 2MiB empty entries.
+ *
+ * The exception to this downgrade path is for 2MiB DAX PMD entries that have
+ * real storage backing them.  We will leave these real 2MiB DAX entries in
+ * the tree, and PTE writes will simply dirty the entire 2MiB DAX entry.
+ *
  * Note: Unlike filemap_fault() we don't honor FAULT_FLAG_RETRY flags. For
  * persistent memory the benefit is doubtful. We can add that later if we can
  * show it helps.
  */
-static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index)
+static void *grab_mapping_entry(struct 

[PATCH v6 07/17] dax: consistent variable naming for DAX entries

2016-10-12 Thread Ross Zwisler
No functional change.

Consistently use the variable name 'entry' instead of 'ret' for DAX radix
tree entries.  This was already happening in most of the code, so update
get_unlocked_mapping_entry(), grab_mapping_entry() and
dax_unlock_mapping_entry().

Signed-off-by: Ross Zwisler 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Jan Kara 
---
 fs/dax.c | 34 +-
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 98189ac..152a6e1 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -357,7 +357,7 @@ static inline void *unlock_slot(struct address_space 
*mapping, void **slot)
 static void *get_unlocked_mapping_entry(struct address_space *mapping,
pgoff_t index, void ***slotp)
 {
-   void *ret, **slot;
+   void *entry, **slot;
struct wait_exceptional_entry_queue ewait;
wait_queue_head_t *wq = dax_entry_waitqueue(mapping, index);
 
@@ -367,13 +367,13 @@ static void *get_unlocked_mapping_entry(struct 
address_space *mapping,
ewait.key.index = index;
 
for (;;) {
-   ret = __radix_tree_lookup(>page_tree, index, NULL,
+   entry = __radix_tree_lookup(>page_tree, index, NULL,
  );
-   if (!ret || !radix_tree_exceptional_entry(ret) ||
+   if (!entry || !radix_tree_exceptional_entry(entry) ||
!slot_locked(mapping, slot)) {
if (slotp)
*slotp = slot;
-   return ret;
+   return entry;
}
prepare_to_wait_exclusive(wq, ,
  TASK_UNINTERRUPTIBLE);
@@ -396,13 +396,13 @@ static void *get_unlocked_mapping_entry(struct 
address_space *mapping,
  */
 static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index)
 {
-   void *ret, **slot;
+   void *entry, **slot;
 
 restart:
spin_lock_irq(>tree_lock);
-   ret = get_unlocked_mapping_entry(mapping, index, );
+   entry = get_unlocked_mapping_entry(mapping, index, );
/* No entry for given index? Make sure radix tree is big enough. */
-   if (!ret) {
+   if (!entry) {
int err;
 
spin_unlock_irq(>tree_lock);
@@ -410,10 +410,10 @@ restart:
mapping_gfp_mask(mapping) & ~__GFP_HIGHMEM);
if (err)
return ERR_PTR(err);
-   ret = (void *)(RADIX_TREE_EXCEPTIONAL_ENTRY |
+   entry = (void *)(RADIX_TREE_EXCEPTIONAL_ENTRY |
   RADIX_DAX_ENTRY_LOCK);
spin_lock_irq(>tree_lock);
-   err = radix_tree_insert(>page_tree, index, ret);
+   err = radix_tree_insert(>page_tree, index, entry);
radix_tree_preload_end();
if (err) {
spin_unlock_irq(>tree_lock);
@@ -425,11 +425,11 @@ restart:
/* Good, we have inserted empty locked entry into the tree. */
mapping->nrexceptional++;
spin_unlock_irq(>tree_lock);
-   return ret;
+   return entry;
}
/* Normal page in radix tree? */
-   if (!radix_tree_exceptional_entry(ret)) {
-   struct page *page = ret;
+   if (!radix_tree_exceptional_entry(entry)) {
+   struct page *page = entry;
 
get_page(page);
spin_unlock_irq(>tree_lock);
@@ -442,9 +442,9 @@ restart:
}
return page;
}
-   ret = lock_slot(mapping, slot);
+   entry = lock_slot(mapping, slot);
spin_unlock_irq(>tree_lock);
-   return ret;
+   return entry;
 }
 
 void dax_wake_mapping_entry_waiter(struct address_space *mapping,
@@ -469,11 +469,11 @@ void dax_wake_mapping_entry_waiter(struct address_space 
*mapping,
 
 void dax_unlock_mapping_entry(struct address_space *mapping, pgoff_t index)
 {
-   void *ret, **slot;
+   void *entry, **slot;
 
spin_lock_irq(>tree_lock);
-   ret = __radix_tree_lookup(>page_tree, index, NULL, );
-   if (WARN_ON_ONCE(!ret || !radix_tree_exceptional_entry(ret) ||
+   entry = __radix_tree_lookup(>page_tree, index, NULL, );
+   if (WARN_ON_ONCE(!entry || !radix_tree_exceptional_entry(entry) ||
 !slot_locked(mapping, slot))) {
spin_unlock_irq(>tree_lock);
return;
-- 
2.9.0

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v6 05/17] dax: make 'wait_table' global variable static

2016-10-12 Thread Ross Zwisler
The global 'wait_table' variable is only used within fs/dax.c, and
generates the following sparse warning:

fs/dax.c:39:19: warning: symbol 'wait_table' was not declared. Should it be 
static?

Make it static so it has scope local to fs/dax.c, and to make sparse happy.

Signed-off-by: Ross Zwisler 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Jan Kara 
---
 fs/dax.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/dax.c b/fs/dax.c
index 9b9be8a..ac28cdf 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -52,7 +52,7 @@
 #define DAX_WAIT_TABLE_BITS 12
 #define DAX_WAIT_TABLE_ENTRIES (1 << DAX_WAIT_TABLE_BITS)
 
-wait_queue_head_t wait_table[DAX_WAIT_TABLE_ENTRIES];
+static wait_queue_head_t wait_table[DAX_WAIT_TABLE_ENTRIES];
 
 static int __init init_dax_wait_table(void)
 {
-- 
2.9.0

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v6 06/17] dax: remove the last BUG_ON() from fs/dax.c

2016-10-12 Thread Ross Zwisler
Don't take down the kernel if we get an invalid 'from' and 'length'
argument pair.  Just warn once and return an error.

Signed-off-by: Ross Zwisler 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Jan Kara 
---
 fs/dax.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/dax.c b/fs/dax.c
index ac28cdf..98189ac 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1194,7 +1194,8 @@ int dax_zero_page_range(struct inode *inode, loff_t from, 
unsigned length,
/* Block boundary? Nothing to do */
if (!length)
return 0;
-   BUG_ON((offset + length) > PAGE_SIZE);
+   if (WARN_ON_ONCE((offset + length) > PAGE_SIZE))
+   return -EINVAL;
 
memset(, 0, sizeof(bh));
bh.b_bdev = inode->i_sb->s_bdev;
-- 
2.9.0

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v6 09/17] dax: remove dax_pmd_fault()

2016-10-12 Thread Ross Zwisler
dax_pmd_fault() is the old struct buffer_head + get_block_t based 2 MiB DAX
fault handler.  This fault handler has been disabled for several kernel
releases, and support for PMDs will be reintroduced using the struct iomap
interface instead.

Signed-off-by: Ross Zwisler 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Jan Kara 
---
 fs/dax.c| 213 
 include/linux/dax.h |   6 +-
 2 files changed, 1 insertion(+), 218 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index e103053..3d0b103 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -915,219 +915,6 @@ int dax_fault(struct vm_area_struct *vma, struct vm_fault 
*vmf,
 }
 EXPORT_SYMBOL_GPL(dax_fault);
 
-#if defined(CONFIG_TRANSPARENT_HUGEPAGE)
-/*
- * The 'colour' (ie low bits) within a PMD of a page offset.  This comes up
- * more often than one might expect in the below function.
- */
-#define PG_PMD_COLOUR  ((PMD_SIZE >> PAGE_SHIFT) - 1)
-
-static void __dax_dbg(struct buffer_head *bh, unsigned long address,
-   const char *reason, const char *fn)
-{
-   if (bh) {
-   char bname[BDEVNAME_SIZE];
-   bdevname(bh->b_bdev, bname);
-   pr_debug("%s: %s addr: %lx dev %s state %lx start %lld "
-   "length %zd fallback: %s\n", fn, current->comm,
-   address, bname, bh->b_state, (u64)bh->b_blocknr,
-   bh->b_size, reason);
-   } else {
-   pr_debug("%s: %s addr: %lx fallback: %s\n", fn,
-   current->comm, address, reason);
-   }
-}
-
-#define dax_pmd_dbg(bh, address, reason)   __dax_dbg(bh, address, reason, 
"dax_pmd")
-
-/**
- * dax_pmd_fault - handle a PMD fault on a DAX file
- * @vma: The virtual memory area where the fault occurred
- * @vmf: The description of the fault
- * @get_block: The filesystem method used to translate file offsets to blocks
- *
- * When a page fault occurs, filesystems may call this helper in their
- * pmd_fault handler for DAX files.
- */
-int dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
-   pmd_t *pmd, unsigned int flags, get_block_t get_block)
-{
-   struct file *file = vma->vm_file;
-   struct address_space *mapping = file->f_mapping;
-   struct inode *inode = mapping->host;
-   struct buffer_head bh;
-   unsigned blkbits = inode->i_blkbits;
-   unsigned long pmd_addr = address & PMD_MASK;
-   bool write = flags & FAULT_FLAG_WRITE;
-   struct block_device *bdev;
-   pgoff_t size, pgoff;
-   sector_t block;
-   int result = 0;
-   bool alloc = false;
-
-   /* dax pmd mappings require pfn_t_devmap() */
-   if (!IS_ENABLED(CONFIG_FS_DAX_PMD))
-   return VM_FAULT_FALLBACK;
-
-   /* Fall back to PTEs if we're going to COW */
-   if (write && !(vma->vm_flags & VM_SHARED)) {
-   split_huge_pmd(vma, pmd, address);
-   dax_pmd_dbg(NULL, address, "cow write");
-   return VM_FAULT_FALLBACK;
-   }
-   /* If the PMD would extend outside the VMA */
-   if (pmd_addr < vma->vm_start) {
-   dax_pmd_dbg(NULL, address, "vma start unaligned");
-   return VM_FAULT_FALLBACK;
-   }
-   if ((pmd_addr + PMD_SIZE) > vma->vm_end) {
-   dax_pmd_dbg(NULL, address, "vma end unaligned");
-   return VM_FAULT_FALLBACK;
-   }
-
-   pgoff = linear_page_index(vma, pmd_addr);
-   size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
-   if (pgoff >= size)
-   return VM_FAULT_SIGBUS;
-   /* If the PMD would cover blocks out of the file */
-   if ((pgoff | PG_PMD_COLOUR) >= size) {
-   dax_pmd_dbg(NULL, address,
-   "offset + huge page size > file size");
-   return VM_FAULT_FALLBACK;
-   }
-
-   memset(, 0, sizeof(bh));
-   bh.b_bdev = inode->i_sb->s_bdev;
-   block = (sector_t)pgoff << (PAGE_SHIFT - blkbits);
-
-   bh.b_size = PMD_SIZE;
-
-   if (get_block(inode, block, , 0) != 0)
-   return VM_FAULT_SIGBUS;
-
-   if (!buffer_mapped() && write) {
-   if (get_block(inode, block, , 1) != 0)
-   return VM_FAULT_SIGBUS;
-   alloc = true;
-   WARN_ON_ONCE(buffer_unwritten() || buffer_new());
-   }
-
-   bdev = bh.b_bdev;
-
-   if (bh.b_size < PMD_SIZE) {
-   dax_pmd_dbg(, address, "allocated block too small");
-   return VM_FAULT_FALLBACK;
-   }
-
-   /*
-* If we allocated new storage, make sure no process has any
-* zero pages covering this hole
-*/
-   if (alloc) {
-   loff_t lstart = pgoff << PAGE_SHIFT;
-   loff_t lend = lstart + PMD_SIZE - 1; /* inclusive */
-
-   truncate_pagecache_range(inode, lstart, lend);

Re: [Xen-devel] [RFC KERNEL PATCH 0/2] Add Dom0 NVDIMM support for Xen

2016-10-12 Thread Haozhong Zhang

On 10/12/16 05:32 -0600, Jan Beulich wrote:

On 12.10.16 at 12:33,  wrote:

The layout is shown as the following diagram.

+---+---+---+--+--+
| whatever used | Partition | Super | Reserved | /dev/pmem0p1 |
|  by kernel|   Table   | Block | for Xen  |  |
+---+---+---+--+--+
\_ ___/
  V
 /dev/pmem0


I have to admit that I dislike this, for not being OS-agnostic.
Neither should there be any Xen-specific region, nor should the
"whatever used by kernel" one be restricted to just Linux. What
I could see is an OS-reserved area ahead of the partition table,
the exact usage of which depends on which OS is currently
running (and in the Xen case this might be both Xen _and_ the
Dom0 kernel, arbitrated by a tbd protocol). After all, when
running under Xen, the Dom0 may not have a need for as much
control data as it has when running on bare hardware, for it
controlling less (if any) of the actual memory ranges when Xen
is present.



Isn't this OS-reserved area still not OS-agnostic, as it requires OS
to know where the reserved area is?  Or do you mean it's not if it's
defined by a protocol that is accepted by all OSes?

Let me list another two methods just coming to my mind.

1. The first method extends the usage of the super block used by
  current Linux kernel to reserve space on pmem.

  Current Linux kernel places a super block of the following
  structure near the beginning of a pmem namespace.

   struct nd_pfn_sb {
   u8 signature[PFN_SIG_LEN];
   u8 uuid[16];
   u8 parent_uuid[16];
   __le32 flags;
   __le16 version_major;
   __le16 version_minor;
   __le64 dataoff; /* relative to namespace_base + start_pad */
   __le64 npfns;
   __le32 mode;
   /* minor-version-1 additions for section alignment */
   __le32 start_pad;
   __le32 end_trunc;
   /* minor-version-2 record the base alignment of the mapping */
   __le32 align;
   u8 padding[4000];
   __le64 checksum;
   }

   Two interesting fields here are 'dataoff' and 'mode':
   - 'dataoff' indicates the offset where the data area starts,
 ie. IIUC, the part that can be accessed via /dev/pmemN or
 /dev/daxN.
   - 'mode' indicates whether Linux puts struct page for this
 namespace in the ram (= PFN_MODE_RAM) or on the device (=
 PFN_MODE_PMEM).

   Currently for Linux, only 'mode' is customizable, while 'dataoff'
   is not. If mode == PFN_MODE_RAM, no reservation for struct page is
   made on the device, and dataoff starts almost immediately after
   the super block except a small reserved area in between for other
   structures and alignment. If mode == PFN_MODE_PMEM, the size of
   the reservation is decided by kernel, i.e. 64 bytes per struct
   page.

   I propose to make the size of the reserved area customizable,
   e.g. via ioctl and ndctl.
   - If mode == PFN_MODE_PMEM and
 * if the given reserved size is large enough to hold what an OS
   (not limited to Linux) wants to put in, then the OS just
   starts use it as desired;
 * if the given reserved size is not enough, then the OS reports
   error and may take other fallback actions.
   - If mode == PFN_MODE_RAM and
 * if the reserved size is zero, then it's the current way that
   Linux uses the device;
 * if the reserved size is non-zero, I would like to reserve this
   case for hypervisor (right now, namely Xen hypervisor)
   usage. That is, the OS should not use the reserved area. For
   Xen, we could add a function in xen driver in kernel to report
   the reserved area to hypervisor.

  I guess this might be the OS-agnostic way Jan expects, but Dan may
  object to.


2. Lay another pseudo device on the block device (e.g. /dev/pmemN)
  provided by the NVDIMM driver.

  This pseudo device can reserve the size according to user's
  requirement. The reservation information can be persistently
  recorded in a super block before the reserved area.

  This pseudo device also implements another pseudo block device to
  allow the non-reserved area be accessed as a block device (we can
  even implement it as DAX-capable).

  pseudo block device
/-^---\
+--+---+---+---+
|  whatever used   | Super |  reserved by  |   |
| by NVDIMM driver | Block | pseudo device |   |
+--+---+---+---+
\_ ___/
  V
 

?为什么给这样的服务? 客户看重不是产品,而是使用价值;

2016-10-12 Thread ?为什么给这样的服务? 客户看重不是产品,而是使用价值;先生
销售精英2天强化训练

【时间地点】 2016年   10月15-16日深圳11月05-06日上海 
 11月19-20日北京11月26-27日深圳12月17-18日上海 


Judge(评价)一个人,一个公司是不是优秀,不要看他是不是Harvard(哈佛大学),是不是Stanford(斯坦福大学).不要judge(评价)里面有多少名牌大学毕业生,而要judge(评价)这帮人干活是不是发疯一样干,看他每天下班是不是笑眯眯回家!
——阿里巴巴公司马云


——课程简介

第一章客户需求分析
思考:
1、面对客户找不到话说,怎么办?二次沟通应该聊些什么?
2、为什么我把所有资料都给客户了,他还说要考虑一下?
3、同一件事,客户不同的人告诉我不一样的要求,听谁的?
4、同一件事,客户同一个人告诉我两次的答案不一样,听哪次的?
5、推荐哪一种产品给客户好?最好的?稍好的?还是够用的?
4、为什么我按客户要求去做,他还是没有选择我们?
5、不同的客户,我应该如何应对?
6、忠诚的客户应该如何培养?

第一节、为什么要对客户需求进行分析?
1、客户初次告诉我们的信息往往是有所保留的;
2、客户想要的产品,不一定就是实际所需要的;
3、客户不缺少产品信息,困惑的是自己如何选择; 
4、客户购买决定是比较出来的,没有比较,产品就没有价值;
5、销售人员第一思想是战争思想,情报最重要;
6、未来的送货员,联络员,报价员将被淘汰;

第二节、如何做好客户需求分析?
一、基本要求:
1.无事不登三宝殿,有目的地做好拜访计划;
2.引导客户,首先要控制谈话的方向、节奏、内容;
3.从讲产品的“卖点”转变到讲客户的“买点”
4.好的,不一定是最适合的,最合适的才是最好的;
5.不要把猜测当成事实,“谈”的是什么?“判”是由谁判?
6.讨论:客户说价格太贵,代表哪15种不同的意思?

二、需求分析要点:
1.了解客户的4种期望目标;
2.了解客户采购的5个适当;
3.判断谁是关键人的8个依据;
4.哪6大类问题不可以问? 要表达别人听得懂的话;
5.提问注意的“3不谈”,“4不讲”;
6.客户需求分析手册制定的6个步骤;
?找对方向,事半功倍,为什么找这个客户?
?时间没对,努力白费,为什么这个时候找?
?找对人,说对话,为什么找这个人? 
?为什么推荐这个产品?给客户需要的,而不是自己想给的; 
?为什么给这样的服务? 客户看重不是产品,而是使用价值;
?为什么报这个价? 在客户的预算与同行之间找到平衡;
7.为什么还这个价?关注竞争对手,调整自己的策略;

第二章  如何正确推荐产品
思考:
1、为什么我满足客户所提出的要求,客户却还需要考虑一下?
2、为什么客户不相信我质量与服务的承诺?
3、面对客户提出高端产品的要求,而我只有低端产品,怎么办?
4、如何推荐产品才能让客户感觉到我们跟别人不一样;

第一节 为什么需要我们正确地推荐产品?
1、客户往往对自己深层次的问题并不清楚;
2、客户的提出的要求可能是模糊或抽象,有的仅仅提出方向,不要局限于客户明显的问题,头痛医头,脚痛医脚;
3、客户往往会以我们竞品给他的条件要求我们;
4、满足客户提出的要求,是引导客户在不同公司之间做比较,而不在我公司做出决策;

第二节 如何帮助客户建立“排他性”的采购标准?
案例:客户关心的是你如何保证你的质量和服务水平
1、打仗就是打后勤,推荐产品中常用的34项内容;
2、产品的功能与客户需要解决的问题要相对应;客户喜欢提供解决方案的人,而不仅提供工具的人;
3、如何给竞争对手业务员设置障碍?

第三节  见什么人,说什么话;
不同情况下如何讲?时间、能力、精力、兴趣、文化水平、不同的职位等;
1. 什么情况下偏重于理性说服,打动别人的脑?
2. 什么情况下偏重于情感说服,打动别人的心?
3. 何种情况下只讲优势不讲劣势?
4. 何种情况下即讲优势又讲劣势?

第三章如何有效处理异议
思考
1、遇到小气、固执、粗鲁、啰嗦、刻薄、吹毛求疵、优柔寡断的客户应对?
2、客户直接挂电话,怎么办?
3、第二次见面,客户对我大发脾气,怎么办?
4、有一个行业,销售人员每天都会遇到大量的拒绝,为什么却没有任何人会沮丧? 
5、客户就没有压力吗?知已知彼,客户采购时会有哪些压力?
6、为什么客户在上班时与下班后会表现不同的性格特征?

第一节:买卖双方的心情分析
1、如果一方比另一方更主动、更积极追求合作,则后者称为潜在客户 
2、卖方知道某价一定不能卖,但买方不知道什么价不能买;
3、当卖方表现自己很想卖,买方会表现自己不想买;
4、买方还的价,并不一定是他认为商品就应该值这个价;
5、付钱之前,买方占优势,之后,卖方占优势;

第二节、理解客户购买时的心态;
1、客户谈判时常用7种试探技巧分析;
2、客户态度非常好,就是不下订单,为什么?
3、为什么有些客户让我们感觉高高在上,花钱是大爷?难道他们素质真的差?
4、客户自身会有哪6个压力?
案例:客户提出合理条件,是否我就应该降价?
案例:如何分清客户异议的真实性?
案例:当谈判出现僵局时怎么办?
案例:为什么我答应客户提出的所有的条件,反而失去了订单?
案例:客户一再地提出不同的条件,怎么处理?
案例:客户要求我降价时,怎么办?请分8个步骤处理

第三节 客户异议处理的5个区分
1、要区分“第一” 还是“唯一”
2、对客户要求的真伪进行鉴别;
3、要区分“情绪”还是“行为”
4、区分“假想”还是“事实”
5、区别问题的轻重,缓急;

第四章  如何建立良好的客情关系?
案例:销售工作需要疯狂、圆滑、奉承、见人说人话,见鬼说鬼话吗?
案例:生意不成仁义在,咱俩交个朋友,这句话应该由谁说?
案例:邀请客户吃饭,你应该怎么说?
案例:当客户表扬了你,你会怎么回答?
案例:我代表公司的形象,是否我应该表现自己很强势?
案例:为了获得客户的信任,我是否应该花重金包装自己?让自己很完美?
案例:是否需要处处表现自己很有礼貌?
案例:如何与企业高层、政府高层打交道?

第一节 做回真实和真诚的自己,表里如一
礼仪的目的是尊重别人,而不是伪装自己,礼仪中常见的错误;
1、演别人,再好的演技也会搞砸,想做别人的时候,你就会离自己很远;
2、不同的人,需求不同,越改越累,越改越气,只会把自己折磨得心浮气躁,不得人心;
3、以朋友的心态与客户交往,过多的商业化语言、行为、过多的礼仪只会让客户感觉到生硬、距离、排斥、公事公办,没有感情;
4、适当的暴露自己的缺点,越完美的人越不可信;
5、守时,守信,守约,及时传递进程与信息,让客户感觉到可控性;
6、销售不是向客户笑,而是要让客户对自己笑;

第二节 感谢伤害我的人,是因为我自己错了;
1、一味顺从、推卸责任、理论交谈、谈论小事、无诚信;
2、当客户说过一段时间、以后、改天、回头、月底时,如何应对?
3、越完美的人越不可信,自我暴露的四个层次;
4、做好防错性的服务,签完合同仅仅是合作的开始;
?指导客户如何使用; 
?跟踪产品使用的情况; 
?为客户在使用过程中提供指导建议; 
?积极解答客户在使用中提出的问题; 


第四章团队配合
思考:
1.团队配合的前提是什么?是否任意两个人在一起都会有团队精神?
2.团队配合中为什么会出现扯皮的现象?
3.为什么公司花那么高成本让大家加深感情,但有些人之间还是有隔阂?
4.业绩好的人影响业绩差的人容易还是业绩差的影响业绩好的容易?
5.统一底薪好?还是差别化底薪好?如何让大家都觉得公平?
6.为什么有能力的不听话,听话的却没能力?
7.为什么有些人总是不按我要求的方法去做?
8.面对业绩总是很差的员工,到底是留还是开?

第一节团队配合的重要性
1.优秀的业务员业绩往往是普通的几十甚至上百倍;
2.提高成交的效率,不要杀敌一千,而自损八百;
3.优秀业务员缺时间,业绩普通的业务员缺能力,扬长避短,人尽其才;
4.把人力资源效益利用最大化;
5.打造完美的团队,让成员的缺点相互抵消;

第二节,如何开展团队配合
第一、能力互补
1.关注员工的能力,不要把未来寄托在员工未知的潜能上;
2.不要把员工塑造成同一类型的人,不把专才当全才用;
3.团队以能为本,销售岗位常见的14项技能;
4.售前、售中、售后人员要求与如何搭配?
5.案例:新员工有激情,但能力不足,老员工有能力,但激情不足,怎么办?

第二、利益关联
1.为什么成员会相互冷漠、互不关心、彼此封锁信息和资源?
2.为什么团队成员把团队的事不当回事?
3.如何才能让团队成员真心的为优秀的成员而高兴?
4.开除业绩差的员工,其他成员缺乏安全感怎么办?
5.如何才能让团队自动自发的努力工作?

第三节、不同客户喜欢不同风格的销售人员
1、 销售人员形象与举止,注意自己的形象;
2、 是否具备相似的背景,门当户对;
3、 是否具备相同的认识,道不同不相为盟;
4、 是否“投其所好”,话不投机半句多;
5、 赞美,喜欢对方,我们同样对喜欢我们的人有好感;
先交流感情,增进互信,欲速则不达;
6、 是否对销售人员熟悉,销售最忌讳交浅言深;
初次见面就企图跟别人成为朋友的行为很幼稚;
初次见面就暗示好处的行为很肤浅;
刚见面就强调价格很便宜的行为很愚蠢;
7、 销售人员是否具备亲和力,别人的脸是自己的一面镜子;
成交并不取决于说理,而是取决于心情
8、 销售人员是否值得信赖。

第六章  新客户开发
案例:为什么客户一开始很有兴趣,但迟迟不下单?
案例:前天明明说不买的客户居然今天却买了,客户的话能相信吗?
案例:客户答应买我司的产品,却突然变卦买别人的了,为什么?
案例:为什么我们会买自己没有兴趣的而且并不需要的产品?
一、客户是根据自己所投入的精力、金钱来确定自己的态度;
二、如何才能引导客户作自我说服?
1.不要轻易给客户下结论,谁会买,谁不会买
2.态度上的变化叫说服,行为上的变化叫接受;
3.我们都喜欢为我们自己的行为找理由,却不善于做我们已找到理由的事;
4.客户是发现了自己的需求,“发现”的依据是自己的行为; 
5.案例:合同签订后,应该问哪4句话,提升客户忠诚度?

第七章 自我激励
1.做销售工作赚钱最快,且最容易得到老板的重视、同事的尊重;
2.不要把第一次见面看成最后一次,工作要积极但不要着急;
3.不是成功太慢,而是放弃太快,钱是给内行的人赚的;
4.不要报着试试看的心态,企图一夜暴富的投机心态让客户反感;
5.不是有希望才坚持,而是坚持了才有希望; 
6.付出才会拥有,而不是拥有才付出;做了才会,而不是会了才做;
7.好工作是做出来的,不是找出来的,不要把自己托付给公司,而要独立成长;
8.尝试不同的工作方法,而不是多年重复使用一种方式,具备试错的精神;
9.工作可以出错,但不可以不做,世界上最危险的莫过于原地不动;
10.不要把未来寄托在自己一无所知的行业上,做好目前的工作;

【培训特点】
1.分组讨论,训练为主,互动式教学;2次现场考试;
2.真实案例分析,大量课后作业题,既有抢答,又有辩论,还有现场演练,热烈的课堂氛围;
3.将销售管理融入培训现场:
   3.1  不仅关注个人学习表现,而且重视团队合作;
   3.2  不仅关注2天以内的学习,而且营造2天以后的培训学习氛围;
   3.3  不仅考核个人得分,而且考核团队得分;不仅考核学员的学习成绩,而且考核学员学习的参与度;


【讲师介绍】 王老师
 销售团队管理咨询师、销售培训讲师;
 曾任可口可乐(中国)公司业务经理;阿里巴巴(中国)网络技术有限公司业务经理;
 清华大学.南京大学EMBA特邀培训讲师;新加坡莱佛士学院特约讲师;
 

Re: [PATCH v5 15/17] dax: add struct iomap based DAX PMD support

2016-10-12 Thread Jan Kara
On Tue 11-10-16 16:51:30, Ross Zwisler wrote:
> On Tue, Oct 11, 2016 at 10:31:52AM +0200, Jan Kara wrote:
> > On Fri 07-10-16 15:09:02, Ross Zwisler wrote:
> > > diff --git a/fs/dax.c b/fs/dax.c
> > > index ac3cd05..e51d51f 100644
> > > --- a/fs/dax.c
> > > +++ b/fs/dax.c
> > > @@ -281,7 +281,7 @@ static wait_queue_head_t *dax_entry_waitqueue(struct 
> > > address_space *mapping,
> > >* queue to the start of that PMD.  This ensures that all offsets in
> > >* the range covered by the PMD map to the same bit lock.
> > >*/
> > > - if (RADIX_DAX_TYPE(entry) == RADIX_DAX_PMD)
> > > + if ((unsigned long)entry & RADIX_DAX_PMD)
> > >   index &= ~((1UL << (PMD_SHIFT - PAGE_SHIFT)) - 1);
> > 
> > I agree with Christoph - helper for masking type bits would make this
> > nicer.
> 
> Fixed via a dax_flag_test() helper as I outlined in the mail to Christoph.  It
> seems clean to me, but if you or Christoph feel strongly that it would be
> cleaner as a local 'flags' variable, I'll make the change.

One idea I had is that you could have helpers like:

dax_is_pmd_entry()
dax_is_pte_entry()
dax_is_empty_entry()
dax_is_hole_entry()

And then you would use these helpers - all the flags would be hidden in the
helpers so even if we decide to change the flagging scheme to compress
things or so, it should be pretty local change.

> > > - entry = (void *)(RADIX_TREE_EXCEPTIONAL_ENTRY |
> > > -RADIX_DAX_ENTRY_LOCK);
> > > +
> > > + /*
> > > +  * Besides huge zero pages the only other thing that gets
> > > +  * downgraded are empty entries which don't need to be
> > > +  * unmapped.
> > > +  */
> > > + if (pmd_downgrade && ((unsigned long)entry & RADIX_DAX_HZP))
> > > + unmap_mapping_range(mapping,
> > > + (index << PAGE_SHIFT) & PMD_MASK, PMD_SIZE, 0);
> > > +
> > >   spin_lock_irq(>tree_lock);
> > > - err = radix_tree_insert(>page_tree, index, entry);
> > > +
> > > + if (pmd_downgrade) {
> > > + radix_tree_delete(>page_tree, index);
> > > + mapping->nrexceptional--;
> > > + dax_wake_mapping_entry_waiter(mapping, index, entry,
> > > + false);
> > 
> > You need to set 'wake_all' argument here to true. Otherwise there could be
> > waiters waiting for non-existent entry forever...
> 
> Interesting.   Fixed, but let me make sure I understand.  So is the issue that
> you could have say 2 tasks waiting on a PMD index that has been rounded down
> to the PMD index via dax_entry_waitqueue()?
> 
> The person holding the lock on the entry would remove the PMD, insert a PTE
> and wake just one of the PMD aligned waiters.  That waiter would wake up, do
> something PTE based (since the PMD space is now polluted with PTEs), and then
> wake any waiters on it's PTE index.  Meanwhile, the second waiter could sleep
> forever on the PMD aligned index.  Is this correct?

Yes.

> So, perhaps more succinctly:
> 
> Thread 1  Thread 2Thread 3
>   
> index 0x202, hold PMD lock 0x200
>   index 0x203, sleep on 0x200
>   index 0x204, sleep on 0x200
> downgrade, removing 0x200
> wake one waiter on 0x200
> insert PTE @ 0x202
>   wake up, grab index 0x203
>   ...
>   wake one waiter on index 0x203
> 
>   ... sleeps forever
> Right?
 
Exactly.

> > > @@ -608,22 +683,28 @@ static void *dax_insert_mapping_entry(struct 
> > > address_space *mapping,
> > >   error = radix_tree_preload(vmf->gfp_mask & ~__GFP_HIGHMEM);
> > >   if (error)
> > >   return ERR_PTR(error);
> > > + } else if (((unsigned long)entry & RADIX_DAX_HZP) &&
> > > + !(flags & RADIX_DAX_HZP)) {
> > > + /* replacing huge zero page with PMD block mapping */
> > > + unmap_mapping_range(mapping,
> > > + (vmf->pgoff << PAGE_SHIFT) & PMD_MASK, PMD_SIZE, 0);
> > >   }
> > >  
> > >   spin_lock_irq(>tree_lock);
> > > - new_entry = (void *)((unsigned long)RADIX_DAX_ENTRY(sector, false) |
> > > -RADIX_DAX_ENTRY_LOCK);
> > > + new_entry = dax_radix_entry(sector, flags);
> > > +
> > 
> > You've lost the RADIX_DAX_ENTRY_LOCK flag here?
> 
> Oh, nope, that's embedded in the dax_radix_entry() helper:
> 
> /* entries begin locked */
> static inline void *dax_radix_entry(sector_t sector, unsigned long flags)
> {
>   return (void *)(RADIX_TREE_EXCEPTIONAL_ENTRY | flags |
>   ((unsigned long)sector << RADIX_DAX_SHIFT) |
>   RADIX_DAX_ENTRY_LOCK);
> }
> 
> I'll s/dax_radix_entry/dax_radix_locked_entry/ or something to make this
> clearer to the reader.

Yep, that would be better. Thanks!

> > >   if 

Re: [Xen-devel] [RFC KERNEL PATCH 0/2] Add Dom0 NVDIMM support for Xen

2016-10-12 Thread Jan Beulich
>>> On 11.10.16 at 17:53,  wrote:
> On Tue, Oct 11, 2016 at 6:08 AM, Jan Beulich  wrote:
> Andrew Cooper  10/10/16 6:44 PM >>>
>>>On 10/10/16 01:35, Haozhong Zhang wrote:
 Xen hypervisor needs assistance from Dom0 Linux kernel for following tasks:
 1) Reserve an area on NVDIMM devices for Xen hypervisor to place
memory management data structures, i.e. frame table and M2P table.
 2) Report SPA ranges of NVDIMM devices and the reserved area to Xen
hypervisor.
>>>
>>>However, I can't see any justification for 1).  Dom0 should not be
>>>involved in Xen's management of its own frame table and m2p.  The mfns
>>>making up the pmem/pblk regions should be treated just like any other
>>>MMIO regions, and be handed wholesale to dom0 by default.
>>
>> That precludes the use as RAM extension, and I thought earlier rounds of
>> discussion had got everyone in agreement that at least for the pmem case
>> we will need some control data in Xen.
> 
> The missing piece for me is why this reservation for control data
> needs to be done in the libnvdimm core?  I would expect that any dax
> capable file could be mapped and made available to a guest.  This
> includes /dev/ramX devices that are dax capable, but are external to
> the libnvdimm sub-system.

Despite me being the only one on the To list, I don't think the question
was really meant to be directed to me.

Jan

___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[patch] libnvdimm, namespace: potential NULL deref on allocation error

2016-10-12 Thread Dan Carpenter
If the kcalloc() fails then "devs" can be NULL and we dereference it
checking "devs[i]".

Fixes: 1b40e09a1232 ('libnvdimm: blk labels and namespace instantiation')
Signed-off-by: Dan Carpenter 

diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index 3509cff..abe5c6b 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -2176,12 +2176,14 @@ static struct device **scan_labels(struct nd_region 
*nd_region)
return devs;
 
  err:
-   for (i = 0; devs[i]; i++)
-   if (is_nd_blk(_region->dev))
-   namespace_blk_release(devs[i]);
-   else
-   namespace_pmem_release(devs[i]);
-   kfree(devs);
+   if (devs) {
+   for (i = 0; devs[i]; i++)
+   if (is_nd_blk(_region->dev))
+   namespace_blk_release(devs[i]);
+   else
+   namespace_pmem_release(devs[i]);
+   kfree(devs);
+   }
return NULL;
 }
 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm