On Wed, 2015-11-18 at 16:09 +0100, Jan Kara wrote:
> Hum, I don't get this. truncate_inode_pages_final() gets called when inode
> has no more users. So there are no mappings of the inode. So how could
> truncate_pagecache() possibly make a difference?

True.  I confirmed with more focus testing that the change to
truncate_inode_pages_final() is not necessary.  After
invalidate_inodes() does unmap_mapping_range() we are protected by
future calls to get_block() and blk_queue_enter() failing when there
are attempts to re-establish a mapping after the block device has been
torn down.

Here's a revised patch.  Note that the call truncate_pagecache() is
replaced with a call to unmap_mapping_range() since it is fine to
access zero pages that might still be in the page cache.

8<----
Subject: mm, dax: unmap dax mappings at bdev or fs shutdown

From: Dan Williams <[email protected]>

Currently dax mappings leak past / survive block_device shutdown.  While
page cache pages are permitted to be read/written after the block_device
is torn down this is not acceptable in the dax case as all media access
must end when the device is disabled.  The pfn backing a dax mapping is
permitted to be invalidated after bdev shutdown and this is indeed the
case with brd.

When a dax capable block_device driver calls del_gendisk() in its
shutdown path del_gendisk() needs to ensure that all DAX pfns are
unmapped.  This is different than the pagecache backed case where the
disk is protected by the queue being torn down which ends I/O to the
device.  Since dax bypasses the page cache we need to unconditionally
unmap the inode.

Cc: <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Ross Zwisler <[email protected]>
[honza: drop changes to truncate_inode_pages_final]
Signed-off-by: Dan Williams <[email protected]>
---
 fs/inode.c |   27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/fs/inode.c b/fs/inode.c
index 1be5f9003eb3..dcb31d2c15e6 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -579,6 +579,18 @@ static void dispose_list(struct list_head *head)
        }
 }
 
+static void unmap_list(struct list_head *head)
+{
+       struct inode *inode, *_i;
+
+       list_for_each_entry_safe(inode, _i, head, i_lru) {
+               list_del_init(&inode->i_lru);
+               unmap_mapping_range(&inode->i_data, 0, 0, 1);
+               iput(inode);
+               cond_resched();
+       }
+}
+
 /**
  * evict_inodes        - evict all evictable inodes for a superblock
  * @sb:                superblock to operate on
@@ -642,6 +654,7 @@ int invalidate_inodes(struct super_block *sb, bool 
kill_dirty)
        int busy = 0;
        struct inode *inode, *next;
        LIST_HEAD(dispose);
+       LIST_HEAD(unmap);
 
        spin_lock(&sb->s_inode_list_lock);
        list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
@@ -655,6 +668,19 @@ int invalidate_inodes(struct super_block *sb, bool 
kill_dirty)
                        busy = 1;
                        continue;
                }
+               if (IS_DAX(inode) && atomic_read(&inode->i_count)) {
+                       /*
+                        * dax mappings can't live past this invalidation event
+                        * as there is no page cache present to allow the data
+                        * to remain accessible.
+                        */
+                       __iget(inode);
+                       inode_lru_list_del(inode);
+                       spin_unlock(&inode->i_lock);
+                       list_add(&inode->i_lru, &unmap);
+                       busy = 1;
+                       continue;
+               }
                if (atomic_read(&inode->i_count)) {
                        spin_unlock(&inode->i_lock);
                        busy = 1;
@@ -669,6 +695,7 @@ int invalidate_inodes(struct super_block *sb, bool 
kill_dirty)
        spin_unlock(&sb->s_inode_list_lock);
 
        dispose_list(&dispose);
+       unmap_list(&unmap);
 
        return busy;
 }

Reply via email to