[GIT PULL] libnvdimm for v5.9
Hi Linus, please pull from: git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git/ tags/libnvdimm-for-5.9 ...to receive a new feature in libnvdimm - 'Runtime Firmware Activation', and a few small cleanups and fixes in libnvdimm and DAX. You'd normally receive this pull request from Dan Williams, but he's busy watching a newborn (Congrats Dan!), so I'm watching libnvdimm this cycle. I'd originally intended to make separate topic-based pull requests - one for libnvdimm, and one for DAX, but some of the DAX material fell out since it wasn't quite ready. I ended up merging the two branches into one, and hence a couple of 'internal' merges - I hope these are ok. If you prefer that I should've handled this differently please let me know! I was also expecting a potential conflict - I was assuming Greg had pulled in one of Dan's patches[1] through driver-core, but I don't see it in his tree, and a test merge with the current master went through cleanly. There were a small handful of late fixes, but everything has had at least a week of -next soak time without any reported issues. We've also received a positive build notification from 0-day. [1]: https://lore.kernel.org/linux-nvdimm/20200721104442.gf1676...@kroah.com/ --- The following changes since commit 48778464bb7d346b47157d21ffde2af6b2d39110: Linux 5.8-rc2 (2020-06-21 15:45:29 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git/ tags/libnvdimm-for-5.9 for you to fetch changes up to 7f674025d9f7321dea11b802cc0ab3f09cbe51c5: libnvdimm/security: ensure sysfs poll thread woke up and fetch updated attr (2020-08-03 18:54:13 -0600) libnvdimm for 5.9 - Add 'Runtime Firmware Activation' support for NVDIMMs that advertise the relevant capability - Misc libnvdimm and DAX cleanups Coly Li (1): dax: print error message by pr_info() in __generic_fsdax_supported() Dan Williams (12): libnvdimm: Validate command family indices ACPI: NFIT: Move bus_dsm_mask out of generic nvdimm_bus_descriptor ACPI: NFIT: Define runtime firmware activation commands tools/testing/nvdimm: Cleanup dimm index passing tools/testing/nvdimm: Add command debug messages tools/testing/nvdimm: Prepare nfit_ctl_test() for ND_CMD_CALL emulation tools/testing/nvdimm: Emulate firmware activation commands driver-core: Introduce DEVICE_ATTR_ADMIN_{RO,RW} libnvdimm: Convert to DEVICE_ATTR_ADMIN_RO() PM, libnvdimm: Add runtime firmware activation support ACPI: NFIT: Add runtime firmware activate support ACPI: NFIT: Fix ARS zero-sized allocation Hao Li (1): dax: Fix incorrect argument passed to xas_set_err() Ira Weiny (2): fs/dax: Remove unused size parameter drivers/dax: Expand lock scope to cover the use of addresses Jane Chu (3): libnvdimm/security: fix a typo libnvdimm/security: the 'security' attr never show 'overwrite' state libnvdimm/security: ensure sysfs poll thread woke up and fetch updated attr Vishal Verma (2): Merge branch 'for-5.9/dax' into libnvdimm-for-next Merge branch 'for-5.9/firmware-activate' into libnvdimm-for-next Documentation/ABI/testing/sysfs-bus-nfit | 19 + Documentation/ABI/testing/sysfs-bus-nvdimm | 2 + .../driver-api/nvdimm/firmware-activate.rst| 86 + drivers/acpi/nfit/core.c | 157 ++--- drivers/acpi/nfit/intel.c | 386 + drivers/acpi/nfit/intel.h | 61 drivers/acpi/nfit/nfit.h | 38 +- drivers/dax/super.c| 13 +- drivers/nvdimm/bus.c | 16 + drivers/nvdimm/core.c | 149 drivers/nvdimm/dimm_devs.c | 123 ++- drivers/nvdimm/namespace_devs.c| 2 +- drivers/nvdimm/nd-core.h | 1 + drivers/nvdimm/pfn_devs.c | 2 +- drivers/nvdimm/region_devs.c | 2 +- drivers/nvdimm/security.c | 13 +- fs/dax.c | 15 +- include/linux/device.h | 4 + include/linux/libnvdimm.h | 52 ++- include/linux/suspend.h| 6 + include/linux/sysfs.h | 7 + include/uapi/linux/ndctl.h | 5 + kernel/power/hibernate.c | 97 ++ tools/testing/nvdimm/test/nfit.c | 367 24 files changed, 1486 insertions(+), 137 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-bus-nvdimm create mode
Quotation sheet for EFB-CS 20200609
___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [RFC PATCH 0/8] fsdax: introduce FS query interface to support reflink
On Mon, Aug 10, 2020 at 12:16:57PM +0100, Matthew Wilcox wrote: > On Mon, Aug 10, 2020 at 04:15:50PM +0800, Ruan Shiyang wrote: > > > > > > On 2020/8/7 下午9:38, Matthew Wilcox wrote: > > > On Fri, Aug 07, 2020 at 09:13:28PM +0800, Shiyang Ruan wrote: > > > > This patchset is a try to resolve the problem of tracking shared page > > > > for fsdax. > > > > > > > > Instead of per-page tracking method, this patchset introduces a query > > > > interface: get_shared_files(), which is implemented by each FS, to > > > > obtain the owners of a shared page. It returns an owner list of this > > > > shared page. Then, the memory-failure() iterates the list to be able > > > > to notify each process using files that sharing this page. > > > > > > > > The design of the tracking method is as follow: > > > > 1. dax_assocaite_entry() associates the owner's info to this page > > > > > > I think that's the first problem with this design. dax_associate_entry is > > > a horrendous idea which needs to be ripped out, not made more important. > > > It's all part of the general problem of trying to do something on a > > > per-page basis instead of per-extent basis. > > > > > > > The memory-failure needs to track owners info from a dax page, so I should > > associate the owner with this page. In this version, I associate the block > > device to the dax page, so that the memory-failure is able to iterate the > > owners by the query interface provided by filesystem. > > No, it doesn't need to track owner info from a DAX page. What it needs to > do is ask the filesystem. Just to add to this: the owner tracking that is current done deep inside the DAX code needs to be moved out to the owner of the dax device. That may be the dax device itself, or it may be a filesystem like ext4 or XFS. Initially, nothing will be able to share pages and page owner tracking should be done on the page itself as the DAX code currently does. Hence when a page error occurs, the device owner is called with the page and error information, and the dax device or filesystem can then look up the page owner (via mapping/index pointers) and run the Die Userspace Die functions instead of running this all internally in the DAX code. Once the implementation is abstracted and controlled by the device owner, then we can start working to change the XFS implementation to support shared pages Cheers, Dave. -- Dave Chinner da...@fromorbit.com ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [PATCH] powerpc/papr_scm: Make access mode of 'perf_stats' attribute file to '0400'
Vaibhav Jain writes: > The newly introduced 'perf_stats' attribute uses the default access > mode of 0444 letting non-root users access performance stats of an > nvdimm and potentially force the kernel into issuing large number of > expensive HCALLs. Since the information exposed by this attribute > cannot be cached hence its better to ward of access to this attribute > from non-root users. > > Hence this patch updates the access-mode of 'perf_stats' sysfs > attribute file to 0400 to make it only readable to root-users. Or should we ratelimit it? Fixes: ?? > Reported-by: Aneesh Kumar K.V > Signed-off-by: Vaibhav Jain cheers ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [RFC PATCH 0/8] fsdax: introduce FS query interface to support reflink
On Mon, Aug 10, 2020 at 04:15:50PM +0800, Ruan Shiyang wrote: > > > On 2020/8/7 下午9:38, Matthew Wilcox wrote: > > On Fri, Aug 07, 2020 at 09:13:28PM +0800, Shiyang Ruan wrote: > > > This patchset is a try to resolve the problem of tracking shared page > > > for fsdax. > > > > > > Instead of per-page tracking method, this patchset introduces a query > > > interface: get_shared_files(), which is implemented by each FS, to > > > obtain the owners of a shared page. It returns an owner list of this > > > shared page. Then, the memory-failure() iterates the list to be able > > > to notify each process using files that sharing this page. > > > > > > The design of the tracking method is as follow: > > > 1. dax_assocaite_entry() associates the owner's info to this page > > > > I think that's the first problem with this design. dax_associate_entry is > > a horrendous idea which needs to be ripped out, not made more important. > > It's all part of the general problem of trying to do something on a > > per-page basis instead of per-extent basis. > > > > The memory-failure needs to track owners info from a dax page, so I should > associate the owner with this page. In this version, I associate the block > device to the dax page, so that the memory-failure is able to iterate the > owners by the query interface provided by filesystem. No, it doesn't need to track owner info from a DAX page. What it needs to do is ask the filesystem. ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [RFC PATCH 0/8] fsdax: introduce FS query interface to support reflink
On 2020/8/7 下午9:38, Matthew Wilcox wrote: On Fri, Aug 07, 2020 at 09:13:28PM +0800, Shiyang Ruan wrote: This patchset is a try to resolve the problem of tracking shared page for fsdax. Instead of per-page tracking method, this patchset introduces a query interface: get_shared_files(), which is implemented by each FS, to obtain the owners of a shared page. It returns an owner list of this shared page. Then, the memory-failure() iterates the list to be able to notify each process using files that sharing this page. The design of the tracking method is as follow: 1. dax_assocaite_entry() associates the owner's info to this page I think that's the first problem with this design. dax_associate_entry is a horrendous idea which needs to be ripped out, not made more important. It's all part of the general problem of trying to do something on a per-page basis instead of per-extent basis. The memory-failure needs to track owners info from a dax page, so I should associate the owner with this page. In this version, I associate the block device to the dax page, so that the memory-failure is able to iterate the owners by the query interface provided by filesystem. -- Thanks, Ruan Shiyang. ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [RFC PATCH 1/8] fs: introduce get_shared_files() for dax
On 2020/8/8 上午12:15, Darrick J. Wong wrote: On Fri, Aug 07, 2020 at 09:13:29PM +0800, Shiyang Ruan wrote: Under the mode of both dax and reflink on, one page may be shared by multiple files and offsets. In order to track them in memory-failure or other cases, we introduce this function by finding out who is sharing this block(the page) in a filesystem. It returns a list that contains all the owners, and the offset in each owner. For XFS, rmapbt is used to find out the owners of one block. So, it should be turned on when we want to use dax feature together. Signed-off-by: Shiyang Ruan --- fs/xfs/xfs_super.c | 67 + include/linux/dax.h | 7 + include/linux/fs.h | 2 ++ 3 files changed, 76 insertions(+) diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 379cbff438bc..b71392219c91 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -35,6 +35,9 @@ #include "xfs_refcount_item.h" #include "xfs_bmap_item.h" #include "xfs_reflink.h" +#include "xfs_alloc.h" +#include "xfs_rmap.h" +#include "xfs_rmap_btree.h" #include #include @@ -1097,6 +1100,69 @@ xfs_fs_free_cached_objects( return xfs_reclaim_inodes_nr(XFS_M(sb), sc->nr_to_scan); } +static int _get_shared_files_fn( Needs an xfs_ prefix... + struct xfs_btree_cur*cur, + struct xfs_rmap_irec*rec, + void*priv) +{ + struct list_head*list = priv; + struct xfs_inode*ip; + struct shared_files *sfp; + + /* Get files that incore, filter out others that are not in use. */ + xfs_iget(cur->bc_mp, cur->bc_tp, rec->rm_owner, XFS_IGET_INCORE, 0, ); No error checking at all? What if rm_owner refers to metadata? Yes, we need to check whether the page contains metadata. I remembered that. I wrote the check code but removed it in this patch, because I didn't find a way to associate this block device with the dax page that contains metadata. We can call dax_associate_entry() to create the association if the page's owner is a file, but it's not work for metadata. I should have explained here. + if (ip && !ip->i_vnode.i_mapping) + return 0; When is the xfs_inode released? We don't iput it here, and there's no way for dax_unlock_page (afaict the only consumer) to do it, so we leak the reference. Do you mean xfs_irele() ? Sorry, I didn't realize that. All we need is get the ->mapping form a given inode number. So, I think we can call xfs_irele() when exiting this function. + + sfp = kmalloc(sizeof(*sfp), GFP_KERNEL); If there are millions of open files reflinked to this range of pmem this is going to allocate a /lot/ of memory. + sfp->mapping = ip->i_vnode.i_mapping; sfp->mapping = VFS_I(ip)->i_mapping; + sfp->index = rec->rm_offset; + list_add_tail(>list, list); Why do we leave ->cookie uninitialized? What does it even do? It's my fault. ->cookie should have been added in the next patch. It stores each owner's dax entry when calling dax_lock_page() in memory-failure. + + return 0; +} + +static int +xfs_fs_get_shared_files( + struct super_block *sb, + pgoff_t offset, Which device does this offset refer to? XFS supports multiple storage devices. Also, uh, is this really a pgoff_t? If yes, you can't use it with XFS_B_TO_FSB below without first converting it to a loff_t. The offset here is assigned as iomap->addr, which is obtained from iomap_begin(). So that we can easily looking up for the owners of this offset. I don't quite understand what you said about supporting multiple storage devices. What are these devices? Do you mean NVDIMM, HDD, SSD and others? + struct list_head*list) +{ + struct xfs_mount*mp = XFS_M(sb); + struct xfs_trans*tp = NULL; + struct xfs_btree_cur*cur = NULL; + struct xfs_rmap_irecrmap_low = { 0 }, rmap_high = { 0 }; No need to memset(0) rmap_low later, or zero rmap_high just to memset it later. + struct xfs_buf *agf_bp = NULL; + xfs_agblock_t bno = XFS_B_TO_FSB(mp, offset); "FSB" refers to xfs_fsblock_t. You just ripped the upper 32 bits off the fsblock number. I think I misused these types. Sorry for that. + xfs_agnumber_t agno = XFS_FSB_TO_AGNO(mp, bno); + int error = 0; + + error = xfs_trans_alloc_empty(mp, ); + if (error) + return error; + + error = xfs_alloc_read_agf(mp, tp, agno, 0, _bp); + if (error) + return error; + + cur = xfs_rmapbt_init_cursor(mp, tp, agf_bp, agno); + + memset(>bc_rec, 0, sizeof(cur->bc_rec)); Not necessary, bc_rec is zero in a freshly created cursor. + /* Construct the range for one rmap search */ + memset(_low, 0, sizeof(rmap_low)); +