On Mon, 19 Jul 2021 10:01:53 +0800 Jingqi Liu <jingqi....@intel.com> wrote:
> Linux kernel version 5.1 brings in support for the volatile-use of > persistent memory as a hotplugged memory region (KMEM DAX). > When this feature is enabled, persistent memory can be seen as a > separate memory-only NUMA node(s). This newly-added memory can be > selected by its unique NUMA node. > > Add 'target-node' option for 'nvdimm' device to indicate this NUMA > node. It can be extended to a new node after all existing NUMA nodes. > > The 'node' option of 'pc-dimm' device is to add the DIMM to an > existing NUMA node. The 'node' should be in the available NUMA nodes. > For KMEM DAX mode, persistent memory can be in a new separate > memory-only NUMA node. The new node is created dynamically. > So users use 'target-node' to control whether persistent memory > is added to an existing NUMA node or a new NUMA node. > > An example of configuration is as follows. > > Using the following QEMU command: > -object > memory-backend-file,id=nvmem1,share=on,mem-path=/dev/dax0.0,size=3G,align=2M > -device nvdimm,id=nvdimm1,memdev=mem1,label-size=128K,targe-node=2 > > To list DAX devices: > # daxctl list -u > { > "chardev":"dax0.0", > "size":"3.00 GiB (3.22 GB)", > "target_node":2, > "mode":"devdax" > } > > To create a namespace in Device-DAX mode as a standard memory: > $ ndctl create-namespace --mode=devdax --map=mem > To reconfigure DAX device from devdax mode to a system-ram mode: > $ daxctl reconfigure-device dax0.0 --mode=system-ram > > There are two existing NUMA nodes in Guest. After these operations, > persistent memory is configured as a separate Node 2 and > can be used as a volatile memory. This NUMA node is dynamically > created according to 'target-node'. Well, I've looked at spec and series pointed at v1 thread, and I don't really see a good reason to add duplicate 'target-node' property to NVDIMM that for all practical purposes serves the same purpose as already existing 'node' property. The only thing that it does on top of existing 'node' property is facilitate implicit creation of numa nodes on top of user configured ones. But what I really dislike, is adding implicit path to create numa nodes from random place. It just creates mess and and doesn't really work well because you will have to plumb into other code to account for implicit nodes for it to work properly. (1st thing that comes to mind is HMAT configuration won't accept this implicit nodes, there might be other places that will not work as expected). So I suggest to abandon this approach and use already existing numa CLI options to do what you need. What you are trying to achieve can be done without this series as QEMU allows to create memory only nodes and even empty ones (for future hotplug) just fine. The only thing is that one shall specify complete planned numa topology on command line. Here is an example that works for me: -machine q35,nvdimm=on \ -m 4G,slots=4,maxmem=12G \ -smp 4,cores=2 \ -object memory-backend-ram,size=4G,policy=bind,host-nodes=0,id=ram-node0 \ -numa node,nodeid=0,memdev=ram-node0 # explicitly assign all CPUs -numa cpu,node-id=0,socket-id=0 -numa cpu,node-id=0,socket-id=1 # and create a cpu-less node for you nvdimm -numa node,nodeid=1 with that you can hotplug nvdimm to with 'node=1' property set or provide that at startup, like this: -object memory-backend-file,id=mem1,share=on,mem-path=nvdimmfile,size=3G,align=2M \ -device nvdimm,id=nvdimm1,memdev=mem1,label-size=128K,node=1 after boot numactl -H will show: available: 1 nodes (0) node 0 cpus: 0 1 2 3 node 0 size: 3924 MB node 0 free: 3657 MB node distances: node 0 0: 10 and after initializing nvdimm as a dax device and reconfiguring that to system memory it will show as 'new' 'memory only' node available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 node 0 size: 3924 MB node 0 free: 3641 MB node 1 cpus: node 1 size: 896 MB node 1 free: 896 MB node distances: node 0 1 0: 10 20 1: 20 10 > Signed-off-by: Jingqi Liu <jingqi....@intel.com> [...]