Il 04/12/2013 08:58, Wanlong Gao ha scritto: > As you know, QEMU can't direct it's memory allocation now, this may cause > guest cross node access performance regression. > And, the worse thing is that if PCI-passthrough is used, > direct-attached-device uses DMA transfer between device and qemu process. > All pages of the guest will be pinned by get_user_pages(). > > KVM_ASSIGN_PCI_DEVICE ioctl > kvm_vm_ioctl_assign_device() > =>kvm_assign_device() > => kvm_iommu_map_memslots() > => kvm_iommu_map_pages() > => kvm_pin_pages() > > So, with direct-attached-device, all guest page's page count will be +1 and > any page migration will not work. AutoNUMA won't too. > > So, we should set the guest nodes memory allocation policy before > the pages are really mapped. > > According to this patch set, we are able to set guest nodes memory policy > like following: > > -numa node,nodeid=0,cpus=0, \ > -numa mem,size=1024M,policy=membind,host-nodes=0-1 \ > -numa node,nodeid=1,cpus=1 \ > -numa mem,size=1024M,policy=interleave,host-nodes=1 > > This supports > "policy={default|membind|interleave|preferred},relative=true,host-nodes=N-N" > like format. > > And add a QMP command "query-numa" to show numa info through > this API. > > And convert the "info numa" monitor command to use this > QMP command "query-numa". > > This version removes "set-mem-policy" qmp and hmp commands temporarily > as Marcelo and Paolo suggested. > > > The simple test is like following: > ===================================================== > Before: > # numactl -H && /qemu/x86_64-softmmu/qemu-system-x86_64 -m 4096 -smp 2 -numa > node,nodeid=0,cpus=0,mem=2048 -numa node,nodeid=1,cpus=1,mem=2048 -hda > 6u4ga2.qcow2 -enable-kvm -device > pci-assign,host=07:00.1,id=hostdev0,bus=pci.0,addr=0x7 & sleep 40 && numactl > -H > [1] 13320 > available: 2 nodes (0-1) > node 0 cpus: 0 2 > node 0 size: 5111 MB > node 0 free: 4653 MB > node 1 cpus: 1 3 > node 1 size: 5120 MB > node 1 free: 4764 MB > node distances: > node 0 1 > 0: 10 20 > 1: 20 10 > available: 2 nodes (0-1) > node 0 cpus: 0 2 > node 0 size: 5111 MB > node 0 free: 4317 MB > node 1 cpus: 1 3 > node 1 size: 5120 MB > node 1 free: 876 MB > node distances: > node 0 1 > 0: 10 20 > 1: 20 10 > > > > After: > # numactl -H && /qemu/x86_64-softmmu/qemu-system-x86_64 -m 4096 -smp 4 -numa > node,nodeid=0,cpus=0,cpus=2 -numa mem,size=2048M,policy=membind,host-nodes=0 > -numa node,nodeid=0,cpus=1,cpus=3 -numa > mem,size=2048M,policy=membind,host-nodes=1 -hda 6u4ga2.qcow2 -enable-kvm > -device pci-assign,host=07:00.1,id=hostdev0,bus=pci.0,addr=0x7 & sleep 40 && > numactl -H > [1] 10862 > available: 2 nodes (0-1) > node 0 cpus: 0 2 > node 0 size: 5111 MB > node 0 free: 4718 MB > node 1 cpus: 1 3 > node 1 size: 5120 MB > node 1 free: 4799 MB > node distances: > node 0 1 > 0: 10 20 > 1: 20 10 > available: 2 nodes (0-1) > node 0 cpus: 0 2 > node 0 size: 5111 MB > node 0 free: 2544 MB > node 1 cpus: 1 3 > node 1 size: 5120 MB > node 1 free: 2725 MB > node distances: > node 0 1 > 0: 10 20 > 1: 20 10 > =================================================== > > > V1->V2: > change to use QemuOpts in numa options (Paolo) > handle Error in mpol parser (Paolo) > change qmp command format to mem-policy=membind,mem-hostnode=0-1 like > (Paolo) > V2->V3: > also handle Error in cpus parser (5/10) > split out common parser from cpus and hostnode parser (Bandan 6/10) > V3-V4: > rebase to request for comments > V4->V5: > use OptVisitor and split -numa option (Paolo) > - s/set-mpol/set-mem-policy (Andreas) > - s/mem-policy/policy > - s/mem-hostnode/host-nodes > fix hmp command process after error (Luiz) > add qmp command query-numa and convert info numa to it (Luiz) > V5->V6: > remove tabs in json file (Laszlo, Paolo) > add back "-numa node,mem=xxx" as legacy (Paolo) > change cpus and host-nodes to array (Laszlo, Eric) > change "nodeid" to "uint16" > add NumaMemPolicy enum type (Eric) > rebased on Laszlo's "OptsVisitor: support / flatten integer ranges for > repeating options" patch set, thanks for Laszlo's help > V6-V7: > change UInt16 to uint16 (Laszlo) > fix a typo in adding qmp command set-mem-policy > V7-V8: > rebase to current master with Laszlo's V2 of OptsVisitor patch set > fix an adding white space line error > V8->V9: > rebase to current master > check if total numa memory size is equal to ram_size (Paolo) > add comments to the OptsVisitor stuff in qapi-schema.json (Eric, Laszlo) > replace the use of numa_num_configured_nodes() (Andrew) > avoid abusing the fact i==nodeid (Andrew) > V9->V10: > rebase to current master > remove libnuma (Andrew) > MAX_NODES=64 -> MAX_NODES=128 since libnuma selected 128 (Andrew) > use MAX_NODES instead of MAX_CPUMASK_BITS for host_mem bitmap (Andrew) > remove a useless clear_bit() operation (Andrew) > V10->V11: > rebase to current master > fix "maxnode" argument of mbind(2) > V11->V12: > rebase to current master > split patch 02/11 of V11 (Eduardo) > add some max value check (Eduardo) > split MAX_NODES change patch (Eduardo) > V12->V13: > rebase to current master > thanks for Luiz's review (Luiz) > doc hmp command set-mem-policy (Luiz) > rename: NUMAInfo -> NUMANode (Luiz) > V13->V14: > remove "set-mem-policy" qmp and hmp commands (Marcelo, Paolo) > V14->V15: > rebase to the current master > V15->V16: > rebase to current master > add more test log > V16->V17: > use MemoryRegion to set policy instead of using "pc.ram" (Paolo) > > Wanlong Gao (11): > NUMA: move numa related code to new file numa.c > NUMA: check if the total numa memory size is equal to ram_size > NUMA: Add numa_info structure to contain numa nodes info > NUMA: convert -numa option to use OptsVisitor > NUMA: introduce NumaMemOptions > NUMA: add "-numa mem," options > NUMA: expand MAX_NODES from 64 to 128 > NUMA: parse guest numa nodes memory policy > NUMA: set guest numa nodes memory policy > NUMA: add qmp command query-numa > NUMA: convert hmp command info_numa to use qmp command query_numa > > Makefile.target | 2 +- > cpus.c | 14 -- > hmp.c | 57 +++++++ > hmp.h | 1 + > hw/i386/pc.c | 21 ++- > include/exec/memory.h | 15 ++ > include/sysemu/cpus.h | 1 - > include/sysemu/sysemu.h | 18 ++- > monitor.c | 21 +-- > numa.c | 408 > ++++++++++++++++++++++++++++++++++++++++++++++++ > qapi-schema.json | 112 +++++++++++++ > qemu-options.hx | 6 +- > qmp-commands.hx | 49 ++++++ > vl.c | 160 +++---------------- > 14 files changed, 698 insertions(+), 187 deletions(-) > create mode 100644 numa.c >
I think patches 1-4 and 7 are fine. For the rest, I'd rather wait for Igor's patches and try to integrate with Igor's memory hotplug patches. Paolo