On Tue, Jan 9, 2024 at 11:58 AM Gregory Price
<gregory.pr...@memverge.com> wrote:
>
> On Tue, Jan 09, 2024 at 11:33:04AM -0800, Hao Xiang wrote:
> > On Mon, Jan 8, 2024 at 5:13 PM Gregory Price <gregory.pr...@memverge.com> 
> > wrote:
> >
> > Sounds like the technical details are explained on the other thread.
> > From what I understand now, if we don't go through a complex CXL
> > setup, it wouldn't go through the emulation path.
> >
> > Here is our exact setup. Guest runs Linux kernel 6.6rc2
> >
> > taskset --cpu-list 0-47,96-143 \
> > numactl -N 0 -m 0 ${QEMU} \
> > -M q35,cxl=on,hmat=on \
> > -m 64G \
> > -smp 8,sockets=1,cores=8,threads=1 \
> > -object memory-backend-ram,id=ram0,size=45G \
> > -numa node,memdev=ram0,cpus=0-7,nodeid=0 \
> > -msg timestamp=on -L /usr/share/seabios \
> > -enable-kvm \
> > -object 
> > memory-backend-ram,id=vmem0,size=19G,host-nodes=${HOST_CXL_NODE},policy=bind
> > \
> > -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
> > -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
> > -device cxl-type3,bus=root_port13,volatile-memdev=vmem0,id=cxl-vmem0 \
> > -numa node,memdev=vmem0,nodeid=1 \
> > -M 
> > cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=19G,cxl-fmw.0.interleave-granularity=8k
>
> :] you did what i thought you did
>
> -numa node,memdev=vmem0,nodeid=1
>
> """
> Another possiblity: You mapped this memory-backend into another numa
> node explicitly and never onlined the memory via cxlcli.  I've done
> this, and it works, but it's a "hidden feature" that probably should
> not exist / be supported.
> """
>
> You're mapping vmem0 into an explicit numa node *and* into the type3
> device.  You don't need to do both - and technically this shouldn't be
> allowed.
>
> With this configuration, you can go thorugh the cxl-cli setup process
> for the CXL device, you'll find that you can create *another* node
> (node 2 in this case) that maps to the same memory you mapped to node1..
>
>
> You can drop the cxl devices objects in here and the memory will still
> come up the way you want it to.
>
> If you drop this line:
>
> -numa node,memdev=vmem0,nodeid=1

We tried this as well and it works after going through the cxlcli
process and created the devdax device. The problem is that without the
"nodeid=1" configuration, we cannot connect with the explicit per numa
node latency/bandwidth configuration "-numa hmat-lb". I glanced at the
code in hw/numa.c, parse_numa_hmat_lb() looks like the one passing the
lb information to VM's hmat.

>From what I understand so far, the guest kernel will dynamically
create a numa node after a cxl devdax device is created. That means we
don't know the numa node until after VM boot. 2. QEMU can only
statically parse the lb information to the VM at boot time. How do we
connect these two things?

Assuming that the same issue applies to a physical server with CXL.
Were you able to see a host kernel getting the correct lb information
for a CXL devdax device?

>
> You have to use the CXL driver to instantiate the dax device and the
> numa node, and at *that* point you will see the read/write functions
> being called.
>
> ~Gregory

Reply via email to