Re: [PATCH v5 09/13] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents

2024-04-26 Thread Gregory Price
On Fri, Apr 26, 2024 at 04:55:55PM +0100, Jonathan Cameron wrote:
> On Wed, 24 Apr 2024 10:33:33 -0700
> Ira Weiny  wrote:
> 
> > Markus Armbruster wrote:
> > > nifan@gmail.com writes:
> > >   
> > > > From: Fan Ni 
> > > >
> > > > Since fabric manager emulation is not supported yet, the change 
> > > > implements
> > > > the functions to add/release dynamic capacity extents as QMP 
> > > > interfaces.  
> > > 
> > > Will fabric manager emulation obsolete these commands?  
> > 
> > I don't think so.  In the development of the kernel, I see these being
> > valuable to do CI and regression testing without the complexity of an FM.
> 
> Fully agree - I also long term see these as the drivers for one
> possible virtualization stack for DCD devices (whether it turns
> out to be the way forwards for that is going to take a while to
> resolve!)
> 
> It doesn't make much sense to add a fabric manager into that flow
> or to expose an appropriate (maybe MCTP) interface from QEMU just
> to poke the emulated device.
> 
> Jonathan
> 

fwiw it's useful in modeling the Orchestrator/Fabric Manager interaction,
since you can basically build a little emulated MHD FM-LD on top of this.

You basically just put a tiny software layer in front that converts what
would be MCTP or whatever commands into QMP commands forwarded to the
appropriate socket.

When a real device comes around, you just point it at the real thing
instead of that small software layer.

But for the actual fabric manager, less useful. (Also, if you're
confused, it's because fabric manager is such an overloaded term
*laughcry*)

~Gregory



Re: [PATCH v7 00/12] Enabling DCD emulation support in Qemu

2024-04-22 Thread Gregory Price
On Mon, Apr 22, 2024 at 01:04:48PM +0100, Jonathan Cameron wrote:
> On Sat, 20 Apr 2024 16:35:46 -0400
> Gregory Price  wrote:
> 
> > On Fri, Apr 19, 2024 at 11:43:14AM -0700, fan wrote:
> > > On Fri, Apr 19, 2024 at 02:24:36PM -0400, Gregory Price wrote:  
> > > > 
> > > > added review to all patches, will hopefully be able to add a Tested-by
> > > > tag early next week, along with a v1 RFC for MHD bit-tracking.
> > > > 
> > > > We've been testing v5/v6 for a bit, so I expect as soon as we get the
> > > > MHD code ported over to v7 i'll ship a tested-by tag pretty quick.
> > > > 
> > > > The super-set release will complicate a few things but this doesn't
> > > > look like a blocker on our end, just a change to how we track bits in a
> > > > shared bit/bytemap.
> > > >   
> > > 
> > > Hi Gregory,
> > > Thanks for reviewing the patches so quickly. 
> > > 
> > > No pressure, but look forward to your MHD work. :)
> > > 
> > > Fan  
> > 
> > Starting to get into versioniong hell a bit, since the Niagara work was
> > based off of jonathan's branch and the mhd-dcd work needs some of the
> > extentions from that branch - while this branch is based on master.
> > 
> > Probably we'll need to wait for a new cxl dated branch to try and sus
> > out the pain points before we push an RFC.  I would not want to have
> > conflicting commits for something like this for example:
> > 
> > https://lore.kernel.org/qemu-devel/20230901012914.226527-2-gregory.pr...@memverge.com/
> > 
> > We get merge conflicts here because this is behind that patch. So
> > pushing up an RFC in this state would be mostly useless to everyone
> 
> Subtle hint noted ;) 
>

Gentle nudge/poke/prod :P

Got your updates, thank you!  We should have something cleaned up today 
hopefully.

> I'll build a fresh tree - any remaining rebases until QEMU 9.0 should be
> straight forward anyway.   My ideal is that the NUMA GP series lands early
> in 9.1 cycle and this can go in parallel.  I'd really like to
> get this in early if possible so we can start clearing some of the other
> stuff that ended up built on top of it!
> 
> Jonathan
> 
> > 
> > ~Gregory
> 



Re: [PATCH v7 00/12] Enabling DCD emulation support in Qemu

2024-04-20 Thread Gregory Price
On Fri, Apr 19, 2024 at 11:43:14AM -0700, fan wrote:
> On Fri, Apr 19, 2024 at 02:24:36PM -0400, Gregory Price wrote:
> > 
> > added review to all patches, will hopefully be able to add a Tested-by
> > tag early next week, along with a v1 RFC for MHD bit-tracking.
> > 
> > We've been testing v5/v6 for a bit, so I expect as soon as we get the
> > MHD code ported over to v7 i'll ship a tested-by tag pretty quick.
> > 
> > The super-set release will complicate a few things but this doesn't
> > look like a blocker on our end, just a change to how we track bits in a
> > shared bit/bytemap.
> > 
> 
> Hi Gregory,
> Thanks for reviewing the patches so quickly. 
> 
> No pressure, but look forward to your MHD work. :)
> 
> Fan

Starting to get into versioniong hell a bit, since the Niagara work was
based off of jonathan's branch and the mhd-dcd work needs some of the
extentions from that branch - while this branch is based on master.

Probably we'll need to wait for a new cxl dated branch to try and sus
out the pain points before we push an RFC.  I would not want to have
conflicting commits for something like this for example:

https://lore.kernel.org/qemu-devel/20230901012914.226527-2-gregory.pr...@memverge.com/

We get merge conflicts here because this is behind that patch. So
pushing up an RFC in this state would be mostly useless to everyone.

~Gregory



Re: [PATCH v7 00/12] Enabling DCD emulation support in Qemu

2024-04-19 Thread Gregory Price
On Thu, Apr 18, 2024 at 04:10:51PM -0700, nifan@gmail.com wrote:
> A git tree of this series can be found here (with one extra commit on top
> for printing out accepted/pending extent list): 
> https://github.com/moking/qemu/tree/dcd-v7
> 
> v6->v7:
> 
> 1. Fixed the dvsec range register issue mentioned in the the cover letter in 
> v6.
>Only relevant bits are set to mark the device ready (Patch 6). (Jonathan)
> 2. Moved the if statement in cxl_setup_memory from Patch 6 to Patch 4. 
> (Jonathan)
> 3. Used MIN instead of if statement to get record_count in Patch 7. (Jonathan)
> 4. Added "Reviewed-by" tag to Patch 7.
> 5. Modified cxl_dc_extent_release_dry_run so the updated extent list can be
>reused in cmd_dcd_release_dyn_cap to simplify the process in Patch 8. 
> (Jørgen) 
> 6. Added comments to indicate further "TODO" items in cmd_dcd_add_dyn_cap_rsp.
> (Jonathan)
> 7. Avoided irrelevant code reformat in Patch 8. (Jonathan)
> 8. Modified QMP interfaces for adding/releasing DC extents to allow passing
>tags, selection policy, flags in the interface. (Jonathan, Gregory)
> 9. Redesigned the pending list so extents in the same requests are grouped
> together. A new data structure is introduced to represent "extent group"
> in pending list.  (Jonathan)
> 10. Added support in QMP interface for "More" flag. 
> 11. Check "Forced removal" flag for release request and not let it pass 
> through.
> 12. Removed the dynamic capacity log type from CxlEventLog definition in 
> cxl.json
>to avoid the side effect it may introduce to inject error to DC event log.
>(Jonathan)
> 13. Hard coded the event log type to dynamic capacity event log in QMP
> interfaces. (Jonathan)
> 14. Adding space in between "-1]". (Jonathan)
> 15. Some minor comment fixes.
> 
> The code is tested with similar setup and has passed similar tests as listed
> in the cover letter of v5[1] and v6[2].
> Also, the code is tested with the latest DCD kernel patchset[3].
> 
> [1] Qemu DCD patchset v5: 
> https://lore.kernel.org/linux-cxl/20240304194331.1586191-1-nifan@gmail.com/T/#t
> [2] Qemu DCD patchset v6: 
> https://lore.kernel.org/linux-cxl/20240325190339.696686-1-nifan@gmail.com/T/#t
> [3] DCD kernel patches: 
> https://lore.kernel.org/linux-cxl/20240324-dcd-type2-upstream-v1-0-b7b00d623...@intel.com/T/#m11c571e21c4fe17c7d04ec5c2c7bc7cbf2cd07e3
>

added review to all patches, will hopefully be able to add a Tested-by
tag early next week, along with a v1 RFC for MHD bit-tracking.

We've been testing v5/v6 for a bit, so I expect as soon as we get the
MHD code ported over to v7 i'll ship a tested-by tag pretty quick.

The super-set release will complicate a few things but this doesn't
look like a blocker on our end, just a change to how we track bits in a
shared bit/bytemap.

> 
> Fan Ni (12):
>   hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output
> payload of identify memory device command
>   hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative
> and mailbox command support
>   include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for
> type3 memory devices
>   hw/mem/cxl_type3: Add support to create DC regions to type3 memory
> devices
>   hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr
> size instead of mr as argument
>   hw/mem/cxl_type3: Add host backend and address space handling for DC
> regions
>   hw/mem/cxl_type3: Add DC extent list representative and get DC extent
> list mailbox support
>   hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release
> dynamic capacity response
>   hw/cxl/events: Add qmp interfaces to add/release dynamic capacity
> extents
>   hw/mem/cxl_type3: Add DPA range validation for accesses to DC regions
>   hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support
>   hw/mem/cxl_type3: Allow to release extent superset in QMP interface
> 
>  hw/cxl/cxl-mailbox-utils.c  | 620 ++-
>  hw/mem/cxl_type3.c  | 633 +---
>  hw/mem/cxl_type3_stubs.c|  20 ++
>  include/hw/cxl/cxl_device.h |  81 -
>  include/hw/cxl/cxl_events.h |  18 +
>  qapi/cxl.json   |  69 
>  6 files changed, 1396 insertions(+), 45 deletions(-)
> 
> -- 
> 2.43.0
> 



Re: [PATCH v7 12/12] hw/mem/cxl_type3: Allow to release extent superset in QMP interface

2024-04-19 Thread Gregory Price
On Thu, Apr 18, 2024 at 04:11:03PM -0700, nifan@gmail.com wrote:
> From: Fan Ni 
> 
> Before the change, the QMP interface used for add/release DC extents
> only allows to release an extent whose DPA range is contained by a single
> accepted extent in the device.
> 
> With the change, we relax the constraints.  As long as the DPA range of
> the extent is covered by accepted extents, we allow the release.
> 
> Reviewed-by: Jonathan Cameron 
> Signed-off-by: Fan Ni 
> ---
>  hw/mem/cxl_type3.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 

Reveiwed-by: Gregory Price 

> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index a3e1a5de25..9e725647f1 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -1941,7 +1941,7 @@ static void 
> qmp_cxl_process_dynamic_capacity_prescriptive(const char *path,
> "cannot release extent with pending DPA range");
>  return;
>  }
> -if (!cxl_extents_contains_dpa_range(>dc.extents, dpa, len)) 
> {
> +if (!ct3_test_region_block_backed(dcd, dpa, len)) {
>  error_setg(errp,
> "cannot release extent with non-existing DPA 
> range");
>  return;
> -- 
> 2.43.0
> 



Re: [PATCH v7 11/12] hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support

2024-04-19 Thread Gregory Price
On Thu, Apr 18, 2024 at 04:11:02PM -0700, nifan@gmail.com wrote:
> From: Fan Ni 
> 
> With the change, we extend the extent release mailbox command processing
> to allow more flexible release. As long as the DPA range of the extent to
> release is covered by accepted extent(s) in the device, the release can be
> performed.
> 
> Signed-off-by: Fan Ni 
> ---
>  hw/cxl/cxl-mailbox-utils.c | 21 -
>  1 file changed, 8 insertions(+), 13 deletions(-)
>

Hmmm.  This will complicate MHD accounting, but it looks ok to me as-is.

Reviewed-by: Gregory Price 

> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 57f1ce9cce..89f0ab8116 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -1704,6 +1704,13 @@ static CXLRetCode 
> cxl_dc_extent_release_dry_run(CXLType3Dev *ct3d,
>  dpa = in->updated_entries[i].start_dpa;
>  len = in->updated_entries[i].len;
>  
> +/* Check if the DPA range is not fully backed with valid extents */
> +if (!ct3_test_region_block_backed(ct3d, dpa, len)) {
> +ret = CXL_MBOX_INVALID_PA;
> +goto free_and_exit;
> +}
> +
> +/* After this point, extent overflow is the only error can happen */
>  while (len > 0) {
>  QTAILQ_FOREACH(ent, updated_list, node) {
>  range_init_nofail(, ent->start_dpa, ent->len);
> @@ -1718,14 +1725,7 @@ static CXLRetCode 
> cxl_dc_extent_release_dry_run(CXLType3Dev *ct3d,
>  if (range_contains(, dpa + len - 1)) {
>  len2 = ent_start_dpa + ent_len - dpa - len;
>  } else {
> -/*
> - * TODO: we reject the attempt to remove an extent
> - * that overlaps with multiple extents in the device
> - * for now. We will allow it once superset release
> - * support is added.
> - */
> -ret = CXL_MBOX_INVALID_PA;
> -goto free_and_exit;
> +dpa = ent_start_dpa + ent_len;
>  }
>  len_done = ent_len - len1 - len2;
>  
> @@ -1752,14 +1752,9 @@ static CXLRetCode 
> cxl_dc_extent_release_dry_run(CXLType3Dev *ct3d,
>  }
>  
>  len -= len_done;
> -/* len == 0 here until superset release is added */
>  break;
>  }
>  }
> -if (len) {
> -ret = CXL_MBOX_INVALID_PA;
> -goto free_and_exit;
> -}
>  }
>  }
>  free_and_exit:
> -- 
> 2.43.0
> 



Re: [PATCH v7 08/12] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response

2024-04-19 Thread Gregory Price
On Thu, Apr 18, 2024 at 04:10:59PM -0700, nifan@gmail.com wrote:
> From: Fan Ni 
> 
> Per CXL spec 3.1, two mailbox commands are implemented:
> Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.9.9.3, and
> Release Dynamic Capacity (Opcode 4803h) 8.2.9.9.9.4.
> 
> For the process of the above two commands, we use two-pass approach.
> Pass 1: Check whether the input payload is valid or not; if not, skip
> Pass 2 and return mailbox process error.
> Pass 2: Do the real work--add or release extents, respectively.
> 
> Signed-off-by: Fan Ni 
> ---
>  hw/cxl/cxl-mailbox-utils.c  | 394 
>  hw/mem/cxl_type3.c  |  11 +
>  include/hw/cxl/cxl_device.h |   4 +
>  3 files changed, 409 insertions(+)
> 

Reviewed-by: Gregory Price 



Re: [PATCH v7 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents

2024-04-19 Thread Gregory Price
On Thu, Apr 18, 2024 at 04:11:00PM -0700, nifan@gmail.com wrote:
> From: Fan Ni 
> 
> To simulate FM functionalities for initiating Dynamic Capacity Add
> (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> add/release dynamic capacity extents requests.
> 
> With the change, we allow to release an extent only when its DPA range
> is contained by a single accepted extent in the device. That is to say,
> extent superset release is not supported yet.
> 
...
> 
> Signed-off-by: Fan Ni 
> ---
>  hw/cxl/cxl-mailbox-utils.c  |  62 +--
>  hw/mem/cxl_type3.c  | 311 +++-
>  hw/mem/cxl_type3_stubs.c|  20 +++
>  include/hw/cxl/cxl_device.h |  22 +++
>  include/hw/cxl/cxl_events.h |  18 +++
>  qapi/cxl.json   |  69 
>  6 files changed, 489 insertions(+), 13 deletions(-)
> 

Reviewed-by: Gregory Price 



Re: [PATCH v7 06/12] hw/mem/cxl_type3: Add host backend and address space handling for DC regions

2024-04-19 Thread Gregory Price
On Thu, Apr 18, 2024 at 04:10:57PM -0700, nifan@gmail.com wrote:
> From: Fan Ni 
> 
> Add (file/memory backed) host backend for DCD. All the dynamic capacity
> regions will share a single, large enough host backend. Set up address
> space for DC regions to support read/write operations to dynamic capacity
> for DCD.
> 
> With the change, the following support is added:
> 1. Add a new property to type3 device "volatile-dc-memdev" to point to host
>memory backend for dynamic capacity. Currently, all DC regions share one
>host backend;
> 2. Add namespace for dynamic capacity for read/write support;
> 3. Create cdat entries for each dynamic capacity region.
> 
> Signed-off-by: Fan Ni 
> ---
>  hw/cxl/cxl-mailbox-utils.c  |  16 ++--
>  hw/mem/cxl_type3.c  | 172 +---
>  include/hw/cxl/cxl_device.h |   8 ++
>  3 files changed, 160 insertions(+), 36 deletions(-)
> 

A couple general comments in line for discussion, but patch looks good
otherwise. Notes are mostly on improvements we could make that should
not block this patch.

Reviewed-by: Gregory Price 

>  
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index a1fe268560..ac87398089 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -45,7 +45,8 @@ enum {
>  
>  static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
>int dsmad_handle, uint64_t size,
> -  bool is_pmem, uint64_t dpa_base)
> +  bool is_pmem, bool is_dynamic,
> +  uint64_t dpa_base)

We should probably change the is_* fields into a flags field and do some
error checking on the combination of flags.

>  {
>  CDATDsmas *dsmas;
>  CDATDslbis *dslbis0;
> @@ -61,7 +62,8 @@ static void ct3_build_cdat_entries_for_mr(CDATSubHeader 
> **cdat_table,
>  .length = sizeof(*dsmas),
>  },
>  .DSMADhandle = dsmad_handle,
> -.flags = is_pmem ? CDAT_DSMAS_FLAG_NV : 0,
> +.flags = (is_pmem ? CDAT_DSMAS_FLAG_NV : 0) |
> + (is_dynamic ? CDAT_DSMAS_FLAG_DYNAMIC_CAP : 0),

For example, as noted elsewhere in the code, is_pmem+is_dynamic is not
presently supported, so this shouldn't even be allowed in this function.

> +if (dc_mr) {
> +int i;
> +uint64_t region_base = vmr_size + pmr_size;
> +
> +/*
> + * TODO: we assume the dynamic capacity to be volatile for now.
> + * Non-volatile dynamic capacity will be added if needed in the
> + * future.
> + */

Probably don't need to mark this TODO, can just leave it as a note.

Non-volatile dynamic capacity will coincide with shared memory, so it'll
end up handled.  So this isn't really a TODO for this current work, and
should read more like:

"Dynamic Capacity is always volatile, until shared memory is
implemented"

> +} else if (ct3d->hostpmem) {
>  range1_size_hi = ct3d->hostpmem->size >> 32;
>  range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
>   (ct3d->hostpmem->size & 0xF000);
> +} else {
> +/*
> + * For DCD with no static memory, set memory active, memory class 
> bits.
> + * No range is set.
> + */
> +range1_size_lo = (2 << 5) | (2 << 2) | 0x3;

We should probably add defs for these fields at some point. Can be
tabled for later work though.

> +/*
> + * TODO: set dc as volatile for now, non-volatile support can be 
> added
> + * in the future if needed.
> + */
> +memory_region_set_nonvolatile(dc_mr, false);

Again can probably drop the TODO and just leave a statement.

~Gregory



Re: [PATCH v7 10/12] hw/mem/cxl_type3: Add DPA range validation for accesses to DC regions

2024-04-19 Thread Gregory Price
On Thu, Apr 18, 2024 at 04:11:01PM -0700, nifan@gmail.com wrote:
> From: Fan Ni 
> 
> All DPA ranges in the DC regions are invalid to access until an extent
> covering the range has been successfully accepted by the host. A bitmap
> is added to each region to record whether a DC block in the region has
> been backed by a DC extent. Each bit in the bitmap represents a DC block.
> When a DC extent is accepted, all the bits representing the blocks in the
> extent are set, which will be cleared when the extent is released.
> 
> Reviewed-by: Jonathan Cameron 
> Signed-off-by: Fan Ni 
> ---
>  hw/cxl/cxl-mailbox-utils.c  |  3 ++
>  hw/mem/cxl_type3.c  | 76 +
>  include/hw/cxl/cxl_device.h |  7 
>  3 files changed, 86 insertions(+)
> 

Reviewed-by: Gregory Price 




Re: [PATCH v7 07/12] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support

2024-04-19 Thread Gregory Price
On Thu, Apr 18, 2024 at 04:10:58PM -0700, nifan@gmail.com wrote:
> From: Fan Ni 
> 
> Add dynamic capacity extent list representative to the definition of
> CXLType3Dev and implement get DC extent list mailbox command per
> CXL.spec.3.1:.8.2.9.9.9.2.
> 
> Reviewed-by: Jonathan Cameron 
> Signed-off-by: Fan Ni 
> ---
>  hw/cxl/cxl-mailbox-utils.c  | 73 -
>  hw/mem/cxl_type3.c  |  1 +
>  include/hw/cxl/cxl_device.h | 22 +++
>  3 files changed, 95 insertions(+), 1 deletion(-)
> 

Reviewed-by: Gregory Price 



Re: [PATCH v7 04/12] hw/mem/cxl_type3: Add support to create DC regions to type3 memory devices

2024-04-19 Thread Gregory Price
On Thu, Apr 18, 2024 at 04:10:55PM -0700, nifan@gmail.com wrote:
> From: Fan Ni 
> 
> With the change, when setting up memory for type3 memory device, we can
> create DC regions.
> A property 'num-dc-regions' is added to ct3_props to allow users to pass the
> number of DC regions to create. To make it easier, other region parameters
> like region base, length, and block size are hard coded. If needed,
> these parameters can be added easily.
> 
> With the change, we can create DC regions with proper kernel side
> support like below:
> 
> region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
> echo $region > /sys/bus/cxl/devices/decoder0.0/create_dc_region
> echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
> echo 1 > /sys/bus/cxl/devices/$region/interleave_ways
> 
> echo "dc0" >/sys/bus/cxl/devices/decoder2.0/mode
> echo 0x4000 >/sys/bus/cxl/devices/decoder2.0/dpa_size
> 
> echo 0x4000 > /sys/bus/cxl/devices/$region/size
> echo  "decoder2.0" > /sys/bus/cxl/devices/$region/target0
> echo 1 > /sys/bus/cxl/devices/$region/commit
> echo $region > /sys/bus/cxl/drivers/cxl_region/bind
> 
> Reviewed-by: Jonathan Cameron 
> Signed-off-by: Fan Ni 
> ---
>  hw/mem/cxl_type3.c | 49 ++
>  1 file changed, 49 insertions(+)
> 

Reviewed-by: Gregory Price 



Re: [PATCH v7 03/12] include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices

2024-04-19 Thread Gregory Price
On Thu, Apr 18, 2024 at 04:10:54PM -0700, nifan@gmail.com wrote:
> From: Fan Ni 
> 
> Rename mem_size as static_mem_size for type3 memdev to cover static RAM and
> pmem capacity, preparing for the introduction of dynamic capacity to support
> dynamic capacity devices.
> 
> Reviewed-by: Jonathan Cameron 
> Signed-off-by: Fan Ni 
> ---
>  hw/cxl/cxl-mailbox-utils.c  | 4 ++--
>  hw/mem/cxl_type3.c  | 8 
>  include/hw/cxl/cxl_device.h | 2 +-
>  3 files changed, 7 insertions(+), 7 deletions(-)
> 

Reviewed-by: Gregory Price 




Re: [PATCH v7 02/12] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support

2024-04-19 Thread Gregory Price
On Thu, Apr 18, 2024 at 04:10:53PM -0700, nifan@gmail.com wrote:
> From: Fan Ni 
> 
> Per cxl spec r3.1, add dynamic capacity region representative based on
> Table 8-165 and extend the cxl type3 device definition to include DC region
> information. Also, based on info in 8.2.9.9.9.1, add 'Get Dynamic Capacity
> Configuration' mailbox support.
> 
> Note: we store region decode length as byte-wise length on the device, which
> should be divided by 256 * MiB before being returned to the host
> for "Get Dynamic Capacity Configuration" mailbox command per
> specification.
> 
> Reviewed-by: Jonathan Cameron 
> Signed-off-by: Fan Ni 
> ---
>  hw/cxl/cxl-mailbox-utils.c  | 96 +
>  include/hw/cxl/cxl_device.h | 16 +++
>  2 files changed, 112 insertions(+)
> 

Reviewed-by: Gregory Price 



Re: [PATCH v7 01/12] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command

2024-04-19 Thread Gregory Price
On Thu, Apr 18, 2024 at 04:10:52PM -0700, nifan@gmail.com wrote:
> From: Fan Ni 
> 
> Based on CXL spec r3.1 Table 8-127 (Identify Memory Device Output
> Payload), dynamic capacity event log size should be part of
> output of the Identify command.
> Add dc_event_log_size to the output payload for the host to get the info.
> 
> Reviewed-by: Jonathan Cameron 
> Signed-off-by: Fan Ni 
> ---
>  hw/cxl/cxl-mailbox-utils.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 

Reviewed-by: Gregory Price 



Re: [PATCH v7 05/12] hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr size instead of mr as argument

2024-04-19 Thread Gregory Price
On Thu, Apr 18, 2024 at 04:10:56PM -0700, nifan@gmail.com wrote:
> From: Fan Ni 
> 
> The function ct3_build_cdat_entries_for_mr only uses size of the passed
> memory region argument, refactor the function definition to make the passed
> arguments more specific.
> 
> Reviewed-by: Jonathan Cameron 
> Signed-off-by: Fan Ni 
> ---
>  hw/mem/cxl_type3.c | 15 +--
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 5ceed0ab4c..a1fe268560 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -44,7 +44,7 @@ enum {
>  };

Reviewed-by: Gregory Price 



Re: [PATCH v6 10/12] hw/mem/cxl_type3: Add dpa range validation for accesses to DC regions

2024-04-18 Thread Gregory Price
On Wed, Apr 17, 2024 at 12:59:51PM +0100, Jonathan Cameron wrote:
> On Tue, 16 Apr 2024 09:37:09 -0700
> fan  wrote:
> 
> > 
> > Currently, we do not allow releasing an extent when it is still pending,
> > which aligns with the case you mentioned above "not release for reuse", I
> > think.
> > Can the second add mean a retry instead of reuse? 
> No - or at least the device should not be doing that.  The FM might try
> again, but only once it knows try 1 failed. For reasons of this aligning
> with MHD case where you definitely can't offer it to more than one host,
> I think we should not do it.  Whether we should put any effort into blocking
> it is a different question.  User error :)
> 
> Note, the host must not remove a log entry until it has dealt with it
> (sent a response) so there is no obvious reason to bother with a retry.
> Maybe a booting host would reject all offered extents (because it's not ready
> for them yet), but then I'd want the FM to explicitly decide to tell the 
> device
> to offer gain.
> 

This might be the only time a forced-removal makes sense, considering
that removal of a pending add could be potentially catestrophic, but if
the FM knows the host is not coming up and never coming up, an
allocation stuck in pending would not be recoverable unless you
force-removed it.

> Whilst this is a custom interface, the equivalent FM API does say.
> 
> "The command, with selection policy Enable Shared Access, shall also fail 
> with Invalid
> Input under the following conditions:
> • When the specified region is not Sharable
> • When the tagged capacity is already mapped to any Host ID via a non-Sharable
> region
> • When the tagged capacity cannot be added to the requested region due to 
> deviceimposed
> restrictions
> • When the same tagged capacity is currently accessible by the same LD"
> 
> Little fuzzy because of the whole pending vs 'mapped / accessible' wording but
> I think intent is you can't send again until first one is dealt with.
> 
> Jonathan
> 
> > 
> > Fan
> > 
> > > 
> > >   
> > > > From the kernel side, if the first one is accepted, the second one will
> > > > get rejected, and there is no issue there.
> > > > If the first is reject for some reason, the second one can get
> > > > accepted or rejected and do not need to worry about the first one.
> > > > 
> > > > 
> > > > Fan
> > > >   
> > >   
> 



Re: [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents

2024-04-16 Thread Gregory Price
On Tue, Apr 16, 2024 at 03:58:22PM +0100, Jonathan Cameron wrote:
> On Mon, 15 Apr 2024 13:06:04 -0700
> fan  wrote:
> 
> > From ce75be83e915fbc4dd6e489f976665b81174002b Mon Sep 17 00:00:00 2001
> > From: Fan Ni 
> > Date: Tue, 20 Feb 2024 09:48:31 -0800
> > Subject: [PATCH 09/13] hw/cxl/events: Add qmp interfaces to add/release
> >  dynamic capacity extents
> > 
> > +
> > +if (num_extents > 1) {
> > +error_setg(errp,
> > +   "TODO: remove the check once kernel support More flag");
> Not our problem :)  For now we can just test the kernel by passing in single
> extents via separate commands.
> 
> I don't want to carry unnecessary limitations in qemu.
> 

Probably worth popping in to say some out of band discussions around the
`more bit` suggest it may be a while before it is supported.

Allowing QEMU to send more bit messages to the kernel would be extremely
helpful for validation that the kernel won't blow up if/when a real
device implements it.  So yes, please allow it!

~Gregory



Re: [PATCH v6 10/12] hw/mem/cxl_type3: Add dpa range validation for accesses to DC regions

2024-04-16 Thread Gregory Price
On Tue, Apr 16, 2024 at 04:00:56PM +0100, Jonathan Cameron wrote:
> On Mon, 15 Apr 2024 10:37:00 -0700
> fan  wrote:
> 
> > On Fri, Apr 12, 2024 at 06:54:42PM -0400, Gregory Price wrote:
> > > On Mon, Mar 25, 2024 at 12:02:28PM -0700, nifan@gmail.com wrote:  
> > > > From: Fan Ni 
> > > > 
> > > > All dpa ranges in the DC regions are invalid to access until an extent
> > > > covering the range has been added. Add a bitmap for each region to
> > > > record whether a DC block in the region has been backed by DC extent.
> > > > For the bitmap, a bit in the bitmap represents a DC block. When a DC
> > > > extent is added, all the bits of the blocks in the extent will be set,
> > > > which will be cleared when the extent is released.
> > > > 
> > > > Reviewed-by: Jonathan Cameron 
> > > > Signed-off-by: Fan Ni 
> > > > ---
> > > >  hw/cxl/cxl-mailbox-utils.c  |  6 +++
> > > >  hw/mem/cxl_type3.c  | 76 +
> > > >  include/hw/cxl/cxl_device.h |  7 
> > > >  3 files changed, 89 insertions(+)
> > > > 
> > > > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > > > index 7094e007b9..a0d2239176 100644
> > > > --- a/hw/cxl/cxl-mailbox-utils.c
> > > > +++ b/hw/cxl/cxl-mailbox-utils.c
> > > > @@ -1620,6 +1620,7 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const 
> > > > struct cxl_cmd *cmd,
> > > >  
> > > >  cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 
> > > > 0);
> > > >  ct3d->dc.total_extent_count += 1;
> > > > +ct3_set_region_block_backed(ct3d, dpa, len);
> > > >  
> > > >  ent = QTAILQ_FIRST(>dc.extents_pending);
> > > >  cxl_remove_extent_from_extent_list(>dc.extents_pending, 
> > > > ent);  
> > > 
> > > while looking at the MHD code, we had decided to "reserve" the blocks in
> > > the bitmap in the call to `qmp_cxl_process_dynamic_capacity` in order to
> > > prevent a potential double-allocation (basically we need to sanity check
> > > that two hosts aren't reserving the region PRIOR to the host being
> > > notified).
> > > 
> > > I did not see any checks in the `qmp_cxl_process_dynamic_capacity` path
> > > to prevent pending extents from being double-allocated.  Is this an
> > > explicit choice?
> > > 
> > > I can see, for example, why you may want to allow the following in the
> > > pending list: [Add X, Remove X, Add X].  I just want to know if this is
> > > intentional or not. If not, you may consider adding a pending check
> > > during the sanity check phase of `qmp_cxl_process_dynamic_capacity`
> > > 
> > > ~Gregory  
> > 
> > First, for remove request, pending list is not involved. See cxl r3.1,
> > 9.13.3.3. Pending basically means "pending to add". 
> > So for the above example, in the pending list, you can see [Add x, add x] 
> > if the
> > event is not processed in time.
> > Second, from the spec, I cannot find any text saying we cannot issue
> > another add extent X if it is still pending.
> 
> I think there is text saying that the capacity is not released for reuse
> by the device until it receives a response from the host.   Whilst
> it's not explicit on offers to the same host, I'm not sure that matters.
> So I don't think it is suppose to queue multiple extents...
> 
> 

It definitely should not release capacity until it receives a response,
because the host could tell the device to kick rocks (which would be
reasonable under a variety of circumstances).

~Gregory



Re: [PATCH v6 10/12] hw/mem/cxl_type3: Add dpa range validation for accesses to DC regions

2024-04-12 Thread Gregory Price
On Mon, Mar 25, 2024 at 12:02:28PM -0700, nifan@gmail.com wrote:
> From: Fan Ni 
> 
> All dpa ranges in the DC regions are invalid to access until an extent
> covering the range has been added. Add a bitmap for each region to
> record whether a DC block in the region has been backed by DC extent.
> For the bitmap, a bit in the bitmap represents a DC block. When a DC
> extent is added, all the bits of the blocks in the extent will be set,
> which will be cleared when the extent is released.
> 
> Reviewed-by: Jonathan Cameron 
> Signed-off-by: Fan Ni 
> ---
>  hw/cxl/cxl-mailbox-utils.c  |  6 +++
>  hw/mem/cxl_type3.c  | 76 +
>  include/hw/cxl/cxl_device.h |  7 
>  3 files changed, 89 insertions(+)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 7094e007b9..a0d2239176 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -1620,6 +1620,7 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct 
> cxl_cmd *cmd,
>  
>  cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
>  ct3d->dc.total_extent_count += 1;
> +ct3_set_region_block_backed(ct3d, dpa, len);
>  
>  ent = QTAILQ_FIRST(>dc.extents_pending);
>  cxl_remove_extent_from_extent_list(>dc.extents_pending, ent);

while looking at the MHD code, we had decided to "reserve" the blocks in
the bitmap in the call to `qmp_cxl_process_dynamic_capacity` in order to
prevent a potential double-allocation (basically we need to sanity check
that two hosts aren't reserving the region PRIOR to the host being
notified).

I did not see any checks in the `qmp_cxl_process_dynamic_capacity` path
to prevent pending extents from being double-allocated.  Is this an
explicit choice?

I can see, for example, why you may want to allow the following in the
pending list: [Add X, Remove X, Add X].  I just want to know if this is
intentional or not. If not, you may consider adding a pending check
during the sanity check phase of `qmp_cxl_process_dynamic_capacity`

~Gregory



Re: [PATCH 9/9] accel/tcg: Improve can_do_io management

2024-04-09 Thread Gregory Price
On Sat, Apr 06, 2024 at 12:32:48PM -1000, Richard Henderson wrote:
> We already attempted to set and clear can_do_io before the first
> and last insns, but only used the initial value of max_insns and
> the call to translator_io_start to find those insns.
> 
> Now that we track insn_start in DisasContextBase, and now that
> we have emit_before_op, we can wait until we have finished
> translation to identify the true first and last insns and emit
> the sets of can_do_io at that time.
> 
> This fixes case of a translation block which crossed a page boundary,
> and for which the second page turned out to be mmio.

I love when I get to say this: I knew it! :D

https://lore.kernel.org/qemu-devel/zbvvb4j+ahkln...@memverge.com/

Great fix, much appreciate the effort!

Reviewed-by: Gregory Price 

> In this case we
> truncate the block, and the previous logic for can_do_io could leave
> a block with a single insn with can_do_io set to false, which would
> fail an assertion in cpu_io_recompile.
> 
> Reported-by: Jørgen Hansen 
> Signed-off-by: Richard Henderson 
> ---
>  include/exec/translator.h |  1 -
>  accel/tcg/translator.c| 45 ---
>  2 files changed, 23 insertions(+), 23 deletions(-)
> 
> diff --git a/include/exec/translator.h b/include/exec/translator.h
> index ceaeca8c91..2c4fb818e7 100644
> --- a/include/exec/translator.h
> +++ b/include/exec/translator.h
> @@ -87,7 +87,6 @@ typedef struct DisasContextBase {
>  int num_insns;
>  int max_insns;
>  bool singlestep_enabled;
> -int8_t saved_can_do_io;
>  bool plugin_enabled;
>  struct TCGOp *insn_start;
>  void *host_addr[2];
> diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
> index ae61c154c2..9de0bc34c8 100644
> --- a/accel/tcg/translator.c
> +++ b/accel/tcg/translator.c
> @@ -18,20 +18,14 @@
>  
>  static void set_can_do_io(DisasContextBase *db, bool val)
>  {
> -if (db->saved_can_do_io != val) {
> -db->saved_can_do_io = val;
> -
> -QEMU_BUILD_BUG_ON(sizeof_field(CPUState, neg.can_do_io) != 1);
> -tcg_gen_st8_i32(tcg_constant_i32(val), tcg_env,
> -offsetof(ArchCPU, parent_obj.neg.can_do_io) -
> -offsetof(ArchCPU, env));
> -}
> +QEMU_BUILD_BUG_ON(sizeof_field(CPUState, neg.can_do_io) != 1);
> +tcg_gen_st8_i32(tcg_constant_i32(val), tcg_env,
> +offsetof(ArchCPU, parent_obj.neg.can_do_io) -
> +offsetof(ArchCPU, env));
>  }
>  
>  bool translator_io_start(DisasContextBase *db)
>  {
> -set_can_do_io(db, true);
> -
>  /*
>   * Ensure that this instruction will be the last in the TB.
>   * The target may override this to something more forceful.
> @@ -84,13 +78,6 @@ static TCGOp *gen_tb_start(DisasContextBase *db, uint32_t 
> cflags)
>   - offsetof(ArchCPU, env));
>  }
>  
> -/*
> - * cpu->neg.can_do_io is set automatically here at the beginning of
> - * each translation block.  The cost is minimal, plus it would be
> - * very easy to forget doing it in the translator.
> - */
> -set_can_do_io(db, db->max_insns == 1);
> -
>  return icount_start_insn;
>  }
>  
> @@ -129,6 +116,7 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, 
> int *max_insns,
>  {
>  uint32_t cflags = tb_cflags(tb);
>  TCGOp *icount_start_insn;
> +TCGOp *first_insn_start = NULL;
>  bool plugin_enabled;
>  
>  /* Initialize DisasContext */
> @@ -139,7 +127,6 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, 
> int *max_insns,
>  db->num_insns = 0;
>  db->max_insns = *max_insns;
>  db->singlestep_enabled = cflags & CF_SINGLE_STEP;
> -db->saved_can_do_io = -1;
>  db->insn_start = NULL;
>  db->host_addr[0] = host_pc;
>  db->host_addr[1] = NULL;
> @@ -159,6 +146,9 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, 
> int *max_insns,
>  *max_insns = ++db->num_insns;
>  ops->insn_start(db, cpu);
>  db->insn_start = tcg_last_op();
> +if (first_insn_start == NULL) {
> +first_insn_start = db->insn_start;
> +}
>  tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
>  
>  if (plugin_enabled) {
> @@ -171,10 +161,6 @@ void translator_loop(CPUState *cpu, TranslationBlock 
> *tb, int *max_insns,
>   * done next -- either exiting this loop or locate the start of
>   * the next instruction.
>   */
> -if (db->num_insns == db->max_insns) {
> 

Re: [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents

2024-04-05 Thread Gregory Price
On Fri, Apr 05, 2024 at 06:44:52PM +0100, Jonathan Cameron wrote:
> On Fri, 5 Apr 2024 12:07:45 -0400
> Gregory Price  wrote:
> 
> > 3. (C) Upon Device receiving Release Dynamic Capacity Request
> >a. check for a pending release request. If exists, error.
> 
> Not sure that's necessary - can queue as long as the head
> can track if the bits are in a pending release state.
> 

Yeah probably it's fine to just queue the event and everything
downstream just handles it.

> >b. check that the bits in the MHD bitmap are actually set
> Good.
> > 
> >function: qmp_cxl_process_dynamic_capacity
> > 
> > 4. (D) Upon Device receiving Release Dynamic Capacity Response
> >a. clear the bits in the mhd bitmap
> >b. remove the pending request from the pending list
> > 
> >function: cmd_dcd_release_dyn_cap
> > 
> > Something to note: The MHD bitmap is essentially the same as the
> > existing DCD extent bitmap - except that it is located in a shared
> > region of memory (mmap file, shm, whatever - pick one).
> 
> I think you will ideally also have a per head one to track head access
> to the things offered by the mhd.
> 

Generally I try not to duplicate state, reduces consistency problems.

You do still need a shared memory state and a per-head state to capture
per-head data, but the allocation bitmap is really device-global state.

Either way you have a race condition when checking the bitmap during a
memory access in the process of adding/releasing capacity - but that's
more an indication of bad host behavior than it is of a bug in the
implementatio of the emulated device. Probably we don't need to
read-lock the bitmap (for access validation), only write-lock.

My preference, for what it's worth, would be to have a single bitmap
and have it be anonymous-memory for Single-head and file-backed for
for Multi-head.  I'll have to work out the locking mechanism.

~Gregory



Re: [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents

2024-04-05 Thread Gregory Price
On Fri, Apr 05, 2024 at 01:27:19PM +0100, Jonathan Cameron wrote:
> On Wed, 3 Apr 2024 14:16:25 -0400
> Gregory Price  wrote:
> 
> A few follow up comments.
> 
> > 
> > > +error_setg(errp, "no valid extents to send to process");
> > > +return;
> > > +}
> > > +  
> > 
> > I'm looking at adding the MHD extensions around this point, e.g.:
> > 
> > /* If MHD cannot allocate requested extents, the cmd fails */
> > if (type == DC_EVENT_ADD_CAPACITY && dcd->mhd_dcd_extents_allocate &&
> > num_extents != dcd->mhd_dcd_extents_allocate(...))
> > return;
> > 
> > where mhd_dcd_extents_allocate checks the MHD block bitmap and tags
> > for correctness (shared // no double-allocations, etc). On success,
> > it garuantees proper ownership.
> > 
> > the release path would then be done in the release response path from
> > the host, as opposed to the release event injection.
> 
> I think it would be polite to check if the QMP command on release
> for whether it is asking something plausible - makes for an easier
> to user QMP interface.  I guess it's not strictly required though.
> What races are there on release?

The only real critical section, barring force-release beign supported,
is when you clear the bits in the device allowing new requests to swipe
those blocks. The appropriate place appears to be after the host kernel
has responded to the release extent request.

Also need to handle the case of multiple add-requests contending for the
same region, but that's just an "oops failed to get all the bits, roll
back" scenario - easy to handle.

Could go coarse-grained to just lock access to the bitmap entirely while
operating on it, or be fancy and use atomics to go lockless. The latter
code already exists in the Niagara model for reference.

> We aren't support force release
> for now, and for anything else, it's host specific (unlike add where
> the extra rules kick in).   AS such I 'think' a check at command
> time will be valid as long as the host hasn't done an async
> release of capacity between that and the event record.  That
> is a race we always have and the host should at most log it and
> not release capacity twice.
>

Borrowing from the Ira's flow chart, here are the pieces I believe are
needed to implement MHD support for DCD.

Orchestrator  FM Device   Host KernelHost User

| |   ||  |
|-- Add ->|-- Add --->A--- Add --->|  |
|  Capacity   |  Extent   |   Extent   |  |
| |   ||  |
| |<--Accept--B<--Accept --|  |
| |   Extent  |   Extent   |  |
| |   ||  |
| | ... snip ...   |  |
| |   ||  |
|-- Remove -->|--Release->C---Release->|  |
|  Capacity   |  Extent   |   Extent   |  |
| |   ||  |
| |<-Release--D<--Release--|  |
| |  Extent   |   Extent   |  |
| |   ||  |

1. (A) Upon Device Receiving Add Capacity Request
   a. the device sanity checks the request against local mappings
   b. the mhd hook is called to sanity check against global mappings
   c. the mhd bitmap is updated, marking the capacity owned by that head

   function: qmp_cxl_process_dynamic_capacity

2. (B) Upon Device Receiving Add Dynamic Capacity Response
   a. accepted extents are compared to the original request
   b. not accepted extents are cleared from the bitmap (local and MHD)
   (Note: My understanding is that for now each request = 1 extent)

   function: cmd_dcd_add_dyn_cap_rsp

3. (C) Upon Device receiving Release Dynamic Capacity Request
   a. check for a pending release request. If exists, error.
   b. check that the bits in the MHD bitmap are actually set

   function: qmp_cxl_process_dynamic_capacity

4. (D) Upon Device receiving Release Dynamic Capacity Response
   a. clear the bits in the mhd bitmap
   b. remove the pending request from the pending list

   function: cmd_dcd_release_dyn_cap

Something to note: The MHD bitmap is essentially the same as the
existing DCD extent bitmap - except that it is located in a shared
region of memory (mmap file, shm, whatever - pick one).

Maybe it's worth abstracting out the bitmap twiddling to make that
backable by a file mmap'd SHARED and use atomics to twiddle the bits?

That would be about 90% of the way to MH-DCD.

Maybe flock() could be used for coarse locking on the a shared bitmap
in the short term?  This mitigates your concern of using shm.h as
the coordination piece, though i'm not sure how portable flock() is.

~Gregory



Re: [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents

2024-04-03 Thread Gregory Price
On Mon, Mar 25, 2024 at 12:02:27PM -0700, nifan@gmail.com wrote:
> From: Fan Ni 
> 
> To simulate FM functionalities for initiating Dynamic Capacity Add
> (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> add/release dynamic capacity extents requests.
> 
... snip 
> +
> +/*
> + * The main function to process dynamic capacity event. Currently DC extents
> + * add/release requests are processed.
> + */
> +static void qmp_cxl_process_dynamic_capacity(const char *path, CxlEventLog 
> log,
> + CXLDCEventType type, uint16_t 
> hid,
> + uint8_t rid,
> + CXLDCExtentRecordList *records,
> + Error **errp)
> +{
... snip 
> +/* Sanity check and count the extents */
> +list = records;
> +while (list) {
> +offset = list->value->offset;
> +len = list->value->len;
> +dpa = offset + dcd->dc.regions[rid].base;
> +
> +if (len == 0) {
> +error_setg(errp, "extent with 0 length is not allowed");
> +return;
> +}
> +
> +if (offset % block_size || len % block_size) {
> +error_setg(errp, "dpa or len is not aligned to region block 
> size");
> +return;
> +}
> +
> +if (offset + len > dcd->dc.regions[rid].len) {
> +error_setg(errp, "extent range is beyond the region end");
> +return;
> +}
> +
> +/* No duplicate or overlapped extents are allowed */
> +if (test_any_bits_set(blk_bitmap, offset / block_size,
> +  len / block_size)) {
> +error_setg(errp, "duplicate or overlapped extents are detected");
> +return;
> +}
> +bitmap_set(blk_bitmap, offset / block_size, len / block_size);
> +
> +num_extents++;

I think num_extents is always equal to the length of the list, otherwise
this code will return with error.

Nitpick:
This can be moved to the bottom w/ `list = list->next` to express that a
little more clearly.

> +if (type == DC_EVENT_RELEASE_CAPACITY) {
> +if (cxl_extents_overlaps_dpa_range(>dc.extents_pending,
> +   dpa, len)) {
> +error_setg(errp,
> +   "cannot release extent with pending DPA range");
> +return;
> +}
> +if (!cxl_extents_contains_dpa_range(>dc.extents,
> +dpa, len)) {
> +error_setg(errp,
> +   "cannot release extent with non-existing DPA 
> range");
> +return;
> +}
> +}
> +list = list->next;
> +}
> +
> +if (num_extents == 0) {

Since num_extents is always the length of the list, this is equivalent to
`if (!records)` prior to the while loop. Makes it a little more clear that:

1. There must be at least 1 extent
2. All extents must be valid for the command to be serviced.

> +error_setg(errp, "no valid extents to send to process");
> +return;
> +}
> +

I'm looking at adding the MHD extensions around this point, e.g.:

/* If MHD cannot allocate requested extents, the cmd fails */
if (type == DC_EVENT_ADD_CAPACITY && dcd->mhd_dcd_extents_allocate &&
num_extents != dcd->mhd_dcd_extents_allocate(...))
return;

where mhd_dcd_extents_allocate checks the MHD block bitmap and tags
for correctness (shared // no double-allocations, etc). On success,
it garuantees proper ownership.

the release path would then be done in the release response path from
the host, as opposed to the release event injection.

Do you see any issues with that flow?

> +/* Create extent list for event being passed to host */
> +i = 0;
> +list = records;
> +extents = g_new0(CXLDCExtentRaw, num_extents);
> +while (list) {
> +offset = list->value->offset;
> +len = list->value->len;
> +dpa = dcd->dc.regions[rid].base + offset;
> +
> +extents[i].start_dpa = dpa;
> +extents[i].len = len;
> +memset(extents[i].tag, 0, 0x10);
> +extents[i].shared_seq = 0;
> +list = list->next;
> +i++;
> +}
> +
> +/*
> + * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
> + *
> + * All Dynamic Capacity event records shall set the Event Record Severity
> + * field in the Common Event Record Format to Informational Event. All
> + * Dynamic Capacity related events shall be logged in the Dynamic 
> Capacity
> + * Event Log.
> + */
> +cxl_assign_event_header(hdr, _capacity_uuid, flags, sizeof(dCap),
> +cxl_device_get_timestamp(>cxl_dstate));
> +
> +dCap.type = type;
> +/* FIXME: 

[RFC PATCH INCOMPLETE] cxl: Multi-headed Single Logical Device (MHSLD)

2024-03-08 Thread Gregory Price
Implement the scaffolding for an MHSLD with a simple multi-headed
command set. This device inherits the cxl-type3 device and will
provide the hooks needed to serialize multi-system devices.

This device requires linux, as it uses shared memory (shmem) to
implement the shared state.  This also limits the emulation of
multiple guests to a single host.

To instantiate:

-device 
cxl-mhsld,bus=rp0,volatile-memdev=mem0,id=cxl-mem0,sn=6,mhd-head=0,mhd-shmid=15

The MH-SLD shared memory region must be initialized via 'init_mhsld'
tool located in the cxl/mhsld directory.

usage: init_mhsld  
heads : number of heads on the device
shmid : shmid produced by ipcmk

Example:
$shmid1=ipcmk -M 131072
./init_mhseld 4 $shmid


What we should discuss:
* What state needs to be in MHSLDSharedState
  - for DCD
  - for other CXL extentions? (RAS?)

* What hooks we need to add to cxl-type3 for MHDs
  - Example: cvc->mhd_access_valid was added for Niagara
  - Additional calls will need to be added for things like DCD

* Shared state design and serialization
  - right now we use the same shared memory solution, this makes
serialization a little easier since we can use a mutex in shmem
  - This limits emulation of MHSLD devices to QEMU guests located
on the same host, but this seems fine for functional testing.

* Shared state initialization
  - right now i use the external tool, but if we want to limit
the ability to test things like restarting hosts, we could
instead add an explicit `initialize_state=true` option to the
device and ditch the extra program.

Signed-off-by: Gregory Price 
---
 hw/cxl/Kconfig|   1 +
 hw/cxl/meson.build|   1 +
 hw/cxl/mhsld/.gitignore   |   1 +
 hw/cxl/mhsld/Kconfig  |   4 +
 hw/cxl/mhsld/init_mhsld.c |  76 
 hw/cxl/mhsld/meson.build  |   3 +
 hw/cxl/mhsld/mhsld.c  | 177 ++
 hw/cxl/mhsld/mhsld.h  |  64 ++
 8 files changed, 327 insertions(+)
 create mode 100644 hw/cxl/mhsld/.gitignore
 create mode 100644 hw/cxl/mhsld/Kconfig
 create mode 100644 hw/cxl/mhsld/init_mhsld.c
 create mode 100644 hw/cxl/mhsld/meson.build
 create mode 100644 hw/cxl/mhsld/mhsld.c
 create mode 100644 hw/cxl/mhsld/mhsld.h

diff --git a/hw/cxl/Kconfig b/hw/cxl/Kconfig
index e603839a62..919e59b598 100644
--- a/hw/cxl/Kconfig
+++ b/hw/cxl/Kconfig
@@ -1,3 +1,4 @@
+source mhsld/Kconfig
 source vendor/Kconfig
 
 config CXL
diff --git a/hw/cxl/meson.build b/hw/cxl/meson.build
index e8c8c1355a..394750dd19 100644
--- a/hw/cxl/meson.build
+++ b/hw/cxl/meson.build
@@ -16,4 +16,5 @@ system_ss.add(when: 'CONFIG_I2C_MCTP_CXL', if_true: 
files('i2c_mctp_cxl.c'))
 
 system_ss.add(when: 'CONFIG_ALL', if_true: files('cxl-host-stubs.c'))
 
+subdir('mhsld')
 subdir('vendor')
diff --git a/hw/cxl/mhsld/.gitignore b/hw/cxl/mhsld/.gitignore
new file mode 100644
index 00..f9f6a37cc0
--- /dev/null
+++ b/hw/cxl/mhsld/.gitignore
@@ -0,0 +1 @@
+init_mhsld
diff --git a/hw/cxl/mhsld/Kconfig b/hw/cxl/mhsld/Kconfig
new file mode 100644
index 00..dc2be15140
--- /dev/null
+++ b/hw/cxl/mhsld/Kconfig
@@ -0,0 +1,4 @@
+config CXL_MHSLD
+bool
+depends on CXL_MEM_DEVICE
+default y
diff --git a/hw/cxl/mhsld/init_mhsld.c b/hw/cxl/mhsld/init_mhsld.c
new file mode 100644
index 00..d9e0cd54e0
--- /dev/null
+++ b/hw/cxl/mhsld/init_mhsld.c
@@ -0,0 +1,76 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Copyright (c) 2024 MemVerge Inc.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct mhsld_state {
+uint8_t nr_heads;
+uint8_t nr_lds;
+uint8_t ldmap[65536];
+};
+
+int main(int argc, char *argv[])
+{
+int shmid = 0;
+uint32_t sections = 0;
+uint32_t section_size = 0;
+uint32_t heads = 0;
+struct mhsld_state *mhsld_state = NULL;
+size_t state_size;
+uint8_t i;
+
+if (argc != 3) {
+printf("usage: init_mhsld  \n"
+"\theads : number of heads on the device\n"
+"\tshmid : /tmp/mytoken.tmp\n\n"
+"It is recommended your shared memory region is at least 
128kb\n");
+return -1;
+}
+
+/* must have at least 1 head */
+heads = (uint32_t)atoi(argv[1]);
+if (heads == 0 || heads > 32) {
+printf("bad heads argument (1-32)\n");
+return -1;
+}
+
+shmid = (uint32_t)atoi(argv[2]);
+if (shmid == 0) {
+printf("bad shmid argument\n");
+return -1;
+}
+
+mhsld_state = shmat(shmid, NULL, 0);
+if (mhsld_state == (void *)-1) {
+printf("Unable to attach to shared memory\n");
+return -1;
+}
+
+/* Initialize the mhsld_state */
+state_size = sizeof(struct mhsld_state) + (sizeof(uint32_t) * sections);
+memset(mhsld_state, 0, state_size);
+mh

Re: [PATCH 1/2] hmat acpi: Do not add Memory Proximity Domain Attributes Structure targetting non existent memory.

2024-02-29 Thread Gregory Price
On Thu, Feb 29, 2024 at 04:25:44PM +, Jonathan Cameron wrote:
> If qemu is started with a proximity node containing CPUs alone,
> it will provide one of these structures to say memory in this
> node is directly connected to itself.
> 
> This description is arguably pointless even if there is memory
> in the node.  If there is no memory present, and hence no SRAT
> entry it breaks Linux HMAT passing and the table is rejected.
> 
> https://elixir.bootlin.com/linux/latest/source/drivers/acpi/numa/hmat.c#L444
> 

Nit: This link becomes out of date pretty much immediately, consider
using a versioned link.

> Signed-off-by: Jonathan Cameron 
> ---
>  hw/acpi/hmat.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
> index 3042d223c8..723ae28d32 100644
> --- a/hw/acpi/hmat.c
> +++ b/hw/acpi/hmat.c
> @@ -204,6 +204,13 @@ static void hmat_build_table_structs(GArray *table_data, 
> NumaState *numa_state)
>  build_append_int_noprefix(table_data, 0, 4); /* Reserved */
>  
>  for (i = 0; i < numa_state->num_nodes; i++) {
> +/*
> + * Linux rejects whole HMAT table if a node with no memory
> + * has one of these structures listing it as a target.
> + */
> +if (!numa_state->nodes[i].node_mem) {
> +continue;
> +}
>  flags = 0;
>  
>  if (numa_state->nodes[i].initiator < MAX_NODES) {
> -- 
> 2.39.2
> 
> 



Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

2024-02-02 Thread Gregory Price
On Fri, Feb 02, 2024 at 04:33:20PM +, Peter Maydell wrote:
> On Fri, 2 Feb 2024 at 16:26, Jonathan Cameron
>  wrote:
> > #7  0x55ab1929 in bql_lock_impl (file=0x56049122 
> > "../../accel/tcg/cputlb.c", line=2033) at ../../system/cpus.c:524
> > #8  bql_lock_impl (file=file@entry=0x56049122 
> > "../../accel/tcg/cputlb.c", line=line@entry=2033) at ../../system/cpus.c:520
> > #9  0x55c9f7d6 in do_ld_mmio_beN (cpu=0x578e0cb0, 
> > full=0x7ffe88012950, ret_be=ret_be@entry=0, addr=19595792376, 
> > size=size@entry=8, mmu_idx=4, type=MMU_DATA_LOAD, ra=0) at 
> > ../../accel/tcg/cputlb.c:2033
> > #10 0x55ca0fbd in do_ld_8 (cpu=cpu@entry=0x578e0cb0, 
> > p=p@entry=0x74efd1d0, mmu_idx=, 
> > type=type@entry=MMU_DATA_LOAD, memop=, ra=ra@entry=0) at 
> > ../../accel/tcg/cputlb.c:2356
> > #11 0x55ca341f in do_ld8_mmu (cpu=cpu@entry=0x578e0cb0, 
> > addr=addr@entry=19595792376, oi=oi@entry=52, ra=0, ra@entry=52, 
> > access_type=access_type@entry=MMU_DATA_LOAD) at 
> > ../../accel/tcg/cputlb.c:2439
> > #12 0x55ca5f59 in cpu_ldq_mmu (ra=52, oi=52, addr=19595792376, 
> > env=0x578e3470) at ../../accel/tcg/ldst_common.c.inc:169
> > #13 cpu_ldq_le_mmuidx_ra (env=0x578e3470, addr=19595792376, 
> > mmu_idx=, ra=ra@entry=0) at 
> > ../../accel/tcg/ldst_common.c.inc:301
> > #14 0x55b4b5fc in ptw_ldq (ra=0, in=0x74efd320) at 
> > ../../target/i386/tcg/sysemu/excp_helper.c:98
> > #15 ptw_ldq (ra=0, in=0x74efd320) at 
> > ../../target/i386/tcg/sysemu/excp_helper.c:93
> > #16 mmu_translate (env=env@entry=0x578e3470, in=0x74efd3e0, 
> > out=0x74efd3b0, err=err@entry=0x74efd3c0, ra=ra@entry=0) at 
> > ../../target/i386/tcg/sysemu/excp_helper.c:174
> > #17 0x55b4c4b3 in get_physical_address (ra=0, err=0x74efd3c0, 
> > out=0x74efd3b0, mmu_idx=0, access_type=MMU_DATA_LOAD, 
> > addr=18446741874686299840, env=0x578e3470) at 
> > ../../target/i386/tcg/sysemu/excp_helper.c:580
> > #18 x86_cpu_tlb_fill (cs=0x578e0cb0, addr=18446741874686299840, 
> > size=, access_type=MMU_DATA_LOAD, mmu_idx=0, 
> > probe=, retaddr=0) at 
> > ../../target/i386/tcg/sysemu/excp_helper.c:606
> > #19 0x55ca0ee9 in tlb_fill (retaddr=0, mmu_idx=0, 
> > access_type=MMU_DATA_LOAD, size=, addr=18446741874686299840, 
> > cpu=0x74efd540) at ../../accel/tcg/cputlb.c:1315
> > #20 mmu_lookup1 (cpu=cpu@entry=0x578e0cb0, 
> > data=data@entry=0x74efd540, mmu_idx=0, 
> > access_type=access_type@entry=MMU_DATA_LOAD, ra=ra@entry=0) at 
> > ../../accel/tcg/cputlb.c:1713
> 
> Here we are trying to take an interrupt. This isn't related to the
> other can_do_io stuff, it's happening because do_ld_mmio_beN assumes
> it's called with the BQL not held, but in fact there are some
> situations where we call into the memory subsystem and we do
> already have the BQL.
> 
> -- PMM

It's bugs all the way down as usual!
https://xkcd.com/1416/

I'll dig in a little next week to see if there's an easy fix. We can see
the return address is already 0 going into mmu_translate, so it does
look unrelated to the patch I threw out - but probably still has to do
with things being on IO.

~Gregory



Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

2024-02-01 Thread Gregory Price
On Thu, Feb 01, 2024 at 06:04:26PM +, Peter Maydell wrote:
> On Thu, 1 Feb 2024 at 17:25, Alex Bennée  wrote:
> >
> > Jonathan Cameron  writes:
> > >> > #21 0x55ca3e5d in do_st8_mmu (cpu=0x578e0cb0, addr=23937, 
> > >> > val=18386491784638059520, oi=6, ra=140736029817822) at 
> > >> > ../../accel/tcg/cputlb.c:2853
> > >> > #22 0x7fffa9107c63 in code_gen_buffer ()
> > >>
> > >> No thats different - we are actually writing to the MMIO region here.
> > >> But the fact we hit cpu_abort because we can't find the TB we are
> > >> executing is a little problematic.
> > >>
> > >> Does ra properly point to the code buffer here?
> > >
> > > Err.  How would I tell?
> >
> > (gdb) p/x 140736029817822
> > $1 = 0x7fffa9107bde
> >
> > seems off because code_gen_buffer starts at 0x7fffa9107c63
> 
> The code_gen_buffer doesn't *start* at 0x7fffa9107c63 --
> that is our return address into it, which is to say it's just
> after the call insn to the do_st8_mmu helper. The 'ra' argument
> to the helper function is going to be a number slightly lower
> than that, because it points within the main lump of generated
> code for the TB, whereas the helper call is done as part of
> an out-of-line lump of common code at the end of the TB.
> 
> The 'ra' here is fine -- the problem is that we don't
> pass it all the way down the callstack and instead end
> up using 0 as a 'ra' within the ptw code.
> 
> -- PMM

Is there any particular reason not to, as below?
~Gregory


diff --git a/target/i386/tcg/sysemu/excp_helper.c 
b/target/i386/tcg/sysemu/excp_helper.c
index 5b86f439ad..2f581b9bfb 100644
--- a/target/i386/tcg/sysemu/excp_helper.c
+++ b/target/i386/tcg/sysemu/excp_helper.c
@@ -59,14 +59,14 @@ typedef struct PTETranslate {
 hwaddr gaddr;
 } PTETranslate;

-static bool ptw_translate(PTETranslate *inout, hwaddr addr)
+static bool ptw_translate(PTETranslate *inout, hwaddr addr, uint64_t ra)
 {
 CPUTLBEntryFull *full;
 int flags;

 inout->gaddr = addr;
 flags = probe_access_full(inout->env, addr, 0, MMU_DATA_STORE,
-  inout->ptw_idx, true, >haddr, , 0);
+  inout->ptw_idx, true, >haddr, , ra);

 if (unlikely(flags & TLB_INVALID_MASK)) {
 TranslateFault *err = inout->err;
@@ -82,20 +82,20 @@ static bool ptw_translate(PTETranslate *inout, hwaddr addr)
 return true;
 }

-static inline uint32_t ptw_ldl(const PTETranslate *in)
+static inline uint32_t ptw_ldl(const PTETranslate *in, uint64_t ra)
 {
 if (likely(in->haddr)) {
 return ldl_p(in->haddr);
 }
-return cpu_ldl_mmuidx_ra(in->env, in->gaddr, in->ptw_idx, 0);
+return cpu_ldl_mmuidx_ra(in->env, in->gaddr, in->ptw_idx, ra);
 }

-static inline uint64_t ptw_ldq(const PTETranslate *in)
+static inline uint64_t ptw_ldq(const PTETranslate *in, uint64_t ra)
 {
 if (likely(in->haddr)) {
 return ldq_p(in->haddr);
 }
-return cpu_ldq_mmuidx_ra(in->env, in->gaddr, in->ptw_idx, 0);
+return cpu_ldq_mmuidx_ra(in->env, in->gaddr, in->ptw_idx, ra);
 }

 /*
@@ -132,7 +132,8 @@ static inline bool ptw_setl(const PTETranslate *in, 
uint32_t old, uint32_t set)
 }

 static bool mmu_translate(CPUX86State *env, const TranslateParams *in,
-  TranslateResult *out, TranslateFault *err)
+  TranslateResult *out, TranslateFault *err,
+  uint64_t ra)
 {
 const int32_t a20_mask = x86_get_a20_mask(env);
 const target_ulong addr = in->addr;
@@ -166,11 +167,11 @@ static bool mmu_translate(CPUX86State *env, const 
TranslateParams *in,
  */
 pte_addr = ((in->cr3 & ~0xfff) +
 (((addr >> 48) & 0x1ff) << 3)) & a20_mask;
-if (!ptw_translate(_trans, pte_addr)) {
+if (!ptw_translate(_trans, pte_addr, ra)) {
 return false;
 }
 restart_5:
-pte = ptw_ldq(_trans);
+pte = ptw_ldq(_trans, ra);
 if (!(pte & PG_PRESENT_MASK)) {
 goto do_fault;
 }
@@ -191,11 +192,11 @@ static bool mmu_translate(CPUX86State *env, const 
TranslateParams *in,
  */
 pte_addr = ((pte & PG_ADDRESS_MASK) +
 (((addr >> 39) & 0x1ff) << 3)) & a20_mask;
-if (!ptw_translate(_trans, pte_addr)) {
+if (!ptw_translate(_trans, pte_addr, ra)) {
 return false;
 }
 restart_4:
-pte = ptw_ldq(_trans);
+pte = ptw_ldq(_trans, ra);
 if (!(pte & PG_PRESENT_MASK)) {
 goto do_fault;
 }
@@ -212,11 +213,11 @@ static bool mmu_translate(CPUX86State *env, const 
TranslateParams *in,
  */
 pte_addr = ((pte & PG_ADDRESS_MASK) +
 (((addr >> 30) & 0x1ff) << 3)) & a20_mask;
-if (!ptw_translate(_trans, 

Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

2024-02-01 Thread Gregory Price
On Thu, Feb 01, 2024 at 05:07:31PM +, Peter Maydell wrote:
> On Thu, 1 Feb 2024 at 17:04, Gregory Price  wrote:
> >
> > On Thu, Feb 01, 2024 at 04:45:30PM +, Alex Bennée wrote:
> 
> > > No thats different - we are actually writing to the MMIO region here.
> > > But the fact we hit cpu_abort because we can't find the TB we are
> > > executing is a little problematic.
> > >
> > > Does ra properly point to the code buffer here?
> > >
> >
> > What if the code block is ALSO in CXL (MMIO)? :D
> 
> In that case the TB is supposed to be a single insn,
> so the insn will by definition be the last one in its
> TB, and IO should be OK for it -- so can_do_io ought
> to be true and we shouldn't get into the io_recompile.
> 
> -- PMM

We saw a bug early on in CXL emulation with instructions hosted on CXL
that split a page boundary (e.g. 0xEB|0xFE)..  I'm wondering about a
code block that splits a page boundary and whether there's a similar
corner case.

~Gregory



Re: Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

2024-02-01 Thread Gregory Price
On Thu, Feb 01, 2024 at 04:45:30PM +, Alex Bennée wrote:
> Jonathan Cameron  writes:
> 
> > On Thu, 1 Feb 2024 16:00:56 +
> > Peter Maydell  wrote:
> >
> >> On Thu, 1 Feb 2024 at 15:17, Alex Bennée  wrote:
> >> >
> >> > Peter Maydell  writes:  
> >> > > So, that looks like:
> >> > >  * we call cpu_tb_exec(), which executes some generated code
> >> > >  * that generated code calls the lookup_tb_ptr helper to see
> >> > >if we have a generated TB already for the address we're going
> >> > >to execute next
> >> > >  * lookup_tb_ptr probes the TLB to see if we know the host RAM
> >> > >address for the guest address
> >> > >  * this results in a TLB walk for an instruction fetch
> >> > >  * the page table descriptor load is to IO memory
> >> > >  * io_prepare assumes it needs to do a TLB recompile, because
> >> > >can_do_io is clear
> >> > >
> >> > > I am not surprised that the corner case of "the guest put its
> >> > > page tables in an MMIO device" has not yet come up :-)
> >> > >
> >> > > I'm really not sure how the icount handling should interact
> >> > > with that...  
> >> >
> >> > Its not just icount - we need to handle it for all modes now. That said
> >> > seeing as we are at the end of a block shouldn't can_do_io be set?  
> >> 
> >> The lookup_tb_ptr helper gets called from tcg_gen_goto_tb(),
> >> which happens earlier than the tb_stop callback (it can
> >> happen in the trans function for branch etc insns, for
> >> example).
> >> 
> >> I think it should be OK to clear can_do_io at the start
> >> of the lookup_tb_ptr helper, something like:
> >> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> >> index 977576ca143..7818537f318 100644
> >> --- a/accel/tcg/cpu-exec.c
> >> +++ b/accel/tcg/cpu-exec.c
> >> @@ -396,6 +396,15 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState *env)
> >>  uint64_t cs_base;
> >>  uint32_t flags, cflags;
> >> 
> >> +/*
> >> + * By definition we've just finished a TB, so I/O is OK.
> >> + * Avoid the possibility of calling cpu_io_recompile() if
> >> + * a page table walk triggered by tb_lookup() calling
> >> + * probe_access_internal() happens to touch an MMIO device.
> >> + * The next TB, if we chain to it, will clear the flag again.
> >> + */
> >> +cpu->neg.can_do_io = true;
> >> +
> >>  cpu_get_tb_cpu_state(env, , _base, );
> >> 
> >>  cflags = curr_cflags(cpu);
> >> 
> >> -- PMM
> >
> > No joy.  Seems like a very similar backtrace.
> >
> > Thread 5 "qemu-system-x86" received signal SIGABRT, Aborted.
> > [Switching to Thread 0x74efe6c0 (LWP 23937)]
> > __pthread_kill_implementation (no_tid=0, signo=6, threadid=) 
> > at ./nptl/pthread_kill.c:44
> > 44  ./nptl/pthread_kill.c: No such file or directory.
> > (gdb) bt
> > #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid= > out>) at ./nptl/pthread_kill.c:44
> > #1  __pthread_kill_internal (signo=6, threadid=) at 
> > ./nptl/pthread_kill.c:78
> > #2  __GI___pthread_kill (threadid=, signo=signo@entry=6) at 
> > ./nptl/pthread_kill.c:89
> > #3  0x777c43b6 in __GI_raise (sig=sig@entry=6) at 
> > ../sysdeps/posix/raise.c:26
> > #4  0x777aa87c in __GI_abort () at ./stdlib/abort.c:79
> > #5  0x55c4d19e in cpu_abort (cpu=cpu@entry=0x578e0cb0, 
> > fmt=fmt@entry=0x56048ee8 "cpu_io_recompile: could not find TB for 
> > pc=%p") at ../../cpu-target.c:373
> > #6  0x55c9cb25 in cpu_io_recompile (cpu=cpu@entry=0x578e0cb0, 
> > retaddr=retaddr@entry=0) at ../../accel/tcg/translate-all.c:611
> > #7  0x55c9f744 in io_prepare (retaddr=0, addr=19595790664, 
> > attrs=..., xlat=, cpu=0x578e0cb0, out_offset= > pointer>) at ../../accel/tcg/cputlb.c:1339
> > #8  do_ld_mmio_beN (cpu=0x578e0cb0, full=0x7ffe88012890, 
> > ret_be=ret_be@entry=0, addr=19595790664, size=size@entry=8, mmu_idx=4, 
> > type=MMU_DATA_LOAD, ra=0) at ../../accel/tcg/cputlb.c:2030
> > #9  0x55ca0ecd in do_ld_8 (cpu=cpu@entry=0x578e0cb0, 
> > p=p@entry=0x74efcdd0, mmu_idx=, 
> > type=type@entry=MMU_DATA_LOAD, memop=, ra=ra@entry=0) at 
> > ../../accel/tcg/cputlb.c:2356
> > #10 0x55ca332f in do_ld8_mmu (cpu=cpu@entry=0x578e0cb0, 
> > addr=addr@entry=19595790664, oi=oi@entry=52, ra=ra@entry=0, 
> > access_type=access_type@entry=MMU_DATA_LOAD) at 
> > ../../accel/tcg/cputlb.c:2439
> > #11 0x55ca5e69 in cpu_ldq_mmu (ra=0, oi=52, addr=19595790664, 
> > env=0x578e3470) at ../../accel/tcg/ldst_common.c.inc:169
> > #12 cpu_ldq_le_mmuidx_ra (env=0x578e3470, addr=19595790664, 
> > mmu_idx=, ra=ra@entry=0) at 
> > ../../accel/tcg/ldst_common.c.inc:301
> > #13 0x55b4b5de in ptw_ldq (in=0x74efcf10) at 
> > ../../target/i386/tcg/sysemu/excp_helper.c:98
> > #14 ptw_ldq (in=0x74efcf10) at 
> > ../../target/i386/tcg/sysemu/excp_helper.c:93
> > #15 mmu_translate (env=env@entry=0x578e3470, in=0x74efcfd0, 
> > out=0x74efcfa0, err=err@entry=0x74efcfb0) at 
> > 

Re: [External] Re: [QEMU-devel][RFC PATCH 1/1] backends/hostmem: qapi/qom: Add an ObjectOption for memory-backend-* called HostMemType and its arg 'cxlram'

2024-01-09 Thread Gregory Price
On Tue, Jan 09, 2024 at 01:27:28PM -0800, Hao Xiang wrote:
> On Tue, Jan 9, 2024 at 11:58 AM Gregory Price
>  wrote:
> >
> > If you drop this line:
> >
> > -numa node,memdev=vmem0,nodeid=1
> 
> We tried this as well and it works after going through the cxlcli
> process and created the devdax device. The problem is that without the
> "nodeid=1" configuration, we cannot connect with the explicit per numa
> node latency/bandwidth configuration "-numa hmat-lb". I glanced at the
> code in hw/numa.c, parse_numa_hmat_lb() looks like the one passing the
> lb information to VM's hmat.
>

Yeah, this is what Jonathan was saying - right now there isn't a good
way (in QEMU) to pass the hmat/cdat stuff down through the device.
Needs to be plumbed out.

In the meantime: You should just straight up drop the cxl device from
your QEMU config.  It doesn't actually get you anything.

> From what I understand so far, the guest kernel will dynamically
> create a numa node after a cxl devdax device is created. That means we
> don't know the numa node until after VM boot. 2. QEMU can only
> statically parse the lb information to the VM at boot time. How do we
> connect these two things?

during boot, the kernel discovers all the memory regions exposed to
bios. In this qemu configuration you have defined:

region 0: CPU + DRAM node
region 1: DRAM only node
region 2: CXL Fixed Memory Window (the last line of the cxl stuff)

The kernel reads this information on boot and reserves 1 numa node for
each of these regions.

The kernel then automatically brings up regions 0 and 1 in nodes 0 and 1
respectively.

Node2 sits dormant until you go through the cxl-cli startup sequence.


What you're asking for is for the QEMU team to plumb hmat/cdat
information down through the type3 device.  I *think* that is presently
possible with a custom CDAT file - but Jonathan probably has more
details on that.  You'll have to go digging for answers on that one.


Now - even if you did that - the current state of the cxl-type3 device
is *not what you want* because your memory accesses will be routed
through the read/write functions in the emulated device.

What Jonathan and I discussed on the other thread is how you might go
about slimming this down to allow pass-through of the memory without the
need for all the fluff.  This is a non-trivial refactor of the existing
device, so i would not expect that any time soon.

At the end of the day, quickest way to get-there-from-here is to just
drop the cxl related lines from your QEMU config, and keep everything
else.

> 
> Assuming that the same issue applies to a physical server with CXL.
> Were you able to see a host kernel getting the correct lb information
> for a CXL devdax device?
> 

Yes, if you bring up a CXL device via cxl-cli on real hardware, the
subsequent numa node ends up in the "lower tier" of the memory-tiering
infrastructure.

~Gregory



Re: [External] Re: [QEMU-devel][RFC PATCH 1/1] backends/hostmem: qapi/qom: Add an ObjectOption for memory-backend-* called HostMemType and its arg 'cxlram'

2024-01-09 Thread Gregory Price
On Tue, Jan 09, 2024 at 11:33:04AM -0800, Hao Xiang wrote:
> On Mon, Jan 8, 2024 at 5:13 PM Gregory Price  
> wrote:
> 
> Sounds like the technical details are explained on the other thread.
> From what I understand now, if we don't go through a complex CXL
> setup, it wouldn't go through the emulation path.
> 
> Here is our exact setup. Guest runs Linux kernel 6.6rc2
> 
> taskset --cpu-list 0-47,96-143 \
> numactl -N 0 -m 0 ${QEMU} \
> -M q35,cxl=on,hmat=on \
> -m 64G \
> -smp 8,sockets=1,cores=8,threads=1 \
> -object memory-backend-ram,id=ram0,size=45G \
> -numa node,memdev=ram0,cpus=0-7,nodeid=0 \
> -msg timestamp=on -L /usr/share/seabios \
> -enable-kvm \
> -object 
> memory-backend-ram,id=vmem0,size=19G,host-nodes=${HOST_CXL_NODE},policy=bind
> \
> -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
> -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
> -device cxl-type3,bus=root_port13,volatile-memdev=vmem0,id=cxl-vmem0 \
> -numa node,memdev=vmem0,nodeid=1 \
> -M 
> cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=19G,cxl-fmw.0.interleave-granularity=8k

:] you did what i thought you did

-numa node,memdev=vmem0,nodeid=1

"""
Another possiblity: You mapped this memory-backend into another numa
node explicitly and never onlined the memory via cxlcli.  I've done
this, and it works, but it's a "hidden feature" that probably should
not exist / be supported.
"""

You're mapping vmem0 into an explicit numa node *and* into the type3
device.  You don't need to do both - and technically this shouldn't be
allowed.

With this configuration, you can go thorugh the cxl-cli setup process
for the CXL device, you'll find that you can create *another* node
(node 2 in this case) that maps to the same memory you mapped to node1..


You can drop the cxl devices objects in here and the memory will still
come up the way you want it to.

If you drop this line:

-numa node,memdev=vmem0,nodeid=1

You have to use the CXL driver to instantiate the dax device and the
numa node, and at *that* point you will see the read/write functions
being called.

~Gregory



Re: [External] Re: [QEMU-devel][RFC PATCH 1/1] backends/hostmem: qapi/qom: Add an ObjectOption for memory-backend-* called HostMemType and its arg 'cxlram'

2024-01-08 Thread Gregory Price
On Mon, Jan 08, 2024 at 05:05:38PM -0800, Hao Xiang wrote:
> On Mon, Jan 8, 2024 at 2:47 PM Hao Xiang  wrote:
> >
> > On Mon, Jan 8, 2024 at 9:15 AM Gregory Price  
> > wrote:
> > >
> > > On Fri, Jan 05, 2024 at 09:59:19PM -0800, Hao Xiang wrote:
> > > > On Wed, Jan 3, 2024 at 1:56 PM Gregory Price 
> > > >  wrote:
> > > > >
> > > > > For a variety of performance reasons, this will not work the way you
> > > > > want it to.  You are essentially telling QEMU to map the vmem0 into a
> > > > > virtual cxl device, and now any memory accesses to that memory region
> > > > > will end up going through the cxl-type3 device logic - which is an IO
> > > > > path from the perspective of QEMU.
> > > >
> > > > I didn't understand exactly how the virtual cxl-type3 device works. I
> > > > thought it would go with the same "guest virtual address ->  guest
> > > > physical address -> host physical address" translation totally done by
> > > > CPU. But if it is going through an emulation path handled by virtual
> > > > cxl-type3, I agree the performance would be bad. Do you know why
> > > > accessing memory on a virtual cxl-type3 device can't go with the
> > > > nested page table translation?
> > > >
> > >
> > > Because a byte-access on CXL memory can have checks on it that must be
> > > emulated by the virtual device, and because there are caching
> > > implications that have to be emulated as well.
> >
> > Interesting. Now that I see the cxl_type3_read/cxl_type3_write. If the
> > CXL memory data path goes through them, the performance would be
> > pretty problematic. We have actually run Intel's Memory Latency
> > Checker benchmark from inside a guest VM with both system-DRAM and
> > virtual CXL-type3 configured. The idle latency on the virtual CXL
> > memory is 2X of system DRAM, which is on-par with the benchmark
> > running from a physical host. I need to debug this more to understand
> > why the latency is actually much better than I would expect now.
> 
> So we double checked on benchmark testing. What we see is that running
> Intel Memory Latency Checker from a guest VM with virtual CXL memory
> VS from a physical host with CXL1.1 memory expander has the same
> latency.
> 
> From guest VM: local socket system-DRAM latency is 117.0ns, local
> socket CXL-DRAM latency is 269.4ns
> From physical host: local socket system-DRAM latency is 113.6ns ,
> local socket CXL-DRAM latency is 267.5ns
> 
> I also set debugger breakpoints on cxl_type3_read/cxl_type3_write
> while running the benchmark testing but those two functions are not
> ever hit. We used the virtual CXL configuration while launching QEMU
> but the CXL memory is present as a separate NUMA node and we are not
> creating devdax devices. Does that make any difference?
> 

Could you possibly share your full QEMU configuration and what OS/kernel
you are running inside the guest?

The only thing I'm surprised by is that the numa node appears without
requiring the driver to generate the NUMA node.  It's possible I missed
a QEMU update that allows this.

~Gregory



Re: [External] Re: [QEMU-devel][RFC PATCH 1/1] backends/hostmem: qapi/qom: Add an ObjectOption for memory-backend-* called HostMemType and its arg 'cxlram'

2024-01-08 Thread Gregory Price
On Fri, Jan 05, 2024 at 09:59:19PM -0800, Hao Xiang wrote:
> On Wed, Jan 3, 2024 at 1:56 PM Gregory Price  
> wrote:
> >
> > For a variety of performance reasons, this will not work the way you
> > want it to.  You are essentially telling QEMU to map the vmem0 into a
> > virtual cxl device, and now any memory accesses to that memory region
> > will end up going through the cxl-type3 device logic - which is an IO
> > path from the perspective of QEMU.
> 
> I didn't understand exactly how the virtual cxl-type3 device works. I
> thought it would go with the same "guest virtual address ->  guest
> physical address -> host physical address" translation totally done by
> CPU. But if it is going through an emulation path handled by virtual
> cxl-type3, I agree the performance would be bad. Do you know why
> accessing memory on a virtual cxl-type3 device can't go with the
> nested page table translation?
>

Because a byte-access on CXL memory can have checks on it that must be
emulated by the virtual device, and because there are caching
implications that have to be emulated as well.

The cxl device you are using is an emulated CXL device - not a
virtualization interface.  Nuanced difference:  the emulated device has
to emulate *everything* that CXL device does.

What you want is passthrough / managed access to a real device -
virtualization.  This is not the way to accomplish that.  A better way
to accomplish that is to simply pass the memory through as a static numa
node as I described.

> 
> When we had a discussion with Intel, they told us to not use the KVM
> option in QEMU while using virtual cxl type3 device. That's probably
> related to the issue you described here? We enabled KVM though but
> haven't seen the crash yet.
>

The crash really only happens, IIRC, if code ends up hosted in that
memory.  I forget the exact scenario, but the working theory is it has
to do with the way instruction caches are managed with KVM and this
device.

> >
> > You're better off just using the `host-nodes` field of host-memory
> > and passing bandwidth/latency attributes though via `-numa hmat-lb`
> 
> We tried this but it doesn't work from end to end right now. I
> described the issue in another fork of this thread.
>
> >
> > In that scenario, the guest software doesn't even need to know CXL
> > exists at all, it can just read the attributes of the numa node
> > that QEMU created for it.
> 
> We thought about this before. But the current kernel implementation
> requires a devdax device to be probed and recognized as a slow tier
> (by reading the memory attributes). I don't think this can be done via
> the path you described. Have you tried this before?
>

Right, because the memory tiering component lumps the nodes together.

Better idea:  Fix the memory tiering component

I cc'd you on another patch line that is discussing something relevant
to this.

https://lore.kernel.org/linux-mm/87fs00njft@yhuang6-desk2.ccr.corp.intel.com/T/#m32d58f8cc607aec942995994a41b17ff711519c8

The point is: There's no need for this to be a dax device at all, there
is no need for the guest to even know what is providing the memory, or
for the guest to have any management access to the memory.  It just
wants the memory and the ability to tier it.

So we should fix the memory tiering component to work with this
workflow.

~Gregory



Re: [QEMU-devel][RFC PATCH 1/1] backends/hostmem: qapi/qom: Add an ObjectOption for memory-backend-* called HostMemType and its arg 'cxlram'

2024-01-03 Thread Gregory Price
On Sun, Dec 31, 2023 at 11:53:15PM -0800, Ho-Ren (Jack) Chuang wrote:
> Introduce a new configuration option 'host-mem-type=' in the
> '-object memory-backend-ram', allowing users to specify
> from which type of memory to allocate.
> 
> Users can specify 'cxlram' as an argument, and QEMU will then
> automatically locate CXL RAM NUMA nodes and use them as the backend memory.
> For example:
>   -object memory-backend-ram,id=vmem0,size=19G,host-mem-type=cxlram \

Stupid questions:

Why not just use `host-nodes` and pass in the numa node you want to
allocate from? Why should QEMU be made "CXL-aware" in the sense that
QEMU is responsible for figuring out what host node has CXL memory?

This feels like an "upper level software" operation (orchestration), rather
than something qemu should internally understand.

>   -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
>   -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
>   -device cxl-type3,bus=root_port13,volatile-memdev=vmem0,id=cxl-vmem0 \

For a variety of performance reasons, this will not work the way you
want it to.  You are essentially telling QEMU to map the vmem0 into a
virtual cxl device, and now any memory accesses to that memory region
will end up going through the cxl-type3 device logic - which is an IO
path from the perspective of QEMU.

You may want to double check that your tests aren't using swap space in
the guest, because I found in my testing that linux would prefer to use
swap rather than attempt to use virtual cxl-type3 device memory because
of how god-awful slow it is (even if it is backed by DRAM).


Additionally, this configuration will not (should not) presently work
with VMX enabled.  Unless I missed some other update, QEMU w/ CXL memory
presently crashes when VMX is enabled for not-yet-debugged reasons.

Another possiblity: You mapped this memory-backend into another numa
node explicitly and never onlined the memory via cxlcli.  I've done
this, and it works, but it's a "hidden feature" that probably should
not exist / be supported.



If I understand the goal here, it's to pass CXL-hosted DRAM through to
the guest in a way that the system can manage it according to its
performance attributes.

You're better off just using the `host-nodes` field of host-memory
and passing bandwidth/latency attributes though via `-numa hmat-lb`

In that scenario, the guest software doesn't even need to know CXL
exists at all, it can just read the attributes of the numa node
that QEMU created for it.

In the future to deal with proper dynamic capacity, we may need to
consider a different backend object altogether that allows sparse
allocations, and a virtual cxl device which pre-allocates the CFMW
can at least be registered to manage it.  I'm not quite sure how that
looks just yet.

For example: 1-socket, 4 CPU QEMU instance w/ 4GB on a cpu-node and 4GB
on a cpuless node.

qemu-system-x86_64 \
-nographic \
-accel kvm \
-machine type=q35,hmat=on \
-drive file=./myvm.qcow2,format=qcow2,index=0,media=disk,id=hd \
-m 8G,slots=4,maxmem=16G \
-smp cpus=4 \
-object memory-backend-ram,size=4G,id=ram-node0,numa=X \ <-- extend here
-object memory-backend-ram,size=4G,id=ram-node1,numa=Y \ <-- extend here
-numa node,nodeid=0,cpus=0-4,memdev=ram-node0 \  <-- cpu node
-numa node,initiator=0,nodeid=1,memdev=ram-node1 \   <-- cpuless node
-netdev bridge,id=hn0,br=virbr0 \
-device virtio-net-pci,netdev=hn0,id=nic1,mac=52:54:00:12:34:77 \
-numa 
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=10
 \
-numa 
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=10485760
 \
-numa 
hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=20
 \
-numa 
hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=5242880

[root@fedora ~]# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3
node 0 size: 3965 MB
node 0 free: 3611 MB
node 1 cpus:
node 1 size: 3986 MB
node 1 free: 3960 MB
node distances:
node   0   1
  0:  10  20
  1:  20  10

[root@fedora initiators]# cd /sys/devices/system/node/node1/access0/initiators
node0  read_bandwidth  read_latency  write_bandwidth  write_latency
[root@fedora initiators]# cat read_bandwidth
5
[root@fedora initiators]# cat read_latency
20


~Gregory



Re: [PATCH 00/19] QEMU: CXL mailbox rework and features

2023-09-28 Thread Gregory Price
On Mon, Sep 25, 2023 at 05:11:05PM +0100, Jonathan Cameron wrote:
> I've been carrying most of this series on our CXL staging tree
> https://gitlab.com/jic23/qemu for some time and a lot of more recent
> work around Multi Head Devices and Dynamic Capacity that we need for
> Linux kernel enabling are backed up behind it. Hence I reorganized my
> queue to try and land this before other less 'central' features such
> as CXL PMUs and arm/virt support.
... 
> 
> Jonathan Cameron (15):
>   hw/cxl/mbox: Pull the payload out of struct cxl_cmd and make instances
> constant
>   hw/cxl/mbox: Split mailbox command payload into separate input and
> output
>   hw/cxl/mbox: Pull the CCI definition out of the CXLDeviceState
>   hw/cxl/mbox: Generalize the CCI command processing
>   hw/pci-bridge/cxl_upstream: Move defintion of device to header.

To save some list spam, I can't say i've reviewed and tested the entire
set, but this patch series to help model the Niagara work so please add
my tags as appropriate to the above.

Reviewed-by: Gregory Price 
Tested-by: Gregory Price 

~Gregory



[PATCH v2] cxl/vendor: update niagara to only build on linux, add KConfig options

2023-09-22 Thread Gregory Price
Niagara uses  which presently limits its compatibility to
linux hosts.  Change build to only build it on linux.

Add Kconfig file for skhynix directory, and make niagara depend on
CXL_MEM_DEVICE and LINUX.  Add an explicit flag for niagara.

Signed-off-by: Gregory Price 
---
 hw/cxl/Kconfig| 2 ++
 hw/cxl/vendor/Kconfig | 1 +
 hw/cxl/vendor/skhynix/Kconfig | 4 
 hw/cxl/vendor/skhynix/meson.build | 2 +-
 4 files changed, 8 insertions(+), 1 deletion(-)
 create mode 100644 hw/cxl/vendor/Kconfig
 create mode 100644 hw/cxl/vendor/skhynix/Kconfig

diff --git a/hw/cxl/Kconfig b/hw/cxl/Kconfig
index dd6c54b54d..88022008c7 100644
--- a/hw/cxl/Kconfig
+++ b/hw/cxl/Kconfig
@@ -1,3 +1,5 @@
+source vendor/Kconfig
+
 config CXL
 bool
 default y if PCI_EXPRESS
diff --git a/hw/cxl/vendor/Kconfig b/hw/cxl/vendor/Kconfig
new file mode 100644
index 00..aa23bb051b
--- /dev/null
+++ b/hw/cxl/vendor/Kconfig
@@ -0,0 +1 @@
+source skhynix/Kconfig
diff --git a/hw/cxl/vendor/skhynix/Kconfig b/hw/cxl/vendor/skhynix/Kconfig
new file mode 100644
index 00..20942cffc2
--- /dev/null
+++ b/hw/cxl/vendor/skhynix/Kconfig
@@ -0,0 +1,4 @@
+config CXL_SKHYNIX_NIAGARA
+bool
+depends on CXL_MEM_DEVICE && LINUX
+default y if CXL_VENDOR
diff --git a/hw/cxl/vendor/skhynix/meson.build 
b/hw/cxl/vendor/skhynix/meson.build
index 4e57db65f1..e3cb00e848 100644
--- a/hw/cxl/vendor/skhynix/meson.build
+++ b/hw/cxl/vendor/skhynix/meson.build
@@ -1 +1 @@
-system_ss.add(when: 'CONFIG_CXL_VENDOR', if_true: files('skhynix_niagara.c',))
+system_ss.add(when: 'CONFIG_CXL_SKHYNIX_NIAGARA', if_true: 
files('skhynix_niagara.c',))
-- 
2.39.1




[PATCH] cxl/vendor: update niagara to only build on linux, add KConfig options

2023-09-20 Thread Gregory Price
Niagara uses  which presently limits its compatibility to
linux hosts.  Change build to only build it on linux.

Add Kconfig file for skhynix directory, and make niagara depend on
CXL_MEM_DEVICE.  Add an explicit flag for niagara.

Signed-off-by: Gregory Price 
---
 hw/cxl/Kconfig| 2 ++
 hw/cxl/vendor/Kconfig | 1 +
 hw/cxl/vendor/skhynix/Kconfig | 4 
 hw/cxl/vendor/skhynix/meson.build | 4 +++-
 4 files changed, 10 insertions(+), 1 deletion(-)
 create mode 100644 hw/cxl/vendor/Kconfig
 create mode 100644 hw/cxl/vendor/skhynix/Kconfig

diff --git a/hw/cxl/Kconfig b/hw/cxl/Kconfig
index dd6c54b54d..88022008c7 100644
--- a/hw/cxl/Kconfig
+++ b/hw/cxl/Kconfig
@@ -1,3 +1,5 @@
+source vendor/Kconfig
+
 config CXL
 bool
 default y if PCI_EXPRESS
diff --git a/hw/cxl/vendor/Kconfig b/hw/cxl/vendor/Kconfig
new file mode 100644
index 00..aa23bb051b
--- /dev/null
+++ b/hw/cxl/vendor/Kconfig
@@ -0,0 +1 @@
+source skhynix/Kconfig
diff --git a/hw/cxl/vendor/skhynix/Kconfig b/hw/cxl/vendor/skhynix/Kconfig
new file mode 100644
index 00..382fa0cd6c
--- /dev/null
+++ b/hw/cxl/vendor/skhynix/Kconfig
@@ -0,0 +1,4 @@
+config CXL_SKHYNIX_NIAGARA
+bool
+depends on CXL_MEM_DEVICE
+default y if CXL_VENDOR
diff --git a/hw/cxl/vendor/skhynix/meson.build 
b/hw/cxl/vendor/skhynix/meson.build
index 4e57db65f1..6f194aa517 100644
--- a/hw/cxl/vendor/skhynix/meson.build
+++ b/hw/cxl/vendor/skhynix/meson.build
@@ -1 +1,3 @@
-system_ss.add(when: 'CONFIG_CXL_VENDOR', if_true: files('skhynix_niagara.c',))
+if targetos == 'linux'
+system_ss.add(when: 'CONFIG_CXL_SKHYNIX_NIAGARA', if_true: 
files('skhynix_niagara.c',))
+endif
-- 
2.39.1




Re: [PATCH v4 1/1] cxl/vendor: SK hynix Niagara Multi-Headed SLD Device

2023-09-20 Thread Gregory Price
On Wed, Sep 20, 2023 at 01:46:18PM +0100, Jonathan Cameron wrote:
> On Mon, 18 Sep 2023 13:36:56 -0400
> Gregory Price  wrote:
> 
> > Create a new device to emulate the SK hynix Niagara MHSLD platform.
> > 
> 
> Hi Gregory,
> 
> Seems this doesn't drop in directly on top of the rest of v3.
> The mhd_access_valid check has moved from being a class thing to
> an instance thing.
> 
> So for now I've reverted that bit in my local tree and will probably
> push out later.  Whilst here, some trivial formatting stuff inline
> that I was carrying.
> 
> Thanks,
> 
> Jonathan
> 

BAH! It seems i've edited an old version of the commit!

Disregard *facepalm*, I will send a new one today.

~Gregory



[PATCH v4 1/1] cxl/vendor: SK hynix Niagara Multi-Headed SLD Device

2023-09-18 Thread Gregory Price
Create a new device to emulate the SK hynix Niagara MHSLD platform.

This device has custom CCI commands that allow for applying isolation
to each memory block between hosts. This enables an early form of
dynamic capacity, whereby the NUMA node maps the entire region, but
the host is responsible for asking the device which memory blocks
are allocated to it, and therefore may be onlined.

To instantiate:

-device 
cxl-skh-niagara,cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0,sn=6,mhd-head=0,mhd-shmid=15

The linux kernel will require raw CXL commands enabled to allow for
passing through of Niagara CXL commands via the CCI mailbox.

The Niagara MH-SLD has a shared memory region that must be initialized
using the 'init_niagara' tool located in the vendor subdirectory

usage: init_niagara
heads : number of heads on the device
sections  : number of sections
section_size  : size of a section in 128mb increments
shmid : shmid produced by ipcmk

Example:
$shmid1=ipcmk -M 131072
./init_niagara 4 32 1 $shmid1

Signed-off-by: Gregory Price 
Signed-off-by: Junhee Ryu 
Signed-off-by: Kwangjin Ko 
---
 hw/cxl/Kconfig  |   6 +
 hw/cxl/meson.build  |   2 +
 hw/cxl/vendor/Kconfig   |   1 +
 hw/cxl/vendor/meson.build   |   1 +
 hw/cxl/vendor/skhynix/.gitignore|   1 +
 hw/cxl/vendor/skhynix/Kconfig   |   4 +
 hw/cxl/vendor/skhynix/init_niagara.c|  99 +
 hw/cxl/vendor/skhynix/meson.build   |   3 +
 hw/cxl/vendor/skhynix/skhynix_niagara.c | 516 
 hw/cxl/vendor/skhynix/skhynix_niagara.h | 162 
 10 files changed, 795 insertions(+)
 create mode 100644 hw/cxl/vendor/Kconfig
 create mode 100644 hw/cxl/vendor/meson.build
 create mode 100644 hw/cxl/vendor/skhynix/.gitignore
 create mode 100644 hw/cxl/vendor/skhynix/Kconfig
 create mode 100644 hw/cxl/vendor/skhynix/init_niagara.c
 create mode 100644 hw/cxl/vendor/skhynix/meson.build
 create mode 100644 hw/cxl/vendor/skhynix/skhynix_niagara.c
 create mode 100644 hw/cxl/vendor/skhynix/skhynix_niagara.h

diff --git a/hw/cxl/Kconfig b/hw/cxl/Kconfig
index c9b2e46bac..88022008c7 100644
--- a/hw/cxl/Kconfig
+++ b/hw/cxl/Kconfig
@@ -1,6 +1,12 @@
+source vendor/Kconfig
+
 config CXL
 bool
 default y if PCI_EXPRESS
 
+config CXL_VENDOR
+bool
+default y
+
 config I2C_MCTP_CXL
 bool
diff --git a/hw/cxl/meson.build b/hw/cxl/meson.build
index 1393821fc4..e8c8c1355a 100644
--- a/hw/cxl/meson.build
+++ b/hw/cxl/meson.build
@@ -15,3 +15,5 @@ system_ss.add(when: 'CONFIG_CXL',
 system_ss.add(when: 'CONFIG_I2C_MCTP_CXL', if_true: files('i2c_mctp_cxl.c'))
 
 system_ss.add(when: 'CONFIG_ALL', if_true: files('cxl-host-stubs.c'))
+
+subdir('vendor')
diff --git a/hw/cxl/vendor/Kconfig b/hw/cxl/vendor/Kconfig
new file mode 100644
index 00..aa23bb051b
--- /dev/null
+++ b/hw/cxl/vendor/Kconfig
@@ -0,0 +1 @@
+source skhynix/Kconfig
diff --git a/hw/cxl/vendor/meson.build b/hw/cxl/vendor/meson.build
new file mode 100644
index 00..12db8991f1
--- /dev/null
+++ b/hw/cxl/vendor/meson.build
@@ -0,0 +1 @@
+subdir('skhynix')
diff --git a/hw/cxl/vendor/skhynix/.gitignore b/hw/cxl/vendor/skhynix/.gitignore
new file mode 100644
index 00..6d96de38ea
--- /dev/null
+++ b/hw/cxl/vendor/skhynix/.gitignore
@@ -0,0 +1 @@
+init_niagara
diff --git a/hw/cxl/vendor/skhynix/Kconfig b/hw/cxl/vendor/skhynix/Kconfig
new file mode 100644
index 00..382fa0cd6c
--- /dev/null
+++ b/hw/cxl/vendor/skhynix/Kconfig
@@ -0,0 +1,4 @@
+config CXL_SKHYNIX_NIAGARA
+bool
+depends on CXL_MEM_DEVICE
+default y if CXL_VENDOR
diff --git a/hw/cxl/vendor/skhynix/init_niagara.c 
b/hw/cxl/vendor/skhynix/init_niagara.c
new file mode 100644
index 00..2c189dc33c
--- /dev/null
+++ b/hw/cxl/vendor/skhynix/init_niagara.c
@@ -0,0 +1,99 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Copyright (c) 2023 MemVerge Inc.
+ * Copyright (c) 2023 SK hynix Inc.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct niagara_state {
+uint8_t nr_heads;
+uint8_t nr_lds;
+uint8_t ldmap[65536];
+uint32_t total_sections;
+uint32_t free_sections;
+uint32_t section_size;
+uint32_t sections[];
+};
+
+int main(int argc, char *argv[])
+{
+int shmid = 0;
+uint32_t sections = 0;
+uint32_t section_size = 0;
+uint32_t heads = 0;
+struct niagara_state *niagara_state = NULL;
+size_t state_size;
+uint8_t i;
+
+if (argc != 5) {
+printf("usage: init_niagara
\n"
+"\theads : number of heads on the device\n"
+"\tsections  : number of sections\n"
+"\tsection_size  : size of a section in 128mb increments\n"
+"\tshmid : /tmp/mytoken.tmp\n\n"
+   

[PATCH v4 0/1] Niagara MHSLD

2023-09-18 Thread Gregory Price
v4 update: Kconfig and meson fixes

Since Niagara uses , it presently can only be built
for linux.  Also addings missing Kconfig files and options
to turn it off, and turns it off by default if VENDOR or
CXL_MEM_DEVICE are turned off.

Gregory Price (1):
  cxl/vendor: SK hynix Niagara Multi-Headed SLD Device

 hw/cxl/Kconfig  |   6 +
 hw/cxl/meson.build  |   2 +
 hw/cxl/vendor/Kconfig   |   1 +
 hw/cxl/vendor/meson.build   |   1 +
 hw/cxl/vendor/skhynix/.gitignore|   1 +
 hw/cxl/vendor/skhynix/Kconfig   |   4 +
 hw/cxl/vendor/skhynix/init_niagara.c|  99 +
 hw/cxl/vendor/skhynix/meson.build   |   3 +
 hw/cxl/vendor/skhynix/skhynix_niagara.c | 516 
 hw/cxl/vendor/skhynix/skhynix_niagara.h | 162 
 10 files changed, 795 insertions(+)
 create mode 100644 hw/cxl/vendor/Kconfig
 create mode 100644 hw/cxl/vendor/meson.build
 create mode 100644 hw/cxl/vendor/skhynix/.gitignore
 create mode 100644 hw/cxl/vendor/skhynix/Kconfig
 create mode 100644 hw/cxl/vendor/skhynix/init_niagara.c
 create mode 100644 hw/cxl/vendor/skhynix/meson.build
 create mode 100644 hw/cxl/vendor/skhynix/skhynix_niagara.c
 create mode 100644 hw/cxl/vendor/skhynix/skhynix_niagara.h

-- 
2.39.1




[PATCH v3 5/6] cxl/mailbox, type3: Implement MHD get info command callback

2023-09-06 Thread Gregory Price
For multi-headed type 3 devices, this command reports logical device
mappings for each head.  Implement a callback which can be
initialized by MHD devices to field these commands.

Reports "unsupported" if the command is called but the callback is
not implemented.

Signed-off-by: Gregory Price 
---
 hw/cxl/cxl-mailbox-utils.c  | 21 +
 hw/mem/cxl_type3.c  |  1 +
 include/hw/cxl/cxl_device.h |  6 ++
 3 files changed, 28 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index b64bbdf45d..3177a59de3 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -91,6 +91,8 @@ enum {
 #define GET_PHYSICAL_PORT_STATE 0x1
 TUNNEL = 0x53,
 #define MANAGEMENT_COMMAND 0x0
+MHD = 0x55,
+#define GET_MHD_INFO 0x0
 };
 
 /* CCI Message Format CXL r3.0 Figure 7-19 */
@@ -184,6 +186,23 @@ static CXLRetCode cmd_tunnel_management_cmd(const struct 
cxl_cmd *cmd,
 return CXL_MBOX_INVALID_INPUT;
 }
 
+/*
+ * CXL r3.0 section 7.6.7.5.1 - Get Multi-Headed Info (Opcode 5500h)
+ */
+static CXLRetCode cmd_mhd_get_info(const struct cxl_cmd *cmd,
+   uint8_t *payload_in, size_t len_in,
+   uint8_t *payload_out, size_t *len_out,
+   CXLCCI *cci)
+{
+CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
+CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
+if (cvc->mhd_get_info) {
+return cvc->mhd_get_info(cmd, payload_in, len_in, payload_out,
+ len_out, cci);
+}
+return CXL_MBOX_UNSUPPORTED;
+}
+
 static CXLRetCode cmd_events_get_records(const struct cxl_cmd *cmd,
  uint8_t *payload_in, size_t len_in,
  uint8_t *payload_out, size_t *len_out,
@@ -1598,6 +1617,8 @@ static const struct cxl_cmd cxl_cmd_set[256][256] = {
 cmd_media_inject_poison, 8, 0 },
 [MEDIA_AND_POISON][CLEAR_POISON] = { "MEDIA_AND_POISON_CLEAR_POISON",
 cmd_media_clear_poison, 72, 0 },
+[MHD][GET_MHD_INFO] = {"GET_MULTI_HEADED_INFO",
+cmd_mhd_get_info, 2, 0},
 };
 
 static const struct cxl_cmd cxl_cmd_set_dcd[256][256] = {
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 1fb3ffeca8..307d7c1fd8 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -2120,6 +2120,7 @@ static void ct3_class_init(ObjectClass *oc, void *data)
 cvc->get_lsa = get_lsa;
 cvc->set_lsa = set_lsa;
 cvc->set_cacheline = set_cacheline;
+cvc->mhd_get_info = NULL;
 cvc->mhd_access_valid = NULL;
 }
 
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 37893f8626..4a5d4bd98b 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -508,6 +508,12 @@ struct CXLType3Class {
 bool (*set_cacheline)(CXLType3Dev *ct3d, uint64_t dpa_offset, uint8_t 
*data);
 
 /* Multi-headed Device */
+CXLRetCode (*mhd_get_info)(const struct cxl_cmd *cmd,
+   uint8_t *payload_in,
+   size_t len_in,
+   uint8_t *payload_out,
+   size_t *len_out,
+   CXLCCI *cci);
 bool (*mhd_access_valid)(PCIDevice *d, uint64_t addr, unsigned int size);
 };
 
-- 
2.39.1




[PATCH v3 4/6] cxl/type3: add an optional mhd validation function for memory accesses

2023-09-06 Thread Gregory Price
When memory accesses are made, some MHSLD's would validate the address
is within the scope of allocated sections.  To do this, the base device
must call an optional function set by inherited devices.

Signed-off-by: Gregory Price 
---
 hw/mem/cxl_type3.c  | 15 +++
 include/hw/cxl/cxl_device.h |  3 +++
 2 files changed, 18 insertions(+)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 6e3309dc11..1fb3ffeca8 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -1034,6 +1034,7 @@ void ct3_realize(PCIDevice *pci_dev, Error **errp)
 goto err_release_cdat;
 }
 }
+
 return;
 
 err_release_cdat:
@@ -1249,6 +1250,7 @@ MemTxResult cxl_type3_read(PCIDevice *d, hwaddr 
host_addr, uint64_t *data,
unsigned size, MemTxAttrs attrs)
 {
 CXLType3Dev *ct3d = CXL_TYPE3(d);
+CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
 uint64_t dpa_offset = 0;
 AddressSpace *as = NULL;
 int res;
@@ -1259,6 +1261,11 @@ MemTxResult cxl_type3_read(PCIDevice *d, hwaddr 
host_addr, uint64_t *data,
 return MEMTX_ERROR;
 }
 
+if (cvc->mhd_access_valid &&
+!cvc->mhd_access_valid(d, dpa_offset, size)) {
+return MEMTX_ERROR;
+}
+
 if (sanitize_running(>cci)) {
 qemu_guest_getrandom_nofail(data, size);
 return MEMTX_OK;
@@ -1270,6 +1277,7 @@ MemTxResult cxl_type3_write(PCIDevice *d, hwaddr 
host_addr, uint64_t data,
 unsigned size, MemTxAttrs attrs)
 {
 CXLType3Dev *ct3d = CXL_TYPE3(d);
+CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
 uint64_t dpa_offset = 0;
 AddressSpace *as = NULL;
 int res;
@@ -1279,6 +1287,12 @@ MemTxResult cxl_type3_write(PCIDevice *d, hwaddr 
host_addr, uint64_t data,
 if (res) {
 return MEMTX_ERROR;
 }
+
+if (cvc->mhd_access_valid &&
+!cvc->mhd_access_valid(d, dpa_offset, size)) {
+return MEMTX_ERROR;
+}
+
 if (sanitize_running(>cci)) {
 return MEMTX_OK;
 }
@@ -2106,6 +2120,7 @@ static void ct3_class_init(ObjectClass *oc, void *data)
 cvc->get_lsa = get_lsa;
 cvc->set_lsa = set_lsa;
 cvc->set_cacheline = set_cacheline;
+cvc->mhd_access_valid = NULL;
 }
 
 static const TypeInfo ct3d_info = {
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 9c37a54699..37893f8626 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -506,6 +506,9 @@ struct CXLType3Class {
 void (*set_lsa)(CXLType3Dev *ct3d, const void *buf, uint64_t size,
 uint64_t offset);
 bool (*set_cacheline)(CXLType3Dev *ct3d, uint64_t dpa_offset, uint8_t 
*data);
+
+/* Multi-headed Device */
+bool (*mhd_access_valid)(PCIDevice *d, uint64_t addr, unsigned int size);
 };
 
 struct CSWMBCCIDev {
-- 
2.39.1




[PATCH v3 6/6] cxl/vendor: SK hynix Niagara Multi-Headed SLD Device

2023-09-06 Thread Gregory Price
Create a new device to emulate the SK hynix Niagara MHSLD platform.

This device has custom CCI commands that allow for applying isolation
to each memory block between hosts. This enables an early form of
dynamic capacity, whereby the NUMA node maps the entire region, but
the host is responsible for asking the device which memory blocks
are allocated to it, and therefore may be onlined.

To instantiate:

-device 
cxl-skh-niagara,cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0,sn=6,mhd-head=0,mhd-shmid=15

The linux kernel will require raw CXL commands enabled to allow for
passing through of Niagara CXL commands via the CCI mailbox.

The Niagara MH-SLD has a shared memory region that must be initialized
using the 'init_niagara' tool located in the vendor subdirectory

usage: init_niagara
heads : number of heads on the device
sections  : number of sections
section_size  : size of a section in 128mb increments
shmid : shmid produced by ipcmk

Example:
$shmid1=ipcmk -M 131072
./init_niagara 4 32 1 $shmid1

Signed-off-by: Gregory Price 
---
 hw/cxl/Kconfig  |   4 +
 hw/cxl/meson.build  |   2 +
 hw/cxl/vendor/meson.build   |   1 +
 hw/cxl/vendor/skhynix/.gitignore|   1 +
 hw/cxl/vendor/skhynix/init_niagara.c|  99 +
 hw/cxl/vendor/skhynix/meson.build   |   1 +
 hw/cxl/vendor/skhynix/skhynix_niagara.c | 514 
 hw/cxl/vendor/skhynix/skhynix_niagara.h | 161 
 8 files changed, 783 insertions(+)
 create mode 100644 hw/cxl/vendor/meson.build
 create mode 100644 hw/cxl/vendor/skhynix/.gitignore
 create mode 100644 hw/cxl/vendor/skhynix/init_niagara.c
 create mode 100644 hw/cxl/vendor/skhynix/meson.build
 create mode 100644 hw/cxl/vendor/skhynix/skhynix_niagara.c
 create mode 100644 hw/cxl/vendor/skhynix/skhynix_niagara.h

diff --git a/hw/cxl/Kconfig b/hw/cxl/Kconfig
index c9b2e46bac..dd6c54b54d 100644
--- a/hw/cxl/Kconfig
+++ b/hw/cxl/Kconfig
@@ -2,5 +2,9 @@ config CXL
 bool
 default y if PCI_EXPRESS
 
+config CXL_VENDOR
+bool
+default y
+
 config I2C_MCTP_CXL
 bool
diff --git a/hw/cxl/meson.build b/hw/cxl/meson.build
index 1393821fc4..e8c8c1355a 100644
--- a/hw/cxl/meson.build
+++ b/hw/cxl/meson.build
@@ -15,3 +15,5 @@ system_ss.add(when: 'CONFIG_CXL',
 system_ss.add(when: 'CONFIG_I2C_MCTP_CXL', if_true: files('i2c_mctp_cxl.c'))
 
 system_ss.add(when: 'CONFIG_ALL', if_true: files('cxl-host-stubs.c'))
+
+subdir('vendor')
diff --git a/hw/cxl/vendor/meson.build b/hw/cxl/vendor/meson.build
new file mode 100644
index 00..12db8991f1
--- /dev/null
+++ b/hw/cxl/vendor/meson.build
@@ -0,0 +1 @@
+subdir('skhynix')
diff --git a/hw/cxl/vendor/skhynix/.gitignore b/hw/cxl/vendor/skhynix/.gitignore
new file mode 100644
index 00..6d96de38ea
--- /dev/null
+++ b/hw/cxl/vendor/skhynix/.gitignore
@@ -0,0 +1 @@
+init_niagara
diff --git a/hw/cxl/vendor/skhynix/init_niagara.c 
b/hw/cxl/vendor/skhynix/init_niagara.c
new file mode 100644
index 00..2c189dc33c
--- /dev/null
+++ b/hw/cxl/vendor/skhynix/init_niagara.c
@@ -0,0 +1,99 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Copyright (c) 2023 MemVerge Inc.
+ * Copyright (c) 2023 SK hynix Inc.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct niagara_state {
+uint8_t nr_heads;
+uint8_t nr_lds;
+uint8_t ldmap[65536];
+uint32_t total_sections;
+uint32_t free_sections;
+uint32_t section_size;
+uint32_t sections[];
+};
+
+int main(int argc, char *argv[])
+{
+int shmid = 0;
+uint32_t sections = 0;
+uint32_t section_size = 0;
+uint32_t heads = 0;
+struct niagara_state *niagara_state = NULL;
+size_t state_size;
+uint8_t i;
+
+if (argc != 5) {
+printf("usage: init_niagara
\n"
+"\theads : number of heads on the device\n"
+"\tsections  : number of sections\n"
+"\tsection_size  : size of a section in 128mb increments\n"
+"\tshmid : /tmp/mytoken.tmp\n\n"
+"It is recommended your shared memory region is at least 
128kb\n");
+return -1;
+}
+
+/* must have at least 1 head */
+heads = (uint32_t)atoi(argv[1]);
+if (heads == 0 || heads > 32) {
+printf("bad heads argument (1-32)\n");
+return -1;
+}
+
+/* Get number of sections */
+sections = (uint32_t)atoi(argv[2]);
+if (sections == 0) {
+printf("bad sections argument\n");
+return -1;
+}
+
+section_size = (uint32_t)atoi(argv[3]);
+if (sections == 0) {
+printf("bad section size argument\n");
+return -1;
+}
+
+shmid = (uint32_t)atoi(argv[4]);
+if (shmid == 0) {
+printf(&q

[PATCH v3 3/6] cxl/type3: Expose ct3 functions so that inheriters can call them

2023-09-06 Thread Gregory Price
For devices built on top of ct3, we need the init, realize, and
exit functions exposed to correctly start up and tear down.

Signed-off-by: Gregory Price 
---
 hw/mem/cxl_type3.c  | 6 +++---
 include/hw/cxl/cxl_device.h | 4 
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 80d596ee10..6e3309dc11 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -950,7 +950,7 @@ static DOEProtocol doe_spdm_prot[] = {
 { }
 };
 
-static void ct3_realize(PCIDevice *pci_dev, Error **errp)
+void ct3_realize(PCIDevice *pci_dev, Error **errp)
 {
 CXLType3Dev *ct3d = CXL_TYPE3(pci_dev);
 CXLComponentState *cxl_cstate = >cxl_cstate;
@@ -1054,7 +1054,7 @@ err_address_space_free:
 return;
 }
 
-static void ct3_exit(PCIDevice *pci_dev)
+void ct3_exit(PCIDevice *pci_dev)
 {
 CXLType3Dev *ct3d = CXL_TYPE3(pci_dev);
 CXLComponentState *cxl_cstate = >cxl_cstate;
@@ -1285,7 +1285,7 @@ MemTxResult cxl_type3_write(PCIDevice *d, hwaddr 
host_addr, uint64_t data,
 return address_space_write(as, dpa_offset, attrs, , size);
 }
 
-static void ct3d_reset(DeviceState *dev)
+void ct3d_reset(DeviceState *dev)
 {
 CXLType3Dev *ct3d = CXL_TYPE3(dev);
 uint32_t *reg_state = ct3d->cxl_cstate.crb.cache_mem_registers;
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index e824c5ade8..9c37a54699 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -524,6 +524,10 @@ MemTxResult cxl_type3_read(PCIDevice *d, hwaddr host_addr, 
uint64_t *data,
 MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
 unsigned size, MemTxAttrs attrs);
 
+void ct3_realize(PCIDevice *pci_dev, Error **errp);
+void ct3_exit(PCIDevice *pci_dev);
+void ct3d_reset(DeviceState *d);
+
 uint64_t cxl_device_get_timestamp(CXLDeviceState *cxlds);
 
 void cxl_event_init(CXLDeviceState *cxlds, int start_msg_num);
-- 
2.39.1




[PATCH v3 1/6] cxl/mailbox: move mailbox effect definitions to a header

2023-09-06 Thread Gregory Price
Preparation for allowing devices to define their own CCI commands

Signed-off-by: Gregory Price 
---
 hw/cxl/cxl-mailbox-utils.c   | 30 +-
 include/hw/cxl/cxl_mailbox.h | 18 ++
 2 files changed, 31 insertions(+), 17 deletions(-)
 create mode 100644 include/hw/cxl/cxl_mailbox.h

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 4e8651ebe2..b64bbdf45d 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -12,6 +12,7 @@
 #include "hw/pci/msix.h"
 #include "hw/cxl/cxl.h"
 #include "hw/cxl/cxl_events.h"
+#include "hw/cxl/cxl_mailbox.h"
 #include "hw/pci/pci.h"
 #include "hw/pci-bridge/cxl_upstream_port.h"
 #include "qemu/cutils.h"
@@ -1561,28 +1562,21 @@ static CXLRetCode cmd_dcd_release_dyn_cap(const struct 
cxl_cmd *cmd,
 return CXL_MBOX_SUCCESS;
 }
 
-#define IMMEDIATE_CONFIG_CHANGE (1 << 1)
-#define IMMEDIATE_DATA_CHANGE (1 << 2)
-#define IMMEDIATE_POLICY_CHANGE (1 << 3)
-#define IMMEDIATE_LOG_CHANGE (1 << 4)
-#define SECURITY_STATE_CHANGE (1 << 5)
-#define BACKGROUND_OPERATION (1 << 6)
-
 static const struct cxl_cmd cxl_cmd_set[256][256] = {
 [EVENTS][GET_RECORDS] = { "EVENTS_GET_RECORDS",
 cmd_events_get_records, 1, 0 },
 [EVENTS][CLEAR_RECORDS] = { "EVENTS_CLEAR_RECORDS",
-cmd_events_clear_records, ~0, IMMEDIATE_LOG_CHANGE },
+cmd_events_clear_records, ~0, CXL_MBOX_IMMEDIATE_LOG_CHANGE },
 [EVENTS][GET_INTERRUPT_POLICY] = { "EVENTS_GET_INTERRUPT_POLICY",
   cmd_events_get_interrupt_policy, 0, 0 },
 [EVENTS][SET_INTERRUPT_POLICY] = { "EVENTS_SET_INTERRUPT_POLICY",
   cmd_events_set_interrupt_policy,
-  ~0, IMMEDIATE_CONFIG_CHANGE },
+  ~0, CXL_MBOX_IMMEDIATE_CONFIG_CHANGE },
 [FIRMWARE_UPDATE][GET_INFO] = { "FIRMWARE_UPDATE_GET_INFO",
 cmd_firmware_update_get_info, 0, 0 },
 [TIMESTAMP][GET] = { "TIMESTAMP_GET", cmd_timestamp_get, 0, 0 },
 [TIMESTAMP][SET] = { "TIMESTAMP_SET", cmd_timestamp_set, 8,
- IMMEDIATE_POLICY_CHANGE },
+ CXL_MBOX_IMMEDIATE_POLICY_CHANGE },
 [LOGS][GET_SUPPORTED] = { "LOGS_GET_SUPPORTED", cmd_logs_get_supported, 0, 
0 },
 [LOGS][GET_LOG] = { "LOGS_GET_LOG", cmd_logs_get_log, 0x18, 0 },
 [IDENTIFY][MEMORY_DEVICE] = { "IDENTIFY_MEMORY_DEVICE",
@@ -1591,9 +1585,11 @@ static const struct cxl_cmd cxl_cmd_set[256][256] = {
 cmd_ccls_get_partition_info, 0, 0 },
 [CCLS][GET_LSA] = { "CCLS_GET_LSA", cmd_ccls_get_lsa, 8, 0 },
 [CCLS][SET_LSA] = { "CCLS_SET_LSA", cmd_ccls_set_lsa,
-~0, IMMEDIATE_CONFIG_CHANGE | IMMEDIATE_DATA_CHANGE },
+~0, CXL_MBOX_IMMEDIATE_CONFIG_CHANGE | CXL_MBOX_IMMEDIATE_DATA_CHANGE 
},
 [SANITIZE][OVERWRITE] = { "SANITIZE_OVERWRITE", cmd_sanitize_overwrite, 0,
-IMMEDIATE_DATA_CHANGE | SECURITY_STATE_CHANGE | BACKGROUND_OPERATION },
+(CXL_MBOX_IMMEDIATE_DATA_CHANGE |
+ CXL_MBOX_SECURITY_STATE_CHANGE |
+ CXL_MBOX_BACKGROUND_OPERATION)},
 [PERSISTENT_MEM][GET_SECURITY_STATE] = { "GET_SECURITY_STATE",
 cmd_get_security_state, 0, 0 },
 [MEDIA_AND_POISON][GET_POISON_LIST] = { "MEDIA_AND_POISON_GET_POISON_LIST",
@@ -1612,10 +1608,10 @@ static const struct cxl_cmd cxl_cmd_set_dcd[256][256] = 
{
 8, 0 },
 [DCD_CONFIG][ADD_DYN_CAP_RSP] = {
 "ADD_DCD_DYNAMIC_CAPACITY_RESPONSE", cmd_dcd_add_dyn_cap_rsp,
-~0, IMMEDIATE_DATA_CHANGE },
+~0, CXL_MBOX_IMMEDIATE_DATA_CHANGE },
 [DCD_CONFIG][RELEASE_DYN_CAP] = {
 "RELEASE_DCD_DYNAMIC_CAPACITY", cmd_dcd_release_dyn_cap,
-~0, IMMEDIATE_DATA_CHANGE },
+~0, CXL_MBOX_IMMEDIATE_DATA_CHANGE },
 };
 
 static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
@@ -1628,7 +1624,7 @@ static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
  */
 [TIMESTAMP][GET] = { "TIMESTAMP_GET", cmd_timestamp_get, 0, 0 },
 [TIMESTAMP][SET] = { "TIMESTAMP_SET", cmd_timestamp_set, 8,
- IMMEDIATE_POLICY_CHANGE },
+ CXL_MBOX_IMMEDIATE_POLICY_CHANGE },
 [LOGS][GET_SUPPORTED] = { "LOGS_GET_SUPPORTED", cmd_logs_get_supported, 0,
   0 },
 [LOGS][GET_LOG] = { "LOGS_GET_LOG", cmd_logs_get_log, 0x18, 0 },
@@ -1670,7 +1666,7 @@ int cxl_process_cci_message(CXLCCI *cci, uint8_t set, 
uint8_t cmd,
 }
 
 /* Only one bg command at a time */
-if ((cxl_cmd->effect & BACKGROUND_OPERATION) &&
+if ((cxl_cmd->effect & CXL_MB

[PATCH v3 2/6] cxl/type3: Cleanup multiple CXL_TYPE3() calls in read/write functions

2023-09-06 Thread Gregory Price
Call CXL_TYPE3 once at top of function to avoid multiple invocations.

Signed-off-by: Gregory Price 
---
 hw/mem/cxl_type3.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index fd9d134d46..80d596ee10 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -1248,17 +1248,18 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev 
*ct3d,
 MemTxResult cxl_type3_read(PCIDevice *d, hwaddr host_addr, uint64_t *data,
unsigned size, MemTxAttrs attrs)
 {
+CXLType3Dev *ct3d = CXL_TYPE3(d);
 uint64_t dpa_offset = 0;
 AddressSpace *as = NULL;
 int res;
 
-res = cxl_type3_hpa_to_as_and_dpa(CXL_TYPE3(d), host_addr, size,
+res = cxl_type3_hpa_to_as_and_dpa(ct3d, host_addr, size,
   , _offset);
 if (res) {
 return MEMTX_ERROR;
 }
 
-if (sanitize_running(_TYPE3(d)->cci)) {
+if (sanitize_running(>cci)) {
 qemu_guest_getrandom_nofail(data, size);
 return MEMTX_OK;
 }
@@ -1268,16 +1269,17 @@ MemTxResult cxl_type3_read(PCIDevice *d, hwaddr 
host_addr, uint64_t *data,
 MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
 unsigned size, MemTxAttrs attrs)
 {
+CXLType3Dev *ct3d = CXL_TYPE3(d);
 uint64_t dpa_offset = 0;
 AddressSpace *as = NULL;
 int res;
 
-res = cxl_type3_hpa_to_as_and_dpa(CXL_TYPE3(d), host_addr, size,
+res = cxl_type3_hpa_to_as_and_dpa(ct3d, host_addr, size,
   , _offset);
 if (res) {
 return MEMTX_ERROR;
 }
-if (sanitize_running(_TYPE3(d)->cci)) {
+if (sanitize_running(>cci)) {
 return MEMTX_OK;
 }
 return address_space_write(as, dpa_offset, attrs, , size);
-- 
2.39.1




[PATCH v3 0/6] CXL: SK hynix Niagara MHSLD Device

2023-09-06 Thread Gregory Price
v3:
- 6 patch series, first 5 are pull-aheads that can be merged separately
- cci: added MHD back into main mailbox, but implemented a callback
   pattern.  type-3 devices leave the callback null by default and
   report unsupported if nothing implements it.
- cleanup and formatting

v2:
- 5 patch series, first 4 are pull-aheads that can be merged separately
- cci: rebased on 8-30 branch from jic23, dropped cci patches
- mailbox: dropped MHD commands, integrated into niagara (for now)
- mailbox: refactor CCI defines to avoid redefinition in niagara
- type3: cleanup duplicate typecasting
- type3: expose ct3 functions so inheriting devices may access them
- type3: add optional mhd validation function for memory access
- niagara: refactor to make niagara inherit type3 and override behavior
- niagara: refactor command definitions and types into header to make
   understanding the device a bit easier for users
- style and formatting

This patch set includes an emulation of the SK hynix Niagara MHSLD 
platform with custom CCI commands that allow for isolation of memory
blocks between attached hosts.

This device allows hosts to request memory blocks directly from the
device, rather than requiring full the DCD command set.  As a matter
of simplicity, this is beneficial to for testing and applications of
dynamic memory pooling on top of the 1.1 and 2.0 specification.

Note that these CCI commands are not servicable without a proper
driver or the kernel allowing raw CXL commands to be passed through
the mailbox driver, so users should enable
`CONFIG_CXL_MEM_RAW_COMMANDS=y` on the kernel of their QEMU instance
if they wish to test it.

Signed-off-by: Gregory Price 


Gregory Price (6):
  cxl/mailbox: move mailbox effect definitions to a header
  cxl/type3: Cleanup multiple CXL_TYPE3() calls in read/write functions
  cxl/type3: Expose ct3 functions so that inheriters can call them
  cxl/type3: add an optional mhd validation function for memory accesses
  cxl/mailbox,type3: Implement MHD get info command callback
  cxl/vendor: SK hynix Niagara Multi-Headed SLD Device

 hw/cxl/Kconfig  |   4 +
 hw/cxl/cxl-mailbox-utils.c  |  51 ++-
 hw/cxl/meson.build  |   2 +
 hw/cxl/vendor/meson.build   |   1 +
 hw/cxl/vendor/skhynix/.gitignore|   1 +
 hw/cxl/vendor/skhynix/init_niagara.c|  99 +
 hw/cxl/vendor/skhynix/meson.build   |   1 +
 hw/cxl/vendor/skhynix/skhynix_niagara.c | 514 
 hw/cxl/vendor/skhynix/skhynix_niagara.h | 161 
 hw/mem/cxl_type3.c  |  32 +-
 include/hw/cxl/cxl_device.h |  13 +
 include/hw/cxl/cxl_mailbox.h|  18 +
 12 files changed, 873 insertions(+), 24 deletions(-)
 create mode 100644 hw/cxl/vendor/meson.build
 create mode 100644 hw/cxl/vendor/skhynix/.gitignore
 create mode 100644 hw/cxl/vendor/skhynix/init_niagara.c
 create mode 100644 hw/cxl/vendor/skhynix/meson.build
 create mode 100644 hw/cxl/vendor/skhynix/skhynix_niagara.c
 create mode 100644 hw/cxl/vendor/skhynix/skhynix_niagara.h
 create mode 100644 include/hw/cxl/cxl_mailbox.h

-- 
2.39.1




Re: [PATCH 5/5] cxl/vendor: SK hynix Niagara Multi-Headed SLD Device

2023-09-06 Thread Gregory Price
On Wed, Sep 06, 2023 at 02:04:45PM +0100, Jonathan Cameron wrote:
> On Thu, 31 Aug 2023 21:29:14 -0400
> Gregory Price  wrote:
> 
> Hi Gregory,
> 
> Some comments inline, but I'm happy to add this to my staging tree in the 
> meantime
> as it stands (might be a few days until I push a new branch though).
> 

I'm going to do one a quick v3 today with the feedback and some cleanup
in spots i noticed.

> > Signed-off-by: Gregory Price 
> > Signed-off-by: Junhee Ryu 
> > Signed-off-by: Kwangjin Ko 
> 
> The SoB chain needs cleaning up.  Is this a co-developed situation?
> If it is use the rules in the kernel documentation as I don't think those
> are yet clearly stated in QEMU docs (and they are confusing so I won't try
> to restate them here).
> 

TL;DR: They gave me the command list, I wrote the model.  We got
approval to release the model, but I wasn't sure how to capture the
copyright/SoB list.  I suppose the copyright covers SKh, but since I
authored the model, it only requires my SoB?

After reading, I'm still not sure how to capture this lol.

Should I just switch the skh folks to Co-developed-by?

> 
> > diff --git a/hw/cxl/vendor/skhynix/meson.build 
> > b/hw/cxl/vendor/skhynix/meson.build
> > new file mode 100644
> > index 00..4e57db65f1
> > --- /dev/null
> > +++ b/hw/cxl/vendor/skhynix/meson.build
> > @@ -0,0 +1 @@
> > +system_ss.add(when: 'CONFIG_CXL_VENDOR', if_true: 
> > files('skhynix_niagara.c',))
> > diff --git a/hw/cxl/vendor/skhynix/skhynix_niagara.c 
> > b/hw/cxl/vendor/skhynix/skhynix_niagara.c
> > new file mode 100644
> > index 00..88e53cc6cc
> > --- /dev/null
> > +++ b/hw/cxl/vendor/skhynix/skhynix_niagara.c
> > @@ -0,0 +1,516 @@
> > +/*
> > + * SPDX-License-Identifier: GPL-2.0-or-later
> > + *
> > + * Copyright (c) 2023 MemVerge Inc.
> > + * Copyright (c) 2023 SK hynix Inc.
> > + */
> > +
> > +#include 
> 
> This will need some osdep.h magic.  There is some there
> already but it will need relaxing (unless you want to run only on sparc ;)
> and we may need to make this device linux host only.
> 
> 

Good point, I had not considered osdep issues.  Do you know of any
examples of linux-only devices I can use to do a quick patch-up? I 
can come back around on this issue later.

> 
> > +
> > +enum {
> > +NIAGARA_MHD = 0x55,
> > +#define GET_MHD_INFO 0x0
> 
> Is this standard as it's in the normal space?
> If it is then I'd like the implementation pushed down to the
> type3 implementation (with some callbacks or similar.)
> 

:thinking_face:

maybe a similar pattern to the callback from before? I suppose I could
push this down into type3 and add an mhd callback in the class and have
niagara fill that in with the callback.

That *feels* right, so i'll go ahead with it.



If I misunderstood anything, let me know

~Gregory



Re: [PATCH 0/5 v2] CXL: SK hynix Niagara MHSLD Device

2023-09-05 Thread Gregory Price
On Tue, Sep 05, 2023 at 11:04:56AM +0200, Philippe Mathieu-Daudé wrote:
> On 1/9/23 03:29, Gregory Price wrote:
> > v2:
> > - 5 patch series, first 4 are pull-aheads that can be merged separately
> > 
> > This patch set includes an emulation of the SK hynix Niagara MHSLD
> > platform with custom CCI commands that allow for isolation of memory
> > blocks between attached hosts.
> > 
> 
> Being at commit 17780edd81 I can't apply this series:
> 
> Applying: cxl/mailbox: move mailbox effect definitions to a header
> error: patch failed: hw/cxl/cxl-mailbox-utils.c:12
> error: hw/cxl/cxl-mailbox-utils.c: patch does not apply
> Patch failed at 0001 cxl/mailbox: move mailbox effect definitions to a
> header
> 
> On what is it based?

This is based on jonathan's branch here:
https://gitlab.com/jic23/qemu/-/tree/cxl-2023-08-30

I try to keep things on top of his working branch to avoid any rebasing
issues.  I should have included that with the cover letter, my bad.

~Gregory



Re: [PATCH 3/5] cxl/type3: Expose ct3 functions so that inheriters can call them

2023-09-05 Thread Gregory Price
On Tue, Sep 05, 2023 at 10:59:15AM +0200, Philippe Mathieu-Daudé wrote:
> Hi Gregory,
> 
> On 1/9/23 03:29, Gregory Price wrote:
> > For devices built on top of ct3, we need the init, realize, and
> > exit functions exposed to correctly start up and tear down.
> 
> You shouldn't need this. Your device class can inherit from
> the CT3 base class by setting its .parent to TYPE_CXL_TYPE3:
> 

realized this after the fact when testing and patching, forgot to remove
it, good eye.  Still learning QEMU internals.

> 
> You can see some documentation about QOM here:
> https://qemu-project.gitlab.io/qemu/devel/qom.html
> But still you'll have to look at examples in the tree.
> 
> 

Thanks!
Gregory



Re: [PATCH 4/5] cxl/type3: add an optional mhd validation function for memory accesses

2023-09-05 Thread Gregory Price
On Mon, Sep 04, 2023 at 06:02:14PM +0100, Jonathan Cameron wrote:
> On Thu, 31 Aug 2023 21:29:13 -0400
> Gregory Price  wrote:
> 
> > When memory accesses are made, some MHSLD's would validate the address
> > is within the scope of allocated sections.  To do this, the base device
> > must call an optional function set by inherited devices.
> > 
> > Signed-off-by: Gregory Price 
> 
> This sort of callback addition can be done via class initialization.
> E.g. get_lsa_size()
> https://elixir.bootlin.com/qemu/latest/source/hw/mem/cxl_type3.c#L1494
> as the callback is the same for all instances of the class which
> in next patch is CXLNiagraClass where you already set the
> PCIClass callbacks in cxl_niagara_class_init()
> 
> You can then use something like:
> CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
> cvc->mhd_access_valid(ct3d, dpa_offset, size);
> 
> Jonathan
> 

Will make this change along with a few cleanups suggested elsewhere and
a few boneheaded mistakes.

~Gregory



[PATCH 4/5] cxl/type3: add an optional mhd validation function for memory accesses

2023-08-31 Thread Gregory Price
When memory accesses are made, some MHSLD's would validate the address
is within the scope of allocated sections.  To do this, the base device
must call an optional function set by inherited devices.

Signed-off-by: Gregory Price 
---
 hw/mem/cxl_type3.c  | 15 +++
 include/hw/cxl/cxl_device.h |  3 +++
 2 files changed, 18 insertions(+)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index a8d4a12f3e..8e1565f2fc 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -1034,6 +1034,10 @@ void ct3_realize(PCIDevice *pci_dev, Error **errp)
 goto err_release_cdat;
 }
 }
+
+/* Devices which inherit ct3d should initialize these after ct3_realize */
+ct3d->mhd_access_valid = NULL;
+
 return;
 
 err_release_cdat:
@@ -1259,6 +1263,11 @@ MemTxResult cxl_type3_read(PCIDevice *d, hwaddr 
host_addr, uint64_t *data,
 return MEMTX_ERROR;
 }
 
+if (ct3d->mhd_access_valid &&
+!ct3d->mhd_access_valid(d, dpa_offset, size)) {
+return MEMTX_ERROR;
+}
+
 if (sanitize_running(>cci)) {
 qemu_guest_getrandom_nofail(data, size);
 return MEMTX_OK;
@@ -1279,6 +1288,12 @@ MemTxResult cxl_type3_write(PCIDevice *d, hwaddr 
host_addr, uint64_t data,
 if (res) {
 return MEMTX_ERROR;
 }
+
+if (ct3d->mhd_access_valid &&
+!ct3d->mhd_access_valid(d, dpa_offset, size)) {
+return MEMTX_ERROR;
+}
+
 if (sanitize_running(>cci)) {
 return MEMTX_OK;
 }
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 4ad38b689c..b1b39a9aa0 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -489,6 +489,9 @@ struct CXLType3Dev {
 uint8_t num_regions; /* 0-8 regions */
 CXLDCDRegion regions[DCD_MAX_REGION_NUM];
 } dc;
+
+/* Multi-headed Device */
+bool (*mhd_access_valid)(PCIDevice *d, uint64_t addr, unsigned int size);
 };
 
 #define TYPE_CXL_TYPE3 "cxl-type3"
-- 
2.39.1




[PATCH 1/5] cxl/mailbox: move mailbox effect definitions to a header

2023-08-31 Thread Gregory Price
Preparation for allowing devices to define their own CCI commands

Signed-off-by: Gregory Price 
---
 hw/cxl/cxl-mailbox-utils.c   | 35 +++
 include/hw/cxl/cxl_mailbox.h | 18 ++
 2 files changed, 37 insertions(+), 16 deletions(-)
 create mode 100644 include/hw/cxl/cxl_mailbox.h

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 4e8651ebe2..edf39a3efb 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -12,6 +12,7 @@
 #include "hw/pci/msix.h"
 #include "hw/cxl/cxl.h"
 #include "hw/cxl/cxl_events.h"
+#include "hw/cxl/cxl_mailbox.h"
 #include "hw/pci/pci.h"
 #include "hw/pci-bridge/cxl_upstream_port.h"
 #include "qemu/cutils.h"
@@ -1561,28 +1562,28 @@ static CXLRetCode cmd_dcd_release_dyn_cap(const struct 
cxl_cmd *cmd,
 return CXL_MBOX_SUCCESS;
 }
 
-#define IMMEDIATE_CONFIG_CHANGE (1 << 1)
-#define IMMEDIATE_DATA_CHANGE (1 << 2)
-#define IMMEDIATE_POLICY_CHANGE (1 << 3)
-#define IMMEDIATE_LOG_CHANGE (1 << 4)
-#define SECURITY_STATE_CHANGE (1 << 5)
-#define BACKGROUND_OPERATION (1 << 6)
+#define CXL_MBOX_IMMEDIATE_CONFIG_CHANGE (1 << 1)
+#define CXL_MBOX_IMMEDIATE_DATA_CHANGE (1 << 2)
+#define CXL_MBOX_IMMEDIATE_POLICY_CHANGE (1 << 3)
+#define CXL_MBOX_IMMEDIATE_LOG_CHANGE (1 << 4)
+#define CXL_MBOX_SECURITY_STATE_CHANGE (1 << 5)
+#define CXL_MBOX_BACKGROUND_OPERATION (1 << 6)
 
 static const struct cxl_cmd cxl_cmd_set[256][256] = {
 [EVENTS][GET_RECORDS] = { "EVENTS_GET_RECORDS",
 cmd_events_get_records, 1, 0 },
 [EVENTS][CLEAR_RECORDS] = { "EVENTS_CLEAR_RECORDS",
-cmd_events_clear_records, ~0, IMMEDIATE_LOG_CHANGE },
+cmd_events_clear_records, ~0, CXL_MBOX_IMMEDIATE_LOG_CHANGE },
 [EVENTS][GET_INTERRUPT_POLICY] = { "EVENTS_GET_INTERRUPT_POLICY",
   cmd_events_get_interrupt_policy, 0, 0 },
 [EVENTS][SET_INTERRUPT_POLICY] = { "EVENTS_SET_INTERRUPT_POLICY",
   cmd_events_set_interrupt_policy,
-  ~0, IMMEDIATE_CONFIG_CHANGE },
+  ~0, CXL_MBOX_IMMEDIATE_CONFIG_CHANGE },
 [FIRMWARE_UPDATE][GET_INFO] = { "FIRMWARE_UPDATE_GET_INFO",
 cmd_firmware_update_get_info, 0, 0 },
 [TIMESTAMP][GET] = { "TIMESTAMP_GET", cmd_timestamp_get, 0, 0 },
 [TIMESTAMP][SET] = { "TIMESTAMP_SET", cmd_timestamp_set, 8,
- IMMEDIATE_POLICY_CHANGE },
+ CXL_MBOX_IMMEDIATE_POLICY_CHANGE },
 [LOGS][GET_SUPPORTED] = { "LOGS_GET_SUPPORTED", cmd_logs_get_supported, 0, 
0 },
 [LOGS][GET_LOG] = { "LOGS_GET_LOG", cmd_logs_get_log, 0x18, 0 },
 [IDENTIFY][MEMORY_DEVICE] = { "IDENTIFY_MEMORY_DEVICE",
@@ -1591,9 +1592,11 @@ static const struct cxl_cmd cxl_cmd_set[256][256] = {
 cmd_ccls_get_partition_info, 0, 0 },
 [CCLS][GET_LSA] = { "CCLS_GET_LSA", cmd_ccls_get_lsa, 8, 0 },
 [CCLS][SET_LSA] = { "CCLS_SET_LSA", cmd_ccls_set_lsa,
-~0, IMMEDIATE_CONFIG_CHANGE | IMMEDIATE_DATA_CHANGE },
+~0, CXL_MBOX_IMMEDIATE_CONFIG_CHANGE | CXL_MBOX_IMMEDIATE_DATA_CHANGE 
},
 [SANITIZE][OVERWRITE] = { "SANITIZE_OVERWRITE", cmd_sanitize_overwrite, 0,
-IMMEDIATE_DATA_CHANGE | SECURITY_STATE_CHANGE | BACKGROUND_OPERATION },
+(CXL_MBOX_IMMEDIATE_DATA_CHANGE |
+ CXL_MBOX_SECURITY_STATE_CHANGE |
+ CXL_MBOX_BACKGROUND_OPERATION)},
 [PERSISTENT_MEM][GET_SECURITY_STATE] = { "GET_SECURITY_STATE",
 cmd_get_security_state, 0, 0 },
 [MEDIA_AND_POISON][GET_POISON_LIST] = { "MEDIA_AND_POISON_GET_POISON_LIST",
@@ -1612,10 +1615,10 @@ static const struct cxl_cmd cxl_cmd_set_dcd[256][256] = 
{
 8, 0 },
 [DCD_CONFIG][ADD_DYN_CAP_RSP] = {
 "ADD_DCD_DYNAMIC_CAPACITY_RESPONSE", cmd_dcd_add_dyn_cap_rsp,
-~0, IMMEDIATE_DATA_CHANGE },
+~0, CXL_MBOX_IMMEDIATE_DATA_CHANGE },
 [DCD_CONFIG][RELEASE_DYN_CAP] = {
 "RELEASE_DCD_DYNAMIC_CAPACITY", cmd_dcd_release_dyn_cap,
-~0, IMMEDIATE_DATA_CHANGE },
+~0, CXL_MBOX_IMMEDIATE_DATA_CHANGE },
 };
 
 static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
@@ -1628,7 +1631,7 @@ static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
  */
 [TIMESTAMP][GET] = { "TIMESTAMP_GET", cmd_timestamp_get, 0, 0 },
 [TIMESTAMP][SET] = { "TIMESTAMP_SET", cmd_timestamp_set, 8,
- IMMEDIATE_POLICY_CHANGE },
+ CXL_MBOX_IMMEDIATE_POLICY_CHANGE },
 [LOGS][GET_SUPPORTED] = { "LOGS_GET_SUPPORTED", cmd_logs_get_supported, 0,
   0 },
 [LOGS]

[PATCH 0/5 v2] CXL: SK hynix Niagara MHSLD Device

2023-08-31 Thread Gregory Price
v2:
- 5 patch series, first 4 are pull-aheads that can be merged separately
- cci: rebased on 8-30 branch from jic23, dropped cci patches
- mailbox: dropped MHD commands, integrated into niagara (for now)
- mailbox: refactor CCI defines to avoid redefinition in niagara
- type3: cleanup duplicate typecasting
- type3: expose ct3 functions so inheriting devices may access them
- type3: add optional mhd validation function for memory access
- niagara: refactor to make niagara inherit type3 and override behavior
- niagara: refactor command definitions and types into header to make
   understanding the device a bit easier for users
- style and formatting

This patch set includes an emulation of the SK hynix Niagara MHSLD
platform with custom CCI commands that allow for isolation of memory
blocks between attached hosts.

This device allows hosts to request memory blocks directly from the device,
rather than requiring full the DCD command set.  As a matter of simplicity,
this is beneficial to for testing and applications of dynamic memory
pooling on top of the 1.1 and 2.0 specification.

Note that these CCI commands are not servicable without a proper driver or
the kernel allowing raw CXL commands to be passed through the mailbox
driver, so users should enable `CONFIG_CXL_MEM_RAW_COMMANDS=y` on the
kernel of their QEMU instance if they wish to test it

Signed-off-by: Gregory Price 

Gregory Price (5):
  cxl/mailbox: move mailbox effect definitions to a header
  cxl/type3: Cleanup multiple CXL_TYPE3() calls in read/write functions
  cxl/type3: Expose ct3 functions so that inheriters can call them
  cxl/type3: add an optional mhd validation function for memory accesses
  cxl/vendor: SK hynix Niagara Multi-Headed SLD Device

 hw/cxl/Kconfig  |   4 +
 hw/cxl/cxl-mailbox-utils.c  |  35 +-
 hw/cxl/meson.build  |   2 +
 hw/cxl/vendor/meson.build   |   1 +
 hw/cxl/vendor/skhynix/.gitignore|   1 +
 hw/cxl/vendor/skhynix/init_niagara.c|  99 +
 hw/cxl/vendor/skhynix/meson.build   |   1 +
 hw/cxl/vendor/skhynix/skhynix_niagara.c | 516 
 hw/cxl/vendor/skhynix/skhynix_niagara.h | 169 
 hw/mem/cxl_type3.c  |  33 +-
 include/hw/cxl/cxl_device.h |   8 +
 include/hw/cxl/cxl_mailbox.h|  18 +
 12 files changed, 863 insertions(+), 24 deletions(-)
 create mode 100644 hw/cxl/vendor/meson.build
 create mode 100644 hw/cxl/vendor/skhynix/.gitignore
 create mode 100644 hw/cxl/vendor/skhynix/init_niagara.c
 create mode 100644 hw/cxl/vendor/skhynix/meson.build
 create mode 100644 hw/cxl/vendor/skhynix/skhynix_niagara.c
 create mode 100644 hw/cxl/vendor/skhynix/skhynix_niagara.h
 create mode 100644 include/hw/cxl/cxl_mailbox.h

-- 
2.39.1




[PATCH 2/5] cxl/type3: Cleanup multiple CXL_TYPE3() calls in read/write functions

2023-08-31 Thread Gregory Price
Call CXL_TYPE3 once at top of function to avoid multiple invocations.

Signed-off-by: Gregory Price 
---
 hw/mem/cxl_type3.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index fd9d134d46..80d596ee10 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -1248,17 +1248,18 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev 
*ct3d,
 MemTxResult cxl_type3_read(PCIDevice *d, hwaddr host_addr, uint64_t *data,
unsigned size, MemTxAttrs attrs)
 {
+CXLType3Dev *ct3d = CXL_TYPE3(d);
 uint64_t dpa_offset = 0;
 AddressSpace *as = NULL;
 int res;
 
-res = cxl_type3_hpa_to_as_and_dpa(CXL_TYPE3(d), host_addr, size,
+res = cxl_type3_hpa_to_as_and_dpa(ct3d, host_addr, size,
   , _offset);
 if (res) {
 return MEMTX_ERROR;
 }
 
-if (sanitize_running(_TYPE3(d)->cci)) {
+if (sanitize_running(>cci)) {
 qemu_guest_getrandom_nofail(data, size);
 return MEMTX_OK;
 }
@@ -1268,16 +1269,17 @@ MemTxResult cxl_type3_read(PCIDevice *d, hwaddr 
host_addr, uint64_t *data,
 MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
 unsigned size, MemTxAttrs attrs)
 {
+CXLType3Dev *ct3d = CXL_TYPE3(d);
 uint64_t dpa_offset = 0;
 AddressSpace *as = NULL;
 int res;
 
-res = cxl_type3_hpa_to_as_and_dpa(CXL_TYPE3(d), host_addr, size,
+res = cxl_type3_hpa_to_as_and_dpa(ct3d, host_addr, size,
   , _offset);
 if (res) {
 return MEMTX_ERROR;
 }
-if (sanitize_running(_TYPE3(d)->cci)) {
+if (sanitize_running(>cci)) {
 return MEMTX_OK;
 }
 return address_space_write(as, dpa_offset, attrs, , size);
-- 
2.39.1




[PATCH 3/5] cxl/type3: Expose ct3 functions so that inheriters can call them

2023-08-31 Thread Gregory Price
For devices built on top of ct3, we need the init, realize, and
exit functions exposed to correctly start up and tear down.

Signed-off-by: Gregory Price 
---
 hw/mem/cxl_type3.c  | 8 
 include/hw/cxl/cxl_device.h | 5 +
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 80d596ee10..a8d4a12f3e 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -950,7 +950,7 @@ static DOEProtocol doe_spdm_prot[] = {
 { }
 };
 
-static void ct3_realize(PCIDevice *pci_dev, Error **errp)
+void ct3_realize(PCIDevice *pci_dev, Error **errp)
 {
 CXLType3Dev *ct3d = CXL_TYPE3(pci_dev);
 CXLComponentState *cxl_cstate = >cxl_cstate;
@@ -1054,7 +1054,7 @@ err_address_space_free:
 return;
 }
 
-static void ct3_exit(PCIDevice *pci_dev)
+void ct3_exit(PCIDevice *pci_dev)
 {
 CXLType3Dev *ct3d = CXL_TYPE3(pci_dev);
 CXLComponentState *cxl_cstate = >cxl_cstate;
@@ -1285,7 +1285,7 @@ MemTxResult cxl_type3_write(PCIDevice *d, hwaddr 
host_addr, uint64_t data,
 return address_space_write(as, dpa_offset, attrs, , size);
 }
 
-static void ct3d_reset(DeviceState *dev)
+void ct3d_reset(DeviceState *dev)
 {
 CXLType3Dev *ct3d = CXL_TYPE3(dev);
 uint32_t *reg_state = ct3d->cxl_cstate.crb.cache_mem_registers;
@@ -2081,7 +2081,7 @@ void qmp_cxl_release_dynamic_capacity(const char *path,
  errp);
 }
 
-static void ct3_class_init(ObjectClass *oc, void *data)
+void ct3_class_init(ObjectClass *oc, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(oc);
 PCIDeviceClass *pc = PCI_DEVICE_CLASS(oc);
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index e824c5ade8..4ad38b689c 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -524,6 +524,11 @@ MemTxResult cxl_type3_read(PCIDevice *d, hwaddr host_addr, 
uint64_t *data,
 MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
 unsigned size, MemTxAttrs attrs);
 
+void ct3_realize(PCIDevice *pci_dev, Error **errp);
+void ct3_exit(PCIDevice *pci_dev);
+void ct3d_reset(DeviceState *d);
+void ct3_class_init(ObjectClass *oc, void *data);
+
 uint64_t cxl_device_get_timestamp(CXLDeviceState *cxlds);
 
 void cxl_event_init(CXLDeviceState *cxlds, int start_msg_num);
-- 
2.39.1




[PATCH 5/5] cxl/vendor: SK hynix Niagara Multi-Headed SLD Device

2023-08-31 Thread Gregory Price
Create a new device to emulate the SK hynix Niagara MHSLD platform.

This device has custom CCI commands that allow for applying isolation
to each memory block between hosts. This enables an early form of
dynamic capacity, whereby the NUMA node maps the entire region, but
the host is responsible for asking the device which memory blocks
are allocated to it, and therefore may be onlined.

To instantiate:

-device 
cxl-skh-niagara,cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0,sn=6,mhd-head=0,mhd-shmid=0

The linux kernel will require raw CXL commands enabled to allow for
passing through of Niagara CXL commands via the CCI mailbox.

The Niagara MH-SLD has a shared memory region that must be initialized
using the 'init_niagara' tool located in the vendor subdirectory

usage: init_niagara
heads : number of heads on the device
sections  : number of sections
section_size  : size of a section in 128mb increments
shmid : shmid produced by ipcmk

Example:
$shmid1=ipcmk -M 131072
./init_niagara 4 32 1 $shmid1

Signed-off-by: Gregory Price 
Signed-off-by: Junhee Ryu 
Signed-off-by: Kwangjin Ko 
---
 hw/cxl/Kconfig  |   4 +
 hw/cxl/meson.build  |   2 +
 hw/cxl/vendor/meson.build   |   1 +
 hw/cxl/vendor/skhynix/.gitignore|   1 +
 hw/cxl/vendor/skhynix/init_niagara.c|  99 +
 hw/cxl/vendor/skhynix/meson.build   |   1 +
 hw/cxl/vendor/skhynix/skhynix_niagara.c | 516 
 hw/cxl/vendor/skhynix/skhynix_niagara.h | 169 
 8 files changed, 793 insertions(+)
 create mode 100644 hw/cxl/vendor/meson.build
 create mode 100644 hw/cxl/vendor/skhynix/.gitignore
 create mode 100644 hw/cxl/vendor/skhynix/init_niagara.c
 create mode 100644 hw/cxl/vendor/skhynix/meson.build
 create mode 100644 hw/cxl/vendor/skhynix/skhynix_niagara.c
 create mode 100644 hw/cxl/vendor/skhynix/skhynix_niagara.h

diff --git a/hw/cxl/Kconfig b/hw/cxl/Kconfig
index c9b2e46bac..dd6c54b54d 100644
--- a/hw/cxl/Kconfig
+++ b/hw/cxl/Kconfig
@@ -2,5 +2,9 @@ config CXL
 bool
 default y if PCI_EXPRESS
 
+config CXL_VENDOR
+bool
+default y
+
 config I2C_MCTP_CXL
 bool
diff --git a/hw/cxl/meson.build b/hw/cxl/meson.build
index 1393821fc4..e8c8c1355a 100644
--- a/hw/cxl/meson.build
+++ b/hw/cxl/meson.build
@@ -15,3 +15,5 @@ system_ss.add(when: 'CONFIG_CXL',
 system_ss.add(when: 'CONFIG_I2C_MCTP_CXL', if_true: files('i2c_mctp_cxl.c'))
 
 system_ss.add(when: 'CONFIG_ALL', if_true: files('cxl-host-stubs.c'))
+
+subdir('vendor')
diff --git a/hw/cxl/vendor/meson.build b/hw/cxl/vendor/meson.build
new file mode 100644
index 00..12db8991f1
--- /dev/null
+++ b/hw/cxl/vendor/meson.build
@@ -0,0 +1 @@
+subdir('skhynix')
diff --git a/hw/cxl/vendor/skhynix/.gitignore b/hw/cxl/vendor/skhynix/.gitignore
new file mode 100644
index 00..6d96de38ea
--- /dev/null
+++ b/hw/cxl/vendor/skhynix/.gitignore
@@ -0,0 +1 @@
+init_niagara
diff --git a/hw/cxl/vendor/skhynix/init_niagara.c 
b/hw/cxl/vendor/skhynix/init_niagara.c
new file mode 100644
index 00..2c189dc33c
--- /dev/null
+++ b/hw/cxl/vendor/skhynix/init_niagara.c
@@ -0,0 +1,99 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Copyright (c) 2023 MemVerge Inc.
+ * Copyright (c) 2023 SK hynix Inc.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct niagara_state {
+uint8_t nr_heads;
+uint8_t nr_lds;
+uint8_t ldmap[65536];
+uint32_t total_sections;
+uint32_t free_sections;
+uint32_t section_size;
+uint32_t sections[];
+};
+
+int main(int argc, char *argv[])
+{
+int shmid = 0;
+uint32_t sections = 0;
+uint32_t section_size = 0;
+uint32_t heads = 0;
+struct niagara_state *niagara_state = NULL;
+size_t state_size;
+uint8_t i;
+
+if (argc != 5) {
+printf("usage: init_niagara
\n"
+"\theads : number of heads on the device\n"
+"\tsections  : number of sections\n"
+"\tsection_size  : size of a section in 128mb increments\n"
+"\tshmid : /tmp/mytoken.tmp\n\n"
+"It is recommended your shared memory region is at least 
128kb\n");
+return -1;
+}
+
+/* must have at least 1 head */
+heads = (uint32_t)atoi(argv[1]);
+if (heads == 0 || heads > 32) {
+printf("bad heads argument (1-32)\n");
+return -1;
+}
+
+/* Get number of sections */
+sections = (uint32_t)atoi(argv[2]);
+if (sections == 0) {
+printf("bad sections argument\n");
+return -1;
+}
+
+section_size = (uint32_t)atoi(argv[3]);
+if (sections == 0) {
+printf("bad section size argument\n");
+return -1;
+}
+
+shmid = (uint32_t)atoi

Re: [PATCH 3/4] cxl/type3: minimum MHD cci support

2023-08-31 Thread Gregory Price
On Mon, Aug 07, 2023 at 03:56:09PM +0100, Jonathan Cameron wrote:
> On Fri, 21 Jul 2023 12:35:08 -0400
> Gregory Price  wrote:
> 
> > Implement the MHD GET_INFO cci command and add a shared memory
> > region to the type3 device to host the information.
> > 
> > Add a helper program to initialize this shared memory region.
> > 
> > Add a function pointer to type3 devices for future work that
> > will allow an mhd device to provide a hook to validate whether
> > a memory access is valid or not.
> > 
> > For now, limit the number of LD's to the number of heads. Later,
> > this limitation will need to be lifted for MH-MLDs.
> > 
> > Intended use case:
> > 
> > 1. Create the shared memory region
> > 2. Format the shared memory region
> > 3. Launch QEMU with `is_mhd=true,mhd_head=N,mhd_shmid=$shmid`
> 
> I'd definitely like some feedback from experienced QEMU folk on
> how to model this sort of cross QEMU instance sharing.
> 
> My instinct is a socket would be more flexible as it lets us
> potentially emulate the machines on multiple hosts (assuming they
> can see some shared backend storage).
> 

I'd considered a socket, but after mulling it over, I realized the
operations I was trying to implement amounted to an atomic compare
and swap.  I wanted to avoid locks and serialization if at all
possible to avoid introducing that into the hot-path of memory
accesses.  Maybe there is a known pattern for this kind of
multi-instance QEMU coherency stuff, but that seemed the simplest path.

This obviously has the downside of limiting emulation of MHD system to
a local machine, but the use of common files/memory to back CXL memory
already does that (i think).

¯\_(ツ)_/¯ feedback welcome.  I'll leave it for now unless there are
strong opinions.

> else not needed if we returned in previous leg.
> 
>   if (ct3d->is_mhd && ...
> 
> > +   (!ct3d->mhd_shmid || (ct3d->mhd_head == ~(0 {
> 
> How does mhd_head equal that? Default is 0. I'm not sure there is a reason
> to require it.
> 

Good catch, the default should be ~(0).  Updated in the field
initialization section.

> > +/* For now, limit the number of heads to the number of LD's (SLD) */
> 
> Feels backwards.  Number of heads never going to be bigger than numbre of
> LDs.  Other way around is possible of course.
> 
> > +if (ct3d->mhd_state->nr_heads <= ct3d->mhd_head) {
> 
> mhd head needs to be out of range?  Confused.
> 

Woops, yes this should be mhd_head >= nr_heads. Wording is backwards and
so is the code.  fixed.

> > +if (ct3d->mhd_state) {
> > +shmdt(ct3d->mhd_state);
> > +}
> 
> Reverse order of realize - so I think this wants to be earlier.
> 
> >  }

Just to be clear you *don't* want the reverse order, correct?  If so
i'll swap it.

> >  
> > +if (ct3d->is_mhd && ct3d->mhd_access_valid) {
> > +if (!ct3d->mhd_access_valid(ct3d, dpa_offset, size))
> > +return MEMTX_ERROR;
> 
> Brackets for inner block.
> Arguably could just add the ct3d->is_mhd check in the place where
> mhd_access_valid is set and hence only need to check that here.
> Maybe that makes it slightly harder to follow though.
> 
> Also could just unset is_mhd if mhd_access_valid not present..
>

I'm going to change the next patch to fully inherit CXLType3Dev into
Niagara as you suggested.  I will have to see what falls out when i do
that, but my feeling was being more explicit when checking pointers is
better.  If you prefer to to drop the is_mhd check i'm ok with that
simply because mhd_access_valid should always be NULL when is_mhd is
false.  Either way the result is the same.

~Gregory



Re: [Qemu PATCH v2 5/9] hw/mem/cxl_type3: Add host backend and address space handling for DC regions

2023-08-04 Thread Gregory Price
On Fri, Aug 04, 2023 at 05:36:23PM +0100, Jonathan Cameron wrote:
> On Tue, 25 Jul 2023 18:39:56 +
> Fan Ni  wrote:
> 
> > From: Fan Ni 
> > 
> > Add (file/memory backed) host backend, all the dynamic capacity regions
> > will share a single, large enough host backend. Set up address space for
> > DC regions to support read/write operations to dynamic capacity for DCD.
> > 
> > With the change, following supports are added:
> > 1. add a new property to type3 device "nonvolatile-dc-memdev" to point to 
> > host
> >memory backend for dynamic capacity;
> > 2. add namespace for dynamic capacity for read/write support;
> > 3. create cdat entries for each dynamic capacity region;
> > 4. fix dvsec range registers to include DC regions.
> > 
> > Signed-off-by: Fan Ni 
> Hi Fan,
> 
> I'm not sure if we want to do all regions backed by one memory backend
> or one backend each.  It will become complex when some are shared
> (e.g. what Gregory is working on).

I thought about this briefly when i implemented the original volatile
support due to the potential for partitioning. We landed on, iirc, 
2 backends (1 for volatile, 1 for non-volatile).

The reality, though, is the driver (presently) does not have a good way
to create more than 1 dax per memdev, and in practice with real devices
we see that this just tends to be the case: 1 dax per device.  So unless
that's going to change, ever having more than 1 backend will just be
unused complexity.

To me, this is a good example of "maybe piling everything into the core
ct3d is going to get ugly fast".  Maybe it would be better to do
something similar to the CCI interface and allow for overriding the
other functions as well.

just a thought.  I apologize for not engaging with the DCD patch set,
conferences have been keeping me busier than expected.  I plan on
putting it through the grinder this month.

> 
> A few questions inline.  In particular there are subtle changes to
> existing handling that are either bug fixes (in which case they need
> to be sent first) or bugs / have no effect and shouldn't be in here.
> 
> 
> > ---
> >  hw/cxl/cxl-mailbox-utils.c  |  19 +++-
> >  hw/mem/cxl_type3.c  | 203 +---
> >  include/hw/cxl/cxl_device.h |   4 +
> >  3 files changed, 185 insertions(+), 41 deletions(-)
> > 



Re: [PATCH 2/4] cxl/mailbox: interface to add CCI commands to an existing CCI

2023-08-04 Thread Gregory Price
On Fri, Aug 04, 2023 at 04:14:14PM +0100, Jonathan Cameron wrote:
> On Fri, 21 Jul 2023 12:35:06 -0400
> Gregory Price  wrote:
> 
> > This enables wrapper devices to customize the base device's CCI
> > (for example, with custom commands outside the specification)
> > without the need to change the base device.
> > 
> > The also enabled the base device to dispatch those commands without
> > requiring additional driver support.
> > 
> > Signed-off-by: Gregory Price 
> > ---
> >  hw/cxl/cxl-mailbox-utils.c  | 19 +++
> >  include/hw/cxl/cxl_device.h |  2 ++
> >  2 files changed, 21 insertions(+)
> > 
> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > index ddee3f1718..cad0cd0adb 100644
> > --- a/hw/cxl/cxl-mailbox-utils.c
> > +++ b/hw/cxl/cxl-mailbox-utils.c
> > @@ -1383,6 +1383,25 @@ static void cxl_copy_cci_commands(CXLCCI *cci, const 
> > struct cxl_cmd (*cxl_cmds)[
> >  }
> >  }
> >  
> > +void cxl_add_cci_commands(CXLCCI *cci, const struct cxl_cmd 
> > (*cxl_cmd_set)[256], size_t payload_max)
> > +{
> > +cci->payload_max = payload_max > cci->payload_max ? payload_max : 
> > cci->payload_max;
> > +for (int set = 0; set < 256; set++) {
> > +for (int cmd = 0; cmd < 256; cmd++) {
> > +if (cxl_cmd_set[set][cmd].handler) {
> > +const struct cxl_cmd *c = _cmd_set[set][cmd];
> > +cci->cxl_cmd_set[set][cmd] = *c;
> Don't interleave definitions and code.
> 
> > +struct cel_log *log =
> > +>cel_log[cci->cel_size];
> > +
> > +log->opcode = (set << 8) | cmd;
> > +log->effect = c->effect;
> > +cci->cel_size++;
> 
> So my gut feeling on this is based on the large amount of overlapping code.  
> I might queue it
> as it stands, but I'd like to see this refactored.
> 
> 1) Single copy routine used in all places that copie in any new entries to 
> cci->cxl_cmd_set[][]
> 2) A single cel_log builder function to be called in normal path and after an 
> update. Just rebuild
> the whole thing rather than trying to append to it I think.
> 
> Something like (so far untested but I'll poke it with Fan's code in a few 
> mins)
> 
> However this is all proving rather costly in space so maybe we need a better
> representation for the sparse nature of cxl comamnds - a job for another day.

I'd certainly considered the issue of space, but it seemed better to
blow up the size in one commit and then come back around and change the
structure out from under the work this unblocks.  What we save in space
we sacrifice in complexity, but the structure seems simple enough that a
change shouldn't take a ton of scrutiny to get right.

One downside of the approach here is what happens when there's an
overlap and custom devices build up over time.  As in - if i steal the
0xFF command group for my custom emulated MHMLD DCD Everything Super Device,
what happens if the spec finally comes around to defining 0xFF as a real
command set?

tl;dr: Should the copy function error on overlap detections?

Quick read-back through the spec, I don't see explicit carve-outs for
reserved command regions for custom sets, might be worth a discussion.

For now it shouldn't be an issue.

> 
> 
> From 8ab48adfb2b481be0702b84a0d172a4f142b0df6 Mon Sep 17 00:00:00 2001
> From: Gregory Price 
> Date: Fri, 21 Jul 2023 12:35:06 -0400
> Subject: [PATCH] cxl/mailbox: interface to add CCI commands to an existing CCI
> 
> This enables wrapper devices to customize the base device's CCI
> (for example, with custom commands outside the specification)
> without the need to change the base device.
> 
> The also enabled the base device to dispatch those commands without
> requiring additional driver support.
> 
> Signed-off-by: Gregory Price 
> Link: 
> https://lore.kernel.org/r/20230721163505.1910-3-gregory.pr...@memverge.com
> Signed-off-by: Jonathan Cameron 
> 
> --
> Heavily edited by Jonathan Cameron to increase code reuse
> ---
>  include/hw/cxl/cxl_device.h |  2 ++
>  hw/cxl/cxl-mailbox-utils.c  | 21 +++--
>  2 files changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index 0c9254dff9..2a3050fbad 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -297,6 +297,8 @@ void cxl_initialize_mailbox_t3(CXLCCI *cci, DeviceState 
> *d, size_t payload_max);
>  void cxl_initialize_mailbox_swcci(CXLCCI *cci, DeviceState *intf,
>   

[PATCH 4/4] cxl/vendor: SK hynix Niagara Multi-Headed SLD Device

2023-07-21 Thread Gregory Price
Create a new device to emulate the SK hynix Niagara MHSLD platform.

This device has custom CCI commands that allow for applying isolation
to each memory block between hosts. This enables an early form of
dynamic capacity, whereby the NUMA node maps the entire region, but
the host is responsible for asking the device which memory blocks
are allocated to it, and therefore may be onlined.

To instantiate, wrap a cxl-type3 mhd in a cxl-skh-niagara like so:

-device 
cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0,sn=6,is_mhd=true,mhd_head=0,mhd_shmid=15
-device cxl-skh-niagara,target=cxl-mem0

The linux kernel will require raw CXL commands enabled to allow for
passing through of Niagara CXL commands via the CCI mailbox.

The Niagara MH-SLD has a slightly different shared memory region
than the base MHD, so an additional tool ('init_niagara') is located
in the vendor subdirectory.  Utilize this in place of cxl_mhd_init.

usage: init_niagara
heads : number of heads on the device
sections  : number of sections
section_size  : size of a section in 128mb increments
shmid : shmid produced by ipcmk

Example:
$shmid1=ipcmk -M 131072
./init_niagara 4 32 1 $shmid1

Signed-off-by: Gregory Price 
Signed-off-by: Junhee Ryu 
Signed-off-by: Kwangjin Ko 
---
 hw/cxl/Kconfig  |   4 +
 hw/cxl/meson.build  |   2 +
 hw/cxl/vendor/meson.build   |   1 +
 hw/cxl/vendor/skhynix/.gitignore|   1 +
 hw/cxl/vendor/skhynix/init_niagara.c|  99 +
 hw/cxl/vendor/skhynix/meson.build   |   1 +
 hw/cxl/vendor/skhynix/skhynix_niagara.c | 521 
 7 files changed, 629 insertions(+)
 create mode 100644 hw/cxl/vendor/meson.build
 create mode 100644 hw/cxl/vendor/skhynix/.gitignore
 create mode 100644 hw/cxl/vendor/skhynix/init_niagara.c
 create mode 100644 hw/cxl/vendor/skhynix/meson.build
 create mode 100644 hw/cxl/vendor/skhynix/skhynix_niagara.c

diff --git a/hw/cxl/Kconfig b/hw/cxl/Kconfig
index c9b2e46bac..dd6c54b54d 100644
--- a/hw/cxl/Kconfig
+++ b/hw/cxl/Kconfig
@@ -2,5 +2,9 @@ config CXL
 bool
 default y if PCI_EXPRESS
 
+config CXL_VENDOR
+bool
+default y
+
 config I2C_MCTP_CXL
 bool
diff --git a/hw/cxl/meson.build b/hw/cxl/meson.build
index 1393821fc4..e8c8c1355a 100644
--- a/hw/cxl/meson.build
+++ b/hw/cxl/meson.build
@@ -15,3 +15,5 @@ system_ss.add(when: 'CONFIG_CXL',
 system_ss.add(when: 'CONFIG_I2C_MCTP_CXL', if_true: files('i2c_mctp_cxl.c'))
 
 system_ss.add(when: 'CONFIG_ALL', if_true: files('cxl-host-stubs.c'))
+
+subdir('vendor')
diff --git a/hw/cxl/vendor/meson.build b/hw/cxl/vendor/meson.build
new file mode 100644
index 00..12db8991f1
--- /dev/null
+++ b/hw/cxl/vendor/meson.build
@@ -0,0 +1 @@
+subdir('skhynix')
diff --git a/hw/cxl/vendor/skhynix/.gitignore b/hw/cxl/vendor/skhynix/.gitignore
new file mode 100644
index 00..6d96de38ea
--- /dev/null
+++ b/hw/cxl/vendor/skhynix/.gitignore
@@ -0,0 +1 @@
+init_niagara
diff --git a/hw/cxl/vendor/skhynix/init_niagara.c 
b/hw/cxl/vendor/skhynix/init_niagara.c
new file mode 100644
index 00..28612339e0
--- /dev/null
+++ b/hw/cxl/vendor/skhynix/init_niagara.c
@@ -0,0 +1,99 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Copyright (c) 2023 MemVerge Inc.
+ * Copyright (c) 2023 SK hynix Inc.
+ *
+ * Reference list:
+ * From www.computeexpresslink.org
+ * Compute Express Link (CXL) Specification revision 3.0 Version 1.0
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct niagara_state {
+uint8_t nr_heads;
+uint8_t nr_lds;
+uint8_t ldmap[65536];
+uint32_t total_sections;
+uint32_t free_sections;
+uint32_t section_size;
+uint32_t sections[];
+};
+
+int main(int argc, char *argv[]) {
+int shmid = 0;
+uint32_t sections = 0;
+uint32_t section_size = 0;
+uint32_t heads = 0;
+struct niagara_state* niagara_state = 0;
+uint8_t i;
+
+if (argc != 5) {
+printf("usage: init_niagara
\n"
+"\theads : number of heads on the device\n"
+"\tsections  : number of sections\n"
+"\tsection_size  : size of a section in 128mb increments\n"
+"\tshmid : /tmp/mytoken.tmp\n\n"
+"It is recommended your shared memory region is at least 
128kb\n");
+return -1;
+}
+
+// must have at least 1 head
+heads = (uint32_t)atoi(argv[1]);
+if (heads == 0 || heads > 32) {
+printf("bad heads argument (1-32)\n");
+return -1;
+}
+
+// Get number of sections
+sections = (uint32_t)atoi(argv[2]);
+if (sections == 0) {
+printf("bad sections argument\n");
+return -1;
+}
+
+section_size = (uint32_t)atoi(argv[3]);
+if (

[PATCH 3/4] cxl/type3: minimum MHD cci support

2023-07-21 Thread Gregory Price
Implement the MHD GET_INFO cci command and add a shared memory
region to the type3 device to host the information.

Add a helper program to initialize this shared memory region.

Add a function pointer to type3 devices for future work that
will allow an mhd device to provide a hook to validate whether
a memory access is valid or not.

For now, limit the number of LD's to the number of heads. Later,
this limitation will need to be lifted for MH-MLDs.

Intended use case:

1. Create the shared memory region
2. Format the shared memory region
3. Launch QEMU with `is_mhd=true,mhd_head=N,mhd_shmid=$shmid`

shmid=`ipcmk -M 4096 | grep -o -E '[0-9]+' | head -1`
cxl_mhd_init 4 $shmid
qemu-system-x86_64 \
  -nographic \
  -accel kvm \
  -drive file=./mhd.qcow2,format=qcow2,index=0,media=disk,id=hd \
  -m 4G,slots=4,maxmem=8G \
  -smp 4 \
  -machine type=q35,cxl=on,hmat=on \
  -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
  -device cxl-rp,id=rp0,bus=cxl.0,chassis=0,port=0,slot=0 \
  -object memory-backend-file,id=mem0,mem-path=/tmp/mem0,size=4G,share=true \
  -device 
cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0,sn=6,is_mhd=true,mhd_head=0,mhd_shmid=$shmid
 \
  -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=4G

Signed-off-by: Gregory Price 
---
 hw/cxl/cxl-mailbox-utils.c  | 53 +
 hw/mem/cxl_type3.c  | 67 +
 include/hw/cxl/cxl_device.h | 14 
 tools/cxl/cxl_mhd_init.c| 63 ++
 tools/cxl/meson.build   |  3 ++
 tools/meson.build   |  1 +
 6 files changed, 201 insertions(+)
 create mode 100644 tools/cxl/cxl_mhd_init.c
 create mode 100644 tools/cxl/meson.build

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index cad0cd0adb..57b8da4376 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -84,6 +84,8 @@ enum {
 #define GET_PHYSICAL_PORT_STATE 0x1
 TUNNEL = 0x53,
 #define MANAGEMENT_COMMAND 0x0
+MHD = 0x55,
+#define GET_MHD_INFO 0x0
 };
 
 /* CCI Message Format CXL r3.0 Figure 7-19 */
@@ -1155,6 +1157,56 @@ static CXLRetCode cmd_media_clear_poison(const struct 
cxl_cmd *cmd,
 return CXL_MBOX_SUCCESS;
 }
 
+static CXLRetCode cmd_mhd_get_info(const struct cxl_cmd *cmd,
+   uint8_t *payload_in,
+   size_t len_in,
+   uint8_t *payload_out,
+   size_t *len_out,
+   CXLCCI *cci)
+{
+CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
+struct {
+uint8_t start_ld;
+uint8_t ldmap_len;
+} QEMU_PACKED *input = (void *)payload_in;
+
+struct {
+uint8_t nr_lds;
+uint8_t nr_heads;
+uint16_t resv1;
+uint8_t start_ld;
+uint8_t ldmap_len;
+uint16_t resv2;
+uint8_t ldmap[];
+} QEMU_PACKED *output = (void *)payload_out;
+
+uint8_t start_ld = input->start_ld;
+uint8_t ldmap_len = input->ldmap_len;
+uint8_t i;
+
+if (!ct3d->is_mhd) {
+return CXL_MBOX_UNSUPPORTED;
+}
+
+if (start_ld >= ct3d->mhd_state->nr_lds) {
+return CXL_MBOX_INVALID_INPUT;
+}
+
+output->nr_lds = ct3d->mhd_state->nr_lds;
+output->nr_heads = ct3d->mhd_state->nr_heads;
+output->resv1 = 0;
+output->start_ld = start_ld;
+output->resv2 = 0;
+
+for (i = 0; i < ldmap_len && (start_ld + i) < output->nr_lds; i++) {
+output->ldmap[i] = ct3d->mhd_state->ldmap[start_ld + i];
+}
+output->ldmap_len = i;
+
+*len_out = sizeof(*output) + output->ldmap_len;
+return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_DATA_CHANGE (1 << 2)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -1195,6 +1247,7 @@ static const struct cxl_cmd cxl_cmd_set[256][256] = {
 cmd_media_inject_poison, 8, 0 },
 [MEDIA_AND_POISON][CLEAR_POISON] = { "MEDIA_AND_POISON_CLEAR_POISON",
 cmd_media_clear_poison, 72, 0 },
+[MHD][GET_MHD_INFO] = {"GET_MULTI_HEADED_INFO", cmd_mhd_get_info, 2, 0},
 };
 
 static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index efb7dece80..c8eb3aa67d 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -18,6 +18,7 @@
 #include "hw/cxl/cxl.h"
 #include "hw/pci/msix.h"
 #include "hw/pci/spdm.h"
+#include 
 
 #define DWORD_BYTE 4
 
@@ -794,6 +795,48 @@ static DOEProtocol doe_spdm_prot[] = {
 { }
 };
 
+static bool cxl_setup_mhd(CXLType3Dev *ct3d, Error **errp)
+{
+if (!ct3d->is_mhd) {
+ct3d->mhd_access_valid = NULL;
+return true;
+} else if (ct3d->is_mhd &&
+   (!ct3d->mhd_shmid || (ct3d->mhd_head == ~(0 {
+error

[PATCH 1/4] cxl/mailbox: change CCI cmd set structure to be a member, not a refernce

2023-07-21 Thread Gregory Price
This allows devices to have fully customized CCIs, along with complex
devices where wrapper devices can override or add additional CCI
commands without having to replicate full command structures or
pollute a base device with every command that might ever be used.

Signed-off-by: Gregory Price 
---
 hw/cxl/cxl-mailbox-utils.c  | 18 ++
 include/hw/cxl/cxl_device.h |  2 +-
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 2819914e8d..ddee3f1718 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -1373,9 +1373,19 @@ void cxl_init_cci(CXLCCI *cci, size_t payload_max)
  bg_timercb, cci);
 }
 
+static void cxl_copy_cci_commands(CXLCCI *cci, const struct cxl_cmd 
(*cxl_cmds)[256]) {
+for (int set = 0; set < 256; set++) {
+for (int cmd = 0; cmd < 256; cmd++) {
+if (cxl_cmds[set][cmd].handler) {
+cci->cxl_cmd_set[set][cmd] = cxl_cmds[set][cmd];
+}
+}
+}
+}
+
 void cxl_initialize_mailbox_swcci(CXLCCI *cci, DeviceState *intf, DeviceState 
*d, size_t payload_max)
 {
-cci->cxl_cmd_set = cxl_cmd_set_sw;
+cxl_copy_cci_commands(cci, cxl_cmd_set_sw);
 cci->d = d;
 cci->intf = intf;
 cxl_init_cci(cci, payload_max);
@@ -1383,7 +1393,7 @@ void cxl_initialize_mailbox_swcci(CXLCCI *cci, 
DeviceState *intf, DeviceState *d
 
 void cxl_initialize_mailbox_t3(CXLCCI *cci, DeviceState *d, size_t payload_max)
 {
-cci->cxl_cmd_set = cxl_cmd_set;
+cxl_copy_cci_commands(cci, cxl_cmd_set);
 cci->d = d;
  
 /* No separation for PCI MB as protocol handled in PCI device */
@@ -1398,7 +1408,7 @@ static const struct cxl_cmd cxl_cmd_set_t3_mctp[256][256] 
= {
 void cxl_initialize_t3_mctpcci(CXLCCI *cci, DeviceState *d, DeviceState *intf,
size_t payload_max)
 {
-cci->cxl_cmd_set = cxl_cmd_set_t3_mctp;
+cxl_copy_cci_commands(cci, cxl_cmd_set_t3_mctp);
 cci->d = d;
 cci->intf = intf;
 cxl_init_cci(cci, payload_max);
@@ -1414,7 +1424,7 @@ static const struct cxl_cmd 
cxl_cmd_set_usp_mctp[256][256] = {
 
 void cxl_initialize_usp_mctpcci(CXLCCI *cci, DeviceState *d, DeviceState 
*intf, size_t payload_max)
 {
-cci->cxl_cmd_set = cxl_cmd_set_usp_mctp;
+cxl_copy_cci_commands(cci, cxl_cmd_set_usp_mctp);
 cci->d = d;
 cci->intf = intf;
 cxl_init_cci(cci, payload_max);
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index c68981b618..9a3c8b2dfa 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -163,7 +163,7 @@ typedef struct CXLEventLog {
 } CXLEventLog;
 
 typedef struct CXLCCI {
-const struct cxl_cmd (*cxl_cmd_set)[256];
+struct cxl_cmd cxl_cmd_set[256][256];
 struct cel_log {
 uint16_t opcode;
 uint16_t effect;
-- 
2.39.1




[PATCH 2/4] cxl/mailbox: interface to add CCI commands to an existing CCI

2023-07-21 Thread Gregory Price
This enables wrapper devices to customize the base device's CCI
(for example, with custom commands outside the specification)
without the need to change the base device.

The also enabled the base device to dispatch those commands without
requiring additional driver support.

Signed-off-by: Gregory Price 
---
 hw/cxl/cxl-mailbox-utils.c  | 19 +++
 include/hw/cxl/cxl_device.h |  2 ++
 2 files changed, 21 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index ddee3f1718..cad0cd0adb 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -1383,6 +1383,25 @@ static void cxl_copy_cci_commands(CXLCCI *cci, const 
struct cxl_cmd (*cxl_cmds)[
 }
 }
 
+void cxl_add_cci_commands(CXLCCI *cci, const struct cxl_cmd 
(*cxl_cmd_set)[256], size_t payload_max)
+{
+cci->payload_max = payload_max > cci->payload_max ? payload_max : 
cci->payload_max;
+for (int set = 0; set < 256; set++) {
+for (int cmd = 0; cmd < 256; cmd++) {
+if (cxl_cmd_set[set][cmd].handler) {
+const struct cxl_cmd *c = _cmd_set[set][cmd];
+cci->cxl_cmd_set[set][cmd] = *c;
+struct cel_log *log =
+>cel_log[cci->cel_size];
+
+log->opcode = (set << 8) | cmd;
+log->effect = c->effect;
+cci->cel_size++;
+}
+}
+}
+}
+
 void cxl_initialize_mailbox_swcci(CXLCCI *cci, DeviceState *intf, DeviceState 
*d, size_t payload_max)
 {
 cxl_copy_cci_commands(cci, cxl_cmd_set_sw);
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 9a3c8b2dfa..abc8405cc5 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -297,6 +297,8 @@ void cxl_initialize_mailbox_t3(CXLCCI *cci, DeviceState *d, 
size_t payload_max);
 void cxl_initialize_mailbox_swcci(CXLCCI *cci, DeviceState *intf,
   DeviceState *d, size_t payload_max);
 void cxl_init_cci(CXLCCI *cci, size_t payload_max);
+void cxl_add_cci_commands(CXLCCI *cci, const struct cxl_cmd 
(*cxl_cmd_set)[256],
+  size_t payload_max);
 int cxl_process_cci_message(CXLCCI *cci, uint8_t set, uint8_t cmd,
 size_t len_in, uint8_t *pl_in,
 size_t *len_out, uint8_t *pl_out,
-- 
2.39.1




[PATCH 0/4] CXL: SK hynix Niagara MHSLD Device

2023-07-21 Thread Gregory Price
Base repo: https://gitlab.com/jic23/qemu
Base branch: cxl-2023-07-17

This patch set includes an emulation of the SK hynix Niagara MHSLD
platform with custom CCI commands that allow for isolation of memory
blocks between attached hosts.

There are 4 total patches in this set:
1 & 2: Modifications to the CCI interface to allow the addition of
   custom CCI commands to an existing CCI command set.
3: Minimum MHD cci support for a type-3 device
4: The SK hynix Niagara Device

The changes to the core device and cci interface are very limited,
and this provides a clean way for developers to add custom CCI commands
to a device while retaining the possiblity to upstream the model.

This device allows hosts to request memory blocks directly from the
device, rather than requiring full the DCD command set.  As a matter of
simplicity, this is beneficial to for testing and applications of 
dynamic memory pooling on top of the 1.1 and 2.0 specification.

Note that these CCI commands are not servicable without the kernel
allowing raw CXL commands to be passed through the mailbox driver,
so users should enable `CONFIG_CXL_MEM_RAW_COMMANDS=y` on the kernel
of their QEMU instance.

Signed-off-by: Gregory Price 

Gregory Price (4):
  cxl/mailbox: change CCI cmd set structure to be a member, not a
refernce
  cxl/mailbox: interface to add CCI commands to an existing CCI
  cxl/type3: minimum MHD cci support
  cxl/vendor: SK hynix Niagara Multi-Headed SLD Device

 hw/cxl/Kconfig  |   4 +
 hw/cxl/cxl-mailbox-utils.c  |  90 +++-
 hw/cxl/meson.build  |   2 +
 hw/cxl/vendor/meson.build   |   1 +
 hw/cxl/vendor/skhynix/.gitignore|   1 +
 hw/cxl/vendor/skhynix/init_niagara.c|  99 +
 hw/cxl/vendor/skhynix/meson.build   |   1 +
 hw/cxl/vendor/skhynix/skhynix_niagara.c | 521 
 hw/mem/cxl_type3.c  |  67 +++
 include/hw/cxl/cxl_device.h |  18 +-
 tools/cxl/cxl_mhd_init.c|  63 +++
 tools/cxl/meson.build   |   3 +
 tools/meson.build   |   1 +
 13 files changed, 866 insertions(+), 5 deletions(-)
 create mode 100644 hw/cxl/vendor/meson.build
 create mode 100644 hw/cxl/vendor/skhynix/.gitignore
 create mode 100644 hw/cxl/vendor/skhynix/init_niagara.c
 create mode 100644 hw/cxl/vendor/skhynix/meson.build
 create mode 100644 hw/cxl/vendor/skhynix/skhynix_niagara.c
 create mode 100644 tools/cxl/cxl_mhd_init.c
 create mode 100644 tools/cxl/meson.build

-- 
2.39.1




[RFC 2/2] cxl/mailbox: interface to add CCI commands to an existing CCI

2023-07-20 Thread Gregory Price
This enables wrapper devices to customize the base device's CCI
(for example, with custom commands outside the specification)
without the need to change the base device.

The also enabled the base device to dispatch those commands without
requiring additional driver support.

Signed-off-by: Gregory Price 

---
 hw/cxl/cxl-mailbox-utils.c |  19 +++
 include/hw/cxl/cxl_device.h|   2 ++
 2 files changed, 21 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index ddee3f1718..cad0cd0adb 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -1383,6 +1383,25 @@ static void cxl_copy_cci_commands(CXLCCI *cci, const 
struct cxl_cmd (*cxl_cmds)[
 }
 }
 
+void cxl_add_cci_commands(CXLCCI *cci, const struct cxl_cmd 
(*cxl_cmd_set)[256], size_t payload_max)
+{
+cci->payload_max = payload_max > cci->payload_max ? payload_max : 
cci->payload_max;
+for (int set = 0; set < 256; set++) {
+for (int cmd = 0; cmd < 256; cmd++) {
+if (cxl_cmd_set[set][cmd].handler) {
+const struct cxl_cmd *c = _cmd_set[set][cmd];
+cci->cxl_cmd_set[set][cmd] = *c;
+struct cel_log *log =
+>cel_log[cci->cel_size];
+
+log->opcode = (set << 8) | cmd;
+log->effect = c->effect;
+cci->cel_size++;
+}
+}
+}
+}
+
 void cxl_initialize_mailbox_swcci(CXLCCI *cci, DeviceState *intf, DeviceState 
*d, size_t payload_max)
 {
 cxl_copy_cci_commands(cci, cxl_cmd_set_sw);

diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 9a3c8b2dfa..abc8405cc5 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -297,6 +297,8 @@ void cxl_initialize_mailbox_t3(CXLCCI *cci, DeviceState *d, 
size_t payload_max);
 void cxl_initialize_mailbox_swcci(CXLCCI *cci, DeviceState *intf,
   DeviceState *d, size_t payload_max);
 void cxl_init_cci(CXLCCI *cci, size_t payload_max);
+void cxl_add_cci_commands(CXLCCI *cci, const struct cxl_cmd 
(*cxl_cmd_set)[256],
+  size_t payload_max);
 int cxl_process_cci_message(CXLCCI *cci, uint8_t set, uint8_t cmd,
 size_t len_in, uint8_t *pl_in,
 size_t *len_out, uint8_t *pl_out,
-- 
2.39.1




[RFC 1/2] cxl/mailbox: change CCI cmd set structure to be a member, not a refernce

2023-07-20 Thread Gregory Price
This allows devices to have fully customized CCIs, along with complex
devices where wrapper devices can override or add additional CCI
commands without having to replicate full command structures or
pollute a base device with every command that might ever be used.

Signed-off-by: Gregory Price 

---
 hw/cxl/cxl-mailbox-utils.c |  18 ++
 include/hw/cxl/cxl_device.h|   2 +-
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 2819914e8d..ddee3f1718 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -1373,9 +1373,19 @@ void cxl_init_cci(CXLCCI *cci, size_t payload_max)
  bg_timercb, cci);
 }
 
+static void cxl_copy_cci_commands(CXLCCI *cci, const struct cxl_cmd 
(*cxl_cmds)[256]) {
+for (int set = 0; set < 256; set++) {
+for (int cmd = 0; cmd < 256; cmd++) {
+if (cxl_cmds[set][cmd].handler) {
+cci->cxl_cmd_set[set][cmd] = cxl_cmds[set][cmd];
+}
+}
+}
+}
+
 void cxl_initialize_mailbox_swcci(CXLCCI *cci, DeviceState *intf, DeviceState 
*d, size_t payload_max)
 {
-cci->cxl_cmd_set = cxl_cmd_set_sw;
+cxl_copy_cci_commands(cci, cxl_cmd_set_sw);
 cci->d = d;
 cci->intf = intf;
 cxl_init_cci(cci, payload_max);
@@ -1383,7 +1393,7 @@ void cxl_initialize_mailbox_swcci(CXLCCI *cci, 
DeviceState *intf, DeviceState *d
 
 void cxl_initialize_mailbox_t3(CXLCCI *cci, DeviceState *d, size_t payload_max)
 {
-cci->cxl_cmd_set = cxl_cmd_set;
+cxl_copy_cci_commands(cci, cxl_cmd_set);
 cci->d = d;
  
 /* No separation for PCI MB as protocol handled in PCI device */
@@ -1398,7 +1408,7 @@ static const struct cxl_cmd cxl_cmd_set_t3_mctp[256][256] 
= {
 void cxl_initialize_t3_mctpcci(CXLCCI *cci, DeviceState *d, DeviceState *intf,
size_t payload_max)
 {
-cci->cxl_cmd_set = cxl_cmd_set_t3_mctp;
+cxl_copy_cci_commands(cci, cxl_cmd_set_t3_mctp);
 cci->d = d;
 cci->intf = intf;
 cxl_init_cci(cci, payload_max);
@@ -1414,7 +1424,7 @@ static const struct cxl_cmd 
cxl_cmd_set_usp_mctp[256][256] = {
 
 void cxl_initialize_usp_mctpcci(CXLCCI *cci, DeviceState *d, DeviceState 
*intf, size_t payload_max)
 {
-cci->cxl_cmd_set = cxl_cmd_set_usp_mctp;
+cxl_copy_cci_commands(cci, cxl_cmd_set_usp_mctp);
 cci->d = d;
 cci->intf = intf;
 cxl_init_cci(cci, payload_max);

diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index c68981b618..9a3c8b2dfa 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -163,7 +163,7 @@ typedef struct CXLEventLog {
 } CXLEventLog;
 
 typedef struct CXLCCI {
-const struct cxl_cmd (*cxl_cmd_set)[256];
+struct cxl_cmd cxl_cmd_set[256][256];
 struct cel_log {
 uint16_t opcode;
 uint16_t effect;
-- 
2.39.1




[RFC 0/2] Modify CCI cmd sets to be mutable

2023-07-20 Thread Gregory Price
This is on top of the proposed CCI changes by Jonathan.

Base repo: https://gitlab.com/jic23/qemu
Base branch: cxl-2023-07-17


A proposal to make the CCI cmd sets full members of their device,
and copy the const struct entries instead of referencing them.
This would allow child devices to inherit the parent device default
behavior, but with the flexibility to override without changing the
core device.

This also enables the base device to receive the commands via the
same /dev/cxl/ device, simplifying testing.


An example of how one might override/add commands (paraphrased):


instantiating:

-device cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0
-device cxl-my-cmds,target=cxl-mem0


simple code:

static const struct cxl_cmd cxl_cmd_set_my_cmds[256][256] = {
[MY_CMDS][GET_INFO] = { "SPECIAL_GET_INFO",
  cmd_my_cmds_get_info, 0, 0 },}

static void cxl_my_cmds_realize(DeviceState *d, Error **errp)
{
CXL_MyCmds_State *s = CXL_MyCmds(d);

if (object_dynamic_cast(OBJECT(s->target), TYPE_CXL_TYPE3)) {
CXLType3Dev *ct3d = CXL_TYPE3(s->target);

s->type = cxl_type3;
s->cci = >cci;

cxl_add_cci_commands(>cci, cxl_cmd_set_my_cmds, 512);
return;
}
error_setg(errp, "Unhandled target type for CXL MHDSLD");
}

#define TYPE_CXL_Niagara "cxl-my-cmds"
static const TypeInfo cxl_my_cmds_info = {
.name = TYPE_CXL_MyCmds,
.parent = TYPE_CXL_TYPE3,
.class_size = sizeof(struct CXL_MyCmdsClass),
.class_init = cxl_my_cmds_class_init,
.instance_size = sizeof(CXL_MyCmds_State),
.interfaces = (InterfaceInfo[]) {
{ INTERFACE_CXL_DEVICE },
{ INTERFACE_PCIE_DEVICE },
    {}
},
};

Signed-off-by: Gregory Price 

---
Gregory Price (2):
  cxl/mailbox: change CCI cmd set structure to be a member, not a
refernce
  cxl/mailbox: interface to add CCI commands to an existing CCI

 hw/cxl/cxl-mailbox-utils.c  | 37 +
 include/hw/cxl/cxl_device.h |  4 +++-
 2 files changed, 36 insertions(+), 5 deletions(-)

-- 
2.39.1




Re: [RFC PATCH 10/17] misc/i2c_mctp_cxl: Initial device emulation

2023-07-20 Thread Gregory Price
On Thu, Jul 20, 2023 at 01:18:33PM +0100, Jonathan Cameron wrote:
> On Wed, 19 Jul 2023 14:49:07 -0400
> Gregory Price  wrote:
> 
> > 
> > Maybe a dangerous suggestion.  Right now the CCI's are static:
> > 
> > static const struct cxl_cmd cxl_cmd_set[256][256]
> 
> That's defined by the ID space for the commands.  There can't be more than
> that many currently..
> 

Meant commands beyond what is defined in the spec, not beyond the 256
limits.

> > 
> > how difficult might it be to allow these tables to be dynamic instead?
> > Then we could add an interface like this:
> > 
> > void cxl_add_cmd_set(CXLCCI *cci, CXLCCI *cmd_set, payload_max) {
> > copy(cci, cmd_set);
> > }
> > 
> > This would enable not just adding sub-components piece-meal, but also if
> > someone wants to model a real device with custom CCI commands, they can
> > simply define a CCI set and pass it in via
> > 
> > cxl_add_cmd_set(>cci, my_cmd_set, payload_max);
> 
> Ok.  I'm potentially fine with people adding an interface for this, but
> only if they plan to also upstream the QEMU emulation of their actual
> device.
> 

Working on it :]

Hoping to show off a fully functional MHSLD with some custom commands
soon, and I think I'm happy with the abstraction that fell out on top of
this CCI work.  Previously it required duplicating the type3 device or
hacking directly on it, which is a maintenance nightmare / not
upstreamable.

> 
> I'd look to just inherit from a cxl type 3, like Ira did in the PoC for
> type 2 support.   We can then easily add a path to replace the commands
> set with whatever anyone wants.  I'm not sure we want the command line
> to be used to configure such a device as it'll both get very complex and
> prove increasingly hard to test more than a small subset of options.
> 
> https://lore.kernel.org/all/20230517-rfc-type2-dev-v1-0-6eb2e4709...@intel.com/
> 
> Jonathan
> 

I made an attempt at this at first, but due to the custom commands, i
think everyone would (rightly) scoff at the idea of adding
non-specification defined stuff into the core type 3 device.  Once I
shifted to modifying the CCI and overriding entries, the entire vendor
device ended up as mostly the CCI command definitions, which is exactly
how i envisioned doing it in the first place.

I'll post some additional patches to my MHD RFC, the changes were pretty
minor.

Hopefully will be able to tack on a concrete MHSLD following that..

~Gregory



Re: [RFC PATCH 10/17] misc/i2c_mctp_cxl: Initial device emulation

2023-07-19 Thread Gregory Price
On Wed, Jul 19, 2023 at 09:19:47AM +0100, Jonathan Cameron wrote:
> On Tue, 18 Jul 2023 17:30:57 -0400
> Gregory Price  wrote:
> 
> > On Mon, Jul 17, 2023 at 06:16:39PM +0100, Jonathan Cameron wrote:
> > > @@ -397,8 +401,9 @@ struct CXLType3Dev {
> > >  AddressSpace hostpmem_as;
> > >  CXLComponentState cxl_cstate;
> > >  CXLDeviceState cxl_dstate;
> > > -CXLCCI cci;
> > > -
> > > +CXLCCI cci; /* Primary PCI mailbox CCI */
> > > +CXLCCI oob_mctp_cci; /* Initialized only if targetted */
> > > +  
> > 
> > I've been humming and hawing over this on the MHD stuff because I wanted
> > to figure out how to "add a CCI command" to a type-3 device without
> > either having a billion definitions for CCI command sets - or doing
> > something like this.
> > 
> > I don't hate this design pattern, I just want to ask whether your
> > intent is to end up with CXLType3Dev hosting many CXLCCI's based on what
> > wrapper types you have. 
> > 
> > Example: a type-3 device with mctp pass through and the MHD command set
> > 
> > CXLType3Dev {
> > ...
> > CXLCCI cci;
> > CXLCCI oob_mctp_cci;
> > CXLCCI mhd_cci;
> > ...
> > }
> 
> Yes - that's what I was thinking.  In some cases a CCI may be accessed by
> tunneling on a different CCI on the same device as well as the option
> of tunneling to different devices.
> 
> So far the set that we'll end up with isn't too large. And if some aren't
> used for a given instantiation that's fine if it keeps the code simple.
> We may end up with other MCTP buses and to keep things consistent each one
> will need it's own target CXLCCI. If we need to rethink and make it dynamic
> to some degree we can look at it later.
> 

Maybe a dangerous suggestion.  Right now the CCI's are static:

static const struct cxl_cmd cxl_cmd_set[256][256]

how difficult might it be to allow these tables to be dynamic instead?
Then we could add an interface like this:

void cxl_add_cmd_set(CXLCCI *cci, CXLCCI *cmd_set, payload_max) {
copy(cci, cmd_set);
}

This would enable not just adding sub-components piece-meal, but also if
someone wants to model a real device with custom CCI commands, they can
simply define a CCI set and pass it in via

cxl_add_cmd_set(>cci, my_cmd_set, payload_max);

Which lets the existing /dev/cxl/memN device dispatch those commands,
and makes modeling real devices an easier endeavor.

Only downside is that this may require changing the command structure to
include a callback type and pointer per cci function. The upside is this
would also allow commands to be written somewhat agnostic to the device
they're being inherited by and allow for device nesting like...

-device cxl-type3, id=ct3d
-device cxl-mhd, target=ct3d
-device my_vendor_cxl_type3, target=ct3d
etc etc

otherwise we're probably going to end up with a cxl-type3 -device line
300 characters long.

Maybe that's over-generalizing a bit much n.n;

~Gregory



[RFC] cxl/type3: minimum MHD cci support

2023-07-19 Thread Gregory Price
Implement the MHD GET_INFO cci command and add a shared memory
region to the type3 device to host the information.

Add a helper program to initialize this shared memory region.

For now, limit the number of LD's to the number of heads. Later,
this limitation will need to be lifted for MH-MLDs.

Intended use case:

1. Create the shared memory region
2. Format the shared memory region
3. Launch QEMU with `is_mhd=true,mhd_head=N,mhd_shmid=$shmid`

shmid=`ipcmk -M 4096 | grep -o -E '[0-9]+' | head -1`
cxl_mhd_init 4 $shmid
qemu-system-x86_64 \
  -nographic \
  -accel kvm \
  -drive file=./mhd.qcow2,format=qcow2,index=0,media=disk,id=hd \
  -m 4G,slots=4,maxmem=8G \
  -smp 4 \
  -machine type=q35,cxl=on,hmat=on \
  -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
  -device cxl-rp,id=rp0,bus=cxl.0,chassis=0,port=0,slot=0 \
  -object memory-backend-file,id=mem0,mem-path=/tmp/mem0,size=4G,share=true \
  -device 
cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0,sn=6,is_mhd=true,mhd_head=0,mhd_shmid=$shmid
 \
  -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=4G




Comments:

base repo: https://gitlab.com/jic23/qemu
base branch: cxl-2023-07-17

Originally I wanted to add this as a separate CCI, but I realized this
wouldn't work as intended because that would require a separate pci/cxl
device in /dev/ to tunnel messages though.  This is not how this will
present on real devices, so I went away from that.

Next I wanted to simply *dynamically* add the command to the existing
CCI in the type3 device, but these are statically defined in
cxl-mailbox.

I settled for simply adding the cci command to the type 3 device by
default, and checking for whether `is_mhd` is set in the command.

Ultimately, for MHD, they are likely to have a bunch of vendor specific
commands associated with them *and* a bunch of vendor specific state. It
would be nice to able to have something like "cci_add_command_set()" to
the cci-mailbox, and an interface to override certain type3 functions
such as read/write (but this is an exercise for a later patch set).

---
 hw/cxl/cxl-mailbox-utils.c  | 53 +++
 hw/mem/cxl_type3.c  | 50 +
 include/hw/cxl/cxl_device.h | 12 +++
 tools/cxl/cxl_mhd_init.c| 63 +
 tools/cxl/meson.build   |  3 ++
 tools/meson.build   |  1 +
 6 files changed, 182 insertions(+)
 create mode 100644 tools/cxl/cxl_mhd_init.c
 create mode 100644 tools/cxl/meson.build

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 2819914e8d..9ef4d7f5e0 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -84,6 +84,8 @@ enum {
 #define GET_PHYSICAL_PORT_STATE 0x1
 TUNNEL = 0x53,
 #define MANAGEMENT_COMMAND 0x0
+MHD = 0x55,
+#define GET_MHD_INFO 0x0
 };
 
 /* CCI Message Format CXL r3.0 Figure 7-19 */
@@ -1155,6 +1157,56 @@ static CXLRetCode cmd_media_clear_poison(const struct 
cxl_cmd *cmd,
 return CXL_MBOX_SUCCESS;
 }
 
+static CXLRetCode cmd_mhd_get_info(const struct cxl_cmd *cmd,
+   uint8_t *payload_in,
+   size_t len_in,
+   uint8_t *payload_out,
+   size_t *len_out,
+   CXLCCI *cci)
+{
+CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
+struct {
+uint8_t start_ld;
+uint8_t ldmap_len;
+} QEMU_PACKED *input = (void *)payload_in;
+
+struct {
+uint8_t nr_lds;
+uint8_t nr_heads;
+uint16_t resv1;
+uint8_t start_ld;
+uint8_t ldmap_len;
+uint16_t resv2;
+uint8_t ldmap[];
+} QEMU_PACKED *output = (void *)payload_out;
+
+uint8_t start_ld = input->start_ld;
+uint8_t ldmap_len = input->ldmap_len;
+uint8_t i;
+
+if (!ct3d->is_mhd) {
+return CXL_MBOX_UNSUPPORTED;
+}
+
+if (start_ld >= ct3d->mhd_state->nr_lds) {
+return CXL_MBOX_INVALID_INPUT;
+}
+
+output->nr_lds = ct3d->mhd_state->nr_lds;
+output->nr_heads = ct3d->mhd_state->nr_heads;
+output->resv1 = 0;
+output->start_ld = start_ld;
+output->resv2 = 0;
+
+for (i = 0; i < ldmap_len && (start_ld + i) < output->nr_lds; i++) {
+output->ldmap[i] = ct3d->mhd_state->ldmap[start_ld + i];
+}
+output->ldmap_len = i;
+
+*len_out = sizeof(*output) + output->ldmap_len;
+return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_DATA_CHANGE (1 << 2)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -1195,6 +1247,7 @@ static const struct cxl_cmd cxl_cmd_set[256][256] = {
 cmd_media_inject_poison, 8, 0 },
 [MEDIA_AND_POISON][CLEAR_POISON] = { "MEDIA_AND_POISON_CLEAR_POISON",
 cmd_media_clear_poison, 72, 0 },
+[MHD][GET_MHD_INFO] = {"GET_MULTI_HEADED_INFO", cmd_mhd_get_info, 2, 0},
 };
 
 static const struct cxl_cmd 

Re: [RFC PATCH 10/17] misc/i2c_mctp_cxl: Initial device emulation

2023-07-18 Thread Gregory Price
On Mon, Jul 17, 2023 at 06:16:39PM +0100, Jonathan Cameron wrote:
> @@ -397,8 +401,9 @@ struct CXLType3Dev {
>  AddressSpace hostpmem_as;
>  CXLComponentState cxl_cstate;
>  CXLDeviceState cxl_dstate;
> -CXLCCI cci;
> -
> +CXLCCI cci; /* Primary PCI mailbox CCI */
> +CXLCCI oob_mctp_cci; /* Initialized only if targetted */
> +

I've been humming and hawing over this on the MHD stuff because I wanted
to figure out how to "add a CCI command" to a type-3 device without
either having a billion definitions for CCI command sets - or doing
something like this.

I don't hate this design pattern, I just want to ask whether your
intent is to end up with CXLType3Dev hosting many CXLCCI's based on what
wrapper types you have. 

Example: a type-3 device with mctp pass through and the MHD command set

CXLType3Dev {
...
CXLCCI cci;
CXLCCI oob_mctp_cci;
CXLCCI mhd_cci;
...
}

Instantiate:
-device cxl-type3,bus=swport0,memdev=cxl-mem1,id=cxl-pmem1,lsa=cxl-lsa1,sn=3 
-device i2c_mctp_cxl,bus=aspeed.i2c.bus.0,address=5,target=cxl-pmem1
-device cxl-mhd,target=cxl-pmem1,...whatever else...

where the MHD code is contained within its own type/file, and the type3
device hosts the CCI for it.  Similar to how you've implemented the MTCP
stuff here.

The reason I ask is because certain CCI's don't necessarily get
associated with "a bus" so much as "a device".  the MHD example - it's
still part of "the device", but it's optional.   So does it make sense
to create this wrapper without a bus association, or to just pile it on
top CXLType3Dev and have to duplicate the code across any other
multi-headed devices that folks may conjur up?

~Gregory



Re: A confusion about CXL in arm virt machine

2023-06-16 Thread Gregory Price
On Fri, Jun 16, 2023 at 03:43:31PM +0800, Yuquan Wang wrote:
> Hi, Gregory
> 
> There is one confusion about CXL in QEMU I hope to consult. 
> If you have some time to look at this email, I would have better 
> understanding of CXL 
> emulation in QEMU.
> 
> On docs/system/devices/cxl.rst ,  Gregory wrote:
> A very simple setup with just one directly attached CXL Type 3 Volatile 
> Memory device::
> qemu-system-aarch64 -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 -cpu 
> max \
> ..
> 
> As the current master branch of QEMU has not yet complemented the CXL 
> option/expansion 
> in arm virt machine, how this example command lines worked? Or here used 
> another branch 
> rather than master?
> 
> Many thanks
> Yuquan

As of today, the qemu/qemu.git master branch does have the required
patch for volatile region support: adacc814f541af9281c922e750d8ba4b90c1a73e

however, the last time i tested it on x86, the master branch was
incapable of enabling these regions with the latest kernel (6.3.x)
despite that kernel having sufficient support to do so.  I have not dug
into what the discrepency between master and johnathan's working branch
are just yet.

Last I tested cxl-2023-05-25 branch of Johnathan's fork is working on x86: 

https://gitlab.com/jic23/qemu/-/tree/cxl-2023-05-25

I have not worked with the ARM machine, but Johnathan may be able to
comment on the state of ARM support for this code.

~Gregory



Re: [RFC] cxl: Multi-headed device design

2023-05-29 Thread Gregory Price
On Wed, May 17, 2023 at 03:18:59PM +0100, Jonathan Cameron wrote:
> > 
> > i.e. an SLD does not require an FM-Owned LD for management, but an MHD,
> > MLD, and DCD all do (at least in theory).
> 
> DCD 'might' though I don't think anything in the spec rules that you 'must'
> control the SLD/MLD via the FM-API, it's just a spec provided option.
> From our point of view we don't want to get more creative so lets assume
> it does.
> 
> I can't immediately think of reason for a single head SLD to have an FM owned
> LD, though it may well have an MCTP CCI for querying stuff about it from an 
> FM.
> 

Before I go running off into the woods, it seems like it would be simple
enough to simply make an FM-LD "device" which simply links a mhXXX device
and implements its own Mailbox CCI.

Maybe not "realistic", but to my mind this appears as a separate
character device in /dev/cxl/*. Maybe the realism here doesn't matter,
since we're just implementing for the sake of testing.  This is just a
straightforward way to pipe a DCD request into the device and trigger
DCD event log entries.

As commented early, this is done as a QEMU fed event.  If that's
sufficient, a hack like this feels like it would be at least mildly
cleaner and easier to test against.


Example: consider a user wanting to issue a DCD command to add capacity.

Real world: this would be some out of band communication, and eventually
this results in a DCD command to the device that results in a
capacity-event showing up in the log. Maybe it happens over TCP and
drills down to a Redfish event that talks to the BMC that issues a
command over etc etc MTCP emulations, etc.

With a simplistic /dev/cxl/memX-fmld device a user can simply issue these
commands without all that, and the effect is the same.

On the QEMU side you get something like:

-device cxl-type3,volatile-memdev=mem0,bus=rp0,mhd-head=0,id=mem0,mhd-main=true
-device cxl-mhsld,type3=mem0,bus=rp0,headid=0,id=mhsld1,shmid=X
-device cxl-fmld,mhsld=mdsld1,bus=rp1,id=mem0-fmld,shmid=Y

on the Linux side you get:
/dev/cxl/mem0
/dev/cxl/mem0-fmld

in this example, the shmid for mhsld is a shared memory region created
with mkipc that implements the shared state (basically section bitmap
tracking and the actual plumbing for DCD, etc). This limits the emulation
of the mhsld to a single host for now, but that seems sufficient.

The shmid for cxl-fmld implements any shared state for the fmld,
including a mutex, that allows all hosts attached to the mhsld to have
access to the fmld.  This may or may not be realistic, but it would
allow all head-attached hosts to send DCD commands over its own local
fabric, ratehr than going out of band.

This gets us to the point where, at a minimum, each host can issue its
own DCD commands to add capacity to itself.  That's step 1.

Step 2 is allow Host A to issue a DCD command to add capacity to Host B.

I suppose this could be done via a backgruond thread that waits on a
message to show up in the shared memory region?

Being somewhat unfamiliar with QEMU, is it kosher to start background
threads that just wait on events like this, or is that generally frowed
upon?  If done this way, it would stimplify the creation and startup
sequence at least.

~Gregory



Re: [RFC] cxl: Multi-headed device design

2023-05-16 Thread Gregory Price
On Mon, May 15, 2023 at 05:18:07PM +0100, Jonathan Cameron wrote:
> On Tue, 21 Mar 2023 21:50:33 -0400
> Gregory Price  wrote:
> 
> > 
> > Ambiguity #1:
> > 
> > * An SLD contains 1 Logical Device.
> > * An MH-SLD presents multiple SLDs, one per head.
> > 
> > Ergo an MH-SLD contains multiple LDs which makes it an MLD according to the
> > definition of LD, but not according to the definition of MLD, or MH-MLD.
> 
> I'd go with 'sort of'.  SLD is a presentation of a device to a host.
> It can be a normal single headed MLD that has been plugged directly into a 
> host.
> 
> So for extra fun points you can have one MH-MLD that has some ports connected
> to switches and other directly to hosts. Thus it can present as SLD on some
> upstream ports and as MLD on others.
>

I suppose this section of the email was really to just point out that
what constitutions a "multi-headed", "logical", and "multi-logical"
device is rather confusing from just reading the spec.  Since writing
this, i've kind of settled on:

MH-* - anything with multiple heads, regardless of how it works
SLD - one LD per head, but LD does not imply any particular command set
MLD - multiple LD's per head, but those LD's may only attach to one head
DCD - anything can technically be a DCD if it implements the commands

Trying to figure out, from the spec, "what commands an MH-SLD" should
implement to be "Spec Compliance" was my frustration.  It's somewhat
clear now that the answer is "Technically nothing... unless its an MLD".

> > I want to make very close note of this.  SLD's are managed like SLDs
> > SLDs, MLDs are managed like MLDs.  MH-SLDs, according to this, should be
> > managed like SLDs from the perspective of each host.
> 
> True, but an MH-MLD device connected directly to a host will also 
> be managed (at some level anyway) as an SLD on that particular port.
>

The ambiguous part is... what commands relate specifically to an SLD?
The spec isn't really written that way, and the answer is that an SLD is
more of a lack of other functionality (specifically MLD functionality),
rather than its own set of functionality.

i.e. an SLD does not require an FM-Owned LD for management, but an MHD,
MLD, and DCD all do (at least in theory).

> > 
> > 2.5.1 continues on to describe "LD Management in MH-MLDs" but just ignores
> > that MH-SLDs (may) exist.  That's frustrating to say the least, but I
> > suppose we can gather from context that MH-SLD's *MAY NOT* have LD
> > management controls.
> 
> Hmm. In theory you could have an MH-SLD that used a config from flash or 
> similar
> but that would be odd.  We need some level of dynamic control to make these
> devices useful.  Doesn't mean the spec should exclude dumb devices, but
> we shouldn't concentrate on them for emulation.
> 
> One possible usecase would be a device that always shares all it's memory on
> all ports. Yuk.
> 

I can say that the earliest forms of MH-SLD, and certainly pre-DCD, is
likely to present all memory on all ports, and potentially provide some
custom commands to help hosts enforce exclusivity.

It's beyond the spec, but this can actually be emulated today with the
MH-SLD setup I describe below.  Certainly I expected a yuk factor to
proposing it, but I think the reality is on the path to 3.0 and DCD
devices we should at least entertain that someone will probably do this
with real hardware.

> > For the simplest MH-SLD device, these fields would be immutable, and
> > there would be a single LD for each head, where head_id == ld_id.
> 
> Agreed.
> 
> > 
> > So summarizing, what I took away from this was the following:
> > 
> > In the simplest form of MH-SLD, there's is neither a switch, nor is
> > there LD management.  So, presumably, we don't HAVE to implement the
> > MHD commands to say we "have MH-SLD support".
> 
> Whilst theoretically possible - I don' think such a device is interesting.
> Minimum I'd want to see is something with multiple upstream SLD ports
> and a management LD with appropriate interface to poke it.
> 
>
> The MLD side of things is interesting only once we support MLDs in general
> in QEMU CXL emulation and even then they are near invisible to a host
> and are more interesting for emulating fabric management.
> 
> What you may want to do is take Fan's work on DCD and look at doing
> a simple MH-SLD device that uses same cheat of just using QMP commands
> to do the configuration.  That's an intermediate step to us getting
> the FM-API and similar commands implemented.
> 

I actually think it's a good step to go from MH-SLD to MH-SLD+DCD while
not having to worry about the complexity of MLD and switches.

(I

Re: [PATCH v5 2/3] hw/mem: Use memory_region_size() in cxl_type3

2023-05-16 Thread Gregory Price
On Fri, Apr 21, 2023 at 05:08:26PM +0100, Jonathan Cameron wrote:
> Accessors prefered over direct use of int128_get64() as they
> clamp out of range values.  None are expected here but
> cleaner to always use the accessor than mix and match.
> 
> Signed-off-by: Jonathan Cameron 
> 
> ---
> v5: New patch to tidy up existing instance before adding more of
> them.
>   - Use memory_region_size() to access the size of memory regions.
> We may eventually need to allow for larger addresses but it
> is unlikely to be a problem any time soon.
> ---
>  hw/mem/cxl_type3.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 1bd5963a3f..2db756851c 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -52,7 +52,7 @@ static int ct3_build_cdat_entries_for_mr(CDATSubHeader 
> **cdat_table,
>  .DSMADhandle = dsmad_handle,
>  .flags = CDAT_DSMAS_FLAG_NV,
>  .DPA_base = 0,
> -.DPA_length = int128_get64(mr->size),
> +.DPA_length = memory_region_size(mr),
>  };
>  
>  /* For now, no memory side cache, plausiblish numbers */
> @@ -133,7 +133,7 @@ static int ct3_build_cdat_entries_for_mr(CDATSubHeader 
> **cdat_table,
>  /* Reserved - the non volatile from DSMAS matters */
>  .EFI_memory_type_attr = 2,
>  .DPA_offset = 0,
> -.DPA_length = int128_get64(mr->size),
> +.DPA_length = memory_region_size(mr),
>  };
>  
>  /* Header always at start of structure */
> @@ -698,7 +698,7 @@ MemTxResult cxl_type3_read(PCIDevice *d, hwaddr 
> host_addr, uint64_t *data,
>  return MEMTX_ERROR;
>  }
>  
> -if (dpa_offset > int128_get64(mr->size)) {
> +if (dpa_offset > memory_region_size(mr)) {
>  return MEMTX_ERROR;
>  }
>  
> @@ -721,7 +721,7 @@ MemTxResult cxl_type3_write(PCIDevice *d, hwaddr 
> host_addr, uint64_t data,
>  return MEMTX_OK;
>  }
>  
> -if (dpa_offset > int128_get64(mr->size)) {
> +    if (dpa_offset > memory_region_size(mr)) {
>  return MEMTX_OK;
>  }
>  return address_space_write(>hostmem_as, dpa_offset, attrs,
> -- 
> 2.37.2
> 

lgtm

Reviewed-by: Gregory Price 



[RFC] cxl: Multi-headed device design

2023-03-22 Thread Gregory Price
Originally I was planning to kick this off with a patch set, but i've
decided my current prototype does not fit the extensibility requirements
to go from SLD to MH-SLD to MH-MLD.


So instead I'd like to kick off by just discussing the data structures
and laugh/cry a bit about some of the frustrating ambiguities for MH-SLDs
when it comes to the specification.

I apologize for the sheer length of this email, but it really is just
that complex.


=
 What does the specification say about Multi-headed Devices? 
=

Defining each relevant component according to the specification:

>
> VCS - Virtual CXL Switch
> * Includes entities within the physical switch belonging to a
>   single VH. It is identified using the VCS ID.
> 
> 
> VH - Virtual Hierarchy.
> * Everything from the CXL RP down.
> 
> 
> LD - Logical Device
> * Entity that represents a CXL Endpoint that is bound to a VCS.
>   An SLD device contains one LD.  An MLD contains multiple LDs.
> 
> 
> SLD - Single Logical Device
> * That's it, that's the definition.
> 
> 
> MLD - Multi Logical Device
> * Multi-Logical Device. CXL component that contains multiple LDs,
>   out of which one LD is reserved for configuration via the FM API,
>   and each remaining LD is suitable for assignment to a different
>   host. Currently MLDs are architected only for Type 3 LDs.
> 
> 
> MH-SLD - Mutli-Headed SLD
> * CXL component that contains multiple CXL ports, each presenting an
>   SLD. The ports must correctly operate when connected to any
>   combination of common or different hosts.
> 
> 
> MH-MLD - Multi-Headed MLD
> * CXL component that contains multiple CXL ports, each presenting an MLD
>   or SLD. The ports must correctly operate when connected to any
>   combination of common or different hosts. The FM-API is used to
>   configure each LD as well as the overall MH-MLD.
> 
>   MH-MLDs are considered a specialized type of MLD and, as such, are
>   subject to all functional and behavioral requirements of MLDs.
> 

Ambiguity #1:

* An SLD contains 1 Logical Device.
* An MH-SLD presents multiple SLDs, one per head.

Ergo an MH-SLD contains multiple LDs which makes it an MLD according to the
definition of LD, but not according to the definition of MLD, or MH-MLD.

Now is the winter of my discontent.

The Specification says this about MH-SLD's in other sections

> 2.4.3 Pooled and Shared FAM
> 
> LD-FAM includes several device variants.
> 
> A multi-headed Single Logical Device (MH-SLD) exposes multiple LDs, each with
> a dedicated link.
> 
>
> 2.5 Multi-Headed Device
> 
> There are two types of Multi-Headed Devices that are distinguied by how
> they present themselves on each head:
> *  MH-SLD, which present SLDs on all head
> *  MH-MLD, which may present MLDs on any of their heads
>
>
> Management of heads in Multi-Headed Devices follows the model defined for
> the device presented by that head:
> *  Heads that present SLDs may support the port management and control
> features that are available for SLDs
> *  Heads that present MLDs may support the port management and control
>features that are available for MLDs
>

I want to make very close note of this.  SLD's are managed like SLDs
SLDs, MLDs are managed like MLDs.  MH-SLDs, according to this, should be
managed like SLDs from the perspective of each host.

That's pretty straight forward.

>
> Management of memory resources in Multi-Headed Devices follows the model
> defined for MLD components because both MH-SLDs and MH-MLDs must support
> the isolation of memory resources, state, context, and management on a
> per-LD basis.  LDs within the device are mapped to a single head.
> 
> *  In MH-SLDs, there is a 1:1 mapping between heads and LDs.
> *  In MH-MLDs, multiple LDs are mapped to at most one head.
> 
> 
> Multi-Headed Devices expose a dedicated Component Command Interface (CCI),
> the LD Pool CCI, for management of all LDs within the device. The LD Pool
> CCI may be exposed as an MCTP-based CCI or can be accessed via the Tunnel
> Management Command command through a head’s Mailbox CCI, as detailed in
> Section 7.6.7.3.1.

2.5.1 continues on to describe "LD Management in MH-MLDs" but just ignores
that MH-SLDs (may) exist.  That's frustrating to say the least, but I
suppose we can gather from context that MH-SLD's *MAY NOT* have LD
management controls.

Lets see if that assumption holds.

> 7.6.7.3 MLD Port Command Set
>
> 7.6.7.3.1 Tunnel Management Command (Opcode 5300h)

The referenced section at the end of 2.5 seems to also suggest that
MH-SLDs do not (or don't have to?) implement the tunnel management
command set.  It sends us to the MLD command set, and SLDs don't get
managed like MLDs - ergo it's not relevant?

The final mention of MH-SLDs is mentioned in section 9.13.3

> 9.13.3 Dynamic Capacity Device
> ...
>  MH-SLD or MH-MLD based DCD shall forcefully release shared Dynamic
>  

[RFC] CXL: TCG/KVM instruction alignment issue discussion default

2023-02-21 Thread Gregory Price


Breaking this off into a separate thread for archival sake.

There's a bug with handling execution of instructions held in CXL
memory - specifically when an instruction crosses a page boundary.

The result of this is that type-3 devices cannot use KVM at all at the
moment, and require the attached patch to run in TCG-only mode.


CXL memory devices are presently emulated as MMIO, and MMIO has no
coherency guarantees, so TCG doesn't cache the results of translating
an instruction, meaning execution is incredibly slow (orders of
magnitude slower than KVM).


Request for comments:


First there's the stability issue:

0) TCG cannot handle instructions across a page boundary spanning ram and
   MMIO. See attached patch for hotfix.  This basically solves the page
   boundary issue by reverting the entire block to MMIO-mode if the
   problem is detected.

1) KVM needs to be investigated.  It's likely the same/similar issue,
   but it's not confirmed.



Second there's the performance issue:

0) Do we actually care about performance? How likely are users to
   attempt to run software out of CXL memory?

1) If we do care, is there a potential for converting CXL away from the
   MMIO design?  The issue is coherency for shared memory. Emulating
   coherency is a) hard, and b) a ton of work for little gain.

   Presently marking CXL memory as MMIO basically enforces coherency by
   preventing caching, though it's unclear how this is enforced
   by KVM (or if it is, i have to imagine it is).



It might be nice to solve this for non-shared memory regions, but
testing functionality >>> performance at this point so it might not
worth the investment.


Just tossing this out for discussion
~Gregory




diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index 061519691f..cd383d7125 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -171,8 +171,16 @@ static void *translator_access(CPUArchState *env, 
DisasContextBase *db,
 if (host == NULL) {
 tb_page_addr_t phys_page =
 get_page_addr_code_hostp(env, base, >host_addr[1]);
-/* We cannot handle MMIO as second page. */
-assert(phys_page != -1);
+
+/*
+* If the second page is MMIO, treat as if the first page
+* was MMIO as well, so that we do not cache the TB.
+*/
+if (unlikely(phys_page == -1)) {
+tb_set_page_addr0(tb, -1);
+return NULL;
+}
+
 tb_set_page_addr1(tb, phys_page);
 #ifdef CONFIG_USER_ONLY
 page_protect(end);



Re: [PATCH 00/18] CXL RAM and the 'Soft Reserved' => 'System RAM' default

2023-02-21 Thread Gregory Price
On Wed, Feb 15, 2023 at 10:03:27AM +, Jonathan Cameron wrote:
> On Tue, 14 Feb 2023 16:54:02 -0500
> Gregory Price  wrote:
> 
> > Just clarifying one thing:  Even with the patch, KVM blows up.
> > Disabling KVM fixes this entirely.  I haven't tested without KVM but
> > with the patch, i will do that now.
> 
> yup.  The patch only fixes TCG so that's expected behavior.
> 
> Fingers crossed on this 'working'.
> 
> I'm open to suggestions on how to work around the problem with KVM
> or indeed allow TCG to cache the instructions (right not it has
> to fetch and emulate each instruction on it's own).
> 
> I can envision how we might do it for KVM with userspace page fault handling
> used to get a fault up to QEMU which can then stitch in a cache
> of the underlying memory as a stage 2 translation to the page (a little
> bit like how post migration copy works) though I've not prototyped
> anything...
> 

Just following up.  With the patch applied and KVM turned off, no crash.
I've been working with this for a while.

We should move the instruction alignment issue into a separate
discussion thread.



Re: [PATCH 2/2] hw/cxl: Multi-Region CXL Type-3 Devices (Volatile and Persistent)

2023-02-17 Thread Gregory Price
On Fri, Feb 17, 2023 at 04:16:17PM +, Jonathan Cameron via wrote:
> On Tue, 31 Jan 2023 16:38:47 +
> Jonathan Cameron via  wrote:
> 
> > From: Gregory Price 
> > 
> > This commit enables each CXL Type-3 device to contain one volatile
> > memory region and one persistent region.
> > 
> > Two new properties have been added to cxl-type3 device initialization:
> > [volatile-memdev] and [persistent-memdev]
> > 
> > The existing [memdev] property has been deprecated and will default the
> > memory region to a persistent memory region (although a user may assign
> > the region to a ram or file backed region). It cannot be used in
> > combination with the new [persistent-memdev] property.
> > 
> > Partitioning volatile memory from persistent memory is not yet supported.
> > 
> > Volatile memory is mapped at DPA(0x0), while Persistent memory is mapped
> > at DPA(vmem->size), per CXL Spec 8.2.9.8.2.0 - Get Partition Info.
> > 
> > Signed-off-by: Gregory Price 
> > Signed-off-by: Jonathan Cameron 
> > 
> Hi Gregory,
> 
> I've added support for multiple HDM decoders and hence can now
> test both volatile and non volatile on same device.
> It very nearly all works. With one exception which is I couldn't
> poke the first byte of the non volatile region.
> 
> I think we have an off by one in a single check.
> 
> Interestingly it makes no difference when creating an FS on top
> (which was my standard test) so I only noticed when poking memory
> addresses directly to sanity check the HDM decoder setup.
> 
> I'll roll a v2 if no one shouts out that I'm wrong.
> 
> Note that adding multiple HDM decoders massively increases
> the number of test cases over what we had before to poke all the
> corners so I may well be missing stuff.  Hopefully can send an RFC
> of that support out next week.
> 
> Jonathan
> 

Very cool! Thanks for pushing this over the finishing line.

All my testing so far has been really smooth since getting the TCG issue
worked out.

> > -MemTxResult cxl_type3_read(PCIDevice *d, hwaddr host_addr, uint64_t *data,
> > -   unsigned size, MemTxAttrs attrs)
[...]
> > +if (vmr) {
> > +if (*dpa_offset <= int128_get64(vmr->size)) {
> 
> Off by one I think.  < 
> 

Yes that makes sense, should be <.  Derp derp.

Though I think this may alludes to more off-by-one issues?  This says

if (dpa_offset < vmr->size)

but dpa_offset should be (hpa - memory_region_base),

The HPA is used by memory access routing for the whole system to determine
what device it should access.

If that corner case is being hit, doesn't it imply the higher level code
is also susceptible to this, and is routing accesses to the wrong device?

~Gregory



Re: CXL 2.0 memory pooling emulation

2023-02-17 Thread Gregory Price
On Fri, Feb 17, 2023 at 11:14:18AM +, Jonathan Cameron wrote:
> On Thu, 16 Feb 2023 15:52:31 -0500
> Gregory Price  wrote:
> 
> > 
> > I agree, it's certainly "not pretty".
> > 
> > I'd go so far as to call the baby ugly :].  Like i said: "The Hackiest way"
> > 
> > My understanding from looking around at some road shows is that some
> > of these early multi-headed devices are basically just SLD's with multiple
> > heads. Most of these devices had to be developed well before DCD's and
> > therefore the FM-API were placed in the spec, and we haven't seen or
> > heard of any of these early devices having any form of switch yet.
> > 
> > I don't see how this type of device is feasible unless it's either 
> > statically
> > provisioned (change firmware settings from bios on reboot) or implements
> > custom firmware commands to implement some form of exclusivity controls over
> > memory regions.
> > 
> > The former makes it not really a useful pooling device, so I'm sorta 
> > guessing
> > we'll see most of these early devices implement custom commands.
> > 
> > I'm just not sure these early MHD's are going to have any real form of
> > FM-API, but it would still be nice to emulate them.
> > 
> Makes sense.  I'd be fine with adding any necessary hooks to allow that
> in the QEMU emulation, but probably not upstreaming the custom stuff.
> 
> Jonathan
> 

I'll have to give it some thought.  The "custom stuff" is mostly init
code, mailbox commands, and the fields those mailbox commands twiddle.

I guess we could create a wrapper-device that hooks raw commands?  Is
that what raw commands are intended for? Notably the kernel has to be
compiled with raw command support, which is disabled by default, but
that's fine.

Dunno, spitballing, but i'm a couple days away from a first pass at a
MHD, though I'll need to spend quite a bit of time cleaning it up before
i can push an RFC.

~Gregory



Re: CXL 2.0 memory pooling emulation

2023-02-16 Thread Gregory Price
On Thu, Feb 16, 2023 at 06:00:57PM +, Jonathan Cameron wrote:
> On Wed, 15 Feb 2023 04:10:20 -0500
> Gregory Price  wrote:
> 
> > On Wed, Feb 15, 2023 at 03:18:54PM +, Jonathan Cameron via wrote:
> > > On Wed, 8 Feb 2023 16:28:44 -0600
> > > zhiting zhu  wrote:
> > >   
> > > 1) Emulate an Multi Headed Device.
> > >Initially connect two heads to different host bridges on a single QEMU
> > >machine.  That lets us test most of the code flows without needing
> > >to handle tests that involve multiple machines.
> > >Later, we could add a means to connect between two instances of QEMU.  
> > 
> > Hackiest way to do this is to connect the same memory backend to two
> > type-3 devices, with the obvious caveat that the device state will not
> > be consistent between views.
> > 
> > But we could, for example, just put the relevant shared state into an
> > optional shared memory area instead of a normally allocated region.
> > 
> > i can imagine this looking something like
> > 
> > memory-backend-file,id=mem0,mem-path=/tmp/mem0,size=4G,share=true
> > cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0,shm_token=mytoken
> > 
> > then you can have multiple qemu instances hook their relevant devices up
> > to a a backend that points to the same file, and instantiate their
> > shared state in the region shmget(mytoken).
> 
> That's not pretty.  For local instance I was thinking a primary device
> which also has the FM-API tunneled access via mailbox, and secondary devices
> that don't.  That would also apply to remote. The secondary device would
> then just receive some control commands on what to expose up to it's host.
> Not sure what convention on how to do that is in QEMU. Maybe a socket
> interface like is done for swtpm? With some ordering constraints on startup.
> 

I agree, it's certainly "not pretty".

I'd go so far as to call the baby ugly :].  Like i said: "The Hackiest way"

My understanding from looking around at some road shows is that some
of these early multi-headed devices are basically just SLD's with multiple
heads. Most of these devices had to be developed well before DCD's and
therefore the FM-API were placed in the spec, and we haven't seen or
heard of any of these early devices having any form of switch yet.

I don't see how this type of device is feasible unless it's either statically
provisioned (change firmware settings from bios on reboot) or implements
custom firmware commands to implement some form of exclusivity controls over
memory regions.

The former makes it not really a useful pooling device, so I'm sorta guessing
we'll see most of these early devices implement custom commands.

I'm just not sure these early MHD's are going to have any real form of
FM-API, but it would still be nice to emulate them.

~Gregory



Re: CXL 2.0 memory pooling emulation

2023-02-15 Thread Gregory Price
On Wed, Feb 15, 2023 at 03:18:54PM +, Jonathan Cameron via wrote:
> On Wed, 8 Feb 2023 16:28:44 -0600
> zhiting zhu  wrote:
> 
> > Hi,
> > 
> > I saw a PoC:
> > https://lore.kernel.org/qemu-devel/20220525121422.3...@huawei.com/T/ to
> > implement memory pooling and fabric manager on qemu. Is there any further
> > development on this? Can qemu emulate a memory pooling on a simple case
> > that two virtual machines connected to a CXL switch where some memory
> > devices are attached to?
> > 
> > Best,
> > Zhiting
> [... snip ...]
> 
> Note though that there is a long way to go before we can do what you
> want.  The steps I'd expect to see along the way:
> 
> 1) Emulate an Multi Headed Device.
>Initially connect two heads to different host bridges on a single QEMU
>machine.  That lets us test most of the code flows without needing
>to handle tests that involve multiple machines.
>Later, we could add a means to connect between two instances of QEMU.

I've been playing with this a bit.

Hackiest way to do this is to connect the same memory backend to two
type-3 devices, with the obvious caveat that the device state will not
be consistent between views.

But we could, for example, just put the relevant shared state into an
optional shared memory area instead of a normally allocated region.

i can imagine this looking something like

memory-backend-file,id=mem0,mem-path=/tmp/mem0,size=4G,share=true
cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0,shm_token=mytoken

then you can have multiple qemu instances hook their relevant devices up
to a a backend that points to the same file, and instantiate their
shared state in the region shmget(mytoken).

Additionally, these devices will require a set of what amounts to
vendor-specific mailbox commands - since the spec doesn't really define
what multi-headed devices "should do" to manage exclusivity.

Not sure if this would be upstream-worthy, or if we'd want to fork
mem/cxl-type3.c into like mem/cxl-type3-multihead.c or something.

The base type3 device is going to end up overloaded at some point i
think, and we'll want to look at trying to abstract it.

~Gregory



Re: [PATCH 2/2] hw/cxl: Multi-Region CXL Type-3 Devices (Volatile and Persistent)

2023-01-31 Thread Gregory Price
On Tue, Jan 31, 2023 at 04:38:47PM +, Jonathan Cameron wrote:
> From: Gregory Price 
> 
> This commit enables each CXL Type-3 device to contain one volatile
> memory region and one persistent region.
>
> ... snip ...
> 

I have no objections to the changes made.  I'll test when I finish up a
few other tasks.



Re: [PATCH 1/2] tests/qtest/cxl-test: whitespace, line ending cleanup

2023-01-31 Thread Gregory Price
On Tue, Jan 31, 2023 at 04:38:46PM +, Jonathan Cameron wrote:
> From: Gregory Price 
> 
> Defines are starting to exceed line length limits, align them for
> cleanliness before making modifications.
> 
> Signed-off-by: Gregory Price 
> Signed-off-by: Jonathan Cameron 
> ---
> Changes since RFC v4:
>   Naming consistency improvements.
>  
>  tests/qtest/cxl-test.c | 84 +++---
>  1 file changed, 46 insertions(+), 38 deletions(-)
> 
> // ... snip ...
>

changes lgtm



Re: [PATCH v3 00/10] hw/cxl: CXL emulation cleanups and minor fixes for upstream

2023-01-30 Thread Gregory Price


Tested and reviewed this series (except my own patches, obviously).

Reviewed-by: Gregory Price 
Tested-by: Gregory Price 

On Mon, Jan 30, 2023 at 02:36:55PM +, Jonathan Cameron wrote:
> V3: Thanks to Michael Tsirkin
>  - Update tests/data/acpi/q35/DSDT.cxl to reflect dropping of the duplicate 
> _UID.
>Usual dance with marking table to be ignored by test then making change 
> and finally
>updating the table with the new version and dropping the entry preventing 
> the tests
>from running.
>  
> V2:
> - Various minor issues found by Philippe, see individual patches.
>   Note that the const_le64() patch matches changes in a set of Philippe's 
> that was
>   never applied. Philippe may send an update of that series before this 
> merges.
>   If that occurs, drop "qemu/bswap: Add const_le64()"
> - Picked up tags.
> 
> V1 Cover letter.
> 
> A small collection of misc fixes and tidying up pulled out from various
> series. I've pulled this to the top of my queue of CXL related work
> as they stand fine on their own and it will reduce the noise in
> the larger patch sets if these go upstream first.
> 
> Gregory's patches were posted as part of his work on adding volatile support.
> https://lore.kernel.org/linux-cxl/20221006233702.18532-1-gregory.pr...@memverge.com/
> https://lore.kernel.org/linux-cxl/20221128150157.97724-2-gregory.pr...@memverge.com/
> I might propose this for upstream inclusion this cycle, but testing is
> currently limited by lack of suitable kernel support.
> 
> Ira's patches were part of his event injection series.
> https://lore.kernel.org/linux-cxl/20221221-ira-cxl-events-2022-11-17-v2-0-2ce2ecc06...@intel.com/
> Intent is to propose for upstream the rest of that series shortly after
> some minor changes from earlier review.
> 
> My three patches have not previously been posted.
> 
> For the curious, the current state of QEMU CXL emulation that we are working
> through the backlog wrt to final cleanup before proposing for upstreaming can 
> be found at.
> 
> https://gitlab.com/jic23/qemu/-/commits/cxl-2023-01-11
> 
> 
> Gregory Price (2):
>   hw/cxl: set cxl-type3 device type to PCI_CLASS_MEMORY_CXL
>   hw/cxl: Add CXL_CAPACITY_MULTIPLIER definition
> 
> Ira Weiny (3):
>   qemu/bswap: Add const_le64()
>   qemu/uuid: Add UUID static initializer
>   hw/cxl/mailbox: Use new UUID network order define for cel_uuid
> 
> Jonathan Cameron (5):
>   hw/mem/cxl_type3: Improve error handling in realize()
>   hw/pci-bridge/cxl_downstream: Fix type naming mismatch
>   tests/acpi: Allow update of q35/DSDT.cxl
>   hw/i386/acpi: Drop duplicate _UID entry for CXL root bridge
>   tests: acpi: Update q35/DSDT.cxl for removed duplicate UID
> 
>  hw/cxl/cxl-device-utils.c  |   2 +-
>  hw/cxl/cxl-mailbox-utils.c |  28 +++-
>  hw/i386/acpi-build.c   |   1 -
>  hw/mem/cxl_type3.c |  15 +++
>  hw/pci-bridge/cxl_downstream.c |   2 +-
>  include/hw/cxl/cxl_device.h|   2 +-
>  include/qemu/bswap.h   |  12 +++-
>  include/qemu/uuid.h|  12 
>  tests/data/acpi/q35/DSDT.cxl   | Bin 9636 -> 9622 bytes
>  9 files changed, 52 insertions(+), 22 deletions(-)
> 
> -- 
> 2.37.2
> 



Re: [RFC v4 3/3] hw/cxl: Multi-Region CXL Type-3 Devices (Volatile and Persistent)

2023-01-30 Thread Gregory Price
On Mon, Jan 30, 2023 at 01:24:51PM +, Jonathan Cameron wrote:
> 
> > diff --git a/tests/qtest/cxl-test.c b/tests/qtest/cxl-test.c
> > index e59ba22387..6893f54e28 100644
> > --- a/tests/qtest/cxl-test.c
> > +++ b/tests/qtest/cxl-test.c
> > @@ -40,32 +40,46 @@
> >"-device cxl-rp,id=rp2,bus=cxl.1,chassis=0,slot=2 " \
> >"-device cxl-rp,id=rp3,bus=cxl.1,chassis=0,slot=3 "
> >  
> > -#define QEMU_T3D \
> > +#define QEMU_T3D_DEPRECATED \
> >"-object memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M " \
> > -  "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M "\
> > +  "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M " \
> >"-device cxl-type3,bus=rp0,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0 "
> >  
> > +#define QEMU_T3D_PMEM \
> > +  "-object memory-backend-file,id=m0,mem-path=%s,size=256M " \
> > +  "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M " \
> > +  "-device cxl-type3,bus=rp0,persistent-memdev=cxl-m0,lsa=lsa0,id=pmem0 "
> naming mismatch. I've fixed up with mem0 as the device name. 
> The naming isn't totally consistent so I may tweak this some more before
> sending for upstream.
> 
>

ack



Re: [RFC v4 2/3] tests/qtest/cxl-test: whitespace, line ending cleanup

2023-01-30 Thread Gregory Price
On Mon, Jan 30, 2023 at 01:11:50PM +, Jonathan Cameron wrote:
> On Thu, 5 Jan 2023 14:38:07 +
> Jonathan Cameron  wrote:
> 
> > On Mon, 28 Nov 2022 10:01:56 -0500
> > Gregory Price  wrote:
> > 
> > > Defines are starting to exceed line length limits, align them for
> > > cleanliness before making modifications.
> > > 
> > > Signed-off-by: Gregory Price   
> > 
> > Hi Gregory,
> > 
> > I was just reordering my tree and noticed that you've only
> > gone with 2 space indent.  Given 4 spaces is the convention in QEMU
> > for other uses, I've switched my local copy of this over to 4 spaces.
> > 
> > Note there was also a single inconsistent 1 space indent - see below.
> > 
> > Jonathan
> > 
> > > 
> > > ---
> > >  tests/qtest/cxl-test.c | 99 +++---
> > >  1 file changed, 54 insertions(+), 45 deletions(-)
> > > 
> > > diff --git a/tests/qtest/cxl-test.c b/tests/qtest/cxl-test.c
> > > index c54f18e76b..e59ba22387 100644
> > > --- a/tests/qtest/cxl-test.c
> > > +++ b/tests/qtest/cxl-test.c
> > > @@ -8,55 +8,64 @@
> > >  #include "qemu/osdep.h"
> > >  #include "libqtest-single.h"
> > >  
> > > -#define QEMU_PXB_CMD "-machine q35,cxl=on " \
> > > - "-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 "  \
> > > - "-M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=4G "
> > > -
> > > -#define QEMU_2PXB_CMD "-machine q35,cxl=on "\
> > > -  "-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 "  \
> > > -  "-device pxb-cxl,id=cxl.1,bus=pcie.0,bus_nr=53 " \
> > > -  "-M 
> > > cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.targets.1=cxl.1,cxl-fmw.0.size=4G "
> > > -
> > > -#define QEMU_VIRT_2PXB_CMD "-machine virt,cxl=on "  \
> > > -  "-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 "  \
> > > -  "-device pxb-cxl,id=cxl.1,bus=pcie.0,bus_nr=53 "  \
> > > -  "-M 
> > > cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.targets.1=cxl.1,cxl-fmw.0.size=4G "
> > > -
> > > -#define QEMU_RP "-device cxl-rp,id=rp0,bus=cxl.0,chassis=0,slot=0 "
> > > +#define QEMU_PXB_CMD \
> > > +  "-machine q35,cxl=on " \
> > > +  "-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 " \
> > > +  "-M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=4G "
> > > +
> > > +#define QEMU_2PXB_CMD \
> > > +  "-machine q35,cxl=on " \
> > > +  "-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 " \
> > > +  "-device pxb-cxl,id=cxl.1,bus=pcie.0,bus_nr=53 " \
> > > + "- M 
> > > cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.targets.1=cxl.1,cxl-fmw.0.size=4G "  
> > This one only has one space.
> It also has a space after the - that I somehow missed. Fixed up in the 
> version I'm
> carrying. Will push out a new tree once I've caught up with some other 
> pending items.

ack



Re: cxl nvdimm Potential probe ordering issues.

2023-01-23 Thread Gregory Price
On Fri, Jan 20, 2023 at 02:41:05PM -0800, Dan Williams wrote:
> 
> Which mode are you referring?
> 
> The next steps for the kernel enabling relevant to this thread are:
> 
> * ram region discovery (platform firmware or kexec established)
> * ram region creation
> * pmem region discovery (from labels)

On some bios there is an option for "legacy cxl" which produces the
nvdimm-esque behavior described above.  After further investigation,
supporting this would require a bigger lift than i expected since the
existing emulated CXL devices don't carry any kind of node assignment
and we'd have to plumb that through

Not worth it, better off working towards the new stuff.



Re: cxl nvdimm Potential probe ordering issues.

2023-01-20 Thread Gregory Price
On Fri, Jan 20, 2023 at 09:38:13AM -0800, Dan Williams wrote:
> As it stands currently that dax device and the cxl device are not
> related since a default dax-device is loaded just based on the presence
> of an EFI_MEMORY_SP address range in the address map. With the new ram
> enabling that default device will be elided and CXL will register a
> dax-device parented by a cxl region.
> 
> >  - The memory *does not* auto-online, instead the dax device can be
> >onlined as system-ram *manually* via ndctl and friends
> 
> That *manually* part is the problem that needs distro help to solve. It
> should be the case that by default all Linux distributions auto-online
> all dax-devices. If that happens to online memory that is too slow for
> general use, or too high-performance / precious for general purpose use
> then the administrator can set policy after the fact. Unfortunately user
> policy can not be applied if these memory ranges were onlined by the
> kernel at boot , so that's why the kernel policy defaults to not-online.
> 
> In other words, there is no guarantee that memory that was assigned to
> the general purpose pool at boot can be removed. The only guaranteed
> behavior is to never give the memory to the core kernel in the first
> instance and always let user policy route the memory.
> 
> > 3) The code creates an nvdimm_bridge IFF a CFMW is defined - regardless
> >of the type-3 device configuration (pmem-only or vmem-only)
> 
> Correct, the top-level bus code (cxl_acpi) and the endpoint code
> (cxl_mem, cxl_port) need to handshake before establishing regions. For
> pmem regions the platform needs to claim the availability of a pmem
> capable CXL window.
> 
> > 4) As you can see above, multiple decoders are registered.  I'm not sure
> >if that's correct or not, but it does seem odd given there's only one
> >  cxl type-3 device.  Odd that decoder0.0 shows up when CFMW is there,
> >  but not when it isn't.
> 
> CXL windows are modeled as decoders hanging off the the CXL root device
> (ACPI0017 on ACPI based platforms). An endpoint decoder can then map a
> selection of that window.
> 
> > Don't know why I haven't thought of this until now, but is the CFMW code
> > reporting something odd about what's behind it?  Is it assuming the
> > devices are pmem?
> 
> No, the cxl_acpi code is just advertising platform decode possibilities
> independent of what devices show up. Think of this like the PCI MMIO
> space that gets allocated to a root bridge at the beginning of time.
> That space may or may not get consumed based on what devices show up
> downstream.

Thank you for the explanation Dan, and thank you for you patience
@JCameron.  I'm fairly sure I grok it now.

Summarizing to make sure: the cxl driver is providing what would be the
CXL.io (control) path, and the CXL.mem path is basically being simulated
by what otherwise would be a traditional PCI memory region. This explains
why turning off Legacy mode drops the dax devices, and why the topology
looks strange - the devices are basically attached in 2 different ways.

Might there be interest from the QEMU community to implement this
legacy-style setup in the short term, in an effort to test the the
control path of type-3 devices while we wait for the kernel to catch up?

Or should we forget this mode ever existed and just barrel forward
with HDM decoders and writing the kernel code to hook up the underlying
devices in drivers/cxl?



Re: cxl nvdimm Potential probe ordering issues.

2023-01-19 Thread Gregory Price
On Thu, Jan 19, 2023 at 04:17:11PM +, Jonathan Cameron wrote:
> 
> Whilst I still have no idea if this is the same problem, I have identified
> what goes wrong if there is a module probe ordering issue.
> https://elixir.bootlin.com/linux/v6.2-rc4/source/drivers/cxl/core/pmem.c#L306
> 
>   /*
>* The two actions below arrange for @cxl_nvd to be deleted when either
>* the top-level PMEM bridge goes down, or the endpoint device goes
>* through ->remove().
>*/
>   device_lock(_nvb->dev);
>   if (cxl_nvb->dev.driver)
>   rc = devm_add_action_or_reset(_nvb->dev, cxl_nvd_unregister,
> cxl_nvd);
>   else
> // bridge driver not loaded, so we hit this path.
>   rc = -ENXIO;
>   device_unlock(_nvb->dev);
> 
>   if (rc)
> /// and this one
>   goto err_alloc;
> 
>   /* @cxlmd carries a reference on @cxl_nvb until cxlmd_release_nvdimm */
>   return devm_add_action_or_reset(>dev, cxlmd_release_nvdimm, 
> cxlmd);
> 
> err:
>   put_device(dev);
> err_alloc:
>   cxlmd->cxl_nvb = NULL;
>   cxlmd->cxl_nvd = NULL;
>   put_device(_nvb->dev);
> // whilst we scrub the pointers we don't actually get rid of the
> // cxl_nvd that we registered.  Hence later load of the driver tries to
> // attach to that and boom because we've scrubbed these pointers here.
> // A quick hack is to just call device_del(_nvd->dev) if rc = -ENXIO here.
> // There may well be a races though
>   return rc;
> }
> EXPORT_SYMBOL_NS_GPL(devm_cxl_add_nvdimm, CXL);
> 
> 
> Of course this "fix" just stops things blowing up, it doesn't leave things
> in a remotely useful state.  If it's triggered because someone
> is messing with the load order that's fine.  If the same issue
> is occurring for Gregory, not so much. 
> 
> Jonathan
> 

mild hint in the dev_cxl_add_nvdimm_bridge path

driver/cxl/acpi.c

static int cxl_acpi_probe(struct platform_device *pdev)
{
... snip ...
  if (IS_ENABLED(CONFIG_CXL_PMEM))
rc = device_for_each_child(_port->dev, root_port,
 add_root_nvdimm_bridge);
  if (rc < 0)
return rc;

  /* In case PCI is scanned before ACPI re-trigger memdev attach */
  cxl_bus_rescan();
  return 0;
}


if PCI is presently written in a way that it's expecting nvdimm_bridge
to be present (via acpi_probe), then clearly this would break.

>From the other discussion here... that seems to be the issue?  If that's
an issue, I also imagine there are other parts that may be subject to
the same problem.


static int cxl_pmem_region_probe(struct device *dev)
{
  struct nd_mapping_desc mappings[CXL_DECODER_MAX_INTERLEAVE];
  struct cxl_pmem_region *cxlr_pmem = to_cxl_pmem_region(dev);
  struct cxl_region *cxlr = cxlr_pmem->cxlr;
  struct cxl_nvdimm_bridge *cxl_nvb = cxlr->cxl_nvb;


this may be unreachable due to prior stack traces, but you get the
point.

Reiterating my confusion a bit: I don't have an nvdimm, why am i getting
an nvdimm_bridge?  The reason it no longer appears to trigger on my
memexp example is because it doesnt go down this path:

static int cxl_mem_probe(struct device *dev)
{
... snip ...

  // resource size is 0 here due to type3dev->persistent_capacity=0
  if (resource_size(>pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM)) {
rc = devm_cxl_add_nvdimm(cxlmd);
if (rc == -ENODEV)
  dev_info(dev, "PMEM disabled by platform\n");
else
  return rc;
  }
... snip ...
}

This seems like more than an ordering issue.



Re: cxl nvdimm Potential probe ordering issues.

2023-01-19 Thread Gregory Price
On Thu, Jan 19, 2023 at 03:04:49PM +, Jonathan Cameron wrote:
> Gregory, would you mind checking if
> cxl_nvb is NULL here...
> https://elixir.bootlin.com/linux/v6.2-rc4/source/drivers/cxl/pmem.c#L67
> (printk before it is used should work).
> 
> Might also be worth checking cxl_nvd and cxl_ds
> but my guess is cxl_nvb is our problem (it is when I deliberate change
> the load order).
> 
> Jonathan
> 

This is exactly the issue.  cxl_nvb is null, the rest appear fine.

Also, note, that weirdly the non-volatile bridge shows up when launching
this in volatile mode, but no stack trace appears.

¯\_(ツ)_/¯

After spending way too much time tracing through the current cxl driver
code, i have only really determined that

1) The code is very pmem oriented, and it's unclear to me how the driver
   as-is differentiates a persistent device from a volatile device. That
 code path still completely escapes me.  The only differentiating code
 i see is in the memdev probe path that creates mem#/pmem and mem#/ram

2) The code successfully manages probe, enable, and mount a REAL device
   - cxl memdev appears (/sys/bus/cxl/devices/mem0)
 - a dax device appears (/sys/bus/dax/devices/)
   This happens at boot, which I assume must be bios related
 - The memory *does not* auto-online, instead the dax device can be
   onlined as system-ram *manually* via ndctl and friends

3) The code creates an nvdimm_bridge IFF a CFMW is defined - regardless
   of the type-3 device configuration (pmem-only or vmem-only)

   # CFMW defined
   [root@fedora ~]# ls /sys/bus/cxl/devices/
   decoder0.0  decoder2.0  mem0port1
   decoder1.0  endpoint2   nvdimm-bridge0  root0

   # CFMW not defined
 [root@fedora ~]# ls /sys/bus/cxl/devices/
   decoder1.0  decoder2.0  endpoint2  mem0  port1  root0

4) As you can see above, multiple decoders are registered.  I'm not sure
   if that's correct or not, but it does seem odd given there's only one
 cxl type-3 device.  Odd that decoder0.0 shows up when CFMW is there,
 but not when it isn't.

 Note: All these tests have two root ports:
 -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
   -device cxl-rp,id=rp0,bus=cxl.0,chassis=0,port=0,slot=0 \
   -device cxl-rp,id=rp1,bus=cxl.0,chassis=0,port=1,slot=1 \


Don't know why I haven't thought of this until now, but is the CFMW code
reporting something odd about what's behind it?  Is it assuming the
devices are pmem?




Re: [RFC v4 3/3] hw/cxl: Multi-Region CXL Type-3 Devices (Volatile and Persistent)

2023-01-19 Thread Gregory Price
On Thu, Jan 19, 2023 at 05:31:12PM +, Jonathan Cameron wrote:
> On Thu, 19 Jan 2023 12:15:45 -0500
> Gregory Price  wrote:
> 
> > Found a bug, not sure how we missed this, probably happed with rebasing
> > and some fixups. We're presently reporting the volatile region as
> > non-volatile, 1 line patch.
> > 
> > Jonathan do you want a separate patch shipped or would you rather just
> > apply a fixup to the commit in your current branch?
> I'll fix up as I'd only squash the patch in anyway.
> 
> If tomorrow is slightly less crazy busy than today I'll push out a new
> tree with this and the pass through decoders support RFC
> (I'll post that to the lists as well)
> 
> Jonathan
> 

Aye aye! 

One other change to consider: the .EFI_memory_type_attr right now is set
to RESERVED.  Should this field actually be EFI_MEMORY_SP? Or is there a
reason for explicitly Reserved?

0: EfiConventionalMemory
1: EfiConventionalMemory w/ EFI_MEMORY_SP Attribute
2: EfiReservedMemoryType

I remember reading a while back that the intended marking is
special-purpose rather than reserved, but i'm hitting my wall on
knowledge.  

Dan may know, but I couldn't divine the correct setting from the kernel
(obviously this is EFI level code, so i didn't expect to).



One other thing that I am noticing:  When a CFMW is registered, an
nvdimm_bridge device becomes present in /sys/bus/cxl/devices -
regardless of whether the type3 device is persistent or volatile.

This makes me believe we aren't setting something up correctly in the
CDAT or something, but other than the below changes everything else
looks correct.  This could imply a kernel driver bug, but i've been
validating against real hardware and this behavior is not seen, even
with functional CXL memory expander devices (which the BIOS obviously
has a hand in setting up).

I started validating the DVSECs, but likewise i didn't see any
indication of error either.



diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 919cdf141e..4daa0cf0f6 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -132,8 +132,9 @@ static int ct3_build_cdat_entries_for_mr(CDATSubHeader 
**cdat_table,
 .length = sizeof(*dsemts),
 },
 .DSMAS_handle = dsmad_handle,
-/* Reserved - the non volatile from DSMAS matters */
-.EFI_memory_type_attr = 2,
+/* Reserved if NV - the non volatile from DSMAS matters
+ * otherwise label this EFI_MEMORY_SP (special purpose) */
+.EFI_memory_type_attr = is_pmem ? 2 : 1,
 .DPA_offset = 0,
 .DPA_length = int128_get64(mr->size),
 };
@@ -187,7 +188,7 @@ static int ct3_build_cdat_table(CDATSubHeader 
***cdat_table, void *priv)
 /* Now fill them in */
 if (volatile_mr) {
 rc = ct3_build_cdat_entries_for_mr(table, dsmad_handle++, volatile_mr,
-   true, 0);
+   false, 0);
 if (rc < 0) {
 return rc;
 }



Re: [RFC v4 3/3] hw/cxl: Multi-Region CXL Type-3 Devices (Volatile and Persistent)

2023-01-19 Thread Gregory Price


Found a bug, not sure how we missed this, probably happed with rebasing
and some fixups. We're presently reporting the volatile region as
non-volatile, 1 line patch.

Jonathan do you want a separate patch shipped or would you rather just
apply a fixup to the commit in your current branch?

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 919cdf141e..1c5168e2e4 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -187,7 +187,7 @@ static int ct3_build_cdat_table(CDATSubHeader 
***cdat_table, void *priv)
 /* Now fill them in */
 if (volatile_mr) {
 rc = ct3_build_cdat_entries_for_mr(table, dsmad_handle++, volatile_mr,
-   true, 0);
+   false, 0);
 if (rc < 0) {
 return rc;
 }



Re: [PATCH 0/8] hw/cxl: CXL emulation cleanups and minor fixes for upstream

2023-01-19 Thread Gregory Price
On Thu, Jan 19, 2023 at 06:48:11AM -0500, Michael S. Tsirkin wrote:
> > 
> > I clearly need to run some more rests as not seen this issue (and I've
> > had a couple of reports of it now).  I guess I never tend to be running
> > a completely clean tree on either side + testing is mostly on arm64
> > (though I doubt that matters). It's not the pass through decoder issues as
> > that is visible only when bringing up a region.
> 
> OK so I should drop this from my queue for now?
> 

Even with this trace, I'm fairly confident that the DOE line + this have
fixed other issues, i'm seeing less general instability on the x86 than
i did before, and when loading in non-volatile mode i don't see any
stability issues.



Re: [PATCH 0/8] hw/cxl: CXL emulation cleanups and minor fixes for upstream

2023-01-19 Thread Gregory Price
On Thu, Jan 19, 2023 at 10:19:46AM +, Jonathan Cameron wrote:
> Even if everything else worked, it will currently fail because of the
> issue with pass through decoders.
> (Kernel assumes always pass through for single rp, qemu assumes never
>  pass through - both are valid under spec).
> Add a second rp for now.  I should be able to post some patches
> to work around that shortly. 
> 

*facepalm* i knew this and yet i still kept old configs around.

I'll do some more testing today with 2 root ports



Re: [PATCH 0/8] hw/cxl: CXL emulation cleanups and minor fixes for upstream

2023-01-18 Thread Gregory Price
I apparently forgot an intro lol

I tested the DOE linux branch with the 2023-1-11 QEMU branch with both
volatile, non-volatile, and "legacy" (pre-my-patch) non-volatile mode.

1) *In volatile mode, there are no stack traces present (during boot*)

On Wed, Jan 18, 2023 at 02:22:08PM -0500, Gregory Price wrote:
> 
> 1) No stack traces present
> 2) Device usage appears to work, but cxl-cli fails to create a region, i
> haven't checked why yet (also tried ndctl-75, same results)
> 3) There seems to be some other regression with the cxl_pmem_init
> routine, because I get a stack trace in this setup regardless of whether
> I apply the type-3 device commit.
> 
> 
> All tests below with the previously posted DOE linux branch.
> Base QEMU branch was Jonathan's 2023-1-11
> 
> 
> DOE Branch - 2023-1-11 (HEAD) (all commits)
> 
> QEMU Config:
> sudo /opt/qemu-cxl/bin/qemu-system-x86_64 \
> -drive 
> file=/var/lib/libvirt/images/cxl.qcow2,format=qcow2,index=0,media=disk,id=hd \
> -m 3G,slots=4,maxmem=8G \
> -smp 4 \
> -machine type=q35,accel=kvm,cxl=on \
> -enable-kvm \
> -nographic \
> -object memory-backend-ram,id=mem0,size=1G,share=on \
> -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
> -device cxl-rp,id=rp0,bus=cxl.0,chassis=0,slot=0 \
> -device cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0 \
> -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=1G
> 
> Result:  This worked, but cxl-cli could not create a region (will look
> into this further later).
> 
> 
> 
> 
> When running with a persistent memory configuration, I'm seeing a
> kernel stack trace on cxl_pmem_init
> 
> Config:
> sudo /opt/qemu-cxl/bin/qemu-system-x86_64 \
> -drive 
> file=/var/lib/libvirt/images/cxl.qcow2,format=qcow2,index=0,media=disk,id=hd \
> -m 3G,slots=4,maxmem=4G \
> -smp 4 \
> -machine type=q35,accel=kvm,cxl=on \
> -enable-kvm \
> -nographic \
> -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
> -device cxl-rp,port=0,id=rp0,bus=cxl.0,chassis=0,slot=0 \
> -object memory-backend-file,id=cxl-mem0,mem-path=/tmp/mem0,size=1G \
> -object memory-backend-file,id=cxl-lsa0,mem-path=/tmp/lsa0,size=1G \
> -device 
> cxl-type3,bus=rp0,persistent-memdev=cxl-mem0,lsa=cxl-lsa0,id=cxl-pmem0 \
> -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=1G
> 
> 
> [   62.167518] BUG: kernel NULL pointer dereference, address: 04c0
> [   62.185069] #PF: supervisor read access in kernel mode
> [   62.198502] #PF: error_code(0x) - not-present page
> [   62.211019] PGD 0 P4D 0
> [   62.220521] Oops:  [#1] PREEMPT SMP PTI
> [   62.233457] CPU: 3 PID: 558 Comm: systemd-udevd Not tainted 6.2.0-rc1+ #1
> [   62.252886] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014
> [   62.258432] Adding 2939900k swap on /dev/zram0.  Priority:100 extents:1 
> across:2939900k SSDscFS
> [   62.285513] RIP: 0010:cxl_nvdimm_probe+0x8d/0x130 [cxl_pmem]
> [   62.285529] Code: 85 c0 0f 85 90 00 00 00 f0 80 0c 24 40 f0 80 4c 24 08 10 
> f0 80 4c 24 08 20 f0 80 4c 24 08 40 49 8d 84 24 b8 04 00 00 4c 89 0
> [   62.285531] RSP: 0018:acff0141fc38 EFLAGS: 00010202
> [   62.285534] RAX: 97a8a37b84b8 RBX: 97a8a37b8000 RCX: 
> 
> [   62.285536] RDX: 0001 RSI: 97a8a37b8000 RDI: 
> 
> [   62.285537] RBP: 97a8a37b8000 R08: 0001 R09: 
> 0001
> [   62.285538] R10: 0001 R11:  R12: 
> 97a8a37b8000
> [   62.285539] R13: 97a982c3dc28 R14:  R15: 
> 
> [   62.285541] FS:  7f2619829580() GS:97a9bca0() 
> knlGS:
> [   62.285542] CS:  0010 DS:  ES:  CR0: 80050033
> [   62.285544] CR2: 04c0 CR3: 0001056a8000 CR4: 
> 06e0
> [   62.285653] Call Trace:
> [   62.285656]  
> [   62.285660]  cxl_bus_probe+0x17/0x50
> [   62.285691]  really_probe+0xde/0x380
> [   62.285695]  ? pm_runtime_barrier+0x50/0x90
> [   62.285700]  __driver_probe_device+0x78/0x170
> [   62.285846]  driver_probe_device+0x1f/0x90
> [   62.285850]  __driver_attach+0xd2/0x1c0
> [   62.285853]  ? __pfx___driver_attach+0x10/0x10
> [   62.285856]  bus_for_each_dev+0x76/0xa0
> [   62.285860]  bus_add_driver+0x1b1/0x200
> [   62.285863]  driver_register+0x89/0xe0
> [   62.285868]  ? __pfx_init_module+0x10/0x10 [cxl_pmem]
> [   62.285874]  cxl_pmem_init+0x50/0xff0 [cxl_pmem]
> [   62.285880]  do_one_initcall+0x6e/0x330
> [   62.285888]  do_init_module+0x4a/0x200
> [   62.285892]  __do_sys_finit_module+0x93/0xf0
> [   62.285899]  do_syscall_64+0x5b/0x80
> [   62.285904]  ? do_syscall_64+0x67/0x80
> [   62.285906]  ? asm

Re: [PATCH 0/8] hw/cxl: CXL emulation cleanups and minor fixes for upstream

2023-01-18 Thread Gregory Price


1) No stack traces present
2) Device usage appears to work, but cxl-cli fails to create a region, i
haven't checked why yet (also tried ndctl-75, same results)
3) There seems to be some other regression with the cxl_pmem_init
routine, because I get a stack trace in this setup regardless of whether
I apply the type-3 device commit.


All tests below with the previously posted DOE linux branch.
Base QEMU branch was Jonathan's 2023-1-11


DOE Branch - 2023-1-11 (HEAD) (all commits)

QEMU Config:
sudo /opt/qemu-cxl/bin/qemu-system-x86_64 \
-drive 
file=/var/lib/libvirt/images/cxl.qcow2,format=qcow2,index=0,media=disk,id=hd \
-m 3G,slots=4,maxmem=8G \
-smp 4 \
-machine type=q35,accel=kvm,cxl=on \
-enable-kvm \
-nographic \
-object memory-backend-ram,id=mem0,size=1G,share=on \
-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
-device cxl-rp,id=rp0,bus=cxl.0,chassis=0,slot=0 \
-device cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0 \
-M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=1G

Result:  This worked, but cxl-cli could not create a region (will look
into this further later).




When running with a persistent memory configuration, I'm seeing a
kernel stack trace on cxl_pmem_init

Config:
sudo /opt/qemu-cxl/bin/qemu-system-x86_64 \
-drive 
file=/var/lib/libvirt/images/cxl.qcow2,format=qcow2,index=0,media=disk,id=hd \
-m 3G,slots=4,maxmem=4G \
-smp 4 \
-machine type=q35,accel=kvm,cxl=on \
-enable-kvm \
-nographic \
-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
-device cxl-rp,port=0,id=rp0,bus=cxl.0,chassis=0,slot=0 \
-object memory-backend-file,id=cxl-mem0,mem-path=/tmp/mem0,size=1G \
-object memory-backend-file,id=cxl-lsa0,mem-path=/tmp/lsa0,size=1G \
-device cxl-type3,bus=rp0,persistent-memdev=cxl-mem0,lsa=cxl-lsa0,id=cxl-pmem0 \
-M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=1G


[   62.167518] BUG: kernel NULL pointer dereference, address: 04c0
[   62.185069] #PF: supervisor read access in kernel mode
[   62.198502] #PF: error_code(0x) - not-present page
[   62.211019] PGD 0 P4D 0
[   62.220521] Oops:  [#1] PREEMPT SMP PTI
[   62.233457] CPU: 3 PID: 558 Comm: systemd-udevd Not tainted 6.2.0-rc1+ #1
[   62.252886] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014
[   62.258432] Adding 2939900k swap on /dev/zram0.  Priority:100 extents:1 
across:2939900k SSDscFS
[   62.285513] RIP: 0010:cxl_nvdimm_probe+0x8d/0x130 [cxl_pmem]
[   62.285529] Code: 85 c0 0f 85 90 00 00 00 f0 80 0c 24 40 f0 80 4c 24 08 10 
f0 80 4c 24 08 20 f0 80 4c 24 08 40 49 8d 84 24 b8 04 00 00 4c 89 0
[   62.285531] RSP: 0018:acff0141fc38 EFLAGS: 00010202
[   62.285534] RAX: 97a8a37b84b8 RBX: 97a8a37b8000 RCX: 
[   62.285536] RDX: 0001 RSI: 97a8a37b8000 RDI: 
[   62.285537] RBP: 97a8a37b8000 R08: 0001 R09: 0001
[   62.285538] R10: 0001 R11:  R12: 97a8a37b8000
[   62.285539] R13: 97a982c3dc28 R14:  R15: 
[   62.285541] FS:  7f2619829580() GS:97a9bca0() 
knlGS:
[   62.285542] CS:  0010 DS:  ES:  CR0: 80050033
[   62.285544] CR2: 04c0 CR3: 0001056a8000 CR4: 06e0
[   62.285653] Call Trace:
[   62.285656]  
[   62.285660]  cxl_bus_probe+0x17/0x50
[   62.285691]  really_probe+0xde/0x380
[   62.285695]  ? pm_runtime_barrier+0x50/0x90
[   62.285700]  __driver_probe_device+0x78/0x170
[   62.285846]  driver_probe_device+0x1f/0x90
[   62.285850]  __driver_attach+0xd2/0x1c0
[   62.285853]  ? __pfx___driver_attach+0x10/0x10
[   62.285856]  bus_for_each_dev+0x76/0xa0
[   62.285860]  bus_add_driver+0x1b1/0x200
[   62.285863]  driver_register+0x89/0xe0
[   62.285868]  ? __pfx_init_module+0x10/0x10 [cxl_pmem]
[   62.285874]  cxl_pmem_init+0x50/0xff0 [cxl_pmem]
[   62.285880]  do_one_initcall+0x6e/0x330
[   62.285888]  do_init_module+0x4a/0x200
[   62.285892]  __do_sys_finit_module+0x93/0xf0
[   62.285899]  do_syscall_64+0x5b/0x80
[   62.285904]  ? do_syscall_64+0x67/0x80
[   62.285906]  ? asm_exc_page_fault+0x22/0x30
[   62.285910]  ? lockdep_hardirqs_on+0x7d/0x100
[   62.285914]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[   62.285917] RIP: 0033:0x7f2619b0afbd
[   62.285920] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 
f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 8
[   62.285922] RSP: 002b:7ffcc516bf58 EFLAGS: 0246 ORIG_RAX: 
0139
[   62.285924] RAX: ffda RBX: 5557c0dcaa60 RCX: 7f2619b0afbd
[   62.285925] RDX:  RSI: 7f261a18743c RDI: 0006
[   62.285926] RBP: 7f261a18743c R08:  R09: 7f261a17bb52
[   62.285927] R10: 0006 R11: 0246 R12: 0002
[   62.285928] R13: 5557c0dbbce0 R14:  R15: 5557c0dc18a0
[   62.285932]  
[   62.285933] Modules linked in: 

Re: [PATCH 0/8] hw/cxl: CXL emulation cleanups and minor fixes for upstream

2023-01-13 Thread Gregory Price
On Fri, Jan 13, 2023 at 09:12:13AM +, Jonathan Cameron wrote:
> 
> Just to check, are these different from the on stack problem you reported
> previously?  Doesn't look like the fix for that has made it upstream yet.
> 
> What kernel are you running?
> 
> 

The prior issue I saw was related to the CXL Fixed Memory Window having
an e820 region registered during machine initialization.  That fix is
upstream.

On 2023-1-11 branch it is commit 2486dd045794d65598fbca9fd1224c27b9732dce

This one appears when registering any kind of type-3 device, during
boot.



  1   2   >