from:"Ben Widawsky"

Re: [PATCH] mem/cxl-type3: Add sn option to provide serial number for PCI ecap

2022-09-27 Thread Ben Widawsky

On 22-09-23 17:18:35, Jonathan Cameron wrote:
> The Device Serial Number Extended Capability PCI r6.0 sec 7.9.3
> provides a standard way to provide a device serial number as
> an IEEE defined 64-bit extended unique identifier EUI-64.
> 
> CXL 2.0 section 8.1.12.2 Memory Device PCIe Capabilities and
> Extended Capabilities requires this to be used to uniquely
> identify CXL memory devices.
> 
> Signed-off-by: Jonathan Cameron 

Reviewed-by: Ben Widawsky 

> ---
> 
> This is the missing element to be able to use the Linux kernel
> support for PMEM region creation.  Without this you can create
> a region, but not remount it after reboot (as the label
> is not valid).
> 
>  hw/mem/cxl_type3.c  | 14 +-
>  include/hw/cxl/cxl_device.h |  1 +
>  2 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 3bf2869573..e0c1535b73 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -14,6 +14,12 @@
>  #include "sysemu/hostmem.h"
>  #include "hw/cxl/cxl.h"
>  
> +/*
> + * Null value of all Fs suggested by IEEE RA guidelines for use of
> + * EU, OUI and CID
> + */
> +#define UI64_NULL ~(0ULL)
> +
>  static void build_dvsecs(CXLType3Dev *ct3d)
>  {
>  CXLComponentState *cxl_cstate = >cxl_cstate;
> @@ -149,7 +155,12 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
>  pci_config_set_class(pci_conf, PCI_CLASS_MEMORY_CXL);
>  
>  pcie_endpoint_cap_init(pci_dev, 0x80);
> -cxl_cstate->dvsec_offset = 0x100;
> +if (ct3d->sn != UI64_NULL) {
> +pcie_dev_ser_num_init(pci_dev, 0x100, ct3d->sn);
> +cxl_cstate->dvsec_offset = 0x100 + 0x0c;
> +} else {
> +cxl_cstate->dvsec_offset = 0x100;
> +}

Perhaps just always make it 0x10c to keep it simple and debuggable?

>  
>  ct3d->cxl_cstate.pdev = pci_dev;
>  build_dvsecs(ct3d);
> @@ -275,6 +286,7 @@ static Property ct3_props[] = {
>   HostMemoryBackend *),
>  DEFINE_PROP_LINK("lsa", CXLType3Dev, lsa, TYPE_MEMORY_BACKEND,
>   HostMemoryBackend *),
> +DEFINE_PROP_UINT64("sn", CXLType3Dev, sn, UI64_NULL),
>  DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index 1e141b6621..e4d221cdb3 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -237,6 +237,7 @@ struct CXLType3Dev {
>  /* Properties */
>  HostMemoryBackend *hostmem;
>  HostMemoryBackend *lsa;
> +uint64_t sn;
>  
>  /* State */
>  AddressSpace hostmem_as;
> -- 
> 2.32.0
>

[PATCH v2] MAINTAINERS: change Ben Widawsky's email address

2022-06-08 Thread Ben Widawsky via

ben.widaw...@intel.com will stop working on 2022-06-20, change it to my
personal email address.

Update .mailmap to handle previously authored commits.

Acked-by: Jonathan Cameron 
Signed-off-by: Ben Widawsky 

---
v2:
  Fix typo in commit message
  change author to b...@bwidawsk.net from @intel.com
  Swap mailmap direction (Jonathan)
---
 .mailmap| 1 +
 MAINTAINERS | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/.mailmap b/.mailmap
index 8c326709cfab..e92e268b9230 100644
--- a/.mailmap
+++ b/.mailmap
@@ -54,6 +54,7 @@ Aleksandar Rikalo  

 Aleksandar Rikalo  
 Alexander Graf  
 Anthony Liguori  Anthony Liguori 
+Ben Widawsky  
 Christian Borntraeger  
 Filip Bozuta  
 Frederic Konrad  
diff --git a/MAINTAINERS b/MAINTAINERS
index 5580a36b68e1..89da5755116b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2574,7 +2574,7 @@ F: qapi/transaction.json
 T: git https://repo.or.cz/qemu/armbru.git block-next
 
 Compute Express Link
-M: Ben Widawsky 
+M: Ben Widawsky 
 M: Jonathan Cameron 
 S: Supported
 F: hw/cxl/

base-commit: 9b1f58854959c5a9bdb347e3e04c252ab7fc9ef5
-- 
2.36.1

Re: [PATCH] MAINTAINERS: change Ben Widawsky's email address

2022-06-07 Thread Ben Widawsky via

On 22-06-07 17:50:35, Jonathan Cameron wrote:
> On Tue, 7 Jun 2022 09:26:28 -0700
> Ben Widawsky  wrote:
> 
> > ben@widaw...@intel.com will stop working on 2022-06-20, change it to my
> > personal email address.
> > 
> > Update .mailmap to handle previously authored commits.
> > 
> > Signed-off-by: Ben Widawsky 
> 
> With below question addressed,
> Acked-by: Jonathan Cameron 
> 
> Probably cc Michael Tsirkin as well given he picked up the
> patch that introduced this maintainers entry recently.

Okay. Luckily I had a typo in the commit message anyway, so it needed a respin.

> 
> Thanks,
> 
> Jonathan
> 
> 
> > ---
> >  .mailmap| 1 +
> >  MAINTAINERS | 2 +-
> >  2 files changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/.mailmap b/.mailmap
> > index 8c326709cfab..0dec7b156999 100644
> > --- a/.mailmap
> > +++ b/.mailmap
> > @@ -54,6 +54,7 @@ Aleksandar Rikalo  
> > 
> >  Aleksandar Rikalo  
> > 
> >  Alexander Graf  
> >  Anthony Liguori  Anthony Liguori 
> > 
> > +Ben Widawsky  
> 
> Is this backwards as you will (I think) want scripts to output 
> b...@bwidawsk.net 
> as your canonical email address going forwards?

I guess? I simply read the comment to determine order. I should have gone with
what I knew rather than tried to figure out what was meant.

"# Next, replace old addresses by a more recent one."

> 
> >  Christian Borntraeger  
> >  Filip Bozuta  
> >  Frederic Konrad  
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 5580a36b68e1..89da5755116b 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -2574,7 +2574,7 @@ F: qapi/transaction.json
> >  T: git https://repo.or.cz/qemu/armbru.git block-next
> >  
> >  Compute Express Link
> > -M: Ben Widawsky 
> > +M: Ben Widawsky 
> >  M: Jonathan Cameron 
> >  S: Supported
> >  F: hw/cxl/
> > 
> > base-commit: 9b1f58854959c5a9bdb347e3e04c252ab7fc9ef5
>

[PATCH] MAINTAINERS: change Ben Widawsky's email address

2022-06-07 Thread Ben Widawsky via

ben@widaw...@intel.com will stop working on 2022-06-20, change it to my
personal email address.

Update .mailmap to handle previously authored commits.

Signed-off-by: Ben Widawsky 
---
 .mailmap| 1 +
 MAINTAINERS | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/.mailmap b/.mailmap
index 8c326709cfab..0dec7b156999 100644
--- a/.mailmap
+++ b/.mailmap
@@ -54,6 +54,7 @@ Aleksandar Rikalo  

 Aleksandar Rikalo  
 Alexander Graf  
 Anthony Liguori  Anthony Liguori 
+Ben Widawsky  
 Christian Borntraeger  
 Filip Bozuta  
 Frederic Konrad  
diff --git a/MAINTAINERS b/MAINTAINERS
index 5580a36b68e1..89da5755116b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2574,7 +2574,7 @@ F: qapi/transaction.json
 T: git https://repo.or.cz/qemu/armbru.git block-next
 
 Compute Express Link
-M: Ben Widawsky 
+M: Ben Widawsky 
 M: Jonathan Cameron 
 S: Supported
 F: hw/cxl/

base-commit: 9b1f58854959c5a9bdb347e3e04c252ab7fc9ef5
-- 
2.36.1

Re: [PATCH v3] hw/cxl: Fix missing write mask for HDM decoder target list registers

2022-06-07 Thread Ben Widawsky

On 22-06-07 17:37:02, Jonathan Cameron wrote:
> On Tue, 7 Jun 2022 09:19:28 -0700
> Ben Widawsky  wrote:
> 
> > On 22-06-07 17:07:47, Jonathan Cameron wrote:
> > > Without being able to write these registers, no interleaving is possible.
> > > More refined checks of HDM register state on commit to follow.
> > > 
> > > Signed-off-by: Jonathan Cameron 
> > > ---
> > > v3: Actually pass the parameter to the call...
> > > v2: (Ben Widawsky)
> > > - Correctly set a tighter write mask for the endpoint devices where this
> > >   register has a different use.
> > >   
> > >  hw/cxl/cxl-component-utils.c | 11 +--
> > >  1 file changed, 9 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c
> > > index 7985c9bfca..2208284ee6 100644
> > > --- a/hw/cxl/cxl-component-utils.c
> > > +++ b/hw/cxl/cxl-component-utils.c
> > > @@ -154,7 +154,8 @@ static void ras_init_common(uint32_t *reg_state, 
> > > uint32_t *write_msk)
> > >  reg_state[R_CXL_RAS_ERR_CAP_CTRL] = 0x00;
> > >  }
> > >  
> > > -static void hdm_init_common(uint32_t *reg_state, uint32_t *write_msk)
> > > +static void hdm_init_common(uint32_t *reg_state, uint32_t *write_msk,
> > > +enum reg_type type)
> > >  {
> > >  int decoder_count = 1;
> > >  int i;
> > > @@ -174,6 +175,12 @@ static void hdm_init_common(uint32_t *reg_state, 
> > > uint32_t *write_msk)
> > >  write_msk[R_CXL_HDM_DECODER0_SIZE_LO + i * 0x20] = 0xf000;
> > >  write_msk[R_CXL_HDM_DECODER0_SIZE_HI + i * 0x20] = 0x;
> > >  write_msk[R_CXL_HDM_DECODER0_CTRL + i * 0x20] = 0x13ff;
> > > +if (type == CXL2_DEVICE) {
> > > +write_msk[R_CXL_HDM_DECODER0_TARGET_LIST_LO + i * 0x20] = 
> > > 0xf000;
> > > +} else {
> > > +write_msk[R_CXL_HDM_DECODER0_TARGET_LIST_LO + i * 0x20] = 
> > > 0x;
> > > +}
> > > +write_msk[R_CXL_HDM_DECODER0_TARGET_LIST_HI + i * 0x20] = 
> > > 0x;  
> > 
> > Should it be (type == CXL2_DEVICE || type == CXL2_TYPE3_DEVICE) ?
> 
> Good point, but also for consistency I think we need 
> type == CXL2_LOGICAL_DEVICE as well.

I was looking at this and I am not sure, but I defer to you.

> 
> We will only exercise the match to CXL2_TYPE3_DEVICE currently
> as we don't have any emulation for MLDs (and hence LD) or type 1/2 devices
> (CXL2_DEVICE).
> 
> I'll send a v4 out tomorrow.
> 

Sounds good, feel free to keep the r-b tag.

> > 
> > Otherwise,
> > Reviewed-by: Ben Widawsky 
> > 
> > >  }
> > >  }
> > >  
> > > @@ -239,7 +246,7 @@ void cxl_component_register_init_common(uint32_t 
> > > *reg_state, uint32_t *write_msk
> > >  }
> > >  
> > >  init_cap_reg(HDM, 5, 1);
> > > -hdm_init_common(reg_state, write_msk);
> > > +hdm_init_common(reg_state, write_msk, type);
> > >  
> > >  if (caps < 5) {
> > >  return;
> > > -- 
> > > 2.32.0
> > >   
>

Re: [PATCH v3] hw/cxl: Fix missing write mask for HDM decoder target list registers

2022-06-07 Thread Ben Widawsky

On 22-06-07 17:07:47, Jonathan Cameron wrote:
> Without being able to write these registers, no interleaving is possible.
> More refined checks of HDM register state on commit to follow.
> 
> Signed-off-by: Jonathan Cameron 
> ---
> v3: Actually pass the parameter to the call...
> v2: (Ben Widawsky)
> - Correctly set a tighter write mask for the endpoint devices where this
>   register has a different use.
>   
>  hw/cxl/cxl-component-utils.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c
> index 7985c9bfca..2208284ee6 100644
> --- a/hw/cxl/cxl-component-utils.c
> +++ b/hw/cxl/cxl-component-utils.c
> @@ -154,7 +154,8 @@ static void ras_init_common(uint32_t *reg_state, uint32_t 
> *write_msk)
>  reg_state[R_CXL_RAS_ERR_CAP_CTRL] = 0x00;
>  }
>  
> -static void hdm_init_common(uint32_t *reg_state, uint32_t *write_msk)
> +static void hdm_init_common(uint32_t *reg_state, uint32_t *write_msk,
> +enum reg_type type)
>  {
>  int decoder_count = 1;
>  int i;
> @@ -174,6 +175,12 @@ static void hdm_init_common(uint32_t *reg_state, 
> uint32_t *write_msk)
>  write_msk[R_CXL_HDM_DECODER0_SIZE_LO + i * 0x20] = 0xf000;
>  write_msk[R_CXL_HDM_DECODER0_SIZE_HI + i * 0x20] = 0x;
>  write_msk[R_CXL_HDM_DECODER0_CTRL + i * 0x20] = 0x13ff;
> +if (type == CXL2_DEVICE) {
> +write_msk[R_CXL_HDM_DECODER0_TARGET_LIST_LO + i * 0x20] = 
> 0xf000;
> +} else {
> +write_msk[R_CXL_HDM_DECODER0_TARGET_LIST_LO + i * 0x20] = 
> 0x;
> +}
> +write_msk[R_CXL_HDM_DECODER0_TARGET_LIST_HI + i * 0x20] = 0x;

Should it be (type == CXL2_DEVICE || type == CXL2_TYPE3_DEVICE) ?

Otherwise,
Reviewed-by: Ben Widawsky 

>  }
>  }
>  
> @@ -239,7 +246,7 @@ void cxl_component_register_init_common(uint32_t 
> *reg_state, uint32_t *write_msk
>  }
>  
>  init_cap_reg(HDM, 5, 1);
> -hdm_init_common(reg_state, write_msk);
> +hdm_init_common(reg_state, write_msk, type);
>  
>  if (caps < 5) {
>  return;
> -- 
> 2.32.0
>

Re: [PATCH] hw/cxl: Fix missing write mask for HDM decoder target list registers

2022-06-06 Thread Ben Widawsky

On 22-05-31 13:39:53, Jonathan Cameron wrote:
> Without being able to write these registers, no interleaving is possible.
> More refined checks of HDM register state on commit to follow.
> 
> Signed-off-by: Jonathan Cameron 
> ---
>  hw/cxl/cxl-component-utils.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c
> index 7985c9bfca..993248b5c0 100644
> --- a/hw/cxl/cxl-component-utils.c
> +++ b/hw/cxl/cxl-component-utils.c
> @@ -174,6 +174,8 @@ static void hdm_init_common(uint32_t *reg_state, uint32_t 
> *write_msk)
>  write_msk[R_CXL_HDM_DECODER0_SIZE_LO + i * 0x20] = 0xf000;
>  write_msk[R_CXL_HDM_DECODER0_SIZE_HI + i * 0x20] = 0x;
>  write_msk[R_CXL_HDM_DECODER0_CTRL + i * 0x20] = 0x13ff;
> +write_msk[R_CXL_HDM_DECODER0_TARGET_LIST_LO + i * 0x20] = 0x;
> +write_msk[R_CXL_HDM_DECODER0_TARGET_LIST_HI + i * 0x20] = 0x;

I wonder if this should be 0. It will be weird for endpoints to have a skip
value of 0xff.

>  }
>  }
>  
> -- 
> 2.32.0
>

Re: [PATCH v2 0/8] hw/cxl: Move CXL emulation options and state to machines.

2022-06-06 Thread Ben Widawsky

On 22-06-01 17:42:27, Jonathan Cameron wrote:
> Changes since v1 (thanks to Paolo Bonzini)
> * Update 'description' of cxl-fmw as suggested to mention it's an array.
> * Add a wrapper cxl_hook_up_pxb_registers() to cxl-host.c as it'll be common
>   for all machines using CXL with PXB.
> 
> Run through the CI at:
> https://gitlab.com/jic23/qemu/-/pipelines/553257456
>  
> V1 Cover letter:
> 
> Currently only machine with CXL support upstream is i386/pc but arm/virt
> patches have been posted and once this is merged an updated series will
> follow. Switch support is queued behind this as well because they both
> include documentation updates.
> 
> Paolo Bonzini highlighted a couple of issues with the current CXL
> emulation code.
> 
> * Top level parameter rather than machine for fixed memory windows
> 
>   The --cxl-fixed-memory-window top level command line parameters won't play
>   well with efforts to make it possible to instantiate entire machines via
>   RPC. Better to move these to be machine configuration.  This change is
>   relatively straight forward, but does result in very long command lines
>   (cannot break fixed window setup into multiple -M entries).
> 
> * Move all CXL stuff to machine specific code and helpers
> 
>   To simplify the various interactions between machine setup and host
>   bridges etc, currently various CXL steps are called from the generic
>   core/machine.c and softmmu/vl.c + there are CXL elements in MachineState.
> 
>   Much of this is straight forward to do with one exception:
>   The CXL pci_expander_bridge host bridges require MMIO register space.
>   This series does this by walking the bus and filling the register space
>   in via the machine_done callback. This is similar to the walk done for
>   identifying host bridges in the ACPI building code but it is rather ugly
>   and postpones rejection of PXB_CXL instances where cxl=off (default).
> 
> All comments welcome, but the first patch at least changes the command-line
> so to avoid have to add backwards compatibility code, it would be great
> to merge that before 7.1 is released.
> 

LGTM overall. I'm not thrilled with introducing another [sub]scronym "fmw", but
otherwise, no complaints.
Series is:
Reviewed-by: Ben Widawsky 

> Thanks,
> 
> Jonathan
> 
> Jonathan Cameron (8):
>   hw/cxl: Make the CXL fixed memory window setup a machine parameter.
>   hw/acpi/cxl: Pass in the CXLState directly rather than MachineState
>   hw/cxl: Push linking of CXL targets into i386/pc rather than in
> machine.c
>   tests/acpi: Allow modification of q35 CXL CEDT table.
>   pci/pci_expander_bridge: For CXL HB delay the HB register memory
> region setup.
>   tests/acpi: Update q35/CEDT.cxl for new memory addresses.
>   hw/cxl: Move the CXLState from MachineState to machine type specific
> state.
>   hw/machine: Drop cxl_supported flag as no longer useful
> 
>  docs/system/devices/cxl.rst |   4 +-
>  hw/acpi/cxl.c   |   9 +-
>  hw/core/machine.c   |  28 --
>  hw/cxl/cxl-host-stubs.c |   9 +-
>  hw/cxl/cxl-host.c   | 100 ++--
>  hw/i386/acpi-build.c|   8 +-
>  hw/i386/pc.c|  31 +++---
>  hw/pci-bridge/meson.build   |   5 +-
>  hw/pci-bridge/pci_expander_bridge.c |  32 ---
>  hw/pci-bridge/pci_expander_bridge_stubs.c   |  14 +++
>  include/hw/acpi/cxl.h   |   5 +-
>  include/hw/boards.h |   3 +-
>  include/hw/cxl/cxl.h|   9 +-
>  include/hw/cxl/cxl_host.h   |  23 +
>  include/hw/i386/pc.h|   2 +
>  include/hw/pci-bridge/pci_expander_bridge.h |  12 +++
>  qapi/machine.json   |  13 +++
>  softmmu/vl.c|  46 -
>  tests/data/acpi/q35/CEDT.cxl| Bin 184 -> 184 bytes
>  tests/qtest/bios-tables-test.c  |   4 +-
>  tests/qtest/cxl-test.c  |   4 +-
>  21 files changed, 222 insertions(+), 139 deletions(-)
>  create mode 100644 hw/pci-bridge/pci_expander_bridge_stubs.c
>  create mode 100644 include/hw/cxl/cxl_host.h
>  create mode 100644 include/hw/pci-bridge/pci_expander_bridge.h
> 
> -- 
> 2.32.0
>

Re: [PATCH 1/8] hw/cxl: Make the CXL fixed memory window setup a machine parameter.

2022-06-06 Thread Ben Widawsky

On 22-05-31 09:26:27, Paolo Bonzini wrote:
> On 5/30/22 15:45, Jonathan Cameron via wrote:
> > +object_property_add(obj, "cxl-fmw", "CXLFixedMemoryWindow",
> > +machine_get_cfmw, machine_set_cfmw,
> > +NULL, state);
> > +object_property_set_description(obj, "cxl-fmw",
> > +"CXL Fixed Memory Window");
> 
> Perhaps "CML fixed memory windows (array)" or something like that?
> 
> Paolo

I had a mail which I apparently never sent. I'd like to see 'fmw' renamed, since
that has no decoder ring in any spec that I'm aware of.

Why not keep cfmws nomenclature? It's well defined.

Ben

Re: [RFC PATCH 2/2] arm/virt: Add aspeed-i2c controller and MCTP EP to enable MCTP testing

2022-05-24 Thread Ben Widawsky

On 22-05-20 18:01:28, Jonathan Cameron wrote:
> As the only I2C emulation in QEMU that supports being both
> a master and a slave, suitable for MCTP over i2c is aspeed-i2c
> add this controller to the arm virt model and hook up our new
> i2c_mctp_cxl_fmapi device.
> 
> The current Linux driver for aspeed-i2c has a hard requirement on
> a reset controller.  Throw down the simplest reset controller
> I could find so as to avoid need to make any chance to the kernel
> code.

s/chance/change

> 
> Patch also builds appropriate device tree.  Unfortunately for CXL
> we need to use ACPI (no DT bindings yet defined). Enabling this will
> either require appropriate support for MCTP on an i2c master that
> has ACPI bindings, or modifications of the kernel driver to support
> ACPI with aspeed-i2c (which might be a little controversial ;)

I'm naive to what DT defines, but I assume what's there already is insufficient
to make the bindings for CXL. I say this because I believe it wouldn't be too
bad at all to make a cxl_dt.ko, and it's certainly less artificial than
providing ACPI support for things which don't naturally have ACPI support.

> 
> Signed-off-by: Jonathan Cameron 
> ---
>  hw/arm/Kconfig|  1 +
>  hw/arm/virt.c | 77 +++
>  include/hw/arm/virt.h |  2 ++
>  3 files changed, 80 insertions(+)
> 
> diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
> index 219262a8da..4a733298cd 100644
> --- a/hw/arm/Kconfig
> +++ b/hw/arm/Kconfig
> @@ -30,6 +30,7 @@ config ARM_VIRT
>  select ACPI_VIOT
>  select VIRTIO_MEM_SUPPORTED
>  select ACPI_CXL
> +select I2C_MCTP_CXL_FMAPI
>  
>  config CHEETAH
>  bool
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index d818131b57..ea04279515 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -80,6 +80,9 @@
>  #include "hw/char/pl011.h"
>  #include "hw/cxl/cxl.h"
>  #include "qemu/guest-random.h"
> +#include "hw/i2c/i2c.h"
> +#include "hw/i2c/aspeed_i2c.h"
> +#include "hw/misc/i2c_mctp_cxl_fmapi.h"
>  
>  #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
>  static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
> @@ -156,6 +159,8 @@ static const MemMapEntry base_memmap[] = {
>  [VIRT_PVTIME] = { 0x090a, 0x0001 },
>  [VIRT_SECURE_GPIO] ={ 0x090b, 0x1000 },
>  [VIRT_MMIO] =   { 0x0a00, 0x0200 },
> +[VIRT_I2C] ={ 0x0b00, 0x4000 },
> +[VIRT_RESET_FAKE] = { 0x0b004000, 0x0010 },
>  /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size 
> */
>  [VIRT_PLATFORM_BUS] =   { 0x0c00, 0x0200 },
>  [VIRT_SECURE_MEM] = { 0x0e00, 0x0100 },
> @@ -192,6 +197,7 @@ static const int a15irqmap[] = {
>  [VIRT_GPIO] = 7,
>  [VIRT_SECURE_UART] = 8,
>  [VIRT_ACPI_GED] = 9,
> +[VIRT_I2C] = 10,
>  [VIRT_MMIO] = 16, /* ...to 16 + NUM_VIRTIO_TRANSPORTS - 1 */
>  [VIRT_GIC_V2M] = 48, /* ...to 48 + NUM_GICV2M_SPIS - 1 */
>  [VIRT_SMMU] = 74,/* ...to 74 + NUM_SMMU_IRQS - 1 */
> @@ -1996,6 +2002,75 @@ static void virt_cpu_post_init(VirtMachineState *vms, 
> MemoryRegion *sysmem)
>  }
>  }
>  
> +static void create_mctp_test(MachineState *ms)
> +{
> +VirtMachineState *vms = VIRT_MACHINE(ms);
> +MemoryRegion *sysmem = get_system_memory();
> +AspeedI2CState *aspeedi2c;
> +struct DeviceState  *dev;
> +char *nodename_i2c_master;
> +char *nodename_i2c_sub;
> +char *nodename_reset;
> +uint32_t clk_phandle, reset_phandle;
> +MemoryRegion *sysmem2;
> +   
> +dev = qdev_new("aspeed.i2c-ast2600");
> +aspeedi2c = ASPEED_I2C(dev);
> +object_property_set_link(OBJECT(dev), "dram", OBJECT(ms->ram), 
> _fatal);
> +sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal);
> +sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, vms->memmap[VIRT_I2C].base);
> +sysbus_connect_irq(SYS_BUS_DEVICE(>busses[0]), 0, 
> qdev_get_gpio_in(vms->gic, vms->irqmap[VIRT_I2C]));
> +
> +/* I2C bus DT */
> +reset_phandle = qemu_fdt_alloc_phandle(ms->fdt);
> +nodename_reset = g_strdup_printf("/reset@%" PRIx64, 
> vms->memmap[VIRT_RESET_FAKE].base);
> +qemu_fdt_add_subnode(ms->fdt, nodename_reset);
> +qemu_fdt_setprop_string(ms->fdt, nodename_reset, "compatible", 
> "snps,dw-low-reset");
> +qemu_fdt_setprop_sized_cells(ms->fdt, nodename_reset, "reg",
> + 2, vms->memmap[VIRT_RESET_FAKE].base,
> + 2, vms->memmap[VIRT_RESET_FAKE].size);
> +qemu_fdt_setprop_cell(ms->fdt, nodename_reset, "#reset-cells", 0x1);
> +qemu_fdt_setprop_cell(ms->fdt, nodename_reset, "phandle", reset_phandle);
> +sysmem2 =  g_new(MemoryRegion, 1);
> +memory_region_init_ram(sysmem2, NULL, "reset", 
> vms->memmap[VIRT_RESET_FAKE].size, NULL);
> +memory_region_add_subregion(sysmem, vms->memmap[VIRT_RESET_FAKE].base, 
> sysmem2);
> +
> +

Re: [PATCH v5 20/43] hw/cxl/device: Add a memory device (8.2.8.5)

2022-02-11 Thread Ben Widawsky

On 22-02-11 16:45:19, Jonathan Cameron wrote:
> On Fri, 11 Feb 2022 07:50:00 -0800
> Ben Widawsky  wrote:
> 
> > On 22-02-02 14:10:14, Jonathan Cameron wrote:
> > > From: Ben Widawsky 
> > > 
> > > A CXL memory device (AKA Type 3) is a CXL component that contains some
> > > combination of volatile and persistent memory. It also implements the
> > > previously defined mailbox interface as well as the memory device
> > > firmware interface.
> > > 
> > > Although the memory device is configured like a normal PCIe device, the
> > > memory traffic is on an entirely separate bus conceptually (using the
> > > same physical wires as PCIe, but different protocol).
> > > 
> > > Once the CXL topology is fully configure and address decoders committed,
> > > the guest physical address for the memory device is part of a larger
> > > window which is owned by the platform.  The creation of these windows
> > > is later in this series.
> > > 
> > > The following example will create a 256M device in a 512M window:
> > > -object 
> > > "memory-backend-file,id=cxl-mem1,share,mem-path=cxl-type3,size=512M"
> > > -device "cxl-type3,bus=rp0,memdev=cxl-mem1,id=cxl-pmem0"
> > > 
> > > Note: Dropped PCDIMM info interfaces for now.  They can be added if
> > > appropriate at a later date.
> > > 
> > > Signed-off-by: Ben Widawsky 
> > > Signed-off-by: Jonathan Cameron 
> > > ---
> > >  hw/cxl/cxl-mailbox-utils.c |  47 ++
> > >  hw/mem/Kconfig |   5 ++
> > >  hw/mem/cxl_type3.c | 170 +
> > >  hw/mem/meson.build |   1 +
> > >  include/hw/cxl/cxl.h   |   1 +
> > >  include/hw/cxl/cxl_pci.h   |  22 +
> > >  include/hw/pci/pci_ids.h   |   1 +
> > >  7 files changed, 247 insertions(+)
> > >  create mode 100644 hw/mem/cxl_type3.c
> > > 
> > > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > > index 16bb998735..808faec114 100644
> > > --- a/hw/cxl/cxl-mailbox-utils.c
> > > +++ b/hw/cxl/cxl-mailbox-utils.c
> > > @@ -50,6 +50,8 @@ enum {
> > >  LOGS= 0x04,
> > >  #define GET_SUPPORTED 0x0
> > >  #define GET_LOG   0x1
> > > +IDENTIFY= 0x40,
> > > +#define MEMORY_DEVICE 0x0
> > >  };
> > >  
> > >  /* 8.2.8.4.5.1 Command Return Codes */
> > > @@ -216,6 +218,48 @@ static ret_code cmd_logs_get_log(struct cxl_cmd *cmd,
> > >  return CXL_MBOX_SUCCESS;
> > >  }
> > >  
> > > +/* 8.2.9.5.1.1 */
> > > +static ret_code cmd_identify_memory_device(struct cxl_cmd *cmd,
> > > +   CXLDeviceState *cxl_dstate,
> > > +   uint16_t *len)
> > > +{
> > > +struct {
> > > +char fw_revision[0x10];
> > > +uint64_t total_capacity;
> > > +uint64_t volatile_capacity;
> > > +uint64_t persistent_capacity;
> > > +uint64_t partition_align;
> > > +uint16_t info_event_log_size;
> > > +uint16_t warning_event_log_size;
> > > +uint16_t failure_event_log_size;
> > > +uint16_t fatal_event_log_size;
> > > +uint32_t lsa_size;
> > > +uint8_t poison_list_max_mer[3];
> > > +uint16_t inject_poison_limit;
> > > +uint8_t poison_caps;
> > > +uint8_t qos_telemetry_caps;
> > > +} __attribute__((packed)) *id;
> > > +_Static_assert(sizeof(*id) == 0x43, "Bad identify size");
> > > +
> > > +uint64_t size = cxl_dstate->pmem_size;
> > > +
> > > +if (!QEMU_IS_ALIGNED(size, 256 << 20)) {
> > > +return CXL_MBOX_INTERNAL_ERROR;
> > > +}
> > > +
> > > +id = (void *)cmd->payload;
> > > +memset(id, 0, sizeof(*id));
> > > +
> > > +/* PMEM only */
> > > +snprintf(id->fw_revision, 0x10, "BWFW VERSION %02d", 0);
> > > +
> > > +id->total_capacity = size / (256 << 20);
> > > +id->persistent_capacity = size / (256 << 20);
> > > +
> > > +*len = sizeof(*id);
> > > +return CXL_MBOX_SUCCESS;
> > > +}
> > > +
> > >  #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
> &g

Re: [PATCH v5 20/43] hw/cxl/device: Add a memory device (8.2.8.5)

2022-02-11 Thread Ben Widawsky

On 22-02-02 14:10:14, Jonathan Cameron wrote:
> From: Ben Widawsky 
> 
> A CXL memory device (AKA Type 3) is a CXL component that contains some
> combination of volatile and persistent memory. It also implements the
> previously defined mailbox interface as well as the memory device
> firmware interface.
> 
> Although the memory device is configured like a normal PCIe device, the
> memory traffic is on an entirely separate bus conceptually (using the
> same physical wires as PCIe, but different protocol).
> 
> Once the CXL topology is fully configure and address decoders committed,
> the guest physical address for the memory device is part of a larger
> window which is owned by the platform.  The creation of these windows
> is later in this series.
> 
> The following example will create a 256M device in a 512M window:
> -object "memory-backend-file,id=cxl-mem1,share,mem-path=cxl-type3,size=512M"
> -device "cxl-type3,bus=rp0,memdev=cxl-mem1,id=cxl-pmem0"
> 
> Note: Dropped PCDIMM info interfaces for now.  They can be added if
> appropriate at a later date.
> 
> Signed-off-by: Ben Widawsky 
> Signed-off-by: Jonathan Cameron 
> ---
>  hw/cxl/cxl-mailbox-utils.c |  47 ++
>  hw/mem/Kconfig |   5 ++
>  hw/mem/cxl_type3.c | 170 +
>  hw/mem/meson.build |   1 +
>  include/hw/cxl/cxl.h   |   1 +
>  include/hw/cxl/cxl_pci.h   |  22 +
>  include/hw/pci/pci_ids.h   |   1 +
>  7 files changed, 247 insertions(+)
>  create mode 100644 hw/mem/cxl_type3.c
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 16bb998735..808faec114 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -50,6 +50,8 @@ enum {
>  LOGS= 0x04,
>  #define GET_SUPPORTED 0x0
>  #define GET_LOG   0x1
> +IDENTIFY= 0x40,
> +#define MEMORY_DEVICE 0x0
>  };
>  
>  /* 8.2.8.4.5.1 Command Return Codes */
> @@ -216,6 +218,48 @@ static ret_code cmd_logs_get_log(struct cxl_cmd *cmd,
>  return CXL_MBOX_SUCCESS;
>  }
>  
> +/* 8.2.9.5.1.1 */
> +static ret_code cmd_identify_memory_device(struct cxl_cmd *cmd,
> +   CXLDeviceState *cxl_dstate,
> +   uint16_t *len)
> +{
> +struct {
> +char fw_revision[0x10];
> +uint64_t total_capacity;
> +uint64_t volatile_capacity;
> +uint64_t persistent_capacity;
> +uint64_t partition_align;
> +uint16_t info_event_log_size;
> +uint16_t warning_event_log_size;
> +uint16_t failure_event_log_size;
> +uint16_t fatal_event_log_size;
> +uint32_t lsa_size;
> +uint8_t poison_list_max_mer[3];
> +uint16_t inject_poison_limit;
> +uint8_t poison_caps;
> +uint8_t qos_telemetry_caps;
> +} __attribute__((packed)) *id;
> +_Static_assert(sizeof(*id) == 0x43, "Bad identify size");
> +
> +uint64_t size = cxl_dstate->pmem_size;
> +
> +if (!QEMU_IS_ALIGNED(size, 256 << 20)) {
> +return CXL_MBOX_INTERNAL_ERROR;
> +}
> +
> +id = (void *)cmd->payload;
> +memset(id, 0, sizeof(*id));
> +
> +/* PMEM only */
> +snprintf(id->fw_revision, 0x10, "BWFW VERSION %02d", 0);
> +
> +id->total_capacity = size / (256 << 20);
> +id->persistent_capacity = size / (256 << 20);
> +
> +*len = sizeof(*id);
> +return CXL_MBOX_SUCCESS;
> +}
> +
>  #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
>  #define IMMEDIATE_POLICY_CHANGE (1 << 3)
>  #define IMMEDIATE_LOG_CHANGE (1 << 4)
> @@ -233,8 +277,11 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
>  [TIMESTAMP][SET] = { "TIMESTAMP_SET", cmd_timestamp_set, 8, 
> IMMEDIATE_POLICY_CHANGE },
>  [LOGS][GET_SUPPORTED] = { "LOGS_GET_SUPPORTED", cmd_logs_get_supported, 
> 0, 0 },
>  [LOGS][GET_LOG] = { "LOGS_GET_LOG", cmd_logs_get_log, 0x18, 0 },
> +[IDENTIFY][MEMORY_DEVICE] = { "IDENTIFY_MEMORY_DEVICE",
> +cmd_identify_memory_device, 0, 0 },
>  };
>  
> +
>  void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
>  {
>  uint16_t ret = CXL_MBOX_SUCCESS;
> diff --git a/hw/mem/Kconfig b/hw/mem/Kconfig
> index 03dbb3c7df..73c5ae8ad9 100644
> --- a/hw/mem/Kconfig
> +++ b/hw/mem/Kconfig
> @@ -11,3 +11,8 @@ config NVDIMM
>  
>  config SPARSE_MEM
>  bool
> +
> +config CXL_MEM_DEVICE
> +bool
> +default y if CXL
> +select MEM_DEVICE
> diff --git a/hw/mem/cxl_type3.

Re: [PATCH v4 00/42] CXl 2.0 emulation Support

2022-01-25 Thread Ben Widawsky

On 22-01-25 11:18:08, Ben Widawsky wrote:
> Really awesome work Jonathan. Dan and I are wrapping up some of the kernel 
> bits,
> so all I'll do for now is try to run this, but I hope to be able to review the
> parts I'm familiar with at least.
> 
> On 22-01-24 17:16:23, Jonathan Cameron wrote:
> > Previous version was RFC v3: CXL 2.0 Support.
> > No longer an RFC as I would consider the vast majority of this
> > to be ready for detailed review. There are still questions called
> > out in some patches however.
> > 
> > Looking in particular for:
> > * Review of the PCI interactions
> > * x86 and ARM machine interactions (particularly the memory maps)
> > * Review of the interleaving approach - is the basic idea
> >   acceptable?
> > * Review of the command line interface.
> > * CXL related review welcome but much of that got reviewed
> >   in earlier versions and hasn't changed substantially.
> > 
> > Main changes:
> > * The CXL fixed memory windows are now instantiated via a
> >   -cxl-fixed-memory-window command line option.  As they are host level
> >   entities, not associated with a particular hardware entity a top
> >   level parameter seems the most natural way to describe them.
> >   This is also much closer to how it works on a real host than the
> >   previous assignment of a physical address window to all components
> >   along the CXL path.
> 
> Excellent.
> 
> > * Dynamic host memory physical address space allocation both for
> >   the CXL host bridge MMIO space and the CFMWS windows.
> 
> I thought I had done the host bridge MMIO, but perhaps I was mistaken. Either
> way, this is an important step to support all platforms more generally.
> 
> > * Interleaving support (based loosely on Philippe Mathieu-Daudé's
> >   earlier work on an interleaved memory device).  Note this is rudimentary
> >   and low performance but it may be sufficient for test purposes.
> 
> I'll have to look at this further. I had some thoughts about how we might make
> this fast, but it would be more of fake interleaving. How low is "low"?
> 
> > * Additional PCI and memory related utility functions needed for the
> >   interleaving.
> > * Various minor cleanup and increase in scope of tests.
> > * For now dropped the support for presenting CXL type 3 devices
> >   as memory devices in various QEMU interfaces.
> 
> What are the downsides to this? I only used the memory interface originally
> because it seemed like a natural fit, but looking back I'm not sure we gain
> much (though my memory is very lossy).
> 
> > * Dropped the patch letting UID be different from bus_nr.  Whilst
> >   it may be a useful thing to have, we don't need it for this series
> >   and so should be handled separately.
> > 
> > I've called out patches with major changes by marking them as
> > co-developed or introducing them as new patches. The original
> > memory window code has been dropped
> > 
> > After discussions at plumbers and more recently on the mailing list
> > it was clear that there was interest in getting emulation for CXL 2.0
> > upstream in QEMU.  This version resolves many of the outstanding issues
> > and enables the following features:
> > 
> > * Support on both x86/pc and ARM/virt with relevant ACPI tables
> >   generated in QEMU.
> > * Host bridge based on the existing PCI Expander Bridge PXB.
> > * CXL fixed memory windows, allowing host to describe interleaving
> >   across multiple CXL host bridges.
> > * pxb-cxl CXL host bridge support including MMIO region for control
> >   and HDM (Host manage device memory - basically interleaving / routing)
> >   decoder configuration.
> > * Basic CXL Root port support.
> > * CXL Type 3 device support with persistent memory regions (backed by
> >   hostmem backend).
> > * Pulled MAINTAINERS entry out to a separate patch and add myself as
> >   a co-maintainer at Ben's suggestion.
> > 
> > Big TODOs:
> > 
> > * Volatile memory devices (easy but it's more code so left for now).
> > * Switch support.
> > * Hotplug?  May not need much but it's not tested yet!
> > * More tests and tighter verification that values written to hardware
> >   are actually valid - stuff that real hardware would check.
> > * Main host bridge support (not a priority for me...)
> 
> I originally cared about this for the sake of making a system more realistic. 
> I
> now believe we should drop this entirely.
> 
> > * Testing, testing and more testing.  I have been running a basic
> >   set of ARM and x86 tests on th

Re: [PATCH v4 00/42] CXl 2.0 emulation Support

2022-01-25 Thread Ben Widawsky

Really awesome work Jonathan. Dan and I are wrapping up some of the kernel bits,
so all I'll do for now is try to run this, but I hope to be able to review the
parts I'm familiar with at least.

On 22-01-24 17:16:23, Jonathan Cameron wrote:
> Previous version was RFC v3: CXL 2.0 Support.
> No longer an RFC as I would consider the vast majority of this
> to be ready for detailed review. There are still questions called
> out in some patches however.
> 
> Looking in particular for:
> * Review of the PCI interactions
> * x86 and ARM machine interactions (particularly the memory maps)
> * Review of the interleaving approach - is the basic idea
>   acceptable?
> * Review of the command line interface.
> * CXL related review welcome but much of that got reviewed
>   in earlier versions and hasn't changed substantially.
> 
> Main changes:
> * The CXL fixed memory windows are now instantiated via a
>   -cxl-fixed-memory-window command line option.  As they are host level
>   entities, not associated with a particular hardware entity a top
>   level parameter seems the most natural way to describe them.
>   This is also much closer to how it works on a real host than the
>   previous assignment of a physical address window to all components
>   along the CXL path.

Excellent.

> * Dynamic host memory physical address space allocation both for
>   the CXL host bridge MMIO space and the CFMWS windows.

I thought I had done the host bridge MMIO, but perhaps I was mistaken. Either
way, this is an important step to support all platforms more generally.

> * Interleaving support (based loosely on Philippe Mathieu-Daudé's
>   earlier work on an interleaved memory device).  Note this is rudimentary
>   and low performance but it may be sufficient for test purposes.

I'll have to look at this further. I had some thoughts about how we might make
this fast, but it would be more of fake interleaving. How low is "low"?

> * Additional PCI and memory related utility functions needed for the
>   interleaving.
> * Various minor cleanup and increase in scope of tests.
> * For now dropped the support for presenting CXL type 3 devices
>   as memory devices in various QEMU interfaces.

What are the downsides to this? I only used the memory interface originally
because it seemed like a natural fit, but looking back I'm not sure we gain
much (though my memory is very lossy).

> * Dropped the patch letting UID be different from bus_nr.  Whilst
>   it may be a useful thing to have, we don't need it for this series
>   and so should be handled separately.
> 
> I've called out patches with major changes by marking them as
> co-developed or introducing them as new patches. The original
> memory window code has been dropped
> 
> After discussions at plumbers and more recently on the mailing list
> it was clear that there was interest in getting emulation for CXL 2.0
> upstream in QEMU.  This version resolves many of the outstanding issues
> and enables the following features:
> 
> * Support on both x86/pc and ARM/virt with relevant ACPI tables
>   generated in QEMU.
> * Host bridge based on the existing PCI Expander Bridge PXB.
> * CXL fixed memory windows, allowing host to describe interleaving
>   across multiple CXL host bridges.
> * pxb-cxl CXL host bridge support including MMIO region for control
>   and HDM (Host manage device memory - basically interleaving / routing)
>   decoder configuration.
> * Basic CXL Root port support.
> * CXL Type 3 device support with persistent memory regions (backed by
>   hostmem backend).
> * Pulled MAINTAINERS entry out to a separate patch and add myself as
>   a co-maintainer at Ben's suggestion.
> 
> Big TODOs:
> 
> * Volatile memory devices (easy but it's more code so left for now).
> * Switch support.
> * Hotplug?  May not need much but it's not tested yet!
> * More tests and tighter verification that values written to hardware
>   are actually valid - stuff that real hardware would check.
> * Main host bridge support (not a priority for me...)

I originally cared about this for the sake of making a system more realistic. I
now believe we should drop this entirely.

> * Testing, testing and more testing.  I have been running a basic
>   set of ARM and x86 tests on this, but there is always room for
>   more tests and greater automation.
> 
> Why do we want QEMU emulation of CXL?
> 
> As Ben stated in V3, QEMU support has been critical to getting OS
> software written given lack of availability of hardware supporting the
> latest CXL features (coupled with very high demand for support being
> ready in a timely fashion). What has become clear since Ben's v3
> is that situation is a continuous one.  Whilst we can't talk about
> them yet, CXL 3.0 features and OS support have been prototyped on
> top of this support and a lot of the ongoing kernel work is being
> tested against these patches.
> 
> Other features on the qemu-list that build on these include PCI-DOE
> /CDAT support from the Avery Design

Re: Follow-up on the CXL discussion at OFTC

2021-11-30 Thread Ben Widawsky

On 21-11-30 13:09:56, Jonathan Cameron wrote:
> On Mon, 29 Nov 2021 18:28:43 +
> Alex Bennée  wrote:
> 
> > Ben Widawsky  writes:
> > 
> > > On 21-11-26 12:08:08, Alex Bennée wrote:  
> > >> 
> > >> Ben Widawsky  writes:
> > >>   
> > >> > On 21-11-19 02:29:51, Shreyas Shah wrote:  
> > >> >> Hi Ben
> > >> >> 
> > >> >> Are you planning to add the CXL2.0 switch inside QEMU or already 
> > >> >> added in one of the version? 
> > >> >>
> > >> >
> > >> > From me, there are no plans for QEMU anything until/unless upstream 
> > >> > thinks it
> > >> > will merge the existing patches, or provide feedback as to what it 
> > >> > would take to
> > >> > get them merged. If upstream doesn't see a point in these patches, 
> > >> > then I really
> > >> > don't see much value in continuing to further them. Once hardware 
> > >> > comes out, the
> > >> > value proposition is certainly less.  
> > >> 
> > >> I take it:
> > >> 
> > >>   Subject: [RFC PATCH v3 00/31] CXL 2.0 Support
> > >>   Date: Mon,  1 Feb 2021 16:59:17 -0800
> > >>   Message-Id: <20210202005948.241655-1-ben.widaw...@intel.com>
> > >> 
> > >> is the current state of the support? I saw there was a fair amount of
> > >> discussion on the thread so assumed there would be a v4 forthcoming at
> > >> some point.  
> > >
> > > Hi Alex,
> > >
> > > There is a v4, however, we never really had a solid plan for the primary 
> > > issue
> > > which was around handling CXL memory expander devices properly (both from 
> > > an
> > > interleaving standpoint as well as having a device which hosts multiple 
> > > memory
> > > capacities, persistent and volatile). I didn't feel it was worth sending 
> > > a v4
> > > unless someone could say
> > >
> > > 1. we will merge what's there and fix later, or
> > > 2. you must have a more perfect emulation in place, or
> > > 3. we want to see usages for a real guest  
> > 
> > I think 1. is acceptable if the community is happy there will be ongoing
> > development and it's not just a code dump. Given it will have a
> > MAINTAINERS entry I think that is demonstrated.
> 
> My thought is also 1.  There are a few hacks we need to clean out but
> nothing that should take too long.  I'm sure it'll take a rev or two more.
> Right now for example, I've added support to arm-virt and maybe need to
> move that over to a different machine model...
> 

The most annoying thing about rebasing it is passing the ACPI tests. They keep
changing upstream. Being able to at least merge up to there would be huge.

> > 
> > What's the current use case? Testing drivers before real HW comes out?
> > Will it still be useful after real HW comes out for people wanting to
> > debug things without HW?
> 
> CXL is continuing to expand in scope and capabilities and I don't see that
> reducing any time soon (My guess is 3 years+ to just catch up with what is
> under discussion today).  So I see two long term use cases:
> 
> 1) Automated verification that we haven't broken things.  I suspect no
> one person is going to have a test farm covering all the corner cases.
> So we'll need emulation + firmware + kernel based testing.
> 

Does this exist in other forms? AFAICT for x86, there isn't much example of
this.

> 2) New feature prove out.  We have already used it for some features that
> will appear in the next spec version. Obviously I can't say what or
> send that code out yet.  Its very useful and the spec draft has changed
> in various ways a result.  I can't commit others, but Huawei will be
> doing more of this going forwards.  For that we need a stable base to
> which we add the new stuff once spec publication allows it.
> 

I can't commit for Intel but I will say there's more latitude now to work on
projects like this compared to when I first wrote the patches. I have
interesting in continuing to develop this as well. I'm very interested in
supporting interleave and hotplug specifically.

> > 
> > >
> > > I had hoped we could merge what was there mostly as is and fix it up as 
> > > we go.
> > > It's useful in the state it is now, and as time goes on, we find more 
> > > usecases
> > > for it in a VMM, and not just driver development.
> > >  
> > >> 
> > >&

Re: Follow-up on the CXL discussion at OFTC

2021-11-29 Thread Ben Widawsky

On 21-11-26 12:08:08, Alex Bennée wrote:
> 
> Ben Widawsky  writes:
> 
> > On 21-11-19 02:29:51, Shreyas Shah wrote:
> >> Hi Ben
> >> 
> >> Are you planning to add the CXL2.0 switch inside QEMU or already added in 
> >> one of the version? 
> >>  
> >
> > From me, there are no plans for QEMU anything until/unless upstream thinks 
> > it
> > will merge the existing patches, or provide feedback as to what it would 
> > take to
> > get them merged. If upstream doesn't see a point in these patches, then I 
> > really
> > don't see much value in continuing to further them. Once hardware comes 
> > out, the
> > value proposition is certainly less.
> 
> I take it:
> 
>   Subject: [RFC PATCH v3 00/31] CXL 2.0 Support
>   Date: Mon,  1 Feb 2021 16:59:17 -0800
>   Message-Id: <20210202005948.241655-1-ben.widaw...@intel.com>
> 
> is the current state of the support? I saw there was a fair amount of
> discussion on the thread so assumed there would be a v4 forthcoming at
> some point.

Hi Alex,

There is a v4, however, we never really had a solid plan for the primary issue
which was around handling CXL memory expander devices properly (both from an
interleaving standpoint as well as having a device which hosts multiple memory
capacities, persistent and volatile). I didn't feel it was worth sending a v4
unless someone could say
1. we will merge what's there and fix later, or
2. you must have a more perfect emulation in place, or
3. we want to see usages for a real guest

I had hoped we could merge what was there mostly as is and fix it up as we go.
It's useful in the state it is now, and as time goes on, we find more usecases
for it in a VMM, and not just driver development.

> 
> Adding new subsystems to QEMU does seem to be a pain point for new
> contributors. Patches tend to fall through the cracks of existing
> maintainers who spend most of their time looking at stuff that directly
> touches their files. There is also a reluctance to merge large chunks of
> functionality without an identified maintainer (and maybe reviewers) who
> can be the contact point for new patches. So in short you need:
> 
>  - Maintainer Reviewed-by/Acked-by on patches that touch other sub-systems

This is the challenging one. I have Cc'd the relevant maintainers (hw/pci and
hw/mem are the two) in the past, but I think there interest is lacking (and
reasonably so, it is an entirely different subsystem).

>  - Reviewed-by tags on the new sub-system patches from anyone who understands 
> CXL

I have/had those from Jonathan.

>  - Some* in-tree testing (so it doesn't quietly bitrot)

We had this, but it's stale now. We can bring this back up.

>  - A patch adding the sub-system to MAINTAINERS with identified people

That was there too. Since the original posting, I'd be happy to sign Jonathan up
to this if he's willing.

> 
> * Some means at least ensuring qtest can instantiate the device and not
>   fall over. Obviously more testing is better but it can always be
>   expanded on in later series.

This was in the patch series. It could use more testing for sure, but I had
basic functional testing in place via qtest.

> 
> Is that the feedback you were looking for?

You validated my assumptions as to what's needed, but your first bullet is the
one I can't seem to pin down.

Thanks.
Ben

Re: Follow-up on the CXL discussion at OFTC

2021-11-19 Thread Ben Widawsky

On 21-11-19 18:53:43, Jonathan Cameron wrote:
> On Thu, 18 Nov 2021 17:52:07 -0800
> Ben Widawsky  wrote:
> 
> > On 21-11-18 15:20:34, Saransh Gupta1 wrote:
> > > Hi Ben and Jonathan,
> > > 
> > > Thanks for your replies. I'm looking forward to the patches.
> > > 
> > > For QEMU, I see hotplug support as an item on the list and would like to 
> > > start working on it. It would be great if you can provide some pointers 
> > > about how I should go about it.  
> > 
> > It's been a while, so I can't recall what's actually missing. I think it 
> > should
> > mostly behave like a normal PCIe endpoint.
> > 
> > > Also, which version of kernel and QEMU (maybe Jonathan's upcoming 
> > > version) 
> > > would be a good starting point for it?  
> > 
> > If he rebased and claims it works I have no reason to doubt it :-). I have a
> > small fix on my v4 branch if you want to use the latest port patches.
> 
> Thanks. I'd missed that one. Now pushed down into the original patch.
> 
> It occurred to me that technically I only know my rebase works on Arm64...
> Fingers crossed for x86.
> 
> Anyhow, I'll run more tests on it next week (possibly even including x86),
> 
> Available at: 
> https://github.com/hisilicon/qemu/tree/cxl-hacks
> 
> For arm64 the description at
> https://people.kernel.org/jic23/ will almost work with this. 
> There is a bug however that I need to track down which currently means you
> need to set the pxb uid to the same as the bus number.   Shouldn't take
> long to fix but it's Friday evening...
> (add uid=0x80 to the options for pxb-cxl)
> 
> I dropped the CMA patch from Avery from this tree as need to improve
> the way it's getting hold of some parts of libSPDM and move to the current
> version of that library (rather than the old openSPDM)
> 
> Ben, if you don't mind me trying to push this forwards, I'll do a bit
> of cleanup and reordering then make use of the QEMU folks we have / know and
> try and start getting your hard work upstream.

I don't mind at all.

> 
> Whilst I've not poked the various interfaces yet, this is working with
> a kernel tree that is current cxl/next + Ira's DOE series and Ben's region 
> series
> + (for fun) my SPDM series.  That tree's a franken monster so I'm not planning
> to share it unless anyone has particular need of it.  Hopefully the various
> parts will move forwards this cycle anyway so I can stop having to spend
> as much time on rebases!
> 
> Jonathan 
> 
> > 
> > > 
> > > Thanks,
> > > Saransh
> > > 
> > > 
> > > 
> > > From:   "Jonathan Cameron" 
> > > To: "Ben Widawsky" 
> > > Cc: "Saransh Gupta1" , , 
> > > 
> > > Date:   11/17/2021 09:32 AM
> > > Subject:[EXTERNAL] Re: Follow-up on the CXL discussion at OFTC
> > > 
> > > 
> > > 
> > > On Wed, 17 Nov 2021 08:57:19 -0800
> > > Ben Widawsky  wrote:
> > >   
> > > > Hi Saransh. Please add the list for these kind of questions. I've   
> > > converted your  
> > > > HTML mail, but going forward, the list will eat it, so please use text  
> > > >  
> > > only.  
> > > > 
> > > > On 21-11-16 00:14:33, Saransh Gupta1 wrote:  
> > > > >Hi Ben,
> > > > > 
> > > > >This is Saransh from IBM. Sorry to have (unintentionally) dropped  
> > > > >  
> > > out  
> > > > >of the conversion on OFTC, I'm new to IRC.
> > > > >Just wanted to follow-up on the discussion there. We discussed   
> > > about  
> > > > >helping with linux patches reviews. On that front, I have   
> > > identified  
> > > > >some colleague(s) who can help me with this. Let me know if/how you
> > > > >want to proceed with that.   
> > > > 
> > > > Currently the ball is in my court to re-roll the RFC v2 patches [1]   
> > > based on  
> > > > feedback from Dan. I've implemented all/most of it, but I'm still   
> > > debugging some  
> > > > issues with the result.
> > > >   
> > > > > 
> > > > >Maybe not urgently, but my team would also like to get an   
> > > understanding  
> > > > >of the missing pieces in QEMU. Initially our focus is on type3   
> > > memory  
> > > > >access and hotplug support. Most of the work that my team does is
> > &g

Re: Follow-up on the CXL discussion at OFTC

2021-11-18 Thread Ben Widawsky

On 21-11-19 02:29:51, Shreyas Shah wrote:
> Hi Ben
> 
> Are you planning to add the CXL2.0 switch inside QEMU or already added in one 
> of the version? 
>  

>From me, there are no plans for QEMU anything until/unless upstream thinks it
will merge the existing patches, or provide feedback as to what it would take to
get them merged. If upstream doesn't see a point in these patches, then I really
don't see much value in continuing to further them. Once hardware comes out, the
value proposition is certainly less.

Having said that, once I get the port/region patches merged for the Linux
driver, I do intend to go back and try to implement a basic switch so that we
can test those flows.

I admit, I'm curious why you're interested in switches.

> Regards,
> Shreyas
> 
> -Original Message-
> From: Ben Widawsky  
> Sent: Thursday, November 18, 2021 5:48 PM
> To: Shreyas Shah 
> Cc: Saransh Gupta1 ; Jonathan Cameron 
> ; linux-...@vger.kernel.org; 
> qemu-devel@nongnu.org
> Subject: Re: Follow-up on the CXL discussion at OFTC
> 
> On 21-11-18 22:52:56, Shreyas Shah wrote:
> > Hello Folks,
> > 
> > Any plan to add CXL2.0 switch ports in QEMU? 
> 
> What's your definition of plan?
> 
> > 
> > Regards,
> > Shreyas
> 
> [snip]

Re: Follow-up on the CXL discussion at OFTC

2021-11-18 Thread Ben Widawsky

On 21-11-18 15:20:34, Saransh Gupta1 wrote:
> Hi Ben and Jonathan,
> 
> Thanks for your replies. I'm looking forward to the patches.
> 
> For QEMU, I see hotplug support as an item on the list and would like to 
> start working on it. It would be great if you can provide some pointers 
> about how I should go about it.

It's been a while, so I can't recall what's actually missing. I think it should
mostly behave like a normal PCIe endpoint.

> Also, which version of kernel and QEMU (maybe Jonathan's upcoming version) 
> would be a good starting point for it?

If he rebased and claims it works I have no reason to doubt it :-). I have a
small fix on my v4 branch if you want to use the latest port patches.

> 
> Thanks,
> Saransh
> 
> 
> 
> From:   "Jonathan Cameron" 
> To: "Ben Widawsky" 
> Cc: "Saransh Gupta1" , , 
> 
> Date:   11/17/2021 09:32 AM
> Subject:[EXTERNAL] Re: Follow-up on the CXL discussion at OFTC
> 
> 
> 
> On Wed, 17 Nov 2021 08:57:19 -0800
> Ben Widawsky  wrote:
> 
> > Hi Saransh. Please add the list for these kind of questions. I've 
> converted your
> > HTML mail, but going forward, the list will eat it, so please use text 
> only.
> > 
> > On 21-11-16 00:14:33, Saransh Gupta1 wrote:
> > >Hi Ben,
> > > 
> > >This is Saransh from IBM. Sorry to have (unintentionally) dropped 
> out
> > >of the conversion on OFTC, I'm new to IRC.
> > >Just wanted to follow-up on the discussion there. We discussed 
> about
> > >helping with linux patches reviews. On that front, I have 
> identified
> > >some colleague(s) who can help me with this. Let me know if/how you
> > >want to proceed with that. 
> > 
> > Currently the ball is in my court to re-roll the RFC v2 patches [1] 
> based on
> > feedback from Dan. I've implemented all/most of it, but I'm still 
> debugging some
> > issues with the result.
> > 
> > > 
> > >Maybe not urgently, but my team would also like to get an 
> understanding
> > >of the missing pieces in QEMU. Initially our focus is on type3 
> memory
> > >access and hotplug support. Most of the work that my team does is
> > >open-source, so contributing to the QEMU effort is another possible
> > >line of collaboration. 
> > 
> > If you haven't seen it already, check out my LPC talk [2]. The QEMU 
> patches
> > could use a lot of love. Mostly, I have little/no motivation until 
> upstream
> > shows an interest because I don't have time currently to make sure I 
> don't break
> > vs. upstream. If you want more details here, I can provide them, and I 
> will Cc
> > the qemu-devel mailing list; the end of the LPC talk [2] does have a 
> list.
> Hi Ben, Saransh
> 
> I have a forward port of the series + DOE etc to near current QEMU that is 
> lightly tested,
> and can look to push that out publicly later this week.
> 
> I'd also like to push QEMU support forwards and to start getting this 
> upstream in QEMU
> + fill in some of the missing parts.
> 
> Was aiming to make progress on this a few weeks ago, but as ever other 
> stuff
> got in the way.
> 
> +CC qemu-devel in case anyone else also looking at this.
> 
> Jonathan
> 
> 
> 
> > 
> > > 
> > >Thanks for your help and guidance!
> > > 
> > >Best,
> > >Saransh Gupta
> > >Research Staff Member, IBM Research 
> > 
> > [1]: 
> https://lore.kernel.org/linux-cxl/20211022183709.1199701-1-ben.widaw...@intel.com/T/#t
>  
> 
> > [2]: 
> https://www.youtube.com/watch?v=g89SLjt5Bd4=PLVsQ_xZBEyN3wA8Ej4BUjudXFbXuxhnfc=49
>  
> 
> 
> 
> 
> 
>

Re: Follow-up on the CXL discussion at OFTC

2021-11-18 Thread Ben Widawsky

On 21-11-18 22:52:56, Shreyas Shah wrote:
> Hello Folks,
> 
> Any plan to add CXL2.0 switch ports in QEMU? 

What's your definition of plan?

> 
> Regards,
> Shreyas

[snip]

Re: [CXL volatile MEM] - Qemu command to turn on HMAT and NUMA fails with assertion

2021-08-10 Thread Ben Widawsky

Thanks Dave.

Samarth,

Easiest is to just use our run_qemu and figure out the diffs (--cmdline will
print the qemu commandline):
https://github.com/pmem/run_qemu

If you're not able to figure it out after that, please let me know.

On 21-08-10 17:38:16, Samarth Saxena wrote:
> Thanks Dave,
> 
> The Qemu version is qemu-6.0.50.
> 
> I am trying to capture the stack and will place it ASAP.
> 
> Regards,
> Samarth
> 
> 
> -Original Message-
> From: Dr. David Alan Gilbert  
> Sent: Tuesday, August 10, 2021 4:58 PM
> To: Samarth Saxena ; ben.widaw...@intel.com
> Cc: qemu-devel@nongnu.org
> Subject: Re: [CXL volatile MEM] - Qemu command to turn on HMAT and NUMA fails 
> with assertion
> 
> EXTERNAL MAIL
> 
> 
> * Samarth Saxena (samar...@cadence.com) wrote:
> > Hi All,
> > 
> > I am trying the following command to run Qemu with Kernel 5.14.
> 
> cc'ing in Ben who I think owns the CXL stuff.
> 
> > qemu-system-x86_64 -M q35,accel=kvm,nvdimm=on,cxl=on,hmat=on -m 
> > 8448M,slots=2,maxmem=16G -smp 8,sockets=2,cores=2,threads=2 -hda 
> > /lan/dscratch/singhabh/shradha/ubuntu-20.10-64_with_orig_driver_backup
> > .qcow2 -enable-kvm -usbdevice tablet -L $VB_ROOT/etc/vm/common/ 
> > -object memory-backend-ram,id=cxl-ram,share=on,size=256M -device 
> > "pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52,uid=0,len-window-base=1,window-
> > base[0]=0x4c000,memdev[0]=cxl-ram" -device 
> > cxl-rp,id=rp0,bus=cxl.0,addr=0.0,chassis=0,slot=0 -device 
> > cxl-type3,bus=rp0,memdev=cxl-ram,id=cxl-vmem0,size=256M -numa 
> > node,nodeid=0,memdev=cxl-ram -object 
> > memory-backend-ram,id=other-ram,size=8G,prealloc=n,share=off -numa 
> > node,nodeid=1,memdev=other-ram,initiator=0 -numa 
> > cpu,node-id=0,socket-id=0 -numa cpu,node-id=0,socket-id=1
> 
> You probably need to state which qemu tree and version you're using here.
> 
> > I get the following crash before the machine starts to boot.
> > 
> > qemu-system-x86_64: ../softmmu/memory.c:2443: 
> > memory_region_add_subregion_common: Assertion `!subregion->container' 
> > failed.
> 
> It's probably best to check with Ben, but you'll want a backtrace and figure 
> out which subregion and region you're dealing with.
> 
> Dave
> 
> > 
> > Please help me find what's wrong here.
> > 
> > Regards,
> > [CadenceLogoRed185Regcopy1583174817new51584636989.png] > ence.com/en_US/home.html>
> > Samarth Saxena
> > Sr Principal Software Engineer
> > T: +911204308300
> > [UIcorrectsize1583179003.png]
> > [16066EmailSignatureFortune100Best2021White92x1271617625037.png] > ://www.cadence.com/en_US/home/company/careers.html>
> > 
> > 
> > 
> > 
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>

Re: CXL 2.0 memory device design

2021-03-19 Thread Ben Widawsky

On 21-03-19 18:07:05, Igor Mammedov wrote:
> On Wed, 17 Mar 2021 14:40:58 -0700
> Ben Widawsky  wrote:
> 
> > Phil, Igor, Markus
> > 
> > TL;DR: What to do about multiple capacities in a single device, and what to 
> > do
> > about interleave?
> > 
> > I've hacked together a basic CXL 2.0 implementation which exposes a CXL 
> > "Type 3"
> > memory device (CXL 2.0 Chapter 2.3). For what we were trying to do this was
> > sufficient. There are two main capabilities that CXL spec exposes which 
> > I've not
> > implemented that I'd like to start working toward and am realizing that I 
> > what I
> > have so far might not be able to carry forward to that next milestone.
> > 
> > Capability 1. A CXL memory device may have both a volatile, and a persistent
> >   capacity. https://bwidawsk.net/HDM_decoders.svg (lower right
> >   side). The current work only supports a single persistent
> >   capacity.
> > Capability 2. CXL topologies can be interleaved. Basic example:
> >   https://bwidawsk.net/HDM_decoders.svg (lower left side)
> > 
> > Memory regions are configured via a CXL spec defined HDM decoder. The HDM
> > decoder which is minimally implemented supports all the functionality 
> > mentioned
> > above (base, size, interleave, type, etc.). A CXL component may have up to 
> > 10
> > HDMs.
> > 
> > What I have today: https://bwidawsk.net/QEMU_objects.svg
> > There's a single memory backend device for each host bridge. That backend is
> > passed to any CXL component that is part of the hierarchy underneath that
> > hostbridge. In the case of a Type 3 device memory capacity a subregion is
> > created for that capacity from within the main backend. The device itself
> > implements the TYPE_MEMORY_DEVICE interface. This allows me to utilize the
> > existing memory region code to determine the next free address, and warn on
> > overlaps. It hopefully will help when I'm ready to support hotplug.
> 
> As was mentioned on IRC (and maybe on my first attempt to review your patches)
> 
> Backends are for managing host resource (RAM/file/fd) and its properties.
> A backend should match a corresponding device model (frontend/emulated hw, 
> i.e. CXL type 3 device),
> the later should manage how it looks to guest.
> 
> i.e. in CXL case I'd imagine CLI adding memory look like:
> 
> -machine cxl=on \
> -device cxl-host-bridge,id=foo \
> -device cxl-rp,id=rp0,bus="foo" ]
> -object memory-backend-file,mem-path=somefile,id=mem1 \
> -device cxl-mem,backend=mem1[,bus=rp0]
> 
> if you need to add CXL memory you add pair memory-backend-file + cxl-mem
> (which practically reflects what would happen on real hw)

Conceptually this is fine with me and I agree it more accurately reflects real
hardware. The issue has been more around how to implement that model.

> 
> Sharing a single backend between several CXL devices as a means to implement
> interleaving, looks to me as abusing backend concept.
> (that's not how it's done on real hw, memory chips (backend) that belong to a 
> CXL memory
> card are not shared with other CXL devices). It's probably address space
> that gets partitioned in small chunks to map them to one or another CXL 
> memory dev.

Yes, it is an address space that gets partitioned. Is the recommendation then to
create a new address space for each of these regions?

> 
> I'd suggest to forget about interleaving for now and implement
> a simplified variant without it.

That's fine for me, I'm just hoping if we ever get to the point of implementing
interleave, we don't have to start entirely over.

> 
> > Where I've gotten stuck: A Memory Device expects only to have one region of
> > memory. Trying to add a second breaks pretty much everything.
> 
> Memory device has very simplistic rules to map devices in address space
> (we basically open-coded part of 'memory controller' into machine code
> to do address allocation/mapping, due to PC machine historically not having
> it implemented properly).
> 
> > I'm hoping to start the discussion about what the right way to emulate this 
> > in
> > QEMU. Ideally something upstreamable would be great. I think adding a 
> > secondary
> > (or more) capacity to a memory class device is doable, but probably not the
> > right approach.
> 
> Also earlier you mentioned that it's guest who programs where CXL memory is 
> mapped,
> that isn't compatible with simplistic Memory device interface where guest
> has no say where memory is mapped, in Memory Device case, machine code picks
> the next free gap in fixed hotplug region and maps it

Re: CXL 2.0 memory device design

2021-03-18 Thread Ben Widawsky

On 21-03-17 14:40:58, Ben Widawsky wrote:
> Phil, Igor, Markus
> 
> TL;DR: What to do about multiple capacities in a single device, and what to do
> about interleave?
> 
> I've hacked together a basic CXL 2.0 implementation which exposes a CXL "Type 
> 3"
> memory device (CXL 2.0 Chapter 2.3). For what we were trying to do this was
> sufficient. There are two main capabilities that CXL spec exposes which I've 
> not
> implemented that I'd like to start working toward and am realizing that I 
> what I
> have so far might not be able to carry forward to that next milestone.
> 
> Capability 1. A CXL memory device may have both a volatile, and a persistent
> capacity. https://bwidawsk.net/HDM_decoders.svg (lower right
> side). The current work only supports a single persistent
> capacity.
> Capability 2. CXL topologies can be interleaved. Basic example:
>   https://bwidawsk.net/HDM_decoders.svg (lower left side)
> 
> Memory regions are configured via a CXL spec defined HDM decoder. The HDM
> decoder which is minimally implemented supports all the functionality 
> mentioned
> above (base, size, interleave, type, etc.). A CXL component may have up to 10
> HDMs.
> 
> What I have today: https://bwidawsk.net/QEMU_objects.svg
> There's a single memory backend device for each host bridge. That backend is
> passed to any CXL component that is part of the hierarchy underneath that
> hostbridge. In the case of a Type 3 device memory capacity a subregion is
> created for that capacity from within the main backend. The device itself
> implements the TYPE_MEMORY_DEVICE interface. This allows me to utilize the
> existing memory region code to determine the next free address, and warn on
> overlaps. It hopefully will help when I'm ready to support hotplug.
> 
> Where I've gotten stuck: A Memory Device expects only to have one region of
> memory. Trying to add a second breaks pretty much everything.
> 
> I'm hoping to start the discussion about what the right way to emulate this in
> QEMU. Ideally something upstreamable would be great. I think adding a 
> secondary
> (or more) capacity to a memory class device is doable, but probably not the
> right approach.
> 
> For context, I've posted v3 previously. Here's a link to v4 which has some 
> minor
> changes as well as moving back to using subregions instead of aliases:
> https://gitlab.com/bwidawsk/qemu/-/tree/cxl-2.0v4
> 
> Thanks.
> Ben
> 

Hello.

I spent some time thinking a bit about this. Right now a have a CXL type 3
memory device which implements TYPE_MEMORY_DEVICE interface. Perhaps the easiest
solution would be to have that same device which doesn't implement
TYPE_MEMORY_DEVICE, but does object_initialize_child_with_props() a TYPE_PC_DIMM
and a TYPE_NVDIMM kind of thing. In the current design, those would be
subclassed (or simply rewritten) to not have their own memory backend, and carve
out from the main memory backend as I describe above.

Thanks. I'm looking forward to hearing some feedback or other suggestions.
Ben

CXL 2.0 memory device design

2021-03-17 Thread Ben Widawsky

Phil, Igor, Markus

TL;DR: What to do about multiple capacities in a single device, and what to do
about interleave?

I've hacked together a basic CXL 2.0 implementation which exposes a CXL "Type 3"
memory device (CXL 2.0 Chapter 2.3). For what we were trying to do this was
sufficient. There are two main capabilities that CXL spec exposes which I've not
implemented that I'd like to start working toward and am realizing that I what I
have so far might not be able to carry forward to that next milestone.

Capability 1. A CXL memory device may have both a volatile, and a persistent
  capacity. https://bwidawsk.net/HDM_decoders.svg (lower right
  side). The current work only supports a single persistent
  capacity.
Capability 2. CXL topologies can be interleaved. Basic example:
  https://bwidawsk.net/HDM_decoders.svg (lower left side)

Memory regions are configured via a CXL spec defined HDM decoder. The HDM
decoder which is minimally implemented supports all the functionality mentioned
above (base, size, interleave, type, etc.). A CXL component may have up to 10
HDMs.

What I have today: https://bwidawsk.net/QEMU_objects.svg
There's a single memory backend device for each host bridge. That backend is
passed to any CXL component that is part of the hierarchy underneath that
hostbridge. In the case of a Type 3 device memory capacity a subregion is
created for that capacity from within the main backend. The device itself
implements the TYPE_MEMORY_DEVICE interface. This allows me to utilize the
existing memory region code to determine the next free address, and warn on
overlaps. It hopefully will help when I'm ready to support hotplug.

Where I've gotten stuck: A Memory Device expects only to have one region of
memory. Trying to add a second breaks pretty much everything.

I'm hoping to start the discussion about what the right way to emulate this in
QEMU. Ideally something upstreamable would be great. I think adding a secondary
(or more) capacity to a memory class device is doable, but probably not the
right approach.

For context, I've posted v3 previously. Here's a link to v4 which has some minor
changes as well as moving back to using subregions instead of aliases:
https://gitlab.com/bwidawsk/qemu/-/tree/cxl-2.0v4

Thanks.
Ben

[RFC PATCH] hw/mem/cxl_type3: Go back to subregions

2021-03-11 Thread Ben Widawsky

Each device allocates its memory (persistent only for now) out of a
container memory that represents a "window" that would be defined by the
host bridge. For example, a host bridge may claim all traffic from 0x0 -
0x4000; it might then also direct 0x1000-0x1fff to a specific CXL
device. Change the memory region type claiming 0x1000-0x1fff from an
alias, to a subregion.

v1 and v2 of the patch series both used a subregion. There were two
issues with this and so for v3, an alias was chosen mimicking nvdimm.

The switch to alias left an issue in the implementation. There's logic
that check to make sure two memory regions (ie. two devices under the
same host bridge) couldn't collide. While hardware doesn't make this
guarantee, it's nice for driver debug. There is no clean way to do that
with an alias.

More importantly, this change was inspired by implementing support for
both volatile and persistent memory. In that case, you may have multiple
devices which the BIOS is going to assign address ranges to. Since we
are the BIOS in this case, having a way of finding used space in the
memory window so that you can allocate the next chunk is easily
accomplished here.

With this, I have the following output from `info mtree`
004c-004c1fff (prio 0, ram): cxl-mem1
  004c-004c0fff (prio 1, i/o): cxl_type3-memory

And `info memory-devices`
Memory device [cxl]: "cxl-pmem0"
  addr: 0x4c
  slot: 0
  node: 0
  size: 268435456
  memdev: /objects/cxl-mem1
  hotplugged: false
  hotpluggable: true

This functionality has been tested with a WIP linux patch which amounts
to this:
   reg = readl(cxlm->regs.hdm_decoder + CXL_HDM_DECODER_CAP_OFFSET);

   dev_err(dev, "decoder cap:\n"
"\tcount: %lu\n",
   FIELD_GET(CXL_HDM_DECODER_COUNT_MASK, reg));

   writel(0x4c, cxlm->regs.hdm_decoder + CXL_HDM_DECODER0_BASE_HIGH_OFFSET);
   writel(0, cxlm->regs.hdm_decoder + CXL_HDM_DECODER0_BASE_LOW_OFFSET);
   writel(0, cxlm->regs.hdm_decoder + CXL_HDM_DECODER0_SIZE_HIGH_OFFSET);
   writel(256 << 20, cxlm->regs.hdm_decoder + 
CXL_HDM_DECODER0_SIZE_LOW_OFFSET);
   writel(BIT(9),  cxlm->regs.hdm_decoder + CXL_HDM_DECODER0_CTRL_OFFSET);

   tmp = ioremap_uc(0x4c, 4096);
       writel(0x20, tmp);

Cc: Jonathan Cameron 
Signed-off-by: Ben Widawsky 
---
 hw/mem/cxl_type3.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index bf33ddb915..33991079a6 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -46,7 +46,9 @@ static void build_dvsecs(CXLType3Dev *ct3d)
 static void cxl_set_addr(CXLType3Dev *ct3d, hwaddr addr, Error **errp)
 {
 MemoryDeviceClass *mdc = MEMORY_DEVICE_GET_CLASS(ct3d);
-mdc->set_addr(MEMORY_DEVICE(ct3d), addr, errp);
+MemoryRegion *mr = host_memory_backend_get_memory(ct3d->hostmem);
+
+mdc->set_addr(MEMORY_DEVICE(ct3d), addr - mr->addr, errp);
 }
 
 static void hdm_decoder_commit(CXLType3Dev *ct3d, int which)
@@ -180,13 +182,13 @@ static void cxl_setup_memory(CXLType3Dev *ct3d, Error 
**errp)
 
 memory_region_set_nonvolatile(pmem, true);
 memory_region_set_enabled(pmem, false);
-memory_region_init_alias(pmem, OBJECT(ct3d), "cxl_type3-memory", mr, 0,
- ct3d->size);
+memory_region_init(pmem, OBJECT(ct3d), "cxl_type3-memory", ct3d->size);
+memory_region_add_subregion_overlap(mr, 0, pmem, 1);
 ct3d->cxl_dstate.pmem = pmem;
 
 #ifdef SET_PMEM_PADDR
 /* This path will initialize the memory device as if BIOS had done it */
-cxl_set_addr(ct3d, mr->addr + offset, errp);
+cxl_set_addr(ct3d, offset, errp);
 memory_region_set_enabled(pmem, true);
 #endif
 }
@@ -246,8 +248,11 @@ static uint64_t cxl_md_get_addr(const MemoryDeviceState 
*md)
 CXLType3Dev *ct3d = CT3(md);
 MemoryRegion *pmem = ct3d->cxl_dstate.pmem;
 
-assert(pmem->alias);
-return pmem->alias_offset;
+if (pmem) {
+return pmem->addr + pmem->container->addr;
+}
+
+return 0;
 }
 
 static void cxl_md_set_addr(MemoryDeviceState *md, uint64_t addr, Error **errp)
@@ -255,8 +260,6 @@ static void cxl_md_set_addr(MemoryDeviceState *md, uint64_t 
addr, Error **errp)
 CXLType3Dev *ct3d = CT3(md);
 MemoryRegion *pmem = ct3d->cxl_dstate.pmem;
 
-assert(pmem->alias);
-memory_region_set_alias_offset(pmem, addr);
 memory_region_set_address(pmem, addr);
 }
 
-- 
2.30.2

Re: [RFC PATCH v3 05/31] hw/cxl/device: Implement basic mailbox (8.2.8.4)

2021-02-17 Thread Ben Widawsky

On 21-02-11 17:46:39, Jonathan Cameron wrote:
> On Tue, 2 Feb 2021 14:58:30 +
> Jonathan Cameron  wrote:
> 
> > On Mon, 1 Feb 2021 16:59:22 -0800
> > Ben Widawsky  wrote:
> > 
> > > This is the beginning of implementing mailbox support for CXL 2.0
> > > devices. The implementation recognizes when the doorbell is rung,
> > > handles the command/payload, clears the doorbell while returning error
> > > codes and data.
> > > 
> > > Generally the mailbox mechanism is designed to permit communication
> > > between the host OS and the firmware running on the device. For our
> > > purposes, we emulate both the firmware, implemented primarily in
> > > cxl-mailbox-utils.c, and the hardware.
> > > 
> > > No commands are implemented yet.
> > > 
> > > Signed-off-by: Ben Widawsky 
> > > ---
> > >  hw/cxl/cxl-device-utils.c   | 125 ++-
> > >  hw/cxl/cxl-mailbox-utils.c  | 197 
> > >  hw/cxl/meson.build  |   1 +
> > >  include/hw/cxl/cxl.h|   3 +
> > >  include/hw/cxl/cxl_device.h |  28 -
> > >  5 files changed, 349 insertions(+), 5 deletions(-)
> > >  create mode 100644 hw/cxl/cxl-mailbox-utils.c
> > > 
> > > diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
> > > index bb15ad9a0f..6602606f3d 100644
> > > --- a/hw/cxl/cxl-device-utils.c
> > > +++ b/hw/cxl/cxl-device-utils.c
> > > @@ -40,6 +40,111 @@ static uint64_t dev_reg_read(void *opaque, hwaddr 
> > > offset, unsigned size)
> > >  return 0;
> > >  }
> > >  
> > > +static uint64_t mailbox_reg_read(void *opaque, hwaddr offset, unsigned 
> > > size)
> > > +{
> > > +CXLDeviceState *cxl_dstate = opaque;
> > > +
> > > +switch (size) {
> 
> As per the thread on the linux driver and the infinite loop I saw there
> as a result of doing 1 byte reads.
> 
> With the current setup of min_access_size = 4 and this
> function QEMU will helpfully issue a series of unaligned 4 byte
> reads to this function. It will then mask those down to 1 byte each
> and combine them.  Given the integer division that results in
> the bottom byte of offset / 4 being returned up to 4 times.
> 
> To handle 2 and 1 byte reads we need explicit support in here and
> the MemoryRegionOps need to reflect that as well.
> 
> All the similar cases where such reads are allowed need to do the
> same.
> 

>From the documentation (memory.rst)
- .impl.min_access_size, .impl.max_access_size define the access sizes
  (in bytes) supported by the *implementation*; other access sizes will be
  emulated using the ones available. For example a 4-byte write will be
  emulated using four 1-byte writes, if .impl.max_access_size = 1.

I'm intentionally not looking at the implementation. As I read this, the
behavior you describe is either a QEMU bug, or poor documentation.

Am I missing something?

> > > +case 8:
> > > +return cxl_dstate->mbox_reg_state64[offset / 8];
> > > +case 4:
> > > +return cxl_dstate->mbox_reg_state32[offset / 4];  
> > 
> > Numeric order seems more natural and I can't see a reason not to do it.
> > + you do them in that order below.
> > 
> > > +default:
> > > +g_assert_not_reached();
> > > +}
> > > +}
> > > +
> > > +static void mailbox_mem_writel(uint32_t *reg_state, hwaddr offset,
> > > +   uint64_t value)
> > > +{
> > > +switch (offset) {
> > > +case A_CXL_DEV_MAILBOX_CTRL:
> > > +/* fallthrough */
> > > +case A_CXL_DEV_MAILBOX_CAP:
> > > +/* RO register */
> > > +break;
> > > +default:
> > > +qemu_log_mask(LOG_UNIMP,
> > > +  "%s Unexpected 32-bit access to 0x%" PRIx64 " 
> > > (WI)\n",
> > > +  __func__, offset);
> > > +break;
> > > +}
> > > +
> > > +reg_state[offset / 4] = value;
> > > +}
> > > +
> > > +static void mailbox_mem_writeq(uint64_t *reg_state, hwaddr offset,
> > > +   uint64_t value)
> > > +{
> > > +switch (offset) {
> > > +case A_CXL_DEV_MAILBOX_CMD:
> > > +break;
> > > +case A_CXL_DEV_BG_CMD_STS:
> > > +/* BG not supported */
> > > +/* fallthro

Re: [RFC PATCH v3 04/31] hw/cxl/device: Implement the CAP array (8.2.8.1-2)

2021-02-17 Thread Ben Widawsky

On 21-02-02 12:23:50, Jonathan Cameron wrote:
> On Mon, 1 Feb 2021 16:59:21 -0800
> Ben Widawsky  wrote:
> 
> > This implements all device MMIO up to the first capability. That
> > includes the CXL Device Capabilities Array Register, as well as all of
> > the CXL Device Capability Header Registers. The latter are filled in as
> > they are implemented in the following patches.
> > 
> > Endianness and alignment are managed by softmmu memory core.
> > 
> > Signed-off-by: Ben Widawsky 
> A few trivials
> > ---
> >  hw/cxl/cxl-device-utils.c   | 105 
> >  hw/cxl/meson.build  |   1 +
> >  include/hw/cxl/cxl_device.h |  27 +-
> >  3 files changed, 132 insertions(+), 1 deletion(-)
> >  create mode 100644 hw/cxl/cxl-device-utils.c
> > 
> > diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
> > new file mode 100644
> > index 00..bb15ad9a0f
> > --- /dev/null
> > +++ b/hw/cxl/cxl-device-utils.c
> > @@ -0,0 +1,105 @@
> > +/*
> > + * CXL Utility library for devices
> > + *
> > + * Copyright(C) 2020 Intel Corporation.
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2. See the
> > + * COPYING file in the top-level directory.
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "qemu/log.h"
> > +#include "hw/cxl/cxl.h"
> > +
> > +/*
> > + * Device registers have no restrictions per the spec, and so fall back to 
> > the
> > + * default memory mapped register rules in 8.2:
> > + *   Software shall use CXL.io Memory Read and Write to access memory 
> > mapped
> > + *   register defined in this section. Unless otherwise specified, software
> > + *   shall restrict the accesses width based on the following:
> > + *   • A 32 bit register shall   be accessed as a 1 Byte, 2 Bytes or 4 
> > Bytes
> 
> odd spacing
> 
> > + * quantity.
> > + *   • A 64 bit register shall be accessed as a 1 Byte, 2 Bytes, 4 Bytes 
> > or 8
> > + * Bytes
> > + *   • The address shall be a multiple of the access width, e.g. when
> > + * accessing a register as a 4 Byte quantity, the address shall be
> > + * multiple of 4.
> > + *   • The accesses shall map to contiguous bytes.If these rules are not
> > + * followed, the behavior is undefined
> > + */
> > +
> > +static uint64_t caps_reg_read(void *opaque, hwaddr offset, unsigned size)
> > +{
> > +CXLDeviceState *cxl_dstate = opaque;
> > +
> > +return cxl_dstate->caps_reg_state32[offset / 4];
> > +}
> > +
> > +static uint64_t dev_reg_read(void *opaque, hwaddr offset, unsigned size)
> > +{
> > +return 0;
> > +}
> > +
> > +static const MemoryRegionOps dev_ops = {
> > +.read = dev_reg_read,
> > +.write = NULL, /* status register is read only */
> > +.endianness = DEVICE_LITTLE_ENDIAN,
> > +.valid = {
> > +.min_access_size = 1,
> > +.max_access_size = 8,
> > +.unaligned = false,
> > +},
> > +.impl = {
> > +.min_access_size = 1,
> > +.max_access_size = 8,
> > +},
> > +};
> > +
> > +static const MemoryRegionOps caps_ops = {
> > +.read = caps_reg_read,
> > +.write = NULL, /* caps registers are read only */
> > +.endianness = DEVICE_LITTLE_ENDIAN,
> > +.valid = {
> > +.min_access_size = 1,
> > +.max_access_size = 8,
> > +.unaligned = false,
> > +},
> > +.impl = {
> > +.min_access_size = 4,
> > +.max_access_size = 4,
> > +},
> > +};
> > +
> > +void cxl_device_register_block_init(Object *obj, CXLDeviceState 
> > *cxl_dstate)
> > +{
> > +/* This will be a BAR, so needs to be rounded up to pow2 for PCI spec 
> > */
> > +memory_region_init(_dstate->device_registers, obj, 
> > "device-registers",
> > +   pow2ceil(CXL_MMIO_SIZE));
> > +
> > +memory_region_init_io(_dstate->caps, obj, _ops, cxl_dstate,
> > +  "cap-array", CXL_DEVICE_REGISTERS_OFFSET - 0);
> 
> Specifying a size in terms of the offset of another region isn't exactly 
> intuitive so perhaps a comment on why or better yet actually use a size
> parameter covering what is there rather than simply the region below
> the CXL_DEVICE_REGISTERS_OFFSET.
> 

I didn't have a simple way to accommodat

Re: [RFC PATCH v3 02/31] hw/cxl/component: Introduce CXL components (8.1.x, 8.2.5)

2021-02-17 Thread Ben Widawsky

On 21-02-02 11:48:15, Jonathan Cameron wrote:
> On Mon, 1 Feb 2021 16:59:19 -0800
> Ben Widawsky  wrote:
> 
> > A CXL 2.0 component is any entity in the CXL topology. All components
> > have a analogous function in PCIe. Except for the CXL host bridge, all
> > have a PCIe config space that is accessible via the common PCIe
> > mechanisms. CXL components are enumerated via DVSEC fields in the
> > extended PCIe header space. CXL components will minimally implement some
> > subset of CXL.mem and CXL.cache registers defined in 8.2.5 of the CXL
> > 2.0 specification. Two headers and a utility library are introduced to
> > support the minimum functionality needed to enumerate components.
> > 
> > The cxl_pci header manages bits associated with PCI, specifically the
> > DVSEC and related fields. The cxl_component.h variant has data
> > structures and APIs that are useful for drivers implementing any of the
> > CXL 2.0 components. The library takes care of making use of the DVSEC
> > bits and the CXL.[mem|cache] registers. Per spec, the registers are
> > little endian.
> > 
> > None of the mechanisms required to enumerate a CXL capable hostbridge
> > are introduced at this point.
> > 
> > Note that the CXL.mem and CXL.cache registers used are always 4B wide.
> > It's possible in the future that this constraint will not hold.
> > 
> > Signed-off-by: Ben Widawsky 
> 
> A few minor discrepancies from the spec, + naming suggestions.
> 
> Otherwise LGTM.
> 
> > ---
> >  MAINTAINERS|   6 +
> >  hw/Kconfig |   1 +
> >  hw/cxl/Kconfig |   3 +
> >  hw/cxl/cxl-component-utils.c   | 208 +
> >  hw/cxl/meson.build |   3 +
> >  hw/meson.build |   1 +
> >  include/hw/cxl/cxl.h   |  17 +++
> >  include/hw/cxl/cxl_component.h | 187 +
> >  include/hw/cxl/cxl_pci.h   | 138 ++
> >  9 files changed, 564 insertions(+)
> >  create mode 100644 hw/cxl/Kconfig
> >  create mode 100644 hw/cxl/cxl-component-utils.c
> >  create mode 100644 hw/cxl/meson.build
> >  create mode 100644 include/hw/cxl/cxl.h
> >  create mode 100644 include/hw/cxl/cxl_component.h
> >  create mode 100644 include/hw/cxl/cxl_pci.h
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index bcd88668bc..981dc92e25 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -2234,6 +2234,12 @@ F: qapi/block*.json
> >  F: qapi/transaction.json
> >  T: git https://repo.or.cz/qemu/armbru.git block-next
> >  
> > +Compute Express Link
> > +M: Ben Widawsky 
> > +S: Supported
> > +F: hw/cxl/
> > +F: include/hw/cxl/
> > +
> >  Dirty Bitmaps
> >  M: Eric Blake 
> >  M: Vladimir Sementsov-Ogievskiy 
> > diff --git a/hw/Kconfig b/hw/Kconfig
> > index 5ad3c6b5a4..c03650c5ed 100644
> > --- a/hw/Kconfig
> > +++ b/hw/Kconfig
> > @@ -6,6 +6,7 @@ source audio/Kconfig
> >  source block/Kconfig
> >  source char/Kconfig
> >  source core/Kconfig
> > +source cxl/Kconfig
> >  source display/Kconfig
> >  source dma/Kconfig
> >  source gpio/Kconfig
> > diff --git a/hw/cxl/Kconfig b/hw/cxl/Kconfig
> > new file mode 100644
> > index 00..8e67519b16
> > --- /dev/null
> > +++ b/hw/cxl/Kconfig
> > @@ -0,0 +1,3 @@
> > +config CXL
> > +bool
> > +default y if PCI_EXPRESS
> > diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c
> > new file mode 100644
> > index 00..8d56ad5c7d
> > --- /dev/null
> > +++ b/hw/cxl/cxl-component-utils.c
> > @@ -0,0 +1,208 @@
> > +/*
> > + * CXL Utility library for components
> > + *
> > + * Copyright(C) 2020 Intel Corporation.
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2. See the
> > + * COPYING file in the top-level directory.
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "qemu/log.h"
> > +#include "hw/pci/pci.h"
> > +#include "hw/cxl/cxl.h"
> > +
> > +static uint64_t cxl_cache_mem_read_reg(void *opaque, hwaddr offset,
> > +   unsigned size)
> > +{
> > +CXLComponentState *cxl_cstate = opaque;
> > +ComponentRegisters *cregs = _cstate->crb;
> > +
> > +assert(size == 4);
> > +
> > +if (cregs->special_ops && cregs->special_ops->read)

Re: [RFC PATCH v3 02/31] hw/cxl/component: Introduce CXL components (8.1.x, 8.2.5)

2021-02-17 Thread Ben Widawsky

On 21-02-11 17:08:45, Jonathan Cameron wrote:
> On Mon, 1 Feb 2021 16:59:19 -0800
> Ben Widawsky  wrote:
> 
> > A CXL 2.0 component is any entity in the CXL topology. All components
> > have a analogous function in PCIe. Except for the CXL host bridge, all
> > have a PCIe config space that is accessible via the common PCIe
> > mechanisms. CXL components are enumerated via DVSEC fields in the
> > extended PCIe header space. CXL components will minimally implement some
> > subset of CXL.mem and CXL.cache registers defined in 8.2.5 of the CXL
> > 2.0 specification. Two headers and a utility library are introduced to
> > support the minimum functionality needed to enumerate components.
> > 
> > The cxl_pci header manages bits associated with PCI, specifically the
> > DVSEC and related fields. The cxl_component.h variant has data
> > structures and APIs that are useful for drivers implementing any of the
> > CXL 2.0 components. The library takes care of making use of the DVSEC
> > bits and the CXL.[mem|cache] registers. Per spec, the registers are
> > little endian.
> > 
> > None of the mechanisms required to enumerate a CXL capable hostbridge
> > are introduced at this point.
> > 
> > Note that the CXL.mem and CXL.cache registers used are always 4B wide.
> > It's possible in the future that this constraint will not hold.
> > 
> > Signed-off-by: Ben Widawsky 
> A few additions to previous comments.

Thanks for continuing to look.

> 
> > ---
> >  MAINTAINERS|   6 +
> >  hw/Kconfig |   1 +
> >  hw/cxl/Kconfig |   3 +
> >  hw/cxl/cxl-component-utils.c   | 208 +
> >  hw/cxl/meson.build |   3 +
> >  hw/meson.build |   1 +
> >  include/hw/cxl/cxl.h   |  17 +++
> >  include/hw/cxl/cxl_component.h | 187 +
> >  include/hw/cxl/cxl_pci.h   | 138 ++
> >  9 files changed, 564 insertions(+)
> >  create mode 100644 hw/cxl/Kconfig
> >  create mode 100644 hw/cxl/cxl-component-utils.c
> >  create mode 100644 hw/cxl/meson.build
> >  create mode 100644 include/hw/cxl/cxl.h
> >  create mode 100644 include/hw/cxl/cxl_component.h
> >  create mode 100644 include/hw/cxl/cxl_pci.h
> > 
> 
> 
> > diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c
> > new file mode 100644
> > index 00..8d56ad5c7d
> > --- /dev/null
> > +++ b/hw/cxl/cxl-component-utils.c
> > @@ -0,0 +1,208 @@
> > +/*
> > + * CXL Utility library for components
> > + *
> > + * Copyright(C) 2020 Intel Corporation.
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2. See the
> > + * COPYING file in the top-level directory.
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "qemu/log.h"
> > +#include "hw/pci/pci.h"
> > +#include "hw/cxl/cxl.h"
> > +
> > +static uint64_t cxl_cache_mem_read_reg(void *opaque, hwaddr offset,
> > +   unsigned size)
> > +{
> > +CXLComponentState *cxl_cstate = opaque;
> > +ComponentRegisters *cregs = _cstate->crb;
> > +
> > +assert(size == 4);
> > +
> > +if (cregs->special_ops && cregs->special_ops->read) {
> > +return cregs->special_ops->read(cxl_cstate, offset, size);
> > +} else {
> > +return cregs->cache_mem_registers[offset / 4];
> > +}
> > +}
> > +
> > +static void cxl_cache_mem_write_reg(void *opaque, hwaddr offset, uint64_t 
> > value,
> > +unsigned size)
> > +{
> > +CXLComponentState *cxl_cstate = opaque;
> > +ComponentRegisters *cregs = _cstate->crb;
> > +
> > +assert(size == 4);
> > +
> > +if (cregs->special_ops && cregs->special_ops->write) {
> > +cregs->special_ops->write(cxl_cstate, offset, value, size);
> > +} else {
> > +cregs->cache_mem_registers[offset / 4] = value;
> > +}
> > +}
> > +
> > +/*
> > + * 8.2.3
> > + *   The access restrictions specified in Section 8.2.2 also apply to CXL 
> > 2.0
> > + *   Component Registers.
> > + *
> > + * 8.2.2
> > + *   • A 32 bit register shall be accessed as a 4 Bytes quantity. Partial
> > + *   reads are not permitted.
> > + *   • A 64 bit register shall be accessed as a 8

Re: [RFC v2 2/2] Basic CXL DOE for CDAT and Compliance Mode

2021-02-09 Thread Ben Widawsky

A couple of high level comments below. Overall your approach was what I had
imagined originally. The approach Jonathan took is likely more versatile (but
harder to read, for sure).

I'm fine with either and I hope you two can come to an agreement on what the
best way forward is.

My ultimate goal was to be able to take a CDAT from a real device and load it as
a blob into the ct3d for regression testing. Not sure if that's actually
possible or not.

Thanks.
Ben

On 21-02-09 15:36:03, Chris Browy wrote:
> ---
>  hw/cxl/cxl-component-utils.c   | 132 +++
>  hw/mem/cxl_type3.c | 172 
>  include/hw/cxl/cxl_cdat.h  | 120 +
>  include/hw/cxl/cxl_compl.h | 289 
> +
>  include/hw/cxl/cxl_component.h | 126 ++
>  include/hw/cxl/cxl_device.h|   3 +
>  include/hw/cxl/cxl_pci.h   |   4 +
>  7 files changed, 846 insertions(+)
>  create mode 100644 include/hw/cxl/cxl_cdat.h
>  create mode 100644 include/hw/cxl/cxl_compl.h
> 
> diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c
> index e1bcee5..fc6c538 100644
> --- a/hw/cxl/cxl-component-utils.c
> +++ b/hw/cxl/cxl-component-utils.c
> @@ -195,3 +195,135 @@ void cxl_component_create_dvsec(CXLComponentState *cxl, 
> uint16_t length,
>  range_init_nofail(>dvsecs[type], cxl->dvsec_offset, length);
>  cxl->dvsec_offset += length;
>  }
> +
> +/* Return the sum of bytes */
> +static void cdat_ent_init(CDATStruct *cs, void *base, uint32_t len)
> +{
> +cs->base = base;
> +cs->length = len;
> +}
> +
> +void cxl_doe_cdat_init(CXLComponentState *cxl_cstate)
> +{
> +uint8_t sum = 0;
> +uint32_t len = 0;
> +int i, j;
> +
> +cxl_cstate->cdat_ent_len = 7;
> +cxl_cstate->cdat_ent =
> +g_malloc0(sizeof(CDATStruct) * cxl_cstate->cdat_ent_len);
> +
> +cdat_ent_init(_cstate->cdat_ent[0],
> +  _cstate->cdat_header, sizeof(cxl_cstate->cdat_header));
> +cdat_ent_init(_cstate->cdat_ent[1],
> +  _cstate->dsmas, sizeof(cxl_cstate->dsmas));
> +cdat_ent_init(_cstate->cdat_ent[2],
> +  _cstate->dslbis, sizeof(cxl_cstate->dslbis));
> +cdat_ent_init(_cstate->cdat_ent[3],
> +  _cstate->dsmscis, sizeof(cxl_cstate->dsmscis));
> +cdat_ent_init(_cstate->cdat_ent[4],
> +  _cstate->dsis, sizeof(cxl_cstate->dsis));
> +cdat_ent_init(_cstate->cdat_ent[5],
> +  _cstate->dsemts, sizeof(cxl_cstate->dsemts));
> +cdat_ent_init(_cstate->cdat_ent[6],
> +  _cstate->sslbis, sizeof(cxl_cstate->sslbis));
> +
> +/* Set the DSMAS entry, ent = 1 */
> +cxl_cstate->dsmas.header.type = CDAT_TYPE_DSMAS;
> +cxl_cstate->dsmas.header.reserved = 0x0;
> +cxl_cstate->dsmas.header.length = sizeof(cxl_cstate->dsmas);
> +cxl_cstate->dsmas.DSMADhandle = 0x0;
> +cxl_cstate->dsmas.flags = 0x0;
> +cxl_cstate->dsmas.reserved2 = 0x0;
> +cxl_cstate->dsmas.DPA_base = 0x0;
> +cxl_cstate->dsmas.DPA_length = 0x4;
> +
> +/* Set the DSLBIS entry, ent = 2 */
> +cxl_cstate->dslbis.header.type = CDAT_TYPE_DSLBIS;
> +cxl_cstate->dslbis.header.reserved = 0;
> +cxl_cstate->dslbis.header.length = sizeof(cxl_cstate->dslbis);
> +cxl_cstate->dslbis.handle = 0;
> +cxl_cstate->dslbis.flags = 0;
> +cxl_cstate->dslbis.data_type = 0;
> +cxl_cstate->dslbis.reserved2 = 0;
> +cxl_cstate->dslbis.entry_base_unit = 0;
> +cxl_cstate->dslbis.entry[0] = 0;
> +cxl_cstate->dslbis.entry[1] = 0;
> +cxl_cstate->dslbis.entry[2] = 0;
> +cxl_cstate->dslbis.reserved3 = 0;
> +
> +/* Set the DSMSCIS entry, ent = 3 */
> +cxl_cstate->dsmscis.header.type = CDAT_TYPE_DSMSCIS;
> +cxl_cstate->dsmscis.header.reserved = 0;
> +cxl_cstate->dsmscis.header.length = sizeof(cxl_cstate->dsmscis);
> +cxl_cstate->dsmscis.DSMASH_handle = 0;
> +cxl_cstate->dsmscis.reserved2[0] = 0;
> +cxl_cstate->dsmscis.reserved2[1] = 0;
> +cxl_cstate->dsmscis.reserved2[2] = 0;
> +cxl_cstate->dsmscis.memory_side_cache_size = 0;
> +cxl_cstate->dsmscis.cache_attributes = 0;
> +
> +/* Set the DSIS entry, ent = 4 */
> +cxl_cstate->dsis.header.type = CDAT_TYPE_DSIS;
> +cxl_cstate->dsis.header.reserved = 0;
> +cxl_cstate->dsis.header.length = sizeof(cxl_cstate->dsis);
> +cxl_cstate->dsis.flags = 0;
> +cxl_cstate->dsis.handle = 0;
> +cxl_cstate->dsis.reserved2 = 0;
> +
> +/* Set the DSEMTS entry, ent = 5 */
> +cxl_cstate->dsemts.header.type = CDAT_TYPE_DSEMTS;
> +cxl_cstate->dsemts.header.reserved = 0;
> +cxl_cstate->dsemts.header.length = sizeof(cxl_cstate->dsemts);
> +cxl_cstate->dsemts.DSMAS_handle = 0;
> +cxl_cstate->dsemts.EFI_memory_type_attr = 0;
> +cxl_cstate->dsemts.reserved2 = 0;
> +cxl_cstate->dsemts.DPA_offset = 0;
> +cxl_cstate->dsemts.DPA_length = 0;
> +
> +/* Set the SSLBIS

Re: [RFC PATCH v2 1/2] Basic PCIe DOE support

2021-02-09 Thread Ben Widawsky

Have you/Jonathan come to consensus about which implementation is going forward?
I'd rather not have to review two :D

On 21-02-09 15:35:49, Chris Browy wrote:
> ---
>  MAINTAINERS   |   7 +
>  hw/pci/meson.build|   1 +
>  hw/pci/pcie.c |   2 +-
>  hw/pci/pcie_doe.c | 414 
> ++
>  include/hw/pci/pci_ids.h  |   2 +
>  include/hw/pci/pcie.h |   1 +
>  include/hw/pci/pcie_doe.h | 166 
>  include/hw/pci/pcie_regs.h|   4 +
>  include/standard-headers/linux/pci_regs.h |   3 +-
>  9 files changed, 598 insertions(+), 2 deletions(-)
>  create mode 100644 hw/pci/pcie_doe.c
>  create mode 100644 include/hw/pci/pcie_doe.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 981dc92..4fb865e 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1655,6 +1655,13 @@ F: docs/pci*
>  F: docs/specs/*pci*
>  F: default-configs/pci.mak
>  
> +PCIE DOE
> +M: Huai-Cheng Kuo 
> +M: Chris Browy 
> +S: Supported
> +F: include/hw/pci/pcie_doe.h
> +F: hw/pci/pcie_doe.c
> +
>  ACPI/SMBIOS
>  M: Michael S. Tsirkin 
>  M: Igor Mammedov 
> diff --git a/hw/pci/meson.build b/hw/pci/meson.build
> index 5c4bbac..115e502 100644
> --- a/hw/pci/meson.build
> +++ b/hw/pci/meson.build
> @@ -12,6 +12,7 @@ pci_ss.add(files(
>  # allow plugging PCIe devices into PCI buses, include them even if
>  # CONFIG_PCI_EXPRESS=n.
>  pci_ss.add(files('pcie.c', 'pcie_aer.c'))
> +pci_ss.add(files('pcie_doe.c'))

It looks like this should be like the below line:
softmmu_ss.add(when: 'CONFIG_PCI_EXPRESS', if_true: pci_doe.c))

>  softmmu_ss.add(when: 'CONFIG_PCI_EXPRESS', if_true: files('pcie_port.c', 
> 'pcie_host.c'))
>  softmmu_ss.add_all(when: 'CONFIG_PCI', if_true: pci_ss)
>  
> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> index 1ecf6f6..f7516c4 100644
> --- a/hw/pci/pcie.c
> +++ b/hw/pci/pcie.c
> @@ -735,7 +735,7 @@ void pcie_cap_slot_write_config(PCIDevice *dev,
>  
>  hotplug_event_notify(dev);
>  
> -/* 
> +/*

Please drop this.

>   * 6.7.3.2 Command Completed Events
>   *
>   * Software issues a command to a hot-plug capable Downstream Port by
> diff --git a/hw/pci/pcie_doe.c b/hw/pci/pcie_doe.c
> new file mode 100644
> index 000..df8e92e
> --- /dev/null
> +++ b/hw/pci/pcie_doe.c
> @@ -0,0 +1,414 @@
> +#include "qemu/osdep.h"
> +#include "qemu/log.h"
> +#include "qemu/error-report.h"
> +#include "qapi/error.h"
> +#include "qemu/range.h"
> +#include "hw/pci/pci.h"
> +#include "hw/pci/pcie.h"
> +#include "hw/pci/pcie_doe.h"
> +#include "hw/pci/msi.h"
> +#include "hw/pci/msix.h"
> +
> +/*
> + * DOE Default Protocols (Discovery, CMA)
> + */
> +/* Discovery Request Object */
> +struct doe_discovery {
> +DOEHeader header;
> +uint8_t index;
> +uint8_t reserved[3];
> +} QEMU_PACKED;
> +
> +/* Discovery Response Object */
> +struct doe_discovery_rsp {
> +DOEHeader header;
> +uint16_t vendor_id;
> +uint8_t doe_type;
> +uint8_t next_index;
> +} QEMU_PACKED;
> +
> +/* Callback for Discovery */
> +static bool pcie_doe_discovery_rsp(DOECap *doe_cap)
> +{
> +PCIEDOE *doe = doe_cap->doe;
> +struct doe_discovery *req = pcie_doe_get_req(doe_cap);
> +uint8_t index = req->index;
> +DOEProtocol *prot = NULL;
> +
> +/* Request length mismatch, discard */
> +if (req->header.length < dwsizeof(struct doe_discovery)) {

Use DIV_ROUND_UP instead of rolling your own thing.

> +return DOE_DISCARD;
> +}
> +
> +/* Point to the requested protocol */
> +if (index < doe->protocol_num) {
> +prot = >protocols[index];
> +}

What happens on else, should that still return DOE_SUCCESS?

> +
> +struct doe_discovery_rsp rsp = {
> +.header = {
> +.vendor_id = PCI_VENDOR_ID_PCI_SIG,
> +.doe_type = PCI_SIG_DOE_DISCOVERY,
> +.reserved = 0x0,
> +.length = dwsizeof(struct doe_discovery_rsp),
> +},

mixed declarations are not allowed.
DIV_ROUND_UP

> +.vendor_id = (prot) ? prot->vendor_id : 0x,
> +.doe_type = (prot) ? prot->doe_type : 0xFF,
> +.next_index = (index + 1) < doe->protocol_num ?
> +  (index + 1) : 0,
> +};

I prefer:
next_index = (index + 1) % doe->protocol_num

> +
> +pcie_doe_set_rsp(doe_cap, );
> +
> +return DOE_SUCCESS;
> +}
> +
> +/* Callback for CMA */
> +static bool pcie_doe_cma_rsp(DOECap *doe_cap)
> +{
> +doe_cap->status.error = 1;
> +
> +memset(doe_cap->read_mbox, 0,
> +   PCI_DOE_MAX_DW_SIZE * sizeof(uint32_t));
> +
> +doe_cap->write_mbox_len = 0;
> +
> +return DOE_DISCARD;
> +}
> +
> +/*
> + * DOE Utilities
> + */
> +static void pcie_doe_reset_mbox(DOECap *st)
> +{
> +st->read_mbox_idx = 0;
> +
> +st->read_mbox_len = 0;
> +st->write_mbox_len = 0;
> +
> +memset(st->read_mbox, 0,

Re: [RFC PATCH v1 01/01] PCIe DOE for PCIe and CXL 2.0

2021-02-08 Thread Ben Widawsky

On 21-02-08 10:55:51, Jonathan Cameron wrote:
> ...
> 
> > 
> > >   
> > >>   
> >  
> >  Just like you we feel what's most important is to have DOE supported 
> >  so that
> >  UEFI and Linux kernel and drivers can progress.  We're also 
> >  contributing to
> >  writing compliance tests for the CXL Compliance Software Development 
> >  WG.
> > >>> 
> > >>> Great.
> > >> 
> > >> Is anyone doing the kernel enabling for it?  
> > > 
> > > Planning to look at this but plenty of other things on my todo list if 
> > > someone
> > > else gets to it first.
> > > 
> > > Generic DOE support should be straight forward (the infrastructure).
> > > Parsing CDAT also straight forward.
> > > Doing something with the results is hard unless we just provide an 
> > > interface for
> > > userspace to query them for a given device - or dump the table
> > > (I think we do want to be able to that). 
> > > 
> > > What I'm really not sure on is how to handle NUMA domains that are 
> > > created late
> > > in the kernel boot sequence.  The  ACPI flow is set up with the assumption
> > > that we can get them from SRAT very early in boot. Need to figure out how 
> > > to
> > > work around that. (e.g. preallocate a bunch of spare nodes for example 
> > > though that's
> > > ugly).  Note IIRC the kernel doesn't do runtime update of any of the ACPI
> > > performance parameters yet (_SLI, _HMA) so there probably isn't any 
> > > infrastructure
> > > that we can reuse.
> > > 
> > > There is also the firmware based enumeration and description option (OS 
> > > not necessarily
> > > aware of CXL) in which this is all up to EDK2 and the kernel gets it all 
> > > presented
> > > as standard tables.  
> > 
> > Do we know who’s on this as part of the EDK2 development?  It would be 
> > great if they could
> > address the SRAT/HMAT generation from reading CDAT.  EDK2 does address CXL 
> > 1.1 now.
> 
> No idea who, if anyone, is looking at this currently.  Perhaps ask on the 
> EDK2 list?
> 
> Jonathan
> 

I did ping the folks at #edk2 in OFTC a few months back and got basically no
response. Mailing list might be best though.

> > 
> > > 
> > > As you can perhaps tell from my half done reviews, this week disappeared 
> > > in
> > > other things so bit of catch up for me to do next week.
> > > 
> > > Thanks,
> > > 
> > > Joanthan
> > > 
> ...
>

Re: [RFC PATCH v1 01/01] PCIe DOE for PCIe and CXL 2.0

2021-02-05 Thread Ben Widawsky

On 21-02-05 16:09:54, Jonathan Cameron wrote:
> On Wed, 3 Feb 2021 23:53:53 -0500
> Chris Browy  wrote:
> 
> > Hi Jonathan,
> >   
> > Thanks for the review comments and we'll put out a v2 patch series
> > based on a genuine git send-email flow in a day or so and plan to include
> > - functionally separate patches
> > - new MSI-X support
> > - few bugs found in CDAT table header + checksum generation
> > - more fully respond to review comments (thanks again!)
> > 
> > After the SSWG responds to your email on spec clarifications we'll work on
> > adding user-defined CDAT entries.  Thanks for raising the issues with SSWG!
> > 
> > It would be good to collaborate on how best to specify external CDAT files.
> > One idea is to provide -device command line property for filenames.  Files
> > could be ascii format specifying the CDAT struct instances with named 
> > fields and
> > value pairs.  Some checks could be adding when reading in the files.  Users 
> > could
> > specify the CDAT structure types in any order and have multiple instances.
> 
> I'd keep away from ascii files for this. Whilst it is horrible in some ways
> we should stick to command line ops.  If we need a more structured format then
> similar to was proposed with hmat, via libvirt.
> 
> Alternatively we could use compiled tables though we'd end up having to parse
> them to some degree.
> 

Why parse? Initially (6 months ago), I was thinking CDAT could just be a blob.
The thing I liked about that approach was that when real devices came along, we
could dump their CDATs and use it directly.

> > 
> > Just like you we feel what's most important is to have DOE supported so that
> > UEFI and Linux kernel and drivers can progress.  We're also contributing to
> > writing compliance tests for the CXL Compliance Software Development WG.
> 
> Great.

Is anyone doing the kernel enabling for it?

> 
> > 
> > Note your email did not post to lore.kernel.org/qemu-devel despite being 
> > CC’d.
> > Maybe a --in-replies-to issue.  I’ve restored that here in this email reply.
> 
> Thanks Chris.  The rejection was due to an unintended attachment.  Please 
> ignore.
> 
> Thanks,
> 
> Jonathan
> 
> 
> 
> > 
> > Best Regards,
> > Chris
> > 
> > 
> > On 2/3/21, 12:19 PM, "Jonathan Cameron"  wrote:
> > 
> > On Tue, 2 Feb 2021 15:43:28 -0500
> > Chris Browy  wrote:
> > 
> > Hi Chris,
> > 
> > Whilst I appreciate that this is very much an RFC and so not in the
> > form you would eventually aim to present it in, please look for
> > a v2 to break this into a series of functionally separate patches.
> > Probably.
> > 
> > 1. Introduce DOE support with no users - probably including the
> >discovery protocol
> > 2. CMA support
> > 3. CDAT support for CXL
> > 4. Compliance part.
> > 
> > It's also well worth jumping through the hoops needed to get a
> > git send-email workflow up and running as you seem to have had some
> > trouble with getting the thread to send in one go etc.
> > 
> > Clearly we now have two possible implementations for this functionality.
> > Personally I don't care which one we take forwards - if nothing else
> > the exercise has highlighted some disagreements in spec interpretation
> > that need clearing up.  I've mailed one big one to the SSWG list today.
> > 
> > I found a few things I definitely got wrong as well whilst reading this 
> > :)
> > Always advantages in having multiple implementations given we don't have
> > hardware yet.
> > 
> > Jonathan
> > 
> > > diff --git a/MAINTAINERS b/MAINTAINERS
> > > index 981dc92e25..4fb865e0b3 100644
> > > --- a/MAINTAINERS
> > > +++ b/MAINTAINERS
> > > @@ -1655,6 +1655,13 @@ F: docs/pci*
> > >   F: docs/specs/*pci*
> > >   F: default-configs/pci.mak
> > > 
> > > +PCIE DOE
> > > +M: Huai-Cheng Kuo 
> > > +M: Chris Browy 
> > > +S: Supported
> > > +F: include/hw/pci/pcie_doe.h
> > > +F: hw/pci/pcie_doe.c
> > > +
> > >   ACPI/SMBIOS
> > >   M: Michael S. Tsirkin 
> > >   M: Igor Mammedov 
> > > diff --git a/hw/cxl/cxl-component-utils.c 
> > b/hw/cxl/cxl-component-utils.c
> > > index e1bcee5bdb..c49d2aa896 100644
> > > --- a/hw/cxl/cxl-component-utils.c
> > > +++ b/hw/cxl/cxl-component-utils.c
> > > @@ -195,3 +195,154 @@ void 
> > cxl_component_create_dvsec(CXLComponentState *cxl, uint16_t length,
> > >   range_init_nofail(>dvsecs[type], cxl->dvsec_offset, 
> > length);
> > >   cxl->dvsec_offset += length;
> > >   }
> > > +
> > > +uint32_t cxl_doe_compliance_init(CXLComponentState *cxl_cstate)
> > > +{
> > > +PCIDevice *pci_dev = cxl_cstate->pdev;
> > > +uint32_t req;
> > > +uint32_t byte_cnt = 0;
> > > +
> > > +DOE_DBG(">> %s\n",  __func__);
> > > +
> > > +req = ((struct cxl_compliance_mode_cap 
> > *)pcie_doe_get_req(pci_dev))
> > > +

Re: [RFC PATCH v3 00/31] CXL 2.0 Support

2021-02-03 Thread Ben Widawsky

I've started a barebones project plan:
https://gitlab.com/bwidawsk/qemu/-/snippets/2070304

Jonathan, if you have a moment, perhaps you can send a MR summarizing CDAT/DOE
work from you and Chris?

If folks feel priorities are drastically off, we can discuss it in the snippet
comments.

As for wider acceptance, if I'm looking at this from the QEMU community
perspective, better test cases are really needed. If your fingers are itching
for some typing, might I suggest starting with that.

I've opted not to use issue tracker for this because I am hopeful this won't be
a long living gitlab project.

On 21-02-01 16:59:17, Ben Widawsky wrote:
> Major changes since v2 [1]:
>  * Removed all register endian/alignment/size checking. Using core 
> functionality
>instead. This untested on big endian hosts, but Should Work(tm).
>  * Fix component capability header generation (off by 1).
>  * Fixed HDM programming (multiple issues).
>  * Fixed timestamp command implementations.
>  * Added commands: GET_FIRMWARE_UPDATE_INFO, GET_PARTITION_INFO, GET_LSA, 
> SET_LSA
> 
> Things have remained fairly stable since since v2. The biggest change here is
> definitely the HDM programming which has received limited (but not 0) testing 
> in
> the Linux driver.
> 
> Jonathan Cameron has gotten this patch series working on ARM [2], and added 
> some
> much sought after functionality [3].
> 
> ---
> 
> I've started #cxl on OFTC IRC for discussion. Please feel free to use that
> channel for questions or suggestions in addition to #qemu.
> 
> ---
> 
> Introduce emulation of Compute Express Link 2.0
> (https://www.computeexpresslink.org/). Specifically, add support for Type 3
> memory expanders with persistent memory.
> 
> The emulation has been critical to get the Linux enabling started [4], it 
> would
> be an ideal place to land regression tests for different topology handling, 
> and
> there may be applications for this emulation as a way for a guest to 
> manipulate
> its address space relative to different performance memories.
> 
> Three of the five CXL component types are emulated with some level of
> functionality: host bridge, root port, and memory device. All components and
> devices implement basic MMIO. Devices/memory devices implement the mailbo
> interface. Basic ACPI support is also included. Upstream ports and downstream
> ports aren't implemented (the two components needed to make up a switch).
> 
> CXL 2.0 is built on top of PCIe (see spec for details). As a result, much of 
> the
> implementation utilizes existing PCI paradigms. To implement the host bridge,
> I've chosen to use PXB (PCI Expander Bridge). It seemed to be the most natural
> fit even though it doesn't directly map to how hardware will work. For
> persistent capacity of the memory device, I utilized the memory subsystem
> (hw/mem).
> 
> We have 3 reasons why this work is valuable:
> 1. Linux driver feature development benefits from emulation both due to a lack
>of initial hardware availability, but also, as is seen with NVDIMM/PMEM
>emulation, there is value in being able to share topologies with
>system-software developers even after hardware is available.
> 
> 2. The Linux kernel's unit test suite for NVDIMM/PMEM ended up injecting fake
>resources via custom modules (nfit_test). In retrospect a QEMU emulation of
>nfit_test capabilities would have made the test environment more portable,
>and allowed for easier community contributions of example configurations.
> 
> 3. This is still being fleshed out, but in short it provides a standardized
>mechanism for the guest to provide feedback to the host about size and
>placement needs of the memory. After the host gives the guest a physical
>window mapping to the CXL device, the emulated HDM decoders allow the 
> guest a
>way to tell the host how much it wants and where. There are likely simpler
>ways to do this, but they'd require inventing a new interface and you'd 
> need
>to have diverging driver code in the guest programming of the HDM decoder 
> vs.
>the host. Since we've already done this work, why not use it?
> 
> There is quite a long list of work to do for full spec compliance, but I don't
> believe that any of it precludes merging. Off the top of my head:
> - Main host bridge support (WIP)
> - Interleaving
> - Better Tests
> - Hot plug support
> - Emulating volatile capacity
> - CDAT emulation [3]
> 
> The flow of the patches in general is to define all the data structures and
> registers associated with the various components in a top down manner. Host
> bridge, component, ports, devices. Then, the actual implementation is done in
> the same order.
> 
> The summary is:
>

Re: [RFC PATCH 3/3] hw/cxl/cxl-device-utils: Allow incorrect read lengths

2021-02-03 Thread Ben Widawsky

On 21-02-01 23:26:55, Jonathan Cameron wrote:
> This is currently needed to avoid an issue in the Linux RFC
> in which a read is issued that is not a multiple of DW.
> On arm64 that results in byte reads being issued and a bus
> error returned.
> 
> It is not yet obvious at what level this should be fixed,
> so paper over it to get things working.
> 
> Not-signed-off-by: Jonathan Cameron 
> ---
>  hw/cxl/cxl-device-utils.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
> index d0d0a47122..52dd03384a 100644
> --- a/hw/cxl/cxl-device-utils.c
> +++ b/hw/cxl/cxl-device-utils.c
> @@ -181,11 +181,11 @@ static const MemoryRegionOps mailbox_ops = {
>  .write = mailbox_reg_write,
>  .endianness = DEVICE_LITTLE_ENDIAN,
>  .valid = {
> -.min_access_size = 4,
> +.min_access_size = 1,
>  .max_access_size = 8,
>  },
>  .impl = {
> -.min_access_size = 4,
> +.min_access_size = 1,
>  .max_access_size = 8,
>  },
>  };

I think this is now addressed in my v3. I'm happy to carry these patches around
in my branch if it helps.

They all lgtm

Re: [RFC PATCH v3 17/31] hw/cxl/component: Implement host bridge MMIO (8.2.5, table 142)

2021-02-02 Thread Ben Widawsky

On 21-02-02 20:43:38, Jonathan Cameron wrote:
> On Tue, 2 Feb 2021 11:45:05 -0800
> Ben Widawsky  wrote:
> 
> > On 21-02-02 19:21:35, Jonathan Cameron wrote:
> > > On Mon, 1 Feb 2021 16:59:34 -0800
> > > Ben Widawsky  wrote:
> > >   
> > > > CXL host bridges themselves may have MMIO. Since host bridges don't have
> > > > a BAR they are treated as special for MMIO.
> > > > 
> > > > Signed-off-by: Ben Widawsky 
> > > > 
> > > > --
> > > > 
> > > > It's arbitrarily chosen here to pick 0xD000 as the base for the host
> > > > bridge MMIO. I'm not sure what the right way to find free space for
> > > > platform hardcoded things like this is.  
> > > 
> > > Seems like this needs to come from the machine definition.
> > > This is fairly easy for arm/virt, where there is a clearly laid out 
> > > memory map.
> > > For hw/i386 I'm less sure on how to do it.  
> > 
> > I think this is how to do it :D
> 
> It may well be, but they you'll need to find a suitable region and document
> it and ensure no one else ever tramples on it.  Easy to do on a physical 
> system,
> bit trickier in emulation.
> 

Maybe? x86 is full of magic physical address holes. As long as it's conveyed to
EDK via _CRS, I think it's pretty safe. If something else tries to use the same
address, you should get a fairly obvious error.

Document somehow, yes please.

> > 
> > > 
> > > Having said that, for this particular magic device, we do have a PCI EP
> > > associated with it.  How about putting all the host bridge MMIO into a
> > > BAR of that rather than having it separate.   
> > > That has the added advantage of making it discoverable from firmware.
> > > 
> > > Any normal system is going to have this is impdef for discovery anyway.
> > >   
> > 
> > This is not how it's expected to work for Intel at least. If the device was
> > discoverable you wouldn't need CEDT/CHBS. The magic host bridges are only
> > advertised via the CEDT.
> 
> I agree on a normal system (i.e. a real one) this doesn't work,
> but then a normal system doesn't involve a magic PCI RCiEP that also happens
> to instantiate an extra host bridge. This is what pxb_pcie is doing and
> what your pxb_cxl is almost doing.
> 
> > 
> > When I build and run QEMU for x86_64, I do not see the host bridge in the 
> > pci
> > topology, do you (it's meant to not be there)?
> > 
> > 00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM 
> > Controller
> > ...
> > 34:00.0 PCI bridge: Intel Corporation Device 7075
> > 35:00.0 Memory controller [0502]: Intel Corporation Device 0d93 (rev 01)
> > 
> > That's Q35, Root Port, and Type 3 device respectively.
> 
> You don't see the host bridge, for pxb_cxl, but for pxb_pcie you do.
> 00:06.0 Host bridge: Red Hat, Inc QEMU PCIe Expander bridge.
> If you have another device after your pxb-cxl you'll also notice that there
> is a hole punched in the list where you'd expect pxb-cxl to be (device number
> skipped).  (that had me confused earlier).
> 
> This seems to be because no VID etc (unlike pxb-pcie).
> 

Right. This was in an earlier version of the series and you made me realize if I
got rid of them that it disappears. I really like that this more accurately
represents the hardware.

I agree it can be implemented more simply, but why do it if you can accurately
model it?

> I gave vendor and device IDs (and a bar to test that could be done) and it
> then appears just like pxb_pcie does.  Hence handy place to hang our
> magic memory off so that EDK2 or similar can work with it and indeed
> construct he CHBS as needed so we can get to this via the same paths as
> a normal system.  It's a bit convoluted but in some ways more elegant. 
> 

What are you looking to get out of EDK2 or similar? Anything you want to convey
should work with _CRS, I think. That was the path I was going down.

> Jonathan
> 
> > 
> > > That would then let you drop the separate definition of CXLHost structure
> > > though it needs a bit of figuring out what to do with the memory window
> > > setup etc.
> > > 
> > > I tried hacking it together, but not gotten it working yet.
> > >   
> > > > ---
> > > >  hw/pci-bridge/pci_expander_bridge.c | 53 -
> > > >  include/hw/cxl/cxl.h|  2 ++
> > > >  2 files changed, 54 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/hw/pci-bridge/pci_expa

Re: [RFC PATCH v3 17/31] hw/cxl/component: Implement host bridge MMIO (8.2.5, table 142)

2021-02-02 Thread Ben Widawsky

On 21-02-02 19:21:35, Jonathan Cameron wrote:
> On Mon, 1 Feb 2021 16:59:34 -0800
> Ben Widawsky  wrote:
> 
> > CXL host bridges themselves may have MMIO. Since host bridges don't have
> > a BAR they are treated as special for MMIO.
> > 
> > Signed-off-by: Ben Widawsky 
> > 
> > --
> > 
> > It's arbitrarily chosen here to pick 0xD000 as the base for the host
> > bridge MMIO. I'm not sure what the right way to find free space for
> > platform hardcoded things like this is.
> 
> Seems like this needs to come from the machine definition.
> This is fairly easy for arm/virt, where there is a clearly laid out memory 
> map.
> For hw/i386 I'm less sure on how to do it.

I think this is how to do it :D

> 
> Having said that, for this particular magic device, we do have a PCI EP
> associated with it.  How about putting all the host bridge MMIO into a
> BAR of that rather than having it separate.   
> That has the added advantage of making it discoverable from firmware.
> 
> Any normal system is going to have this is impdef for discovery anyway.
> 

This is not how it's expected to work for Intel at least. If the device was
discoverable you wouldn't need CEDT/CHBS. The magic host bridges are only
advertised via the CEDT.

When I build and run QEMU for x86_64, I do not see the host bridge in the pci
topology, do you (it's meant to not be there)?

00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
...
34:00.0 PCI bridge: Intel Corporation Device 7075
35:00.0 Memory controller [0502]: Intel Corporation Device 0d93 (rev 01)

That's Q35, Root Port, and Type 3 device respectively.

> That would then let you drop the separate definition of CXLHost structure
> though it needs a bit of figuring out what to do with the memory window
> setup etc.
> 
> I tried hacking it together, but not gotten it working yet.
> 
> > ---
> >  hw/pci-bridge/pci_expander_bridge.c | 53 -
> >  include/hw/cxl/cxl.h|  2 ++
> >  2 files changed, 54 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/pci-bridge/pci_expander_bridge.c 
> > b/hw/pci-bridge/pci_expander_bridge.c
> > index 5021b60435..226a8a5fff 100644
> > --- a/hw/pci-bridge/pci_expander_bridge.c
> > +++ b/hw/pci-bridge/pci_expander_bridge.c
> > @@ -17,6 +17,7 @@
> >  #include "hw/pci/pci_host.h"
> >  #include "hw/qdev-properties.h"
> >  #include "hw/pci/pci_bridge.h"
> > +#include "hw/cxl/cxl.h"
> >  #include "qemu/range.h"
> >  #include "qemu/error-report.h"
> >  #include "qemu/module.h"
> > @@ -70,6 +71,12 @@ struct PXBDev {
> >  int32_t uid;
> >  };
> >  
> > +typedef struct CXLHost {
> > +PCIHostState parent_obj;
> > +
> > +CXLComponentState cxl_cstate;
> > +} CXLHost;
> > +
> >  static PXBDev *convert_to_pxb(PCIDevice *dev)
> >  {
> >  /* A CXL PXB's parent bus is PCIe, so the normal check won't work */
> > @@ -85,6 +92,9 @@ static GList *pxb_dev_list;
> >  
> >  #define TYPE_PXB_HOST "pxb-host"
> >  
> > +#define TYPE_PXB_CXL_HOST "pxb-cxl-host"
> > +#define PXB_CXL_HOST(obj) OBJECT_CHECK(CXLHost, (obj), TYPE_PXB_CXL_HOST)
> > +
> >  static int pxb_bus_num(PCIBus *bus)
> >  {
> >  PXBDev *pxb = convert_to_pxb(bus->parent_dev);
> > @@ -198,6 +208,46 @@ static const TypeInfo pxb_host_info = {
> >  .class_init= pxb_host_class_init,
> >  };
> >  
> > +static void pxb_cxl_realize(DeviceState *dev, Error **errp)
> > +{
> > +SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
> > +PCIHostState *phb = PCI_HOST_BRIDGE(dev);
> > +CXLHost *cxl = PXB_CXL_HOST(dev);
> > +CXLComponentState *cxl_cstate = >cxl_cstate;
> > +struct MemoryRegion *mr = _cstate->crb.component_registers;
> > +
> > +cxl_component_register_block_init(OBJECT(dev), cxl_cstate,
> > +  TYPE_PXB_CXL_HOST);
> > +sysbus_init_mmio(sbd, mr);
> > +
> > +/* FIXME: support multiple host bridges. */
> > +sysbus_mmio_map(sbd, 0, CXL_HOST_BASE +
> > +memory_region_size(mr) * 
> > pci_bus_uid(phb->bus));
> > +}
> > +
> > +static void pxb_cxl_host_class_init(ObjectClass *class, void *data)
> > +{
> > +DeviceClass *dc = DEVICE_CLASS(class);
> > +PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(class);
> > +
> > +hc->root_bus_path = pxb_host_root_bus_path;
> > +dc->fw_name = "cxl"

Re: [RFC PATCH 3/4] hw/cxl/cxl-cdat: Initial CDAT implementation for use by CXL devices

2021-02-02 Thread Ben Widawsky

On 21-02-01 23:16:28, Jonathan Cameron wrote:
> CDAT is an ACPI like format defined by the CXL consortium. It is
> available from
> 
> https://www.uefi.org/node/4093
> 
> Here support for managing all the entires is introduced, along with
> an implementation of a callback for a DOE mailbox which may be
> used to read these values from CXL hardware by either firmware or
> an OS.
> 
> Signed-off-by: Jonathan Cameron 

I seem to be missing one critical thing, where is the CDAT header (Table 1 from
the spec) actually populated, length, revision, checksum, etc?

General, probably underthought-out, comment:

With the CXLType3Class I added since you probably wrote these patches, I wonder
if that'd be a better fit for populating some of these tables, having them
populate dynamically when the DOE/CDAT request actually comes in.

I have no strong opinion. The one advantage I see to using the class is
operations like managing handles, length calculations, etc can be managed by the
helper library rather than the device implementation having to do it. OTTOH if
you have devices that want to do weird things, they might lose some flexibility.

I think with just a few methods added to the CXLType3Class you could pretty
trivially build up a nice sane default CDAT for any CXL device.

> ---
>  hw/cxl/cxl-cdat.c | 252 ++
>  hw/cxl/meson.build|   1 +
>  include/hw/cxl/cxl_cdat.h | 101 +++
>  3 files changed, 354 insertions(+)
> 
> diff --git a/hw/cxl/cxl-cdat.c b/hw/cxl/cxl-cdat.c
> new file mode 100644
> index 00..6ed4c15cc0
> --- /dev/null
> +++ b/hw/cxl/cxl-cdat.c
> @@ -0,0 +1,252 @@
> +/*
> + * Support for CDAT entires as defined in
> + * Coherent Device Attribute Table (CDAT) Specification rev 1.02
> + * Available from uefi.org.
> + *
> + * Copyright (c) 2021 Jonathan Cameron 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +#include "qemu/osdep.h"
> +#include "qemu/units.h"
> +#include "qemu/error-report.h"
> +#include "hw/mem/memory-device.h"
> +#include "hw/mem/pc-dimm.h"
> +#include "hw/pci/pci.h"
> +#include "hw/pci/doe.h"
> +#include "hw/qdev-properties.h"
> +#include "qapi/error.h"
> +#include "qemu/log.h"
> +#include "qemu/module.h"
> +#include "qemu/range.h"
> +#include "qemu/rcu.h"
> +#include "sysemu/hostmem.h"
> +#include "sysemu/numa.h"
> +#include "hw/cxl/cxl.h"
> +
> +void cdat_add_dsmas(CXLCDAT *cdat, uint8_t dsmad_handle, uint8_t flags,
> +uint64_t dpa_base, uint64_t dpa_length)
> +{
> +struct cxl_cdat_dsmas *dsmas = g_malloc0(sizeof(*dsmas));
> +dsmas->dsmad_handle = dsmad_handle;
> +dsmas->flags = flags;
> +dsmas->dpa_base = dpa_base;
> +dsmas->dpa_length = dpa_length;
> +cdat->dsmas_list = g_list_append(cdat->dsmas_list, dsmas);
> +}
> +
> +void cdat_add_dslbis(CXLCDAT *cdat, uint8_t handle, uint8_t flags,
> + uint8_t data_type, uint64_t base_unit,
> + uint16_t entry0, uint16_t entry1, uint16_t entry2)
> +{
> +struct cxl_cdat_dslbis *dslbis = g_malloc0(sizeof(*dslbis));
> +dslbis->handle = handle;
> +dslbis->flags = flags;
> +dslbis->data_type = data_type;
> +dslbis->entry_base_unit = base_unit;
> +dslbis->entries[0] = entry0;
> +dslbis->entries[1] = entry1;
> +dslbis->entries[2] = entry2;
> +cdat->dslbis_list = g_list_append(cdat->dslbis_list, dslbis);
> +}
> +
> +void cdat_add_dsmscis(CXLCDAT *cdat, uint8_t dsmas_handle,
> +  uint64_t memory_sc_size, uint64_t cache_attrs)
> +{
> +struct cxl_cdat_dsmscis *dsmscis = g_malloc(sizeof(*dsmscis));
> +dsmscis->dsmas_handle = dsmas_handle;
> +dsmscis->memory_side_cache_size = memory_sc_size;
> +dsmscis->cache_attributes = cache_attrs;
> +cdat->dsmscis_list = g_list_append(cdat->dsmscis_list, dsmscis);
> +}
> +
> +void cdat_add_dsis(CXLCDAT *cdat, uint8_t flags, uint8_t handle)
> +{
> +struct cxl_cdat_dsis *dsis = g_malloc(sizeof(*dsis));
> +dsis->flags = flags;
> +dsis->handle = handle;
> +cdat->dsis_list = g_list_append(cdat->dsis_list, dsis);
> +}
> +
> +void cdat_add_dsemts(CXLCDAT *cdat, uint8_t dsmas_handle,
> + uint8_t efi_mem_type_attr, uint64_t dpa_offset,
> + uint64_t dpa_length)
> +{
> +struct cxl_cdat_dsemts *dsemts = g_malloc(sizeof(*dsemts));
> +dsemts->dsmas_handle = dsmas_handle;
> +dsemts->efi_mem_type_attr = efi_mem_type_attr;
> +dsemts->dpa_offset = dpa_offset;
> +dsemts->dpa_length = dpa_length;
> +cdat->dsemts_list = g_list_append(cdat->dsemts_list, dsemts);
> +}
> +
> +struct cxl_cdat_sslbis *cdat_add_sslbis(CXLCDAT *cdat, uint8_t num_entries,
> +uint8_t data_type, uint64_t 
> base_unit)
> +{
> +struct cxl_cdat_sslbis *sslbis =
> +g_malloc(sizeof(*sslbis) + num_entries *

Re: [RFC PATCH 2/4] hw/pci/pcie_doe: Introduce utility functions for PCIe DOE

2021-02-02 Thread Ben Widawsky

This was a bit more complicated than I was anticipating :-)

On 21-02-01 23:16:27, Jonathan Cameron wrote:
> This implements the ECN to the PCI 5.0 specification available at
> https://members.pcisig.com/wg/PCI-SIG/document/14143
> 
> Does not currently support interrupts.
> 
> Note that currently no attempt is made to clean up allocated memory.
> 
> Signed-off-by: Jonathan Cameron 
> ---
>  hw/pci/meson.build   |   2 +-
>  hw/pci/pcie_doe.c| 257 +++
>  include/hw/pci/doe.h |  40 ++
>  include/hw/pci/pci_ids.h |   2 +
>  4 files changed, 300 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/pci/meson.build b/hw/pci/meson.build
> index 5c4bbac817..7336620ee3 100644
> --- a/hw/pci/meson.build
> +++ b/hw/pci/meson.build
> @@ -11,7 +11,7 @@ pci_ss.add(files(
>  # The functions in these modules can be used by devices too.  Since we
>  # allow plugging PCIe devices into PCI buses, include them even if
>  # CONFIG_PCI_EXPRESS=n.
> -pci_ss.add(files('pcie.c', 'pcie_aer.c'))
> +pci_ss.add(files('pcie.c', 'pcie_aer.c',  'pcie_doe.c'))
>  softmmu_ss.add(when: 'CONFIG_PCI_EXPRESS', if_true: files('pcie_port.c', 
> 'pcie_host.c'))
>  softmmu_ss.add_all(when: 'CONFIG_PCI', if_true: pci_ss)
>  
> diff --git a/hw/pci/pcie_doe.c b/hw/pci/pcie_doe.c
> new file mode 100644
> index 00..8739c41280
> --- /dev/null
> +++ b/hw/pci/pcie_doe.c
> @@ -0,0 +1,257 @@
> +/*
> + * pcie_doe.c
> + * utility functions for pci express data object exchange introduced
> + * in PCI 5.0 Data Object Exchange (DOE) ECN
> + *
> + * Copyright (c) 2021 Jonathan Cameron 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/units.h"
> +#include "qemu/error-report.h"
> +#include "hw/pci/pci.h"
> +#include "hw/pci/doe.h"
> +#include "hw/qdev-properties.h"
> +#include "qapi/error.h"
> +#include "qemu/log.h"
> +#include "qemu/module.h"
> +#include "qemu/range.h"
> +#include "qemu/rcu.h"
> +#include "sysemu/hostmem.h"
> +

I know it's RFC and quickly thrown together, but:

/* VID and Type */
#define DOE_DATA_OBJECT_HDR1 0
/* Length */
#define DOE_DATA_OBJECT_HDR2 1
/* Index */
#define DOE_DATA_OBJECT_REQUEST_DATA 2
#define DOE_DATA_OBJECT_RESPONSE_DATA 3

Then use that throughout below, please?

> +struct doe_handler {
> +uint16_t vendor_id;
> +uint8_t object_type;
> +doe_msg_handler_t handler;
> +void *priv;
> +};
> +
> +static void doe_set_ctl(PCIEDOE *doe, uint32_t val)
> +{
> +/* Abort */
> +if (val & PCI_DOE_CTRL_DOE_ABORT) {
> +doe->req_index = 0;
> +doe->rsp_index = 0;
> +doe->req_length = 0;
> +doe->error = false;
> +doe->data_object_ready = false;
> +}
> +
> +if (val & PCI_DOE_CTRL_DOE_GO) {
> +GList *l;
> +uint16_t vendor_id = doe->store[0] & PCI_DATA_OBJ_DW0_VID;
> +uint8_t object_type = (doe->store[0] & PCI_DATA_OBJ_DW0_TYPE) >>
> +ctz32(PCI_DATA_OBJ_DW0_TYPE);

I think it'd be much nicer to read this as REG32/FIELD_EX32.

> +if ((doe->req_index != 3) || (doe->req_length != 3)) {
> +/*
> + * Not entirely clear what should happen if req_length is correct
> + * buf insufficient data has been received.

s/buf/but

Also, maybe for more resiliency and readability, with a comment about why '3':
if ((doe->req_index < 3) ...

I think it'd be much more readable if you pulled the index directly out of the
register store here instead of when you set.

len = doe->store[DOE_DATA_OBJECT_HDR2] & PCI_DATA_OBJ_DW1_LEN;
if (!len)
len = 1<<18;

> + */
> +doe->error = true;
> +return;

I don't think this should be an error:
"If the Length is shorter than expected for a specific data object, then the 
data
object must be silently discarded.

If the Length is greater than expected for a specific data object, then the
portion of the data object up to the expected length must be processednormally
and the remainder of the data object must be silently discarded."

> +}
> +/* Discovery protocol - DOE ECN */
> +if (vendor_id == PCI_VENDOR_ID_PCI_SIG &&
> +object_type == PCI_DOE_DIS_OBJ_TYPE) {
> +uint8_t index = doe->store[2] & PCI_DOE_DIS_REQ_D0_DW0_INDEX;
> +doe->store[1] = 3;
> +if (index == 0) {
> +/* First entry is this one, the discovery protocol itself */
> +uint8_t next;
> +
> +if (doe->cb_list) {
> +next = index + 1;
> +} else {
> +next = 0;
> +}

I think a comment here that you're terminating the list if no callbacks are
registered would be good.

> +doe->store[2] =
> +(next << ctz32(PCI_DOE_DIS_RSP_D0_DW0_NEXT_INDEX)) |
> +

Re: [RFC PATCH v3 16/31] hw/pci: Plumb _UID through host bridges

2021-02-02 Thread Ben Widawsky

On 21-02-02 10:51:55, Michael S. Tsirkin wrote:
> On Tue, Feb 02, 2021 at 07:42:57AM -0800, Ben Widawsky wrote:
> > Thanks for looking! Mixing comments to Jonathan and Michael..
> > 
> > On 21-02-02 10:24:43, Michael S. Tsirkin wrote:
> > > On Tue, Feb 02, 2021 at 03:00:56PM +, Jonathan Cameron wrote:
> > > > On Mon, 1 Feb 2021 16:59:33 -0800
> > > > Ben Widawsky  wrote:
> > > > 
> > > > > Currently, QEMU makes _UID equivalent to the bus number (_BBN). While
> > > > > there is nothing wrong with doing it this way, CXL spec has a heavy
> > > > > reliance on _UID to identify host bridges and there is no link to the
> > > > > bus number. Having a distinct UID solves two problems. The first is it
> > > > > gets us around the limitation of 256 (current max bus number).
> > > 
> > > Not sure I understand. You want more than 256 host bridges?
> > > 
> > 
> > I don't want more than 256 host bridges, but I want the ability to 
> > disaggregate
> > _UID and bus (_BBN). The reasoning is just to align with the spec where 
> > _UID is
> > used to identify a CXL host bridge which is unrelated (perhaps) to the bus
> > number.
> 
> Which spec is that?
> 

CXL spec
https://www.computeexpresslink.org/download-the-specification

The spec introduces a new ACPI table which links information about a host bridge
based on the _UID. The _UID is 4 bytes.

"In an ACPI compliant system, there shall be one instance of CXL Host Bridge
Device object in ACPI namespace (HID=”ACPI0016”) for every CHBS entry. The _UID
object under a CXL Host Bridge object, when evaluated, shall match the UID field
in the associated CHBS entry."

"CXL Host Bridge Unique ID. Used to associate a CHBS instance with CXL Host
Bridge instance. The value of this field shall match the output of _UID under
the associated CXL Host Bridge in ACPI namespace"


> > > > The
> > > > > second is it allows us to replicate hardware configurations where bus
> > > > > number and uid aren't equivalent.
> > > 
> > > A bit more data on when this needs to be the case?
> > > 
> > 
> > Doesn't *need* to be the case. I was making a concerted effort to allow full
> > spec flexibility, but I don't believe it to be necessary unless we want to
> > accurately emulate a real platform.
> > 
> > > > The latter has benefits for our
> > > > > development and debugging using QEMU.
> > > > > 
> > > > > The other way to do this would be to implement the expanded bus
> > > > > numbering, but having an explicit uid makes more sense when trying to
> > > > > replicate real hardware configurations.
> > > > > 
> > > > > The QEMU commandline to utilize this would be:
> > > > >   -device pxb-cxl,id=cxl.0,bus="pcie.0",bus_nr=1,uid=x
> > > > > 
> > > > > Signed-off-by: Ben Widawsky 
> > > 
> > > However, if doing this how do we ensure UID is still unique?
> > > What do we do for cases where UID was not specified?
> > > One idea is to generate a string UID and just stick the bus #
> > > in there.
> > 
> > This is totally mishandled in the code currently. I like your idea though.
> > 
> > > 
> > > 
> > > > > --
> > > > > 
> > > > > I'm guessing this patch will be somewhat controversial. For early CXL
> > > > > work, this can be dropped without too much heartache.
> > > > 
> > > > Whilst I'm not personally against, this maybe best to drop for now as 
> > > > you
> > > > say.
> > > > 
> > 
> > I think it'd be good to understand from the PCIe experts if CXL matches in 
> > this
> > regard. If PCIe generally allows (and does in practice) _UID not matching 
> > _BBN,
> > perhaps this is an overall improvement to the code.
> 
> Well
> 
> 6.1.12 _UID (Unique ID)
>   This object provides OSPM with a logical device ID that does not change 
> across reboots. This object
>   is optional, but is required when the device has no other way to report 
> a persistent unique device ID.
>   The _UID must be unique across all devices with either a common _HID or 
> _CID. This is because a
>   device needs to be uniquely identified to the OSPM, which may match on 
> either a _HID or a _CID to
>   identify the device. The uniqueness match must be true regardless of 
> whether the OSPM uses the
>   _HID or t

Re: [RFC PATCH v3 16/31] hw/pci: Plumb _UID through host bridges

2021-02-02 Thread Ben Widawsky

Thanks for looking! Mixing comments to Jonathan and Michael..

On 21-02-02 10:24:43, Michael S. Tsirkin wrote:
> On Tue, Feb 02, 2021 at 03:00:56PM +, Jonathan Cameron wrote:
> > On Mon, 1 Feb 2021 16:59:33 -0800
> > Ben Widawsky  wrote:
> > 
> > > Currently, QEMU makes _UID equivalent to the bus number (_BBN). While
> > > there is nothing wrong with doing it this way, CXL spec has a heavy
> > > reliance on _UID to identify host bridges and there is no link to the
> > > bus number. Having a distinct UID solves two problems. The first is it
> > > gets us around the limitation of 256 (current max bus number).
> 
> Not sure I understand. You want more than 256 host bridges?
> 

I don't want more than 256 host bridges, but I want the ability to disaggregate
_UID and bus (_BBN). The reasoning is just to align with the spec where _UID is
used to identify a CXL host bridge which is unrelated (perhaps) to the bus
number.

> > The
> > > second is it allows us to replicate hardware configurations where bus
> > > number and uid aren't equivalent.
> 
> A bit more data on when this needs to be the case?
> 

Doesn't *need* to be the case. I was making a concerted effort to allow full
spec flexibility, but I don't believe it to be necessary unless we want to
accurately emulate a real platform.

> > The latter has benefits for our
> > > development and debugging using QEMU.
> > > 
> > > The other way to do this would be to implement the expanded bus
> > > numbering, but having an explicit uid makes more sense when trying to
> > > replicate real hardware configurations.
> > > 
> > > The QEMU commandline to utilize this would be:
> > >   -device pxb-cxl,id=cxl.0,bus="pcie.0",bus_nr=1,uid=x
> > > 
> > > Signed-off-by: Ben Widawsky 
> 
> However, if doing this how do we ensure UID is still unique?
> What do we do for cases where UID was not specified?
> One idea is to generate a string UID and just stick the bus #
> in there.

This is totally mishandled in the code currently. I like your idea though.

> 
> 
> > > --
> > > 
> > > I'm guessing this patch will be somewhat controversial. For early CXL
> > > work, this can be dropped without too much heartache.
> > 
> > Whilst I'm not personally against, this maybe best to drop for now as you
> > say.
> > 

I think it'd be good to understand from the PCIe experts if CXL matches in this
regard. If PCIe generally allows (and does in practice) _UID not matching _BBN,
perhaps this is an overall improvement to the code.

> > > ---
> > >  hw/i386/acpi-build.c|  3 ++-
> > >  hw/pci-bridge/pci_expander_bridge.c | 19 +++
> > >  hw/pci/pci.c| 11 +++
> > >  include/hw/pci/pci.h|  1 +
> > >  include/hw/pci/pci_bus.h|  1 +
> > >  5 files changed, 34 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > > index cf6eb54c22..145a503e92 100644
> > > --- a/hw/i386/acpi-build.c
> > > +++ b/hw/i386/acpi-build.c
> > > @@ -1343,6 +1343,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
> > >  QLIST_FOREACH(bus, >child, sibling) {
> > >  uint8_t bus_num = pci_bus_num(bus);
> > >  uint8_t numa_node = pci_bus_numa_node(bus);
> > > +int32_t uid = pci_bus_uid(bus);
> > >  
> > >  /* look only for expander root buses */
> > >  if (!pci_bus_is_root(bus)) {
> > > @@ -1356,7 +1357,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
> > >  scope = aml_scope("\\_SB");
> > >  dev = aml_device("PC%.02X", bus_num);
> > >  aml_append(dev, aml_name_decl("_BBN", aml_int(bus_num)));
> > > -init_pci_acpi(dev, bus_num, pci_bus_is_express(bus) ? PCIE : 
> > > PCI);
> > > +init_pci_acpi(dev, uid, pci_bus_is_express(bus) ? PCIE : 
> > > PCI);
> > >  
> > >  if (numa_node != NUMA_NODE_UNASSIGNED) {
> > >  aml_append(dev, aml_name_decl("_PXM", 
> > > aml_int(numa_node)));
> > > diff --git a/hw/pci-bridge/pci_expander_bridge.c 
> > > b/hw/pci-bridge/pci_expander_bridge.c
> > > index b42592e1ff..5021b60435 100644
> > > --- a/hw/pci-bridge/pci_expander_bridge.c
> > > +++ b/hw/pci-bridge/pci_expander_bridge.c
> > > @@ -67,6 +67,7 @@ struct PXBDev {

Re: [RFC PATCH 1/4] include/standard-headers/linux/pci_regs: temp hack to add necessary DOE definitions.

2021-02-02 Thread Ben Widawsky

On 21-02-01 23:16:26, Jonathan Cameron wrote:
> Signed-off-by: Jonathan Cameron 
> ---
>  include/standard-headers/linux/pci_regs.h | 33 ++-
>  1 file changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/include/standard-headers/linux/pci_regs.h 
> b/include/standard-headers/linux/pci_regs.h
> index e709ae8235..7e852d3dd0 100644
> --- a/include/standard-headers/linux/pci_regs.h
> +++ b/include/standard-headers/linux/pci_regs.h
> @@ -730,7 +730,8 @@
>  #define PCI_EXT_CAP_ID_DVSEC 0x23/* Designated Vendor-Specific */
>  #define PCI_EXT_CAP_ID_DLF   0x25/* Data Link Feature */
>  #define PCI_EXT_CAP_ID_PL_16GT   0x26/* Physical Layer 16.0 GT/s */
> -#define PCI_EXT_CAP_ID_MAX   PCI_EXT_CAP_ID_PL_16GT
> +#define PCI_EXT_CAP_ID_DOE   0x2E/* Data Object Exchange */
> +#define PCI_EXT_CAP_ID_MAX   PCI_EXT_CAP_ID_DOE
>  
>  #define PCI_EXT_CAP_DSN_SIZEOF   12
>  #define PCI_EXT_CAP_MCAST_ENDPOINT_SIZEOF 40
> @@ -1092,4 +1093,34 @@
>  #define  PCI_PL_16GT_LE_CTRL_USP_TX_PRESET_MASK  0x00F0
>  #define  PCI_PL_16GT_LE_CTRL_USP_TX_PRESET_SHIFT 4
>  
> +/* Data Object Exchange */
> +#define PCI_DOE_CAP  0x04
> +#define  PCI_DOE_CAP_INT_SUPPORT 0x0001
> +#define  PCI_DOE_CAP_INT_MSG_NUM 0x0FFE
> +
> +#define PCI_DOE_CTRL 0x08
> +#define  PCI_DOE_CTRL_DOE_ABORT  0x0001
> +#define  PCI_DOE_CTRL_DOE_INT_EN 0x0002
> +#define  PCI_DOE_CTRL_DOE_GO 0x8000
> +
> +#define PCI_DOE_STATUS   0x0c
> +#define  PCI_DOE_STATUS_DOE_BUSY 0x0001
> +#define  PCI_DOE_STATUS_INT_STATUS   0x0002
> +#define  PCI_DOE_STATUS_DOE_ERROR0x0004
> +#define  PCI_DOE_STATUS_DATA_OBJECT_READY0x8000
> +
> +#define PCI_DOE_WRITE_MAILBOX0x10
> +#define PCI_DOE_READ_MAILBOX 0x14
> +
> +/* Data Object Format DOE ECN 6.xx.1 */
> +#define PCI_DATA_OBJ_DW0_VID 0x
> +#define PCI_DATA_OBJ_DW0_TYPE0x00ff
> +#define PCI_DATA_OBJ_DW1_LEN 0x0003
> +
> +/* DOE Discover Data Object */
> +#define PCI_DOE_DIS_OBJ_TYPE  0x1
> +#define PCI_DOE_DIS_REQ_D0_DW0_INDEX 0x00ff
> +#define PCI_DOE_DIS_RSP_DO_DW0_VID   0x
> +#define PCI_DOE_DIS_RSP_D0_DW0_PROT  0x00ff
> +#define PCI_DOE_DIS_RSP_D0_DW0_NEXT_INDEX0xff00
>  #endif /* LINUX_PCI_REGS_H */

I think a lot of these should have had _MASK at the end.

As for the accuracy of the values, lgtm.

Re: [RFC PATCH v3 21/31] hw/cxl/device: Add a memory device (8.2.8.5)

2021-02-02 Thread Ben Widawsky

On 21-02-02 08:26:14, Eric Blake wrote:
> On 2/1/21 6:59 PM, Ben Widawsky wrote:
> > A CXL memory device (AKA Type 3) is a CXL component that contains some
> > combination of volatile and persistent memory. It also implements the
> > previously defined mailbox interface as well as the memory device
> > firmware interface.
> > 
> > Although the memory device is configured like a normal PCIe device, the
> > memory traffic is on an entirely separate bus conceptually (using the
> > same physical wires as PCIe, but different protocol).
> > 
> > The guest physical address for the memory device is part of a larger
> > window which is owned by the platform. Currently, this is hardcoded as
> > an object property on host bridge (PXB) creation, but that will need to
> > change for interleaving.
> > 
> > The following example will create a 256M device in a 512M window:
> > -object "memory-backend-file,id=cxl-mem1,share,mem-path=cxl-type3,size=512M"
> > -device "cxl-type3,bus=rp0,memdev=cxl-mem1,id=cxl-pmem0,size=256M"
> > 
> > Signed-off-by: Ben Widawsky 
> > ---
> 
> > +++ b/qapi/machine.json
> > @@ -1394,6 +1394,7 @@
> >  { 'union': 'MemoryDeviceInfo',
> >'data': { 'dimm': 'PCDIMMDeviceInfo',
> >  'nvdimm': 'PCDIMMDeviceInfo',
> > +'cxl': 'PCDIMMDeviceInfo',
> >  'virtio-pmem': 'VirtioPMEMDeviceInfo',
> >  'virtio-mem': 'VirtioMEMDeviceInfo'
> >}
> 
> Missing documentation that 'cxl' was introduced in 6.0.  Also, is it
> worth keeping the branches of this union in lexicographic order?
> 

Sure.

As discussed on the list previously, I think more thought needs to be put in
here, and I could really use some input.

A CXL type3 memory device can have both persistent and volatile capacity. As
such a single PCDIMMDeviceInfo I believe is insufficient. The current code
supports persistent memory only, so this is fine for now.

I'd guess my best bet is to create a new CXLType3DeviceInfo, but I'm not
entirely sure of all the implications that has.

Any advice?

> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3226
> Virtualization:  qemu.org | libvirt.org
> 
>

[RFC PATCH v3 31/31] WIP: i386/cxl: Initialize a host bridge

2021-02-01 Thread Ben Widawsky

This patch allows initializing the primary host bridge as a CXL capable
hostbridge.

Signed-off-by: Ben Widawsky 

--
This patch is WIP.
---
 hw/arm/virt.c|  1 +
 hw/core/machine.c| 26 ++
 hw/i386/acpi-build.c |  8 +++-
 hw/i386/microvm.c|  1 +
 hw/i386/pc.c |  1 +
 hw/ppc/spapr.c   |  2 ++
 include/hw/boards.h  |  2 ++
 include/hw/cxl/cxl.h |  4 
 8 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 399da73454..fd5f5b656c 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2547,6 +2547,7 @@ static void virt_machine_class_init(ObjectClass *oc, void 
*data)
 hc->unplug_request = virt_machine_device_unplug_request_cb;
 hc->unplug = virt_machine_device_unplug_cb;
 mc->nvdimm_supported = true;
+mc->cxl_supported = false;
 mc->auto_enable_numa_with_memhp = true;
 mc->auto_enable_numa_with_memdev = true;
 mc->default_ram_id = "mach-virt.ram";
diff --git a/hw/core/machine.c b/hw/core/machine.c
index de3b8f1b31..c739803854 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -30,6 +30,7 @@
 #include "sysemu/qtest.h"
 #include "hw/pci/pci.h"
 #include "hw/mem/nvdimm.h"
+#include "hw/cxl/cxl.h"
 #include "migration/global_state.h"
 #include "migration/vmstate.h"
 
@@ -502,6 +503,20 @@ static void machine_set_nvdimm_persistence(Object *obj, 
const char *value,
 nvdimms_state->persistence_string = g_strdup(value);
 }
 
+static bool machine_get_cxl(Object *obj, Error **errp)
+{
+MachineState *ms = MACHINE(obj);
+
+return ms->cxl_devices_state->is_enabled;
+}
+
+static void machine_set_cxl(Object *obj, bool value, Error **errp)
+{
+MachineState *ms = MACHINE(obj);
+
+ms->cxl_devices_state->is_enabled = value;
+}
+
 void machine_class_allow_dynamic_sysbus_dev(MachineClass *mc, const char *type)
 {
 QAPI_LIST_PREPEND(mc->allowed_dynamic_sysbus_devices, g_strdup(type));
@@ -903,6 +918,16 @@ static void machine_initfn(Object *obj)
 "Valid values are cpu, mem-ctrl");
 }
 
+if (mc->cxl_supported) {
+Object *obj = OBJECT(ms);
+
+ms->cxl_devices_state = g_new0(CXLState, 1);
+object_property_add_bool(obj, "cxl", machine_get_cxl, machine_set_cxl);
+object_property_set_description(obj, "cxl",
+"Set on/off to enable/disable "
+"CXL instantiation");
+}
+
 if (mc->cpu_index_to_instance_props && mc->get_default_cpu_node_id) {
 ms->numa_state = g_new0(NumaState, 1);
 object_property_add_bool(obj, "hmat",
@@ -939,6 +964,7 @@ static void machine_finalize(Object *obj)
 g_free(ms->device_memory);
 g_free(ms->nvdimms_state);
 g_free(ms->numa_state);
+g_free(ms->cxl_devices_state);
 }
 
 bool machine_usb(MachineState *machine)
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 7706856c49..2250e6d27b 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -53,6 +53,7 @@
 #include "sysemu/numa.h"
 #include "sysemu/reset.h"
 #include "hw/hyperv/vmbus-bridge.h"
+#include "hw/cxl/cxl.h"
 
 /* Supported chipsets: */
 #include "hw/southbridge/piix.h"
@@ -1277,8 +1278,13 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 build_piix4_pci0_int(dsdt);
 } else {
 sb_scope = aml_scope("_SB");
+/*
+ * XXX: CXL spec calls this "CXL0", but that would require lots of
+ * changes throughout and so even for CXL enabled, we call it "PCI0"
+ */
 dev = aml_device("PCI0");
-init_pci_acpi(dev, 0, PCIE);
+init_pci_acpi(dev, 0,
+machine->cxl_devices_state->is_enabled ? CXL : PCIE);
 aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
 aml_append(sb_scope, dev);
 
diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index edf2b0f061..970b299a69 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -688,6 +688,7 @@ static void microvm_class_init(ObjectClass *oc, void *data)
 mc->auto_enable_numa_with_memdev = false;
 mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE;
 mc->nvdimm_supported = false;
+mc->cxl_supported = false;
 mc->default_ram_id = "microvm.ram";
 
 /* Avoid relying too much on kernel components */
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 5d41809b37..7350eeea9c 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1725,6 +1725,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 hc->unplug = pc_machine_device_unplug_cb;
 mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE;
 mc->nvd

[RFC PATCH v3 27/31] hw/cxl/device: Add some trivial commands

2021-02-01 Thread Ben Widawsky

GET_FW_INFO and GET_PARTITION_INFO, for this emulation, is equivalent to
info already returned in the IDENTIFY command. To have a more robust
implementation, add those.

Signed-off-by: Ben Widawsky 
---
 hw/cxl/cxl-mailbox-utils.c | 65 ++
 1 file changed, 65 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index f92dfad882..dc8e0eb08e 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -43,6 +43,8 @@ enum {
 #define CLEAR_RECORDS   0x1
 #define GET_INTERRUPT_POLICY   0x2
 #define SET_INTERRUPT_POLICY   0x3
+FIRMWARE_UPDATE = 0x02,
+#define GET_INFO  0x0
 TIMESTAMP   = 0x03,
 #define GET   0x0
 #define SET   0x1
@@ -51,6 +53,8 @@ enum {
 #define GET_LOG   0x1
 IDENTIFY= 0x40,
 #define MEMORY_DEVICE 0x0
+CCLS= 0x41,
+#define GET_PARTITION_INFO 0x0
 };
 
 /* 8.2.8.4.5.1 Command Return Codes */
@@ -125,11 +129,13 @@ define_mailbox_handler_zeroed(EVENTS_GET_RECORDS, 0x20);
 define_mailbox_handler_nop(EVENTS_CLEAR_RECORDS);
 define_mailbox_handler_zeroed(EVENTS_GET_INTERRUPT_POLICY, 4);
 define_mailbox_handler_nop(EVENTS_SET_INTERRUPT_POLICY);
+declare_mailbox_handler(FIRMWARE_UPDATE_GET_INFO);
 declare_mailbox_handler(TIMESTAMP_GET);
 declare_mailbox_handler(TIMESTAMP_SET);
 declare_mailbox_handler(LOGS_GET_SUPPORTED);
 declare_mailbox_handler(LOGS_GET_LOG);
 declare_mailbox_handler(IDENTIFY_MEMORY_DEVICE);
+declare_mailbox_handler(CCLS_GET_PARTITION_INFO);
 
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -143,15 +149,50 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
 CXL_CMD(EVENTS, CLEAR_RECORDS, ~0, IMMEDIATE_LOG_CHANGE),
 CXL_CMD(EVENTS, GET_INTERRUPT_POLICY, 0, 0),
 CXL_CMD(EVENTS, SET_INTERRUPT_POLICY, 4, IMMEDIATE_CONFIG_CHANGE),
+CXL_CMD(FIRMWARE_UPDATE, GET_INFO, 0, 0),
 CXL_CMD(TIMESTAMP, GET, 0, 0),
 CXL_CMD(TIMESTAMP, SET, 8, IMMEDIATE_POLICY_CHANGE),
 CXL_CMD(LOGS, GET_SUPPORTED, 0, 0),
 CXL_CMD(LOGS, GET_LOG, 0x18, 0),
 CXL_CMD(IDENTIFY, MEMORY_DEVICE, 0, 0),
+CXL_CMD(CCLS, GET_PARTITION_INFO, 0, 0),
 };
 
 #undef CXL_CMD
 
+/*
+ * 8.2.9.2.1
+ */
+define_mailbox_handler(FIRMWARE_UPDATE_GET_INFO)
+{
+struct {
+uint8_t slots_supported;
+uint8_t slot_info;
+uint8_t caps;
+uint8_t rsvd[0xd];
+char fw_rev1[0x10];
+char fw_rev2[0x10];
+char fw_rev3[0x10];
+char fw_rev4[0x10];
+} __attribute__((packed)) *fw_info;
+_Static_assert(sizeof(*fw_info) == 0x50, "Bad firmware info size");
+
+if (memory_region_size(cxl_dstate->pmem) < (256 << 20)) {
+return CXL_MBOX_INTERNAL_ERROR;
+}
+
+fw_info = (void *)cmd->payload;
+memset(fw_info, 0, sizeof(*fw_info));
+
+fw_info->slots_supported = 2;
+fw_info->slot_info = BIT(0) | BIT(3);
+fw_info->caps = 0;
+snprintf(fw_info->fw_rev1, 0x10, "BWFW VERSION %02d", 0);
+
+*len = sizeof(*fw_info);
+return CXL_MBOX_SUCCESS;
+}
+
 /*
  * 8.2.9.3.1
  */
@@ -296,6 +337,30 @@ define_mailbox_handler(IDENTIFY_MEMORY_DEVICE)
 return CXL_MBOX_SUCCESS;
 }
 
+define_mailbox_handler(CCLS_GET_PARTITION_INFO)
+{
+struct {
+uint64_t active_vmem;
+uint64_t active_pmem;
+uint64_t next_vmem;
+uint64_t next_pmem;
+} __attribute__((packed)) *part_info = (void *)cmd->payload;
+_Static_assert(sizeof(*part_info) == 0x20, "Bad get partition info size");
+
+if (memory_region_size(cxl_dstate->pmem) < (256 << 20)) {
+return CXL_MBOX_INTERNAL_ERROR;
+}
+
+/* PMEM only */
+part_info->active_vmem = 0;
+part_info->next_vmem = 0;
+part_info->active_pmem = memory_region_size(cxl_dstate->pmem);
+part_info->next_pmem = part_info->active_pmem;
+
+*len = sizeof(*part_info);
+return CXL_MBOX_SUCCESS;
+}
+
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
 {
 uint16_t ret = CXL_MBOX_SUCCESS;
-- 
2.30.0

[RFC PATCH v3 23/31] acpi/cxl: Add _OSC implementation (9.14.2)

2021-02-01 Thread Ben Widawsky

CXL 2.0 specification adds 2 new dwords to the existing _OSC definition
from PCIe. The new dwords are accessed with a new uuid. This
implementation supports what is in the specification.

We are currently in the process of trying to define a new definition for
_OSC. See later work for an explanation.

Signed-off-by: Ben Widawsky 
---
 hw/acpi/Kconfig   |   5 ++
 hw/acpi/cxl.c | 104 ++
 hw/acpi/meson.build   |   1 +
 hw/i386/acpi-build.c  |  12 -
 include/hw/acpi/cxl.h |  23 ++
 5 files changed, 144 insertions(+), 1 deletion(-)
 create mode 100644 hw/acpi/cxl.c
 create mode 100644 include/hw/acpi/cxl.h

diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
index 1932f66af8..b27907953e 100644
--- a/hw/acpi/Kconfig
+++ b/hw/acpi/Kconfig
@@ -5,6 +5,7 @@ config ACPI_X86
 bool
 select ACPI
 select ACPI_NVDIMM
+select ACPI_CXL
 select ACPI_CPU_HOTPLUG
 select ACPI_MEMORY_HOTPLUG
 select ACPI_HMAT
@@ -42,3 +43,7 @@ config ACPI_VMGENID
 depends on PC
 
 config ACPI_HW_REDUCED
+
+config ACPI_CXL
+bool
+depends on ACPI
diff --git a/hw/acpi/cxl.c b/hw/acpi/cxl.c
new file mode 100644
index 00..7124d5a1a3
--- /dev/null
+++ b/hw/acpi/cxl.c
@@ -0,0 +1,104 @@
+/*
+ * CXL ACPI Implementation
+ *
+ * Copyright(C) 2020 Intel Corporation.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#include "qemu/osdep.h"
+#include "hw/cxl/cxl.h"
+#include "hw/acpi/acpi.h"
+#include "hw/acpi/aml-build.h"
+#include "hw/acpi/bios-linker-loader.h"
+#include "hw/acpi/cxl.h"
+#include "qapi/error.h"
+#include "qemu/uuid.h"
+
+static Aml *__build_cxl_osc_method(void)
+{
+Aml *method, *if_uuid, *else_uuid, *if_arg1_not_1, *if_cxl, 
*if_caps_masked;
+Aml *a_ctrl = aml_local(0);
+Aml *a_cdw1 = aml_name("CDW1");
+
+method = aml_method("_OSC", 4, AML_NOTSERIALIZED);
+aml_append(method, aml_create_dword_field(aml_arg(3), aml_int(0), "CDW1"));
+
+/* 9.14.2.1.4 */
+if_uuid = aml_if(
+aml_lor(aml_equal(aml_arg(0),
+  aml_touuid("33DB4D5B-1FF7-401C-9657-7441C03DD766")),
+aml_equal(aml_arg(0),
+  
aml_touuid("68F2D50B-C469-4D8A-BD3D-941A103FD3FC";
+aml_append(if_uuid, aml_create_dword_field(aml_arg(3), aml_int(4), 
"CDW2"));
+aml_append(if_uuid, aml_create_dword_field(aml_arg(3), aml_int(8), 
"CDW3"));
+
+aml_append(if_uuid, aml_store(aml_name("CDW3"), a_ctrl));
+
+/* This is all the same as what's used for PCIe */
+aml_append(if_uuid,
+   aml_and(aml_name("CTRL"), aml_int(0x1F), aml_name("CTRL")));
+
+if_arg1_not_1 = aml_if(aml_lnot(aml_equal(aml_arg(1), aml_int(0x1;
+/* Unknown revision */
+aml_append(if_arg1_not_1, aml_or(a_cdw1, aml_int(0x08), a_cdw1));
+aml_append(if_uuid, if_arg1_not_1);
+
+if_caps_masked = aml_if(aml_lnot(aml_equal(aml_name("CDW3"), a_ctrl)));
+/* Capability bits were masked */
+aml_append(if_caps_masked, aml_or(a_cdw1, aml_int(0x10), a_cdw1));
+aml_append(if_uuid, if_caps_masked);
+
+aml_append(if_uuid, aml_store(aml_name("CDW2"), aml_name("SUPP")));
+aml_append(if_uuid, aml_store(aml_name("CDW3"), aml_name("CTRL")));
+
+if_cxl = aml_if(aml_equal(
+aml_arg(0), aml_touuid("68F2D50B-C469-4D8A-BD3D-941A103FD3FC")));
+/* CXL support field */
+aml_append(if_cxl, aml_create_dword_field(aml_arg(3), aml_int(12), 
"CDW4"));
+/* CXL capabilities */
+aml_append(if_cxl, aml_create_dword_field(aml_arg(3), aml_int(16), 
"CDW5"));
+aml_append(if_cxl, aml_store(aml_name("CDW4"), aml_name("SUPC")));
+aml_append(if_cxl, aml_store(aml_name("CDW5"), aml_name("CTRC")));
+
+/* CXL 2.0 Port/Device Register access */
+aml_append(if_cxl,
+   aml_or(aml_name("CDW5"), aml_int(0x1), aml_name("CDW5")));
+aml_append(if_uuid, if_cxl);
+
+/* Update DWORD3 (the return value) */
+aml_append(if_uuid, aml_store(a_ctrl, aml_name("CDW3")));
+
+aml_append(if_u

[RFC PATCH v3 30/31] qtest/cxl: Add very basic sanity tests

2021-02-01 Thread Ben Widawsky

Signed-off-by: Ben Widawsky 
---
 tests/qtest/cxl-test.c  | 93 +
 tests/qtest/meson.build |  4 ++
 2 files changed, 97 insertions(+)
 create mode 100644 tests/qtest/cxl-test.c

diff --git a/tests/qtest/cxl-test.c b/tests/qtest/cxl-test.c
new file mode 100644
index 00..00eca14faa
--- /dev/null
+++ b/tests/qtest/cxl-test.c
@@ -0,0 +1,93 @@
+/*
+ * QTest testcase for CXL
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest-single.h"
+
+#define QEMU_PXB_CMD "-machine q35 -object memory-backend-file,id=cxl-mem1," \
+ "share,mem-path=%s,size=512M "  \
+ "-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52,uid=0,"  \
+ 
"len-window-base=1,window-base[0]=0x4c000,memdev[0]=cxl-mem1"
+#define QEMU_RP "-device cxl-rp,id=rp0,bus=cxl.0,addr=0.0,chassis=0,slot=0"
+
+#define QEMU_T3D "-device 
cxl-type3,bus=rp0,memdev=cxl-mem1,id=cxl-pmem0,size=256M"
+
+static void cxl_basic_hb(void)
+{
+qtest_start("-machine q35,cxl");
+qtest_end();
+}
+
+static void cxl_basic_pxb(void)
+{
+qtest_start("-machine q35 -device pxb-cxl,bus=pcie.0,uid=0");
+qtest_end();
+}
+
+static void cxl_pxb_with_window(void)
+{
+GString *cmdline;
+char template[] = "/tmp/cxl-test-XX";
+const char *tmpfs;
+
+tmpfs = mkdtemp(template);
+
+cmdline = g_string_new(NULL);
+g_string_printf(cmdline, QEMU_PXB_CMD, tmpfs);
+
+qtest_start(cmdline->str);
+qtest_end();
+
+g_string_free(cmdline, TRUE);
+}
+
+static void cxl_root_port(void)
+{
+GString *cmdline;
+char template[] = "/tmp/cxl-test-XX";
+const char *tmpfs;
+
+tmpfs = mkdtemp(template);
+
+cmdline = g_string_new(NULL);
+g_string_printf(cmdline, QEMU_PXB_CMD " %s", tmpfs, QEMU_RP);
+
+qtest_start(cmdline->str);
+qtest_end();
+
+g_string_free(cmdline, TRUE);
+}
+
+static void cxl_t3d(void)
+{
+GString *cmdline;
+char template[] = "/tmp/cxl-test-XX";
+const char *tmpfs;
+
+tmpfs = mkdtemp(template);
+
+cmdline = g_string_new(NULL);
+g_string_printf(cmdline, QEMU_PXB_CMD " %s %s", tmpfs, QEMU_RP, QEMU_T3D);
+
+qtest_start(cmdline->str);
+qtest_end();
+
+g_string_free(cmdline, TRUE);
+}
+
+int main(int argc, char **argv)
+{
+g_test_init(, , NULL);
+
+qtest_add_func("/pci/cxl/basic_hostbridge", cxl_basic_hb);
+qtest_add_func("/pci/cxl/basic_pxb", cxl_basic_pxb);
+qtest_add_func("/pci/cxl/pxb_with_window", cxl_pxb_with_window);
+qtest_add_func("/pci/cxl/root_port", cxl_root_port);
+qtest_add_func("/pci/cxl/type3_device", cxl_t3d);
+
+return g_test_run();
+}
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index c83bc211b6..554152b7c5 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -22,6 +22,9 @@ qtests_pci = \
   (config_all_devices.has_key('CONFIG_VGA') ? ['display-vga-test'] : []) + 
 \
   (config_all_devices.has_key('CONFIG_IVSHMEM_DEVICE') ? ['ivshmem-test'] : [])
 
+qtests_cxl = \
+  (config_all_devices.has_key('CONFIG_CXL') ? ['cxl-test'] : [])
+
 qtests_i386 = \
   (slirp.found() ? ['pxe-test', 'test-netfilter'] : []) + \
   (config_host.has_key('CONFIG_POSIX') ? ['test-filter-mirror'] : []) +
 \
@@ -48,6 +51,7 @@ qtests_i386 = \
   (config_all_devices.has_key('CONFIG_TPM_TIS_ISA') ? ['tpm-tis-swtpm-test'] : 
[]) +\
   (config_all_devices.has_key('CONFIG_RTL8139_PCI') ? ['rtl8139-test'] : []) + 
 \
   qtests_pci + 
 \
+  qtests_cxl + 
 \
   ['fdc-test',
'ide-test',
'hd-geo-test',
-- 
2.30.0

[RFC PATCH v3 21/31] hw/cxl/device: Add a memory device (8.2.8.5)

2021-02-01 Thread Ben Widawsky

A CXL memory device (AKA Type 3) is a CXL component that contains some
combination of volatile and persistent memory. It also implements the
previously defined mailbox interface as well as the memory device
firmware interface.

Although the memory device is configured like a normal PCIe device, the
memory traffic is on an entirely separate bus conceptually (using the
same physical wires as PCIe, but different protocol).

The guest physical address for the memory device is part of a larger
window which is owned by the platform. Currently, this is hardcoded as
an object property on host bridge (PXB) creation, but that will need to
change for interleaving.

The following example will create a 256M device in a 512M window:
-object "memory-backend-file,id=cxl-mem1,share,mem-path=cxl-type3,size=512M"
-device "cxl-type3,bus=rp0,memdev=cxl-mem1,id=cxl-pmem0,size=256M"

Signed-off-by: Ben Widawsky 
---
 hw/core/numa.c |   3 +
 hw/cxl/cxl-mailbox-utils.c |  41 ++
 hw/i386/pc.c   |   1 +
 hw/mem/Kconfig |   5 +
 hw/mem/cxl_type3.c | 281 +
 hw/mem/meson.build |   1 +
 hw/pci/pcie.c  |  30 
 include/hw/cxl/cxl.h   |   2 +
 include/hw/cxl/cxl_pci.h   |  22 +++
 include/hw/pci/pci_ids.h   |   1 +
 monitor/hmp-cmds.c |  15 ++
 qapi/machine.json  |   1 +
 12 files changed, 403 insertions(+)
 create mode 100644 hw/mem/cxl_type3.c

diff --git a/hw/core/numa.c b/hw/core/numa.c
index 68cee65f61..cd7df371e6 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -770,6 +770,9 @@ static void numa_stat_memory_devices(NumaNodeMem node_mem[])
 node_mem[pcdimm_info->node].node_plugged_mem +=
 pcdimm_info->size;
 break;
+case MEMORY_DEVICE_INFO_KIND_CXL:
+/* FINISHME */
+break;
 case MEMORY_DEVICE_INFO_KIND_VIRTIO_PMEM:
 vpi = value->u.virtio_pmem.data;
 /* TODO: once we support numa, assign to right node */
diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 3f0ae8b9e5..f92dfad882 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -49,6 +49,8 @@ enum {
 LOGS= 0x04,
 #define GET_SUPPORTED 0x0
 #define GET_LOG   0x1
+IDENTIFY= 0x40,
+#define MEMORY_DEVICE 0x0
 };
 
 /* 8.2.8.4.5.1 Command Return Codes */
@@ -127,6 +129,7 @@ declare_mailbox_handler(TIMESTAMP_GET);
 declare_mailbox_handler(TIMESTAMP_SET);
 declare_mailbox_handler(LOGS_GET_SUPPORTED);
 declare_mailbox_handler(LOGS_GET_LOG);
+declare_mailbox_handler(IDENTIFY_MEMORY_DEVICE);
 
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -144,6 +147,7 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
 CXL_CMD(TIMESTAMP, SET, 8, IMMEDIATE_POLICY_CHANGE),
 CXL_CMD(LOGS, GET_SUPPORTED, 0, 0),
 CXL_CMD(LOGS, GET_LOG, 0x18, 0),
+CXL_CMD(IDENTIFY, MEMORY_DEVICE, 0, 0),
 };
 
 #undef CXL_CMD
@@ -255,6 +259,43 @@ define_mailbox_handler(LOGS_GET_LOG)
 return CXL_MBOX_SUCCESS;
 }
 
+/* 8.2.9.5.1.1 */
+define_mailbox_handler(IDENTIFY_MEMORY_DEVICE)
+{
+struct {
+char fw_revision[0x10];
+uint64_t total_capacity;
+uint64_t volatile_capacity;
+uint64_t persistent_capacity;
+uint64_t partition_align;
+uint16_t info_event_log_size;
+uint16_t warning_event_log_size;
+uint16_t failure_event_log_size;
+uint16_t fatal_event_log_size;
+uint32_t lsa_size;
+uint8_t poison_list_max_mer[3];
+uint16_t inject_poison_limit;
+uint8_t poison_caps;
+uint8_t qos_telemetry_caps;
+} __attribute__((packed)) *id;
+_Static_assert(sizeof(*id) == 0x43, "Bad identify size");
+
+if (memory_region_size(cxl_dstate->pmem) < (256 << 20)) {
+return CXL_MBOX_INTERNAL_ERROR;
+}
+
+id = (void *)cmd->payload;
+memset(id, 0, sizeof(*id));
+
+/* PMEM only */
+snprintf(id->fw_revision, 0x10, "BWFW VERSION %02d", 0);
+id->total_capacity = memory_region_size(cxl_dstate->pmem);
+id->persistent_capacity = memory_region_size(cxl_dstate->pmem);
+
+*len = sizeof(*id);
+return CXL_MBOX_SUCCESS;
+}
+
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
 {
 uint16_t ret = CXL_MBOX_SUCCESS;
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 5458f61d10..5d41809b37 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -79,6 +79,7 @@
 #include "acpi-build.h"
 #include "hw/mem/pc-dimm.h"
 #include "hw/mem/nvdimm.h"
+#include "hw/cxl/cxl.h"
 #include "qapi/error.h"
 #include "qapi/qapi-visit-common.h"
 #include "qapi/visitor.h"
diff --git a/hw/mem/Kconfig b/hw/mem/Kconfig
index a0ef2cf648..7d9d1ced3e 100644
---

[RFC PATCH v3 19/31] hw/pxb/cxl: Add "windows" for host bridges

2021-02-01 Thread Ben Widawsky

In a bare metal CXL capable system, system firmware will program
physical address ranges on the host. This is done by programming
internal registers that aren't typically known to OS. These address
ranges might be contiguous or interleaved across host bridges.

For a QEMU guest a new construct is introduced allowing passing a memory
backend to the host bridge for this same purpose. Each memory backend
needs to be passed to the host bridge as well as any device that will be
emulating that memory (not implemented here).

I'm hopeful the interleaving work in the link can be re-purposed here
(see Link).

An example to create a host bridges with a 512M window at 0x4c000
 -object memory-backend-file,id=cxl-mem1,share,mem-path=cxl-type3,size=512M
 -device 
pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52,uid=0,len-memory-base=1,memory-base\[0\]=0x4c000,memory\[0\]=cxl-mem1

Link: https://lists.nongnu.org/archive/html/qemu-devel/2020-08/msg03680.html
Signed-off-by: Ben Widawsky 
---
 hw/pci-bridge/pci_expander_bridge.c | 65 +++--
 include/hw/cxl/cxl.h|  1 +
 2 files changed, 62 insertions(+), 4 deletions(-)

diff --git a/hw/pci-bridge/pci_expander_bridge.c 
b/hw/pci-bridge/pci_expander_bridge.c
index 226a8a5fff..af1450c69d 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -69,12 +69,19 @@ struct PXBDev {
 uint8_t bus_nr;
 uint16_t numa_node;
 int32_t uid;
+struct cxl_dev {
+HostMemoryBackend *memory_window[CXL_WINDOW_MAX];
+
+uint32_t num_windows;
+hwaddr *window_base[CXL_WINDOW_MAX];
+} cxl;
 };
 
 typedef struct CXLHost {
 PCIHostState parent_obj;
 
 CXLComponentState cxl_cstate;
+PXBDev *dev;
 } CXLHost;
 
 static PXBDev *convert_to_pxb(PCIDevice *dev)
@@ -213,16 +220,31 @@ static void pxb_cxl_realize(DeviceState *dev, Error 
**errp)
 SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
 PCIHostState *phb = PCI_HOST_BRIDGE(dev);
 CXLHost *cxl = PXB_CXL_HOST(dev);
+struct cxl_dev *cxl_dev = >dev->cxl;
 CXLComponentState *cxl_cstate = >cxl_cstate;
 struct MemoryRegion *mr = _cstate->crb.component_registers;
+int uid = pci_bus_uid(phb->bus);
 
 cxl_component_register_block_init(OBJECT(dev), cxl_cstate,
   TYPE_PXB_CXL_HOST);
 sysbus_init_mmio(sbd, mr);
 
-/* FIXME: support multiple host bridges. */
-sysbus_mmio_map(sbd, 0, CXL_HOST_BASE +
-memory_region_size(mr) * pci_bus_uid(phb->bus));
+sysbus_mmio_map(sbd, 0, CXL_HOST_BASE + memory_region_size(mr) * uid);
+
+/*
+ * A CXL host bridge can exist without a fixed memory window, but it would
+ * only operate in legacy PCIe mode.
+ */
+if (!cxl_dev->memory_window[uid]) {
+warn_report(
+"CXL expander bridge created without window. Consider using %s",
+"memdev[0]=");
+return;
+}
+
+mr = host_memory_backend_get_memory(cxl_dev->memory_window[uid]);
+sysbus_init_mmio(sbd, mr);
+sysbus_mmio_map(sbd, 1 + uid, *cxl_dev->window_base[uid]);
 }
 
 static void pxb_cxl_host_class_init(ObjectClass *class, void *data)
@@ -328,6 +350,7 @@ static void pxb_dev_realize_common(PCIDevice *dev, enum 
BusType type,
 } else if (type == CXL) {
 bus = pci_root_bus_new(ds, dev_name, NULL, NULL, 0, TYPE_PXB_CXL_BUS);
 bus->flags |= PCI_BUS_CXL;
+PXB_CXL_HOST(ds)->dev = PXB_CXL_DEV(dev);
 } else {
 bus = pci_root_bus_new(ds, "pxb-internal", NULL, NULL, 0, 
TYPE_PXB_BUS);
 bds = qdev_new("pci-bridge");
@@ -389,6 +412,8 @@ static Property pxb_dev_properties[] = {
 DEFINE_PROP_UINT8("bus_nr", PXBDev, bus_nr, 0),
 DEFINE_PROP_UINT16("numa_node", PXBDev, numa_node, NUMA_NODE_UNASSIGNED),
 DEFINE_PROP_INT32("uid", PXBDev, uid, -1),
+DEFINE_PROP_ARRAY("window-base", PXBDev, cxl.num_windows, cxl.window_base,
+  qdev_prop_uint64, hwaddr),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -460,7 +485,9 @@ static const TypeInfo pxb_pcie_dev_info = {
 
 static void pxb_cxl_dev_realize(PCIDevice *dev, Error **errp)
 {
-PXBDev *pxb = convert_to_pxb(dev);
+PXBDev *pxb = PXB_CXL_DEV(dev);
+struct cxl_dev *cxl = >cxl;
+int count = 0;
 
 /* A CXL PXB's parent bus is still PCIe */
 if (!pci_bus_is_express(pci_get_bus(dev))) {
@@ -476,6 +503,23 @@ static void pxb_cxl_dev_realize(PCIDevice *dev, Error 
**errp)
 /* FIXME: Check that uid doesn't collide with UIDs of other host bridges */
 
 pxb_dev_realize_common(dev, CXL, errp);
+
+for (unsigned i = 0; i < CXL_WINDOW_MAX; i++) {
+if (!cxl->memory_window[i]) {
+continue;
+}
+
+count++;
+}
+
+if (!count) {
+warn_report("memory-windows should be set when creating CXL host 
bridg

[RFC PATCH v3 29/31] hw/cxl/device: Implement get/set LSA

2021-02-01 Thread Ben Widawsky

Signed-off-by: Ben Widawsky 
---
 hw/cxl/cxl-mailbox-utils.c  | 50 +
 hw/mem/cxl_type3.c  | 56 -
 include/hw/cxl/cxl_device.h |  9 ++
 3 files changed, 114 insertions(+), 1 deletion(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 2637250c7b..c133cf0341 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -55,6 +55,8 @@ enum {
 #define MEMORY_DEVICE 0x0
 CCLS= 0x41,
 #define GET_PARTITION_INFO 0x0
+#define GET_LSA   0x2
+#define SET_LSA   0x3
 };
 
 /* 8.2.8.4.5.1 Command Return Codes */
@@ -136,8 +138,11 @@ declare_mailbox_handler(LOGS_GET_SUPPORTED);
 declare_mailbox_handler(LOGS_GET_LOG);
 declare_mailbox_handler(IDENTIFY_MEMORY_DEVICE);
 declare_mailbox_handler(CCLS_GET_PARTITION_INFO);
+declare_mailbox_handler(CCLS_GET_LSA);
+declare_mailbox_handler(CCLS_SET_LSA);
 
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
+#define IMMEDIATE_DATA_CHANGE (1 << 1)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
 #define IMMEDIATE_LOG_CHANGE (1 << 4)
 
@@ -156,6 +161,8 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
 CXL_CMD(LOGS, GET_LOG, 0x18, 0),
 CXL_CMD(IDENTIFY, MEMORY_DEVICE, 0, 0),
 CXL_CMD(CCLS, GET_PARTITION_INFO, 0, 0),
+CXL_CMD(CCLS, GET_LSA, 0, 0),
+CXL_CMD(CCLS, SET_LSA, ~0, IMMEDIATE_CONFIG_CHANGE | 
IMMEDIATE_DATA_CHANGE),
 };
 
 #undef CXL_CMD
@@ -365,6 +372,49 @@ define_mailbox_handler(CCLS_GET_PARTITION_INFO)
 return CXL_MBOX_SUCCESS;
 }
 
+define_mailbox_handler(CCLS_GET_LSA)
+{
+struct {
+uint32_t offset;
+uint32_t length;
+} __attribute__((packed, __aligned__(16))) *get_lsa;
+CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
+CXLType3Class *cvc = CXL_TYPE3_DEV_GET_CLASS(ct3d);
+uint32_t offset, length;
+
+get_lsa = (void *)cmd->payload;
+offset = get_lsa->offset;
+length = get_lsa->length;
+
+*len = 0;
+if (offset + length > cvc->get_lsa_size(ct3d)) {
+return CXL_MBOX_INVALID_INPUT;
+}
+
+*len = cvc->get_lsa(ct3d, get_lsa, length, offset);
+return CXL_MBOX_SUCCESS;
+}
+
+define_mailbox_handler(CCLS_SET_LSA)
+{
+struct {
+uint32_t offset;
+uint32_t rsvd;
+void *data;
+} __attribute__((packed, __aligned__(16))) *set_lsa = (void *)cmd->payload;
+CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
+CXLType3Class *cvc = CXL_TYPE3_DEV_GET_CLASS(ct3d);
+uint16_t plen = *len;
+
+*len = 0;
+if ((set_lsa->offset + plen) > cvc->get_lsa_size(ct3d)) {
+return CXL_MBOX_INVALID_INPUT;
+}
+
+cvc->set_lsa(ct3d, set_lsa->data, plen, set_lsa->offset);
+return CXL_MBOX_SUCCESS;
+}
+
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
 {
 uint16_t ret = CXL_MBOX_SUCCESS;
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 074d1dd41f..d091e645aa 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -8,6 +8,7 @@
 #include "qapi/error.h"
 #include "qemu/log.h"
 #include "qemu/module.h"
+#include "qemu/pmem.h"
 #include "qemu/range.h"
 #include "qemu/rcu.h"
 #include "sysemu/hostmem.h"
@@ -148,6 +149,11 @@ static void cxl_setup_memory(CXLType3Dev *ct3d, Error 
**errp)
 return;
 }
 
+if (!ct3d->lsa) {
+error_setg(errp, "lsa property must be set");
+return;
+}
+
 /* FIXME: need to check mr is the host bridge's MR */
 mr = host_memory_backend_get_memory(ct3d->hostmem);
 
@@ -267,6 +273,8 @@ static Property ct3_props[] = {
 DEFINE_PROP_SIZE("size", CXLType3Dev, size, -1),
 DEFINE_PROP_LINK("memdev", CXLType3Dev, hostmem, TYPE_MEMORY_BACKEND,
  HostMemoryBackend *),
+DEFINE_PROP_LINK("lsa", CXLType3Dev, lsa, TYPE_MEMORY_BACKEND,
+ HostMemoryBackend *),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -297,7 +305,51 @@ static void pc_dimm_md_fill_device_info(const 
MemoryDeviceState *md,
 
 static uint64_t get_lsa_size(CXLType3Dev *ct3d)
 {
-return 0;
+MemoryRegion *mr;
+
+mr = host_memory_backend_get_memory(ct3d->lsa);
+return memory_region_size(mr);
+}
+
+static void validate_lsa_access(MemoryRegion *mr, uint64_t size,
+uint64_t offset)
+{
+assert(offset + size <= memory_region_size(mr));
+assert(offset + size > offset);
+}
+
+static uint64_t get_lsa(CXLType3Dev *ct3d, void *buf, uint64_t size,
+uint64_t offset)
+{
+MemoryRegion *mr;
+void *lsa;
+
+mr = host_memory_backend_get_memory(ct3d->lsa);
+validate_lsa_access(mr, size, offset);
+
+lsa = memory_region_get_ram_ptr(mr) + offset;
+memcpy(buf, lsa, size);
+
+return size;
+}
+
+static void

[RFC PATCH v3 20/31] hw/cxl/rp: Add a root port

2021-02-01 Thread Ben Widawsky

This adds just enough of a root port implementation to be able to
enumerate root ports (creating the required DVSEC entries). What's not
here yet is the MMIO nor the ability to write some of the DVSEC entries.

This can be added with the qemu commandline by adding a rootport to a
specific CXL host bridge. For example:
  -device cxl-rp,id=rp0,bus="cxl.0",addr=0.0,chassis=4

Like the host bridge patch, the ACPI tables aren't generated at this
point and so system software cannot use it.

Signed-off-by: Ben Widawsky 
---
 hw/pci-bridge/Kconfig  |   5 +
 hw/pci-bridge/cxl_root_port.c  | 231 +
 hw/pci-bridge/meson.build  |   1 +
 hw/pci-bridge/pcie_root_port.c |   6 +-
 hw/pci/pci.c   |   4 +-
 5 files changed, 245 insertions(+), 2 deletions(-)
 create mode 100644 hw/pci-bridge/cxl_root_port.c

diff --git a/hw/pci-bridge/Kconfig b/hw/pci-bridge/Kconfig
index f8df4315ba..02614f49aa 100644
--- a/hw/pci-bridge/Kconfig
+++ b/hw/pci-bridge/Kconfig
@@ -27,3 +27,8 @@ config DEC_PCI
 
 config SIMBA
 bool
+
+config CXL
+bool
+default y if PCI_EXPRESS && PXB
+depends on PCI_EXPRESS && MSI_NONBROKEN && PXB
diff --git a/hw/pci-bridge/cxl_root_port.c b/hw/pci-bridge/cxl_root_port.c
new file mode 100644
index 00..6c3b215bb3
--- /dev/null
+++ b/hw/pci-bridge/cxl_root_port.c
@@ -0,0 +1,231 @@
+/*
+ * CXL 2.0 Root Port Implementation
+ *
+ * Copyright(C) 2020 Intel Corporation.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qemu/range.h"
+#include "hw/pci/pci_bridge.h"
+#include "hw/pci/pcie_port.h"
+#include "hw/qdev-properties.h"
+#include "hw/sysbus.h"
+#include "qapi/error.h"
+#include "hw/cxl/cxl.h"
+
+#define CXL_ROOT_PORT_DID 0x7075
+
+/* Copied from the gen root port which we derive */
+#define GEN_PCIE_ROOT_PORT_AER_OFFSET 0x100
+#define GEN_PCIE_ROOT_PORT_ACS_OFFSET \
+(GEN_PCIE_ROOT_PORT_AER_OFFSET + PCI_ERR_SIZEOF)
+#define CXL_ROOT_PORT_DVSEC_OFFSET \
+(GEN_PCIE_ROOT_PORT_ACS_OFFSET + PCI_ACS_SIZEOF)
+
+typedef struct CXLRootPort {
+/*< private >*/
+PCIESlot parent_obj;
+
+CXLComponentState cxl_cstate;
+PCIResReserve res_reserve;
+} CXLRootPort;
+
+#define TYPE_CXL_ROOT_PORT "cxl-rp"
+DECLARE_INSTANCE_CHECKER(CXLRootPort, CXL_ROOT_PORT, TYPE_CXL_ROOT_PORT)
+
+static void latch_registers(CXLRootPort *crp)
+{
+uint32_t *reg_state = crp->cxl_cstate.crb.cache_mem_registers;
+
+cxl_component_register_init_common(reg_state, CXL2_ROOT_PORT);
+}
+
+static void build_dvsecs(CXLComponentState *cxl)
+{
+uint8_t *dvsec;
+
+dvsec = (uint8_t *)&(struct extensions_dvsec_port){ 0 };
+cxl_component_create_dvsec(cxl, EXTENSIONS_PORT_DVSEC_LENGTH,
+   EXTENSIONS_PORT_DVSEC,
+   EXTENSIONS_PORT_DVSEC_REVID, dvsec);
+
+dvsec = (uint8_t *)&(struct dvsec_port_gpf){
+.rsvd= 0,
+.phase1_ctrl = 1, /* 1μs timeout */
+.phase2_ctrl = 1, /* 1μs timeout */
+};
+cxl_component_create_dvsec(cxl, GPF_PORT_DVSEC_LENGTH, GPF_PORT_DVSEC,
+   GPF_PORT_DVSEC_REVID, dvsec);
+
+dvsec = (uint8_t *)&(struct dvsec_port_flexbus){
+.cap  = 0x26, /* IO, Mem, non-MLD */
+.ctrl = 0,
+.status   = 0x26, /* same */
+.rcvd_mod_ts_data = 0xef, /* WTF? */
+};
+cxl_component_create_dvsec(cxl, PCIE_FLEXBUS_PORT_DVSEC_LENGTH_2_0,
+   PCIE_FLEXBUS_PORT_DVSEC,
+   PCIE_FLEXBUS_PORT_DVSEC_REVID_2_0, dvsec);
+
+dvsec = (uint8_t *)&(struct dvsec_register_locator){
+.rsvd = 0,
+.reg0_base_lo = RBI_COMPONENT_REG | COMPONENT_REG_BAR_IDX,
+.reg0_base_hi = 0,
+};
+cxl_component_create_dvsec(cxl, REG_LOC_DVSEC_LENGTH, REG_LOC_DVSEC,
+   REG_LOC_DVSEC_REVID, dvsec);
+}
+
+static void cxl_rp_realize(DeviceState *dev, Error **errp)
+{
+PCIDevice *pci_dev = PCI_DEVICE(dev);
+PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(dev);
+CXLRootPort *crp   = CXL_ROO

[RFC PATCH v3 15/31] tests/acpi: remove stale allowed tables

2021-02-01 Thread Ben Widawsky

 Scope (_SB)
 {
 Device (PCI0)
 {
 Name (_HID, EisaId ("PNP0A03") /* PCI Bus */)  // _HID: Hardware ID
-Name (_ADR, Zero)  // _ADR: Address
 Name (_UID, Zero)  // _UID: Unique ID
+Name (_ADR, Zero)  // _ADR: Address

Signed-off-by: Ben Widawsky 
---
 tests/data/acpi/pc/DSDT | Bin 5065 -> 5065 bytes
 tests/data/acpi/pc/DSDT.acpihmat| Bin 6390 -> 6390 bytes
 tests/data/acpi/pc/DSDT.bridge  | Bin 6924 -> 6924 bytes
 tests/data/acpi/pc/DSDT.cphp| Bin 5529 -> 5529 bytes
 tests/data/acpi/pc/DSDT.dimmpxm | Bin 6719 -> 6719 bytes
 tests/data/acpi/pc/DSDT.hpbridge| Bin 5026 -> 5026 bytes
 tests/data/acpi/pc/DSDT.hpbrroot| Bin 3084 -> 3084 bytes
 tests/data/acpi/pc/DSDT.ipmikcs | Bin 5137 -> 5137 bytes
 tests/data/acpi/pc/DSDT.memhp   | Bin 6424 -> 6424 bytes
 tests/data/acpi/pc/DSDT.numamem | Bin 5071 -> 5071 bytes
 tests/data/acpi/pc/DSDT.roothp  | Bin 5261 -> 5261 bytes
 tests/data/acpi/q35/DSDT| Bin 7801 -> 7801 bytes
 tests/data/acpi/q35/DSDT.acpihmat   | Bin 9126 -> 9126 bytes
 tests/data/acpi/q35/DSDT.bridge | Bin 7819 -> 7819 bytes
 tests/data/acpi/q35/DSDT.cphp   | Bin 8265 -> 8265 bytes
 tests/data/acpi/q35/DSDT.dimmpxm| Bin 9455 -> 9455 bytes
 tests/data/acpi/q35/DSDT.ipmibt | Bin 7876 -> 7876 bytes
 tests/data/acpi/q35/DSDT.memhp  | Bin 9160 -> 9160 bytes
 tests/data/acpi/q35/DSDT.mmio64 | Bin 8932 -> 8932 bytes
 tests/data/acpi/q35/DSDT.numamem| Bin 7807 -> 7807 bytes
 tests/qtest/bios-tables-test-allowed-diff.h |  21 
 21 files changed, 21 deletions(-)

diff --git a/tests/data/acpi/pc/DSDT b/tests/data/acpi/pc/DSDT
index 
f6173df1d598767a79aa34ad7585ad7d45c5d4f3..b516745128e3f1a297b6327e9057026a2d16229c
 100644
GIT binary patch
delta 20
bcmX@9eo}oxJ7=h;3j;^Iqf5}n36{bDOsEE~

delta 20
bcmX@9eo}oxJEx;d5CcbisHe-u36{bDOlAhI

diff --git a/tests/data/acpi/pc/DSDT.acpihmat b/tests/data/acpi/pc/DSDT.acpihmat
index 
67f3f7249eaaa9404ebf0f2d0a324b8c8e3bd445..aeae285c6434ae6cf3c53660e34425727a497871
 100644
GIT binary patch
delta 20
bcmexn_|0%aJ7=h;3j;^Iqf5}n3271lRUHRT

delta 20
bcmexn_|0%aJEx;d5CcbisHe-u3271lRNDtm

diff --git a/tests/data/acpi/pc/DSDT.bridge b/tests/data/acpi/pc/DSDT.bridge
index 
643390f4c4138b37fc481656d3f555d0eeedcb02..4cd26a87dd11d96e10bf6de786b9d56ebfe0a4f9
 100644
GIT binary patch
delta 20
bcmeA%>oJ?q_I!oU=n}MXLX8vvMneXi

delta 20
bcmeA%>oJ?q#J~|B>glp^LX8vvMgaz#

diff --git a/tests/data/acpi/pc/DSDT.cphp b/tests/data/acpi/pc/DSDT.cphp
index 
1ddcf7d8812f5d8d4d38fe7e7b35fd5885806046..fecb784812cbb2308ef58acf4a2c580f56d35c39
 100644
GIT binary patch
delta 20
bcmbQKJyUx^J7=h;3j;^Iqf5}n37nz;MY;wk

delta 20
bcmbQKJyUx^JEx;d5CcbisHe-u37nz;MR*1%

diff --git a/tests/data/acpi/pc/DSDT.dimmpxm b/tests/data/acpi/pc/DSDT.dimmpxm
index 
c44385cc01879324738ffb7f997b8cdd762cbf97..f2c31e150ead16e4931367a6dab42704950a21e9
 100644
GIT binary patch
delta 20
bcmdmQvfpGvJ7=h;3j;^Iqf5}n3F{>RP4WjY

delta 20
bcmdmQvfpGvJEx;d5CcbisHe-u3F{>RO|Sglp^LJcglp^!e3zkMNS6(

diff --git a/tests/data/acpi/q35/DSDT b/tests/data/acpi/q35/DSDT
index 
d25cd7072932886d6967f4023faac1e1fa6e836c..17e2aebde98e0a3161d93e9b2e200737b13699ac
 100644
GIT binary patch
delta 21
dcmexq^V4R+R}_#>)Z#RX+z<

diff --git a/tests/data/acpi/q35/DSDT.acpihmat 
b/tests/data/acpi/q35/DSDT.acpihmat
index 
722e06af83abcde203a2b96a8ec81fd3bab9fc98..7b3d659352a0923822f6a5db1dbd0a6ad853c446
 100644
GIT binary patch
delta 21
dcmZ4HzRZ2XR}__9y`WOK1lA

diff --git a/tests/data/acpi/q35/DSDT.bridge b/tests/data/acpi/q35/DSDT.bridge
index 
06bac139d668ddfc7914e258b471a303c9dbd192..5961b55b1067c3090b2f1f4cd3386d71efee241d
 100644
GIT binary patch
delta 21
ccmeCS?Y5mTdE(4QHja2lmmr4CQjCSN09fk={{R30

delta 19
acmeCS?Y5mTnZ?m1h+*Qy=FL)!g|Yxf4F-?^

diff --git a/tests/data/acpi/q35/DSDT.cphp b/tests/data/acpi/q35/DSDT.cphp
index 
2b933ac482e6883efccbd7d6c96089602f2c0b4d..09c92d52f92bb346ed807945b9638cad958446f8
 100644
GIT binary patch
delta 21
dcmX@R}_>dONFPN@dc

diff --git a/tests/data/acpi/q35/DSDT.dimmpxm b/tests/data/acpi/q35/DSDT.dimmpxm
index 
bd8f8305b028ef20f9b6d1a0c69ac428d027e3d1..1da97afb32dddafefe7f27934acbcb7d56a67489
 100644
GIT binary patch
delta 21
dcmaFw`QCHFR}_UR4GFR)YuH

diff --git a/tests/data/acpi/q35/DSDT.ipmibt b/tests/data/acpi/q35/DSDT.ipmibt
index 
a8f868e23c25688ab1c0371016c071f23e9d732f..c7e68432b66e7b4d03284c882c65bbf3066825dc
 100644
GIT binary patch
delta 21
dcmX?NdR}_u95`+PJ;(K

diff --git a/tests/data/acpi/q35/DSDT.memhp b/tests/data/acpi/q35/DSDT.memhp
index 
9a802e4c67022386442976d5cb997ea3fc57b58f..3af457dd550461b2d2ea85aa85d7740452913b34
 100644
GIT binary patch
delta 21
dcmX@%e!_jiR}_u2

[RFC PATCH v3 28/31] hw/cxl/device: Plumb real LSA sizing

2021-02-01 Thread Ben Widawsky

This should introduce no change. Subsequent work will make use of this
new class member.

Signed-off-by: Ben Widawsky 
---
 hw/cxl/cxl-mailbox-utils.c  |  4 
 hw/mem/cxl_type3.c  | 24 +---
 include/hw/cxl/cxl.h|  1 -
 include/hw/cxl/cxl_device.h | 24 
 4 files changed, 37 insertions(+), 16 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index dc8e0eb08e..2637250c7b 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -321,6 +321,9 @@ define_mailbox_handler(IDENTIFY_MEMORY_DEVICE)
 } __attribute__((packed)) *id;
 _Static_assert(sizeof(*id) == 0x43, "Bad identify size");
 
+CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
+CXLType3Class *cvc = CXL_TYPE3_DEV_GET_CLASS(ct3d);
+
 if (memory_region_size(cxl_dstate->pmem) < (256 << 20)) {
 return CXL_MBOX_INTERNAL_ERROR;
 }
@@ -332,6 +335,7 @@ define_mailbox_handler(IDENTIFY_MEMORY_DEVICE)
 snprintf(id->fw_revision, 0x10, "BWFW VERSION %02d", 0);
 id->total_capacity = memory_region_size(cxl_dstate->pmem);
 id->persistent_capacity = memory_region_size(cxl_dstate->pmem);
+id->lsa_size = cvc->get_lsa_size(ct3d);
 
 *len = sizeof(*id);
 return CXL_MBOX_SUCCESS;
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index fe02c3b63c..074d1dd41f 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -13,21 +13,6 @@
 #include "sysemu/hostmem.h"
 #include "hw/cxl/cxl.h"
 
-typedef struct cxl_type3_dev {
-/* Private */
-PCIDevice parent_obj;
-
-/* Properties */
-uint64_t size;
-HostMemoryBackend *hostmem;
-
-/* State */
-CXLComponentState cxl_cstate;
-CXLDeviceState cxl_dstate;
-} CXLType3Dev;
-
-#define CT3(obj) OBJECT_CHECK(CXLType3Dev, (obj), TYPE_CXL_TYPE3_DEV)
-
 static void build_dvsecs(CXLType3Dev *ct3d)
 {
 CXLComponentState *cxl_cstate = >cxl_cstate;
@@ -310,11 +295,17 @@ static void pc_dimm_md_fill_device_info(const 
MemoryDeviceState *md,
 info->type = MEMORY_DEVICE_INFO_KIND_CXL;
 }
 
+static uint64_t get_lsa_size(CXLType3Dev *ct3d)
+{
+return 0;
+}
+
 static void ct3_class_init(ObjectClass *oc, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(oc);
 PCIDeviceClass *pc = PCI_DEVICE_CLASS(oc);
 MemoryDeviceClass *mdc = MEMORY_DEVICE_CLASS(oc);
+CXLType3Class *cvc = CXL_TYPE3_DEV_CLASS(oc);
 
 pc->realize = ct3_realize;
 pc->class_id = PCI_CLASS_STORAGE_EXPRESS;
@@ -332,11 +323,14 @@ static void ct3_class_init(ObjectClass *oc, void *data)
 mdc->fill_device_info = pc_dimm_md_fill_device_info;
 mdc->get_plugged_size = memory_device_get_region_size;
 mdc->set_addr = cxl_md_set_addr;
+
+cvc->get_lsa_size = get_lsa_size;
 }
 
 static const TypeInfo ct3d_info = {
 .name = TYPE_CXL_TYPE3_DEV,
 .parent = TYPE_PCI_DEVICE,
+.class_size = sizeof(struct CXLType3Class),
 .class_init = ct3_class_init,
 .instance_size = sizeof(CXLType3Dev),
 .instance_init = ct3_instance_init,
diff --git a/include/hw/cxl/cxl.h b/include/hw/cxl/cxl.h
index 809ed7de60..c7ca42930f 100644
--- a/include/hw/cxl/cxl.h
+++ b/include/hw/cxl/cxl.h
@@ -23,4 +23,3 @@
 #define CXL_WINDOW_MAX 10
 
 #endif
-
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index ca5328a581..a79a0f106c 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -219,4 +219,28 @@ REG32(CXL_MEM_DEV_STS, 0)
 FIELD(CXL_MEM_DEV_STS, MBOX_READY, 4, 1)
 FIELD(CXL_MEM_DEV_STS, RESET_NEEDED, 5, 3)
 
+typedef struct cxl_type3_dev {
+/* Private */
+PCIDevice parent_obj;
+
+/* Properties */
+uint64_t size;
+HostMemoryBackend *hostmem;
+HostMemoryBackend *lsa;
+
+/* State */
+CXLComponentState cxl_cstate;
+CXLDeviceState cxl_dstate;
+} CXLType3Dev;
+
+#define CT3(obj) OBJECT_CHECK(CXLType3Dev, (obj), TYPE_CXL_TYPE3_DEV)
+
+struct CXLType3Class {
+/* Private */
+PCIDeviceClass parent_class;
+
+/* public */
+uint64_t (*get_lsa_size)(CXLType3Dev *ct3d);
+};
+
 #endif
-- 
2.30.0

[RFC PATCH v3 18/31] acpi/pxb/cxl: Reserve host bridge MMIO

2021-02-01 Thread Ben Widawsky

For all host bridges, reserve MMIO space with _CRS. The MMIO for the
host bridge lives in a magically hard coded space in the system's
physical address space. The standard mechanism to tell the OS about
regions which can't be used for host bridges is _CRS.

Signed-off-by: Ben Widawsky 
---
 hw/i386/acpi-build.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 145a503e92..ecdc10b148 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -28,6 +28,7 @@
 #include "qemu/bitmap.h"
 #include "qemu/error-report.h"
 #include "hw/pci/pci.h"
+#include "hw/cxl/cxl.h"
 #include "hw/core/cpu.h"
 #include "target/i386/cpu.h"
 #include "hw/misc/pvpanic.h"
@@ -1194,7 +1195,7 @@ static void build_smb0(Aml *table, I2CBus *smbus, int 
devnr, int func)
 aml_append(table, scope);
 }
 
-enum { PCI, PCIE };
+enum { PCI, PCIE, CXL };
 static void init_pci_acpi(Aml *dev, int uid, int type)
 {
 if (type == PCI) {
@@ -1344,20 +1345,28 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 uint8_t bus_num = pci_bus_num(bus);
 uint8_t numa_node = pci_bus_numa_node(bus);
 int32_t uid = pci_bus_uid(bus);
+int type;
 
 /* look only for expander root buses */
 if (!pci_bus_is_root(bus)) {
 continue;
 }
 
+type = pci_bus_is_cxl(bus) ? CXL :
+ pci_bus_is_express(bus) ? PCIE : PCI;
+
 if (bus_num < root_bus_limit) {
 root_bus_limit = bus_num - 1;
 }
 
 scope = aml_scope("\\_SB");
-dev = aml_device("PC%.02X", bus_num);
+if (type == CXL) {
+dev = aml_device("CXL%.01X", pci_bus_uid(bus));
+} else {
+dev = aml_device("PC%.02X", bus_num);
+}
 aml_append(dev, aml_name_decl("_BBN", aml_int(bus_num)));
-init_pci_acpi(dev, uid, pci_bus_is_express(bus) ? PCIE : PCI);
+init_pci_acpi(dev, uid, type);
 
 if (numa_node != NUMA_NODE_UNASSIGNED) {
 aml_append(dev, aml_name_decl("_PXM", aml_int(numa_node)));
@@ -1369,6 +1378,13 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 aml_append(dev, aml_name_decl("_CRS", crs));
 aml_append(scope, dev);
 aml_append(dsdt, scope);
+
+/* Handle the ranges for the PXB expanders */
+if (type == CXL) {
+uint64_t base = CXL_HOST_BASE + uid * 0x1;
+crs_range_insert(crs_range_set.mem_ranges, base,
+ base + 0x1 - 1);
+}
 }
 }
 
-- 
2.30.0

[RFC PATCH v3 24/31] tests/acpi: allow CEDT table addition

2021-02-01 Thread Ben Widawsky

Signed-off-by: Ben Widawsky 
---
 tests/data/acpi/pc/CEDT | 0
 tests/data/acpi/q35/CEDT| 0
 tests/qtest/bios-tables-test-allowed-diff.h | 2 ++
 3 files changed, 2 insertions(+)
 create mode 100644 tests/data/acpi/pc/CEDT
 create mode 100644 tests/data/acpi/q35/CEDT

diff --git a/tests/data/acpi/pc/CEDT b/tests/data/acpi/pc/CEDT
new file mode 100644
index 00..e69de29bb2
diff --git a/tests/data/acpi/q35/CEDT b/tests/data/acpi/q35/CEDT
new file mode 100644
index 00..e69de29bb2
diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..9b07f1e1ff 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,3 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/pc/CEDT",
+"tests/data/acpi/q35/CEDT",
-- 
2.30.0

[RFC PATCH v3 13/31] qtest: allow DSDT acpi table changes

2021-02-01 Thread Ben Widawsky

Signed-off-by: Ben Widawsky 
---
 tests/qtest/bios-tables-test-allowed-diff.h | 21 +
 1 file changed, 21 insertions(+)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..5c695cdf37 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,22 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/pc/DSDT",
+"tests/data/acpi/pc/DSDT.acpihmat",
+"tests/data/acpi/pc/DSDT.bridge",
+"tests/data/acpi/pc/DSDT.cphp",
+"tests/data/acpi/pc/DSDT.dimmpxm",
+"tests/data/acpi/pc/DSDT.hpbridge",
+"tests/data/acpi/pc/DSDT.hpbrroot",
+"tests/data/acpi/pc/DSDT.ipmikcs",
+"tests/data/acpi/pc/DSDT.memhp",
+"tests/data/acpi/pc/DSDT.numamem",
+"tests/data/acpi/pc/DSDT.roothp",
+"tests/data/acpi/q35/DSDT",
+"tests/data/acpi/q35/DSDT.acpihmat",
+"tests/data/acpi/q35/DSDT.bridge",
+"tests/data/acpi/q35/DSDT.cphp",
+"tests/data/acpi/q35/DSDT.dimmpxm",
+"tests/data/acpi/q35/DSDT.ipmibt",
+"tests/data/acpi/q35/DSDT.memhp",
+"tests/data/acpi/q35/DSDT.mmio64",
+"tests/data/acpi/q35/DSDT.numamem",
+"tests/data/acpi/q35/DSDT.tis",
-- 
2.30.0

[RFC PATCH v3 26/31] tests/acpi: Add new CEDT files

2021-02-01 Thread Ben Widawsky

Signed-off-by: Ben Widawsky 
---
 tests/data/acpi/pc/CEDT | Bin 0 -> 36 bytes
 tests/data/acpi/q35/CEDT| Bin 0 -> 36 bytes
 tests/qtest/bios-tables-test-allowed-diff.h |   2 --
 3 files changed, 2 deletions(-)

diff --git a/tests/data/acpi/pc/CEDT b/tests/data/acpi/pc/CEDT
index 
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..ebf9b54b0b27d9efca53359c3c2e560511f0e165
 100644
GIT binary patch
literal 36
kcmZ>EbqP^nU|?X};NEbqP^nU|?X};N

[RFC PATCH v3 11/31] hw/pci/cxl: Create a CXL bus type

2021-02-01 Thread Ben Widawsky

The easiest way to differentiate a CXL bus, and a PCIE bus is using a
flag. A CXL bus, in hardware, is backward compatible with PCIE, and
therefore the code tries pretty hard to keep them in sync as much as
possible.

The other way to implement this would be to try to cast the bus to the
correct type. This is less code and useful for debugging via simply
looking at the flags.

Signed-off-by: Ben Widawsky 
---
 hw/pci-bridge/pci_expander_bridge.c | 9 -
 include/hw/pci/pci_bus.h| 7 +++
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/hw/pci-bridge/pci_expander_bridge.c 
b/hw/pci-bridge/pci_expander_bridge.c
index 232b7ce305..88c45dc3b5 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -24,7 +24,7 @@
 #include "hw/boards.h"
 #include "qom/object.h"
 
-enum BusType { PCI, PCIE };
+enum BusType { PCI, PCIE, CXL };
 
 #define TYPE_PXB_BUS "pxb-bus"
 typedef struct PXBBus PXBBus;
@@ -35,6 +35,10 @@ DECLARE_INSTANCE_CHECKER(PXBBus, PXB_BUS,
 DECLARE_INSTANCE_CHECKER(PXBBus, PXB_PCIE_BUS,
  TYPE_PXB_PCIE_BUS)
 
+#define TYPE_PXB_CXL_BUS "pxb-cxl-bus"
+DECLARE_INSTANCE_CHECKER(PXBBus, PXB_CXL_BUS,
+ TYPE_PXB_CXL_BUS)
+
 struct PXBBus {
 /*< private >*/
 PCIBus parent_obj;
@@ -244,6 +248,9 @@ static void pxb_dev_realize_common(PCIDevice *dev, enum 
BusType type,
 ds = qdev_new(TYPE_PXB_HOST);
 if (type == PCIE) {
 bus = pci_root_bus_new(ds, dev_name, NULL, NULL, 0, TYPE_PXB_PCIE_BUS);
+} else if (type == CXL) {
+bus = pci_root_bus_new(ds, dev_name, NULL, NULL, 0, TYPE_PXB_CXL_BUS);
+bus->flags |= PCI_BUS_CXL;
 } else {
 bus = pci_root_bus_new(ds, "pxb-internal", NULL, NULL, 0, 
TYPE_PXB_BUS);
 bds = qdev_new("pci-bridge");
diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h
index 347440d42c..eb94e7e85c 100644
--- a/include/hw/pci/pci_bus.h
+++ b/include/hw/pci/pci_bus.h
@@ -24,6 +24,8 @@ enum PCIBusFlags {
 PCI_BUS_IS_ROOT = 0x0001,
 /* PCIe extended configuration space is accessible on this bus */
 PCI_BUS_EXTENDED_CONFIG_SPACE   = 0x0002,
+/* This is a CXL Type BUS */
+PCI_BUS_CXL = 0x0004,
 };
 
 struct PCIBus {
@@ -53,6 +55,11 @@ struct PCIBus {
 Notifier machine_done;
 };
 
+static inline bool pci_bus_is_cxl(PCIBus *bus)
+{
+return !!(bus->flags & PCI_BUS_CXL);
+}
+
 static inline bool pci_bus_is_root(PCIBus *bus)
 {
 return !!(bus->flags & PCI_BUS_IS_ROOT);
-- 
2.30.0

[RFC PATCH v3 14/31] acpi/pci: Consolidate host bridge setup

2021-02-01 Thread Ben Widawsky

This cleanup will make it easier to add support for CXL to the mix.

Signed-off-by: Ben Widawsky 
---
 hw/i386/acpi-build.c | 31 +--
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index f56d699c7f..cf6eb54c22 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1194,6 +1194,20 @@ static void build_smb0(Aml *table, I2CBus *smbus, int 
devnr, int func)
 aml_append(table, scope);
 }
 
+enum { PCI, PCIE };
+static void init_pci_acpi(Aml *dev, int uid, int type)
+{
+if (type == PCI) {
+aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A03")));
+aml_append(dev, aml_name_decl("_UID", aml_int(uid)));
+} else {
+aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A08")));
+aml_append(dev, aml_name_decl("_CID", aml_eisaid("PNP0A03")));
+aml_append(dev, aml_name_decl("_UID", aml_int(uid)));
+aml_append(dev, build_q35_osc_method());
+}
+}
+
 static void
 build_dsdt(GArray *table_data, BIOSLinker *linker,
AcpiPmInfo *pm, AcpiMiscInfo *misc,
@@ -1222,9 +1236,8 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 if (misc->is_piix4) {
 sb_scope = aml_scope("_SB");
 dev = aml_device("PCI0");
-aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A03")));
+init_pci_acpi(dev, 0, PCI);
 aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
-aml_append(dev, aml_name_decl("_UID", aml_int(0)));
 aml_append(sb_scope, dev);
 aml_append(dsdt, sb_scope);
 
@@ -1238,11 +1251,8 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 } else {
 sb_scope = aml_scope("_SB");
 dev = aml_device("PCI0");
-aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A08")));
-aml_append(dev, aml_name_decl("_CID", aml_eisaid("PNP0A03")));
+init_pci_acpi(dev, 0, PCIE);
 aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
-aml_append(dev, aml_name_decl("_UID", aml_int(0)));
-aml_append(dev, build_q35_osc_method());
 aml_append(sb_scope, dev);
 
 if (pm->smi_on_cpuhp) {
@@ -1345,15 +1355,8 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 
 scope = aml_scope("\\_SB");
 dev = aml_device("PC%.02X", bus_num);
-aml_append(dev, aml_name_decl("_UID", aml_int(bus_num)));
 aml_append(dev, aml_name_decl("_BBN", aml_int(bus_num)));
-if (pci_bus_is_express(bus)) {
-aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A08")));
-aml_append(dev, aml_name_decl("_CID", aml_eisaid("PNP0A03")));
-aml_append(dev, build_q35_osc_method());
-} else {
-aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A03")));
-}
+init_pci_acpi(dev, bus_num, pci_bus_is_express(bus) ? PCIE : PCI);
 
 if (numa_node != NUMA_NODE_UNASSIGNED) {
 aml_append(dev, aml_name_decl("_PXM", aml_int(numa_node)));
-- 
2.30.0

[RFC PATCH v3 12/31] hw/pxb: Allow creation of a CXL PXB (host bridge)

2021-02-01 Thread Ben Widawsky

This works like adding a typical pxb device, except the name is
'pxb-cxl' instead of 'pxb-pcie'. An example command line would be as
follows:
  -device pxb-cxl,id=cxl.0,bus="pcie.0",bus_nr=1

A CXL PXB is backward compatible with PCIe. What this means in practice
is that an operating system that is unaware of CXL should still be able
to enumerate this topology as if it were PCIe.

One can create multiple CXL PXB host bridges, but a host bridge can only
be connected to the main root bus. Host bridges cannot appear elsewhere
in the topology.

Note that as of this patch, the ACPI tables needed for the host bridge
(specifically, an ACPI object in _SB named ACPI0016 and the CEDT) aren't
created. So while this patch internally creates it, it cannot be
properly used by an operating system or other system software.

Upcoming patches will allow creating multiple host bridges.

v2: Remove vendor and device ID (Ben)

Signed-off-by: Ben Widawsky 
---
 hw/pci-bridge/pci_expander_bridge.c | 67 -
 hw/pci/pci.c|  7 +++
 include/hw/pci/pci.h|  6 +++
 3 files changed, 78 insertions(+), 2 deletions(-)

diff --git a/hw/pci-bridge/pci_expander_bridge.c 
b/hw/pci-bridge/pci_expander_bridge.c
index 88c45dc3b5..b42592e1ff 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -56,6 +56,10 @@ DECLARE_INSTANCE_CHECKER(PXBDev, PXB_DEV,
 DECLARE_INSTANCE_CHECKER(PXBDev, PXB_PCIE_DEV,
  TYPE_PXB_PCIE_DEVICE)
 
+#define TYPE_PXB_CXL_DEVICE "pxb-cxl"
+DECLARE_INSTANCE_CHECKER(PXBDev, PXB_CXL_DEV,
+ TYPE_PXB_CXL_DEVICE)
+
 struct PXBDev {
 /*< private >*/
 PCIDevice parent_obj;
@@ -67,6 +71,11 @@ struct PXBDev {
 
 static PXBDev *convert_to_pxb(PCIDevice *dev)
 {
+/* A CXL PXB's parent bus is PCIe, so the normal check won't work */
+if (object_dynamic_cast(OBJECT(dev), TYPE_PXB_CXL_DEVICE)) {
+return PXB_CXL_DEV(dev);
+}
+
 return pci_bus_is_express(pci_get_bus(dev))
 ? PXB_PCIE_DEV(dev) : PXB_DEV(dev);
 }
@@ -111,11 +120,20 @@ static const TypeInfo pxb_pcie_bus_info = {
 .class_init= pxb_bus_class_init,
 };
 
+static const TypeInfo pxb_cxl_bus_info = {
+.name  = TYPE_PXB_CXL_BUS,
+.parent= TYPE_CXL_BUS,
+.instance_size = sizeof(PXBBus),
+.class_init= pxb_bus_class_init,
+};
+
 static const char *pxb_host_root_bus_path(PCIHostState *host_bridge,
   PCIBus *rootbus)
 {
-PXBBus *bus = pci_bus_is_express(rootbus) ?
-  PXB_PCIE_BUS(rootbus) : PXB_BUS(rootbus);
+PXBBus *bus = pci_bus_is_cxl(rootbus) ?
+  PXB_CXL_BUS(rootbus) :
+  pci_bus_is_express(rootbus) ? PXB_PCIE_BUS(rootbus) :
+PXB_BUS(rootbus);
 
 snprintf(bus->bus_path, 8, ":%02x", pxb_bus_num(rootbus));
 return bus->bus_path;
@@ -380,13 +398,58 @@ static const TypeInfo pxb_pcie_dev_info = {
 },
 };
 
+static void pxb_cxl_dev_realize(PCIDevice *dev, Error **errp)
+{
+/* A CXL PXB's parent bus is still PCIe */
+if (!pci_bus_is_express(pci_get_bus(dev))) {
+error_setg(errp, "pxb-cxl devices cannot reside on a PCI bus");
+return;
+}
+
+pxb_dev_realize_common(dev, CXL, errp);
+}
+
+static void pxb_cxl_dev_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc   = DEVICE_CLASS(klass);
+PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+k->realize = pxb_cxl_dev_realize;
+k->exit= pxb_dev_exitfn;
+/*
+ * XXX: These types of bridges don't actually show up in the hierarchy so
+ * vendor, device, class, etc. ids are intentionally left out.
+ */
+
+dc->desc = "CXL Host Bridge";
+device_class_set_props(dc, pxb_dev_properties);
+set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
+
+/* Host bridges aren't hotpluggable. FIXME: spec reference */
+dc->hotpluggable = false;
+}
+
+static const TypeInfo pxb_cxl_dev_info = {
+.name  = TYPE_PXB_CXL_DEVICE,
+.parent= TYPE_PCI_DEVICE,
+.instance_size = sizeof(PXBDev),
+.class_init= pxb_cxl_dev_class_init,
+.interfaces =
+(InterfaceInfo[]){
+{ INTERFACE_CONVENTIONAL_PCI_DEVICE },
+{},
+},
+};
+
 static void pxb_register_types(void)
 {
 type_register_static(_bus_info);
 type_register_static(_pcie_bus_info);
+type_register_static(_cxl_bus_info);
 type_register_static(_host_info);
 type_register_static(_dev_info);
 type_register_static(_pcie_dev_info);
+type_register_static(_cxl_dev_info);
 }
 
 type_init(pxb_register_types)
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index a45ca326ed..adbe8aa260 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -222,6 +222,12 @@ s

[RFC PATCH v3 25/31] acpi/cxl: Create the CEDT (9.14.1)

2021-02-01 Thread Ben Widawsky

The CXL Early Discovery Table is defined in the CXL 2.0 specification as
a way for the OS to get CXL specific information from the system
firmware.

CXL 2.0 specification adds an _HID, ACPI0016, for CXL capable host
bridges, with a _CID of PNP0A08 (PCIe host bridge). CXL aware software
is able to use this initiate the proper _OSC method, and get the _UID
which is referenced by the CEDT. Therefore the existence of an ACPI0016
device allows a CXL aware driver perform the necessary actions. For a
CXL capable OS, this works. For a CXL unaware OS, this works.

CEDT awaremess requires more. The motivation for ACPI0017 is to provide
the possibility of having a Linux CXL module that can work on a legacy
Linux kernel. Linux core PCI/ACPI which won't be built as a module,
will see the _CID of PNP0A08 and bind a driver to it. If we later loaded
a driver for ACPI0016, Linux won't be able to bind it to the hardware
because it has already bound the PNP0A08 driver. The ACPI0017 device is
an opportunity to have an object to bind a driver will be used by a
Linux driver to walk the CXL topology and do everything that we would
have preferred to do with ACPI0016.

There is another motivation for an ACPI0017 device which isn't
implemented here. An operating system needs an attach point for a
non-volatile region provider that understands cross-hostbridge
interleaving. Since QEMU emulation doesn't support interleaving yet,
this is more important on the OS side, for now.

As of CXL 2.0 spec, only 1 sub structure is defined, the CXL Host Bridge
Structure (CHBS) which is primarily useful for telling the OS exactly
where the MMIO for the host bridge is.

v2: Update CHBS to spec released definition
v3: squash ACPI0017 in now that it's ratified.

Link: 
https://lore.kernel.org/linux-cxl/20210115034911.nkgpzc756d6qm...@intel.com/T/#t
Signed-off-by: Ben Widawsky 
---
 hw/acpi/cxl.c   | 69 +
 hw/i386/acpi-build.c| 25 ++-
 hw/pci-bridge/pci_expander_bridge.c | 21 +
 include/hw/acpi/cxl.h   |  4 ++
 include/hw/pci/pci_bridge.h | 25 +++
 5 files changed, 123 insertions(+), 21 deletions(-)

diff --git a/hw/acpi/cxl.c b/hw/acpi/cxl.c
index 7124d5a1a3..68db0fe3a8 100644
--- a/hw/acpi/cxl.c
+++ b/hw/acpi/cxl.c
@@ -18,14 +18,83 @@
  */
 
 #include "qemu/osdep.h"
+#include "hw/sysbus.h"
+#include "hw/pci/pci_bridge.h"
+#include "hw/pci/pci_host.h"
 #include "hw/cxl/cxl.h"
+#include "hw/mem/memory-device.h"
 #include "hw/acpi/acpi.h"
 #include "hw/acpi/aml-build.h"
 #include "hw/acpi/bios-linker-loader.h"
 #include "hw/acpi/cxl.h"
+#include "hw/acpi/cxl.h"
 #include "qapi/error.h"
 #include "qemu/uuid.h"
 
+static void cedt_build_chbs(GArray *table_data, PXBDev *cxl)
+{
+SysBusDevice *sbd = SYS_BUS_DEVICE(cxl->cxl.cxl_host_bridge);
+struct MemoryRegion *mr = sbd->mmio[0].memory;
+
+/* Type */
+build_append_int_noprefix(table_data, 0, 1);
+
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 1);
+
+/* Record Length */
+build_append_int_noprefix(table_data, 32, 2);
+
+/* UID */
+build_append_int_noprefix(table_data, cxl->uid, 4);
+
+/* Version */
+build_append_int_noprefix(table_data, 1, 4);
+
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 4);
+
+/* Base */
+build_append_int_noprefix(table_data, mr->addr, 8);
+
+/* Length */
+build_append_int_noprefix(table_data, memory_region_size(mr), 8);
+}
+
+static int cxl_foreach_pxb_hb(Object *obj, void *opaque)
+{
+Aml *cedt = opaque;
+
+if (object_dynamic_cast(obj, TYPE_PXB_CXL_DEVICE)) {
+PXBDev *pxb = PXB_CXL_DEV(obj);
+
+cedt_build_chbs(cedt->buf, pxb);
+}
+
+return 0;
+}
+
+void cxl_build_cedt(GArray *table_offsets, GArray *table_data,
+BIOSLinker *linker)
+{
+const int cedt_start = table_data->len;
+Aml *cedt;
+
+cedt = init_aml_allocator();
+
+/* reserve space for CEDT header */
+acpi_add_table(table_offsets, table_data);
+acpi_data_push(cedt->buf, sizeof(AcpiTableHeader));
+
+object_child_foreach_recursive(object_get_root(), cxl_foreach_pxb_hb, 
cedt);
+
+/* copy AML table into ACPI tables blob and patch header there */
+g_array_append_vals(table_data, cedt->buf->data, cedt->buf->len);
+build_header(linker, table_data, (void *)(table_data->data + cedt_start),
+ "CEDT", table_data->len - cedt_start, 1, NULL, NULL);
+free_aml_allocator();
+}
+
 static Aml *__build_cxl_osc_method(void)
 {
 Aml *method, *if_uuid, *else_uuid, *if_arg1_not_1, *if_cxl, 
*if_caps_masked;
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 2c2293b55f..7706856c49 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -75,

[RFC PATCH v3 08/31] hw/cxl/device: Timestamp implementation (8.2.9.3)

2021-02-01 Thread Ben Widawsky

Per spec, timestamp appears to be a free-running counter from a value
set by the host via the Set Timestamp command (0301h). There are
references to the epoch, which seem like a red herring. Therefore, the
implementation implements the timestamp as freerunning counter from the
last value that was issued by the Set Timestamp command.

Signed-off-by: Ben Widawsky 
---
 hw/cxl/cxl-mailbox-utils.c  | 53 +
 include/hw/cxl/cxl_device.h |  6 +
 2 files changed, 59 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 7c939a1851..3d36614c0c 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -43,6 +43,9 @@ enum {
 #define CLEAR_RECORDS   0x1
 #define GET_INTERRUPT_POLICY   0x2
 #define SET_INTERRUPT_POLICY   0x3
+TIMESTAMP   = 0x03,
+#define GET   0x0
+#define SET   0x1
 };
 
 /* 8.2.8.4.5.1 Command Return Codes */
@@ -117,8 +120,11 @@ define_mailbox_handler_zeroed(EVENTS_GET_RECORDS, 0x20);
 define_mailbox_handler_nop(EVENTS_CLEAR_RECORDS);
 define_mailbox_handler_zeroed(EVENTS_GET_INTERRUPT_POLICY, 4);
 define_mailbox_handler_nop(EVENTS_SET_INTERRUPT_POLICY);
+declare_mailbox_handler(TIMESTAMP_GET);
+declare_mailbox_handler(TIMESTAMP_SET);
 
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
+#define IMMEDIATE_POLICY_CHANGE (1 << 3)
 #define IMMEDIATE_LOG_CHANGE (1 << 4)
 
 #define CXL_CMD(s, c, in, cel_effect) \
@@ -129,10 +135,57 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
 CXL_CMD(EVENTS, CLEAR_RECORDS, ~0, IMMEDIATE_LOG_CHANGE),
 CXL_CMD(EVENTS, GET_INTERRUPT_POLICY, 0, 0),
 CXL_CMD(EVENTS, SET_INTERRUPT_POLICY, 4, IMMEDIATE_CONFIG_CHANGE),
+CXL_CMD(TIMESTAMP, GET, 0, 0),
+CXL_CMD(TIMESTAMP, SET, 8, IMMEDIATE_POLICY_CHANGE),
 };
 
 #undef CXL_CMD
 
+/*
+ * 8.2.9.3.1
+ */
+define_mailbox_handler(TIMESTAMP_GET)
+{
+struct timespec ts;
+uint64_t delta;
+
+if (!cxl_dstate->timestamp.set) {
+*(uint64_t *)cmd->payload = 0;
+goto done;
+}
+
+/* First find the delta from the last time the host set the time. */
+clock_gettime(CLOCK_REALTIME, );
+delta = (ts.tv_sec * NANOSECONDS_PER_SECOND + ts.tv_nsec) -
+cxl_dstate->timestamp.last_set;
+
+/* Then adjust the actual time */
+stq_le_p(cmd->payload, cxl_dstate->timestamp.host_set + delta);
+
+done:
+*len = 8;
+return CXL_MBOX_SUCCESS;
+}
+
+/*
+ * 8.2.9.3.2
+ */
+define_mailbox_handler(TIMESTAMP_SET)
+{
+struct timespec ts;
+
+clock_gettime(CLOCK_REALTIME, );
+
+cxl_dstate->timestamp.set = true;
+cxl_dstate->timestamp.last_set =
+ts.tv_sec * NANOSECONDS_PER_SECOND + ts.tv_nsec;
+
+cxl_dstate->timestamp.host_set = le64_to_cpu(*(uint64_t *)cmd->payload);
+
+*len = 0;
+return CXL_MBOX_SUCCESS;
+}
+
 QemuUUID cel_uuid;
 
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 0cc5354ba4..ca5328a581 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -107,6 +107,12 @@ typedef struct cxl_device_state {
 size_t cel_size;
 };
 
+struct {
+bool set;
+uint64_t last_set;
+uint64_t host_set;
+} timestamp;
+
 /* memory region for persistent memory, HDM */
 MemoryRegion *pmem;
 
-- 
2.30.0

[RFC PATCH v3 09/31] hw/cxl/device: Add log commands (8.2.9.4) + CEL

2021-02-01 Thread Ben Widawsky

CXL specification provides for the ability to obtain logs from the
device. Logs are either spec defined, like the "Command Effects Log"
(CEL), or vendor specific. UUIDs are defined for all log types.

The CEL is a mechanism to provide information to the host about which
commands are supported. It is useful both to determine which spec'd
optional commands are supported, as well as provide a list of vendor
specified commands that might be used. The CEL is already created as
part of mailbox initialization, but here it is now exported to hosts
that use these log commands.

Signed-off-by: Ben Widawsky 
---
 hw/cxl/cxl-mailbox-utils.c | 67 ++
 1 file changed, 67 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 3d36614c0c..3f0ae8b9e5 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -46,6 +46,9 @@ enum {
 TIMESTAMP   = 0x03,
 #define GET   0x0
 #define SET   0x1
+LOGS= 0x04,
+#define GET_SUPPORTED 0x0
+#define GET_LOG   0x1
 };
 
 /* 8.2.8.4.5.1 Command Return Codes */
@@ -122,6 +125,8 @@ define_mailbox_handler_zeroed(EVENTS_GET_INTERRUPT_POLICY, 
4);
 define_mailbox_handler_nop(EVENTS_SET_INTERRUPT_POLICY);
 declare_mailbox_handler(TIMESTAMP_GET);
 declare_mailbox_handler(TIMESTAMP_SET);
+declare_mailbox_handler(LOGS_GET_SUPPORTED);
+declare_mailbox_handler(LOGS_GET_LOG);
 
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -137,6 +142,8 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
 CXL_CMD(EVENTS, SET_INTERRUPT_POLICY, 4, IMMEDIATE_CONFIG_CHANGE),
 CXL_CMD(TIMESTAMP, GET, 0, 0),
 CXL_CMD(TIMESTAMP, SET, 8, IMMEDIATE_POLICY_CHANGE),
+CXL_CMD(LOGS, GET_SUPPORTED, 0, 0),
+CXL_CMD(LOGS, GET_LOG, 0x18, 0),
 };
 
 #undef CXL_CMD
@@ -188,6 +195,66 @@ define_mailbox_handler(TIMESTAMP_SET)
 
 QemuUUID cel_uuid;
 
+/* 8.2.9.4.1 */
+define_mailbox_handler(LOGS_GET_SUPPORTED)
+{
+struct {
+uint16_t entries;
+uint8_t rsvd[6];
+struct {
+QemuUUID uuid;
+uint32_t size;
+} log_entries[1];
+} __attribute__((packed)) *supported_logs = (void *)cmd->payload;
+_Static_assert(sizeof(*supported_logs) == 0x1c, "Bad supported log size");
+
+supported_logs->entries = 1;
+supported_logs->log_entries[0].uuid = cel_uuid;
+supported_logs->log_entries[0].size = 4 * cxl_dstate->cel_size;
+
+*len = sizeof(*supported_logs);
+return CXL_MBOX_SUCCESS;
+}
+
+/* 8.2.9.4.2 */
+define_mailbox_handler(LOGS_GET_LOG)
+{
+struct {
+QemuUUID uuid;
+uint32_t offset;
+uint32_t length;
+} __attribute__((packed, __aligned__(16))) *get_log = (void *)cmd->payload;
+
+/*
+ * 8.2.9.4.2
+ *   The device shall return Invalid Parameter if the Offset or Length
+ *   fields attempt to access beyond the size of the log as reported by Get
+ *   Supported Logs.
+ *
+ * XXX: Spec is wrong, "Invalid Parameter" isn't a thing.
+ * XXX: Spec doesn't address incorrect UUID incorrectness.
+ *
+ * The CEL buffer is large enough to fit all commands in the emulation, so
+ * the only possible failure would be if the mailbox itself isn't big
+ * enough.
+ */
+if (get_log->offset + get_log->length > cxl_dstate->payload_size) {
+return CXL_MBOX_INVALID_INPUT;
+}
+
+if (!qemu_uuid_is_equal(_log->uuid, _uuid)) {
+return CXL_MBOX_UNSUPPORTED;
+}
+
+/* Store off everything to local variables so we can wipe out the payload 
*/
+*len = get_log->length;
+
+memmove(cmd->payload, cxl_dstate->cel_log + get_log->offset,
+   get_log->length);
+
+return CXL_MBOX_SUCCESS;
+}
+
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
 {
 uint16_t ret = CXL_MBOX_SUCCESS;
-- 
2.30.0

[RFC PATCH v3 10/31] hw/pxb: Use a type for realizing expanders

2021-02-01 Thread Ben Widawsky

This opens up the possibility for more types of expanders (other than
PCI and PCIe). We'll need this to create a CXL expander.

Signed-off-by: Ben Widawsky 
---
 hw/pci-bridge/pci_expander_bridge.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/hw/pci-bridge/pci_expander_bridge.c 
b/hw/pci-bridge/pci_expander_bridge.c
index aedded1064..232b7ce305 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -24,6 +24,8 @@
 #include "hw/boards.h"
 #include "qom/object.h"
 
+enum BusType { PCI, PCIE };
+
 #define TYPE_PXB_BUS "pxb-bus"
 typedef struct PXBBus PXBBus;
 DECLARE_INSTANCE_CHECKER(PXBBus, PXB_BUS,
@@ -214,7 +216,8 @@ static gint pxb_compare(gconstpointer a, gconstpointer b)
0;
 }
 
-static void pxb_dev_realize_common(PCIDevice *dev, bool pcie, Error **errp)
+static void pxb_dev_realize_common(PCIDevice *dev, enum BusType type,
+   Error **errp)
 {
 PXBDev *pxb = convert_to_pxb(dev);
 DeviceState *ds, *bds = NULL;
@@ -239,7 +242,7 @@ static void pxb_dev_realize_common(PCIDevice *dev, bool 
pcie, Error **errp)
 }
 
 ds = qdev_new(TYPE_PXB_HOST);
-if (pcie) {
+if (type == PCIE) {
 bus = pci_root_bus_new(ds, dev_name, NULL, NULL, 0, TYPE_PXB_PCIE_BUS);
 } else {
 bus = pci_root_bus_new(ds, "pxb-internal", NULL, NULL, 0, 
TYPE_PXB_BUS);
@@ -287,7 +290,7 @@ static void pxb_dev_realize(PCIDevice *dev, Error **errp)
 return;
 }
 
-pxb_dev_realize_common(dev, false, errp);
+pxb_dev_realize_common(dev, PCI, errp);
 }
 
 static void pxb_dev_exitfn(PCIDevice *pci_dev)
@@ -339,7 +342,7 @@ static void pxb_pcie_dev_realize(PCIDevice *dev, Error 
**errp)
 return;
 }
 
-pxb_dev_realize_common(dev, true, errp);
+pxb_dev_realize_common(dev, PCIE, errp);
 }
 
 static void pxb_pcie_dev_class_init(ObjectClass *klass, void *data)
-- 
2.30.0

[RFC PATCH v3 22/31] hw/cxl/device: Implement MMIO HDM decoding (8.2.5.12)

2021-02-01 Thread Ben Widawsky

A device's volatile and persistent memory are known Host Defined Memory
(HDM) regions. The mechanism by which the device is programmed to claim
the addresses associated with those regions is through dedicated logic
known as the HDM decoder. In order to allow the OS to properly program
the HDMs, the HDM decoders must be modeled.

There are two ways the HDM decoders can be implemented, the legacy
mechanism is through the PCIe DVSEC programming from CXL 1.1 (8.1.3.8),
and MMIO is found in 8.2.5.12 of the spec. For now, 8.1.3.8 is not
implemented.

Much of CXL device logic is implemented in cxl-utils. The HDM decoder
however is implemented directly by the device implementation. The
generic cxl-utils probably should be the correct place to put this since
HDM decoders aren't unique to a type3 device. It is however easier at
the moment, and requires less design consideration to simply implement
it in the device, and figure out how to consolidate it later.

Signed-off-by: Ben Widawsky 
---
 hw/mem/cxl_type3.c | 92 ++
 1 file changed, 84 insertions(+), 8 deletions(-)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 4e9a016448..fe02c3b63c 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -57,6 +57,84 @@ static void build_dvsecs(CXLType3Dev *ct3d)
REG_LOC_DVSEC_REVID, dvsec);
 }
 
+static void cxl_set_addr(CXLType3Dev *ct3d, hwaddr addr, Error **errp)
+{
+MemoryDeviceClass *mdc = MEMORY_DEVICE_GET_CLASS(ct3d);
+mdc->set_addr(MEMORY_DEVICE(ct3d), addr, errp);
+}
+
+static void hdm_decoder_commit(CXLType3Dev *ct3d, int which)
+{
+MemoryRegion *pmem = ct3d->cxl_dstate.pmem;
+MemoryRegion *mr = host_memory_backend_get_memory(ct3d->hostmem);
+Range window, device;
+ComponentRegisters *cregs = >cxl_cstate.crb;
+uint32_t *cache_mem = cregs->cache_mem_registers;
+uint64_t offset, size;
+Error *err = NULL;
+
+assert(which == 0);
+
+ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, COMMIT, 0);
+ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, ERROR, 0);
+
+offset = ((uint64_t)cache_mem[R_CXL_HDM_DECODER0_BASE_HI] << 32) |
+ cache_mem[R_CXL_HDM_DECODER0_BASE_LO];
+size = ((uint64_t)cache_mem[R_CXL_HDM_DECODER0_SIZE_HI] << 32) |
+   cache_mem[R_CXL_HDM_DECODER0_SIZE_LO];
+
+range_init_nofail(, mr->addr, memory_region_size(mr));
+range_init_nofail(, offset, size);
+
+if (!range_contains_range(, )) {
+ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, ERROR, 1);
+return;
+}
+
+/*
+ * FIXME: Support resizing.
+ * Maybe just memory_region_ram_resize(pmem, size, )?
+ */
+if (size != memory_region_size(pmem)) {
+ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, ERROR, 1);
+return;
+}
+
+cxl_set_addr(ct3d, offset, );
+if (err) {
+ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, ERROR, 1);
+return;
+}
+memory_region_set_enabled(pmem, true);
+
+ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, COMMITTED, 1);
+}
+
+static void ct3d_reg_write(void *opaque, hwaddr offset, uint64_t value, 
unsigned size)
+{
+CXLComponentState *cxl_cstate = opaque;
+ComponentRegisters *cregs = _cstate->crb;
+CXLType3Dev *ct3d = container_of(cxl_cstate, CXLType3Dev, cxl_cstate);
+uint32_t *cache_mem = cregs->cache_mem_registers;
+bool should_commit = false;
+int which_hdm = -1;
+
+assert(size == 4);
+
+switch (offset) {
+case A_CXL_HDM_DECODER0_CTRL:
+should_commit = FIELD_EX32(value, CXL_HDM_DECODER0_CTRL, COMMIT);
+which_hdm = 0;
+break;
+default:
+break;
+}
+
+stl_le_p((uint8_t *)cache_mem + offset, value);
+if (should_commit)
+hdm_decoder_commit(ct3d, which_hdm);
+}
+
 static void ct3_instance_init(Object *obj)
 {
 /* MemoryDeviceClass *mdc = MEMORY_DEVICE_GET_CLASS(obj); */
@@ -65,18 +143,13 @@ static void ct3_instance_init(Object *obj)
 static void ct3_finalize(Object *obj)
 {
 CXLType3Dev *ct3d = CT3(obj);
+CXLComponentState *cxl_cstate = >cxl_cstate;
+ComponentRegisters *regs = _cstate->crb;
 
+g_free((void *)regs->special_ops);
 g_free(ct3d->cxl_dstate.pmem);
 }
 
-#ifdef SET_PMEM_PADDR
-static void cxl_set_addr(CXLType3Dev *ct3d, hwaddr addr, Error **errp)
-{
-MemoryDeviceClass *mdc = MEMORY_DEVICE_GET_CLASS(ct3d);
-mdc->set_addr(MEMORY_DEVICE(ct3d), addr, errp);
-}
-#endif
-
 static void cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
 {
 MemoryRegionSection mrs;
@@ -160,6 +233,9 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
 ct3d->cxl_cstate.pdev = pci_dev;
 build_dvsecs(ct3d);
 
+regs->special_ops = g_new0(MemoryRegionOps, 1);
+regs->special_ops->write = ct3d_reg_write;
+
 cxl_component_register_block_init(OBJECT(pci_dev), cxl_cs

[RFC PATCH v3 04/31] hw/cxl/device: Implement the CAP array (8.2.8.1-2)

2021-02-01 Thread Ben Widawsky

This implements all device MMIO up to the first capability. That
includes the CXL Device Capabilities Array Register, as well as all of
the CXL Device Capability Header Registers. The latter are filled in as
they are implemented in the following patches.

Endianness and alignment are managed by softmmu memory core.

Signed-off-by: Ben Widawsky 
---
 hw/cxl/cxl-device-utils.c   | 105 
 hw/cxl/meson.build  |   1 +
 include/hw/cxl/cxl_device.h |  27 +-
 3 files changed, 132 insertions(+), 1 deletion(-)
 create mode 100644 hw/cxl/cxl-device-utils.c

diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
new file mode 100644
index 00..bb15ad9a0f
--- /dev/null
+++ b/hw/cxl/cxl-device-utils.c
@@ -0,0 +1,105 @@
+/*
+ * CXL Utility library for devices
+ *
+ * Copyright(C) 2020 Intel Corporation.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See the
+ * COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/cxl/cxl.h"
+
+/*
+ * Device registers have no restrictions per the spec, and so fall back to the
+ * default memory mapped register rules in 8.2:
+ *   Software shall use CXL.io Memory Read and Write to access memory mapped
+ *   register defined in this section. Unless otherwise specified, software
+ *   shall restrict the accesses width based on the following:
+ *   • A 32 bit register shall   be accessed as a 1 Byte, 2 Bytes or 4 Bytes
+ * quantity.
+ *   • A 64 bit register shall be accessed as a 1 Byte, 2 Bytes, 4 Bytes or 8
+ * Bytes
+ *   • The address shall be a multiple of the access width, e.g. when
+ * accessing a register as a 4 Byte quantity, the address shall be
+ * multiple of 4.
+ *   • The accesses shall map to contiguous bytes.If these rules are not
+ * followed, the behavior is undefined
+ */
+
+static uint64_t caps_reg_read(void *opaque, hwaddr offset, unsigned size)
+{
+CXLDeviceState *cxl_dstate = opaque;
+
+return cxl_dstate->caps_reg_state32[offset / 4];
+}
+
+static uint64_t dev_reg_read(void *opaque, hwaddr offset, unsigned size)
+{
+return 0;
+}
+
+static const MemoryRegionOps dev_ops = {
+.read = dev_reg_read,
+.write = NULL, /* status register is read only */
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 1,
+.max_access_size = 8,
+.unaligned = false,
+},
+.impl = {
+.min_access_size = 1,
+.max_access_size = 8,
+},
+};
+
+static const MemoryRegionOps caps_ops = {
+.read = caps_reg_read,
+.write = NULL, /* caps registers are read only */
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 1,
+.max_access_size = 8,
+.unaligned = false,
+},
+.impl = {
+.min_access_size = 4,
+.max_access_size = 4,
+},
+};
+
+void cxl_device_register_block_init(Object *obj, CXLDeviceState *cxl_dstate)
+{
+/* This will be a BAR, so needs to be rounded up to pow2 for PCI spec */
+memory_region_init(_dstate->device_registers, obj, "device-registers",
+   pow2ceil(CXL_MMIO_SIZE));
+
+memory_region_init_io(_dstate->caps, obj, _ops, cxl_dstate,
+  "cap-array", CXL_DEVICE_REGISTERS_OFFSET - 0);
+memory_region_init_io(_dstate->device, obj, _ops, cxl_dstate,
+  "device-status", CXL_DEVICE_REGISTERS_LENGTH);
+
+memory_region_add_subregion(_dstate->device_registers, 0,
+_dstate->caps);
+memory_region_add_subregion(_dstate->device_registers,
+CXL_DEVICE_REGISTERS_OFFSET,
+_dstate->device);
+}
+
+static void device_reg_init_common(CXLDeviceState *cxl_dstate) { }
+
+void cxl_device_register_init_common(CXLDeviceState *cxl_dstate)
+{
+uint32_t *cap_hdrs = cxl_dstate->caps_reg_state32;
+const int cap_count = 1;
+
+/* CXL Device Capabilities Array Register */
+ARRAY_FIELD_DP32(cap_hdrs, CXL_DEV_CAP_ARRAY, CAP_ID, 0);
+ARRAY_FIELD_DP32(cap_hdrs, CXL_DEV_CAP_ARRAY, CAP_VERSION, 1);
+ARRAY_FIELD_DP32(cap_hdrs, CXL_DEV_CAP_ARRAY2, CAP_COUNT, cap_count);
+
+cxl_device_cap_init(cxl_dstate, DEVICE, 1);
+device_reg_init_common(cxl_dstate);
+}
diff --git a/hw/cxl/meson.build b/hw/cxl/meson.build
index 00c3876a0f..47154d6850 100644
--- a/hw/cxl/meson.build
+++ b/hw/cxl/meson.build
@@ -1,3 +1,4 @@
 softmmu_ss.add(when: 'CONFIG_CXL', if_true: files(
   'cxl-component-utils.c',
+  'cxl-device-utils.c',
 ))
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index a85f250503..f3bcf19410 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -58,6 +58,8 @@
 #define CXL_DEVICE_CAP_HDR1_OFFSET 0x10 /* Figure 138 */
 #define CXL_DEVIC

[RFC PATCH v3 06/31] hw/cxl/device: Add memory device utilities

2021-02-01 Thread Ben Widawsky

Memory devices implement extra capabilities on top of CXL devices. This
adds support for that.

A large part of memory devices is the mailbox/command interface. All of
the mailbox handling is done in the mailbox-utils library. Longer term,
new CXL devices that are being emulated may want to handle commands
differently, and therefore would need a mechanism to opt in/out of the
specific generic handlers. As such, this is considered sufficient for
now, but may need more depth in the future.

Signed-off-by: Ben Widawsky 
---
 hw/cxl/cxl-device-utils.c   | 38 -
 include/hw/cxl/cxl_device.h | 18 +-
 2 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
index 6602606f3d..639ace523d 100644
--- a/hw/cxl/cxl-device-utils.c
+++ b/hw/cxl/cxl-device-utils.c
@@ -130,6 +130,31 @@ static void mailbox_reg_write(void *opaque, hwaddr offset, 
uint64_t value,
 cxl_process_mailbox(cxl_dstate);
 }
 
+static uint64_t mdev_reg_read(void *opaque, hwaddr offset, unsigned size)
+{
+uint64_t retval = 0;
+
+retval = FIELD_DP64(retval, CXL_MEM_DEV_STS, MEDIA_STATUS, 1);
+retval = FIELD_DP64(retval, CXL_MEM_DEV_STS, MBOX_READY, 1);
+
+return retval;
+}
+
+static const MemoryRegionOps mdev_ops = {
+.read = mdev_reg_read,
+.write = NULL, /* memory device register is read only */
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 1,
+.max_access_size = 8,
+.unaligned = false,
+},
+.impl = {
+.min_access_size = 8,
+.max_access_size = 8,
+},
+};
+
 static const MemoryRegionOps mailbox_ops = {
 .read = mailbox_reg_read,
 .write = mailbox_reg_write,
@@ -187,6 +212,9 @@ void cxl_device_register_block_init(Object *obj, 
CXLDeviceState *cxl_dstate)
   "device-status", CXL_DEVICE_REGISTERS_LENGTH);
 memory_region_init_io(_dstate->mailbox, obj, _ops, cxl_dstate,
   "mailbox", CXL_MAILBOX_REGISTERS_LENGTH);
+memory_region_init_io(_dstate->memory_device, obj, _ops,
+  cxl_dstate, "memory device caps",
+  CXL_MEMORY_DEVICE_REGISTERS_LENGTH);
 
 memory_region_add_subregion(_dstate->device_registers, 0,
 _dstate->caps);
@@ -196,6 +224,9 @@ void cxl_device_register_block_init(Object *obj, 
CXLDeviceState *cxl_dstate)
 memory_region_add_subregion(_dstate->device_registers,
 CXL_MAILBOX_REGISTERS_OFFSET,
 _dstate->mailbox);
+memory_region_add_subregion(_dstate->device_registers,
+CXL_MEMORY_DEVICE_REGISTERS_OFFSET,
+_dstate->memory_device);
 }
 
 static void device_reg_init_common(CXLDeviceState *cxl_dstate) { }
@@ -208,10 +239,12 @@ static void mailbox_reg_init_common(CXLDeviceState 
*cxl_dstate)
 cxl_dstate->payload_size = CXL_MAILBOX_MAX_PAYLOAD_SIZE;
 }
 
+static void memdev_reg_init_common(CXLDeviceState *cxl_dstate) { }
+
 void cxl_device_register_init_common(CXLDeviceState *cxl_dstate)
 {
 uint32_t *cap_hdrs = cxl_dstate->caps_reg_state32;
-const int cap_count = 2;
+const int cap_count = 3;
 
 /* CXL Device Capabilities Array Register */
 ARRAY_FIELD_DP32(cap_hdrs, CXL_DEV_CAP_ARRAY, CAP_ID, 0);
@@ -224,5 +257,8 @@ void cxl_device_register_init_common(CXLDeviceState 
*cxl_dstate)
 cxl_device_cap_init(cxl_dstate, MAILBOX, 2);
 mailbox_reg_init_common(cxl_dstate);
 
+cxl_device_cap_init(cxl_dstate, MEMORY_DEVICE, 0x4000);
+memdev_reg_init_common(cxl_dstate);
+
 assert(cxl_initialize_mailbox(cxl_dstate) == 0);
 }
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index af91bec10c..0cc5354ba4 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -72,15 +72,20 @@
 #define CXL_MAILBOX_REGISTERS_LENGTH \
 (CXL_MAILBOX_REGISTERS_SIZE + CXL_MAILBOX_MAX_PAYLOAD_SIZE)
 
+#define CXL_MEMORY_DEVICE_REGISTERS_OFFSET \
+(CXL_MAILBOX_REGISTERS_OFFSET + CXL_MAILBOX_REGISTERS_LENGTH)
+#define CXL_MEMORY_DEVICE_REGISTERS_LENGTH 0x8
+
 #define CXL_MMIO_SIZE   \
 CXL_DEVICE_CAP_REG_SIZE + CXL_DEVICE_REGISTERS_LENGTH + \
-CXL_MAILBOX_REGISTERS_LENGTH
+CXL_MAILBOX_REGISTERS_LENGTH + CXL_MEMORY_DEVICE_REGISTERS_LENGTH
 
 typedef struct cxl_device_state {
 MemoryRegion device_registers;
 
 /* mmio for device capabilities array - 8.2.8.2 */
 MemoryRegion device;
+MemoryRegion memory_device;
 struct {
 MemoryRegion caps;
 uint32_t caps_reg_state32[CXL_CAPS_SIZE / 4];
@@ -145,6 +150,9 @@ REG32(CXL_DEV_CAP_ARRAY2, 4) /* We're going to pretend it's 
64b */
 CXL_DEVICE_CAPABILITY_HEADER_REGISTER(DEVICE, CXL_DEVICE_CAP

[RFC PATCH v3 17/31] hw/cxl/component: Implement host bridge MMIO (8.2.5, table 142)

2021-02-01 Thread Ben Widawsky

CXL host bridges themselves may have MMIO. Since host bridges don't have
a BAR they are treated as special for MMIO.

Signed-off-by: Ben Widawsky 

--

It's arbitrarily chosen here to pick 0xD000 as the base for the host
bridge MMIO. I'm not sure what the right way to find free space for
platform hardcoded things like this is.
---
 hw/pci-bridge/pci_expander_bridge.c | 53 -
 include/hw/cxl/cxl.h|  2 ++
 2 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/hw/pci-bridge/pci_expander_bridge.c 
b/hw/pci-bridge/pci_expander_bridge.c
index 5021b60435..226a8a5fff 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -17,6 +17,7 @@
 #include "hw/pci/pci_host.h"
 #include "hw/qdev-properties.h"
 #include "hw/pci/pci_bridge.h"
+#include "hw/cxl/cxl.h"
 #include "qemu/range.h"
 #include "qemu/error-report.h"
 #include "qemu/module.h"
@@ -70,6 +71,12 @@ struct PXBDev {
 int32_t uid;
 };
 
+typedef struct CXLHost {
+PCIHostState parent_obj;
+
+CXLComponentState cxl_cstate;
+} CXLHost;
+
 static PXBDev *convert_to_pxb(PCIDevice *dev)
 {
 /* A CXL PXB's parent bus is PCIe, so the normal check won't work */
@@ -85,6 +92,9 @@ static GList *pxb_dev_list;
 
 #define TYPE_PXB_HOST "pxb-host"
 
+#define TYPE_PXB_CXL_HOST "pxb-cxl-host"
+#define PXB_CXL_HOST(obj) OBJECT_CHECK(CXLHost, (obj), TYPE_PXB_CXL_HOST)
+
 static int pxb_bus_num(PCIBus *bus)
 {
 PXBDev *pxb = convert_to_pxb(bus->parent_dev);
@@ -198,6 +208,46 @@ static const TypeInfo pxb_host_info = {
 .class_init= pxb_host_class_init,
 };
 
+static void pxb_cxl_realize(DeviceState *dev, Error **errp)
+{
+SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
+PCIHostState *phb = PCI_HOST_BRIDGE(dev);
+CXLHost *cxl = PXB_CXL_HOST(dev);
+CXLComponentState *cxl_cstate = >cxl_cstate;
+struct MemoryRegion *mr = _cstate->crb.component_registers;
+
+cxl_component_register_block_init(OBJECT(dev), cxl_cstate,
+  TYPE_PXB_CXL_HOST);
+sysbus_init_mmio(sbd, mr);
+
+/* FIXME: support multiple host bridges. */
+sysbus_mmio_map(sbd, 0, CXL_HOST_BASE +
+memory_region_size(mr) * pci_bus_uid(phb->bus));
+}
+
+static void pxb_cxl_host_class_init(ObjectClass *class, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(class);
+PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(class);
+
+hc->root_bus_path = pxb_host_root_bus_path;
+dc->fw_name = "cxl";
+dc->realize = pxb_cxl_realize;
+/* Reason: Internal part of the pxb/pxb-pcie device, not usable by itself 
*/
+dc->user_creatable = false;
+}
+
+/*
+ * This is a device to handle the MMIO for a CXL host bridge. It does nothing
+ * else.
+ */
+static const TypeInfo cxl_host_info = {
+.name  = TYPE_PXB_CXL_HOST,
+.parent= TYPE_PCI_HOST_BRIDGE,
+.instance_size = sizeof(CXLHost),
+.class_init= pxb_cxl_host_class_init,
+};
+
 /*
  * Registers the PXB bus as a child of pci host root bus.
  */
@@ -272,7 +322,7 @@ static void pxb_dev_realize_common(PCIDevice *dev, enum 
BusType type,
 dev_name = dev->qdev.id;
 }
 
-ds = qdev_new(TYPE_PXB_HOST);
+ds = qdev_new(type == CXL ? TYPE_PXB_CXL_HOST : TYPE_PXB_HOST);
 if (type == PCIE) {
 bus = pci_root_bus_new(ds, dev_name, NULL, NULL, 0, TYPE_PXB_PCIE_BUS);
 } else if (type == CXL) {
@@ -466,6 +516,7 @@ static void pxb_register_types(void)
 type_register_static(_pcie_bus_info);
 type_register_static(_cxl_bus_info);
 type_register_static(_host_info);
+type_register_static(_host_info);
 type_register_static(_dev_info);
 type_register_static(_pcie_dev_info);
 type_register_static(_cxl_dev_info);
diff --git a/include/hw/cxl/cxl.h b/include/hw/cxl/cxl.h
index 362cda40de..6bc344f205 100644
--- a/include/hw/cxl/cxl.h
+++ b/include/hw/cxl/cxl.h
@@ -17,5 +17,7 @@
 #define COMPONENT_REG_BAR_IDX 0
 #define DEVICE_REG_BAR_IDX 2
 
+#define CXL_HOST_BASE 0xD000
+
 #endif
 
-- 
2.30.0

[RFC PATCH v3 03/31] hw/cxl/device: Introduce a CXL device (8.2.8)

2021-02-01 Thread Ben Widawsky

A CXL device is a type of CXL component. Conceptually, a CXL device
would be a leaf node in a CXL topology. From an emulation perspective,
CXL devices are the most complex and so the actual implementation is
reserved for discrete commits.

This new device type is specifically catered towards the eventual
implementation of a Type3 CXL.mem device, 8.2.8.5 in the CXL 2.0
specification.

Signed-off-by: Ben Widawsky 
---
 include/hw/cxl/cxl.h|   1 +
 include/hw/cxl/cxl_device.h | 155 
 2 files changed, 156 insertions(+)
 create mode 100644 include/hw/cxl/cxl_device.h

diff --git a/include/hw/cxl/cxl.h b/include/hw/cxl/cxl.h
index 55f6cc30a5..23f52c4cf9 100644
--- a/include/hw/cxl/cxl.h
+++ b/include/hw/cxl/cxl.h
@@ -12,6 +12,7 @@
 
 #include "cxl_pci.h"
 #include "cxl_component.h"
+#include "cxl_device.h"
 
 #endif
 
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
new file mode 100644
index 00..a85f250503
--- /dev/null
+++ b/include/hw/cxl/cxl_device.h
@@ -0,0 +1,155 @@
+/*
+ * QEMU CXL Devices
+ *
+ * Copyright (c) 2020 Intel
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See the
+ * COPYING file in the top-level directory.
+ */
+
+#ifndef CXL_DEVICE_H
+#define CXL_DEVICE_H
+
+#include "hw/register.h"
+
+/*
+ * The following is how a CXL device's MMIO space is laid out. The only
+ * requirement from the spec is that the capabilities array and the capability
+ * headers start at offset 0 and are contiguously packed. The headers 
themselves
+ * provide offsets to the register fields. For this emulation, registers will
+ * start at offset 0x80 (m == 0x80). No secondary mailbox is implemented which
+ * means that n = m + sizeof(mailbox registers) + sizeof(device registers).
+ *
+ * This is roughly described in 8.2.8 Figure 138 of the CXL 2.0 spec.
+ *
+ * n + PAYLOAD_SIZE_MAX  +-+
+ *   | |
+ *  ^| |
+ *  || |
+ *  || |
+ *  || |
+ *  || Command Payload |
+ *  || |
+ *  || |
+ *  || |
+ *  || |
+ *  || |
+ *  n+-+
+ *  ^| |
+ *  ||Device Capability Registers  |
+ *  ||x, mailbox, y|
+ *  || |
+ *  m+-+
+ *  ^| Device Capability Header y  |
+ *  |+-+
+ *  || Device Capability Header Mailbox|
+ *  |+- 
+ *  || Device Capability Header x  |
+ *  |+-+
+ *  || |
+ *  || |
+ *  ||  Device Cap Array[0..n] |
+ *  || |
+ *  || |
+ *  || |
+ *  0+-+
+ */
+
+#define CXL_DEVICE_CAP_HDR1_OFFSET 0x10 /* Figure 138 */
+#define CXL_DEVICE_CAP_REG_SIZE 0x10 /* 8.2.8.2 */
+#define CXL_DEVICE_CAPS_MAX 4 /* 8.2.8.2.1 + 8.2.8.5 */
+
+#define CXL_DEVICE_REGISTERS_OFFSET 0x80 /* Read comment above */
+#define CXL_DEVICE_REGISTERS_LENGTH 0x8 /* 8.2.8.3.1 */
+
+#define CXL_MAILBOX_REGISTERS_OFFSET \
+(CXL_DEVICE_REGISTERS_OFFSET + CXL_DEVICE_REGISTERS_LENGTH)
+#define CXL_MAILBOX_REGISTERS_SIZE 0x20
+#define CXL_MAILBOX_PAYLOAD_SHIFT 11
+#define CXL_MAILBOX_MAX_PAYLOAD_SIZE (1 << CXL_MAILBOX_PAYLOAD_SHIFT)
+#define CXL_MAILBOX_REGISTERS_LENGTH \
+(CXL_MAILBOX_REGISTERS_SIZE + CXL_MAILBOX_MAX_PAYLOAD_SIZE)
+
+typedef struct cxl_device_state {
+MemoryRegion device_registers;
+
+/* mmio for device capabilities array - 8.2.8.2 */
+MemoryRegion caps;
+
+/* mmio for the device status registers 8.2.8.3 */
+MemoryRegion device;
+
+/* mmio for the mailbox registers 8.2.8.4 */
+MemoryRegion mailbox;
+
+/* memory region for persistent memory, HDM */
+MemoryRegion *pmem;
+
+/* memory region for volatile  memory, HDM */
+MemoryRegion *vmem;
+} CXLDe

[RFC PATCH v3 05/31] hw/cxl/device: Implement basic mailbox (8.2.8.4)

2021-02-01 Thread Ben Widawsky

This is the beginning of implementing mailbox support for CXL 2.0
devices. The implementation recognizes when the doorbell is rung,
handles the command/payload, clears the doorbell while returning error
codes and data.

Generally the mailbox mechanism is designed to permit communication
between the host OS and the firmware running on the device. For our
purposes, we emulate both the firmware, implemented primarily in
cxl-mailbox-utils.c, and the hardware.

No commands are implemented yet.

Signed-off-by: Ben Widawsky 
---
 hw/cxl/cxl-device-utils.c   | 125 ++-
 hw/cxl/cxl-mailbox-utils.c  | 197 
 hw/cxl/meson.build  |   1 +
 include/hw/cxl/cxl.h|   3 +
 include/hw/cxl/cxl_device.h |  28 -
 5 files changed, 349 insertions(+), 5 deletions(-)
 create mode 100644 hw/cxl/cxl-mailbox-utils.c

diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
index bb15ad9a0f..6602606f3d 100644
--- a/hw/cxl/cxl-device-utils.c
+++ b/hw/cxl/cxl-device-utils.c
@@ -40,6 +40,111 @@ static uint64_t dev_reg_read(void *opaque, hwaddr offset, 
unsigned size)
 return 0;
 }
 
+static uint64_t mailbox_reg_read(void *opaque, hwaddr offset, unsigned size)
+{
+CXLDeviceState *cxl_dstate = opaque;
+
+switch (size) {
+case 8:
+return cxl_dstate->mbox_reg_state64[offset / 8];
+case 4:
+return cxl_dstate->mbox_reg_state32[offset / 4];
+default:
+g_assert_not_reached();
+}
+}
+
+static void mailbox_mem_writel(uint32_t *reg_state, hwaddr offset,
+   uint64_t value)
+{
+switch (offset) {
+case A_CXL_DEV_MAILBOX_CTRL:
+/* fallthrough */
+case A_CXL_DEV_MAILBOX_CAP:
+/* RO register */
+break;
+default:
+qemu_log_mask(LOG_UNIMP,
+  "%s Unexpected 32-bit access to 0x%" PRIx64 " (WI)\n",
+  __func__, offset);
+break;
+}
+
+reg_state[offset / 4] = value;
+}
+
+static void mailbox_mem_writeq(uint64_t *reg_state, hwaddr offset,
+   uint64_t value)
+{
+switch (offset) {
+case A_CXL_DEV_MAILBOX_CMD:
+break;
+case A_CXL_DEV_BG_CMD_STS:
+/* BG not supported */
+/* fallthrough */
+case A_CXL_DEV_MAILBOX_STS:
+/* Read only register, will get updated by the state machine */
+return;
+default:
+qemu_log_mask(LOG_UNIMP,
+  "%s Unexpected 64-bit access to 0x%" PRIx64 " (WI)\n",
+  __func__, offset);
+return;
+}
+
+
+reg_state[offset / 8] = value;
+}
+
+static void mailbox_reg_write(void *opaque, hwaddr offset, uint64_t value,
+  unsigned size)
+{
+CXLDeviceState *cxl_dstate = opaque;
+
+if (offset >= A_CXL_DEV_CMD_PAYLOAD) {
+memcpy(cxl_dstate->mbox_reg_state + offset, , size);
+return;
+}
+
+/*
+ * Lock is needed to prevent concurrent writes as well as to prevent writes
+ * coming in while the firmware is processing. Without background commands
+ * or the second mailbox implemented, this serves no purpose since the
+ * memory access is synchronized at a higher level (per memory region).
+ */
+RCU_READ_LOCK_GUARD();
+
+switch (size) {
+case 4:
+mailbox_mem_writel(cxl_dstate->mbox_reg_state32, offset, value);
+break;
+case 8:
+mailbox_mem_writeq(cxl_dstate->mbox_reg_state64, offset, value);
+break;
+default:
+g_assert_not_reached();
+}
+
+if (ARRAY_FIELD_EX32(cxl_dstate->mbox_reg_state32, CXL_DEV_MAILBOX_CTRL,
+ DOORBELL))
+cxl_process_mailbox(cxl_dstate);
+}
+
+static const MemoryRegionOps mailbox_ops = {
+.read = mailbox_reg_read,
+.write = mailbox_reg_write,
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 1,
+.max_access_size = 8,
+.unaligned = false,
+},
+.impl = {
+.min_access_size = 4,
+.max_access_size = 8,
+},
+};
+
 static const MemoryRegionOps dev_ops = {
 .read = dev_reg_read,
 .write = NULL, /* status register is read only */
@@ -80,20 +185,33 @@ void cxl_device_register_block_init(Object *obj, 
CXLDeviceState *cxl_dstate)
   "cap-array", CXL_DEVICE_REGISTERS_OFFSET - 0);
 memory_region_init_io(_dstate->device, obj, _ops, cxl_dstate,
   "device-status", CXL_DEVICE_REGISTERS_LENGTH);
+memory_region_init_io(_dstate->mailbox, obj, _ops, cxl_dstate,
+  "mailbox", CXL_MAILBOX_REGISTERS_LENGTH);
 
 memory_region_add_subregion(_dstate->device_registers, 0,
 _dstate->caps);
 memory_region_add_subregion(_dstate->device_registers,

[RFC PATCH v3 07/31] hw/cxl/device: Add cheap EVENTS implementation (8.2.9.1)

2021-02-01 Thread Ben Widawsky

Using the previously implemented stubbed helpers, it is now possible to
easily add the missing, required commands to the implementation.

Signed-off-by: Ben Widawsky 
---
 hw/cxl/cxl-mailbox-utils.c | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 466055b01a..7c939a1851 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -37,6 +37,14 @@
  *  a register interface that already deals with it.
  */
 
+enum {
+EVENTS  = 0x01,
+#define GET_RECORDS   0x0
+#define CLEAR_RECORDS   0x1
+#define GET_INTERRUPT_POLICY   0x2
+#define SET_INTERRUPT_POLICY   0x3
+};
+
 /* 8.2.8.4.5.1 Command Return Codes */
 typedef enum {
 CXL_MBOX_SUCCESS = 0x0,
@@ -105,10 +113,23 @@ struct cxl_cmd {
 return CXL_MBOX_SUCCESS;  \
 }
 
+define_mailbox_handler_zeroed(EVENTS_GET_RECORDS, 0x20);
+define_mailbox_handler_nop(EVENTS_CLEAR_RECORDS);
+define_mailbox_handler_zeroed(EVENTS_GET_INTERRUPT_POLICY, 4);
+define_mailbox_handler_nop(EVENTS_SET_INTERRUPT_POLICY);
+
+#define IMMEDIATE_CONFIG_CHANGE (1 << 1)
+#define IMMEDIATE_LOG_CHANGE (1 << 4)
+
 #define CXL_CMD(s, c, in, cel_effect) \
 [s][c] = { stringify(s##_##c), cmd_##s##_##c, in, cel_effect }
 
-static struct cxl_cmd cxl_cmd_set[256][256] = {};
+static struct cxl_cmd cxl_cmd_set[256][256] = {
+CXL_CMD(EVENTS, GET_RECORDS, 1, 0),
+CXL_CMD(EVENTS, CLEAR_RECORDS, ~0, IMMEDIATE_LOG_CHANGE),
+CXL_CMD(EVENTS, GET_INTERRUPT_POLICY, 0, 0),
+CXL_CMD(EVENTS, SET_INTERRUPT_POLICY, 4, IMMEDIATE_CONFIG_CHANGE),
+};
 
 #undef CXL_CMD
 
-- 
2.30.0

[RFC PATCH v3 16/31] hw/pci: Plumb _UID through host bridges

2021-02-01 Thread Ben Widawsky

Currently, QEMU makes _UID equivalent to the bus number (_BBN). While
there is nothing wrong with doing it this way, CXL spec has a heavy
reliance on _UID to identify host bridges and there is no link to the
bus number. Having a distinct UID solves two problems. The first is it
gets us around the limitation of 256 (current max bus number). The
second is it allows us to replicate hardware configurations where bus
number and uid aren't equivalent. The latter has benefits for our
development and debugging using QEMU.

The other way to do this would be to implement the expanded bus
numbering, but having an explicit uid makes more sense when trying to
replicate real hardware configurations.

The QEMU commandline to utilize this would be:
  -device pxb-cxl,id=cxl.0,bus="pcie.0",bus_nr=1,uid=x

Signed-off-by: Ben Widawsky 

--

I'm guessing this patch will be somewhat controversial. For early CXL
work, this can be dropped without too much heartache.
---
 hw/i386/acpi-build.c|  3 ++-
 hw/pci-bridge/pci_expander_bridge.c | 19 +++
 hw/pci/pci.c| 11 +++
 include/hw/pci/pci.h|  1 +
 include/hw/pci/pci_bus.h|  1 +
 5 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index cf6eb54c22..145a503e92 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1343,6 +1343,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 QLIST_FOREACH(bus, >child, sibling) {
 uint8_t bus_num = pci_bus_num(bus);
 uint8_t numa_node = pci_bus_numa_node(bus);
+int32_t uid = pci_bus_uid(bus);
 
 /* look only for expander root buses */
 if (!pci_bus_is_root(bus)) {
@@ -1356,7 +1357,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 scope = aml_scope("\\_SB");
 dev = aml_device("PC%.02X", bus_num);
 aml_append(dev, aml_name_decl("_BBN", aml_int(bus_num)));
-init_pci_acpi(dev, bus_num, pci_bus_is_express(bus) ? PCIE : PCI);
+init_pci_acpi(dev, uid, pci_bus_is_express(bus) ? PCIE : PCI);
 
 if (numa_node != NUMA_NODE_UNASSIGNED) {
 aml_append(dev, aml_name_decl("_PXM", aml_int(numa_node)));
diff --git a/hw/pci-bridge/pci_expander_bridge.c 
b/hw/pci-bridge/pci_expander_bridge.c
index b42592e1ff..5021b60435 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -67,6 +67,7 @@ struct PXBDev {
 
 uint8_t bus_nr;
 uint16_t numa_node;
+int32_t uid;
 };
 
 static PXBDev *convert_to_pxb(PCIDevice *dev)
@@ -98,12 +99,20 @@ static uint16_t pxb_bus_numa_node(PCIBus *bus)
 return pxb->numa_node;
 }
 
+static int32_t pxb_bus_uid(PCIBus *bus)
+{
+PXBDev *pxb = convert_to_pxb(bus->parent_dev);
+
+return pxb->uid;
+}
+
 static void pxb_bus_class_init(ObjectClass *class, void *data)
 {
 PCIBusClass *pbc = PCI_BUS_CLASS(class);
 
 pbc->bus_num = pxb_bus_num;
 pbc->numa_node = pxb_bus_numa_node;
+pbc->uid = pxb_bus_uid;
 }
 
 static const TypeInfo pxb_bus_info = {
@@ -329,6 +338,7 @@ static Property pxb_dev_properties[] = {
 /* Note: 0 is not a legal PXB bus number. */
 DEFINE_PROP_UINT8("bus_nr", PXBDev, bus_nr, 0),
 DEFINE_PROP_UINT16("numa_node", PXBDev, numa_node, NUMA_NODE_UNASSIGNED),
+DEFINE_PROP_INT32("uid", PXBDev, uid, -1),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -400,12 +410,21 @@ static const TypeInfo pxb_pcie_dev_info = {
 
 static void pxb_cxl_dev_realize(PCIDevice *dev, Error **errp)
 {
+PXBDev *pxb = convert_to_pxb(dev);
+
 /* A CXL PXB's parent bus is still PCIe */
 if (!pci_bus_is_express(pci_get_bus(dev))) {
 error_setg(errp, "pxb-cxl devices cannot reside on a PCI bus");
 return;
 }
 
+if (pxb->uid < 0) {
+error_setg(errp, "pxb-cxl devices must have a valid uid 
(0-2147483647)");
+return;
+}
+
+/* FIXME: Check that uid doesn't collide with UIDs of other host bridges */
+
 pxb_dev_realize_common(dev, CXL, errp);
 }
 
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index adbe8aa260..bf019d91a0 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -170,6 +170,11 @@ static uint16_t pcibus_numa_node(PCIBus *bus)
 return NUMA_NODE_UNASSIGNED;
 }
 
+static int32_t pcibus_uid(PCIBus *bus)
+{
+return -1;
+}
+
 static void pci_bus_class_init(ObjectClass *klass, void *data)
 {
 BusClass *k = BUS_CLASS(klass);
@@ -184,6 +189,7 @@ static void pci_bus_class_init(ObjectClass *klass, void 
*data)
 
 pbc->bus_num = pcibus_num;
 pbc->numa_node = pcibus_numa_node;
+pbc->uid = pcibus_uid;
 }
 
 static const TypeInfo pci_bus_info = {
@@ -530,6 +536,11 @@ int pci_bus_numa_node(PCIBus *bus)
 return PCI_BUS_GET_CLASS(bus)->numa_node(bus);
 }

[RFC PATCH v3 01/31] hw/pci/cxl: Add a CXL component type (interface)

2021-02-01 Thread Ben Widawsky

A CXL component is a hardware entity that implements CXL component
registers from the CXL 2.0 spec (8.2.3). Currently these represent 3
general types.
1. Host Bridge
2. Ports (root, upstream, downstream)
3. Devices (memory, other)

A CXL component can be conceptually thought of as a PCIe device with
extra functionality when enumerated and enabled. For this reason, CXL
does here, and will continue to add on to existing PCI code paths.

Host bridges will typically need to be handled specially and so they can
implement this newly introduced interface or not. All other components
should implement this interface. Implementing this interface allows the
core pci code to treat these devices as special where appropriate.

Signed-off-by: Ben Widawsky 
---
 hw/pci/pci.c | 10 ++
 include/hw/pci/pci.h |  8 
 2 files changed, 18 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 512e9042ff..a45ca326ed 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -194,6 +194,11 @@ static const TypeInfo pci_bus_info = {
 .class_init = pci_bus_class_init,
 };
 
+static const TypeInfo cxl_interface_info = {
+.name  = INTERFACE_CXL_DEVICE,
+.parent= TYPE_INTERFACE,
+};
+
 static const TypeInfo pcie_interface_info = {
 .name  = INTERFACE_PCIE_DEVICE,
 .parent= TYPE_INTERFACE,
@@ -2091,6 +2096,10 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
 pci_dev->cap_present |= QEMU_PCI_CAP_EXPRESS;
 }
 
+if (object_class_dynamic_cast(klass, INTERFACE_CXL_DEVICE)) {
+pci_dev->cap_present |= QEMU_PCIE_CAP_CXL;
+}
+
 pci_dev = do_pci_register_device(pci_dev,
  object_get_typename(OBJECT(qdev)),
  pci_dev->devfn, errp);
@@ -2817,6 +2826,7 @@ static void pci_register_types(void)
 type_register_static(_bus_info);
 type_register_static(_bus_info);
 type_register_static(_pci_interface_info);
+type_register_static(_interface_info);
 type_register_static(_interface_info);
 type_register_static(_device_type_info);
 }
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 66db08462f..528cef341c 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -195,6 +195,8 @@ enum {
 QEMU_PCIE_LNKSTA_DLLLA = (1 << QEMU_PCIE_LNKSTA_DLLLA_BITNR),
 #define QEMU_PCIE_EXTCAP_INIT_BITNR 9
 QEMU_PCIE_EXTCAP_INIT = (1 << QEMU_PCIE_EXTCAP_INIT_BITNR),
+#define QEMU_PCIE_CXL_BITNR 10
+QEMU_PCIE_CAP_CXL = (1 << QEMU_PCIE_CXL_BITNR),
 };
 
 #define TYPE_PCI_DEVICE "pci-device"
@@ -202,6 +204,12 @@ typedef struct PCIDeviceClass PCIDeviceClass;
 DECLARE_OBJ_CHECKERS(PCIDevice, PCIDeviceClass,
  PCI_DEVICE, TYPE_PCI_DEVICE)
 
+/*
+ * Implemented by devices that can be plugged on CXL buses. In the spec, this 
is
+ * actually a "CXL Component, but we name it device to match the PCI naming.
+ */
+#define INTERFACE_CXL_DEVICE "cxl-device"
+
 /* Implemented by devices that can be plugged on PCI Express buses */
 #define INTERFACE_PCIE_DEVICE "pci-express-device"
 
-- 
2.30.0

[RFC PATCH v3 00/31] CXL 2.0 Support

2021-02-01 Thread Ben Widawsky

Major changes since v2 [1]:
 * Removed all register endian/alignment/size checking. Using core functionality
   instead. This untested on big endian hosts, but Should Work(tm).
 * Fix component capability header generation (off by 1).
 * Fixed HDM programming (multiple issues).
 * Fixed timestamp command implementations.
 * Added commands: GET_FIRMWARE_UPDATE_INFO, GET_PARTITION_INFO, GET_LSA, 
SET_LSA

Things have remained fairly stable since since v2. The biggest change here is
definitely the HDM programming which has received limited (but not 0) testing in
the Linux driver.

Jonathan Cameron has gotten this patch series working on ARM [2], and added some
much sought after functionality [3].

---

I've started #cxl on OFTC IRC for discussion. Please feel free to use that
channel for questions or suggestions in addition to #qemu.

---

Introduce emulation of Compute Express Link 2.0
(https://www.computeexpresslink.org/). Specifically, add support for Type 3
memory expanders with persistent memory.

The emulation has been critical to get the Linux enabling started [4], it would
be an ideal place to land regression tests for different topology handling, and
there may be applications for this emulation as a way for a guest to manipulate
its address space relative to different performance memories.

Three of the five CXL component types are emulated with some level of
functionality: host bridge, root port, and memory device. All components and
devices implement basic MMIO. Devices/memory devices implement the mailbo
interface. Basic ACPI support is also included. Upstream ports and downstream
ports aren't implemented (the two components needed to make up a switch).

CXL 2.0 is built on top of PCIe (see spec for details). As a result, much of the
implementation utilizes existing PCI paradigms. To implement the host bridge,
I've chosen to use PXB (PCI Expander Bridge). It seemed to be the most natural
fit even though it doesn't directly map to how hardware will work. For
persistent capacity of the memory device, I utilized the memory subsystem
(hw/mem).

We have 3 reasons why this work is valuable:
1. Linux driver feature development benefits from emulation both due to a lack
   of initial hardware availability, but also, as is seen with NVDIMM/PMEM
   emulation, there is value in being able to share topologies with
   system-software developers even after hardware is available.

2. The Linux kernel's unit test suite for NVDIMM/PMEM ended up injecting fake
   resources via custom modules (nfit_test). In retrospect a QEMU emulation of
   nfit_test capabilities would have made the test environment more portable,
   and allowed for easier community contributions of example configurations.

3. This is still being fleshed out, but in short it provides a standardized
   mechanism for the guest to provide feedback to the host about size and
   placement needs of the memory. After the host gives the guest a physical
   window mapping to the CXL device, the emulated HDM decoders allow the guest a
   way to tell the host how much it wants and where. There are likely simpler
   ways to do this, but they'd require inventing a new interface and you'd need
   to have diverging driver code in the guest programming of the HDM decoder vs.
   the host. Since we've already done this work, why not use it?

There is quite a long list of work to do for full spec compliance, but I don't
believe that any of it precludes merging. Off the top of my head:
- Main host bridge support (WIP)
- Interleaving
- Better Tests
- Hot plug support
- Emulating volatile capacity
- CDAT emulation [3]

The flow of the patches in general is to define all the data structures and
registers associated with the various components in a top down manner. Host
bridge, component, ports, devices. Then, the actual implementation is done in
the same order.

The summary is:
1-5: Infrastructure for component and device emulation
6-9: Basic mailbox command implementations
10-19: Implement CXL host bridges as PXB devices
20: Implement a root port
21-22: Implement a memory device
23-26: ACPI bits
27-29: Add some more advanced mailbox command implementations
30: Start working on enabling the main host bridge
31: Basic test case

---

[1]: 
https://lore.kernel.org/qemu-devel/20210105165323.783725-1-ben.widaw...@intel.com/
[2]: 
https://lore.kernel.org/qemu-devel/20210201152655.31027-1-jonathan.came...@huawei.com/
[3]: 
https://lore.kernel.org/qemu-devel/20210201151629.29656-1-jonathan.came...@huawei.com/
[4]: 
https://lore.kernel.org/linux-cxl/20210130002438.1872527-1-ben.widaw...@intel.com/

---

Ben Widawsky (31):
  hw/pci/cxl: Add a CXL component type (interface)
  hw/cxl/component: Introduce CXL components (8.1.x, 8.2.5)
  hw/cxl/device: Introduce a CXL device (8.2.8)
  hw/cxl/device: Implement the CAP array (8.2.8.1-2)
  hw/cxl/device: Implement basic mailbox (8.2.8.4)
  hw/cxl/device: Add memory device utilities
  hw/cxl/device: Add cheap EVENTS implementation

[RFC PATCH v3 02/31] hw/cxl/component: Introduce CXL components (8.1.x, 8.2.5)

2021-02-01 Thread Ben Widawsky

A CXL 2.0 component is any entity in the CXL topology. All components
have a analogous function in PCIe. Except for the CXL host bridge, all
have a PCIe config space that is accessible via the common PCIe
mechanisms. CXL components are enumerated via DVSEC fields in the
extended PCIe header space. CXL components will minimally implement some
subset of CXL.mem and CXL.cache registers defined in 8.2.5 of the CXL
2.0 specification. Two headers and a utility library are introduced to
support the minimum functionality needed to enumerate components.

The cxl_pci header manages bits associated with PCI, specifically the
DVSEC and related fields. The cxl_component.h variant has data
structures and APIs that are useful for drivers implementing any of the
CXL 2.0 components. The library takes care of making use of the DVSEC
bits and the CXL.[mem|cache] registers. Per spec, the registers are
little endian.

None of the mechanisms required to enumerate a CXL capable hostbridge
are introduced at this point.

Note that the CXL.mem and CXL.cache registers used are always 4B wide.
It's possible in the future that this constraint will not hold.

Signed-off-by: Ben Widawsky 
---
 MAINTAINERS|   6 +
 hw/Kconfig |   1 +
 hw/cxl/Kconfig |   3 +
 hw/cxl/cxl-component-utils.c   | 208 +
 hw/cxl/meson.build |   3 +
 hw/meson.build |   1 +
 include/hw/cxl/cxl.h   |  17 +++
 include/hw/cxl/cxl_component.h | 187 +
 include/hw/cxl/cxl_pci.h   | 138 ++
 9 files changed, 564 insertions(+)
 create mode 100644 hw/cxl/Kconfig
 create mode 100644 hw/cxl/cxl-component-utils.c
 create mode 100644 hw/cxl/meson.build
 create mode 100644 include/hw/cxl/cxl.h
 create mode 100644 include/hw/cxl/cxl_component.h
 create mode 100644 include/hw/cxl/cxl_pci.h

diff --git a/MAINTAINERS b/MAINTAINERS
index bcd88668bc..981dc92e25 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2234,6 +2234,12 @@ F: qapi/block*.json
 F: qapi/transaction.json
 T: git https://repo.or.cz/qemu/armbru.git block-next
 
+Compute Express Link
+M: Ben Widawsky 
+S: Supported
+F: hw/cxl/
+F: include/hw/cxl/
+
 Dirty Bitmaps
 M: Eric Blake 
 M: Vladimir Sementsov-Ogievskiy 
diff --git a/hw/Kconfig b/hw/Kconfig
index 5ad3c6b5a4..c03650c5ed 100644
--- a/hw/Kconfig
+++ b/hw/Kconfig
@@ -6,6 +6,7 @@ source audio/Kconfig
 source block/Kconfig
 source char/Kconfig
 source core/Kconfig
+source cxl/Kconfig
 source display/Kconfig
 source dma/Kconfig
 source gpio/Kconfig
diff --git a/hw/cxl/Kconfig b/hw/cxl/Kconfig
new file mode 100644
index 00..8e67519b16
--- /dev/null
+++ b/hw/cxl/Kconfig
@@ -0,0 +1,3 @@
+config CXL
+bool
+default y if PCI_EXPRESS
diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c
new file mode 100644
index 00..8d56ad5c7d
--- /dev/null
+++ b/hw/cxl/cxl-component-utils.c
@@ -0,0 +1,208 @@
+/*
+ * CXL Utility library for components
+ *
+ * Copyright(C) 2020 Intel Corporation.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See the
+ * COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/pci/pci.h"
+#include "hw/cxl/cxl.h"
+
+static uint64_t cxl_cache_mem_read_reg(void *opaque, hwaddr offset,
+   unsigned size)
+{
+CXLComponentState *cxl_cstate = opaque;
+ComponentRegisters *cregs = _cstate->crb;
+
+assert(size == 4);
+
+if (cregs->special_ops && cregs->special_ops->read) {
+return cregs->special_ops->read(cxl_cstate, offset, size);
+} else {
+return cregs->cache_mem_registers[offset / 4];
+}
+}
+
+static void cxl_cache_mem_write_reg(void *opaque, hwaddr offset, uint64_t 
value,
+unsigned size)
+{
+CXLComponentState *cxl_cstate = opaque;
+ComponentRegisters *cregs = _cstate->crb;
+
+assert(size == 4);
+
+if (cregs->special_ops && cregs->special_ops->write) {
+cregs->special_ops->write(cxl_cstate, offset, value, size);
+} else {
+cregs->cache_mem_registers[offset / 4] = value;
+}
+}
+
+/*
+ * 8.2.3
+ *   The access restrictions specified in Section 8.2.2 also apply to CXL 2.0
+ *   Component Registers.
+ *
+ * 8.2.2
+ *   • A 32 bit register shall be accessed as a 4 Bytes quantity. Partial
+ *   reads are not permitted.
+ *   • A 64 bit register shall be accessed as a 8 Bytes quantity. Partial
+ *   reads are not permitted.
+ *
+ * As of the spec defined today, only 4 byte registers exist.
+ */
+static const MemoryRegionOps cache_mem_ops = {
+.read = cxl_cache_mem_read_reg,
+.write = cxl_cache_mem_write_reg,
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 4,
+.max_

Re: [RFC PATCH v2 24/32] hw/cxl/device: Add a memory device (8.2.8.5)

2021-01-28 Thread Ben Widawsky

On 21-01-28 08:51:51, Ben Widawsky wrote:
> On 21-01-28 07:14:44, Ben Widawsky wrote:
> > On 21-01-28 07:03:18, Ben Widawsky wrote:
> > > On 21-01-28 10:25:38, Jonathan Cameron wrote:
> > > > On Wed, 27 Jan 2021 13:26:45 -0800
> > > > Ben Widawsky  wrote:
> > > > 
> > > > > On 21-01-27 22:03:12, Igor Mammedov wrote:
> > > > > > On Tue,  5 Jan 2021 08:53:15 -0800
> > > > > > Ben Widawsky  wrote:
> > > > > >   
> > > > > > > A CXL memory device (AKA Type 3) is a CXL component that contains 
> > > > > > > some
> > > > > > > combination of volatile and persistent memory. It also implements 
> > > > > > > the
> > > > > > > previously defined mailbox interface as well as the memory device
> > > > > > > firmware interface.
> > > > > > > 
> > > > > > > The following example will create a 256M device in a 512M window:
> > > > > > > 
> > > > > > > -object 
> > > > > > > "memory-backend-file,id=cxl-mem1,share,mem-path=cxl-type3,size=512M"
> > > > > > > -device 
> > > > > > > "cxl-type3,bus=rp0,memdev=cxl-mem1,id=cxl-pmem0,size=256M"  
> > > > > > 
> > > > > > I'd expect whole backend used by frontend, so one would not need 
> > > > > > "size" property
> > > > > > on frontend (like we do with memory devices).
> > > > > > So question is why it partially uses memdev?  
> > > > > 
> > > > > Answered in a separate thread...
> > > > 
> > > > One possible suggestion inline.
> > > > 
> > > > > > > +
> > > > > > > +static void cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> > > > > > > +{
> > > > > > > +MemoryRegionSection mrs;
> > > > > > > +MemoryRegion *mr;
> > > > > > > +uint64_t offset = 0;
> > > > > > > +size_t remaining_size;
> > > > > > > +
> > > > > > > +if (!ct3d->hostmem) {
> > > > > > > +error_setg(errp, "memdev property must be set");
> > > > > > > +return;
> > > > > > > +}
> > > > > > > +
> > > > > > > +/* FIXME: need to check mr is the host bridge's MR */
> > > > > > > +mr = host_memory_backend_get_memory(ct3d->hostmem);
> > > > > > > +
> > > > > > > +/* Create our new subregion */
> > > > > > > +ct3d->cxl_dstate.pmem = g_new(MemoryRegion, 1);
> > > > > > > +
> > > > > > > +/* Find the first free space in the window */
> > > > > > > +WITH_RCU_READ_LOCK_GUARD()
> > > > > > > +{
> > > > > > > +mrs = memory_region_find(mr, offset, 1);
> > > > > > > +while (mrs.mr && mrs.mr != mr) {
> > > > > > > +offset += memory_region_size(mrs.mr);
> > > > > > > +mrs = memory_region_find(mr, offset, 1);
> > > > > > > +}
> > > > > > > +}
> > > > > > > +
> > > > > > > +remaining_size = memory_region_size(mr) - offset;
> > > > > > > +if (remaining_size < ct3d->size) {
> > > > > > > +g_free(ct3d->cxl_dstate.pmem);
> > > > > > > +error_setg(errp,
> > > > > > > +   "Not enough free space (%zd) required for 
> > > > > > > device (%" PRId64  ")",
> > > > > > > +   remaining_size, ct3d->size);
> > > > > > > +}
> > > > > > > +
> > > > > > > +/* Register our subregion as non-volatile */
> > > > > > > +memory_region_init_ram(ct3d->cxl_dstate.pmem, OBJECT(ct3d),
> > > > > > > +   "cxl_type3-memory", ct3d->size, 
> > > > > > > errp);  
> > > > > > this allocates ct3d->size of anon RAM, was this an intention?
> > > > > > If yes, can you clarify why extra RAM is

Re: [RFC PATCH v2 24/32] hw/cxl/device: Add a memory device (8.2.8.5)

2021-01-28 Thread Ben Widawsky

On 21-01-28 07:14:44, Ben Widawsky wrote:
> On 21-01-28 07:03:18, Ben Widawsky wrote:
> > On 21-01-28 10:25:38, Jonathan Cameron wrote:
> > > On Wed, 27 Jan 2021 13:26:45 -0800
> > > Ben Widawsky  wrote:
> > > 
> > > > On 21-01-27 22:03:12, Igor Mammedov wrote:
> > > > > On Tue,  5 Jan 2021 08:53:15 -0800
> > > > > Ben Widawsky  wrote:
> > > > >   
> > > > > > A CXL memory device (AKA Type 3) is a CXL component that contains 
> > > > > > some
> > > > > > combination of volatile and persistent memory. It also implements 
> > > > > > the
> > > > > > previously defined mailbox interface as well as the memory device
> > > > > > firmware interface.
> > > > > > 
> > > > > > The following example will create a 256M device in a 512M window:
> > > > > > 
> > > > > > -object 
> > > > > > "memory-backend-file,id=cxl-mem1,share,mem-path=cxl-type3,size=512M"
> > > > > > -device "cxl-type3,bus=rp0,memdev=cxl-mem1,id=cxl-pmem0,size=256M"  
> > > > > 
> > > > > I'd expect whole backend used by frontend, so one would not need 
> > > > > "size" property
> > > > > on frontend (like we do with memory devices).
> > > > > So question is why it partially uses memdev?  
> > > > 
> > > > Answered in a separate thread...
> > > 
> > > One possible suggestion inline.
> > > 
> > > > > > +
> > > > > > +static void cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> > > > > > +{
> > > > > > +MemoryRegionSection mrs;
> > > > > > +MemoryRegion *mr;
> > > > > > +uint64_t offset = 0;
> > > > > > +size_t remaining_size;
> > > > > > +
> > > > > > +if (!ct3d->hostmem) {
> > > > > > +error_setg(errp, "memdev property must be set");
> > > > > > +return;
> > > > > > +}
> > > > > > +
> > > > > > +/* FIXME: need to check mr is the host bridge's MR */
> > > > > > +mr = host_memory_backend_get_memory(ct3d->hostmem);
> > > > > > +
> > > > > > +/* Create our new subregion */
> > > > > > +ct3d->cxl_dstate.pmem = g_new(MemoryRegion, 1);
> > > > > > +
> > > > > > +/* Find the first free space in the window */
> > > > > > +WITH_RCU_READ_LOCK_GUARD()
> > > > > > +{
> > > > > > +mrs = memory_region_find(mr, offset, 1);
> > > > > > +while (mrs.mr && mrs.mr != mr) {
> > > > > > +offset += memory_region_size(mrs.mr);
> > > > > > +mrs = memory_region_find(mr, offset, 1);
> > > > > > +}
> > > > > > +}
> > > > > > +
> > > > > > +remaining_size = memory_region_size(mr) - offset;
> > > > > > +if (remaining_size < ct3d->size) {
> > > > > > +g_free(ct3d->cxl_dstate.pmem);
> > > > > > +error_setg(errp,
> > > > > > +   "Not enough free space (%zd) required for 
> > > > > > device (%" PRId64  ")",
> > > > > > +   remaining_size, ct3d->size);
> > > > > > +}
> > > > > > +
> > > > > > +/* Register our subregion as non-volatile */
> > > > > > +memory_region_init_ram(ct3d->cxl_dstate.pmem, OBJECT(ct3d),
> > > > > > +   "cxl_type3-memory", ct3d->size, errp);  
> > > > > this allocates ct3d->size of anon RAM, was this an intention?
> > > > > If yes, can you clarify why extra RAM is used instead of using what
> > > > > backend provides?  
> > > > 
> > > > It sounds like I'm doing the wrong thing then. There should be one 
> > > > chunk of
> > > > memory which is a subset of the full memory backend object. Could you 
> > > > please
> > > > advise on what I should be doing instead? Is add_subregion() sufficient?
> > > 
> > > Taking inspiration from nvdimm I'm carrying a patch that us

Re: [RFC PATCH v2 24/32] hw/cxl/device: Add a memory device (8.2.8.5)

2021-01-28 Thread Ben Widawsky

On 21-01-28 07:03:18, Ben Widawsky wrote:
> On 21-01-28 10:25:38, Jonathan Cameron wrote:
> > On Wed, 27 Jan 2021 13:26:45 -0800
> > Ben Widawsky  wrote:
> > 
> > > On 21-01-27 22:03:12, Igor Mammedov wrote:
> > > > On Tue,  5 Jan 2021 08:53:15 -0800
> > > > Ben Widawsky  wrote:
> > > >   
> > > > > A CXL memory device (AKA Type 3) is a CXL component that contains some
> > > > > combination of volatile and persistent memory. It also implements the
> > > > > previously defined mailbox interface as well as the memory device
> > > > > firmware interface.
> > > > > 
> > > > > The following example will create a 256M device in a 512M window:
> > > > > 
> > > > > -object 
> > > > > "memory-backend-file,id=cxl-mem1,share,mem-path=cxl-type3,size=512M"
> > > > > -device "cxl-type3,bus=rp0,memdev=cxl-mem1,id=cxl-pmem0,size=256M"  
> > > > 
> > > > I'd expect whole backend used by frontend, so one would not need "size" 
> > > > property
> > > > on frontend (like we do with memory devices).
> > > > So question is why it partially uses memdev?  
> > > 
> > > Answered in a separate thread...
> > 
> > One possible suggestion inline.
> > 
> > > > > +
> > > > > +static void cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> > > > > +{
> > > > > +MemoryRegionSection mrs;
> > > > > +MemoryRegion *mr;
> > > > > +uint64_t offset = 0;
> > > > > +size_t remaining_size;
> > > > > +
> > > > > +if (!ct3d->hostmem) {
> > > > > +error_setg(errp, "memdev property must be set");
> > > > > +return;
> > > > > +}
> > > > > +
> > > > > +/* FIXME: need to check mr is the host bridge's MR */
> > > > > +mr = host_memory_backend_get_memory(ct3d->hostmem);
> > > > > +
> > > > > +/* Create our new subregion */
> > > > > +ct3d->cxl_dstate.pmem = g_new(MemoryRegion, 1);
> > > > > +
> > > > > +/* Find the first free space in the window */
> > > > > +WITH_RCU_READ_LOCK_GUARD()
> > > > > +{
> > > > > +mrs = memory_region_find(mr, offset, 1);
> > > > > +while (mrs.mr && mrs.mr != mr) {
> > > > > +offset += memory_region_size(mrs.mr);
> > > > > +mrs = memory_region_find(mr, offset, 1);
> > > > > +}
> > > > > +}
> > > > > +
> > > > > +remaining_size = memory_region_size(mr) - offset;
> > > > > +if (remaining_size < ct3d->size) {
> > > > > +g_free(ct3d->cxl_dstate.pmem);
> > > > > +error_setg(errp,
> > > > > +   "Not enough free space (%zd) required for device 
> > > > > (%" PRId64  ")",
> > > > > +   remaining_size, ct3d->size);
> > > > > +}
> > > > > +
> > > > > +/* Register our subregion as non-volatile */
> > > > > +memory_region_init_ram(ct3d->cxl_dstate.pmem, OBJECT(ct3d),
> > > > > +   "cxl_type3-memory", ct3d->size, errp);  
> > > > this allocates ct3d->size of anon RAM, was this an intention?
> > > > If yes, can you clarify why extra RAM is used instead of using what
> > > > backend provides?  
> > > 
> > > It sounds like I'm doing the wrong thing then. There should be one chunk 
> > > of
> > > memory which is a subset of the full memory backend object. Could you 
> > > please
> > > advise on what I should be doing instead? Is add_subregion() sufficient?
> > 
> > Taking inspiration from nvdimm I'm carrying a patch that uses
> > memory_region_init_alias(ct3d->cxl_dstate.pmem, OBJECT(qct3d)q,
> >  "cxl_type3-memory", mr, offset, ct3d->size);
> > 
> > I 'think' that's doing the right thing, but haven't fully tested it yet
> > so may be completely wrong :)
> > 
> > Then for the pmem addr, call memory_region_set_address() to put it
> > in a particular location.
> > 
> 
> Yes - this is wha

Re: [RFC PATCH v2 24/32] hw/cxl/device: Add a memory device (8.2.8.5)

2021-01-28 Thread Ben Widawsky

On 21-01-28 10:25:38, Jonathan Cameron wrote:
> On Wed, 27 Jan 2021 13:26:45 -0800
> Ben Widawsky  wrote:
> 
> > On 21-01-27 22:03:12, Igor Mammedov wrote:
> > > On Tue,  5 Jan 2021 08:53:15 -0800
> > > Ben Widawsky  wrote:
> > >   
> > > > A CXL memory device (AKA Type 3) is a CXL component that contains some
> > > > combination of volatile and persistent memory. It also implements the
> > > > previously defined mailbox interface as well as the memory device
> > > > firmware interface.
> > > > 
> > > > The following example will create a 256M device in a 512M window:
> > > > 
> > > > -object 
> > > > "memory-backend-file,id=cxl-mem1,share,mem-path=cxl-type3,size=512M"
> > > > -device "cxl-type3,bus=rp0,memdev=cxl-mem1,id=cxl-pmem0,size=256M"  
> > > 
> > > I'd expect whole backend used by frontend, so one would not need "size" 
> > > property
> > > on frontend (like we do with memory devices).
> > > So question is why it partially uses memdev?  
> > 
> > Answered in a separate thread...
> 
> One possible suggestion inline.
> 
> > > > +
> > > > +static void cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
> > > > +{
> > > > +MemoryRegionSection mrs;
> > > > +MemoryRegion *mr;
> > > > +uint64_t offset = 0;
> > > > +size_t remaining_size;
> > > > +
> > > > +if (!ct3d->hostmem) {
> > > > +error_setg(errp, "memdev property must be set");
> > > > +return;
> > > > +}
> > > > +
> > > > +/* FIXME: need to check mr is the host bridge's MR */
> > > > +mr = host_memory_backend_get_memory(ct3d->hostmem);
> > > > +
> > > > +/* Create our new subregion */
> > > > +ct3d->cxl_dstate.pmem = g_new(MemoryRegion, 1);
> > > > +
> > > > +/* Find the first free space in the window */
> > > > +WITH_RCU_READ_LOCK_GUARD()
> > > > +{
> > > > +mrs = memory_region_find(mr, offset, 1);
> > > > +while (mrs.mr && mrs.mr != mr) {
> > > > +offset += memory_region_size(mrs.mr);
> > > > +mrs = memory_region_find(mr, offset, 1);
> > > > +}
> > > > +}
> > > > +
> > > > +remaining_size = memory_region_size(mr) - offset;
> > > > +if (remaining_size < ct3d->size) {
> > > > +g_free(ct3d->cxl_dstate.pmem);
> > > > +error_setg(errp,
> > > > +   "Not enough free space (%zd) required for device 
> > > > (%" PRId64  ")",
> > > > +   remaining_size, ct3d->size);
> > > > +}
> > > > +
> > > > +/* Register our subregion as non-volatile */
> > > > +memory_region_init_ram(ct3d->cxl_dstate.pmem, OBJECT(ct3d),
> > > > +   "cxl_type3-memory", ct3d->size, errp);  
> > > this allocates ct3d->size of anon RAM, was this an intention?
> > > If yes, can you clarify why extra RAM is used instead of using what
> > > backend provides?  
> > 
> > It sounds like I'm doing the wrong thing then. There should be one chunk of
> > memory which is a subset of the full memory backend object. Could you please
> > advise on what I should be doing instead? Is add_subregion() sufficient?
> 
> Taking inspiration from nvdimm I'm carrying a patch that uses
> memory_region_init_alias(ct3d->cxl_dstate.pmem, OBJECT(qct3d)q,
>"cxl_type3-memory", mr, offset, ct3d->size);
> 
> I 'think' that's doing the right thing, but haven't fully tested it yet
> so may be completely wrong :)
> 
> Then for the pmem addr, call memory_region_set_address() to put it
> in a particular location.
> 

Yes - this is what I'd like to do and what I initially tried, and I also believe
it's right, but it doesn't work.

range_invariant: Assertion `range->lob <= range->upb || range->lob == 
range->upb + 1' failed.

I was digging into this yesterday, but opted to start a new thread on the
matter.

> > 
> > 
> > >   
> > > > +memory_region_set_nonvolatile(ct3d->cxl_dstate.pmem, true);  
> > >   
> > > > +#ifdef SET_PMEM_PADDR
> > > > +memory_region_add_subregion(mr, offset, ct

[RFC] Set addresses for memory devices [CXL]

2021-01-27 Thread Ben Widawsky

Hi list, Igor.

I wanted to get some ideas on how to better handle this. Per the recent
discussion [1], it's become clear that there needs to be more thought put into
how to manage the address space for CXL memory devices. If you see the
discussion on interleave [2] there's a decent diagram for the problem statement.

A CXL topology looks just like a PCIe topology. A CXL memory device is a memory
expander. It's a byte addressable address range with a combination of persistent
and volatile memory. In a CXL capable system, you can effectively think of these
things as more configurable NVDIMMs. The memory devices have an interface that
allows the OS to program the base physical address range it claims called an HDM
(Host Defined Memory) decoder. A larger address range is claimed by a host
bridge (or a combination of host bridges in the interleaved case) which is
platform specific.

Originally, my plan was to create a single memory backend for a "window" and
subregion the devices in there. So for example, if you had two devices under a
hostbridge, each of 256M size, the window would be some fixed GPA of 512M+ size
memory backend, and those memory devices would be a subregion of the
hostbridge's window. I thought this was working in my patch series, but as it
turns out, this doesn't actually work as I intended. `info mtree` looks good,
but `info memory-devices` doesn't.

So let me list the requirements and hopefully get some feedback on the best way
to handle it.
1. A PCIe like device has a persistent memory region (I don't care about
volatile at the moment).
2. The physical address base for the memory region is programmable.
3. Memory accesses will support interleaving across multiple host bridges.

As far as I can tell, there isn't anything that works quite like this today,
and, my attempts so far haven't been correct.

Thanks.
Ben

References:
[1] 
https://lore.kernel.org/qemu-devel/20210126213013.6v24im4sler3q...@mail.bwidawsk.net/
[2] 
https://lore.kernel.org/qemu-devel/c51b000e-80db-40e9-d878-f260c49e4...@amsat.org/

Other:
https://lore.kernel.org/qemu-devel/20210105165323.783725-23-ben.widaw...@intel.com/
https://lore.kernel.org/qemu-devel/20210105165323.783725-26-ben.widaw...@intel.com/

Re: Handling multiple inheritance [for CXL]

2021-01-27 Thread Ben Widawsky

On 21-01-27 22:33:37, Igor Mammedov wrote:
> On Wed, 27 Jan 2021 12:25:44 -0800
> Ben Widawsky  wrote:
> 
> > On 21-01-27 21:18:24, Igor Mammedov wrote:
> > > On Tue, 26 Jan 2021 13:33:52 -0800
> > > Ben Widawsky  wrote:
> > >   
> > > > I'm working on CXL 2.0 type 3 memory devices [1]. In short, these are 
> > > > PCIe devices
> > > > that have persistent memory on them. As such, it would be nice to 
> > > > inherit from
> > > > both a PCI_DEVICE class as well as an NVDIMM device class.
> > > > 
> > > > Truth be told, using TYPE_MEMORY_DEVICE as the interface does provide 
> > > > most of
> > > > what I need.  
> > > could you be more specific on what you need from it?
> > >   
> > 
> > I'm trying to register my persistent memory as normal system memory. I 
> > assume
> > it's required that I implement the memory interface to do that. If it's not,
> > that's fine too.
> > 
> > For reference:
> > https://gitlab.com/bwidawsk/qemu/-/blob/cxl-2.0v3/hw/mem/cxl_type3.c
> 
> if you use TYPE_MEMORY_DEVICE machinery, then address/(max)size a device 
> takes in
> hotplug ram window, is fixed at device creation time.
> If you use PCI BAR to map memory, it should be possible to reprogram BAR
> anywhere in PCI address space at runtime.
> 

This is not part of the PCI address space. I believe there will be quite a bit
of work to support hotplug properly for CXL devices, but I believe making it a
PCI BAR is the wrong approach.

If you're not yet familiar with the spec, it might make some sense to take a
look as I think I'm not doing a good job conveying what this hardware is.

https://www.computeexpresslink.org/download-the-specification

> > > >I'm wondering what the best way to handle this is. Currently, the
> > > > only thing NVDIMM class provides is write/read_label_data, this is 
> > > > driven by
> > > > _DSM. For CXL, the mechanism to read/write the equivalent area is not 
> > > > done via
> > > > _DSM, but done directly via a mailbox interface. However, the intent is 
> > > > the
> > > > same, and so utilizing similar code seems ideal.
> > > > 
> > > > If there's a desire to unify these code paths, I'd need something like 
> > > > multiple
> > > > inheritance. I'm looking for some feedback here on how to do it.
> > > > 
> > > > Thanks.
> > > > Ben
> > > > 
> > > > [1]: 
> > > > https://lore.kernel.org/qemu-devel/20210105165323.783725-1-ben.widaw...@intel.com/
> > > >   
> > >   
> > 
>

Re: [RFC PATCH v2 24/32] hw/cxl/device: Add a memory device (8.2.8.5)

2021-01-27 Thread Ben Widawsky

On 21-01-27 22:21:04, Igor Mammedov wrote:
> On Wed, 27 Jan 2021 13:11:16 -0800
> Ben Widawsky  wrote:
> 
> > On 21-01-27 22:03:12, Igor Mammedov wrote:
> > > On Tue,  5 Jan 2021 08:53:15 -0800
> > > Ben Widawsky  wrote:
> > >   
> > > > A CXL memory device (AKA Type 3) is a CXL component that contains some
> > > > combination of volatile and persistent memory. It also implements the
> > > > previously defined mailbox interface as well as the memory device
> > > > firmware interface.
> > > > 
> > > > The following example will create a 256M device in a 512M window:
> > > > 
> > > > -object 
> > > > "memory-backend-file,id=cxl-mem1,share,mem-path=cxl-type3,size=512M"
> > > > -device "cxl-type3,bus=rp0,memdev=cxl-mem1,id=cxl-pmem0,size=256M"  
> > > 
> > > I'd expect whole backend used by frontend, so one would not need "size" 
> > > property
> > > on frontend (like we do with memory devices).
> > > So question is why it partially uses memdev?  
> > 
> > A CXL memory device may participate in an interleave set. In such a case, it
> > would be < the total size of the memory window.
> > 
> > This isn't implemented in the code yet, but it is planned.
> 
> could you add here how it supposed to look like CLI interface wise? 
> 
> also see other questions below.
> 

My mistake on the other questions. I forked another thread for those.

Interleave is still being fleshed out. But generally to set up a 512M address
range interleaves across 2 devices, each 256M, and each connected to a root port
on the host bridge:

# Memory backend
-object memory-backend-file,id=cxl-mem1,share,mem-path=cxl-type3,size=512M

# Host Bridge
-device pxb-cxl id=cxl.0,bus=pcie.0,bus_nr=52,uid=0 
len-window-base=1,window-base[0]=0x4c000 memdev[0]=cxl-mem1

# 2 root ports
-device cxl-rp,id=rp0,bus=cxl.0,addr=0.0,chassis=0,slot=0,memdev=cxl-mem1
-device cxl-rp,id=rp1,bus=cxl.0,addr=0.1,chassis=0,slot=1,memdev=cxl-mem1

# 2 PMEM devices
-device cxl-type3,bus=rp0,memdev=cxl-mem1,id=cxl-pmem0,size=256M
-device cxl-type3,bus=rp1,memdev=cxl-mem1,id=cxl-pmem1,size=256M

Re: [RFC PATCH v2 24/32] hw/cxl/device: Add a memory device (8.2.8.5)

2021-01-27 Thread Ben Widawsky

On 21-01-27 22:03:12, Igor Mammedov wrote:
> On Tue,  5 Jan 2021 08:53:15 -0800
> Ben Widawsky  wrote:
> 
> > A CXL memory device (AKA Type 3) is a CXL component that contains some
> > combination of volatile and persistent memory. It also implements the
> > previously defined mailbox interface as well as the memory device
> > firmware interface.
> > 
> > The following example will create a 256M device in a 512M window:
> > 
> > -object "memory-backend-file,id=cxl-mem1,share,mem-path=cxl-type3,size=512M"
> > -device "cxl-type3,bus=rp0,memdev=cxl-mem1,id=cxl-pmem0,size=256M"
> 
> I'd expect whole backend used by frontend, so one would not need "size" 
> property
> on frontend (like we do with memory devices).
> So question is why it partially uses memdev?

Answered in a separate thread...

> 
> 
> > Signed-off-by: Ben Widawsky 
> > ---
> >  hw/core/numa.c |   3 +
> >  hw/cxl/cxl-mailbox-utils.c |  41 ++
> >  hw/i386/pc.c   |   1 +
> >  hw/mem/Kconfig |   5 +
> >  hw/mem/cxl_type3.c | 262 +
> >  hw/mem/meson.build |   1 +
> >  hw/pci/pcie.c  |  30 +
> >  include/hw/cxl/cxl.h   |   2 +
> >  include/hw/cxl/cxl_pci.h   |  22 
> >  include/hw/pci/pci_ids.h   |   1 +
> >  monitor/hmp-cmds.c |  15 +++
> >  qapi/machine.json  |   1 +
> >  12 files changed, 384 insertions(+)
> >  create mode 100644 hw/mem/cxl_type3.c
> > 
> > diff --git a/hw/core/numa.c b/hw/core/numa.c
> > index 68cee65f61..cd7df371e6 100644
> > --- a/hw/core/numa.c
> > +++ b/hw/core/numa.c
> > @@ -770,6 +770,9 @@ static void numa_stat_memory_devices(NumaNodeMem 
> > node_mem[])
> >  node_mem[pcdimm_info->node].node_plugged_mem +=
> >  pcdimm_info->size;
> >  break;
> > +case MEMORY_DEVICE_INFO_KIND_CXL:
> > +/* FINISHME */
> > +break;
> >  case MEMORY_DEVICE_INFO_KIND_VIRTIO_PMEM:
> >  vpi = value->u.virtio_pmem.data;
> >  /* TODO: once we support numa, assign to right node */
> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > index f68ec5b5b9..eeb10b8943 100644
> > --- a/hw/cxl/cxl-mailbox-utils.c
> > +++ b/hw/cxl/cxl-mailbox-utils.c
> > @@ -49,6 +49,8 @@ enum {
> >  LOGS= 0x04,
> >  #define GET_SUPPORTED 0x0
> >  #define GET_LOG   0x1
> > +IDENTIFY= 0x40,
> > +#define MEMORY_DEVICE 0x0
> >  };
> >  
> >  /* 8.2.8.4.5.1 Command Return Codes */
> > @@ -127,6 +129,7 @@ declare_mailbox_handler(TIMESTAMP_GET);
> >  declare_mailbox_handler(TIMESTAMP_SET);
> >  declare_mailbox_handler(LOGS_GET_SUPPORTED);
> >  declare_mailbox_handler(LOGS_GET_LOG);
> > +declare_mailbox_handler(IDENTIFY_MEMORY_DEVICE);
> >  
> >  #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
> >  #define IMMEDIATE_POLICY_CHANGE (1 << 3)
> > @@ -144,6 +147,7 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
> >  CXL_CMD(TIMESTAMP, SET, 8, IMMEDIATE_POLICY_CHANGE),
> >  CXL_CMD(LOGS, GET_SUPPORTED, 0, 0),
> >  CXL_CMD(LOGS, GET_LOG, 0x18, 0),
> > +CXL_CMD(IDENTIFY, MEMORY_DEVICE, 0, 0),
> >  };
> >  
> >  #undef CXL_CMD
> > @@ -262,6 +266,43 @@ define_mailbox_handler(LOGS_GET_LOG)
> >  return CXL_MBOX_SUCCESS;
> >  }
> >  
> > +/* 8.2.9.5.1.1 */
> > +define_mailbox_handler(IDENTIFY_MEMORY_DEVICE)
> > +{
> > +struct {
> > +char fw_revision[0x10];
> > +uint64_t total_capacity;
> > +uint64_t volatile_capacity;
> > +uint64_t persistent_capacity;
> > +uint64_t partition_align;
> > +uint16_t info_event_log_size;
> > +uint16_t warning_event_log_size;
> > +uint16_t failure_event_log_size;
> > +uint16_t fatal_event_log_size;
> > +uint32_t lsa_size;
> > +uint8_t poison_list_max_mer[3];
> > +uint16_t inject_poison_limit;
> > +uint8_t poison_caps;
> > +uint8_t qos_telemetry_caps;
> > +} __attribute__((packed)) *id;
> > +_Static_assert(sizeof(*id) == 0x43, "Bad identify size");
> > +
> > +if (memory_region_size(cxl_dstate->pmem) < (256 << 20)) {
> > +return CXL_MBOX_INTERNAL_ERROR;
> > +}
> > +
> > +i

Re: [RFC PATCH v2 24/32] hw/cxl/device: Add a memory device (8.2.8.5)

2021-01-27 Thread Ben Widawsky

On 21-01-27 22:03:12, Igor Mammedov wrote:
> On Tue,  5 Jan 2021 08:53:15 -0800
> Ben Widawsky  wrote:
> 
> > A CXL memory device (AKA Type 3) is a CXL component that contains some
> > combination of volatile and persistent memory. It also implements the
> > previously defined mailbox interface as well as the memory device
> > firmware interface.
> > 
> > The following example will create a 256M device in a 512M window:
> > 
> > -object "memory-backend-file,id=cxl-mem1,share,mem-path=cxl-type3,size=512M"
> > -device "cxl-type3,bus=rp0,memdev=cxl-mem1,id=cxl-pmem0,size=256M"
> 
> I'd expect whole backend used by frontend, so one would not need "size" 
> property
> on frontend (like we do with memory devices).
> So question is why it partially uses memdev?

A CXL memory device may participate in an interleave set. In such a case, it
would be < the total size of the memory window.

This isn't implemented in the code yet, but it is planned.

> 
> 
> > Signed-off-by: Ben Widawsky 
> > ---
> >  hw/core/numa.c |   3 +
> >  hw/cxl/cxl-mailbox-utils.c |  41 ++
> >  hw/i386/pc.c   |   1 +
> >  hw/mem/Kconfig |   5 +
> >  hw/mem/cxl_type3.c | 262 +
> >  hw/mem/meson.build |   1 +
> >  hw/pci/pcie.c  |  30 +
> >  include/hw/cxl/cxl.h   |   2 +
> >  include/hw/cxl/cxl_pci.h   |  22 
> >  include/hw/pci/pci_ids.h   |   1 +
> >  monitor/hmp-cmds.c |  15 +++
> >  qapi/machine.json  |   1 +
> >  12 files changed, 384 insertions(+)
> >  create mode 100644 hw/mem/cxl_type3.c
> > 
> > diff --git a/hw/core/numa.c b/hw/core/numa.c
> > index 68cee65f61..cd7df371e6 100644
> > --- a/hw/core/numa.c
> > +++ b/hw/core/numa.c
> > @@ -770,6 +770,9 @@ static void numa_stat_memory_devices(NumaNodeMem 
> > node_mem[])
> >  node_mem[pcdimm_info->node].node_plugged_mem +=
> >  pcdimm_info->size;
> >  break;
> > +case MEMORY_DEVICE_INFO_KIND_CXL:
> > +/* FINISHME */
> > +break;
> >  case MEMORY_DEVICE_INFO_KIND_VIRTIO_PMEM:
> >  vpi = value->u.virtio_pmem.data;
> >  /* TODO: once we support numa, assign to right node */
> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> > index f68ec5b5b9..eeb10b8943 100644
> > --- a/hw/cxl/cxl-mailbox-utils.c
> > +++ b/hw/cxl/cxl-mailbox-utils.c
> > @@ -49,6 +49,8 @@ enum {
> >  LOGS= 0x04,
> >  #define GET_SUPPORTED 0x0
> >  #define GET_LOG   0x1
> > +IDENTIFY= 0x40,
> > +#define MEMORY_DEVICE 0x0
> >  };
> >  
> >  /* 8.2.8.4.5.1 Command Return Codes */
> > @@ -127,6 +129,7 @@ declare_mailbox_handler(TIMESTAMP_GET);
> >  declare_mailbox_handler(TIMESTAMP_SET);
> >  declare_mailbox_handler(LOGS_GET_SUPPORTED);
> >  declare_mailbox_handler(LOGS_GET_LOG);
> > +declare_mailbox_handler(IDENTIFY_MEMORY_DEVICE);
> >  
> >  #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
> >  #define IMMEDIATE_POLICY_CHANGE (1 << 3)
> > @@ -144,6 +147,7 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
> >  CXL_CMD(TIMESTAMP, SET, 8, IMMEDIATE_POLICY_CHANGE),
> >  CXL_CMD(LOGS, GET_SUPPORTED, 0, 0),
> >  CXL_CMD(LOGS, GET_LOG, 0x18, 0),
> > +CXL_CMD(IDENTIFY, MEMORY_DEVICE, 0, 0),
> >  };
> >  
> >  #undef CXL_CMD
> > @@ -262,6 +266,43 @@ define_mailbox_handler(LOGS_GET_LOG)
> >  return CXL_MBOX_SUCCESS;
> >  }
> >  
> > +/* 8.2.9.5.1.1 */
> > +define_mailbox_handler(IDENTIFY_MEMORY_DEVICE)
> > +{
> > +struct {
> > +char fw_revision[0x10];
> > +uint64_t total_capacity;
> > +uint64_t volatile_capacity;
> > +uint64_t persistent_capacity;
> > +uint64_t partition_align;
> > +uint16_t info_event_log_size;
> > +uint16_t warning_event_log_size;
> > +uint16_t failure_event_log_size;
> > +uint16_t fatal_event_log_size;
> > +uint32_t lsa_size;
> > +uint8_t poison_list_max_mer[3];
> > +uint16_t inject_poison_limit;
> > +uint8_t poison_caps;
> > +uint8_t qos_telemetry_caps;
> > +} __attribute__((packed)) *id;
> > +_Static_assert(sizeof(*id) == 0x43, "Bad identify size");
> > +
> > +if (memory_regi

Re: Handling multiple inheritance [for CXL]

2021-01-27 Thread Ben Widawsky

On 21-01-27 21:18:24, Igor Mammedov wrote:
> On Tue, 26 Jan 2021 13:33:52 -0800
> Ben Widawsky  wrote:
> 
> > I'm working on CXL 2.0 type 3 memory devices [1]. In short, these are PCIe 
> > devices
> > that have persistent memory on them. As such, it would be nice to inherit 
> > from
> > both a PCI_DEVICE class as well as an NVDIMM device class.
> > 
> > Truth be told, using TYPE_MEMORY_DEVICE as the interface does provide most 
> > of
> > what I need.
> could you be more specific on what you need from it?
> 

I'm trying to register my persistent memory as normal system memory. I assume
it's required that I implement the memory interface to do that. If it's not,
that's fine too.

For reference:
https://gitlab.com/bwidawsk/qemu/-/blob/cxl-2.0v3/hw/mem/cxl_type3.c

> >I'm wondering what the best way to handle this is. Currently, the
> > only thing NVDIMM class provides is write/read_label_data, this is driven by
> > _DSM. For CXL, the mechanism to read/write the equivalent area is not done 
> > via
> > _DSM, but done directly via a mailbox interface. However, the intent is the
> > same, and so utilizing similar code seems ideal.
> > 
> > If there's a desire to unify these code paths, I'd need something like 
> > multiple
> > inheritance. I'm looking for some feedback here on how to do it.
> > 
> > Thanks.
> > Ben
> > 
> > [1]: 
> > https://lore.kernel.org/qemu-devel/20210105165323.783725-1-ben.widaw...@intel.com/
> > 
>

Re: Handling multiple inheritance [for CXL]

2021-01-27 Thread Ben Widawsky

On 21-01-27 10:06:48, Daniel P. Berrangé wrote:
> On Tue, Jan 26, 2021 at 01:33:52PM -0800, Ben Widawsky wrote:
> > I'm working on CXL 2.0 type 3 memory devices [1]. In short, these are PCIe 
> > devices
> > that have persistent memory on them. As such, it would be nice to inherit 
> > from
> > both a PCI_DEVICE class as well as an NVDIMM device class.
> > 
> > Truth be told, using TYPE_MEMORY_DEVICE as the interface does provide most 
> > of
> > what I need. I'm wondering what the best way to handle this is. Currently, 
> > the
> > only thing NVDIMM class provides is write/read_label_data, this is driven by
> > _DSM. For CXL, the mechanism to read/write the equivalent area is not done 
> > via
> > _DSM, but done directly via a mailbox interface. However, the intent is the
> > same, and so utilizing similar code seems ideal.
> > 
> > If there's a desire to unify these code paths, I'd need something like 
> > multiple
> > inheritance. I'm looking for some feedback here on how to do it.
> 
> We don't have a direct concept of multiple inheritance in QOM.
> 
> The closest you can get is to turn the NVDIMM class into an
> interface. You can inherit from PCI_DEVICE and then implement
> the NVDIMM interface.
> 
> Regards,
> Daniel

Is there a concise summary of what the tradeoffs would be of moving NVDIMM to an
interface? AFAICT, there's a lot of things done through subclassing that can
just as easily be done as an interface, but I don't understand the reason for
that.

Handling multiple inheritance [for CXL]

2021-01-26 Thread Ben Widawsky

I'm working on CXL 2.0 type 3 memory devices [1]. In short, these are PCIe 
devices
that have persistent memory on them. As such, it would be nice to inherit from
both a PCI_DEVICE class as well as an NVDIMM device class.

Truth be told, using TYPE_MEMORY_DEVICE as the interface does provide most of
what I need. I'm wondering what the best way to handle this is. Currently, the
only thing NVDIMM class provides is write/read_label_data, this is driven by
_DSM. For CXL, the mechanism to read/write the equivalent area is not done via
_DSM, but done directly via a mailbox interface. However, the intent is the
same, and so utilizing similar code seems ideal.

If there's a desire to unify these code paths, I'd need something like multiple
inheritance. I'm looking for some feedback here on how to do it.

Thanks.
Ben

[1]: 
https://lore.kernel.org/qemu-devel/20210105165323.783725-1-ben.widaw...@intel.com/

Re: [RFC PATCH] Makefile: add GNU global tags support

2021-01-12 Thread Ben Widawsky

On 21-01-12 09:27:39, Alex Bennée wrote:
> 
> Ben Widawsky  writes:
> 
> > On 21-01-08 22:30:59, Alex Bennée wrote:
> >> 
> >> Ben Widawsky  writes:
> >> 
> >> > On 21-01-08 12:19:35, Alex Bennée wrote:
> >> >> GNU Global is another tags engine which is more like cscope in being
> >> >> able to support finding both references and definitions. You will be
> >> >> un-surprised to know it also integrates well with Emacs.
> >> >> 
> >> >> The main benefit of integrating it into find-src-path is it takes less
> >> >> time to rebuild the database from scratch when you have a lot of build
> >> >> directories under your source tree.
> >> >> 
> >> >> Signed-off-by: Alex Bennée 
> >> >
> >> > It might be worth mentioning that the Linux kernel has supported this 
> >> > for a long
> >> > time now (10+ years).
> >> >
> >> > Having switched to gtags about 3 years ago, I think it's summarily 
> >> > better and
> >> > would really like this to get merged.
> >> 
> >> So I take it that's a reviewed-by and a tested-by tag from you?
> >> 
> >
> > It doesn't actually work correctly for me, I just like the idea :-)
> >
> > make gtags 2>&1  | grep ignored | wc -l
> > 6266
> >
> > Warning: '/home/bwidawsk/work/clk/qemu/accel/qtest/qtest.c' is out of
> > source tree. ignored.
> 
> Did you run this in the build directory by any chance? I tested in the
> source directory because that's generally where you want the tags.
> 
> I wonder what the best solution is to this? Always force ourselves to be
> in the source dir? Or error out when we are not in the source tree?

I was in the build directory. With ctags, that works for me in both source and
build directory.

It does indeed work from the source directory.

I'm wondering how gtags can't seem to do this (I wasn't able to figure it out,
at least).

I'd be in favor of error.


> 
> 
> >
> >> >
> >> >> ---
> >> >>  Makefile   | 9 -
> >> >>  .gitignore | 3 +++
> >> >>  2 files changed, 11 insertions(+), 1 deletion(-)
> >> >> 
> >> >> diff --git a/Makefile b/Makefile
> >> >> index fb9923ff22..66eec99685 100644
> >> >> --- a/Makefile
> >> >> +++ b/Makefile
> >> >> @@ -253,6 +253,13 @@ ctags:
> >> >> rm -f "$(SRC_PATH)/"tags
> >> >> $(find-src-path) -exec ctags -f "$(SRC_PATH)/"tags --append {} +
> >> >>  
> >> >> +.PHONY: gtags
> >> >> +gtags:
> >> >> +   rm -f "$(SRC_PATH)/"GTAGS
> >> >> +   rm -f "$(SRC_PATH)/"GRTAGS
> >> >> +   rm -f "$(SRC_PATH)/"GPATH
> >> >> +   $(find-src-path) | gtags -f -
> >> >> +
> >> >>  .PHONY: TAGS
> >> >>  TAGS:
> >> >> rm -f "$(SRC_PATH)/"TAGS
> >> >> @@ -279,7 +286,7 @@ help:
> >> >> $(call print-help,all,Build all)
> >> >> $(call print-help,dir/file.o,Build specified target only)
> >> >> $(call print-help,install,Install QEMU, documentation and tools)
> >> >> -   $(call print-help,ctags/TAGS,Generate tags file for editors)
> >> >> +   $(call print-help,ctags/gtags/TAGS,Generate tags file for 
> >> >> editors)
> >> >> $(call print-help,cscope,Generate cscope index)
> >> >> $(call print-help,sparse,Run sparse on the QEMU source)
> >> >> @echo  ''
> >> >> diff --git a/.gitignore b/.gitignore
> >> >> index b32bca1315..75a4be0724 100644
> >> >> --- a/.gitignore
> >> >> +++ b/.gitignore
> >> >> @@ -7,6 +7,9 @@
> >> >>  cscope.*
> >> >>  tags
> >> >>  TAGS
> >> >> +GPATH
> >> >> +GRTAGS
> >> >> +GTAGS
> >> >>  *~
> >> >>  *.ast_raw
> >> >>  *.depend_raw
> >> >> -- 
> >> >> 2.20.1
> >> >> 
> >> >> 
> >> 
> >> 
> >> -- 
> >> Alex Bennée
> 
> 
> -- 
> Alex Bennée

Re: [RFC PATCH] Makefile: add GNU global tags support

2021-01-08 Thread Ben Widawsky

On 21-01-08 22:30:59, Alex Bennée wrote:
> 
> Ben Widawsky  writes:
> 
> > On 21-01-08 12:19:35, Alex Bennée wrote:
> >> GNU Global is another tags engine which is more like cscope in being
> >> able to support finding both references and definitions. You will be
> >> un-surprised to know it also integrates well with Emacs.
> >> 
> >> The main benefit of integrating it into find-src-path is it takes less
> >> time to rebuild the database from scratch when you have a lot of build
> >> directories under your source tree.
> >> 
> >> Signed-off-by: Alex Bennée 
> >
> > It might be worth mentioning that the Linux kernel has supported this for a 
> > long
> > time now (10+ years).
> >
> > Having switched to gtags about 3 years ago, I think it's summarily better 
> > and
> > would really like this to get merged.
> 
> So I take it that's a reviewed-by and a tested-by tag from you?
> 

It doesn't actually work correctly for me, I just like the idea :-)

make gtags 2>&1  | grep ignored | wc -l
6266

Warning: '/home/bwidawsk/work/clk/qemu/accel/qtest/qtest.c' is out of source 
tree. ignored.

> >
> >> ---
> >>  Makefile   | 9 -
> >>  .gitignore | 3 +++
> >>  2 files changed, 11 insertions(+), 1 deletion(-)
> >> 
> >> diff --git a/Makefile b/Makefile
> >> index fb9923ff22..66eec99685 100644
> >> --- a/Makefile
> >> +++ b/Makefile
> >> @@ -253,6 +253,13 @@ ctags:
> >>rm -f "$(SRC_PATH)/"tags
> >>$(find-src-path) -exec ctags -f "$(SRC_PATH)/"tags --append {} +
> >>  
> >> +.PHONY: gtags
> >> +gtags:
> >> +  rm -f "$(SRC_PATH)/"GTAGS
> >> +  rm -f "$(SRC_PATH)/"GRTAGS
> >> +  rm -f "$(SRC_PATH)/"GPATH
> >> +  $(find-src-path) | gtags -f -
> >> +
> >>  .PHONY: TAGS
> >>  TAGS:
> >>rm -f "$(SRC_PATH)/"TAGS
> >> @@ -279,7 +286,7 @@ help:
> >>$(call print-help,all,Build all)
> >>$(call print-help,dir/file.o,Build specified target only)
> >>$(call print-help,install,Install QEMU, documentation and tools)
> >> -  $(call print-help,ctags/TAGS,Generate tags file for editors)
> >> +  $(call print-help,ctags/gtags/TAGS,Generate tags file for editors)
> >>$(call print-help,cscope,Generate cscope index)
> >>$(call print-help,sparse,Run sparse on the QEMU source)
> >>@echo  ''
> >> diff --git a/.gitignore b/.gitignore
> >> index b32bca1315..75a4be0724 100644
> >> --- a/.gitignore
> >> +++ b/.gitignore
> >> @@ -7,6 +7,9 @@
> >>  cscope.*
> >>  tags
> >>  TAGS
> >> +GPATH
> >> +GRTAGS
> >> +GTAGS
> >>  *~
> >>  *.ast_raw
> >>  *.depend_raw
> >> -- 
> >> 2.20.1
> >> 
> >> 
> 
> 
> -- 
> Alex Bennée

Re: [RFC PATCH v2 00/32] CXL 2.0 Support

2021-01-08 Thread Ben Widawsky

On 21-01-08 18:44:04, Jonathan Cameron wrote:
> On Tue, 5 Jan 2021 08:52:51 -0800
> Ben Widawsky  wrote:
> 
> > Fixes since v1 [1]:
> >  * Defer introducing some commands/registers not yet used (Ben)
> >  * Add stubbed device_reg_init_common() (Ben)
> >  * Improve assertions in DVSEC creation (Jonathan)
> >  * Use 'n' for HDM register offsets (Jonathan)
> >  * Correct revision ID for extensions (Jonathan)
> >  * Minor cleanups and clarifications (Jonathan)
> >  * Remove error codes not yet used (Jonathan)
> >  * Fix interrupt enable bit width (Jonathan)
> >  * Add comment for weird register size (Jonathan)
> >  * Break out register alignment checks (Jonathan)
> >  * Use the reg alignment helper (Jonathan)
> >  * Rename error codes to match spec
> >  * Fix cap count mid series (Jonathan)
> > 
> > New since v1 [1]:
> >  * Entirely reworked framework for firmware handling
> >  * Implemented more device commands
> >  * CEL support
> > 
> > (There are some new patches that I just named 'v2' for simplicity's sake)
> > 
> > Introduce emulation of Compute Express Link 2.0
> > (https://www.computeexpresslink.org/).
> 
> Hi Ben,
> 
> In interests of avoid duplication, I thought I'd give a quick update of where
> I am. I have what is here working on arm64 (enabling is fairly simple).
> I'll have a bit more feedback next week and hopefully have time for an 
> indepth review.

Thanks for this. Dan recommended I send out an email with plans for what's
needed, what we intend to work on and so on. I'll try to get that out next week
also.

> 
> I did run into some issues around alignment for the persistent memory that
> (I think) are down to the fact the memory-backend isn't quite tied up to the
> device.

I did have plans to change this to support interleaving.

> 
> Flows wise, I initially started hacking in NFIT building changes in qemu but 
> I'm not
> yet sure how the hotplug flows are supposed to work and who is responsible for
> setting up a hotplugged device (OS or firmware).

It won't be NFIT :-). I'd redirect this question to the linux-cxl mailing list,
as it's a good one. The spec is somewhat handwavy around this.

> 
> Plans wise, I was thinking I could look at DOE emulation as seems that we'll
> be needing that fairly soon to get any useful memory usecases up and running.
> Let me know if you already have that underway.  
> 
> Jonathan
> 

Chris in the Cc had some plans to enable DOE for their purposes. It'd be good to
sync with him. AFAIK though, nobody has plans to enable DOE in Linux, which
might be a good target next.

> > 
> > The emulation has been critical to get the Linux enabling started [2], it 
> > would
> > be an ideal place to land regression tests for different topology handling, 
> > and
> > there may be applications for this emulation as a way for a guest to 
> > manipulate
> > its address space relative to different performance memories.
> > 
> > Three of the five CXL component types are emulated with some level of 
> > functionality:
> > host bridge, root port, and memory device. Upstream ports and downstream 
> > ports
> > aren't implemented (the two components needed to make up a switch).
> > 
> > CXL 2.0 is built on top of PCIe (see spec for details). As a result, much 
> > of the
> > implementation utilizes existing PCI paradigms. To implement the host 
> > bridge,
> > I've chosen to use PXB (PCI Expander Bridge). It seemed to be the most 
> > natural
> > fit even though it doesn't directly map to how hardware will work. For
> > persistent capacity of the memory device, I utilized the memory subsystem
> > (hw/mem).
> > 
> > We have 3 reasons why this work is valuable:
> > 1. OS driver development and testing
> > 2. OS driver regression testing
> > 3. Possible guest support for HDMs
> > 
> > As mentioned above there are three benefits to carrying this enabling in
> > upstream QEMU:
> > 
> > 1. Linux driver feature development benefits from emulation both due to
> > a lack of initial hardware availability, but also, as is seen with
> > NVDIMM/PMEM emulation, there is value in being able to share
> > topologies with system-software developers even after hardware is
> > available.
> > 
> > 2. The Linux kernel's unit test suite for NVDIMM/PMEM ended up injecting 
> > fake
> > resources via custom modules (nfit_test). In retrospect a QEMU emulation of
> > nfit_test capabilities would have made the test environment more portable, 
> > and
> > allowed for easier community contributions of example configuratio

Re: [RFC PATCH] Makefile: add GNU global tags support

2021-01-08 Thread Ben Widawsky

On 21-01-08 12:19:35, Alex Bennée wrote:
> GNU Global is another tags engine which is more like cscope in being
> able to support finding both references and definitions. You will be
> un-surprised to know it also integrates well with Emacs.
> 
> The main benefit of integrating it into find-src-path is it takes less
> time to rebuild the database from scratch when you have a lot of build
> directories under your source tree.
> 
> Signed-off-by: Alex Bennée 

It might be worth mentioning that the Linux kernel has supported this for a long
time now (10+ years).

Having switched to gtags about 3 years ago, I think it's summarily better and
would really like this to get merged.

> ---
>  Makefile   | 9 -
>  .gitignore | 3 +++
>  2 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/Makefile b/Makefile
> index fb9923ff22..66eec99685 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -253,6 +253,13 @@ ctags:
>   rm -f "$(SRC_PATH)/"tags
>   $(find-src-path) -exec ctags -f "$(SRC_PATH)/"tags --append {} +
>  
> +.PHONY: gtags
> +gtags:
> + rm -f "$(SRC_PATH)/"GTAGS
> + rm -f "$(SRC_PATH)/"GRTAGS
> + rm -f "$(SRC_PATH)/"GPATH
> + $(find-src-path) | gtags -f -
> +
>  .PHONY: TAGS
>  TAGS:
>   rm -f "$(SRC_PATH)/"TAGS
> @@ -279,7 +286,7 @@ help:
>   $(call print-help,all,Build all)
>   $(call print-help,dir/file.o,Build specified target only)
>   $(call print-help,install,Install QEMU, documentation and tools)
> - $(call print-help,ctags/TAGS,Generate tags file for editors)
> + $(call print-help,ctags/gtags/TAGS,Generate tags file for editors)
>   $(call print-help,cscope,Generate cscope index)
>   $(call print-help,sparse,Run sparse on the QEMU source)
>   @echo  ''
> diff --git a/.gitignore b/.gitignore
> index b32bca1315..75a4be0724 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -7,6 +7,9 @@
>  cscope.*
>  tags
>  TAGS
> +GPATH
> +GRTAGS
> +GTAGS
>  *~
>  *.ast_raw
>  *.depend_raw
> -- 
> 2.20.1
> 
>

Re: [Linuxarm] Re: [RFC PATCH v2 07/32] hw/cxl/device: Implement basic mailbox (8.2.8.4)

2021-01-07 Thread Ben Widawsky

On 21-01-06 11:08:28, Ben Widawsky wrote:
> On 21-01-06 10:05:57, Ben Widawsky wrote:
> > On 21-01-06 17:40:14, Jonathan Cameron wrote:
> > > On Wed, 6 Jan 2021 13:21:23 +
> > > Jonathan Cameron  wrote:
> > > 
> > > > On Tue, 5 Jan 2021 08:52:58 -0800
> > > > Ben Widawsky  wrote:
> > > > 
> 
> [snip]
> 
> > 
> > I'm sorry you had to debug this. I had fixed this previously and it got 
> > lost.
> > I'm currently between test applications, so my regression testing isn't 
> > great.
> > 
> > I think the fix should be something like this, but I can't easily test at 
> > the
> > moment:
> > 
> > diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
> > index c515d45d20..b38e9b4c17 100644
> > --- a/hw/cxl/cxl-device-utils.c
> > +++ b/hw/cxl/cxl-device-utils.c
> > @@ -102,6 +102,9 @@ static void mailbox_reg_write(void *opaque, hwaddr 
> > offset, uint64_t value,
> >  {
> >  CXLDeviceState *cxl_dstate = opaque;
> > 
> > +if (offset >= A_CXL_DEV_CMD_PAYLOAD)
> > +stn_le_p(cxl_dstate->mbox_reg_state, size, value);
> > +
> >  /*
> >   * Lock is needed to prevent concurrent writes as well as to prevent 
> > writes
> >   * coming in while the firmware is processing. Without background 
> > commands
> > 
> > 
> > 
> 
> +if (offset >= A_CXL_DEV_CMD_PAYLOAD) {
> +stn_le_p(cxl_dstate->mbox_reg_state, size, value);
> +return;
> +}
> +

Last time's a charm
 stn_le_p(cxl_dstate->mbox_reg_state + offset, size, value);

> 
> [snip]
> 
> >

Re: [Linuxarm] Re: [RFC PATCH v2 07/32] hw/cxl/device: Implement basic mailbox (8.2.8.4)

2021-01-06 Thread Ben Widawsky

On 21-01-06 10:05:57, Ben Widawsky wrote:
> On 21-01-06 17:40:14, Jonathan Cameron wrote:
> > On Wed, 6 Jan 2021 13:21:23 +
> > Jonathan Cameron  wrote:
> > 
> > > On Tue, 5 Jan 2021 08:52:58 -0800
> > > Ben Widawsky  wrote:
> > > 

[snip]

> 
> I'm sorry you had to debug this. I had fixed this previously and it got lost.
> I'm currently between test applications, so my regression testing isn't great.
> 
> I think the fix should be something like this, but I can't easily test at the
> moment:
> 
> diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
> index c515d45d20..b38e9b4c17 100644
> --- a/hw/cxl/cxl-device-utils.c
> +++ b/hw/cxl/cxl-device-utils.c
> @@ -102,6 +102,9 @@ static void mailbox_reg_write(void *opaque, hwaddr 
> offset, uint64_t value,
>  {
>  CXLDeviceState *cxl_dstate = opaque;
> 
> +if (offset >= A_CXL_DEV_CMD_PAYLOAD)
> +stn_le_p(cxl_dstate->mbox_reg_state, size, value);
> +
>  /*
>   * Lock is needed to prevent concurrent writes as well as to prevent 
> writes
>   * coming in while the firmware is processing. Without background 
> commands
> 
> 
> 

+if (offset >= A_CXL_DEV_CMD_PAYLOAD) {
+stn_le_p(cxl_dstate->mbox_reg_state, size, value);
+return;
+}
+

[snip]

>

Re: [Linuxarm] Re: [RFC PATCH v2 07/32] hw/cxl/device: Implement basic mailbox (8.2.8.4)

2021-01-06 Thread Ben Widawsky

On 21-01-06 17:40:14, Jonathan Cameron wrote:
> On Wed, 6 Jan 2021 13:21:23 +
> Jonathan Cameron  wrote:
> 
> > On Tue, 5 Jan 2021 08:52:58 -0800
> > Ben Widawsky  wrote:
> > 
> > > This is the beginning of implementing mailbox support for CXL 2.0
> > > devices.
> > > 
> > > v2: Use register alignment helper (Ben)
> > > Minor cleanups (Jonathan)
> > > Rename error codes to match spec (Jonathan)
> > > Update cap count from 1 to 2 (Jonathan)
> > > Add infra to support CEL (Ben)
> > > Add more of the actual mailbox handling from later patch (Ben)
> > > 
> > > Signed-off-by: Ben Widawsky   
> > 
> > Hi Ben,
> > 
> > I hacked support in for ARM64 to give this a spin and ran into an
> > interesting problem around read sizes.
> > 
> > The mailbox registers space allows 4 or 8 byte reads, but in the kernel
> > driver (I think I have the right version from your github) you do
> > the payload drain with
> > memcpy_from_io()
> > 
> > If the size of the payload is not a multiple of 8 bytes, on ARM64 that
> > results in byte reads and an exception.  This happens with some of the
> > existing calls which happen to have non multiple of 8 payload sizes.
> > 
> > I hacked below to allow 1 byte reads from that region but that's probably
> > not the right fix.  I found a statement in the CXL spec saying maximum read
> > size from this register block was 8 bytes but couldn't immediately see a 
> > minimum.
> > (I haven't looked that hard yet though!)
> > 
> > Various approaches in kernel could also be used:
> > 1) Change the payload drain to have specific handling for the end few bytes.
> > 2) Pad the various structures to ensure payloads are always 8 byte multiples
> > in length (nasty).
> 
> Bit more testing an another little thing below.
> 
> J
> 
> > 
> > > ---
> > >  hw/cxl/cxl-device-utils.c   | 122 -
> > >  hw/cxl/cxl-mailbox-utils.c  | 173 
> > >  hw/cxl/meson.build  |   1 +
> > >  include/hw/cxl/cxl.h|   3 +
> > >  include/hw/cxl/cxl_device.h |  27 +-
> > >  5 files changed, 322 insertions(+), 4 deletions(-)
> > >  create mode 100644 hw/cxl/cxl-mailbox-utils.c
> > > 
> > > diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
> > > index b86e5466bd..642e3c2617 100644
> > > --- a/hw/cxl/cxl-device-utils.c
> > > +++ b/hw/cxl/cxl-device-utils.c
> > > @@ -44,6 +44,108 @@ static uint64_t dev_reg_read(void *opaque, hwaddr 
> > > offset, unsigned size)
> > >  return ldn_le_p(, size);
> > >  }
> > >  
> > > +static uint64_t mailbox_reg_read(void *opaque, hwaddr offset, unsigned 
> > > size)
> > > +{
> > > +CXLDeviceState *cxl_dstate = opaque;
> > > +
> > > +if (cxl_device_check_register_alignment(offset, size)) {
> > > +qemu_log_mask(LOG_UNIMP, "Unaligned register read\n");
> > > +return 0;
> > > +}
> > > +
> > > +return ldn_le_p(cxl_dstate->mbox_reg_state + offset, size);
> > > +}
> > > +
> > > +static void mailbox_mem_writel(uint32_t *reg_state, hwaddr offset,
> > > +   uint64_t value)
> > > +{
> > > +switch (offset) {
> > > +case A_CXL_DEV_MAILBOX_CTRL:
> > > +/* fallthrough */
> > > +case A_CXL_DEV_MAILBOX_CAP:
> > > +/* RO register */
> > > +break;
> > > +default:
> > > +qemu_log_mask(LOG_UNIMP,
> > > +  "%s Unexpected 32-bit access to 0x%" PRIx64 " 
> > > (WI)\n",
> > > +  __func__, offset);
> > > +break;
> > > +}
> > > +
> > > +stl_le_p((uint8_t *)reg_state + offset, value);
> > > +}
> > > +
> > > +static void mailbox_mem_writeq(uint8_t *reg_state, hwaddr offset,
> > > +   uint64_t value)
> > > +{
> > > +switch (offset) {
> > > +case A_CXL_DEV_MAILBOX_CMD:
> > > +break;
> > > +case A_CXL_DEV_BG_CMD_STS:
> > > +/* BG not supported */
> > > +/* fallthrough */
> > > +case A_CXL_DEV_MAILBOX_STS:
> > > +/* Read only register, will get updated by the state machine */
> > > +re

Re: [RFC PATCH v2 05/32] hw/cxl/device: Implement the CAP array (8.2.8.1-2)

2021-01-06 Thread Ben Widawsky

On 21-01-06 17:06:41, Jonathan Cameron wrote:
> On Wed, 6 Jan 2021 08:49:48 -0800
> Ben Widawsky  wrote:
> 
> > On 21-01-06 13:28:05, Jonathan Cameron wrote:
> > > On Tue, 5 Jan 2021 08:52:56 -0800
> > > Ben Widawsky  wrote:
> > >   
> > > > This implements all device MMIO up to the first capability .That
> > > > includes the CXL Device Capabilities Array Register, as well as all of
> > > > the CXL Device Capability Header Registers. The latter are filled in as
> > > > they are implemented in the following patches.
> > > > 
> > > > v2: Break out register alignment checks (Jonathan)
> > > > 
> > > > Signed-off-by: Ben Widawsky   
> > > Hi Ben,
> > > 
> > > One buglet / inconsistency inline that I spotted whilst chasing that issue
> > > with size of reads.
> > > 
> > > Will get to a full review after messing around ('testing') this a bit 
> > > more ;)
> > > 
> > > Jonathan
> > >   
> > > > ---
> > > >  hw/cxl/cxl-device-utils.c | 72 +++
> > > >  hw/cxl/meson.build|  1 +
> > > >  2 files changed, 73 insertions(+)
> > > >  create mode 100644 hw/cxl/cxl-device-utils.c
> > > > 
> > > > diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
> > > > new file mode 100644
> > > > index 00..d1b1371e66
> > > > --- /dev/null
> > > > +++ b/hw/cxl/cxl-device-utils.c
> > > > @@ -0,0 +1,72 @@
> > > > +/*
> > > > + * CXL Utility library for devices
> > > > + *
> > > > + * Copyright(C) 2020 Intel Corporation.
> > > > + *
> > > > + * This work is licensed under the terms of the GNU GPL, version 2. 
> > > > See the
> > > > + * COPYING file in the top-level directory.
> > > > + */
> > > > +
> > > > +#include "qemu/osdep.h"
> > > > +#include "qemu/log.h"
> > > > +#include "hw/cxl/cxl.h"
> > > > +
> > > > +static int cxl_device_check_register_alignment(hwaddr offset, unsigned 
> > > > size)
> > > > +{
> > > > +if (unlikely(offset & (size - 1))) {
> > > > +return 1;
> > > > +}
> > > > +
> > > > +return 0;
> > > > +}
> > > > +
> > > > +static uint64_t caps_reg_read(void *opaque, hwaddr offset, unsigned 
> > > > size)
> > > > +{
> > > > +CXLDeviceState *cxl_dstate = opaque;
> > > > +
> > > > +if (cxl_device_check_register_alignment(offset, size)) {
> > > > +qemu_log_mask(LOG_UNIMP, "Unaligned register read\n");
> > > > +return 0;
> > > > +}
> > > > +
> > > > +return ldn_le_p(cxl_dstate->caps_reg_state + offset, size);
> > > > +}
> > > > +
> > > > +static const MemoryRegionOps caps_ops = {
> > > > +.read = caps_reg_read,
> > > > +.write = NULL,
> > > > +.endianness = DEVICE_LITTLE_ENDIAN,
> > > > +.valid = {
> > > > +.min_access_size = 4,
> > > > +.max_access_size = 8,
> > > > +},
> > > > +.impl = {
> > > > +.min_access_size = 4,
> > > > +.max_access_size = 8,
> > > > +},
> > > > +};
> > > > +
> > > > +void cxl_device_register_block_init(Object *obj, CXLDeviceState 
> > > > *cxl_dstate)
> > > > +{
> > > > +/* This will be a BAR, so needs to be rounded up to pow2 for PCI 
> > > > spec */
> > > > +memory_region_init(
> > > > +_dstate->device_registers, obj, "device-registers",
> > > > +pow2ceil(CXL_MAILBOX_REGISTERS_LENGTH + 
> > > > CXL_MAILBOX_REGISTERS_OFFSET));  
> > > 
> > > I can see why you jumped directly to sizing this for the whole region, 
> > > but the snag
> > > is that means I think you missed the fact that patch 8 adds a region 
> > > after the end
> > > of the mailbox.   Doesn't result in an actual bug because the ceil above 
> > > takes
> > > you way past the space needed though (the memory device region is only 8 
> > > bytes long).
> > > 
> > >   
> > 
> > Maybe I misunderstan

Re: [RFC PATCH v2 05/32] hw/cxl/device: Implement the CAP array (8.2.8.1-2)

2021-01-06 Thread Ben Widawsky

On 21-01-06 13:28:05, Jonathan Cameron wrote:
> On Tue, 5 Jan 2021 08:52:56 -0800
> Ben Widawsky  wrote:
> 
> > This implements all device MMIO up to the first capability .That
> > includes the CXL Device Capabilities Array Register, as well as all of
> > the CXL Device Capability Header Registers. The latter are filled in as
> > they are implemented in the following patches.
> > 
> > v2: Break out register alignment checks (Jonathan)
> > 
> > Signed-off-by: Ben Widawsky 
> Hi Ben,
> 
> One buglet / inconsistency inline that I spotted whilst chasing that issue
> with size of reads.
> 
> Will get to a full review after messing around ('testing') this a bit more ;)
> 
> Jonathan
> 
> > ---
> >  hw/cxl/cxl-device-utils.c | 72 +++
> >  hw/cxl/meson.build|  1 +
> >  2 files changed, 73 insertions(+)
> >  create mode 100644 hw/cxl/cxl-device-utils.c
> > 
> > diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
> > new file mode 100644
> > index 00..d1b1371e66
> > --- /dev/null
> > +++ b/hw/cxl/cxl-device-utils.c
> > @@ -0,0 +1,72 @@
> > +/*
> > + * CXL Utility library for devices
> > + *
> > + * Copyright(C) 2020 Intel Corporation.
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2. See the
> > + * COPYING file in the top-level directory.
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "qemu/log.h"
> > +#include "hw/cxl/cxl.h"
> > +
> > +static int cxl_device_check_register_alignment(hwaddr offset, unsigned 
> > size)
> > +{
> > +if (unlikely(offset & (size - 1))) {
> > +return 1;
> > +}
> > +
> > +return 0;
> > +}
> > +
> > +static uint64_t caps_reg_read(void *opaque, hwaddr offset, unsigned size)
> > +{
> > +CXLDeviceState *cxl_dstate = opaque;
> > +
> > +if (cxl_device_check_register_alignment(offset, size)) {
> > +qemu_log_mask(LOG_UNIMP, "Unaligned register read\n");
> > +return 0;
> > +}
> > +
> > +return ldn_le_p(cxl_dstate->caps_reg_state + offset, size);
> > +}
> > +
> > +static const MemoryRegionOps caps_ops = {
> > +.read = caps_reg_read,
> > +.write = NULL,
> > +.endianness = DEVICE_LITTLE_ENDIAN,
> > +.valid = {
> > +.min_access_size = 4,
> > +.max_access_size = 8,
> > +},
> > +.impl = {
> > +.min_access_size = 4,
> > +.max_access_size = 8,
> > +},
> > +};
> > +
> > +void cxl_device_register_block_init(Object *obj, CXLDeviceState 
> > *cxl_dstate)
> > +{
> > +/* This will be a BAR, so needs to be rounded up to pow2 for PCI spec 
> > */
> > +memory_region_init(
> > +_dstate->device_registers, obj, "device-registers",
> > +pow2ceil(CXL_MAILBOX_REGISTERS_LENGTH + 
> > CXL_MAILBOX_REGISTERS_OFFSET));
> 
> I can see why you jumped directly to sizing this for the whole region, but 
> the snag
> is that means I think you missed the fact that patch 8 adds a region after 
> the end
> of the mailbox.   Doesn't result in an actual bug because the ceil above takes
> you way past the space needed though (the memory device region is only 8 
> bytes long).
> 
> 

Maybe I misunderstand, but this is the intended behavior.
cxl_dstate->device_registers is the MemoryRegion container for all the
subregions that are the actual device MMIO.

 +--+
  ^  |  |
  |  |  unused  |
  |  
  |  |   memdev regs|
  |  
  |  |  |
  |  | +--+ |
 cxl_dstate-> |  | |  | |
 device_registers |  | |  | |
  |  | |payload   | |
  |  | |(2k currently)| |
  |  | |  | |
  |  | |  | |
  |  | +--+ |
  |  |   mailbox regs   |
  |  
  |  |device regs   |
  v  
 | caps regs|
 BAR --> +--+


Perhaps I should add this as a comment in the code?

> > +
> > +memory_region_init_io(_dstate->caps, obj, _ops, cxl_dstate,
> > +  "ca

Re: [RFC PATCH v2 07/32] hw/cxl/device: Implement basic mailbox (8.2.8.4)

2021-01-06 Thread Ben Widawsky

On 21-01-06 13:21:23, Jonathan Cameron wrote:
> On Tue, 5 Jan 2021 08:52:58 -0800
> Ben Widawsky  wrote:
> 
> > This is the beginning of implementing mailbox support for CXL 2.0
> > devices.
> > 
> > v2: Use register alignment helper (Ben)
> > Minor cleanups (Jonathan)
> > Rename error codes to match spec (Jonathan)
> > Update cap count from 1 to 2 (Jonathan)
> > Add infra to support CEL (Ben)
> > Add more of the actual mailbox handling from later patch (Ben)
> > 
> > Signed-off-by: Ben Widawsky 
> 
> Hi Ben,
> 
> I hacked support in for ARM64 to give this a spin and ran into an
> interesting problem around read sizes.

Cool!

> 
> The mailbox registers space allows 4 or 8 byte reads, but in the kernel
> driver (I think I have the right version from your github) you do
> the payload drain with
> memcpy_from_io()

https://gitlab.com/bwidawsk/linux/-/tree/cxl-2.0v3

Vishal in the Cc is finishing up some work on adding support to ndctl, after
which I'll be sending that (or something closely related) to the mailing list.

> 
> If the size of the payload is not a multiple of 8 bytes, on ARM64 that
> results in byte reads and an exception.  This happens with some of the
> existing calls which happen to have non multiple of 8 payload sizes.

Could you add the backtrace here? You caused me to go look at what x86 does. I
actually assumed on modern platforms it did movsb (byte moves), which are highly
optimized in microcode. In fact, it does some amount of word moves too, which
was new info to me.

So I'm not really sure yet why ARM64 is special here.

> 
> I hacked below to allow 1 byte reads from that region but that's probably
> not the right fix.  I found a statement in the CXL spec saying maximum read
> size from this register block was 8 bytes but couldn't immediately see a 
> minimum.
> (I haven't looked that hard yet though!)

8.2
• A 32 bit register shall be accessed as a 1 Byte, 2 Bytes or 4 Bytes quantity.
• A 64 bit register shall be accessed as a 1 Byte, 2 Bytes, 4 Bytes or 8 Bytes 
quantity

Interesting though, I really thought QEMU should do the right thing here and you
wouldn't need to allow 1byte accesses. 

> 
> Various approaches in kernel could also be used:
> 1) Change the payload drain to have specific handling for the end few bytes.
> 2) Pad the various structures to ensure payloads are always 8 byte multiples
> in length (nasty).

I think the right solution is to support/allow byte and word access. I just
thought it should work in the existing code. I'd like to try to figure out
what's going on.

How much effort would it be for me to reproduce what you've done?

> 
> > ---
> >  hw/cxl/cxl-device-utils.c   | 122 -
> >  hw/cxl/cxl-mailbox-utils.c  | 173 
> >  hw/cxl/meson.build  |   1 +
> >  include/hw/cxl/cxl.h|   3 +
> >  include/hw/cxl/cxl_device.h |  27 +-
> >  5 files changed, 322 insertions(+), 4 deletions(-)
> >  create mode 100644 hw/cxl/cxl-mailbox-utils.c
> > 
> > diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
> > index b86e5466bd..642e3c2617 100644
> > --- a/hw/cxl/cxl-device-utils.c
> > +++ b/hw/cxl/cxl-device-utils.c
> > @@ -44,6 +44,108 @@ static uint64_t dev_reg_read(void *opaque, hwaddr 
> > offset, unsigned size)
> >  return ldn_le_p(, size);
> >  }
> >  
> > +static uint64_t mailbox_reg_read(void *opaque, hwaddr offset, unsigned 
> > size)
> > +{
> > +CXLDeviceState *cxl_dstate = opaque;
> > +
> > +if (cxl_device_check_register_alignment(offset, size)) {
> > +qemu_log_mask(LOG_UNIMP, "Unaligned register read\n");
> > +return 0;
> > +}
> > +
> > +return ldn_le_p(cxl_dstate->mbox_reg_state + offset, size);
> > +}
> > +
> > +static void mailbox_mem_writel(uint32_t *reg_state, hwaddr offset,
> > +   uint64_t value)
> > +{
> > +switch (offset) {
> > +case A_CXL_DEV_MAILBOX_CTRL:
> > +/* fallthrough */
> > +case A_CXL_DEV_MAILBOX_CAP:
> > +/* RO register */
> > +break;
> > +default:
> > +qemu_log_mask(LOG_UNIMP,
> > +  "%s Unexpected 32-bit access to 0x%" PRIx64 " 
> > (WI)\n",
> > +  __func__, offset);
> > +break;
> > +}
> > +
> > +stl_le_p((uint8_t *)reg_state + offset, value);
> > +}
> > +
> > +static void mailbox_mem_writeq(uint8_t *reg_state, hwaddr offset,
> > +

[RFC PATCH v2 28/32] acpi/cxl: Create the CEDT (9.14.1)

2021-01-05 Thread Ben Widawsky

The CXL Early Discovery Table is defined in the CXL 2.0 specification as
a way for the OS to get CXL specific information from the system
firmware.

As of CXL 2.0 spec, only 1 sub structure is defined, the CXL Host Bridge
Structure (CHBS) which is primarily useful for telling the OS exactly
where the MMIO for the host bridge is.

v2: Update CHBS to spec released definition

Signed-off-by: Ben Widawsky 
---
 hw/acpi/cxl.c   | 69 +
 hw/i386/acpi-build.c|  6 ++-
 hw/pci-bridge/pci_expander_bridge.c | 21 +
 include/hw/acpi/cxl.h   |  4 ++
 include/hw/pci/pci_bridge.h | 25 +++
 5 files changed, 104 insertions(+), 21 deletions(-)

diff --git a/hw/acpi/cxl.c b/hw/acpi/cxl.c
index 7124d5a1a3..68db0fe3a8 100644
--- a/hw/acpi/cxl.c
+++ b/hw/acpi/cxl.c
@@ -18,14 +18,83 @@
  */
 
 #include "qemu/osdep.h"
+#include "hw/sysbus.h"
+#include "hw/pci/pci_bridge.h"
+#include "hw/pci/pci_host.h"
 #include "hw/cxl/cxl.h"
+#include "hw/mem/memory-device.h"
 #include "hw/acpi/acpi.h"
 #include "hw/acpi/aml-build.h"
 #include "hw/acpi/bios-linker-loader.h"
 #include "hw/acpi/cxl.h"
+#include "hw/acpi/cxl.h"
 #include "qapi/error.h"
 #include "qemu/uuid.h"
 
+static void cedt_build_chbs(GArray *table_data, PXBDev *cxl)
+{
+SysBusDevice *sbd = SYS_BUS_DEVICE(cxl->cxl.cxl_host_bridge);
+struct MemoryRegion *mr = sbd->mmio[0].memory;
+
+/* Type */
+build_append_int_noprefix(table_data, 0, 1);
+
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 1);
+
+/* Record Length */
+build_append_int_noprefix(table_data, 32, 2);
+
+/* UID */
+build_append_int_noprefix(table_data, cxl->uid, 4);
+
+/* Version */
+build_append_int_noprefix(table_data, 1, 4);
+
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 4);
+
+/* Base */
+build_append_int_noprefix(table_data, mr->addr, 8);
+
+/* Length */
+build_append_int_noprefix(table_data, memory_region_size(mr), 8);
+}
+
+static int cxl_foreach_pxb_hb(Object *obj, void *opaque)
+{
+Aml *cedt = opaque;
+
+if (object_dynamic_cast(obj, TYPE_PXB_CXL_DEVICE)) {
+PXBDev *pxb = PXB_CXL_DEV(obj);
+
+cedt_build_chbs(cedt->buf, pxb);
+}
+
+return 0;
+}
+
+void cxl_build_cedt(GArray *table_offsets, GArray *table_data,
+BIOSLinker *linker)
+{
+const int cedt_start = table_data->len;
+Aml *cedt;
+
+cedt = init_aml_allocator();
+
+/* reserve space for CEDT header */
+acpi_add_table(table_offsets, table_data);
+acpi_data_push(cedt->buf, sizeof(AcpiTableHeader));
+
+object_child_foreach_recursive(object_get_root(), cxl_foreach_pxb_hb, 
cedt);
+
+/* copy AML table into ACPI tables blob and patch header there */
+g_array_append_vals(table_data, cedt->buf->data, cedt->buf->len);
+build_header(linker, table_data, (void *)(table_data->data + cedt_start),
+ "CEDT", table_data->len - cedt_start, 1, NULL, NULL);
+free_aml_allocator();
+}
+
 static Aml *__build_cxl_osc_method(void)
 {
 Aml *method, *if_uuid, *else_uuid, *if_arg1_not_1, *if_cxl, 
*if_caps_masked;
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 3eb07b9741..49242eb8f3 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -75,6 +75,8 @@
 #include "hw/acpi/ipmi.h"
 #include "hw/acpi/hmat.h"
 
+#include "hw/acpi/cxl.h"
+
 /* These are used to size the ACPI tables for -M pc-i440fx-1.7 and
  * -M pc-i440fx-2.0.  Even if the actual amount of AML generated grows
  * a little bit, there should be plenty of free space since the DSDT
@@ -1371,7 +1373,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 
 scope = aml_scope("\\_SB");
 if (type == CXL) {
-dev = aml_device("CXL%.01X", pci_bus_uid(bus));
+dev = aml_device("CXL%.01X", uid);
 } else {
 dev = aml_device("PC%.02X", bus_num);
 }
@@ -2277,6 +2279,8 @@ void acpi_build(AcpiBuildTables *tables, MachineState 
*machine)
   machine->nvdimms_state, machine->ram_slots);
 }
 
+cxl_build_cedt(table_offsets, tables_blob, tables->linker);
+
 acpi_add_table(table_offsets, tables_blob);
 build_waet(tables_blob, tables->linker);
 
diff --git a/hw/pci-bridge/pci_expander_bridge.c 
b/hw/pci-bridge/pci_expander_bridge.c
index af1450c69d..6458d5b76e 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -57,26 +57,6 @@ DECLARE_INSTANCE_CHECKER(PXBDev, PXB_DEV,
 DECLARE_INSTANCE_CHECKER(PXBDev, PXB_PCIE_DEV,
  TYPE_PXB_PCIE_DEVICE)
 
-#define TYPE_PXB_CXL_D

[RFC PATCH v2 27/32] tests/acpi: allow CEDT table addition

2021-01-05 Thread Ben Widawsky

Signed-off-by: Ben Widawsky 
---
 tests/data/acpi/pc/CEDT | 0
 tests/data/acpi/q35/CEDT| 0
 tests/qtest/bios-tables-test-allowed-diff.h | 2 ++
 3 files changed, 2 insertions(+)
 create mode 100644 tests/data/acpi/pc/CEDT
 create mode 100644 tests/data/acpi/q35/CEDT

diff --git a/tests/data/acpi/pc/CEDT b/tests/data/acpi/pc/CEDT
new file mode 100644
index 00..e69de29bb2
diff --git a/tests/data/acpi/q35/CEDT b/tests/data/acpi/q35/CEDT
new file mode 100644
index 00..e69de29bb2
diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..9b07f1e1ff 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,3 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/pc/CEDT",
+"tests/data/acpi/q35/CEDT",
-- 
2.30.0

[RFC PATCH v2 21/32] acpi/pxb/cxl: Reserve host bridge MMIO

2021-01-05 Thread Ben Widawsky

For all host bridges, reserve MMIO space with _CRS. The MMIO for the
host bridge lives in a magically hard coded space in the system's
physical address space. The standard mechanism to tell the OS about
regions which can't be used for host bridges is _CRS.

Signed-off-by: Ben Widawsky 
---
 hw/i386/acpi-build.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 26e4ddd025..16cde677a0 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -28,6 +28,7 @@
 #include "qemu/bitmap.h"
 #include "qemu/error-report.h"
 #include "hw/pci/pci.h"
+#include "hw/cxl/cxl.h"
 #include "hw/core/cpu.h"
 #include "target/i386/cpu.h"
 #include "hw/misc/pvpanic.h"
@@ -1194,7 +1195,7 @@ static void build_smb0(Aml *table, I2CBus *smbus, int 
devnr, int func)
 aml_append(table, scope);
 }
 
-enum { PCI, PCIE };
+enum { PCI, PCIE, CXL };
 static void init_pci_acpi(Aml *dev, int uid, int type)
 {
 if (type == PCI) {
@@ -1344,20 +1345,28 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 uint8_t bus_num = pci_bus_num(bus);
 uint8_t numa_node = pci_bus_numa_node(bus);
 int32_t uid = pci_bus_uid(bus);
+int type;
 
 /* look only for expander root buses */
 if (!pci_bus_is_root(bus)) {
 continue;
 }
 
+type = pci_bus_is_cxl(bus) ? CXL :
+ pci_bus_is_express(bus) ? PCIE : PCI;
+
 if (bus_num < root_bus_limit) {
 root_bus_limit = bus_num - 1;
 }
 
 scope = aml_scope("\\_SB");
-dev = aml_device("PC%.02X", bus_num);
+if (type == CXL) {
+dev = aml_device("CXL%.01X", pci_bus_uid(bus));
+} else {
+dev = aml_device("PC%.02X", bus_num);
+}
 aml_append(dev, aml_name_decl("_BBN", aml_int(bus_num)));
-init_pci_acpi(dev, uid, pci_bus_is_express(bus) ? PCIE : PCI);
+init_pci_acpi(dev, uid, type);
 
 if (numa_node != NUMA_NODE_UNASSIGNED) {
 aml_append(dev, aml_name_decl("_PXM", aml_int(numa_node)));
@@ -1368,6 +1377,13 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 aml_append(dev, aml_name_decl("_CRS", crs));
 aml_append(scope, dev);
 aml_append(dsdt, scope);
+
+/* Handle the ranges for the PXB expanders */
+if (type == CXL) {
+uint64_t base = CXL_HOST_BASE + uid * 0x1;
+crs_range_insert(crs_range_set.mem_ranges, base,
+ base + 0x1 - 1);
+}
 }
 }
 
-- 
2.30.0

1 2 >

1 - 100 of 184 matches

Mail list logo