date:20240304

Re: [PATCH v3 14/26] memory: Add Error** argument to .log_global*() handlers

2024-03-04 Thread Peter Xu

On Mon, Mar 04, 2024 at 01:28:32PM +0100, Cédric Le Goater wrote:
> @@ -2936,15 +2975,22 @@ void memory_global_dirty_log_start(unsigned int flags)
>  trace_global_dirty_changed(global_dirty_tracking);
>  
>  if (!old_flags) {
> -MEMORY_LISTENER_CALL_GLOBAL(log_global_start, Forward);
> +MEMORY_LISTENER_CALL_LOG_GLOBAL(log_global_start, Forward,
> +_err);
> +if (local_err) {
> +error_report_err(local_err);
> +return;

Returns here means global_dirty_tracking will keep the new value even if
it's not truly commited globally (in memory_region_transaction_commit()
later below).  I think it'll cause inconsistency: global_dirty_tracking
should reflect the global status of dirty tracking, and that should match
with the MR status cached in FlatViews (which is used in memory core to
reflect address space translations).

For some details on how that flag applied to each MR, feel free to have a
quick look in address_space_update_topology_pass() of the "else if (frold
&& frnew && flatrange_equal(frold, frnew))".

Here IIUC if to fully support a graceful failure (IIUC that is the goal for
VFIO.. and this op should be easily triggerable by the user), then we need
to do proper unwind on both:

  - Call proper log_global_stop() on those who has already been started
successfully before the current failed log_global_start(), then,

  - Reset global_dirty_tracking to old_flags before return

We may want to make sure trace_global_dirty_changed() is only called when
all things succeeded.

I don't have a strong opinion on whether do we need similar error report
interfaces for _stop() and _log_sync().  I'd still suggest the same that we
drop them to make the patch simpler, but only add such error reports for
log_global_start().  If they never get triggered they're dead code anyway,
so I don't think "having errp for all APIs" is a must-to-have at least to me.

Thanks,

> +}
>  memory_region_transaction_begin();
>  memory_region_update_pending = true;
>  memory_region_transaction_commit();
>  }
>  }

-- 
Peter Xu

Re: [PATCH v3 10/26] migration: Move cleanup after after error reporting in qemu_savevm_state_setup()

2024-03-04 Thread Cédric Le Goater


On 3/5/24 04:32, Peter Xu wrote:

On Mon, Mar 04, 2024 at 01:28:28PM +0100, Cédric Le Goater wrote:

This will help preserving the error set by .save_setup() handlers.

Signed-off-by: Cédric Le Goater 


IIUC this is about the next patch.  I got fully confused before reading
into the next one.  IMHO we can squash it into where it's used.


That's where the change was initially ... I thought extracting it in its
own patch would clarify. Oh well nevermind, I will put it back and add
a comment in the commit log.

Thanks,

C.





Thanks,


---
  migration/savevm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 
31ce9391d49c825d4ec835e26ac0246e192783a0..e400706e61e06d2d1d03a11aed14f30a243833f2
 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1740,10 +1740,10 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
  qemu_savevm_state_complete_precopy(f, false, false);
  ret = qemu_file_get_error(f);
  }
-qemu_savevm_state_cleanup();
  if (ret != 0) {
  error_setg_errno(errp, -ret, "Error while writing VM state");
  }
+qemu_savevm_state_cleanup();
  
  if (ret != 0) {

  status = MIGRATION_STATUS_FAILED;
--
2.44.0

Re: [PATCH v4 1/3] qga/commands-win32: Declare const qualifier before type

2024-03-04 Thread Yan Vugenfirer

On Mon, Mar 4, 2024 at 3:45 PM Konstantin Kostiuk  wrote:
>
> From: Philippe Mathieu-Daudé 
>
> Most of the code base use the 'const' qualifier *before*
> the type being qualified. Use the same style to unify.
>
> Signed-off-by: Philippe Mathieu-Daudé 
> Message-ID: <20240222152835.72095-2-phi...@linaro.org>
> Reviewed-by: Konstantin Kostiuk 
> Signed-off-by: Konstantin Kostiuk 
>
> ---
>  qga/commands-win32.c | 22 +++---
>  1 file changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/qga/commands-win32.c b/qga/commands-win32.c
> index a1015757d8..79b5a580c9 100644
> --- a/qga/commands-win32.c
> +++ b/qga/commands-win32.c
> @@ -2120,11 +2120,11 @@ GuestUserList *qmp_guest_get_users(Error **errp)
>  typedef struct _ga_matrix_lookup_t {
>  int major;
>  int minor;
> -char const *version;
> -char const *version_id;
> +const char *version;
> +const char *version_id;
>  } ga_matrix_lookup_t;
>
> -static ga_matrix_lookup_t const WIN_VERSION_MATRIX[2][7] = {
> +static const ga_matrix_lookup_t WIN_VERSION_MATRIX[2][7] = {
>  {
>  /* Desktop editions */
>  { 5, 0, "Microsoft Windows 2000",   "2000"},
> @@ -2148,18 +2148,18 @@ static ga_matrix_lookup_t const 
> WIN_VERSION_MATRIX[2][7] = {
>
>  typedef struct _ga_win_10_0_t {
>  int first_build;
> -char const *version;
> -char const *version_id;
> +const char *version;
> +const char *version_id;
>  } ga_win_10_0_t;
>
> -static ga_win_10_0_t const WIN_10_0_SERVER_VERSION_MATRIX[4] = {
> +static const ga_win_10_0_t WIN_10_0_SERVER_VERSION_MATRIX[4] = {
>  {14393, "Microsoft Windows Server 2016","2016"},
>  {17763, "Microsoft Windows Server 2019","2019"},
>  {20344, "Microsoft Windows Server 2022","2022"},
>  {0, 0}
>  };
>
> -static ga_win_10_0_t const WIN_10_0_CLIENT_VERSION_MATRIX[3] = {
> +static const ga_win_10_0_t WIN_10_0_CLIENT_VERSION_MATRIX[3] = {
>  {10240, "Microsoft Windows 10","10"},
>  {22000, "Microsoft Windows 11","11"},
>  {0, 0}
> @@ -2185,16 +2185,16 @@ static void ga_get_win_version(RTL_OSVERSIONINFOEXW 
> *info, Error **errp)
>  return;
>  }
>
> -static char *ga_get_win_name(OSVERSIONINFOEXW const *os_version, bool id)
> +static char *ga_get_win_name(const OSVERSIONINFOEXW *os_version, bool id)
>  {
>  DWORD major = os_version->dwMajorVersion;
>  DWORD minor = os_version->dwMinorVersion;
>  DWORD build = os_version->dwBuildNumber;
>  int tbl_idx = (os_version->wProductType != VER_NT_WORKSTATION);
> -ga_matrix_lookup_t const *table = WIN_VERSION_MATRIX[tbl_idx];
> -ga_win_10_0_t const *win_10_0_table = tbl_idx ?
> +const ga_matrix_lookup_t *table = WIN_VERSION_MATRIX[tbl_idx];
> +const ga_win_10_0_t *win_10_0_table = tbl_idx ?
>  WIN_10_0_SERVER_VERSION_MATRIX : WIN_10_0_CLIENT_VERSION_MATRIX;
> -ga_win_10_0_t const *win_10_0_version = NULL;
> +const ga_win_10_0_t *win_10_0_version = NULL;
>  while (table->version != NULL) {
>  if (major == 10 && minor == 0) {
>  while (win_10_0_table->version != NULL) {
> --
> 2.44.0
>

Reviewed-by: Yan Vugenfirer

Re: [PATCH v4 3/3] qga-win: Add support of Windows Server 2025 in get-osinfo command

2024-03-04 Thread Yan Vugenfirer

On Mon, Mar 4, 2024 at 3:45 PM Konstantin Kostiuk  wrote:
>
> From: Dehan Meng 
>
> Add support of Windows Server 2025 in get-osinfo command
>
> Signed-off-by: Dehan Meng 
> Message-ID: <20240222152835.72095-4-phi...@linaro.org>
> Signed-off-by: Philippe Mathieu-Daudé 
> Reviewed-by: Konstantin Kostiuk 
> Signed-off-by: Konstantin Kostiuk 
> ---
>  qga/commands-win32.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/qga/commands-win32.c b/qga/commands-win32.c
> index a830f1494e..d1cf1a87db 100644
> --- a/qga/commands-win32.c
> +++ b/qga/commands-win32.c
> @@ -2153,6 +2153,7 @@ static const ga_win_10_0_t 
> WIN_10_0_SERVER_VERSION_MATRIX[] = {
>  {14393, "Microsoft Windows Server 2016","2016"},
>  {17763, "Microsoft Windows Server 2019","2019"},
>  {20344, "Microsoft Windows Server 2022","2022"},
> +{26040, "MIcrosoft Windows Server 2025","2025"},
>  { }
>  };
>
> --
> 2.44.0
>

Reviewed-by: Yan Vugenfirer

Re: [PATCH v4 2/3] qga/commands-win32: Do not set matrix_lookup_t/win_10_0_t arrays size

2024-03-04 Thread Yan Vugenfirer

On Mon, Mar 4, 2024 at 3:45 PM Konstantin Kostiuk  wrote:
>
> From: Philippe Mathieu-Daudé 
>
> ga_get_win_name() iterates over all elements in the arrays by
> checking the 'version' field is non-NULL. Since the arrays are
> guarded by a NULL terminating element, we don't need to specify
> their size:
>
>   static char *ga_get_win_name(...)
>   {
>   ...
>   const ga_matrix_lookup_t *table = WIN_VERSION_MATRIX[tbl_idx];
>   const ga_win_10_0_t *win_10_0_table = ...
>   ...
>   while (table->version != NULL) {
> ^^^
>   while (win_10_0_table->version != NULL) {
>  ^^^
>
> This will simplify maintenance when adding new entries to these
> arrays.
>
> Split WIN_VERSION_MATRIX into WIN_CLIENT_VERSION_MATRIX and
> WIN_SERVER_VERSION_MATRIX because  multidimensional array must
> have bounds for all dimensions except the first.
>
> Signed-off-by: Philippe Mathieu-Daudé 
> Message-ID: <20240222152835.72095-3-phi...@linaro.org>
> Reviewed-by: Konstantin Kostiuk 
> Signed-off-by: Konstantin Kostiuk 
> ---
>  qga/commands-win32.c | 52 +---
>  1 file changed, 25 insertions(+), 27 deletions(-)
>
> diff --git a/qga/commands-win32.c b/qga/commands-win32.c
> index 79b5a580c9..a830f1494e 100644
> --- a/qga/commands-win32.c
> +++ b/qga/commands-win32.c
> @@ -2124,45 +2124,42 @@ typedef struct _ga_matrix_lookup_t {
>  const char *version_id;
>  } ga_matrix_lookup_t;
>
> -static const ga_matrix_lookup_t WIN_VERSION_MATRIX[2][7] = {
> -{
> -/* Desktop editions */
> -{ 5, 0, "Microsoft Windows 2000",   "2000"},
> -{ 5, 1, "Microsoft Windows XP", "xp"},
> -{ 6, 0, "Microsoft Windows Vista",  "vista"},
> -{ 6, 1, "Microsoft Windows 7"   "7"},
> -{ 6, 2, "Microsoft Windows 8",  "8"},
> -{ 6, 3, "Microsoft Windows 8.1","8.1"},
> -{ 0, 0, 0}
> -},{
> -/* Server editions */
> -{ 5, 2, "Microsoft Windows Server 2003","2003"},
> -{ 6, 0, "Microsoft Windows Server 2008","2008"},
> -{ 6, 1, "Microsoft Windows Server 2008 R2", "2008r2"},
> -{ 6, 2, "Microsoft Windows Server 2012","2012"},
> -{ 6, 3, "Microsoft Windows Server 2012 R2", "2012r2"},
> -{ 0, 0, 0},
> -{ 0, 0, 0}
> -}
> +static const ga_matrix_lookup_t WIN_CLIENT_VERSION_MATRIX[] = {
> +{ 5, 0, "Microsoft Windows 2000",   "2000"},
> +{ 5, 1, "Microsoft Windows XP", "xp"},
> +{ 6, 0, "Microsoft Windows Vista",  "vista"},
> +{ 6, 1, "Microsoft Windows 7"   "7"},
> +{ 6, 2, "Microsoft Windows 8",  "8"},
> +{ 6, 3, "Microsoft Windows 8.1","8.1"},
> +{ }
> +};
> +
> +static const ga_matrix_lookup_t WIN_SERVER_VERSION_MATRIX[] = {
> +{ 5, 2, "Microsoft Windows Server 2003","2003"},
> +{ 6, 0, "Microsoft Windows Server 2008","2008"},
> +{ 6, 1, "Microsoft Windows Server 2008 R2", "2008r2"},
> +{ 6, 2, "Microsoft Windows Server 2012","2012"},
> +{ 6, 3, "Microsoft Windows Server 2012 R2", "2012r2"},
> +{ },
>  };
>
>  typedef struct _ga_win_10_0_t {
>  int first_build;
> -const char *version;
> -const char *version_id;
> +char const *version;
> +char const *version_id;
>  } ga_win_10_0_t;
>
> -static const ga_win_10_0_t WIN_10_0_SERVER_VERSION_MATRIX[4] = {
> +static const ga_win_10_0_t WIN_10_0_SERVER_VERSION_MATRIX[] = {
>  {14393, "Microsoft Windows Server 2016","2016"},
>  {17763, "Microsoft Windows Server 2019","2019"},
>  {20344, "Microsoft Windows Server 2022","2022"},
> -{0, 0}
> +{ }
>  };
>
> -static const ga_win_10_0_t WIN_10_0_CLIENT_VERSION_MATRIX[3] = {
> +static const ga_win_10_0_t WIN_10_0_CLIENT_VERSION_MATRIX[] = {
>  {10240, "Microsoft Windows 10","10"},
>  {22000, "Microsoft Windows 11","11"},
> -{0, 0}
> +{ }
>  };
>
>  static void ga_get_win_version(RTL_OSVERSIONINFOEXW *info, Error **errp)
> @@ -2191,7 +2188,8 @@ static char *ga_get_win_name(const OSVERSIONINFOEXW 
> *os_version, bool id)
>  DWORD minor = os_version->dwMinorVersion;
>  DWORD build = os_version->dwBuildNumber;
>  int tbl_idx = (os_version->wProductType != VER_NT_WORKSTATION);
> -const ga_matrix_lookup_t *table = WIN_VERSION_MATRIX[tbl_idx];
> +const ga_matrix_lookup_t *table = tbl_idx ?
> +WIN_SERVER_VERSION_MATRIX : WIN_CLIENT_VERSION_MATRIX;
>  const ga_win_10_0_t *win_10_0_table = tbl_idx ?
>  WIN_10_0_SERVER_VERSION_MATRIX : WIN_10_0_CLIENT_VERSION_MATRIX;
>  const ga_win_10_0_t *win_10_0_version = NULL;
> --
> 2.44.0
>

Reviewed-by: Yan Vugenfirer

Re: [PATCH v3 05/26] migration: Add Error** argument to vmstate_save()

2024-03-04 Thread Prasad Pandit

On Mon, 4 Mar 2024 at 19:38, Cédric Le Goater  wrote:
> This will prepare ground for futur changes adding an Error** argument

* futur -> furure

>
> -static int vmstate_save(QEMUFile *f, SaveStateEntry *se, JSONWriter *vmdesc)
> +static int vmstate_save(QEMUFile *f, SaveStateEntry *se, JSONWriter *vmdesc,
> +Error **errp)
>  {
>  int ret;
> -Error *local_err = NULL;
> -MigrationState *s = migrate_get_current();
>
>  if ((!se->ops || !se->ops->save_state) && !se->vmsd) {
>  return 0;
> @@ -1034,10 +1033,9 @@ static int vmstate_save(QEMUFile *f, SaveStateEntry 
> *se, JSONWriter *vmdesc)
>  if (!se->vmsd) {
>  vmstate_save_old_style(f, se, vmdesc);
>  } else {
> -ret = vmstate_save_state_with_err(f, se->vmsd, se->opaque, vmdesc, 
> _err);
> +ret = vmstate_save_state_with_err(f, se->vmsd, se->opaque, vmdesc,
> +  errp);
>  if (ret) {
> -migrate_set_error(s, local_err);
> -error_report_err(local_err);
>  return ret;
>  }
>  }
> @@ -1324,8 +1322,10 @@ void qemu_savevm_state_setup(QEMUFile *f)
>  trace_savevm_state_setup();
>  QTAILQ_FOREACH(se, _state.handlers, entry) {
>  if (se->vmsd && se->vmsd->early_setup) {
> -ret = vmstate_save(f, se, ms->vmdesc);
> +ret = vmstate_save(f, se, ms->vmdesc, _err);
>  if (ret) {
> +migrate_set_error(ms, local_err);
> +error_report_err(local_err);
>  qemu_file_set_error(f, ret);
>  break;
>  }
> @@ -1540,6 +1540,7 @@ int 
> qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
>  JSONWriter *vmdesc = ms->vmdesc;
>  int vmdesc_len;
>  SaveStateEntry *se;
> +Error *local_err = NULL;
>  int ret;
>
>  QTAILQ_FOREACH(se, _state.handlers, entry) {
> @@ -1550,8 +1551,10 @@ int 
> qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
>
>  start_ts_each = qemu_clock_get_us(QEMU_CLOCK_REALTIME);
>
> -ret = vmstate_save(f, se, vmdesc);
> +ret = vmstate_save(f, se, vmdesc, _err);
>  if (ret) {
> +migrate_set_error(ms, local_err);
> +error_report_err(local_err);
>  qemu_file_set_error(f, ret);
>  return ret;
>  }
> @@ -1566,7 +1569,6 @@ int 
> qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
>   * bdrv_activate_all() on the other end won't fail. */
>  ret = bdrv_inactivate_all();
>  if (ret) {
> -Error *local_err = NULL;
>  error_setg(_err, "%s: bdrv_inactivate_all() failed (%d)",
> __func__, ret);
>  migrate_set_error(ms, local_err);
> @@ -1762,6 +1764,8 @@ void qemu_savevm_live_state(QEMUFile *f)
>
>  int qemu_save_device_state(QEMUFile *f)
>  {
> +MigrationState *ms = migrate_get_current();
> +Error *local_err = NULL;
>  SaveStateEntry *se;
>
>  if (!migration_in_colo_state()) {
> @@ -1776,8 +1780,10 @@ int qemu_save_device_state(QEMUFile *f)
>  if (se->is_ram) {
>  continue;
>  }
> -ret = vmstate_save(f, se, NULL);
> +ret = vmstate_save(f, se, NULL, _err);
>  if (ret) {
> +migrate_set_error(ms, local_err);
> +error_report_err(local_err);
>  return ret;
>  }
>  }
> --

Reviewed-by: Prasad Pandit 

Thank you.
---
  - Prasad

Re: [External] Re: [PATCH v1 0/1] Improved Memory Tier Creation for CPUless NUMA Nodes

2024-03-04 Thread Huang, Ying

"Ho-Ren (Jack) Chuang"  writes:

> On Mon, Mar 4, 2024 at 10:36 PM Huang, Ying  wrote:
>>
>> "Ho-Ren (Jack) Chuang"  writes:
>>
>> > On Sun, Mar 3, 2024 at 6:47 PM Huang, Ying  wrote:
>> >>
>> >> "Ho-Ren (Jack) Chuang"  writes:
>> >>
>> >> > The memory tiering component in the kernel is functionally useless for
>> >> > CPUless memory/non-DRAM devices like CXL1.1 type3 memory because the 
>> >> > nodes
>> >> > are lumped together in the DRAM tier.
>> >> > https://lore.kernel.org/linux-mm/ph0pr08mb7955e9f08ccb64f23963b5c3a8...@ph0pr08mb7955.namprd08.prod.outlook.com/T/
>> >>
>> >> I think that it's unfair to call it "useless".  Yes, it doesn't work if
>> >> the CXL memory device are not enumerate via drivers/dax/kmem.c.  So,
>> >> please be specific about in which cases it doesn't work instead of too
>> >> general "useless".
>> >>
>> >
>> > Thank you and I didn't mean anything specific. I simply reused phrases
>> > we discussed
>> > earlier in the previous patchset. I will change them to the following in 
>> > v2:
>> > "At boot time, current memory tiering assigns all detected memory nodes
>> > to the same DRAM tier. This results in CPUless memory/non-DRAM devices,
>> > such as CXL1.1 type3 memory, being unable to be assigned to the
>> > correct memory tier,
>> > leading to the inability to migrate pages between different types of 
>> > memory."
>> >
>> > Please see if this looks more specific.
>>
>> I don't think that the description above is accurate.  In fact, there
>> are 2 ways to enumerate the memory device,
>>
>> 1. Mark it as reserved memory (E820_TYPE_SOFT_RESERVED, etc.) in E820
>>table or something similar.
>>
>> 2. Mark it as normal memory (E820_TYPE_RAM) in E820 table or something
>>similar
>>
>> For 1, the memory device (including CXL memory) is onlined via
>> drivers/dax/kmem.c, so will be put in proper memory tiers.  For 2, the
>> memory device is indistinguishable with normal DRAM with current
>> implementation.  And this is what this patch is working on.
>>
>> Right?
>
> Good point! How about this?:
> "
> When a memory device, such as CXL1.1 type3 memory, is emulated as
> normal memory (E820_TYPE_RAM), the memory device is indistinguishable
> from normal DRAM in terms of memory tiering with the current implementation.
> The current memory tiering assigns all detected normal memory nodes
> to the same DRAM tier. This results in normal memory devices with
> different attributions being unable to be assigned to the correct memory tier,
> leading to the inability to migrate pages between different types of memory.
> "

Looks good me!  Thanks!

--
Best Regards,
Huang, Ying

[PATCH v2 03/13] contrib/elf2dmp: Continue even contexts are lacking

2024-03-04 Thread Akihiko Odaki

Let fill_context() continue even if it fails to fill contexts of some
CPUs. A dump may still contain valuable information even if it lacks
contexts of some CPUs due to dump corruption or a failure before
starting CPUs.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Peter Maydell 
---
 contrib/elf2dmp/main.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/contrib/elf2dmp/main.c b/contrib/elf2dmp/main.c
index 9b278f392e39..89bf4e23566b 100644
--- a/contrib/elf2dmp/main.c
+++ b/contrib/elf2dmp/main.c
@@ -336,7 +336,12 @@ static int fill_header(WinDumpHeader64 *hdr, struct 
pa_space *ps,
 return 0;
 }
 
-static int fill_context(KDDEBUGGER_DATA64 *kdbg,
+/*
+ * fill_context() continues even if it fails to fill contexts of some CPUs.
+ * A dump may still contain valuable information even if it lacks contexts of
+ * some CPUs due to dump corruption or a failure before starting CPUs.
+ */
+static void fill_context(KDDEBUGGER_DATA64 *kdbg,
 struct va_space *vs, QEMU_Elf *qe)
 {
 int i;
@@ -350,7 +355,7 @@ static int fill_context(KDDEBUGGER_DATA64 *kdbg,
 if (va_space_rw(vs, kdbg->KiProcessorBlock + sizeof(Prcb) * i,
 , sizeof(Prcb), 0)) {
 eprintf("Failed to read CPU #%d PRCB location\n", i);
-return 1;
+continue;
 }
 
 if (!Prcb) {
@@ -361,7 +366,7 @@ static int fill_context(KDDEBUGGER_DATA64 *kdbg,
 if (va_space_rw(vs, Prcb + kdbg->OffsetPrcbContext,
 , sizeof(Context), 0)) {
 eprintf("Failed to read CPU #%d ContextFrame location\n", i);
-return 1;
+continue;
 }
 
 printf("Filling context for CPU #%d...\n", i);
@@ -369,11 +374,9 @@ static int fill_context(KDDEBUGGER_DATA64 *kdbg,
 
 if (va_space_rw(vs, Context, , sizeof(ctx), 1)) {
 eprintf("Failed to fill CPU #%d context\n", i);
-return 1;
+continue;
 }
 }
-
-return 0;
 }
 
 static int pe_get_data_dir_entry(uint64_t base, void *start_addr, int idx,
@@ -619,9 +622,7 @@ int main(int argc, char *argv[])
 goto out_kdbg;
 }
 
-if (fill_context(kdbg, , _elf)) {
-goto out_kdbg;
-}
+fill_context(kdbg, , _elf);
 
 if (write_dump(, , argv[2])) {
 eprintf("Failed to save dump\n");

-- 
2.44.0

[PATCH v2 07/13] contrib/elf2dmp: Ensure segment fits in file

2024-03-04 Thread Akihiko Odaki

This makes elf2dmp more robust against corrupted inputs.

Signed-off-by: Akihiko Odaki 
---
 contrib/elf2dmp/addrspace.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/contrib/elf2dmp/addrspace.c b/contrib/elf2dmp/addrspace.c
index e01860d15b07..81295a11534a 100644
--- a/contrib/elf2dmp/addrspace.c
+++ b/contrib/elf2dmp/addrspace.c
@@ -88,11 +88,12 @@ void pa_space_create(struct pa_space *ps, QEMU_Elf 
*qemu_elf)
 ps->block = g_new(struct pa_block, ps->block_nr);
 
 for (i = 0; i < phdr_nr; i++) {
-if (phdr[i].p_type == PT_LOAD) {
+if (phdr[i].p_type == PT_LOAD && phdr[i].p_offset < qemu_elf->size) {
 ps->block[block_i] = (struct pa_block) {
 .addr = (uint8_t *)qemu_elf->map + phdr[i].p_offset,
 .paddr = phdr[i].p_paddr,
-.size = phdr[i].p_filesz,
+.size = MIN(phdr[i].p_filesz,
+qemu_elf->size - phdr[i].p_offset),
 };
 pa_block_align(>block[block_i]);
 block_i = ps->block[block_i].size ? (block_i + 1) : block_i;

-- 
2.44.0

[PATCH v2 05/13] contrib/elf2dmp: Always check for PA resolution failure

2024-03-04 Thread Akihiko Odaki

Not checking PA resolution failure can result in NULL deference.

Signed-off-by: Akihiko Odaki 
---
 contrib/elf2dmp/addrspace.c | 46 -
 1 file changed, 29 insertions(+), 17 deletions(-)

diff --git a/contrib/elf2dmp/addrspace.c b/contrib/elf2dmp/addrspace.c
index c995c723ae80..e01860d15b07 100644
--- a/contrib/elf2dmp/addrspace.c
+++ b/contrib/elf2dmp/addrspace.c
@@ -22,7 +22,7 @@ static struct pa_block *pa_space_find_block(struct pa_space 
*ps, uint64_t pa)
 return NULL;
 }
 
-static uint8_t *pa_space_resolve(struct pa_space *ps, uint64_t pa)
+static void *pa_space_resolve(struct pa_space *ps, uint64_t pa)
 {
 struct pa_block *block = pa_space_find_block(ps, pa);
 
@@ -33,6 +33,19 @@ static uint8_t *pa_space_resolve(struct pa_space *ps, 
uint64_t pa)
 return block->addr + (pa - block->paddr);
 }
 
+static bool pa_space_read64(struct pa_space *ps, uint64_t pa, uint64_t *value)
+{
+uint64_t *resolved = pa_space_resolve(ps, pa);
+
+if (!resolved) {
+return false;
+}
+
+*value = *resolved;
+
+return true;
+}
+
 static void pa_block_align(struct pa_block *b)
 {
 uint64_t low_align = ((b->paddr - 1) | ELF2DMP_PAGE_MASK) + 1 - b->paddr;
@@ -106,19 +119,20 @@ void va_space_create(struct va_space *vs, struct pa_space 
*ps, uint64_t dtb)
 va_space_set_dtb(vs, dtb);
 }
 
-static uint64_t get_pml4e(struct va_space *vs, uint64_t va)
+static bool get_pml4e(struct va_space *vs, uint64_t va, uint64_t *value)
 {
 uint64_t pa = (vs->dtb & 0xff000) | ((va & 0xff80) >> 36);
 
-return *(uint64_t *)pa_space_resolve(vs->ps, pa);
+return pa_space_read64(vs->ps, pa, value);
 }
 
-static uint64_t get_pdpi(struct va_space *vs, uint64_t va, uint64_t pml4e)
+static bool get_pdpi(struct va_space *vs, uint64_t va, uint64_t pml4e,
+uint64_t *value)
 {
 uint64_t pdpte_paddr = (pml4e & 0xff000) |
 ((va & 0x7FC000) >> 27);
 
-return *(uint64_t *)pa_space_resolve(vs->ps, pdpte_paddr);
+return pa_space_read64(vs->ps, pdpte_paddr, value);
 }
 
 static uint64_t pde_index(uint64_t va)
@@ -131,11 +145,12 @@ static uint64_t pdba_base(uint64_t pdpe)
 return pdpe & 0xFF000;
 }
 
-static uint64_t get_pgd(struct va_space *vs, uint64_t va, uint64_t pdpe)
+static bool get_pgd(struct va_space *vs, uint64_t va, uint64_t pdpe,
+   uint64_t *value)
 {
 uint64_t pgd_entry = pdba_base(pdpe) + pde_index(va) * 8;
 
-return *(uint64_t *)pa_space_resolve(vs->ps, pgd_entry);
+return pa_space_read64(vs->ps, pgd_entry, value);
 }
 
 static uint64_t pte_index(uint64_t va)
@@ -148,11 +163,12 @@ static uint64_t ptba_base(uint64_t pde)
 return pde & 0xFF000;
 }
 
-static uint64_t get_pte(struct va_space *vs, uint64_t va, uint64_t pgd)
+static bool get_pte(struct va_space *vs, uint64_t va, uint64_t pgd,
+   uint64_t *value)
 {
 uint64_t pgd_val = ptba_base(pgd) + pte_index(va) * 8;
 
-return *(uint64_t *)pa_space_resolve(vs->ps, pgd_val);
+return pa_space_read64(vs->ps, pgd_val, value);
 }
 
 static uint64_t get_paddr(uint64_t va, uint64_t pte)
@@ -184,13 +200,11 @@ static uint64_t va_space_va2pa(struct va_space *vs, 
uint64_t va)
 {
 uint64_t pml4e, pdpe, pgd, pte;
 
-pml4e = get_pml4e(vs, va);
-if (!is_present(pml4e)) {
+if (!get_pml4e(vs, va, ) || !is_present(pml4e)) {
 return INVALID_PA;
 }
 
-pdpe = get_pdpi(vs, va, pml4e);
-if (!is_present(pdpe)) {
+if (!get_pdpi(vs, va, pml4e, ) || !is_present(pdpe)) {
 return INVALID_PA;
 }
 
@@ -198,8 +212,7 @@ static uint64_t va_space_va2pa(struct va_space *vs, 
uint64_t va)
 return get_1GB_paddr(va, pdpe);
 }
 
-pgd = get_pgd(vs, va, pdpe);
-if (!is_present(pgd)) {
+if (!get_pgd(vs, va, pdpe, ) || !is_present(pgd)) {
 return INVALID_PA;
 }
 
@@ -207,8 +220,7 @@ static uint64_t va_space_va2pa(struct va_space *vs, 
uint64_t va)
 return get_2MB_paddr(va, pgd);
 }
 
-pte = get_pte(vs, va, pgd);
-if (!is_present(pte)) {
+if (!get_pte(vs, va, pgd, ) || !is_present(pte)) {
 return INVALID_PA;
 }
 

-- 
2.44.0

[PATCH v2 08/13] contrib/elf2dmp: Use lduw_le_p() to read PDB

2024-03-04 Thread Akihiko Odaki

This resolved UBSan warnings.

Signed-off-by: Akihiko Odaki 
---
 contrib/elf2dmp/pdb.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/contrib/elf2dmp/pdb.c b/contrib/elf2dmp/pdb.c
index 1c5051425185..492aca4434c8 100644
--- a/contrib/elf2dmp/pdb.c
+++ b/contrib/elf2dmp/pdb.c
@@ -19,6 +19,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/bswap.h"
 
 #include "pdb.h"
 #include "err.h"
@@ -186,7 +187,7 @@ static bool pdb_init_symbols(struct pdb_reader *r)
 
 r->symbols = symbols;
 
-r->segments = *(uint16_t *)((const char *)symbols + sizeof(PDB_SYMBOLS) +
+r->segments = lduw_le_p((const char *)symbols + sizeof(PDB_SYMBOLS) +
 symbols->module_size + symbols->offset_size +
 symbols->hash_size + symbols->srcmodule_size +
 symbols->pdbimport_size + symbols->unknown2_size +

-- 
2.44.0

[PATCH v2 06/13] contrib/elf2dmp: Always destroy PA space

2024-03-04 Thread Akihiko Odaki

Destroy PA space even if paging base couldn't be found, fixing memory
leak.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Peter Maydell 
---
 contrib/elf2dmp/main.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/contrib/elf2dmp/main.c b/contrib/elf2dmp/main.c
index 140ac6e00cfe..25cf0fdff724 100644
--- a/contrib/elf2dmp/main.c
+++ b/contrib/elf2dmp/main.c
@@ -550,7 +550,7 @@ int main(int argc, char *argv[])
 va_space_create(, , state->cr[3]);
 if (!fix_dtb(, _elf)) {
 eprintf("Failed to find paging base\n");
-goto out_elf;
+goto out_ps;
 }
 
 printf("CPU #0 IDT is at 0x%016"PRIx64"\n", state->idt.base);
@@ -635,7 +635,6 @@ out_pdb_file:
 unlink(PDB_NAME);
 out_ps:
 pa_space_destroy();
-out_elf:
 QEMU_Elf_exit(_elf);
 
 return err;

-- 
2.44.0

[PATCH v2 12/13] contrib/elf2dmp: Use GPtrArray

2024-03-04 Thread Akihiko Odaki

This removes the need to enumarate QEMUCPUState twice and saves code.

Signed-off-by: Akihiko Odaki 
---
 contrib/elf2dmp/qemu_elf.c | 25 -
 1 file changed, 8 insertions(+), 17 deletions(-)

diff --git a/contrib/elf2dmp/qemu_elf.c b/contrib/elf2dmp/qemu_elf.c
index a22c057d3ec3..7d896cac5b15 100644
--- a/contrib/elf2dmp/qemu_elf.c
+++ b/contrib/elf2dmp/qemu_elf.c
@@ -66,7 +66,7 @@ static bool init_states(QEMU_Elf *qe)
 Elf64_Nhdr *start = (void *)((uint8_t *)qe->map + phdr[0].p_offset);
 Elf64_Nhdr *end = (void *)((uint8_t *)start + phdr[0].p_memsz);
 Elf64_Nhdr *nhdr;
-size_t cpu_nr = 0;
+GPtrArray *states;
 
 if (phdr[0].p_type != PT_NOTE) {
 eprintf("Failed to find PT_NOTE\n");
@@ -74,38 +74,29 @@ static bool init_states(QEMU_Elf *qe)
 }
 
 qe->has_kernel_gs_base = 1;
+states = g_ptr_array_new();
 
 for (nhdr = start; nhdr < end; nhdr = nhdr_get_next(nhdr)) {
 if (!strcmp(nhdr_get_name(nhdr), QEMU_NOTE_NAME)) {
 QEMUCPUState *state = nhdr_get_desc(nhdr);
 
 if (state->size < sizeof(*state)) {
-eprintf("CPU #%zu: QEMU CPU state size %u doesn't match\n",
-cpu_nr, state->size);
+eprintf("CPU #%u: QEMU CPU state size %u doesn't match\n",
+states->len, state->size);
 /*
  * We assume either every QEMU CPU state has KERNEL_GS_BASE or
  * no one has.
  */
 qe->has_kernel_gs_base = 0;
 }
-cpu_nr++;
+g_ptr_array_add(states, state);
 }
 }
 
-printf("%zu CPU states has been found\n", cpu_nr);
+printf("%u CPU states has been found\n", states->len);
 
-qe->state = g_new(QEMUCPUState*, cpu_nr);
-
-cpu_nr = 0;
-
-for (nhdr = start; nhdr < end; nhdr = nhdr_get_next(nhdr)) {
-if (!strcmp(nhdr_get_name(nhdr), QEMU_NOTE_NAME)) {
-qe->state[cpu_nr] = nhdr_get_desc(nhdr);
-cpu_nr++;
-}
-}
-
-qe->state_nr = cpu_nr;
+qe->state_nr = states->len;
+qe->state = (void *)g_ptr_array_free(states, FALSE);
 
 return true;
 }

-- 
2.44.0

[PATCH v2 13/13] contrib/elf2dmp: Clamp QEMU note to file size

2024-03-04 Thread Akihiko Odaki

This fixes crashes with truncated dumps.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2202
Signed-off-by: Akihiko Odaki 
---
 contrib/elf2dmp/qemu_elf.c | 87 +-
 1 file changed, 55 insertions(+), 32 deletions(-)

diff --git a/contrib/elf2dmp/qemu_elf.c b/contrib/elf2dmp/qemu_elf.c
index 7d896cac5b15..8d750adf904a 100644
--- a/contrib/elf2dmp/qemu_elf.c
+++ b/contrib/elf2dmp/qemu_elf.c
@@ -6,6 +6,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/host-utils.h"
 #include "err.h"
 #include "qemu_elf.h"
 
@@ -15,36 +16,11 @@
 #define ROUND_UP(n, d) (((n) + (d) - 1) & -(0 ? (n) : (d)))
 #endif
 
-#ifndef DIV_ROUND_UP
-#define DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d))
-#endif
-
-#define ELF_NOTE_SIZE(hdr_size, name_size, desc_size)   \
-((DIV_ROUND_UP((hdr_size), 4) + \
-  DIV_ROUND_UP((name_size), 4) +\
-  DIV_ROUND_UP((desc_size), 4)) * 4)
-
 int is_system(QEMUCPUState *s)
 {
 return s->gs.base >> 63;
 }
 
-static char *nhdr_get_name(Elf64_Nhdr *nhdr)
-{
-return (char *)nhdr + ROUND_UP(sizeof(*nhdr), 4);
-}
-
-static void *nhdr_get_desc(Elf64_Nhdr *nhdr)
-{
-return nhdr_get_name(nhdr) + ROUND_UP(nhdr->n_namesz, 4);
-}
-
-static Elf64_Nhdr *nhdr_get_next(Elf64_Nhdr *nhdr)
-{
-return (void *)((uint8_t *)nhdr + ELF_NOTE_SIZE(sizeof(*nhdr),
-nhdr->n_namesz, nhdr->n_descsz));
-}
-
 Elf64_Phdr *elf64_getphdr(void *map)
 {
 Elf64_Ehdr *ehdr = map;
@@ -60,13 +36,35 @@ Elf64_Half elf_getphdrnum(void *map)
 return ehdr->e_phnum;
 }
 
+static bool advance_note_offset(uint64_t *offsetp, uint64_t size, uint64_t end)
+{
+uint64_t offset = *offsetp;
+
+if (uadd64_overflow(offset, size, ) || offset > UINT64_MAX - 3) {
+return false;
+}
+
+offset = ROUND_UP(offset, 4);
+
+if (offset > end) {
+return false;
+}
+
+*offsetp = offset;
+
+return true;
+}
+
 static bool init_states(QEMU_Elf *qe)
 {
 Elf64_Phdr *phdr = elf64_getphdr(qe->map);
-Elf64_Nhdr *start = (void *)((uint8_t *)qe->map + phdr[0].p_offset);
-Elf64_Nhdr *end = (void *)((uint8_t *)start + phdr[0].p_memsz);
 Elf64_Nhdr *nhdr;
 GPtrArray *states;
+QEMUCPUState *state;
+uint32_t state_size;
+uint64_t offset;
+uint64_t end_offset;
+char *name;
 
 if (phdr[0].p_type != PT_NOTE) {
 eprintf("Failed to find PT_NOTE\n");
@@ -74,15 +72,40 @@ static bool init_states(QEMU_Elf *qe)
 }
 
 qe->has_kernel_gs_base = 1;
+offset = phdr[0].p_offset;
 states = g_ptr_array_new();
 
-for (nhdr = start; nhdr < end; nhdr = nhdr_get_next(nhdr)) {
-if (!strcmp(nhdr_get_name(nhdr), QEMU_NOTE_NAME)) {
-QEMUCPUState *state = nhdr_get_desc(nhdr);
+if (uadd64_overflow(offset, phdr[0].p_memsz, _offset) ||
+end_offset > qe->size) {
+end_offset = qe->size;
+}
+
+while (offset < end_offset) {
+nhdr = (void *)((uint8_t *)qe->map + offset);
+
+if (!advance_note_offset(, sizeof(*nhdr), end_offset)) {
+break;
+}
+
+name = (char *)qe->map + offset;
+
+if (!advance_note_offset(, nhdr->n_namesz, end_offset)) {
+break;
+}
+
+state = (void *)((uint8_t *)qe->map + offset);
+
+if (!advance_note_offset(, nhdr->n_descsz, end_offset)) {
+break;
+}
+
+if (!strcmp(name, QEMU_NOTE_NAME) &&
+nhdr->n_descsz >= offsetof(QEMUCPUState, kernel_gs_base)) {
+state_size = MIN(state->size, nhdr->n_descsz);
 
-if (state->size < sizeof(*state)) {
+if (state_size < sizeof(*state)) {
 eprintf("CPU #%u: QEMU CPU state size %u doesn't match\n",
-states->len, state->size);
+states->len, state_size);
 /*
  * We assume either every QEMU CPU state has KERNEL_GS_BASE or
  * no one has.

-- 
2.44.0

[PATCH v2 11/13] contrib/elf2dmp: Build only for little endian host

2024-03-04 Thread Akihiko Odaki

elf2dmp assumes little endian host in many places.

Signed-off-by: Akihiko Odaki 
---
 contrib/elf2dmp/meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/elf2dmp/meson.build b/contrib/elf2dmp/meson.build
index 6707d43c4fa5..046569861f7a 100644
--- a/contrib/elf2dmp/meson.build
+++ b/contrib/elf2dmp/meson.build
@@ -1,4 +1,4 @@
-if curl.found()
+if curl.found() and host_machine.endian() == 'little'
   executable('elf2dmp', files('main.c', 'addrspace.c', 'download.c', 'pdb.c', 
'qemu_elf.c'), genh,
  dependencies: [glib, curl],
  install: true)

-- 
2.44.0

[PATCH v2 10/13] MAINTAINERS: Add Akihiko Odaki as a elf2dmp reviewer

2024-03-04 Thread Akihiko Odaki

Signed-off-by: Akihiko Odaki 
Reviewed-by: Peter Maydell 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 65dfdc9677e4..d25403f3709b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3583,6 +3583,7 @@ F: util/iova-tree.c
 
 elf2dmp
 M: Viktor Prutyanov 
+R: Akihiko Odaki 
 S: Maintained
 F: contrib/elf2dmp/
 

-- 
2.44.0

[PATCH v2 09/13] contrib/elf2dmp: Use rol64() to decode

2024-03-04 Thread Akihiko Odaki

rol64() is roubust against too large shift values and fixes UBSan
warnings.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Peter Maydell 
---
 contrib/elf2dmp/main.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/contrib/elf2dmp/main.c b/contrib/elf2dmp/main.c
index 25cf0fdff724..20547fd8f819 100644
--- a/contrib/elf2dmp/main.c
+++ b/contrib/elf2dmp/main.c
@@ -6,6 +6,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/bitops.h"
 
 #include "err.h"
 #include "addrspace.h"
@@ -47,11 +48,6 @@ static const uint64_t SharedUserData = 0xf780;
 s ? printf(#s" = 0x%016"PRIx64"\n", s) :\
 eprintf("Failed to resolve "#s"\n"), s)
 
-static uint64_t rol(uint64_t x, uint64_t y)
-{
-return (x << y) | (x >> (64 - y));
-}
-
 /*
  * Decoding algorithm can be found in Volatility project
  */
@@ -64,7 +60,7 @@ static void kdbg_decode(uint64_t *dst, uint64_t *src, size_t 
size,
 uint64_t block;
 
 block = src[i];
-block = rol(block ^ kwn, (uint8_t)kwn);
+block = rol64(block ^ kwn, kwn);
 block = __builtin_bswap64(block ^ kdbe) ^ kwa;
 dst[i] = block;
 }

-- 
2.44.0

[PATCH v2 01/13] contrib/elf2dmp: Remove unnecessary err flags

2024-03-04 Thread Akihiko Odaki

They are always evaluated to 1.

Signed-off-by: Akihiko Odaki 
---
 contrib/elf2dmp/pdb.c | 14 +++---
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/contrib/elf2dmp/pdb.c b/contrib/elf2dmp/pdb.c
index 40991f5f4c34..abf17c2e7c12 100644
--- a/contrib/elf2dmp/pdb.c
+++ b/contrib/elf2dmp/pdb.c
@@ -177,7 +177,6 @@ static int pdb_init_segments(struct pdb_reader *r)
 
 static int pdb_init_symbols(struct pdb_reader *r)
 {
-int err = 0;
 PDB_SYMBOLS *symbols;
 
 symbols = pdb_ds_read_file(r, 3);
@@ -196,7 +195,6 @@ static int pdb_init_symbols(struct pdb_reader *r)
 /* Read global symbol table */
 r->modimage = pdb_ds_read_file(r, symbols->gsym_file);
 if (!r->modimage) {
-err = 1;
 goto out_symbols;
 }
 
@@ -205,7 +203,7 @@ static int pdb_init_symbols(struct pdb_reader *r)
 out_symbols:
 g_free(symbols);
 
-return err;
+return 1;
 }
 
 static int pdb_reader_ds_init(struct pdb_reader *r, PDB_DS_HEADER *hdr)
@@ -228,7 +226,6 @@ static int pdb_reader_ds_init(struct pdb_reader *r, 
PDB_DS_HEADER *hdr)
 
 static int pdb_reader_init(struct pdb_reader *r, void *data)
 {
-int err = 0;
 const char pdb7[] = "Microsoft C/C++ MSF 7.00";
 
 if (memcmp(data, pdb7, sizeof(pdb7) - 1)) {
@@ -241,17 +238,14 @@ static int pdb_reader_init(struct pdb_reader *r, void 
*data)
 
 r->ds.root = pdb_ds_read_file(r, 1);
 if (!r->ds.root) {
-err = 1;
 goto out_ds;
 }
 
 if (pdb_init_symbols(r)) {
-err = 1;
 goto out_root;
 }
 
 if (pdb_init_segments(r)) {
-err = 1;
 goto out_sym;
 }
 
@@ -264,7 +258,7 @@ out_root:
 out_ds:
 pdb_reader_ds_exit(r);
 
-return err;
+return 1;
 }
 
 static void pdb_reader_exit(struct pdb_reader *r)
@@ -278,7 +272,6 @@ static void pdb_reader_exit(struct pdb_reader *r)
 int pdb_init_from_file(const char *name, struct pdb_reader *reader)
 {
 GError *gerr = NULL;
-int err = 0;
 void *map;
 
 reader->gmf = g_mapped_file_new(name, TRUE, );
@@ -291,7 +284,6 @@ int pdb_init_from_file(const char *name, struct pdb_reader 
*reader)
 reader->file_size = g_mapped_file_get_length(reader->gmf);
 map = g_mapped_file_get_contents(reader->gmf);
 if (pdb_reader_init(reader, map)) {
-err = 1;
 goto out_unmap;
 }
 
@@ -300,7 +292,7 @@ int pdb_init_from_file(const char *name, struct pdb_reader 
*reader)
 out_unmap:
 g_mapped_file_unref(reader->gmf);
 
-return err;
+return 1;
 }
 
 void pdb_exit(struct pdb_reader *reader)

-- 
2.44.0

[PATCH v2 02/13] contrib/elf2dmp: Assume error by default

2024-03-04 Thread Akihiko Odaki

A common construct in contrib/elf2dmp is to set "err" flag and goto
in error paths. In such a construct, there is only one successful path
while there are several error paths, so it will be more simpler to
initialize "err" flag set, and clear it in the successful path.

Signed-off-by: Akihiko Odaki 
---
 contrib/elf2dmp/download.c |  4 +---
 contrib/elf2dmp/main.c | 15 +++
 2 files changed, 4 insertions(+), 15 deletions(-)

diff --git a/contrib/elf2dmp/download.c b/contrib/elf2dmp/download.c
index bd7650a7a27f..902dc04ffa5c 100644
--- a/contrib/elf2dmp/download.c
+++ b/contrib/elf2dmp/download.c
@@ -11,7 +11,7 @@
 
 int download_url(const char *name, const char *url)
 {
-int err = 0;
+int err = 1;
 FILE *file;
 CURL *curl = curl_easy_init();
 
@@ -21,7 +21,6 @@ int download_url(const char *name, const char *url)
 
 file = fopen(name, "wb");
 if (!file) {
-err = 1;
 goto out_curl;
 }
 
@@ -33,7 +32,6 @@ int download_url(const char *name, const char *url)
 || curl_easy_perform(curl) != CURLE_OK) {
 unlink(name);
 fclose(file);
-err = 1;
 } else {
 err = fclose(file);
 }
diff --git a/contrib/elf2dmp/main.c b/contrib/elf2dmp/main.c
index cbc38a7c103a..9b278f392e39 100644
--- a/contrib/elf2dmp/main.c
+++ b/contrib/elf2dmp/main.c
@@ -511,7 +511,7 @@ static void pe_get_pdb_symstore_hash(OMFSignatureRSDS 
*rsds, char *hash)
 
 int main(int argc, char *argv[])
 {
-int err = 0;
+int err = 1;
 QEMU_Elf qemu_elf;
 struct pa_space ps;
 struct va_space vs;
@@ -542,7 +542,6 @@ int main(int argc, char *argv[])
 
 if (pa_space_create(, _elf)) {
 eprintf("Failed to initialize physical address space\n");
-err = 1;
 goto out_elf;
 }
 
@@ -552,7 +551,6 @@ int main(int argc, char *argv[])
 va_space_create(, , state->cr[3]);
 if (fix_dtb(, _elf)) {
 eprintf("Failed to find paging base\n");
-err = 1;
 goto out_elf;
 }
 
@@ -561,7 +559,6 @@ int main(int argc, char *argv[])
 if (va_space_rw(, state->idt.base,
 _idt_desc, sizeof(first_idt_desc), 0)) {
 eprintf("Failed to get CPU #0 IDT[0]\n");
-err = 1;
 goto out_ps;
 }
 printf("CPU #0 IDT[0] -> 0x%016"PRIx64"\n", idt_desc_addr(first_idt_desc));
@@ -586,7 +583,6 @@ int main(int argc, char *argv[])
 
 if (!kernel_found) {
 eprintf("Failed to find NT kernel image\n");
-err = 1;
 goto out_ps;
 }
 
@@ -600,45 +596,40 @@ int main(int argc, char *argv[])
 
 if (download_url(PDB_NAME, pdb_url)) {
 eprintf("Failed to download PDB file\n");
-err = 1;
 goto out_ps;
 }
 
 if (pdb_init_from_file(PDB_NAME, )) {
 eprintf("Failed to initialize PDB reader\n");
-err = 1;
 goto out_pdb_file;
 }
 
 if (!SYM_RESOLVE(KernBase, , KdDebuggerDataBlock) ||
 !SYM_RESOLVE(KernBase, , KdVersionBlock)) {
-err = 1;
 goto out_pdb;
 }
 
 kdbg = get_kdbg(KernBase, , , KdDebuggerDataBlock);
 if (!kdbg) {
-err = 1;
 goto out_pdb;
 }
 
 if (fill_header(, , , KdDebuggerDataBlock, kdbg,
 KdVersionBlock, qemu_elf.state_nr)) {
-err = 1;
 goto out_kdbg;
 }
 
 if (fill_context(kdbg, , _elf)) {
-err = 1;
 goto out_kdbg;
 }
 
 if (write_dump(, , argv[2])) {
 eprintf("Failed to save dump\n");
-err = 1;
 goto out_kdbg;
 }
 
+err = 0;
+
 out_kdbg:
 g_free(kdbg);
 out_pdb:

-- 
2.44.0

[PATCH v2 04/13] contrib/elf2dmp: Conform to the error reporting pattern

2024-03-04 Thread Akihiko Odaki

include/qapi/error.h says:
> We recommend
> * bool-valued functions return true on success / false on failure,
> ...

Signed-off-by: Akihiko Odaki 
---
 contrib/elf2dmp/addrspace.h |   6 +--
 contrib/elf2dmp/download.h  |   2 +-
 contrib/elf2dmp/pdb.h   |   2 +-
 contrib/elf2dmp/qemu_elf.h  |   2 +-
 contrib/elf2dmp/addrspace.c |  12 ++---
 contrib/elf2dmp/download.c  |  10 ++--
 contrib/elf2dmp/main.c  | 114 +---
 contrib/elf2dmp/pdb.c   |  50 +--
 contrib/elf2dmp/qemu_elf.c  |  32 ++---
 9 files changed, 112 insertions(+), 118 deletions(-)

diff --git a/contrib/elf2dmp/addrspace.h b/contrib/elf2dmp/addrspace.h
index 039c70c5b079..2ad30a9da48a 100644
--- a/contrib/elf2dmp/addrspace.h
+++ b/contrib/elf2dmp/addrspace.h
@@ -33,13 +33,13 @@ struct va_space {
 struct pa_space *ps;
 };
 
-int pa_space_create(struct pa_space *ps, QEMU_Elf *qemu_elf);
+void pa_space_create(struct pa_space *ps, QEMU_Elf *qemu_elf);
 void pa_space_destroy(struct pa_space *ps);
 
 void va_space_create(struct va_space *vs, struct pa_space *ps, uint64_t dtb);
 void va_space_set_dtb(struct va_space *vs, uint64_t dtb);
 void *va_space_resolve(struct va_space *vs, uint64_t va);
-int va_space_rw(struct va_space *vs, uint64_t addr,
-void *buf, size_t size, int is_write);
+bool va_space_rw(struct va_space *vs, uint64_t addr,
+ void *buf, size_t size, int is_write);
 
 #endif /* ADDRSPACE_H */
diff --git a/contrib/elf2dmp/download.h b/contrib/elf2dmp/download.h
index 5c274925f7aa..f65adb5d0894 100644
--- a/contrib/elf2dmp/download.h
+++ b/contrib/elf2dmp/download.h
@@ -8,6 +8,6 @@
 #ifndef DOWNLOAD_H
 #define DOWNLOAD_H
 
-int download_url(const char *name, const char *url);
+bool download_url(const char *name, const char *url);
 
 #endif /* DOWNLOAD_H */
diff --git a/contrib/elf2dmp/pdb.h b/contrib/elf2dmp/pdb.h
index 2a50da56ac96..feddf1862f08 100644
--- a/contrib/elf2dmp/pdb.h
+++ b/contrib/elf2dmp/pdb.h
@@ -233,7 +233,7 @@ struct pdb_reader {
 size_t segs_size;
 };
 
-int pdb_init_from_file(const char *name, struct pdb_reader *reader);
+bool pdb_init_from_file(const char *name, struct pdb_reader *reader);
 void pdb_exit(struct pdb_reader *reader);
 uint64_t pdb_resolve(uint64_t img_base, struct pdb_reader *r, const char 
*name);
 uint64_t pdb_find_public_v3_symbol(struct pdb_reader *reader, const char 
*name);
diff --git a/contrib/elf2dmp/qemu_elf.h b/contrib/elf2dmp/qemu_elf.h
index afa75f10b2d2..adc50238b46b 100644
--- a/contrib/elf2dmp/qemu_elf.h
+++ b/contrib/elf2dmp/qemu_elf.h
@@ -42,7 +42,7 @@ typedef struct QEMU_Elf {
 int has_kernel_gs_base;
 } QEMU_Elf;
 
-int QEMU_Elf_init(QEMU_Elf *qe, const char *filename);
+bool QEMU_Elf_init(QEMU_Elf *qe, const char *filename);
 void QEMU_Elf_exit(QEMU_Elf *qe);
 
 Elf64_Phdr *elf64_getphdr(void *map);
diff --git a/contrib/elf2dmp/addrspace.c b/contrib/elf2dmp/addrspace.c
index 6f608a517b1e..c995c723ae80 100644
--- a/contrib/elf2dmp/addrspace.c
+++ b/contrib/elf2dmp/addrspace.c
@@ -57,7 +57,7 @@ static void pa_block_align(struct pa_block *b)
 b->paddr += low_align;
 }
 
-int pa_space_create(struct pa_space *ps, QEMU_Elf *qemu_elf)
+void pa_space_create(struct pa_space *ps, QEMU_Elf *qemu_elf)
 {
 Elf64_Half phdr_nr = elf_getphdrnum(qemu_elf->map);
 Elf64_Phdr *phdr = elf64_getphdr(qemu_elf->map);
@@ -87,8 +87,6 @@ int pa_space_create(struct pa_space *ps, QEMU_Elf *qemu_elf)
 }
 
 ps->block_nr = block_i;
-
-return 0;
 }
 
 void pa_space_destroy(struct pa_space *ps)
@@ -228,8 +226,8 @@ void *va_space_resolve(struct va_space *vs, uint64_t va)
 return pa_space_resolve(vs->ps, pa);
 }
 
-int va_space_rw(struct va_space *vs, uint64_t addr,
-void *buf, size_t size, int is_write)
+bool va_space_rw(struct va_space *vs, uint64_t addr,
+ void *buf, size_t size, int is_write)
 {
 while (size) {
 uint64_t page = addr & ELF2DMP_PFN_MASK;
@@ -240,7 +238,7 @@ int va_space_rw(struct va_space *vs, uint64_t addr,
 
 ptr = va_space_resolve(vs, addr);
 if (!ptr) {
-return 1;
+return false;
 }
 
 if (is_write) {
@@ -254,5 +252,5 @@ int va_space_rw(struct va_space *vs, uint64_t addr,
 addr += s;
 }
 
-return 0;
+return true;
 }
diff --git a/contrib/elf2dmp/download.c b/contrib/elf2dmp/download.c
index 902dc04ffa5c..ec8d33ba1e4b 100644
--- a/contrib/elf2dmp/download.c
+++ b/contrib/elf2dmp/download.c
@@ -9,14 +9,14 @@
 #include 
 #include "download.h"
 
-int download_url(const char *name, const char *url)
+bool download_url(const char *name, const char *url)
 {
-int err = 1;
+bool success = false;
 FILE *file;
 CURL *curl = curl_easy_init();
 
 if (!curl) {
-return 1;
+return success;
 }
 
 file = fopen(name, "wb");
@@ -33,11 +33,11 @@ int download_url(const char *name, const char *url)
 unlink(name);

[PATCH v2 00/13] contrib/elf2dmp: Improve robustness

2024-03-04 Thread Akihiko Odaki

elf2dmp sometimes fails to work with partially corrupted dumps, and also
emits warnings when sanitizers are in use. This series are collections
of changes to improve the situation.

Signed-off-by: Akihiko Odaki 
---
Changes in v2:
- Added patch "contrib/elf2dmp: Remove unnecessary err flags".
- Added patch "contrib/elf2dmp: Assume error by default".
- Added patch "contrib/elf2dmp: Conform to the error reporting pattern".
- Added patch "contrib/elf2dmp: Build only for little endian host".
- Added patch "contrib/elf2dmp: Use GPtrArray".
- Added patch "contrib/elf2dmp: Clamp QEMU note to file size".
- Changed error handling in patch "contrib/elf2dmp: Ensure segment fits
  in file" (Peter Maydell)
- Added a comment to fill_context() that it continues on failure.
  (Peter Maydell)
- Link to v1: 
https://lore.kernel.org/r/20240303-elf2dmp-v1-0-bea6649fe...@daynix.com

---
Akihiko Odaki (13):
  contrib/elf2dmp: Remove unnecessary err flags
  contrib/elf2dmp: Assume error by default
  contrib/elf2dmp: Continue even contexts are lacking
  contrib/elf2dmp: Conform to the error reporting pattern
  contrib/elf2dmp: Always check for PA resolution failure
  contrib/elf2dmp: Always destroy PA space
  contrib/elf2dmp: Ensure segment fits in file
  contrib/elf2dmp: Use lduw_le_p() to read PDB
  contrib/elf2dmp: Use rol64() to decode
  MAINTAINERS: Add Akihiko Odaki as a elf2dmp reviewer
  contrib/elf2dmp: Build only for little endian host
  contrib/elf2dmp: Use GPtrArray
  contrib/elf2dmp: Clamp QEMU note to file size

 MAINTAINERS |   1 +
 contrib/elf2dmp/addrspace.h |   6 +-
 contrib/elf2dmp/download.h  |   2 +-
 contrib/elf2dmp/pdb.h   |   2 +-
 contrib/elf2dmp/qemu_elf.h  |   2 +-
 contrib/elf2dmp/addrspace.c |  63 ++
 contrib/elf2dmp/download.c  |  12 ++--
 contrib/elf2dmp/main.c  | 159 
 contrib/elf2dmp/pdb.c   |  61 -
 contrib/elf2dmp/qemu_elf.c  | 142 +--
 contrib/elf2dmp/meson.build |   2 +-
 11 files changed, 226 insertions(+), 226 deletions(-)
---
base-commit: bfe8020c814a30479a4241aaa78b63960655962b
change-id: 20240301-elf2dmp-1a6a551f8663

Best regards,
-- 
Akihiko Odaki

Re: [PATCH] hw/core/machine-smp: Remove deprecated "parameter=0" SMP configurations

2024-03-04 Thread Zhao Liu

Hi Prasad,

> On Mon, 4 Mar 2024 at 12:19, Zhao Liu  wrote:
> > > unsigned maxcpus = config->has_maxcpus ? config->maxcpus : 0;
> >
> > This indicates the default maxcpus is initialized as 0 if user doesn't
> > specifies it.
> 
> * 'has_maxcpus' should be set only if maxcpus > 0. If maxcpus == 0,
> then setting 'has_maxcpus=1' seems convoluted.

After simple test, if user sets maxcpus as 0, the has_maxcpus will be
true as well...I think it's related with QAPI code generation logic.

> > However, we could initialize maxcpus as other default value, e.g.,
> >
> > maxcpus = config->has_maxcpus ? config->maxcpus : 1.
> ===
> hw/core/machine.c
>  machine_initfn
> /* default to mc->default_cpus */
> ms->smp.cpus = mc->default_cpus;
> ms->smp.max_cpus = mc->default_cpus;
> 
>static void machine_class_base_init(ObjectClass *oc, void *data)
>{
>MachineClass *mc = MACHINE_CLASS(oc);
>mc->max_cpus = mc->max_cpus ?: 1;
>mc->min_cpus = mc->min_cpus ?: 1;
>mc->default_cpus = mc->default_cpus ?: 1;
>}
> ===
> * Looking at the above bits, it seems smp.cpus & smp.max_cpus are
> initialised to 1 via default_cpus in MachineClass object.

Yes.

The maxcpus I mentioned is a local virable in
machine_parse_smp_config(), whihc is used to do sanity-check check.

In machine_parse_smp_config(), when we can confirm the topology is
valid, then ms->smp.cpus and ms->smp.max_cpus are set with the valid
virables (cpus and maxcpus).

> >>  if (config->has_maxcpus && config->maxcpus == 0)
> > This check only wants to identify the case that user sets the 0.
> > If the default maxcpus is initialized as 0, then (maxcpus == 0) will
> > fail if user doesn't set maxcpus.
> >
> > But it is still necessary to distinguish whether maxcpus is user-set or
> > auto-initialized.
> 
> * If it is set to zero(0) either by user or by auto-initialise, it is
> still invalid, right?

The latter, "auto-initialise", means user could omit "cpus" and "maxcpus"
parameters in -smp.

Even though the local variable "cpus" and "maxcpus" are initialized as
0, eventually ms->smp.cpus and ms->smp.max_cpus will still have the
valid values.

> > If it is user-set, -smp should fail is there's invalid maxcpus/invalid
> > topology.
> >
> > Otherwise, if it is auto-initialized, its value should be adjusted based
> > on other topology components as the above calculation in (*).
> 
> * Why have such diverging ways?
> * Could we simplify it as
>- If cpus/maxcpus==0, it is invalid, show an error and exit.

Hmm, the origial behavior means if user doesn't set cpus=*/maxcpus=* in
-smp, then QEMU will auto-complete these 2 fields.

If we also return error for the above case that user omits cpus and
maxcpus parameters, then this change the QEMU's API and we need to mark
feature that the cpus/maxcpus parameter can be omitted as deprecated and
remove it out. Just like what I did in this patch for zeroed-parameter
case.

I feel if there's no issue then it's not necessary to change the API. Do
you agree?

>- If cpus/maxcpus > 0, but incorrect for topology, then
> re-calculate the correct value based on topology parameters. If the
> re-calculated value is still incorrect or unsatisfactory, then show an
> error and exit.

Yes, this case is right.

> * Saying that user setting cpu/maxcpus=0 is invalid and
> auto-initialising it to zero(0) is valid, is not consistent.
>

I think "auto-initialising it to zero(0)" doesn't means we re-initialize
ms->smp.cpus and ms->smp.max_cpus as 0 (these 2 fields store actual basic
topology information and they're defult as 1 as you said above).

Does my explaination address your concern? ;-)

Thanks,
Zhao

Re: [PATCH 3/3] hw/mem/cxl_type3: Fix problem with g_steal_pointer()

2024-03-04 Thread Thomas Huth


On 04/03/2024 16.10, Jonathan Cameron wrote:

On Mon,  4 Mar 2024 11:44:06 +0100
Thomas Huth  wrote:


When setting GLIB_VERSION_MAX_ALLOWED to GLIB_VERSION_2_58 or higher,
glib adds type safety checks to the g_steal_pointer() macro. This
triggers errors in the ct3_build_cdat_entries_for_mr() function which
uses the g_steal_pointer() for type-casting from one pointer type to
the other (which also looks quite weird since the local pointers have
all been declared with g_autofree though they are never freed here).
Fix it by using a proper typecast instead. For making this possible, we
have to remove the QEMU_PACKED attribute from some structs since GCC
otherwise complains that the source and destination pointer might
have different alignment restrictions. Removing the QEMU_PACKED should
be fine here since the structs are already naturally aligned. Anyway,
add some QEMU_BUILD_BUG_ON() statements to make sure that we've got
the right sizes (without padding in the structs).


I missed these as well when getting rid of the false handling
of failure of g_new0 calls.

Another alternative would be to point to the head structures rather
than the containing structure - would avoid need to cast.
That might be neater?  Should I think also remove the alignment
question?


I gave it a try, but it does not help against the alignment issue, I still get:

../../devel/qemu/hw/mem/cxl_type3.c: In function 
‘ct3_build_cdat_entries_for_mr’:
../../devel/qemu/hw/mem/cxl_type3.c:138:34: error: taking address of packed 
member of ‘struct CDATDsmas’ may result in an unaligned pointer value 
[-Werror=address-of-packed-member]

  138 | cdat_table[CT3_CDAT_DSMAS] = >header;
  |  ^~

From my experience, it's better anyway to avoid __attribute__((packed)) on 
structures unless it is really really required. At least we should avoid it 
as good as possible as long as we still support running QEMU on Sparc hosts 
(that don't support misaligned memory accesses), since otherwise you can end 
up with non-working code there, see e.g.:


 https://www.mail-archive.com/qemu-devel@nongnu.org/msg439899.html

or:

 https://gitlab.com/qemu-project/qemu/-/commit/cb89b349074310ff9eb7ebe18a

Thus I'd rather prefer to keep this patch as it is right now.

 Thomas

[RFC PATCH v6 10/23] hw/arm/virt: Wire NMI and VNMI irq lines from GIC to CPU

2024-03-04 Thread Jinjie Ruan via

Wire the new NMI and VNMI interrupt line from the GIC to each CPU.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v4:
- Add Reviewed-by.
v3:
- Also add VNMI wire.
---
 hw/arm/virt.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 0af1943697..2d4a187fd5 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -804,7 +804,8 @@ static void create_gic(VirtMachineState *vms, MemoryRegion 
*mem)
 
 /* Wire the outputs from each CPU's generic timer and the GICv3
  * maintenance interrupt signal to the appropriate GIC PPI inputs,
- * and the GIC's IRQ/FIQ/VIRQ/VFIQ interrupt outputs to the CPU's inputs.
+ * and the GIC's IRQ/FIQ/VIRQ/VFIQ/NMI/VNMI interrupt outputs to the
+ * CPU's inputs.
  */
 for (i = 0; i < smp_cpus; i++) {
 DeviceState *cpudev = DEVICE(qemu_get_cpu(i));
@@ -848,6 +849,10 @@ static void create_gic(VirtMachineState *vms, MemoryRegion 
*mem)
qdev_get_gpio_in(cpudev, ARM_CPU_VIRQ));
 sysbus_connect_irq(gicbusdev, i + 3 * smp_cpus,
qdev_get_gpio_in(cpudev, ARM_CPU_VFIQ));
+sysbus_connect_irq(gicbusdev, i + 4 * smp_cpus,
+   qdev_get_gpio_in(cpudev, ARM_CPU_NMI));
+sysbus_connect_irq(gicbusdev, i + 5 * smp_cpus,
+   qdev_get_gpio_in(cpudev, ARM_CPU_VNMI));
 }
 
 fdt_add_gic_node(vms);
-- 
2.34.1

[RFC PATCH v6 22/23] target/arm: Add FEAT_NMI to max

2024-03-04 Thread Jinjie Ruan via

Enable FEAT_NMI on the 'max' CPU.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v3:
- Add Reviewed-by.
- Sorted to last.
---
 docs/system/arm/emulation.rst | 1 +
 target/arm/tcg/cpu64.c| 1 +
 2 files changed, 2 insertions(+)

diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index f67aea2d83..91baf7ad69 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -63,6 +63,7 @@ the following architecture extensions:
 - FEAT_MTE (Memory Tagging Extension)
 - FEAT_MTE2 (Memory Tagging Extension)
 - FEAT_MTE3 (MTE Asymmetric Fault Handling)
+- FEAT_NMI (Non-maskable Interrupt)
 - FEAT_NV (Nested Virtualization)
 - FEAT_NV2 (Enhanced nested virtualization support)
 - FEAT_PACIMP (Pointer authentication - IMPLEMENTATION DEFINED algorithm)
diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
index 5fba2c0f04..60f0dcd799 100644
--- a/target/arm/tcg/cpu64.c
+++ b/target/arm/tcg/cpu64.c
@@ -1175,6 +1175,7 @@ void aarch64_max_tcg_initfn(Object *obj)
 t = FIELD_DP64(t, ID_AA64PFR1, RAS_FRAC, 0);  /* FEAT_RASv1p1 + 
FEAT_DoubleFault */
 t = FIELD_DP64(t, ID_AA64PFR1, SME, 1);   /* FEAT_SME */
 t = FIELD_DP64(t, ID_AA64PFR1, CSV2_FRAC, 0); /* FEAT_CSV2_2 */
+t = FIELD_DP64(t, ID_AA64PFR1, NMI, 1);   /* FEAT_NMI */
 cpu->isar.id_aa64pfr1 = t;
 
 t = cpu->isar.id_aa64mmfr0;
-- 
2.34.1

[RFC PATCH v6 21/23] hw/intc/arm_gicv3: Report the VNMI interrupt

2024-03-04 Thread Jinjie Ruan via

In vCPU Interface, if the vIRQ has the superpriority property, report
vNMI to the corresponding vPE.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v6:
- Add Reviewed-by.
---
 hw/intc/arm_gicv3_cpuif.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
index 483b1bc4a3..f55e8fd277 100644
--- a/hw/intc/arm_gicv3_cpuif.c
+++ b/hw/intc/arm_gicv3_cpuif.c
@@ -465,6 +465,7 @@ void gicv3_cpuif_virt_irq_fiq_update(GICv3CPUState *cs)
 int idx;
 int irqlevel = 0;
 int fiqlevel = 0;
+int nmilevel = 0;
 
 idx = hppvi_index(cs);
 trace_gicv3_cpuif_virt_update(gicv3_redist_affid(cs), idx,
@@ -482,9 +483,17 @@ void gicv3_cpuif_virt_irq_fiq_update(GICv3CPUState *cs)
 uint64_t lr = cs->ich_lr_el2[idx];
 
 if (icv_hppi_can_preempt(cs, lr)) {
-/* Virtual interrupts are simple: G0 are always FIQ, and G1 IRQ */
+/*
+ * Virtual interrupts are simple: G0 are always FIQ, and G1 are
+ * IRQ or NMI which depends on the ICH_LR_EL2.NMI to have
+ * non-maskable property.
+ */
 if (lr & ICH_LR_EL2_GROUP) {
-irqlevel = 1;
+if (cs->gic->nmi_support && (lr & ICH_LR_EL2_NMI)) {
+nmilevel = 1;
+} else {
+irqlevel = 1;
+}
 } else {
 fiqlevel = 1;
 }
@@ -494,6 +503,7 @@ void gicv3_cpuif_virt_irq_fiq_update(GICv3CPUState *cs)
 trace_gicv3_cpuif_virt_set_irqs(gicv3_redist_affid(cs), fiqlevel, 
irqlevel);
 qemu_set_irq(cs->parent_vfiq, fiqlevel);
 qemu_set_irq(cs->parent_virq, irqlevel);
+qemu_set_irq(cs->parent_vnmi, nmilevel);
 }
 
 static void gicv3_cpuif_virt_update(GICv3CPUState *cs)
-- 
2.34.1

Re: [PATCH v3 01/26] s390/stattrib: Add Error** argument to set_migrationmode() handler

2024-03-04 Thread Cédric Le Goater


On 3/4/24 21:49, Fabiano Rosas wrote:

Cédric Le Goater  writes:


This will prepare ground for futur changes adding an Error** argument
to the save_setup() handler. We need to make sure that on failure,
set_migrationmode() always sets a new error. See the Rules section in
qapi/error.h.

Cc: Halil Pasic 
Cc: Christian Borntraeger 
Cc: Thomas Huth 
Signed-off-by: Cédric Le Goater 
---
  include/hw/s390x/storage-attributes.h |  2 +-
  hw/s390x/s390-stattrib-kvm.c  | 12 ++--
  hw/s390x/s390-stattrib.c  | 14 +-
  3 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/include/hw/s390x/storage-attributes.h 
b/include/hw/s390x/storage-attributes.h
index 
5239eb538c1b087797867a247abfc14551af6a4d..8921a04d514bf64a3113255ee10ed33fc598ae06
 100644
--- a/include/hw/s390x/storage-attributes.h
+++ b/include/hw/s390x/storage-attributes.h
@@ -39,7 +39,7 @@ struct S390StAttribClass {
  int (*set_stattr)(S390StAttribState *sa, uint64_t start_gfn,
uint32_t count, uint8_t *values);
  void (*synchronize)(S390StAttribState *sa);
-int (*set_migrationmode)(S390StAttribState *sa, bool value);
+int (*set_migrationmode)(S390StAttribState *sa, bool value, Error **errp);
  int (*get_active)(S390StAttribState *sa);
  long long (*get_dirtycount)(S390StAttribState *sa);
  };
diff --git a/hw/s390x/s390-stattrib-kvm.c b/hw/s390x/s390-stattrib-kvm.c
index 
24cd01382e2d74d62c2d7e980eb6aca1077d893d..357cea2c987213b867c81b0e258f7d0c293fe665
 100644
--- a/hw/s390x/s390-stattrib-kvm.c
+++ b/hw/s390x/s390-stattrib-kvm.c
@@ -17,6 +17,7 @@
  #include "sysemu/kvm.h"
  #include "exec/ram_addr.h"
  #include "kvm/kvm_s390x.h"
+#include "qapi/error.h"
  
  Object *kvm_s390_stattrib_create(void)

  {
@@ -137,14 +138,21 @@ static void 
kvm_s390_stattrib_synchronize(S390StAttribState *sa)
  }
  }
  
-static int kvm_s390_stattrib_set_migrationmode(S390StAttribState *sa, bool val)

+static int kvm_s390_stattrib_set_migrationmode(S390StAttribState *sa, bool val,
+   Error **errp)
  {
  struct kvm_device_attr attr = {
  .group = KVM_S390_VM_MIGRATION,
  .attr = val,
  .addr = 0,
  };
-return kvm_vm_ioctl(kvm_state, KVM_SET_DEVICE_ATTR, );
+int r;
+
+r = kvm_vm_ioctl(kvm_state, KVM_SET_DEVICE_ATTR, );
+if (r) {
+error_setg_errno(errp, -r, "KVM_S390_SET_CMMA_BITS failed");


Did you mean KVM_SET_DEVICE_ATTR?


Drat. Copy paste :)


Thanks,

C.






+}
+return r;
  }
  
  static long long kvm_s390_stattrib_get_dirtycount(S390StAttribState *sa)

diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c
index 
c483b62a9b5f71772639fc180bdad15ecb6711cb..e99de190332a82363b1388bbc450013149295bc0
 100644
--- a/hw/s390x/s390-stattrib.c
+++ b/hw/s390x/s390-stattrib.c
@@ -60,11 +60,12 @@ void hmp_migrationmode(Monitor *mon, const QDict *qdict)
  S390StAttribState *sas = s390_get_stattrib_device();
  S390StAttribClass *sac = S390_STATTRIB_GET_CLASS(sas);
  uint64_t what = qdict_get_int(qdict, "mode");
+Error *local_err = NULL;
  int r;
  
-r = sac->set_migrationmode(sas, what);

+r = sac->set_migrationmode(sas, what, _err);
  if (r < 0) {
-monitor_printf(mon, "Error: %s", strerror(-r));
+monitor_printf(mon, "Error: %s", error_get_pretty(local_err));
  }
  }
  
@@ -170,13 +171,15 @@ static int cmma_save_setup(QEMUFile *f, void *opaque)

  {
  S390StAttribState *sas = S390_STATTRIB(opaque);
  S390StAttribClass *sac = S390_STATTRIB_GET_CLASS(sas);
+Error *local_err = NULL;
  int res;
  /*
   * Signal that we want to start a migration, thus needing PGSTE dirty
   * tracking.
   */
-res = sac->set_migrationmode(sas, 1);
+res = sac->set_migrationmode(sas, true, _err);
  if (res) {
+error_report_err(local_err);
  return res;
  }
  qemu_put_be64(f, STATTR_FLAG_EOS);
@@ -260,7 +263,7 @@ static void cmma_save_cleanup(void *opaque)
  {
  S390StAttribState *sas = S390_STATTRIB(opaque);
  S390StAttribClass *sac = S390_STATTRIB_GET_CLASS(sas);
-sac->set_migrationmode(sas, 0);
+sac->set_migrationmode(sas, false, NULL);
  }
  
  static bool cmma_active(void *opaque)

@@ -293,7 +296,8 @@ static long long 
qemu_s390_get_dirtycount_stub(S390StAttribState *sa)
  {
  return 0;
  }
-static int qemu_s390_set_migrationmode_stub(S390StAttribState *sa, bool value)
+static int qemu_s390_set_migrationmode_stub(S390StAttribState *sa, bool value,
+Error **errp)
  {
  return 0;
  }

[RFC PATCH v6 17/23] hw/intc/arm_gicv3: Add NMI handling CPU interface registers

2024-03-04 Thread Jinjie Ruan via

Add the NMIAR CPU interface registers which deal with acknowledging NMI.

When introduce NMI interrupt, there are some updates to the semantics for the
register ICC_IAR1_EL1 and ICC_HPPIR1_EL1. For ICC_IAR1_EL1 register, it
should return 1022 if the intid has super priority. And for ICC_NMIAR1_EL1
register, it should return 1023 if the intid do not have super priority.
Howerever, these are not necessary for ICC_HPPIR1_EL1 register.

Signed-off-by: Jinjie Ruan 
---
v4:
- Define ICC_NMIAR1_EL1 only if FEAT_GICv3_NMI is implemented.
- Check sctrl_elx.SCTLR_NMI to return 1022 for icc_iar1_read().
- Add gicv3_icc_nmiar1_read() trace event.
- Do not check icc_hppi_can_preempt() for icc_nmiar1_read().
- Add icv_nmiar1_read() and call it when EL2Enabled() and HCR_EL2.IMO == '1'

Signed-off-by: Jinjie Ruan 
---
 hw/intc/arm_gicv3_cpuif.c | 59 +--
 hw/intc/gicv3_internal.h  |  1 +
 hw/intc/trace-events  |  1 +
 3 files changed, 58 insertions(+), 3 deletions(-)

diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
index e1a60d8c15..df82a413c6 100644
--- a/hw/intc/arm_gicv3_cpuif.c
+++ b/hw/intc/arm_gicv3_cpuif.c
@@ -795,6 +795,13 @@ static uint64_t icv_iar_read(CPUARMState *env, const 
ARMCPRegInfo *ri)
 return intid;
 }
 
+static uint64_t icv_nmiar1_read(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+/* todo */
+uint64_t intid = INTID_SPURIOUS;
+return intid;
+}
+
 static uint32_t icc_fullprio_mask(GICv3CPUState *cs)
 {
 /*
@@ -1097,7 +1104,8 @@ static uint64_t icc_hppir0_value(GICv3CPUState *cs, 
CPUARMState *env)
 return cs->hppi.irq;
 }
 
-static uint64_t icc_hppir1_value(GICv3CPUState *cs, CPUARMState *env)
+static uint64_t icc_hppir1_value(GICv3CPUState *cs, CPUARMState *env,
+ bool is_nmi, bool is_hppi)
 {
 /* Return the highest priority pending interrupt register value
  * for group 1.
@@ -1108,6 +1116,19 @@ static uint64_t icc_hppir1_value(GICv3CPUState *cs, 
CPUARMState *env)
 return INTID_SPURIOUS;
 }
 
+if (!is_hppi) {
+int el = arm_current_el(env);
+
+if (is_nmi && (!cs->hppi.superprio)) {
+return INTID_SPURIOUS;
+}
+
+if ((!is_nmi) && cs->hppi.superprio
+&& env->cp15.sctlr_el[el] & SCTLR_NMI) {
+return INTID_NMI;
+}
+}
+
 /* Check whether we can return the interrupt or if we should return
  * a special identifier, as per the CheckGroup1ForSpecialIdentifiers
  * pseudocode. (We can simplify a little because for us ICC_SRE_EL1.RM
@@ -1168,7 +1189,7 @@ static uint64_t icc_iar1_read(CPUARMState *env, const 
ARMCPRegInfo *ri)
 if (!icc_hppi_can_preempt(cs)) {
 intid = INTID_SPURIOUS;
 } else {
-intid = icc_hppir1_value(cs, env);
+intid = icc_hppir1_value(cs, env, false, false);
 }
 
 if (!gicv3_intid_is_special(intid)) {
@@ -1179,6 +1200,25 @@ static uint64_t icc_iar1_read(CPUARMState *env, const 
ARMCPRegInfo *ri)
 return intid;
 }
 
+static uint64_t icc_nmiar1_read(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+GICv3CPUState *cs = icc_cs_from_env(env);
+uint64_t intid;
+
+if (icv_access(env, HCR_IMO)) {
+return icv_nmiar1_read(env, ri);
+}
+
+intid = icc_hppir1_value(cs, env, true, false);
+
+if (!gicv3_intid_is_special(intid)) {
+icc_activate_irq(cs, intid);
+}
+
+trace_gicv3_icc_nmiar1_read(gicv3_redist_affid(cs), intid);
+return intid;
+}
+
 static void icc_drop_prio(GICv3CPUState *cs, int grp)
 {
 /* Drop the priority of the currently active interrupt in
@@ -1555,7 +1595,7 @@ static uint64_t icc_hppir1_read(CPUARMState *env, const 
ARMCPRegInfo *ri)
 return icv_hppir_read(env, ri);
 }
 
-value = icc_hppir1_value(cs, env);
+value = icc_hppir1_value(cs, env, false, true);
 trace_gicv3_icc_hppir1_read(gicv3_redist_affid(cs), value);
 return value;
 }
@@ -2482,6 +2522,15 @@ static const ARMCPRegInfo 
gicv3_cpuif_icc_apxr23_reginfo[] = {
 },
 };
 
+static const ARMCPRegInfo gicv3_cpuif_gicv3_nmi_reginfo[] = {
+{ .name = "ICC_NMIAR1_EL1", .state = ARM_CP_STATE_BOTH,
+  .opc0 = 3, .opc1 = 0, .crn = 12, .crm = 9, .opc2 = 5,
+  .type = ARM_CP_IO | ARM_CP_NO_RAW,
+  .access = PL1_R, .accessfn = gicv3_irq_access,
+  .readfn = icc_nmiar1_read,
+},
+};
+
 static uint64_t ich_ap_read(CPUARMState *env, const ARMCPRegInfo *ri)
 {
 GICv3CPUState *cs = icc_cs_from_env(env);
@@ -2838,6 +2887,10 @@ void gicv3_init_cpuif(GICv3State *s)
  */
 define_arm_cp_regs(cpu, gicv3_cpuif_reginfo);
 
+if (s->nmi_support) {
+define_arm_cp_regs(cpu, gicv3_cpuif_gicv3_nmi_reginfo);
+}
+
 /*
  * The CPU implementation specifies the number of supported
  * bits of physical priority. For backwards compatibility
diff --git a/hw/intc/gicv3_internal.h b/hw/intc/gicv3_internal.h
index

Re: [External] Re: [PATCH v1 0/1] Improved Memory Tier Creation for CPUless NUMA Nodes

2024-03-04 Thread Ho-Ren (Jack) Chuang

On Mon, Mar 4, 2024 at 10:36 PM Huang, Ying  wrote:
>
> "Ho-Ren (Jack) Chuang"  writes:
>
> > On Sun, Mar 3, 2024 at 6:47 PM Huang, Ying  wrote:
> >>
> >> "Ho-Ren (Jack) Chuang"  writes:
> >>
> >> > The memory tiering component in the kernel is functionally useless for
> >> > CPUless memory/non-DRAM devices like CXL1.1 type3 memory because the 
> >> > nodes
> >> > are lumped together in the DRAM tier.
> >> > https://lore.kernel.org/linux-mm/ph0pr08mb7955e9f08ccb64f23963b5c3a8...@ph0pr08mb7955.namprd08.prod.outlook.com/T/
> >>
> >> I think that it's unfair to call it "useless".  Yes, it doesn't work if
> >> the CXL memory device are not enumerate via drivers/dax/kmem.c.  So,
> >> please be specific about in which cases it doesn't work instead of too
> >> general "useless".
> >>
> >
> > Thank you and I didn't mean anything specific. I simply reused phrases
> > we discussed
> > earlier in the previous patchset. I will change them to the following in v2:
> > "At boot time, current memory tiering assigns all detected memory nodes
> > to the same DRAM tier. This results in CPUless memory/non-DRAM devices,
> > such as CXL1.1 type3 memory, being unable to be assigned to the
> > correct memory tier,
> > leading to the inability to migrate pages between different types of 
> > memory."
> >
> > Please see if this looks more specific.
>
> I don't think that the description above is accurate.  In fact, there
> are 2 ways to enumerate the memory device,
>
> 1. Mark it as reserved memory (E820_TYPE_SOFT_RESERVED, etc.) in E820
>table or something similar.
>
> 2. Mark it as normal memory (E820_TYPE_RAM) in E820 table or something
>similar
>
> For 1, the memory device (including CXL memory) is onlined via
> drivers/dax/kmem.c, so will be put in proper memory tiers.  For 2, the
> memory device is indistinguishable with normal DRAM with current
> implementation.  And this is what this patch is working on.
>
> Right?

Good point! How about this?:
"
When a memory device, such as CXL1.1 type3 memory, is emulated as
normal memory (E820_TYPE_RAM), the memory device is indistinguishable
from normal DRAM in terms of memory tiering with the current implementation.
The current memory tiering assigns all detected normal memory nodes
to the same DRAM tier. This results in normal memory devices with
different attributions being unable to be assigned to the correct memory tier,
leading to the inability to migrate pages between different types of memory.
"

--
Best regards,
Ho-Ren (Jack) Chuang
莊賀任

[RFC PATCH v6 11/23] hw/intc/arm_gicv3: Add external IRQ lines for NMI

2024-03-04 Thread Jinjie Ruan via

Augment the GICv3's QOM device interface by adding one
new set of sysbus IRQ line, to signal NMI to each CPU.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v4:
- Add Reviewed-by.
v3:
- Add support for VNMI.
---
 hw/intc/arm_gicv3_common.c | 6 ++
 include/hw/intc/arm_gic_common.h   | 2 ++
 include/hw/intc/arm_gicv3_common.h | 2 ++
 3 files changed, 10 insertions(+)

diff --git a/hw/intc/arm_gicv3_common.c b/hw/intc/arm_gicv3_common.c
index cb55c72681..c52f060026 100644
--- a/hw/intc/arm_gicv3_common.c
+++ b/hw/intc/arm_gicv3_common.c
@@ -299,6 +299,12 @@ void gicv3_init_irqs_and_mmio(GICv3State *s, 
qemu_irq_handler handler,
 for (i = 0; i < s->num_cpu; i++) {
 sysbus_init_irq(sbd, >cpu[i].parent_vfiq);
 }
+for (i = 0; i < s->num_cpu; i++) {
+sysbus_init_irq(sbd, >cpu[i].parent_nmi);
+}
+for (i = 0; i < s->num_cpu; i++) {
+sysbus_init_irq(sbd, >cpu[i].parent_vnmi);
+}
 
 memory_region_init_io(>iomem_dist, OBJECT(s), ops, s,
   "gicv3_dist", 0x1);
diff --git a/include/hw/intc/arm_gic_common.h b/include/hw/intc/arm_gic_common.h
index 7080375008..97fea4102d 100644
--- a/include/hw/intc/arm_gic_common.h
+++ b/include/hw/intc/arm_gic_common.h
@@ -71,6 +71,8 @@ struct GICState {
 qemu_irq parent_fiq[GIC_NCPU];
 qemu_irq parent_virq[GIC_NCPU];
 qemu_irq parent_vfiq[GIC_NCPU];
+qemu_irq parent_nmi[GIC_NCPU];
+qemu_irq parent_vnmi[GIC_NCPU];
 qemu_irq maintenance_irq[GIC_NCPU];
 
 /* GICD_CTLR; for a GIC with the security extensions the NS banked version
diff --git a/include/hw/intc/arm_gicv3_common.h 
b/include/hw/intc/arm_gicv3_common.h
index 4e2fb518e7..7324c7d983 100644
--- a/include/hw/intc/arm_gicv3_common.h
+++ b/include/hw/intc/arm_gicv3_common.h
@@ -155,6 +155,8 @@ struct GICv3CPUState {
 qemu_irq parent_fiq;
 qemu_irq parent_virq;
 qemu_irq parent_vfiq;
+qemu_irq parent_nmi;
+qemu_irq parent_vnmi;
 
 /* Redistributor */
 uint32_t level;  /* Current IRQ level */
-- 
2.34.1

[RFC PATCH v6 07/23] target/arm: Add support for NMI in arm_phys_excp_target_el()

2024-03-04 Thread Jinjie Ruan via

According to Arm GIC section 4.6.3 Interrupt superpriority, the interrupt
with superpriority is always IRQ, never FIQ, so handle NMI same as IRQ in
arm_phys_excp_target_el().

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v4:
- Add Reviewed-by.
v3:
- Remove nmi_is_irq flag in CPUARMState.
- Handle NMI same as IRQ in arm_phys_excp_target_el().
---
 target/arm/helper.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 4b4c8e279d..7cdc90e9e3 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -10570,6 +10570,7 @@ uint32_t arm_phys_excp_target_el(CPUState *cs, uint32_t 
excp_idx,
 hcr_el2 = arm_hcr_el2_eff(env);
 switch (excp_idx) {
 case EXCP_IRQ:
+case EXCP_NMI:
 scr = ((env->cp15.scr_el3 & SCR_IRQ) == SCR_IRQ);
 hcr = hcr_el2 & HCR_IMO;
 break;
-- 
2.34.1

[RFC PATCH v6 12/23] target/arm: Handle NMI in arm_cpu_do_interrupt_aarch64()

2024-03-04 Thread Jinjie Ruan via

According to Arm GIC section 4.6.3 Interrupt superpriority, the interrupt
with superpriority is always IRQ, never FIQ, so the NMI exception trap entry
behave like IRQ. And VNMI(vIRQ with Superpriority) can be raised from the
GIC or come from the hcrx_el2.HCRX_VINMI bit.

Signed-off-by: Jinjie Ruan 
---
v6:
- Not combine VFNMI with CPU_INTERRUPT_VNMI.
v4:
- Also handle VNMI in arm_cpu_do_interrupt_aarch64().
v3:
- Remove the FIQ NMI handle.
---
 target/arm/helper.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index c5af859c35..e6d5326c92 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11460,6 +11460,8 @@ static void arm_cpu_do_interrupt_aarch64(CPUState *cs)
 break;
 case EXCP_IRQ:
 case EXCP_VIRQ:
+case EXCP_NMI:
+case EXCP_VNMI:
 addr += 0x80;
 break;
 case EXCP_FIQ:
-- 
2.34.1

Re: [PATCH v7 2/2] hw/acpi: Implement the SRAT GI affinity structure

2024-03-04 Thread Cédric Le Goater


On 3/5/24 06:59, Ankit Agrawal wrote:

One thing I forgot.

Please add a test.  tests/qtest/bios-tables-test.c
+ relevant table dumps.


Here I need to add a test that creates a vfio-pci device and numa
nodes and link using the acpi-generic-initiator object. One thing
here is that the -device vfio-pci needs a host= argument. I
probably cannot provide the device bdf from my local setup. So
I am not sure how can I add this test to tests/qtest/bios-tables-test.c.
FYI, the following is a sample args we use for the
acpi-generic-initiator object.

-numa node,nodeid=2
-device vfio-pci-nohotplug,bus=pcie.0,addr=04.0,rombar=0,id=dev0 \
-object acpi-generic-initiator,id=gi0,pci-dev=dev0,node=2 \

Moreover based on a quick grep, I don't see any other test that
have -device vfio-pci argument.

Jonathan, Alex, do you know how we may add tests that is dependent
on the vfio-pci device?


There are none.

This would require a host device always available for passthrough and
there is no simple solution for this problem. Such tests would need to
run in a nested environment under avocado: a pc/virt machine with an
igb device and use the PF and/or VFs to check device assignment in a
nested guests.

PPC just introduced new tests to check nested guest support on two
different HV implementations. If you have time, please take a look
at tests/avocado/ppc_hv_tests.py for the framework.

I will try to propose a new test when I am done with the reviews,
not before 9.0 soft freeze though.

Thanks,

C.

[RFC PATCH v6 01/23] target/arm: Handle HCR_EL2 accesses for bits introduced with FEAT_NMI

2024-03-04 Thread Jinjie Ruan via

FEAT_NMI defines another three new bits in HCRX_EL2: TALLINT, HCRX_VINMI and
HCRX_VFNMI. When the feature is enabled, allow these bits to be written in
HCRX_EL2.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v4:
- Update the comment for FEAT_NMI in hcrx_write().
- Update the commit message, s/thress/three/g.
v3:
- Add Reviewed-by.
- Add HCRX_VINMI and HCRX_VFNMI support in HCRX_EL2.
- Upate the commit messsage.
---
 target/arm/cpu-features.h | 5 +
 target/arm/helper.c   | 5 +
 2 files changed, 10 insertions(+)

diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
index 7567854db6..2ad1179be7 100644
--- a/target/arm/cpu-features.h
+++ b/target/arm/cpu-features.h
@@ -681,6 +681,11 @@ static inline bool isar_feature_aa64_sme(const 
ARMISARegisters *id)
 return FIELD_EX64(id->id_aa64pfr1, ID_AA64PFR1, SME) != 0;
 }
 
+static inline bool isar_feature_aa64_nmi(const ARMISARegisters *id)
+{
+return FIELD_EX64(id->id_aa64pfr1, ID_AA64PFR1, NMI) != 0;
+}
+
 static inline bool isar_feature_aa64_tgran4_lpa2(const ARMISARegisters *id)
 {
 return FIELD_SEX64(id->id_aa64mmfr0, ID_AA64MMFR0, TGRAN4) >= 1;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 90c4fb72ce..affa493141 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -6056,6 +6056,11 @@ static void hcrx_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
 valid_mask |= HCRX_MSCEN | HCRX_MCE2;
 }
 
+/* FEAT_NMI adds TALLINT, VINMI and VFNMI */
+if (cpu_isar_feature(aa64_nmi, env_archcpu(env))) {
+valid_mask |= HCRX_TALLINT | HCRX_VINMI | HCRX_VFNMI;
+}
+
 /* Clear RES0 bits.  */
 env->cp15.hcrx_el2 = value & valid_mask;
 }
-- 
2.34.1

[RFC PATCH v6 23/23] hw/arm/virt: Add FEAT_GICv3_NMI feature support in virt GIC

2024-03-04 Thread Jinjie Ruan via

A PE that implements FEAT_NMI and FEAT_GICv3 also implements
FEAT_GICv3_NMI. A PE that does not implement FEAT_NMI, does not implement
FEAT_GICv3_NMI

So included support FEAT_GICv3_NMI feature as part of virt platform
GIC initialization if FEAT_NMI and FEAT_GICv3 supported.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v4:
- Add Reviewed-by.
v3:
- Adjust to be the last after add FEAT_NMI to max.
- Check whether support FEAT_NMI and FEAT_GICv3 for FEAT_GICv3_NMI.
---
 hw/arm/virt.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 2d4a187fd5..c12307ccd9 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -712,6 +712,19 @@ static void create_v2m(VirtMachineState *vms)
 vms->msi_controller = VIRT_MSI_CTRL_GICV2M;
 }
 
+/*
+ * A PE that implements FEAT_NMI and FEAT_GICv3 also implements
+ * FEAT_GICv3_NMI. A PE that does not implement FEAT_NMI, does not implement
+ * FEAT_GICv3_NMI.
+ */
+static bool gicv3_nmi_present(VirtMachineState *vms)
+{
+ARMCPU *cpu = ARM_CPU(qemu_get_cpu(0));
+
+return cpu_isar_feature(aa64_nmi, cpu) &&
+   (vms->gic_version != VIRT_GIC_VERSION_2);
+}
+
 static void create_gic(VirtMachineState *vms, MemoryRegion *mem)
 {
 MachineState *ms = MACHINE(vms);
@@ -785,6 +798,11 @@ static void create_gic(VirtMachineState *vms, MemoryRegion 
*mem)
   vms->virt);
 }
 }
+
+if (gicv3_nmi_present(vms)) {
+qdev_prop_set_bit(vms->gic, "has-nmi", true);
+}
+
 gicbusdev = SYS_BUS_DEVICE(vms->gic);
 sysbus_realize_and_unref(gicbusdev, _fatal);
 sysbus_mmio_map(gicbusdev, 0, vms->memmap[VIRT_GIC_DIST].base);
-- 
2.34.1

[RFC PATCH v6 18/23] hw/intc/arm_gicv3: Handle icv_nmiar1_read() for icc_nmiar1_read()

2024-03-04 Thread Jinjie Ruan via

Implement icv_nmiar1_read() for icc_nmiar1_read(), so add definition for
ICH_LR_EL2.NMI and ICH_AP1R_EL2.NMI bit.

If FEAT_GICv3_NMI is supported, ich_ap_write() should consider ICH_AP1R_EL2.NMI
bit. In icv_activate_irq() and icv_eoir_write(), the ICH_AP1R_EL2.NMI bit
should be set or clear according to the Superpriority info.

By the way, add gicv3_icv_nmiar1_read trace event.

Signed-off-by: Jinjie Ruan 
---
v6:
- Implement icv_nmiar1_read().
---
 hw/intc/arm_gicv3_cpuif.c | 50 ++-
 hw/intc/gicv3_internal.h  |  3 +++
 hw/intc/trace-events  |  1 +
 3 files changed, 48 insertions(+), 6 deletions(-)

diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
index df82a413c6..9a7d089dea 100644
--- a/hw/intc/arm_gicv3_cpuif.c
+++ b/hw/intc/arm_gicv3_cpuif.c
@@ -728,7 +728,7 @@ static uint64_t icv_hppir_read(CPUARMState *env, const 
ARMCPRegInfo *ri)
 return value;
 }
 
-static void icv_activate_irq(GICv3CPUState *cs, int idx, int grp)
+static void icv_activate_irq(GICv3CPUState *cs, int idx, int grp, bool nmi)
 {
 /* Activate the interrupt in the specified list register
  * by moving it from Pending to Active state, and update the
@@ -742,7 +742,12 @@ static void icv_activate_irq(GICv3CPUState *cs, int idx, 
int grp)
 
 cs->ich_lr_el2[idx] &= ~ICH_LR_EL2_STATE_PENDING_BIT;
 cs->ich_lr_el2[idx] |= ICH_LR_EL2_STATE_ACTIVE_BIT;
-cs->ich_apr[grp][regno] |= (1 << regbit);
+
+if (cs->gic->nmi_support) {
+cs->ich_apr[grp][regno] |= (1 << regbit) | (nmi ? ICH_AP1R_EL2_NMI : 
0);
+} else {
+cs->ich_apr[grp][regno] |= (1 << regbit);
+}
 }
 
 static void icv_activate_vlpi(GICv3CPUState *cs)
@@ -775,8 +780,8 @@ static uint64_t icv_iar_read(CPUARMState *env, const 
ARMCPRegInfo *ri)
 
 if (thisgrp == grp && icv_hppi_can_preempt(cs, lr)) {
 intid = ich_lr_vintid(lr);
-if (!gicv3_intid_is_special(intid)) {
-icv_activate_irq(cs, idx, grp);
+if (!gicv3_intid_is_special(intid) && !(lr & ICH_LR_EL2_NMI)) {
+icv_activate_irq(cs, idx, grp, false);
 } else {
 /* Interrupt goes from Pending to Invalid */
 cs->ich_lr_el2[idx] &= ~ICH_LR_EL2_STATE_PENDING_BIT;
@@ -797,8 +802,32 @@ static uint64_t icv_iar_read(CPUARMState *env, const 
ARMCPRegInfo *ri)
 
 static uint64_t icv_nmiar1_read(CPUARMState *env, const ARMCPRegInfo *ri)
 {
-/* todo */
+GICv3CPUState *cs = icc_cs_from_env(env);
+int idx = hppvi_index(cs);
 uint64_t intid = INTID_SPURIOUS;
+
+if (idx >= 0 && idx != HPPVI_INDEX_VLPI) {
+uint64_t lr = cs->ich_lr_el2[idx];
+int thisgrp = (lr & ICH_LR_EL2_GROUP) ? GICV3_G1NS : GICV3_G0;
+
+if ((thisgrp == GICV3_G1NS) && (lr & ICH_LR_EL2_NMI)) {
+intid = ich_lr_vintid(lr);
+if (!gicv3_intid_is_special(intid)) {
+icv_activate_irq(cs, idx, GICV3_G1NS, true);
+} else {
+/* Interrupt goes from Pending to Invalid */
+cs->ich_lr_el2[idx] &= ~ICH_LR_EL2_STATE_PENDING_BIT;
+/* We will now return the (bogus) ID from the list register,
+ * as per the pseudocode.
+ */
+}
+}
+}
+
+trace_gicv3_icv_nmiar1_read(gicv3_redist_affid(cs), intid);
+
+gicv3_cpuif_virt_update(cs);
+
 return intid;
 }
 
@@ -1403,6 +1432,11 @@ static int icv_drop_prio(GICv3CPUState *cs)
 return (apr0count + i * 32) << (icv_min_vbpr(cs) + 1);
 } else {
 *papr1 &= *papr1 - 1;
+
+if (cs->gic->nmi_support && (*papr1 & ICH_AP1R_EL2_NMI)) {
+*papr1 &= ~ICH_AP1R_EL2_NMI;
+}
+
 return (apr1count + i * 32) << (icv_min_vbpr(cs) + 1);
 }
 }
@@ -2552,7 +2586,11 @@ static void ich_ap_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
 
 trace_gicv3_ich_ap_write(ri->crm & 1, regno, gicv3_redist_affid(cs), 
value);
 
-cs->ich_apr[grp][regno] = value & 0xU;
+if (cs->gic->nmi_support) {
+cs->ich_apr[grp][regno] = value & (0xU | ICH_AP1R_EL2_NMI);
+} else {
+cs->ich_apr[grp][regno] = value & 0xU;
+}
 gicv3_cpuif_virt_irq_fiq_update(cs);
 }
 
diff --git a/hw/intc/gicv3_internal.h b/hw/intc/gicv3_internal.h
index 93e56b3726..5e2b32861d 100644
--- a/hw/intc/gicv3_internal.h
+++ b/hw/intc/gicv3_internal.h
@@ -242,6 +242,7 @@ FIELD(GICR_VPENDBASER, VALID, 63, 1)
 #define ICH_LR_EL2_PRIORITY_SHIFT 48
 #define ICH_LR_EL2_PRIORITY_LENGTH 8
 #define ICH_LR_EL2_PRIORITY_MASK (0xffULL << ICH_LR_EL2_PRIORITY_SHIFT)
+#define ICH_LR_EL2_NMI (1ULL << 59)
 #define ICH_LR_EL2_GROUP (1ULL << 60)
 #define ICH_LR_EL2_HW (1ULL << 61)
 #define ICH_LR_EL2_STATE_SHIFT 62
@@ -273,6 +274,8 @@ FIELD(GICR_VPENDBASER, VALID, 63, 1)
 #define ICH_VTR_EL2_PREBITS_SHIFT 26
 #define ICH_VTR_EL2_PRIBITS_SHIFT 29
 
+#define ICH_AP1R_EL2_NMI (1ULL

[RFC PATCH v6 04/23] target/arm: Implement ALLINT MSR (immediate)

2024-03-04 Thread Jinjie Ruan via

Add ALLINT MSR (immediate) to decodetree, in which the CRm is 0b000x. The
EL0 check is necessary to ALLINT, and the EL1 check is necessary when
imm == 1. So implement it inline for EL2/3, or EL1 with imm==0. Avoid the
unconditional write to pc and use raise_exception_ra to unwind.

Signed-off-by: Jinjie Ruan 
---
v6:
- Fix DISAS_TOO_MANY to DISAS_UPDATE_EXIT and add the comment.
v5:
- Drop the & 1 in trans_MSR_i_ALLINT().
- Simplify and merge msr_i_allint() and allint_check().
- Rename msr_i_allint() to msr_set_allint_el1().
v4:
- Fix the ALLINT MSR (immediate) decodetree implementation.
- Remove arm_is_el2_enabled() check in allint_check().
- Update env->allint to env->pstate.
- Only call allint_check() when imm == 1.
- Simplify the allint_check() to not pass "op" and extract.
- Implement it inline for EL2/3, or EL1 with imm==0.
- Pass (a->imm & 1) * PSTATE_ALLINT (i64) to simplfy the ALLINT set/clear.
v3:
- Remove EL0 check in allint_check().
- Add TALLINT check for EL1 in allint_check().
- Remove unnecessarily arm_rebuild_hflags() in msr_i_allint helper.
---
 target/arm/tcg/a64.decode  |  1 +
 target/arm/tcg/helper-a64.c| 12 
 target/arm/tcg/helper-a64.h|  1 +
 target/arm/tcg/translate-a64.c | 19 +++
 4 files changed, 33 insertions(+)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 8a20dce3c8..0e7656fd15 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -207,6 +207,7 @@ MSR_i_DIT   1101 0101  0 011 0100  010 1 
@msr_i
 MSR_i_TCO   1101 0101  0 011 0100  100 1 @msr_i
 MSR_i_DAIFSET   1101 0101  0 011 0100  110 1 @msr_i
 MSR_i_DAIFCLEAR 1101 0101  0 011 0100  111 1 @msr_i
+MSR_i_ALLINT1101 0101  0 001 0100 000 imm:1 000 1
 MSR_i_SVCR  1101 0101  0 011 0100 0 mask:2 imm:1 011 1
 
 # MRS, MSR (register), SYS, SYSL. These are all essentially the
diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c
index ebaa7f00df..7818537890 100644
--- a/target/arm/tcg/helper-a64.c
+++ b/target/arm/tcg/helper-a64.c
@@ -66,6 +66,18 @@ void HELPER(msr_i_spsel)(CPUARMState *env, uint32_t imm)
 update_spsel(env, imm);
 }
 
+void HELPER(msr_set_allint_el1)(CPUARMState *env)
+{
+/* ALLINT update to PSTATE. */
+if (arm_hcrx_el2_eff(env) & HCRX_TALLINT) {
+raise_exception_ra(env, EXCP_UDEF,
+   syn_aa64_sysregtrap(0, 1, 0, 4, 1, 0x1f, 0),
+   exception_target_el(env), GETPC());
+}
+
+env->pstate |= PSTATE_ALLINT;
+}
+
 static void daif_check(CPUARMState *env, uint32_t op,
uint32_t imm, uintptr_t ra)
 {
diff --git a/target/arm/tcg/helper-a64.h b/target/arm/tcg/helper-a64.h
index 575a5dab7d..0518165399 100644
--- a/target/arm/tcg/helper-a64.h
+++ b/target/arm/tcg/helper-a64.h
@@ -22,6 +22,7 @@ DEF_HELPER_FLAGS_1(rbit64, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_2(msr_i_spsel, void, env, i32)
 DEF_HELPER_2(msr_i_daifset, void, env, i32)
 DEF_HELPER_2(msr_i_daifclear, void, env, i32)
+DEF_HELPER_1(msr_set_allint_el1, void, env)
 DEF_HELPER_3(vfp_cmph_a64, i64, f16, f16, ptr)
 DEF_HELPER_3(vfp_cmpeh_a64, i64, f16, f16, ptr)
 DEF_HELPER_3(vfp_cmps_a64, i64, f32, f32, ptr)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 340265beb0..21758b290d 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -2036,6 +2036,25 @@ static bool trans_MSR_i_DAIFCLEAR(DisasContext *s, arg_i 
*a)
 return true;
 }
 
+static bool trans_MSR_i_ALLINT(DisasContext *s, arg_i *a)
+{
+if (!dc_isar_feature(aa64_nmi, s) || s->current_el == 0) {
+return false;
+}
+
+if (a->imm == 0) {
+clear_pstate_bits(PSTATE_ALLINT);
+} else if (s->current_el > 1) {
+set_pstate_bits(PSTATE_ALLINT);
+} else {
+gen_helper_msr_set_allint_el1(tcg_env);
+}
+
+/* Exit the cpu loop to re-evaluate pending IRQs. */
+s->base.is_jmp = DISAS_UPDATE_EXIT;
+return true;
+}
+
 static bool trans_MSR_i_SVCR(DisasContext *s, arg_MSR_i_SVCR *a)
 {
 if (!dc_isar_feature(aa64_sme, s) || a->mask == 0) {
-- 
2.34.1

[RFC PATCH v6 14/23] hw/intc/arm_gicv3_redist: Implement GICR_INMIR0

2024-03-04 Thread Jinjie Ruan via

Add GICR_INMIR0 register and support access GICR_INMIR0.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v6:
- Add Reviewed-by.
v4:
- Make the GICR_INMIR0 implementation more clearer.
---
 hw/intc/arm_gicv3_redist.c | 19 +++
 hw/intc/gicv3_internal.h   |  1 +
 2 files changed, 20 insertions(+)

diff --git a/hw/intc/arm_gicv3_redist.c b/hw/intc/arm_gicv3_redist.c
index 8153525849..7a16a058b1 100644
--- a/hw/intc/arm_gicv3_redist.c
+++ b/hw/intc/arm_gicv3_redist.c
@@ -35,6 +35,15 @@ static int gicr_ns_access(GICv3CPUState *cs, int irq)
 return extract32(cs->gicr_nsacr, irq * 2, 2);
 }
 
+static void gicr_write_bitmap_reg(GICv3CPUState *cs, MemTxAttrs attrs,
+  uint32_t *reg, uint32_t val)
+{
+/* Helper routine to implement writing to a "set" register */
+val &= mask_group(cs, attrs);
+*reg = val;
+gicv3_redist_update(cs);
+}
+
 static void gicr_write_set_bitmap_reg(GICv3CPUState *cs, MemTxAttrs attrs,
   uint32_t *reg, uint32_t val)
 {
@@ -406,6 +415,10 @@ static MemTxResult gicr_readl(GICv3CPUState *cs, hwaddr 
offset,
 *data = value;
 return MEMTX_OK;
 }
+case GICR_INMIR0:
+*data = cs->gic->nmi_support ?
+gicr_read_bitmap_reg(cs, attrs, cs->gicr_isuperprio) : 0;
+return MEMTX_OK;
 case GICR_ICFGR0:
 case GICR_ICFGR1:
 {
@@ -555,6 +568,12 @@ static MemTxResult gicr_writel(GICv3CPUState *cs, hwaddr 
offset,
 gicv3_redist_update(cs);
 return MEMTX_OK;
 }
+case GICR_INMIR0:
+if (cs->gic->nmi_support) {
+gicr_write_bitmap_reg(cs, attrs, >gicr_isuperprio, value);
+}
+return MEMTX_OK;
+
 case GICR_ICFGR0:
 /* Register is all RAZ/WI or RAO/WI bits */
 return MEMTX_OK;
diff --git a/hw/intc/gicv3_internal.h b/hw/intc/gicv3_internal.h
index 29d5cdc1b6..f35b7d2f03 100644
--- a/hw/intc/gicv3_internal.h
+++ b/hw/intc/gicv3_internal.h
@@ -109,6 +109,7 @@
 #define GICR_ICFGR1   (GICR_SGI_OFFSET + 0x0C04)
 #define GICR_IGRPMODR0(GICR_SGI_OFFSET + 0x0D00)
 #define GICR_NSACR(GICR_SGI_OFFSET + 0x0E00)
+#define GICR_INMIR0   (GICR_SGI_OFFSET + 0x0F80)
 
 /* VLPI redistributor registers, offsets from VLPI_base */
 #define GICR_VPROPBASER   (GICR_VLPI_OFFSET + 0x70)
-- 
2.34.1

[RFC PATCH v6 08/23] target/arm: Handle IS/FS in ISR_EL1 for NMI

2024-03-04 Thread Jinjie Ruan via

Add IS and FS bit in ISR_EL1 and handle the read. With CPU_INTERRUPT_NMI or
CPU_INTERRUPT_VNMI, both CPSR_I and ISR_IS must be set. With
CPU_INTERRUPT_VFIQ and HCRX_EL2.VFNMI set, both CPSR_F and ISR_FS must be set.

Signed-off-by: Jinjie Ruan 
---
v6:
- Verify that HCR_EL2.VF is set before checking VFNMI.
v4；
- Also handle VNMI.
v3:
- CPU_INTERRUPT_NMI do not set FIQ, so remove it.
- With CPU_INTERRUPT_NMI, both CPSR_I and ISR_IS must be set.
---
 target/arm/cpu.h|  2 ++
 target/arm/helper.c | 14 ++
 2 files changed, 16 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index de9446c68c..97f276559f 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1467,6 +1467,8 @@ FIELD(CPTR_EL3, TCPAC, 31, 1)
 #define CPSR_N (1U << 31)
 #define CPSR_NZCV (CPSR_N | CPSR_Z | CPSR_C | CPSR_V)
 #define CPSR_AIF (CPSR_A | CPSR_I | CPSR_F)
+#define ISR_FS (1U << 9)
+#define ISR_IS (1U << 10)
 
 #define CPSR_IT (CPSR_IT_0_1 | CPSR_IT_2_7)
 #define CACHED_CPSR_BITS (CPSR_T | CPSR_AIF | CPSR_GE | CPSR_IT | CPSR_Q \
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 7cdc90e9e3..7e73edfde3 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -2018,15 +2018,29 @@ static uint64_t isr_read(CPUARMState *env, const 
ARMCPRegInfo *ri)
 if (cs->interrupt_request & CPU_INTERRUPT_VIRQ) {
 ret |= CPSR_I;
 }
+if (cs->interrupt_request & CPU_INTERRUPT_VNMI) {
+ret |= ISR_IS;
+ret |= CPSR_I;
+}
 } else {
 if (cs->interrupt_request & CPU_INTERRUPT_HARD) {
 ret |= CPSR_I;
 }
+
+if (cs->interrupt_request & CPU_INTERRUPT_NMI) {
+ret |= ISR_IS;
+ret |= CPSR_I;
+}
 }
 
 if (hcr_el2 & HCR_FMO) {
 if (cs->interrupt_request & CPU_INTERRUPT_VFIQ) {
 ret |= CPSR_F;
+
+if ((arm_hcr_el2_eff(env) & HCR_VF) &&
+(env->cp15.hcrx_el2 & HCRX_VFNMI)) {
+ret |= ISR_FS;
+}
 }
 } else {
 if (cs->interrupt_request & CPU_INTERRUPT_FIQ) {
-- 
2.34.1

[RFC PATCH v6 15/23] hw/intc/arm_gicv3: Implement GICD_INMIR

2024-03-04 Thread Jinjie Ruan via

Add GICD_INMIR, GICD_INMIRnE register and support access GICD_INMIR0.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v4:
- Make the GICD_INMIR implementation more clearer.
- Udpate the commit message.
v3:
- Add Reviewed-by.
---
 hw/intc/arm_gicv3_dist.c | 34 ++
 hw/intc/gicv3_internal.h |  2 ++
 2 files changed, 36 insertions(+)

diff --git a/hw/intc/arm_gicv3_dist.c b/hw/intc/arm_gicv3_dist.c
index 35e850685c..9739404e35 100644
--- a/hw/intc/arm_gicv3_dist.c
+++ b/hw/intc/arm_gicv3_dist.c
@@ -89,6 +89,29 @@ static int gicd_ns_access(GICv3State *s, int irq)
 return extract32(s->gicd_nsacr[irq / 16], (irq % 16) * 2, 2);
 }
 
+static void gicd_write_bitmap_reg(GICv3State *s, MemTxAttrs attrs,
+  uint32_t *bmp, maskfn *maskfn,
+  int offset, uint32_t val)
+{
+/*
+ * Helper routine to implement writing to a "set" register
+ * (GICD_INMIR, etc).
+ * Semantics implemented here:
+ * RAZ/WI for SGIs, PPIs, unimplemented IRQs
+ * Bits corresponding to Group 0 or Secure Group 1 interrupts RAZ/WI.
+ * offset should be the offset in bytes of the register from the start
+ * of its group.
+ */
+int irq = offset * 8;
+
+if (irq < GIC_INTERNAL || irq >= s->num_irq) {
+return;
+}
+val &= mask_group_and_nsacr(s, attrs, maskfn, irq);
+*gic_bmp_ptr32(bmp, irq) = val;
+gicv3_update(s, irq, 32);
+}
+
 static void gicd_write_set_bitmap_reg(GICv3State *s, MemTxAttrs attrs,
   uint32_t *bmp,
   maskfn *maskfn,
@@ -543,6 +566,11 @@ static bool gicd_readl(GICv3State *s, hwaddr offset,
 /* RAZ/WI since affinity routing is always enabled */
 *data = 0;
 return true;
+case GICD_INMIR ... GICD_INMIR + 0x7f:
+*data = (!s->nmi_support) ? 0 :
+gicd_read_bitmap_reg(s, attrs, s->superprio, NULL,
+ offset - GICD_INMIR);
+return true;
 case GICD_IROUTER ... GICD_IROUTER + 0x1fdf:
 {
 uint64_t r;
@@ -752,6 +780,12 @@ static bool gicd_writel(GICv3State *s, hwaddr offset,
 case GICD_SPENDSGIR ... GICD_SPENDSGIR + 0xf:
 /* RAZ/WI since affinity routing is always enabled */
 return true;
+case GICD_INMIR ... GICD_INMIR + 0x7f:
+if (s->nmi_support) {
+gicd_write_bitmap_reg(s, attrs, s->superprio, NULL,
+  offset - GICD_INMIR, value);
+}
+return true;
 case GICD_IROUTER ... GICD_IROUTER + 0x1fdf:
 {
 uint64_t r;
diff --git a/hw/intc/gicv3_internal.h b/hw/intc/gicv3_internal.h
index f35b7d2f03..a1fc34597e 100644
--- a/hw/intc/gicv3_internal.h
+++ b/hw/intc/gicv3_internal.h
@@ -52,6 +52,8 @@
 #define GICD_SGIR0x0F00
 #define GICD_CPENDSGIR   0x0F10
 #define GICD_SPENDSGIR   0x0F20
+#define GICD_INMIR   0x0F80
+#define GICD_INMIRnE 0x3B00
 #define GICD_IROUTER 0x6000
 #define GICD_IDREGS  0xFFD0
 
-- 
2.34.1

[RFC PATCH v6 20/23] hw/intc/arm_gicv3: Report the NMI interrupt in gicv3_cpuif_update()

2024-03-04 Thread Jinjie Ruan via

In CPU Interface, if the IRQ has the superpriority property, report
NMI to the corresponding PE.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v6:
- Add Reviewed-by.
v4:
- Swap the ordering of the IFs.
v3:
- Remove handling nmi_is_irq flag.
---
 hw/intc/arm_gicv3_cpuif.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
index 9a7d089dea..483b1bc4a3 100644
--- a/hw/intc/arm_gicv3_cpuif.c
+++ b/hw/intc/arm_gicv3_cpuif.c
@@ -967,6 +967,7 @@ void gicv3_cpuif_update(GICv3CPUState *cs)
 /* Tell the CPU about its highest priority pending interrupt */
 int irqlevel = 0;
 int fiqlevel = 0;
+int nmilevel = 0;
 ARMCPU *cpu = ARM_CPU(cs->cpu);
 CPUARMState *env = >env;
 
@@ -1005,6 +1006,8 @@ void gicv3_cpuif_update(GICv3CPUState *cs)
 
 if (isfiq) {
 fiqlevel = 1;
+} else if (cs->hppi.superprio) {
+nmilevel = 1;
 } else {
 irqlevel = 1;
 }
@@ -1014,6 +1017,7 @@ void gicv3_cpuif_update(GICv3CPUState *cs)
 
 qemu_set_irq(cs->parent_fiq, fiqlevel);
 qemu_set_irq(cs->parent_irq, irqlevel);
+qemu_set_irq(cs->parent_nmi, nmilevel);
 }
 
 static uint64_t icc_pmr_read(CPUARMState *env, const ARMCPRegInfo *ri)
-- 
2.34.1

[RFC PATCH v6 02/23] target/arm: Add PSTATE.ALLINT

2024-03-04 Thread Jinjie Ruan via

When PSTATE.ALLINT is set, an IRQ or FIQ interrupt that is targeted to
ELx, with or without superpriority is masked.

As Richard suggested, place ALLINT bit in PSTATE in env->pstate.

With the change to pstate_read/write, exception entry
and return are automatically handled.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v5:
- Remove the ALLINT comment, as it is covered by "all other bits".
- Add Reviewed-by.
v4:
- Keep PSTATE.ALLINT in env->pstate but not env->allint.
- Update the commit message.
v3:
- Remove ALLINT dump in aarch64_cpu_dump_state().
- Update the commit message.
---
 target/arm/cpu.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index a5b3d8f7da..9402d23a93 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1536,6 +1536,7 @@ FIELD(VTCR, SL2, 33, 1)
 #define PSTATE_D (1U << 9)
 #define PSTATE_BTYPE (3U << 10)
 #define PSTATE_SSBS (1U << 12)
+#define PSTATE_ALLINT (1U << 13)
 #define PSTATE_IL (1U << 20)
 #define PSTATE_SS (1U << 21)
 #define PSTATE_PAN (1U << 22)
-- 
2.34.1

[RFC PATCH v6 05/23] target/arm: Support MSR access to ALLINT

2024-03-04 Thread Jinjie Ruan via

Support ALLINT msr access as follow:
mrs , ALLINT// read allint
msr ALLINT, // write allint with imm

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v5:
- Add Reviewed-by.
v4:
- Remove arm_is_el2_enabled() check in allint_check().
- Change to env->pstate instead of env->allint.
v3:
- Remove EL0 check in aa64_allint_access() which alreay checks in .access
  PL1_RW.
- Use arm_hcrx_el2_eff() in aa64_allint_access() instead of env->cp15.hcrx_el2.
- Make ALLINT msr access function controlled by aa64_nmi.
---
 target/arm/helper.c | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index affa493141..497b6e4bdf 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -4618,6 +4618,36 @@ static void aa64_daif_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
 env->daif = value & PSTATE_DAIF;
 }
 
+static void aa64_allint_write(CPUARMState *env, const ARMCPRegInfo *ri,
+  uint64_t value)
+{
+env->pstate = (env->pstate & ~PSTATE_ALLINT) | (value & PSTATE_ALLINT);
+}
+
+static uint64_t aa64_allint_read(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+return env->pstate & PSTATE_ALLINT;
+}
+
+static CPAccessResult aa64_allint_access(CPUARMState *env,
+ const ARMCPRegInfo *ri, bool isread)
+{
+if (arm_current_el(env) == 1 && (arm_hcrx_el2_eff(env) & HCRX_TALLINT)) {
+return CP_ACCESS_TRAP_EL2;
+}
+return CP_ACCESS_OK;
+}
+
+static const ARMCPRegInfo nmi_reginfo[] = {
+{ .name = "ALLINT", .state = ARM_CP_STATE_AA64,
+  .opc0 = 3, .opc1 = 0, .opc2 = 0, .crn = 4, .crm = 3,
+  .type = ARM_CP_NO_RAW,
+  .access = PL1_RW, .accessfn = aa64_allint_access,
+  .fieldoffset = offsetof(CPUARMState, pstate),
+  .writefn = aa64_allint_write, .readfn = aa64_allint_read,
+  .resetfn = arm_cp_reset_ignore },
+};
+
 static uint64_t aa64_pan_read(CPUARMState *env, const ARMCPRegInfo *ri)
 {
 return env->pstate & PSTATE_PAN;
@@ -9724,6 +9754,10 @@ void register_cp_regs_for_features(ARMCPU *cpu)
 if (cpu_isar_feature(aa64_nv2, cpu)) {
 define_arm_cp_regs(cpu, nv2_reginfo);
 }
+
+if (cpu_isar_feature(aa64_nmi, cpu)) {
+define_arm_cp_regs(cpu, nmi_reginfo);
+}
 #endif
 
 if (cpu_isar_feature(any_predinv, cpu)) {
-- 
2.34.1

[RFC PATCH v6 16/23] hw/intc: Enable FEAT_GICv3_NMI Feature

2024-03-04 Thread Jinjie Ruan via

Added properties to enable FEAT_GICv3_NMI feature, setup distributor
and redistributor registers to indicate NMI support.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v4:
- Add Reviewed-by.
---
 hw/intc/arm_gicv3_common.c | 1 +
 hw/intc/arm_gicv3_dist.c   | 2 ++
 hw/intc/gicv3_internal.h   | 1 +
 include/hw/intc/arm_gicv3_common.h | 1 +
 4 files changed, 5 insertions(+)

diff --git a/hw/intc/arm_gicv3_common.c b/hw/intc/arm_gicv3_common.c
index c52f060026..2d2cea6858 100644
--- a/hw/intc/arm_gicv3_common.c
+++ b/hw/intc/arm_gicv3_common.c
@@ -569,6 +569,7 @@ static Property arm_gicv3_common_properties[] = {
 DEFINE_PROP_UINT32("num-irq", GICv3State, num_irq, 32),
 DEFINE_PROP_UINT32("revision", GICv3State, revision, 3),
 DEFINE_PROP_BOOL("has-lpi", GICv3State, lpi_enable, 0),
+DEFINE_PROP_BOOL("has-nmi", GICv3State, nmi_support, 0),
 DEFINE_PROP_BOOL("has-security-extensions", GICv3State, security_extn, 0),
 /*
  * Compatibility property: force 8 bits of physical priority, even
diff --git a/hw/intc/arm_gicv3_dist.c b/hw/intc/arm_gicv3_dist.c
index 9739404e35..c4e28d209a 100644
--- a/hw/intc/arm_gicv3_dist.c
+++ b/hw/intc/arm_gicv3_dist.c
@@ -412,6 +412,7 @@ static bool gicd_readl(GICv3State *s, hwaddr offset,
  *  by GICD_TYPER.IDbits)
  * MBIS == 0 (message-based SPIs not supported)
  * SecurityExtn == 1 if security extns supported
+ * NMI = 1 if Non-maskable interrupt property is supported
  * CPUNumber == 0 since for us ARE is always 1
  * ITLinesNumber == (((max SPI IntID + 1) / 32) - 1)
  */
@@ -425,6 +426,7 @@ static bool gicd_readl(GICv3State *s, hwaddr offset,
 bool dvis = s->revision >= 4;
 
 *data = (1 << 25) | (1 << 24) | (dvis << 18) | (sec_extn << 10) |
+(s->nmi_support << GICD_TYPER_NMI_SHIFT) |
 (s->lpi_enable << GICD_TYPER_LPIS_SHIFT) |
 (0xf << 19) | itlinesnumber;
 return true;
diff --git a/hw/intc/gicv3_internal.h b/hw/intc/gicv3_internal.h
index a1fc34597e..8d793243f4 100644
--- a/hw/intc/gicv3_internal.h
+++ b/hw/intc/gicv3_internal.h
@@ -70,6 +70,7 @@
 #define GICD_CTLR_E1NWF (1U << 7)
 #define GICD_CTLR_RWP   (1U << 31)
 
+#define GICD_TYPER_NMI_SHIFT   9
 #define GICD_TYPER_LPIS_SHIFT  17
 
 /* 16 bits EventId */
diff --git a/include/hw/intc/arm_gicv3_common.h 
b/include/hw/intc/arm_gicv3_common.h
index df4380141d..16c5fa7256 100644
--- a/include/hw/intc/arm_gicv3_common.h
+++ b/include/hw/intc/arm_gicv3_common.h
@@ -251,6 +251,7 @@ struct GICv3State {
 uint32_t num_irq;
 uint32_t revision;
 bool lpi_enable;
+bool nmi_support;
 bool security_extn;
 bool force_8bit_prio;
 bool irq_reset_nonsecure;
-- 
2.34.1

[RFC PATCH v6 09/23] target/arm: Handle PSTATE.ALLINT on taking an exception

2024-03-04 Thread Jinjie Ruan via

Set or clear PSTATE.ALLINT on taking an exception to ELx according to the
SCTLR_ELx.SPINTMASK bit.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v3:
- Add Reviewed-by.
---
 target/arm/helper.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 7e73edfde3..c5af859c35 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11540,6 +11540,15 @@ static void arm_cpu_do_interrupt_aarch64(CPUState *cs)
 }
 }
 
+if (cpu_isar_feature(aa64_nmi, cpu) &&
+(env->cp15.sctlr_el[new_el] & SCTLR_NMI)) {
+if (!(env->cp15.sctlr_el[new_el] & SCTLR_SPINTMASK)) {
+new_mode |= PSTATE_ALLINT;
+} else {
+new_mode &= ~PSTATE_ALLINT;
+}
+}
+
 pstate_write(env, PSTATE_DAIF | new_mode);
 env->aarch64 = true;
 aarch64_restore_sp(env, new_el);
-- 
2.34.1

[RFC PATCH v6 13/23] hw/intc/arm_gicv3: Add irq superpriority information

2024-03-04 Thread Jinjie Ruan via

A SPI, PPI or SGI interrupt can have a superpriority property. So
maintain superpriority information in PendingIrq and GICR/GICD.

Signed-off-by: Jinjie Ruan 
Acked-by: Richard Henderson 
---
v3:
- Place this ahead of implement GICR_INMIR.
- Add Acked-by.
---
 include/hw/intc/arm_gicv3_common.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/hw/intc/arm_gicv3_common.h 
b/include/hw/intc/arm_gicv3_common.h
index 7324c7d983..df4380141d 100644
--- a/include/hw/intc/arm_gicv3_common.h
+++ b/include/hw/intc/arm_gicv3_common.h
@@ -146,6 +146,7 @@ typedef struct {
 int irq;
 uint8_t prio;
 int grp;
+bool superprio;
 } PendingIrq;
 
 struct GICv3CPUState {
@@ -172,6 +173,7 @@ struct GICv3CPUState {
 uint32_t gicr_ienabler0;
 uint32_t gicr_ipendr0;
 uint32_t gicr_iactiver0;
+uint32_t gicr_isuperprio;
 uint32_t edge_trigger; /* ICFGR0 and ICFGR1 even bits */
 uint32_t gicr_igrpmodr0;
 uint32_t gicr_nsacr;
@@ -274,6 +276,7 @@ struct GICv3State {
 GIC_DECLARE_BITMAP(active);   /* GICD_ISACTIVER */
 GIC_DECLARE_BITMAP(level);/* Current level */
 GIC_DECLARE_BITMAP(edge_trigger); /* GICD_ICFGR even bits */
+GIC_DECLARE_BITMAP(superprio);/* GICD_INMIR */
 uint8_t gicd_ipriority[GICV3_MAXIRQ];
 uint64_t gicd_irouter[GICV3_MAXIRQ];
 /* Cached information: pointer to the cpu i/f for the CPUs specified
@@ -313,6 +316,7 @@ GICV3_BITMAP_ACCESSORS(pending)
 GICV3_BITMAP_ACCESSORS(active)
 GICV3_BITMAP_ACCESSORS(level)
 GICV3_BITMAP_ACCESSORS(edge_trigger)
+GICV3_BITMAP_ACCESSORS(superprio)
 
 #define TYPE_ARM_GICV3_COMMON "arm-gicv3-common"
 typedef struct ARMGICv3CommonClass ARMGICv3CommonClass;
-- 
2.34.1

[RFC PATCH v6 19/23] hw/intc/arm_gicv3: Implement NMI interrupt prioirty

2024-03-04 Thread Jinjie Ruan via

If GICD_CTLR_DS bit is zero and the NMI is non-secure, the NMI prioirty
is higher than 0x80, otherwise it is higher than 0x0. And save NMI
super prioirty information in hppi.superprio to deliver NMI exception.
Since both GICR and GICD can deliver NMI, it is both necessary to check
whether the pending irq is NMI in gicv3_redist_update_noirqset and
gicv3_update_noirqset. And In irqbetter(), only a non-NMI with the same
priority and a smaller interrupt number can be preempted but not NMI.

Signed-off-by: Jinjie Ruan 
---
v6:
- Put the "extract superprio info" logic into gicv3_get_priority().
- Update the comment in irqbetter().
- Reset the cs->hppi.superprio to 0x0.
- Set hppi.superprio to false for LPI.
v4:
- Replace is_nmi with has_superprio to not a mix NMI and superpriority.
- Update the comment in irqbetter().
- Extract gicv3_get_priority() to avoid code repeat.
---
v3:
- Add missing brace
---
 hw/intc/arm_gicv3.c| 70 +-
 hw/intc/arm_gicv3_common.c |  1 +
 2 files changed, 63 insertions(+), 8 deletions(-)

diff --git a/hw/intc/arm_gicv3.c b/hw/intc/arm_gicv3.c
index 0b8f79a122..c3d266af23 100644
--- a/hw/intc/arm_gicv3.c
+++ b/hw/intc/arm_gicv3.c
@@ -21,7 +21,8 @@
 #include "hw/intc/arm_gicv3.h"
 #include "gicv3_internal.h"
 
-static bool irqbetter(GICv3CPUState *cs, int irq, uint8_t prio)
+static bool irqbetter(GICv3CPUState *cs, int irq, uint8_t prio,
+  bool has_superprio)
 {
 /* Return true if this IRQ at this priority should take
  * precedence over the current recorded highest priority
@@ -33,11 +34,22 @@ static bool irqbetter(GICv3CPUState *cs, int irq, uint8_t 
prio)
 if (prio < cs->hppi.prio) {
 return true;
 }
+
+/*
+ * The same priority IRQ with superpriority should signal to the CPU
+ * as it have the priority higher than the labelled 0x80 or 0x00.
+ */
+if (prio == cs->hppi.prio && !cs->hppi.superprio && has_superprio) {
+return true;
+}
+
 /* If multiple pending interrupts have the same priority then it is an
  * IMPDEF choice which of them to signal to the CPU. We choose to
- * signal the one with the lowest interrupt number.
+ * signal the one with the lowest interrupt number if they don't have
+ * superpriority.
  */
-if (prio == cs->hppi.prio && irq <= cs->hppi.irq) {
+if (prio == cs->hppi.prio && !cs->hppi.superprio &&
+!has_superprio && irq <= cs->hppi.irq) {
 return true;
 }
 return false;
@@ -129,6 +141,43 @@ static uint32_t gicr_int_pending(GICv3CPUState *cs)
 return pend;
 }
 
+static bool gicv3_get_priority(GICv3CPUState *cs, bool is_redist,
+   uint8_t *prio, int irq)
+{
+bool has_superprio = false;
+uint32_t superprio = 0x0;
+
+if (is_redist) {
+superprio = extract32(cs->gicr_isuperprio, irq, 1);
+} else {
+superprio = *gic_bmp_ptr32(cs->gic->superprio, irq);
+superprio = superprio & (1 << (irq & 0x1f));
+}
+
+if (superprio) {
+has_superprio = true;
+
+/* DS = 0 & Non-secure NMI */
+if (!(cs->gic->gicd_ctlr & GICD_CTLR_DS) &&
+((is_redist && extract32(cs->gicr_igroupr0, irq, 1)) ||
+ (!is_redist && gicv3_gicd_group_test(cs->gic, irq {
+*prio = 0x80;
+} else {
+*prio = 0x0;
+}
+} else {
+has_superprio = false;
+
+if (is_redist) {
+*prio = cs->gicr_ipriorityr[irq];
+} else {
+*prio = cs->gic->gicd_ipriority[irq];
+}
+}
+
+return has_superprio;
+}
+
 /* Update the interrupt status after state in a redistributor
  * or CPU interface has changed, but don't tell the CPU i/f.
  */
@@ -141,6 +190,7 @@ static void gicv3_redist_update_noirqset(GICv3CPUState *cs)
 uint8_t prio;
 int i;
 uint32_t pend;
+bool has_superprio = false;
 
 /* Find out which redistributor interrupts are eligible to be
  * signaled to the CPU interface.
@@ -152,10 +202,11 @@ static void gicv3_redist_update_noirqset(GICv3CPUState 
*cs)
 if (!(pend & (1 << i))) {
 continue;
 }
-prio = cs->gicr_ipriorityr[i];
-if (irqbetter(cs, i, prio)) {
+has_superprio = gicv3_get_priority(cs, true, , i);
+if (irqbetter(cs, i, prio, has_superprio)) {
 cs->hppi.irq = i;
 cs->hppi.prio = prio;
+cs->hppi.superprio = has_superprio;
 seenbetter = true;
 }
 }
@@ -168,9 +219,10 @@ static void gicv3_redist_update_noirqset(GICv3CPUState *cs)
 if ((cs->gicr_ctlr & GICR_CTLR_ENABLE_LPIS) && cs->gic->lpi_enable &&
 (cs->gic->gicd_ctlr & GICD_CTLR_EN_GRP1NS) &&
 (cs->hpplpi.prio != 0xff)) {
-if (irqbetter(cs, cs->hpplpi.irq, cs->hpplpi.prio)) {
+if (irqbetter(cs, cs->hpplpi.irq, cs->hpplpi.prio, false)) {

[RFC PATCH v6 03/23] target/arm: Add support for FEAT_NMI, Non-maskable Interrupt

2024-03-04 Thread Jinjie Ruan via

Add support for FEAT_NMI. NMI (FEAT_NMI) is an mandatory feature in
ARMv8.8-A and ARM v9.3-A.

Signed-off-by: Jinjie Ruan 
Reviewed-by: Richard Henderson 
---
v3:
- Add Reviewed-by.
- Adjust to before the MSR patches.
---
 target/arm/internals.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index 860bcc0c66..980af3c1c1 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -1078,6 +1078,9 @@ static inline uint32_t aarch64_pstate_valid_mask(const 
ARMISARegisters *id)
 if (isar_feature_aa64_mte(id)) {
 valid |= PSTATE_TCO;
 }
+if (isar_feature_aa64_nmi(id)) {
+valid |= PSTATE_ALLINT;
+}
 
 return valid;
 }
-- 
2.34.1

[RFC PATCH v6 06/23] target/arm: Add support for Non-maskable Interrupt

2024-03-04 Thread Jinjie Ruan via

This only implements the external delivery method via the GICv3.

Signed-off-by: Jinjie Ruan 
---
v6:
- env->cp15.hcr_el2 -> arm_hcr_el2_eff().
- env->cp15.hcrx_el2 -> arm_hcrx_el2_eff().
- Not include VF && VFNMI in CPU_INTERRUPT_VNMI.
v4:
- Accept NMI unconditionally for arm_cpu_has_work() but add comment.
- Change from & to && for EXCP_IRQ or EXCP_FIQ.
- Refator nmi mask in arm_excp_unmasked().
- Also handle VNMI in arm_cpu_exec_interrupt() and arm_cpu_set_irq().
- Rename virtual to Virtual.
v3:
- Not include CPU_INTERRUPT_NMI when FEAT_NMI not enabled
- Add ARM_CPU_VNMI.
- Refator nmi mask in arm_excp_unmasked().
- Test SCTLR_ELx.NMI for ALLINT mask for NMI.
---
 target/arm/cpu-qom.h   |  4 +-
 target/arm/cpu.c   | 85 +++---
 target/arm/cpu.h   |  4 ++
 target/arm/helper.c|  2 +
 target/arm/internals.h |  9 +
 5 files changed, 97 insertions(+), 7 deletions(-)

diff --git a/target/arm/cpu-qom.h b/target/arm/cpu-qom.h
index 8e032691db..e0c9e18036 100644
--- a/target/arm/cpu-qom.h
+++ b/target/arm/cpu-qom.h
@@ -36,11 +36,13 @@ DECLARE_CLASS_CHECKERS(AArch64CPUClass, AARCH64_CPU,
 #define ARM_CPU_TYPE_SUFFIX "-" TYPE_ARM_CPU
 #define ARM_CPU_TYPE_NAME(name) (name ARM_CPU_TYPE_SUFFIX)
 
-/* Meanings of the ARMCPU object's four inbound GPIO lines */
+/* Meanings of the ARMCPU object's six inbound GPIO lines */
 #define ARM_CPU_IRQ 0
 #define ARM_CPU_FIQ 1
 #define ARM_CPU_VIRQ 2
 #define ARM_CPU_VFIQ 3
+#define ARM_CPU_NMI 4
+#define ARM_CPU_VNMI 5
 
 /* For M profile, some registers are banked secure vs non-secure;
  * these are represented as a 2-element array where the first element
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index b2ea5d6513..779e365819 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -122,6 +122,13 @@ void arm_restore_state_to_opc(CPUState *cs,
 }
 #endif /* CONFIG_TCG */
 
+/*
+ * With SCTLR_ELx.NMI == 0, IRQ with Superpriority is masked identically with
+ * IRQ without Superpriority. Moreover, if the GIC is configured so that
+ * FEAT_GICv3_NMI is only set if FEAT_NMI is set, then we won't ever see
+ * CPU_INTERRUPT_*NMI anyway. So we might as well accept NMI here
+ * unconditionally.
+ */
 static bool arm_cpu_has_work(CPUState *cs)
 {
 ARMCPU *cpu = ARM_CPU(cs);
@@ -129,6 +136,7 @@ static bool arm_cpu_has_work(CPUState *cs)
 return (cpu->power_state != PSCI_OFF)
 && cs->interrupt_request &
 (CPU_INTERRUPT_FIQ | CPU_INTERRUPT_HARD
+ | CPU_INTERRUPT_NMI | CPU_INTERRUPT_VNMI
  | CPU_INTERRUPT_VFIQ | CPU_INTERRUPT_VIRQ | CPU_INTERRUPT_VSERR
  | CPU_INTERRUPT_EXITTB);
 }
@@ -668,6 +676,7 @@ static inline bool arm_excp_unmasked(CPUState *cs, unsigned 
int excp_idx,
 CPUARMState *env = cpu_env(cs);
 bool pstate_unmasked;
 bool unmasked = false;
+bool allIntMask = false;
 
 /*
  * Don't take exceptions if they target a lower EL.
@@ -678,13 +687,31 @@ static inline bool arm_excp_unmasked(CPUState *cs, 
unsigned int excp_idx,
 return false;
 }
 
+if (cpu_isar_feature(aa64_nmi, env_archcpu(env)) &&
+env->cp15.sctlr_el[target_el] & SCTLR_NMI && cur_el == target_el) {
+allIntMask = env->pstate & PSTATE_ALLINT ||
+ ((env->cp15.sctlr_el[target_el] & SCTLR_SPINTMASK) &&
+  (env->pstate & PSTATE_SP));
+}
+
 switch (excp_idx) {
+case EXCP_NMI:
+pstate_unmasked = !allIntMask;
+break;
+
+case EXCP_VNMI:
+if ((!(hcr_el2 & HCR_IMO) && !(hcr_el2 & HCR_FMO)) ||
+ (hcr_el2 & HCR_TGE)) {
+/* VNMIs(VIRQs or VFIQs) are only taken when hypervized.  */
+return false;
+}
+return !allIntMask;
 case EXCP_FIQ:
-pstate_unmasked = !(env->daif & PSTATE_F);
+pstate_unmasked = (!(env->daif & PSTATE_F)) && (!allIntMask);
 break;
 
 case EXCP_IRQ:
-pstate_unmasked = !(env->daif & PSTATE_I);
+pstate_unmasked = (!(env->daif & PSTATE_I)) && (!allIntMask);
 break;
 
 case EXCP_VFIQ:
@@ -692,13 +719,13 @@ static inline bool arm_excp_unmasked(CPUState *cs, 
unsigned int excp_idx,
 /* VFIQs are only taken when hypervized.  */
 return false;
 }
-return !(env->daif & PSTATE_F);
+return !(env->daif & PSTATE_F) && (!allIntMask);
 case EXCP_VIRQ:
 if (!(hcr_el2 & HCR_IMO) || (hcr_el2 & HCR_TGE)) {
 /* VIRQs are only taken when hypervized.  */
 return false;
 }
-return !(env->daif & PSTATE_I);
+return !(env->daif & PSTATE_I) && (!allIntMask);
 case EXCP_VSERR:
 if (!(hcr_el2 & HCR_AMO) || (hcr_el2 & HCR_TGE)) {
 /* VIRQs are only taken when hypervized.  */
@@ -804,6 +831,24 @@ static bool arm_cpu_exec_interrupt(CPUState *cs, int 
interrupt_request)
 
 /* The prioritization of interrupts is IMPLEMENTATION DEFINED. */
 
+if

Re: [PATCH v5 1/5] target/riscv: Fix the predicate functions for mhpmeventhX CSRs

2024-03-04 Thread LIU Zhiwei




On 2024/2/29 2:51, Atish Patra wrote:

mhpmeventhX CSRs are available for RV32. The predicate function
should check that first before checking sscofpmf extension.

Fixes: 14664483457b ("target/riscv: Add sscofpmf extension support")
Reviewed-by: Daniel Henrique Barboza 
Reviewed-by: Alistair Francis 


Reviewed-by: LIU Zhiwei 

Zhiwei


Signed-off-by: Atish Patra 
---
  target/riscv/csr.c | 67 ++
  1 file changed, 38 insertions(+), 29 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index d4e8ac13b90c..a3d979c4c72c 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -227,6 +227,15 @@ static RISCVException sscofpmf(CPURISCVState *env, int 
csrno)
  return RISCV_EXCP_NONE;
  }
  
+static RISCVException sscofpmf_32(CPURISCVState *env, int csrno)

+{
+if (riscv_cpu_mxl(env) != MXL_RV32) {
+return RISCV_EXCP_ILLEGAL_INST;
+}
+
+return sscofpmf(env, csrno);
+}
+
  static RISCVException any(CPURISCVState *env, int csrno)
  {
  return RISCV_EXCP_NONE;
@@ -5035,91 +5044,91 @@ riscv_csr_operations csr_ops[CSR_TABLE_SIZE] = {
  [CSR_MHPMEVENT31]= { "mhpmevent31",any,read_mhpmevent,
   write_mhpmevent   },
  
-[CSR_MHPMEVENT3H]= { "mhpmevent3h",sscofpmf,  read_mhpmeventh,

+[CSR_MHPMEVENT3H]= { "mhpmevent3h",sscofpmf_32,  read_mhpmeventh,
   write_mhpmeventh,
   .min_priv_ver = PRIV_VERSION_1_12_0},
-[CSR_MHPMEVENT4H]= { "mhpmevent4h",sscofpmf,  read_mhpmeventh,
+[CSR_MHPMEVENT4H]= { "mhpmevent4h",sscofpmf_32,  read_mhpmeventh,
   write_mhpmeventh,
   .min_priv_ver = PRIV_VERSION_1_12_0},
-[CSR_MHPMEVENT5H]= { "mhpmevent5h",sscofpmf,  read_mhpmeventh,
+[CSR_MHPMEVENT5H]= { "mhpmevent5h",sscofpmf_32,  read_mhpmeventh,
   write_mhpmeventh,
   .min_priv_ver = PRIV_VERSION_1_12_0},
-[CSR_MHPMEVENT6H]= { "mhpmevent6h",sscofpmf,  read_mhpmeventh,
+[CSR_MHPMEVENT6H]= { "mhpmevent6h",sscofpmf_32,  read_mhpmeventh,
   write_mhpmeventh,
   .min_priv_ver = PRIV_VERSION_1_12_0},
-[CSR_MHPMEVENT7H]= { "mhpmevent7h",sscofpmf,  read_mhpmeventh,
+[CSR_MHPMEVENT7H]= { "mhpmevent7h",sscofpmf_32,  read_mhpmeventh,
   write_mhpmeventh,
   .min_priv_ver = PRIV_VERSION_1_12_0},
-[CSR_MHPMEVENT8H]= { "mhpmevent8h",sscofpmf,  read_mhpmeventh,
+[CSR_MHPMEVENT8H]= { "mhpmevent8h",sscofpmf_32,  read_mhpmeventh,
   write_mhpmeventh,
   .min_priv_ver = PRIV_VERSION_1_12_0},
-[CSR_MHPMEVENT9H]= { "mhpmevent9h",sscofpmf,  read_mhpmeventh,
+[CSR_MHPMEVENT9H]= { "mhpmevent9h",sscofpmf_32,  read_mhpmeventh,
   write_mhpmeventh,
   .min_priv_ver = PRIV_VERSION_1_12_0},
-[CSR_MHPMEVENT10H]   = { "mhpmevent10h",sscofpmf,  read_mhpmeventh,
+[CSR_MHPMEVENT10H]   = { "mhpmevent10h",sscofpmf_32,  read_mhpmeventh,
   write_mhpmeventh,
   .min_priv_ver = PRIV_VERSION_1_12_0},
-[CSR_MHPMEVENT11H]   = { "mhpmevent11h",sscofpmf,  read_mhpmeventh,
+[CSR_MHPMEVENT11H]   = { "mhpmevent11h",sscofpmf_32,  read_mhpmeventh,
   write_mhpmeventh,
   .min_priv_ver = PRIV_VERSION_1_12_0},
-[CSR_MHPMEVENT12H]   = { "mhpmevent12h",sscofpmf,  read_mhpmeventh,
+[CSR_MHPMEVENT12H]   = { "mhpmevent12h",sscofpmf_32,  read_mhpmeventh,
   write_mhpmeventh,
   .min_priv_ver = PRIV_VERSION_1_12_0},
-[CSR_MHPMEVENT13H]   = { "mhpmevent13h",sscofpmf,  read_mhpmeventh,
+[CSR_MHPMEVENT13H]   = { "mhpmevent13h",sscofpmf_32,  read_mhpmeventh,
   write_mhpmeventh,
   .min_priv_ver = PRIV_VERSION_1_12_0},
-[CSR_MHPMEVENT14H]   = { "mhpmevent14h",sscofpmf,  read_mhpmeventh,
+[CSR_MHPMEVENT14H]   = { "mhpmevent14h",sscofpmf_32,  read_mhpmeventh,
   write_mhpmeventh,
   .min_priv_ver = PRIV_VERSION_1_12_0},
-[CSR_MHPMEVENT15H]   = { "mhpmevent15h",sscofpmf,  read_mhpmeventh,
+[CSR_MHPMEVENT15H]   = { "mhpmevent15h",sscofpmf_32,  read_mhpmeventh,
   write_mhpmeventh,
   .min_priv_ver = PRIV_VERSION_1_12_0},
-[CSR_MHPMEVENT16H]   = { "mhpmevent16h",

Re: [PATCH v5 2/5] target/riscv: Add cycle & instret privilege mode filtering properties

2024-03-04 Thread LIU Zhiwei




On 2024/2/29 2:51, Atish Patra wrote:

From: Kaiwen Xue 

This adds the properties for ISA extension smcntrpmf. Patches
implementing it will follow.

Signed-off-by: Atish Patra 
Signed-off-by: Kaiwen Xue 
---
  target/riscv/cpu.c | 2 ++
  target/riscv/cpu_cfg.h | 1 +
  2 files changed, 3 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 1b8d001d237f..f9d3c80597fc 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -169,6 +169,7 @@ const RISCVIsaExtData isa_edata_arr[] = {
  ISA_EXT_DATA_ENTRY(zhinx, PRIV_VERSION_1_12_0, ext_zhinx),
  ISA_EXT_DATA_ENTRY(zhinxmin, PRIV_VERSION_1_12_0, ext_zhinxmin),
  ISA_EXT_DATA_ENTRY(smaia, PRIV_VERSION_1_12_0, ext_smaia),
+ISA_EXT_DATA_ENTRY(smcntrpmf, PRIV_VERSION_1_12_0, ext_smcntrpmf),
  ISA_EXT_DATA_ENTRY(smepmp, PRIV_VERSION_1_12_0, ext_smepmp),
  ISA_EXT_DATA_ENTRY(smstateen, PRIV_VERSION_1_12_0, ext_smstateen),
  ISA_EXT_DATA_ENTRY(ssaia, PRIV_VERSION_1_12_0, ext_ssaia),
@@ -1447,6 +1448,7 @@ const char *riscv_get_misa_ext_description(uint32_t bit)
  const RISCVCPUMultiExtConfig riscv_cpu_extensions[] = {
  /* Defaults for standard extensions */
  MULTI_EXT_CFG_BOOL("sscofpmf", ext_sscofpmf, false),
+MULTI_EXT_CFG_BOOL("smcntrpmf", ext_smcntrpmf, false),
  MULTI_EXT_CFG_BOOL("zifencei", ext_zifencei, true),
  MULTI_EXT_CFG_BOOL("zicsr", ext_zicsr, true),
  MULTI_EXT_CFG_BOOL("zihintntl", ext_zihintntl, true),


We should not add the configure option for users before the feature has 
been implemented for bitsect reasons.


Thanks,
Zhiwei


diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
index 833bf5821708..0828841445c5 100644
--- a/target/riscv/cpu_cfg.h
+++ b/target/riscv/cpu_cfg.h
@@ -73,6 +73,7 @@ struct RISCVCPUConfig {
  bool ext_zihpm;
  bool ext_smstateen;
  bool ext_sstc;
+bool ext_smcntrpmf;
  bool ext_svadu;
  bool ext_svinval;
  bool ext_svnapot;

Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_NOTIFICATION_DATA support

2024-03-04 Thread Thomas Huth


On 05/03/2024 04.21, Xinying Yu wrote:

One more thing, I would ask how do  I get the full series patch? Do I copy 
the RFC line by line from this link[1]?


For getting patches that you might have missed on the mailing list, I 
recommend lore.kernel.org :



https://lore.kernel.org/qemu-devel/20240301134330.4191007-1-jonah.pal...@oracle.com/

You can download mbox files there that you can apply locally with "git am".

 HTH,
  Thomas

Re: [PATCH v5 5/5] target/riscv: Implement privilege mode filtering for cycle/instret

2024-03-04 Thread LIU Zhiwei




On 2024/2/29 2:51, Atish Patra wrote:

Privilege mode filtering can also be emulated for cycle/instret by
tracking host_ticks/icount during each privilege mode switch. This
patch implements that for both cycle/instret and mhpmcounters. The
first one requires Smcntrpmf while the other one requires Sscofpmf
to be enabled.

The cycle/instret are still computed using host ticks when icount
is not enabled. Otherwise, they are computed using raw icount which
is more accurate in icount mode.

Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Atish Patra 
---
  target/riscv/cpu.h| 11 +
  target/riscv/cpu_bits.h   |  5 ++
  target/riscv/cpu_helper.c | 17 ++-
  target/riscv/csr.c| 96 ++-
  target/riscv/pmu.c| 64 ++
  target/riscv/pmu.h|  2 +
  6 files changed, 171 insertions(+), 24 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 174e8ba8e847..9e21d7f7d635 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -157,6 +157,15 @@ typedef struct PMUCTRState {
  target_ulong irq_overflow_left;
  } PMUCTRState;
  
+typedef struct PMUFixedCtrState {

+/* Track cycle and icount for each privilege mode */
+uint64_t counter[4];
+uint64_t counter_prev[4];
+/* Track cycle and icount for each privilege mode when V = 1*/
+uint64_t counter_virt[2];
+uint64_t counter_virt_prev[2];
+} PMUFixedCtrState;
+
  struct CPUArchState {
  target_ulong gpr[32];
  target_ulong gprh[32]; /* 64 top bits of the 128-bit registers */
@@ -353,6 +362,8 @@ struct CPUArchState {
  /* PMU event selector configured values for RV32 */
  target_ulong mhpmeventh_val[RV_MAX_MHPMEVENTS];
  
+PMUFixedCtrState pmu_fixed_ctrs[2];

+
  target_ulong sscratch;
  target_ulong mscratch;
  
diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h

index e866c60a400c..5fe349e313dc 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -920,6 +920,11 @@ typedef enum RISCVException {
  #define MHPMEVENT_BIT_VUINHBIT_ULL(58)
  #define MHPMEVENTH_BIT_VUINH   BIT(26)
  
+#define MHPMEVENT_FILTER_MASK  (MHPMEVENT_BIT_MINH  | \

+MHPMEVENT_BIT_SINH  | \
+MHPMEVENT_BIT_UINH  | \
+MHPMEVENT_BIT_VSINH | \
+MHPMEVENT_BIT_VUINH)
  #define MHPMEVENT_SSCOF_MASK   _ULL(0x)
  #define MHPMEVENT_IDX_MASK 0xF
  #define MHPMEVENT_SSCOF_RESVD  16
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index d462d95ee165..33965d843d46 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -718,8 +718,21 @@ void riscv_cpu_set_mode(CPURISCVState *env, target_ulong 
newpriv)
  {
  g_assert(newpriv <= PRV_M && newpriv != PRV_RESERVED);
  
-if (icount_enabled() && newpriv != env->priv) {

-riscv_itrigger_update_priv(env);
+/*
+ * Invoke cycle/instret update between priv mode changes or
+ * VS->HS mode transition is SPV bit must be set
+ * HS->VS mode transition where virt_enabled must be set
+ * In both cases, priv will S mode only.
+ */
+if (newpriv != env->priv ||
+   (env->priv == PRV_S && newpriv == PRV_S &&
+(env->virt_enabled || get_field(env->hstatus, HSTATUS_SPV {
+if (icount_enabled()) {
+riscv_itrigger_update_priv(env);
+riscv_pmu_icount_update_priv(env, newpriv);
+} else {
+riscv_pmu_cycle_update_priv(env, newpriv);
+}
  }
  /* tlb_flush is unnecessary as mode is contained in mmu_idx */
  env->priv = newpriv;
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index ff9bac537593..482e212c5f74 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -788,32 +788,16 @@ static RISCVException write_vcsr(CPURISCVState *env, int 
csrno,
  return RISCV_EXCP_NONE;
  }
  
+#if defined(CONFIG_USER_ONLY)

  /* User Timers and Counters */
  static target_ulong get_ticks(bool shift)
  {
-int64_t val;
-target_ulong result;
-
-#if !defined(CONFIG_USER_ONLY)
-if (icount_enabled()) {
-val = icount_get();
-} else {
-val = cpu_get_host_ticks();
-}
-#else
-val = cpu_get_host_ticks();
-#endif
-
-if (shift) {
-result = val >> 32;
-} else {
-result = val;
-}
+int64_t val = cpu_get_host_ticks();
+target_ulong result = shift ? val >> 32 : val;
  
  return result;

  }
  
-#if defined(CONFIG_USER_ONLY)

  static RISCVException read_time(CPURISCVState *env, int csrno,
  target_ulong *val)
  {
@@ -952,6 +936,71 @@ static RISCVException write_mhpmeventh(CPURISCVState *env, 
int csrno,
  return RISCV_EXCP_NONE;
  }
  
+static

Re: [External] Re: [PATCH v1 0/1] Improved Memory Tier Creation for CPUless NUMA Nodes

2024-03-04 Thread Huang, Ying

"Ho-Ren (Jack) Chuang"  writes:

> On Sun, Mar 3, 2024 at 6:47 PM Huang, Ying  wrote:
>>
>> "Ho-Ren (Jack) Chuang"  writes:
>>
>> > The memory tiering component in the kernel is functionally useless for
>> > CPUless memory/non-DRAM devices like CXL1.1 type3 memory because the nodes
>> > are lumped together in the DRAM tier.
>> > https://lore.kernel.org/linux-mm/ph0pr08mb7955e9f08ccb64f23963b5c3a8...@ph0pr08mb7955.namprd08.prod.outlook.com/T/
>>
>> I think that it's unfair to call it "useless".  Yes, it doesn't work if
>> the CXL memory device are not enumerate via drivers/dax/kmem.c.  So,
>> please be specific about in which cases it doesn't work instead of too
>> general "useless".
>>
>
> Thank you and I didn't mean anything specific. I simply reused phrases
> we discussed
> earlier in the previous patchset. I will change them to the following in v2:
> "At boot time, current memory tiering assigns all detected memory nodes
> to the same DRAM tier. This results in CPUless memory/non-DRAM devices,
> such as CXL1.1 type3 memory, being unable to be assigned to the
> correct memory tier,
> leading to the inability to migrate pages between different types of memory."
>
> Please see if this looks more specific.

I don't think that the description above is accurate.  In fact, there
are 2 ways to enumerate the memory device,

1. Mark it as reserved memory (E820_TYPE_SOFT_RESERVED, etc.) in E820
   table or something similar.

2. Mark it as normal memory (E820_TYPE_RAM) in E820 table or something
   similar

For 1, the memory device (including CXL memory) is onlined via
drivers/dax/kmem.c, so will be put in proper memory tiers.  For 2, the
memory device is indistinguishable with normal DRAM with current
implementation.  And this is what this patch is working on.

Right?

--
Best Regards,
Huang, Ying

>> > This patchset automatically resolves the issues. It delays the 
>> > initialization
>> > of memory tiers for CPUless NUMA nodes until they obtain HMAT information
>> > at boot time, eliminating the need for user intervention.
>> > If no HMAT specified, it falls back to using `default_dram_type`.
>> >
>> > Example usecase:
>> > We have CXL memory on the host, and we create VMs with a new system memory
>> > device backed by host CXL memory. We inject CXL memory performance 
>> > attributes
>> > through QEMU, and the guest now sees memory nodes with performance 
>> > attributes
>> > in HMAT. With this change, we enable the guest kernel to construct
>> > the correct memory tiering for the memory nodes.
>> >
>> > Ho-Ren (Jack) Chuang (1):
>> >   memory tier: acpi/hmat: create CPUless memory tiers after obtaining
>> > HMAT info
>> >
>> >  drivers/acpi/numa/hmat.c |  3 ++
>> >  include/linux/memory-tiers.h |  6 +++
>> >  mm/memory-tiers.c| 76 
>> >  3 files changed, 77 insertions(+), 8 deletions(-)
>>
>> --
>> Best Regards,
>> Huang, Ying

[PATCH v3] target/loongarch: Add TCG macro in structure CPUArchState

2024-03-04 Thread Bibo Mao

In structure CPUArchState some struct elements are only used in TCG
mode, and it is not used in KVM mode. Macro CONFIG_TCG is added to
make it simpiler in KVM mode, also there is the same modification
in c code when these struct elements are used.

When VM runs in KVM mode, TLB entries are not used and do not need
migrate. It is only useful when it runs in TCG mode.

Signed-off-by: Bibo Mao 
---
v2 --> v3:
- Remove print info about fp_status in loongarch_cpu_dump_state() since
it is always zero.
- Return tcg_enabled() directly in tlb_needed()

v1 --> v2:
- Add field needed in structure vmstate_tlb, dynamically judge whether
tlb should be migrated, since mostly qemu-system-loongarch64 is compiled
with both kvm and tcg accl enabled.

Signed-off-by: Bibo Mao 
---
 target/loongarch/cpu.c|  7 +--
 target/loongarch/cpu.h| 16 ++--
 target/loongarch/cpu_helper.c |  9 +
 target/loongarch/machine.c| 30 +-
 4 files changed, 49 insertions(+), 13 deletions(-)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index bc2684179f..6d0349ded2 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -517,7 +517,9 @@ static void loongarch_cpu_reset_hold(Object *obj)
 lacc->parent_phases.hold(obj);
 }
 
+#ifdef CONFIG_TCG
 env->fcsr0_mask = FCSR0_M1 | FCSR0_M2 | FCSR0_M3;
+#endif
 env->fcsr0 = 0x0;
 
 int n;
@@ -562,7 +564,9 @@ static void loongarch_cpu_reset_hold(Object *obj)
 
 #ifndef CONFIG_USER_ONLY
 env->pc = 0x1c00;
+#ifdef CONFIG_TCG
 memset(env->tlb, 0, sizeof(env->tlb));
+#endif
 if (kvm_enabled()) {
 kvm_arch_reset_vcpu(env);
 }
@@ -699,8 +703,7 @@ void loongarch_cpu_dump_state(CPUState *cs, FILE *f, int 
flags)
 int i;
 
 qemu_fprintf(f, " PC=%016" PRIx64 " ", env->pc);
-qemu_fprintf(f, " FCSR0 0x%08x  fp_status 0x%02x\n", env->fcsr0,
- get_float_exception_flags(>fp_status));
+qemu_fprintf(f, " FCSR0 0x%08x\n", env->fcsr0);
 
 /* gpr */
 for (i = 0; i < 32; i++) {
diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index ec37579fd6..c25ad112b1 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -272,6 +272,7 @@ union fpr_t {
 VReg  vreg;
 };
 
+#ifdef CONFIG_TCG
 struct LoongArchTLB {
 uint64_t tlb_misc;
 /* Fields corresponding to CSR_TLBELO0/1 */
@@ -279,23 +280,18 @@ struct LoongArchTLB {
 uint64_t tlb_entry1;
 };
 typedef struct LoongArchTLB LoongArchTLB;
+#endif
 
 typedef struct CPUArchState {
 uint64_t gpr[32];
 uint64_t pc;
 
 fpr_t fpr[32];
-float_status fp_status;
 bool cf[8];
-
 uint32_t fcsr0;
-uint32_t fcsr0_mask;
 
 uint32_t cpucfg[21];
 
-uint64_t lladdr; /* LL virtual address compared against SC */
-uint64_t llval;
-
 /* LoongArch CSRs */
 uint64_t CSR_CRMD;
 uint64_t CSR_PRMD;
@@ -352,8 +348,16 @@ typedef struct CPUArchState {
 uint64_t CSR_DERA;
 uint64_t CSR_DSAVE;
 
+#ifdef CONFIG_TCG
+float_status fp_status;
+uint32_t fcsr0_mask;
+uint64_t lladdr; /* LL virtual address compared against SC */
+uint64_t llval;
+#endif
 #ifndef CONFIG_USER_ONLY
+#ifdef CONFIG_TCG
 LoongArchTLB  tlb[LOONGARCH_TLB_MAX];
+#endif
 
 AddressSpace *address_space_iocsr;
 bool load_elf;
diff --git a/target/loongarch/cpu_helper.c b/target/loongarch/cpu_helper.c
index 45f821d086..d1cdbe30ba 100644
--- a/target/loongarch/cpu_helper.c
+++ b/target/loongarch/cpu_helper.c
@@ -11,6 +11,7 @@
 #include "internals.h"
 #include "cpu-csr.h"
 
+#ifdef CONFIG_TCG
 static int loongarch_map_tlb_entry(CPULoongArchState *env, hwaddr *physical,
int *prot, target_ulong address,
int access_type, int index, int mmu_idx)
@@ -154,6 +155,14 @@ static int loongarch_map_address(CPULoongArchState *env, 
hwaddr *physical,
 
 return TLBRET_NOMATCH;
 }
+#else
+static int loongarch_map_address(CPULoongArchState *env, hwaddr *physical,
+ int *prot, target_ulong address,
+ MMUAccessType access_type, int mmu_idx)
+{
+return TLBRET_NOMATCH;
+}
+#endif
 
 static hwaddr dmw_va2pa(CPULoongArchState *env, target_ulong va,
 target_ulong dmw)
diff --git a/target/loongarch/machine.c b/target/loongarch/machine.c
index c7029fb9b4..9cd9e848d6 100644
--- a/target/loongarch/machine.c
+++ b/target/loongarch/machine.c
@@ -8,6 +8,7 @@
 #include "qemu/osdep.h"
 #include "cpu.h"
 #include "migration/cpu.h"
+#include "sysemu/tcg.h"
 #include "vec.h"
 
 static const VMStateDescription vmstate_fpu_reg = {
@@ -109,9 +110,15 @@ static const VMStateDescription vmstate_lasx = {
 },
 };
 
+#if defined(CONFIG_TCG) && !defined(CONFIG_USER_ONLY)
+static bool tlb_needed(void *opaque)
+{
+return tcg_enabled();
+}
+
 /* TLB state */
-const VMStateDescription vmstate_tlb = {
-.name = "cpu/tlb",
+static

Re: [External] Re: [PATCH v1 0/1] Improved Memory Tier Creation for CPUless NUMA Nodes

2024-03-04 Thread Ho-Ren (Jack) Chuang

On Sun, Mar 3, 2024 at 6:47 PM Huang, Ying  wrote:
>
> "Ho-Ren (Jack) Chuang"  writes:
>
> > The memory tiering component in the kernel is functionally useless for
> > CPUless memory/non-DRAM devices like CXL1.1 type3 memory because the nodes
> > are lumped together in the DRAM tier.
> > https://lore.kernel.org/linux-mm/ph0pr08mb7955e9f08ccb64f23963b5c3a8...@ph0pr08mb7955.namprd08.prod.outlook.com/T/
>
> I think that it's unfair to call it "useless".  Yes, it doesn't work if
> the CXL memory device are not enumerate via drivers/dax/kmem.c.  So,
> please be specific about in which cases it doesn't work instead of too
> general "useless".
>

Thank you and I didn't mean anything specific. I simply reused phrases
we discussed
earlier in the previous patchset. I will change them to the following in v2:
"At boot time, current memory tiering assigns all detected memory nodes
to the same DRAM tier. This results in CPUless memory/non-DRAM devices,
such as CXL1.1 type3 memory, being unable to be assigned to the
correct memory tier,
leading to the inability to migrate pages between different types of memory."

Please see if this looks more specific.

> > This patchset automatically resolves the issues. It delays the 
> > initialization
> > of memory tiers for CPUless NUMA nodes until they obtain HMAT information
> > at boot time, eliminating the need for user intervention.
> > If no HMAT specified, it falls back to using `default_dram_type`.
> >
> > Example usecase:
> > We have CXL memory on the host, and we create VMs with a new system memory
> > device backed by host CXL memory. We inject CXL memory performance 
> > attributes
> > through QEMU, and the guest now sees memory nodes with performance 
> > attributes
> > in HMAT. With this change, we enable the guest kernel to construct
> > the correct memory tiering for the memory nodes.
> >
> > Ho-Ren (Jack) Chuang (1):
> >   memory tier: acpi/hmat: create CPUless memory tiers after obtaining
> > HMAT info
> >
> >  drivers/acpi/numa/hmat.c |  3 ++
> >  include/linux/memory-tiers.h |  6 +++
> >  mm/memory-tiers.c| 76 
> >  3 files changed, 77 insertions(+), 8 deletions(-)
>
> --
> Best Regards,
> Huang, Ying

-- 
---
Best regards,
Ho-Ren (Jack) Chuang
莊賀任

Re: [PATCH V2 1/1] target/loongarch: Fixed tlb huge page loading issue

2024-03-04 Thread lixianglai


Hi Richard:

On 3/4/24 17:51, Xianglai Li wrote:

When we use qemu tcg simulation, the page size of bios is 4KB.
When using the level 2 super large page (page size is 1G) to create 
the page table,
it is found that the content of the corresponding address space is 
abnormal,
resulting in the bios can not start the operating system and 
graphical interface normally.


The lddir and ldpte instruction emulation has
a problem with the use of super large page processing above level 2.
The page size is not correctly calculated,
resulting in the wrong page size of the table entry found by tlb.

Signed-off-by: Xianglai Li 
Cc: maob...@loongson.cn
Cc: Song Gao 
Cc: Xiaojuan Yang 
Cc: zhaotian...@loongson.cn
---
  target/loongarch/cpu.h    |  1 +
  target/loongarch/tcg/tlb_helper.c | 21 -
  2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index ec37579fd6..eab3e41c71 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -292,6 +292,7 @@ typedef struct CPUArchState {
  uint32_t fcsr0_mask;
    uint32_t cpucfg[21];
+    uint32_t lddir_ps;


This magical cpu state does not appear in the manual.


The hardware instruction manual is hosted on github at

https://github.com/loongson/LoongArch-Documentation

Are you sure that large pages above level 2 are really supported by 
LDDIR?



Yes,We have done tests on the physical cpu of loongarch64 and

it works fine with a level 2 large page on the physical cpu.




Some explanation from the hardware engineering side is required.


The description of lddir hardware manual is as follows:


Instruction formats:

|lddir rd, rj, level|

The|LDDIR|instruction is used for accessing directory entries during 
software page table walking.


If bit|[6]|of the general register|rj|is|0|, it means that the content 
of|rj|is the physical address of the


base address of the level page table at this time. In this case, 
the|LDDIR|instruction will access the level


page table according to the current TLB refill address, retrieve the 
base address of the corresponding


|level+1|page table, and write it to the general register|rd|.


reference:

https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html


 4.2.5.1.|LDDIR|

Thanks,

Xianglai.




r~

Re: [PATCH v7 7/9] misc: Add a pca9554 GPIO device model

2024-03-04 Thread Cédric Le Goater


On 3/4/24 23:32, Paolo Bonzini wrote:

On 1/25/24 23:48, Glenn Miles wrote:

Specs are available here:

 https://www.nxp.com/docs/en/data-sheet/PCA9554_9554A.pdf

This is a simple model supporting the basic registers for GPIO
mode.  The device also supports an interrupt output line but the
model does not yet support this.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Glenn Miles 
---

No changes from previous version

  MAINTAINERS    |  10 +-
  hw/misc/pca9554.c  | 328 +


Not a huge deal, but this should have been in hw/gpio.


I started by putting the pca9552 model under misc :/

Glenn,

FYI, there are 2 issues reported by Coverity. See below. May be we could
move both files under gpio while at it ?

Thanks,

C.



** CID 1534919:  Error handling issues  (CHECKED_RETURN)
/builds/bonzini/qemu/hw/misc/pca9554.c: 171 in pca9554_get_pin()



*** CID 1534919:  Error handling issues  (CHECKED_RETURN)
/builds/bonzini/qemu/hw/misc/pca9554.c: 171 in pca9554_get_pin()
165 return;
166 }
167
168 state = pca9554_read(s, PCA9554_CONFIG);
169 state |= pca9554_read(s, PCA9554_OUTPUT);
170 state = (state >> pin) & 0x1;

CID 1534919:  Error handling issues  (CHECKED_RETURN)
Calling "visit_type_str" without checking return value (as is done 
elsewhere 689 out of 740 times).

171 visit_type_str(v, name, (char **)_state[state], errp);
172 }
173
174 static void pca9554_set_pin(Object *obj, Visitor *v, const char *name,
175 void *opaque, Error **errp)
176 {




** CID 1534917:  Integer handling issues  (BAD_SHIFT)
/builds/bonzini/qemu/hw/misc/pca9554.c: 170 in pca9554_get_pin()



*** CID 1534917:  Integer handling issues  (BAD_SHIFT)
/builds/bonzini/qemu/hw/misc/pca9554.c: 170 in pca9554_get_pin()
164 error_setg(errp, "%s invalid pin %s", __func__, name);
165 return;
166 }
167
168 state = pca9554_read(s, PCA9554_CONFIG);
169 state |= pca9554_read(s, PCA9554_OUTPUT);

CID 1534917:  Integer handling issues  (BAD_SHIFT)
In expression "state >> pin", right shifting "state" by more than 7 bits always 
yields zero.  The shift amount, "pin", is as much as 8.

170 state = (state >> pin) & 0x1;
171 visit_type_str(v, name, (char **)_state[state], errp);
172 }
173
174 static void pca9554_set_pin(Object *obj, Visitor *v, const char *name,
175 void *opaque, Error **errp)





Paolo


  include/hw/misc/pca9554.h  |  36 
  include/hw/misc/pca9554_regs.h |  19 ++
  4 files changed, 391 insertions(+), 2 deletions(-)
  create mode 100644 hw/misc/pca9554.c
  create mode 100644 include/hw/misc/pca9554.h
  create mode 100644 include/hw/misc/pca9554_regs.h

diff --git a/MAINTAINERS b/MAINTAINERS
index dfaca8323e..51861e3c7d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1169,9 +1169,7 @@ R: Joel Stanley 
  L: qemu-...@nongnu.org
  S: Maintained
  F: hw/*/*aspeed*
-F: hw/misc/pca9552.c
  F: include/hw/*/*aspeed*
-F: include/hw/misc/pca9552*.h
  F: hw/net/ftgmac100.c
  F: include/hw/net/ftgmac100.h
  F: docs/system/arm/aspeed.rst
@@ -1540,6 +1538,14 @@ F: include/hw/pci-host/pnv*
  F: pc-bios/skiboot.lid
  F: tests/qtest/pnv*
+pca955x
+M: Glenn Miles 
+L: qemu-...@nongnu.org
+L: qemu-...@nongnu.org
+S: Odd Fixes
+F: hw/misc/pca955*.c
+F: include/hw/misc/pca955*.h
+
  virtex_ml507
  M: Edgar E. Iglesias 
  L: qemu-...@nongnu.org
diff --git a/hw/misc/pca9554.c b/hw/misc/pca9554.c
new file mode 100644
index 00..778b32e443
--- /dev/null
+++ b/hw/misc/pca9554.c
@@ -0,0 +1,328 @@
+/*
+ * PCA9554 I/O port
+ *
+ * Copyright (c) 2023, IBM Corporation.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "qemu/bitops.h"
+#include "hw/qdev-properties.h"
+#include "hw/misc/pca9554.h"
+#include "hw/misc/pca9554_regs.h"
+#include "hw/irq.h"
+#include "migration/vmstate.h"
+#include "qapi/error.h"
+#include "qapi/visitor.h"
+#include "trace.h"
+#include "qom/object.h"
+
+struct PCA9554Class {
+    /*< private >*/
+    I2CSlaveClass parent_class;
+    /*< public >*/
+};
+typedef struct PCA9554Class PCA9554Class;
+
+DECLARE_CLASS_CHECKERS(PCA9554Class, PCA9554,
+   TYPE_PCA9554)
+
+#define PCA9554_PIN_LOW  0x0
+#define PCA9554_PIN_HIZ  0x1
+
+static const char *pin_state[] = {"low", "high"};
+
+static void pca9554_update_pin_input(PCA9554State *s)
+{
+    int i;
+    uint8_t config = s->regs[PCA9554_CONFIG];
+    uint8_t output = s->regs[PCA9554_OUTPUT];
+    uint8_t internal_state = config | output;
+
+    for (i = 0; i < PCA9554_PIN_COUNT; i++) {
+

Re: [External] Re: [PATCH v1 0/1] Improved Memory Tier Creation for CPUless NUMA Nodes

2024-03-04 Thread Ho-Ren (Jack) Chuang

On Sun, Mar 3, 2024 at 6:47 PM Huang, Ying  wrote:
>
> "Ho-Ren (Jack) Chuang"  writes:
>
> > The memory tiering component in the kernel is functionally useless for
> > CPUless memory/non-DRAM devices like CXL1.1 type3 memory because the
nodes
> > are lumped together in the DRAM tier.
> >
https://lore.kernel.org/linux-mm/ph0pr08mb7955e9f08ccb64f23963b5c3a8...@ph0pr08mb7955.namprd08.prod.outlook.com/T/
>
> I think that it's unfair to call it "useless".  Yes, it doesn't work if
> the CXL memory device are not enumerate via drivers/dax/kmem.c.  So,
> please be specific about in which cases it doesn't work instead of too
> general "useless".

Thank you and I didn't mean anything specific. I simply reused phrases we
discussed

earlier in the previous patchset. I will change them to the following in v2:


"At boot time, current memory tiering assigns all detected memory nodes

to the same DRAM tier. This results in CPUless memory/non-DRAM devices,

such as CXL1.1 type3 memory, being unable to be assigned to the correct
memory tier,

leading to the inability to migrate pages between different types of
memory."


Please see if this looks more specific.

> > This patchset automatically resolves the issues. It delays the
initialization
> > of memory tiers for CPUless NUMA nodes until they obtain HMAT
information
> > at boot time, eliminating the need for user intervention.
> > If no HMAT specified, it falls back to using `default_dram_type`.
> >
> > Example usecase:
> > We have CXL memory on the host, and we create VMs with a new system
memory
> > device backed by host CXL memory. We inject CXL memory performance
attributes
> > through QEMU, and the guest now sees memory nodes with performance
attributes
> > in HMAT. With this change, we enable the guest kernel to construct
> > the correct memory tiering for the memory nodes.
> >
> > Ho-Ren (Jack) Chuang (1):
> >   memory tier: acpi/hmat: create CPUless memory tiers after obtaining
> > HMAT info
> >
> >  drivers/acpi/numa/hmat.c |  3 ++
> >  include/linux/memory-tiers.h |  6 +++
> >  mm/memory-tiers.c| 76 
> >  3 files changed, 77 insertions(+), 8 deletions(-)
>
> --
> Best Regards,
> Huang, Ying

--
Best regards,
Ho-Ren (Jack) Chuang
莊賀任

Re: [PATCH 14/19] smbios: in case of entry point is 'auto' try to build v2 tables 1st

2024-03-04 Thread Ani Sinha




> On 27-Feb-2024, at 21:17, Igor Mammedov  wrote:
> 
> QEMU for some time now uses SMBIOS 3.0 for PC/Q35 machines by
> default, however Windows has a bug in locating SMBIOS 3.0
> entrypoint and fails to find tables when booted on SeaBIOS
> (on UEFI SMBIOS 3.0 tables work fine since firmware hands
> over tables in another way)
> 
> Missing SMBIOS tables may lead to some issues for guest
> though (worst are: possible reactiveation, inability to
> get virtio drivers from 'Windows Update')
> 
> It's unclear  at this point if MS will fix the issue on their
> side. So instead of it (or rather in addition) this patch
> will try to workaround the issue.
> 
> aka, use smbios-entry-point-type=auto to make QEMU try
> generating conservative SMBIOS 2.0 tables and if that
> fails (due to limits/requested configuration) fallback
> to SMBIOS 3.0 tables.
> 
> With this in place majority of users will use SMBIOS 2.0
> tables which work fine with (Windows + legacy BIOS).
> The configurations that is not to possible to describe
> with SMBIOS 2.0 will switch automatically to SMBIOS 3.0
> (which will trigger Windows bug but there is nothing
> QEMU can do here, so go and aks Microsoft to real fix).
> 
> Signed-off-by: Igor Mammedov 

Reviewed-by: Ani Sinha 

> ---
> hw/smbios/smbios.c | 52 +++---
> 1 file changed, 49 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
> index 5a791fd9eb..e54a9f21e6 100644
> --- a/hw/smbios/smbios.c
> +++ b/hw/smbios/smbios.c
> @@ -959,7 +959,7 @@ static void smbios_entry_point_setup(SmbiosEntryPointType 
> ep_type)
> }
> }
> 
> -void smbios_get_tables(MachineState *ms,
> +static bool smbios_get_tables_ep(MachineState *ms,
>SmbiosEntryPointType ep_type,
>const struct smbios_phys_mem_area *mem_array,
>const unsigned int mem_array_size,
> @@ -968,6 +968,7 @@ void smbios_get_tables(MachineState *ms,
>Error **errp)
> {
> unsigned i, dimm_cnt, offset;
> +ERRP_GUARD();
> 
> assert(ep_type == SMBIOS_ENTRY_POINT_TYPE_32 ||
>ep_type == SMBIOS_ENTRY_POINT_TYPE_64);
> @@ -1052,11 +1053,56 @@ void smbios_get_tables(MachineState *ms,
> abort();
> }
> 
> -return;
> +return true;
> err_exit:
> g_free(smbios_tables);
> smbios_tables = NULL;
> -return;
> +return false;
> +}
> +
> +void smbios_get_tables(MachineState *ms,
> +   SmbiosEntryPointType ep_type,
> +   const struct smbios_phys_mem_area *mem_array,
> +   const unsigned int mem_array_size,
> +   uint8_t **tables, size_t *tables_len,
> +   uint8_t **anchor, size_t *anchor_len,
> +   Error **errp)
> +{
> +Error *local_err = NULL;
> +bool is_valid;
> +ERRP_GUARD();
> +
> +switch (ep_type) {
> +case SMBIOS_ENTRY_POINT_TYPE_AUTO:
> +case SMBIOS_ENTRY_POINT_TYPE_32:
> +is_valid = smbios_get_tables_ep(ms, SMBIOS_ENTRY_POINT_TYPE_32,
> +mem_array, mem_array_size,
> +tables, tables_len,
> +anchor, anchor_len,
> +_err);
> +if (is_valid || ep_type != SMBIOS_ENTRY_POINT_TYPE_AUTO) {
> +break;
> +}
> +/*
> + * fall through in case AUTO endpoint is selected and
> + * SMBIOS 2.x tables can't be generated, to try if SMBIOS 3.x
> + * tables would work
> + */
> +case SMBIOS_ENTRY_POINT_TYPE_64:
> +error_free(local_err);
> +local_err = NULL;
> +is_valid = smbios_get_tables_ep(ms, SMBIOS_ENTRY_POINT_TYPE_64,
> +mem_array, mem_array_size,
> +tables, tables_len,
> +anchor, anchor_len,
> +_err);
> +break;
> +default:
> +abort();
> +}
> +if (!is_valid) {
> +error_propagate(errp, local_err);
> +}
> }
> 
> static void save_opt(const char **dest, QemuOpts *opts, const char *name)
> -- 
> 2.39.3
>

Re: [PATCH v7 2/2] hw/acpi: Implement the SRAT GI affinity structure

2024-03-04 Thread Ankit Agrawal

> One thing I forgot.
>
> Please add a test.  tests/qtest/bios-tables-test.c
> + relevant table dumps.

Here I need to add a test that creates a vfio-pci device and numa
nodes and link using the acpi-generic-initiator object. One thing
here is that the -device vfio-pci needs a host= argument. I
probably cannot provide the device bdf from my local setup. So
I am not sure how can I add this test to tests/qtest/bios-tables-test.c.
FYI, the following is a sample args we use for the
acpi-generic-initiator object.

   -numa node,nodeid=2
   -device vfio-pci-nohotplug,bus=pcie.0,addr=04.0,rombar=0,id=dev0 \
   -object acpi-generic-initiator,id=gi0,pci-dev=dev0,node=2 \

Moreover based on a quick grep, I don't see any other test that
have -device vfio-pci argument.

Jonathan, Alex, do you know how we may add tests that is dependent
on the vfio-pci device?

Re: [RISC-V][tech-server-soc] [RFC 2/2] target/riscv: Add server platform reference cpu

2024-03-04 Thread Wu, Fei

On 3/5/2024 3:43 AM, Daniel Henrique Barboza wrote:
> 
> 
> On 3/4/24 07:25, Fei Wu wrote:
>> The harts requirements of RISC-V server platform [1] require RVA23 ISA
>> profile support, plus Sv48, Svadu, H, Sscofmpf etc. This patch provides
>> a virt CPU type (rvsp-ref) as compliant as possible.
>>
>> [1]
>> https://github.com/riscv-non-isa/riscv-server-platform/blob/main/server_platform_requirements.adoc
>>
>> Signed-off-by: Fei Wu 
>> --->   hw/riscv/server_platform_ref.c |  6 +++-
>>   target/riscv/cpu-qom.h |  1 +
>>   target/riscv/cpu.c | 62 ++
>>   3 files changed, 68 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/riscv/server_platform_ref.c
>> b/hw/riscv/server_platform_ref.c
>> index ae90c4b27a..52ec607cee 100644
>> --- a/hw/riscv/server_platform_ref.c
>> +++ b/hw/riscv/server_platform_ref.c
>> @@ -1205,11 +1205,15 @@ static void
>> rvsp_ref_machine_class_init(ObjectClass *oc, void *data)
>>   {
>>   char str[128];
>>   MachineClass *mc = MACHINE_CLASS(oc);
>> +    static const char * const valid_cpu_types[] = {
>> +    TYPE_RISCV_CPU_RVSP_REF,
>> +    };
>>     mc->desc = "RISC-V Server SoC Reference board";
>>   mc->init = rvsp_ref_machine_init;
>>   mc->max_cpus = RVSP_CPUS_MAX;
>> -    mc->default_cpu_type = TYPE_RISCV_CPU_BASE;
>> +    mc->default_cpu_type = TYPE_RISCV_CPU_RVSP_REF;
>> +    mc->valid_cpu_types = valid_cpu_types;
> 
> I suggest introducing this patch first, then the new machine type that
> will use it as a default
> CPU. The reason is to facilitate future bisects. If we introduce the
> board first, a future bisect
> might hit the previous patch, the board will be run using RV64 instead
> of the correct CPU, and
> we'll have different results because of it.
> 
Good suggestion.

>>   mc->pci_allow_0_address = true;
>>   mc->default_nic = "e1000e";
>>   mc->possible_cpu_arch_ids = riscv_numa_possible_cpu_arch_ids;
>> diff --git a/target/riscv/cpu-qom.h b/target/riscv/cpu-qom.h
>> index 3670cfe6d9..adb934d19e 100644
>> --- a/target/riscv/cpu-qom.h
>> +++ b/target/riscv/cpu-qom.h
>> @@ -49,6 +49,7 @@
>>   #define TYPE_RISCV_CPU_SIFIVE_U54  
>> RISCV_CPU_TYPE_NAME("sifive-u54")
>>   #define TYPE_RISCV_CPU_THEAD_C906  
>> RISCV_CPU_TYPE_NAME("thead-c906")
>>   #define TYPE_RISCV_CPU_VEYRON_V1   
>> RISCV_CPU_TYPE_NAME("veyron-v1")
>> +#define TYPE_RISCV_CPU_RVSP_REF RISCV_CPU_TYPE_NAME("rvsp-ref")
>>   #define TYPE_RISCV_CPU_HOST RISCV_CPU_TYPE_NAME("host")
>>     OBJECT_DECLARE_CPU_TYPE(RISCVCPU, RISCVCPUClass, RISCV_CPU)
>> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
>> index 5ff0192c52..bc91be702b 100644
>> --- a/target/riscv/cpu.c
>> +++ b/target/riscv/cpu.c
>> @@ -2282,6 +2282,67 @@ static void rva22s64_profile_cpu_init(Object *obj)
>>     RVA22S64.enabled = true;
>>   }
>> +
>> +static void rv64_rvsp_ref_cpu_init(Object *obj)
>> +{
>> +    CPURISCVState *env = _CPU(obj)->env;
>> +    RISCVCPU *cpu = RISCV_CPU(obj);
>> +
>> +    riscv_cpu_set_misa_ext(env, RVG | RVC | RVS | RVU | RVH | RVV);
>> +
>> +    /* FIXME: change to 1.13 */
>> +    env->priv_ver = PRIV_VERSION_1_12_0;
>> +
>> +    /* RVA22U64 */
>> +    cpu->cfg.mmu = true;
>> +    cpu->cfg.ext_zifencei = true;
>> +    cpu->cfg.ext_zicsr = true;
>> +    cpu->cfg.ext_zicntr = true;
>> +    cpu->cfg.ext_zihpm = true;
>> +    cpu->cfg.ext_zihintpause = true;
>> +    cpu->cfg.ext_zba = true;
>> +    cpu->cfg.ext_zbb = true;
>> +    cpu->cfg.ext_zbs = true;
>> +    cpu->cfg.zic64b = true;
>> +    cpu->cfg.ext_zicbom = true;
>> +    cpu->cfg.ext_zicbop = true;
>> +    cpu->cfg.ext_zicboz = true;
>> +    cpu->cfg.cbom_blocksize = 64;
>> +    cpu->cfg.cbop_blocksize = 64;
>> +    cpu->cfg.cboz_blocksize = 64;
>> +    cpu->cfg.ext_zfhmin = true;
>> +    cpu->cfg.ext_zkt = true;
> 
> You can change this whole block with:
> 
> RVA22U64.enabled = true;
> 
> 
> riscv_cpu_add_profiles() will check if we have a profile enabled and, if
> that's the
> case, we'll enable all its extensions in the CPU.
> 
> In the near future, when we implement a proper RVA23 support, we'll be
> able to just do
> a single RVA23S64.enabled = true in this cpu_init(). But for now we can
> at least declare
> RVA22U64 (perhaps RVA22S64) support for this CPU.
> 
Let me try.

Thanks,
Fei.

> 
> Thanks,
> 
> Daniel
> 
> 
>> +
>> +    /* RVA23U64 */
>> +    cpu->cfg.ext_zvfhmin = true;
>> +    cpu->cfg.ext_zvbb = true;
>> +    cpu->cfg.ext_zvkt = true;
>> +    cpu->cfg.ext_zihintntl = true;
>> +    cpu->cfg.ext_zicond = true;
>> +    cpu->cfg.ext_zcb = true;
>> +    cpu->cfg.ext_zfa = true;
>> +    cpu->cfg.ext_zawrs = true;
>> +
>> +    /* RVA23S64 */
>> +    cpu->cfg.ext_zifencei = true;
>> +    cpu->cfg.svade = true;
>> +    cpu->cfg.ext_svpbmt = true;
>> +    cpu->cfg.ext_svinval = true;
>> +    cpu->cfg.ext_svnapot = true;
>> +    cpu->cfg.ext_sstc = true;
>> +    cpu->cfg.ext_sscofpmf = true;
>> +    cpu->cfg.ext_smstateen =

Re: [RFC 1/2] hw/riscv: Add server platform reference machine

2024-03-04 Thread Wu, Fei

On 3/5/2024 3:35 AM, Daniel Henrique Barboza wrote:
> 
> 
> On 3/4/24 07:25, Fei Wu wrote:
>> The RISC-V Server Platform specification[1] defines a standardized set
>> of hardware and software capabilities, that portable system software,
>> such as OS and hypervisors can rely on being present in a RISC-V server
>> platform.
>>
>> A corresponding Qemu RISC-V server platform reference (rvsp-ref for
>> short) machine type is added to provide a environment for firmware/OS
>> development and testing. The main features included in rvsp-ref are:
>>
>>   - Based on riscv virt machine type
>>   - A new memory map as close as virt machine as possible
>>   - A new virt CPU type rvsp-ref-cpu for server platform compliance
>>   - AIA
>>   - PCIe AHCI
>>   - PCIe NIC
>>   - No virtio device
>>   - No fw_cfg device
>>   - No ACPI table provided
>>   - Only minimal device tree nodes
>>
>> [1] https://github.com/riscv-non-isa/riscv-server-platform
>>
>> Signed-off-by: Fei Wu 
>> ---
>>   configs/devices/riscv64-softmmu/default.mak |    1 +
>>   hw/riscv/Kconfig    |   13 +
>>   hw/riscv/meson.build    |    1 +
>>   hw/riscv/server_platform_ref.c  | 1244 +++
>>   4 files changed, 1259 insertions(+)
>>   create mode 100644 hw/riscv/server_platform_ref.c
>>
>> diff --git a/configs/devices/riscv64-softmmu/default.mak
>> b/configs/devices/riscv64-softmmu/default.mak
>> index 3f68059448..a1d98e49ef 100644
>> --- a/configs/devices/riscv64-softmmu/default.mak
>> +++ b/configs/devices/riscv64-softmmu/default.mak
>> @@ -10,5 +10,6 @@ CONFIG_SPIKE=y
>>   CONFIG_SIFIVE_E=y
>>   CONFIG_SIFIVE_U=y
>>   CONFIG_RISCV_VIRT=y
>> +CONFIG_SERVER_PLATFORM_REF=y
>>   CONFIG_MICROCHIP_PFSOC=y
>>   CONFIG_SHAKTI_C=y
>> diff --git a/hw/riscv/Kconfig b/hw/riscv/Kconfig
>> index 5d644eb7b1..debac5a7f5 100644
>> --- a/hw/riscv/Kconfig
>> +++ b/hw/riscv/Kconfig
>> @@ -48,6 +48,19 @@ config RISCV_VIRT
>>   select ACPI
>>   select ACPI_PCI
>>   +config SERVER_PLATFORM_REF
>> +    bool
>> +    select RISCV_NUMA
>> +    select GOLDFISH_RTC
>> +    select PCI
>> +    select PCI_EXPRESS_GENERIC_BRIDGE
>> +    select PFLASH_CFI01
>> +    select SERIAL
>> +    select RISCV_ACLINT
>> +    select RISCV_APLIC
>> +    select RISCV_IMSIC
>> +    select SIFIVE_TEST
>> +
>>   config SHAKTI_C
>>   bool
>>   select RISCV_ACLINT
>> diff --git a/hw/riscv/meson.build b/hw/riscv/meson.build
>> index 2f7ee81be3..bb3aff91ea 100644
>> --- a/hw/riscv/meson.build
>> +++ b/hw/riscv/meson.build
>> @@ -4,6 +4,7 @@ riscv_ss.add(when: 'CONFIG_RISCV_NUMA', if_true:
>> files('numa.c'))
>>   riscv_ss.add(files('riscv_hart.c'))
>>   riscv_ss.add(when: 'CONFIG_OPENTITAN', if_true: files('opentitan.c'))
>>   riscv_ss.add(when: 'CONFIG_RISCV_VIRT', if_true: files('virt.c'))
>> +riscv_ss.add(when: 'CONFIG_SERVER_PLATFORM_REF', if_true:
>> files('server_platform_ref.c'))
>>   riscv_ss.add(when: 'CONFIG_SHAKTI_C', if_true: files('shakti_c.c'))
>>   riscv_ss.add(when: 'CONFIG_SIFIVE_E', if_true: files('sifive_e.c'))
>>   riscv_ss.add(when: 'CONFIG_SIFIVE_U', if_true: files('sifive_u.c'))
>> diff --git a/hw/riscv/server_platform_ref.c
>> b/hw/riscv/server_platform_ref.c
>> new file mode 100644
>> index 00..ae90c4b27a
>> --- /dev/null
>> +++ b/hw/riscv/server_platform_ref.c
>> @@ -0,0 +1,1244 @@
>> +/*
>> + * QEMU RISC-V Server Platfrom (RVSP) Reference Board
>> + *
>> + * Copyright (c) 2024 Intel, Inc.
>> + *
>> + * This board is compliant RISC-V Server platform specification and
>> leveraging
>> + * a lot of riscv virt code.
>> + *
>> + * This program is free software; you can redistribute it and/or
>> modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2 or later, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but
>> WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
>> License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> along with
>> + * this program.  If not, see .
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qemu/units.h"
>> +#include "qemu/error-report.h"
>> +#include "qemu/guest-random.h"
>> +#include "qapi/error.h"
>> +#include "qapi/qapi-visit-common.h"
>> +#include "hw/boards.h"
>> +#include "hw/loader.h"
>> +#include "hw/sysbus.h"
>> +#include "hw/qdev-properties.h"
>> +#include "hw/char/serial.h"
>> +#include "hw/block/flash.h"
>> +#include "hw/ide/pci.h"
>> +#include "hw/ide/ahci-pci.h"
>> +#include "hw/pci/pci.h"
>> +#include "hw/pci-host/gpex.h"
>> +#include "hw/core/sysbus-fdt.h"
>> +#include "hw/riscv/riscv_hart.h"
>> +#include "hw/riscv/boot.h"
>> +#include "hw/riscv/numa.h"
>> +#include "hw/intc/riscv_aclint.h"
>> +#include "hw/intc/riscv_aplic.h"
>> +#include

Re: [RFC PATCH] tests: bump QOS_PATH_MAX_ELEMENT_SIZE again

2024-03-04 Thread Thomas Huth


On 04/03/2024 20.37, Alex Bennée wrote:

We "fixed" a bug with LTO builds with 100c459f194 (tests/qtest: bump
up QOS_PATH_MAX_ELEMENT_SIZE) but it seems it has triggered again.
Lets be more assertive raising QOS_PATH_MAX_ELEMENT_SIZE to make it go
away again.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1186 (again)
Signed-off-by: Alex Bennée 
---
  tests/qtest/libqos/qgraph.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/libqos/qgraph.h b/tests/qtest/libqos/qgraph.h
index 287022a67c1..1b5de02e7be 100644
--- a/tests/qtest/libqos/qgraph.h
+++ b/tests/qtest/libqos/qgraph.h
@@ -24,7 +24,7 @@
  #include "libqos-malloc.h"
  
  /* maximum path length */

-#define QOS_PATH_MAX_ELEMENT_SIZE 64
+#define QOS_PATH_MAX_ELEMENT_SIZE 128



Reviewed-by: Thomas Huth

[PATCH v4 6/8] migration/multifd: implement qpl compression and decompression

2024-03-04 Thread Yuan Liu

each qpl job is used to (de)compress a normal page and it can
be processed independently by the IAA hardware. All qpl jobs
are submitted to the hardware at once, and wait for all jobs
completion.

Signed-off-by: Yuan Liu 
Reviewed-by: Nanhai Zou 
---
 migration/multifd-qpl.c | 219 +++-
 1 file changed, 215 insertions(+), 4 deletions(-)

diff --git a/migration/multifd-qpl.c b/migration/multifd-qpl.c
index f4db97ca01..eb815ea3be 100644
--- a/migration/multifd-qpl.c
+++ b/migration/multifd-qpl.c
@@ -167,6 +167,112 @@ static void qpl_send_cleanup(MultiFDSendParams *p, Error 
**errp)
 p->data = NULL;
 }
 
+static inline void prepare_job(qpl_job *job, uint8_t *input, uint32_t 
input_len,
+   uint8_t *output, uint32_t output_len,
+   bool is_compression)
+{
+job->op = is_compression ? qpl_op_compress : qpl_op_decompress;
+job->next_in_ptr = input;
+job->next_out_ptr = output;
+job->available_in = input_len;
+job->available_out = output_len;
+job->flags = QPL_FLAG_FIRST | QPL_FLAG_LAST | QPL_FLAG_OMIT_VERIFY;
+/* only supports one compression level */
+job->level = 1;
+}
+
+/**
+ * set_raw_data_hdr: set the length of raw data
+ *
+ * If the length of the compressed output data is greater than or equal to
+ * the page size, then set the compressed data length to the data size and
+ * send raw data directly.
+ *
+ * @qpl: pointer to the qpl_data structure
+ * @index: the index of the compression job header
+ */
+static inline void set_raw_data_hdr(struct qpl_data *qpl, uint32_t index)
+{
+assert(index < qpl->job_num);
+qpl->zbuf_hdr[index] = cpu_to_be32(qpl->data_size);
+}
+
+/**
+ * is_raw_data: check if the data is raw data
+ *
+ * The raw data length is always equal to data size, which is the
+ * size of one page.
+ *
+ * Returns true if the data is raw data, otherwise false
+ *
+ * @qpl: pointer to the qpl_data structure
+ * @index: the index of the decompressed job header
+ */
+static inline bool is_raw_data(struct qpl_data *qpl, uint32_t index)
+{
+assert(index < qpl->job_num);
+return qpl->zbuf_hdr[index] == qpl->data_size;
+}
+
+static int run_comp_jobs(MultiFDSendParams *p, Error **errp)
+{
+qpl_status status;
+struct qpl_data *qpl = p->data;
+MultiFDPages_t *pages = p->pages;
+uint32_t job_num = pages->num;
+qpl_job *job = NULL;
+uint32_t off = 0;
+
+assert(job_num <= qpl->job_num);
+/* submit all compression jobs */
+for (int i = 0; i < job_num; i++) {
+job = qpl->job_array[i];
+/* the compressed data size should be less than one page */
+prepare_job(job, pages->block->host + pages->offset[i], qpl->data_size,
+qpl->zbuf + off, qpl->data_size - 1, true);
+retry:
+status = qpl_submit_job(job);
+if (status == QPL_STS_OK) {
+off += qpl->data_size;
+} else if (status == QPL_STS_QUEUES_ARE_BUSY_ERR) {
+goto retry;
+} else {
+error_setg(errp, "multifd %u: qpl_submit_job failed with error %d",
+   p->id, status);
+return -1;
+}
+}
+
+/* wait all jobs to complete */
+for (int i = 0; i < job_num; i++) {
+job = qpl->job_array[i];
+status = qpl_wait_job(job);
+if (status == QPL_STS_OK) {
+qpl->zbuf_hdr[i] = cpu_to_be32(job->total_out);
+p->iov[p->iovs_num].iov_len = job->total_out;
+p->iov[p->iovs_num].iov_base = qpl->zbuf + (qpl->data_size * i);
+p->next_packet_size += job->total_out;
+} else if (status == QPL_STS_MORE_OUTPUT_NEEDED) {
+/*
+ * the compression job does not fail, the output data
+ * size is larger than the provided memory size. In this
+ * case, raw data is sent directly to the destination.
+ */
+set_raw_data_hdr(qpl, i);
+p->iov[p->iovs_num].iov_len = qpl->data_size;
+p->iov[p->iovs_num].iov_base = pages->block->host +
+   pages->offset[i];
+p->next_packet_size += qpl->data_size;
+} else {
+error_setg(errp, "multifd %u: qpl_wait_job failed with error %d",
+   p->id, status);
+return -1;
+}
+p->iovs_num++;
+}
+return 0;
+}
+
 /**
  * qpl_send_prepare: prepare data to be able to send
  *
@@ -180,8 +286,25 @@ static void qpl_send_cleanup(MultiFDSendParams *p, Error 
**errp)
  */
 static int qpl_send_prepare(MultiFDSendParams *p, Error **errp)
 {
-/* Implement in next patch */
-return -1;
+struct qpl_data *qpl = p->data;
+uint32_t hdr_size = p->pages->num * sizeof(uint32_t);
+
+multifd_send_prepare_header(p);
+
+assert(p->pages->num <= qpl->job_num);
+/* prepare the header that stores the lengths of all compressed data */
+

[PATCH v4 5/8] migration/multifd: implement initialization of qpl compression

2024-03-04 Thread Yuan Liu

the qpl initialization includes memory allocation for compressed
data and the qpl job initialization.

the qpl initialization will check whether the In-Memory Analytics
Accelerator(IAA) hardware is available, if the platform does not
have IAA hardware or the IAA hardware is not available, the QPL
compression initialization will fail.

Signed-off-by: Yuan Liu 
Reviewed-by: Nanhai Zou 
---
 migration/multifd-qpl.c | 128 ++--
 1 file changed, 122 insertions(+), 6 deletions(-)

diff --git a/migration/multifd-qpl.c b/migration/multifd-qpl.c
index 6b94e732ac..f4db97ca01 100644
--- a/migration/multifd-qpl.c
+++ b/migration/multifd-qpl.c
@@ -33,6 +33,100 @@ struct qpl_data {
 uint32_t *zbuf_hdr;
 };
 
+static void free_zbuf(struct qpl_data *qpl)
+{
+if (qpl->zbuf != NULL) {
+munmap(qpl->zbuf, qpl->job_num * qpl->data_size);
+qpl->zbuf = NULL;
+}
+if (qpl->zbuf_hdr != NULL) {
+g_free(qpl->zbuf_hdr);
+qpl->zbuf_hdr = NULL;
+}
+}
+
+static int alloc_zbuf(struct qpl_data *qpl, uint8_t chan_id, Error **errp)
+{
+int flags = MAP_PRIVATE | MAP_POPULATE | MAP_ANONYMOUS;
+uint32_t size = qpl->job_num * qpl->data_size;
+uint8_t *buf;
+
+buf = (uint8_t *) mmap(NULL, size, PROT_READ | PROT_WRITE, flags, -1, 0);
+if (buf == MAP_FAILED) {
+error_setg(errp, "multifd: %u: alloc_zbuf failed, job num %u, size %u",
+   chan_id, qpl->job_num, qpl->data_size);
+return -1;
+}
+qpl->zbuf = buf;
+qpl->zbuf_hdr = g_new0(uint32_t, qpl->job_num);
+return 0;
+}
+
+static void free_jobs(struct qpl_data *qpl)
+{
+for (int i = 0; i < qpl->job_num; i++) {
+qpl_fini_job(qpl->job_array[i]);
+g_free(qpl->job_array[i]);
+qpl->job_array[i] = NULL;
+}
+g_free(qpl->job_array);
+qpl->job_array = NULL;
+}
+
+static int alloc_jobs(struct qpl_data *qpl, uint8_t chan_id, Error **errp)
+{
+qpl_status status;
+uint32_t job_size = 0;
+qpl_job *job = NULL;
+/* always use IAA hardware accelerator */
+qpl_path_t path = qpl_path_hardware;
+
+status = qpl_get_job_size(path, _size);
+if (status != QPL_STS_OK) {
+error_setg(errp, "multifd: %u: qpl_get_job_size failed with error %d",
+   chan_id, status);
+return -1;
+}
+qpl->job_array = g_new0(qpl_job *, qpl->job_num);
+for (int i = 0; i < qpl->job_num; i++) {
+job = g_malloc0(job_size);
+status = qpl_init_job(path, job);
+if (status != QPL_STS_OK) {
+error_setg(errp, "multifd: %u: qpl_init_job failed with error %d",
+   chan_id, status);
+free_jobs(qpl);
+return -1;
+}
+qpl->job_array[i] = job;
+}
+return 0;
+}
+
+static int init_qpl(struct qpl_data *qpl, uint32_t job_num, uint32_t data_size,
+uint8_t chan_id, Error **errp)
+{
+qpl->job_num = job_num;
+qpl->data_size = data_size;
+if (alloc_zbuf(qpl, chan_id, errp) != 0) {
+return -1;
+}
+if (alloc_jobs(qpl, chan_id, errp) != 0) {
+free_zbuf(qpl);
+return -1;
+}
+return 0;
+}
+
+static void deinit_qpl(struct qpl_data *qpl)
+{
+if (qpl != NULL) {
+free_jobs(qpl);
+free_zbuf(qpl);
+qpl->job_num = 0;
+qpl->data_size = 0;
+}
+}
+
 /**
  * qpl_send_setup: setup send side
  *
@@ -45,8 +139,15 @@ struct qpl_data {
  */
 static int qpl_send_setup(MultiFDSendParams *p, Error **errp)
 {
-/* Implement in next patch */
-return -1;
+struct qpl_data *qpl;
+
+qpl = g_new0(struct qpl_data, 1);
+if (init_qpl(qpl, p->page_count, p->page_size, p->id, errp) != 0) {
+g_free(qpl);
+return -1;
+}
+p->data = qpl;
+return 0;
 }
 
 /**
@@ -59,7 +160,11 @@ static int qpl_send_setup(MultiFDSendParams *p, Error 
**errp)
  */
 static void qpl_send_cleanup(MultiFDSendParams *p, Error **errp)
 {
-/* Implement in next patch */
+struct qpl_data *qpl = p->data;
+
+deinit_qpl(qpl);
+g_free(p->data);
+p->data = NULL;
 }
 
 /**
@@ -91,8 +196,15 @@ static int qpl_send_prepare(MultiFDSendParams *p, Error 
**errp)
  */
 static int qpl_recv_setup(MultiFDRecvParams *p, Error **errp)
 {
-/* Implement in next patch */
-return -1;
+struct qpl_data *qpl;
+
+qpl = g_new0(struct qpl_data, 1);
+if (init_qpl(qpl, p->page_count, p->page_size, p->id, errp) != 0) {
+g_free(qpl);
+return -1;
+}
+p->data = qpl;
+return 0;
 }
 
 /**
@@ -104,7 +216,11 @@ static int qpl_recv_setup(MultiFDRecvParams *p, Error 
**errp)
  */
 static void qpl_recv_cleanup(MultiFDRecvParams *p)
 {
-/* Implement in next patch */
+struct qpl_data *qpl = p->data;
+
+deinit_qpl(qpl);
+g_free(p->data);
+p->data = NULL;
 }
 
 /**
-- 
2.39.3

[PATCH v4 2/8] migration/multifd: add get_iov_count in the multifd method

2024-03-04 Thread Yuan Liu

the new function get_iov_count is used to get the number of
IOVs required by a specified multifd method

Different multifd methods may require different numbers of IOVs.
Based on streaming compression of zlib and zstd, all pages will be
compressed to a data block, so an IOV is required to send this data
block. For no compression, each IOV is used to send a page, so the
number of IOVs required is the same as the number of pages.

Signed-off-by: Yuan Liu 
Reviewed-by: Nanhai Zou 
---
 migration/multifd-zlib.c | 18 +-
 migration/multifd-zstd.c | 18 +-
 migration/multifd.c  | 24 +---
 migration/multifd.h  |  2 ++
 4 files changed, 57 insertions(+), 5 deletions(-)

diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index 012e3bdea1..35187f2aff 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -313,13 +313,29 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error 
**errp)
 return 0;
 }
 
+/**
+ * zlib_get_iov_count: get the count of IOVs
+ *
+ * For zlib streaming compression, all pages will be compressed into a data
+ * block, and an IOV is requested for sending this block.
+ *
+ * Returns the count of the IOVs
+ *
+ * @page_count: Indicate the maximum count of pages processed by multifd
+ */
+static uint32_t zlib_get_iov_count(uint32_t page_count)
+{
+return 1;
+}
+
 static MultiFDMethods multifd_zlib_ops = {
 .send_setup = zlib_send_setup,
 .send_cleanup = zlib_send_cleanup,
 .send_prepare = zlib_send_prepare,
 .recv_setup = zlib_recv_setup,
 .recv_cleanup = zlib_recv_cleanup,
-.recv_pages = zlib_recv_pages
+.recv_pages = zlib_recv_pages,
+.get_iov_count = zlib_get_iov_count
 };
 
 static void multifd_zlib_register(void)
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index dc8fe43e94..25ed1add2a 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -304,13 +304,29 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error 
**errp)
 return 0;
 }
 
+/**
+ * zstd_get_iov_count: get the count of IOVs
+ *
+ * For zstd streaming compression, all pages will be compressed into a data
+ * block, and an IOV is requested for sending this block.
+ *
+ * Returns the count of the IOVs
+ *
+ * @page_count: Indicate the maximum count of pages processed by multifd
+ */
+static uint32_t zstd_get_iov_count(uint32_t page_count)
+{
+return 1;
+}
+
 static MultiFDMethods multifd_zstd_ops = {
 .send_setup = zstd_send_setup,
 .send_cleanup = zstd_send_cleanup,
 .send_prepare = zstd_send_prepare,
 .recv_setup = zstd_recv_setup,
 .recv_cleanup = zstd_recv_cleanup,
-.recv_pages = zstd_recv_pages
+.recv_pages = zstd_recv_pages,
+.get_iov_count = zstd_get_iov_count
 };
 
 static void multifd_zstd_register(void)
diff --git a/migration/multifd.c b/migration/multifd.c
index adfe8c9a0a..787402247e 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -209,13 +209,29 @@ static int nocomp_recv_pages(MultiFDRecvParams *p, Error 
**errp)
 return qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);
 }
 
+/**
+ * nocomp_get_iov_count: get the count of IOVs
+ *
+ * For no compression, the count of IOVs required is the same as the count of
+ * pages
+ *
+ * Returns the count of the IOVs
+ *
+ * @page_count: Indicate the maximum count of pages processed by multifd
+ */
+static uint32_t nocomp_get_iov_count(uint32_t page_count)
+{
+return page_count;
+}
+
 static MultiFDMethods multifd_nocomp_ops = {
 .send_setup = nocomp_send_setup,
 .send_cleanup = nocomp_send_cleanup,
 .send_prepare = nocomp_send_prepare,
 .recv_setup = nocomp_recv_setup,
 .recv_cleanup = nocomp_recv_cleanup,
-.recv_pages = nocomp_recv_pages
+.recv_pages = nocomp_recv_pages,
+.get_iov_count = nocomp_get_iov_count
 };
 
 static MultiFDMethods *multifd_ops[MULTIFD_COMPRESSION__MAX] = {
@@ -998,6 +1014,8 @@ bool multifd_send_setup(void)
 Error *local_err = NULL;
 int thread_count, ret = 0;
 uint32_t page_count = MULTIFD_PACKET_SIZE / qemu_target_page_size();
+/* We need one extra place for the packet header */
+uint32_t iov_count = 1;
 uint8_t i;
 
 if (!migrate_multifd()) {
@@ -1012,6 +1030,7 @@ bool multifd_send_setup(void)
 qemu_sem_init(_send_state->channels_ready, 0);
 qatomic_set(_send_state->exiting, 0);
 multifd_send_state->ops = multifd_ops[migrate_multifd_compression()];
+iov_count += multifd_send_state->ops->get_iov_count(page_count);
 
 for (i = 0; i < thread_count; i++) {
 MultiFDSendParams *p = _send_state->params[i];
@@ -1026,8 +1045,7 @@ bool multifd_send_setup(void)
 p->packet->magic = cpu_to_be32(MULTIFD_MAGIC);
 p->packet->version = cpu_to_be32(MULTIFD_VERSION);
 p->name = g_strdup_printf("multifdsend_%d", i);
-/* We need one extra place for the packet header */
-p->iov = g_new0(struct iovec, page_count + 1);

[PATCH v4 8/8] tests/migration-test: add qpl compression test

2024-03-04 Thread Yuan Liu

add qpl to compression method test for multifd migration

the migration with qpl compression needs to access IAA hardware
resource, please run "check-qtest" with sudo or root permission,
otherwise migration test will fail

Signed-off-by: Yuan Liu 
Reviewed-by: Nanhai Zou 
---
 tests/qtest/migration-test.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 23d50fe599..96842f9515 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2653,6 +2653,15 @@ test_migrate_precopy_tcp_multifd_zstd_start(QTestState 
*from,
 }
 #endif /* CONFIG_ZSTD */
 
+#ifdef CONFIG_QPL
+static void *
+test_migrate_precopy_tcp_multifd_qpl_start(QTestState *from,
+QTestState *to)
+{
+return test_migrate_precopy_tcp_multifd_start_common(from, to, "qpl");
+}
+#endif /* CONFIG_QPL */
+
 static void test_multifd_tcp_none(void)
 {
 MigrateCommon args = {
@@ -2688,6 +2697,17 @@ static void test_multifd_tcp_zstd(void)
 }
 #endif
 
+#ifdef CONFIG_QPL
+static void test_multifd_tcp_qpl(void)
+{
+MigrateCommon args = {
+.listen_uri = "defer",
+.start_hook = test_migrate_precopy_tcp_multifd_qpl_start,
+};
+test_precopy_common();
+}
+#endif
+
 #ifdef CONFIG_GNUTLS
 static void *
 test_migrate_multifd_tcp_tls_psk_start_match(QTestState *from,
@@ -3574,6 +3594,10 @@ int main(int argc, char **argv)
 migration_test_add("/migration/multifd/tcp/plain/zstd",
test_multifd_tcp_zstd);
 #endif
+#ifdef CONFIG_QPL
+migration_test_add("/migration/multifd/tcp/plain/qpl",
+   test_multifd_tcp_qpl);
+#endif
 #ifdef CONFIG_GNUTLS
 migration_test_add("/migration/multifd/tcp/tls/psk/match",
test_multifd_tcp_tls_psk_match);
-- 
2.39.3

[PATCH v4 4/8] migration/multifd: add qpl compression method

2024-03-04 Thread Yuan Liu

add the Query Processing Library (QPL) compression method

Although both qpl and zlib support deflate compression, qpl will
only use the In-Memory Analytics Accelerator(IAA) for compression
and decompression, and IAA is not compatible with the Zlib in
migration, so qpl is used as a new compression method for migration.

How to enable qpl compression during migration:
migrate_set_parameter multifd-compression qpl

The qpl only supports one compression level, there is no qpl
compression level parameter added, users do not need to specify
the qpl compression level.

Signed-off-by: Yuan Liu 
Reviewed-by: Nanhai Zou 
---
 hw/core/qdev-properties-system.c |   2 +-
 migration/meson.build|   1 +
 migration/multifd-qpl.c  | 158 +++
 migration/multifd.h  |   1 +
 qapi/migration.json  |   7 +-
 5 files changed, 167 insertions(+), 2 deletions(-)
 create mode 100644 migration/multifd-qpl.c

diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index 1a396521d5..b4f0e5cbdb 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -658,7 +658,7 @@ const PropertyInfo qdev_prop_fdc_drive_type = {
 const PropertyInfo qdev_prop_multifd_compression = {
 .name = "MultiFDCompression",
 .description = "multifd_compression values, "
-   "none/zlib/zstd",
+   "none/zlib/zstd/qpl",
 .enum_table = _lookup,
 .get = qdev_propinfo_get_enum,
 .set = qdev_propinfo_set_enum,
diff --git a/migration/meson.build b/migration/meson.build
index 92b1cc4297..c155c2d781 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -40,6 +40,7 @@ if get_option('live_block_migration').allowed()
   system_ss.add(files('block.c'))
 endif
 system_ss.add(when: zstd, if_true: files('multifd-zstd.c'))
+system_ss.add(when: qpl, if_true: files('multifd-qpl.c'))
 
 specific_ss.add(when: 'CONFIG_SYSTEM_ONLY',
 if_true: files('ram.c',
diff --git a/migration/multifd-qpl.c b/migration/multifd-qpl.c
new file mode 100644
index 00..6b94e732ac
--- /dev/null
+++ b/migration/multifd-qpl.c
@@ -0,0 +1,158 @@
+/*
+ * Multifd qpl compression accelerator implementation
+ *
+ * Copyright (c) 2023 Intel Corporation
+ *
+ * Authors:
+ *  Yuan Liu
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/rcu.h"
+#include "exec/ramblock.h"
+#include "exec/target_page.h"
+#include "qapi/error.h"
+#include "migration.h"
+#include "trace.h"
+#include "options.h"
+#include "multifd.h"
+#include "qpl/qpl.h"
+
+struct qpl_data {
+qpl_job **job_array;
+/* the number of allocated jobs */
+uint32_t job_num;
+/* the size of data processed by a qpl job */
+uint32_t data_size;
+/* compressed data buffer */
+uint8_t *zbuf;
+/* the length of compressed data */
+uint32_t *zbuf_hdr;
+};
+
+/**
+ * qpl_send_setup: setup send side
+ *
+ * Setup each channel with QPL compression.
+ *
+ * Returns 0 for success or -1 for error
+ *
+ * @p: Params for the channel that we are using
+ * @errp: pointer to an error
+ */
+static int qpl_send_setup(MultiFDSendParams *p, Error **errp)
+{
+/* Implement in next patch */
+return -1;
+}
+
+/**
+ * qpl_send_cleanup: cleanup send side
+ *
+ * Close the channel and return memory.
+ *
+ * @p: Params for the channel that we are using
+ * @errp: pointer to an error
+ */
+static void qpl_send_cleanup(MultiFDSendParams *p, Error **errp)
+{
+/* Implement in next patch */
+}
+
+/**
+ * qpl_send_prepare: prepare data to be able to send
+ *
+ * Create a compressed buffer with all the pages that we are going to
+ * send.
+ *
+ * Returns 0 for success or -1 for error
+ *
+ * @p: Params for the channel that we are using
+ * @errp: pointer to an error
+ */
+static int qpl_send_prepare(MultiFDSendParams *p, Error **errp)
+{
+/* Implement in next patch */
+return -1;
+}
+
+/**
+ * qpl_recv_setup: setup receive side
+ *
+ * Create the compressed channel and buffer.
+ *
+ * Returns 0 for success or -1 for error
+ *
+ * @p: Params for the channel that we are using
+ * @errp: pointer to an error
+ */
+static int qpl_recv_setup(MultiFDRecvParams *p, Error **errp)
+{
+/* Implement in next patch */
+return -1;
+}
+
+/**
+ * qpl_recv_cleanup: setup receive side
+ *
+ * Close the channel and return memory.
+ *
+ * @p: Params for the channel that we are using
+ */
+static void qpl_recv_cleanup(MultiFDRecvParams *p)
+{
+/* Implement in next patch */
+}
+
+/**
+ * qpl_recv_pages: read the data from the channel into actual pages
+ *
+ * Read the compressed buffer, and uncompress it into the actual
+ * pages.
+ *
+ * Returns 0 for success or -1 for error
+ *
+ * @p: Params for the channel that we are using
+ * @errp: pointer to an error
+ */
+static int qpl_recv_pages(MultiFDRecvParams *p, Error

[PATCH v4 3/8] configure: add --enable-qpl build option

2024-03-04 Thread Yuan Liu

add --enable-qpl and --disable-qpl options to enable and disable
the QPL compression method for multifd migration.

the Query Processing Library (QPL) is an open-source library
that supports data compression and decompression features.

The QPL compression is based on the deflate compression algorithm
and use Intel In-Memory Analytics Accelerator(IAA) hardware for
compression and decompression acceleration.

Please refer to the following for more information about QPL
https://intel.github.io/qpl/documentation/introduction_docs/introduction.html

Signed-off-by: Yuan Liu 
Reviewed-by: Nanhai Zou 
---
 meson.build   | 18 ++
 meson_options.txt |  2 ++
 scripts/meson-buildoptions.sh |  3 +++
 3 files changed, 23 insertions(+)

diff --git a/meson.build b/meson.build
index c1dc83e4c0..2dea1e6834 100644
--- a/meson.build
+++ b/meson.build
@@ -1197,6 +1197,22 @@ if not get_option('zstd').auto() or have_block
 required: get_option('zstd'),
 method: 'pkg-config')
 endif
+qpl = not_found
+if not get_option('qpl').auto()
+  libqpl = cc.find_library('qpl', required: false)
+  if not libqpl.found()
+error('libqpl not found, please install it from ' +
+
'https://intel.github.io/qpl/documentation/get_started_docs/installation.html')
+  endif
+  libaccel = cc.find_library('accel-config', required: false)
+  if not libaccel.found()
+error('libaccel-config not found, please install it from ' +
+'https://github.com/intel/idxd-config')
+  endif
+  qpl = declare_dependency(dependencies: [libqpl, libaccel,
+cc.find_library('dl', required: get_option('qpl'))],
+link_args: ['-lstdc++'])
+endif
 virgl = not_found
 
 have_vhost_user_gpu = have_tools and host_os == 'linux' and pixman.found()
@@ -2298,6 +2314,7 @@ config_host_data.set('CONFIG_MALLOC_TRIM', 
has_malloc_trim)
 config_host_data.set('CONFIG_STATX', has_statx)
 config_host_data.set('CONFIG_STATX_MNT_ID', has_statx_mnt_id)
 config_host_data.set('CONFIG_ZSTD', zstd.found())
+config_host_data.set('CONFIG_QPL', qpl.found())
 config_host_data.set('CONFIG_FUSE', fuse.found())
 config_host_data.set('CONFIG_FUSE_LSEEK', fuse_lseek.found())
 config_host_data.set('CONFIG_SPICE_PROTOCOL', spice_protocol.found())
@@ -4438,6 +4455,7 @@ summary_info += {'snappy support':snappy}
 summary_info += {'bzip2 support': libbzip2}
 summary_info += {'lzfse support': liblzfse}
 summary_info += {'zstd support':  zstd}
+summary_info += {'Query Processing Library support': qpl}
 summary_info += {'NUMA host support': numa}
 summary_info += {'capstone':  capstone}
 summary_info += {'libpmem support':   libpmem}
diff --git a/meson_options.txt b/meson_options.txt
index 0a99a059ec..06cd675572 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -259,6 +259,8 @@ option('xkbcommon', type : 'feature', value : 'auto',
description: 'xkbcommon support')
 option('zstd', type : 'feature', value : 'auto',
description: 'zstd compression support')
+option('qpl', type : 'feature', value : 'auto',
+   description: 'Query Processing Library support')
 option('fuse', type: 'feature', value: 'auto',
description: 'FUSE block device export')
 option('fuse_lseek', type : 'feature', value : 'auto',
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index 680fa3f581..784f74fde9 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -222,6 +222,7 @@ meson_options_help() {
   printf "%s\n" '  Xen PCI passthrough support'
   printf "%s\n" '  xkbcommon   xkbcommon support'
   printf "%s\n" '  zstdzstd compression support'
+  printf "%s\n" '  qpl Query Processing Library support'
 }
 _meson_option_parse() {
   case $1 in
@@ -562,6 +563,8 @@ _meson_option_parse() {
 --disable-xkbcommon) printf "%s" -Dxkbcommon=disabled ;;
 --enable-zstd) printf "%s" -Dzstd=enabled ;;
 --disable-zstd) printf "%s" -Dzstd=disabled ;;
+--enable-qpl) printf "%s" -Dqpl=enabled ;;
+--disable-qpl) printf "%s" -Dqpl=disabled ;;
 *) return 1 ;;
   esac
 }
-- 
2.39.3

[PATCH v4 7/8] migration/multifd: fix zlib and zstd compression levels not working

2024-03-04 Thread Yuan Liu

add zlib and zstd compression levels in multifd parameter
testing and application and add compression level tests

Signed-off-by: Yuan Liu 
Reviewed-by: Nanhai Zou 
Reported-by: Xiaohui Li 
---
 migration/options.c  | 12 
 tests/qtest/migration-test.c | 16 
 2 files changed, 28 insertions(+)

diff --git a/migration/options.c b/migration/options.c
index 3e3e0b93b4..1cd3cc7c33 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -1312,6 +1312,12 @@ static void 
migrate_params_test_apply(MigrateSetParameters *params,
 if (params->has_multifd_compression) {
 dest->multifd_compression = params->multifd_compression;
 }
+if (params->has_multifd_zlib_level) {
+dest->multifd_zlib_level = params->multifd_zlib_level;
+}
+if (params->has_multifd_zstd_level) {
+dest->multifd_zstd_level = params->multifd_zstd_level;
+}
 if (params->has_xbzrle_cache_size) {
 dest->xbzrle_cache_size = params->xbzrle_cache_size;
 }
@@ -1447,6 +1453,12 @@ static void migrate_params_apply(MigrateSetParameters 
*params, Error **errp)
 if (params->has_multifd_compression) {
 s->parameters.multifd_compression = params->multifd_compression;
 }
+if (params->has_multifd_zlib_level) {
+s->parameters.multifd_zlib_level = params->multifd_zlib_level;
+}
+if (params->has_multifd_zstd_level) {
+s->parameters.multifd_zstd_level = params->multifd_zstd_level;
+}
 if (params->has_xbzrle_cache_size) {
 s->parameters.xbzrle_cache_size = params->xbzrle_cache_size;
 xbzrle_cache_resize(params->xbzrle_cache_size, errp);
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 8a5bb1752e..23d50fe599 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2621,10 +2621,24 @@ test_migrate_precopy_tcp_multifd_start(QTestState *from,
 return test_migrate_precopy_tcp_multifd_start_common(from, to, "none");
 }
 
+static void
+test_and_set_multifd_compression_level(QTestState *who, const char *param)
+{
+/* The default compression level is 1, test a level other than 1 */
+int level = 2;
+
+migrate_set_parameter_int(who, param, level);
+migrate_check_parameter_int(who, param, level);
+/* only test compression level 1 during migration */
+migrate_set_parameter_int(who, param, 1);
+}
+
 static void *
 test_migrate_precopy_tcp_multifd_zlib_start(QTestState *from,
 QTestState *to)
 {
+/* the compression level is used only on the source side. */
+test_and_set_multifd_compression_level(from, "multifd-zlib-level");
 return test_migrate_precopy_tcp_multifd_start_common(from, to, "zlib");
 }
 
@@ -2633,6 +2647,8 @@ static void *
 test_migrate_precopy_tcp_multifd_zstd_start(QTestState *from,
 QTestState *to)
 {
+/* the compression level is used only on the source side. */
+test_and_set_multifd_compression_level(from, "multifd-zstd-level");
 return test_migrate_precopy_tcp_multifd_start_common(from, to, "zstd");
 }
 #endif /* CONFIG_ZSTD */
-- 
2.39.3

[PATCH v4 1/8] docs/migration: add qpl compression feature

2024-03-04 Thread Yuan Liu

add QPL compression method introduction

Signed-off-by: Yuan Liu 
Reviewed-by: Nanhai Zou 
---
 docs/devel/migration/features.rst|   1 +
 docs/devel/migration/qpl-compression.rst | 231 +++
 2 files changed, 232 insertions(+)
 create mode 100644 docs/devel/migration/qpl-compression.rst

diff --git a/docs/devel/migration/features.rst 
b/docs/devel/migration/features.rst
index a9acaf618e..9819393c12 100644
--- a/docs/devel/migration/features.rst
+++ b/docs/devel/migration/features.rst
@@ -10,3 +10,4 @@ Migration has plenty of features to support different use 
cases.
dirty-limit
vfio
virtio
+   qpl-compression
diff --git a/docs/devel/migration/qpl-compression.rst 
b/docs/devel/migration/qpl-compression.rst
new file mode 100644
index 00..42c7969d30
--- /dev/null
+++ b/docs/devel/migration/qpl-compression.rst
@@ -0,0 +1,231 @@
+===
+QPL Compression
+===
+The Intel Query Processing Library (Intel ``QPL``) is an open-source library to
+provide compression and decompression features and it is based on deflate
+compression algorithm (RFC 1951).
+
+The ``QPL`` compression relies on Intel In-Memory Analytics 
Accelerator(``IAA``)
+and Shared Virtual Memory(``SVM``) technology, they are new features supported
+from Intel 4th Gen Intel Xeon Scalable processors, codenamed Sapphire Rapids
+processor(``SPR``).
+
+For more ``QPL`` introduction, please refer to:
+
+https://intel.github.io/qpl/documentation/introduction_docs/introduction.html
+
+QPL Compression Framework
+=
+
+::
+
+  ++   +--+
+  | MultiFD Service|   |accel-config tool |
+  +---++   ++-+
+  | |
+  | |
+  +---++| Setup IAA
+  |  QPL library   || Resources
+  +---+---++|
+  |   | |
+  |   +-+---+
+  |   Open IAA  |
+  |   Devices +-+-+
+  |   |idxd driver|
+  |   +-+-+
+  | |
+  | |
+  |   +-+-+
+  +---+IAA Devices|
+  Submit jobs +---+
+  via enqcmd
+
+
+Intel In-Memory Analytics Accelerator (Intel IAA) Introduction
+
+
+Intel ``IAA`` is an accelerator that has been designed to help benefit
+in-memory databases and analytic workloads. There are three main areas
+that Intel ``IAA`` can assist with analytics primitives (scan, filter, etc.),
+sparse data compression and memory tiering.
+
+``IAA`` Manual Documentation:
+
+https://www.intel.com/content/www/us/en/content-details/721858/intel-in-memory-analytics-accelerator-architecture-specification
+
+IAA Device Enabling
+---
+
+- Enabling ``IAA`` devices for platform configuration, please refer to:
+
+https://www.intel.com/content/www/us/en/content-details/780887/intel-in-memory-analytics-accelerator-intel-iaa.html
+
+- ``IAA`` device driver is ``Intel Data Accelerator Driver (idxd)``, it is
+  recommended that the minimum version of Linux kernel is 5.18.
+
+- Add ``"intel_iommu=on,sm_on"`` parameter to kernel command line
+  for ``SVM`` feature enabling.
+
+Here is an easy way to verify ``IAA`` device driver and ``SVM``, refer to:
+
+https://github.com/intel/idxd-config/tree/stable/test
+
+IAA Device Management
+-
+
+The number of ``IAA`` devices will vary depending on the Xeon product model.
+On a ``SPR`` server, there can be a maximum of 8 ``IAA`` devices, with up to
+4 devices per socket.
+
+By default, all ``IAA`` devices are disabled and need to be configured and
+enabled by users manually.
+
+Check the number of devices through the following command
+
+.. code-block:: shell
+
+  # lspci -d 8086:0cfe
+  # 6a:02.0 System peripheral: Intel Corporation Device 0cfe
+  # 6f:02.0 System peripheral: Intel Corporation Device 0cfe
+  # 74:02.0 System peripheral: Intel Corporation Device 0cfe
+  # 79:02.0 System peripheral: Intel Corporation Device 0cfe
+  # e7:02.0 System peripheral: Intel Corporation Device 0cfe
+  # ec:02.0 System peripheral: Intel Corporation Device 0cfe
+  # f1:02.0 System peripheral: Intel Corporation Device 0cfe
+  # f6:02.0 System peripheral: Intel Corporation Device 0cfe
+
+IAA Device Configuration
+
+
+The ``accel-config`` tool is used to enable ``IAA`` devices and configure
+``IAA`` hardware resources(work queues and engines). One ``IAA`` device
+has 8 work queues and 8 processing engines, multiple engines can be assigned
+to a work queue via ``group`` attribute.
+
+One example of configuring and enabling an ``IAA`` device.
+
+.. code-block:: shell
+
+  # accel-config config-engine iax1/engine1.0 -g 0
+  # accel-config config-engine iax1/engine1.1

[PATCH] qemu-options.hx: Fix uncorrect description of "-serial"

2024-03-04 Thread steven.s...@jaguarmicro.com

Before v2.12, serial_hds used MAX_SERIAL_PORTS(4) for
resources of serials.The limitaion description of "-serial"
option: "This option can be used several times to simulate
up to 4 serial ports."
In latest qemu, serial_hds have been replaced by "Chardev **"
and now is dynamically allocated through "g_renew".
So the limitation description is not suitable now.
Update to "This option can be used several times to simulate
multiple serial ports." to avoid misleading.

Signed-off-by: Steven Shen 
---
 qemu-options.hx | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 9a47385c15..ac4a30fa83 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4129,7 +4129,7 @@ SRST
 default device is ``vc`` in graphical mode and ``stdio`` in non
 graphical mode.
 
-This option can be used several times to simulate up to 4 serial
+This option can be used several times to simulate multiple serial
 ports.
 
 You can use ``-serial none`` to suppress the creation of default
-- 
2.36.0.windows.1

Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_NOTIFICATION_DATA support

2024-03-04 Thread Xinying Yu

Of course,  I am glad to do.  And I need to clarify that our use case only 
support VIRTIO_F_NOTIFICATION_DATA  transport feature on DPDK vDPA framework 
which the backend type is NET_CLIENT_DRIVER_VHOST_USER and use 
user_feature_bits. So the new feature add on vdpa_feature_bits  will not under 
verified in our case.  Not sure this meets your expectations?
One more thing, I would ask how do  I get the full series patch? Do I copy the 
RFC line by line from this link[1]?

Thanks,
Xinying


[1] https://lists.nongnu.org/archive/html/qemu-devel/2024-03/msg00090.html


From: Eugenio Perez Martin 
Sent: Saturday, March 2, 2024 4:32 AM
To: Wentao Jia ; Rick Zhong 
; Xinying Yu 
Cc: Jonah Palmer ; qemu-devel@nongnu.org 
; m...@redhat.com ; jasow...@redhat.com 
; si-wei@oracle.com ; 
boris.ostrov...@oracle.com ; raph...@enfabrica.net 
; kw...@redhat.com ; hre...@redhat.com 
; pa...@linux.ibm.com ; 
borntrae...@linux.ibm.com ; far...@linux.ibm.com 
; th...@redhat.com ; 
richard.hender...@linaro.org ; da...@redhat.com 
; i...@linux.ibm.com ; coh...@redhat.com 
; pbonz...@redhat.com ; f...@euphon.net 
; stefa...@redhat.com ; 
qemu-bl...@nongnu.org ; qemu-s3...@nongnu.org 
; virtio...@lists.linux.dev 
Subject: Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_NOTIFICATION_DATA support

Hi Wentao / Rick / Xinying Yu,

Would it work for you to test this series on your use cases, so we
make sure everything works as expected?

Thanks!

On Fri, Mar 1, 2024 at 2:44 PM Jonah Palmer  wrote:
>
> The goal of these patches are to add support to a variety of virtio and
> vhost devices for the VIRTIO_F_NOTIFICATION_DATA transport feature. This
> feature indicates that a driver will pass extra data (instead of just a
> virtqueue's index) when notifying the corresponding device.
>
> The data passed in by the driver when this feature is enabled varies in
> format depending on if the device is using a split or packed virtqueue
> layout:
>
>  Split VQ
>   - Upper 16 bits: last_avail_idx
>   - Lower 16 bits: virtqueue index
>
>  Packed VQ
>   - Upper 16 bits: 1-bit wrap counter & 15-bit last_avail_idx
>   - Lower 16 bits: virtqueue index
>
> Also, due to the limitations of ioeventfd not being able to carry the
> extra provided by the driver, ioeventfd is left disabled for any devices
> using this feature.
>
> A significant aspect of this effort has been to maintain compatibility
> across different backends. As such, the feature is offered by backend
> devices only when supported, with fallback mechanisms where backend
> support is absent.
>

Hi Wentao,

Re: [PATCH v3 04/26] migration: Always report an error in ram_save_setup()

2024-03-04 Thread Prasad Pandit

On Mon, 4 Mar 2024 at 18:01, Cédric Le Goater  wrote:
> This will prepare ground for futur changes adding an Error** argument

* futur -> future

> +ret = qemu_fflush(f);
> +if (ret) {

* if (ret) -> if (ret < 0)

Thank you.
---
  - Prasad

RE: [PATCH v2 8/9] aspeed: Add an AST2700 eval board

2024-03-04 Thread Jamin Lin

> -Original Message-
> From: Cédric Le Goater 
> Sent: Monday, March 4, 2024 11:40 PM
> To: Jamin Lin ; Peter Maydell
> ; Andrew Jeffery ;
> Joel Stanley ; Alistair Francis ; open
> list:ASPEED BMCs ; open list:All patches CC here
> 
> Cc: Troy Lee ; Yunlin Tang
> 
> Subject: Re: [PATCH v2 8/9] aspeed: Add an AST2700 eval board
> 
> On 3/4/24 10:29, Jamin Lin wrote:
> > AST2700 CPU is ARM Cortex-A35 which is 64 bits.
> > Add TARGET_AARCH64 to build this machine.
> >
> > According to the design of ast2700, it has a bootmcu(riscv-32) which
> > is used for executing SPL.
> > Then, CPUs(cortex-a35) execute u-boot, kernel and rofs.
> >
> > Currently, qemu not support emulate two CPU architectures at the same
> > machine. Therefore, qemu will only support to emulate CPU(cortex-a35)
> > side for ast2700
> >
> > Signed-off-by: Troy Lee 
> > Signed-off-by: Jamin Lin 
> > ---
> >   hw/arm/aspeed.c | 32 
> >   1 file changed, 32 insertions(+)
> >
> > diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c index
> > 8854581ca8..4544026d14 100644
> > --- a/hw/arm/aspeed.c
> > +++ b/hw/arm/aspeed.c
> > @@ -178,6 +178,12 @@ struct AspeedMachineState {
> >   #define AST2600_EVB_HW_STRAP1 0x00C0
> >   #define AST2600_EVB_HW_STRAP2 0x0003
> >
> > +#ifdef TARGET_AARCH64
> > +/* AST2700 evb hardware value */
> > +#define AST2700_EVB_HW_STRAP1 0x00C0 #define
> > +AST2700_EVB_HW_STRAP2 0x0003 #endif
> > +
> >   /* Tacoma hardware value */
> >   #define TACOMA_BMC_HW_STRAP1  0x
> >   #define TACOMA_BMC_HW_STRAP2  0x0040 @@ -1588,6
> +1594,26 @@
> > static void aspeed_minibmc_machine_ast1030_evb_class_init(ObjectClass
> *oc,
> >   aspeed_machine_class_init_cpus_defaults(mc);
> >   }
> >
> > +#ifdef TARGET_AARCH64
> > +static void aspeed_machine_ast2700_evb_class_init(ObjectClass *oc,
> > +void *data) {
> > +MachineClass *mc = MACHINE_CLASS(oc);
> > +AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
> > +
> > +mc->desc = "Aspeed AST2700 EVB (Cortex-A35)";
> > +amc->soc_name  = "ast2700-a0";
> > +amc->hw_strap1 = AST2700_EVB_HW_STRAP1;
> > +amc->hw_strap2 = AST2700_EVB_HW_STRAP2;
> > +amc->fmc_model = "w25q01jvq";
> > +amc->spi_model = "w25q512jv";
> > +amc->num_cs= 2;
> > +amc->macs_mask = ASPEED_MAC0_ON | ASPEED_MAC1_ON |
> ASPEED_MAC2_ON;
> > +amc->uart_default = ASPEED_DEV_UART12;
> > +mc->default_ram_size = 1 * GiB;
> 
> This seems low. What's the size on real HW  ?
> 
Hi Cedric,
Thanks for review.
The default ram size is 1 GiB on AST2700 EVB and AST2700 support the maximum 
dram size is 8GiB.
> Anyhow,
> 
> 
> Reviewed-by: Cédric Le Goater 
> 
> Thanks,
> 
> C.
> 
> 
> > +aspeed_machine_class_init_cpus_defaults(mc);
> > +}
> > +#endif
> > +
> >   static void aspeed_machine_qcom_dc_scm_v1_class_init(ObjectClass
> *oc,
> >void
> *data)
> >   {
> > @@ -1711,6 +1737,12 @@ static const TypeInfo aspeed_machine_types[] = {
> >   .name   = MACHINE_TYPE_NAME("ast1030-evb"),
> >   .parent = TYPE_ASPEED_MACHINE,
> >   .class_init =
> aspeed_minibmc_machine_ast1030_evb_class_init,
> > +#ifdef TARGET_AARCH64
> > +}, {
> > +.name  = MACHINE_TYPE_NAME("ast2700-evb"),
> > +.parent= TYPE_ASPEED_MACHINE,
> > +.class_init= aspeed_machine_ast2700_evb_class_init,
> > +#endif
> >   }, {
> >   .name  = TYPE_ASPEED_MACHINE,
> >   .parent= TYPE_MACHINE,

Re: [PATCH V2 1/1] target/loongarch: Fixed tlb huge page loading issue

2024-03-04 Thread Richard Henderson


On 3/4/24 17:51, Xianglai Li wrote:

When we use qemu tcg simulation, the page size of bios is 4KB.
When using the level 2 super large page (page size is 1G) to create the page 
table,
it is found that the content of the corresponding address space is abnormal,
resulting in the bios can not start the operating system and graphical 
interface normally.

The lddir and ldpte instruction emulation has
a problem with the use of super large page processing above level 2.
The page size is not correctly calculated,
resulting in the wrong page size of the table entry found by tlb.

Signed-off-by: Xianglai Li 
Cc: maob...@loongson.cn
Cc: Song Gao 
Cc: Xiaojuan Yang 
Cc: zhaotian...@loongson.cn
---
  target/loongarch/cpu.h|  1 +
  target/loongarch/tcg/tlb_helper.c | 21 -
  2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index ec37579fd6..eab3e41c71 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -292,6 +292,7 @@ typedef struct CPUArchState {
  uint32_t fcsr0_mask;
  
  uint32_t cpucfg[21];

+uint32_t lddir_ps;


This magical cpu state does not appear in the manual.
Are you sure that large pages above level 2 are really supported by LDDIR?

Some explanation from the hardware engineering side is required.


r~

Re: [RFC PATCH v5 18/22] hw/intc/arm_gicv3: Implement NMI interrupt prioirty

2024-03-04 Thread Jinjie Ruan via




On 2024/3/4 20:18, Jinjie Ruan wrote:
> 
> 
> On 2024/3/1 7:50, Richard Henderson wrote:
>> On 2/29/24 03:10, Jinjie Ruan via wrote:
>>> If GICD_CTLR_DS bit is zero and the NMI is non-secure, the NMI prioirty
>>> is higher than 0x80, otherwise it is higher than 0x0. And save NMI
>>> super prioirty information in hppi.superprio to deliver NMI exception.
>>> Since both GICR and GICD can deliver NMI, it is both necessary to check
>>> whether the pending irq is NMI in gicv3_redist_update_noirqset and
>>> gicv3_update_noirqset. And In irqbetter(), only a non-NMI with the same
>>> priority and a smaller interrupt number can be preempted but not NMI.
>>>
>>> Signed-off-by: Jinjie Ruan 
>>> ---
>>> v4:
>>> - Replace is_nmi with has_superprio to not a mix NMI and superpriority.
>>> - Update the comment in irqbetter().
>>> - Extract gicv3_get_priority() to avoid code repeat.
>>> ---
>>> v3:
>>> - Add missing brace
>>> ---
>>>   hw/intc/arm_gicv3.c | 71 -
>>>   1 file changed, 63 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/hw/intc/arm_gicv3.c b/hw/intc/arm_gicv3.c
>>> index 0b8f79a122..1d16a53b23 100644
>>> --- a/hw/intc/arm_gicv3.c
>>> +++ b/hw/intc/arm_gicv3.c
>>> @@ -21,7 +21,8 @@
>>>   #include "hw/intc/arm_gicv3.h"
>>>   #include "gicv3_internal.h"
>>>   -static bool irqbetter(GICv3CPUState *cs, int irq, uint8_t prio)
>>> +static bool irqbetter(GICv3CPUState *cs, int irq, uint8_t prio,
>>> +  bool has_superprio)
>>>   {
>>>   /* Return true if this IRQ at this priority should take
>>>    * precedence over the current recorded highest priority
>>> @@ -33,11 +34,24 @@ static bool irqbetter(GICv3CPUState *cs, int irq,
>>> uint8_t prio)
>>>   if (prio < cs->hppi.prio) {
>>>   return true;
>>>   }
>>> +
>>> +    /*
>>> + * Current highest prioirity pending interrupt is an IRQ without
>>> + * superpriority, the new IRQ with superpriority has same priority
>>> + * should signal to the CPU as it have the priority higher than
>>> + * the labelled 0x80 or 0x00.
>>> + */
>>> +    if (prio == cs->hppi.prio && !cs->hppi.superprio && has_superprio) {
>>> +    return true;
>>> +    }
>>> +
>>>   /* If multiple pending interrupts have the same priority then it
>>> is an
>>>    * IMPDEF choice which of them to signal to the CPU. We choose to
>>> - * signal the one with the lowest interrupt number.
>>> + * signal the one with the lowest interrupt number if they don't
>>> have
>>> + * superpriority.
>>>    */
>>> -    if (prio == cs->hppi.prio && irq <= cs->hppi.irq) {
>>> +    if (prio == cs->hppi.prio && !cs->hppi.superprio &&
>>> +    !has_superprio && irq <= cs->hppi.irq) {
>>>   return true;
>>>   }
>>>   return false;
>>> @@ -129,6 +143,35 @@ static uint32_t gicr_int_pending(GICv3CPUState *cs)
>>>   return pend;
>>>   }
>>>   +static bool gicv3_get_priority(GICv3CPUState *cs, bool is_redist,
>>> +   uint32_t superprio, uint8_t *prio, int
>>> irq)
>>> +{
>>> +    bool has_superprio = false;
>>> +
>>> +    if (superprio) {
>>> +    has_superprio = true;
>>> +
>>> +    /* DS = 0 & Non-secure NMI */
>>> +    if (!(cs->gic->gicd_ctlr & GICD_CTLR_DS) &&
>>> +    ((is_redist && extract32(cs->gicr_igroupr0, irq, 1)) ||
>>> + (!is_redist && gicv3_gicd_group_test(cs->gic, irq {
>>> +    *prio = 0x80;
>>> +    } else {
>>> +    *prio = 0x0;
>>> +    }
>>> +    } else {
>>> +    has_superprio = false;
>>> +
>>> +    if (is_redist) {
>>> +    *prio = cs->gicr_ipriorityr[irq];
>>> +    } else {
>>> +    *prio = cs->gic->gicd_ipriority[irq];
>>> +    }
>>> +    }
>>> +
>>> +    return has_superprio;
>>> +}
>>
>> Did you not like the idea to map {priority, !superpriority} into a
>> single value?
>>
>> It would eliminate the change in irqbetter(), which is a bit more
>> complex than it needs to be.
> 
> I will try to change to implement this mapping scheme.

I try to implement this mapping scheme, but it seems more complex, more
change on a larger scale, although it would eliminate the change in
irqbetter(). The subpriority and group priority, eight-bit segmentation
is used, the same is true of the semantics of BPR. And currently, the
invalid values or initial value of all priorities in the QEMU are 0xff,
and all priority mask operations are 0xff.

Especially, what should hppi.prio save? the original priority value or
the mapped priority value. If hppi.prio save the mapped priority value,
all of the above things need to be updated. Otherwise,if hppi.prio save
the original priority value，it will be more simpler, but it also need
to convert from mapped priority to original priority when save
cs->hppi.prio and convert to mapped priority when comparing priority in
irqbetter(), it is also not more compact than existing implementation.

@Peter

> 
>>
>>> @@

RE: [PATCH v1 7/8] aspeed/soc: Add AST2700 support

2024-03-04 Thread Jamin Lin

> -Original Message-
> From: Philippe Mathieu-Daudé 
> Sent: Thursday, February 29, 2024 5:38 PM
> To: Jamin Lin ; Cédric Le Goater ;
> Peter Maydell ; Andrew Jeffery
> ; Joel Stanley ; Alistair
> Francis ; open list:ASPEED BMCs
> ; open list:All patches CC here
> 
> Cc: Troy Lee ; Yunlin Tang
> 
> Subject: Re: [PATCH v1 7/8] aspeed/soc: Add AST2700 support
> 
> Hi Jamin,
> 
> On 29/2/24 08:23, Jamin Lin via wrote:
> > Initial definitions for a simple machine using an AST2700 SOC (Cortex-a35
> CPU).
> >
> > AST2700 SOC and its interrupt controller are too complex to handle in
> > the common Aspeed SoC framework. We introduce a new ast2700 class with
> > instance_init and realize handlers.
> >
> > AST2700 is a 64 bits quad core cpus and support 8 watchdog.
> > Update maximum ASPEED_CPUS_NUM to 4 and ASPEED_WDTS_NUM to 8.
> > In addition, update AspeedSocState to support scuio, sli, sliio and intc.
> >
> > Update silicon_rev data type to 64bits from AspeedSoCClass and add
> > TYPE_ASPEED27X0_SOC machine type.
> >
> > Signed-off-by: Troy Lee 
> > Signed-off-by: Jamin Lin 
> > ---
> >   hw/arm/aspeed_ast27x0.c | 462
> 
> >   hw/arm/meson.build  |   1 +
> >   include/hw/arm/aspeed_soc.h |  26 +-
> >   3 files changed, 486 insertions(+), 3 deletions(-)
> >   create mode 100644 hw/arm/aspeed_ast27x0.c
> 
> 
> > +#define AST2700_MAX_IRQ 288
> > +
> > +/* Shared Peripheral Interrupt values below are offset by -32 from
> > +datasheet */ static const int aspeed_soc_ast2700_irqmap[] = {
> > +[ASPEED_DEV_UART0] = 132,
> > +[ASPEED_DEV_UART1] = 132,
> > +[ASPEED_DEV_UART2] = 132,
> > +[ASPEED_DEV_UART3] = 132,
> > +[ASPEED_DEV_UART4] = 8,
> > +[ASPEED_DEV_UART5] = 132,
> > +[ASPEED_DEV_UART6] = 132,
> > +[ASPEED_DEV_UART7] = 132,
> > +[ASPEED_DEV_UART8] = 132,
> > +[ASPEED_DEV_UART9] = 132,
> > +[ASPEED_DEV_UART10]= 132,
> > +[ASPEED_DEV_UART11]= 132,
> > +[ASPEED_DEV_UART12]= 132,
> 
> When multiple devices output IRQ lines are connected to the same input one,
> a IRQ OR gate has to be used.
> 
> See previous explanations here:
> https://lore.kernel.org/qemu-devel/5a7594d9-3fbd-4d90-a5f9-81b7b845fba7@
> linaro.org/
> 
Thanks for your review and suggestion.
I am studying this design and they will be modified 
in V3 patch series.
Thanks-Jamin

> (Pre-existing issue in aspeed_soc_ast2600_irqmap[])
> 
> > +[ASPEED_DEV_FMC]   = 131,
> > +[ASPEED_DEV_SDMC]  = 0,
> > +[ASPEED_DEV_SCU]   = 12,
> > +[ASPEED_DEV_ADC]   = 130,
> > +[ASPEED_DEV_XDMA]  = 5,
> > +[ASPEED_DEV_EMMC]  = 15,
> > +[ASPEED_DEV_GPIO]  = 11,
> > +[ASPEED_DEV_GPIO_1_8V] = 130,
> > +[ASPEED_DEV_RTC]   = 13,
> > +[ASPEED_DEV_TIMER1]= 16,
> > +[ASPEED_DEV_TIMER2]= 17,
> > +[ASPEED_DEV_TIMER3]= 18,
> > +[ASPEED_DEV_TIMER4]= 19,
> > +[ASPEED_DEV_TIMER5]= 20,
> > +[ASPEED_DEV_TIMER6]= 21,
> > +[ASPEED_DEV_TIMER7]= 22,
> > +[ASPEED_DEV_TIMER8]= 23,
> > +[ASPEED_DEV_WDT]   = 131,
> > +[ASPEED_DEV_PWM]   = 131,
> > +[ASPEED_DEV_LPC]   = 128,
> > +[ASPEED_DEV_IBT]   = 128,
> > +[ASPEED_DEV_I2C]   = 130,
> > +[ASPEED_DEV_PECI]  = 133,
> > +[ASPEED_DEV_ETH1]  = 132,
> > +[ASPEED_DEV_ETH2]  = 132,
> > +[ASPEED_DEV_ETH3]  = 132,
> > +[ASPEED_DEV_HACE]  = 4,
> > +[ASPEED_DEV_KCS]   = 128,
> > +[ASPEED_DEV_DP]= 28,
> > +[ASPEED_DEV_I3C]   = 131,
> > +};

Re: [PATCH 3/7] contrib/elf2dmp: Ensure segment fits in file

2024-03-04 Thread Akihiko Odaki


On 2024/03/05 2:52, Peter Maydell wrote:

On Sun, 3 Mar 2024 at 10:53, Akihiko Odaki  wrote:


This makes elf2dmp more robust against corrupted inputs.

Signed-off-by: Akihiko Odaki 
---
  contrib/elf2dmp/addrspace.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/contrib/elf2dmp/addrspace.c b/contrib/elf2dmp/addrspace.c
index 980a7aa5f8fb..d546a400dfda 100644
--- a/contrib/elf2dmp/addrspace.c
+++ b/contrib/elf2dmp/addrspace.c
@@ -88,11 +88,12 @@ int pa_space_create(struct pa_space *ps, QEMU_Elf *qemu_elf)
  ps->block = g_new(struct pa_block, ps->block_nr);

  for (i = 0; i < phdr_nr; i++) {
-if (phdr[i].p_type == PT_LOAD) {
+if (phdr[i].p_type == PT_LOAD && phdr[i].p_offset < qemu_elf->size) {
  ps->block[block_i] = (struct pa_block) {
  .addr = (uint8_t *)qemu_elf->map + phdr[i].p_offset,
  .paddr = phdr[i].p_paddr,
-.size = phdr[i].p_filesz,
+.size = MIN(phdr[i].p_filesz,
+qemu_elf->size - phdr[i].p_offset),


Shouldn't "p_filesz is smaller than the actual amount of data in the
file" be a failure condition? In include/hw/elf_ops.h we treat it
that way:

 mem_size = ph->p_memsz; /* Size of the ROM */
 file_size = ph->p_filesz; /* Size of the allocated data */
 data_offset = ph->p_offset; /* Offset where the data is located */

 if (file_size > 0) {
 if (g_mapped_file_get_length(mapped_file) <
 file_size + data_offset) {
 goto fail;
 }
 [etc]

Like that code, we could then only check if p_offset + p_filesz is off
the end of the file, rather than checking p_offset separately.


  };
  pa_block_align(>block[block_i]);
  block_i = ps->block[block_i].size ? (block_i + 1) : block_i;


thanks
-- PMM


I'm making this permissive for corrupted dumps since they may still 
include valuable information.


It is different from include/hw/elf_ops.h, which is presumably used to 
load executables rather than dumps. Loading a corrupted executable does 
nothing good.


Regards,
Akihiko Odaki

[PATCH V2 0/1] target/loongarch: Fixed tlb huge page loading issue

2024-03-04 Thread Xianglai Li

When we use qemu tcg simulation, the page size of bios is 4KB.
When using the level 2 super large page (page size is 1G) to create the page 
table,
it is found that the content of the corresponding address space is abnormal,
resulting in the bios can not start the operating system and graphical 
interface normally.

The lddir and ldpte instruction emulation has
a problem with the use of super large page processing above level 2.
The page size is not correctly calculated,
resulting in the wrong page size of the table entry found by tlb.

Changes log:
V1->V2:
Modified the patch title format and Enrich the commit mesg description

Cc: maob...@loongson.cn
Cc: Song Gao 
Cc: Xiaojuan Yang 
Cc: zhaotian...@loongson.cn

Xianglai Li (1):
  target/loongarch: Fixed tlb huge page loading issue

 target/loongarch/cpu.h|  1 +
 target/loongarch/tcg/tlb_helper.c | 21 -
 2 files changed, 13 insertions(+), 9 deletions(-)

-- 
2.39.1

[PATCH V2 1/1] target/loongarch: Fixed tlb huge page loading issue

2024-03-04 Thread Xianglai Li

When we use qemu tcg simulation, the page size of bios is 4KB.
When using the level 2 super large page (page size is 1G) to create the page 
table,
it is found that the content of the corresponding address space is abnormal,
resulting in the bios can not start the operating system and graphical 
interface normally.

The lddir and ldpte instruction emulation has
a problem with the use of super large page processing above level 2.
The page size is not correctly calculated,
resulting in the wrong page size of the table entry found by tlb.

Signed-off-by: Xianglai Li 
Cc: maob...@loongson.cn
Cc: Song Gao 
Cc: Xiaojuan Yang 
Cc: zhaotian...@loongson.cn
---
 target/loongarch/cpu.h|  1 +
 target/loongarch/tcg/tlb_helper.c | 21 -
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index ec37579fd6..eab3e41c71 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -292,6 +292,7 @@ typedef struct CPUArchState {
 uint32_t fcsr0_mask;
 
 uint32_t cpucfg[21];
+uint32_t lddir_ps;
 
 uint64_t lladdr; /* LL virtual address compared against SC */
 uint64_t llval;
diff --git a/target/loongarch/tcg/tlb_helper.c 
b/target/loongarch/tcg/tlb_helper.c
index a08c08b05a..3594c800b3 100644
--- a/target/loongarch/tcg/tlb_helper.c
+++ b/target/loongarch/tcg/tlb_helper.c
@@ -38,6 +38,7 @@ static void raise_mmu_exception(CPULoongArchState *env, 
target_ulong address,
 cs->exception_index = EXCCODE_PIF;
 }
 env->CSR_TLBRERA = FIELD_DP64(env->CSR_TLBRERA, CSR_TLBRERA, ISTLBR, 
1);
+env->lddir_ps = 0;
 break;
 case TLBRET_INVALID:
 /* TLB match with no valid bit */
@@ -488,13 +489,6 @@ target_ulong helper_lddir(CPULoongArchState *env, 
target_ulong base,
 uint64_t dir_base, dir_width;
 bool huge = (base >> LOONGARCH_PAGE_HUGE_SHIFT) & 0x1;
 
-badvaddr = env->CSR_TLBRBADV;
-base = base & TARGET_PHYS_MASK;
-
-/* 0:64bit, 1:128bit, 2:192bit, 3:256bit */
-shift = FIELD_EX64(env->CSR_PWCL, CSR_PWCL, PTEWIDTH);
-shift = (shift + 1) * 3;
-
 if (huge) {
 return base;
 }
@@ -519,9 +513,18 @@ target_ulong helper_lddir(CPULoongArchState *env, 
target_ulong base,
 do_raise_exception(env, EXCCODE_INE, GETPC());
 return 0;
 }
+
+/* 0:64bit, 1:128bit, 2:192bit, 3:256bit */
+shift = FIELD_EX64(env->CSR_PWCL, CSR_PWCL, PTEWIDTH);
+shift = (shift + 1) * 3;
+badvaddr = env->CSR_TLBRBADV;
+base = base & TARGET_PHYS_MASK;
 index = (badvaddr >> dir_base) & ((1 << dir_width) - 1);
 phys = base | index << shift;
 ret = ldq_phys(cs->as, phys) & TARGET_PHYS_MASK;
+if (ret & BIT_ULL(LOONGARCH_PAGE_HUGE_SHIFT)) {
+env->lddir_ps = dir_base;
+}
 return ret;
 }
 
@@ -538,13 +541,13 @@ void helper_ldpte(CPULoongArchState *env, target_ulong 
base, target_ulong odd,
 base = base & TARGET_PHYS_MASK;
 
 if (huge) {
-/* Huge Page. base is paddr */
 tmp0 = base ^ (1 << LOONGARCH_PAGE_HUGE_SHIFT);
 /* Move Global bit */
 tmp0 = ((tmp0 & (1 << LOONGARCH_HGLOBAL_SHIFT))  >>
 LOONGARCH_HGLOBAL_SHIFT) << R_TLBENTRY_G_SHIFT |
 (tmp0 & (~(1 << LOONGARCH_HGLOBAL_SHIFT)));
-ps = ptbase + ptwidth - 1;
+
+ps = env->lddir_ps - 1;
 if (odd) {
 tmp0 += MAKE_64BIT_MASK(ps, 1);
 }
-- 
2.39.1

RE: [PATCH v2 0/9] Add AST2700 support

2024-03-04 Thread Jamin Lin

> -Original Message-
> From: Cédric Le Goater 
> Sent: Monday, March 4, 2024 11:54 PM
> To: Jamin Lin ; Peter Maydell
> ; Andrew Jeffery ;
> Joel Stanley ; Alistair Francis ; open
> list:ASPEED BMCs ; open list:All patches CC here
> 
> Cc: Troy Lee ; Yunlin Tang
> 
> Subject: Re: [PATCH v2 0/9] Add AST2700 support
> 
> Hello Jamin,
> 
> On 3/4/24 10:29, Jamin Lin wrote:
> > Changes from v1:
> > The patch series supports WDT, SDMC, SMC, SCU, SLI and INTC for AST2700
> SoC.
> >
> > Changes from v2:
> > - replace is_aarch64 with is_bus64bit for sdmc patch review.
> > - fix incorrect dram size for AST2700
> >
> > Test steps:
> > 1. Download openbmc image for AST2700 from
> > https://github.com/AspeedTech-BMC/openbmc/releases/tag/v09.00
> >
> https://github.com/AspeedTech-BMC/openbmc/releases/download/v09.00/
> > ast2700-default-obmc.tar.gz
> > 2. untar ast2700-default-obmc.tar.gz
> > ```
> > tar -xf ast2700-default-obmc.tar.gz
> > ```
> > 3. Run and the contents of scripts as following IMGDIR=ast2700-default
> > UBOOT_SIZE=$(stat --format=%s -L ${IMGDIR}/u-boot-nodtb.bin)
> > UBOOT_DTB_ADDR=$((0x4 + ${UBOOT_SIZE}))
> >
> > qemu-system-aarch64 -M ast2700-evb -nographic\
> >   -device
> loader,addr=0x4,file=${IMGDIR}/u-boot-nodtb.bin,force-raw=on\
> >   -device
> loader,addr=${UBOOT_DTB_ADDR},file=${IMGDIR}/u-boot.dtb,force-raw=on\
> >   -device loader,addr=0x43000,file=${IMGDIR}/bl31.bin,force-raw=on\
> >   -device
> loader,addr=0x43008,file=${IMGDIR}/optee/tee-raw.bin,force-raw=on\
> >   -device loader,addr=0x43000,cpu-num=0\
> >   -device loader,addr=0x43000,cpu-num=1\
> >   -device loader,addr=0x43000,cpu-num=2\
> >   -device loader,addr=0x43000,cpu-num=3\
> >   -smp 4\
> >   -drive file=${IMGDIR}/image-bmc,format=raw,if=mtd\
> >   -serial mon:stdio\
> >   -snapshot
> >
> > Known Issue:
> > 1. QEMU supports ARM Generic Interrupt Controller, version 3(GICv3)
> > but not support Shared Peripheral Interrupt (SPI), yet.
> > Added work around in INTC patch to set GICINT132[18] which was BMC
> > UART interrupt if it received GICINT132, so users are able to type any
> > key from keyboard to trigger GICINT132 interrupt until AST2700 boot
> > into login prompt. It is a temporary solution.
> > If users encounter boot stck and no booting log, please type any key
> > from keyboard.
> 
> I haven't looked at the GIC issue but I started reviewing what I received.
> 
> The mailer issue needs to be fixed before we consider this patches for merge.
> May be use an external email while keeping the same
> From: and Signed-off-by address.
>
Understand.
Thanks for your suggestion. I am asking our IT help to fix our smtp server 
issue.
They are fixing it. I will use my external account to send V3 patch series if
this issues does not be fixed, yet.
> When you resend, could you please add an avocado test ?
> 
Sure, will create a patch for avocado test.
Thanks
> Thanks,
> 
> C.
> 
> 
> > Jamin Lin (9):
> >aspeed/wdt: Add AST2700 support
> >aspeed/sli: Add AST2700 support
> >aspeed/sdmc: Add AST2700 support
> >aspeed/smc: Add AST2700 support
> >aspeed/scu: Add AST2700 support
> >aspeed/intc: Add AST2700 support
> >aspeed/soc: Add AST2700 support
> >aspeed: Add an AST2700 eval board
> >aspeed/soc: fix incorrect dram size for AST2700
> >
> >   hw/arm/aspeed.c  |  32 ++
> >   hw/arm/aspeed_ast27x0.c  | 554
> +++
> >   hw/arm/meson.build   |   1 +
> >   hw/intc/aspeed_intc.c| 135 
> >   hw/intc/meson.build  |   1 +
> >   hw/misc/aspeed_scu.c | 306 -
> >   hw/misc/aspeed_sdmc.c| 215 ++--
> >   hw/misc/aspeed_sli.c | 179 ++
> >   hw/misc/meson.build  |   3 +-
> >   hw/misc/trace-events |  11 +
> >   hw/ssi/aspeed_smc.c  | 326 --
> >   hw/ssi/trace-events  |   2 +-
> >   hw/watchdog/wdt_aspeed.c |  24 ++
> >   include/hw/arm/aspeed_soc.h  |  27 +-
> >   include/hw/intc/aspeed_vic.h |  29 ++
> >   include/hw/misc/aspeed_scu.h |  47 ++-
> >   include/hw/misc/aspeed_sdmc.h|   4 +-
> >   include/hw/misc/aspeed_sli.h |  32 ++
> >   include/hw/ssi/aspeed_smc.h  |   1 +
> >   include/hw/watchdog/wdt_aspeed.h |   3 +-
> >   20 files changed, 1880 insertions(+), 52 deletions(-)
> >   create mode 100644 hw/arm/aspeed_ast27x0.c
> >   create mode 100644 hw/intc/aspeed_intc.c
> >   create mode 100644 hw/misc/aspeed_sli.c
> >   create mode 100644 include/hw/misc/aspeed_sli.h
> >

Re: [PATCH v3 10/26] migration: Move cleanup after after error reporting in qemu_savevm_state_setup()

2024-03-04 Thread Peter Xu

On Mon, Mar 04, 2024 at 01:28:28PM +0100, Cédric Le Goater wrote:
> This will help preserving the error set by .save_setup() handlers.
> 
> Signed-off-by: Cédric Le Goater 

IIUC this is about the next patch.  I got fully confused before reading
into the next one.  IMHO we can squash it into where it's used.

Thanks,

> ---
>  migration/savevm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 
> 31ce9391d49c825d4ec835e26ac0246e192783a0..e400706e61e06d2d1d03a11aed14f30a243833f2
>  100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1740,10 +1740,10 @@ static int qemu_savevm_state(QEMUFile *f, Error 
> **errp)
>  qemu_savevm_state_complete_precopy(f, false, false);
>  ret = qemu_file_get_error(f);
>  }
> -qemu_savevm_state_cleanup();
>  if (ret != 0) {
>  error_setg_errno(errp, -ret, "Error while writing VM state");
>  }
> +qemu_savevm_state_cleanup();
>  
>  if (ret != 0) {
>  status = MIGRATION_STATUS_FAILED;
> -- 
> 2.44.0
> 
> 

-- 
Peter Xu

RE: [PATCH v2 2/9] aspeed/sli: Add AST2700 support

2024-03-04 Thread Jamin Lin

> -Original Message-
> From: Cédric Le Goater 
> Sent: Monday, March 4, 2024 10:36 PM
> To: Jamin Lin ; Peter Maydell
> ; Andrew Jeffery ;
> Joel Stanley ; Alistair Francis ; open
> list:ASPEED BMCs ; open list:All patches CC here
> 
> Cc: Troy Lee ; Yunlin Tang
> 
> Subject: Re: [PATCH v2 2/9] aspeed/sli: Add AST2700 support
> 
> On 3/4/24 10:29, Jamin Lin wrote:
> > AST2700 SLI engine is designed to accelerate the throughput between
> > cross-die connections.
> > It have CPU_SLI at CPU die and IO_SLI at IO die.
> >
> > Introduce new ast2700_sli and ast2700_sliio class with instance_init
> > and realize handlers.
> 
> This should say that the implementation is a dummy one.
Will fix

> 
> >
> > Signed-off-by: Troy Lee 
> > Signed-off-by: Jamin Lin 
> > ---
> >   hw/misc/aspeed_sli.c | 179
> +++
> >   hw/misc/meson.build  |   3 +-
> >   hw/misc/trace-events |   7 ++
> >   include/hw/misc/aspeed_sli.h |  32 +++
> >   4 files changed, 220 insertions(+), 1 deletion(-)
> >   create mode 100644 hw/misc/aspeed_sli.c
> >   create mode 100644 include/hw/misc/aspeed_sli.h
> >
> > diff --git a/hw/misc/aspeed_sli.c b/hw/misc/aspeed_sli.c new file mode
> > 100644 index 00..4af42f145c
> > --- /dev/null
> > +++ b/hw/misc/aspeed_sli.c
> > @@ -0,0 +1,179 @@
> > +/*
> > + * ASPEED SLI Controller
> > + *
> > + * Copyright (C) 2024 ASPEED Technology Inc.
> > + *
> > + * This code is licensed under the GPL version 2 or later.  See
> > + * the COPYING file in the top-level directory.
> 
> In all files, the paragraph above can be replaced with :
> 
> * SPDX-License-Identifier: GPL-2.0-or-later
> 
Will fix
> 
> Thanks,
> 
> C.
> 
Thanks for review and they will be fixed in V3 patch series.
Jamin
> 
> 
> 
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "qemu/log.h"
> > +#include "qemu/error-report.h"
> > +#include "hw/qdev-properties.h"
> > +#include "hw/misc/aspeed_sli.h"
> > +#include "qapi/error.h"
> > +#include "migration/vmstate.h"
> > +#include "trace.h"
> > +
> > +#define SLI_REGION_SIZE 0x500
> > +#define TO_REG(addr) ((addr) >> 2)
> > +
> > +static uint64_t aspeed_sli_read(void *opaque, hwaddr addr, unsigned
> > +int size) {
> > +AspeedSLIState *s = ASPEED_SLI(opaque);
> > +int reg = TO_REG(addr);
> > +
> > +if (reg >= ARRAY_SIZE(s->regs)) {
> > +qemu_log_mask(LOG_GUEST_ERROR,
> > +  "%s: Out-of-bounds read at offset 0x%"
> HWADDR_PRIx "\n",
> > +  __func__, addr);
> > +return 0;
> > +}
> > +
> > +trace_aspeed_sli_read(addr, size, s->regs[reg]);
> > +return s->regs[reg];
> > +}
> > +
> > +static void aspeed_sli_write(void *opaque, hwaddr addr, uint64_t data,
> > +  unsigned int size) {
> > +AspeedSLIState *s = ASPEED_SLI(opaque);
> > +int reg = TO_REG(addr);
> > +
> > +if (reg >= ARRAY_SIZE(s->regs)) {
> > +qemu_log_mask(LOG_GUEST_ERROR,
> > +  "%s: Out-of-bounds write at offset 0x%"
> HWADDR_PRIx "\n",
> > +  __func__, addr);
> > +return;
> > +}
> > +
> > +trace_aspeed_sli_write(addr, size, data);
> > +s->regs[reg] = data;
> > +}
> > +
> > +static uint64_t aspeed_sliio_read(void *opaque, hwaddr addr, unsigned
> > +int size) {
> > +AspeedSLIState *s = ASPEED_SLI(opaque);
> > +int reg = TO_REG(addr);
> > +
> > +if (reg >= ARRAY_SIZE(s->regs)) {
> > +qemu_log_mask(LOG_GUEST_ERROR,
> > +  "%s: Out-of-bounds read at offset 0x%"
> HWADDR_PRIx "\n",
> > +  __func__, addr);
> > +return 0;
> > +}
> > +
> > +trace_aspeed_sliio_read(addr, size, s->regs[reg]);
> > +return s->regs[reg];
> > +}
> > +
> > +static void aspeed_sliio_write(void *opaque, hwaddr addr, uint64_t data,
> > +  unsigned int size) {
> > +AspeedSLIState *s = ASPEED_SLI(opaque);
> > +int reg = TO_REG(addr);
> > +
> > +if (reg >= ARRAY_SIZE(s->regs)) {
> > +qemu_log_mask(LOG_GUEST_ERROR,
> > +  "%s: Out-of-bounds write at offset 0x%"
> HWADDR_PRIx "\n",
> > +  __func__, addr);
> > +return;
> > +}
> > +
> > +trace_aspeed_sliio_write(addr, size, data);
> > +s->regs[reg] = data;
> > +}
> > +
> > +static const MemoryRegionOps aspeed_sli_ops = {
> > +.read = aspeed_sli_read,
> > +.write = aspeed_sli_write,
> > +.endianness = DEVICE_LITTLE_ENDIAN,
> > +.valid = {
> > +.min_access_size = 1,
> > +.max_access_size = 4,
> > +},
> > +};
> > +
> > +static const MemoryRegionOps aspeed_sliio_ops = {
> > +.read = aspeed_sliio_read,
> > +.write = aspeed_sliio_write,
> > +.endianness = DEVICE_LITTLE_ENDIAN,
> > +.valid = {
> > +.min_access_size = 1,
> > +.max_access_size = 4,
> > +},
> > +};
> > +
> > +static void aspeed_sli_realize(DeviceState *dev, Error

Re: Why does the vmovdqu works for passthrough device but crashes for emulated device with "illegal operand" error (in x86_64 QEMU, -accel = kvm) ?

2024-03-04 Thread Jim Mattson

On Mon, Mar 4, 2024 at 6:11 PM Xu Liu  wrote:
>
> Hey Alex and Paolo,
>
> I saw there is some code related to AVX  
> https://elixir.bootlin.com/linux/latest/source/arch/x86/kvm/emulate.c#L668
>
> Does that mean in some special cases, kvm supports AVX instructions ?
> I didn’t really know the big picture, so just guess what it is doing .

The Avx bit was added in commit 1c11b37669a5 ("KVM: x86 emulator: add
support for vector alignment"). It is not used.

> Thanks,
> Xu
>
> > On Mar 4, 2024, at 6:39 PM, Paolo Bonzini  wrote:
> >
> > !---|
> > This Message Is From an External Sender
> >
> > |---!
> >
> > On 3/4/24 22:59, Alex Williamson wrote:
> >> Since you're not seeing a KVM_EXIT_MMIO I'd guess this is more of a KVM
> >> issue than QEMU (Cc kvm list).  Possibly KVM doesn't emulate vmovdqu
> >> relative to an MMIO access, but honestly I'm not positive that AVX
> >> instructions are meant to work on MMIO space.  I'll let x86 KVM experts
> >> more familiar with specific opcode semantics weigh in on that.
> >
> > Indeed, KVM's instruction emulator supports some SSE MOV instructions but 
> > not the corresponding AVX instructions.
> >
> > Vector instructions however do work on MMIO space, and they are used 
> > occasionally especially in combination with write-combining memory.  SSE 
> > support was added to KVM because some operating systems used SSE 
> > instructions to read and write to VRAM.  However, so far we've never 
> > received any reports of OSes using AVX instructions on devices that QEMU 
> > can emulate (as opposed to, for example, GPU VRAM that is passed through).
> >
> > Thanks,
> >
> > Paolo
> >
> >> Is your "program" just doing a memcpy() with an mmap() of the PCI BAR
> >> acquired through pci-sysfs or a userspace vfio-pci driver within the
> >> guest?
> >> In QEMU 4a2e242bbb30 ("memory: Don't use memcpy for ram_device
> >> regions") we resolved an issue[1] where QEMU itself was doing a memcpy()
> >> to assigned device MMIO space resulting in breaking functionality of
> >> the device.  IIRC memcpy() was using an SSE instruction that didn't
> >> fault, but didn't work correctly relative to MMIO space either.
> >> So I also wouldn't rule out that the program isn't inherently
> >> misbehaving by using memcpy() and thereby ignoring the nature of the
> >> device MMIO access semantics.  Thanks,
> >> Alex
> >> [1]https://bugs.launchpad.net/qemu/+bug/1384892
> >
>

RE: [PATCH v2 1/9] aspeed/wdt: Add AST2700 support

2024-03-04 Thread Jamin Lin

> -Original Message-
> From: Cédric Le Goater 
> Sent: Monday, March 4, 2024 10:32 PM
> To: Jamin Lin ; Peter Maydell
> ; Andrew Jeffery ;
> Joel Stanley ; Alistair Francis ; open
> list:ASPEED BMCs ; open list:All patches CC here
> 
> Cc: Troy Lee ; Yunlin Tang
> 
> Subject: Re: [PATCH v2 1/9] aspeed/wdt: Add AST2700 support
> 
> Hello Jamin,
> 
> On 3/4/24 10:29, Jamin Lin wrote:
> > AST2700 wdt controller is similiar to AST2600's wdt, but the AST2700
> > has 8 watchdogs, and they each have a 0x80 of registers.
> 
> ... they each have 0x80 registers.
> 
> > Introduce ast2700 object class and increse the number of regs(offset)
> > of
> 
> .. increase ...
> 
> > ast2700 model.
> >
> > Signed-off-by: Troy Lee 
> > Signed-off-by: Jamin Lin 
>
Thanks for review and typo will be fixed in V3 patch series.
Jamin
> 
> Reviewed-by: Cédric Le Goater 
> 
> Thanks,
> 
> C.
> 
> 
> > ---
> >   hw/watchdog/wdt_aspeed.c | 24 
> >   include/hw/watchdog/wdt_aspeed.h |  3 ++-
> >   2 files changed, 26 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/watchdog/wdt_aspeed.c b/hw/watchdog/wdt_aspeed.c index
> > d70b656f8e..75685c5647 100644
> > --- a/hw/watchdog/wdt_aspeed.c
> > +++ b/hw/watchdog/wdt_aspeed.c
> > @@ -422,12 +422,36 @@ static const TypeInfo aspeed_1030_wdt_info = {
> >   .class_init = aspeed_1030_wdt_class_init,
> >   };
> >
> > +static void aspeed_2700_wdt_class_init(ObjectClass *klass, void
> > +*data) {
> > +DeviceClass *dc = DEVICE_CLASS(klass);
> > +AspeedWDTClass *awc = ASPEED_WDT_CLASS(klass);
> > +
> > +dc->desc = "ASPEED 2700 Watchdog Controller";
> > +awc->iosize = 0x80;
> > +awc->ext_pulse_width_mask = 0xf; /* TODO */
> > +awc->reset_ctrl_reg = AST2600_SCU_RESET_CONTROL1;
> > +awc->reset_pulse = aspeed_2500_wdt_reset_pulse;
> > +awc->wdt_reload = aspeed_wdt_reload_1mhz;
> > +awc->sanitize_ctrl = aspeed_2600_sanitize_ctrl;
> > +awc->default_status = 0x014FB180;
> > +awc->default_reload_value = 0x014FB180; }
> > +
> > +static const TypeInfo aspeed_2700_wdt_info = {
> > +.name = TYPE_ASPEED_2700_WDT,
> > +.parent = TYPE_ASPEED_WDT,
> > +.instance_size = sizeof(AspeedWDTState),
> > +.class_init = aspeed_2700_wdt_class_init, };
> > +
> >   static void wdt_aspeed_register_types(void)
> >   {
> >   type_register_static(_wdt_info);
> >   type_register_static(_2400_wdt_info);
> >   type_register_static(_2500_wdt_info);
> >   type_register_static(_2600_wdt_info);
> > +type_register_static(_2700_wdt_info);
> >   type_register_static(_1030_wdt_info);
> >   }
> >
> > diff --git a/include/hw/watchdog/wdt_aspeed.h
> > b/include/hw/watchdog/wdt_aspeed.h
> > index e90ef86651..830b0a7936 100644
> > --- a/include/hw/watchdog/wdt_aspeed.h
> > +++ b/include/hw/watchdog/wdt_aspeed.h
> > @@ -19,9 +19,10 @@ OBJECT_DECLARE_TYPE(AspeedWDTState,
> AspeedWDTClass, ASPEED_WDT)
> >   #define TYPE_ASPEED_2400_WDT TYPE_ASPEED_WDT "-ast2400"
> >   #define TYPE_ASPEED_2500_WDT TYPE_ASPEED_WDT "-ast2500"
> >   #define TYPE_ASPEED_2600_WDT TYPE_ASPEED_WDT "-ast2600"
> > +#define TYPE_ASPEED_2700_WDT TYPE_ASPEED_WDT "-ast2700"
> >   #define TYPE_ASPEED_1030_WDT TYPE_ASPEED_WDT "-ast1030"
> >
> > -#define ASPEED_WDT_REGS_MAX(0x30 / 4)
> > +#define ASPEED_WDT_REGS_MAX(0x80 / 4)
> >
> >   struct AspeedWDTState {
> >   /*< private >*/

Re: [PATCH 1/1] kvm: add support for guest physical bits

2024-03-04 Thread Xiaoyao Li


On 3/4/2024 10:58 PM, Gerd Hoffmann wrote:

On Mon, Mar 04, 2024 at 09:54:40AM +0800, Xiaoyao Li wrote:

On 3/1/2024 6:17 PM, Gerd Hoffmann wrote:

query kvm for supported guest physical address bits using
KVM_CAP_VM_GPA_BITS.  Expose the value to the guest via cpuid
(leaf 0x8008, eax, bits 16-23).

Signed-off-by: Gerd Hoffmann 
---
   target/i386/cpu.h | 1 +
   target/i386/cpu.c | 1 +
   target/i386/kvm/kvm.c | 8 
   3 files changed, 10 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 952174bb6f52..d427218827f6 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -2026,6 +2026,7 @@ struct ArchCPU {
   /* Number of physical address bits supported */
   uint32_t phys_bits;
+uint32_t guest_phys_bits;
   /* in order to simplify APIC support, we leave this pointer to the
  user */
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 2666ef380891..1a6cfc75951e 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6570,6 +6570,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
   if (env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM) {
   /* 64 bit processor */
*eax |= (cpu_x86_virtual_addr_width(env) << 8);
+ *eax |= (cpu->guest_phys_bits << 16);


I think you misunderstand this field.

If you expose this field to guest, it's the information for nested guest.
i.e., the guest itself runs as a hypervisor will know its nested guest can
have guest_phys_bits for physical addr.


I think those limits (l1 + l2 guest phys-bits) are identical, no?


Sorry, I didn't know this patch was based on the off-list proposal made 
by Paolo that changing the definition of CPUID.0x8008:EAX[23:16] to 
advertise the "maximum usable physical address bits".


If you call out this in the change log, it can avoid the misunderstanding.

As I replied to KVM series, I think the info is better to setup by KVM 
and reported by GET_SUPPORTED_CPUID.



The problem this tries to solve is that making the guest phys-bits
smaller than the host phys-bits is problematic (which why we have
allow_smaller_maxphyaddr), but nevertheless there are cases where
the usable guest physical address space is smaller than the host
physical address space.  One case is intel processors with phys-bits
larger than 48 and 4-level EPT.  Another case is amd processors with
phys-bits larger than 48 and the l0 hypervisor using 4-level paging.

The guest needs to know that limit, specifically the guest firmware
so it knows where it can map PCI bars.

take care,
   Gerd

[PATCH] target/arm: Fix 32-bit SMOPA

2024-03-04 Thread Richard Henderson

The while the 8-bit input elements are sequential in the input vector,
the 32-bit output elements are not sequential in the output matrix.
Do not attempt to compute 2 32-bit outputs at the same time.

Cc: qemu-sta...@nongnu.org
Fixes: 23a5e3859f5 ("target/arm: Implement SME integer outer product")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2083
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/sme_helper.c   | 77 ++-
 tests/tcg/aarch64/sme-smopa-1.c   | 47 +++
 tests/tcg/aarch64/sme-smopa-2.c   | 54 ++
 tests/tcg/aarch64/Makefile.target |  2 +-
 4 files changed, 147 insertions(+), 33 deletions(-)
 create mode 100644 tests/tcg/aarch64/sme-smopa-1.c
 create mode 100644 tests/tcg/aarch64/sme-smopa-2.c

diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c
index 904bfdac43..ef39eee48d 100644
--- a/target/arm/tcg/sme_helper.c
+++ b/target/arm/tcg/sme_helper.c
@@ -1083,11 +1083,32 @@ void HELPER(sme_bfmopa)(void *vza, void *vzn, void 
*vzm, void *vpn,
 }
 }
 
-typedef uint64_t IMOPFn(uint64_t, uint64_t, uint64_t, uint8_t, bool);
+typedef uint32_t IMOPFn32(uint32_t, uint32_t, uint32_t, uint8_t, bool);
+static inline void do_imopa_s(uint32_t *za, uint32_t *zn, uint32_t *zm,
+  uint8_t *pn, uint8_t *pm,
+  uint32_t desc, IMOPFn32 *fn)
+{
+intptr_t row, col, oprsz = simd_oprsz(desc) / 4;
+bool neg = simd_data(desc);
 
-static inline void do_imopa(uint64_t *za, uint64_t *zn, uint64_t *zm,
-uint8_t *pn, uint8_t *pm,
-uint32_t desc, IMOPFn *fn)
+for (row = 0; row < oprsz; ++row) {
+uint8_t pa = pn[H1(row >> 1)] >> ((row & 1) * 4);
+uint32_t *za_row = [H4(tile_vslice_index(row))];
+uint32_t n = zn[H4(row)];
+
+for (col = 0; col < oprsz; ++col) {
+uint8_t pb = pm[H1(col >> 1)] >> ((col & 1) * 4);
+uint32_t *a = _row[col];
+
+*a = fn(n, zm[H4(col)], *a, pa & pb & 0xf, neg);
+}
+}
+}
+
+typedef uint64_t IMOPFn64(uint64_t, uint64_t, uint64_t, uint8_t, bool);
+static inline void do_imopa_d(uint64_t *za, uint64_t *zn, uint64_t *zm,
+  uint8_t *pn, uint8_t *pm,
+  uint32_t desc, IMOPFn64 *fn)
 {
 intptr_t row, col, oprsz = simd_oprsz(desc) / 8;
 bool neg = simd_data(desc);
@@ -1107,25 +1128,16 @@ static inline void do_imopa(uint64_t *za, uint64_t *zn, 
uint64_t *zm,
 }
 
 #define DEF_IMOP_32(NAME, NTYPE, MTYPE) \
-static uint64_t NAME(uint64_t n, uint64_t m, uint64_t a, uint8_t p, bool neg) \
+static uint32_t NAME(uint32_t n, uint32_t m, uint32_t a, uint8_t p, bool neg) \
 {   \
-uint32_t sum0 = 0, sum1 = 0;\
+uint32_t sum = 0;   \
 /* Apply P to N as a mask, making the inactive elements 0. */   \
 n &= expand_pred_b(p);  \
-sum0 += (NTYPE)(n >> 0) * (MTYPE)(m >> 0);  \
-sum0 += (NTYPE)(n >> 8) * (MTYPE)(m >> 8);  \
-sum0 += (NTYPE)(n >> 16) * (MTYPE)(m >> 16);\
-sum0 += (NTYPE)(n >> 24) * (MTYPE)(m >> 24);\
-sum1 += (NTYPE)(n >> 32) * (MTYPE)(m >> 32);\
-sum1 += (NTYPE)(n >> 40) * (MTYPE)(m >> 40);\
-sum1 += (NTYPE)(n >> 48) * (MTYPE)(m >> 48);\
-sum1 += (NTYPE)(n >> 56) * (MTYPE)(m >> 56);\
-if (neg) {  \
-sum0 = (uint32_t)a - sum0, sum1 = (uint32_t)(a >> 32) - sum1;   \
-} else {\
-sum0 = (uint32_t)a + sum0, sum1 = (uint32_t)(a >> 32) + sum1;   \
-}   \
-return ((uint64_t)sum1 << 32) | sum0;   \
+sum += (NTYPE)(n >> 0) * (MTYPE)(m >> 0);   \
+sum += (NTYPE)(n >> 8) * (MTYPE)(m >> 8);   \
+sum += (NTYPE)(n >> 16) * (MTYPE)(m >> 16); \
+sum += (NTYPE)(n >> 24) * (MTYPE)(m >> 24); \
+return neg ? a - sum : a + sum; \
 }
 
 #define DEF_IMOP_64(NAME, NTYPE, MTYPE) \
@@ -1151,16 +1163,17 @@ DEF_IMOP_64(umopa_d, uint16_t, uint16_t)
 DEF_IMOP_64(sumopa_d, int16_t, uint16_t)
 DEF_IMOP_64(usmopa_d, uint16_t, int16_t)
 
-#define DEF_IMOPH(NAME) \
-void HELPER(sme_##NAME)(void *vza, void *vzn, void *vzm, void *vpn,  \
-

Re: [PATCH] migration/ram: add additional check

2024-03-04 Thread Peter Xu

On Mon, Mar 04, 2024 at 05:42:03PM +0300, Maksim Davydov wrote:
> If a migration stream is broken, the address and flag reading can return
> zero. Thus, an irrelevant flag error will be returned instead of EIO.
> It can be fixed by additional check after the reading.
> 
> Signed-off-by: Maksim Davydov 
> ---
>  migration/ram.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 45a00b45ed..95d8b19c3b 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -3902,6 +3902,12 @@ static int ram_load_precopy(QEMUFile *f)
>  i++;
>  
>  addr = qemu_get_be64(f);
> +ret = qemu_file_get_error(f);
> +if (ret) {
> +error_report("Getting RAM address failed");
> +break;
> +}
> +
>  flags = addr & ~TARGET_PAGE_MASK;
>  addr &= TARGET_PAGE_MASK;
>  
> -- 
> 2.34.1
> 
> 

Queued, thanks.

-- 
Peter Xu

Re: [PATCH v2 0/3] migration: Don't serialize devices in qemu_savevm_state_iterate()

2024-03-04 Thread Peter Xu

On Mon, Mar 04, 2024 at 12:53:36PM +0200, Avihai Horon wrote:
> Hi,
> 
> This small series is v2 of the single patch I previously sent [1].
> 
> It removes device serialization in qemu_savevm_state_iterate() and does
> some VFIO migration touch ups. More info provided in the commit
> messages.
> 
> Thanks.
> 
> Changes from V1 -> V2:
> * Remove device serialization in qemu_savevm_state_iterate() always,
>   regardless of switchover-ack.
> * Refactor vfio_save_iterate() return value.
> * Add a note about migration rate limiting in vfio_save_iterate().
> 
> [1] 
> https://lore.kernel.org/qemu-devel/20240222155627.14563-1-avih...@nvidia.com/

Queued, thanks.

-- 
Peter Xu

Re: Why does the vmovdqu works for passthrough device but crashes for emulated device with "illegal operand" error (in x86_64 QEMU, -accel = kvm) ?

2024-03-04 Thread Xu Liu

Hey Alex and Paolo,

I saw there is some code related to AVX  
https://elixir.bootlin.com/linux/latest/source/arch/x86/kvm/emulate.c#L668

Does that mean in some special cases, kvm supports AVX instructions ?
I didn’t really know the big picture, so just guess what it is doing .

Thanks,
Xu

> On Mar 4, 2024, at 6:39 PM, Paolo Bonzini  wrote:
> 
> !---|
> This Message Is From an External Sender
> 
> |---!
> 
> On 3/4/24 22:59, Alex Williamson wrote:
>> Since you're not seeing a KVM_EXIT_MMIO I'd guess this is more of a KVM
>> issue than QEMU (Cc kvm list).  Possibly KVM doesn't emulate vmovdqu
>> relative to an MMIO access, but honestly I'm not positive that AVX
>> instructions are meant to work on MMIO space.  I'll let x86 KVM experts
>> more familiar with specific opcode semantics weigh in on that.
> 
> Indeed, KVM's instruction emulator supports some SSE MOV instructions but not 
> the corresponding AVX instructions.
> 
> Vector instructions however do work on MMIO space, and they are used 
> occasionally especially in combination with write-combining memory.  SSE 
> support was added to KVM because some operating systems used SSE instructions 
> to read and write to VRAM.  However, so far we've never received any reports 
> of OSes using AVX instructions on devices that QEMU can emulate (as opposed 
> to, for example, GPU VRAM that is passed through).
> 
> Thanks,
> 
> Paolo
> 
>> Is your "program" just doing a memcpy() with an mmap() of the PCI BAR
>> acquired through pci-sysfs or a userspace vfio-pci driver within the
>> guest?
>> In QEMU 4a2e242bbb30 ("memory: Don't use memcpy for ram_device
>> regions") we resolved an issue[1] where QEMU itself was doing a memcpy()
>> to assigned device MMIO space resulting in breaking functionality of
>> the device.  IIRC memcpy() was using an SSE instruction that didn't
>> fault, but didn't work correctly relative to MMIO space either.
>> So I also wouldn't rule out that the program isn't inherently
>> misbehaving by using memcpy() and thereby ignoring the nature of the
>> device MMIO access semantics.  Thanks,
>> Alex
>> [1]https://bugs.launchpad.net/qemu/+bug/1384892
>

Re: [PATCH v3 15/26] memory: Add Error** argument to the global_dirty_log routines

2024-03-04 Thread Yong Huang

On Mon, Mar 4, 2024 at 8:29 PM Cédric Le Goater  wrote:

> Now that the log_global*() handlers take an Error** parameter and
> return a bool, do the same for memory_global_dirty_log_start() and
> memory_global_dirty_log_stop(). The error is reported in the callers
> for now and it will be propagated in the call stack in the next
> changes.
>
> To be noted a functional change in ram_init_bitmaps(), if the dirty
> pages logger fails to start, there is no need to synchronize the dirty
> pages bitmaps. colo_incoming_start_dirty_log() could be modified in a
> similar way.
>
> Cc: Stefano Stabellini 
> Cc: Anthony Perard 
> Cc: Paul Durrant 
> Cc: "Michael S. Tsirkin" 
> Cc: Paolo Bonzini 
> Cc: David Hildenbrand 
> Cc: Hyman Huang 
> Signed-off-by: Cédric Le Goater 
> ---
>  include/exec/memory.h | 10 --
>  hw/i386/xen/xen-hvm.c |  4 ++--
>  migration/dirtyrate.c | 21 +
>  migration/ram.c   | 34 ++
>  system/memory.c   | 30 --
>  5 files changed, 69 insertions(+), 30 deletions(-)
>
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index
> 4bc146c5ebdd377cd14a4e462f32cc945db5a0a8..8b019465ab13ce85c03075c80865a0865ea1feed
> 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -2576,15 +2576,21 @@ void memory_listener_unregister(MemoryListener
> *listener);
>   * memory_global_dirty_log_start: begin dirty logging for all regions
>   *
>   * @flags: purpose of starting dirty log, migration or dirty rate
> + * @errp: pointer to Error*, to store an error if it happens.
> + *
> + * Return: true on success, else false setting @errp with error.
>   */
> -void memory_global_dirty_log_start(unsigned int flags);
> +bool memory_global_dirty_log_start(unsigned int flags, Error **errp);
>
>  /**
>   * memory_global_dirty_log_stop: end dirty logging for all regions
>   *
>   * @flags: purpose of stopping dirty log, migration or dirty rate
> + * @errp: pointer to Error*, to store an error if it happens.
> + *
> + * Return: true on success, else false setting @errp with error.
>   */
> -void memory_global_dirty_log_stop(unsigned int flags);
> +bool memory_global_dirty_log_stop(unsigned int flags, Error **errp);
>
>  void mtree_info(bool flatview, bool dispatch_tree, bool owner, bool
> disabled);
>
> diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
> index
> 925a207b494b4eed52d5f360b554f18ac8a9806d..286269b47572d90e57df5ff44835bb5f8e16c7ad
> 100644
> --- a/hw/i386/xen/xen-hvm.c
> +++ b/hw/i386/xen/xen-hvm.c
> @@ -655,9 +655,9 @@ void xen_hvm_modified_memory(ram_addr_t start,
> ram_addr_t length)
>  void qmp_xen_set_global_dirty_log(bool enable, Error **errp)
>  {
>  if (enable) {
> -memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION);
> +memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION, errp);
>  } else {
> -memory_global_dirty_log_stop(GLOBAL_DIRTY_MIGRATION);
> +memory_global_dirty_log_stop(GLOBAL_DIRTY_MIGRATION, errp);
>  }
>  }
>
> diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
> index
> 1d2e85746fb7b10eb7f149976970f9a92125af8a..34f6d803ff5f4e6ccf2e06aaaed65a336c4be469
> 100644
> --- a/migration/dirtyrate.c
> +++ b/migration/dirtyrate.c
> @@ -90,11 +90,17 @@ static int64_t do_calculate_dirtyrate(DirtyPageRecord
> dirty_pages,
>
>  void global_dirty_log_change(unsigned int flag, bool start)
>  {
> +Error *local_err = NULL;
> +bool ret;
> +
>  bql_lock();
>  if (start) {
> -memory_global_dirty_log_start(flag);
> +ret = memory_global_dirty_log_start(flag, _err);
>  } else {
> -memory_global_dirty_log_stop(flag);
> +ret = memory_global_dirty_log_stop(flag, _err);
> +}
> +if (!ret) {
> +error_report_err(local_err);
>  }
>  bql_unlock();
>  }
> @@ -106,10 +112,14 @@ void global_dirty_log_change(unsigned int flag, bool
> start)
>   */
>  static void global_dirty_log_sync(unsigned int flag, bool one_shot)
>  {
> +Error *local_err = NULL;
> +
>  bql_lock();
>  memory_global_dirty_log_sync(false);
>  if (one_shot) {
> -memory_global_dirty_log_stop(flag);
> +if (!memory_global_dirty_log_stop(flag, _err)) {
> +error_report_err(local_err);
> +}
>  }
>  bql_unlock();
>  }
> @@ -608,9 +618,12 @@ static void calculate_dirtyrate_dirty_bitmap(struct
> DirtyRateConfig config)
>  {
>  int64_t start_time;
>  DirtyPageRecord dirty_pages;
> +Error *local_err = NULL;
>
>  bql_lock();
> -memory_global_dirty_log_start(GLOBAL_DIRTY_DIRTY_RATE);
> +if (!memory_global_dirty_log_start(GLOBAL_DIRTY_DIRTY_RATE,
> _err)) {
> +error_report_err(local_err);
> +}
>
>  /*
>   * 1'round of log sync may return all 1 bits with
> diff --git a/migration/ram.c b/migration/ram.c
> index
> 20c6ad9e759b2b8ec7ae26b7ca72d5cbd20d481f..3d9c08cfae8a59031a7c1b3c70721c2a90daceba
> 100644
> ---

Re: [PATCH v6 00/23] migration: File based migration with multifd and mapped-ram

2024-03-04 Thread Peter Xu

On Mon, Mar 04, 2024 at 09:04:51PM +, Daniel P. Berrangé wrote:
> On Mon, Mar 04, 2024 at 05:15:05PM -0300, Fabiano Rosas wrote:
> > Peter Xu  writes:
> > 
> > > On Mon, Mar 04, 2024 at 08:53:24PM +0800, Peter Xu wrote:
> > >> On Mon, Mar 04, 2024 at 12:42:25PM +, Daniel P. Berrangé wrote:
> > >> > On Mon, Mar 04, 2024 at 08:35:36PM +0800, Peter Xu wrote:
> > >> > > Fabiano,
> > >> > > 
> > >> > > On Thu, Feb 29, 2024 at 12:29:54PM -0300, Fabiano Rosas wrote:
> > >> > > > => guest: 128 GB RAM - 120 GB dirty - 1 vcpu in tight loop 
> > >> > > > dirtying memory
> > >> > > 
> > >> > > I'm curious normally how much time does it take to do the final 
> > >> > > fdatasync()
> > >> > > for you when you did this test.
> > 
> > I measured and it takes ~4s for the live migration and ~2s for the
> > non-live. I didn't notice this before because the VM goes into
> > postmigrate, so it's paused anyway.

For my case it took me tens of seconds at least, if not go into minutes,
which I didn't measure.

I could have dirtied harder, or I just had a slower disk.  IIUC the worst
case is all cache dirty (didn't yet writeback in the kernel), say 100GB,
assuming the disk bandwidth 1GB/s (that's the bw of my test machine hard
drive of 1M chunk dd for a 10GB file, even without a sync..), IIUC it means
it could take 1min or more in reality.

> > 
> > >> > > 
> > >> > > I finally got a relatively large system today and gave it a quick 
> > >> > > shot over
> > >> > > 128G (100G busy dirty) mapped-ram snapshot with 8 multifd channels.  
> > >> > > The
> > >> > > migration save/load does all fine, so I don't think there's anything 
> > >> > > wrong
> > >> > > with the patchset, however when save completes (I'll need to stop the
> > >> > > workload as my disk isn't fast enough I guess..) I'll always hit a 
> > >> > > super
> > >> > > long hang of QEMU on fdatasync() on XFS during which the main thread 
> > >> > > is in
> > >> > > UNINTERRUPTIBLE state.
> > >> > 
> > >> > That isn't very surprising. If you don't have O_DIRECT enabled, then
> > >> > all that disk I/O from the migrate is going to be in RAM, and thus the
> > >> > fdatasync() is likely to trigger writing out alot of data.
> > >> > 
> > >> > Blocking the main QEMU thread though is pretty unhelpful. That suggests
> > >> > the data sync needs to be moved to a non-main thread.
> > >> 
> > >> Perhaps migration thread itself can also be a candidate, then.
> > >> 
> > >> > 
> > >> > With O_DIRECT meanwhile there should be essentially no hit from 
> > >> > fdatasync.
> > >> 
> > >> The update of COMPLETED status can be a good place of a marker point to
> > >> show such flush done if from the gut feeling of a user POV.  If that 
> > >> makes
> > >> sense, maybe we can do that sync before setting COMPLETED.
> > 
> > At the migration completion I believe the multifd threads will have
> > already cleaned up and dropped the reference to the channel, it might be
> > too late then.
> > 
> > In the multifd threads, we'll be wasting (like we are today) the extra
> > syscalls after the first sync succeeds.
> > 
> > >> 
> > >> No matter which thread does that sync, it's still a pity that it'll go 
> > >> into
> > >> UNINTERRUPTIBLE during fdatasync(), then whoever wants to e.g. attach a 
> > >> gdb
> > >> onto it to have a look will also hang.
> > >
> > > Or... would it be nicer we get rid of the fdatasync() but leave that for
> > > upper layers?  QEMU used to support file: migration already, it never
> > > manage cache behavior; it does smell like something shouldn't be done in
> > > QEMU when thinking about it, at least mapped-ram is nothing special to me
> > > from this regard.
> > >
> > > User should be able to control that either manually (sync), or Libvirt can
> > > do that after QEMU quits; after all Libvirt holds the fd itself?  It 
> > > should
> > > allow us to get rid of above UNINTERRUPTIBLE / un-debuggable period of 
> > > QEMU
> > > went away.  Another side benefit: rather than holding all of QEMU 
> > > resources
> > > (especially, guest RAM) when waiting for a super slow disk flush, Libvirt 
> > > /
> > > upper layer can do that separately after releasing all the QEMU resources
> > > first.
> > 
> > I like the idea of QEMU having a self-contained
> > implementation. Specially since we'll add O_DIRECT support, which is
> > already quite heavy-handed if we're talking about managing cache
> > behavior.

O_DIRECT is optionally selected by the user by setting the new parameter
first, so the user is still in full control - it's still user's decision on
how cache should be managed, even if QEMU needs explicit changes to support
and expose the new parameter.

For fdatasync(), I think it's slightly different in that it doesn't require
anything implemented in QEMU, as the snapshot is always in the form of a
file, and file is pretty common concept which well supports sync semantics
separately.  Instead of providing yet another parameter to control it, we
can just avoid that

Re: [PATCH] Fixed tlb huge page loading issue

2024-03-04 Thread lixianglai




Hi gaosong:

Hi,

Title 'target/loongarch: ' ...


OK! I will fix it in next version.

Thanks,

Xianglai.





Thanks.
Song Gao
在 2024/2/28 14:55, Xianglai Li 写道:

The lddir and ldpte instruction emulation has
a problem with the use of large page processing above level 2.
The page size is not correctly calculated,
resulting in the wrong page size of the table entry found by tlb.

Signed-off-by: Xianglai Li 
---
  target/loongarch/cpu.h    |  1 +
  target/loongarch/tcg/tlb_helper.c | 21 -
  2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index ec37579fd6..eab3e41c71 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -292,6 +292,7 @@ typedef struct CPUArchState {
  uint32_t fcsr0_mask;
    uint32_t cpucfg[21];
+    uint32_t lddir_ps;
    uint64_t lladdr; /* LL virtual address compared against SC */
  uint64_t llval;
diff --git a/target/loongarch/tcg/tlb_helper.c 
b/target/loongarch/tcg/tlb_helper.c

index a08c08b05a..3594c800b3 100644
--- a/target/loongarch/tcg/tlb_helper.c
+++ b/target/loongarch/tcg/tlb_helper.c
@@ -38,6 +38,7 @@ static void raise_mmu_exception(CPULoongArchState 
*env, target_ulong address,

  cs->exception_index = EXCCODE_PIF;
  }
  env->CSR_TLBRERA = FIELD_DP64(env->CSR_TLBRERA, 
CSR_TLBRERA, ISTLBR, 1);

+    env->lddir_ps = 0;
  break;
  case TLBRET_INVALID:
  /* TLB match with no valid bit */
@@ -488,13 +489,6 @@ target_ulong helper_lddir(CPULoongArchState 
*env, target_ulong base,

  uint64_t dir_base, dir_width;
  bool huge = (base >> LOONGARCH_PAGE_HUGE_SHIFT) & 0x1;
  -    badvaddr = env->CSR_TLBRBADV;
-    base = base & TARGET_PHYS_MASK;
-
-    /* 0:64bit, 1:128bit, 2:192bit, 3:256bit */
-    shift = FIELD_EX64(env->CSR_PWCL, CSR_PWCL, PTEWIDTH);
-    shift = (shift + 1) * 3;
-
  if (huge) {
  return base;
  }
@@ -519,9 +513,18 @@ target_ulong helper_lddir(CPULoongArchState 
*env, target_ulong base,

  do_raise_exception(env, EXCCODE_INE, GETPC());
  return 0;
  }
+
+    /* 0:64bit, 1:128bit, 2:192bit, 3:256bit */
+    shift = FIELD_EX64(env->CSR_PWCL, CSR_PWCL, PTEWIDTH);
+    shift = (shift + 1) * 3;
+    badvaddr = env->CSR_TLBRBADV;
+    base = base & TARGET_PHYS_MASK;
  index = (badvaddr >> dir_base) & ((1 << dir_width) - 1);
  phys = base | index << shift;
  ret = ldq_phys(cs->as, phys) & TARGET_PHYS_MASK;
+    if (ret & BIT_ULL(LOONGARCH_PAGE_HUGE_SHIFT)) {
+    env->lddir_ps = dir_base;
+    }
  return ret;
  }
  @@ -538,13 +541,13 @@ void helper_ldpte(CPULoongArchState *env, 
target_ulong base, target_ulong odd,

  base = base & TARGET_PHYS_MASK;
    if (huge) {
-    /* Huge Page. base is paddr */
  tmp0 = base ^ (1 << LOONGARCH_PAGE_HUGE_SHIFT);
  /* Move Global bit */
  tmp0 = ((tmp0 & (1 << LOONGARCH_HGLOBAL_SHIFT))  >>
  LOONGARCH_HGLOBAL_SHIFT) << R_TLBENTRY_G_SHIFT |
  (tmp0 & (~(1 << LOONGARCH_HGLOBAL_SHIFT)));
-    ps = ptbase + ptwidth - 1;
+
+    ps = env->lddir_ps - 1;
  if (odd) {
  tmp0 += MAKE_64BIT_MASK(ps, 1);
  }

Re: [PATCH v2] target/loongarch: Add TCG macro in structure CPUArchState

2024-03-04 Thread maobibo





On 2024/3/5 上午12:53, Richard Henderson wrote:

On 3/3/24 16:18, Bibo Mao wrote:
@@ -696,11 +700,15 @@ void loongarch_cpu_dump_state(CPUState *cs, FILE 
*f, int flags)

  {
  LoongArchCPU *cpu = LOONGARCH_CPU(cs);
  CPULoongArchState *env = >env;
-    int i;
+    int i, fp_status;
+#ifdef CONFIG_TCG
+    fp_status = get_float_exception_flags(>fp_status);
+#else
+    fp_status = 0;
+#endif
  qemu_fprintf(f, " PC=%016" PRIx64 " ", env->pc);
-    qemu_fprintf(f, " FCSR0 0x%08x  fp_status 0x%02x\n", env->fcsr0,
- get_float_exception_flags(>fp_status));
+    qemu_fprintf(f, " FCSR0 0x%08x  fp_status 0x%02x\n", env->fcsr0, 
fp_status);


fp_status, I think, is unnecessary to print all of the time.

In update_fcsr0_mask, we ensure that fcsr0 is updated and 
fp_status.exception_flags is 0. So I would expect this field to be 0 all 
of the time -- anything else is a bug.


yes, fp_status is temporary status during float instruction translation, 
it will clear to 0 when fp instruction translation is done.


Will remove print sentence.


+#if defined(CONFIG_TCG) && !defined(CONFIG_USER_ONLY)
+static bool tlb_needed(void *opaque)
+{
+    if (kvm_enabled()) {
+    return false;
+    }
+
+    return true;
+}


Better as return tcg_enabled();

Will return as tcg_enabled(), it is more simpler.

Regards
Bibo Mao



r~

Re: [PATCH] oslib-posix: fix memory leak in touch_all_pages

2024-03-04 Thread Mark Kanda


On 3/4/24 4:48 PM, Paolo Bonzini wrote:

touch_all_pages() can return early, before creating threads.  In this case,
however, it leaks the MemsetContext that it has allocated at the
beginning of the function.

Reported by Coverity as CID 1534922.

Fixes: 04accf43df8 ("oslib-posix: initialize backend memory objects in 
parallel", 2024-02-06)
Cc: Mark Kanda
Signed-off-by: Paolo Bonzini

Reviewed-by: Mark Kanda 

Thanks/regards,
-Mark

---
  util/oslib-posix.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 3c379f96c26..e76441695bd 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -467,11 +467,13 @@ static int touch_all_pages(char *area, size_t hpagesize, 
size_t numpages,
   * preallocating synchronously.
   */
  if (context->num_threads == 1 && !async) {
+ret = 0;
  if (qemu_madvise(area, hpagesize * numpages,
   QEMU_MADV_POPULATE_WRITE)) {
-return -errno;
+ret = -errno;
  }
-return 0;
+g_free(context);
+return ret;
  }
  touch_fn = do_madv_populate_write_pages;
  } else {

Re: rutabaga 0.1.3

2024-03-04 Thread Gurchetan Singh

On Sat, Mar 2, 2024 at 6:38 AM Alyssa Ross  wrote:

> Hi Gurchetan,
>
> > >> > Would this be a suitable commit for the 0.1.3 release of rutabaga?
> > >> >
> > >> >
> https://chromium.googlesource.com/crosvm/crosvm/+/5dfd74a0680d317c6edf44138def886f47cb1c7c
> > >> >
> > >> > The gfxstream/AEMU commits would remain unchanged.
> > >>
> > >> That combination works for me.
> > >
> > > Just FYI, still working on it.  Could take 1-2 more weeks.
> >
> > FYI:
> >
> >
> https://android.googlesource.com/platform/hardware/google/gfxstream/+/refs/tags/v0.1.2-gfxstream-release
> >
> >
> https://android.googlesource.com/platform/hardware/google/aemu/+/refs/tags/v0.1.2-aemu-release
> >
> >
> https://chromium.googlesource.com/crosvm/crosvm/+/refs/tags/v0.1.3-rutabaga-release
>
> Unlike the commit I tested for you, the commit that ended up being
> tagged as v0.1.3-rutabaga-release doesn't work for me:
>
> qemu: The errno is EBADF: Bad file number
> qemu: CHECK failed in rutabaga_cmd_resource_map_blob()
> ../hw/display/virtio-gpu-rutabaga.c:655
> qemu: virtio_gpu_rutabaga_process_cmd: ctrl 0x208, error 0x1200
> qemu: CHECK failed in rutabaga_cmd_resource_unmap_blob()
> ../hw/display/virtio-gpu-rutabaga.c:723
> qemu: virtio_gpu_rutabaga_process_cmd: ctrl 0x209, error 0x1200
> qemu: The errno is EBADF: Bad file number
> qemu: CHECK failed in rutabaga_cmd_resource_map_blob()
> ../hw/display/virtio-gpu-rutabaga.c:655
> qemu: virtio_gpu_rutabaga_process_cmd: ctrl 0x208, error 0x1200
> qemu: CHECK failed in rutabaga_cmd_resource_unmap_blob()
> ../hw/display/virtio-gpu-rutabaga.c:723
> qemu: virtio_gpu_rutabaga_process_cmd: ctrl 0x209, error 0x1200
> qemu: The errno is EBADF: Bad file number
> qemu: CHECK failed in rutabaga_cmd_resource_map_blob()
> ../hw/display/virtio-gpu-rutabaga.c:655
> qemu: virtio_gpu_rutabaga_process_cmd: ctrl 0x208, error 0x1200
> qemu: invalid resource id
> qemu: CHECK failed in rutabaga_cmd_submit_3d()
> ../hw/display/virtio-gpu-rutabaga.c:341
> qemu: virtio_gpu_rutabaga_process_cmd: ctrl 0x207, error 0x1200
> qemu: CHECK failed in rutabaga_cmd_resource_unmap_blob()
> ../hw/display/virtio-gpu-rutabaga.c:723
> qemu: virtio_gpu_rutabaga_process_cmd: ctrl 0x209, error 0x1200
>

Thank you for the bug report .. does crrev.com/c/5342655 fix this for you?


I bisected it to:
>
> commit f3dbf20eedadb135e2fd813474fbb9731d465f3a
> Author: Andrew Walbran 
> Date:   Wed Nov 29 17:23:45 2023 +
>
> rutabaga_gfx: Uprev nix to 0.27.1
>
> The new version of nix uses OwnedFd in various places, which
> allows us
> to have less unsafe code.
>
> TEST=CQ
> BUG=b:293289578
>
> Change-Id: I61aa80c4105eaf1182c5c325109b5aba11cf60de
> Reviewed-on:
> https://chromium-review.googlesource.com/c/crosvm/crosvm/+/5072293
> Auto-Submit: Andrew Walbran 
> Reviewed-by: Gurchetan Singh 
> Reviewed-by: Frederick Mayle 
> Commit-Queue: Frederick Mayle 
>

Re: Why does the vmovdqu works for passthrough device but crashes for emulated device with "illegal operand" error (in x86_64 QEMU, -accel = kvm) ?

2024-03-04 Thread Xu Liu

Hey Paolo,
Thanks for confirming that the AVX is not supported for MMIO space.

So for the emulated device,  basically I have to force the compiler avoid using 
vmovdqu .

I am curious about how kvm emulates those instructions. Do you mind sharing 
some related code pointer ?

Thanks,
Xu

On Mar 4, 2024, at 6:39 PM, Paolo Bonzini  wrote:

!---|
This Message Is From an External Sender

|---!

On 3/4/24 22:59, Alex Williamson wrote:
Since you're not seeing a KVM_EXIT_MMIO I'd guess this is more of a KVM
issue than QEMU (Cc kvm list).  Possibly KVM doesn't emulate vmovdqu
relative to an MMIO access, but honestly I'm not positive that AVX
instructions are meant to work on MMIO space.  I'll let x86 KVM experts
more familiar with specific opcode semantics weigh in on that.

Indeed, KVM's instruction emulator supports some SSE MOV instructions but not 
the corresponding AVX instructions.

Vector instructions however do work on MMIO space, and they are used 
occasionally especially in combination with write-combining memory.  SSE 
support was added to KVM because some operating systems used SSE instructions 
to read and write to VRAM.  However, so far we've never received any reports of 
OSes using AVX instructions on devices that QEMU can emulate (as opposed to, 
for example, GPU VRAM that is passed through).

Thanks,

Paolo

Is your "program" just doing a memcpy() with an mmap() of the PCI BAR
acquired through pci-sysfs or a userspace vfio-pci driver within the
guest?
In QEMU 4a2e242bbb30 ("memory: Don't use memcpy for ram_device
regions") we resolved an issue[1] where QEMU itself was doing a memcpy()
to assigned device MMIO space resulting in breaking functionality of
the device.  IIRC memcpy() was using an SSE instruction that didn't
fault, but didn't work correctly relative to MMIO space either.
So I also wouldn't rule out that the program isn't inherently
misbehaving by using memcpy() and thereby ignoring the nature of the
device MMIO access semantics.  Thanks,
Alex
[1]https://bugs.launchpad.net/qemu/+bug/1384892

Re: Why does the vmovdqu works for passthrough device but crashes for emulated device with "illegal operand" error (in x86_64 QEMU, -accel = kvm) ?

2024-03-04 Thread Xu Liu

Hey Alex,

Thanks for the detailed explanation!

First answer your question:
Is your "program" just doing a memcpy() with an mmap() of the PCI BAR
acquired through pci-sysfs or a userspace vfio-pci driver within the
guest?

My program is using a usersapcee vfio-pci driver for both emulated device and 
the assigned device within the guest OS.
The program is in Rust , and I am using std::ptr::write_volatile to do the 
memory copy.

I tried "x-no-mmap=on” for the assigned device, as you mentioned it did behave 
same as the emulated device.


I am also suspecting this
Possibly KVM doesn't emulate vmovdqu
relative to an MMIO access, but honestly I'm not positive that AVX
instructions are meant to work on MMIO space.

Do you have some suggestions to verify this ? Or some code pointer to check on 
this ?

Thanks ,
Xu

On Mar 4, 2024, at 4:59 PM, Alex Williamson  wrote:

!---|
 This Message Is From an External Sender

|---!

On Sun, 3 Mar 2024 22:20:33 +
Xu Liu mailto:li...@meta.com>> wrote:

Hello,

Recently I am running my programs in QEMU (x86_64) with “-accel=kvm”.
The QEMU version is 6.0.0.

I run my programs in two ways:

1.   I pass through my device through vfio-pci  to QEMU,  this way
works well.

2.  I write an emulated PCI device for QEMU, and run my programs on
the emulated PCI device. This crashes when  the code try to do memory
copy to PCI device when the data length is longer than 16 bytes.
While the  passthrough device works well for the same situation.


After  dump the assembly code.  I noticed when the data is <= 16
bytes,  the mov assembly code is chosen, and it works well.

When the data is > 16 bytes,  the vmovdqu  assembly code is chosen,
and it crashes with “illegal operand”.

Given the code and data are exactly same for both passthrough device
and emulated device.  I am curious about why this happens.

After turn on  kernel trace for kvm by   echo kvm:*
/sys/kernel/debug/tracing/set_event And rerun the QEMU and my code
for both passthrough device and emulated device, I noticed that:

1) for passthrough device,  I didn’t see  any trace events related to
my  gva and gpa.  This makes me think that the memory copy to PCI
device went through different code path . It is handled by the guest
OS without exit to VMX.

2) for emulated device, if I use   compiler flag
target-feature=-avx,-avx2 to force compiler use  mov assembly code,
I can see the memory copy goes through the KVM_EXIT_MMIO, and
everything works well. if I don’t force the compiler use mov ,  the
compiler just chooses the vmovdqu ,  which just crash the programs,
and no KVM_EXIT_MMIO related to my memory copy appears in the trace
events. Looks like the guest OS handles the crash.


Any clue about why the vmovdqu works for passthrough device but not
work for emulated device.

For an assigned device, the device MMIO space will be directly mapped
into the VM address space (assuming the PCI BAR is at least PAGE_SIZE),
so there's no emulation of the access.  You can disable this with the
x-no-mmap=on option for the vfio-pci device, where then I'd guess this
behaves the same as your emulated device (assuming we really don't
reach QEMU for the access).

Since you're not seeing a KVM_EXIT_MMIO I'd guess this is more of a KVM
issue than QEMU (Cc kvm list).  Possibly KVM doesn't emulate vmovdqu
relative to an MMIO access, but honestly I'm not positive that AVX
instructions are meant to work on MMIO space.  I'll let x86 KVM experts
more familiar with specific opcode semantics weigh in on that.

Is your "program" just doing a memcpy() with an mmap() of the PCI BAR
acquired through pci-sysfs or a userspace vfio-pci driver within the
guest?

In QEMU 4a2e242bbb30 ("memory: Don't use memcpy for ram_device
regions") we resolved an issue[1] where QEMU itself was doing a memcpy()
to assigned device MMIO space resulting in breaking functionality of
the device.  IIRC memcpy() was using an SSE instruction that didn't
fault, but didn't work correctly relative to MMIO space either.

So I also wouldn't rule out that the program isn't inherently
misbehaving by using memcpy() and thereby ignoring the nature of the
device MMIO access semantics.  Thanks,

Alex

[1]https://bugs.launchpad.net/qemu/+bug/1384892

1 2 3 4 >

1 - 100 of 388 matches

Mail list logo