[PATCH 0/3] migration: some small cleanups

2024-01-16 Thread peterx
From: Peter Xu 

I found three tiny little patches lying around probably for a long time,
sending it out before deleting the branch.  Review welcomed, thanks.

Peter Xu (3):
  migration: Make threshold_size an uint64_t
  migration: Drop unnecessary check in ram's pending_exact()
  analyze-migration.py: Remove trick on parsing ramblocks

 migration/migration.h|  2 +-
 migration/ram.c  |  9 -
 scripts/analyze-migration.py | 11 +++
 3 files changed, 8 insertions(+), 14 deletions(-)

-- 
2.43.0




[PATCH 3/3] analyze-migration.py: Remove trick on parsing ramblocks

2024-01-16 Thread peterx
From: Peter Xu 

RAM_SAVE_FLAG_MEM_SIZE contains the total length of ramblock idstr to know
whether scanning of ramblocks is complete.  Drop the trick.

Signed-off-by: Peter Xu 
---
 scripts/analyze-migration.py | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/scripts/analyze-migration.py b/scripts/analyze-migration.py
index a39dfb8766..8a254a5b6a 100755
--- a/scripts/analyze-migration.py
+++ b/scripts/analyze-migration.py
@@ -151,17 +151,12 @@ def read(self):
 addr &= ~(self.TARGET_PAGE_SIZE - 1)
 
 if flags & self.RAM_SAVE_FLAG_MEM_SIZE:
-while True:
+total_length = addr
+while total_length > 0:
 namelen = self.file.read8()
-# We assume that no RAM chunk is big enough to ever
-# hit the first byte of the address, so when we see
-# a zero here we know it has to be an address, not the
-# length of the next block.
-if namelen == 0:
-self.file.file.seek(-1, 1)
-break
 self.name = self.file.readstr(len = namelen)
 len = self.file.read64()
+total_length -= len
 self.sizeinfo[self.name] = '0x%016x' % len
 if self.write_memory:
 print(self.name)
-- 
2.43.0




Re: [PATCH v2 2/2] hw/pflash: implement update buffer for block writes

2024-01-16 Thread Peter Xu
On Tue, Jan 16, 2024 at 05:08:28PM +0100, Philippe Mathieu-Daudé wrote:
> On 12/1/24 17:54, Peter Maydell wrote:
> > On Mon, 8 Jan 2024 at 13:06, Philippe Mathieu-Daudé  
> > wrote:
> > > 
> > > Hi Gerd,
> > > 
> > > On 8/1/24 13:53, Philippe Mathieu-Daudé wrote:
> > > > From: Gerd Hoffmann 
> > > > 
> > > > Add an update buffer where all block updates are staged.
> > > > Flush or discard updates properly, so we should never see
> > > > half-completed block writes in pflash storage.
> > > > 
> > > > Drop a bunch of FIXME comments ;)
> > > > 
> > > > Signed-off-by: Gerd Hoffmann 
> > > > Message-ID: <20240105135855.268064-3-kra...@redhat.com>
> > > > Signed-off-by: Philippe Mathieu-Daudé 
> > > > ---
> > > >hw/block/pflash_cfi01.c | 106 
> > > > ++--
> > > >1 file changed, 80 insertions(+), 26 deletions(-)
> 
> 
> > > > +static const VMStateDescription vmstate_pflash_blk_write = {
> > > > +.name = "pflash_cfi01_blk_write",
> > > > +.version_id = 1,
> > > > +.minimum_version_id = 1,
> > > > +.needed = pflash_blk_write_state_needed,
> > > > +.fields = (const VMStateField[]) {
> > > > +VMSTATE_VBUFFER_UINT32(blk_bytes, PFlashCFI01, 0, NULL, 
> > > > writeblock_size),
> > > 
> > > I don't get the difference with VMSTATE_VBUFFER_ALLOC_UINT32() which
> > > sets VMS_ALLOC. In this case pflash_cfi01_realize() does the alloc so
> > > we don't need VMS_ALLOC?
> > 
> > Yes, that's the idea. A VMS_ALLOC vmstate type means "this
> > block of memory is dynamically sized at runtime, so when the
> > migration code is doing inbound migration it needs to
> > allocate a buffer of the right size first (based on some
> > state struct field we've already migrated) and then put the
> > incoming data into it". VMS_VBUFFER means "the size of the buffer
> > isn't a compile-time constant, so we need to fish it out of
> > some other state struct field". So:
> > 
> >   VMSTATE_VBUFFER_UINT32: we need to migrate (a pointer to) an array
> >   of uint32_t; the size of that is in some other struct field,
> >   but it's a runtime constant and we can assume the memory has
> >   already been allocated
> > 
> >   VMSTATE_VBUFFER_ALLOC_UINT32: we need to migrate an array
> >   of uint32_t of variable size dependent on the inbound migration
> >   data, and so the migration code must allocate it
> 
> Thanks Peter!
> 
> Do you mind if we commit your explanation as is? As:

Any attempt to provide more documents there for the macros would be nice if
anyone would like to post a patch.  Though some comments below for this
specific one.

> 
> -- >8 --
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index 294d2d8486..5c6f6c5c32 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -573,4 +573,6 @@ extern const VMStateInfo vmstate_info_qlist;
> 
> -/* a variable length array (i.e. _type *_field) but we know the
> - * length
> +/**
> + * VMSTATE_STRUCT_VARRAY_POINTER_KNOWN:
> + *
> + * A variable length array (i.e. _type *_field) but we know the length.
>   */
> @@ -678,2 +680,10 @@ extern const VMStateInfo vmstate_info_qlist;
> 
> +/**
> + * VMSTATE_VBUFFER_UINT32:
> + *
> + * We need to migrate (a pointer to) an array of uint32_t; the size of

IIUC it's not an array of uint32_t, but a buffer with dynamic size.  the
"uint32_t" is trying to describe the type of the size field, afaict.  Same
issue for below one.

For example, comparing VMSTATE_VBUFFER_UINT32 vs VMSTATE_VBUFFER, we should
see the only difference is:

.size_offset  = vmstate_offset_value(_state, _field_size, int32_t),\

vs:

.size_offset  = vmstate_offset_value(_state, _field_size, uint32_t),\

I think it checks the size field to match that type.

Interestingly, when I look at the code to use the size, it looks like a bug
to me, where we always parse the size field to be int32_t..

static int vmstate_size(void *opaque, const VMStateField *field)
{
int size = field->size;

if (field->flags & VMS_VBUFFER) {
size = *(int32_t *)(opaque + field->size_offset); <--- see 
here..
if (field->flags & VMS_MULTIPLY) {
size *= field->size;
}
}

return size;
}

I think it'll start to crash things when bit32 start to be used.  And I had
a feeling that this is overlooked in the commit a19cbfb346 ("spice: add qxl
device") where VMSTATE_VBUFFER_UINT32 is introduced.

I don't have a good way to trivially fix it because we don't remember that
type of size in vmsd.  However a simple solution could be that we convert
all existing VMSTATE_VBUFFER() (potentially, VMSTATE_PARTIAL_VBUFFER users;
there seems to have only 1 caller..) to always use uint32_t.  I don't think
it's wise to try using a signed for size anyway, and it should be
compatible change as we doubled the size.

I'll hold a bit to see whether there's some comment, then I can try to post
a patch.

> + * that is in some other struct field, but it's a 

Re: [PULL 16/22] gdbstub: expose api to find registers

2024-01-16 Thread Akihiko Odaki

On 2024/01/16 19:48, Alex Bennée wrote:

Expose an internal API to QEMU to return all the registers for a vCPU.
The list containing the details required to called gdb_read_register().

Based-on: <20231025093128.33116-15-akihiko.od...@daynix.com>
Cc: Akihiko Odaki 
Message-Id: <20240103173349.398526-38-alex.ben...@linaro.org>
Signed-off-by: Alex Bennée 


I'm not for to pull this, "[PULL 17/22] plugins: add an API to read 
registers", and "[PULL 19/22] contrib/plugins: extend execlog to track 
register changes" in the current state. I have only commented the API 
aspect of these patches, but looking at them now, I see something wrong 
with their implementations. I'll add comments to respective patches.




diff --git a/include/exec/gdbstub.h b/include/exec/gdbstub.h
index da9ddfe54c5..7bddea8259e 100644
--- a/include/exec/gdbstub.h
+++ b/include/exec/gdbstub.h
@@ -111,6 +111,53 @@ void gdb_feature_builder_end(const GDBFeatureBuilder 
*builder);
   */
  const GDBFeature *gdb_find_static_feature(const char *xmlname);
  
+/**

+ * gdb_find_feature() - Find a feature associated with a CPU.
+ * @cpu: The CPU associated with the feature.
+ * @name: The feature's name.
+ *
+ * Return: The feature's number.
+ */
+int gdb_find_feature(CPUState *cpu, const char *name);


This function is not used.


+
+/**
+ * gdb_find_feature_register() - Find a register associated with a CPU.
+ * @cpu: The CPU associated with the register.
+ * @feature: The feature's number returned by gdb_find_feature().
+ * @name: The register's name.
+ *
+ * Return: The register's number.
+ */
+int gdb_find_feature_register(CPUState *cpu, int feature, const char *name);


This function is also no longer needed.

Regards,
Akihiko Odaki



Re: Re: [PATCH] tests/docker: Add sqlite3 module to openSUSE Leap container

2024-01-16 Thread Andrea Bolognani
On Wed, Jan 17, 2024 at 08:18:27AM +0100, Thomas Huth wrote:
> On 17/01/2024 00.09, Fabiano Rosas wrote:
> > Avocado needs sqlite3:
> >
> >Failed to load plugin from module "avocado.plugins.journal":
> >ImportError("Module 'sqlite3' is not installed.
> >Use: sudo zypper install python311 to install it")
> >
> > Include the appropriate package in the dockerfile.
> >
> >  From 'zypper info python311':
> >"This package supplies rich command line features provided by
> >readline, and sqlite3 support for the interpreter core, thus forming
> >a so called "extended" runtime."

Weird choice on Python's part to have sqlite3 support as part of the
standard library IMO, but that's "batteries included" for you :)

> > +++ b/tests/docker/dockerfiles/opensuse-leap.docker
> > @@ -90,6 +90,7 @@ RUN zypper update -y && \
> >  pcre-devel-static \
> >  pipewire-devel \
> >  pkgconfig \
> > +   python311 \
> >  python311-base \
> >  python311-pip \
> >  python311-setuptools \
>
> AFAIK tests/docker/dockerfiles/opensuse-leap.docker is auto-generated, so
> this will be lost once somebody else runs lcitool again...
>
> I don't really have a clue, but I guess this has to be fixed in the upstream
> lcitool first ( https://gitlab.com/libvirt/libvirt-ci ), and then we need to
> update our lcitool status in QEMU afterwards. Maybe Daniel can advise for
> the right stteps here...?

It looks like a bunch of mappings are maintained in
tests/lcitool/mappings.yml instead of the main lcitool repository. So
I think you need to apply the diff below, then run

  $ git submodule update --init tests/lcitool/libvirt-ci
  $ tests/lcitool/refresh

to propagate the changes to the generated files.


diff --git a/tests/lcitool/mappings.yml b/tests/lcitool/mappings.yml
index 0b908882f1..407c03301b 100644
--- a/tests/lcitool/mappings.yml
+++ b/tests/lcitool/mappings.yml
@@ -59,6 +59,10 @@ mappings:
 CentOSStream8:
 OpenSUSELeap15:

+  python3-sqlite3:
+CentOSStream8: python38
+OpenSUSELeap15: python311
+
   python3-tomli:
 # test using tomllib
 apk:
diff --git a/tests/lcitool/projects/qemu.yml b/tests/lcitool/projects/qemu.yml
index 82092c9f17..149b15de57 100644
--- a/tests/lcitool/projects/qemu.yml
+++ b/tests/lcitool/projects/qemu.yml
@@ -97,6 +97,7 @@ packages:
  - python3-pip
  - python3-sphinx
  - python3-sphinx-rtd-theme
+ - python3-sqlite3
  - python3-tomli
  - python3-venv
  - rpm2cpio
-- 
Andrea Bolognani / Red Hat / Virtualization




Re: [PATCH] coroutine-ucontext: Save fake stack for pooled coroutine

2024-01-16 Thread Akihiko Odaki

On 2024/01/16 17:42, Marc-André Lureau wrote:

Hi

On Mon, Jan 15, 2024 at 10:49 PM Stefan Hajnoczi  wrote:


On Fri, Jan 12, 2024 at 07:36:19PM +0900, Akihiko Odaki wrote:

Coroutine may be pooled even after COROUTINE_TERMINATE if
CONFIG_COROUTINE_POOL is enabled and fake stack should be saved in
such a case to keep AddressSanitizerUseAfterReturn working. Even worse,
I'm seeing stack corruption without fake stack being saved.

Signed-off-by: Akihiko Odaki 


Thanks Akihiko, this is solving a crash when enabling ASAN!


---
  util/coroutine-ucontext.c | 21 +++--
  1 file changed, 19 insertions(+), 2 deletions(-)


Adding Marc-André Lureau and Lingfeng Yang, who authored the code in
question.


Side note:
I am surprised that commit 0aebab04b9  "configure: add --enable-tsan
flag + fiber annotations" changed code like this:
  {
  #ifdef CONFIG_ASAN
-__sanitizer_start_switch_fiber(fake_stack_save, bottom, size);
+__sanitizer_start_switch_fiber(
+action == COROUTINE_TERMINATE ? NULL : fake_stack_save,
+bottom, size);
+#endif
+#ifdef CONFIG_TSAN
+void *curr_fiber =
+__tsan_get_current_fiber();
+__tsan_acquire(curr_fiber);
+
+*fake_stack_save = curr_fiber;
+__tsan_switch_to_fiber(new_fiber, 0);  /* 0=synchronize */
  #endif

*fake_stack_save = curr_fiber:
Is TSAN compatible with ASAN ? I guess not.


meson.build has the following:
if get_option('tsan')
  if get_option('sanitizers')
error('TSAN is not supported with other sanitizers')
  endif



It would probably help to have more explicit comments & errors if such
a case happens.


Added G_STATIC_ASSERT(!IS_ENABLED(CONFIG_TSAN)) in v2.





Stefan



diff --git a/util/coroutine-ucontext.c b/util/coroutine-ucontext.c
index 7b304c79d942..e62ced5d6779 100644
--- a/util/coroutine-ucontext.c
+++ b/util/coroutine-ucontext.c
@@ -124,8 +124,9 @@ void start_switch_fiber_asan(CoroutineAction action, void 
**fake_stack_save,
  {
  #ifdef CONFIG_ASAN
  __sanitizer_start_switch_fiber(
-action == COROUTINE_TERMINATE ? NULL : fake_stack_save,
-bottom, size);
+!IS_ENABLED(CONFIG_COROUTINE_POOL) && action == COROUTINE_TERMINATE ?
+NULL : fake_stack_save,
+bottom, size);



Ok, changing back the commit from Lingfeng when coroutine pools are enabled.


  #endif
  }

@@ -269,10 +270,26 @@ static inline void 
valgrind_stack_deregister(CoroutineUContext *co)
  #endif
  #endif

+#if defined(CONFIG_ASAN) && defined(CONFIG_COROUTINE_POOL)
+static void coroutine_fn terminate(void *opaque)
+{
+CoroutineUContext *to = DO_UPCAST(CoroutineUContext, base, opaque);
+
+__sanitizer_start_switch_fiber(NULL, to->stack, to->stack_size);
+siglongjmp(to->env, COROUTINE_ENTER);
+}


looking at 
https://github.com/llvm/llvm-project/blob/main/compiler-rt/test/asan/TestCases/Linux/swapcontext_annotation.cpp,
that seems correct to me to destroy the fake_stack.

However, not switching back with qemu_coroutine_switch() may create
issues: set_current() (and tsan) not being called appropriately.


Thanks for catching this. Added missing set_current() call and 
G_STATIC_ASSERT(!IS_ENABLED(CONFIG_TSAN)) in v2 as I described earlier.




Should we introduce another action like COROUTINE_DELETE?


I don't think so. CoroutineAction is part of the common interface for 
coroutine backends. The need to switch back to the destroyed coroutine 
is a peculiarity unique to ucontext and better to be contained in 
coroutine-ucontext.c.


Alternatively we can add a bool parameter to qemu_coroutine_switch() to 
tell to destroy the fake stack, but probably it's not that worth to 
share qemu_coroutine_switch() code; while coroutine-ucontext.c already 
have a few places that call start_switch_fiber_asan() or siglongjmp() 
and they are somewhat similar, they are still too diverged to unify.




Re: [PATCH v2] coroutine-ucontext: Save fake stack for pooled coroutine

2024-01-16 Thread Marc-André Lureau
Hi

On Wed, Jan 17, 2024 at 11:26 AM Marc-André Lureau
 wrote:
>
> Hi
>
> On Wed, Jan 17, 2024 at 11:06 AM Akihiko Odaki  
> wrote:
> >
> > Coroutine may be pooled even after COROUTINE_TERMINATE if
> > CONFIG_COROUTINE_POOL is enabled and fake stack should be saved in
> > such a case to keep AddressSanitizerUseAfterReturn working. Even worse,
> > I'm seeing stack corruption without fake stack being saved.
> >
> > Signed-off-by: Akihiko Odaki 
> > ---
> > Changes in v2:
> > - Added missing set_current() (Marc-André Lureau)
> > - Added G_STATIC_ASSERT(!IS_ENABLED(CONFIG_TSAN)) (Marc-André Lureau)
> > - Renamed terminate() to terminate_asan() for clarity and consistency.
> > - Changed terminate_asan() to call start_switch_fiber_asan() for
> >   consistency.
> > - Link to v1: 
> > https://lore.kernel.org/r/20240112-asan-v1-1-e330f0d00...@daynix.com
> > ---
> >  util/coroutine-ucontext.c | 35 ++-
> >  1 file changed, 26 insertions(+), 9 deletions(-)
> >
> > diff --git a/util/coroutine-ucontext.c b/util/coroutine-ucontext.c
> > index 7b304c79d942..8ef603d081ea 100644
> > --- a/util/coroutine-ucontext.c
> > +++ b/util/coroutine-ucontext.c
> > @@ -119,13 +119,11 @@ void finish_switch_fiber(void *fake_stack_save)
> >
> >  /* always_inline is required to avoid TSan runtime fatal errors. */
> >  static inline __attribute__((always_inline))
> > -void start_switch_fiber_asan(CoroutineAction action, void 
> > **fake_stack_save,
> > +void start_switch_fiber_asan(void **fake_stack_save,
> >   const void *bottom, size_t size)
> >  {
> >  #ifdef CONFIG_ASAN
> > -__sanitizer_start_switch_fiber(
> > -action == COROUTINE_TERMINATE ? NULL : fake_stack_save,
> > -bottom, size);
> > +__sanitizer_start_switch_fiber(fake_stack_save, bottom, size);
> >  #endif
> >  }
> >
> > @@ -165,7 +163,7 @@ static void coroutine_trampoline(int i0, int i1)
> >  if (!sigsetjmp(self->env, 0)) {
> >  CoroutineUContext *leaderp = get_ptr_leader();
> >
> > -start_switch_fiber_asan(COROUTINE_YIELD, _stack_save,
> > +start_switch_fiber_asan(_stack_save,
> >  leaderp->stack, leaderp->stack_size);
> >  start_switch_fiber_tsan(_stack_save, self, true); /* 
> > true=caller */
> >  siglongjmp(*(sigjmp_buf *)co->entry_arg, 1);
> > @@ -226,8 +224,7 @@ Coroutine *qemu_coroutine_new(void)
> >
> >  /* swapcontext() in, siglongjmp() back out */
> >  if (!sigsetjmp(old_env, 0)) {
> > -start_switch_fiber_asan(COROUTINE_YIELD, _stack_save, 
> > co->stack,
> > -co->stack_size);
> > +start_switch_fiber_asan(_stack_save, co->stack, 
> > co->stack_size);
> >  start_switch_fiber_tsan(_stack_save,
> >  co, false); /* false=not caller */
> >
> > @@ -269,10 +266,28 @@ static inline void 
> > valgrind_stack_deregister(CoroutineUContext *co)
> >  #endif
> >  #endif
> >
> > +#if defined(CONFIG_ASAN) && defined(CONFIG_COROUTINE_POOL)
> > +static void coroutine_fn terminate_asan(void *opaque)
> > +{
> > +CoroutineUContext *to = DO_UPCAST(CoroutineUContext, base, opaque);
> > +
> > +set_current(opaque);
> > +start_switch_fiber_asan(NULL, to->stack, to->stack_size);
> > +G_STATIC_ASSERT(!IS_ENABLED(CONFIG_TSAN));
> > +siglongjmp(to->env, COROUTINE_ENTER);
> > +}
> > +#endif
> > +
> >  void qemu_coroutine_delete(Coroutine *co_)
> >  {
> >  CoroutineUContext *co = DO_UPCAST(CoroutineUContext, base, co_);
> >
> > +#if defined(CONFIG_ASAN) && defined(CONFIG_COROUTINE_POOL)
> > +co_->entry_arg = qemu_coroutine_self();
> > +co_->entry = terminate_asan;
> > +qemu_coroutine_switch(co_->entry_arg, co_, COROUTINE_ENTER);
> > +#endif
> > +
> >  #ifdef CONFIG_VALGRIND_H
> >  valgrind_stack_deregister(co);
> >  #endif
> > @@ -305,8 +320,10 @@ qemu_coroutine_switch(Coroutine *from_, Coroutine *to_,
> >
> >  ret = sigsetjmp(from->env, 0);
> >  if (ret == 0) {
> > -start_switch_fiber_asan(action, _stack_save, to->stack,
> > -to->stack_size);
> > +start_switch_fiber_asan(IS_ENABLED(CONFIG_COROUTINE_POOL) ||
> > +action != COROUTINE_TERMINATE ?
> > +_stack_save : NULL,
> > +to->stack, to->stack_size);
>
> given that the coroutine is reentered on delete to clear the fake
> stack, can we just pass _stack_save here?
>

Ah, terminate_asan() is only called when the pool is enabled.

Reviewed-by: Marc-André Lureau 




-- 
Marc-André Lureau



Re: [PATCH v2] coroutine-ucontext: Save fake stack for pooled coroutine

2024-01-16 Thread Marc-André Lureau
Hi

On Wed, Jan 17, 2024 at 11:06 AM Akihiko Odaki  wrote:
>
> Coroutine may be pooled even after COROUTINE_TERMINATE if
> CONFIG_COROUTINE_POOL is enabled and fake stack should be saved in
> such a case to keep AddressSanitizerUseAfterReturn working. Even worse,
> I'm seeing stack corruption without fake stack being saved.
>
> Signed-off-by: Akihiko Odaki 
> ---
> Changes in v2:
> - Added missing set_current() (Marc-André Lureau)
> - Added G_STATIC_ASSERT(!IS_ENABLED(CONFIG_TSAN)) (Marc-André Lureau)
> - Renamed terminate() to terminate_asan() for clarity and consistency.
> - Changed terminate_asan() to call start_switch_fiber_asan() for
>   consistency.
> - Link to v1: 
> https://lore.kernel.org/r/20240112-asan-v1-1-e330f0d00...@daynix.com
> ---
>  util/coroutine-ucontext.c | 35 ++-
>  1 file changed, 26 insertions(+), 9 deletions(-)
>
> diff --git a/util/coroutine-ucontext.c b/util/coroutine-ucontext.c
> index 7b304c79d942..8ef603d081ea 100644
> --- a/util/coroutine-ucontext.c
> +++ b/util/coroutine-ucontext.c
> @@ -119,13 +119,11 @@ void finish_switch_fiber(void *fake_stack_save)
>
>  /* always_inline is required to avoid TSan runtime fatal errors. */
>  static inline __attribute__((always_inline))
> -void start_switch_fiber_asan(CoroutineAction action, void **fake_stack_save,
> +void start_switch_fiber_asan(void **fake_stack_save,
>   const void *bottom, size_t size)
>  {
>  #ifdef CONFIG_ASAN
> -__sanitizer_start_switch_fiber(
> -action == COROUTINE_TERMINATE ? NULL : fake_stack_save,
> -bottom, size);
> +__sanitizer_start_switch_fiber(fake_stack_save, bottom, size);
>  #endif
>  }
>
> @@ -165,7 +163,7 @@ static void coroutine_trampoline(int i0, int i1)
>  if (!sigsetjmp(self->env, 0)) {
>  CoroutineUContext *leaderp = get_ptr_leader();
>
> -start_switch_fiber_asan(COROUTINE_YIELD, _stack_save,
> +start_switch_fiber_asan(_stack_save,
>  leaderp->stack, leaderp->stack_size);
>  start_switch_fiber_tsan(_stack_save, self, true); /* 
> true=caller */
>  siglongjmp(*(sigjmp_buf *)co->entry_arg, 1);
> @@ -226,8 +224,7 @@ Coroutine *qemu_coroutine_new(void)
>
>  /* swapcontext() in, siglongjmp() back out */
>  if (!sigsetjmp(old_env, 0)) {
> -start_switch_fiber_asan(COROUTINE_YIELD, _stack_save, co->stack,
> -co->stack_size);
> +start_switch_fiber_asan(_stack_save, co->stack, co->stack_size);
>  start_switch_fiber_tsan(_stack_save,
>  co, false); /* false=not caller */
>
> @@ -269,10 +266,28 @@ static inline void 
> valgrind_stack_deregister(CoroutineUContext *co)
>  #endif
>  #endif
>
> +#if defined(CONFIG_ASAN) && defined(CONFIG_COROUTINE_POOL)
> +static void coroutine_fn terminate_asan(void *opaque)
> +{
> +CoroutineUContext *to = DO_UPCAST(CoroutineUContext, base, opaque);
> +
> +set_current(opaque);
> +start_switch_fiber_asan(NULL, to->stack, to->stack_size);
> +G_STATIC_ASSERT(!IS_ENABLED(CONFIG_TSAN));
> +siglongjmp(to->env, COROUTINE_ENTER);
> +}
> +#endif
> +
>  void qemu_coroutine_delete(Coroutine *co_)
>  {
>  CoroutineUContext *co = DO_UPCAST(CoroutineUContext, base, co_);
>
> +#if defined(CONFIG_ASAN) && defined(CONFIG_COROUTINE_POOL)
> +co_->entry_arg = qemu_coroutine_self();
> +co_->entry = terminate_asan;
> +qemu_coroutine_switch(co_->entry_arg, co_, COROUTINE_ENTER);
> +#endif
> +
>  #ifdef CONFIG_VALGRIND_H
>  valgrind_stack_deregister(co);
>  #endif
> @@ -305,8 +320,10 @@ qemu_coroutine_switch(Coroutine *from_, Coroutine *to_,
>
>  ret = sigsetjmp(from->env, 0);
>  if (ret == 0) {
> -start_switch_fiber_asan(action, _stack_save, to->stack,
> -to->stack_size);
> +start_switch_fiber_asan(IS_ENABLED(CONFIG_COROUTINE_POOL) ||
> +action != COROUTINE_TERMINATE ?
> +_stack_save : NULL,
> +to->stack, to->stack_size);

given that the coroutine is reentered on delete to clear the fake
stack, can we just pass _stack_save here?

otherwise, looks ok to me

>  start_switch_fiber_tsan(_stack_save,
>  to, false); /* false=not caller */
>  siglongjmp(to->env, action);
>
> ---
> base-commit: f614acb7450282a119d85d759f27eae190476058
> change-id: 20240112-asan-eb695c769f40
>
> Best regards,
> --
> Akihiko Odaki 
>


-- 
Marc-André Lureau



Re: [PATCH] tests/docker: Add sqlite3 module to openSUSE Leap container

2024-01-16 Thread Thomas Huth

On 17/01/2024 00.09, Fabiano Rosas wrote:

Avocado needs sqlite3:

   Failed to load plugin from module "avocado.plugins.journal":
   ImportError("Module 'sqlite3' is not installed.
   Use: sudo zypper install python311 to install it")

Include the appropriate package in the dockerfile.

 From 'zypper info python311':
   "This package supplies rich command line features provided by
   readline, and sqlite3 support for the interpreter core, thus forming
   a so called "extended" runtime."

Signed-off-by: Fabiano Rosas 
---
  tests/docker/dockerfiles/opensuse-leap.docker | 1 +
  1 file changed, 1 insertion(+)

diff --git a/tests/docker/dockerfiles/opensuse-leap.docker 
b/tests/docker/dockerfiles/opensuse-leap.docker
index dc0e36ce48..cf753383a4 100644
--- a/tests/docker/dockerfiles/opensuse-leap.docker
+++ b/tests/docker/dockerfiles/opensuse-leap.docker
@@ -90,6 +90,7 @@ RUN zypper update -y && \
 pcre-devel-static \
 pipewire-devel \
 pkgconfig \
+   python311 \
 python311-base \
 python311-pip \
 python311-setuptools \


AFAIK tests/docker/dockerfiles/opensuse-leap.docker is auto-generated, so 
this will be lost once somebody else runs lcitool again...


I don't really have a clue, but I guess this has to be fixed in the upstream 
lcitool first ( https://gitlab.com/libvirt/libvirt-ci ), and then we need to 
update our lcitool status in QEMU afterwards. Maybe Daniel can advise for 
the right stteps here...?


 Thomas




Re: Error messages in the "avocado-system-opensuse" QEMU CI job

2024-01-16 Thread Philippe Mathieu-Daudé

On 16/1/24 16:31, Thomas Huth wrote:


  Hi!

I noticed that there are some error messages about a missing python 
module in the "avocado-system-opensuse" QEMU CI job:


  https://gitlab.com/qemu-project/qemu/-/jobs/5911721890#L191

Anybody interested in fixing those?


Fabiano is looking at it:
https://lore.kernel.org/qemu-devel/20240116230924.23053-1-faro...@suse.de/




Re: [PATCH V2 11/11] vfio: allow cpr-reboot migration if suspended

2024-01-16 Thread Peter Xu
On Tue, Jan 16, 2024 at 03:37:33PM -0500, Steven Sistare wrote:
> On 1/15/2024 2:33 AM, Peter Xu wrote:
> > On Fri, Jan 12, 2024 at 07:05:10AM -0800, Steve Sistare wrote:
> >> Allow cpr-reboot for vfio if the guest is in the suspended runstate.  The
> >> guest drivers' suspend methods flush outstanding requests and re-initialize
> >> the devices, and thus there is no device state to save and restore.  The
> >> user is responsible for suspending the guest before initiating cpr, such as
> >> by issuing guest-suspend-ram to the qemu guest agent.
> >>
> >> Relax the vfio blocker so it does not apply to cpr, and add a notifier that
> >> verifies the guest is suspended.  Skip dirty page tracking, which is N/A 
> >> for
> >> cpr, to avoid ioctl errors.
> >>
> >> Signed-off-by: Steve Sistare 
> >> ---
> >>  hw/vfio/common.c  |  2 +-
> >>  hw/vfio/cpr.c | 20 
> >>  hw/vfio/migration.c   |  2 +-
> >>  include/hw/vfio/vfio-common.h |  1 +
> >>  migration/ram.c   |  9 +
> >>  5 files changed, 28 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> >> index 0b3352f..09af934 100644
> >> --- a/hw/vfio/common.c
> >> +++ b/hw/vfio/common.c
> >> @@ -128,7 +128,7 @@ int vfio_block_multiple_devices_migration(VFIODevice 
> >> *vbasedev, Error **errp)
> >>  error_setg(_devices_migration_blocker,
> >> "Multiple VFIO devices migration is supported only if all 
> >> of "
> >> "them support P2P migration");
> >> -ret = migrate_add_blocker(_devices_migration_blocker, errp);
> >> +ret = migrate_add_blocker_normal(_devices_migration_blocker, 
> >> errp);
> >>  
> >>  return ret;
> >>  }
> >> diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
> >> index bbd1c7a..9f4b1fe 100644
> >> --- a/hw/vfio/cpr.c
> >> +++ b/hw/vfio/cpr.c
> >> @@ -7,13 +7,33 @@
> >>  
> >>  #include "qemu/osdep.h"
> >>  #include "hw/vfio/vfio-common.h"
> >> +#include "migration/misc.h"
> >>  #include "qapi/error.h"
> >> +#include "sysemu/runstate.h"
> >> +
> >> +static int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier,
> >> +MigrationEvent *e, Error **errp)
> >> +{
> >> +if (e->state == MIGRATION_STATUS_SETUP &&
> >> +!runstate_check(RUN_STATE_SUSPENDED)) {
> >> +
> >> +error_setg(errp,
> >> +"VFIO device only supports cpr-reboot for runstate 
> >> suspended");
> >> +
> >> +return -1;
> >> +}
> > 
> > What happens if the guest is suspended during SETUP, but then quickly waked
> > up before CPR migration completes?
> 
> That is a management layer bug -- we told them the VM must be suspended.
> However, I will reject a wakeup request if migration is active and mode is 
> cpr.

Yes it needs to be enforced if it is required by cpr-reboot.  QEMU
hopefully should never rely on mgmt app behavior.

> 
> >> +return 0;
> >> +}
> >>  
> >>  int vfio_cpr_register_container(VFIOContainer *container, Error **errp)
> >>  {
> >> +migration_add_notifier_mode(>cpr_reboot_notifier,
> >> +vfio_cpr_reboot_notifier,
> >> +MIG_MODE_CPR_REBOOT);
> >>  return 0;
> >>  }
> >>  
> >>  void vfio_cpr_unregister_container(VFIOContainer *container)
> >>  {
> >> +migration_remove_notifier(>cpr_reboot_notifier);
> >>  }
> >> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> >> index 534fddf..488905d 100644
> >> --- a/hw/vfio/migration.c
> >> +++ b/hw/vfio/migration.c
> >> @@ -895,7 +895,7 @@ static int vfio_block_migration(VFIODevice *vbasedev, 
> >> Error *err, Error **errp)
> >>  vbasedev->migration_blocker = error_copy(err);
> >>  error_free(err);
> >>  
> >> -return migrate_add_blocker(>migration_blocker, errp);
> >> +return migrate_add_blocker_normal(>migration_blocker, errp);
> >>  }
> >>  
> >>  /* -- 
> >> */
> >> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> >> index 1add5b7..7a46e24 100644
> >> --- a/include/hw/vfio/vfio-common.h
> >> +++ b/include/hw/vfio/vfio-common.h
> >> @@ -78,6 +78,7 @@ struct VFIOGroup;
> >>  typedef struct VFIOContainer {
> >>  VFIOContainerBase bcontainer;
> >>  int fd; /* /dev/vfio/vfio, empowered by the attached groups */
> >> +NotifierWithReturn cpr_reboot_notifier;
> >>  unsigned iommu_type;
> >>  QLIST_HEAD(, VFIOGroup) group_list;
> >>  } VFIOContainer;
> >> diff --git a/migration/ram.c b/migration/ram.c
> >> index 1923366..44ad324 100644
> >> --- a/migration/ram.c
> >> +++ b/migration/ram.c
> >> @@ -2392,8 +2392,8 @@ static void ram_save_cleanup(void *opaque)
> >>  RAMState **rsp = opaque;
> >>  RAMBlock *block;
> >>  
> >> -/* We don't use dirty log with background snapshots */
> >> -if (!migrate_background_snapshot()) {
> >> +/* We don't use dirty log with background snapshots 

Re: [PATCH] tests/docker: Add sqlite3 module to openSUSE Leap container

2024-01-16 Thread Philippe Mathieu-Daudé

Hi Fabiano,

On 17/1/24 00:09, Fabiano Rosas wrote:

Avocado needs sqlite3:

   Failed to load plugin from module "avocado.plugins.journal":
   ImportError("Module 'sqlite3' is not installed.
   Use: sudo zypper install python311 to install it")

Include the appropriate package in the dockerfile.

 From 'zypper info python311':
   "This package supplies rich command line features provided by
   readline, and sqlite3 support for the interpreter core, thus forming
   a so called "extended" runtime."

Signed-off-by: Fabiano Rosas 
---
  tests/docker/dockerfiles/opensuse-leap.docker | 1 +
  1 file changed, 1 insertion(+)

diff --git a/tests/docker/dockerfiles/opensuse-leap.docker 
b/tests/docker/dockerfiles/opensuse-leap.docker
index dc0e36ce48..cf753383a4 100644
--- a/tests/docker/dockerfiles/opensuse-leap.docker
+++ b/tests/docker/dockerfiles/opensuse-leap.docker
@@ -90,6 +90,7 @@ RUN zypper update -y && \
 pcre-devel-static \
 pipewire-devel \
 pkgconfig \
+   python311 \
 python311-base \
 python311-pip \
 python311-setuptools \


See in this file header:

  # THIS FILE WAS AUTO-GENERATED
  #
  #  $ lcitool dockerfile --layers all opensuse-leap-15 qemu
  #
  # https://gitlab.com/libvirt/libvirt-ci

libvirt-ci maintains dependencies required to build QEMU,
in this case since it is a 'testing QEMU' dependency, you
might add it in the generate_dockerfile("opensuse-leap")
call in tests/lcitool/refresh (otherwise raise an issue
in libvirt-ci about it).

Regards,

Phil.



[PATCH v2] coroutine-ucontext: Save fake stack for pooled coroutine

2024-01-16 Thread Akihiko Odaki
Coroutine may be pooled even after COROUTINE_TERMINATE if
CONFIG_COROUTINE_POOL is enabled and fake stack should be saved in
such a case to keep AddressSanitizerUseAfterReturn working. Even worse,
I'm seeing stack corruption without fake stack being saved.

Signed-off-by: Akihiko Odaki 
---
Changes in v2:
- Added missing set_current() (Marc-André Lureau)
- Added G_STATIC_ASSERT(!IS_ENABLED(CONFIG_TSAN)) (Marc-André Lureau)
- Renamed terminate() to terminate_asan() for clarity and consistency.
- Changed terminate_asan() to call start_switch_fiber_asan() for
  consistency.
- Link to v1: 
https://lore.kernel.org/r/20240112-asan-v1-1-e330f0d00...@daynix.com
---
 util/coroutine-ucontext.c | 35 ++-
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/util/coroutine-ucontext.c b/util/coroutine-ucontext.c
index 7b304c79d942..8ef603d081ea 100644
--- a/util/coroutine-ucontext.c
+++ b/util/coroutine-ucontext.c
@@ -119,13 +119,11 @@ void finish_switch_fiber(void *fake_stack_save)
 
 /* always_inline is required to avoid TSan runtime fatal errors. */
 static inline __attribute__((always_inline))
-void start_switch_fiber_asan(CoroutineAction action, void **fake_stack_save,
+void start_switch_fiber_asan(void **fake_stack_save,
  const void *bottom, size_t size)
 {
 #ifdef CONFIG_ASAN
-__sanitizer_start_switch_fiber(
-action == COROUTINE_TERMINATE ? NULL : fake_stack_save,
-bottom, size);
+__sanitizer_start_switch_fiber(fake_stack_save, bottom, size);
 #endif
 }
 
@@ -165,7 +163,7 @@ static void coroutine_trampoline(int i0, int i1)
 if (!sigsetjmp(self->env, 0)) {
 CoroutineUContext *leaderp = get_ptr_leader();
 
-start_switch_fiber_asan(COROUTINE_YIELD, _stack_save,
+start_switch_fiber_asan(_stack_save,
 leaderp->stack, leaderp->stack_size);
 start_switch_fiber_tsan(_stack_save, self, true); /* true=caller 
*/
 siglongjmp(*(sigjmp_buf *)co->entry_arg, 1);
@@ -226,8 +224,7 @@ Coroutine *qemu_coroutine_new(void)
 
 /* swapcontext() in, siglongjmp() back out */
 if (!sigsetjmp(old_env, 0)) {
-start_switch_fiber_asan(COROUTINE_YIELD, _stack_save, co->stack,
-co->stack_size);
+start_switch_fiber_asan(_stack_save, co->stack, co->stack_size);
 start_switch_fiber_tsan(_stack_save,
 co, false); /* false=not caller */
 
@@ -269,10 +266,28 @@ static inline void 
valgrind_stack_deregister(CoroutineUContext *co)
 #endif
 #endif
 
+#if defined(CONFIG_ASAN) && defined(CONFIG_COROUTINE_POOL)
+static void coroutine_fn terminate_asan(void *opaque)
+{
+CoroutineUContext *to = DO_UPCAST(CoroutineUContext, base, opaque);
+
+set_current(opaque);
+start_switch_fiber_asan(NULL, to->stack, to->stack_size);
+G_STATIC_ASSERT(!IS_ENABLED(CONFIG_TSAN));
+siglongjmp(to->env, COROUTINE_ENTER);
+}
+#endif
+
 void qemu_coroutine_delete(Coroutine *co_)
 {
 CoroutineUContext *co = DO_UPCAST(CoroutineUContext, base, co_);
 
+#if defined(CONFIG_ASAN) && defined(CONFIG_COROUTINE_POOL)
+co_->entry_arg = qemu_coroutine_self();
+co_->entry = terminate_asan;
+qemu_coroutine_switch(co_->entry_arg, co_, COROUTINE_ENTER);
+#endif
+
 #ifdef CONFIG_VALGRIND_H
 valgrind_stack_deregister(co);
 #endif
@@ -305,8 +320,10 @@ qemu_coroutine_switch(Coroutine *from_, Coroutine *to_,
 
 ret = sigsetjmp(from->env, 0);
 if (ret == 0) {
-start_switch_fiber_asan(action, _stack_save, to->stack,
-to->stack_size);
+start_switch_fiber_asan(IS_ENABLED(CONFIG_COROUTINE_POOL) ||
+action != COROUTINE_TERMINATE ?
+_stack_save : NULL,
+to->stack, to->stack_size);
 start_switch_fiber_tsan(_stack_save,
 to, false); /* false=not caller */
 siglongjmp(to->env, action);

---
base-commit: f614acb7450282a119d85d759f27eae190476058
change-id: 20240112-asan-eb695c769f40

Best regards,
-- 
Akihiko Odaki 




Re: [PATCH 2/2] i386/cpu: Mask with XCR0/XSS mask for FEAT_XSAVE_XCR0_HI and FEAT_XSAVE_XSS_HI leafs

2024-01-16 Thread Yang, Weijiang

On 1/15/2024 5:13 PM, Li, Xiaoyao wrote:

The value of FEAT_XSAVE_XCR0_HI leaf and FEAT_XSAVE_XSS_HI leaf also
need to be masked by XCR0 and XSS mask respectively, to make it
logically correct.

Fixes: 301e90675c3f ("target/i386: Enable support for XSAVES based features")
Signed-off-by: Xiaoyao Li 
---
  target/i386/cpu.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index b445e2957c4f..a5c08944a483 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6946,9 +6946,9 @@ static void x86_cpu_enable_xsave_components(X86CPU *cpu)
  }
  
  env->features[FEAT_XSAVE_XCR0_LO] = mask & CPUID_XSTATE_XCR0_MASK;

-env->features[FEAT_XSAVE_XCR0_HI] = mask >> 32;
+env->features[FEAT_XSAVE_XCR0_HI] = (mask & CPUID_XSTATE_XCR0_MASK) >> 32;
  env->features[FEAT_XSAVE_XSS_LO] = mask & CPUID_XSTATE_XSS_MASK;
-env->features[FEAT_XSAVE_XSS_HI] = mask >> 32;
+env->features[FEAT_XSAVE_XSS_HI] = (mask & CPUID_XSTATE_XSS_MASK) >> 32;
  }


Thanks for fixing this!
Reviewed-by: Yang Weijiang 

  
  /* Steps involved on loading and filtering CPUID data





Re: [PATCH 1/2] i386/cpu: Clear FEAT_XSAVE_XSS_LO/HI leafs when CPUID_EXT_XSAVE is not available

2024-01-16 Thread Yang, Weijiang

On 1/15/2024 5:13 PM, Li, Xiaoyao wrote:

Leaf FEAT_XSAVE_XSS_LO and FEAT_XSAVE_XSS_HI also need to be cleared
when CPUID_EXT_XSAVE is not set.

Fixes: 301e90675c3f ("target/i386: Enable support for XSAVES based features")
Signed-off-by: Xiaoyao Li 
---
  target/i386/cpu.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 2524881ce245..b445e2957c4f 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6926,6 +6926,8 @@ static void x86_cpu_enable_xsave_components(X86CPU *cpu)
  if (!(env->features[FEAT_1_ECX] & CPUID_EXT_XSAVE)) {
  env->features[FEAT_XSAVE_XCR0_LO] = 0;
  env->features[FEAT_XSAVE_XCR0_HI] = 0;
+env->features[FEAT_XSAVE_XSS_LO] = 0;
+env->features[FEAT_XSAVE_XSS_HI] = 0;
  return;
  }


Thanks for fixing this!
Reviewed-by: Yang Weijiang 

  





[PATCH v2] acpi/tests/avocado/bits: wait for 200 seconds for SHUTDOWN event from bits VM

2024-01-16 Thread Ani Sinha
By default, the timeout to receive any specified event from the QEMU VM is 60
seconds set by the python avocado test framework. Please see event_wait() and
events_wait() in python/qemu/machine/machine.py. If the matching event is not
triggered within that interval, an asyncio.TimeoutError is generated. Since the
timeout for the bits avocado test is 200 secs, we need to make event_wait()
timeout of the same value as well so that an early timeout is not triggered by
the avocado framework.

CC: peter.mayd...@linaro.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2077
Signed-off-by: Ani Sinha 
---
 tests/avocado/acpi-bits.py | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

changelog:
v2: cosmetic comment updates in code and patch description.

diff --git a/tests/avocado/acpi-bits.py b/tests/avocado/acpi-bits.py
index 68b9e98d4e..efe4f52ee0 100644
--- a/tests/avocado/acpi-bits.py
+++ b/tests/avocado/acpi-bits.py
@@ -54,6 +54,8 @@
 deps = ["xorriso", "mformat"] # dependent tools needed in the test setup/box.
 supported_platforms = ['x86_64'] # supported test platforms.
 
+# default timeout of 120 secs is sometimes not enough for bits test.
+BITS_TIMEOUT = 200
 
 def which(tool):
 """ looks up the full path for @tool, returns None if not found
@@ -133,7 +135,7 @@ class AcpiBitsTest(QemuBaseTest): #pylint: 
disable=too-many-instance-attributes
 
 """
 # in slower systems the test can take as long as 3 minutes to complete.
-timeout = 200
+timeout = BITS_TIMEOUT
 
 def __init__(self, *args, **kwargs):
 super().__init__(*args, **kwargs)
@@ -400,7 +402,8 @@ def test_acpi_smbios_bits(self):
 
 # biosbits has been configured to run all the specified test suites
 # in batch mode and then automatically initiate a vm shutdown.
-# Rely on avocado's unit test timeout.
-self._vm.event_wait('SHUTDOWN')
+# Set timeout to BITS_TIMEOUT for SHUTDOWN event from bits VM at par
+# with the avocado test timeout.
+self._vm.event_wait('SHUTDOWN', timeout=BITS_TIMEOUT)
 self._vm.wait(timeout=None)
 self.parse_log()
-- 
2.39.2




Re: [PATCH] qdev: not add devices to bus in reverse order

2024-01-16 Thread Kai

On 1/9/24 17:52, kai.k...@windriver.com wrote:

From: Kai Kang 

When this section of source codes were added via commit:

* 02e2da45c4 Add common BusState

it added devices to bus with LIST_INSERT_HEAD() which operated on the
single direction list. It didn't have something like LIST_INSERT_TAIL()
at that time and kept that way when turned to QTAILQ.

Then it causes the fist device in qemu command line inserted at the end
of the bus child link list. And when realize them, the first device will
be the last one to be realized.

Replace QTAILQ_INSERT_HEAD_RCU() with QTAILQ_INSERT_TAIL_RCU() to make
sure that devices are added to bus with the sequence in the command
line.


Ping.




Signed-off-by: Kai Kang 
---
  hw/core/qdev.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 43d863b0c5..5e2ff43715 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -89,7 +89,7 @@ static void bus_add_child(BusState *bus, DeviceState *child)
  kid->child = child;
  object_ref(OBJECT(kid->child));
  
-QTAILQ_INSERT_HEAD_RCU(>children, kid, sibling);

+QTAILQ_INSERT_TAIL_RCU(>children, kid, sibling);
  
  /* This transfers ownership of kid->child to the property.  */

  snprintf(name, sizeof(name), "child[%d]", kid->index);



--
Kai Kang
Wind River Linux




Re: [PATCH v3 15/38] target/s390x: Improve general case of disas_jcc

2024-01-16 Thread Richard Henderson

On 1/17/24 09:19, Philippe Mathieu-Daudé wrote:

+    case 0x4 | 0x1: /* cc == 1 || cc == 3 => (cc & 1) != 0 */
+    cond = TCG_COND_TSTNE;
+    c->u.s32.b = tcg_constant_i32(1);


Don't we need to AND?

   c->u.s32.a = tcg_temp_new_i32();
   tcg_gen_andi_i32(c->u.s32.a, cc_op, 1);


No, that's the TSTNE cond there.


r~



Re: [PATCH 3/3] s390x/pci: drive ISM reset from subsystem reset

2024-01-16 Thread Eric Farman
On Tue, 2024-01-16 at 17:31 -0500, Matthew Rosato wrote:
> ISM devices are sensitive to manipulation of the IOMMU, so the ISM
> device
> needs to be reset before the vfio-pci device is reset (triggering a
> full
> UNMAP).  In order to ensure this occurs, trigger ISM device resets
> from
> subsystem_reset before triggering the PCI bus reset (which will also
> trigger vfio-pci reset).  This only needs to be done for ISM devices
> which were enabled for use by the guest.
> Further, ensure that AIF is disabled as part of the reset event.
> 
> Fixes: ef1535901a ("s390x: do a subsystem reset before the unprotect
> on reboot")
> Fixes: 03451953c7 ("s390x/pci: reset ISM passthrough devices on
> shutdown and system reset")
> Reported-by: Cédric Le Goater 
> Signed-off-by: Matthew Rosato 
> ---
>  hw/s390x/s390-pci-bus.c | 26 +-
>  hw/s390x/s390-virtio-ccw.c  |  2 ++
>  include/hw/s390x/s390-pci-bus.h |  1 +
>  3 files changed, 20 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> index 347580ebac..3e57d5faca 100644
> --- a/hw/s390x/s390-pci-bus.c
> +++ b/hw/s390x/s390-pci-bus.c
> @@ -151,20 +151,12 @@ static void s390_pci_shutdown_notifier(Notifier
> *n, void *opaque)
>  pci_device_reset(pbdev->pdev);
>  }
>  
> -static void s390_pci_reset_cb(void *opaque)
> -{
> -    S390PCIBusDevice *pbdev = opaque;
> -
> -    pci_device_reset(pbdev->pdev);
> -}
> -
>  static void s390_pci_perform_unplug(S390PCIBusDevice *pbdev)
>  {
>  HotplugHandler *hotplug_ctrl;
>  
>  if (pbdev->pft == ZPCI_PFT_ISM) {
>  notifier_remove(>shutdown_notifier);
> -    qemu_unregister_reset(s390_pci_reset_cb, pbdev);
>  }
>  
>  /* Unplug the PCI device */
> @@ -1132,7 +1124,6 @@ static void s390_pcihost_plug(HotplugHandler
> *hotplug_dev, DeviceState *dev,
>  if (pbdev->pft == ZPCI_PFT_ISM) {
>  pbdev->shutdown_notifier.notify =
> s390_pci_shutdown_notifier;
>  qemu_register_shutdown_notifier(
> >shutdown_notifier);
> -    qemu_register_reset(s390_pci_reset_cb, pbdev);
>  }
>  } else {
>  pbdev->fh |= FH_SHM_EMUL;
> @@ -1279,6 +1270,23 @@ static void s390_pci_enumerate_bridge(PCIBus
> *bus, PCIDevice *pdev,
>  pci_default_write_config(pdev, PCI_SUBORDINATE_BUS, s->bus_no,
> 1);
>  }
>  
> +void s390_pci_ism_reset(void)
> +{
> +    S390pciState *s = s390_get_phb();
> +
> +    S390PCIBusDevice *pbdev, *next;
> +
> +    /* Trigger reset event for each passthrough ISM device currently
> in-use */
> +    QTAILQ_FOREACH_SAFE(pbdev, >zpci_devs, link, next) {
> +    if (pbdev->interp && pbdev->pft == ZPCI_PFT_ISM &&
> +    pbdev->fh & FH_MASK_ENABLE) {
> +    s390_pci_kvm_aif_disable(pbdev);
> +
> +    pci_device_reset(pbdev->pdev);

Do we care about the loss of a reset for ISM devices in a
!interpretation case? (I seem to think such a configuration is not
possible today, and so we don't care, but could use a reminder.)

> +    }
> +    }
> +}
> +
>  static void s390_pcihost_reset(DeviceState *dev)
>  {
>  S390pciState *s = S390_PCI_HOST_BRIDGE(dev);
> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> index 1169e20b94..4de04f7e9f 100644
> --- a/hw/s390x/s390-virtio-ccw.c
> +++ b/hw/s390x/s390-virtio-ccw.c
> @@ -118,6 +118,8 @@ static void subsystem_reset(void)
>  DeviceState *dev;
>  int i;
>  
> +    s390_pci_ism_reset();
> +
>  for (i = 0; i < ARRAY_SIZE(reset_dev_types); i++) {
>  dev = DEVICE(object_resolve_path_type("",
> reset_dev_types[i], NULL));
>  if (dev) {
> diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-
> pci-bus.h
> index 435e788867..2c43ea123f 100644
> --- a/include/hw/s390x/s390-pci-bus.h
> +++ b/include/hw/s390x/s390-pci-bus.h
> @@ -401,5 +401,6 @@ S390PCIBusDevice
> *s390_pci_find_dev_by_target(S390pciState *s,
>    const char *target);
>  S390PCIBusDevice *s390_pci_find_next_avail_dev(S390pciState *s,
>     S390PCIBusDevice
> *pbdev);
> +void s390_pci_ism_reset(void);
>  
>  #endif




Re: [PATCH V2 03/11] migration: convert to NotifierWithReturn

2024-01-16 Thread Peter Xu
On Tue, Jan 16, 2024 at 03:35:53PM -0500, Steven Sistare wrote:
> On 1/15/2024 1:44 AM, Peter Xu wrote:
> > On Fri, Jan 12, 2024 at 07:05:02AM -0800, Steve Sistare wrote:
> >> Change all migration notifiers to type NotifierWithReturn, so notifiers
> >> can return an error status in a future patch.  For now, pass NULL for the
> >> notifier error parameter, and do not check the return value.
> >>
> >> Signed-off-by: Steve Sistare 
> >> ---
> >>  hw/net/virtio-net.c|  4 +++-
> >>  hw/vfio/migration.c|  4 +++-
> >>  include/hw/vfio/vfio-common.h  |  2 +-
> >>  include/hw/virtio/virtio-net.h |  2 +-
> >>  include/migration/misc.h   |  6 +++---
> >>  migration/migration.c  | 16 
> >>  net/vhost-vdpa.c   |  6 --
> >>  ui/spice-core.c|  8 +---
> >>  8 files changed, 28 insertions(+), 20 deletions(-)
> >>
> >> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >> index 7a2846f..9570353 100644
> >> --- a/hw/net/virtio-net.c
> >> +++ b/hw/net/virtio-net.c
> >> @@ -3529,11 +3529,13 @@ static void 
> >> virtio_net_handle_migration_primary(VirtIONet *n, MigrationState *s)
> >>  }
> >>  }
> >>  
> >> -static void virtio_net_migration_state_notifier(Notifier *notifier, void 
> >> *data)
> >> +static int virtio_net_migration_state_notifier(NotifierWithReturn 
> >> *notifier,
> >> +   void *data, Error **errp)
> >>  {
> >>  MigrationState *s = data;
> >>  VirtIONet *n = container_of(notifier, VirtIONet, migration_state);
> >>  virtio_net_handle_migration_primary(n, s);
> >> +return 0;
> >>  }
> > 
> > I should have mentioned this earlier.. we have a trend recently to modify
> > retval to boolean when Error** existed, e.g.:
> > 
> > https://lore.kernel.org/all/20231017202633.296756-5-pet...@redhat.com/
> > 
> > Let's start using boolean too here?  Previous patches may need touch-ups
> > too for this.
> > 
> > Other than that it all looks good here.  Thanks,
> 
> Boolean makes sense when there is only one way to recover from failure.
> However, when the notifier may fail in different ways, and recovery differs
> for each, then the function should return an int errno.  NotifierWithReturn
> could have future uses that need multiple return values and multiple recovery 
> paths above the notifier_with_return_list_notify level, so IMO the function 
> should continue to return int for generality.

Ah ok.  Please then add a comment either in the commit message or code for
that.  Thanks.

-- 
Peter Xu




Re: [PATCH 2/3] s390x/pci: refresh fh before disabling aif

2024-01-16 Thread Eric Farman
On Tue, 2024-01-16 at 17:31 -0500, Matthew Rosato wrote:
> Typically we refresh the host fh during CLP enable, however it's
> possible
> that the device goes through multiple reset events before the guest
> performs another CLP enable.  Let's handle this for now by refreshing
> the
> host handle from vfio before disabling aif.
> 
> Fixes: 03451953c7 ("s390x/pci: reset ISM passthrough devices on
> shutdown and system reset")
> Reported-by: Cédric Le Goater 
> Signed-off-by: Matthew Rosato 
> ---
>  hw/s390x/s390-pci-kvm.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)

Reviewed-by: Eric Farman 



Re: [PATCH 1/3] s390x/pci: avoid double enable/disable of aif

2024-01-16 Thread Eric Farman
On Tue, 2024-01-16 at 17:31 -0500, Matthew Rosato wrote:
> Use a flag to keep track of whether AIF is currently enabled.  This
> can be
> used to avoid enabling/disabling AIF multiple times as well as to
> determine
> whether or not it should be disabled during reset processing.
> 
> Fixes: d0bc7091c2 ("s390x/pci: enable adapter event notification for
> interpreted devices")
> Reported-by: Cédric Le Goater 
> Signed-off-by: Matthew Rosato 
> ---
>  hw/s390x/s390-pci-kvm.c | 25 +++--
>  include/hw/s390x/s390-pci-bus.h |  1 +
>  2 files changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/s390x/s390-pci-kvm.c b/hw/s390x/s390-pci-kvm.c
> index ff41e4106d..f7e10cfa72 100644
> --- a/hw/s390x/s390-pci-kvm.c
> +++ b/hw/s390x/s390-pci-kvm.c
> @@ -27,6 +27,7 @@ bool s390_pci_kvm_interp_allowed(void)
>  
>  int s390_pci_kvm_aif_enable(S390PCIBusDevice *pbdev, ZpciFib *fib,
> bool assist)
>  {
> +    int rc;
>  struct kvm_s390_zpci_op args = {
>  .fh = pbdev->fh,
>  .op = KVM_S390_ZPCIOP_REG_AEN,
> @@ -38,15 +39,35 @@ int s390_pci_kvm_aif_enable(S390PCIBusDevice
> *pbdev, ZpciFib *fib, bool assist)
>  .u.reg_aen.flags = (assist) ? 0 :
> KVM_S390_ZPCIOP_REGAEN_HOST
>  };
>  
> -    return kvm_vm_ioctl(kvm_state, KVM_S390_ZPCI_OP, );
> +    if (pbdev->aif) {
> +    return -EINVAL;
> +    }
> +
> +    rc = kvm_vm_ioctl(kvm_state, KVM_S390_ZPCI_OP, );
> +    if (rc == 0) {
> +    pbdev->aif = true;
> +    }
> +
> +    return rc;
>  }
>  
>  int s390_pci_kvm_aif_disable(S390PCIBusDevice *pbdev)
>  {
> +    int rc;
> +
>  struct kvm_s390_zpci_op args = {
>  .fh = pbdev->fh,
>  .op = KVM_S390_ZPCIOP_DEREG_AEN
>  };
>  
> -    return kvm_vm_ioctl(kvm_state, KVM_S390_ZPCI_OP, );
> +    if (!pbdev->aif) {
> +    return -EINVAL;
> +    }
> +
> +    rc = kvm_vm_ioctl(kvm_state, KVM_S390_ZPCI_OP, );
> +    if (rc == 0) {
> +    pbev->aif = false;

s/pbev/pbdev/

You fix this in patch 2. :)

With that fixed:

Reviewed-by: Eric Farman 

> +    }
> +
> +    return rc;
>  }
> diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-
> pci-bus.h
> index b1bdbeaeb5..435e788867 100644
> --- a/include/hw/s390x/s390-pci-bus.h
> +++ b/include/hw/s390x/s390-pci-bus.h
> @@ -361,6 +361,7 @@ struct S390PCIBusDevice {
>  bool unplug_requested;
>  bool interp;
>  bool forwarding_assist;
> +    bool aif;
>  QTAILQ_ENTRY(S390PCIBusDevice) link;
>  };
>  




Re: [PATCH 1/2] linux-user/riscv: vdso: fix call frame info in __vdso_rt_sigreturn

2024-01-16 Thread Richard Henderson

On 1/17/24 10:52, Vineet Gupta wrote:

Ok it seems the issue is really subtle.

With 8.2 trunk, the NOP needed before signal trampoline seems to be be
factored into the unwind info for sigrestorer.

 003c 0098  CIE
   Version:   3
   Augmentation:  "zRS"
   Code alignment factor: 1
   Data alignment factor: -4
   Return address column: 64
   Augmentation data: 1b
   DW_CFA_def_cfa: r2 (sp) ofs 832
   DW_CFA_offset_extended: r64 at cfa-528
   DW_CFA_offset: r1 (ra) at cfa-520
   DW_CFA_offset: r2 (sp) at cfa-512
 ...
   DW_CFA_offset: r63 (ft11) at cfa-24
   DW_CFA_nop
   DW_CFA_nop

 00d8 0010 00a0 FDE cie=003c
 pc=066c..0678
  
 ^^^    <--- NOP included

   DW_CFA_nop
   DW_CFA_nop
   DW_CFA_nop

 0664 <__vdso_flush_icache>:
  664:    0513      li    a0,0
  668:    8067      ret
  66c:    0013      nop <--- this NOP

 0670 <__vdso_rt_sigreturn>:
  670:    08b00893      li    a7,139
  674:    0073      ecall


This is due to the .cfi_startproc bracketing. If we move the nop out of
the .cfi_{start,end}proc, things start to work as well.

 diff --git a/linux-user/riscv/vdso.S b/linux-user/riscv/vdso.S
 index 4b4e34aeea51..8c9f1038cb8c 100644
 --- a/linux-user/riscv/vdso.S
 +++ b/linux-user/riscv/vdso.S
 @@ -92,6 +92,8 @@ endf __vdso_flush_icache
  
     .cfi_endproc
  
 +   nop

 +
  /*
   * Start the unwind info at least one instruction before the signal
   * trampoline, because the unwinder will assume we are returning
 @@ -178,8 +180,6 @@ endf __vdso_flush_icache
     .cfi_offset 62, B_FR + 30 * sizeof_freg
     .cfi_offset 63, B_FR + 31 * sizeof_freg /* f31 */
  
 -   nop

 -
  __vdso_rt_sigreturn:
     raw_syscall __NR_rt_sigreturn
  endf __vdso_rt_sigreturn


This changes the cfi info slightly as follows:

00d8 0010 00a0 FDE cie=003c
pc=0670..0678  <-- excludes nop
   DW_CFA_nop
   DW_CFA_nop
   DW_CFA_nop


0664 <__vdso_flush_icache>:
  664:    0513      li    a0,0
  668:    8067      ret
  66c:    0013      nop

0670 <__vdso_rt_sigreturn>:
  670:    08b00893      li    a7,139
  674:    0073      ecall

I concur this is still not 100% explanation of why things are going off,
but I have exact same nop quirk for glibc ARC sigrestorer.
Would an updated patch along those lines be more palatable.


No.

The explanation is right there in the block comment: "Start the unwind info at least one 
instruction before...".  The unwind info is taken from that nop insn.


By moving the nop outside the unwind info, you remove the effect of the unwind info, as 
the nop is now outside of any unwind blocks.  It is the same as removing all of the unwind 
info entirely, which results in the (current) libgcc fallback information being used.



r~



Re: [PATCH 1/2] linux-user/riscv: vdso: fix call frame info in __vdso_rt_sigreturn

2024-01-16 Thread Vineet Gupta



On 1/15/24 15:18, Richard Henderson wrote:
> On 1/16/24 10:15, Vineet Gupta wrote:
>> When testing gcc testsuite against QEMU v8.2 we found some additional
>> failures vs. v8.1.2.
>>
>> | FAIL: gcc.dg/cleanup-10.c execution test
>> | FAIL: gcc.dg/cleanup-11.c execution test
>> | FAIL: gcc.dg/cleanup-8.c execution test
>> | FAIL: gcc.dg/cleanup-9.c execution test
>>
>> All of these tests involve unwinding off signal stack and v8.2 did
>> introduce a vdso with sigreturn trampoline and associated unwinding
>> info. It seems that info is not correct and making it similar to
>> to one in the linux kernel fixes the above failures.
> So.. you didn't actually determine what might be off in the unwind info?

Not yet. I just tried what kernel had and that worked.

>
>> +.cfi_startproc
>> +.cfi_signal_frame
>> +   raw_syscall __NR_rt_sigreturn
>> +   .cfi_endproc
> No, this is wrong.  It indicates that the unwind info is present and trivial.

Ok it seems the issue is really subtle.

With 8.2 trunk, the NOP needed before signal trampoline seems to be be
factored into the unwind info for sigrestorer.

003c 0098  CIE
  Version:   3
  Augmentation:  "zRS"
  Code alignment factor: 1
  Data alignment factor: -4
  Return address column: 64
  Augmentation data: 1b
  DW_CFA_def_cfa: r2 (sp) ofs 832
  DW_CFA_offset_extended: r64 at cfa-528
  DW_CFA_offset: r1 (ra) at cfa-520
  DW_CFA_offset: r2 (sp) at cfa-512
...
  DW_CFA_offset: r63 (ft11) at cfa-24
  DW_CFA_nop
  DW_CFA_nop

00d8 0010 00a0 FDE cie=003c
pc=066c..0678

 
^^^    <--- NOP included
  DW_CFA_nop
  DW_CFA_nop
  DW_CFA_nop

0664 <__vdso_flush_icache>:
 664:    0513      li    a0,0
 668:    8067      ret
 66c:    0013      nop <--- this NOP

0670 <__vdso_rt_sigreturn>:
 670:    08b00893      li    a7,139
 674:    0073      ecall


This is due to the .cfi_startproc bracketing. If we move the nop out of
the .cfi_{start,end}proc, things start to work as well.

diff --git a/linux-user/riscv/vdso.S b/linux-user/riscv/vdso.S
index 4b4e34aeea51..8c9f1038cb8c 100644
--- a/linux-user/riscv/vdso.S
+++ b/linux-user/riscv/vdso.S
@@ -92,6 +92,8 @@ endf __vdso_flush_icache
 
    .cfi_endproc
 
+   nop
+
 /*
  * Start the unwind info at least one instruction before the signal
  * trampoline, because the unwinder will assume we are returning
@@ -178,8 +180,6 @@ endf __vdso_flush_icache
    .cfi_offset 62, B_FR + 30 * sizeof_freg
    .cfi_offset 63, B_FR + 31 * sizeof_freg /* f31 */
 
-   nop
-
 __vdso_rt_sigreturn:
    raw_syscall __NR_rt_sigreturn
 endf __vdso_rt_sigreturn


This changes the cfi info slightly as follows:

00d8 0010 00a0 FDE cie=003c
pc=0670..0678  <-- excludes nop
  DW_CFA_nop
  DW_CFA_nop
  DW_CFA_nop


0664 <__vdso_flush_icache>:
 664:    0513      li    a0,0
 668:    8067      ret
 66c:    0013      nop

0670 <__vdso_rt_sigreturn>:
 670:    08b00893      li    a7,139
 674:    0073      ecall

I concur this is still not 100% explanation of why things are going off,
but I have exact same nop quirk for glibc ARC sigrestorer.
Would an updated patch along those lines be more palatable.

Thx,
-Vineet



Issue with PC updates.

2024-01-16 Thread Sid Manning
Hi Taylor,

I ran into an issue when a packet, not executed out of ram 
(get_page_addr_code_hostp returns -1, see translate-all.c) contains a fault.
This packet is an example:
{
p0 = cmp.eq(r6,#0x6)
if (p0.new) jump:t pass
memw(##0xf200) = r6
}

The above packet should always jump to "pass" since r6 is set to #0x6, but if 
the store faults, the jump is discarded.  This happens because 
do_raise_exception's call to cpu_loop_exit_restore is not able to find a TB to 
restore the PC to.  When an instruction is not associated with a physical RAM 
page translate-all will create a "one-shot" TB so when cpu_restore_state looks 
for the TB by calling tcg_tb_loopup none is found.  That keeps the PC from 
being restored.

The change attached restores some of the code from commit 
613653e500c0d482784f09aaa71f1297565b6815 / Hexagon (target/hexagon) Remove 
next_PC from runtime state.

There are two attachments, the qemu update also includes an update to 
translate-all.c that forces this problem to occur.  The second is the testcase 
which is built using vanilla llvm toolchain configured for hexagon.

Thanks,


pc-testcase.tar.gz
Description: pc-testcase.tar.gz


0001-Incorrect-PC-update-for-many-miss-packets.patch
Description: 0001-Incorrect-PC-update-for-many-miss-packets.patch


[PATCH] tests/docker: Add sqlite3 module to openSUSE Leap container

2024-01-16 Thread Fabiano Rosas
Avocado needs sqlite3:

  Failed to load plugin from module "avocado.plugins.journal":
  ImportError("Module 'sqlite3' is not installed.
  Use: sudo zypper install python311 to install it")

Include the appropriate package in the dockerfile.

>From 'zypper info python311':
  "This package supplies rich command line features provided by
  readline, and sqlite3 support for the interpreter core, thus forming
  a so called "extended" runtime."

Signed-off-by: Fabiano Rosas 
---
 tests/docker/dockerfiles/opensuse-leap.docker | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/docker/dockerfiles/opensuse-leap.docker 
b/tests/docker/dockerfiles/opensuse-leap.docker
index dc0e36ce48..cf753383a4 100644
--- a/tests/docker/dockerfiles/opensuse-leap.docker
+++ b/tests/docker/dockerfiles/opensuse-leap.docker
@@ -90,6 +90,7 @@ RUN zypper update -y && \
pcre-devel-static \
pipewire-devel \
pkgconfig \
+   python311 \
python311-base \
python311-pip \
python311-setuptools \
-- 
2.35.3




[PATCH 2/3] s390x/pci: refresh fh before disabling aif

2024-01-16 Thread Matthew Rosato
Typically we refresh the host fh during CLP enable, however it's possible
that the device goes through multiple reset events before the guest
performs another CLP enable.  Let's handle this for now by refreshing the
host handle from vfio before disabling aif.

Fixes: 03451953c7 ("s390x/pci: reset ISM passthrough devices on shutdown and 
system reset")
Reported-by: Cédric Le Goater 
Signed-off-by: Matthew Rosato 
---
 hw/s390x/s390-pci-kvm.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/s390-pci-kvm.c b/hw/s390x/s390-pci-kvm.c
index f7e10cfa72..9eef4fc3ec 100644
--- a/hw/s390x/s390-pci-kvm.c
+++ b/hw/s390x/s390-pci-kvm.c
@@ -18,6 +18,7 @@
 #include "hw/s390x/s390-pci-bus.h"
 #include "hw/s390x/s390-pci-kvm.h"
 #include "hw/s390x/s390-pci-inst.h"
+#include "hw/s390x/s390-pci-vfio.h"
 #include "cpu_models.h"
 
 bool s390_pci_kvm_interp_allowed(void)
@@ -64,9 +65,17 @@ int s390_pci_kvm_aif_disable(S390PCIBusDevice *pbdev)
 return -EINVAL;
 }
 
+/*
+ * The device may have already been reset but we still want to relinquish
+ * the guest ISC, so always be sure to use an up-to-date host fh.
+ */
+if (!s390_pci_get_host_fh(pbdev, )) {
+return -EPERM;
+}
+
 rc = kvm_vm_ioctl(kvm_state, KVM_S390_ZPCI_OP, );
 if (rc == 0) {
-pbev->aif = false;
+pbdev->aif = false;
 }
 
 return rc;
-- 
2.43.0




[PATCH 0/3] s390x/pci: fix ISM reset

2024-01-16 Thread Matthew Rosato
Commit ef1535901a0 (re-)introduced an issue where passthrough ISM devices
on s390x would enter an error state after reboot.  This was previously fixed
by 03451953c79e, using device reset callbacks, however the change in
ef1535901a0 effectively triggers a cold reset of the pci bus before the
device reset callbacks are triggered.

To resolve this, this series proposes to remove the use of the reset callback
for ISM cleanup and instead trigger ISM reset from subsystem_reset before 
triggering bus resets.  This has to happen before the bus resets because the
reset of s390-pcihost will trigger reset of the PCI bus followed by the
s390-pci bus, and the former will trigger vfio-pci reset / the aperture-wide
unmap that ISM gets upset about.
 
  /s390-pcihost (s390-pcihost)
/pci.0 (PCI)
/s390-pcibus.0 (s390-pcibus)

While fixing this, it was also noted that kernel warnings could be seen that
indicate a guest ISC reference count error.  That's because in some reset
cases we were not bothering to disable AIF, but would again re-enable it after
the reset (causing the reference count to grow erroneously).  This was a base
issue that went unnoticed because the kernel previously did not detect and
issue a warning for this scenario. 

Matthew Rosato (3):
  s390x/pci: avoid double enable/disable of aif
  s390x/pci: refresh fh before disabling aif
  s390x/pci: drive ISM reset from subsystem reset

 hw/s390x/s390-pci-bus.c | 26 -
 hw/s390x/s390-pci-kvm.c | 34 +++--
 hw/s390x/s390-virtio-ccw.c  |  2 ++
 include/hw/s390x/s390-pci-bus.h |  2 ++
 4 files changed, 53 insertions(+), 11 deletions(-)

-- 
2.43.0




[PATCH 1/3] s390x/pci: avoid double enable/disable of aif

2024-01-16 Thread Matthew Rosato
Use a flag to keep track of whether AIF is currently enabled.  This can be
used to avoid enabling/disabling AIF multiple times as well as to determine
whether or not it should be disabled during reset processing.

Fixes: d0bc7091c2 ("s390x/pci: enable adapter event notification for 
interpreted devices")
Reported-by: Cédric Le Goater 
Signed-off-by: Matthew Rosato 
---
 hw/s390x/s390-pci-kvm.c | 25 +++--
 include/hw/s390x/s390-pci-bus.h |  1 +
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/hw/s390x/s390-pci-kvm.c b/hw/s390x/s390-pci-kvm.c
index ff41e4106d..f7e10cfa72 100644
--- a/hw/s390x/s390-pci-kvm.c
+++ b/hw/s390x/s390-pci-kvm.c
@@ -27,6 +27,7 @@ bool s390_pci_kvm_interp_allowed(void)
 
 int s390_pci_kvm_aif_enable(S390PCIBusDevice *pbdev, ZpciFib *fib, bool assist)
 {
+int rc;
 struct kvm_s390_zpci_op args = {
 .fh = pbdev->fh,
 .op = KVM_S390_ZPCIOP_REG_AEN,
@@ -38,15 +39,35 @@ int s390_pci_kvm_aif_enable(S390PCIBusDevice *pbdev, 
ZpciFib *fib, bool assist)
 .u.reg_aen.flags = (assist) ? 0 : KVM_S390_ZPCIOP_REGAEN_HOST
 };
 
-return kvm_vm_ioctl(kvm_state, KVM_S390_ZPCI_OP, );
+if (pbdev->aif) {
+return -EINVAL;
+}
+
+rc = kvm_vm_ioctl(kvm_state, KVM_S390_ZPCI_OP, );
+if (rc == 0) {
+pbdev->aif = true;
+}
+
+return rc;
 }
 
 int s390_pci_kvm_aif_disable(S390PCIBusDevice *pbdev)
 {
+int rc;
+
 struct kvm_s390_zpci_op args = {
 .fh = pbdev->fh,
 .op = KVM_S390_ZPCIOP_DEREG_AEN
 };
 
-return kvm_vm_ioctl(kvm_state, KVM_S390_ZPCI_OP, );
+if (!pbdev->aif) {
+return -EINVAL;
+}
+
+rc = kvm_vm_ioctl(kvm_state, KVM_S390_ZPCI_OP, );
+if (rc == 0) {
+pbev->aif = false;
+}
+
+return rc;
 }
diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
index b1bdbeaeb5..435e788867 100644
--- a/include/hw/s390x/s390-pci-bus.h
+++ b/include/hw/s390x/s390-pci-bus.h
@@ -361,6 +361,7 @@ struct S390PCIBusDevice {
 bool unplug_requested;
 bool interp;
 bool forwarding_assist;
+bool aif;
 QTAILQ_ENTRY(S390PCIBusDevice) link;
 };
 
-- 
2.43.0




[PATCH 3/3] s390x/pci: drive ISM reset from subsystem reset

2024-01-16 Thread Matthew Rosato
ISM devices are sensitive to manipulation of the IOMMU, so the ISM device
needs to be reset before the vfio-pci device is reset (triggering a full
UNMAP).  In order to ensure this occurs, trigger ISM device resets from
subsystem_reset before triggering the PCI bus reset (which will also
trigger vfio-pci reset).  This only needs to be done for ISM devices
which were enabled for use by the guest.
Further, ensure that AIF is disabled as part of the reset event.

Fixes: ef1535901a ("s390x: do a subsystem reset before the unprotect on reboot")
Fixes: 03451953c7 ("s390x/pci: reset ISM passthrough devices on shutdown and 
system reset")
Reported-by: Cédric Le Goater 
Signed-off-by: Matthew Rosato 
---
 hw/s390x/s390-pci-bus.c | 26 +-
 hw/s390x/s390-virtio-ccw.c  |  2 ++
 include/hw/s390x/s390-pci-bus.h |  1 +
 3 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 347580ebac..3e57d5faca 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -151,20 +151,12 @@ static void s390_pci_shutdown_notifier(Notifier *n, void 
*opaque)
 pci_device_reset(pbdev->pdev);
 }
 
-static void s390_pci_reset_cb(void *opaque)
-{
-S390PCIBusDevice *pbdev = opaque;
-
-pci_device_reset(pbdev->pdev);
-}
-
 static void s390_pci_perform_unplug(S390PCIBusDevice *pbdev)
 {
 HotplugHandler *hotplug_ctrl;
 
 if (pbdev->pft == ZPCI_PFT_ISM) {
 notifier_remove(>shutdown_notifier);
-qemu_unregister_reset(s390_pci_reset_cb, pbdev);
 }
 
 /* Unplug the PCI device */
@@ -1132,7 +1124,6 @@ static void s390_pcihost_plug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 if (pbdev->pft == ZPCI_PFT_ISM) {
 pbdev->shutdown_notifier.notify = s390_pci_shutdown_notifier;
 qemu_register_shutdown_notifier(>shutdown_notifier);
-qemu_register_reset(s390_pci_reset_cb, pbdev);
 }
 } else {
 pbdev->fh |= FH_SHM_EMUL;
@@ -1279,6 +1270,23 @@ static void s390_pci_enumerate_bridge(PCIBus *bus, 
PCIDevice *pdev,
 pci_default_write_config(pdev, PCI_SUBORDINATE_BUS, s->bus_no, 1);
 }
 
+void s390_pci_ism_reset(void)
+{
+S390pciState *s = s390_get_phb();
+
+S390PCIBusDevice *pbdev, *next;
+
+/* Trigger reset event for each passthrough ISM device currently in-use */
+QTAILQ_FOREACH_SAFE(pbdev, >zpci_devs, link, next) {
+if (pbdev->interp && pbdev->pft == ZPCI_PFT_ISM &&
+pbdev->fh & FH_MASK_ENABLE) {
+s390_pci_kvm_aif_disable(pbdev);
+
+pci_device_reset(pbdev->pdev);
+}
+}
+}
+
 static void s390_pcihost_reset(DeviceState *dev)
 {
 S390pciState *s = S390_PCI_HOST_BRIDGE(dev);
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 1169e20b94..4de04f7e9f 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -118,6 +118,8 @@ static void subsystem_reset(void)
 DeviceState *dev;
 int i;
 
+s390_pci_ism_reset();
+
 for (i = 0; i < ARRAY_SIZE(reset_dev_types); i++) {
 dev = DEVICE(object_resolve_path_type("", reset_dev_types[i], NULL));
 if (dev) {
diff --git a/include/hw/s390x/s390-pci-bus.h b/include/hw/s390x/s390-pci-bus.h
index 435e788867..2c43ea123f 100644
--- a/include/hw/s390x/s390-pci-bus.h
+++ b/include/hw/s390x/s390-pci-bus.h
@@ -401,5 +401,6 @@ S390PCIBusDevice *s390_pci_find_dev_by_target(S390pciState 
*s,
   const char *target);
 S390PCIBusDevice *s390_pci_find_next_avail_dev(S390pciState *s,
S390PCIBusDevice *pbdev);
+void s390_pci_ism_reset(void);
 
 #endif
-- 
2.43.0




Re: [PATCH v3 21/38] tcg/arm: Support TCG_COND_TST{EQ,NE}

2024-01-16 Thread Philippe Mathieu-Daudé

On 10/1/24 23:43, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
Message-Id: <20231028194522.245170-12-richard.hender...@linaro.org>
[PMD: Split from bigger patch, part 2/2]
Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20231108145244.72421-2-phi...@linaro.org>
---
  tcg/arm/tcg-target.h |  2 +-
  tcg/arm/tcg-target.c.inc | 29 -
  2 files changed, 29 insertions(+), 2 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v3 20/38] tcg/arm: Factor tcg_out_cmp() out

2024-01-16 Thread Philippe Mathieu-Daudé

On 10/1/24 23:43, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
Message-Id: <20231028194522.245170-12-richard.hender...@linaro.org>
[PMD: Split from bigger patch, part 1/2]
Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20231108145244.72421-1-phi...@linaro.org>
---
  tcg/arm/tcg-target.c.inc | 32 +---
  1 file changed, 17 insertions(+), 15 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v3 15/38] target/s390x: Improve general case of disas_jcc

2024-01-16 Thread Philippe Mathieu-Daudé

Hi Richard,

On 10/1/24 23:43, Richard Henderson wrote:

Avoid code duplication by handling 7 of the 14 cases
by inverting the test for the other 7 cases.

Use TCG_COND_TSTNE for cc in {1,3}.
Use (cc - 1) <= 1 for cc in {1,2}.

Signed-off-by: Richard Henderson 
---
  target/s390x/tcg/translate.c | 82 +---
  1 file changed, 30 insertions(+), 52 deletions(-)

diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index ae4e7b27ec..168974f2e6 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -885,67 +885,45 @@ static void disas_jcc(DisasContext *s, DisasCompare *c, 
uint32_t mask)
  case CC_OP_STATIC:
  c->is_64 = false;
  c->u.s32.a = cc_op;
-switch (mask) {
-case 0x8 | 0x4 | 0x2: /* cc != 3 */
-cond = TCG_COND_NE;
+
+/* Fold half of the cases using bit 3 to invert. */
+switch (mask & 8 ? mask ^ 0xf : mask) {
+case 0x1: /* cc == 3 */
+cond = TCG_COND_EQ;
  c->u.s32.b = tcg_constant_i32(3);
  break;
-case 0x8 | 0x4 | 0x1: /* cc != 2 */
-cond = TCG_COND_NE;
-c->u.s32.b = tcg_constant_i32(2);
-break;
-case 0x8 | 0x2 | 0x1: /* cc != 1 */
-cond = TCG_COND_NE;
-c->u.s32.b = tcg_constant_i32(1);
-break;
-case 0x8 | 0x2: /* cc == 0 || cc == 2 => (cc & 1) == 0 */
-cond = TCG_COND_EQ;
-c->u.s32.a = tcg_temp_new_i32();
-c->u.s32.b = tcg_constant_i32(0);
-tcg_gen_andi_i32(c->u.s32.a, cc_op, 1);
-break;
-case 0x8 | 0x4: /* cc < 2 */
-cond = TCG_COND_LTU;
-c->u.s32.b = tcg_constant_i32(2);
-break;
-case 0x8: /* cc == 0 */
-cond = TCG_COND_EQ;
-c->u.s32.b = tcg_constant_i32(0);
-break;
-case 0x4 | 0x2 | 0x1: /* cc != 0 */
-cond = TCG_COND_NE;
-c->u.s32.b = tcg_constant_i32(0);
-break;
-case 0x4 | 0x1: /* cc == 1 || cc == 3 => (cc & 1) != 0 */
-cond = TCG_COND_NE;
-c->u.s32.a = tcg_temp_new_i32();
-c->u.s32.b = tcg_constant_i32(0);
-tcg_gen_andi_i32(c->u.s32.a, cc_op, 1);
-break;
-case 0x4: /* cc == 1 */
-cond = TCG_COND_EQ;
-c->u.s32.b = tcg_constant_i32(1);
-break;
-case 0x2 | 0x1: /* cc > 1 */
-cond = TCG_COND_GTU;
-c->u.s32.b = tcg_constant_i32(1);
-break;
  case 0x2: /* cc == 2 */
  cond = TCG_COND_EQ;
  c->u.s32.b = tcg_constant_i32(2);
  break;
-case 0x1: /* cc == 3 */
+case 0x4: /* cc == 1 */
  cond = TCG_COND_EQ;
-c->u.s32.b = tcg_constant_i32(3);
+c->u.s32.b = tcg_constant_i32(1);
+break;
+case 0x2 | 0x1: /* cc == 2 || cc == 3 => cc > 1 */
+cond = TCG_COND_GTU;
+c->u.s32.b = tcg_constant_i32(1);
+break;
+case 0x4 | 0x1: /* cc == 1 || cc == 3 => (cc & 1) != 0 */
+cond = TCG_COND_TSTNE;
+c->u.s32.b = tcg_constant_i32(1);


Don't we need to AND?

  c->u.s32.a = tcg_temp_new_i32();
  tcg_gen_andi_i32(c->u.s32.a, cc_op, 1);


+break;
+case 0x4 | 0x2: /* cc == 1 || cc == 2 => (cc - 1) <= 1 */
+cond = TCG_COND_LEU;
+c->u.s32.a = tcg_temp_new_i32();
+c->u.s32.b = tcg_constant_i32(1);
+tcg_gen_addi_i32(c->u.s32.a, cc_op, -1);
+break;
+case 0x4 | 0x2 | 0x1: /* cc != 0 */
+cond = TCG_COND_NE;
+c->u.s32.b = tcg_constant_i32(0);
  break;
  default:
-/* CC is masked by something else: (8 >> cc) & mask.  */
-cond = TCG_COND_NE;
-c->u.s32.a = tcg_temp_new_i32();
-c->u.s32.b = tcg_constant_i32(0);
-tcg_gen_shr_i32(c->u.s32.a, tcg_constant_i32(8), cc_op);
-tcg_gen_andi_i32(c->u.s32.a, c->u.s32.a, mask);
-break;
+/* case 0: never, handled above. */
+g_assert_not_reached();
+}
+if (mask & 8) {
+cond = tcg_invert_cond(cond);
  }
  break;
  





Re: [PATCH] block/io: clear BDRV_BLOCK_RECURSE flag after recursing in bdrv_co_block_status

2024-01-16 Thread Stefan Hajnoczi
On Tue, Jan 16, 2024 at 04:48:39PM +0100, Fiona Ebner wrote:
> Using fleecing backup like in [0] on a qcow2 image (with metadata
> preallocation) can lead to the following assertion failure:
> 
> > bdrv_co_do_block_status: Assertion `!(ret & BDRV_BLOCK_ZERO)' failed.
> 
> In the reproducer [0], it happens because the BDRV_BLOCK_RECURSE flag
> will be set by the qcow2 driver, so the caller will recursively check
> the file child. Then the BDRV_BLOCK_ZERO set too. Later up the call
> chain, in bdrv_co_do_block_status() for the snapshot-access driver,
> the assertion failure will happen, because both flags are set.
> 
> To fix it, clear the recurse flag after the recursive check was done.

CCing Vladimir, who introduced the BDRV_BLOCK_RECURSE flag in commit
69f47505ee66 ("block: avoid recursive block_status call if possible").

> 
> In detail:
> 
> > #0  qcow2_co_block_status
> 
> Returns 0x45 = BDRV_BLOCK_RECURSE | BDRV_BLOCK_DATA |
> BDRV_BLOCK_OFFSET_VALID.
> 
> > #1  bdrv_co_do_block_status
> 
> Because of the data flag, bdrv_co_do_block_status() will now also set
> BDRV_BLOCK_ALLOCATED. Because of the recurse flag,
> bdrv_co_do_block_status() for the bdrv_file child will be called,
> which returns 0x16 = BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_OFFSET_VALID |
> BDRV_BLOCK_ZERO. Now the return value inherits the zero flag.
> 
> Returns 0x57 = BDRV_BLOCK_RECURSE | BDRV_BLOCK_DATA |
> BDRV_BLOCK_OFFSET_VALID | BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_ZERO.
> 
> > #2  bdrv_co_common_block_status_above
> > #3  bdrv_co_block_status_above
> > #4  bdrv_co_block_status
> > #5  cbw_co_snapshot_block_status
> > #6  bdrv_co_snapshot_block_status
> > #7  snapshot_access_co_block_status
> > #8  bdrv_co_do_block_status
> 
> Return value is propagated all the way up to here, where the assertion
> failure happens, because BDRV_BLOCK_RECURSE and BDRV_BLOCK_ZERO are
> both set.
> 
> > #9  bdrv_co_common_block_status_above
> > #10 bdrv_co_block_status_above
> > #11 block_copy_block_status
> > #12 block_copy_dirty_clusters
> > #13 block_copy_common
> > #14 block_copy_async_co_entry
> > #15 coroutine_trampoline
> 
> [0]:
> 
> > #!/bin/bash
> > rm /tmp/disk.qcow2
> > ./qemu-img create /tmp/disk.qcow2 -o preallocation=metadata -f qcow2 1G
> > ./qemu-img create /tmp/fleecing.qcow2 -f qcow2 1G
> > ./qemu-img create /tmp/backup.qcow2 -f qcow2 1G
> > ./qemu-system-x86_64 --qmp stdio \
> > --blockdev 
> > qcow2,node-name=node0,file.driver=file,file.filename=/tmp/disk.qcow2 \
> > --blockdev 
> > qcow2,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.qcow2 \
> > --blockdev 
> > qcow2,node-name=node2,file.driver=file,file.filename=/tmp/backup.qcow2 \
> > < > {"execute": "qmp_capabilities"}
> > {"execute": "blockdev-add", "arguments": { "driver": "copy-before-write", 
> > "file": "node0", "target": "node1", "node-name": "node3" } }
> > {"execute": "blockdev-add", "arguments": { "driver": "snapshot-access", 
> > "file": "node3", "node-name": "snap0" } }
> > {"execute": "blockdev-backup", "arguments": { "device": "snap0", "target": 
> > "node1", "sync": "full", "job-id": "backup0" } }
> > EOF
> 
> Signed-off-by: Fiona Ebner 
> ---
> 
> I'm new to this part of the code, so I'm not sure if it is actually
> safe to clear the flag? Intuitively, I'd expect it to be only relevant
> until it was acted upon, but no clue.
> 
>  block/io.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/block/io.c b/block/io.c
> index 8fa7670571..33150c0359 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2584,6 +2584,16 @@ bdrv_co_do_block_status(BlockDriverState *bs, bool 
> want_zero,
>  ret |= (ret2 & BDRV_BLOCK_ZERO);
>  }
>  }
> +
> +/*
> + * Now that the recursive search was done, clear the flag. Otherwise,
> + * with more complicated block graphs like snapshot-access ->
> + * copy-before-write -> qcow2, where the return value will be 
> propagated
> + * further up to a parent bdrv_co_do_block_status() call, both the
> + * BDRV_BLOCK_RECURSE and BDRV_BLOCK_ZERO flags would be set, which 
> is
> + * not allowed.
> + */
> +ret &= ~BDRV_BLOCK_RECURSE;
>  }
>  
>  out:
> -- 
> 2.39.2
> 
> 


signature.asc
Description: PGP signature


Re: [PATCH v3 12/38] target/m68k: Use TCG_COND_TST{EQ,NE} in gen_fcc_cond

2024-01-16 Thread Philippe Mathieu-Daudé

On 10/1/24 23:43, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---
  target/m68k/translate.c | 74 ++---
  1 file changed, 33 insertions(+), 41 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v3 09/38] target/alpha: Use TCG_COND_TST{EQ,NE} for BLB{C,S}

2024-01-16 Thread Philippe Mathieu-Daudé

On 10/1/24 23:43, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
Message-Id: <20231028194522.245170-33-richard.hender...@linaro.org>
[PMD: Split from bigger patch, part 2/2]
Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20231108205247.83234-2-phi...@linaro.org>
---
  target/alpha/translate.c | 20 ++--
  1 file changed, 10 insertions(+), 10 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v3 07/38] tcg/optimize: Lower TCG_COND_TST{EQ,NE} if unsupported

2024-01-16 Thread Philippe Mathieu-Daudé

On 10/1/24 23:43, Richard Henderson wrote:

After having performed other simplifications, lower any
remaining test comparisons with AND.

Signed-off-by: Richard Henderson 
---
  tcg/tcg-internal.h |  2 ++
  tcg/optimize.c | 60 +++---
  tcg/tcg.c  |  2 +-
  3 files changed, 55 insertions(+), 9 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v3 08/38] target/alpha: Pass immediate value to gen_bcond_internal()

2024-01-16 Thread Philippe Mathieu-Daudé

On 10/1/24 23:43, Richard Henderson wrote:

Simplify gen_bcond() by passing an immediate value.

Signed-off-by: Richard Henderson 
Message-Id: <20231028194522.245170-33-richard.hender...@linaro.org>
[PMD: Split from bigger patch, part 1/2]
Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20231108205247.83234-1-phi...@linaro.org>
---
  target/alpha/translate.c | 21 +++--
  1 file changed, 7 insertions(+), 14 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v3 36/38] tcg/s390x: Add TCG_CT_CONST_CMP

2024-01-16 Thread Philippe Mathieu-Daudé

On 10/1/24 23:44, Richard Henderson wrote:

Better constraint for tcg_out_cmp, based on the comparison.

Signed-off-by: Richard Henderson 
---
  tcg/s390x/tcg-target-con-set.h |  6 +--
  tcg/s390x/tcg-target-con-str.h |  1 +
  tcg/s390x/tcg-target.c.inc | 72 +-
  3 files changed, 58 insertions(+), 21 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v3 35/38] tcg/s390x: Split constraint A into J+U

2024-01-16 Thread Philippe Mathieu-Daudé

On 10/1/24 23:44, Richard Henderson wrote:

Signed 33-bit == signed 32-bit + unsigned 32-bit.

Signed-off-by: Richard Henderson 
---
  tcg/s390x/tcg-target-con-set.h |  8 
  tcg/s390x/tcg-target-con-str.h |  2 +-
  tcg/s390x/tcg-target.c.inc | 36 +-
  3 files changed, 23 insertions(+), 23 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v3 31/38] tcg/ppc: Use cr0 in tcg_to_bc and tcg_to_isel

2024-01-16 Thread Philippe Mathieu-Daudé

On 10/1/24 23:44, Richard Henderson wrote:

Using cr0 means we could choose to use rc=1 to compute the condition.
Adjust the tables and tcg_out_cmp that feeds them.

Signed-off-by: Richard Henderson 
---
  tcg/ppc/tcg-target.c.inc | 68 
  1 file changed, 34 insertions(+), 34 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v3 13/38] target/sparc: Use TCG_COND_TSTEQ in gen_op_mulscc

2024-01-16 Thread Philippe Mathieu-Daudé

On 10/1/24 23:43, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---
  target/sparc/translate.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 





Re: [PATCH v3 02/38] tcg: Introduce TCG_TARGET_HAS_tst

2024-01-16 Thread Philippe Mathieu-Daudé

On 10/1/24 23:43, Richard Henderson wrote:

Define as 0 for all tcg backends.

Signed-off-by: Richard Henderson 
---
  tcg/aarch64/tcg-target.h | 2 ++
  tcg/arm/tcg-target.h | 2 ++
  tcg/i386/tcg-target.h| 2 ++
  tcg/loongarch64/tcg-target.h | 2 ++
  tcg/mips/tcg-target.h| 2 ++
  tcg/ppc/tcg-target.h | 2 ++
  tcg/riscv/tcg-target.h   | 2 ++
  tcg/s390x/tcg-target.h   | 2 ++
  tcg/sparc64/tcg-target.h | 2 ++
  tcg/tci/tcg-target.h | 2 ++
  10 files changed, 20 insertions(+)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH] accel: Do not set CPUState::can_do_io in non-TCG accels

2024-01-16 Thread Philippe Mathieu-Daudé

On 30/11/23 14:51, Richard Henderson wrote:

On 11/29/23 14:50, Philippe Mathieu-Daudé wrote:

'can_do_io' is specific to TCG. Having it set in non-TCG
code is confusing, so remove it from QTest / HVF / KVM.

Signed-off-by: Philippe Mathieu-Daudé 
---
  accel/dummy-cpus.c    | 1 -
  accel/hvf/hvf-accel-ops.c | 1 -
  accel/kvm/kvm-accel-ops.c | 1 -
  3 files changed, 3 deletions(-)

diff --git a/accel/dummy-cpus.c b/accel/dummy-cpus.c
index b75c919ac3..1005ec6f56 100644
--- a/accel/dummy-cpus.c
+++ b/accel/dummy-cpus.c
@@ -27,7 +27,6 @@ static void *dummy_cpu_thread_fn(void *arg)
  qemu_mutex_lock_iothread();
  qemu_thread_get_self(cpu->thread);
  cpu->thread_id = qemu_get_thread_id();
-    cpu->neg.can_do_io = true;
  current_cpu = cpu;


I expect this is ok...

When I was moving this variable around just recently, 464dacf6, I 
wondered about these other settings, and I wondered if they used to be 
required for implementing some i/o on behalf of hw accel.  Something 
that has now been factored out to e.g. kvm_handle_io, which now uses 
address_space_rw directly.


It was added by mistake in commit 99df7dce8a ("cpu: Move can_do_io
field from CPU_COMMON to CPUState", 2013) to cpu_common_reset() and
commit 626cf8f4c6 ("icount: set can_do_io outside TB execution", 2014),
then moved in commits 57038a92bb ("cpus: extract out kvm-specific code
to accel/kvm"), b86f59c715 ("accel: replace struct CpusAccel with
AccelOpsClass") and the one you mentioned, 464dacf609 ("accel/tcg: Move
can_do_io to CPUNegativeOffsetState").

The culprit is 626cf8f4c6 I guess, so maybe:
Fixes: 626cf8f4c6 ("icount: set can_do_io outside TB execution")

I see 3 reads of can_do_io: accel/tcg/{cputlb.c, icount-common.c} and 
system/watchpoint.c.  The final one is nested within 
replay_running_debug(), which implies icount and tcg.


Reviewed-by: Richard Henderson 


Thanks!



[PATCH v3 04/13] target/riscv/insn_trans/trans_rvbf16.c.inc: use cpu->cfg.vlenb

2024-01-16 Thread Daniel Henrique Barboza
Use ctx->cfg_ptr->vlenb instead of ctx->cfg_ptr->vlen / 8.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Richard Henderson 
---
 target/riscv/insn_trans/trans_rvbf16.c.inc | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvbf16.c.inc 
b/target/riscv/insn_trans/trans_rvbf16.c.inc
index 4e39c00884..8ee99df3f3 100644
--- a/target/riscv/insn_trans/trans_rvbf16.c.inc
+++ b/target/riscv/insn_trans/trans_rvbf16.c.inc
@@ -83,8 +83,8 @@ static bool trans_vfncvtbf16_f_f_w(DisasContext *ctx, 
arg_vfncvtbf16_f_f_w *a)
 data = FIELD_DP32(data, VDATA, VMA, ctx->vma);
 tcg_gen_gvec_3_ptr(vreg_ofs(ctx, a->rd), vreg_ofs(ctx, 0),
vreg_ofs(ctx, a->rs2), tcg_env,
-   ctx->cfg_ptr->vlen / 8,
-   ctx->cfg_ptr->vlen / 8, data,
+   ctx->cfg_ptr->vlenb,
+   ctx->cfg_ptr->vlenb, data,
gen_helper_vfncvtbf16_f_f_w);
 mark_vs_dirty(ctx);
 gen_set_label(over);
@@ -112,8 +112,8 @@ static bool trans_vfwcvtbf16_f_f_v(DisasContext *ctx, 
arg_vfwcvtbf16_f_f_v *a)
 data = FIELD_DP32(data, VDATA, VMA, ctx->vma);
 tcg_gen_gvec_3_ptr(vreg_ofs(ctx, a->rd), vreg_ofs(ctx, 0),
vreg_ofs(ctx, a->rs2), tcg_env,
-   ctx->cfg_ptr->vlen / 8,
-   ctx->cfg_ptr->vlen / 8, data,
+   ctx->cfg_ptr->vlenb,
+   ctx->cfg_ptr->vlenb, data,
gen_helper_vfwcvtbf16_f_f_v);
 mark_vs_dirty(ctx);
 gen_set_label(over);
@@ -143,8 +143,8 @@ static bool trans_vfwmaccbf16_vv(DisasContext *ctx, 
arg_vfwmaccbf16_vv *a)
 tcg_gen_gvec_4_ptr(vreg_ofs(ctx, a->rd), vreg_ofs(ctx, 0),
vreg_ofs(ctx, a->rs1),
vreg_ofs(ctx, a->rs2), tcg_env,
-   ctx->cfg_ptr->vlen / 8,
-   ctx->cfg_ptr->vlen / 8, data,
+   ctx->cfg_ptr->vlenb,
+   ctx->cfg_ptr->vlenb, data,
gen_helper_vfwmaccbf16_vv);
 mark_vs_dirty(ctx);
 gen_set_label(over);
-- 
2.43.0




[PATCH v3 11/13] target/riscv: change vext_get_vlmax() arguments

2024-01-16 Thread Daniel Henrique Barboza
We'll re-use the logic froim vext_get_vlmax() in 2 other occurrences in
the next patch, but first we need to make it independent of both 'cpu'
and 'vtype'. To do that, add 'vlenb', 'vsew' and 'lmul' as parameters
instead.

Adapt the two existing callers. In cpu_get_tb_cpu_state(), rename 'sew'
to 'vsew' to be less ambiguous about what we're encoding into *pflags.

In HELPER(vsetvl) the following changes were made:

- add a 'vsew' var to store vsew. Use it in the shift to get 'sew';
- the existing 'lmul' var was renamed to 'vlmul';
- add a new 'lmul' var to store 'lmul' encoded like DisasContext:lmul.

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/cpu.h   |  7 +++
 target/riscv/cpu_helper.c| 11 +++
 target/riscv/vector_helper.c | 16 ++--
 3 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 3af61e0f94..9dcbc0649a 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -688,11 +688,10 @@ static inline RISCVMXL riscv_cpu_sxl(CPURISCVState *env)
  *   = 256 >> 7
  *   = 2
  */
-static inline uint32_t vext_get_vlmax(RISCVCPU *cpu, target_ulong vtype)
+static inline uint32_t vext_get_vlmax(uint32_t vlenb, uint32_t vsew,
+  int8_t lmul)
 {
-uint8_t vsew = FIELD_EX64(vtype, VTYPE, VSEW);
-int8_t lmul = sextract32(FIELD_EX64(vtype, VTYPE, VLMUL), 0, 3);
-uint32_t vlen = cpu->cfg.vlenb << 3;
+uint32_t vlen = vlenb << 3;
 
 /*
  * We need to use 'vlen' instead of 'vlenb' to
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index c7cc7eb423..8da9104da4 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -81,13 +81,16 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, vaddr *pc,
  * which is not supported by GVEC. So we set vl_eq_vlmax flag to true
  * only when maxsz >= 8 bytes.
  */
-uint32_t vlmax = vext_get_vlmax(cpu, env->vtype);
-uint32_t sew = FIELD_EX64(env->vtype, VTYPE, VSEW);
-uint32_t maxsz = vlmax << sew;
+
+/* lmul encoded as in DisasContext::lmul */
+int8_t lmul = sextract32(FIELD_EX64(env->vtype, VTYPE, VLMUL), 0, 3);
+uint32_t vsew = FIELD_EX64(env->vtype, VTYPE, VSEW);
+uint32_t vlmax = vext_get_vlmax(cpu->cfg.vlenb, vsew, lmul);
+uint32_t maxsz = vlmax << vsew;
 bool vl_eq_vlmax = (env->vstart == 0) && (vlmax == env->vl) &&
(maxsz >= 8);
 flags = FIELD_DP32(flags, TB_FLAGS, VILL, env->vill);
-flags = FIELD_DP32(flags, TB_FLAGS, SEW, sew);
+flags = FIELD_DP32(flags, TB_FLAGS, SEW, vsew);
 flags = FIELD_DP32(flags, TB_FLAGS, LMUL,
FIELD_EX64(env->vtype, VTYPE, VLMUL));
 flags = FIELD_DP32(flags, TB_FLAGS, VL_EQ_VLMAX, vl_eq_vlmax);
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index b13be1541a..718a0c711a 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -35,16 +35,18 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, 
target_ulong s1,
 {
 int vlmax, vl;
 RISCVCPU *cpu = env_archcpu(env);
-uint64_t lmul = FIELD_EX64(s2, VTYPE, VLMUL);
-uint16_t sew = 8 << FIELD_EX64(s2, VTYPE, VSEW);
+uint64_t vlmul = FIELD_EX64(s2, VTYPE, VLMUL);
+uint8_t vsew = FIELD_EX64(s2, VTYPE, VSEW);
+uint16_t sew = 8 << vsew;
 uint8_t ediv = FIELD_EX64(s2, VTYPE, VEDIV);
 int xlen = riscv_cpu_xlen(env);
 bool vill = (s2 >> (xlen - 1)) & 0x1;
 target_ulong reserved = s2 &
 MAKE_64BIT_MASK(R_VTYPE_RESERVED_SHIFT,
 xlen - 1 - R_VTYPE_RESERVED_SHIFT);
+int8_t lmul;
 
-if (lmul & 4) {
+if (vlmul & 4) {
 /*
  * Fractional LMUL, check:
  *
@@ -53,8 +55,8 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong 
s1,
  * (vlenb << 3) >> (8 - lmul) >= sew
  * vlenb >> (8 - 3 - lmul) >= sew
  */
-if (lmul == 4 ||
-cpu->cfg.vlenb >> (8 - 3 - lmul) < sew) {
+if (vlmul == 4 ||
+cpu->cfg.vlenb >> (8 - 3 - vlmul) < sew) {
 vill = true;
 }
 }
@@ -68,7 +70,9 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong 
s1,
 return 0;
 }
 
-vlmax = vext_get_vlmax(cpu, s2);
+/* lmul encoded as in DisasContext::lmul */
+lmul = sextract32(FIELD_EX64(s2, VTYPE, VLMUL), 0, 3);
+vlmax = vext_get_vlmax(cpu->cfg.vlenb, vsew, lmul);
 if (s1 <= vlmax) {
 vl = s1;
 } else {
-- 
2.43.0




[PATCH v3 13/13] target/riscv/cpu.c: remove cpu->cfg.vlen

2024-01-16 Thread Daniel Henrique Barboza
There is no need to keep both 'vlen' and 'vlenb'. All existing code
that requires 'vlen' is retrieving it via 'vlenb << 3'.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Richard Henderson 
---
 target/riscv/cpu.c | 8 +++-
 target/riscv/cpu_cfg.h | 1 -
 target/riscv/tcg/tcg-cpu.c | 4 +++-
 3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index f4261d2ffc..7b3f69d3fb 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1313,7 +1313,6 @@ static void riscv_cpu_init(Object *obj)
 
 /* Default values for non-bool cpu properties */
 cpu->cfg.pmu_mask = MAKE_64BIT_MASK(3, 16);
-cpu->cfg.vlen = 128;
 cpu->cfg.vlenb = 128 >> 3;
 cpu->cfg.elen = 64;
 cpu->env.vext_ver = VEXT_VERSION_1_00_0;
@@ -1802,22 +1801,21 @@ static void prop_vlen_set(Object *obj, Visitor *v, 
const char *name,
 return;
 }
 
-if (value != cpu->cfg.vlen && riscv_cpu_is_vendor(obj)) {
+if (value != cpu->cfg.vlenb && riscv_cpu_is_vendor(obj)) {
 cpu_set_prop_err(cpu, name, errp);
 error_append_hint(errp, "Current '%s' val: %u\n",
-  name, cpu->cfg.vlen);
+  name, cpu->cfg.vlenb << 3);
 return;
 }
 
 cpu_option_add_user_setting(name, value);
-cpu->cfg.vlen = value;
 cpu->cfg.vlenb = value >> 3;
 }
 
 static void prop_vlen_get(Object *obj, Visitor *v, const char *name,
  void *opaque, Error **errp)
 {
-uint16_t value = RISCV_CPU(obj)->cfg.vlen;
+uint16_t value = RISCV_CPU(obj)->cfg.vlenb << 3;
 
 visit_type_uint16(v, name, , errp);
 }
diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
index 50479dd72f..e241922f89 100644
--- a/target/riscv/cpu_cfg.h
+++ b/target/riscv/cpu_cfg.h
@@ -139,7 +139,6 @@ struct RISCVCPUConfig {
 bool ext_XVentanaCondOps;
 
 uint32_t pmu_mask;
-uint16_t vlen;
 uint16_t vlenb;
 uint16_t elen;
 uint16_t cbom_blocksize;
diff --git a/target/riscv/tcg/tcg-cpu.c b/target/riscv/tcg/tcg-cpu.c
index daff0b8f60..667421b0b7 100644
--- a/target/riscv/tcg/tcg-cpu.c
+++ b/target/riscv/tcg/tcg-cpu.c
@@ -298,7 +298,9 @@ static void riscv_cpu_validate_misa_mxl(RISCVCPU *cpu, 
Error **errp)
 static void riscv_cpu_validate_v(CPURISCVState *env, RISCVCPUConfig *cfg,
  Error **errp)
 {
-if (cfg->vlen > RV_VLEN_MAX || cfg->vlen < 128) {
+uint32_t vlen = cfg->vlenb << 3;
+
+if (vlen > RV_VLEN_MAX || vlen < 128) {
 error_setg(errp,
"Vector extension implementation only supports VLEN "
"in the range [128, %d]", RV_VLEN_MAX);
-- 
2.43.0




[PATCH v3 05/13] target/riscv/insn_trans/trans_rvv.c.inc: use 'vlenb'

2024-01-16 Thread Daniel Henrique Barboza
Use s->cfg_ptr->vlenb instead of "s->cfg_ptr->vlen / 8"  and
"s->cfg_ptr->vlen >> 3".

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Richard Henderson 
---
 target/riscv/insn_trans/trans_rvv.c.inc | 140 
 1 file changed, 70 insertions(+), 70 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
b/target/riscv/insn_trans/trans_rvv.c.inc
index 3871f0ea73..d743675262 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -217,7 +217,7 @@ static bool trans_vsetivli(DisasContext *s, arg_vsetivli *a)
 /* vector register offset from env */
 static uint32_t vreg_ofs(DisasContext *s, int reg)
 {
-return offsetof(CPURISCVState, vreg) + reg * s->cfg_ptr->vlen / 8;
+return offsetof(CPURISCVState, vreg) + reg * s->cfg_ptr->vlenb;
 }
 
 /* check functions */
@@ -627,11 +627,11 @@ static bool ldst_us_trans(uint32_t vd, uint32_t rs1, 
uint32_t data,
  * As simd_desc supports at most 2048 bytes, and in this implementation,
  * the max vector group length is 4096 bytes. So split it into two parts.
  *
- * The first part is vlen in bytes, encoded in maxsz of simd_desc.
+ * The first part is vlen in bytes (vlenb), encoded in maxsz of simd_desc.
  * The second part is lmul, encoded in data of simd_desc.
  */
-desc = tcg_constant_i32(simd_desc(s->cfg_ptr->vlen / 8,
-  s->cfg_ptr->vlen / 8, data));
+desc = tcg_constant_i32(simd_desc(s->cfg_ptr->vlenb,
+  s->cfg_ptr->vlenb, data));
 
 tcg_gen_addi_ptr(dest, tcg_env, vreg_ofs(s, vd));
 tcg_gen_addi_ptr(mask, tcg_env, vreg_ofs(s, 0));
@@ -791,8 +791,8 @@ static bool ldst_stride_trans(uint32_t vd, uint32_t rs1, 
uint32_t rs2,
 mask = tcg_temp_new_ptr();
 base = get_gpr(s, rs1, EXT_NONE);
 stride = get_gpr(s, rs2, EXT_NONE);
-desc = tcg_constant_i32(simd_desc(s->cfg_ptr->vlen / 8,
-  s->cfg_ptr->vlen / 8, data));
+desc = tcg_constant_i32(simd_desc(s->cfg_ptr->vlenb,
+  s->cfg_ptr->vlenb, data));
 
 tcg_gen_addi_ptr(dest, tcg_env, vreg_ofs(s, vd));
 tcg_gen_addi_ptr(mask, tcg_env, vreg_ofs(s, 0));
@@ -897,8 +897,8 @@ static bool ldst_index_trans(uint32_t vd, uint32_t rs1, 
uint32_t vs2,
 mask = tcg_temp_new_ptr();
 index = tcg_temp_new_ptr();
 base = get_gpr(s, rs1, EXT_NONE);
-desc = tcg_constant_i32(simd_desc(s->cfg_ptr->vlen / 8,
-  s->cfg_ptr->vlen / 8, data));
+desc = tcg_constant_i32(simd_desc(s->cfg_ptr->vlenb,
+  s->cfg_ptr->vlenb, data));
 
 tcg_gen_addi_ptr(dest, tcg_env, vreg_ofs(s, vd));
 tcg_gen_addi_ptr(index, tcg_env, vreg_ofs(s, vs2));
@@ -1036,8 +1036,8 @@ static bool ldff_trans(uint32_t vd, uint32_t rs1, 
uint32_t data,
 dest = tcg_temp_new_ptr();
 mask = tcg_temp_new_ptr();
 base = get_gpr(s, rs1, EXT_NONE);
-desc = tcg_constant_i32(simd_desc(s->cfg_ptr->vlen / 8,
-  s->cfg_ptr->vlen / 8, data));
+desc = tcg_constant_i32(simd_desc(s->cfg_ptr->vlenb,
+  s->cfg_ptr->vlenb, data));
 
 tcg_gen_addi_ptr(dest, tcg_env, vreg_ofs(s, vd));
 tcg_gen_addi_ptr(mask, tcg_env, vreg_ofs(s, 0));
@@ -1086,7 +1086,7 @@ static bool ldst_whole_trans(uint32_t vd, uint32_t rs1, 
uint32_t nf,
  uint32_t width, gen_helper_ldst_whole *fn,
  DisasContext *s, bool is_store)
 {
-uint32_t evl = (s->cfg_ptr->vlen / 8) * nf / width;
+uint32_t evl = s->cfg_ptr->vlenb * nf / width;
 TCGLabel *over = gen_new_label();
 tcg_gen_brcondi_tl(TCG_COND_GEU, cpu_vstart, evl, over);
 
@@ -1096,8 +1096,8 @@ static bool ldst_whole_trans(uint32_t vd, uint32_t rs1, 
uint32_t nf,
 
 uint32_t data = FIELD_DP32(0, VDATA, NF, nf);
 dest = tcg_temp_new_ptr();
-desc = tcg_constant_i32(simd_desc(s->cfg_ptr->vlen / 8,
-  s->cfg_ptr->vlen / 8, data));
+desc = tcg_constant_i32(simd_desc(s->cfg_ptr->vlenb,
+  s->cfg_ptr->vlenb, data));
 
 base = get_gpr(s, rs1, EXT_NONE);
 tcg_gen_addi_ptr(dest, tcg_env, vreg_ofs(s, vd));
@@ -1199,8 +1199,8 @@ do_opivv_gvec(DisasContext *s, arg_rmrr *a, GVecGen3Fn 
*gvec_fn,
 data = FIELD_DP32(data, VDATA, VMA, s->vma);
 tcg_gen_gvec_4_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0),
vreg_ofs(s, a->rs1), vreg_ofs(s, a->rs2),
-   tcg_env, s->cfg_ptr->vlen / 8,
-   s->cfg_ptr->vlen / 8, data, fn);
+   tcg_env, s->cfg_ptr->vlenb,
+   s->cfg_ptr->vlenb, data, fn);
 }
 mark_vs_dirty(s);
 gen_set_label(over);
@@ -1248,8 +1248,8 @@ static bool opivx_trans(uint32_t vd, uint32_t rs1, 

[PATCH v3 00/13] target/riscv: add 'cpu->cfg.vlenb', remove 'cpu->cfg.vlen'

2024-01-16 Thread Daniel Henrique Barboza
Hi,

In this v3 the most significant change is with vext_get_vlmax() from
cpu.h. The logic used in this function is also used in at least two
other places, trans_vrgather_vi() and trans_vrgather_vx(), and we need
to make changes in them to remove 'vlen' occurrences.

Instead, we're adding an extra patch (11) to rework vext_get_vlmax()
arguments to make the function usable in trans_vrgather_v*(). This
rework includes some naming changes in local variables - we're using
'vsew' and 'vlmul' more often to be less ambiguous when reading code. 

Series based on Alistair's riscv-to-apply.next.

Patches missing review: patches 10, 11, 12.

Changes from v3:
- patch 8:
  - changed fractional LMUL comment to show the expansion
- patches 9 and 10: switched places
- patch 10 (former 9):
  - use 'vlen' in vext_get_vlmax() to avoid a negative shift 
- patch 11 (new):
  - change vext_get_vlmax() to use 'vlenb', 'vsew' and 'lmul'
- patch 12 (former 11):
  - use vext_get_vlmax() instead of calculating vlmax manually
- v2 link: 
https://lore.kernel.org/qemu-riscv/20240115222528.257342-1-dbarb...@ventanamicro.com/


Daniel Henrique Barboza (13):
  target/riscv: add 'vlenb' field in cpu->cfg
  target/riscv/csr.c: use 'vlenb' instead of 'vlen'
  target/riscv/gdbstub.c: use 'vlenb' instead of shifting 'vlen'
  target/riscv/insn_trans/trans_rvbf16.c.inc: use cpu->cfg.vlenb
  target/riscv/insn_trans/trans_rvv.c.inc: use 'vlenb'
  target/riscv/insn_trans/trans_rvvk.c.inc: use 'vlenb'
  target/riscv/vector_helper.c: use 'vlenb'
  target/riscv/vector_helper.c: use vlenb in HELPER(vsetvl)
  target/riscv/insn_trans/trans_rvv.c.inc: use 'vlenb' in MAXSZ()
  target/riscv/cpu.h: use 'vlenb' in vext_get_vlmax()
  target/riscv: change vext_get_vlmax() arguments
  trans_rvv.c.inc: use vext_get_vlmax() in trans_vrgather_v*()
  target/riscv/cpu.c: remove cpu->cfg.vlen

 target/riscv/cpu.c |  12 +-
 target/riscv/cpu.h |  14 +-
 target/riscv/cpu_cfg.h |   2 +-
 target/riscv/cpu_helper.c  |  11 +-
 target/riscv/csr.c |   4 +-
 target/riscv/gdbstub.c |   6 +-
 target/riscv/insn_trans/trans_rvbf16.c.inc |  12 +-
 target/riscv/insn_trans/trans_rvv.c.inc| 152 ++---
 target/riscv/insn_trans/trans_rvvk.c.inc   |  16 +--
 target/riscv/tcg/tcg-cpu.c |   4 +-
 target/riscv/vector_helper.c   |  43 +++---
 11 files changed, 148 insertions(+), 128 deletions(-)

-- 
2.43.0




[PATCH v3 06/13] target/riscv/insn_trans/trans_rvvk.c.inc: use 'vlenb'

2024-01-16 Thread Daniel Henrique Barboza
Use s->cfg_ptr->vlenb instead of s->cfg_ptr->vlen / 8.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Richard Henderson 
---
 target/riscv/insn_trans/trans_rvvk.c.inc | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvvk.c.inc 
b/target/riscv/insn_trans/trans_rvvk.c.inc
index 3801c16829..a5cdd1b67f 100644
--- a/target/riscv/insn_trans/trans_rvvk.c.inc
+++ b/target/riscv/insn_trans/trans_rvvk.c.inc
@@ -174,7 +174,7 @@ GEN_OPIVX_GVEC_TRANS_CHECK(vandn_vx, andcs, zvkb_vx_check)
 data = FIELD_DP32(data, VDATA, VMA, s->vma);   \
 tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, 0), \
vreg_ofs(s, a->rs2), tcg_env,   \
-   s->cfg_ptr->vlen / 8, s->cfg_ptr->vlen / 8, \
+   s->cfg_ptr->vlenb, s->cfg_ptr->vlenb,   \
data, fns[s->sew]); \
 mark_vs_dirty(s);  \
 gen_set_label(over);   \
@@ -267,7 +267,7 @@ GEN_OPIVI_WIDEN_TRANS(vwsll_vi, IMM_ZX, vwsll_vx, 
vwsll_vx_check)
 rd_v = tcg_temp_new_ptr();\
 rs2_v = tcg_temp_new_ptr();   \
 desc = tcg_constant_i32(  \
-simd_desc(s->cfg_ptr->vlen / 8, s->cfg_ptr->vlen / 8, data)); \
+simd_desc(s->cfg_ptr->vlenb, s->cfg_ptr->vlenb, data));   \
 tcg_gen_addi_ptr(rd_v, tcg_env, vreg_ofs(s, a->rd));  \
 tcg_gen_addi_ptr(rs2_v, tcg_env, vreg_ofs(s, a->rs2));\
 gen_helper_##NAME(rd_v, rs2_v, tcg_env, desc);\
@@ -345,7 +345,7 @@ GEN_V_UNMASKED_TRANS(vaesem_vs, vaes_check_vs, ZVKNED_EGS)
 rs2_v = tcg_temp_new_ptr();   \
 uimm_v = tcg_constant_i32(a->rs1);\
 desc = tcg_constant_i32(  \
-simd_desc(s->cfg_ptr->vlen / 8, s->cfg_ptr->vlen / 8, data)); \
+simd_desc(s->cfg_ptr->vlenb, s->cfg_ptr->vlenb, data));   \
 tcg_gen_addi_ptr(rd_v, tcg_env, vreg_ofs(s, a->rd));  \
 tcg_gen_addi_ptr(rs2_v, tcg_env, vreg_ofs(s, a->rs2));\
 gen_helper_##NAME(rd_v, rs2_v, uimm_v, tcg_env, desc);\
@@ -413,7 +413,7 @@ GEN_VI_UNMASKED_TRANS(vaeskf2_vi, vaeskf2_check, ZVKNED_EGS)
   \
 tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, a->rs1),   \
vreg_ofs(s, a->rs2), tcg_env,  \
-   s->cfg_ptr->vlen / 8, s->cfg_ptr->vlen / 8,\
+   s->cfg_ptr->vlenb, s->cfg_ptr->vlenb,  \
data, gen_helper_##NAME);  \
   \
 mark_vs_dirty(s); \
@@ -466,8 +466,8 @@ static bool trans_vsha2cl_vv(DisasContext *s, arg_rmrr *a)
 data = FIELD_DP32(data, VDATA, VMA, s->vma);
 
 tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, a->rs1),
-vreg_ofs(s, a->rs2), tcg_env, s->cfg_ptr->vlen / 8,
-s->cfg_ptr->vlen / 8, data,
+vreg_ofs(s, a->rs2), tcg_env, s->cfg_ptr->vlenb,
+s->cfg_ptr->vlenb, data,
 s->sew == MO_32 ?
 gen_helper_vsha2cl32_vv : gen_helper_vsha2cl64_vv);
 
@@ -500,8 +500,8 @@ static bool trans_vsha2ch_vv(DisasContext *s, arg_rmrr *a)
 data = FIELD_DP32(data, VDATA, VMA, s->vma);
 
 tcg_gen_gvec_3_ptr(vreg_ofs(s, a->rd), vreg_ofs(s, a->rs1),
-vreg_ofs(s, a->rs2), tcg_env, s->cfg_ptr->vlen / 8,
-s->cfg_ptr->vlen / 8, data,
+vreg_ofs(s, a->rs2), tcg_env, s->cfg_ptr->vlenb,
+s->cfg_ptr->vlenb, data,
 s->sew == MO_32 ?
 gen_helper_vsha2ch32_vv : gen_helper_vsha2ch64_vv);
 
-- 
2.43.0




[PATCH v3 10/13] target/riscv/cpu.h: use 'vlenb' in vext_get_vlmax()

2024-01-16 Thread Daniel Henrique Barboza
Rename the existing 'sew' variable to 'vsew' for extra clarity.

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/cpu.h | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 11df226a00..3af61e0f94 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -690,9 +690,16 @@ static inline RISCVMXL riscv_cpu_sxl(CPURISCVState *env)
  */
 static inline uint32_t vext_get_vlmax(RISCVCPU *cpu, target_ulong vtype)
 {
-uint8_t sew = FIELD_EX64(vtype, VTYPE, VSEW);
+uint8_t vsew = FIELD_EX64(vtype, VTYPE, VSEW);
 int8_t lmul = sextract32(FIELD_EX64(vtype, VTYPE, VLMUL), 0, 3);
-return cpu->cfg.vlen >> (sew + 3 - lmul);
+uint32_t vlen = cpu->cfg.vlenb << 3;
+
+/*
+ * We need to use 'vlen' instead of 'vlenb' to
+ * preserve the '+ 3' in the formula. Otherwise
+ * we risk a negative shift if vsew < lmul.
+ */
+return vlen >> (vsew + 3 - lmul);
 }
 
 void cpu_get_tb_cpu_state(CPURISCVState *env, vaddr *pc,
-- 
2.43.0




[PATCH v3 12/13] trans_rvv.c.inc: use vext_get_vlmax() in trans_vrgather_v*()

2024-01-16 Thread Daniel Henrique Barboza
Use the helper instead of calculating vlmax by hand.

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/insn_trans/trans_rvv.c.inc | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
b/target/riscv/insn_trans/trans_rvv.c.inc
index b4663b6e1f..9e101ab434 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -3535,8 +3535,7 @@ static bool trans_vrgather_vx(DisasContext *s, arg_rmrr 
*a)
 }
 
 if (a->vm && s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
-int scale = s->lmul - (s->sew + 3);
-int vlmax = s->cfg_ptr->vlen >> -scale;
+int vlmax = vext_get_vlmax(s->cfg_ptr->vlenb, s->sew, s->lmul);
 TCGv_i64 dest = tcg_temp_new_i64();
 
 if (a->rs1 == 0) {
@@ -3566,8 +3565,7 @@ static bool trans_vrgather_vi(DisasContext *s, arg_rmrr 
*a)
 }
 
 if (a->vm && s->vl_eq_vlmax && !(s->vta && s->lmul < 0)) {
-int scale = s->lmul - (s->sew + 3);
-int vlmax = s->cfg_ptr->vlen >> -scale;
+int vlmax = vext_get_vlmax(s->cfg_ptr->vlenb, s->sew, s->lmul);
 if (a->rs1 >= vlmax) {
 tcg_gen_gvec_dup_imm(MO_64, vreg_ofs(s, a->rd),
  MAXSZ(s), MAXSZ(s), 0);
-- 
2.43.0




[PATCH v3 07/13] target/riscv/vector_helper.c: use 'vlenb'

2024-01-16 Thread Daniel Henrique Barboza
Use 'cpu->cfg.vlenb' instead of 'cpu->cfg.vlen >> 3'.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Richard Henderson 
---
 target/riscv/vector_helper.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index c1c3a4d1ea..cb944229b0 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -558,7 +558,7 @@ vext_ldst_whole(void *vd, target_ulong base, CPURISCVState 
*env, uint32_t desc,
 {
 uint32_t i, k, off, pos;
 uint32_t nf = vext_nf(desc);
-uint32_t vlenb = riscv_cpu_cfg(env)->vlen >> 3;
+uint32_t vlenb = riscv_cpu_cfg(env)->vlenb;
 uint32_t max_elems = vlenb >> log2_esz;
 
 k = env->vstart / max_elems;
@@ -929,7 +929,7 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void *vs2, 
  \
 { \
 uint32_t vl = env->vl;\
 uint32_t vm = vext_vm(desc);  \
-uint32_t total_elems = riscv_cpu_cfg(env)->vlen;  \
+uint32_t total_elems = riscv_cpu_cfg(env)->vlenb << 3;\
 uint32_t vta_all_1s = vext_vta_all_1s(desc);  \
 uint32_t i;   \
   \
@@ -967,7 +967,7 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1,  
\
 {   \
 uint32_t vl = env->vl;  \
 uint32_t vm = vext_vm(desc);\
-uint32_t total_elems = riscv_cpu_cfg(env)->vlen;\
+uint32_t total_elems = riscv_cpu_cfg(env)->vlenb << 3;  \
 uint32_t vta_all_1s = vext_vta_all_1s(desc);\
 uint32_t i; \
 \
@@ -1171,7 +1171,7 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void 
*vs2,   \
 { \
 uint32_t vm = vext_vm(desc);  \
 uint32_t vl = env->vl;\
-uint32_t total_elems = riscv_cpu_cfg(env)->vlen;  \
+uint32_t total_elems = riscv_cpu_cfg(env)->vlenb << 3;\
 uint32_t vta_all_1s = vext_vta_all_1s(desc);  \
 uint32_t vma = vext_vma(desc);\
 uint32_t i;   \
@@ -1236,7 +1236,7 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong s1, 
void *vs2,   \
 {   \
 uint32_t vm = vext_vm(desc);\
 uint32_t vl = env->vl;  \
-uint32_t total_elems = riscv_cpu_cfg(env)->vlen;\
+uint32_t total_elems = riscv_cpu_cfg(env)->vlenb << 3;  \
 uint32_t vta_all_1s = vext_vta_all_1s(desc);\
 uint32_t vma = vext_vma(desc);  \
 uint32_t i; \
@@ -3971,7 +3971,7 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1, void 
*vs2,   \
 { \
 uint32_t vm = vext_vm(desc);  \
 uint32_t vl = env->vl;\
-uint32_t total_elems = riscv_cpu_cfg(env)->vlen;  \
+uint32_t total_elems = riscv_cpu_cfg(env)->vlenb << 3;\
 uint32_t vta_all_1s = vext_vta_all_1s(desc);  \
 uint32_t vma = vext_vma(desc);\
 uint32_t i;   \
@@ -4011,7 +4011,7 @@ void HELPER(NAME)(void *vd, void *v0, uint64_t s1, void 
*vs2,   \
 {   \
 uint32_t vm = vext_vm(desc);\
 uint32_t vl = env->vl;  \
-uint32_t total_elems = riscv_cpu_cfg(env)->vlen;\
+uint32_t total_elems = riscv_cpu_cfg(env)->vlenb << 3;  \
 uint32_t vta_all_1s = vext_vta_all_1s(desc);\
 uint32_t vma = vext_vma(desc);  \
 uint32_t i; \
@@ -4528,7 +4528,7 @@ void HELPER(NAME)(void *vd, void *v0, void *vs1,  
\
   uint32_t desc)  \
 { \
 uint32_t vl = env->vl;\
-uint32_t total_elems = riscv_cpu_cfg(env)->vlen;  \
+uint32_t total_elems = riscv_cpu_cfg(env)->vlenb << 3;\
 uint32_t vta_all_1s = vext_vta_all_1s(desc);  \
 uint32_t i; 

[PATCH v3 08/13] target/riscv/vector_helper.c: use vlenb in HELPER(vsetvl)

2024-01-16 Thread Daniel Henrique Barboza
Use the new 'vlenb' CPU config to validate fractional LMUL. The original
comparison is done with 'vlen' and 'sew', both in bits. Adjust the shift
to use vlenb.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Richard Henderson 
---
 target/riscv/vector_helper.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index cb944229b0..b13be1541a 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -45,9 +45,16 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong 
s1,
 xlen - 1 - R_VTYPE_RESERVED_SHIFT);
 
 if (lmul & 4) {
-/* Fractional LMUL - check LMUL * VLEN >= SEW */
+/*
+ * Fractional LMUL, check:
+ *
+ * VLEN * LMUL >= SEW
+ * VLEN >> (8 - lmul) >= sew
+ * (vlenb << 3) >> (8 - lmul) >= sew
+ * vlenb >> (8 - 3 - lmul) >= sew
+ */
 if (lmul == 4 ||
-cpu->cfg.vlen >> (8 - lmul) < sew) {
+cpu->cfg.vlenb >> (8 - 3 - lmul) < sew) {
 vill = true;
 }
 }
-- 
2.43.0




[PATCH v3 03/13] target/riscv/gdbstub.c: use 'vlenb' instead of shifting 'vlen'

2024-01-16 Thread Daniel Henrique Barboza
Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Richard Henderson 
---
 target/riscv/gdbstub.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/riscv/gdbstub.c b/target/riscv/gdbstub.c
index 58b3ace0fe..5ab0abda19 100644
--- a/target/riscv/gdbstub.c
+++ b/target/riscv/gdbstub.c
@@ -130,7 +130,7 @@ static int riscv_gdb_set_fpu(CPURISCVState *env, uint8_t 
*mem_buf, int n)
 
 static int riscv_gdb_get_vector(CPURISCVState *env, GByteArray *buf, int n)
 {
-uint16_t vlenb = riscv_cpu_cfg(env)->vlen >> 3;
+uint16_t vlenb = riscv_cpu_cfg(env)->vlenb;
 if (n < 32) {
 int i;
 int cnt = 0;
@@ -146,7 +146,7 @@ static int riscv_gdb_get_vector(CPURISCVState *env, 
GByteArray *buf, int n)
 
 static int riscv_gdb_set_vector(CPURISCVState *env, uint8_t *mem_buf, int n)
 {
-uint16_t vlenb = riscv_cpu_cfg(env)->vlen >> 3;
+uint16_t vlenb = riscv_cpu_cfg(env)->vlenb;
 if (n < 32) {
 int i;
 for (i = 0; i < vlenb; i += 8) {
@@ -266,7 +266,7 @@ static int ricsv_gen_dynamic_vector_xml(CPUState *cs, int 
base_reg)
 RISCVCPU *cpu = RISCV_CPU(cs);
 GString *s = g_string_new(NULL);
 g_autoptr(GString) ts = g_string_new("");
-int reg_width = cpu->cfg.vlen;
+int reg_width = cpu->cfg.vlenb << 3;
 int num_regs = 0;
 int i;
 
-- 
2.43.0




[PATCH v3 02/13] target/riscv/csr.c: use 'vlenb' instead of 'vlen'

2024-01-16 Thread Daniel Henrique Barboza
As a bonus, we're being more idiomatic using cpu->cfg.vlenb when
reading CSR_VLENB.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Richard Henderson 
---
 target/riscv/csr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 674ea075a4..5c8d22452b 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -683,7 +683,7 @@ static RISCVException read_vl(CPURISCVState *env, int csrno,
 
 static int read_vlenb(CPURISCVState *env, int csrno, target_ulong *val)
 {
-*val = riscv_cpu_cfg(env)->vlen >> 3;
+*val = riscv_cpu_cfg(env)->vlenb;
 return RISCV_EXCP_NONE;
 }
 
@@ -738,7 +738,7 @@ static RISCVException write_vstart(CPURISCVState *env, int 
csrno,
  * The vstart CSR is defined to have only enough writable bits
  * to hold the largest element index, i.e. lg2(VLEN) bits.
  */
-env->vstart = val & ~(~0ULL << ctzl(riscv_cpu_cfg(env)->vlen));
+env->vstart = val & ~(~0ULL << ctzl(riscv_cpu_cfg(env)->vlenb << 3));
 return RISCV_EXCP_NONE;
 }
 
-- 
2.43.0




[PATCH v3 01/13] target/riscv: add 'vlenb' field in cpu->cfg

2024-01-16 Thread Daniel Henrique Barboza
Our usage of 'vlenb' is overwhelming superior than the use of 'vlen'.
We're using 'vlenb' most of the time, having to do 'vlen >> 3' or
'vlen / 8' in every instance.

In hindsight we would be better if the 'vlenb' property  was introduced
instead of 'vlen'. That's not what happened, and now we can't easily get
rid of it due to user scripts all around. What we can do, however, is to
change our internal representation to use 'vlenb'.

Add a 'vlenb' field in cpu->cfg. It'll be set via the existing 'vlen'
property, i.e. setting 'vlen' will also set 'vlenb'.

We'll replace all 'vlen >> 3' code to use 'vlenb' directly. Start with
the single instance we have in target/riscv/cpu.c.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Richard Henderson 
---
 target/riscv/cpu.c | 4 +++-
 target/riscv/cpu_cfg.h | 1 +
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 8d3ec74a1c..f4261d2ffc 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -847,7 +847,7 @@ static void riscv_cpu_dump_state(CPUState *cs, FILE *f, int 
flags)
  csr_ops[csrno].name, val);
 }
 }
-uint16_t vlenb = cpu->cfg.vlen >> 3;
+uint16_t vlenb = cpu->cfg.vlenb;
 
 for (i = 0; i < 32; i++) {
 qemu_fprintf(f, " %-8s ", riscv_rvv_regnames[i]);
@@ -1314,6 +1314,7 @@ static void riscv_cpu_init(Object *obj)
 /* Default values for non-bool cpu properties */
 cpu->cfg.pmu_mask = MAKE_64BIT_MASK(3, 16);
 cpu->cfg.vlen = 128;
+cpu->cfg.vlenb = 128 >> 3;
 cpu->cfg.elen = 64;
 cpu->env.vext_ver = VEXT_VERSION_1_00_0;
 }
@@ -1810,6 +1811,7 @@ static void prop_vlen_set(Object *obj, Visitor *v, const 
char *name,
 
 cpu_option_add_user_setting(name, value);
 cpu->cfg.vlen = value;
+cpu->cfg.vlenb = value >> 3;
 }
 
 static void prop_vlen_get(Object *obj, Visitor *v, const char *name,
diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
index fea14c275f..50479dd72f 100644
--- a/target/riscv/cpu_cfg.h
+++ b/target/riscv/cpu_cfg.h
@@ -140,6 +140,7 @@ struct RISCVCPUConfig {
 
 uint32_t pmu_mask;
 uint16_t vlen;
+uint16_t vlenb;
 uint16_t elen;
 uint16_t cbom_blocksize;
 uint16_t cbop_blocksize;
-- 
2.43.0




[PATCH v3 09/13] target/riscv/insn_trans/trans_rvv.c.inc: use 'vlenb' in MAXSZ()

2024-01-16 Thread Daniel Henrique Barboza
Calculate the maximum vector size possible, 'max_sz', which is the size
in bytes 'vlenb' multiplied by the max value of LMUL (LMUL = 8, when
s->lmul = 3).

'max_sz' is then shifted right by 'scale', expressed as '3 - s->lmul',
which is clearer than doing 'scale = lmul - 3' and then using '-scale'
in the shift right.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Richard Henderson 
---
 target/riscv/insn_trans/trans_rvv.c.inc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
b/target/riscv/insn_trans/trans_rvv.c.inc
index d743675262..b4663b6e1f 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -1160,12 +1160,12 @@ GEN_LDST_WHOLE_TRANS(vs8r_v, 8, 1, true)
 /*
  * MAXSZ returns the maximum vector size can be operated in bytes,
  * which is used in GVEC IR when vl_eq_vlmax flag is set to true
- * to accerlate vector operation.
+ * to accelerate vector operation.
  */
 static inline uint32_t MAXSZ(DisasContext *s)
 {
-int scale = s->lmul - 3;
-return s->cfg_ptr->vlen >> -scale;
+int max_sz = s->cfg_ptr->vlenb * 8;
+return max_sz >> (3 - s->lmul);
 }
 
 static bool opivv_check(DisasContext *s, arg_rmrr *a)
-- 
2.43.0




Re: [PATCH V2 11/11] vfio: allow cpr-reboot migration if suspended

2024-01-16 Thread Steven Sistare



On 1/16/2024 3:37 PM, Steven Sistare wrote:
> On 1/15/2024 2:33 AM, Peter Xu wrote:
>> On Fri, Jan 12, 2024 at 07:05:10AM -0800, Steve Sistare wrote:
>>> Allow cpr-reboot for vfio if the guest is in the suspended runstate.  The
>>> guest drivers' suspend methods flush outstanding requests and re-initialize
>>> the devices, and thus there is no device state to save and restore.  The
>>> user is responsible for suspending the guest before initiating cpr, such as
>>> by issuing guest-suspend-ram to the qemu guest agent.
>>>
>>> Relax the vfio blocker so it does not apply to cpr, and add a notifier that
>>> verifies the guest is suspended.  Skip dirty page tracking, which is N/A for
>>> cpr, to avoid ioctl errors.
>>>
>>> Signed-off-by: Steve Sistare 
>>> ---
>>>  hw/vfio/common.c  |  2 +-
>>>  hw/vfio/cpr.c | 20 
>>>  hw/vfio/migration.c   |  2 +-
>>>  include/hw/vfio/vfio-common.h |  1 +
>>>  migration/ram.c   |  9 +
>>>  5 files changed, 28 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>> index 0b3352f..09af934 100644
>>> --- a/hw/vfio/common.c
>>> +++ b/hw/vfio/common.c
>>> @@ -128,7 +128,7 @@ int vfio_block_multiple_devices_migration(VFIODevice 
>>> *vbasedev, Error **errp)
>>>  error_setg(_devices_migration_blocker,
>>> "Multiple VFIO devices migration is supported only if all 
>>> of "
>>> "them support P2P migration");
>>> -ret = migrate_add_blocker(_devices_migration_blocker, errp);
>>> +ret = migrate_add_blocker_normal(_devices_migration_blocker, 
>>> errp);
>>>  
>>>  return ret;
>>>  }
>>> diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
>>> index bbd1c7a..9f4b1fe 100644
>>> --- a/hw/vfio/cpr.c
>>> +++ b/hw/vfio/cpr.c
>>> @@ -7,13 +7,33 @@
>>>  
>>>  #include "qemu/osdep.h"
>>>  #include "hw/vfio/vfio-common.h"
>>> +#include "migration/misc.h"
>>>  #include "qapi/error.h"
>>> +#include "sysemu/runstate.h"
>>> +
>>> +static int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier,
>>> +MigrationEvent *e, Error **errp)
>>> +{
>>> +if (e->state == MIGRATION_STATUS_SETUP &&
>>> +!runstate_check(RUN_STATE_SUSPENDED)) {
>>> +
>>> +error_setg(errp,
>>> +"VFIO device only supports cpr-reboot for runstate suspended");
>>> +
>>> +return -1;
>>> +}
>>
>> What happens if the guest is suspended during SETUP, but then quickly waked
>> up before CPR migration completes?
> 
> That is a management layer bug -- we told them the VM must be suspended.
> However, I will reject a wakeup request if migration is active and mode is 
> cpr.
> 
>>> +return 0;
>>> +}
>>>  
>>>  int vfio_cpr_register_container(VFIOContainer *container, Error **errp)
>>>  {
>>> +migration_add_notifier_mode(>cpr_reboot_notifier,
>>> +vfio_cpr_reboot_notifier,
>>> +MIG_MODE_CPR_REBOOT);
>>>  return 0;
>>>  }
>>>  
>>>  void vfio_cpr_unregister_container(VFIOContainer *container)
>>>  {
>>> +migration_remove_notifier(>cpr_reboot_notifier);
>>>  }
>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>> index 534fddf..488905d 100644
>>> --- a/hw/vfio/migration.c
>>> +++ b/hw/vfio/migration.c
>>> @@ -895,7 +895,7 @@ static int vfio_block_migration(VFIODevice *vbasedev, 
>>> Error *err, Error **errp)
>>>  vbasedev->migration_blocker = error_copy(err);
>>>  error_free(err);
>>>  
>>> -return migrate_add_blocker(>migration_blocker, errp);
>>> +return migrate_add_blocker_normal(>migration_blocker, errp);
>>>  }
>>>  
>>>  /* -- 
>>> */
>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>> index 1add5b7..7a46e24 100644
>>> --- a/include/hw/vfio/vfio-common.h
>>> +++ b/include/hw/vfio/vfio-common.h
>>> @@ -78,6 +78,7 @@ struct VFIOGroup;
>>>  typedef struct VFIOContainer {
>>>  VFIOContainerBase bcontainer;
>>>  int fd; /* /dev/vfio/vfio, empowered by the attached groups */
>>> +NotifierWithReturn cpr_reboot_notifier;
>>>  unsigned iommu_type;
>>>  QLIST_HEAD(, VFIOGroup) group_list;
>>>  } VFIOContainer;
>>> diff --git a/migration/ram.c b/migration/ram.c
>>> index 1923366..44ad324 100644
>>> --- a/migration/ram.c
>>> +++ b/migration/ram.c
>>> @@ -2392,8 +2392,8 @@ static void ram_save_cleanup(void *opaque)
>>>  RAMState **rsp = opaque;
>>>  RAMBlock *block;
>>>  
>>> -/* We don't use dirty log with background snapshots */
>>> -if (!migrate_background_snapshot()) {
>>> +/* We don't use dirty log with background snapshots or cpr */
>>> +if (!migrate_background_snapshot() && migrate_mode() == 
>>> MIG_MODE_NORMAL) {
>>
>> Same question here, on what happens if the user resumes the VM before
>> migration completes?  IIUC shared-ram is not required, then it means if
>> that 

Re: [PATCH v2 08/12] target/riscv/vector_helper.c: use vlenb in HELPER(vsetvl)

2024-01-16 Thread Daniel Henrique Barboza




On 1/15/24 19:57, Richard Henderson wrote:

On 1/16/24 09:25, Daniel Henrique Barboza wrote:

Use the new 'vlenb' CPU config to validate fractional LMUL. The original
comparison is done with 'vlen' and 'sew', both in bits. Adjust the shift
to use vlenb.

Signed-off-by: Daniel Henrique Barboza 
---
  target/riscv/vector_helper.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index cb944229b0..9e3ae4b5d3 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -45,9 +45,13 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, target_ulong 
s1,
  xlen - 1 - 
R_VTYPE_RESERVED_SHIFT);
  if (lmul & 4) {
-    /* Fractional LMUL - check LMUL * VLEN >= SEW */
+    /*
+ * Fractional LMUL: check VLEN * LMUL >= SEW,
+ * or VLEN * (8 - lmul) >= SEW. Using VLENB we
+ * need 3 less shifts rights.


The last sentence is structured oddly.  Perhaps

   Using VLENB, we decrease the right shift by 3

or perhaps just show the expansion:

/*
  * Fractional LMUL, check
  *
  *    VLEN * LMUL >= SEW
  *    VLEN >> (8 - lmul) >= sew
  *    (vlenb << 3) >> (8 - lmul) >= sew
  *    vlenb >> (8 - 3 - lmul) >= sew
  */


Just changed the comment to show the expansion. Thanks,


Daniel



Anyway,
Reviewed-by: Richard Henderson 

r~



+ */
  if (lmul == 4 ||
-    cpu->cfg.vlen >> (8 - lmul) < sew) {
+    cpu->cfg.vlenb >> (8 - 3 - lmul) < sew) {
  vill = true;
  }
  }






Re: [PATCH V2 11/11] vfio: allow cpr-reboot migration if suspended

2024-01-16 Thread Steven Sistare
On 1/15/2024 2:33 AM, Peter Xu wrote:
> On Fri, Jan 12, 2024 at 07:05:10AM -0800, Steve Sistare wrote:
>> Allow cpr-reboot for vfio if the guest is in the suspended runstate.  The
>> guest drivers' suspend methods flush outstanding requests and re-initialize
>> the devices, and thus there is no device state to save and restore.  The
>> user is responsible for suspending the guest before initiating cpr, such as
>> by issuing guest-suspend-ram to the qemu guest agent.
>>
>> Relax the vfio blocker so it does not apply to cpr, and add a notifier that
>> verifies the guest is suspended.  Skip dirty page tracking, which is N/A for
>> cpr, to avoid ioctl errors.
>>
>> Signed-off-by: Steve Sistare 
>> ---
>>  hw/vfio/common.c  |  2 +-
>>  hw/vfio/cpr.c | 20 
>>  hw/vfio/migration.c   |  2 +-
>>  include/hw/vfio/vfio-common.h |  1 +
>>  migration/ram.c   |  9 +
>>  5 files changed, 28 insertions(+), 6 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 0b3352f..09af934 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -128,7 +128,7 @@ int vfio_block_multiple_devices_migration(VFIODevice 
>> *vbasedev, Error **errp)
>>  error_setg(_devices_migration_blocker,
>> "Multiple VFIO devices migration is supported only if all of 
>> "
>> "them support P2P migration");
>> -ret = migrate_add_blocker(_devices_migration_blocker, errp);
>> +ret = migrate_add_blocker_normal(_devices_migration_blocker, 
>> errp);
>>  
>>  return ret;
>>  }
>> diff --git a/hw/vfio/cpr.c b/hw/vfio/cpr.c
>> index bbd1c7a..9f4b1fe 100644
>> --- a/hw/vfio/cpr.c
>> +++ b/hw/vfio/cpr.c
>> @@ -7,13 +7,33 @@
>>  
>>  #include "qemu/osdep.h"
>>  #include "hw/vfio/vfio-common.h"
>> +#include "migration/misc.h"
>>  #include "qapi/error.h"
>> +#include "sysemu/runstate.h"
>> +
>> +static int vfio_cpr_reboot_notifier(NotifierWithReturn *notifier,
>> +MigrationEvent *e, Error **errp)
>> +{
>> +if (e->state == MIGRATION_STATUS_SETUP &&
>> +!runstate_check(RUN_STATE_SUSPENDED)) {
>> +
>> +error_setg(errp,
>> +"VFIO device only supports cpr-reboot for runstate suspended");
>> +
>> +return -1;
>> +}
> 
> What happens if the guest is suspended during SETUP, but then quickly waked
> up before CPR migration completes?

That is a management layer bug -- we told them the VM must be suspended.
However, I will reject a wakeup request if migration is active and mode is cpr.

>> +return 0;
>> +}
>>  
>>  int vfio_cpr_register_container(VFIOContainer *container, Error **errp)
>>  {
>> +migration_add_notifier_mode(>cpr_reboot_notifier,
>> +vfio_cpr_reboot_notifier,
>> +MIG_MODE_CPR_REBOOT);
>>  return 0;
>>  }
>>  
>>  void vfio_cpr_unregister_container(VFIOContainer *container)
>>  {
>> +migration_remove_notifier(>cpr_reboot_notifier);
>>  }
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index 534fddf..488905d 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -895,7 +895,7 @@ static int vfio_block_migration(VFIODevice *vbasedev, 
>> Error *err, Error **errp)
>>  vbasedev->migration_blocker = error_copy(err);
>>  error_free(err);
>>  
>> -return migrate_add_blocker(>migration_blocker, errp);
>> +return migrate_add_blocker_normal(>migration_blocker, errp);
>>  }
>>  
>>  /* -- */
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index 1add5b7..7a46e24 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -78,6 +78,7 @@ struct VFIOGroup;
>>  typedef struct VFIOContainer {
>>  VFIOContainerBase bcontainer;
>>  int fd; /* /dev/vfio/vfio, empowered by the attached groups */
>> +NotifierWithReturn cpr_reboot_notifier;
>>  unsigned iommu_type;
>>  QLIST_HEAD(, VFIOGroup) group_list;
>>  } VFIOContainer;
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 1923366..44ad324 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -2392,8 +2392,8 @@ static void ram_save_cleanup(void *opaque)
>>  RAMState **rsp = opaque;
>>  RAMBlock *block;
>>  
>> -/* We don't use dirty log with background snapshots */
>> -if (!migrate_background_snapshot()) {
>> +/* We don't use dirty log with background snapshots or cpr */
>> +if (!migrate_background_snapshot() && migrate_mode() == 
>> MIG_MODE_NORMAL) {
> 
> Same question here, on what happens if the user resumes the VM before
> migration completes?  IIUC shared-ram is not required, then it means if
> that happens the cpr migration image can contain corrupted data, and that
> may be a problem.
> 
> Background snapshot is special in that it relies on totally different
> tracking facilities 

Re: [PATCH V2 04/11] migration: remove migration_in_postcopy parameter

2024-01-16 Thread Steven Sistare
On 1/15/2024 1:48 AM, Peter Xu wrote:
> On Fri, Jan 12, 2024 at 07:05:03AM -0800, Steve Sistare wrote:
>>  bool migration_in_incoming_postcopy(void)
>> diff --git a/ui/spice-core.c b/ui/spice-core.c
>> index b3cd229..e43a93f 100644
>> --- a/ui/spice-core.c
>> +++ b/ui/spice-core.c
>> @@ -580,7 +580,7 @@ static int migration_state_notifier(NotifierWithReturn 
>> *notifier,
>>  if (migration_in_setup(s)) {
>>  spice_server_migrate_start(spice_server);
>>  } else if (migration_has_finished(s) ||
>> -   migration_in_postcopy_after_devices(s)) {
>> +   migration_in_postcopy_after_devices()) {
> 
> This can be a reply also to your other email: my previous suggestion of
> using PRECOPY_DONE should apply here, where we can convert this chunk into:
> 
>   } else if (event == MIG_EVENT_PRECOPY_DONE) {...}
> 
> Because PRECOPY_DONE should also cover the notification from
> postcopy_start(), then we can drop migration_in_postcopy_after_devices()
> completely, I think.

Yes, that works.  I will define and use MIG_EVENT_PRECOPY_DONE and friends in 
"MigrationEvent for notifiers".

- Steve



Re: [PATCH V2 00/11] allow cpr-reboot for vfio

2024-01-16 Thread Steven Sistare
On 1/12/2024 4:38 PM, Alex Williamson wrote:
> On Fri, 12 Jan 2024 07:04:59 -0800
> Steve Sistare  wrote:
> 
>> Allow cpr-reboot for vfio if the guest is in the suspended runstate.  The
>> guest drivers' suspend methods flush outstanding requests and re-initialize
>> the devices, and thus there is no device state to save and restore.  The
>> user is responsible for suspending the guest before initiating cpr, such as
>> by issuing guest-suspend-ram to the qemu guest agent.
>>
>> Most of the patches in this series enhance migration notifiers so they can
>> return an error status and message.  The last two patches register a notifier
>> for vfio that returns an error if the guest is not suspended.
> 
> Hi Steve,
> 
> This appears to only support legacy container mode and not cdev/iommufd
> mode.  Shouldn't this support both?  Thanks,

My bad, thanks! I'll move cpr_reboot_notifier to the base container, and also 
call
cpr register/unregister from iommufd_cdev_attach/iommufd_cdev_detach.

- Steve

>> Steve Sistare (11):
>>   notify: pass error to notifier with return
>>   migration: remove error from notifier data
>>   migration: convert to NotifierWithReturn
>>   migration: remove migration_in_postcopy parameter
>>   migration: MigrationEvent for notifiers
>>   migration: MigrationNotifyFunc
>>   migration: per-mode notifiers
>>   migration: refactor migrate_fd_connect failures
>>   migration: notifier error checking
>>   vfio: register container for cpr
>>   vfio: allow cpr-reboot migration if suspended
>>
>>  hw/net/virtio-net.c|  14 ++---
>>  hw/vfio/common.c   |   2 +-
>>  hw/vfio/container.c|  11 +++-
>>  hw/vfio/cpr.c  |  39 ++
>>  hw/vfio/meson.build|   1 +
>>  hw/vfio/migration.c|  13 +++--
>>  hw/virtio/vhost-user.c |  10 ++--
>>  hw/virtio/virtio-balloon.c |   3 +-
>>  include/hw/vfio/vfio-common.h  |   6 ++-
>>  include/hw/virtio/virtio-net.h |   2 +-
>>  include/migration/misc.h   |  21 +---
>>  include/qemu/notify.h  |   7 ++-
>>  migration/migration.c  | 117 
>> +++--
>>  migration/postcopy-ram.c   |   3 +-
>>  migration/postcopy-ram.h   |   1 -
>>  migration/ram.c|  12 ++---
>>  net/vhost-vdpa.c   |  15 +++---
>>  ui/spice-core.c|  19 +++
>>  util/notify.c  |   5 +-
>>  19 files changed, 206 insertions(+), 95 deletions(-)
>>  create mode 100644 hw/vfio/cpr.c
>>
> 



Re: [PATCH V2 03/11] migration: convert to NotifierWithReturn

2024-01-16 Thread Steven Sistare
On 1/15/2024 1:44 AM, Peter Xu wrote:
> On Fri, Jan 12, 2024 at 07:05:02AM -0800, Steve Sistare wrote:
>> Change all migration notifiers to type NotifierWithReturn, so notifiers
>> can return an error status in a future patch.  For now, pass NULL for the
>> notifier error parameter, and do not check the return value.
>>
>> Signed-off-by: Steve Sistare 
>> ---
>>  hw/net/virtio-net.c|  4 +++-
>>  hw/vfio/migration.c|  4 +++-
>>  include/hw/vfio/vfio-common.h  |  2 +-
>>  include/hw/virtio/virtio-net.h |  2 +-
>>  include/migration/misc.h   |  6 +++---
>>  migration/migration.c  | 16 
>>  net/vhost-vdpa.c   |  6 --
>>  ui/spice-core.c|  8 +---
>>  8 files changed, 28 insertions(+), 20 deletions(-)
>>
>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>> index 7a2846f..9570353 100644
>> --- a/hw/net/virtio-net.c
>> +++ b/hw/net/virtio-net.c
>> @@ -3529,11 +3529,13 @@ static void 
>> virtio_net_handle_migration_primary(VirtIONet *n, MigrationState *s)
>>  }
>>  }
>>  
>> -static void virtio_net_migration_state_notifier(Notifier *notifier, void 
>> *data)
>> +static int virtio_net_migration_state_notifier(NotifierWithReturn *notifier,
>> +   void *data, Error **errp)
>>  {
>>  MigrationState *s = data;
>>  VirtIONet *n = container_of(notifier, VirtIONet, migration_state);
>>  virtio_net_handle_migration_primary(n, s);
>> +return 0;
>>  }
> 
> I should have mentioned this earlier.. we have a trend recently to modify
> retval to boolean when Error** existed, e.g.:
> 
> https://lore.kernel.org/all/20231017202633.296756-5-pet...@redhat.com/
> 
> Let's start using boolean too here?  Previous patches may need touch-ups
> too for this.
> 
> Other than that it all looks good here.  Thanks,

Boolean makes sense when there is only one way to recover from failure.
However, when the notifier may fail in different ways, and recovery differs
for each, then the function should return an int errno.  NotifierWithReturn
could have future uses that need multiple return values and multiple recovery 
paths above the notifier_with_return_list_notify level, so IMO the function 
should continue to return int for generality.

- Steve



Re: [PATCH V2 08/11] migration: refactor migrate_fd_connect failures

2024-01-16 Thread Steven Sistare
On 1/15/2024 2:37 AM, Peter Xu wrote:
> On Fri, Jan 12, 2024 at 07:05:07AM -0800, Steve Sistare wrote:
>> Move common code for the error path in migrate_fd_connect to a shared
>> fail label.  No functional change.
>>
>> Signed-off-by: Steve Sistare 
> 
> Reviewed-by: Peter Xu 
> 
> One nitpick below:
> 
>> ---
>>  migration/migration.c | 22 +++---
>>  1 file changed, 11 insertions(+), 11 deletions(-)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index e9914aa..c828ba7 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -3549,6 +3549,7 @@ void migrate_fd_connect(MigrationState *s, Error 
>> *error_in)
>>  Error *local_err = NULL;
>>  uint64_t rate_limit;
>>  bool resume = s->state == MIGRATION_STATUS_POSTCOPY_PAUSED;
>> +int expect_state = s->state;
> 
> IMHO we can drop this.
> 
>>  
>>  /*
>>   * If there's a previous error, free it and prepare for another one.
>> @@ -3603,11 +3604,7 @@ void migrate_fd_connect(MigrationState *s, Error 
>> *error_in)
>>  if (migrate_postcopy_ram() || migrate_return_path()) {
>>  if (open_return_path_on_source(s)) {
>>  error_setg(_err, "Unable to open return-path for 
>> postcopy");
>> -migrate_set_state(>state, s->state, MIGRATION_STATUS_FAILED);
>> -migrate_set_error(s, local_err);
>> -error_report_err(local_err);
>> -migrate_fd_cleanup(s);
>> -return;
>> +goto fail;
>>  }
>>  }
>>  
>> @@ -3629,12 +3626,8 @@ void migrate_fd_connect(MigrationState *s, Error 
>> *error_in)
>>  }
>>  
>>  if (multifd_save_setup(_err) != 0) {
>> -migrate_set_error(s, local_err);
>> -error_report_err(local_err);
>> -migrate_set_state(>state, MIGRATION_STATUS_SETUP,
>> -  MIGRATION_STATUS_FAILED);
>> -migrate_fd_cleanup(s);
>> -return;
>> +expect_state = MIGRATION_STATUS_SETUP;
> 
> Then drop this.
> 
>> +goto fail;
>>  }
>>  
>>  if (migrate_background_snapshot()) {
>> @@ -3645,6 +3638,13 @@ void migrate_fd_connect(MigrationState *s, Error 
>> *error_in)
>>  migration_thread, s, QEMU_THREAD_JOINABLE);
>>  }
>>  s->migration_thread_running = true;
>> +return;
>> +
>> +fail:
>> +migrate_set_error(s, local_err);
>> +migrate_set_state(>state, expect_state, MIGRATION_STATUS_FAILED);
> 
> Then constantly use s->state.

Yes, if you are OK with it, I also prefer to simplify here.

- Steve

>> +error_report_err(local_err);
>> +migrate_fd_cleanup(s);
>>  }
>>  
>>  static void migration_class_init(ObjectClass *klass, void *data)
>> -- 
>> 1.8.3.1
>>
> 



Re: [RFC PATCH v3 18/30] migration/multifd: Allow receiving pages without packets

2024-01-16 Thread Fabiano Rosas
Peter Xu  writes:

> On Mon, Nov 27, 2023 at 05:26:00PM -0300, Fabiano Rosas wrote:
>> Currently multifd does not need to have knowledge of pages on the
>> receiving side because all the information needed is within the
>> packets that come in the stream.
>> 
>> We're about to add support to fixed-ram migration, which cannot use
>> packets because it expects the ramblock section in the migration file
>> to contain only the guest pages data.
>> 
>> Add a data structure to transfer pages between the ram migration code
>> and the multifd receiving threads.
>> 
>> We don't want to reuse MultiFDPages_t for two reasons:
>> 
>> a) multifd threads don't really need to know about the data they're
>>receiving.
>> 
>> b) the receiving side has to be stopped to load the pages, which means
>>we can experiment with larger granularities than page size when
>>transferring data.
>> 
>> Signed-off-by: Fabiano Rosas 
>> ---
>> - stopped using MultiFDPages_t and added a new structure which can
>>   take offset + size
>> ---
>>  migration/multifd.c | 122 ++--
>>  migration/multifd.h |  20 
>>  2 files changed, 138 insertions(+), 4 deletions(-)
>> 
>> diff --git a/migration/multifd.c b/migration/multifd.c
>> index c1381bdc21..7dfab2367a 100644
>> --- a/migration/multifd.c
>> +++ b/migration/multifd.c
>> @@ -142,17 +142,36 @@ static void nocomp_recv_cleanup(MultiFDRecvParams *p)
>>  static int nocomp_recv_data(MultiFDRecvParams *p, Error **errp)
>>  {
>>  uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
>> +ERRP_GUARD();
>>  
>>  if (flags != MULTIFD_FLAG_NOCOMP) {
>>  error_setg(errp, "multifd %u: flags received %x flags expected %x",
>> p->id, flags, MULTIFD_FLAG_NOCOMP);
>>  return -1;
>>  }
>> -for (int i = 0; i < p->normal_num; i++) {
>> -p->iov[i].iov_base = p->host + p->normal[i];
>> -p->iov[i].iov_len = p->page_size;
>> +
>> +if (!migrate_multifd_packets()) {
>> +MultiFDRecvData *data = p->data;
>> +size_t ret;
>> +
>> +ret = qio_channel_pread(p->c, (char *) data->opaque,
>> +data->size, data->file_offset, errp);
>> +if (ret != data->size) {
>> +error_prepend(errp,
>> +  "multifd recv (%u): read 0x%zx, expected 0x%zx",
>> +  p->id, ret, data->size);
>> +return -1;
>> +}
>> +
>> +return 0;
>> +} else {
>> +for (int i = 0; i < p->normal_num; i++) {
>> +p->iov[i].iov_base = p->host + p->normal[i];
>> +p->iov[i].iov_len = p->page_size;
>> +}
>> +
>> +return qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);
>>  }
>
> I guess you managed to squash the file loads into "no compression" handler
> of multifd, but IMHO it's not as clean.
>
> Firstly, if to do so, we'd better make sure multifd-compression is not
> enabled anywhere together with fixed-ram.  I didn't yet see such protection
> in the series.  I think if it happens we should expect crashes because
> they'll go into zlib/zstd paths for the file.

Yes, we need some checks around this.

>
> IMHO the only model fixed-ram can share with multifd is the task management
> part, mutexes, semaphores, etc..

AFAIU, that's what multifd *is*. Compression would just be another
client of this task management code. This "nocomp" thing always felt off
to me.

> IIRC I used to mention that it'll be nice
> if we have simply a pool of threads so we can enqueue tasks.

Right, I don't disagree. However I don't think we can just show up with
a thread pool and start moving stuff into it. I think the safest way to
do this is to:

1- Adapt multifd so that the client code is the sole responsible for the
   data being sent. No data knowledge by the multifd thread.

   With this, nothing should need to touch multifd threads code
   anymore. New clients just define their methods and prepare the data
   as they please.

2- Move everything that is left into multifd. Zero pages, postcopy, etc.

With 1 and 2 we'd have a pretty good picture of what kinds of operations
we need to do and what are the requirements for the thread
infrastructure.

3- Try to use existing abstractions within QEMU to replace
   multifd. Write from scratch if none are suitable.

What do you think? We could put an action plan in place and start
picking at it. My main concern is about what sorts of hidden assumptions
are present in the current code that we'd start discovering if we just
replaced it with something new.

> If that's too
> far away, would something like below closer to that?  What I'm thinking:
>
>   - patch 1: rename MultiFDMethods to MultiFDCompressMethods, this can
> replace the other patch to do s/recv_pages/recv_data/
>
>   - patch 2: introduce MultiFDMethods (on top of MultiFDCompressMethods),
> refactor the current code to provide the socket 

Re: [PATCH v10 4/9] hw/fsi: Introduce IBM's FSI master

2024-01-16 Thread Ninad Palsule

Hello Cedric,




[ clg: - move FSICFAMState object under FSIMasterState
    - introduced fsi_master_init()
    - reworked fsi_master_realize()
    - dropped FSIBus definition ]


Move the list down before my S-o-b.

Done.



Signed-off-by: Andrew Jeffery 
Reviewed-by: Joel Stanley 


I think you can drop Joel's reviews, the code has changed a lot.

Removed Joel's review tag.



+static void fsi_master_reset(DeviceState *dev)
+{
+    FSIMasterState *s = FSI_MASTER(dev);
+    int i;
+
+    /* Initialize registers */
+    for (i = 0; i < FSI_MASTER_NR_REGS; i++) {
+    s->regs[i] = 0;
+    }


memset(s->regs, 0, sizeof(s->regs));


Replaced for loop with memset.

Thanks for the review.

Regards,

Ninad





Re: [PATCH v10 7/9] hw/fsi: Added qtest

2024-01-16 Thread Ninad Palsule

Hi Cedric,

On 1/12/24 10:02, Cédric Le Goater wrote:

On 1/11/24 00:15, Ninad Palsule wrote:

Added basic qtests for FSI model.

Acked-by: Thomas Huth 
Signed-off-by: Cédric Le Goater 


Please remove mu S-o-b.


Removed.

Thanks for the review.

Regards,

Ninad





Re: [PATCH v10 8/9] hw/fsi: Added FSI documentation

2024-01-16 Thread Ninad Palsule

Hello Cedric,

On 1/12/24 10:02, Cédric Le Goater wrote:

On 1/11/24 00:15, Ninad Palsule wrote:

Documentation for IBM FSI model.

Signed-off-by: Cédric Le Goater 


Please remove mu S-o-b.


Removed.

Thanks for the review.

Regards,

Ninad





Re: [PATCH v10 9/9] hw/fsi: Update MAINTAINER list

2024-01-16 Thread Ninad Palsule

Hello Cedric,

On 1/12/24 10:03, Cédric Le Goater wrote:

On 1/11/24 00:15, Ninad Palsule wrote:

Added maintainer for IBM FSI model

Signed-off-by: Cédric Le Goater 


Please remove my S-o-b.

Removed.




  F: tests/qtest/isl_pmbus_vr-test.c
  +FSI
+M: Ninad Palsule 


and add me as Reviewer.


Added you as a reviewer.

Thanks for the review.

Regards,

Ninad








[PATCH 0/3] monitor: only run coroutine commands in qemu_aio_context

2024-01-16 Thread Stefan Hajnoczi
Several bugs have been reported related to how QMP commands are rescheduled in
qemu_aio_context:
- https://gitlab.com/qemu-project/qemu/-/issues/1933
- https://issues.redhat.com/browse/RHEL-17369
- https://bugzilla.redhat.com/show_bug.cgi?id=2215192
- https://bugzilla.redhat.com/show_bug.cgi?id=2214985

The first instance of the bug interacted with drain_call_rcu() temporarily
dropping the BQL and resulted in vCPU threads entering device emulation code
simultaneously (something that should never happen). I set out to make
drain_call_rcu() safe to use in this environment, but Paolo and Kevin discussed
the possibility of avoiding rescheduling the monitor_qmp_dispatcher_co()
coroutine for non-coroutine commands. This would prevent monitor commands from
running during vCPU thread aio_poll() entirely and addresses the root cause.

This patch series implements this idea. qemu-iotests is sensitive to the exact
order in which QMP events and responses are emitted. Running QMP handlers in
the iohandler AioContext causes some QMP events to be ordered differently than
before. It is therefore necessary to adjust the reference output in many test
cases. The actual QMP code change is small and everything else is just to make
qemu-iotests happy.

If you have bugs related to the same issue, please retest them with these
patches. Thanks!

Stefan Hajnoczi (3):
  iotests: add filter_qmp_generated_node_ids()
  iotests: port 141 to Python for reliable QMP testing
  monitor: only run coroutine commands in qemu_aio_context

 monitor/qmp.c |  17 -
 qapi/qmp-dispatch.c   |  24 +-
 tests/qemu-iotests/060.out|   4 +-
 tests/qemu-iotests/071.out|   4 +-
 tests/qemu-iotests/081.out|  16 +-
 tests/qemu-iotests/087.out|  12 +-
 tests/qemu-iotests/108.out|   2 +-
 tests/qemu-iotests/109|   4 +-
 tests/qemu-iotests/109.out|  78 ++---
 tests/qemu-iotests/117.out|   2 +-
 tests/qemu-iotests/120.out|   2 +-
 tests/qemu-iotests/127.out|   2 +-
 tests/qemu-iotests/140.out|   2 +-
 tests/qemu-iotests/141| 297 +++---
 tests/qemu-iotests/141.out| 190 +++
 tests/qemu-iotests/143.out|   2 +-
 tests/qemu-iotests/156.out|   2 +-
 tests/qemu-iotests/176.out|  16 +-
 tests/qemu-iotests/182.out|   2 +-
 tests/qemu-iotests/183.out|   4 +-
 tests/qemu-iotests/184.out|  32 +-
 tests/qemu-iotests/185|   6 +-
 tests/qemu-iotests/185.out|  45 ++-
 tests/qemu-iotests/191.out|  16 +-
 tests/qemu-iotests/195.out|  16 +-
 tests/qemu-iotests/223.out|  16 +-
 tests/qemu-iotests/227.out|  32 +-
 tests/qemu-iotests/247.out|   2 +-
 tests/qemu-iotests/273.out|   8 +-
 tests/qemu-iotests/308|   4 +-
 tests/qemu-iotests/308.out|   4 +-
 tests/qemu-iotests/iotests.py |   7 +
 tests/qemu-iotests/tests/file-io-error|   5 +-
 tests/qemu-iotests/tests/iothreads-resize.out |   2 +-
 tests/qemu-iotests/tests/qsd-jobs.out |   4 +-
 35 files changed, 375 insertions(+), 506 deletions(-)

-- 
2.43.0




[PATCH 2/3] iotests: port 141 to Python for reliable QMP testing

2024-01-16 Thread Stefan Hajnoczi
The common.qemu bash functions allow tests to interact with the QMP
monitor of a QEMU process. I spent two days trying to update 141 when
the order of the test output changed, but found it would still fail
occassionally because printf() and QMP events race with synchronous QMP
communication.

I gave up and ported 141 to the existing Python API for QMP tests. The
Python API is less affected by the order in which QEMU prints output
because it does not print all QMP traffic by default.

The next commit changes the order in which QMP messages are received.
Make 141 reliable first.

Cc: Hanna Czenczek 
Signed-off-by: Stefan Hajnoczi 
---
 tests/qemu-iotests/141 | 297 +++--
 tests/qemu-iotests/141.out | 190 +---
 2 files changed, 161 insertions(+), 326 deletions(-)

diff --git a/tests/qemu-iotests/141 b/tests/qemu-iotests/141
index a37030ee17..4248d7c696 100755
--- a/tests/qemu-iotests/141
+++ b/tests/qemu-iotests/141
@@ -1,9 +1,12 @@
-#!/usr/bin/env bash
+#!/usr/bin/env python3
 # group: rw auto quick
 #
 # Test case for ejecting BDSs with block jobs still running on them
 #
-# Copyright (C) 2016 Red Hat, Inc.
+# Originally written in bash by Hanna Czenczek, ported to Python by Stefan
+# Hajnoczi.
+#
+# Copyright Red Hat
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -19,177 +22,119 @@
 # along with this program.  If not, see .
 #
 
-# creator
-owner=hre...@redhat.com
-
-seq="$(basename $0)"
-echo "QA output created by $seq"
-
-status=1   # failure is the default!
-
-_cleanup()
-{
-_cleanup_qemu
-_cleanup_test_img
-for img in "$TEST_DIR"/{b,m,o}.$IMGFMT; do
-_rm_test_img "$img"
-done
-}
-trap "_cleanup; exit \$status" 0 1 2 3 15
-
-# get standard environment, filters and checks
-. ./common.rc
-. ./common.filter
-. ./common.qemu
-
-# Needs backing file and backing format support
-_supported_fmt qcow2 qed
-_supported_proto file
-_supported_os Linux
-
-
-test_blockjob()
-{
-_send_qemu_cmd $QEMU_HANDLE \
-"{'execute': 'blockdev-add',
-  'arguments': {
-  'node-name': 'drv0',
-  'driver': '$IMGFMT',
-  'file': {
-  'driver': 'file',
-  'filename': '$TEST_IMG'
-  }}}" \
-'return'
-
-# If "$2" is an event, we may or may not see it before the
-# {"return": {}}.  Therefore, filter the {"return": {}} out both
-# here and in the next command.  (Naturally, if we do not see it
-# here, we will see it before the next command can be executed,
-# so it will appear in the next _send_qemu_cmd's output.)
-_send_qemu_cmd $QEMU_HANDLE \
-"$1" \
-"$2" \
-| _filter_img_create | _filter_qmp_empty_return
-
-# We want this to return an error because the block job is still running
-_send_qemu_cmd $QEMU_HANDLE \
-"{'execute': 'blockdev-del',
-  'arguments': {'node-name': 'drv0'}}" \
-'error' | _filter_generated_node_ids | _filter_qmp_empty_return
-
-_send_qemu_cmd $QEMU_HANDLE \
-"{'execute': 'block-job-cancel',
-  'arguments': {'device': 'job0'}}" \
-"$3"
-
-_send_qemu_cmd $QEMU_HANDLE \
-"{'execute': 'blockdev-del',
-  'arguments': {'node-name': 'drv0'}}" \
-'return'
-}
-
-
-TEST_IMG="$TEST_DIR/b.$IMGFMT" _make_test_img 1M
-TEST_IMG="$TEST_DIR/m.$IMGFMT" _make_test_img -b "$TEST_DIR/b.$IMGFMT" -F 
$IMGFMT 1M
-_make_test_img -b "$TEST_DIR/m.$IMGFMT" 1M -F $IMGFMT
-
-_launch_qemu -nodefaults
-
-_send_qemu_cmd $QEMU_HANDLE \
-"{'execute': 'qmp_capabilities'}" \
-'return'
-
-echo
-echo '=== Testing drive-backup ==='
-echo
-
-# drive-backup will not send BLOCK_JOB_READY by itself, and cancelling the job
-# will consequently result in BLOCK_JOB_CANCELLED being emitted.
-
-test_blockjob \
-"{'execute': 'drive-backup',
-  'arguments': {'job-id': 'job0',
-'device': 'drv0',
-'target': '$TEST_DIR/o.$IMGFMT',
-'format': '$IMGFMT',
-'sync': 'none'}}" \
-'return' \
-'"status": "null"'
-
-echo
-echo '=== Testing drive-mirror ==='
-echo
-
-# drive-mirror will send BLOCK_JOB_READY basically immediately, and cancelling
-# the job will consequently result in BLOCK_JOB_COMPLETED being emitted.
-
-test_blockjob \
-"{'execute': 'drive-mirror',
-  'arguments': {'job-id': 'job0',
-'device': 'drv0',
-'target': '$TEST_DIR/o.$IMGFMT',
-'format': '$IMGFMT',
-'sync': 'none'}}" \
-'BLOCK_JOB_READY' \
-'"status": "null"'
-
-echo
-echo '=== Testing active block-commit ==='
-echo
-
-# An active block-commit will send BLOCK_JOB_READY basically immediately, and
-# cancelling the job will consequently result in 

[PATCH 3/3] monitor: only run coroutine commands in qemu_aio_context

2024-01-16 Thread Stefan Hajnoczi
monitor_qmp_dispatcher_co() runs in the iohandler AioContext that is not
polled during nested event loops. The coroutine currently reschedules
itself in the main loop's qemu_aio_context AioContext, which is polled
during nested event loops. One known problem is that QMP device-add
calls drain_call_rcu(), which temporarily drops the BQL, leading to all
sorts of havoc like other vCPU threads re-entering device emulation code
while another vCPU thread is waiting in device emulation code with
aio_poll().

Paolo Bonzini suggested running non-coroutine QMP handlers in the
iohandler AioContext. This avoids trouble with nested event loops. His
original idea was to move coroutine rescheduling to
monitor_qmp_dispatch(), but I resorted to moving it to qmp_dispatch()
because we don't know if the QMP handler needs to run in coroutine
context in monitor_qmp_dispatch(). monitor_qmp_dispatch() would have
been nicer since it's associated with the monitor implementation and not
as general as qmp_dispatch(), which is also used by qemu-ga.

A number of qemu-iotests need updated .out files because the order of
QMP events vs QMP responses has changed.

Solves Issue #1933.

Fixes: 7bed89958bfbf40df9ca681cefbdca63abdde39d ("device_core: use 
drain_call_rcu in in qmp_device_add")
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215192
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2214985
Buglink: https://issues.redhat.com/browse/RHEL-17369
Signed-off-by: Stefan Hajnoczi 
---
 monitor/qmp.c | 17 
 qapi/qmp-dispatch.c   | 24 +-
 tests/qemu-iotests/060.out|  4 +-
 tests/qemu-iotests/071.out|  4 +-
 tests/qemu-iotests/081.out| 16 ++--
 tests/qemu-iotests/087.out| 12 +--
 tests/qemu-iotests/108.out|  2 +-
 tests/qemu-iotests/109|  4 +-
 tests/qemu-iotests/109.out| 78 ---
 tests/qemu-iotests/117.out|  2 +-
 tests/qemu-iotests/120.out|  2 +-
 tests/qemu-iotests/127.out|  2 +-
 tests/qemu-iotests/140.out|  2 +-
 tests/qemu-iotests/143.out|  2 +-
 tests/qemu-iotests/156.out|  2 +-
 tests/qemu-iotests/176.out| 16 ++--
 tests/qemu-iotests/182.out|  2 +-
 tests/qemu-iotests/183.out|  4 +-
 tests/qemu-iotests/184.out| 32 
 tests/qemu-iotests/185|  6 +-
 tests/qemu-iotests/185.out| 45 +--
 tests/qemu-iotests/191.out| 16 ++--
 tests/qemu-iotests/195.out| 16 ++--
 tests/qemu-iotests/223.out| 16 ++--
 tests/qemu-iotests/227.out| 32 
 tests/qemu-iotests/247.out|  2 +-
 tests/qemu-iotests/273.out|  8 +-
 tests/qemu-iotests/308|  4 +-
 tests/qemu-iotests/308.out|  4 +-
 tests/qemu-iotests/tests/file-io-error|  5 +-
 tests/qemu-iotests/tests/iothreads-resize.out |  2 +-
 tests/qemu-iotests/tests/qsd-jobs.out |  4 +-
 32 files changed, 207 insertions(+), 180 deletions(-)

diff --git a/monitor/qmp.c b/monitor/qmp.c
index 6eee450fe4..a239945e8d 100644
--- a/monitor/qmp.c
+++ b/monitor/qmp.c
@@ -321,14 +321,6 @@ void coroutine_fn monitor_qmp_dispatcher_co(void *data)
 qemu_coroutine_yield();
 }
 
-/*
- * Move the coroutine from iohandler_ctx to qemu_aio_context for
- * executing the command handler so that it can make progress if it
- * involves an AIO_WAIT_WHILE().
- */
-aio_co_schedule(qemu_get_aio_context(), qmp_dispatcher_co);
-qemu_coroutine_yield();
-
 /* Process request */
 if (req_obj->req) {
 if (trace_event_get_state(TRACE_MONITOR_QMP_CMD_IN_BAND)) {
@@ -355,15 +347,6 @@ void coroutine_fn monitor_qmp_dispatcher_co(void *data)
 }
 
 qmp_request_free(req_obj);
-
-/*
- * Yield and reschedule so the main loop stays responsive.
- *
- * Move back to iohandler_ctx so that nested event loops for
- * qemu_aio_context don't start new monitor commands.
- */
-aio_co_schedule(iohandler_get_aio_context(), qmp_dispatcher_co);
-qemu_coroutine_yield();
 }
 qatomic_set(_dispatcher_co, NULL);
 }
diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
index 28b6bb..176b549473 100644
--- a/qapi/qmp-dispatch.c
+++ b/qapi/qmp-dispatch.c
@@ -206,9 +206,31 @@ QDict *coroutine_mixed_fn qmp_dispatch(const 
QmpCommandList *cmds, QObject *requ
 assert(!(oob && qemu_in_coroutine()));
 assert(monitor_cur() == NULL);
 if (!!(cmd->options & QCO_COROUTINE) == qemu_in_coroutine()) {
+if 

[PATCH 1/3] iotests: add filter_qmp_generated_node_ids()

2024-01-16 Thread Stefan Hajnoczi
Add a filter function for QMP responses that contain QEMU's
automatically generated node ids. The ids change between runs and must
be masked in the reference output.

The next commit will use this new function.

Signed-off-by: Stefan Hajnoczi 
---
 tests/qemu-iotests/iotests.py | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index e5c5798c71..ea48af4a7b 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -651,6 +651,13 @@ def _filter(_key, value):
 def filter_generated_node_ids(msg):
 return re.sub("#block[0-9]+", "NODE_NAME", msg)
 
+def filter_qmp_generated_node_ids(qmsg):
+def _filter(_key, value):
+if is_str(value):
+return filter_generated_node_ids(value)
+return value
+return filter_qmp(qmsg, _filter)
+
 def filter_img_info(output: str, filename: str,
 drop_child_info: bool = True) -> str:
 lines = []
-- 
2.43.0




Re: [PATCH] string-output-visitor: Fix (pseudo) struct handling

2024-01-16 Thread Stefan Hajnoczi
On Tue, Jan 09, 2024 at 07:17:17PM +0100, Kevin Wolf wrote:
> Commit ff32bb53 tried to get minimal struct support into the string
> output visitor by just making it return "". Unfortunately, it
> forgot that the caller will still make more visitor calls for the
> content of the struct.
> 
> If the struct is contained in a list, such as IOThreadVirtQueueMapping,
> in the better case its fields show up as separate list entries. In the
> worse case, it contains another list, and the string output visitor
> doesn't support nested lists and asserts that this doesn't happen. So as
> soon as the optional "vqs" field in IOThreadVirtQueueMapping is
> specified, we get a crash.
> 
> This can be reproduced with the following command line:
> 
>   echo "info qtree" | ./qemu-system-x86_64 \
> -object iothread,id=t0 \
> -blockdev null-co,node-name=disk \
> -device '{"driver": "virtio-blk-pci", "drive": "disk",
>   "iothread-vq-mapping": [{"iothread": "t0", "vqs": [0]}]}' \
> -monitor stdio
> 
> Fix the problem by counting the nesting level of structs and ignoring
> any visitor calls for values (apart from start/end_struct) while we're
> not on the top level.
> 
> Fixes: ff32bb53476539d352653f4ed56372dced73a388
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2069
> Reported-by: Aihua Liang 
> Signed-off-by: Kevin Wolf 
> ---
>  qapi/string-output-visitor.c | 46 
>  1 file changed, 46 insertions(+)

Thanks for getting to this before I could:

Reviewed-by: Stefan Hajnoczi 

> diff --git a/qapi/string-output-visitor.c b/qapi/string-output-visitor.c
> index f0c1dea89e..5115536b15 100644
> --- a/qapi/string-output-visitor.c
> +++ b/qapi/string-output-visitor.c
> @@ -65,6 +65,7 @@ struct StringOutputVisitor
>  } range_start, range_end;
>  GList *ranges;
>  void *list; /* Only needed for sanity checking the caller */
> +unsigned int struct_nesting;
>  };
>  
>  static StringOutputVisitor *to_sov(Visitor *v)
> @@ -144,6 +145,10 @@ static bool print_type_int64(Visitor *v, const char 
> *name, int64_t *obj,
>  StringOutputVisitor *sov = to_sov(v);
>  GList *l;
>  
> +if (sov->struct_nesting) {
> +return true;
> +}
> +
>  switch (sov->list_mode) {
>  case LM_NONE:
>  string_output_append(sov, *obj);
> @@ -231,6 +236,10 @@ static bool print_type_size(Visitor *v, const char 
> *name, uint64_t *obj,
>  uint64_t val;
>  char *out, *psize;
>  
> +if (sov->struct_nesting) {
> +return true;
> +}
> +
>  if (!sov->human) {
>  out = g_strdup_printf("%"PRIu64, *obj);
>  string_output_set(sov, out);
> @@ -250,6 +259,11 @@ static bool print_type_bool(Visitor *v, const char 
> *name, bool *obj,
>  Error **errp)
>  {
>  StringOutputVisitor *sov = to_sov(v);
> +
> +if (sov->struct_nesting) {
> +return true;
> +}
> +
>  string_output_set(sov, g_strdup(*obj ? "true" : "false"));
>  return true;
>  }
> @@ -260,6 +274,10 @@ static bool print_type_str(Visitor *v, const char *name, 
> char **obj,
>  StringOutputVisitor *sov = to_sov(v);
>  char *out;
>  
> +if (sov->struct_nesting) {
> +return true;
> +}
> +
>  if (sov->human) {
>  out = *obj ? g_strdup_printf("\"%s\"", *obj) : g_strdup("");
>  } else {
> @@ -273,6 +291,11 @@ static bool print_type_number(Visitor *v, const char 
> *name, double *obj,
>Error **errp)
>  {
>  StringOutputVisitor *sov = to_sov(v);
> +
> +if (sov->struct_nesting) {
> +return true;
> +}
> +
>  string_output_set(sov, g_strdup_printf("%.17g", *obj));
>  return true;
>  }
> @@ -283,6 +306,10 @@ static bool print_type_null(Visitor *v, const char 
> *name, QNull **obj,
>  StringOutputVisitor *sov = to_sov(v);
>  char *out;
>  
> +if (sov->struct_nesting) {
> +return true;
> +}
> +
>  if (sov->human) {
>  out = g_strdup("");
>  } else {
> @@ -295,6 +322,9 @@ static bool print_type_null(Visitor *v, const char *name, 
> QNull **obj,
>  static bool start_struct(Visitor *v, const char *name, void **obj,
>   size_t size, Error **errp)
>  {
> +StringOutputVisitor *sov = to_sov(v);
> +
> +sov->struct_nesting++;
>  return true;
>  }
>  
> @@ -302,6 +332,10 @@ static void end_struct(Visitor *v, void **obj)
>  {
>  StringOutputVisitor *sov = to_sov(v);
>  
> +if (--sov->struct_nesting) {
> +return;
> +}
> +
>  /* TODO actually print struct fields */
>  string_output_set(sov, g_strdup(""));
>  }
> @@ -312,6 +346,10 @@ start_list(Visitor *v, const char *name, GenericList 
> **list, size_t size,
>  {
>  StringOutputVisitor *sov = to_sov(v);
>  
> +if (sov->struct_nesting) {
> +return true;
> +}
> +
>  /* we can't traverse a list in a list */
>  assert(sov->list_mode == LM_NONE);
>  /* We 

Re: [RFC PATCH v3 15/30] io: Add a pwritev/preadv version that takes a discontiguous iovec

2024-01-16 Thread Fabiano Rosas
Peter Xu  writes:

> On Mon, Nov 27, 2023 at 05:25:57PM -0300, Fabiano Rosas wrote:
>> For the upcoming support to fixed-ram migration with multifd, we need
>> to be able to accept an iovec array with non-contiguous data.
>> 
>> Add a pwritev and preadv version that splits the array into contiguous
>> segments before writing. With that we can have the ram code continue
>> to add pages in any order and the multifd code continue to send large
>> arrays for reading and writing.
>> 
>> Signed-off-by: Fabiano Rosas 
>> ---
>> - split the API that was merged into a single function
>> - use uintptr_t for compatibility with 32-bit
>> ---
>>  include/io/channel.h | 26 
>>  io/channel.c | 70 
>>  2 files changed, 96 insertions(+)
>> 
>> diff --git a/include/io/channel.h b/include/io/channel.h
>> index 7986c49c71..25383db5aa 100644
>> --- a/include/io/channel.h
>> +++ b/include/io/channel.h
>> @@ -559,6 +559,19 @@ int qio_channel_close(QIOChannel *ioc,
>>  ssize_t qio_channel_pwritev(QIOChannel *ioc, const struct iovec *iov,
>>  size_t niov, off_t offset, Error **errp);
>>  
>> +/**
>> + * qio_channel_pwritev_all:
>> + * @ioc: the channel object
>> + * @iov: the array of memory regions to write data from
>> + * @niov: the length of the @iov array
>> + * @offset: the iovec offset in the file where to write the data
>> + * @errp: pointer to a NULL-initialized error object
>> + *
>> + * Returns: 0 if all bytes were written, or -1 on error
>> + */
>> +int qio_channel_pwritev_all(QIOChannel *ioc, const struct iovec *iov,
>> +size_t niov, off_t offset, Error **errp);
>> +
>>  /**
>>   * qio_channel_pwrite
>>   * @ioc: the channel object
>> @@ -595,6 +608,19 @@ ssize_t qio_channel_pwrite(QIOChannel *ioc, char *buf, 
>> size_t buflen,
>>  ssize_t qio_channel_preadv(QIOChannel *ioc, const struct iovec *iov,
>> size_t niov, off_t offset, Error **errp);
>>  
>> +/**
>> + * qio_channel_preadv_all:
>> + * @ioc: the channel object
>> + * @iov: the array of memory regions to read data to
>> + * @niov: the length of the @iov array
>> + * @offset: the iovec offset in the file from where to read the data
>> + * @errp: pointer to a NULL-initialized error object
>> + *
>> + * Returns: 0 if all bytes were read, or -1 on error
>> + */
>> +int qio_channel_preadv_all(QIOChannel *ioc, const struct iovec *iov,
>> +   size_t niov, off_t offset, Error **errp);
>> +
>>  /**
>>   * qio_channel_pread
>>   * @ioc: the channel object
>> diff --git a/io/channel.c b/io/channel.c
>> index a1f12f8e90..2f1745d052 100644
>> --- a/io/channel.c
>> +++ b/io/channel.c
>> @@ -472,6 +472,69 @@ ssize_t qio_channel_pwritev(QIOChannel *ioc, const 
>> struct iovec *iov,
>>  return klass->io_pwritev(ioc, iov, niov, offset, errp);
>>  }
>>  
>> +static int qio_channel_preadv_pwritev_contiguous(QIOChannel *ioc,
>> + const struct iovec *iov,
>> + size_t niov, off_t offset,
>> + bool is_write, Error 
>> **errp)
>> +{
>> +ssize_t ret = -1;
>> +int i, slice_idx, slice_num;
>> +uintptr_t base, next, file_offset;
>> +size_t len;
>> +
>> +slice_idx = 0;
>> +slice_num = 1;
>> +
>> +/*
>> + * If the iov array doesn't have contiguous elements, we need to
>> + * split it in slices because we only have one (file) 'offset' for
>> + * the whole iov. Do this here so callers don't need to break the
>> + * iov array themselves.
>> + */
>> +for (i = 0; i < niov; i++, slice_num++) {
>> +base = (uintptr_t) iov[i].iov_base;
>> +
>> +if (i != niov - 1) {
>> +len = iov[i].iov_len;
>> +next = (uintptr_t) iov[i + 1].iov_base;
>> +
>> +if (base + len == next) {
>> +continue;
>> +}
>> +}
>> +
>> +/*
>> + * Use the offset of the first element of the segment that
>> + * we're sending.
>> + */
>> +file_offset = offset + (uintptr_t) iov[slice_idx].iov_base;
>> +
>> +if (is_write) {
>> +ret = qio_channel_pwritev(ioc, [slice_idx], slice_num,
>> +  file_offset, errp);
>> +} else {
>> +ret = qio_channel_preadv(ioc, [slice_idx], slice_num,
>> + file_offset, errp);
>> +}
>> +
>> +if (ret < 0) {
>> +break;
>> +}
>> +
>> +slice_idx += slice_num;
>> +slice_num = 0;
>> +}
>> +
>> +return (ret < 0) ? -1 : 0;
>> +}
>> +
>> +int qio_channel_pwritev_all(QIOChannel *ioc, const struct iovec *iov,
>> +size_t niov, off_t offset, Error **errp)
>> +{
>> +return qio_channel_preadv_pwritev_contiguous(ioc, iov, niov,
>> + 

Re: [PATCH] tests/unit/test-qga: do not qualify executable paths

2024-01-16 Thread Konstantin Kostiuk
Reviewed-by: Konstantin Kostiuk 

On Wed, Jan 3, 2024 at 7:34 PM Philippe Mathieu-Daudé 
wrote:

> On 3/1/24 17:51, Samuel Tardieu wrote:
> > guest-exec invocation does not need the full path of the executable to
> > execute. Using only the command names ensures correct execution of the
> > test on systems not adhering to the FHS.
> >
> > Signed-off-by: Samuel Tardieu 
> > ---
> >   tests/unit/test-qga.c | 6 +++---
> >   1 file changed, 3 insertions(+), 3 deletions(-)
>
> Reviewed-by: Philippe Mathieu-Daudé 
>
>
>


Re: Qemu setting "-cpu host" seems broken with Windows vms

2024-01-16 Thread Paolo Bonzini
On Fri, Dec 29, 2023 at 2:10 PM Stefan Hajnoczi  wrote:
> > First, performance: since some years ago, since prior to qemu 6.2 until
> > latest 8.2, win10 and win11 vms always worked slower than expected. This
> > could be noticed by comparing booting/starting times between vm and a
> > bare metal installation, but I particularly measured it when installing
> > windows cumulative updates through windows update. On vm, from
> > downloading to finishing rebooting it always took 1.5 circa 1.5 hours,
> > while just 40 minutes on bare metal.

One possibility is that you have Hyper-V enabled with -cpu host but
not with other CPU models. That's because "-cpu host" enables nested
virtualization.

Try "-cpu host,-vmx" and it should be clear if that's the case.

Based on the pastie that you prepared, that's the main difference
between -cpu host and -cpu Broadwell-noTSX-IBRS. Nothing else (see
list below) should have any substantial performance impact; even less
so should they make things worse.

Paolo

   "avx512-vp2intersect": true,
   "avx512-vpopcntdq": true,
   "avx512bitalg": true,
   "avx512bw": true,
   "avx512cd": true,
   "avx512dq": true,
   "avx512f": true,
   "avx512ifma": true,
   "avx512vbmi": true,
   "avx512vbmi2": true,
   "avx512vl": true,
   "avx512vnni": true,
   "full-width-write": true,
   "gfni": true,
   "vaes": true,
   "vpclmulqdq": true,

   "clflushopt": true,
   "clwb": true,

   "fsrm": true,

   "host-cache-info": false,
   "host-phys-bits": true,

   "amd-ssbd": true,
   "amd-stibp": true,
   "arch-capabilities": true,
   "ibpb": true,
   "ibrs": true,
   "ibrs-all": true,
   "ssbd": true,
   "stibp": true,

   "kvm-pv-ipi": true,
   "kvm-pv-sched-yield": true,
   "kvm-pv-tlb-flush": true,
   "kvm-pv-unhalt": true,

   "lmce": true,
   "md-clear": true,
   "mds-no": true,
   "movdir64b": true,
   "movdiri": true,
   "pdcm": true,
   "pdpe1gb": true,

   "pdcm": false,
   "pdpe1gb": false,
   "pku": true,
   "pmu": true,
   "pschange-mc-no": true,
   "rdctl-no": true,
   "rdpid": true,
   "sha-ni": true,
   "ss": true,
   "tsc-adjust": true,
   "umip": true,
   "vmx": true,
   "xgetbv1": true,
   "xsavec": true,
   "xsaves": true,

(skipped everything vmx-related, since they don't matter with vmx
itself being false)




Re: [PATCH v2 09/12] target/riscv/cpu.h: use 'vlenb' in vext_get_vlmax()

2024-01-16 Thread Daniel Henrique Barboza




On 1/15/24 20:14, Richard Henderson wrote:

On 1/16/24 09:25, Daniel Henrique Barboza wrote:

Signed-off-by: Daniel Henrique Barboza 
---
  target/riscv/cpu.h | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 11df226a00..7304e478c2 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -692,7 +692,11 @@ static inline uint32_t vext_get_vlmax(RISCVCPU *cpu, 
target_ulong vtype)
  {
  uint8_t sew = FIELD_EX64(vtype, VTYPE, VSEW);
  int8_t lmul = sextract32(FIELD_EX64(vtype, VTYPE, VLMUL), 0, 3);
-    return cpu->cfg.vlen >> (sew + 3 - lmul);
+    /*
+ * vlmax = vlen >> (sew + 3 - lmul). With vlenb,
+ * 3 less shifts: vlenb >> (sew + 3 - 3 - lmul)
+ */
+    return cpu->cfg.vlenb >> (sew - lmul);
  }


I take it back -- this doesn't work without the + 3:

   sew = 0
   lmul = 3

    vlenb >> (0 - 3)
  = vlenb >> -3

Need

   vlen = vlenb << 3
   vlmax = vlen >> (0 + 3 - 3)
     = vlen >> 0
     = vlen


I got tricked by the 'sew' variable name there. This formula is using vsew.

I'll fix and re-send. Thanks,


Daniel





r~






Re: Possible race condition in aspeed ast2600 smp boot on TCG QEMU

2024-01-16 Thread Stephen Longfield
On Mon, Jan 15, 2024 at 12:36 AM Troy Lee  wrote:
>
> Hi Stephen and Cedric,
>
> This issue haven't been found in real platform but sometime happens in
> emulator, e.g. Simic.

Do you have a workaround that you use in other simulators that we
could also try using in QEMU?

> > Adding Aspeed Engineers. This reminds me of a discussion a while ago.
> >
> > On 1/11/24 18:38, Stephen Longfield wrote:
> > > We’ve noticed inconsistent behavior when running a large number of aspeed
> > ast2600 executions, that seems to be tied to a race condition in the smp 
> > boot
> > when executing on TCG-QEMU, and were wondering what a good mediation
> > strategy might be.
> > >
> > > The problem first shows up as part of SMP boot. On a run that’s likely to
> > later run into issues, we’ll see something like:
> > >
> > > ```
> > > [0.008350] smp: Bringing up secondary CPUs ...
> > > [1.168584] CPU1: failed to come online [1.187277] smp: Brought
> > > up 1 node, 1 CPU ```
> > >
> > > Compared to the more likely to succeed:
> > >
> > > ```
> > > [0.080313] smp: Bringing up secondary CPUs ...
> > > [0.093166] smp: Brought up 1 node, 2 CPUs [0.093345] SMP:
> > > Total of 2 processors activated (4800.00 BogoMIPS).
> > > ```
> > >
> > > It’s somewhat reliably reproducible by running the ast2600-evb with an
> > OpenBMC image, using ‘-icount auto’ to slow execution and make the race
> > condition more frequent (it happens without this, just easier to debug if we
> > can reproduce):
> > >
> > >
> > > ```
> > > ./aarch64-softmmu/qemu-system-aarch64 -machine ast2600-evb -
> > nographic
> > > -drive
> > > file=~/bmc-bin/image-obmc-ast2600,if=mtd,bus=0,unit=0,snapshot=on -nic
> > > user -icount auto ```
>
> Have you try to run qemu with "-smp 2"?

`-smp 2` lowers the probability, but does not get rid of the race.

> > >
> > > Our current hypothesis is that the problem comes up in the platform
> > uboot.  As part of the boot, the secondary core waits for the smp mailbox to
> > get a magic number written by the primary core:
> > >
> > > https://github.com/AspeedTech-BMC/u-boot/blob/aspeed-master-
> > v2019.04/a
> > > rch/arm/mach-aspeed/ast2600/platform.S#L168
> > >  > v2019.04/
> > > arch/arm/mach-aspeed/ast2600/platform.S#L168>
> > >
> > > However, this memory address is cleared on boot:
> > >
> > > https://github.com/AspeedTech-BMC/u-boot/blob/aspeed-master-
> > v2019.04/a
> > > rch/arm/mach-aspeed/ast2600/platform.S#L146
> > >  > v2019.04/
> > > arch/arm/mach-aspeed/ast2600/platform.S#L146>
> > >
> > > The race condition occurs if the primary core runs far ahead of the 
> > > secondary
> > core: if the primary core gets to the point where it signals the secondary 
> > core’s
> > mailbox before the secondary core gets past the point where it does the 
> > initial
> > reset and starts waiting, the reset will clear the signal, and then the 
> > secondary
> > core will never get past the point where it’s looping in
> > `poll_smp_mbox_ready`.
> > >
> > > We’ve observed this race happening by dumping all SCU reads and writes,
> > and validated that this is the problem by using a modified `platform.S` that
> > doesn’t clear the =SCU_SMP_READY mailbox on reset, but would rather not
> > have to use a modified version of SMP boot just for QEMU-TCG execution.
>
> To prevent the race condition described, SCU188 zeroization is conducted
> as early as possible by both CPU#0 and CPU#1. After that, there are at
> least 100 instructions for CPU#0 to execute before it get the chance to
> set SCU188 to 0xbabecafe. For real, parallel HW, it is unusual that CPU#1
> will be slower than CPU#0 by 100 instruction cycles.
>

This doesn't really prevent the race, it just lowers the probability
of it causing problems.

If in hardware, the cores begin execution at the same time, then in
normal execution of Platform.S, the SCU188 zeroing will happen before
the scu_unlock macro runs. If I understand the SCU lock correctly:
that would mean that the store-zero to SCU188 normally has no effect
(since the registers are read only at that time), and only really
participates in this race. Since you mentioned seeing this same race
in other simulation environments, have you considered removing that
write?

> >
> > you could use '-trace aspeed_scu*' to collect the MMIO accesses on the SCU
> > unit. A TCG plugin also.
> >
> > > Is there a way to have QEMU insert a barrier synchronization at some point
> > in the bootloader?  I think getting both cores past the =SCU_SMP_READY reset
> > would get rid of this race, but I’m not aware of a way to do that kind of 
> > thing
> > in QEMU-TCG.
> > >
> > > Thanks for any insights!
> >
> > Could we change the default value to registers 0x180 ... 0x18C in
> > hw/misc/aspeed_scu.c to make sure the SMP regs are immune to the race ?

I don't believe that changing the default values would help. 

Re: [PATCH] Fixed '-serial none' usage breaks following '-serial ...' usage

2024-01-16 Thread Daniel P . Berrangé
On Mon, Jan 15, 2024 at 04:14:30PM +, Peter Maydell wrote:
> (I've cc'd a few people who might have opinions on possible
> command-line compatibility breakage.)
> 
> On Wed, 10 Jan 2024 at 14:38, Bohdan Kostiv  wrote:
> >
> > Hello,
> >
> > I have faced an issue in using serial ports when I need to skip a couple of 
> > ports in the CLI.
> >
> > For example the ARM machine netduinoplus2 supports up to 7 UARTS.
> > Following case works (the first UART is used to send data in the firmware):
> > qemu-system-arm -machine netduinoplus2 -nographic -serial mon:stdio -kernel 
> > path-to-fw/firmware.elf
> > But this one doesn't  (the third UART is used to send data in the firmware):
> > qemu-system-arm -machine netduinoplus2 -nographic -serial none -serial none 
> > -serial mon:stdio -kernel path-to-fw/firmware.elf
> 
> Putting the patch inline for more convenient discussion:
> 
> > Subject: [PATCH] Fixed '-serial none' usage breaks following '-serial ...' 
> > usage
> >
> > Signed-off-by: Bohdan Kostiv 
> > ---
> >  system/vl.c | 5 -
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/system/vl.c b/system/vl.c
> > index 2bcd9efb9a..b8744475cd 100644
> > --- a/system/vl.c
> > +++ b/system/vl.c
> > @@ -1442,8 +1442,11 @@ static int serial_parse(const char *devname)
> >  int index = num_serial_hds;
> >  char label[32];
> >
> > -if (strcmp(devname, "none") == 0)
> > +if (strcmp(devname, "none") == 0) {
> > +num_serial_hds++;
> >  return 0;
> > +}
> > +
> >  snprintf(label, sizeof(label), "serial%d", index);
> >  serial_hds = g_renew(Chardev *, serial_hds, index + 1);
> >
> > --
> > 2.39.3 (Apple Git-145)
> 
> I agree that it's the right thing to do -- '-serial none
> -serial foo' ought to set serial_hds(0) as 'none' and
> serial_hds(1) as 'foo'.
> 
> My only concern here is that this is a very very
> longstanding bug -- as far as I can see it was
> introduced in commit 998bbd74b9d81 in 2009. So I am
> a little worried that maybe some existing command lines
> accidentally rely on the current behaviour.
> 
> I think the current behaviour is:
> 
>  * "-serial none -serial something" is the same as
>"-serial something"
>  * "-serial none" on its own disables the default serial
>device (the docs say it will "disable all serial ports"
>but I don't think that is correct...)
> which amounts to "the only effectively useful use of
> '-serial none' is to disable the default serial device"
> 
> and if we apply this patch:
>  * "-serial none -serial something" has the sensible behaviour
>of "first serial port not connected/present, second serial
>port exists" (which of those you get depends on the machine
>model)
>  * "-serial none" on its own has no behaviour change
> 
> So I think the only affected users would be anybody who
> accidentally had an extra "-serial none" in their command
> line that was previously being overridden by a later
> "-serial" option. That doesn't seem very likely to me,
> so I think I'd be in favour of making this change and
> having something in the release notes about it.

If we don't apply this patch, then the valid use case of the reporter
here is impossible to achieve. We would have to invent a new syntax
to enable what 'serial none -serial something' should have already
been doing.  There is nothing users can do without us applying a fix
of some kind, either as proposed in the patch here, or something that
is functionally identical with different cli.


If we do apply this patch and someone was (mistakenly) relying on
'-serial none -serial something' being functionally equivalent to
'-serial something', they have an easy fix. They can just remove
the redundant '-serial none' they have and this works with any
QEMU they might see, whether ancient, current or future.


On balance, I think it is the right tradeoff to apply the proposed
patch, and accept the small risk of breaking someone who was
mistakenly relying on the broken behaviour, since the impact to
those people should be small.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH] target/arm: Fix VNCR fault detection logic

2024-01-16 Thread Jonathan Cameron via
On Tue, 16 Jan 2024 16:56:05 +
Peter Maydell  wrote:

> In arm_deliver_fault() we check for whether the fault is caused
> by a data abort due to an access to a FEAT_NV2 sysreg in the
> memory pointed to by the VNCR. Unfortunately part of the
> condition checks the wrong argument to the function, meaning
> that it would spuriously trigger, resulting in some instruction
> aborts being taken to the wrong EL and reported incorrectly.
> 
> Use the right variable in the condition.
> 
> Fixes: 674e5345275d425 ("target/arm: Report VNCR_EL2 based faults correctly")
> Reported-by: Jonathan Cameron 
> Signed-off-by: Peter Maydell 

Matches what I have locally from discussion earlier.

Tested-by: Jonathan Cameron 
Reviewed-by: Jonathan Cameron 

Thanks

> ---
> In less lax languages the compiler might have pointed out that
> the type of the LHS and the RHS in the comparison didn't match :-)
> ---
>  target/arm/tcg/tlb_helper.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target/arm/tcg/tlb_helper.c b/target/arm/tcg/tlb_helper.c
> index dd5de74ffb7..5477c7fb7dc 100644
> --- a/target/arm/tcg/tlb_helper.c
> +++ b/target/arm/tcg/tlb_helper.c
> @@ -184,7 +184,7 @@ void arm_deliver_fault(ARMCPU *cpu, vaddr addr,
>   * (and indeed syndrome does not have the EC field in it,
>   * because we masked that out in disas_set_insn_syndrome())
>   */
> -bool is_vncr = (mmu_idx != MMU_INST_FETCH) &&
> +bool is_vncr = (access_type != MMU_INST_FETCH) &&
>  (env->exception.syndrome & ARM_EL_VNCR);
>  
>  if (is_vncr) {




Re: [PATCH] qga-win: Fix guest-get-fsinfo multi-disks collection

2024-01-16 Thread Konstantin Kostiuk
Reviewed-by: Konstantin Kostiuk 

On Wed, Jan 10, 2024 at 9:42 AM Peng Ji  wrote:

> ping !
> please review this patch : 
> https://patchew.org/QEMU/20231227071540.4035803-1-peng...@smartx.com/
>
> thanks
>
>
>
> On Fri, Jan 5, 2024 at 9:47 PM Philippe Mathieu-Daudé 
> wrote:
>
>> On 27/12/23 08:15, peng...@smartx.com wrote:
>> > From: Peng Ji 
>> >
>> > When a volume has more than one disk, all disks cannot be
>> > returned correctly because there is not enough malloced memory
>> > for disk extents, so before executing DeviceIoControl for the
>> > second time, get the correct size of the required memory space
>> > to store all disk extents.
>> >
>> > Signed-off-by: Peng Ji 
>>
>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2075
>>
>> > ---
>> >   qga/commands-win32.c | 2 ++
>> >   1 file changed, 2 insertions(+)
>> >
>> > diff --git a/qga/commands-win32.c b/qga/commands-win32.c
>> > index 697c65507c..a1015757d8 100644
>> > --- a/qga/commands-win32.c
>> > +++ b/qga/commands-win32.c
>> > @@ -935,6 +935,8 @@ static GuestDiskAddressList
>> *build_guest_disk_info(char *guid, Error **errp)
>> >   DWORD last_err = GetLastError();
>> >   if (last_err == ERROR_MORE_DATA) {
>> >   /* Try once more with big enough buffer */
>> > +size = sizeof(VOLUME_DISK_EXTENTS) +
>> > +   (sizeof(DISK_EXTENT) * (extents->NumberOfDiskExtents -
>> 1));
>> >   g_free(extents);
>> >   extents = g_malloc0(size);
>> >   if (!DeviceIoControl(
>>
>>


[PATCH] target/arm: Fix VNCR fault detection logic

2024-01-16 Thread Peter Maydell
In arm_deliver_fault() we check for whether the fault is caused
by a data abort due to an access to a FEAT_NV2 sysreg in the
memory pointed to by the VNCR. Unfortunately part of the
condition checks the wrong argument to the function, meaning
that it would spuriously trigger, resulting in some instruction
aborts being taken to the wrong EL and reported incorrectly.

Use the right variable in the condition.

Fixes: 674e5345275d425 ("target/arm: Report VNCR_EL2 based faults correctly")
Reported-by: Jonathan Cameron 
Signed-off-by: Peter Maydell 
---
In less lax languages the compiler might have pointed out that
the type of the LHS and the RHS in the comparison didn't match :-)
---
 target/arm/tcg/tlb_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/tcg/tlb_helper.c b/target/arm/tcg/tlb_helper.c
index dd5de74ffb7..5477c7fb7dc 100644
--- a/target/arm/tcg/tlb_helper.c
+++ b/target/arm/tcg/tlb_helper.c
@@ -184,7 +184,7 @@ void arm_deliver_fault(ARMCPU *cpu, vaddr addr,
  * (and indeed syndrome does not have the EC field in it,
  * because we masked that out in disas_set_insn_syndrome())
  */
-bool is_vncr = (mmu_idx != MMU_INST_FETCH) &&
+bool is_vncr = (access_type != MMU_INST_FETCH) &&
 (env->exception.syndrome & ARM_EL_VNCR);
 
 if (is_vncr) {
-- 
2.34.1




Re: [PATCH v3 00/14] hw/arm: Prefer arm_feature() over object_property_find()

2024-01-16 Thread Peter Maydell
On Tue, 16 Jan 2024 at 16:20, Philippe Mathieu-Daudé  wrote:
>
> On 13/1/24 14:36, Peter Maydell wrote:
> > On Wed, 10 Jan 2024 at 19:53, Philippe Mathieu-Daudé  
> > wrote:
> >>
> >> Since v2 [2]:
> >> - Dropped "Simplify checking A64_MTE bit in FEATURE_ID register"
> >> - Correct object_property_get_bool() uses
> >> - Update ARM_FEATURE_AARCH64 && aa64_mte
> >>
> >> Since RFC [1]:
> >> - Split one patch per feature
> >> - Addressed Peter's review comments
> >>
> >> [1] 
> >> https://lore.kernel.org/qemu-devel/20231214171447.44025-1-phi...@linaro.org/
> >> [2] 
> >> https://lore.kernel.org/qemu-devel/20240109180930.90793-1-phi...@linaro.org/
> >>
> >> Based-on: <20231123143813.42632-1-phi...@linaro.org>
> >>"hw: Simplify accesses to CPUState::'start-powered-off' property"
> >>
> >> Philippe Mathieu-Daudé (14):
> >>hw/arm/armv7m: Introduce cpudev variable in armv7m_realize()
> >>hw/arm/armv7m: Ensure requested CPU type implements ARM_FEATURE_M
> >>hw/arm/armv7m: Move code setting 'start-powered-off' property around
> >>hw/arm/armv7m: Always set 'init-nsvtor' property for Cortex-M CPUs
>
>
> > The first part of this is fine and reasonable cleanup, but I
> > continue to disagree about the later parts. What we want to do is
> > "if this property is present, then set it", and that's what we do.
> > Conversion to "if  > decide whether to define the property> then set it" is duplicating
> > the condition logic in two places and opening the door for bugs
> > where we change the condition in one place and not in the other.
> > Where the  is a simple "feature X is set" it doesn't
> > look too bad, but where it gets more complex it makes it IMHO more
> > obvious that this is a bad idea, for example with:
> >
> > -if (object_property_find(cpuobj, "reset-cbar")) {
> > +if (arm_feature(>env, ARM_FEATURE_CBAR) ||
> > +arm_feature(>env, ARM_FEATURE_CBAR_RO)) {
>
> For that we could add helpers such
>
>static inline bool arm_feature_cbar(CPUState *cpu) {
>  return arm_feature(>env, ARM_FEATURE_CBAR) ||
> arm_feature(>env, ARM_FEATURE_CBAR_RO);
>}
>
> and use it in the target/ code where we register the property
> and in the hw/ code where we set it.

Well, we could, but why? The question we're trying to
answer is "can we set this property?" and the simplest
and most logical way to test that is "does the object
have the property?". I really don't understand why we
would want to change the code at all.

> Do you mind taking the cleanup patches (1-4) meanwhile?

Yes, I can take patches 1-4.

thanks
-- PMM



[PATCH] acpi/tests/avocado/bits: wait for 200 seconds for SHUTDOWN event from bits VM

2024-01-16 Thread Ani Sinha
By default, the timeout to receive any specified event from the QEMU VM is 60
seconds set by the python avocado test framework. Please see event_wait() and
events_wait() in python/qemu/machine/machine.py. If the matching event is not
triggered within that interval, an asyncio.TimeoutError is generated. Since the
default timeout for the bits avocado test is 200 secs, we need to make
event_wait() timeout the same value as well so that an early timeout is not
triggered by the avocado framework.

CC: peter.mayd...@linaro.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2077
Signed-off-by: Ani Sinha 
---
 tests/avocado/acpi-bits.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tests/avocado/acpi-bits.py b/tests/avocado/acpi-bits.py
index 68b9e98d4e..870cd2e36c 100644
--- a/tests/avocado/acpi-bits.py
+++ b/tests/avocado/acpi-bits.py
@@ -53,7 +53,7 @@
 
 deps = ["xorriso", "mformat"] # dependent tools needed in the test setup/box.
 supported_platforms = ['x86_64'] # supported test platforms.
-
+BITS_TIMEOUT = 200
 
 def which(tool):
 """ looks up the full path for @tool, returns None if not found
@@ -133,7 +133,7 @@ class AcpiBitsTest(QemuBaseTest): #pylint: 
disable=too-many-instance-attributes
 
 """
 # in slower systems the test can take as long as 3 minutes to complete.
-timeout = 200
+timeout = BITS_TIMEOUT
 
 def __init__(self, *args, **kwargs):
 super().__init__(*args, **kwargs)
@@ -401,6 +401,6 @@ def test_acpi_smbios_bits(self):
 # biosbits has been configured to run all the specified test suites
 # in batch mode and then automatically initiate a vm shutdown.
 # Rely on avocado's unit test timeout.
-self._vm.event_wait('SHUTDOWN')
+self._vm.event_wait('SHUTDOWN', timeout=BITS_TIMEOUT)
 self._vm.wait(timeout=None)
 self.parse_log()
-- 
2.39.2




Re: [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU

2024-01-16 Thread Xiaoyao Li

On 1/15/2024 11:18 PM, Zhao Liu wrote:

Hi Xiaoyao,

On Mon, Jan 15, 2024 at 03:45:58PM +0800, Xiaoyao Li wrote:

Date: Mon, 15 Jan 2024 15:45:58 +0800
From: Xiaoyao Li 
Subject: Re: [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU

On 1/15/2024 1:59 PM, Zhao Liu wrote:

(Also cc "machine core" maintainers.)
u
Hi Xiaoyao,

On Mon, Jan 15, 2024 at 12:18:17PM +0800, Xiaoyao Li wrote:

Date: Mon, 15 Jan 2024 12:18:17 +0800
From: Xiaoyao Li 
Subject: Re: [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU

On 1/15/2024 11:27 AM, Zhao Liu wrote:

On Sun, Jan 14, 2024 at 09:49:18PM +0800, Xiaoyao Li wrote:

Date: Sun, 14 Jan 2024 21:49:18 +0800
From: Xiaoyao Li 
Subject: Re: [PATCH v7 10/16] i386/cpu: Introduce cluster-id to X86CPU

On 1/8/2024 4:27 PM, Zhao Liu wrote:

From: Zhuocheng Ding 

Introduce cluster-id other than module-id to be consistent with
CpuInstanceProperties.cluster-id, and this avoids the confusion
of parameter names when hotplugging.


I don't think reusing 'cluster' from arm for x86's 'module' is a good idea.
It introduces confusion around the code.


There is a precedent: generic "socket" v.s. i386 "package".


It's not the same thing. "socket" vs "package" is just software people and
hardware people chose different name. It's just different naming issue.


No, it's a similar issue. Same physical device, different name only.

Furthermore, the topology was introduced for resource layout and silicon
fabrication, and similar design ideas and fabrication processes are fairly
consistent across common current arches. Therefore, it is possible to
abstract similar topological hierarchies for different arches.



however, here it's reusing name issue while 'cluster' has been defined for
x86. It does introduce confusion.


There's nothing fundamentally different between the x86 module and the
generic cluster, is there? This is the reason that I don't agree with
introducing "modules" in -smp.


generic cluster just means the cluster of processors, i.e, a group of
cpus/lps. It is just a middle level between die and core.


Not sure if you mean the "cluster" device for TCG GDB? "cluster" device
is different with "cluster" option in -smp.


No, I just mean the word 'cluster'. And I thought what you called 
"generic cluster" means "a cluster of logical processors"


Below I quote the description of Yanan's commit 864c3b5c32f0:

A cluster generally means a group of CPU cores which share L2 cache
or other mid-level resources, and it is the shared resources that
is used to improve scheduler's behavior. From the point of view of
the size range, it's between CPU die and CPU core. For example, on
some ARM64 Kunpeng servers, we have 6 clusters in each NUMA node,
and 4 CPU cores in each cluster. The 4 CPU cores share a separate
L2 cache and a L3 cache tag, which brings cache affinity advantage.

What I get from it, is, cluster is just a middle level between CPU die 
and CPU core. The cpu cores inside one cluster shares some mid-level 
resource. L2 cache is just one example of the shared mid-level resource. 
So it can be either module level, or tile level in x86, or even the 
diegrp level you mentioned below.



When Yanan introduced the "cluster" option in -smp, he mentioned that it
is for sharing L2 and L3 tags, which roughly corresponds to our module.



It can be the module level in intel, or tile level. Further, if per die lp
number increases in the future, there might be more middle levels in intel
between die and core. Then at that time, how to decide what level should
cluster be mapped to?


Currently, there're 3 levels defined in SDM which are between die and
core: diegrp, tile and module. In our products, L2 is just sharing on the
module, so the intel's module and the general cluster are the best match.


you said 'generic cluster' a lot of times. But from my point of view, 
you are referring to current ARM's cluster instead of *generic* cluster.


Anyway, cluster is just a mid-level between die and core. We should not 
associate it any specific resource. A resource is shared in what level 
can change, e.g., initially L3 cache is shared in a physical package. 
When multi-die got supported, L3 cache is shared in one die. Now, on AMD 
product, L3 cache is shared in one complex, and one die can have 2 
complexs thus 2 separate L3 cache in one die.


It doesn't matter calling it cluster, or module, or xyz. It is just a 
name to represent a cpu topology level between die and core. What 
matters is, once it gets accepted, it becomes formal ABI for users that 
'cluster' means 'module' for x86 users. This is definitely a big 
confusion for people. Maybe people try to figure out why, and find the 
reason is that 'cluster' means the level at which L2 cache is shared and 
that's just the module level in x86 shares L2 cache. Maybe in the 
future, "L2 is shared in module" get changed just like the example I 
give for L3 above. Then, that's really the big confusion, and 

Re: [PATCH] hw/elf_ops: Ignore loadable segments with zero size

2024-01-16 Thread Richard Henderson

On 1/17/24 02:50, Bin Meng wrote:

Some ELF files really do have segments of zero size, e.g.:

Program Headers:
   Type   Offset VirtAddr   PhysAddr
  FileSizMemSiz  Flags  Align
   RISCV_ATTRIBUT 0x25b8 0x 0x
  0x003e 0x  R  0x1
   LOAD   0x1000 0x8020 0x8020
  0x01d1 0x01d1  R E0x1000
   LOAD   0x11d1 0x802001d1 0x802001d1
  0x0e37 0x0e37  RW 0x1000
   LOAD   0x0120 0x 0x
  0x 0x 0x1000

The current logic does not check for this condition, resulting in
the incorrect assignment of 'lowaddr' as zero.

There is already a piece of codes inside the segment traversal loop
that checks for zero-sized loadable segments for not creating empty
ROM blobs. Let's move this check to the beginning of the loop to
cover both scenarios.

Signed-off-by: Bin Meng 


Reviewed-by: Richard Henderson 

But please report this as a bug to whatever tool produced such nonsense.


r~



Re: [PULL 00/10] Hppa fixes 8.2 patches

2024-01-16 Thread Peter Maydell
On Sat, 13 Jan 2024 at 05:59,  wrote:
>
> From: Helge Deller 
>
> The following changes since commit 7425b6277f12e82952cede1f531bfc689bf77fb1:
>
>   Merge tag 'tracing-pull-request' of https://gitlab.com/stefanha/qemu into 
> staging (2023-12-27 05:15:32 -0500)
>
> are available in the Git repository at:
>
>   https://github.com/hdeller/qemu-hppa.git tags/hppa-fixes-8.2-pull-request
>
> for you to fetch changes up to 4bda8224fa89ab28958644c5f1a4117886fe8418:
>
>   target/hppa: Update SeaBIOS-hppa to version 15 (2024-01-13 06:49:18 +0100)
>
> 
> target/hppa qemu v8.2 regression fixes
>
> There were some regressions introduced with Qemu v8.2 on the hppa/hppa64
> target, e.g.:
>
> - 32-bit HP-UX crashes on B160L (32-bit) machine
> - NetBSD boot failure due to power button in page zero
> - NetBSD FPU detection failure
> - OpenBSD 7.4 boot failure
>
> This patch series fixes those known regressions and additionally:
>
> - allows usage of the max. 3840MB of memory (instead of 3GB),
> - adds support for the qemu --nodefaults option (to debug other devices)
>
> This patch set will not fix those known (non-regression) bugs:
> - HP-UX and NetBSD still fail to boot on the new 64-bit C3700 machine
> - Linux kernel will still fail to boot on C3700 as long as kernel modules are 
> used.
>
> Changes v2->v3:
> - Added comment about Figures H-10 and H-11 in the parisc2.0 spec
>   in patch which calculate PDC address translation if PSW.W=0
> - Introduce and use hppa_set_ior_and_isr()
> - Use drive_get_max_bus(IF_SCSI), nd_table[] and serial_hd() to check
>   if default devices should be created
> - Added Tested-by and Reviewed-by tags
>
> Changes v1->v2:
> - fix OpenBSD boot with SeaBIOS v15 instead of v14
> - commit message enhancements suggested by BALATON Zoltan
> - use uint64_t for ram_max in patch #1
>



Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/9.0
for any user-visible changes.

-- PMM



Re: [PULL 00/20] Migration 20240116 patches

2024-01-16 Thread Peter Maydell
On Tue, 16 Jan 2024 at 03:19,  wrote:
>
> From: Peter Xu 
>
> The following changes since commit 977542ded7e6b28d2bc077bcda24568c716e393c:
>
>   Merge tag 'pull-testing-updates-120124-2' of 
> https://gitlab.com/stsquad/qemu into staging (2024-01-12 14:02:53 +)
>
> are available in the Git repository at:
>
>   https://gitlab.com/peterx/qemu.git tags/migration-20240116-pull-request
>
> for you to fetch changes up to 44ce1b5d2fc77343f6a318cb3de613336a240048:
>
>   migration/rdma: define htonll/ntohll only if not predefined (2024-01-16 
> 11:16:10 +0800)
>
> 
> Migration pull request 2nd batch for 9.0
>
> - Het's cleanup on migration qmp command paths
> - Fabiano's migration cleanups and test improvements
> - Fabiano's patch to re-enable multifd-cancel test
> - Peter's migration doc reorganizations
> - Nick Briggs's fix for Solaries build on rdma
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/9.0
for any user-visible changes.

-- PMM



Re: [PATCH 14/33] hw/cpu/arm: Handle 'has_el2/3' properties once in MPCore parent

2024-01-16 Thread Philippe Mathieu-Daudé

On 12/1/24 22:33, Fabiano Rosas wrote:

Philippe Mathieu-Daudé  writes:


Move the 'has_el2' and 'has_el3' properties to the abstract
QOM parent.

Signed-off-by: Philippe Mathieu-Daudé 
---
  include/hw/cpu/cortex_mpcore.h |  5 +
  hw/arm/exynos4210.c| 10 --
  hw/arm/vexpress.c  |  6 ++
  hw/arm/xilinx_zynq.c   |  6 ++
  hw/cpu/a15mpcore.c | 18 ++
  hw/cpu/a9mpcore.c  |  5 +
  hw/cpu/cortex_mpcore.c |  3 +++
  7 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/include/hw/cpu/cortex_mpcore.h b/include/hw/cpu/cortex_mpcore.h
index 0e7cca9e93..387552468c 100644
--- a/include/hw/cpu/cortex_mpcore.h
+++ b/include/hw/cpu/cortex_mpcore.h
@@ -30,6 +30,8 @@
   * QEMU interface:
   *  + QOM property "num-cores" which set the number of cores present in
   *the cluster.
+ *  + QOM properties "cpu-has-el3", "cpu-has-el2" which set whether the CPUs
+ *have the exception level features present.
   */
  #define TYPE_CORTEX_MPCORE_PRIV "cortex_mpcore_priv"
  OBJECT_DECLARE_TYPE(CortexMPPrivState, CortexMPPrivClass, CORTEX_MPCORE_PRIV)
@@ -53,6 +55,9 @@ struct CortexMPPrivState {
  
  /* Properties */

  uint32_t num_cores;
+
+bool cpu_has_el3;
+bool cpu_has_el2;
  };




diff --git a/hw/cpu/a9mpcore.c b/hw/cpu/a9mpcore.c
index 9c138f4442..54949314f8 100644
--- a/hw/cpu/a9mpcore.c
+++ b/hw/cpu/a9mpcore.c
@@ -51,7 +51,6 @@ static void a9mp_priv_realize(DeviceState *dev, Error **errp)
  SysBusDevice *scubusdev, *gicbusdev, *gtimerbusdev, *mptimerbusdev,
   *wdtbusdev;
  Error *local_err = NULL;
-bool has_el3;
  CPUState *cpu0;
  Object *cpuobj;
  
@@ -86,9 +85,7 @@ static void a9mp_priv_realize(DeviceState *dev, Error **errp)

  /* Make the GIC's TZ support match the CPUs. We assume that
   * either all the CPUs have TZ, or none do.
   */
-has_el3 = object_property_find(cpuobj, "has_el3") &&
-object_property_get_bool(cpuobj, "has_el3", _abort);
-qdev_prop_set_bit(gicdev, "has-security-extensions", has_el3);
+qdev_prop_set_bit(gicdev, "has-security-extensions", c->cpu_has_el3);
  
  if (!sysbus_realize(SYS_BUS_DEVICE(>gic), errp)) {

  return;
diff --git a/hw/cpu/cortex_mpcore.c b/hw/cpu/cortex_mpcore.c
index d7ea633e4e..549b81f708 100644
--- a/hw/cpu/cortex_mpcore.c
+++ b/hw/cpu/cortex_mpcore.c
@@ -27,6 +27,9 @@ static Property cortex_mpcore_priv_properties[] = {
  DEFINE_PROP_UINT32("num-cores", CortexMPPrivState, num_cores, 1),
  DEFINE_PROP_UINT32("num-cpu", CortexMPPrivState, num_cores, 1), /* alias 
*/
  
+DEFINE_PROP_BOOL("cpu-has-el3", CortexMPPrivState, cpu_has_el3, true),

+DEFINE_PROP_BOOL("cpu-has-el2", CortexMPPrivState, cpu_has_el2, false),


Are we missing setting cpu_has_el2 somewhere else? This^ results in fewer
cpregs being registered and is what breaks migration.

You can test with:

$ (echo "migrate file:migfile"; echo "quit") | ./qemu-system-arm -M ast2600-evb 
-monitor stdio
$ ./scripts/analyze-migration.py -f migfile | grep array_len

Before series:
$ ./scripts/analyze-migration.py -f migfile | grep array_len
 "cpreg_vmstate_array_len": "0x010a",
 "cpreg_vmstate_array_len": "0x010a",

After series:
$ ./scripts/analyze-migration.py -f migfile | grep array_len
 "cpreg_vmstate_array_len": "0x00df",
 "cpreg_vmstate_array_len": "0x00df",


Thank you Fabiano for helping debugging. Indeed there is a problem
with properties, so this patch is bogus.

The good news is the QOM reparenting happened 2 commits earlier,
so this discarded the doubts on qom-path change possibly affecting
migration :)

Regards,

Phil.



Re: [PATCH v3 00/14] hw/arm: Prefer arm_feature() over object_property_find()

2024-01-16 Thread Philippe Mathieu-Daudé

On 13/1/24 14:36, Peter Maydell wrote:

On Wed, 10 Jan 2024 at 19:53, Philippe Mathieu-Daudé  wrote:


Since v2 [2]:
- Dropped "Simplify checking A64_MTE bit in FEATURE_ID register"
- Correct object_property_get_bool() uses
- Update ARM_FEATURE_AARCH64 && aa64_mte

Since RFC [1]:
- Split one patch per feature
- Addressed Peter's review comments

[1] https://lore.kernel.org/qemu-devel/20231214171447.44025-1-phi...@linaro.org/
[2] https://lore.kernel.org/qemu-devel/20240109180930.90793-1-phi...@linaro.org/

Based-on: <20231123143813.42632-1-phi...@linaro.org>
   "hw: Simplify accesses to CPUState::'start-powered-off' property"

Philippe Mathieu-Daudé (14):
   hw/arm/armv7m: Introduce cpudev variable in armv7m_realize()
   hw/arm/armv7m: Ensure requested CPU type implements ARM_FEATURE_M
   hw/arm/armv7m: Move code setting 'start-powered-off' property around
   hw/arm/armv7m: Always set 'init-nsvtor' property for Cortex-M CPUs




The first part of this is fine and reasonable cleanup, but I
continue to disagree about the later parts. What we want to do is
"if this property is present, then set it", and that's what we do.
Conversion to "if  then set it" is duplicating
the condition logic in two places and opening the door for bugs
where we change the condition in one place and not in the other.
Where the  is a simple "feature X is set" it doesn't
look too bad, but where it gets more complex it makes it IMHO more
obvious that this is a bad idea, for example with:

-if (object_property_find(cpuobj, "reset-cbar")) {
+if (arm_feature(>env, ARM_FEATURE_CBAR) ||
+arm_feature(>env, ARM_FEATURE_CBAR_RO)) {


For that we could add helpers such

  static inline bool arm_feature_cbar(CPUState *cpu) {
return arm_feature(>env, ARM_FEATURE_CBAR) ||
   arm_feature(>env, ARM_FEATURE_CBAR_RO);
  }

and use it in the target/ code where we register the property
and in the hw/ code where we set it.


Anyway I'll wait to see how Kevin's effort is going before
insisting with this series.
Do you mind taking the cleanup patches (1-4) meanwhile?

Thanks,

Phil.



Re: [PATCH 0/4] esp-pci: fixes for Linux and MS-DOS

2024-01-16 Thread Philippe Mathieu-Daudé

On 12/1/24 14:15, Mark Cave-Ayland wrote:

This series contains fixes for the esp-pci device (am53c974 or dc390) for a
few issues spotted whilst testing the previous ESP series.

Patches 1-3 are fixes for issues found by Helge/Guenter whilst testing the
hppa C3700 machine with the amd53c974/dc390 devices under Linux, whilst patch
4 fixes an issue that was exposed by testing MS-DOS and Windows drivers.

With this series applied on top of the reworked ESP device, it is possible to
boot Linux under qemu-system-hppa without any errors and also boot and install
Win98SE from a DC390 PCI SCSI controller (no IDE!) using an MS-DOS boot floppy.

Signed-off-by: Mark Cave-Ayland 
Based-on: 20240112125420.514425-1-mark.cave-ayl...@ilande.co.uk


Mark Cave-Ayland (4):
   esp-pci.c: use correct address register for PCI DMA transfers
   esp-pci.c: generate PCI interrupt from separate ESP and PCI sources
   esp-pci.c: synchronise setting of DMA_STAT_DONE with ESP completion
 interrupt
   esp-pci.c: set DMA_STAT_BCMBLT when BLAST command issued

  hw/scsi/esp-pci.c | 61 +++
  1 file changed, 41 insertions(+), 20 deletions(-)


Series queued to my hw-misc tree, thanks!





Re: [PATCH v2 2/2] hw/pflash: implement update buffer for block writes

2024-01-16 Thread Peter Maydell
On Tue, 16 Jan 2024 at 16:08, Philippe Mathieu-Daudé  wrote:
>
> On 12/1/24 17:54, Peter Maydell wrote:
> > On Mon, 8 Jan 2024 at 13:06, Philippe Mathieu-Daudé  
> > wrote:
> >>
> >> Hi Gerd,
> >>
> >> On 8/1/24 13:53, Philippe Mathieu-Daudé wrote:
> >>> From: Gerd Hoffmann 
> >>>
> >>> Add an update buffer where all block updates are staged.
> >>> Flush or discard updates properly, so we should never see
> >>> half-completed block writes in pflash storage.
> >>>
> >>> Drop a bunch of FIXME comments ;)
> >>>
> >>> Signed-off-by: Gerd Hoffmann 
> >>> Message-ID: <20240105135855.268064-3-kra...@redhat.com>
> >>> Signed-off-by: Philippe Mathieu-Daudé 
> >>> ---
> >>>hw/block/pflash_cfi01.c | 106 ++--
> >>>1 file changed, 80 insertions(+), 26 deletions(-)
>
>
> >>> +static const VMStateDescription vmstate_pflash_blk_write = {
> >>> +.name = "pflash_cfi01_blk_write",
> >>> +.version_id = 1,
> >>> +.minimum_version_id = 1,
> >>> +.needed = pflash_blk_write_state_needed,
> >>> +.fields = (const VMStateField[]) {
> >>> +VMSTATE_VBUFFER_UINT32(blk_bytes, PFlashCFI01, 0, NULL, 
> >>> writeblock_size),
> >>
> >> I don't get the difference with VMSTATE_VBUFFER_ALLOC_UINT32() which
> >> sets VMS_ALLOC. In this case pflash_cfi01_realize() does the alloc so
> >> we don't need VMS_ALLOC?
> >
> > Yes, that's the idea. A VMS_ALLOC vmstate type means "this
> > block of memory is dynamically sized at runtime, so when the
> > migration code is doing inbound migration it needs to
> > allocate a buffer of the right size first (based on some
> > state struct field we've already migrated) and then put the
> > incoming data into it". VMS_VBUFFER means "the size of the buffer
> > isn't a compile-time constant, so we need to fish it out of
> > some other state struct field". So:
> >
> >   VMSTATE_VBUFFER_UINT32: we need to migrate (a pointer to) an array
> >   of uint32_t; the size of that is in some other struct field,
> >   but it's a runtime constant and we can assume the memory has
> >   already been allocated
> >
> >   VMSTATE_VBUFFER_ALLOC_UINT32: we need to migrate an array
> >   of uint32_t of variable size dependent on the inbound migration
> >   data, and so the migration code must allocate it
>
> Thanks Peter!
>
> Do you mind if we commit your explanation as is? As:
>
> -- >8 --
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index 294d2d8486..5c6f6c5c32 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -573,4 +573,6 @@ extern const VMStateInfo vmstate_info_qlist;
>
> -/* a variable length array (i.e. _type *_field) but we know the
> - * length
> +/**
> + * VMSTATE_STRUCT_VARRAY_POINTER_KNOWN:
> + *
> + * A variable length array (i.e. _type *_field) but we know the length.
>*/
> @@ -678,2 +680,10 @@ extern const VMStateInfo vmstate_info_qlist;
>
> +/**
> + * VMSTATE_VBUFFER_UINT32:
> + *
> + * We need to migrate (a pointer to) an array of uint32_t; the size of
> + * that is in some other struct field, but it's a runtime constant and
> + * we can assume the memory has already been allocated.
> +*/
> +
>   #define VMSTATE_VBUFFER_UINT32(_field, _state, _version, _test,
> _field_size) { \
> @@ -688,2 +698,9 @@ extern const VMStateInfo vmstate_info_qlist;
>
> +/**
> + * VMSTATE_VBUFFER_ALLOC_UINT32:
> + *
> + * We need to migrate an array of uint32_t of variable size dependent
> + * on the inbound migration data, and so the migration code must
> + * allocate it.
> +*/
>   #define VMSTATE_VBUFFER_ALLOC_UINT32(_field, _state, _version,   \
> ---

Gerd is probably a better source of comments for these macros;
there are some things I don't entirely understand about all
the inner workings, so my comments above are restricted to
only the bit that matters for this particular distinction,
and aren't really intended as documentation of the macro in
general.

-- PMM



Re: [PATCH v2 2/2] hw/pflash: implement update buffer for block writes

2024-01-16 Thread Philippe Mathieu-Daudé

On 12/1/24 17:54, Peter Maydell wrote:

On Mon, 8 Jan 2024 at 13:06, Philippe Mathieu-Daudé  wrote:


Hi Gerd,

On 8/1/24 13:53, Philippe Mathieu-Daudé wrote:

From: Gerd Hoffmann 

Add an update buffer where all block updates are staged.
Flush or discard updates properly, so we should never see
half-completed block writes in pflash storage.

Drop a bunch of FIXME comments ;)

Signed-off-by: Gerd Hoffmann 
Message-ID: <20240105135855.268064-3-kra...@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé 
---
   hw/block/pflash_cfi01.c | 106 ++--
   1 file changed, 80 insertions(+), 26 deletions(-)




+static const VMStateDescription vmstate_pflash_blk_write = {
+.name = "pflash_cfi01_blk_write",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = pflash_blk_write_state_needed,
+.fields = (const VMStateField[]) {
+VMSTATE_VBUFFER_UINT32(blk_bytes, PFlashCFI01, 0, NULL, 
writeblock_size),


I don't get the difference with VMSTATE_VBUFFER_ALLOC_UINT32() which
sets VMS_ALLOC. In this case pflash_cfi01_realize() does the alloc so
we don't need VMS_ALLOC?


Yes, that's the idea. A VMS_ALLOC vmstate type means "this
block of memory is dynamically sized at runtime, so when the
migration code is doing inbound migration it needs to
allocate a buffer of the right size first (based on some
state struct field we've already migrated) and then put the
incoming data into it". VMS_VBUFFER means "the size of the buffer
isn't a compile-time constant, so we need to fish it out of
some other state struct field". So:

  VMSTATE_VBUFFER_UINT32: we need to migrate (a pointer to) an array
  of uint32_t; the size of that is in some other struct field,
  but it's a runtime constant and we can assume the memory has
  already been allocated

  VMSTATE_VBUFFER_ALLOC_UINT32: we need to migrate an array
  of uint32_t of variable size dependent on the inbound migration
  data, and so the migration code must allocate it


Thanks Peter!

Do you mind if we commit your explanation as is? As:

-- >8 --
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 294d2d8486..5c6f6c5c32 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -573,4 +573,6 @@ extern const VMStateInfo vmstate_info_qlist;

-/* a variable length array (i.e. _type *_field) but we know the
- * length
+/**
+ * VMSTATE_STRUCT_VARRAY_POINTER_KNOWN:
+ *
+ * A variable length array (i.e. _type *_field) but we know the length.
  */
@@ -678,2 +680,10 @@ extern const VMStateInfo vmstate_info_qlist;

+/**
+ * VMSTATE_VBUFFER_UINT32:
+ *
+ * We need to migrate (a pointer to) an array of uint32_t; the size of
+ * that is in some other struct field, but it's a runtime constant and
+ * we can assume the memory has already been allocated.
+*/
+
 #define VMSTATE_VBUFFER_UINT32(_field, _state, _version, _test, 
_field_size) { \

@@ -688,2 +698,9 @@ extern const VMStateInfo vmstate_info_qlist;

+/**
+ * VMSTATE_VBUFFER_ALLOC_UINT32:
+ *
+ * We need to migrate an array of uint32_t of variable size dependent
+ * on the inbound migration data, and so the migration code must
+ * allocate it.
+*/
 #define VMSTATE_VBUFFER_ALLOC_UINT32(_field, _state, _version,   \
---




Re: [PATCH RFC V2 04/37] arm/virt,target/arm: Machine init time change common to vCPU {cold|hot}-plug

2024-01-16 Thread Jonathan Cameron via
On Mon, 2 Oct 2023 17:12:43 +0100
Salil Mehta  wrote:

> Hi Gavin,
> 
> > From: Gavin Shan 
> > Sent: Wednesday, September 27, 2023 7:29 AM
> > To: Salil Mehta ; qemu-devel@nongnu.org; qemu-
> > a...@nongnu.org
> > Cc: m...@kernel.org; jean-phili...@linaro.org; Jonathan Cameron
> > ; lpieral...@kernel.org;
> > peter.mayd...@linaro.org; richard.hender...@linaro.org;
> > imamm...@redhat.com; andrew.jo...@linux.dev; da...@redhat.com;
> > phi...@linaro.org; eric.au...@redhat.com; w...@kernel.org; a...@kernel.org;
> > oliver.up...@linux.dev; pbonz...@redhat.com; m...@redhat.com;
> > raf...@kernel.org; borntrae...@linux.ibm.com; alex.ben...@linaro.org;
> > li...@armlinux.org.uk; dar...@os.amperecomputing.com;
> > il...@os.amperecomputing.com; vis...@os.amperecomputing.com;
> > karl.heub...@oracle.com; miguel.l...@oracle.com; salil.me...@opnsrc.net;
> > zhukeqian ; wangxiongfeng (C)
> > ; wangyanan (Y) ;
> > jiakern...@gmail.com; maob...@loongson.cn; lixiang...@loongson.cn
> > Subject: Re: [PATCH RFC V2 04/37] arm/virt,target/arm: Machine init time
> > change common to vCPU {cold|hot}-plug
> > 
> > Hi Salil,
> > 
> > On 9/26/23 20:04, Salil Mehta wrote:  
> > > Refactor and introduce the common logic required during the  
> > initialization of  
> > > both cold and hot plugged vCPUs. Also initialize the *disabled* state of 
> > > the
> > > vCPUs which shall be used further during init phases of various other 
> > > components
> > > like GIC, PMU, ACPI etc as part of the virt machine initialization.
> > >
> > > KVM vCPUs corresponding to unplugged/yet-to-be-plugged QOM CPUs are kept 
> > > in
> > > powered-off state in the KVM Host and do not run the guest code. Plugged 
> > > vCPUs
> > > are also kept in powered-off state but vCPU threads exist and is kept 
> > > sleeping.
> > >
> > > TBD:
> > > For the cold booted vCPUs, this change also exists in the 
> > > arm_load_kernel()
> > > in boot.c but for the hotplugged CPUs this change should still remain 
> > > part of
> > > the pre-plug phase. We are duplicating the powering-off of the cold 
> > > booted CPUs.
> > > Shall we remove the duplicate change from boot.c?
> > >
> > > Co-developed-by: Salil Mehta 
> > > Signed-off-by: Salil Mehta 
> > > Co-developed-by: Keqian Zhu 
> > > Signed-off-by: Keqian Zhu 
> > > Reported-by: Gavin Shan 
> > > [GS: pointed the assertion due to wrong range check]
> > > Signed-off-by: Salil Mehta 
> > > ---
> > >   hw/arm/virt.c  | 149 -
> > >   target/arm/cpu.c   |   7 +++
> > >   target/arm/cpu64.c |  14 +
> > >   3 files changed, 156 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > > index 0eb6bf5a18..3668ad27ec 100644
> > > --- a/hw/arm/virt.c
> > > +++ b/hw/arm/virt.c
> > > @@ -221,6 +221,7 @@ static const char *valid_cpus[] = {
> > >   ARM_CPU_TYPE_NAME("max"),
> > >   };
> > >
> > > +static CPUArchId *virt_find_cpu_slot(MachineState *ms, int vcpuid);
> > >   static int virt_get_socket_id(const MachineState *ms, int cpu_index);
> > >   static int virt_get_cluster_id(const MachineState *ms, int cpu_index);
> > >   static int virt_get_core_id(const MachineState *ms, int cpu_index);
> > > @@ -2154,6 +2155,14 @@ static void machvirt_init(MachineState *machine)
> > >   exit(1);
> > >   }
> > >
> > > +finalize_gic_version(vms);
> > > +if (tcg_enabled() || hvf_enabled() || qtest_enabled() ||
> > > +(vms->gic_version < VIRT_GIC_VERSION_3)) {
> > > +machine->smp.max_cpus = smp_cpus;
> > > +mc->has_hotpluggable_cpus = false;
> > > +warn_report("cpu hotplug feature has been disabled");
> > > +}
> > > +  
> > 
> > Comments needed here to explain why @mc->has_hotpluggable_cpus is set to 
> > false.
> > I guess it's something related to TODO list, mentioned in the cover letter. 
> >  
> 
> 
> I can put a comment explaining the checks as to why feature has been disabled.
> BTW, isn't code self-explanatory here?

Would be good to gate that warn_report() on whether any attempt to enable
CPU hotplug has been made if (max_cpus > smp for example).
Right now it's noise on a lot of pre existing configurations.




Re: [PATCH] hw/arm/musicpal: Convert to qemu_add_kbd_event_handler()

2024-01-16 Thread Philippe Mathieu-Daudé

Cc'ing Gerd & Marc-André for UI.

On 3/11/23 19:27, Peter Maydell wrote:

Convert the musicpal key input device to use
qemu_add_kbd_event_handler().  This lets us simplify it because we no
longer need to track whether we're in the middle of a PS/2 multibyte
key sequence.

In the conversion we move the keyboard handler registration from init
to realize, because devices shouldn't disturb the state of the
simulation by doing things like registering input handlers until
they're realized, so that device objects can be introspected
safely.

The behaviour where key-repeat is permitted for the arrow-keys only
is intentional (added in commit 7c6ce4baedfcd0c), so we retain it,
and add a comment to that effect.

This is a migration compatibility break for musicpal.

Signed-off-by: Peter Maydell 
---
  hw/arm/musicpal.c | 131 +-
  1 file changed, 61 insertions(+), 70 deletions(-)

diff --git a/hw/arm/musicpal.c b/hw/arm/musicpal.c
index 9703bfb97fb..e8c1250ab04 100644
--- a/hw/arm/musicpal.c
+++ b/hw/arm/musicpal.c
@@ -1043,20 +1043,6 @@ static const TypeInfo musicpal_gpio_info = {
  };
  
  /* Keyboard codes & masks */

-#define KEY_RELEASED0x80
-#define KEY_CODE0x7f
-
-#define KEYCODE_TAB 0x0f
-#define KEYCODE_ENTER   0x1c
-#define KEYCODE_F   0x21
-#define KEYCODE_M   0x32
-
-#define KEYCODE_EXTENDED0xe0
-#define KEYCODE_UP  0x48
-#define KEYCODE_DOWN0x50
-#define KEYCODE_LEFT0x4b
-#define KEYCODE_RIGHT   0x4d
-
  #define MP_KEY_WHEEL_VOL   (1 << 0)
  #define MP_KEY_WHEEL_VOL_INV   (1 << 1)
  #define MP_KEY_WHEEL_NAV   (1 << 2)
@@ -1074,67 +1060,66 @@ struct musicpal_key_state {
  SysBusDevice parent_obj;
  /*< public >*/
  
-uint32_t kbd_extended;

  uint32_t pressed_keys;
  qemu_irq out[8];
  };
  
-static void musicpal_key_event(void *opaque, int keycode)

+static void musicpal_key_event(DeviceState *dev, QemuConsole *src,
+   InputEvent *evt)
  {
-musicpal_key_state *s = opaque;
+musicpal_key_state *s = MUSICPAL_KEY(dev);
+InputKeyEvent *key = evt->u.key.data;
+int qcode = qemu_input_key_value_to_qcode(key->key);
  uint32_t event = 0;
  int i;
  
-if (keycode == KEYCODE_EXTENDED) {

-s->kbd_extended = 1;
-return;
+switch (qcode) {
+case Q_KEY_CODE_UP:
+event = MP_KEY_WHEEL_NAV | MP_KEY_WHEEL_NAV_INV;
+break;
+
+case Q_KEY_CODE_DOWN:
+event = MP_KEY_WHEEL_NAV;
+break;
+
+case Q_KEY_CODE_LEFT:
+event = MP_KEY_WHEEL_VOL | MP_KEY_WHEEL_VOL_INV;
+break;
+
+case Q_KEY_CODE_RIGHT:
+event = MP_KEY_WHEEL_VOL;
+break;
+
+case Q_KEY_CODE_F:
+event = MP_KEY_BTN_FAVORITS;
+break;
+
+case Q_KEY_CODE_TAB:
+event = MP_KEY_BTN_VOLUME;
+break;
+
+case Q_KEY_CODE_RET:
+event = MP_KEY_BTN_NAVIGATION;
+break;
+
+case Q_KEY_CODE_M:
+event = MP_KEY_BTN_MENU;
+break;
  }
  
-if (s->kbd_extended) {

-switch (keycode & KEY_CODE) {
-case KEYCODE_UP:
-event = MP_KEY_WHEEL_NAV | MP_KEY_WHEEL_NAV_INV;
-break;
-
-case KEYCODE_DOWN:
-event = MP_KEY_WHEEL_NAV;
-break;
-
-case KEYCODE_LEFT:
-event = MP_KEY_WHEEL_VOL | MP_KEY_WHEEL_VOL_INV;
-break;
-
-case KEYCODE_RIGHT:
-event = MP_KEY_WHEEL_VOL;
-break;
-}
-} else {
-switch (keycode & KEY_CODE) {
-case KEYCODE_F:
-event = MP_KEY_BTN_FAVORITS;
-break;
-
-case KEYCODE_TAB:
-event = MP_KEY_BTN_VOLUME;
-break;
-
-case KEYCODE_ENTER:
-event = MP_KEY_BTN_NAVIGATION;
-break;
-
-case KEYCODE_M:
-event = MP_KEY_BTN_MENU;
-break;
-}
-/* Do not repeat already pressed buttons */
-if (!(keycode & KEY_RELEASED) && (s->pressed_keys & event)) {
+/*
+ * We allow repeated wheel-events when the arrow keys are held down,
+ * but do not repeat already-pressed buttons for the other key inputs.
+ */
+if (!(event & (MP_KEY_WHEEL_NAV | MP_KEY_WHEEL_VOL))) {
+if (key->down && (s->pressed_keys & event)) {
  event = 0;
  }
  }
  
  if (event) {

  /* Raise GPIO pin first if repeating a key */
-if (!(keycode & KEY_RELEASED) && (s->pressed_keys & event)) {
+if (key->down && (s->pressed_keys & event)) {
  for (i = 0; i <= 7; i++) {
  if (event & (1 << i)) {
  qemu_set_irq(s->out[i], 1);
@@ -1143,17 +1128,15 @@ static void musicpal_key_event(void *opaque, int 
keycode)
  }
  for (i = 0; i <= 7; i++) {
  if (event & (1 << i)) {
- 

Re: [PATCH v2 04/19] qapi/schema: declare type for QAPISchemaObjectTypeMember.type

2024-01-16 Thread John Snow
On Mon, Jan 15, 2024 at 8:53 AM Markus Armbruster  wrote:
>
> John Snow  writes:
>
> > declare, but don't initialize the type of "type" to be QAPISchemaType -
>
> Declare
>
> > and allow the value to be initialized during check(). This creates a
> > form of delayed initialization for QAPISchemaType objects where the
> > static typing only represents the fully-realized object, which occurs
> > after check() has been called.
> >
> > This avoids the need for several "assert type is not None" statements
> > littered throughout the code by asserting it "will always be set."
>
> Uh, I find "will always be set" confusing.
>
> I think you mean "cannot be None".

Yuh

>
> But "be set" can also be read as "will have a value".

...yuh

>
> It actually doesn't exist until .check(), and once it exists, it cannot
> be None.
>
> > Note that the static typing information for this object will be
> > incorrect prior to check() being called. If this field is accessed
> > before it is initialized in check(), you'll be treated to an
> > AttributeError exception.
>
> We could now enter a philosophical debate on whether the static typing
> is actually incorrect.  Clearly v: T asserts that the value of @v is
> always a T, and the type checker proves this.  Does it also assert that
> @v exists?  The type checker doesn't care, and that's a feature.

*clutches TAPL and cries*

>
> Maybe start with the problem, and then develop the solution:
>
>   A QAPISchemaObjectTypeMember's type gets resolved only during .check().
>   We have QAPISchemaObjectTypeMember.__init__() initialize self.type =
>   None, and .check() assign the actual type.  Using .type before .check()
>   is wrong, and hopefully crashes due to the value being None.  Works.
>
>   However, it makes for awkward typing.  With .type:
>   Optional[QAPISchemaType], mypy is of course unable to see that it's None
>   before .check(), and a QAPISchemaType after.  To help it over the hump,
>   we'd have to assert self.type is not None before all the (valid) uses.
>   The assertion catches invalid uses, but only at run time; mypy can't
>   flag them.
>
>   Instead, declare .type in .__init__() as QAPISchemaType *without*
>   initializing it.  Using .type before .check() now certainly crashes,
>   which is an improvement.  Mypy still can't flag invalid uses, but that's
>   okay.
>

OK.

--js

> > Fixes stuff like this:
> >
> > qapi/schema.py:657: error: "None" has no attribute "alternate_qtype"  
> > [attr-defined]
> > qapi/schema.py:662: error: "None" has no attribute "describe"  
> > [attr-defined]
> >
> > Signed-off-by: John Snow 
> > ---
> >  scripts/qapi/schema.py | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/scripts/qapi/schema.py b/scripts/qapi/schema.py
> > index e39ed972a80..48a51dcd188 100644
> > --- a/scripts/qapi/schema.py
> > +++ b/scripts/qapi/schema.py
> > @@ -794,7 +794,7 @@ def __init__(self, name, info, typ, optional, 
> > ifcond=None, features=None):
> >  assert isinstance(f, QAPISchemaFeature)
> >  f.set_defined_in(name)
> >  self._type_name = typ
> > -self.type = None
> > +self.type: QAPISchemaType  # set during check()
> >  self.optional = optional
> >  self.features = features or []
>




[PATCH] hw/elf_ops: Ignore loadable segments with zero size

2024-01-16 Thread Bin Meng
Some ELF files really do have segments of zero size, e.g.:

Program Headers:
  Type   Offset VirtAddr   PhysAddr
 FileSizMemSiz  Flags  Align
  RISCV_ATTRIBUT 0x25b8 0x 0x
 0x003e 0x  R  0x1
  LOAD   0x1000 0x8020 0x8020
 0x01d1 0x01d1  R E0x1000
  LOAD   0x11d1 0x802001d1 0x802001d1
 0x0e37 0x0e37  RW 0x1000
  LOAD   0x0120 0x 0x
 0x 0x 0x1000

The current logic does not check for this condition, resulting in
the incorrect assignment of 'lowaddr' as zero.

There is already a piece of codes inside the segment traversal loop
that checks for zero-sized loadable segments for not creating empty
ROM blobs. Let's move this check to the beginning of the loop to
cover both scenarios.

Signed-off-by: Bin Meng 
---

 include/hw/elf_ops.h | 75 +++-
 1 file changed, 39 insertions(+), 36 deletions(-)

diff --git a/include/hw/elf_ops.h b/include/hw/elf_ops.h
index 0a5c258fe6..f014399b50 100644
--- a/include/hw/elf_ops.h
+++ b/include/hw/elf_ops.h
@@ -427,6 +427,16 @@ static ssize_t glue(load_elf, SZ)(const char *name, int fd,
 file_size = ph->p_filesz; /* Size of the allocated data */
 data_offset = ph->p_offset; /* Offset where the data is located */
 
+/*
+ * Some ELF files really do have segments of zero size;
+ * just ignore them rather than trying to set the wrong addr,
+ * or create empty ROM blobs, because the zero-length blob can
+ * falsely trigger the overlapping-ROM-blobs check.
+ */
+if (mem_size == 0) {
+continue;
+}
+
 if (file_size > 0) {
 if (g_mapped_file_get_length(mapped_file) <
 file_size + data_offset) {
@@ -530,45 +540,38 @@ static ssize_t glue(load_elf, SZ)(const char *name, int 
fd,
 *pentry = ehdr.e_entry - ph->p_vaddr + ph->p_paddr;
 }
 
-/* Some ELF files really do have segments of zero size;
- * just ignore them rather than trying to create empty
- * ROM blobs, because the zero-length blob can falsely
- * trigger the overlapping-ROM-blobs check.
- */
-if (mem_size != 0) {
-if (load_rom) {
-g_autofree char *label =
-g_strdup_printf("%s ELF program header segment %d",
-name, i);
-
-/*
- * rom_add_elf_program() takes its own reference to
- * 'mapped_file'.
- */
-rom_add_elf_program(label, mapped_file, data, file_size,
-mem_size, addr, as);
-} else {
-MemTxResult res;
-
-res = address_space_write(as ? as : _space_memory,
-  addr, MEMTXATTRS_UNSPECIFIED,
-  data, file_size);
+if (load_rom) {
+g_autofree char *label =
+g_strdup_printf("%s ELF program header segment %d",
+name, i);
+
+/*
+ * rom_add_elf_program() takes its own reference to
+ * 'mapped_file'.
+ */
+rom_add_elf_program(label, mapped_file, data, file_size,
+mem_size, addr, as);
+} else {
+MemTxResult res;
+
+res = address_space_write(as ? as : _space_memory,
+  addr, MEMTXATTRS_UNSPECIFIED,
+  data, file_size);
+if (res != MEMTX_OK) {
+goto fail;
+}
+/*
+ * We need to zero'ify the space that is not copied
+ * from file
+ */
+if (file_size < mem_size) {
+res = address_space_set(as ? as : _space_memory,
+addr + file_size, 0,
+mem_size - file_size,
+MEMTXATTRS_UNSPECIFIED);
 if (res != MEMTX_OK) {
 goto fail;
 }
-/*
- * We need to zero'ify the space that is not copied
- * from file
-

Re: [PATCH v2 03/19] qapi: create QAPISchemaDefinition

2024-01-16 Thread John Snow
On Tue, Jan 16, 2024 at 2:22 AM Markus Armbruster  wrote:
>
> John Snow  writes:
>
> > On Mon, Jan 15, 2024 at 8:16 AM Markus Armbruster  wrote:
> >>
> >> John Snow  writes:
> >>
> >> > Include entities don't have names, but we generally expect "entities" to
> >> > have names. Reclassify all entities with names as *definitions*, leaving
> >> > the nameless include entities as QAPISchemaEntity instances.
> >> >
> >> > This is primarily to help simplify typing around expectations of what
> >> > callers expect for properties of an "entity".
> >> >
> >> > Suggested-by: Markus Armbruster 
> >> > Signed-off-by: John Snow 
> >> > ---
> >> >  scripts/qapi/schema.py | 117 -
> >> >  1 file changed, 70 insertions(+), 47 deletions(-)
> >> >
> >> > diff --git a/scripts/qapi/schema.py b/scripts/qapi/schema.py
> >> > index b7830672e57..e39ed972a80 100644
> >> > --- a/scripts/qapi/schema.py
> >> > +++ b/scripts/qapi/schema.py
> >> > @@ -55,14 +55,14 @@ def is_present(self):
> >> >
> >> >
> >> >  class QAPISchemaEntity:
> >> > -meta: Optional[str] = None
> >> > +"""
> >> > +QAPISchemaEntity represents all schema entities.
> >> >
> >> > -def __init__(self, name: str, info, doc, ifcond=None, 
> >> > features=None):
> >> > -assert name is None or isinstance(name, str)
> >> > -for f in features or []:
> >> > -assert isinstance(f, QAPISchemaFeature)
> >> > -f.set_defined_in(name)
> >> > -self.name = name
> >> > +Notably, this includes both named and un-named entities, such as
> >> > +include directives. Entities with names are represented by the
> >> > +more specific sub-class `QAPISchemaDefinition` instead.
> >> > +"""
> >>
> >> Hmm.  What about:
> >>
> >>"""
> >>A schema entity.
> >>
> >>This is either a directive, such as include, or a definition.
> >>The latter use sub-class `QAPISchemaDefinition`.
>
> Or is it "uses"?  Not a native speaker...

American English: collective nouns are treated as singular ("Congress
has passed a law")
British English: collective nouns are treated as plural ("Parliament
have enacted a new regulation")

>
> >>"""
> >>
> >
> > Sure. Key point was just the cookie crumb to the sub-class.
> >
> >> > +def __init__(self, info):
> >> >  self._module = None
> >> >  # For explicitly defined entities, info points to the (explicit)
> >> >  # definition.  For builtins (and their arrays), info is None.
> >> > @@ -70,14 +70,50 @@ def __init__(self, name: str, info, doc, 
> >> > ifcond=None, features=None):
> >> >  # triggered the implicit definition (there may be more than one
> >> >  # such place).
> >> >  self.info = info
> >> > +self._checked = False
> >> > +
> >> > +def __repr__(self):
> >> > +return "<%s at 0x%x>" % (type(self).__name__, id(self))
> >> > +
> >> > +def check(self, schema):
> >> > +# pylint: disable=unused-argument
> >> > +self._checked = True
> >> > +
> >> > +def connect_doc(self, doc=None):
> >> > +pass
> >> > +
> >> > +def check_doc(self):
> >> > +pass
> >> > +
> >> > +def _set_module(self, schema, info):
> >> > +assert self._checked
> >> > +fname = info.fname if info else 
> >> > QAPISchemaModule.BUILTIN_MODULE_NAME
> >> > +self._module = schema.module_by_fname(fname)
> >> > +self._module.add_entity(self)
> >> > +
> >> > +def set_module(self, schema):
> >> > +self._set_module(schema, self.info)
> >> > +
> >> > +def visit(self, visitor):
> >> > +# pylint: disable=unused-argument
> >> > +assert self._checked
> >> > +
> >> > +
> >> > +class QAPISchemaDefinition(QAPISchemaEntity):
> >> > +meta: Optional[str] = None
> >> > +
> >> > +def __init__(self, name: str, info, doc, ifcond=None, 
> >> > features=None):
> >> > +assert isinstance(name, str)
> >> > +super().__init__(info)
> >> > +for f in features or []:
> >> > +assert isinstance(f, QAPISchemaFeature)
> >> > +f.set_defined_in(name)
> >> > +self.name = name
> >> >  self.doc = doc
> >> >  self._ifcond = ifcond or QAPISchemaIfCond()
> >> >  self.features = features or []
> >> > -self._checked = False
> >> >
> >> >  def __repr__(self):
> >> > -if self.name is None:
> >> > -return "<%s at 0x%x>" % (type(self).__name__, id(self))
> >> >  return "<%s:%s at 0x%x>" % (type(self).__name__, self.name,
> >> >  id(self))
> >> >
> >> > @@ -102,15 +138,6 @@ def check_doc(self):
> >>def check(self, schema):
> >># pylint: disable=unused-argument
> >>assert not self._checked
> >>seen = {}
> >>for f in self.features:
> >>f.check_clash(self.info, seen)
> >>self._checked = 

  1   2   3   >