date:20230817

Re: CXL volatile memory is not listed

2023-08-17 Thread Maverickk 78

Hi Jonathan,

The use case of CXL switch will always need some sort of management
agent + FM configuring the available CXL memory connected.

In most cases it would be bmc controller configuring MLD/MHD's to
host, and in very rare scenarios it may be one of the host interacting
with FM firmware inside the switch which would do the trick.

Another use case is static hardcoding between CXL memory & host in
built in cxl switch

There is no scenario where one of the host BIOS can push the select
CXL memory to itself.


Is my understanding correct?



On Fri, 11 Aug 2023 at 19:25, Jonathan Cameron
 wrote:
>
> On Fri, 11 Aug 2023 08:04:26 +0530
> Maverickk 78  wrote:
>
> > Jonathan,
> >
> > > More generally for the flow that would bring the memory up as system ram
> > > you would typically need the bios to have done the CXL enumeration or
> > > a bunch of scripts in the kernel to have done it.  In general it can't
> > > be fully automated, because there are policy decisions to make on things 
> > > like
> > > interleaving.
> >
> > BIOS CXL enumeration? is CEDT not enough? or BIOS further needs to
> > create an entry
> > in the e820 table?
> On intel platforms 'maybe' :)  I know how it works on those that just
> use the nice standard EFI tables - less familiar with the e820 stuff :)
>
> CEDT says where to find the the various bits of system related CXL stuff.
> Nothing in there on the configuration that should be used such as interleaving
> as that depends on what the administrator wants. Or on what the BIOS has
> decided the users should have.
>
> >
> > >
> > > I'm not aware of any open source BIOSs that do it yet.  So you have
> > > to rely on the same kernel paths as for persistent memory - manual 
> > > configuration
> > > etc in the kernel.
> > >
> > Manual works with "cxl create regiton"  :)
> Great.
>
> Jonathan
>
> >
> > On Thu, 10 Aug 2023 at 16:05, Jonathan Cameron
> >  wrote:
> > >
> > > On Wed, 9 Aug 2023 04:21:47 +0530
> > > Maverickk 78  wrote:
> > >
> > > > Hello,
> > > >
> > > > I am running qemu-system-x86_64
> > > >
> > > > qemu-system-x86_64 --version
> > > > QEMU emulator version 8.0.92 (v8.1.0-rc2-80-g0450cf0897)
> > > >
> > > +Cc linux-cxl as the answer is more todo with linux than qemu.
> > >
> > > > qemu-system-x86_64 \
> > > > -m 2G,slots=4,maxmem=4G \
> > > > -smp 4 \
> > > > -machine type=q35,accel=kvm,cxl=on \
> > > > -enable-kvm \
> > > > -nographic \
> > > > -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
> > > > -device cxl-rp,id=rp0,bus=cxl.0,chassis=0,port=0,slot=0 \
> > > > -object 
> > > > memory-backend-file,id=mem0,mem-path=/tmp/mem0,size=1G,share=true \
> > > > -device cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0 \
> > > > -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=1G
> > >
> > > There are some problems upstream at the moment (probably not cxl related 
> > > but
> > > I'm digging). So today I can't boot an x86 machine. (goody)
> > >
> > >
> > > More generally for the flow that would bring the memory up as system ram
> > > you would typically need the bios to have done the CXL enumeration or
> > > a bunch of scripts in the kernel to have done it.  In general it can't
> > > be fully automated, because there are policy decisions to make on things 
> > > like
> > > interleaving.
> > >
> > > I'm not aware of any open source BIOSs that do it yet.  So you have
> > > to rely on the same kernel paths as for persistent memory - manual 
> > > configuration
> > > etc in the kernel.
> > >
> > > There is support in ndctl for those enabling flows, so I'd look there
> > > for more information
> > >
> > > Jonathan
> > >
> > >
> > > >
> > > >
> > > > I was expecting the CXL memory to be listed in "System Ram", the lsmem
> > > > shows only 2G memory which is System RAM, it's not listing the CXL
> > > > memory.
> > > >
> > > > Do I need to pass any particular parameter in the kernel command line?
> > > >
> > > > Is there any documentation available? I followed the inputs provided in
> > > >
> > > > https://lore.kernel.org/linux-mm/y+csoehvlkudn...@kroah.com/T/
> > > >
> > > > Is there any documentation/blog listed?
> > >
>

Re: [PATCH] HDA codec: Fix wanted_r/w position overflow

2023-08-17 Thread Volker Rümelin


Hi,


From: zeroway 

when the duration now - buft_start reach to some kind of value,
which will get the multiply hda_bytes_per_second(st) * (now - buft_start) 
overflow,
instead of calculate the wanted_r/wpos from start time to current time,
here calculate the each timer tick delta data first in wanted_r/wpos_delta,
and sum it all to wanted_r/wpos to avoid the overflow


you could avoid the multiplication overflow with the following code

#include "qemu/host-utils.h"

    int64_t wanted_rpos = muldiv64(now - buft_start, 
hda_bytes_per_second(st),

   NANOSECONDS_PER_SECOND);

and

    int64_t wanted_wpos = muldiv64(now - buft_start, 
hda_bytes_per_second(st),

   NANOSECONDS_PER_SECOND);

This would be a less intrusive change. The wanted_pos with your code 
will grow slower than with the original code because you sum up 
truncated results from the division by NANOSECONDS_PER_SECONDS rounded 
down to next multiple of 4.


With best regards,
Volker



Signed-off-by: zeroway 
---
  hw/audio/hda-codec.c | 24 
  1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/hw/audio/hda-codec.c b/hw/audio/hda-codec.c
index c51d8ba617..747188221a 100644
--- a/hw/audio/hda-codec.c
+++ b/hw/audio/hda-codec.c
@@ -169,6 +169,8 @@ struct HDAAudioStream {
  uint8_t buf[8192]; /* size must be power of two */
  int64_t rpos;
  int64_t wpos;
+int64_t wanted_rpos;
+int64_t wanted_wpos;
  QEMUTimer *buft;
  int64_t buft_start;
  };
@@ -226,16 +228,18 @@ static void hda_audio_input_timer(void *opaque)
  int64_t wpos = st->wpos;
  int64_t rpos = st->rpos;
  
-int64_t wanted_rpos = hda_bytes_per_second(st) * (now - buft_start)

+int64_t wanted_rpos_delta = hda_bytes_per_second(st) * (now - buft_start)
/ NANOSECONDS_PER_SECOND;
-wanted_rpos &= -4; /* IMPORTANT! clip to frames */
+st->wanted_rpos += wanted_rpos_delta;
+st->wanted_rpos &= -4; /* IMPORTANT! clip to frames */
  
-if (wanted_rpos <= rpos) {

+st->buft_start = now;
+if (st->wanted_rpos <= rpos) {
  /* we already transmitted the data */
  goto out_timer;
  }
  
-int64_t to_transfer = MIN(wpos - rpos, wanted_rpos - rpos);

+int64_t to_transfer = MIN(wpos - rpos, st->wanted_rpos - rpos);
  while (to_transfer) {
  uint32_t start = (rpos & B_MASK);
  uint32_t chunk = MIN(B_SIZE - start, to_transfer);
@@ -290,16 +294,18 @@ static void hda_audio_output_timer(void *opaque)
  int64_t wpos = st->wpos;
  int64_t rpos = st->rpos;
  
-int64_t wanted_wpos = hda_bytes_per_second(st) * (now - buft_start)

+int64_t wanted_wpos_delta = hda_bytes_per_second(st) * (now - buft_start)
/ NANOSECONDS_PER_SECOND;
-wanted_wpos &= -4; /* IMPORTANT! clip to frames */
+st->wanted_wpos += wanted_wpos_delta;
+st->wanted_wpos &= -4; /* IMPORTANT! clip to frames */
  
-if (wanted_wpos <= wpos) {

+st->buft_start = now;
+if (st->wanted_wpos <= wpos) {
  /* we already received the data */
  goto out_timer;
  }
  
-int64_t to_transfer = MIN(B_SIZE - (wpos - rpos), wanted_wpos - wpos);

+int64_t to_transfer = MIN(B_SIZE - (wpos - rpos), st->wanted_wpos - wpos);
  while (to_transfer) {
  uint32_t start = (wpos & B_MASK);
  uint32_t chunk = MIN(B_SIZE - start, to_transfer);
@@ -420,6 +426,8 @@ static void hda_audio_set_running(HDAAudioStream *st, bool 
running)
  int64_t now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
  st->rpos = 0;
  st->wpos = 0;
+st->wanted_rpos = 0;
+st->wanted_wpos = 0;
  st->buft_start = now;
  timer_mod_anticipate_ns(st->buft, now + HDA_TIMER_TICKS);
  } else {

Re: CXL volatile memory is not listed

2023-08-17 Thread Maverickk 78

Hi Fan

Awesome, thanks for the info!

On Fri, 11 Aug 2023 at 22:19, Fan Ni  wrote:
>
> On Fri, Aug 11, 2023 at 07:52:25AM +0530, Maverickk 78 wrote:
> > Thanks Fan,
> >
> > cxl create-region works like a charm :)
> >
> > Since this gets listed as "System Ram(kmem)", I guess the kernel
> > treats it as regular memory and
> > allocates it to the applications when needed?
> > or is there an extra effort needed to make it available for
> > applications on the host?
> >
>
> Yes. Once it is onlined, you can use it as regular memory.
> CXL memory will serve as a zero-CPU memory-only NUMA node.
> You can check it with numactl -H.
>
> To use the cxl memory with an app, you can use
> numactl --membind=numa_id app_name
> #numa_id is the dedicated numa node where cxl memory sits.
>
> One thing to notes, kvm will not work correctly with Qemu emulation when
> you try to use cxl memory for an application, so do not enable kvm.
>
> Fan
>
> > On Thu, 10 Aug 2023 at 22:03, Fan Ni  wrote:
> > >
> > > On Wed, Aug 09, 2023 at 04:21:47AM +0530, Maverickk 78 wrote:
> > > > Hello,
> > > >
> > > > I am running qemu-system-x86_64
> > > >
> > > > qemu-system-x86_64 --version
> > > > QEMU emulator version 8.0.92 (v8.1.0-rc2-80-g0450cf0897)
> > > >
> > > > qemu-system-x86_64 \
> > > > -m 2G,slots=4,maxmem=4G \
> > > > -smp 4 \
> > > > -machine type=q35,accel=kvm,cxl=on \
> > > > -enable-kvm \
> > > > -nographic \
> > > > -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
> > > > -device cxl-rp,id=rp0,bus=cxl.0,chassis=0,port=0,slot=0 \
> > > > -object 
> > > > memory-backend-file,id=mem0,mem-path=/tmp/mem0,size=1G,share=true \
> > > > -device cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0 \
> > > > -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=1G
> > > >
> > > >
> > > > I was expecting the CXL memory to be listed in "System Ram", the lsmem
> > > > shows only 2G memory which is System RAM, it's not listing the CXL
> > > > memory.
> > > >
> > > > Do I need to pass any particular parameter in the kernel command line?
> > > >
> > > > Is there any documentation available? I followed the inputs provided in
> > > >
> > > > https://lore.kernel.org/linux-mm/y+csoehvlkudn...@kroah.com/T/
> > > >
> > > > Is there any documentation/blog listed?
> > >
> > > If I remember it correctly, for volatile cxl memory, we need to create a
> > > region and then it will be discovered as system memory and shows up.
> > >
> > > Try to create a region with "cxl create-region".
> > >
> > > Fan
> > > >

Re: [PATCH 4/4] replay: simple auto-snapshot mode for record

2023-08-17 Thread Pavel Dovgalyuk


On 14.08.2023 19:31, Nicholas Piggin wrote:

record makes an initial snapshot when the machine is created, to enable
reverse-debugging. Often the issue being debugged appears near the end of
the trace, so it is important for performance to keep snapshots close to
the end.

This implements a periodic snapshot mode that keeps a rolling set of
recent snapshots.

Arguably this should be done by the debugger or a program that talks to
QMP, but for setting up simple scenarios and tests, it is convenient to
have this feature.

Signed-off-by: Nicholas Piggin 
---
  docs/system/replay.rst   |  5 
  include/sysemu/replay.h  | 11 
  qemu-options.hx  |  9 +--
  replay/replay-snapshot.c | 57 
  replay/replay.c  | 25 ++
  softmmu/vl.c |  9 +++
  6 files changed, 114 insertions(+), 2 deletions(-)

diff --git a/docs/system/replay.rst b/docs/system/replay.rst
index 3105327423..bef9ea4171 100644
--- a/docs/system/replay.rst
+++ b/docs/system/replay.rst
@@ -156,6 +156,11 @@ for storing VM snapshots. Here is the example of the 
command line for this:
  ``empty.qcow2`` drive does not connected to any virtual block device and used
  for VM snapshots only.
  
+``rrsnapmode`` can be used to select just an initial snapshot or periodic

+snapshots, with ``rrsnapcount`` specifying the number of periodic snapshots
+to maintain, and ``rrsnaptime`` the amount of run time in seconds between
+periodic snapshots.
+
  .. _network-label:
  
  Network devices

diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h
index 08aae5869f..a83e54afc6 100644
--- a/include/sysemu/replay.h
+++ b/include/sysemu/replay.h
@@ -45,6 +45,17 @@ typedef enum ReplayCheckpoint ReplayCheckpoint;
  
  typedef struct ReplayNetState ReplayNetState;
  
+enum ReplaySnapshotMode {

+REPLAY_SNAPSHOT_MODE_INITIAL,
+REPLAY_SNAPSHOT_MODE_PERIODIC,
+};
+typedef enum ReplaySnapshotMode ReplaySnapshotMode;
+
+extern ReplaySnapshotMode replay_snapshot_mode;
+
+extern uint64_t replay_snapshot_periodic_delay;
+extern int replay_snapshot_periodic_nr_keep;
+


It seems that all of these doesn't have to be exported,
you can add it into the internal replay header.


  /* Name of the initial VM snapshot */
  extern char *replay_snapshot
  
diff --git a/qemu-options.hx b/qemu-options.hx

index 29b98c3d4c..0dce93eeab 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4530,13 +4530,13 @@ SRST
  ERST
  
  DEF("icount", HAS_ARG, QEMU_OPTION_icount, \

-"-icount 
[shift=N|auto][,align=on|off][,sleep=on|off][,rr=record|replay,rrfile=[,rrsnapshot=]]\n"
 \
+"-icount 
[shift=N|auto][,align=on|off][,sleep=on|off][,rr=record|replay,rrfile=[,rrsnapshot=][,rrsnapmode=initial|periodic][,rrsnaptime=secs][,rrsnapcount=N]\n"
 \
  "enable virtual instruction counter with 2^N clock ticks 
per\n" \
  "instruction, enable aligning the host and virtual 
clocks\n" \
  "or disable real time cpu sleeping, and optionally 
enable\n" \
  "record-and-replay mode\n", QEMU_ARCH_ALL)
  SRST
-``-icount 
[shift=N|auto][,align=on|off][,sleep=on|off][,rr=record|replay,rrfile=filename[,rrsnapshot=snapshot]]``
+``-icount 
[shift=N|auto][,align=on|off][,sleep=on|off][,rr=record|replay,rrfile=filename[,rrsnapshot=snapshot][,rrsnapmode=initial|periodic][,rrsnaptime=secs][,rrsnapcount=N]]``
  Enable virtual instruction counter. The virtual cpu will execute one
  instruction every 2^N ns of virtual time. If ``auto`` is specified
  then the virtual cpu speed will be automatically adjusted to keep
@@ -4578,6 +4578,11 @@ SRST
  name. In record mode, a new VM snapshot with the given name is created
  at the start of execution recording. In replay mode this option
  specifies the snapshot name used to load the initial VM state.
+``rrsnapmode=periodic`` will additionally cause a periodic snapshot to
+be created after ``rrsnaptime=secs`` seconds of real runtime. The last
+``rrsnapcount=N`` periodic snapshots (not including the initial) will
+be kept (0 for infinite). Periodic snapshots are useful to speed
+reverse debugging operations near the end of the recorded trace.
  ERST
  
  DEF("watchdog-action", HAS_ARG, QEMU_OPTION_watchdog_action, \

diff --git a/replay/replay-snapshot.c b/replay/replay-snapshot.c
index 10a7cf7992..38eac61c43 100644
--- a/replay/replay-snapshot.c
+++ b/replay/replay-snapshot.c
@@ -69,6 +69,53 @@ void replay_vmstate_register(void)
  vmstate_register(NULL, 0, _replay, _state);
  }
  
+static QEMUTimer *replay_snapshot_timer;

+static int replay_snapshot_count;
+
+static void replay_snapshot_timer_cb(void *opaque)
+{
+Error *err = NULL;
+char *name;
+
+if (!replay_can_snapshot()) {
+/* Try again soon */
+timer_mod(replay_snapshot_timer,
+  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
+

Re: [PATCH 3/4] replay: allow runstate shutdown->running when replaying trace

2023-08-17 Thread Pavel Dovgalyuk


Acked-by: Pavel Dovgalyuk 

On 14.08.2023 19:31, Nicholas Piggin wrote:

When replaying a trace, it is possible to go from shutdown to
running with a reverse-debugging step. This can be useful if the
problem being debugged triggers a reset or shutdown.

Signed-off-by: Nicholas Piggin 
---
  include/sysemu/runstate.h |  1 +
  replay/replay.c   |  2 ++
  softmmu/runstate.c| 19 +++
  3 files changed, 22 insertions(+)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 7beb29c2e2..85a1167ccb 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -9,6 +9,7 @@ void runstate_set(RunState new_state);
  RunState runstate_get(void);
  bool runstate_is_running(void);
  bool runstate_needs_reset(void);
+void runstate_replay_enable(void);
  
  typedef void VMChangeStateHandler(void *opaque, bool running, RunState state);
  
diff --git a/replay/replay.c b/replay/replay.c

index 0f7d766efe..e64f71209a 100644
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -272,6 +272,8 @@ static void replay_enable(const char *fname, int mode)
  /* go to the beginning */
  fseek(replay_file, HEADER_SIZE, SEEK_SET);
  replay_fetch_data_kind();
+
+runstate_replay_enable();
  }
  
  replay_init_events();

diff --git a/softmmu/runstate.c b/softmmu/runstate.c
index f3bd862818..9fd3e57485 100644
--- a/softmmu/runstate.c
+++ b/softmmu/runstate.c
@@ -174,6 +174,12 @@ static const RunStateTransition runstate_transitions_def[] 
= {
  { RUN_STATE__MAX, RUN_STATE__MAX },
  };
  
+static const RunStateTransition replay_runstate_transitions_def[] = {

+{ RUN_STATE_SHUTDOWN, RUN_STATE_RUNNING},
+
+{ RUN_STATE__MAX, RUN_STATE__MAX },
+};
+
  static bool runstate_valid_transitions[RUN_STATE__MAX][RUN_STATE__MAX];
  
  bool runstate_check(RunState state)

@@ -181,6 +187,19 @@ bool runstate_check(RunState state)
  return current_run_state == state;
  }
  
+void runstate_replay_enable(void)

+{
+const RunStateTransition *p;
+
+assert(replay_mode == REPLAY_MODE_PLAY);
+
+for (p = _runstate_transitions_def[0]; p->from != RUN_STATE__MAX;
+ p++) {
+runstate_valid_transitions[p->from][p->to] = true;
+}
+
+}
+
  static void runstate_init(void)
  {
  const RunStateTransition *p;

Re: [PATCH 2/4] tests/avocado: replay_linux.py add replay-dump.py test

2023-08-17 Thread Pavel Dovgalyuk


On 14.08.2023 19:31, Nicholas Piggin wrote:

This runs replay-dump.py after recording a trace, and fails the test if
the script fails.

replay-dump.py is modified to exit with non-zero if an error is
encountered while parsing.


I would like to have separate test for replay-dump, because
replay-linux tests are very heavy to replay and knowing the exact
reason of the failure in advance would be more convenient.

What do you think of splitting the test?



Signed-off-by: Nicholas Piggin 
---
It's possible this could introduce failures to existing test if an
unimplemented event gets recorded. I would make a new test for this but
it takes quite a while to record such a long trace that includes some
block and net events to excercise the script.

Thanks,
Nick

  scripts/replay-dump.py|  6 --
  tests/avocado/replay_linux.py | 16 +++-
  2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/scripts/replay-dump.py b/scripts/replay-dump.py
index 937ae19ff1..8f4715632a 100755
--- a/scripts/replay-dump.py
+++ b/scripts/replay-dump.py
@@ -21,6 +21,7 @@
  import argparse
  import struct
  import os
+import sys
  from collections import namedtuple
  
  # This mirrors some of the global replay state which some of the

@@ -97,7 +98,7 @@ def call_decode(table, index, dumpfile):
  print("Could not decode index: %d" % (index))
  print("Entry is: %s" % (decoder))
  print("Decode Table is:\n%s" % (table))
-return False
+sys.exit(1)
  else:
  return decoder.fn(decoder.eid, decoder.name, dumpfile)
  
@@ -118,7 +119,7 @@ def print_event(eid, name, string=None, event_count=None):

  def decode_unimp(eid, name, _unused_dumpfile):
  "Unimplimented decoder, will trigger exit"
  print("%s not handled - will now stop" % (name))
-return False
+sys.exit(1)
  
  # Checkpoint decoder

  def swallow_async_qword(eid, name, dumpfile):
@@ -401,3 +402,4 @@ def decode_file(filename):
  if __name__ == "__main__":
  args = parse_arguments()
  decode_file(args.file)
+sys.exit(0)
diff --git a/tests/avocado/replay_linux.py b/tests/avocado/replay_linux.py
index a76dd507fc..12937ce0ec 100644
--- a/tests/avocado/replay_linux.py
+++ b/tests/avocado/replay_linux.py
@@ -11,6 +11,7 @@
  import os
  import logging
  import time
+import subprocess
  
  from avocado import skipUnless

  from avocado_qemu import BUILD_DIR
@@ -21,6 +22,11 @@
  from avocado.utils.path import find_command
  from avocado_qemu import LinuxTest
  
+from pathlib import Path

+
+self_dir = Path(__file__).parent
+src_dir = self_dir.parent.parent
+
  class ReplayLinux(LinuxTest):
  """
  Boots a Linux system, checking for a successful initialization
@@ -94,7 +100,7 @@ def launch_and_wait(self, record, args, shift):
  else:
  vm.event_wait('SHUTDOWN', self.timeout)
  vm.shutdown(True)
-logger.info('successfully fihished the replay')
+logger.info('successfully finished the replay')
  elapsed = time.time() - start_time
  logger.info('elapsed time %.2f sec' % elapsed)
  return elapsed
@@ -105,6 +111,14 @@ def run_rr(self, args=None, shift=7):
  logger = logging.getLogger('replay')
  logger.info('replay overhead {:.2%}'.format(t2 / t1 - 1))
  
+try:

+replay_path = os.path.join(self.workdir, 'replay.bin')
+subprocess.check_call(["./scripts/replay-dump.py",
+   "-f", replay_path],
+  cwd=src_dir, stdout=subprocess.DEVNULL)
+except subprocess.CalledProcessError:
+self.fail('replay-dump.py failed')
+
  @skipUnless(os.getenv('AVOCADO_TIMEOUT_EXPECTED'), 'Test might timeout')
  class ReplayLinuxX8664(ReplayLinux):
  """

Re: [PATCH 1/4] scripts/replay_dump.sh: Update to current rr record format

2023-08-17 Thread Pavel Dovgalyuk


On 14.08.2023 19:31, Nicholas Piggin wrote:

This thing seems to have fallen by the wayside. This gets it working with
the current format, although does not quite implement all events.

Signed-off-by: Nicholas Piggin 


The code looks ok, therefore
Rewieved-by: Pavel Dovgalyuk 

However, there is one thing about idea or replay-dump script.
I think it never will be used for parsing the older versions of the log.
Record/replay can only replay the log generated by the same
QEMU version. Any of virtual hw behavior change or some main loop
refactoring may break the replay.

That is why I think that support of the past replay log
formats is useless.


---
My python skills are not good. Any help on this or patch 2 is
appreciated.

Thanks,
Nick

  scripts/replay-dump.py | 107 ++---
  1 file changed, 101 insertions(+), 6 deletions(-)

diff --git a/scripts/replay-dump.py b/scripts/replay-dump.py
index 3ba97a6d30..937ae19ff1 100755
--- a/scripts/replay-dump.py
+++ b/scripts/replay-dump.py
@@ -20,6 +20,7 @@
  
  import argparse

  import struct
+import os
  from collections import namedtuple
  
  # This mirrors some of the global replay state which some of the

@@ -62,6 +63,10 @@ def read_byte(fin):
  "Read a single byte"
  return struct.unpack('>B', fin.read(1))[0]
  
+def read_bytes(fin, nr):

+"Read a nr bytes"
+return fin.read(nr)
+
  def read_event(fin):
  "Read a single byte event, but save some state"
  if replay_state.already_read:
@@ -122,12 +127,18 @@ def swallow_async_qword(eid, name, dumpfile):
  print("  %s(%d) @ %d" % (name, eid, step_id))
  return True
  
+def swallow_bytes(eid, name, dumpfile, nr):

+"Swallow nr bytes of data without looking at it"
+dumpfile.seek(nr, os.SEEK_CUR)
+return True
+
  async_decode_table = [ Decoder(0, "REPLAY_ASYNC_EVENT_BH", 
swallow_async_qword),
-   Decoder(1, "REPLAY_ASYNC_INPUT", decode_unimp),
-   Decoder(2, "REPLAY_ASYNC_INPUT_SYNC", decode_unimp),
-   Decoder(3, "REPLAY_ASYNC_CHAR_READ", decode_unimp),
-   Decoder(4, "REPLAY_ASYNC_EVENT_BLOCK", decode_unimp),
-   Decoder(5, "REPLAY_ASYNC_EVENT_NET", decode_unimp),
+   Decoder(1, "REPLAY_ASYNC_BH_ONESHOT", decode_unimp),
+   Decoder(2, "REPLAY_ASYNC_INPUT", decode_unimp),
+   Decoder(3, "REPLAY_ASYNC_INPUT_SYNC", decode_unimp),
+   Decoder(4, "REPLAY_ASYNC_CHAR_READ", decode_unimp),
+   Decoder(5, "REPLAY_ASYNC_EVENT_BLOCK", decode_unimp),
+   Decoder(6, "REPLAY_ASYNC_EVENT_NET", decode_unimp),
  ]
  # See replay_read_events/replay_read_event
  def decode_async(eid, name, dumpfile):
@@ -156,6 +167,13 @@ def decode_audio_out(eid, name, dumpfile):
  print_event(eid, name, "%d" % (audio_data))
  return True
  
+def decode_random(eid, name, dumpfile):

+ret = read_dword(dumpfile)
+size = read_dword(dumpfile)
+swallow_bytes(eid, name, dumpfile, size)
+print_event(eid, name, "%d %d" % (ret, size))
+return True
+
  def decode_checkpoint(eid, name, dumpfile):
  """Decode a checkpoint.
  
@@ -184,6 +202,38 @@ def decode_interrupt(eid, name, dumpfile):

  print_event(eid, name)
  return True
  
+def decode_exception(eid, name, dumpfile):

+print_event(eid, name)
+return True
+
+def decode_shutdown(eid, name, dumpfile):
+print_event(eid, name)
+return True
+
+def decode_end(eid, name, dumpfile):
+print_event(eid, name)
+return False
+
+def decode_char_write(eid, name, dumpfile):
+res = read_dword(dumpfile)
+offset = read_dword(dumpfile)
+print_event(eid, name)
+return True
+
+def decode_async_char_read(eid, name, dumpfile):
+char_id = read_byte(dumpfile)
+size = read_dword(dumpfile)
+print_event(eid, name, "device:%x chars:%s" % (char_id, 
read_bytes(dumpfile, size)))
+return True
+
+def decode_async_net(eid, name, dumpfile):
+net_id = read_byte(dumpfile)
+flags = read_dword(dumpfile)
+size = read_dword(dumpfile)
+swallow_bytes(eid, name, dumpfile, size)
+print_event(eid, name, "net:%x flags:%x bytes:%d" % (net_id, flags, size))
+return True
+
  def decode_clock(eid, name, dumpfile):
  clock_data = read_qword(dumpfile)
  print_event(eid, name, "0x%x" % (clock_data))
@@ -268,6 +318,48 @@ def decode_clock(eid, name, dumpfile):
Decoder(28, "EVENT_CP_RESET", decode_checkpoint),
  ]
  
+v12_event_table = [Decoder(0, "EVENT_INSTRUCTION", decode_instruction),

+  Decoder(1, "EVENT_INTERRUPT", decode_interrupt),
+  Decoder(2, "EVENT_EXCEPTION", decode_exception),
+  Decoder(3, "EVENT_ASYNC_BH", swallow_async_qword),
+  Decoder(4, "EVENT_ASYNC_BH_ONESHOT", swallow_async_qword),
+  Decoder(5, "EVENT_ASYNC_INPUT",

[PATCH RESEND v5 13/26] hw/core/cpu: Return static value with gdb_arch_name()

2023-08-17 Thread Akihiko Odaki

All implementations of gdb_arch_name() returns dynamic duplicates of
static strings. It's also unlikely that there will be an implementation
of gdb_arch_name() that returns a truly dynamic value due to the nature
of the function returning a well-known identifiers. Qualify the value
gdb_arch_name() with const and make all of its implementations return
static strings.

Signed-off-by: Akihiko Odaki 
---
 include/hw/core/cpu.h  | 2 +-
 target/ppc/internal.h  | 2 +-
 gdbstub/gdbstub.c  | 4 +---
 target/arm/cpu.c   | 6 +++---
 target/arm/cpu64.c | 4 ++--
 target/i386/cpu.c  | 6 +++---
 target/loongarch/cpu.c | 4 ++--
 target/ppc/gdbstub.c   | 6 +++---
 target/riscv/cpu.c | 6 +++---
 target/s390x/cpu.c | 4 ++--
 target/tricore/cpu.c   | 4 ++--
 11 files changed, 23 insertions(+), 25 deletions(-)

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 84219c1885..09f1aca624 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -165,7 +165,7 @@ struct CPUClass {
 vaddr (*gdb_adjust_breakpoint)(CPUState *cpu, vaddr addr);
 
 const GDBFeature *gdb_core_feature;
-gchar * (*gdb_arch_name)(CPUState *cpu);
+const gchar * (*gdb_arch_name)(CPUState *cpu);
 const char * (*gdb_get_dynamic_xml)(CPUState *cpu, const char *xmlname);
 
 void (*disas_set_info)(CPUState *cpu, disassemble_info *info);
diff --git a/target/ppc/internal.h b/target/ppc/internal.h
index 57acb3212c..974b37aa60 100644
--- a/target/ppc/internal.h
+++ b/target/ppc/internal.h
@@ -221,7 +221,7 @@ void destroy_ppc_opcodes(PowerPCCPU *cpu);
 
 /* gdbstub.c */
 void ppc_gdb_init(CPUState *cs, PowerPCCPUClass *ppc);
-gchar *ppc_gdb_arch_name(CPUState *cs);
+const gchar *ppc_gdb_arch_name(CPUState *cs);
 
 /**
  * prot_for_access_type:
diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index ee6b8b98c8..5656a44970 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -378,11 +378,9 @@ static const char *get_feature_xml(const char *p, const 
char **newp,
 ""
 "");
 if (cc->gdb_arch_name) {
-gchar *arch = cc->gdb_arch_name(cpu);
 pstrcat(buf, buf_sz, "");
-pstrcat(buf, buf_sz, arch);
+pstrcat(buf, buf_sz, cc->gdb_arch_name(cpu));
 pstrcat(buf, buf_sz, "");
-g_free(arch);
 }
 pstrcat(buf, buf_sz, "gdb_core_feature->xmlname);
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index a206ab6b1b..5f07133419 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -2281,15 +2281,15 @@ static Property arm_cpu_properties[] = {
 DEFINE_PROP_END_OF_LIST()
 };
 
-static gchar *arm_gdb_arch_name(CPUState *cs)
+static const gchar *arm_gdb_arch_name(CPUState *cs)
 {
 ARMCPU *cpu = ARM_CPU(cs);
 CPUARMState *env = >env;
 
 if (arm_feature(env, ARM_FEATURE_IWMMXT)) {
-return g_strdup("iwmmxt");
+return "iwmmxt";
 }
-return g_strdup("arm");
+return "arm";
 }
 
 #ifndef CONFIG_USER_ONLY
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 9c2a226159..65f84bfb18 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -743,9 +743,9 @@ static void aarch64_cpu_finalizefn(Object *obj)
 {
 }
 
-static gchar *aarch64_gdb_arch_name(CPUState *cs)
+static const gchar *aarch64_gdb_arch_name(CPUState *cs)
 {
-return g_strdup("aarch64");
+return "aarch64";
 }
 
 static void aarch64_cpu_class_init(ObjectClass *oc, void *data)
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 069410985f..c1a7667f4a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5910,12 +5910,12 @@ static void x86_cpu_load_model(X86CPU *cpu, X86CPUModel 
*model)
 memset(>user_features, 0, sizeof(env->user_features));
 }
 
-static gchar *x86_gdb_arch_name(CPUState *cs)
+static const gchar *x86_gdb_arch_name(CPUState *cs)
 {
 #ifdef TARGET_X86_64
-return g_strdup("i386:x86-64");
+return "i386:x86-64";
 #else
-return g_strdup("i386");
+return "i386";
 #endif
 }
 
diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index b204cb279d..6c76d14e43 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -692,9 +692,9 @@ static const struct SysemuCPUOps loongarch_sysemu_ops = {
 };
 #endif
 
-static gchar *loongarch_gdb_arch_name(CPUState *cs)
+static const gchar *loongarch_gdb_arch_name(CPUState *cs)
 {
-return g_strdup("loongarch64");
+return "loongarch64";
 }
 
 static void loongarch_cpu_class_init(ObjectClass *c, void *data)
diff --git a/target/ppc/gdbstub.c b/target/ppc/gdbstub.c
index 70cac919e0..dbdee7d56e 100644
--- a/target/ppc/gdbstub.c
+++ b/target/ppc/gdbstub.c
@@ -572,12 +572,12 @@ static int gdb_set_vsx_reg(CPUPPCState *env, uint8_t 
*mem_buf, int n)
 return 0;
 }
 
-gchar *ppc_gdb_arch_name(CPUState *cs)
+const gchar *ppc_gdb_arch_name(CPUState *cs)
 {
 #if defined(TARGET_PPC64)
-return g_strdup("powerpc:common64");
+return "powerpc:common64";

[PATCH RESEND v5 24/26] contrib/plugins: Allow to log registers

2023-08-17 Thread Akihiko Odaki

This demonstrates how a register can be read from a plugin.

Signed-off-by: Akihiko Odaki 
---
 docs/devel/tcg-plugins.rst |  10 ++-
 contrib/plugins/execlog.c  | 140 -
 2 files changed, 117 insertions(+), 33 deletions(-)

diff --git a/docs/devel/tcg-plugins.rst b/docs/devel/tcg-plugins.rst
index 81dcd43a61..c9f8b27590 100644
--- a/docs/devel/tcg-plugins.rst
+++ b/docs/devel/tcg-plugins.rst
@@ -497,6 +497,15 @@ arguments if required::
   $ qemu-system-arm $(QEMU_ARGS) \
 -plugin ./contrib/plugins/libexeclog.so,ifilter=st1w,afilter=0x40001808 -d 
plugin
 
+This plugin can also dump a specified register. The specification of register
+follows `GDB standard target features 
`__.
+
+Specify the name of the feature that contains the register and the name of the
+register with ``rfile`` and ``reg`` options, respectively::
+
+  $ qemu-system-arm $(QEMU_ARGS) \
+-plugin ./contrib/plugins/libexeclog.so,rfile=org.gnu.gdb.arm.core,reg=sp 
-d plugin
+
 - contrib/plugins/cache.c
 
 Cache modelling plugin that measures the performance of a given L1 cache
@@ -583,4 +592,3 @@ The following API is generated from the inline 
documentation in
 include the full kernel-doc annotations.
 
 .. kernel-doc:: include/qemu/qemu-plugin.h
-
diff --git a/contrib/plugins/execlog.c b/contrib/plugins/execlog.c
index 82dc2f584e..aa05840fd0 100644
--- a/contrib/plugins/execlog.c
+++ b/contrib/plugins/execlog.c
@@ -15,27 +15,43 @@
 
 #include 
 
+typedef struct CPU {
+/* Store last executed instruction on each vCPU as a GString */
+GString *last_exec;
+GByteArray *reg_history[2];
+
+int reg;
+} CPU;
+
 QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
 
-/* Store last executed instruction on each vCPU as a GString */
-static GPtrArray *last_exec;
+static CPU *cpus;
+static int num_cpus;
 static GRWLock expand_array_lock;
 
 static GPtrArray *imatches;
 static GArray *amatches;
 
+static char *rfile_name;
+static char *reg_name;
+
 /*
- * Expand last_exec array.
+ * Expand cpu array.
  *
  * As we could have multiple threads trying to do this we need to
  * serialise the expansion under a lock.
  */
-static void expand_last_exec(int cpu_index)
+static void expand_cpu(int cpu_index)
 {
 g_rw_lock_writer_lock(_array_lock);
-while (cpu_index >= last_exec->len) {
-GString *s = g_string_new(NULL);
-g_ptr_array_add(last_exec, s);
+if (cpu_index >= num_cpus) {
+cpus = g_realloc_n(cpus, cpu_index + 1, sizeof(*cpus));
+while (cpu_index >= num_cpus) {
+cpus[num_cpus].last_exec = g_string_new(NULL);
+cpus[num_cpus].reg_history[0] = g_byte_array_new();
+cpus[num_cpus].reg_history[1] = g_byte_array_new();
+num_cpus++;
+}
 }
 g_rw_lock_writer_unlock(_array_lock);
 }
@@ -50,8 +66,8 @@ static void vcpu_mem(unsigned int cpu_index, 
qemu_plugin_meminfo_t info,
 
 /* Find vCPU in array */
 g_rw_lock_reader_lock(_array_lock);
-g_assert(cpu_index < last_exec->len);
-s = g_ptr_array_index(last_exec, cpu_index);
+g_assert(cpu_index < num_cpus);
+s = cpus[cpu_index].last_exec;
 g_rw_lock_reader_unlock(_array_lock);
 
 /* Indicate type of memory access */
@@ -77,28 +93,42 @@ static void vcpu_mem(unsigned int cpu_index, 
qemu_plugin_meminfo_t info,
  */
 static void vcpu_insn_exec(unsigned int cpu_index, void *udata)
 {
-GString *s;
+int n;
+int i;
 
-/* Find or create vCPU in array */
 g_rw_lock_reader_lock(_array_lock);
-if (cpu_index >= last_exec->len) {
-g_rw_lock_reader_unlock(_array_lock);
-expand_last_exec(cpu_index);
-g_rw_lock_reader_lock(_array_lock);
-}
-s = g_ptr_array_index(last_exec, cpu_index);
-g_rw_lock_reader_unlock(_array_lock);
 
 /* Print previous instruction in cache */
-if (s->len) {
-qemu_plugin_outs(s->str);
+if (cpus[cpu_index].last_exec->len) {
+if (cpus[cpu_index].reg >= 0) {
+GByteArray *current = cpus[cpu_index].reg_history[0];
+GByteArray *last = cpus[cpu_index].reg_history[1];
+
+g_byte_array_set_size(current, 0);
+n = qemu_plugin_read_register(current, cpus[cpu_index].reg);
+
+if (n != last->len || memcmp(current->data, last->data, n)) {
+g_string_append(cpus[cpu_index].last_exec, ", reg,");
+for (i = 0; i < n; i++) {
+g_string_append_printf(cpus[cpu_index].last_exec, " %02x",
+   current->data[i]);
+}
+}
+
+cpus[cpu_index].reg_history[0] = last;
+cpus[cpu_index].reg_history[1] = current;
+}
+
+qemu_plugin_outs(cpus[cpu_index].last_exec->str);
 qemu_plugin_outs("\n");
 }
 
 /* Store new instruction in cache */
 /* vcpu_mem will

[PATCH RESEND v5 08/26] target/arm: Use GDBFeature for dynamic XML

2023-08-17 Thread Akihiko Odaki

In preparation for a change to use GDBFeature as a parameter of
gdb_register_coprocessor(), convert the internal representation of
dynamic feature from plain XML to GDBFeature.

Signed-off-by: Akihiko Odaki 
Acked-by: Richard Henderson 
---
 target/arm/cpu.h   |  20 +++---
 target/arm/internals.h |   2 +-
 target/arm/gdbstub.c   | 134 ++---
 target/arm/gdbstub64.c |  90 ---
 4 files changed, 108 insertions(+), 138 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 88e5accda6..d6c2378d05 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -136,23 +136,21 @@ enum {
  */
 
 /**
- * DynamicGDBXMLInfo:
- * @desc: Contains the XML descriptions.
- * @num: Number of the registers in this XML seen by GDB.
+ * DynamicGDBFeatureInfo:
+ * @desc: Contains the feature descriptions.
  * @data: A union with data specific to the set of registers
  *@cpregs_keys: Array that contains the corresponding Key of
  *  a given cpreg with the same order of the cpreg
  *  in the XML description.
  */
-typedef struct DynamicGDBXMLInfo {
-char *desc;
-int num;
+typedef struct DynamicGDBFeatureInfo {
+GDBFeature desc;
 union {
 struct {
 uint32_t *keys;
 } cpregs;
 } data;
-} DynamicGDBXMLInfo;
+} DynamicGDBFeatureInfo;
 
 /* CPU state for each instance of a generic timer (in cp15 c14) */
 typedef struct ARMGenericTimer {
@@ -881,10 +879,10 @@ struct ArchCPU {
 uint64_t *cpreg_vmstate_values;
 int32_t cpreg_vmstate_array_len;
 
-DynamicGDBXMLInfo dyn_sysreg_xml;
-DynamicGDBXMLInfo dyn_svereg_xml;
-DynamicGDBXMLInfo dyn_m_systemreg_xml;
-DynamicGDBXMLInfo dyn_m_secextreg_xml;
+DynamicGDBFeatureInfo dyn_sysreg_feature;
+DynamicGDBFeatureInfo dyn_svereg_feature;
+DynamicGDBFeatureInfo dyn_m_systemreg_feature;
+DynamicGDBFeatureInfo dyn_m_secextreg_feature;
 
 /* Timers used by the generic (architected) timer */
 QEMUTimer *gt_timer[NUM_GTIMERS];
diff --git a/target/arm/internals.h b/target/arm/internals.h
index 0f01bc32a8..ca20e4fd1e 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -1388,7 +1388,7 @@ static inline uint64_t pmu_counter_mask(CPUARMState *env)
 }
 
 #ifdef TARGET_AARCH64
-int arm_gen_dynamic_svereg_xml(CPUState *cpu, int base_reg);
+GDBFeature *arm_gen_dynamic_svereg_feature(CPUState *cpu);
 int aarch64_gdb_get_sve_reg(CPUARMState *env, GByteArray *buf, int reg);
 int aarch64_gdb_set_sve_reg(CPUARMState *env, uint8_t *buf, int reg);
 int aarch64_gdb_get_fpu_reg(CPUARMState *env, GByteArray *buf, int reg);
diff --git a/target/arm/gdbstub.c b/target/arm/gdbstub.c
index f421c5d041..daa68ead66 100644
--- a/target/arm/gdbstub.c
+++ b/target/arm/gdbstub.c
@@ -25,11 +25,10 @@
 #include "internals.h"
 #include "cpregs.h"
 
-typedef struct RegisterSysregXmlParam {
+typedef struct RegisterSysregFeatureParam {
 CPUState *cs;
-GString *s;
-int n;
-} RegisterSysregXmlParam;
+GDBFeatureBuilder builder;
+} RegisterSysregFeatureParam;
 
 /* Old gdb always expect FPA registers.  Newer (xml-aware) gdb only expect
whatever the target description contains.  Due to a historical mishap
@@ -243,7 +242,7 @@ static int arm_gdb_get_sysreg(CPUARMState *env, GByteArray 
*buf, int reg)
 const ARMCPRegInfo *ri;
 uint32_t key;
 
-key = cpu->dyn_sysreg_xml.data.cpregs.keys[reg];
+key = cpu->dyn_sysreg_feature.data.cpregs.keys[reg];
 ri = get_arm_cp_reginfo(cpu->cp_regs, key);
 if (ri) {
 if (cpreg_field_is_64bit(ri)) {
@@ -260,34 +259,32 @@ static int arm_gdb_set_sysreg(CPUARMState *env, uint8_t 
*buf, int reg)
 return 0;
 }
 
-static void arm_gen_one_xml_sysreg_tag(GString *s, DynamicGDBXMLInfo *dyn_xml,
+static void arm_gen_one_feature_sysreg(GDBFeatureBuilder *builder,
+   DynamicGDBFeatureInfo *dyn_feature,
ARMCPRegInfo *ri, uint32_t ri_key,
-   int bitsize, int regnum)
+   int bitsize)
 {
-g_string_append_printf(s, "name);
-g_string_append_printf(s, " bitsize=\"%d\"", bitsize);
-g_string_append_printf(s, " regnum=\"%d\"", regnum);
-g_string_append_printf(s, " group=\"cp_regs\"/>");
-dyn_xml->data.cpregs.keys[dyn_xml->num] = ri_key;
-dyn_xml->num++;
+dyn_feature->data.cpregs.keys[dyn_feature->desc.num_regs] = ri_key;
+
+gdb_feature_builder_append_reg(builder, ri->name, bitsize,
+   "int", "cp_regs");
 }
 
-static void arm_register_sysreg_for_xml(gpointer key, gpointer value,
-gpointer p)
+static void arm_register_sysreg_for_feature(gpointer key, gpointer value,
+gpointer p)
 {
 uint32_t ri_key = (uintptr_t)key;
 ARMCPRegInfo *ri = value;
-RegisterSysregXmlParam

[PATCH RESEND v5 23/26] plugins: Allow to read registers

2023-08-17 Thread Akihiko Odaki

It is based on GDB protocol to ensure interface stability.

The timing of the vcpu init hook is also changed so that the hook will
get called after GDB features are initialized.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1706
Signed-off-by: Akihiko Odaki 
---
 include/qemu/qemu-plugin.h   | 65 ++--
 plugins/api.c| 40 ++
 plugins/qemu-plugins.symbols |  2 ++
 3 files changed, 104 insertions(+), 3 deletions(-)

diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
index 50a9957279..214b12bfd6 100644
--- a/include/qemu/qemu-plugin.h
+++ b/include/qemu/qemu-plugin.h
@@ -11,6 +11,7 @@
 #ifndef QEMU_QEMU_PLUGIN_H
 #define QEMU_QEMU_PLUGIN_H
 
+#include 
 #include 
 #include 
 #include 
@@ -51,7 +52,7 @@ typedef uint64_t qemu_plugin_id_t;
 
 extern QEMU_PLUGIN_EXPORT int qemu_plugin_version;
 
-#define QEMU_PLUGIN_VERSION 1
+#define QEMU_PLUGIN_VERSION 2
 
 /**
  * struct qemu_info_t - system information for plugins
@@ -218,8 +219,8 @@ struct qemu_plugin_insn;
  * @QEMU_PLUGIN_CB_R_REGS: callback reads the CPU's regs
  * @QEMU_PLUGIN_CB_RW_REGS: callback reads and writes the CPU's regs
  *
- * Note: currently unused, plugins cannot read or change system
- * register state.
+ * Note: currently QEMU_PLUGIN_CB_RW_REGS is unused, plugins cannot change
+ * system register state.
  */
 enum qemu_plugin_cb_flags {
 QEMU_PLUGIN_CB_NO_REGS,
@@ -664,4 +665,62 @@ uint64_t qemu_plugin_end_code(void);
  */
 uint64_t qemu_plugin_entry_code(void);
 
+/**
+ * struct qemu_plugin_register_file_t - register information
+ *
+ * This structure identifies registers. The identifiers included in this
+ * structure are identical with names used in GDB's standard target features
+ * with some extensions. For details, see:
+ * https://sourceware.org/gdb/onlinedocs/gdb/Standard-Target-Features.html
+ *
+ * A register is uniquely identified with the combination of a feature name
+ * and a register name or a register number. It is recommended to derive
+ * register numbers from feature names and register names each time a new vcpu
+ * starts.
+ *
+ * To derive the register number from a feature name and a register name,
+ * first look up qemu_plugin_register_file_t with the feature name, and then
+ * look up the register name in its @regs. The sum of the @base_reg and the
+ * index in the @reg is the register number.
+ *
+ * Note that @regs may have holes; some elements of @regs may be NULL.
+ */
+typedef struct qemu_plugin_register_file_t {
+/** @name: feature name */
+const char *name;
+/** @regs: register names */
+const char * const *regs;
+/** @base_reg: the base identified number */
+int base_reg;
+/** @num_regs: the number of elements in @regs */
+int num_regs;
+} qemu_plugin_register_file_t;
+
+/**
+ * qemu_plugin_get_register_files() - returns register information
+ *
+ * @vcpu_index: the index of the vcpu context
+ * @size: the pointer to the variable to hold the number of returned elements
+ *
+ * Returns an array of qemu_plugin_register_file_t. The user should g_free()
+ * the array once no longer needed.
+ */
+qemu_plugin_register_file_t *
+qemu_plugin_get_register_files(unsigned int vcpu_index, int *size);
+
+/**
+ * qemu_plugin_read_register() - read register
+ *
+ * @buf: the byte array to append the read register content to.
+ * @reg: the register identifier determined with
+ *   qemu_plugin_get_register_files().
+ *
+ * This function is only available in a context that register read access is
+ * explicitly requested.
+ *
+ * Returns the size of the read register. The content of @buf is in target byte
+ * order.
+ */
+int qemu_plugin_read_register(GByteArray *buf, int reg);
+
 #endif /* QEMU_QEMU_PLUGIN_H */
diff --git a/plugins/api.c b/plugins/api.c
index 2078b16edb..e1b22c98f5 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -35,6 +35,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/main-loop.h"
 #include "qemu/plugin.h"
 #include "qemu/log.h"
 #include "tcg/tcg.h"
@@ -442,3 +443,42 @@ uint64_t qemu_plugin_entry_code(void)
 #endif
 return entry;
 }
+
+static void count_gdb_feature(void *opaque, const GDBFeature *feature,
+  int base_reg)
+{
+(*(int *)opaque)++;
+}
+
+static void map_gdb_feature(void *opaque, const GDBFeature *feature,
+int base_reg)
+{
+qemu_plugin_register_file_t **cursor = opaque;
+(*cursor)->name = feature->name;
+(*cursor)->regs = feature->regs;
+(*cursor)->base_reg = base_reg;
+(*cursor)->num_regs = feature->num_regs;
+(*cursor)++;
+}
+
+qemu_plugin_register_file_t *
+qemu_plugin_get_register_files(unsigned int vcpu_index, int *size)
+{
+QEMU_IOTHREAD_LOCK_GUARD();
+
+*size = 0;
+gdb_foreach_feature(qemu_get_cpu(vcpu_index), count_gdb_feature, size);
+
+qemu_plugin_register_file_t *files =
+g_new(qemu_plugin_register_file_t, *size);
+
+

[PATCH RESEND v5 22/26] cpu: Call plugin hooks only when ready

2023-08-17 Thread Akihiko Odaki

The initialization and exit hooks will not affect the state of vCPU,
but they may depend on the state of vCPU. Therefore, it's better to
call plugin hooks after the vCPU state is fully initialized and before
it gets uninitialized.

Signed-off-by: Akihiko Odaki 
---
 cpu.c| 11 ---
 hw/core/cpu-common.c | 10 ++
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/cpu.c b/cpu.c
index 1c948d1161..2552c85249 100644
--- a/cpu.c
+++ b/cpu.c
@@ -42,7 +42,6 @@
 #include "hw/core/accel-cpu.h"
 #include "trace/trace-root.h"
 #include "qemu/accel.h"
-#include "qemu/plugin.h"
 
 uintptr_t qemu_host_page_size;
 intptr_t qemu_host_page_mask;
@@ -148,11 +147,6 @@ void cpu_exec_realizefn(CPUState *cpu, Error **errp)
 /* Wait until cpu initialization complete before exposing cpu. */
 cpu_list_add(cpu);
 
-/* Plugin initialization must wait until cpu_index assigned. */
-if (tcg_enabled()) {
-qemu_plugin_vcpu_init_hook(cpu);
-}
-
 #ifdef CONFIG_USER_ONLY
 assert(qdev_get_vmsd(DEVICE(cpu)) == NULL ||
qdev_get_vmsd(DEVICE(cpu))->unmigratable);
@@ -179,11 +173,6 @@ void cpu_exec_unrealizefn(CPUState *cpu)
 }
 #endif
 
-/* Call the plugin hook before clearing cpu->cpu_index in cpu_list_remove 
*/
-if (tcg_enabled()) {
-qemu_plugin_vcpu_exit_hook(cpu);
-}
-
 cpu_list_remove(cpu);
 /*
  * Now that the vCPU has been removed from the RCU list, we can call
diff --git a/hw/core/cpu-common.c b/hw/core/cpu-common.c
index ced66c2b34..be1544687e 100644
--- a/hw/core/cpu-common.c
+++ b/hw/core/cpu-common.c
@@ -209,6 +209,11 @@ static void cpu_common_realizefn(DeviceState *dev, Error 
**errp)
 cpu_resume(cpu);
 }
 
+/* Plugin initialization must wait until the cpu is fully realized. */
+if (tcg_enabled()) {
+qemu_plugin_vcpu_init_hook(cpu);
+}
+
 /* NOTE: latest generic point where the cpu is fully realized */
 }
 
@@ -216,6 +221,11 @@ static void cpu_common_unrealizefn(DeviceState *dev)
 {
 CPUState *cpu = CPU(dev);
 
+/* Call the plugin hook before clearing the cpu is fully unrealized */
+if (tcg_enabled()) {
+qemu_plugin_vcpu_exit_hook(cpu);
+}
+
 /* NOTE: latest generic point before the cpu is fully unrealized */
 cpu_exec_unrealizefn(cpu);
 }
-- 
2.41.0

[PATCH RESEND v5 26/26] contrib/plugins: Add cc plugin

2023-08-17 Thread Akihiko Odaki

This demonstrates how to write a plugin in C++.

Signed-off-by: Akihiko Odaki 
---
 docs/devel/tcg-plugins.rst |  8 
 configure  | 15 ---
 contrib/plugins/Makefile   |  5 +
 contrib/plugins/cc.cc  | 17 +
 tests/tcg/Makefile.target  |  3 +++
 5 files changed, 45 insertions(+), 3 deletions(-)
 create mode 100644 contrib/plugins/cc.cc

diff --git a/docs/devel/tcg-plugins.rst b/docs/devel/tcg-plugins.rst
index c9f8b27590..0a11f8036c 100644
--- a/docs/devel/tcg-plugins.rst
+++ b/docs/devel/tcg-plugins.rst
@@ -584,6 +584,14 @@ The plugin has a number of arguments, all of them are 
optional:
   configuration arguments implies ``l2=on``.
   (default: N = 2097152 (2MB), B = 64, A = 16)
 
+- contrib/plugins/cc.cc
+
+cc plugin demonstrates how to write a plugin in C++. It simply outputs
+"hello, world" to the plugin log::
+
+  $ qemu-system-arm $(QEMU_ARGS) \
+-plugin ./contrib/plugins/libcc.so -d plugin
+
 API
 ---
 
diff --git a/configure b/configure
index 26ec5e4f54..0065b0dfe0 100755
--- a/configure
+++ b/configure
@@ -293,10 +293,18 @@ else
   cc="${CC-${cross_prefix}gcc}"
 fi
 
-if test -z "${CXX}${cross_prefix}"; then
-  cxx="c++"
+if test -n "${CXX+x}"; then
+  cxx="$CXX"
 else
-  cxx="${CXX-${cross_prefix}g++}"
+  if test -n "${cross_prefix}"; then
+cxx="${cross_prefix}g++"
+  else
+cxx="c++"
+  fi
+
+  if ! has "$cxx"; then
+cxx=
+  fi
 fi
 
 # Preferred ObjC compiler:
@@ -1702,6 +1710,7 @@ echo "MESON=$meson" >> $config_host_mak
 echo "NINJA=$ninja" >> $config_host_mak
 echo "PKG_CONFIG=${pkg_config}" >> $config_host_mak
 echo "CC=$cc" >> $config_host_mak
+echo "CXX=$cxx" >> $config_host_mak
 echo "EXESUF=$EXESUF" >> $config_host_mak
 
 # use included Linux headers
diff --git a/contrib/plugins/Makefile b/contrib/plugins/Makefile
index b2b9db9f51..93d86b3d07 100644
--- a/contrib/plugins/Makefile
+++ b/contrib/plugins/Makefile
@@ -21,6 +21,9 @@ NAMES += lockstep
 NAMES += hwprofile
 NAMES += cache
 NAMES += drcov
+ifneq ($(CXX),)
+NAMES += cc
+endif
 
 SONAMES := $(addsuffix .so,$(addprefix lib,$(NAMES)))
 
@@ -31,6 +34,8 @@ CFLAGS += -fPIC -Wall
 CFLAGS += $(if $(CONFIG_DEBUG_TCG), -ggdb -O0)
 CFLAGS += -I$(SRC_PATH)/include/qemu
 
+CXXFLAGS := $(CFLAGS)
+
 all: $(SONAMES)
 
 %.o: %.c
diff --git a/contrib/plugins/cc.cc b/contrib/plugins/cc.cc
new file mode 100644
index 00..83a5528db0
--- /dev/null
+++ b/contrib/plugins/cc.cc
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#include 
+
+extern "C" {
+
+QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
+
+QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
+   const qemu_info_t *info, int argc,
+   char **argv)
+{
+qemu_plugin_outs("hello, world\n");
+return 0;
+}
+
+};
diff --git a/tests/tcg/Makefile.target b/tests/tcg/Makefile.target
index 462289f47c..3d7837d3b8 100644
--- a/tests/tcg/Makefile.target
+++ b/tests/tcg/Makefile.target
@@ -149,6 +149,9 @@ PLUGIN_SRC=$(SRC_PATH)/tests/plugin
 PLUGIN_LIB=../../plugin
 VPATH+=$(PLUGIN_LIB)
 PLUGINS=$(patsubst %.c, lib%.so, $(notdir $(wildcard $(PLUGIN_SRC)/*.c)))
+ifneq ($(CXX),)
+PLUGINS+=$(patsubst %.cc, lib%.so, $(notdir $(wildcard $(PLUGIN_SRC)/*.cc)))
+endif
 
 # We need to ensure expand the run-plugin-TEST-with-PLUGIN
 # pre-requistes manually here as we can't use stems to handle it. We
-- 
2.41.0

[PATCH RESEND v5 12/26] gdbstub: Use GDBFeature for GDBRegisterState

2023-08-17 Thread Akihiko Odaki

Simplify GDBRegisterState by replacing num_regs and xml members with
one member that points to GDBFeature.

Signed-off-by: Akihiko Odaki 
---
 gdbstub/gdbstub.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index b62002bc34..ee6b8b98c8 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -47,10 +47,9 @@
 
 typedef struct GDBRegisterState {
 int base_reg;
-int num_regs;
 gdb_get_reg_cb get_reg;
 gdb_set_reg_cb set_reg;
-const char *xml;
+const GDBFeature *feature;
 struct GDBRegisterState *next;
 } GDBRegisterState;
 
@@ -390,7 +389,7 @@ static const char *get_feature_xml(const char *p, const 
char **newp,
 pstrcat(buf, buf_sz, "\"/>");
 for (r = cpu->gdb_regs; r; r = r->next) {
 pstrcat(buf, buf_sz, "xml);
+pstrcat(buf, buf_sz, r->feature->xmlname);
 pstrcat(buf, buf_sz, "\"/>");
 }
 pstrcat(buf, buf_sz, "");
@@ -497,7 +496,7 @@ static int gdb_read_register(CPUState *cpu, GByteArray 
*buf, int reg)
 }
 
 for (r = cpu->gdb_regs; r; r = r->next) {
-if (r->base_reg <= reg && reg < r->base_reg + r->num_regs) {
+if (r->base_reg <= reg && reg < r->base_reg + r->feature->num_regs) {
 return r->get_reg(env, buf, reg - r->base_reg);
 }
 }
@@ -515,7 +514,7 @@ static int gdb_write_register(CPUState *cpu, uint8_t 
*mem_buf, int reg)
 }
 
 for (r = cpu->gdb_regs; r; r = r->next) {
-if (r->base_reg <= reg && reg < r->base_reg + r->num_regs) {
+if (r->base_reg <= reg && reg < r->base_reg + r->feature->num_regs) {
 return r->set_reg(env, mem_buf, reg - r->base_reg);
 }
 }
@@ -538,17 +537,16 @@ void gdb_register_coprocessor(CPUState *cpu,
 p = >gdb_regs;
 while (*p) {
 /* Check for duplicates.  */
-if (strcmp((*p)->xml, feature->xmlname) == 0)
+if ((*p)->feature == feature)
 return;
 p = &(*p)->next;
 }
 
 s = g_new0(GDBRegisterState, 1);
 s->base_reg = cpu->gdb_num_regs;
-s->num_regs = feature->num_regs;
 s->get_reg = get_reg;
 s->set_reg = set_reg;
-s->xml = feature->xml;
+s->feature = feature;
 
 /* Add to end of list.  */
 cpu->gdb_num_regs += feature->num_regs;
-- 
2.41.0

[PATCH RESEND v5 09/26] target/ppc: Use GDBFeature for dynamic XML

2023-08-17 Thread Akihiko Odaki

In preparation for a change to use GDBFeature as a parameter of
gdb_register_coprocessor(), convert the internal representation of
dynamic feature from plain XML to GDBFeature.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Richard Henderson 
---
 target/ppc/cpu-qom.h  |  3 +--
 target/ppc/cpu.h  |  2 +-
 target/ppc/cpu_init.c |  2 +-
 target/ppc/gdbstub.c  | 45 ++-
 4 files changed, 17 insertions(+), 35 deletions(-)

diff --git a/target/ppc/cpu-qom.h b/target/ppc/cpu-qom.h
index be33786bd8..633fb402b5 100644
--- a/target/ppc/cpu-qom.h
+++ b/target/ppc/cpu-qom.h
@@ -186,8 +186,7 @@ struct PowerPCCPUClass {
 int bfd_mach;
 uint32_t l1_dcache_size, l1_icache_size;
 #ifndef CONFIG_USER_ONLY
-unsigned int gdb_num_sprs;
-const char *gdb_spr_xml;
+GDBFeature gdb_spr;
 #endif
 const PPCHash64Options *hash64_opts;
 struct ppc_radix_page_info *radix_page_info;
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 25fac9577a..5f251bdffe 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1381,7 +1381,7 @@ int ppc_cpu_gdb_write_register(CPUState *cpu, uint8_t 
*buf, int reg);
 int ppc_cpu_gdb_write_register_apple(CPUState *cpu, uint8_t *buf, int reg);
 #ifndef CONFIG_USER_ONLY
 hwaddr ppc_cpu_get_phys_page_debug(CPUState *cpu, vaddr addr);
-void ppc_gdb_gen_spr_xml(PowerPCCPU *cpu);
+void ppc_gdb_gen_spr_feature(PowerPCCPU *cpu);
 const char *ppc_gdb_get_dynamic_xml(CPUState *cs, const char *xml_name);
 #endif
 int ppc64_cpu_write_elf64_note(WriteCoreDumpFunction f, CPUState *cs,
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index eb56226865..938cd2b7e1 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -6673,7 +6673,7 @@ static void init_ppc_proc(PowerPCCPU *cpu)
 (*pcc->init_proc)(env);
 
 #if !defined(CONFIG_USER_ONLY)
-ppc_gdb_gen_spr_xml(cpu);
+ppc_gdb_gen_spr_feature(cpu);
 #endif
 
 /* MSR bits & flags consistency checks */
diff --git a/target/ppc/gdbstub.c b/target/ppc/gdbstub.c
index ca39efdc35..0ef484dbee 100644
--- a/target/ppc/gdbstub.c
+++ b/target/ppc/gdbstub.c
@@ -318,15 +318,21 @@ int ppc_cpu_gdb_write_register_apple(CPUState *cs, 
uint8_t *mem_buf, int n)
 }
 
 #ifndef CONFIG_USER_ONLY
-void ppc_gdb_gen_spr_xml(PowerPCCPU *cpu)
+void ppc_gdb_gen_spr_feature(PowerPCCPU *cpu)
 {
 PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
 CPUPPCState *env = >env;
-GString *xml;
-char *spr_name;
+GDBFeatureBuilder builder;
 unsigned int num_regs = 0;
 int i;
 
+if (pcc->gdb_spr.xml) {
+return;
+}
+
+gdb_feature_builder_init(, >gdb_spr,
+ "org.qemu.power.spr", "power-spr.xml");
+
 for (i = 0; i < ARRAY_SIZE(env->spr_cb); i++) {
 ppc_spr_t *spr = >spr_cb[i];
 
@@ -344,35 +350,12 @@ void ppc_gdb_gen_spr_xml(PowerPCCPU *cpu)
  */
 spr->gdb_id = num_regs;
 num_regs++;
-}
-
-if (pcc->gdb_spr_xml) {
-return;
-}
-
-xml = g_string_new("");
-g_string_append(xml, "");
-g_string_append(xml, "");
 
-for (i = 0; i < ARRAY_SIZE(env->spr_cb); i++) {
-ppc_spr_t *spr = >spr_cb[i];
-
-if (!spr->name) {
-continue;
-}
-
-spr_name = g_ascii_strdown(spr->name, -1);
-g_string_append_printf(xml, "");
+gdb_feature_builder_append_reg(, g_ascii_strdown(spr->name, 
-1),
+   TARGET_LONG_BITS, "int", "spr");
 }
 
-g_string_append(xml, "");
-
-pcc->gdb_num_sprs = num_regs;
-pcc->gdb_spr_xml = g_string_free(xml, false);
+gdb_feature_builder_end();
 }
 
 const char *ppc_gdb_get_dynamic_xml(CPUState *cs, const char *xml_name)
@@ -380,7 +363,7 @@ const char *ppc_gdb_get_dynamic_xml(CPUState *cs, const 
char *xml_name)
 PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cs);
 
 if (strcmp(xml_name, "power-spr.xml") == 0) {
-return pcc->gdb_spr_xml;
+return pcc->gdb_spr.xml;
 }
 return NULL;
 }
@@ -618,6 +601,6 @@ void ppc_gdb_init(CPUState *cs, PowerPCCPUClass *pcc)
 }
 #ifndef CONFIG_USER_ONLY
 gdb_register_coprocessor(cs, gdb_get_spr_reg, gdb_set_spr_reg,
- pcc->gdb_num_sprs, "power-spr.xml", 0);
+ pcc->gdb_spr.num_regs, "power-spr.xml", 0);
 #endif
 }
-- 
2.41.0

[PATCH v2] target/riscv: Allocate itrigger timers only once

2023-08-17 Thread Akihiko Odaki

riscv_trigger_init() had been called on reset events that can happen
several times for a CPU and it allocated timers for itrigger. If old
timers were present, they were simply overwritten by the new timers,
resulting in a memory leak.

Divide riscv_trigger_init() into two functions, namely
riscv_trigger_realize() and riscv_trigger_reset() and call them in
appropriate timing. The timer allocation will happen only once for a
CPU in riscv_trigger_realize().

Fixes: 5a4ae64cac ("target/riscv: Add itrigger support when icount is enabled")
Signed-off-by: Akihiko Odaki 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: LIU Zhiwei 
---
 target/riscv/debug.h |  3 ++-
 target/riscv/cpu.c   |  8 +++-
 target/riscv/debug.c | 15 ---
 3 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/target/riscv/debug.h b/target/riscv/debug.h
index c471748d5a..5794aa6ee5 100644
--- a/target/riscv/debug.h
+++ b/target/riscv/debug.h
@@ -143,7 +143,8 @@ void riscv_cpu_debug_excp_handler(CPUState *cs);
 bool riscv_cpu_debug_check_breakpoint(CPUState *cs);
 bool riscv_cpu_debug_check_watchpoint(CPUState *cs, CPUWatchpoint *wp);
 
-void riscv_trigger_init(CPURISCVState *env);
+void riscv_trigger_realize(CPURISCVState *env);
+void riscv_trigger_reset_hold(CPURISCVState *env);
 
 bool riscv_itrigger_enabled(CPURISCVState *env);
 void riscv_itrigger_update_priv(CPURISCVState *env);
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index e12b6ef7f6..7e0512dd5f 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -904,7 +904,7 @@ static void riscv_cpu_reset_hold(Object *obj)
 
 #ifndef CONFIG_USER_ONLY
 if (cpu->cfg.debug) {
-riscv_trigger_init(env);
+riscv_trigger_reset_hold(env);
 }
 
 if (kvm_enabled()) {
@@ -1475,6 +1475,12 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 
 riscv_cpu_register_gdb_regs_for_features(cs);
 
+#ifndef CONFIG_USER_ONLY
+if (cpu->cfg.debug) {
+riscv_trigger_realize(>env);
+}
+#endif
+
 qemu_init_vcpu(cs);
 cpu_reset(cs);
 
diff --git a/target/riscv/debug.c b/target/riscv/debug.c
index 75ee1c4971..ddd46b2d3e 100644
--- a/target/riscv/debug.c
+++ b/target/riscv/debug.c
@@ -903,7 +903,17 @@ bool riscv_cpu_debug_check_watchpoint(CPUState *cs, 
CPUWatchpoint *wp)
 return false;
 }
 
-void riscv_trigger_init(CPURISCVState *env)
+void riscv_trigger_realize(CPURISCVState *env)
+{
+int i;
+
+for (i = 0; i < RV_MAX_TRIGGERS; i++) {
+env->itrigger_timer[i] = timer_new_ns(QEMU_CLOCK_VIRTUAL,
+  riscv_itrigger_timer_cb, env);
+}
+}
+
+void riscv_trigger_reset_hold(CPURISCVState *env)
 {
 target_ulong tdata1 = build_tdata1(env, TRIGGER_TYPE_AD_MATCH, 0, 0);
 int i;
@@ -928,7 +938,6 @@ void riscv_trigger_init(CPURISCVState *env)
 env->tdata3[i] = 0;
 env->cpu_breakpoint[i] = NULL;
 env->cpu_watchpoint[i] = NULL;
-env->itrigger_timer[i] = timer_new_ns(QEMU_CLOCK_VIRTUAL,
-  riscv_itrigger_timer_cb, env);
+timer_del(env->itrigger_timer[i]);
 }
 }
-- 
2.41.0

[PATCH RESEND v5 11/26] gdbstub: Use GDBFeature for gdb_register_coprocessor

2023-08-17 Thread Akihiko Odaki

This is a tree-wide change to introduce GDBFeature parameter to
gdb_register_coprocessor(). The new parameter just replaces num_regs
and xml parameters for now. GDBFeature will be utilized to simplify XML
lookup in a following change.

Signed-off-by: Akihiko Odaki 
Acked-by: Alex Bennée 
---
 include/exec/gdbstub.h |  2 +-
 gdbstub/gdbstub.c  | 13 +++--
 target/arm/gdbstub.c   | 34 ++
 target/hexagon/cpu.c   |  3 +--
 target/loongarch/gdbstub.c |  2 +-
 target/m68k/helper.c   |  6 +++---
 target/microblaze/cpu.c|  5 +++--
 target/ppc/gdbstub.c   | 11 ++-
 target/riscv/gdbstub.c | 18 ++
 target/s390x/gdbstub.c | 28 +++-
 10 files changed, 57 insertions(+), 65 deletions(-)

diff --git a/include/exec/gdbstub.h b/include/exec/gdbstub.h
index 1f4608d4f9..572abada63 100644
--- a/include/exec/gdbstub.h
+++ b/include/exec/gdbstub.h
@@ -27,7 +27,7 @@ typedef int (*gdb_get_reg_cb)(CPUArchState *env, GByteArray 
*buf, int reg);
 typedef int (*gdb_set_reg_cb)(CPUArchState *env, uint8_t *buf, int reg);
 void gdb_register_coprocessor(CPUState *cpu,
   gdb_get_reg_cb get_reg, gdb_set_reg_cb set_reg,
-  int num_regs, const char *xml, int g_pos);
+  const GDBFeature *feature, int g_pos);
 
 /**
  * gdbserver_start: start the gdb server
diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index 909e3cd655..b62002bc34 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -530,7 +530,7 @@ static int gdb_write_register(CPUState *cpu, uint8_t 
*mem_buf, int reg)
 
 void gdb_register_coprocessor(CPUState *cpu,
   gdb_get_reg_cb get_reg, gdb_set_reg_cb set_reg,
-  int num_regs, const char *xml, int g_pos)
+  const GDBFeature *feature, int g_pos)
 {
 GDBRegisterState *s;
 GDBRegisterState **p;
@@ -538,25 +538,26 @@ void gdb_register_coprocessor(CPUState *cpu,
 p = >gdb_regs;
 while (*p) {
 /* Check for duplicates.  */
-if (strcmp((*p)->xml, xml) == 0)
+if (strcmp((*p)->xml, feature->xmlname) == 0)
 return;
 p = &(*p)->next;
 }
 
 s = g_new0(GDBRegisterState, 1);
 s->base_reg = cpu->gdb_num_regs;
-s->num_regs = num_regs;
+s->num_regs = feature->num_regs;
 s->get_reg = get_reg;
 s->set_reg = set_reg;
-s->xml = xml;
+s->xml = feature->xml;
 
 /* Add to end of list.  */
-cpu->gdb_num_regs += num_regs;
+cpu->gdb_num_regs += feature->num_regs;
 *p = s;
 if (g_pos) {
 if (g_pos != s->base_reg) {
 error_report("Error: Bad gdb register numbering for '%s', "
- "expected %d got %d", xml, g_pos, s->base_reg);
+ "expected %d got %d", feature->xml,
+ g_pos, s->base_reg);
 } else {
 cpu->gdb_num_g_regs = cpu->gdb_num_regs;
 }
diff --git a/target/arm/gdbstub.c b/target/arm/gdbstub.c
index daa68ead66..791784dffe 100644
--- a/target/arm/gdbstub.c
+++ b/target/arm/gdbstub.c
@@ -500,14 +500,14 @@ void arm_cpu_register_gdb_regs_for_features(ARMCPU *cpu)
  */
 #ifdef TARGET_AARCH64
 if (isar_feature_aa64_sve(>isar)) {
-int nreg = arm_gen_dynamic_svereg_feature(cs)->num_regs;
+GDBFeature *feature = arm_gen_dynamic_svereg_feature(cs);
 gdb_register_coprocessor(cs, aarch64_gdb_get_sve_reg,
- aarch64_gdb_set_sve_reg, nreg,
- "sve-registers.xml", 0);
+ aarch64_gdb_set_sve_reg, feature, 0);
 } else {
 gdb_register_coprocessor(cs, aarch64_gdb_get_fpu_reg,
  aarch64_gdb_set_fpu_reg,
- 34, "aarch64-fpu.xml", 0);
+ 
gdb_find_static_feature("aarch64-fpu.xml"),
+ 0);
 }
 /*
  * Note that we report pauth information via the feature name
@@ -518,19 +518,22 @@ void arm_cpu_register_gdb_regs_for_features(ARMCPU *cpu)
 if (isar_feature_aa64_pauth(>isar)) {
 gdb_register_coprocessor(cs, aarch64_gdb_get_pauth_reg,
  aarch64_gdb_set_pauth_reg,
- 4, "aarch64-pauth.xml", 0);
+ 
gdb_find_static_feature("aarch64-pauth.xml"),
+ 0);
 }
 #endif
 } else {
 if (arm_feature(env, ARM_FEATURE_NEON)) {
 gdb_register_coprocessor(cs, vfp_gdb_get_reg, vfp_gdb_set_reg,
- 49, "arm-neon.xml", 0);
+ gdb_find_static_feature("arm-neon.xml"),
+

[PATCH RESEND v5 15/26] gdbstub: Simplify XML lookup

2023-08-17 Thread Akihiko Odaki

Now we know all instances of GDBFeature that is used in CPU so we can
traverse them to find XML. This removes the need for a CPU-specific
lookup function for dynamic XMLs.

Signed-off-by: Akihiko Odaki 
---
 gdbstub/gdbstub.c | 24 
 1 file changed, 8 insertions(+), 16 deletions(-)

diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index 031ad89c7d..4648a56088 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -354,8 +354,7 @@ static const char *get_feature_xml(const char *p, const 
char **newp,
GDBProcess *process)
 {
 size_t len;
-int i;
-const char *name;
+GDBRegisterState *r;
 CPUState *cpu = gdb_get_first_cpu_in_process(process);
 CPUClass *cc = CPU_GET_CLASS(cpu);
 
@@ -364,7 +363,6 @@ static const char *get_feature_xml(const char *p, const 
char **newp,
 len++;
 *newp = p + len;
 
-name = NULL;
 if (strncmp(p, "target.xml", len) == 0) {
 /* Generate the XML description for this CPU.  */
 if (!process->target_xml) {
@@ -398,21 +396,15 @@ static const char *get_feature_xml(const char *p, const 
char **newp,
 }
 return process->target_xml;
 }
-if (cc->gdb_get_dynamic_xml) {
-char *xmlname = g_strndup(p, len);
-const char *xml = cc->gdb_get_dynamic_xml(cpu, xmlname);
-
-g_free(xmlname);
-if (xml) {
-return xml;
-}
+if (strncmp(p, cc->gdb_core_feature->xmlname, len) == 0) {
+return cc->gdb_core_feature->xml;
 }
-for (i = 0; ; i++) {
-name = gdb_static_features[i].xmlname;
-if (!name || (strncmp(name, p, len) == 0 && strlen(name) == len))
-break;
+for (r = cpu->gdb_regs; r; r = r->next) {
+if (strncmp(p, r->feature->xmlname, len) == 0) {
+return r->feature->xml;
+}
 }
-return name ? gdb_static_features[i].xml : NULL;
+return NULL;
 }
 
 void gdb_feature_builder_init(GDBFeatureBuilder *builder, GDBFeature *feature,
-- 
2.41.0

[PATCH RESEND v5 00/26] plugins: Allow to read registers

2023-08-17 Thread Akihiko Odaki

I and other people in the University of Tokyo, where I research processor
design, found TCG plugins are very useful for processor design exploration.

The feature we find missing is the capability to read registers from
plugins. In this series, I propose to add such a capability by reusing
gdbstub code.

The reuse of gdbstub code ensures the long-term stability of the TCG plugin
interface for register access without incurring a burden to maintain yet
another interface for register access.

This process to add TCG plugin involves four major changes. The first one
is to add GDBFeature structure that represents a GDB feature, which usually
includes registers. GDBFeature can be generated from static XML files or
dynamically generated by architecture-specific code. In fact, this is a
refactoring independent of the feature this series adds, and potentially
it's benefitial even without the plugin feature. The plugin feature will
utilize this new structure to describe registers exposed to plugins.

The second one is to make gdb_read_register/gdb_write_register usable
outside of gdbstub context.

The third one is to actually make registers readable for plugins.

The last one is to allow to implement a QEMU plugin in C++. A plugin that
I'll describe later is written in C++.

The below is a summary of patches:
Patch 01 fixes a bug in execlog plugin.
Patch [02, 16] introduce GDBFeature.
Patch 17 adds information useful for plugins to GDBFeature.
Patch [18, 21] make registers readable outside of gdbstub context.
Patch [22, 24] add the feature to read registers from plugins.
Patch [25, 26] make it possible to write plugins in C++.

The execlog plugin will have new options to demonstrate the new feature.
I also have a plugin that uses this new feature to generate execution
traces for Sniper processor simulator, which is available at:
https://github.com/shioya-lab/sniper/tree/akihikodaki/bb

V4 -> V5:
  Corrected g_rw_lock_writer_lock() call. (Richard Henderson)
  Replaced abort() with g_assert_not_reached(). (Richard Henderson)
  Fixed CSR name leak in target/riscv. (Richard Henderson)
  Removed gdb_has_xml variable.

V3 -> V4:
  Added execlog changes I forgot to include in the last version.

V2 -> V3:
  Added patch "hw/core/cpu: Return static value with gdb_arch_name()".
  Added patch "gdbstub: Dynamically allocate target.xml buffer".
  (Alex Bennée)
  Added patch "gdbstub: Introduce GDBFeatureBuilder". (Alex Bennée)
  Dropped Reviewed-by tags for "target/*: Use GDBFeature for dynamic XML".
  Changed gdb_find_static_feature() to abort on failure. (Alex Bennée)
  Changed the execlog plugin to log the register value only when changed.
  (Alex Bennée)
  Dropped 0x prefixes for register value logs for conciseness.

V1 -> V2:
  Added SPDX-License-Identifier: GPL-2.0-or-later. (Philippe Mathieu-Daudé)
  Split long lines. (Philippe Mathieu-Daudé)
  Renamed gdb_features to gdb_static_features (Philippe Mathieu-Daudé)
  Dropped RFC.

Akihiko Odaki (26):
  contrib/plugins: Use GRWLock in execlog
  gdbstub: Introduce GDBFeature structure
  gdbstub: Add num_regs member to GDBFeature
  gdbstub: Introduce gdb_find_static_feature()
  target/arm: Move the reference to arm-core.xml
  hw/core/cpu: Replace gdb_core_xml_file with gdb_core_feature
  gdbstub: Introduce GDBFeatureBuilder
  target/arm: Use GDBFeature for dynamic XML
  target/ppc: Use GDBFeature for dynamic XML
  target/riscv: Use GDBFeature for dynamic XML
  gdbstub: Use GDBFeature for gdb_register_coprocessor
  gdbstub: Use GDBFeature for GDBRegisterState
  hw/core/cpu: Return static value with gdb_arch_name()
  gdbstub: Dynamically allocate target.xml buffer
  gdbstub: Simplify XML lookup
  hw/core/cpu: Remove gdb_get_dynamic_xml member
  gdbstub: Add members to identify registers to GDBFeature
  target/arm: Remove references to gdb_has_xml
  target/ppc: Remove references to gdb_has_xml
  gdbstub: Remove gdb_has_xml variable
  gdbstub: Expose functions to read registers
  cpu: Call plugin hooks only when ready
  plugins: Allow to read registers
  contrib/plugins: Allow to log registers
  plugins: Support C++
  contrib/plugins: Add cc plugin

 MAINTAINERS  |   2 +-
 docs/devel/tcg-plugins.rst   |  18 +++-
 configure|  15 ++-
 meson.build  |   2 +-
 gdbstub/internals.h  |   2 +-
 include/exec/gdbstub.h   |  51 +++--
 include/hw/core/cpu.h|  11 +-
 include/qemu/qemu-plugin.h   |  69 +++-
 target/arm/cpu.h |  26 ++---
 target/arm/internals.h   |   2 +-
 target/ppc/cpu-qom.h |   3 +-
 target/ppc/cpu.h |   3 +-
 target/ppc/internal.h|   2 +-
 target/riscv/cpu.h   |   4 +-
 target/s390x/cpu.h   |   2 -
 contrib/plugins/execlog.c| 150 --
 cpu.c|  11 --
 gdbstub/gdbstub.c| 198 +++---
 gdbstub/softmmu.c|   3 +-
 gdbstub/user.c

[PATCH RESEND v5 05/26] target/arm: Move the reference to arm-core.xml

2023-08-17 Thread Akihiko Odaki

Some subclasses overwrite gdb_core_xml_file member but others don't.
Always initialize the member in the subclasses for consistency.

This especially helps for AArch64; in a following change, the file
specified by gdb_core_xml_file is always looked up even if it's going to
be overwritten later. Looking up arm-core.xml results in an error as
it will not be embedded in the AArch64 build.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Richard Henderson 
---
 target/arm/cpu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 93c28d50e5..d71a162070 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -2354,7 +2354,6 @@ static void arm_cpu_class_init(ObjectClass *oc, void 
*data)
 cc->sysemu_ops = _sysemu_ops;
 #endif
 cc->gdb_num_core_regs = 26;
-cc->gdb_core_xml_file = "arm-core.xml";
 cc->gdb_arch_name = arm_gdb_arch_name;
 cc->gdb_get_dynamic_xml = arm_gdb_get_dynamic_xml;
 cc->gdb_stop_before_watchpoint = true;
@@ -2376,8 +2375,10 @@ static void arm_cpu_instance_init(Object *obj)
 static void cpu_register_class_init(ObjectClass *oc, void *data)
 {
 ARMCPUClass *acc = ARM_CPU_CLASS(oc);
+CPUClass *cc = CPU_CLASS(acc);
 
 acc->info = data;
+cc->gdb_core_xml_file = "arm-core.xml";
 }
 
 void arm_cpu_register(const ARMCPUInfo *info)
-- 
2.41.0

[PATCH RESEND v5 02/26] gdbstub: Introduce GDBFeature structure

2023-08-17 Thread Akihiko Odaki

Before this change, the information from a XML file was stored in an
array that is not descriptive. Introduce a dedicated structure type to
make it easier to understand and to extend with more fields.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
 MAINTAINERS |  2 +-
 meson.build |  2 +-
 include/exec/gdbstub.h  |  9 --
 gdbstub/gdbstub.c   |  4 +--
 stubs/gdbstub.c |  6 ++--
 scripts/feature_to_c.py | 48 
 scripts/feature_to_c.sh | 69 -
 7 files changed, 62 insertions(+), 78 deletions(-)
 create mode 100755 scripts/feature_to_c.py
 delete mode 100644 scripts/feature_to_c.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index 12e59b6b27..514ac74101 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2826,7 +2826,7 @@ F: include/exec/gdbstub.h
 F: include/gdbstub/*
 F: gdb-xml/
 F: tests/tcg/multiarch/gdbstub/
-F: scripts/feature_to_c.sh
+F: scripts/feature_to_c.py
 F: scripts/probe-gdb-support.py
 
 Memory API
diff --git a/meson.build b/meson.build
index 98e68ef0b1..5c633f7e01 100644
--- a/meson.build
+++ b/meson.build
@@ -3683,7 +3683,7 @@ common_all = static_library('common',
 dependencies: common_all.dependencies(),
 name_suffix: 'fa')
 
-feature_to_c = find_program('scripts/feature_to_c.sh')
+feature_to_c = find_program('scripts/feature_to_c.py')
 
 if targetos == 'darwin'
   entitlement = find_program('scripts/entitlement.sh')
diff --git a/include/exec/gdbstub.h b/include/exec/gdbstub.h
index 7d743fe1e9..3f08093321 100644
--- a/include/exec/gdbstub.h
+++ b/include/exec/gdbstub.h
@@ -10,6 +10,11 @@
 #define GDB_WATCHPOINT_READ  3
 #define GDB_WATCHPOINT_ACCESS4
 
+typedef struct GDBFeature {
+const char *xmlname;
+const char *xml;
+} GDBFeature;
+
 
 /* Get or set a register.  Returns the size of the register.  */
 typedef int (*gdb_get_reg_cb)(CPUArchState *env, GByteArray *buf, int reg);
@@ -38,7 +43,7 @@ void gdb_set_stop_cpu(CPUState *cpu);
  */
 extern bool gdb_has_xml;
 
-/* in gdbstub-xml.c, generated by scripts/feature_to_c.sh */
-extern const char *const xml_builtin[][2];
+/* in gdbstub-xml.c, generated by scripts/feature_to_c.py */
+extern const GDBFeature gdb_static_features[];
 
 #endif
diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index 6911b73c07..2772f07bbe 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -407,11 +407,11 @@ static const char *get_feature_xml(const char *p, const 
char **newp,
 }
 }
 for (i = 0; ; i++) {
-name = xml_builtin[i][0];
+name = gdb_static_features[i].xmlname;
 if (!name || (strncmp(name, p, len) == 0 && strlen(name) == len))
 break;
 }
-return name ? xml_builtin[i][1] : NULL;
+return name ? gdb_static_features[i].xml : NULL;
 }
 
 static int gdb_read_register(CPUState *cpu, GByteArray *buf, int reg)
diff --git a/stubs/gdbstub.c b/stubs/gdbstub.c
index 2b7aee50d3..580e20702b 100644
--- a/stubs/gdbstub.c
+++ b/stubs/gdbstub.c
@@ -1,6 +1,6 @@
 #include "qemu/osdep.h"
-#include "exec/gdbstub.h"   /* xml_builtin */
+#include "exec/gdbstub.h"   /* gdb_static_features */
 
-const char *const xml_builtin[][2] = {
-  { NULL, NULL }
+const GDBFeature gdb_static_features[] = {
+  { NULL }
 };
diff --git a/scripts/feature_to_c.py b/scripts/feature_to_c.py
new file mode 100755
index 00..bcbcb83beb
--- /dev/null
+++ b/scripts/feature_to_c.py
@@ -0,0 +1,48 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+import os, sys
+
+def writeliteral(indent, bytes):
+sys.stdout.write(' ' * indent)
+sys.stdout.write('"')
+quoted = True
+
+for c in bytes:
+if not quoted:
+sys.stdout.write('\n')
+sys.stdout.write(' ' * indent)
+sys.stdout.write('"')
+quoted = True
+
+if c == b'"'[0]:
+sys.stdout.write('\\"')
+elif c == b'\\'[0]:
+sys.stdout.write('')
+elif c == b'\n'[0]:
+sys.stdout.write('\\n"')
+quoted = False
+elif c >= 32 and c < 127:
+sys.stdout.write(c.to_bytes(1, 'big').decode())
+else:
+sys.stdout.write(f'\{c:03o}')
+
+if quoted:
+sys.stdout.write('"')
+
+sys.stdout.write('#include "qemu/osdep.h"\n' \
+ '#include "exec/gdbstub.h"\n' \
+ '\n'
+ 'const GDBFeature gdb_static_features[] = {\n')
+
+for input in sys.argv[1:]:
+with open(input, 'rb') as file:
+read = file.read()
+
+sys.stdout.write('{\n')
+writeliteral(8, bytes(os.path.basename(input), 'utf-8'))
+sys.stdout.write(',\n')
+writeliteral(8, read)
+sys.stdout.write('\n},\n')
+
+sys.stdout.write('{ NULL }\n};\n')
diff --git a/scripts/feature_to_c.sh b/scripts/feature_to_c.sh

[PATCH RESEND v5 07/26] gdbstub: Introduce GDBFeatureBuilder

2023-08-17 Thread Akihiko Odaki

GDBFeatureBuilder unifies the logic to generate dynamic GDBFeature.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Richard Henderson 
---
 include/exec/gdbstub.h | 20 ++
 gdbstub/gdbstub.c  | 59 ++
 2 files changed, 79 insertions(+)

diff --git a/include/exec/gdbstub.h b/include/exec/gdbstub.h
index d0dcc99ed4..1f4608d4f9 100644
--- a/include/exec/gdbstub.h
+++ b/include/exec/gdbstub.h
@@ -16,6 +16,11 @@ typedef struct GDBFeature {
 int num_regs;
 } GDBFeature;
 
+typedef struct GDBFeatureBuilder {
+GDBFeature *feature;
+GPtrArray *xml;
+} GDBFeatureBuilder;
+
 
 /* Get or set a register.  Returns the size of the register.  */
 typedef int (*gdb_get_reg_cb)(CPUArchState *env, GByteArray *buf, int reg);
@@ -34,6 +39,21 @@ void gdb_register_coprocessor(CPUState *cpu,
  */
 int gdbserver_start(const char *port_or_device);
 
+void gdb_feature_builder_init(GDBFeatureBuilder *builder, GDBFeature *feature,
+  const char *name, const char *xmlname);
+
+void gdb_feature_builder_append_tag(const GDBFeatureBuilder *builder,
+const char *format, ...)
+G_GNUC_PRINTF(2, 3);
+
+void gdb_feature_builder_append_reg(const GDBFeatureBuilder *builder,
+const char *name,
+int bitsize,
+const char *type,
+const char *group);
+
+void gdb_feature_builder_end(const GDBFeatureBuilder *builder);
+
 const GDBFeature *gdb_find_static_feature(const char *xmlname);
 
 void gdb_set_stop_cpu(CPUState *cpu);
diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index 94f218039b..909e3cd655 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -414,6 +414,65 @@ static const char *get_feature_xml(const char *p, const 
char **newp,
 return name ? gdb_static_features[i].xml : NULL;
 }
 
+void gdb_feature_builder_init(GDBFeatureBuilder *builder, GDBFeature *feature,
+  const char *name, const char *xmlname)
+{
+char *header = g_markup_printf_escaped(
+""
+""
+"",
+name);
+
+builder->feature = feature;
+builder->xml = g_ptr_array_new();
+g_ptr_array_add(builder->xml, header);
+feature->xmlname = xmlname;
+feature->num_regs = 0;
+}
+
+void gdb_feature_builder_append_tag(const GDBFeatureBuilder *builder,
+const char *format, ...)
+{
+va_list ap;
+va_start(ap, format);
+g_ptr_array_add(builder->xml, g_markup_vprintf_escaped(format, ap));
+va_end(ap);
+}
+
+void gdb_feature_builder_append_reg(const GDBFeatureBuilder *builder,
+const char *name,
+int bitsize,
+const char *type,
+const char *group)
+{
+if (group) {
+gdb_feature_builder_append_tag(
+builder,
+"",
+name, bitsize, type, group);
+} else {
+gdb_feature_builder_append_tag(
+builder, "",
+name, bitsize, type);
+}
+
+builder->feature->num_regs++;
+}
+
+void gdb_feature_builder_end(const GDBFeatureBuilder *builder)
+{
+g_ptr_array_add(builder->xml, (void *)"");
+g_ptr_array_add(builder->xml, NULL);
+
+builder->feature->xml = g_strjoinv(NULL, (void *)builder->xml->pdata);
+
+for (guint i = 0; i < builder->xml->len - 2; i++) {
+g_free(g_ptr_array_index(builder->xml, i));
+}
+
+g_ptr_array_free(builder->xml, TRUE);
+}
+
 const GDBFeature *gdb_find_static_feature(const char *xmlname)
 {
 const GDBFeature *feature;
-- 
2.41.0

[PATCH RESEND v5 25/26] plugins: Support C++

2023-08-17 Thread Akihiko Odaki

Make qemu-plugin.h consumable for C++ platform.

Signed-off-by: Akihiko Odaki 
---
 include/qemu/qemu-plugin.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
index 214b12bfd6..8637e3d8cf 100644
--- a/include/qemu/qemu-plugin.h
+++ b/include/qemu/qemu-plugin.h
@@ -16,6 +16,8 @@
 #include 
 #include 
 
+G_BEGIN_DECLS
+
 /*
  * For best performance, build the plugin with -fvisibility=hidden so that
  * QEMU_PLUGIN_LOCAL is implicit. Then, just mark qemu_plugin_install with
@@ -723,4 +725,6 @@ qemu_plugin_get_register_files(unsigned int vcpu_index, int 
*size);
  */
 int qemu_plugin_read_register(GByteArray *buf, int reg);
 
+G_END_DECLS
+
 #endif /* QEMU_QEMU_PLUGIN_H */
-- 
2.41.0

[PATCH RESEND v5 10/26] target/riscv: Use GDBFeature for dynamic XML

2023-08-17 Thread Akihiko Odaki

In preparation for a change to use GDBFeature as a parameter of
gdb_register_coprocessor(), convert the internal representation of
dynamic feature from plain XML to GDBFeature.

Signed-off-by: Akihiko Odaki 
---
 target/riscv/cpu.h |  4 +--
 target/riscv/cpu.c |  4 +--
 target/riscv/gdbstub.c | 77 ++
 3 files changed, 37 insertions(+), 48 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 6ea22e0eea..f67751d5b7 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -391,8 +391,8 @@ struct ArchCPU {
 CPUNegativeOffsetState neg;
 CPURISCVState env;
 
-char *dyn_csr_xml;
-char *dyn_vreg_xml;
+GDBFeature dyn_csr_feature;
+GDBFeature dyn_vreg_feature;
 
 /* Configuration Settings */
 RISCVCPUConfig cfg;
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 36de35270d..ceca40cdd9 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1962,9 +1962,9 @@ static const char *riscv_gdb_get_dynamic_xml(CPUState 
*cs, const char *xmlname)
 RISCVCPU *cpu = RISCV_CPU(cs);
 
 if (strcmp(xmlname, "riscv-csr.xml") == 0) {
-return cpu->dyn_csr_xml;
+return cpu->dyn_csr_feature.xml;
 } else if (strcmp(xmlname, "riscv-vector.xml") == 0) {
-return cpu->dyn_vreg_xml;
+return cpu->dyn_vreg_feature.xml;
 }
 
 return NULL;
diff --git a/target/riscv/gdbstub.c b/target/riscv/gdbstub.c
index 524bede865..cdae406751 100644
--- a/target/riscv/gdbstub.c
+++ b/target/riscv/gdbstub.c
@@ -212,12 +212,13 @@ static int riscv_gdb_set_virtual(CPURISCVState *cs, 
uint8_t *mem_buf, int n)
 return 0;
 }
 
-static int riscv_gen_dynamic_csr_xml(CPUState *cs, int base_reg)
+static GDBFeature *riscv_gen_dynamic_csr_feature(CPUState *cs)
 {
 RISCVCPU *cpu = RISCV_CPU(cs);
 CPURISCVState *env = >env;
-GString *s = g_string_new(NULL);
+GDBFeatureBuilder builder;
 riscv_csr_predicate_fn predicate;
+const char *name;
 int bitsize = 16 << env->misa_mxl_max;
 int i;
 
@@ -230,9 +231,8 @@ static int riscv_gen_dynamic_csr_xml(CPUState *cs, int 
base_reg)
 bitsize = 64;
 }
 
-g_string_printf(s, "");
-g_string_append_printf(s, "");
-g_string_append_printf(s, "");
+gdb_feature_builder_init(, >dyn_csr_feature,
+ "org.gnu.gdb.riscv.csr", "riscv-csr.xml");
 
 for (i = 0; i < CSR_TABLE_SIZE; i++) {
 if (env->priv_ver < csr_ops[i].min_priv_ver) {
@@ -240,72 +240,63 @@ static int riscv_gen_dynamic_csr_xml(CPUState *cs, int 
base_reg)
 }
 predicate = csr_ops[i].predicate;
 if (predicate && (predicate(env, i) == RISCV_EXCP_NONE)) {
-if (csr_ops[i].name) {
-g_string_append_printf(s, "", base_reg + i);
+
+gdb_feature_builder_append_reg(, name, bitsize,
+   "int", NULL);
 }
 }
 
-g_string_append_printf(s, "");
-
-cpu->dyn_csr_xml = g_string_free(s, false);
+gdb_feature_builder_end();
 
 #if !defined(CONFIG_USER_ONLY)
 env->debugger = false;
 #endif
 
-return CSR_TABLE_SIZE;
+return >dyn_csr_feature;
 }
 
-static int ricsv_gen_dynamic_vector_xml(CPUState *cs, int base_reg)
+static GDBFeature *ricsv_gen_dynamic_vector_feature(CPUState *cs)
 {
 RISCVCPU *cpu = RISCV_CPU(cs);
-GString *s = g_string_new(NULL);
-g_autoptr(GString) ts = g_string_new("");
+GDBFeatureBuilder builder;
 int reg_width = cpu->cfg.vlen;
-int num_regs = 0;
 int i;
 
-g_string_printf(s, "");
-g_string_append_printf(s, "");
-g_string_append_printf(s, "");
+gdb_feature_builder_init(, >dyn_vreg_feature,
+ "org.gnu.gdb.riscv.vector", "riscv-vector.xml");
 
 /* First define types and totals in a whole VL */
 for (i = 0; i < ARRAY_SIZE(vec_lanes); i++) {
 int count = reg_width / vec_lanes[i].size;
-g_string_printf(ts, "%s", vec_lanes[i].id);
-g_string_append_printf(s,
-   "",
-   ts->str, vec_lanes[i].gdb_type, count);
+gdb_feature_builder_append_tag(
+, "",
+vec_lanes[i].id, vec_lanes[i].gdb_type, count);
 }
 
 /* Define unions */
-g_string_append_printf(s, "");
+gdb_feature_builder_append_tag(, "");
 for (i = 0; i < ARRAY_SIZE(vec_lanes); i++) {
-g_string_append_printf(s, "",
-   vec_lanes[i].suffix,
-   vec_lanes[i].id);
+gdb_feature_builder_append_tag(,
+   "",
+   vec_lanes[i].suffix, vec_lanes[i].id);
 }
-g_string_append(s, "");
+gdb_feature_builder_append_tag(, "");
 
 /* Define vector registers */
 for (i = 0; i < 32; i++) {
-g_string_append_printf(s,
-   "",
-   i,

[PATCH RESEND v5 14/26] gdbstub: Dynamically allocate target.xml buffer

2023-08-17 Thread Akihiko Odaki

There is no guarantee that target.xml fits in 1024 bytes, and the fixed
buffer length requires tedious buffer overflow check. Dynamically
allocate the target.xml buffer to resolve these problems.

Suggested-by: Alex Bennée 
Signed-off-by: Akihiko Odaki 
---
 gdbstub/internals.h |  2 +-
 gdbstub/gdbstub.c   | 44 
 gdbstub/softmmu.c   |  2 +-
 3 files changed, 26 insertions(+), 22 deletions(-)

diff --git a/gdbstub/internals.h b/gdbstub/internals.h
index f2b46cce41..4876ebd74f 100644
--- a/gdbstub/internals.h
+++ b/gdbstub/internals.h
@@ -33,7 +33,7 @@ typedef struct GDBProcess {
 uint32_t pid;
 bool attached;
 
-char target_xml[1024];
+char *target_xml;
 } GDBProcess;
 
 enum RSState {
diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index 5656a44970..031ad89c7d 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -366,33 +366,37 @@ static const char *get_feature_xml(const char *p, const 
char **newp,
 
 name = NULL;
 if (strncmp(p, "target.xml", len) == 0) {
-char *buf = process->target_xml;
-const size_t buf_sz = sizeof(process->target_xml);
-
 /* Generate the XML description for this CPU.  */
-if (!buf[0]) {
+if (!process->target_xml) {
+g_autoptr(GPtrArray) a = g_ptr_array_new_with_free_func(g_free);
 GDBRegisterState *r;
 
-pstrcat(buf, buf_sz,
-""
-""
-"");
+g_ptr_array_add(
+a,
+g_strdup(""
+ ""
+ ""));
 if (cc->gdb_arch_name) {
-pstrcat(buf, buf_sz, "");
-pstrcat(buf, buf_sz, cc->gdb_arch_name(cpu));
-pstrcat(buf, buf_sz, "");
+g_ptr_array_add(
+a,
+g_markup_printf_escaped("%s",
+cc->gdb_arch_name(cpu)));
 }
-pstrcat(buf, buf_sz, "gdb_core_feature->xmlname);
-pstrcat(buf, buf_sz, "\"/>");
+g_ptr_array_add(
+a,
+g_markup_printf_escaped("",
+cc->gdb_core_feature->xmlname));
 for (r = cpu->gdb_regs; r; r = r->next) {
-pstrcat(buf, buf_sz, "feature->xmlname);
-pstrcat(buf, buf_sz, "\"/>");
+g_ptr_array_add(
+a,
+g_markup_printf_escaped("",
+r->feature->xmlname));
 }
-pstrcat(buf, buf_sz, "");
+g_ptr_array_add(a, g_strdup(""));
+g_ptr_array_add(a, NULL);
+process->target_xml = g_strjoinv(NULL, (void *)a->pdata);
 }
-return buf;
+return process->target_xml;
 }
 if (cc->gdb_get_dynamic_xml) {
 char *xmlname = g_strndup(p, len);
@@ -2270,6 +2274,6 @@ void gdb_create_default_process(GDBState *s)
 process = >processes[s->process_num - 1];
 process->pid = pid;
 process->attached = false;
-process->target_xml[0] = '\0';
+process->target_xml = NULL;
 }
 
diff --git a/gdbstub/softmmu.c b/gdbstub/softmmu.c
index f509b7285d..5282324764 100644
--- a/gdbstub/softmmu.c
+++ b/gdbstub/softmmu.c
@@ -293,7 +293,7 @@ static int find_cpu_clusters(Object *child, void *opaque)
 assert(cluster->cluster_id != UINT32_MAX);
 process->pid = cluster->cluster_id + 1;
 process->attached = false;
-process->target_xml[0] = '\0';
+process->target_xml = NULL;
 
 return 0;
 }
-- 
2.41.0

[PATCH RESEND v5 21/26] gdbstub: Expose functions to read registers

2023-08-17 Thread Akihiko Odaki

gdb_foreach_feature() enumerates features that are useful to identify
registers. gdb_read_register() actually reads registers.

Signed-off-by: Akihiko Odaki 
---
 include/exec/gdbstub.h |  6 ++
 gdbstub/gdbstub.c  | 20 +++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/include/exec/gdbstub.h b/include/exec/gdbstub.h
index 5cba2933d2..1208fafa33 100644
--- a/include/exec/gdbstub.h
+++ b/include/exec/gdbstub.h
@@ -59,6 +59,12 @@ void gdb_feature_builder_end(const GDBFeatureBuilder 
*builder);
 
 const GDBFeature *gdb_find_static_feature(const char *xmlname);
 
+void gdb_foreach_feature(CPUState *cpu,
+ void (* callback)(void *, const GDBFeature *, int),
+ void *opaque);
+
+int gdb_read_register(CPUState *cpu, GByteArray *buf, int reg);
+
 void gdb_set_stop_cpu(CPUState *cpu);
 
 /* in gdbstub-xml.c, generated by scripts/feature_to_c.py */
diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index 55819f4aba..41fad40b6c 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -481,7 +481,25 @@ const GDBFeature *gdb_find_static_feature(const char 
*xmlname)
 g_assert_not_reached();
 }
 
-static int gdb_read_register(CPUState *cpu, GByteArray *buf, int reg)
+void gdb_foreach_feature(CPUState *cpu,
+ void (* callback)(void *, const GDBFeature *, int),
+ void *opaque)
+{
+CPUClass *cc = CPU_GET_CLASS(cpu);
+GDBRegisterState *r;
+
+if (!cc->gdb_core_feature) {
+return;
+}
+
+callback(opaque, cc->gdb_core_feature, 0);
+
+for (r = cpu->gdb_regs; r; r = r->next) {
+callback(opaque, r->feature, r->base_reg);
+}
+}
+
+int gdb_read_register(CPUState *cpu, GByteArray *buf, int reg)
 {
 CPUClass *cc = CPU_GET_CLASS(cpu);
 CPUArchState *env = cpu->env_ptr;
-- 
2.41.0

[PATCH RESEND v5 19/26] target/ppc: Remove references to gdb_has_xml

2023-08-17 Thread Akihiko Odaki

GDB has XML support since 6.7 which was released in 2007.
It's time to remove support for old GDB versions without XML support.

Signed-off-by: Akihiko Odaki 
---
 target/ppc/gdbstub.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/target/ppc/gdbstub.c b/target/ppc/gdbstub.c
index c86b7055ca..7e3b67a234 100644
--- a/target/ppc/gdbstub.c
+++ b/target/ppc/gdbstub.c
@@ -54,12 +54,6 @@ static int ppc_gdb_register_len(int n)
 case 0 ... 31:
 /* gprs */
 return sizeof(target_ulong);
-case 32 ... 63:
-/* fprs */
-if (gdb_has_xml) {
-return 0;
-}
-return 8;
 case 66:
 /* cr */
 case 69:
@@ -74,12 +68,6 @@ static int ppc_gdb_register_len(int n)
 case 68:
 /* ctr */
 return sizeof(target_ulong);
-case 70:
-/* fpscr */
-if (gdb_has_xml) {
-return 0;
-}
-return sizeof(target_ulong);
 default:
 return 0;
 }
-- 
2.41.0

[PATCH RESEND v5 20/26] gdbstub: Remove gdb_has_xml variable

2023-08-17 Thread Akihiko Odaki

GDB has XML support since 6.7 which was released in 2007.
It's time to remove support for old GDB versions without XML support.

Signed-off-by: Akihiko Odaki 
---
 include/exec/gdbstub.h |  8 
 gdbstub/gdbstub.c  | 13 -
 gdbstub/softmmu.c  |  1 -
 gdbstub/user.c |  1 -
 4 files changed, 23 deletions(-)

diff --git a/include/exec/gdbstub.h b/include/exec/gdbstub.h
index f3f2c40b1a..5cba2933d2 100644
--- a/include/exec/gdbstub.h
+++ b/include/exec/gdbstub.h
@@ -61,14 +61,6 @@ const GDBFeature *gdb_find_static_feature(const char 
*xmlname);
 
 void gdb_set_stop_cpu(CPUState *cpu);
 
-/**
- * gdb_has_xml:
- * This is an ugly hack to cope with both new and old gdb.
- * If gdb sends qXfer:features:read then assume we're talking to a newish
- * gdb that understands target descriptions.
- */
-extern bool gdb_has_xml;
-
 /* in gdbstub-xml.c, generated by scripts/feature_to_c.py */
 extern const GDBFeature gdb_static_features[];
 
diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index e52a739491..55819f4aba 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -74,8 +74,6 @@ void gdb_init_gdbserver_state(void)
 gdbserver_state.sstep_flags &= gdbserver_state.supported_sstep_flags;
 }
 
-bool gdb_has_xml;
-
 /* writes 2*len+1 bytes in buf */
 void gdb_memtohex(GString *buf, const uint8_t *mem, int len)
 {
@@ -1121,11 +1119,6 @@ static void handle_set_reg(GArray *params, void 
*user_ctx)
 {
 int reg_size;
 
-if (!gdb_has_xml) {
-gdb_put_packet("");
-return;
-}
-
 if (params->len != 2) {
 gdb_put_packet("E22");
 return;
@@ -1142,11 +1135,6 @@ static void handle_get_reg(GArray *params, void 
*user_ctx)
 {
 int reg_size;
 
-if (!gdb_has_xml) {
-gdb_put_packet("");
-return;
-}
-
 if (!params->len) {
 gdb_put_packet("E14");
 return;
@@ -1609,7 +1597,6 @@ static void handle_query_xfer_features(GArray *params, 
void *user_ctx)
 return;
 }
 
-gdb_has_xml = true;
 p = get_param(params, 0)->data;
 xml = get_feature_xml(p, , process);
 if (!xml) {
diff --git a/gdbstub/softmmu.c b/gdbstub/softmmu.c
index 5282324764..42645d2220 100644
--- a/gdbstub/softmmu.c
+++ b/gdbstub/softmmu.c
@@ -97,7 +97,6 @@ static void gdb_chr_event(void *opaque, QEMUChrEvent event)
 
 vm_stop(RUN_STATE_PAUSED);
 replay_gdb_attached();
-gdb_has_xml = false;
 break;
 default:
 break;
diff --git a/gdbstub/user.c b/gdbstub/user.c
index 5b375be1d9..7ab6e5d975 100644
--- a/gdbstub/user.c
+++ b/gdbstub/user.c
@@ -198,7 +198,6 @@ static void gdb_accept_init(int fd)
 gdbserver_state.c_cpu = gdb_first_attached_cpu();
 gdbserver_state.g_cpu = gdbserver_state.c_cpu;
 gdbserver_user_state.fd = fd;
-gdb_has_xml = false;
 }
 
 static bool gdb_accept_socket(int gdb_fd)
-- 
2.41.0

[PATCH RESEND v5 18/26] target/arm: Remove references to gdb_has_xml

2023-08-17 Thread Akihiko Odaki

GDB has XML support since 6.7 which was released in 2007.
It's time to remove support for old GDB versions without XML support.

Signed-off-by: Akihiko Odaki 
---
 target/arm/gdbstub.c | 32 ++--
 1 file changed, 2 insertions(+), 30 deletions(-)

diff --git a/target/arm/gdbstub.c b/target/arm/gdbstub.c
index dbc396a88b..4cccaa42e0 100644
--- a/target/arm/gdbstub.c
+++ b/target/arm/gdbstub.c
@@ -45,21 +45,7 @@ int arm_cpu_gdb_read_register(CPUState *cs, GByteArray 
*mem_buf, int n)
 /* Core integer register.  */
 return gdb_get_reg32(mem_buf, env->regs[n]);
 }
-if (n < 24) {
-/* FPA registers.  */
-if (gdb_has_xml) {
-return 0;
-}
-return gdb_get_zeroes(mem_buf, 12);
-}
-switch (n) {
-case 24:
-/* FPA status register.  */
-if (gdb_has_xml) {
-return 0;
-}
-return gdb_get_reg32(mem_buf, 0);
-case 25:
+if (n == 25) {
 /* CPSR, or XPSR for M-profile */
 if (arm_feature(env, ARM_FEATURE_M)) {
 return gdb_get_reg32(mem_buf, xpsr_read(env));
@@ -99,21 +85,7 @@ int arm_cpu_gdb_write_register(CPUState *cs, uint8_t 
*mem_buf, int n)
 env->regs[n] = tmp;
 return 4;
 }
-if (n < 24) { /* 16-23 */
-/* FPA registers (ignored).  */
-if (gdb_has_xml) {
-return 0;
-}
-return 12;
-}
-switch (n) {
-case 24:
-/* FPA status register (ignored).  */
-if (gdb_has_xml) {
-return 0;
-}
-return 4;
-case 25:
+if (n == 25) {
 /* CPSR, or XPSR for M-profile */
 if (arm_feature(env, ARM_FEATURE_M)) {
 /*
-- 
2.41.0

[PATCH RESEND v5 16/26] hw/core/cpu: Remove gdb_get_dynamic_xml member

2023-08-17 Thread Akihiko Odaki

This function is no longer used.

Signed-off-by: Akihiko Odaki 
---
 include/hw/core/cpu.h |  4 
 target/arm/cpu.h  |  6 --
 target/ppc/cpu.h  |  1 -
 target/arm/cpu.c  |  1 -
 target/arm/gdbstub.c  | 18 --
 target/ppc/cpu_init.c |  3 ---
 target/ppc/gdbstub.c  | 10 --
 target/riscv/cpu.c| 14 --
 8 files changed, 57 deletions(-)

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 09f1aca624..8fc9a1a140 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -133,9 +133,6 @@ struct SysemuCPUOps;
  *   before the insn which triggers a watchpoint rather than after it.
  * @gdb_arch_name: Optional callback that returns the architecture name known
  * to GDB. The caller must free the returned string with g_free.
- * @gdb_get_dynamic_xml: Callback to return dynamically generated XML for the
- *   gdb stub. Returns a pointer to the XML contents for the specified XML file
- *   or NULL if the CPU doesn't have a dynamically generated content for it.
  * @disas_set_info: Setup architecture specific components of disassembly info
  * @adjust_watchpoint_address: Perform a target-specific adjustment to an
  * address before attempting to match it against watchpoints.
@@ -166,7 +163,6 @@ struct CPUClass {
 
 const GDBFeature *gdb_core_feature;
 const gchar * (*gdb_arch_name)(CPUState *cpu);
-const char * (*gdb_get_dynamic_xml)(CPUState *cpu, const char *xmlname);
 
 void (*disas_set_info)(CPUState *cpu, disassemble_info *info);
 
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index d6c2378d05..09bf82034d 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1131,12 +1131,6 @@ hwaddr arm_cpu_get_phys_page_attrs_debug(CPUState *cpu, 
vaddr addr,
 int arm_cpu_gdb_read_register(CPUState *cpu, GByteArray *buf, int reg);
 int arm_cpu_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
 
-/* Returns the dynamically generated XML for the gdb stub.
- * Returns a pointer to the XML contents for the specified XML file or NULL
- * if the XML name doesn't match the predefined one.
- */
-const char *arm_gdb_get_dynamic_xml(CPUState *cpu, const char *xmlname);
-
 int arm_cpu_write_elf64_note(WriteCoreDumpFunction f, CPUState *cs,
  int cpuid, DumpState *s);
 int arm_cpu_write_elf32_note(WriteCoreDumpFunction f, CPUState *cs,
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 5f251bdffe..3dc6e545e3 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1382,7 +1382,6 @@ int ppc_cpu_gdb_write_register_apple(CPUState *cpu, 
uint8_t *buf, int reg);
 #ifndef CONFIG_USER_ONLY
 hwaddr ppc_cpu_get_phys_page_debug(CPUState *cpu, vaddr addr);
 void ppc_gdb_gen_spr_feature(PowerPCCPU *cpu);
-const char *ppc_gdb_get_dynamic_xml(CPUState *cs, const char *xml_name);
 #endif
 int ppc64_cpu_write_elf64_note(WriteCoreDumpFunction f, CPUState *cs,
int cpuid, DumpState *s);
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 5f07133419..f26c0ded18 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -2354,7 +2354,6 @@ static void arm_cpu_class_init(ObjectClass *oc, void 
*data)
 cc->sysemu_ops = _sysemu_ops;
 #endif
 cc->gdb_arch_name = arm_gdb_arch_name;
-cc->gdb_get_dynamic_xml = arm_gdb_get_dynamic_xml;
 cc->gdb_stop_before_watchpoint = true;
 cc->disas_set_info = arm_disas_set_info;
 
diff --git a/target/arm/gdbstub.c b/target/arm/gdbstub.c
index 791784dffe..fc5ed89e80 100644
--- a/target/arm/gdbstub.c
+++ b/target/arm/gdbstub.c
@@ -470,24 +470,6 @@ static GDBFeature 
*arm_gen_dynamic_m_secextreg_feature(CPUState *cs)
 #endif
 #endif /* CONFIG_TCG */
 
-const char *arm_gdb_get_dynamic_xml(CPUState *cs, const char *xmlname)
-{
-ARMCPU *cpu = ARM_CPU(cs);
-
-if (strcmp(xmlname, "system-registers.xml") == 0) {
-return cpu->dyn_sysreg_feature.desc.xml;
-} else if (strcmp(xmlname, "sve-registers.xml") == 0) {
-return cpu->dyn_svereg_feature.desc.xml;
-} else if (strcmp(xmlname, "arm-m-system.xml") == 0) {
-return cpu->dyn_m_systemreg_feature.desc.xml;
-#ifndef CONFIG_USER_ONLY
-} else if (strcmp(xmlname, "arm-m-secext.xml") == 0) {
-return cpu->dyn_m_secextreg_feature.desc.xml;
-#endif
-}
-return NULL;
-}
-
 void arm_cpu_register_gdb_regs_for_features(ARMCPU *cpu)
 {
 CPUState *cs = CPU(cpu);
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 938cd2b7e1..a3153c4e9f 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -7370,9 +7370,6 @@ static void ppc_cpu_class_init(ObjectClass *oc, void 
*data)
 #endif
 
 cc->gdb_num_core_regs = 71;
-#ifndef CONFIG_USER_ONLY
-cc->gdb_get_dynamic_xml = ppc_gdb_get_dynamic_xml;
-#endif
 #ifdef USE_APPLE_GDB
 cc->gdb_read_register = ppc_cpu_gdb_read_register_apple;
 cc->gdb_write_register = ppc_cpu_gdb_write_register_apple;
diff --git a/target/ppc/gdbstub.c b/target/ppc/gdbstub.c
index dbdee7d56e..c86b7055ca

[PATCH RESEND v5 06/26] hw/core/cpu: Replace gdb_core_xml_file with gdb_core_feature

2023-08-17 Thread Akihiko Odaki

This is a tree-wide change to replace gdb_core_xml_file, the path to
GDB XML file with gdb_core_feature, the pointer to GDBFeature. This
also replaces the values assigned to gdb_num_core_regs with the
num_regs member of GDBFeature where applicable to remove magic numbers.

A following change will utilize additional information provided by
GDBFeature to simplify XML file lookup.

Signed-off-by: Akihiko Odaki 
---
 include/hw/core/cpu.h   | 5 +++--
 target/s390x/cpu.h  | 2 --
 gdbstub/gdbstub.c   | 6 +++---
 target/arm/cpu.c| 4 ++--
 target/arm/cpu64.c  | 4 ++--
 target/arm/tcg/cpu32.c  | 3 ++-
 target/avr/cpu.c| 4 ++--
 target/hexagon/cpu.c| 2 +-
 target/i386/cpu.c   | 7 +++
 target/loongarch/cpu.c  | 4 ++--
 target/m68k/cpu.c   | 7 ---
 target/microblaze/cpu.c | 4 ++--
 target/ppc/cpu_init.c   | 4 ++--
 target/riscv/cpu.c  | 7 ---
 target/rx/cpu.c | 4 ++--
 target/s390x/cpu.c  | 4 ++--
 16 files changed, 36 insertions(+), 35 deletions(-)

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index fdcbe87352..84219c1885 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -23,6 +23,7 @@
 #include "hw/qdev-core.h"
 #include "disas/dis-asm.h"
 #include "exec/cpu-common.h"
+#include "exec/gdbstub.h"
 #include "exec/hwaddr.h"
 #include "exec/memattrs.h"
 #include "qapi/qapi-types-run-state.h"
@@ -127,7 +128,7 @@ struct SysemuCPUOps;
  *   breakpoint.  Used by AVR to handle a gdb mis-feature with
  *   its Harvard architecture split code and data.
  * @gdb_num_core_regs: Number of core registers accessible to GDB.
- * @gdb_core_xml_file: File name for core registers GDB XML description.
+ * @gdb_core_feature: GDB core feature description.
  * @gdb_stop_before_watchpoint: Indicates whether GDB expects the CPU to stop
  *   before the insn which triggers a watchpoint rather than after it.
  * @gdb_arch_name: Optional callback that returns the architecture name known
@@ -163,7 +164,7 @@ struct CPUClass {
 int (*gdb_write_register)(CPUState *cpu, uint8_t *buf, int reg);
 vaddr (*gdb_adjust_breakpoint)(CPUState *cpu, vaddr addr);
 
-const char *gdb_core_xml_file;
+const GDBFeature *gdb_core_feature;
 gchar * (*gdb_arch_name)(CPUState *cpu);
 const char * (*gdb_get_dynamic_xml)(CPUState *cpu, const char *xmlname);
 
diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
index eb5b65b7d3..c5bac3230c 100644
--- a/target/s390x/cpu.h
+++ b/target/s390x/cpu.h
@@ -451,8 +451,6 @@ static inline void cpu_get_tb_cpu_state(CPUS390XState *env, 
vaddr *pc,
 #define S390_R13_REGNUM 15
 #define S390_R14_REGNUM 16
 #define S390_R15_REGNUM 17
-/* Total Core Registers. */
-#define S390_NUM_CORE_REGS 18
 
 static inline void setcc(S390CPU *cpu, uint64_t cc)
 {
diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index f0ba9efaff..94f218039b 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -386,7 +386,7 @@ static const char *get_feature_xml(const char *p, const 
char **newp,
 g_free(arch);
 }
 pstrcat(buf, buf_sz, "gdb_core_xml_file);
+pstrcat(buf, buf_sz, cc->gdb_core_feature->xmlname);
 pstrcat(buf, buf_sz, "\"/>");
 for (r = cpu->gdb_regs; r; r = r->next) {
 pstrcat(buf, buf_sz, "gdb_core_xml_file) {
+if (cc->gdb_core_feature) {
 g_string_append(gdbserver_state.str_buf, ";qXfer:features:read+");
 }
 
@@ -1548,7 +1548,7 @@ static void handle_query_xfer_features(GArray *params, 
void *user_ctx)
 
 process = gdb_get_cpu_process(gdbserver_state.g_cpu);
 cc = CPU_GET_CLASS(gdbserver_state.g_cpu);
-if (!cc->gdb_core_xml_file) {
+if (!cc->gdb_core_feature) {
 gdb_put_packet("");
 return;
 }
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index d71a162070..a206ab6b1b 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -2353,7 +2353,6 @@ static void arm_cpu_class_init(ObjectClass *oc, void 
*data)
 #ifndef CONFIG_USER_ONLY
 cc->sysemu_ops = _sysemu_ops;
 #endif
-cc->gdb_num_core_regs = 26;
 cc->gdb_arch_name = arm_gdb_arch_name;
 cc->gdb_get_dynamic_xml = arm_gdb_get_dynamic_xml;
 cc->gdb_stop_before_watchpoint = true;
@@ -2378,7 +2377,8 @@ static void cpu_register_class_init(ObjectClass *oc, void 
*data)
 CPUClass *cc = CPU_CLASS(acc);
 
 acc->info = data;
-cc->gdb_core_xml_file = "arm-core.xml";
+cc->gdb_core_feature = gdb_find_static_feature("arm-core.xml");
+cc->gdb_num_core_regs = cc->gdb_core_feature->num_regs;
 }
 
 void arm_cpu_register(const ARMCPUInfo *info)
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 96158093cc..9c2a226159 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -754,8 +754,8 @@ static void aarch64_cpu_class_init(ObjectClass *oc, void 
*data)
 
 cc->gdb_read_register = aarch64_cpu_gdb_read_register;
 cc->gdb_write_register = aarch64_cpu_gdb_write_register;
-

[PATCH RESEND v5 17/26] gdbstub: Add members to identify registers to GDBFeature

2023-08-17 Thread Akihiko Odaki

These members will be used to help plugins to identify registers.
The added members in instances of GDBFeature dynamically generated by
CPUs will be filled in later changes.

Signed-off-by: Akihiko Odaki 
---
 include/exec/gdbstub.h  |  3 +++
 gdbstub/gdbstub.c   |  8 ++--
 target/arm/gdbstub.c|  2 +-
 target/riscv/gdbstub.c  |  4 +---
 scripts/feature_to_c.py | 14 +-
 5 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/include/exec/gdbstub.h b/include/exec/gdbstub.h
index 572abada63..f3f2c40b1a 100644
--- a/include/exec/gdbstub.h
+++ b/include/exec/gdbstub.h
@@ -13,12 +13,15 @@
 typedef struct GDBFeature {
 const char *xmlname;
 const char *xml;
+const char *name;
+const char * const *regs;
 int num_regs;
 } GDBFeature;
 
 typedef struct GDBFeatureBuilder {
 GDBFeature *feature;
 GPtrArray *xml;
+GPtrArray *regs;
 } GDBFeatureBuilder;
 
 
diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index 4648a56088..e52a739491 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -418,9 +418,10 @@ void gdb_feature_builder_init(GDBFeatureBuilder *builder, 
GDBFeature *feature,
 
 builder->feature = feature;
 builder->xml = g_ptr_array_new();
+builder->regs = g_ptr_array_new();
 g_ptr_array_add(builder->xml, header);
 feature->xmlname = xmlname;
-feature->num_regs = 0;
+feature->name = name;
 }
 
 void gdb_feature_builder_append_tag(const GDBFeatureBuilder *builder,
@@ -449,7 +450,7 @@ void gdb_feature_builder_append_reg(const GDBFeatureBuilder 
*builder,
 name, bitsize, type);
 }
 
-builder->feature->num_regs++;
+g_ptr_array_add(builder->regs, (void *)name);
 }
 
 void gdb_feature_builder_end(const GDBFeatureBuilder *builder)
@@ -464,6 +465,9 @@ void gdb_feature_builder_end(const GDBFeatureBuilder 
*builder)
 }
 
 g_ptr_array_free(builder->xml, TRUE);
+
+builder->feature->num_regs = builder->regs->len;
+builder->feature->regs = (void *)g_ptr_array_free(builder->regs, FALSE);
 }
 
 const GDBFeature *gdb_find_static_feature(const char *xmlname)
diff --git a/target/arm/gdbstub.c b/target/arm/gdbstub.c
index fc5ed89e80..dbc396a88b 100644
--- a/target/arm/gdbstub.c
+++ b/target/arm/gdbstub.c
@@ -264,7 +264,7 @@ static void arm_gen_one_feature_sysreg(GDBFeatureBuilder 
*builder,
ARMCPRegInfo *ri, uint32_t ri_key,
int bitsize)
 {
-dyn_feature->data.cpregs.keys[dyn_feature->desc.num_regs] = ri_key;
+dyn_feature->data.cpregs.keys[builder->regs->len] = ri_key;
 
 gdb_feature_builder_append_reg(builder, ri->name, bitsize,
"int", "cp_regs");
diff --git a/target/riscv/gdbstub.c b/target/riscv/gdbstub.c
index d4f9eb1516..a2ec9a3701 100644
--- a/target/riscv/gdbstub.c
+++ b/target/riscv/gdbstub.c
@@ -240,11 +240,9 @@ static GDBFeature *riscv_gen_dynamic_csr_feature(CPUState 
*cs)
 }
 predicate = csr_ops[i].predicate;
 if (predicate && (predicate(env, i) == RISCV_EXCP_NONE)) {
-g_autofree char *dynamic_name = NULL;
 name = csr_ops[i].name;
 if (!name) {
-dynamic_name = g_strdup_printf("csr%03x", i);
-name = dynamic_name;
+name = g_strdup_printf("csr%03x", i);
 }
 
 gdb_feature_builder_append_reg(, name, bitsize,
diff --git a/scripts/feature_to_c.py b/scripts/feature_to_c.py
index e04d6b2df7..807af0e685 100755
--- a/scripts/feature_to_c.py
+++ b/scripts/feature_to_c.py
@@ -50,7 +50,9 @@ def writeliteral(indent, bytes):
 sys.stderr.write(f'unexpected start tag: {element.tag}\n')
 exit(1)
 
+feature_name = element.attrib['name']
 regnum = 0
+regnames = []
 regnums = []
 tags = ['feature']
 for event, element in events:
@@ -67,6 +69,7 @@ def writeliteral(indent, bytes):
 if 'regnum' in element.attrib:
 regnum = int(element.attrib['regnum'])
 
+regnames.append(element.attrib['name'])
 regnums.append(regnum)
 regnum += 1
 
@@ -85,6 +88,15 @@ def writeliteral(indent, bytes):
 writeliteral(8, bytes(os.path.basename(input), 'utf-8'))
 sys.stdout.write(',\n')
 writeliteral(8, read)
-sys.stdout.write(f',\n{num_regs},\n}},\n')
+sys.stdout.write(',\n')
+writeliteral(8, bytes(feature_name, 'utf-8'))
+sys.stdout.write(',\n(const char * const []) {\n')
+
+for index, regname in enumerate(regnames):
+sys.stdout.write(f'[{regnums[index] - base_reg}] =\n')
+writeliteral(16, bytes(regname, 'utf-8'))
+sys.stdout.write(',\n')
+
+sys.stdout.write(f'}},\n{num_regs},\n}},\n')
 
 sys.stdout.write('{ NULL }\n};\n')
-- 
2.41.0

[PATCH RESEND v5 01/26] contrib/plugins: Use GRWLock in execlog

2023-08-17 Thread Akihiko Odaki

execlog had the following comment:
> As we could have multiple threads trying to do this we need to
> serialise the expansion under a lock. Threads accessing already
> created entries can continue without issue even if the ptr array
> gets reallocated during resize.

However, when the ptr array gets reallocated, the other threads may have
a stale reference to the old buffer. This results in use-after-free.

Use GRWLock to properly fix this issue.

Fixes: 3d7caf145e ("contrib/plugins: add execlog to log instruction execution 
and memory access")
Signed-off-by: Akihiko Odaki 
Reviewed-by: Alex Bennée 
---
 contrib/plugins/execlog.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/contrib/plugins/execlog.c b/contrib/plugins/execlog.c
index 7129d526f8..82dc2f584e 100644
--- a/contrib/plugins/execlog.c
+++ b/contrib/plugins/execlog.c
@@ -19,7 +19,7 @@ QEMU_PLUGIN_EXPORT int qemu_plugin_version = 
QEMU_PLUGIN_VERSION;
 
 /* Store last executed instruction on each vCPU as a GString */
 static GPtrArray *last_exec;
-static GMutex expand_array_lock;
+static GRWLock expand_array_lock;
 
 static GPtrArray *imatches;
 static GArray *amatches;
@@ -28,18 +28,16 @@ static GArray *amatches;
  * Expand last_exec array.
  *
  * As we could have multiple threads trying to do this we need to
- * serialise the expansion under a lock. Threads accessing already
- * created entries can continue without issue even if the ptr array
- * gets reallocated during resize.
+ * serialise the expansion under a lock.
  */
 static void expand_last_exec(int cpu_index)
 {
-g_mutex_lock(_array_lock);
+g_rw_lock_writer_lock(_array_lock);
 while (cpu_index >= last_exec->len) {
 GString *s = g_string_new(NULL);
 g_ptr_array_add(last_exec, s);
 }
-g_mutex_unlock(_array_lock);
+g_rw_lock_writer_unlock(_array_lock);
 }
 
 /**
@@ -51,8 +49,10 @@ static void vcpu_mem(unsigned int cpu_index, 
qemu_plugin_meminfo_t info,
 GString *s;
 
 /* Find vCPU in array */
+g_rw_lock_reader_lock(_array_lock);
 g_assert(cpu_index < last_exec->len);
 s = g_ptr_array_index(last_exec, cpu_index);
+g_rw_lock_reader_unlock(_array_lock);
 
 /* Indicate type of memory access */
 if (qemu_plugin_mem_is_store(info)) {
@@ -80,10 +80,14 @@ static void vcpu_insn_exec(unsigned int cpu_index, void 
*udata)
 GString *s;
 
 /* Find or create vCPU in array */
+g_rw_lock_reader_lock(_array_lock);
 if (cpu_index >= last_exec->len) {
+g_rw_lock_reader_unlock(_array_lock);
 expand_last_exec(cpu_index);
+g_rw_lock_reader_lock(_array_lock);
 }
 s = g_ptr_array_index(last_exec, cpu_index);
+g_rw_lock_reader_unlock(_array_lock);
 
 /* Print previous instruction in cache */
 if (s->len) {
-- 
2.41.0

[PATCH RESEND v5 04/26] gdbstub: Introduce gdb_find_static_feature()

2023-08-17 Thread Akihiko Odaki

This function is useful to determine the number of registers exposed to
GDB from the XML name.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
 include/exec/gdbstub.h |  2 ++
 gdbstub/gdbstub.c  | 13 +
 2 files changed, 15 insertions(+)

diff --git a/include/exec/gdbstub.h b/include/exec/gdbstub.h
index 9b484d7eef..d0dcc99ed4 100644
--- a/include/exec/gdbstub.h
+++ b/include/exec/gdbstub.h
@@ -34,6 +34,8 @@ void gdb_register_coprocessor(CPUState *cpu,
  */
 int gdbserver_start(const char *port_or_device);
 
+const GDBFeature *gdb_find_static_feature(const char *xmlname);
+
 void gdb_set_stop_cpu(CPUState *cpu);
 
 /**
diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index 2772f07bbe..f0ba9efaff 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -414,6 +414,19 @@ static const char *get_feature_xml(const char *p, const 
char **newp,
 return name ? gdb_static_features[i].xml : NULL;
 }
 
+const GDBFeature *gdb_find_static_feature(const char *xmlname)
+{
+const GDBFeature *feature;
+
+for (feature = gdb_static_features; feature->xmlname; feature++) {
+if (!strcmp(feature->xmlname, xmlname)) {
+return feature;
+}
+}
+
+g_assert_not_reached();
+}
+
 static int gdb_read_register(CPUState *cpu, GByteArray *buf, int reg)
 {
 CPUClass *cc = CPU_GET_CLASS(cpu);
-- 
2.41.0

[PATCH RESEND v5 03/26] gdbstub: Add num_regs member to GDBFeature

2023-08-17 Thread Akihiko Odaki

Currently the number of registers exposed to GDB is written as magic
numbers in code. Derive the number of registers GDB actually see from
XML files to replace the magic numbers in code later.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alex Bennée 
---
 include/exec/gdbstub.h  |  1 +
 scripts/feature_to_c.py | 46 +++--
 2 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/include/exec/gdbstub.h b/include/exec/gdbstub.h
index 3f08093321..9b484d7eef 100644
--- a/include/exec/gdbstub.h
+++ b/include/exec/gdbstub.h
@@ -13,6 +13,7 @@
 typedef struct GDBFeature {
 const char *xmlname;
 const char *xml;
+int num_regs;
 } GDBFeature;
 
 
diff --git a/scripts/feature_to_c.py b/scripts/feature_to_c.py
index bcbcb83beb..e04d6b2df7 100755
--- a/scripts/feature_to_c.py
+++ b/scripts/feature_to_c.py
@@ -1,7 +1,7 @@
 #!/usr/bin/env python3
 # SPDX-License-Identifier: GPL-2.0-or-later
 
-import os, sys
+import os, sys, xml.etree.ElementTree
 
 def writeliteral(indent, bytes):
 sys.stdout.write(' ' * indent)
@@ -39,10 +39,52 @@ def writeliteral(indent, bytes):
 with open(input, 'rb') as file:
 read = file.read()
 
+parser = xml.etree.ElementTree.XMLPullParser(['start', 'end'])
+parser.feed(read)
+events = parser.read_events()
+event, element = next(events)
+if event != 'start':
+sys.stderr.write(f'unexpected event: {event}\n')
+exit(1)
+if element.tag != 'feature':
+sys.stderr.write(f'unexpected start tag: {element.tag}\n')
+exit(1)
+
+regnum = 0
+regnums = []
+tags = ['feature']
+for event, element in events:
+if event == 'end':
+if element.tag != tags[len(tags) - 1]:
+sys.stderr.write(f'unexpected end tag: {element.tag}\n')
+exit(1)
+
+tags.pop()
+if element.tag == 'feature':
+break
+elif event == 'start':
+if len(tags) < 2 and element.tag == 'reg':
+if 'regnum' in element.attrib:
+regnum = int(element.attrib['regnum'])
+
+regnums.append(regnum)
+regnum += 1
+
+tags.append(element.tag)
+else:
+raise Exception(f'unexpected event: {event}\n')
+
+if len(tags):
+sys.stderr.write('unterminated feature tag\n')
+exit(1)
+
+base_reg = min(regnums)
+num_regs = max(regnums) - base_reg + 1 if len(regnums) else 0
+
 sys.stdout.write('{\n')
 writeliteral(8, bytes(os.path.basename(input), 'utf-8'))
 sys.stdout.write(',\n')
 writeliteral(8, read)
-sys.stdout.write('\n},\n')
+sys.stdout.write(f',\n{num_regs},\n}},\n')
 
 sys.stdout.write('{ NULL }\n};\n')
-- 
2.41.0

[PATCH] HDA codec: Fix wanted_r/w position overflow

2023-08-17 Thread M_O_Bz

From: zeroway 

when the duration now - buft_start reach to some kind of value,
which will get the multiply hda_bytes_per_second(st) * (now - buft_start) 
overflow,
instead of calculate the wanted_r/wpos from start time to current time,
here calculate the each timer tick delta data first in wanted_r/wpos_delta,
and sum it all to wanted_r/wpos to avoid the overflow

Signed-off-by: zeroway 
---
 hw/audio/hda-codec.c | 24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/hw/audio/hda-codec.c b/hw/audio/hda-codec.c
index c51d8ba617..747188221a 100644
--- a/hw/audio/hda-codec.c
+++ b/hw/audio/hda-codec.c
@@ -169,6 +169,8 @@ struct HDAAudioStream {
 uint8_t buf[8192]; /* size must be power of two */
 int64_t rpos;
 int64_t wpos;
+int64_t wanted_rpos;
+int64_t wanted_wpos;
 QEMUTimer *buft;
 int64_t buft_start;
 };
@@ -226,16 +228,18 @@ static void hda_audio_input_timer(void *opaque)
 int64_t wpos = st->wpos;
 int64_t rpos = st->rpos;
 
-int64_t wanted_rpos = hda_bytes_per_second(st) * (now - buft_start)
+int64_t wanted_rpos_delta = hda_bytes_per_second(st) * (now - buft_start)
   / NANOSECONDS_PER_SECOND;
-wanted_rpos &= -4; /* IMPORTANT! clip to frames */
+st->wanted_rpos += wanted_rpos_delta;
+st->wanted_rpos &= -4; /* IMPORTANT! clip to frames */
 
-if (wanted_rpos <= rpos) {
+st->buft_start = now;
+if (st->wanted_rpos <= rpos) {
 /* we already transmitted the data */
 goto out_timer;
 }
 
-int64_t to_transfer = MIN(wpos - rpos, wanted_rpos - rpos);
+int64_t to_transfer = MIN(wpos - rpos, st->wanted_rpos - rpos);
 while (to_transfer) {
 uint32_t start = (rpos & B_MASK);
 uint32_t chunk = MIN(B_SIZE - start, to_transfer);
@@ -290,16 +294,18 @@ static void hda_audio_output_timer(void *opaque)
 int64_t wpos = st->wpos;
 int64_t rpos = st->rpos;
 
-int64_t wanted_wpos = hda_bytes_per_second(st) * (now - buft_start)
+int64_t wanted_wpos_delta = hda_bytes_per_second(st) * (now - buft_start)
   / NANOSECONDS_PER_SECOND;
-wanted_wpos &= -4; /* IMPORTANT! clip to frames */
+st->wanted_wpos += wanted_wpos_delta;
+st->wanted_wpos &= -4; /* IMPORTANT! clip to frames */
 
-if (wanted_wpos <= wpos) {
+st->buft_start = now;
+if (st->wanted_wpos <= wpos) {
 /* we already received the data */
 goto out_timer;
 }
 
-int64_t to_transfer = MIN(B_SIZE - (wpos - rpos), wanted_wpos - wpos);
+int64_t to_transfer = MIN(B_SIZE - (wpos - rpos), st->wanted_wpos - wpos);
 while (to_transfer) {
 uint32_t start = (wpos & B_MASK);
 uint32_t chunk = MIN(B_SIZE - start, to_transfer);
@@ -420,6 +426,8 @@ static void hda_audio_set_running(HDAAudioStream *st, bool 
running)
 int64_t now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
 st->rpos = 0;
 st->wpos = 0;
+st->wanted_rpos = 0;
+st->wanted_wpos = 0;
 st->buft_start = now;
 timer_mod_anticipate_ns(st->buft, now + HDA_TIMER_TICKS);
 } else {
-- 
2.35.1

Re: [PATCH v5 1/5] ebpf: Added eBPF map update through mmap.

2023-08-17 Thread Andrew Melnichenko

Hi all,

On Wed, Aug 16, 2023 at 4:16 AM Jason Wang  wrote:
>
> On Mon, Aug 14, 2023 at 4:36 PM Andrew Melnichenko  wrote:
> >
> > Hi, all.
> >
> > I've researched an issue a bit. And what can we do?
> > In the case of an "old" kernel 5.4, we need to load RSS eBPF without
> > BPF_F_MAPPABLE
> > and use bpf syscall to update the maps. This requires additional 
> > capabilities,
> > and the libvirtd will never give any capabilities to Qemu.
> > So, the only case for "fallback" is running Qemu manually with
> > capabilities(or with root) on kernel 5.4.
> >
> > We can add hack/fallback to RSS ebpf loading routine with additional
> > checks and modify for BPF_F_MAPPABLE.
> > And we can add a fallback for mmap/syscall ebpf access.
> >
> > The problem is that we need kernel 5.5 with BPF_F_MAPPABLE in headers
> > to compile Qemu with fallback,
> > or move macro to the Qemu headers.
> >
> > It can be implemented something like this:
> > RSS eBPF open/load:
> >  * open the skeleton.
> >  * load the skeleton as is - it would fail because of an unknown 
> > BPF_F_MAPPABLE.
> >  * hack/modify map_flags for skeleton and try to reload.
> > RSS eBPF map update(this is straightforward):
> >  * check the mem pointer if null, use bpf syscall
> >
> > The advantage of hacks in Qemu is that we are aware of the eBPF context.
> > I suggest creating different series of patches that would implement
> > the hack/fallback,
> > If we really want to support eBPF on old kernels.
>
> So I think the simplest way is to disable eBPF RSS support on old
> kernels? (e.g during the configure)
>
> Thanks

I think it's possible to check BPF_F_MAPPABLE flag during configuration.
The absence of this flag would indicate that the kernel probably is
old on the build machine.

It wouldn't solve the issue with a "new" environment and old
kernel(g.e. fallback kernel).
Or "old" environment and new kernel(g.e. self-build one).
Also, the environment on the build maintainer's machine and end-up
system may be different
(assuming that the build machine is always up to date).
On the other hand, there is already a fallback to "in-qemu" RSS if eBPF fails.

If it requires, we can add the check, I don't see that it solves much
without hack.
It will be required if we add mmap/syscall hack for element update.

>
> >
> > On Wed, Aug 9, 2023 at 5:21 AM Jason Wang  wrote:
> > >
> > > On Wed, Aug 9, 2023 at 7:15 AM Andrew Melnichenko  
> > > wrote:
> > > >
> > > > Hi all,
> > > >
> > > > On Tue, Aug 8, 2023 at 5:39 AM Jason Wang  wrote:
> > > > >
> > > > > On Thu, Aug 3, 2023 at 5:01 AM Andrew Melnychenko  
> > > > > wrote:
> > > > > >
> > > > > > Changed eBPF map updates through mmaped array.
> > > > > > Mmaped arrays provide direct access to map data.
> > > > > > It should omit using bpf_map_update_elem() call,
> > > > > > which may require capabilities that are not present.
> > > > > >
> > > > > > Signed-off-by: Andrew Melnychenko 
> > > > > > ---
> > > > > >  ebpf/ebpf_rss.c | 117 
> > > > > > ++--
> > > > > >  ebpf/ebpf_rss.h |   5 +++
> > > > > >  2 files changed, 99 insertions(+), 23 deletions(-)
> > > > > >
> > > > > > diff --git a/ebpf/ebpf_rss.c b/ebpf/ebpf_rss.c
> > > > > > index cee658c158b..247f5eee1b6 100644
> > > > > > --- a/ebpf/ebpf_rss.c
> > > > > > +++ b/ebpf/ebpf_rss.c
> > > > > > @@ -27,19 +27,83 @@ void ebpf_rss_init(struct EBPFRSSContext *ctx)
> > > > > >  {
> > > > > >  if (ctx != NULL) {
> > > > > >  ctx->obj = NULL;
> > > > > > +ctx->program_fd = -1;
> > > > > > +ctx->map_configuration = -1;
> > > > > > +ctx->map_toeplitz_key = -1;
> > > > > > +ctx->map_indirections_table = -1;
> > > > > > +
> > > > > > +ctx->mmap_configuration = NULL;
> > > > > > +ctx->mmap_toeplitz_key = NULL;
> > > > > > +ctx->mmap_indirections_table = NULL;
> > > > > >  }
> > > > > >  }
> > > > > >
> > > > > >  bool ebpf_rss_is_loaded(struct EBPFRSSContext *ctx)
> > > > > >  {
> > > > > > -return ctx != NULL && ctx->obj != NULL;
> > > > > > +return ctx != NULL && (ctx->obj != NULL || ctx->program_fd != 
> > > > > > -1);
> > > > > > +}
> > > > > > +
> > > > > > +static bool ebpf_rss_mmap(struct EBPFRSSContext *ctx)
> > > > > > +{
> > > > > > +if (!ebpf_rss_is_loaded(ctx)) {
> > > > > > +return false;
> > > > > > +}
> > > > > > +
> > > > > > +ctx->mmap_configuration = mmap(NULL, 
> > > > > > qemu_real_host_page_size(),
> > > > > > +   PROT_READ | PROT_WRITE, 
> > > > > > MAP_SHARED,
> > > > > > +   ctx->map_configuration, 0);
> > > > > > +if (ctx->mmap_configuration == MAP_FAILED) {
> > > > > > +trace_ebpf_error("eBPF RSS", "can not mmap eBPF 
> > > > > > configuration array");
> > > > > > +return false;
> > > > > > +}
> > > > > > +ctx->mmap_toeplitz_key = mmap(NULL, qemu_real_host_page_size(),
> > > > > > +

Re: [PATCH 5/6] linux-user: Remove ELF_START_MMAP and image_info.start_mmap

2023-08-17 Thread Warner Losh

On Thu, Aug 17, 2023 at 6:19 PM Richard Henderson <
richard.hender...@linaro.org> wrote:

> On 8/17/23 02:00, Philippe Mathieu-Daudé wrote:
> > On 16/8/23 20:14, Richard Henderson wrote:
> >> The start_mmap value is write-only.
> >> Remove the field and the defines that populated it.
> >> Logically, this has been replaced by task_unmapped_base.
> >>
> >> Signed-off-by: Richard Henderson 
> >> ---
> >>   linux-user/qemu.h|  1 -
> >>   linux-user/elfload.c | 38 --
> >>   2 files changed, 39 deletions(-)
> >
> > Can we squash similar removal in bsd-user?
> > Either that or in a different patch:
> > Reviewed-by: Philippe Mathieu-Daudé 
> >
>
> A different patch, for sure.  I don't want trivial patches to interfere
> with the ongoing
> merge process.
>

CC me on the patch. I'll queue it with the other patches that have been
reviewed and act
as conductor to make sure there's no interference with ongoing work.

Warner

Re: [PATCH 5/6] linux-user: Remove ELF_START_MMAP and image_info.start_mmap

2023-08-17 Thread Richard Henderson


On 8/17/23 02:00, Philippe Mathieu-Daudé wrote:

On 16/8/23 20:14, Richard Henderson wrote:

The start_mmap value is write-only.
Remove the field and the defines that populated it.
Logically, this has been replaced by task_unmapped_base.

Signed-off-by: Richard Henderson 
---
  linux-user/qemu.h    |  1 -
  linux-user/elfload.c | 38 --
  2 files changed, 39 deletions(-)


Can we squash similar removal in bsd-user?
Either that or in a different patch:
Reviewed-by: Philippe Mathieu-Daudé 



A different patch, for sure.  I don't want trivial patches to interfere with the ongoing 
merge process.



r~

Re: [PATCH 3/6] linux-user: Adjust brk for load_bias

2023-08-17 Thread Richard Henderson


On 8/17/23 09:04, Michael Tokarev wrote:

16.08.2023 21:14, Richard Henderson wrote:

PIE executables are usually linked at offset 0 and are
relocated somewhere during load.  The hiaddr needs to
be adjusted to keep the brk next to the executable.

Cc: qemu-sta...@nongnu.org
Fixes: 1f356e8c013 ("linux-user: Adjust initial brk when interpreter is close to 
executable")


FWIW, 1f356e8c013 is v8.1.0-rc2-86, - why did you Cc qemu-stable@?

If this "Adjust brk for load_bias" fix isn't supposed to be part of 8.1.0 
release,
sure thing I'll pick it up for stable-8.1, but it looks like it should be in 
8.1.0.

Or are you saying 1f356e8c013 should be picked for stable-8.0, together with 
this one?

(We're yet to decide if stable-8.0 should have any recent linux-user changes).


This has missed 8.1.0-rc4 and therefore will not be in 8.1.0.
I have tagged it stable for stable-8.1 for 8.1.1.


r~

Re: [PATCH 3/6] linux-user: Adjust brk for load_bias

2023-08-17 Thread Richard Henderson


On 8/17/23 01:53, Philippe Mathieu-Daudé wrote:

On 16/8/23 20:14, Richard Henderson wrote:

PIE executables are usually linked at offset 0 and are
relocated somewhere during load.  The hiaddr needs to
be adjusted to keep the brk next to the executable.

Cc: qemu-sta...@nongnu.org
Fixes: 1f356e8c013 ("linux-user: Adjust initial brk when interpreter is close to 
executable")

Signed-off-by: Richard Henderson 
---
  linux-user/elfload.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index ccfbf82836..ab11f141c3 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -3278,7 +3278,7 @@ static void load_elf_image(const char *image_name, const 
ImageSource *src,

  info->start_data = -1;
  info->end_data = 0;
  /* Usual start for brk is after all sections of the main executable. */
-    info->brk = TARGET_PAGE_ALIGN(hiaddr);
+    info->brk = TARGET_PAGE_ALIGN(hiaddr + load_bias);


Did you got some odd behavior or figured that by
code review?

Reviewed-by: Philippe Mathieu-Daudé 


Odd behaviour, easily seen by [heap] being weird or missing.


r~

Re: [PATCH v7 9/9] docs/system: add basic virtio-gpu documentation

2023-08-17 Thread Gurchetan Singh

On Wed, Aug 16, 2023 at 10:28 PM Akihiko Odaki 
wrote:

> On 2023/08/17 11:23, Gurchetan Singh wrote:
> > From: Gurchetan Singh 
> >
> > This adds basic documentation for virtio-gpu.
> >
> > Suggested-by: Akihiko Odaki 
> > Signed-off-by: Gurchetan Singh 
> > Tested-by: Alyssa Ross 
> > Tested-by: Emmanouil Pitsidianakis 
> > Reviewed-by: Emmanouil Pitsidianakis 
> > ---
> > v2: - Incorporated suggestions by Akihiko Odaki
> >  - Listed the currently supported capset_names (Bernard)
> >
> > v3: - Incorporated suggestions by Akihiko Odaki and Alyssa Ross
> >
> > v4: - Incorporated suggestions by Akihiko Odaki
> >
> > v5: - Removed pci suffix from examples
> >  - Verified that -device virtio-gpu-rutabaga works.  Strangely
> >enough, I don't remember changing anything, and I remember
> >it not working.  I did rebase to top of tree though.
> >  - Fixed meson examples in crosvm docs
> >
> >   docs/system/device-emulation.rst   |   1 +
> >   docs/system/devices/virtio-gpu.rst | 113 +
> >   2 files changed, 114 insertions(+)
> >   create mode 100644 docs/system/devices/virtio-gpu.rst
> >
> > diff --git a/docs/system/device-emulation.rst
> b/docs/system/device-emulation.rst
> > index 4491c4cbf7..1167f3a9f2 100644
> > --- a/docs/system/device-emulation.rst
> > +++ b/docs/system/device-emulation.rst
> > @@ -91,6 +91,7 @@ Emulated Devices
> >  devices/nvme.rst
> >  devices/usb.rst
> >  devices/vhost-user.rst
> > +   devices/virtio-gpu.rst
> >  devices/virtio-pmem.rst
> >  devices/vhost-user-rng.rst
> >  devices/canokey.rst
> > diff --git a/docs/system/devices/virtio-gpu.rst
> b/docs/system/devices/virtio-gpu.rst
> > new file mode 100644
> > index 00..8c5c708272
> > --- /dev/null
> > +++ b/docs/system/devices/virtio-gpu.rst
> > @@ -0,0 +1,113 @@
> > +..
> > +   SPDX-License-Identifier: GPL-2.0
> > +
> > +virtio-gpu
> > +==
> > +
> > +This document explains the setup and usage of the virtio-gpu device.
> > +The virtio-gpu device paravirtualizes the GPU and display controller.
> > +
> > +Linux kernel support
> > +
> > +
> > +virtio-gpu requires a guest Linux kernel built with the
> > +``CONFIG_DRM_VIRTIO_GPU`` option.
> > +
> > +QEMU virtio-gpu variants
> > +
> > +
> > +QEMU virtio-gpu device variants come in the following form:
> > +
> > + * ``virtio-vga[-BACKEND]``
> > + * ``virtio-gpu[-BACKEND][-INTERFACE]``
> > + * ``vhost-user-vga``
> > + * ``vhost-user-pci``
> > +
> > +**Backends:** QEMU provides a 2D virtio-gpu backend, and two accelerated
> > +backends: virglrenderer ('gl' device label) and rutabaga_gfx ('rutabaga'
> > +device label).  There is a vhost-user backend that runs the graphics
> stack
> > +in a separate process for improved isolation.
> > +
> > +**Interfaces:** QEMU further categorizes virtio-gpu device variants
> based
> > +on the interface exposed to the guest. The interfaces can be classified
> > +into VGA and non-VGA variants. The VGA ones are prefixed with virtio-vga
> > +or vhost-user-vga while the non-VGA ones are prefixed with virtio-gpu or
> > +vhost-user-gpu.
> > +
> > +The VGA ones always use the PCI interface, but for the non-VGA ones, the
> > +user can further pick between MMIO or PCI. For MMIO, the user can suffix
> > +the device name with -device, though vhost-user-gpu does not support
> MMIO.
> > +For PCI, the user can suffix it with -pci. Without these suffixes, the
> > +platform default will be chosen.
> > +
> > +virtio-gpu 2d
> > +-
> > +
> > +The default 2D backend only performs 2D operations. The guest needs to
> > +employ a software renderer for 3D graphics.
> > +
> > +Typically, the software renderer is provided by `Mesa`_ or
> `SwiftShader`_.
> > +Mesa's implementations (LLVMpipe, Lavapipe and virgl below) work out of
> box
> > +on typical modern Linux distributions.
> > +
> > +.. parsed-literal::
> > +-device virtio-gpu
> > +
> > +.. _Mesa: https://www.mesa3d.org/
> > +.. _SwiftShader: https://github.com/google/swiftshader
> > +
> > +virtio-gpu virglrenderer
> > +
> > +
> > +When using virgl accelerated graphics mode in the guest, OpenGL API
> calls
> > +are translated into an intermediate representation (see `Gallium3D`_).
> The
> > +intermediate representation is communicated to the host and the
> > +`virglrenderer`_ library on the host translates the intermediate
> > +representation back to OpenGL API calls.
> > +
> > +.. parsed-literal::
> > +-device virtio-gpu-gl
> > +
> > +.. _Gallium3D: https://www.freedesktop.org/wiki/Software/gallium/
> > +.. _virglrenderer: https://gitlab.freedesktop.org/virgl/virglrenderer/
> > +
> > +virtio-gpu rutabaga
> > +---
> > +
> > +virtio-gpu can also leverage `rutabaga_gfx`_ to provide `gfxstream`_
> > +rendering and `Wayland display passthrough`_.  With the gfxstream
> rendering
> > +mode, GLES and Vulkan calls are forwarded to the host with

[PATCH v8 9/9] docs/system: add basic virtio-gpu documentation

2023-08-17 Thread Gurchetan Singh

This adds basic documentation for virtio-gpu.

Suggested-by: Akihiko Odaki 
Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Emmanouil Pitsidianakis 
Reviewed-by: Emmanouil Pitsidianakis 

v2: - Incorporated suggestions by Akihiko Odaki
- Listed the currently supported capset_names (Bernard)

v3: - Incorporated suggestions by Akihiko Odaki and Alyssa Ross

v4: - Incorporated suggestions by Akihiko Odaki

v5: - Removed pci suffix from examples
- Verified that -device virtio-gpu-rutabaga works.  Strangely
  enough, I don't remember changing anything, and I remember
  it not working.  I did rebase to top of tree though.
- Fixed meson examples in crosvm docs

v8: - Remove different links for "rutabaga_gfx" and
  "gfxstream-enabled rutabaga" (Akihiko)
---
 docs/system/device-emulation.rst   |   1 +
 docs/system/devices/virtio-gpu.rst | 112 +
 2 files changed, 113 insertions(+)
 create mode 100644 docs/system/devices/virtio-gpu.rst

diff --git a/docs/system/device-emulation.rst b/docs/system/device-emulation.rst
index 4491c4cbf7..1167f3a9f2 100644
--- a/docs/system/device-emulation.rst
+++ b/docs/system/device-emulation.rst
@@ -91,6 +91,7 @@ Emulated Devices
devices/nvme.rst
devices/usb.rst
devices/vhost-user.rst
+   devices/virtio-gpu.rst
devices/virtio-pmem.rst
devices/vhost-user-rng.rst
devices/canokey.rst
diff --git a/docs/system/devices/virtio-gpu.rst 
b/docs/system/devices/virtio-gpu.rst
new file mode 100644
index 00..2b3eb536f9
--- /dev/null
+++ b/docs/system/devices/virtio-gpu.rst
@@ -0,0 +1,112 @@
+..
+   SPDX-License-Identifier: GPL-2.0
+
+virtio-gpu
+==
+
+This document explains the setup and usage of the virtio-gpu device.
+The virtio-gpu device paravirtualizes the GPU and display controller.
+
+Linux kernel support
+
+
+virtio-gpu requires a guest Linux kernel built with the
+``CONFIG_DRM_VIRTIO_GPU`` option.
+
+QEMU virtio-gpu variants
+
+
+QEMU virtio-gpu device variants come in the following form:
+
+ * ``virtio-vga[-BACKEND]``
+ * ``virtio-gpu[-BACKEND][-INTERFACE]``
+ * ``vhost-user-vga``
+ * ``vhost-user-pci``
+
+**Backends:** QEMU provides a 2D virtio-gpu backend, and two accelerated
+backends: virglrenderer ('gl' device label) and rutabaga_gfx ('rutabaga'
+device label).  There is a vhost-user backend that runs the graphics stack
+in a separate process for improved isolation.
+
+**Interfaces:** QEMU further categorizes virtio-gpu device variants based
+on the interface exposed to the guest. The interfaces can be classified
+into VGA and non-VGA variants. The VGA ones are prefixed with virtio-vga
+or vhost-user-vga while the non-VGA ones are prefixed with virtio-gpu or
+vhost-user-gpu.
+
+The VGA ones always use the PCI interface, but for the non-VGA ones, the
+user can further pick between MMIO or PCI. For MMIO, the user can suffix
+the device name with -device, though vhost-user-gpu does not support MMIO.
+For PCI, the user can suffix it with -pci. Without these suffixes, the
+platform default will be chosen.
+
+virtio-gpu 2d
+-
+
+The default 2D backend only performs 2D operations. The guest needs to
+employ a software renderer for 3D graphics.
+
+Typically, the software renderer is provided by `Mesa`_ or `SwiftShader`_.
+Mesa's implementations (LLVMpipe, Lavapipe and virgl below) work out of box
+on typical modern Linux distributions.
+
+.. parsed-literal::
+-device virtio-gpu
+
+.. _Mesa: https://www.mesa3d.org/
+.. _SwiftShader: https://github.com/google/swiftshader
+
+virtio-gpu virglrenderer
+
+
+When using virgl accelerated graphics mode in the guest, OpenGL API calls
+are translated into an intermediate representation (see `Gallium3D`_). The
+intermediate representation is communicated to the host and the
+`virglrenderer`_ library on the host translates the intermediate
+representation back to OpenGL API calls.
+
+.. parsed-literal::
+-device virtio-gpu-gl
+
+.. _Gallium3D: https://www.freedesktop.org/wiki/Software/gallium/
+.. _virglrenderer: https://gitlab.freedesktop.org/virgl/virglrenderer/
+
+virtio-gpu rutabaga
+---
+
+virtio-gpu can also leverage rutabaga_gfx to provide `gfxstream`_
+rendering and `Wayland display passthrough`_.  With the gfxstream rendering
+mode, GLES and Vulkan calls are forwarded to the host with minimal
+modification.
+
+The crosvm book provides directions on how to build a `gfxstream-enabled
+rutabaga`_ and launch a `guest Wayland proxy`_.
+
+This device does require host blob support (``hostmem`` field below). The
+``hostmem`` field specifies the size of virtio-gpu host memory window.
+This is typically between 256M and 8G.
+
+At least one capset (see colon separated ``capset_names`` below) must be
+specified when starting the device.  The currently supported
+``capset_names`` are ``gfxstream-vulkan`` and ``cross-domain`` on Linux
+guests.

Re: [PATCH 2/3] tcg: Fold deposit with zero to and

2023-08-17 Thread Richard Henderson


On 8/17/23 08:50, Peter Maydell wrote:

+if (arg_is_const(op->args[1])
+&& arg_info(op->args[1])->val == 0
+&& op->args[3] == 0) {
+uint64_t mask = MAKE_64BIT_MASK(0, op->args[4]);


The docs for the TCG deposit op don't say what the restrictions on the
immediate args are, but this will be UB for QEMU if args[4] is 0.
Have we already sanitized those somewhere?


tcg_gen_deposit_{i32,i64} do so.


r~

Re: [PATCH 1/3] tcg/i386: Drop BYTEH deposits for 64-bit

2023-08-17 Thread Richard Henderson


On 8/17/23 08:44, Peter Maydell wrote:

On Wed, 16 Aug 2023 at 16:01, Richard Henderson
 wrote:


It is more useful to allow low-part deposits into all registers
than to restrict allocation for high-byte deposits.



  #define TCG_TARGET_deposit_i32_valid(ofs, len) \
-(((ofs) == 0 && (len) == 8) || ((ofs) == 8 && (len) == 8) || \
- ((ofs) == 0 && (len) == 16))
+(((ofs) == 0 && ((len) == 8 || (len) == 16)) || \
+ (TCG_TARGET_REG_BITS == 32 && (ofs) == 8 && (len) == 8))
  #define TCG_TARGET_deposit_i64_validTCG_TARGET_deposit_i32_valid




@@ -2752,7 +2751,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
  if (args[3] == 0 && args[4] == 8) {
  /* load bits 0..7 */
  tcg_out_modrm(s, OPC_MOVB_EvGv | P_REXB_R | P_REXB_RM, a2, a0);
-} else if (args[3] == 8 && args[4] == 8) {
+} else if (TCG_TARGET_REG_BITS == 32 && args[3] == 8 && args[4] == 8) {


Should we assert(TCG_TARGET_REG_BITS == 32) rather than making it part of the
condition?


The if/else chain ends in g_assert_not_reached().


If I understand the change to the deposit_i32_valid macro above, we
should never get here with 8, 8 if TCG_TARGET_REG_BITS is 64.


Correct.


r~

Re: How to synchronize CPUs on MMIO read?

2023-08-17 Thread Richard Henderson


On 8/16/23 09:31, Igor Lesik wrote:

Hi.

I need to model some custom HW that synchronizes CPUs when they read MMIO register N: MMIO 
read does not return until another CPU writes to MMIO register M. I modeled this behavior 
with a) on MMIO read of N, save CPU into a list of waiting CPUs and put it asleep with 
cpu_interrupt(current_cpu, CPU_INTERRUPT_HALT) and b) on MMIO write to M, wake all waiting 
CPUs with cpu->halted = 0; qemu_cpu_kick(cpu). It seems to work fine. However, this HW has 
a twist: MMIO read of N returns a value that was written by MMIO write to M. Can anyone 
please advise how this could be done?


You'll want to add something to allow each cpu to latch the value written.

Something like

CPU_FOREACH(cpu) {
if (cpu != write_cpu) {
*cpu->mmio_latch = value;
qemu_cpu_kick(cpu);
}
}

where cpu->sync_latch = >env.reg[N] for the register destination of the 
MMIO read.

This is easy if you can identify the hw sync mmio during translation.  If this sync is 
mapped somewhere arbitrary within the address space, you may have to work harder.




r~

Re: [PATCH v3 2/2] target/i386: Avoid overflow of the cache parameter enumerated by leaf 4

2023-08-17 Thread Isaku Yamahata

On Wed, Aug 16, 2023 at 04:06:58PM +0800,
Qian Wen  wrote:

> According to SDM, CPUID.0x4:EAX[31:26] indicates the Maximum number of
> addressable IDs for processor cores in the physical package. If we
> launch over 64 cores VM, the 6-bit field will overflow, and the wrong
> core_id number will be reported.
> 
> Since the HW reports 0x3f when the intel processor has over 64 cores,
> limit the max value written to EBX[31:26] to 63, so max num_cores should
> be 64.
> 
> Signed-off-by: Qian Wen 
> Reviewed-by: Zhao Liu 
> ---
>  target/i386/cpu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 5c008b9d7e..3b6854300a 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -248,7 +248,7 @@ static void encode_cache_cpuid4(CPUCacheInfo *cache,
>  *eax = CACHE_TYPE(cache->type) |
> CACHE_LEVEL(cache->level) |
> (cache->self_init ? CACHE_SELF_INIT_LEVEL : 0) |
> -   ((num_cores - 1) << 26) |
> +   ((MIN(num_cores, 64) - 1) << 26) |
> ((num_apic_ids - 1) << 14);
>  
>  assert(cache->line_size > 0);
> -- 
> 2.25.1
> 
> 

Reviewed-by: Isaku Yamahata 
-- 
Isaku Yamahata

Re: [PATCH v3 0/2] Fix overflow of the max number of IDs for logic processor and core

2023-08-17 Thread Isaku Yamahata

On Wed, Aug 16, 2023 at 04:06:56PM +0800,
Qian Wen  wrote:

> CPUID.1.EBX[23:16]: Maximum number of addressable IDs for logical
> processors in this physical package.
> CPUID.4:EAX[31:26]: Maximum number of addressable IDs for processor cores
> in the physical package.
> 
> The current qemu code doesn't limit the value written to these two fields.
> If the guest has a huge number of cores, APs (application processor) will
> fail to bring up and the wrong info will be reported.
> According to HW behavior, setting max value written to CPUID.1.EBX[23:16]
> to 255, and CPUID.4:EAX[31:26] to 63.
> 
> ---
> Changes v2 -> v3:
>   - Add patch 2.
>   - Revise the commit message and comment to be clearer.
>   - Using MIN() for limitation.
> Changes v1 -> v2:
>   - Revise the commit message and comment to more clearer.
>   - Rebased to v8.1.0-rc2.
> 
> Qian Wen (2):
>   target/i386: Avoid cpu number overflow in legacy topology
>   target/i386: Avoid overflow of the cache parameter enumerated by leaf 4
> 
>  target/i386/cpu.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> base-commit: 0d52116fd82cdd1f4a88837336af5b6290c364a4
> -- 
> 2.25.1
> 

The patch itself looks good. Can we add test cases?
We have some in qemu/tests/unit/test-x86-cpuid.c.
-- 
Isaku Yamahata

Re: [PATCH v3 1/2] target/i386: Avoid cpu number overflow in legacy topology

2023-08-17 Thread Isaku Yamahata

On Wed, Aug 16, 2023 at 04:06:57PM +0800,
Qian Wen  wrote:

> The legacy topology enumerated by CPUID.1.EBX[23:16] is defined in SDM
> Vol2:
> 
> Bits 23-16: Maximum number of addressable IDs for logical processors in
> this physical package.
> 
> When threads_per_socket > 255, it will 1) overwrite bits[31:24] which is
> apic_id, 2) bits [23:16] get truncated.
> 
> Specifically, if launching the VM with -smp 256, the value written to
> EBX[23:16] is 0 because of data overflow. If the guest only supports
> legacy topology, without V2 Extended Topology enumerated by CPUID.0x1f
> or Extended Topology enumerated by CPUID.0x0b to support over 255 CPUs,
> the return of the kernel invoking cpu_smt_allowed() is false and APs
> (application processors) will fail to bring up. Then only CPU 0 is online,
> and others are offline.
> 
> For example, launch VM via:
> qemu-system-x86_64 -M q35,accel=kvm,kernel-irqchip=split \
> -cpu qemu64,cpuid-0xb=off -smp 256 -m 32G \
> -drive file=guest.img,if=none,id=virtio-disk0,format=raw \
> -device virtio-blk-pci,drive=virtio-disk0,bootindex=1 --nographic
> 
> The guest shows:
> CPU(s):   256
> On-line CPU(s) list:  0
> Off-line CPU(s) list: 1-255
> 
> To avoid this issue caused by overflow, limit the max value written to
> EBX[23:16] to 255 as the HW does.
> 
> Signed-off-by: Qian Wen 
> Reviewed-by: Zhao Liu 
> ---
>  target/i386/cpu.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 97ad229d8b..5c008b9d7e 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -6008,6 +6008,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
> uint32_t count,
>  uint32_t die_offset;
>  uint32_t limit;
>  uint32_t signature[3];
> +uint32_t threads_per_socket;
>  X86CPUTopoInfo topo_info;
>  
>  topo_info.dies_per_pkg = env->nr_dies;
> @@ -6049,8 +6050,9 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
> uint32_t count,
>  *ecx |= CPUID_EXT_OSXSAVE;
>  }
>  *edx = env->features[FEAT_1_EDX];
> -if (cs->nr_cores * cs->nr_threads > 1) {
> -*ebx |= (cs->nr_cores * cs->nr_threads) << 16;
> +threads_per_socket = cs->nr_cores * cs->nr_threads;
> +if (threads_per_socket > 1) {
> +*ebx |= MIN(threads_per_socket, 255) << 16;
>  *edx |= CPUID_HT;
>  }
>  if (!cpu->enable_pmu) {
> -- 
> 2.25.1
> 
> 

Reviewed-by: Isaku Yamahata 
-- 
Isaku Yamahata

Re: [PATCH 01/21] block: Remove unused BlockReopenQueueEntry.perms_checked

2023-08-17 Thread Eric Blake

On Thu, Aug 17, 2023 at 02:50:00PM +0200, Kevin Wolf wrote:
> This field has been unused since commit 72373e40fbc ('block:
> bdrv_reopen_multiple: refresh permissions on updated graph').
> Remove it.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block.c | 1 -
>  1 file changed, 1 deletion(-)

Reviewed-by: Eric Blake

Re: Tips for local testing guestfwd

2023-08-17 Thread Felix Wu

Hi Samuel,

Thanks for the clarification! I missed the email so didn't reply in time,
but was able to figure it out.

Hi everyone,
IPv6 guestfwd works in my local test but it has a weird bug: if you send
two requests, the first one gets the correct response, but the second one
gets stuck.
I am using a simple http server for this test, and just noticed this bug
also exists in IPv4 guestfwd. I've documented it in
https://gitlab.com/qemu-project/qemu/-/issues/1835.

Just want to check if anyone has seen the same issue before.

Thanks! Felix

On Thu, Jul 20, 2023 at 7:54 AM Samuel Thibault 
wrote:

> Hello,
>
> Felix Wu, le mar. 18 juil. 2023 18:12:16 -0700, a ecrit:
> > 02 == SYN so it looks good. But both tcpdump and wireshark (looking into
> packet
> > dump provided by QEMU invocation)
>
> Which packet dump?
>
> > I added multiple prints inside slirp and confirmed the ipv6 version of
> [1] was
> > reached.
> > in tcp_output function [2], I got following print:
> > qemu-system-aarch64: info: Slirp: AF_INET6 out dst ip =
> > fdb5:481:10ce:0:8c41:aaff:fea9:f674, port = 52190
> > qemu-system-aarch64: info: Slirp: AF_INET6 out src ip = fec0::105, port
> = 54322
> > It looks like there should be something being sent back to the guest,
>
> That's what it is.
>
> > unless my understanding of tcp_output is wrong.
>
> It looks so.
>
> > To understand the datapath of guestfwd better, I have the following
> questions:
> > 1. What's the meaning of tcp_input and tcp_output? My guess is the
> following
> > graph, but I would like to confirm.
>
> No, tcp_input is for packets that come from the guest, and tcp_output is
> for packets that are send to the guest. So it's like that:
>
> > tcp_inputwrite_cb  host send()
> > QEMU > slirp ---> QEMU > host
> > <<- <-
> >  tcp_output  slirp_socket_recvhost recv()
>
> > 2. I don't see port 6655 in the above process. How does slirp know 6655
> is the
> > port that needs to be visited on the host side?
>
> Slirp itself *doesn't* know that port. The guestfwd piece just calls the
> SlirpWriteCb when it has data coming from the guest. See the
> documentation:
>
> /* Set up port forwarding between a port in the guest network and a
>  * callback that will receive the data coming from the port */
> SLIRP_EXPORT
> int slirp_add_guestfwd(Slirp *slirp, SlirpWriteCb write_cb, void *opaque,
>struct in_addr *guest_addr, int guest_port);
>
> and
>
> /* This is called by the application for a guestfwd, to provide the data
> to be
>  * sent on the forwarded port */
> SLIRP_EXPORT
> void slirp_socket_recv(Slirp *slirp, struct in_addr guest_addr, int
> guest_port,
>const uint8_t *buf, int size);
>
> Samuel
>

Re: [PATCH V3 01/10] vl: start on wakeup request

2023-08-17 Thread Peter Xu

On Mon, Aug 14, 2023 at 11:54:27AM -0700, Steve Sistare wrote:
> +void vm_wakeup(void)
> +{
> +if (!vm_started) {
> +vm_start();

(irrelevant of the global var that I wanted to remove..)

Calling vm_start() is wrong here, IMHO.

I think we need to notify everyone on the wakeup before really waking up
the vcpus:

notifier_list_notify(_notifiers, _reason);

There's resume_all_vcpus() after that.  I don't know the side effect of
resuming vcpus without such notifications, at least some acpi fields do not
seem to be updated so the vcpu can see stale values (acpi_notify_wakeup()).

> +} else {
> +runstate_set(RUN_STATE_RUNNING);
>  }
>  }

-- 
Peter Xu

Re: [PATCH V3 00/10] fix migration of suspended runstate

2023-08-17 Thread Peter Xu

On Mon, Aug 14, 2023 at 11:54:26AM -0700, Steve Sistare wrote:
> Migration of a guest in the suspended runstate is broken.  The incoming
> migration code automatically tries to wake the guest, which is wrong;
> the guest should end migration in the same runstate it started.  Further,
> for a restored snapshot, the automatic wakeup fails.  The runstate is
> RUNNING, but the guest is not.  See the commit messages for the details.

Hi Steve,

I drafted two small patches to show what I meant, on top of this series.
Before applying these two, one needs to revert patch 1 in this series.

After applied, it should also pass all three new suspend tests.  We can
continue the discussion here based on the patches.

Thanks,

===
>From 2e495b08be4c56d5d8a47ba1657bae6e316c6254 Mon Sep 17 00:00:00 2001
From: Peter Xu 
Date: Thu, 17 Aug 2023 12:32:00 -0400
Subject: [PATCH 1/2] cpus: Allow vm_prepare_start() to take vm state as input

It was by default always setting the state as RUNNING, but logically
SUSPENDED (acpi s1) should also fall into "vm running" case, where it's
only the vcpus that are stopped running (while everything else is).

Adding such a state parameter to be prepared when we want to prepare start
when not allowing vcpus to start yet (RUN_STATE_SUSPENDED).

Note: I found that not all vm state notifiers are ready for SUSPENDED when
having running=true set.  Here let's always pass in RUNNING irrelevant of
the state passed into vm_prepare_start(), and leave that for later to
figure out.  So far there should have (hopefully) no impact functional
wise.

For this specific patch, no functional change at all should be intended,
because all callers are still passing over RUNNING.

Signed-off-by: Peter Xu 
---
 include/sysemu/runstate.h |  3 ++-
 gdbstub/softmmu.c |  2 +-
 softmmu/cpus.c| 12 +---
 3 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 7beb29c2e2..7d889ab7c7 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -39,8 +39,9 @@ void vm_start(void);
  * vm_prepare_start: Prepare for starting/resuming the VM
  *
  * @step_pending: whether any of the CPUs is about to be single-stepped by gdb
+ * @state: the vm state to setup
  */
-int vm_prepare_start(bool step_pending);
+int vm_prepare_start(bool step_pending, RunState state);
 int vm_stop(RunState state);
 int vm_stop_force_state(RunState state);
 int vm_shutdown(void);
diff --git a/gdbstub/softmmu.c b/gdbstub/softmmu.c
index f509b7285d..a43e8328c0 100644
--- a/gdbstub/softmmu.c
+++ b/gdbstub/softmmu.c
@@ -565,7 +565,7 @@ int gdb_continue_partial(char *newstates)
 }
 }
 
-if (vm_prepare_start(step_requested)) {
+if (vm_prepare_start(step_requested, RUN_STATE_RUNNING)) {
 return 0;
 }
 
diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index fed20ffb5d..000fac79b7 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -681,7 +681,7 @@ int vm_stop(RunState state)
  * Returns -1 if the vCPUs are not to be restarted (e.g. if they are already
  * running or in case of an error condition), 0 otherwise.
  */
-int vm_prepare_start(bool step_pending)
+int vm_prepare_start(bool step_pending, RunState state)
 {
 RunState requested;
 
@@ -713,14 +713,20 @@ int vm_prepare_start(bool step_pending)
 qapi_event_send_resume();
 
 cpu_enable_ticks();
-runstate_set(RUN_STATE_RUNNING);
+runstate_set(state);
+/*
+ * FIXME: ignore "state" being passed in for now, notify always with
+ * RUNNING. Because some of the vm state change handlers may not expect
+ * other states (e.g. SUSPENDED) passed in with running=true.  This can
+ * be modified after proper investigation over all vm state notifiers.
+ */
 vm_state_notify(1, RUN_STATE_RUNNING);
 return 0;
 }
 
 void vm_start(void)
 {
-if (!vm_prepare_start(false)) {
+if (!vm_prepare_start(false, RUN_STATE_RUNNING)) {
 resume_all_vcpus();
 }
 }
-- 
2.41.0

===

>From 4a0936eafd03952d58ab380271559c4a2049b96e Mon Sep 17 00:00:00 2001
From: Peter Xu 
Date: Thu, 17 Aug 2023 12:44:29 -0400
Subject: [PATCH 2/2] fixup

Signed-off-by: Peter Xu 
---
 migration/migration.c| 9 +
 tests/qtest/migration-test.c | 9 +
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index e6b8024b03..b004475af6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -497,7 +497,7 @@ static void process_incoming_migration_bh(void *opaque)
 migration_incoming_disable_colo();
 vm_start();
 } else {
-runstate_set(global_state_get_runstate());
+vm_prepare_start(false, global_state_get_runstate());
 }
 /*
  * This must happen after any state changes since as soon as an external
@@ -1143,15 +1143,16 @@ void migrate_set_state(int *state, int old_state, int 
new_state)
 
 void

Re: [PATCH V1 2/3] migration: fix suspended runstate

2023-08-17 Thread Peter Xu

On Wed, Aug 16, 2023 at 01:48:13PM -0400, Steven Sistare wrote:
> On 8/14/2023 3:37 PM, Peter Xu wrote:
> > On Mon, Aug 14, 2023 at 02:53:56PM -0400, Steven Sistare wrote:
> >>> Can we just call vm_state_notify() earlier?
> >>
> >> We cannot.  The guest is not running yet, and will not be until later.
> >> We cannot call notifiers that perform actions that complete, or react to, 
> >> the guest entering a running state.
> > 
> > I tried to look at a few examples of the notifees and most of them I read
> > do not react to "vcpu running" but "vm running" (in which case I think
> > "suspended" mode falls into "vm running" case); most of them won't care on
> > the RunState parameter passed in, but only the bool "running".
> > 
> > In reality, when running=true, it must be RUNNING so far.
> > 
> > In that case does it mean we should notify right after the switchover,
> > since after migration the vm is indeed running only if the vcpus are not
> > during suspend?
> 
> I cannot parse your question, but maybe this answers it.
> If the outgoing VM is running and not suspended, then the incoming side
> tests for autostart==true and calls vm_start, which calls the notifiers,
> right after the switchover.

I meant IMHO SUSPENDED should be seen as "vm running" case to me, just like
RUNNING.  Then, we should invoke vm_prepare_start(), just need some touch
ups.

> 
> > One example (of possible issue) is vfio_vmstate_change(), where iiuc if we
> > try to suspend a VM it should keep to be VFIO_DEVICE_STATE_RUNNING for that
> > device; this kind of prove to me that SUSPEND is actually one of
> > running=true states.
> > 
> > If we postpone all notifiers here even after we switched over to dest qemu
> > to the next upcoming suspend wakeup, I think it means these devices will
> > not be in VFIO_DEVICE_STATE_RUNNING after switchover but perhaps
> > VFIO_DEVICE_STATE_STOP.
> 
> or VFIO_DEVICE_STATE_RESUMING, which is set in vfio_load_setup.
> AFAIK it is OK to remain in that state until wakeup is called later.

So let me provide another clue of why I think we should call
vm_prepare_start()..

Firstly, I think RESUME event should always be there right after we
switched over, no matter suspeneded or not.  I just noticed that your test
case would work because you put "wakeup" to be before RESUME.  I'm not sure
whether that's correct.  I'd bet people could rely on that RESUME to
identify the switchover.

More importantly, I'm wondering whether RTC should still be running during
the suspended mode?  Sorry again if my knowledge over there is just
limited, so correct me otherwise - but my understanding is during suspend
mode (s1 or s3, frankly I can't tell which one this belongs..), rtc should
still be running along with the system clock.  It means we _should_ at
least call cpu_enable_ticks() to enable rtc:

/*
 * enable cpu_get_ticks()
 * Caller must hold BQL which serves as mutex for vm_clock_seqlock.
 */
void cpu_enable_ticks(void)

I think that'll enable cpu_get_tsc() and make it start to work right.

> 
> > Ideally I think we should here call vm_state_notify() with running=true and
> > state=SUSPEND, but since I do see some hooks are not well prepared for
> > SUSPEND over running=true, I'd think we should on the safe side call
> > vm_state_notify(running=true, state=RUNNING) even for SUSPEND at switch
> > over phase.  With that IIUC it'll naturally work (e.g. when wakeup again
> > later we just need to call no notifiers).
> 
> Notifiers are just one piece, all the code in vm_prepare_start must be called.
> Is it correct to call all of that long before we actually resume the CPUs in
> wakeup?  I don't know, but what is the point?

The point is not only for cleaness (again, I really, really don't like that
new global.. sorry), but also now I think we should make the vm running.

> The wakeup code still needs
> modification to conditionally resume the vcpus.  The scheme would be roughly:
> 
> loadvm_postcopy_handle_run_bh()
> runstat = global_state_get_runstate();
> if (runstate == RUN_STATE_RUNNING) {
> vm_start()
> } else if (runstate == RUN_STATE_SUSPENDED)
> vm_prepare_start();   // the start of vm_start()
> }
> 
> qemu_system_wakeup_request()
> if (some condition)
> resume_all_vcpus();   // the remainder of vm_start()
> else
> runstate_set(RUN_STATE_RUNNING)

No it doesn't.  wakeup_reason is set there, main loop does the resuming.
See:

if (qemu_wakeup_requested()) {
pause_all_vcpus();
qemu_system_wakeup();
notifier_list_notify(_notifiers, _reason);
wakeup_reason = QEMU_WAKEUP_REASON_NONE;
resume_all_vcpus();
qapi_event_send_wakeup();
}

> 
> How is that better than my patches
> [PATCH V3 01/10] vl: start on wakeup request
> [PATCH V3 02/10] migration: preserve suspended runstate
> 
> loadvm_postcopy_handle_run_bh()
> runstate =

Re: [PATCH 0/6] linux-user: Rewrite open_self_maps

2023-08-17 Thread Ilya Leoshkevich

On Wed, Aug 16, 2023 at 11:14:31AM -0700, Richard Henderson wrote:
> Based-on: 20230816180338.572576-1-richard.hender...@linaro.org
> ("[PATCH v4 00/18] linux-user: Implement VDSOs")
> 
> As promised, a rewrite of /proc/self/{maps,smaps} emulation
> using interval trees.
> 
> Incorporate Helge's change to mark [heap], and also mark [vdso].
> 
> 
> r~
> 
> 
> Richard Henderson (6):
>   util/selfmap: Use dev_t and ino_t in MapInfo
>   linux-user: Use walk_memory_regions for open_self_maps
>   linux-user: Adjust brk for load_bias
>   linux-user: Show heap address in /proc/pid/maps
>   linux-user: Remove ELF_START_MMAP and image_info.start_mmap
>   linux-user: Show vdso address in /proc/pid/maps
> 
>  include/qemu/selfmap.h |   4 +-
>  linux-user/qemu.h  |   2 +-
>  linux-user/elfload.c   |  41 +
>  linux-user/syscall.c   | 194 +
>  util/selfmap.c |  12 +--
>  5 files changed, 131 insertions(+), 122 deletions(-)
> 
> -- 
> 2.34.1

As expected, this improved the situation with mappings on ppc64le.
Handling the errors from read_self_maps() is also a nice addition.

Reviewed-by: Ilya Leoshkevich

Re: [PATCH] chardev/char-pty: Avoid losing bytes when the other side just (re-)connected

2023-08-17 Thread Thomas Huth


On 17/08/2023 15.47, Marc-André Lureau wrote:

Hi

On Thu, Aug 17, 2023 at 5:06 PM Daniel P. Berrangé  wrote:


On Thu, Aug 17, 2023 at 02:00:26PM +0200, Thomas Huth wrote:

On 17/08/2023 12.32, Daniel P. Berrangé wrote:

On Wed, Aug 16, 2023 at 11:07:43PM +0200, Thomas Huth wrote:

When starting a guest via libvirt with "virsh start --console ...",
the first second of the console output is missing. This is especially
annoying on s390x that only has a text console by default and no graphical
output - if the bios fails to boot here, the information about what went
wrong is completely lost.

One part of the problem (there is also some things to be done on the
libvirt side) is that QEMU only checks with a 1 second timer whether
the other side of the pty is already connected, so the first second of
the console output is always lost.

This likely used to work better in the past, since the code once checked
for a re-connection during write, but this has been removed in commit
f8278c7d74 ("char-pty: remove the check for connection on write") to avoid
some locking.

To ease the situation here at least a little bit, let's check with g_poll()
whether we could send out the data anyway, even if the connection has not
been marked as "connected" yet. The file descriptor is marked as non-blocking
anyway since commit fac6688a18 ("Do not hang on full PTY"), so this should
not cause any trouble if the other side is not ready for receiving yet.

With this patch applied, I can now successfully see the bios output of
a s390x guest when running it with "virsh start --console" (with a patched
version of virsh that fixes the remaining issues there, too).

Reported-by: Marc Hartmayer 
Signed-off-by: Thomas Huth 
---
   chardev/char-pty.c | 22 +++---
   1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/chardev/char-pty.c b/chardev/char-pty.c
index 4e5deac18a..fad12dfef3 100644
--- a/chardev/char-pty.c
+++ b/chardev/char-pty.c
@@ -106,11 +106,27 @@ static void pty_chr_update_read_handler(Chardev *chr)
   static int char_pty_chr_write(Chardev *chr, const uint8_t *buf, int len)
   {
   PtyChardev *s = PTY_CHARDEV(chr);
+GPollFD pfd;
+int rc;
-if (!s->connected) {
-return len;
+if (s->connected) {
+return io_channel_send(s->ioc, buf, len);
   }
-return io_channel_send(s->ioc, buf, len);
+
+/*
+ * The other side might already be re-connected, but the timer might
+ * not have fired yet. So let's check here whether we can write again:
+ */
+pfd.fd = QIO_CHANNEL_FILE(s->ioc)->fd;
+pfd.events = G_IO_OUT;
+pfd.revents = 0;
+rc = RETRY_ON_EINTR(g_poll(, 1, 0));
+g_assert(rc >= 0);
+if (!(pfd.revents & G_IO_HUP) && (pfd.revents & G_IO_OUT)) {


Should (can?) we call

 pty_chr_state(chr, 1);

here ?


As far as I understood commit f8278c7d74c6 and f7ea2038bea04628, this is not
possible anymore since the lock has been removed.


+io_channel_send(s->ioc, buf, len);


As it feels a little dirty to be sending data before setting the
'connected == 1' and thus issuing the 'CHR_EVENT_OPENED' event


I didn't find a really better solution so far. We could maybe introduce a
buffer in the char-pty code and store the last second of guest output, but
IMHO that's way more complex and thus somewhat ugly, too?


The orignal commit f8278c7d74c6 said

[quote]
 char-pty: remove the check for connection on write

 This doesn't help much compared to the 1 second poll PTY
 timer. I can't think of a use case where this would help.
[/quote]

We've now identified a use case where it is actually important.

IOW, there's a justification to revert both f7ea2038bea04628 and
f8278c7d74c6, re-adding the locking and write update logic.


Indeed. But isn't it possible to watch for IO_OUT and get rid of the timer?


It might be possible - Marc Hartmayer just sent me a draft patch today that 
uses qio_channel_add_watch() and gets rid of the timer ... I'll do some 
experiments with that and send it out if it works reliably.


 Thomas

Re: [PATCH 3/3] tests/tcg/s390x: Test VSTRS

2023-08-17 Thread Ilya Leoshkevich

On Thu, Aug 17, 2023 at 11:37:29AM +0200, Claudio Fontana wrote:
> On 8/5/23 01:03, Ilya Leoshkevich wrote:
> > Add a small test to prevent regressions.
> > 
> > Signed-off-by: Ilya Leoshkevich 
> 
> Something seems off in the wiring of the make check target?
> 
> I built with:
> 
> ./configure --target-list=s390x-linux-user,s390x-softmmu
> 
> make -j
> make -j check-help
> 
> ...
> 
> Individual test suites:
>  make check-qtest-TARGET Run qtest tests for given target
>  make check-qtestRun qtest tests
>  make check-unit Run qobject tests
>  make check-qapi-schema  Run QAPI schema tests
>  make check-blockRun block tests
>  make check-tcg  Run TCG tests
> 
> 
> ...
> 
> make -j check-tcg
> 
> changing dir to build for make "check-tcg"...
> make[1]: Entering directory '/root/git/qemu/build'
> make[1]: Nothing to be done for 'check-tcg'.
> make[1]: Leaving directory '/root/git/qemu/build'
> 
> 
> Why is this not running any tests for tcg?
> 
> I tried also to run the general make check,
> but even in this case the tcg tests do not seem to trigger.
> 
> Thanks,
> 
> Claudio

Hi,

I believe you need either s390x-linux-gnu-gcc or docker/podman to run
the tcg tests.

Best regards,
Ilya

[PATCH v2 1/4] block: rename blk_io_plug_call() API to defer_call()

2023-08-17 Thread Stefan Hajnoczi

Prepare to move the blk_io_plug_call() API out of the block layer so
that other subsystems call use this deferred call mechanism. Rename it
to defer_call() but leave the code in block/plug.c.

The next commit will move the code out of the block layer.

Suggested-by: Ilya Maximets 
Signed-off-by: Stefan Hajnoczi 
---
 include/sysemu/block-backend-io.h |   6 +-
 block/blkio.c |   8 +--
 block/io_uring.c  |   4 +-
 block/linux-aio.c |   4 +-
 block/nvme.c  |   4 +-
 block/plug.c  | 109 +++---
 hw/block/dataplane/xen-block.c|  10 +--
 hw/block/virtio-blk.c |   4 +-
 hw/scsi/virtio-scsi.c |   6 +-
 9 files changed, 76 insertions(+), 79 deletions(-)

diff --git a/include/sysemu/block-backend-io.h 
b/include/sysemu/block-backend-io.h
index be4dcef59d..cfcfd85c1d 100644
--- a/include/sysemu/block-backend-io.h
+++ b/include/sysemu/block-backend-io.h
@@ -100,9 +100,9 @@ void blk_iostatus_set_err(BlockBackend *blk, int error);
 int blk_get_max_iov(BlockBackend *blk);
 int blk_get_max_hw_iov(BlockBackend *blk);
 
-void blk_io_plug(void);
-void blk_io_unplug(void);
-void blk_io_plug_call(void (*fn)(void *), void *opaque);
+void defer_call_begin(void);
+void defer_call_end(void);
+void defer_call(void (*fn)(void *), void *opaque);
 
 AioContext *blk_get_aio_context(BlockBackend *blk);
 BlockAcctStats *blk_get_stats(BlockBackend *blk);
diff --git a/block/blkio.c b/block/blkio.c
index 1dd495617c..7cf6d61f47 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -312,10 +312,10 @@ static void blkio_detach_aio_context(BlockDriverState *bs)
 }
 
 /*
- * Called by blk_io_unplug() or immediately if not plugged. Called without
- * blkio_lock.
+ * Called by defer_call_end() or immediately if not in a deferred section.
+ * Called without blkio_lock.
  */
-static void blkio_unplug_fn(void *opaque)
+static void blkio_deferred_fn(void *opaque)
 {
 BDRVBlkioState *s = opaque;
 
@@ -332,7 +332,7 @@ static void blkio_submit_io(BlockDriverState *bs)
 {
 BDRVBlkioState *s = bs->opaque;
 
-blk_io_plug_call(blkio_unplug_fn, s);
+defer_call(blkio_deferred_fn, s);
 }
 
 static int coroutine_fn
diff --git a/block/io_uring.c b/block/io_uring.c
index 69d9820928..8429f341be 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -306,7 +306,7 @@ static void ioq_init(LuringQueue *io_q)
 io_q->blocked = false;
 }
 
-static void luring_unplug_fn(void *opaque)
+static void luring_deferred_fn(void *opaque)
 {
 LuringState *s = opaque;
 trace_luring_unplug_fn(s, s->io_q.blocked, s->io_q.in_queue,
@@ -367,7 +367,7 @@ static int luring_do_submit(int fd, LuringAIOCB *luringcb, 
LuringState *s,
 return ret;
 }
 
-blk_io_plug_call(luring_unplug_fn, s);
+defer_call(luring_deferred_fn, s);
 }
 return 0;
 }
diff --git a/block/linux-aio.c b/block/linux-aio.c
index 561c71a9ae..9a08219db0 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -353,7 +353,7 @@ static uint64_t laio_max_batch(LinuxAioState *s, uint64_t 
dev_max_batch)
 return max_batch;
 }
 
-static void laio_unplug_fn(void *opaque)
+static void laio_deferred_fn(void *opaque)
 {
 LinuxAioState *s = opaque;
 
@@ -393,7 +393,7 @@ static int laio_do_submit(int fd, struct qemu_laiocb 
*laiocb, off_t offset,
 if (s->io_q.in_queue >= laio_max_batch(s, dev_max_batch)) {
 ioq_submit(s);
 } else {
-blk_io_plug_call(laio_unplug_fn, s);
+defer_call(laio_deferred_fn, s);
 }
 }
 
diff --git a/block/nvme.c b/block/nvme.c
index b6e95f0b7e..dfbd1085fd 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -476,7 +476,7 @@ static void nvme_trace_command(const NvmeCmd *cmd)
 }
 }
 
-static void nvme_unplug_fn(void *opaque)
+static void nvme_deferred_fn(void *opaque)
 {
 NVMeQueuePair *q = opaque;
 
@@ -503,7 +503,7 @@ static void nvme_submit_command(NVMeQueuePair *q, 
NVMeRequest *req,
 q->need_kick++;
 qemu_mutex_unlock(>lock);
 
-blk_io_plug_call(nvme_unplug_fn, q);
+defer_call(nvme_deferred_fn, q);
 }
 
 static void nvme_admin_cmd_sync_cb(void *opaque, int ret)
diff --git a/block/plug.c b/block/plug.c
index 98a155d2f4..f26173559c 100644
--- a/block/plug.c
+++ b/block/plug.c
@@ -1,24 +1,21 @@
 /* SPDX-License-Identifier: GPL-2.0-or-later */
 /*
- * Block I/O plugging
+ * Deferred calls
  *
  * Copyright Red Hat.
  *
- * This API defers a function call within a blk_io_plug()/blk_io_unplug()
+ * This API defers a function call within a defer_call_begin()/defer_call_end()
  * section, allowing multiple calls to batch up. This is a performance
  * optimization that is used in the block layer to submit several I/O requests
  * at once instead of individually:
  *
- *   blk_io_plug(); <-- start of plugged region
+ *   defer_call_begin(); <-- start of section
  *   ...
- *   blk_io_plug_call(my_func, my_obj); <-- deferred

Re: [PATCH for-8.1] vfio/display: Fix missing update to set backing fields

2023-08-17 Thread Kim, Dongwon

Ok, this regression happened not just because of renaming. Originally 
width and height were representing the size of whole surface that guest 
shares while scanout width and height are for the each scanout. We 
realized backing_width/height are more commonly used to specify the size 
of the whole guest surface so put them in the place of width/height then 
replaced scanout_width/height as well with normal width/height.


On 8/16/2023 3:31 PM, Philippe Mathieu-Daudé wrote:

On 16/8/23 23:55, Alex Williamson wrote:

The below referenced commit renames scanout_width/height to
backing_width/height, but also promotes these fields in various portions
of the egl interface.  Meanwhile vfio dmabuf support has never used the
previous scanout fields and is therefore missed in the update. This
results in a black screen when transitioning from ramfb to dmabuf 
display

when using Intel vGPU with these features.


Referenced commit isn't trivial. Maybe because it is too late here.
I'd have tried to split it. Anyhow, too late (again).

Is vhost-user-gpu also affected? (see VHOST_USER_GPU_DMABUF_SCANOUT
in vhost_user_gpu_handle_display()).


Yeah, backing_width/height should be programmed with plane.width/height 
as well in vhost_user_gpu_handle_display().


Link: https://lists.gnu.org/archive/html/qemu-devel/2023-08/msg02726.html
Fixes: 9ac06df8b684 ("virtio-gpu-udmabuf: correct naming of 
QemuDmaBuf size properties")

Signed-off-by: Alex Williamson 
---

This fixes a regression in dmabuf/EGL support for Intel GVT-g and
potentially the mbochs mdev driver as well.  Once validated by those
that understand dmabuf/EGL integration, I'd welcome QEMU maintainers to
take this directly for v8.1 or queue it as soon as possible for v8.1.1.

  hw/vfio/display.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/hw/vfio/display.c b/hw/vfio/display.c
index bec864f482f4..837d9e6a309e 100644
--- a/hw/vfio/display.c
+++ b/hw/vfio/display.c
@@ -243,6 +243,8 @@ static VFIODMABuf 
*vfio_display_get_dmabuf(VFIOPCIDevice *vdev,

  dmabuf->dmabuf_id  = plane.dmabuf_id;
  dmabuf->buf.width  = plane.width;
  dmabuf->buf.height = plane.height;


One thing to note here is the normal width and height in the QemuDmaBuf 
are of a scanout, which could be just a partial area of the guest plane 
here. So we should not use those as normal width and height of the 
QemuDmaBuf unless it is guaranteed the given guest surface (plane in 
this case) is always of single display's.


https://lists.gnu.org/archive/html/qemu-devel/2021-09/msg04737.html


+    dmabuf->buf.backing_width = plane.width;
+    dmabuf->buf.backing_height = plane.height;
  dmabuf->buf.stride = plane.stride;
  dmabuf->buf.fourcc = plane.drm_format;
  dmabuf->buf.modifier = plane.drm_format_mod;

Re: [PATCH 3/6] linux-user: Adjust brk for load_bias

2023-08-17 Thread Michael Tokarev


16.08.2023 21:14, Richard Henderson wrote:

PIE executables are usually linked at offset 0 and are
relocated somewhere during load.  The hiaddr needs to
be adjusted to keep the brk next to the executable.

Cc: qemu-sta...@nongnu.org
Fixes: 1f356e8c013 ("linux-user: Adjust initial brk when interpreter is close to 
executable")


FWIW, 1f356e8c013 is v8.1.0-rc2-86, - why did you Cc qemu-stable@?

If this "Adjust brk for load_bias" fix isn't supposed to be part of 8.1.0 
release,
sure thing I'll pick it up for stable-8.1, but it looks like it should be in 
8.1.0.

Or are you saying 1f356e8c013 should be picked for stable-8.0, together with 
this one?

(We're yet to decide if stable-8.0 should have any recent linux-user changes).

/mjt

[PATCH v2 4/4] virtio-blk: remove batch notification BH

2023-08-17 Thread Stefan Hajnoczi

There is a batching mechanism for virtio-blk Used Buffer Notifications
that is no longer needed because the previous commit added batching to
virtio_notify_irqfd().

Note that this mechanism was rarely used in practice because it is only
enabled when EVENT_IDX is not negotiated by the driver. Modern drivers
enable EVENT_IDX.

Signed-off-by: Stefan Hajnoczi 
---
 hw/block/dataplane/virtio-blk.c | 48 +
 1 file changed, 1 insertion(+), 47 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index da36fcfd0b..f83bb0f116 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -31,9 +31,6 @@ struct VirtIOBlockDataPlane {
 
 VirtIOBlkConf *conf;
 VirtIODevice *vdev;
-QEMUBH *bh; /* bh for guest notification */
-unsigned long *batch_notify_vqs;
-bool batch_notifications;
 
 /* Note that these EventNotifiers are assigned by value.  This is
  * fine as long as you do not call event_notifier_cleanup on them
@@ -47,36 +44,7 @@ struct VirtIOBlockDataPlane {
 /* Raise an interrupt to signal guest, if necessary */
 void virtio_blk_data_plane_notify(VirtIOBlockDataPlane *s, VirtQueue *vq)
 {
-if (s->batch_notifications) {
-set_bit(virtio_get_queue_index(vq), s->batch_notify_vqs);
-qemu_bh_schedule(s->bh);
-} else {
-virtio_notify_irqfd(s->vdev, vq);
-}
-}
-
-static void notify_guest_bh(void *opaque)
-{
-VirtIOBlockDataPlane *s = opaque;
-unsigned nvqs = s->conf->num_queues;
-unsigned long bitmap[BITS_TO_LONGS(nvqs)];
-unsigned j;
-
-memcpy(bitmap, s->batch_notify_vqs, sizeof(bitmap));
-memset(s->batch_notify_vqs, 0, sizeof(bitmap));
-
-for (j = 0; j < nvqs; j += BITS_PER_LONG) {
-unsigned long bits = bitmap[j / BITS_PER_LONG];
-
-while (bits != 0) {
-unsigned i = j + ctzl(bits);
-VirtQueue *vq = virtio_get_queue(s->vdev, i);
-
-virtio_notify_irqfd(s->vdev, vq);
-
-bits &= bits - 1; /* clear right-most bit */
-}
-}
+virtio_notify_irqfd(s->vdev, vq);
 }
 
 /* Context: QEMU global mutex held */
@@ -126,9 +94,6 @@ bool virtio_blk_data_plane_create(VirtIODevice *vdev, 
VirtIOBlkConf *conf,
 } else {
 s->ctx = qemu_get_aio_context();
 }
-s->bh = aio_bh_new_guarded(s->ctx, notify_guest_bh, s,
-   (vdev)->mem_reentrancy_guard);
-s->batch_notify_vqs = bitmap_new(conf->num_queues);
 
 *dataplane = s;
 
@@ -146,8 +111,6 @@ void virtio_blk_data_plane_destroy(VirtIOBlockDataPlane *s)
 
 vblk = VIRTIO_BLK(s->vdev);
 assert(!vblk->dataplane_started);
-g_free(s->batch_notify_vqs);
-qemu_bh_delete(s->bh);
 if (s->iothread) {
 object_unref(OBJECT(s->iothread));
 }
@@ -173,12 +136,6 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 
 s->starting = true;
 
-if (!virtio_vdev_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX)) {
-s->batch_notifications = true;
-} else {
-s->batch_notifications = false;
-}
-
 /* Set up guest notifier (irq) */
 r = k->set_guest_notifiers(qbus->parent, nvqs, true);
 if (r != 0) {
@@ -370,9 +327,6 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
 
 aio_context_release(s->ctx);
 
-qemu_bh_cancel(s->bh);
-notify_guest_bh(s); /* final chance to notify guest */
-
 /* Clean up guest notifier (irq) */
 k->set_guest_notifiers(qbus->parent, nvqs, false);
 
-- 
2.41.0

[PATCH v2 2/4] util/defer-call: move defer_call() to util/

2023-08-17 Thread Stefan Hajnoczi

The networking subsystem may wish to use defer_call(), so move the code
to util/ where it can be reused.

As a reminder of what defer_call() does:

This API defers a function call within a defer_call_begin()/defer_call_end()
section, allowing multiple calls to batch up. This is a performance
optimization that is used in the block layer to submit several I/O requests
at once instead of individually:

  defer_call_begin(); <-- start of section
  ...
  defer_call(my_func, my_obj); <-- deferred my_func(my_obj) call
  defer_call(my_func, my_obj); <-- another
  defer_call(my_func, my_obj); <-- another
  ...
  defer_call_end(); <-- end of section, my_func(my_obj) is called once

Suggested-by: Ilya Maximets 
Signed-off-by: Stefan Hajnoczi 
---
 MAINTAINERS   |  3 ++-
 include/qemu/defer-call.h | 15 +++
 include/sysemu/block-backend-io.h |  4 
 block/blkio.c |  1 +
 block/io_uring.c  |  1 +
 block/linux-aio.c |  1 +
 block/nvme.c  |  1 +
 hw/block/dataplane/xen-block.c|  1 +
 hw/block/virtio-blk.c |  1 +
 hw/scsi/virtio-scsi.c |  1 +
 block/plug.c => util/defer-call.c |  2 +-
 block/meson.build |  1 -
 util/meson.build  |  1 +
 13 files changed, 26 insertions(+), 7 deletions(-)
 create mode 100644 include/qemu/defer-call.h
 rename block/plug.c => util/defer-call.c (99%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6111b6b4d9..7cd7132ffc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2676,12 +2676,13 @@ S: Supported
 F: util/async.c
 F: util/aio-*.c
 F: util/aio-*.h
+F: util/defer-call.c
 F: util/fdmon-*.c
 F: block/io.c
-F: block/plug.c
 F: migration/block*
 F: include/block/aio.h
 F: include/block/aio-wait.h
+F: include/qemu/defer-call.h
 F: scripts/qemugdb/aio.py
 F: tests/unit/test-fdmon-epoll.c
 T: git https://github.com/stefanha/qemu.git block
diff --git a/include/qemu/defer-call.h b/include/qemu/defer-call.h
new file mode 100644
index 00..291f86c987
--- /dev/null
+++ b/include/qemu/defer-call.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Deferred calls
+ *
+ * Copyright Red Hat.
+ */
+
+#ifndef QEMU_DEFER_CALL_H
+#define QEMU_DEFER_CALL_H
+
+void defer_call_begin(void);
+void defer_call_end(void);
+void defer_call(void (*fn)(void *), void *opaque);
+
+#endif /* QEMU_DEFER_CALL_H */
diff --git a/include/sysemu/block-backend-io.h 
b/include/sysemu/block-backend-io.h
index cfcfd85c1d..d174275a5c 100644
--- a/include/sysemu/block-backend-io.h
+++ b/include/sysemu/block-backend-io.h
@@ -100,10 +100,6 @@ void blk_iostatus_set_err(BlockBackend *blk, int error);
 int blk_get_max_iov(BlockBackend *blk);
 int blk_get_max_hw_iov(BlockBackend *blk);
 
-void defer_call_begin(void);
-void defer_call_end(void);
-void defer_call(void (*fn)(void *), void *opaque);
-
 AioContext *blk_get_aio_context(BlockBackend *blk);
 BlockAcctStats *blk_get_stats(BlockBackend *blk);
 void *blk_aio_get(const AIOCBInfo *aiocb_info, BlockBackend *blk,
diff --git a/block/blkio.c b/block/blkio.c
index 7cf6d61f47..0a0a6c0f5f 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -13,6 +13,7 @@
 #include "block/block_int.h"
 #include "exec/memory.h"
 #include "exec/cpu-common.h" /* for qemu_ram_get_fd() */
+#include "qemu/defer-call.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
 #include "qapi/qmp/qdict.h"
diff --git a/block/io_uring.c b/block/io_uring.c
index 8429f341be..3a1e1f45b3 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -15,6 +15,7 @@
 #include "block/block.h"
 #include "block/raw-aio.h"
 #include "qemu/coroutine.h"
+#include "qemu/defer-call.h"
 #include "qapi/error.h"
 #include "sysemu/block-backend.h"
 #include "trace.h"
diff --git a/block/linux-aio.c b/block/linux-aio.c
index 9a08219db0..62380593c8 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -14,6 +14,7 @@
 #include "block/raw-aio.h"
 #include "qemu/event_notifier.h"
 #include "qemu/coroutine.h"
+#include "qemu/defer-call.h"
 #include "qapi/error.h"
 #include "sysemu/block-backend.h"
 
diff --git a/block/nvme.c b/block/nvme.c
index dfbd1085fd..96b3f8f2fa 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -16,6 +16,7 @@
 #include "qapi/error.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qstring.h"
+#include "qemu/defer-call.h"
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
 #include "qemu/module.h"
diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c
index e9dd8f8a99..c4bb28c66f 100644
--- a/hw/block/dataplane/xen-block.c
+++ b/hw/block/dataplane/xen-block.c
@@ -19,6 +19,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/defer-call.h"
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
 #include "qemu/memalign.h"
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 6a45033d15..a1f8e15522 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -12,6 +12,7 @@
  */

[PATCH v2 0/4] virtio-blk: use blk_io_plug_call() instead of notification BH

2023-08-17 Thread Stefan Hajnoczi

v2:
- Rename blk_io_plug() to defer_call() and move it to util/ so the net
  subsystem can use it [Ilya]
- Add defer_call_begin()/end() to thread_pool_completion_bh() to match Linux
  AIO and io_uring completion batching

Replace the seldom-used virtio-blk notification BH mechanism with
blk_io_plug(). This is part of an effort to enable the multi-queue block layer
in virtio-blk. The notification BH was not multi-queue friendly.

The blk_io_plug() mechanism improves fio rw=randread bs=4k iodepth=64 numjobs=8
IOPS by ~9% with a single IOThread and 8 vCPUs (this is not even a multi-queue
block layer configuration) compared to no completion batching. iodepth=1
decreases by ~1% but this could be noise. Benchmark details are available here:
https://gitlab.com/stefanha/virt-playbooks/-/tree/blk_io_plug-irqfd

Stefan Hajnoczi (4):
  block: rename blk_io_plug_call() API to defer_call()
  util/defer-call: move defer_call() to util/
  virtio: use defer_call() in virtio_irqfd_notify()
  virtio-blk: remove batch notification BH

 MAINTAINERS   |   3 +-
 include/qemu/defer-call.h |  15 +++
 include/sysemu/block-backend-io.h |   4 -
 block/blkio.c |   9 +-
 block/io_uring.c  |  11 ++-
 block/linux-aio.c |   9 +-
 block/nvme.c  |   5 +-
 block/plug.c  | 159 --
 hw/block/dataplane/virtio-blk.c   |  48 +
 hw/block/dataplane/xen-block.c|  11 ++-
 hw/block/virtio-blk.c |   5 +-
 hw/scsi/virtio-scsi.c |   7 +-
 hw/virtio/virtio.c|  11 ++-
 util/defer-call.c | 156 +
 util/thread-pool.c|   5 +
 block/meson.build |   1 -
 util/meson.build  |   1 +
 17 files changed, 227 insertions(+), 233 deletions(-)
 create mode 100644 include/qemu/defer-call.h
 delete mode 100644 block/plug.c
 create mode 100644 util/defer-call.c

-- 
2.41.0

[PATCH v2 3/4] virtio: use defer_call() in virtio_irqfd_notify()

2023-08-17 Thread Stefan Hajnoczi

virtio-blk and virtio-scsi invoke virtio_irqfd_notify() to send Used
Buffer Notifications from an IOThread. This involves an eventfd
write(2) syscall. Calling this repeatedly when completing multiple I/O
requests in a row is wasteful.

Use the defer_call() API to batch together virtio_irqfd_notify() calls
made during thread pool (aio=threads), Linux AIO (aio=native), and
io_uring (aio=io_uring) completion processing.

Behavior is unchanged for emulated devices that do not use
defer_call_begin()/defer_call_end() since defer_call() immediately
invokes the callback when called outside a
defer_call_begin()/defer_call_end() region.

fio rw=randread bs=4k iodepth=64 numjobs=8 IOPS increases by ~9% with a
single IOThread and 8 vCPUs. iodepth=1 decreases by ~1% but this could
be noise. Detailed performance data and configuration specifics are
available here:
https://gitlab.com/stefanha/virt-playbooks/-/tree/blk_io_plug-irqfd

This duplicates the BH that virtio-blk uses for batching. The next
commit will remove it.

Signed-off-by: Stefan Hajnoczi 
---
 block/io_uring.c   |  6 ++
 block/linux-aio.c  |  4 
 hw/virtio/virtio.c | 11 ++-
 util/thread-pool.c |  5 +
 4 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/block/io_uring.c b/block/io_uring.c
index 3a1e1f45b3..7cdd00e9f1 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -125,6 +125,9 @@ static void luring_process_completions(LuringState *s)
 {
 struct io_uring_cqe *cqes;
 int total_bytes;
+
+defer_call_begin();
+
 /*
  * Request completion callbacks can run the nested event loop.
  * Schedule ourselves so the nested event loop will "see" remaining
@@ -217,7 +220,10 @@ end:
 aio_co_wake(luringcb->co);
 }
 }
+
 qemu_bh_cancel(s->completion_bh);
+
+defer_call_end();
 }
 
 static int ioq_submit(LuringState *s)
diff --git a/block/linux-aio.c b/block/linux-aio.c
index 62380593c8..ab607ade6a 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -205,6 +205,8 @@ static void qemu_laio_process_completions(LinuxAioState *s)
 {
 struct io_event *events;
 
+defer_call_begin();
+
 /* Reschedule so nested event loops see currently pending completions */
 qemu_bh_schedule(s->completion_bh);
 
@@ -231,6 +233,8 @@ static void qemu_laio_process_completions(LinuxAioState *s)
  * own `for` loop.  If we are the last all counters droped to zero. */
 s->event_max = 0;
 s->event_idx = 0;
+
+defer_call_end();
 }
 
 static void qemu_laio_process_completions_and_submit(LinuxAioState *s)
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 309038fd46..5eb1f91b41 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -15,6 +15,7 @@
 #include "qapi/error.h"
 #include "qapi/qapi-commands-virtio.h"
 #include "trace.h"
+#include "qemu/defer-call.h"
 #include "qemu/error-report.h"
 #include "qemu/log.h"
 #include "qemu/main-loop.h"
@@ -28,6 +29,7 @@
 #include "hw/virtio/virtio-bus.h"
 #include "hw/qdev-properties.h"
 #include "hw/virtio/virtio-access.h"
+#include "sysemu/block-backend.h"
 #include "sysemu/dma.h"
 #include "sysemu/runstate.h"
 #include "virtio-qmp.h"
@@ -2426,6 +2428,13 @@ static bool virtio_should_notify(VirtIODevice *vdev, 
VirtQueue *vq)
 }
 }
 
+/* Batch irqs while inside a defer_call_begin()/defer_call_end() section */
+static void virtio_notify_irqfd_deferred_fn(void *opaque)
+{
+EventNotifier *notifier = opaque;
+event_notifier_set(notifier);
+}
+
 void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq)
 {
 WITH_RCU_READ_LOCK_GUARD() {
@@ -2452,7 +2461,7 @@ void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue 
*vq)
  * to an atomic operation.
  */
 virtio_set_isr(vq->vdev, 0x1);
-event_notifier_set(>guest_notifier);
+defer_call(virtio_notify_irqfd_deferred_fn, >guest_notifier);
 }
 
 static void virtio_irq(VirtQueue *vq)
diff --git a/util/thread-pool.c b/util/thread-pool.c
index e3d8292d14..d84961779a 100644
--- a/util/thread-pool.c
+++ b/util/thread-pool.c
@@ -15,6 +15,7 @@
  * GNU GPL, version 2 or (at your option) any later version.
  */
 #include "qemu/osdep.h"
+#include "qemu/defer-call.h"
 #include "qemu/queue.h"
 #include "qemu/thread.h"
 #include "qemu/coroutine.h"
@@ -175,6 +176,8 @@ static void thread_pool_completion_bh(void *opaque)
 ThreadPool *pool = opaque;
 ThreadPoolElement *elem, *next;
 
+defer_call_begin(); /* cb() may use defer_call() to coalesce work */
+
 restart:
 QLIST_FOREACH_SAFE(elem, >head, all, next) {
 if (elem->state != THREAD_DONE) {
@@ -208,6 +211,8 @@ restart:
 qemu_aio_unref(elem);
 }
 }
+
+defer_call_end();
 }
 
 static void thread_pool_cancel(BlockAIOCB *acb)
-- 
2.41.0

Re: [PATCH 3/3] tcg/i386: Allow immediate as input to deposit_*

2023-08-17 Thread Peter Maydell

On Wed, 16 Aug 2023 at 15:58, Richard Henderson
 wrote:
>
> We can use MOVB and MOVW with an immediate just as easily
> as with a register input.
>
> Signed-off-by: Richard Henderson 
> ---


Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [PATCH 2/3] tcg: Fold deposit with zero to and

2023-08-17 Thread Peter Maydell

On Wed, 16 Aug 2023 at 15:58, Richard Henderson
 wrote:
>
> Inserting a zero into a value, or inserting a value
> into zero at offset 0 my be implemented with AND.
>
> Signed-off-by: Richard Henderson 
> ---
>  tcg/optimize.c | 35 +++
>  1 file changed, 35 insertions(+)
>
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index d2156367a3..956114b631 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -1279,6 +1279,8 @@ static bool fold_ctpop(OptContext *ctx, TCGOp *op)
>
>  static bool fold_deposit(OptContext *ctx, TCGOp *op)
>  {
> +TCGOpcode and_opc;
> +
>  if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
>  uint64_t t1 = arg_info(op->args[1])->val;
>  uint64_t t2 = arg_info(op->args[2])->val;
> @@ -1287,6 +1289,39 @@ static bool fold_deposit(OptContext *ctx, TCGOp *op)
>  return tcg_opt_gen_movi(ctx, op, op->args[0], t1);
>  }
>
> +switch (ctx->type) {
> +case TCG_TYPE_I32:
> +and_opc = INDEX_op_and_i32;
> +break;
> +case TCG_TYPE_I64:
> +and_opc = INDEX_op_and_i64;
> +break;
> +default:
> +g_assert_not_reached();
> +}
> +
> +if (arg_is_const(op->args[1])
> +&& arg_info(op->args[1])->val == 0
> +&& op->args[3] == 0) {
> +uint64_t mask = MAKE_64BIT_MASK(0, op->args[4]);

The docs for the TCG deposit op don't say what the restrictions on the
immediate args are, but this will be UB for QEMU if args[4] is 0.
Have we already sanitized those somewhere?

> +
> +op->opc = and_opc;
> +op->args[1] = op->args[2];
> +op->args[2] = temp_arg(tcg_constant_internal(ctx->type, mask));
> +ctx->z_mask = mask & arg_info(op->args[1])->z_mask;
> +return false;
> +}
> +
> +if (arg_is_const(op->args[2])
> +&& arg_info(op->args[2])->val == 0) {
> +uint64_t mask = deposit64(-1, op->args[3], op->args[4], 0);
> +
> +op->opc = and_opc;
> +op->args[2] = temp_arg(tcg_constant_internal(ctx->type, mask));
> +ctx->z_mask = mask & arg_info(op->args[1])->z_mask;
> +return false;
> +}
> +
>  ctx->z_mask = deposit64(arg_info(op->args[1])->z_mask,
>  op->args[3], op->args[4],
>  arg_info(op->args[2])->z_mask);
> --

thanks
-- PMM

Re: [PATCH 1/3] tcg/i386: Drop BYTEH deposits for 64-bit

2023-08-17 Thread Peter Maydell

On Wed, 16 Aug 2023 at 16:01, Richard Henderson
 wrote:
>
> It is more useful to allow low-part deposits into all registers
> than to restrict allocation for high-byte deposits.

>  #define TCG_TARGET_deposit_i32_valid(ofs, len) \
> -(((ofs) == 0 && (len) == 8) || ((ofs) == 8 && (len) == 8) || \
> - ((ofs) == 0 && (len) == 16))
> +(((ofs) == 0 && ((len) == 8 || (len) == 16)) || \
> + (TCG_TARGET_REG_BITS == 32 && (ofs) == 8 && (len) == 8))
>  #define TCG_TARGET_deposit_i64_validTCG_TARGET_deposit_i32_valid


> @@ -2752,7 +2751,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
> opc,
>  if (args[3] == 0 && args[4] == 8) {
>  /* load bits 0..7 */
>  tcg_out_modrm(s, OPC_MOVB_EvGv | P_REXB_R | P_REXB_RM, a2, a0);
> -} else if (args[3] == 8 && args[4] == 8) {
> +} else if (TCG_TARGET_REG_BITS == 32 && args[3] == 8 && args[4] == 
> 8) {

Should we assert(TCG_TARGET_REG_BITS == 32) rather than making it part of the
condition? If I understand the change to the deposit_i32_valid macro above, we
should never get here with 8, 8 if TCG_TARGET_REG_BITS is 64.

>  /* load bits 8..15 */
>  tcg_out_modrm(s, OPC_MOVB_EvGv, a2, a0 + 4);
>  } else if (args[3] == 0 && args[4] == 16) {

Otherwise
Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping

2023-08-17 Thread David Hildenbrand


On 17.08.23 17:31, Peter Xu wrote:

On Thu, Aug 17, 2023 at 05:15:52PM +0200, David Hildenbrand wrote:

I don't know how important that requirement was (that commit was a
request from Kata Containers).


Let me take a look if Kata passes "share=on,readonly=on" or
"share=off,readonly=off".


The question is whether it's good enough if we change the semantics as long
as we guarantee the original purposes of when introducing those flags would
be enough (nvdimm, kata, etc.), as anything introduced in qemu can
potentially be used elsewhere too.



Right. And we have to keep the R/O NVDIMM use case working as is apparently.


David, could you share your concern on simply "having a new flag, while
keeping all existing flags unchanged on behavior"?  You mentioned it's not
wanted, but I didn't yet see the reason behind.


I'm really having a hard time to come up with something reasonable to 
configure this. And apparently, we only want to configure 
"share=off,readonly=on".


The best I was imagining was "readonly=file-only" but I'm also not too 
happy about that. Doesn't make any sense for "share=on".


So if we could just let the memory backend do something reasonable and 
have the single existing consumer (R/O NVDIMM) handle the changes case 
explicitly internally, that turns up much cleaner.


IMHO, the user shouldn't have to worry about "how is it mmaped". "share" 
and "readonly" express the memory semantics and the file semantics.


A R/O DIMM on the other hand (unarmed=on), knows that it is R/O, and the 
user configured exactly that. So maybe it can simply expose it to the 
system as readonly by marking the memory region container as being a ROM.


I have not given up yet, but this case is starting to be annoying.

--
Cheers,

David / dhildenb

Re: [PATCH] hw/net/vmxnet3: Fix guest-triggerable assert()

2023-08-17 Thread Philippe Mathieu-Daudé


Hi Thomas,

On 17/8/23 14:56, Thomas Huth wrote:

The assert() that checks for valid MTU sizes can be triggered by
the guest (e.g. with the reproducer code from the bug ticket
https://gitlab.com/qemu-project/qemu/-/issues/517 ). Let's avoid
this problem by simply logging the error and refusing to activate
the device instead.

Fixes: d05dcd94ae ("net: vmxnet3: validate configuration values during 
activate")
Signed-off-by: Thomas Huth 
---
  hw/net/vmxnet3.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 5dfacb1098..6674122a7e 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -1439,7 +1439,10 @@ static void vmxnet3_activate_device(VMXNET3State *s)
  vmxnet3_setup_rx_filtering(s);
  /* Cache fields from shared memory */
  s->mtu = VMXNET3_READ_DRV_SHARED32(d, s->drv_shmem, devRead.misc.mtu);
-assert(VMXNET3_MIN_MTU <= s->mtu && s->mtu <= VMXNET3_MAX_MTU);
+if (s->mtu < VMXNET3_MIN_MTU || s->mtu > VMXNET3_MAX_MTU) {
+qemu_log_mask(LOG_GUEST_ERROR, "vmxnet3: Bad MTU size: %d\n", s->mtu);


mtu is uint32_t, otherwise:

Reviewed-by: Philippe Mathieu-Daudé 


+return;
+}
  VMW_CFPRN("MTU is %u", s->mtu);
  
  s->max_rx_frags =

Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping

2023-08-17 Thread Peter Xu

On Thu, Aug 17, 2023 at 05:15:52PM +0200, David Hildenbrand wrote:
> > I don't know how important that requirement was (that commit was a
> > request from Kata Containers).
> 
> Let me take a look if Kata passes "share=on,readonly=on" or
> "share=off,readonly=off".

The question is whether it's good enough if we change the semantics as long
as we guarantee the original purposes of when introducing those flags would
be enough (nvdimm, kata, etc.), as anything introduced in qemu can
potentially be used elsewhere too.

David, could you share your concern on simply "having a new flag, while
keeping all existing flags unchanged on behavior"?  You mentioned it's not
wanted, but I didn't yet see the reason behind.

Thanks,

-- 
Peter Xu

[PATCH] target/riscv: fix satp_mode_finalize() when satp_mode.supported = 0

2023-08-17 Thread Daniel Henrique Barboza

In the same emulated RISC-V host, the 'host' KVM CPU takes 4 times
longer to boot than the 'rv64' KVM CPU.

The reason is an unintended behavior of riscv_cpu_satp_mode_finalize()
when satp_mode.supported = 0, i.e. when cpu_init() does not set
satp_mode_max_supported(). satp_mode_max_from_map(map) does:

31 - __builtin_clz(map)

This means that, if satp_mode.supported = 0, satp_mode_supported_max
wil be '31 - 32'. But this is C, so satp_mode_supported_max will gladly
set it to UINT_MAX (4294967295). After that, if the user didn't set a
satp_mode, set_satp_mode_default_map(cpu) will make

cfg.satp_mode.map = cfg.satp_mode.supported

So satp_mode.map = 0. And then satp_mode_map_max will be set to
satp_mode_max_from_map(cpu->cfg.satp_mode.map), i.e. also UINT_MAX. The
guard "satp_mode_map_max > satp_mode_supported_max" doesn't protect us
here since both are UINT_MAX.

And finally we have 2 loops:

for (int i = satp_mode_map_max - 1; i >= 0; --i) {

Which are, in fact, 2 loops from UINT_MAX -1 to -1. This is where the
extra delay when booting the 'host' CPU is coming from.

Commit 43d1de32f8 already set a precedence for satp_mode.supported = 0
in a different manner. We're doing the same here. If supported == 0,
interpret as 'the CPU wants the OS to handle satp mode alone' and skip
satp_mode_finalize().

We'll also put a guard in satp_mode_max_from_map() to assert out if map
is 0 since the function is not ready to deal with it.

Cc: Alexandre Ghiti 
Fixes: 6f23aaeb9b ("riscv: Allow user to set the satp mode")
Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/cpu.c | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index d608026a28..86da93c7bc 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -349,6 +349,17 @@ static uint8_t satp_mode_from_str(const char 
*satp_mode_str)
 
 uint8_t satp_mode_max_from_map(uint32_t map)
 {
+/*
+ * 'map = 0' will make us return (31 - 32), which C will
+ * happily overflow to UINT_MAX. There's no good result to
+ * return if 'map = 0' (e.g. returning 0 will be ambiguous
+ * with the result for 'map = 1').
+ *
+ * Assert out if map = 0. Callers will have to deal with
+ * it outside of this function.
+ */
+g_assert(map > 0);
+
 /* map here has at least one bit set, so no problem with clz */
 return 31 - __builtin_clz(map);
 }
@@ -1387,9 +1398,15 @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, 
Error **errp)
 static void riscv_cpu_satp_mode_finalize(RISCVCPU *cpu, Error **errp)
 {
 bool rv32 = riscv_cpu_mxl(>env) == MXL_RV32;
-uint8_t satp_mode_map_max;
-uint8_t satp_mode_supported_max =
-satp_mode_max_from_map(cpu->cfg.satp_mode.supported);
+uint8_t satp_mode_map_max, satp_mode_supported_max;
+
+/* The CPU wants the OS to decide which satp mode to use */
+if (cpu->cfg.satp_mode.supported == 0) {
+return;
+}
+
+satp_mode_supported_max =
+satp_mode_max_from_map(cpu->cfg.satp_mode.supported);
 
 if (cpu->cfg.satp_mode.map == 0) {
 if (cpu->cfg.satp_mode.init == 0) {
-- 
2.41.0

Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping

2023-08-17 Thread David Hildenbrand





Commit 86635aa4e9d627d5142b81c57a33dd1f36627d07 mentions that we don't
want guests to be able to dirty pages on the host. The change you're
proposing would not protect against guests that dirty the memory.


The guest could write memory but not modify the file. Only with
"share=off,readonly=on" of course, not with "share=on,readonly=on".



I don't know how important that requirement was (that commit was a
request from Kata Containers).


Let me take a look if Kata passes "share=on,readonly=on" or
"share=off,readonly=off".



At least their R/O DIMM test generates:

-device nvdimm,id=nv0,memdev=mem0,unarmed=on -object 
memory-backend-file,id=mem0,mem-path=/root,size=65536,readonly=on


So they are assuming readonly with implied share=off creates ROM. If 
only they would have specified share=on ...


One way would be letting the R/O nvdimm set the MR container to 
readonly. Then that guest also shouldn't be able to modify that memory. 
Let me think about that.


--
Cheers,

David / dhildenb

Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping

2023-08-17 Thread David Hildenbrand


On 17.08.23 17:13, Stefan Hajnoczi wrote:

On Thu, 17 Aug 2023 at 05:08, David Hildenbrand  wrote:


@Stefan, see below on a R/O NVDIMM question.

We're discussing how to get MAPR_PRIVATE R/W mapping of a
memory-backend-file running when using R/O files.



This seems a good idea. I am good with the solution you proposed
here as well.


I was just going to get started working on that, when I realized
something important:


"@readonly: if true, the backing file is opened read-only; if false,
   it is opened read-write.  (default: false)"

"@share: if false, the memory is private to QEMU; if true, it is
   shared (default: false)"

So readonly is *all* about the file access mode already ... the mmap()
parameters are just a side-effect of that. Adding a new
"file-access-mode" or similar would be wrong.


Here are the combinations we have right now:

-object memory-backend-file,share=on,readonly=on

   -> Existing behavior: Open readonly, mmap readonly shared
   -> Makes sense, mmap'ing readwrite would fail

-object memory-backend-file,share=on,readonly=off

   -> Existing behavior: Open readwrite, mmap readwrite shared
   -> Mostly makes sense: why open a shared file R/W and not mmap it
  R/W?

-object memory-backend-file,share=off,readonly=off
   -> Existing behavior: Open readwrite, mmap readwrite private
   -> Mostly makes sense: why open a file R/W and not map it R/W (even if
  private)?

-object memory-backend-file,share=off,readonly=on
   -> Existing behavior: Open readonly, mmap readonly private
   -> That's the problematic one


So for your use case (VM templating using a readonly file), you
would actually want to use:

-object memory-backend-file,share=off,readonly=on

BUT, have the mmap be writable (instead of currently readonly).

Assuming we would change the current behavior, what if someone would
specify:

-object memory-backend-file,readonly=on

(because the default is share=off ...) and using it for a R/O NVDIMM,
where we expect any write access to fail.


But let's look at the commit that added the "readonly" parameter:

commit 86635aa4e9d627d5142b81c57a33dd1f36627d07
Author: Stefan Hajnoczi 
Date:   Mon Jan 4 17:13:19 2021 +

  hostmem-file: add readonly=on|off option

  Let -object memory-backend-file work on read-only files when the
  readonly=on option is given. This can be used to share the contents of a
  file between multiple guests while preventing them from consuming
  Copy-on-Write memory if guests dirty the pages, for example.

That was part of

https://lore.kernel.org/all/20210104171320.575838-3-stefa...@redhat.com/T/#m712f995e6dcfdde433958bae5095b145dd0ee640

  From what I understand, for NVDIMMs we always use
"-object memory-backend-file,share=on", even when we want a
readonly NVDIMM.


So we have two options:

1) Change the current behavior of -object 
memory-backend-file,share=off,readonly=on:

-> Open the file r/o but mmap it writable


Commit 86635aa4e9d627d5142b81c57a33dd1f36627d07 mentions that we don't
want guests to be able to dirty pages on the host. The change you're
proposing would not protect against guests that dirty the memory.


The guest could write memory but not modify the file. Only with 
"share=off,readonly=on" of course, not with "share=on,readonly=on".




I don't know how important that requirement was (that commit was a
request from Kata Containers).


Let me take a look if Kata passes "share=on,readonly=on" or 
"share=off,readonly=off".


Thanks!

--
Cheers,

David / dhildenb

Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping

2023-08-17 Thread Stefan Hajnoczi

On Thu, 17 Aug 2023 at 05:08, David Hildenbrand  wrote:
>
> @Stefan, see below on a R/O NVDIMM question.
>
> We're discussing how to get MAPR_PRIVATE R/W mapping of a
> memory-backend-file running when using R/O files.
>
> >
> > This seems a good idea. I am good with the solution you proposed
> > here as well.
>
> I was just going to get started working on that, when I realized
> something important:
>
>
> "@readonly: if true, the backing file is opened read-only; if false,
>   it is opened read-write.  (default: false)"
>
> "@share: if false, the memory is private to QEMU; if true, it is
>   shared (default: false)"
>
> So readonly is *all* about the file access mode already ... the mmap()
> parameters are just a side-effect of that. Adding a new
> "file-access-mode" or similar would be wrong.
>
>
> Here are the combinations we have right now:
>
> -object memory-backend-file,share=on,readonly=on
>
>   -> Existing behavior: Open readonly, mmap readonly shared
>   -> Makes sense, mmap'ing readwrite would fail
>
> -object memory-backend-file,share=on,readonly=off
>
>   -> Existing behavior: Open readwrite, mmap readwrite shared
>   -> Mostly makes sense: why open a shared file R/W and not mmap it
>  R/W?
>
> -object memory-backend-file,share=off,readonly=off
>   -> Existing behavior: Open readwrite, mmap readwrite private
>   -> Mostly makes sense: why open a file R/W and not map it R/W (even if
>  private)?
>
> -object memory-backend-file,share=off,readonly=on
>   -> Existing behavior: Open readonly, mmap readonly private
>   -> That's the problematic one
>
>
> So for your use case (VM templating using a readonly file), you
> would actually want to use:
>
> -object memory-backend-file,share=off,readonly=on
>
> BUT, have the mmap be writable (instead of currently readonly).
>
> Assuming we would change the current behavior, what if someone would
> specify:
>
> -object memory-backend-file,readonly=on
>
> (because the default is share=off ...) and using it for a R/O NVDIMM,
> where we expect any write access to fail.
>
>
> But let's look at the commit that added the "readonly" parameter:
>
> commit 86635aa4e9d627d5142b81c57a33dd1f36627d07
> Author: Stefan Hajnoczi 
> Date:   Mon Jan 4 17:13:19 2021 +
>
>  hostmem-file: add readonly=on|off option
>
>  Let -object memory-backend-file work on read-only files when the
>  readonly=on option is given. This can be used to share the contents of a
>  file between multiple guests while preventing them from consuming
>  Copy-on-Write memory if guests dirty the pages, for example.
>
> That was part of
>
> https://lore.kernel.org/all/20210104171320.575838-3-stefa...@redhat.com/T/#m712f995e6dcfdde433958bae5095b145dd0ee640
>
>  From what I understand, for NVDIMMs we always use
> "-object memory-backend-file,share=on", even when we want a
> readonly NVDIMM.
>
>
> So we have two options:
>
> 1) Change the current behavior of -object 
> memory-backend-file,share=off,readonly=on:
>
> -> Open the file r/o but mmap it writable

Commit 86635aa4e9d627d5142b81c57a33dd1f36627d07 mentions that we don't
want guests to be able to dirty pages on the host. The change you're
proposing would not protect against guests that dirty the memory.

I don't know how important that requirement was (that commit was a
request from Kata Containers).

>
> 2) Add a new property to configure the mmap accessibility. Not a big fan of 
> that.
>
>
> @Stefan, do you have any concern when we would do 1) ?
>
> As far as I can tell, we have to set the nvdimm to "unarmed=on" either way:
>
> +   "unarmed" controls the ACPI NFIT NVDIMM Region Mapping Structure "NVDIMM
> +   State Flags" Bit 3 indicating that the device is "unarmed" and cannot 
> accept
> +   persistent writes. Linux guest drivers set the device to read-only when 
> this
> +   bit is present. Set unarmed to on when the memdev has readonly=on.
>
> So changing the behavior would not really break the nvdimm use case.
>
> Further, we could warn in nvdimm code when we stumble over this configuration 
> with
> unarmed=on.
>
> --
> Cheers,
>
> David / dhildenb
>
>

Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping

2023-08-17 Thread David Hildenbrand


On 17.08.23 16:41, Peter Xu wrote:

On Thu, Aug 17, 2023 at 11:07:23AM +0200, David Hildenbrand wrote:

@Stefan, see below on a R/O NVDIMM question.

We're discussing how to get MAPR_PRIVATE R/W mapping of a
memory-backend-file running when using R/O files.



This seems a good idea. I am good with the solution you proposed
here as well.


I was just going to get started working on that, when I realized
something important:


"@readonly: if true, the backing file is opened read-only; if false,
  it is opened read-write.  (default: false)"

"@share: if false, the memory is private to QEMU; if true, it is
  shared (default: false)"

So readonly is *all* about the file access mode already ... the mmap()
parameters are just a side-effect of that. Adding a new
"file-access-mode" or similar would be wrong.


Not exactly a side effect, IMHO.  IIUC it's simply because we didn't have a
need of using different perm for memory/file levels.  See the other patch
commit message from Stefan:

https://lore.kernel.org/all/20210104171320.575838-2-stefa...@redhat.com/

 There is currently no way to open(O_RDONLY) and mmap(PROT_READ) when
 [...]

So the goal at that time was to map/open both in RO mode, afaiu.  So one


Good point. And you can have both with "share=on,readonly=on" ever since 
Stefan introduced that flag, which that patch enabled.


Stefan didn't go into details to describe the required interactions 
between MAP_PRIVATE / MAP_SHARED.


[...]


-object memory-backend-file,share=off,readonly=on

BUT, have the mmap be writable (instead of currently readonly).

Assuming we would change the current behavior, what if someone would
specify:

-object memory-backend-file,readonly=on

(because the default is share=off ...) and using it for a R/O NVDIMM,
where we expect any write access to fail.


It will (as expected), right?  fallocate() will just fail on the RO files.


Yes, if the file was opened R/O, any fallocate() will fail.



To be explicit, maybe we should just remember the readonly attribute for a
ramblock and then we can even provide a more meaningful error log, like:


Hah, I have a patch that adds RAM_READONLY :) . But it expresses 
"mmapped shared" not "file opened shared".




diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 3df73542e1..f8c11c8d54 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -3424,9 +3424,13 @@ int qemu_ram_foreach_block(RAMBlockIterFunc func, void 
*opaque)
  int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
  {
  int ret = -1;
-
  uint8_t *host_startaddr = rb->host + start;
  
+if (rb->flags & RAM_READONLY) {

+// more meaningful error reports (though right now no errp passed in..)
+return -EACCES;
+}


Remembering whether a file is opened R/O might be reasonable to improve 
the error messages.


[...]



But let's look at the commit that added the "readonly" parameter:

commit 86635aa4e9d627d5142b81c57a33dd1f36627d07
Author: Stefan Hajnoczi 
Date:   Mon Jan 4 17:13:19 2021 +

 hostmem-file: add readonly=on|off option
 Let -object memory-backend-file work on read-only files when the
 readonly=on option is given. This can be used to share the contents of a
 file between multiple guests while preventing them from consuming
 Copy-on-Write memory if guests dirty the pages, for example.

That was part of

https://lore.kernel.org/all/20210104171320.575838-3-stefa...@redhat.com/T/#m712f995e6dcfdde433958bae5095b145dd0ee640

 From what I understand, for NVDIMMs we always use
"-object memory-backend-file,share=on", even when we want a
readonly NVDIMM.


So we have two options:

1) Change the current behavior of -object 
memory-backend-file,share=off,readonly=on:

-> Open the file r/o but mmap it writable

2) Add a new property to configure the mmap accessibility. Not a big fan of 
that.


@Stefan, do you have any concern when we would do 1) ?

As far as I can tell, we have to set the nvdimm to "unarmed=on" either way:

+   "unarmed" controls the ACPI NFIT NVDIMM Region Mapping Structure "NVDIMM
+   State Flags" Bit 3 indicating that the device is "unarmed" and cannot accept
+   persistent writes. Linux guest drivers set the device to read-only when this
+   bit is present. Set unarmed to on when the memdev has readonly=on.

So changing the behavior would not really break the nvdimm use case.

Further, we could warn in nvdimm code when we stumble over this configuration 
with
unarmed=on.


I'll leave nvdimm specific question to Stefan, but isn't this also will map
any readonly=on memory backends (besides nvdimm) to have the memory
writable, which is still unexpected?


Note that libvirt *never* sets readonly=on for R/O NVDIMMs, and R/O 
NVDIMMs are really the only use case.


I don't think "open file read only" raises the expectation "map it read 
only". It certainly does for shared mappings, but not for private mappings.


To me, the expectation of "open the file read only" is that the

Re: [PATCH v3 1/7] vdpa: Use iovec for vhost_vdpa_net_load_cmd()

2023-08-17 Thread Hawkins Jiawei

在 2023/8/17 22:05, Eugenio Perez Martin 写道:
> On Thu, Aug 17, 2023 at 2:42 PM Hawkins Jiawei  wrote:
>>
>> On 2023/8/17 17:23, Eugenio Perez Martin wrote:
>>> On Fri, Jul 7, 2023 at 5:27 PM Hawkins Jiawei  wrote:

 According to VirtIO standard, "The driver MUST follow
 the VIRTIO_NET_CTRL_MAC_TABLE_SET command by a le32 number,
 followed by that number of non-multicast MAC addresses,
 followed by another le32 number, followed by that number
 of multicast addresses."

 Considering that these data is not stored in contiguous memory,
 this patch refactors vhost_vdpa_net_load_cmd() to accept
 scattered data, eliminating the need for an addtional data copy or
 packing the data into s->cvq_cmd_out_buffer outside of
 vhost_vdpa_net_load_cmd().

 Signed-off-by: Hawkins Jiawei 
 ---
 v3:
 - rename argument name to `data_sg` and `data_num`
 - use iov_to_buf() suggested by Eugenio

 v2: 
 https://lore.kernel.org/all/6d3dc0fc076564a03501e222ef1102a6a7a643af.1688051252.git.yin31...@gmail.com/
 - refactor vhost_vdpa_load_cmd() to accept iovec suggested by
 Eugenio

net/vhost-vdpa.c | 33 +
1 file changed, 25 insertions(+), 8 deletions(-)

 diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
 index 373609216f..31ef6ad6ec 100644
 --- a/net/vhost-vdpa.c
 +++ b/net/vhost-vdpa.c
 @@ -620,29 +620,38 @@ static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState 
 *s, size_t out_len,
}

static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, uint8_t class,
 -   uint8_t cmd, const void *data,
 -   size_t data_size)
 +   uint8_t cmd, const struct iovec 
 *data_sg,
 +   size_t data_num)
{
const struct virtio_net_ctrl_hdr ctrl = {
.class = class,
.cmd = cmd,
};
 +size_t data_size = iov_size(data_sg, data_num);

assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - 
 sizeof(ctrl));

 +/* pack the CVQ command header */
memcpy(s->cvq_cmd_out_buffer, , sizeof(ctrl));
 -memcpy(s->cvq_cmd_out_buffer + sizeof(ctrl), data, data_size);

 -return vhost_vdpa_net_cvq_add(s, sizeof(ctrl) + data_size,
 +/* pack the CVQ command command-specific-data */
 +iov_to_buf(data_sg, data_num, 0,
 +   s->cvq_cmd_out_buffer + sizeof(ctrl), data_size);
 +
 +return vhost_vdpa_net_cvq_add(s, data_size + sizeof(ctrl),
>>>
>>> Nit, any reason for changing the order of the addends? sizeof(ctrl) +
>>> data_size ?
>>
>> Hi Eugenio,
>>
>> Here the code should be changed to `sizeof(ctrl) + data_size` as you
>> point out.
>>
>> Since this patch series has already been merged into master, I will
>> submit a separate patch to correct this problem.
>>
>
> Ouch, I didn't realize that. No need to make it back again, I was just
> trying to reduce lines changed.

Ok, I got it. Regardless, thank you for your review!


>
>>>
  sizeof(virtio_net_ctrl_ack));
}

static int vhost_vdpa_net_load_mac(VhostVDPAState *s, const VirtIONet 
 *n)
{
if (virtio_vdev_has_feature(>parent_obj, 
 VIRTIO_NET_F_CTRL_MAC_ADDR)) {
 +const struct iovec data = {
 +.iov_base = (void *)n->mac,
>>>
>>> Assign to void should always be valid, no need for casting here.
>>
>> Yes, assign to void should be valid for normal pointers.
>>
>> However, `n->mac` is an array and is treated as a const pointer. It will
>> trigger the warning "error: initialization discards ‘const’ qualifier
>> from pointer target type" if we don't add this cast.
>>
>
> Got it, I didn't realize it. Everything is ok then.
>
> Thanks!
>
>> Thanks!
>>
>>
>>>
 +.iov_len = sizeof(n->mac),
 +};
ssize_t dev_written = vhost_vdpa_net_load_cmd(s, 
 VIRTIO_NET_CTRL_MAC,
  
 VIRTIO_NET_CTRL_MAC_ADDR_SET,
 -  n->mac, sizeof(n->mac));
 +  , 1);
if (unlikely(dev_written < 0)) {
return dev_written;
}
 @@ -665,9 +674,13 @@ static int vhost_vdpa_net_load_mq(VhostVDPAState *s,
}

mq.virtqueue_pairs = cpu_to_le16(n->curr_queue_pairs);
 +const struct iovec data = {
 +.iov_base = ,
 +.iov_len = sizeof(mq),
 +};
dev_written = vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_MQ,
 -  
 VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET, ,
 -

Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping

2023-08-17 Thread David Hildenbrand


On 17.08.23 16:45, Daniel P. Berrangé wrote:

On Thu, Aug 17, 2023 at 04:37:52PM +0200, David Hildenbrand wrote:

On 17.08.23 16:37, Daniel P. Berrangé wrote:

On Thu, Aug 17, 2023 at 04:30:16PM +0200, David Hildenbrand wrote:



@Stefan, do you have any concern when we would do 1) ?

As far as I can tell, we have to set the nvdimm to "unarmed=on" either way:

+   "unarmed" controls the ACPI NFIT NVDIMM Region Mapping Structure "NVDIMM
+   State Flags" Bit 3 indicating that the device is "unarmed" and cannot accept
+   persistent writes. Linux guest drivers set the device to read-only when this
+   bit is present. Set unarmed to on when the memdev has readonly=on.

So changing the behavior would not really break the nvdimm use case.


Looking into the details, this seems to be the right thing to do.

This is what I have now as patch description, that also highlights how libvirt
doesn't even make use of readonly=true.


  From 42f272ace68e0cd660a8448adb5aefb3b9dd7005 Mon Sep 17 00:00:00 2001
From: David Hildenbrand 
Date: Thu, 17 Aug 2023 12:09:07 +0200
Subject: [PATCH v2 2/4] backends/hostmem-file: Make share=off,readonly=on
   result in RAM instead of ROM

For now, "share=off,readonly=on" would always result in us opening the
file R/O and mmap'ing the opened file MAP_PRIVATE R/O -- effectively
turning it into ROM.

As documented, readonly only specifies that we want to open the file R/O:

  @readonly: if true, the backing file is opened read-only; if false,
  it is opened read-write.  (default: false)

Especially for VM templating, "share=off" is a common use case. However,
that use case is impossible with files that lack write permissions,
because "share=off,readonly=off" will fail opening the file, and
"share=off,readonly=on" will give us ROM instead of RAM.

With MAP_PRIVATE we can easily open the file R/O and mmap it R/W, to
turn it into COW RAM: private changes don't affect the file after all and
don't require write permissions.

This implies that we only get ROM now via "share=on,readonly=on".
"share=off,readonly=on" will give us RAM.

The sole user of ROM via memory-backend-file are R/O NVDIMMs. They
also require "unarmed=on" to be set for the nvdimm device.

With this change, R/O NVDIMMs will continue working even if
"share=off,readonly=on" was specified similar to when simply
providing ordinary RAM to the nvdimm device and setting "unarmed=on".

Note that libvirt seems to default for a "readonly" nvdimm to
* -object memory-backend-file,share=off (implying readonly=off)
* -device nvdimm,unarmed=on
And never seems to even set "readonly=on" for memory-backend-file. So
this change won't affect libvirt, they already always get COW RAM -- not
modifying the underlying file but opening it R/O.

If someone really wants ROM, they can just use "share=on,readonly=on".
After all, there is not relevant difference between a R/O MAP_SHARED
file mapping and a R/O MAP_PRIVATE file mapping.

Signed-off-by: David Hildenbrand 


This still leaves the patch having a warn_report() which I think is
undesirable to emit in a valid / supported use case.


No warning.

Please elaborate on "valid/supported use case".


The usage scenario that this patch aims to enable. IIUC, it will follow
the codepath that leads to the warn_report() call in this patch.


It shouldn't but I will double check!

--
Cheers,

David / dhildenb

Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping

2023-08-17 Thread Daniel P . Berrangé

On Thu, Aug 17, 2023 at 04:37:52PM +0200, David Hildenbrand wrote:
> On 17.08.23 16:37, Daniel P. Berrangé wrote:
> > On Thu, Aug 17, 2023 at 04:30:16PM +0200, David Hildenbrand wrote:
> > > 
> > > > @Stefan, do you have any concern when we would do 1) ?
> > > > 
> > > > As far as I can tell, we have to set the nvdimm to "unarmed=on" either 
> > > > way:
> > > > 
> > > > +   "unarmed" controls the ACPI NFIT NVDIMM Region Mapping Structure 
> > > > "NVDIMM
> > > > +   State Flags" Bit 3 indicating that the device is "unarmed" and 
> > > > cannot accept
> > > > +   persistent writes. Linux guest drivers set the device to read-only 
> > > > when this
> > > > +   bit is present. Set unarmed to on when the memdev has readonly=on.
> > > > 
> > > > So changing the behavior would not really break the nvdimm use case.
> > > 
> > > Looking into the details, this seems to be the right thing to do.
> > > 
> > > This is what I have now as patch description, that also highlights how 
> > > libvirt
> > > doesn't even make use of readonly=true.
> > > 
> > > 
> > >  From 42f272ace68e0cd660a8448adb5aefb3b9dd7005 Mon Sep 17 00:00:00 2001
> > > From: David Hildenbrand 
> > > Date: Thu, 17 Aug 2023 12:09:07 +0200
> > > Subject: [PATCH v2 2/4] backends/hostmem-file: Make share=off,readonly=on
> > >   result in RAM instead of ROM
> > > 
> > > For now, "share=off,readonly=on" would always result in us opening the
> > > file R/O and mmap'ing the opened file MAP_PRIVATE R/O -- effectively
> > > turning it into ROM.
> > > 
> > > As documented, readonly only specifies that we want to open the file R/O:
> > > 
> > >  @readonly: if true, the backing file is opened read-only; if false,
> > >  it is opened read-write.  (default: false)
> > > 
> > > Especially for VM templating, "share=off" is a common use case. However,
> > > that use case is impossible with files that lack write permissions,
> > > because "share=off,readonly=off" will fail opening the file, and
> > > "share=off,readonly=on" will give us ROM instead of RAM.
> > > 
> > > With MAP_PRIVATE we can easily open the file R/O and mmap it R/W, to
> > > turn it into COW RAM: private changes don't affect the file after all and
> > > don't require write permissions.
> > > 
> > > This implies that we only get ROM now via "share=on,readonly=on".
> > > "share=off,readonly=on" will give us RAM.
> > > 
> > > The sole user of ROM via memory-backend-file are R/O NVDIMMs. They
> > > also require "unarmed=on" to be set for the nvdimm device.
> > > 
> > > With this change, R/O NVDIMMs will continue working even if
> > > "share=off,readonly=on" was specified similar to when simply
> > > providing ordinary RAM to the nvdimm device and setting "unarmed=on".
> > > 
> > > Note that libvirt seems to default for a "readonly" nvdimm to
> > > * -object memory-backend-file,share=off (implying readonly=off)
> > > * -device nvdimm,unarmed=on
> > > And never seems to even set "readonly=on" for memory-backend-file. So
> > > this change won't affect libvirt, they already always get COW RAM -- not
> > > modifying the underlying file but opening it R/O.
> > > 
> > > If someone really wants ROM, they can just use "share=on,readonly=on".
> > > After all, there is not relevant difference between a R/O MAP_SHARED
> > > file mapping and a R/O MAP_PRIVATE file mapping.
> > > 
> > > Signed-off-by: David Hildenbrand 
> > 
> > This still leaves the patch having a warn_report() which I think is
> > undesirable to emit in a valid / supported use case.
> 
> No warning.
> 
> Please elaborate on "valid/supported use case".

The usage scenario that this patch aims to enable. IIUC, it will follow
the codepath that leads to the warn_report() call in this patch.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping

2023-08-17 Thread Peter Xu

On Thu, Aug 17, 2023 at 11:07:23AM +0200, David Hildenbrand wrote:
> @Stefan, see below on a R/O NVDIMM question.
> 
> We're discussing how to get MAPR_PRIVATE R/W mapping of a
> memory-backend-file running when using R/O files.
> 
> > 
> > This seems a good idea. I am good with the solution you proposed
> > here as well.
> 
> I was just going to get started working on that, when I realized
> something important:
> 
> 
> "@readonly: if true, the backing file is opened read-only; if false,
>  it is opened read-write.  (default: false)"
> 
> "@share: if false, the memory is private to QEMU; if true, it is
>  shared (default: false)"
> 
> So readonly is *all* about the file access mode already ... the mmap()
> parameters are just a side-effect of that. Adding a new
> "file-access-mode" or similar would be wrong.

Not exactly a side effect, IMHO.  IIUC it's simply because we didn't have a
need of using different perm for memory/file levels.  See the other patch
commit message from Stefan:

https://lore.kernel.org/all/20210104171320.575838-2-stefa...@redhat.com/

There is currently no way to open(O_RDONLY) and mmap(PROT_READ) when
[...]

So the goal at that time was to map/open both in RO mode, afaiu.  So one
parameter was enough at that time.  It doesn't necessarily must be for only
the file permission or memory, while in reality/code it does apply to both
for now until we see a need on differentiating them for CoW purpose.

> 
> 
> Here are the combinations we have right now:
> 
> -object memory-backend-file,share=on,readonly=on
> 
>  -> Existing behavior: Open readonly, mmap readonly shared
>  -> Makes sense, mmap'ing readwrite would fail
> 
> -object memory-backend-file,share=on,readonly=off
> 
>  -> Existing behavior: Open readwrite, mmap readwrite shared
>  -> Mostly makes sense: why open a shared file R/W and not mmap it
> R/W?
> 
> -object memory-backend-file,share=off,readonly=off
>  -> Existing behavior: Open readwrite, mmap readwrite private
>  -> Mostly makes sense: why open a file R/W and not map it R/W (even if
> private)?
> 
> -object memory-backend-file,share=off,readonly=on
>  -> Existing behavior: Open readonly, mmap readonly private
>  -> That's the problematic one
> 
> 
> So for your use case (VM templating using a readonly file), you
> would actually want to use:
> 
> -object memory-backend-file,share=off,readonly=on
> 
> BUT, have the mmap be writable (instead of currently readonly).
> 
> Assuming we would change the current behavior, what if someone would
> specify:
> 
> -object memory-backend-file,readonly=on
> 
> (because the default is share=off ...) and using it for a R/O NVDIMM,
> where we expect any write access to fail.

It will (as expected), right?  fallocate() will just fail on the RO files.

To be explicit, maybe we should just remember the readonly attribute for a
ramblock and then we can even provide a more meaningful error log, like:

diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 3df73542e1..f8c11c8d54 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -3424,9 +3424,13 @@ int qemu_ram_foreach_block(RAMBlockIterFunc func, void 
*opaque)
 int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
 {
 int ret = -1;
-
 uint8_t *host_startaddr = rb->host + start;
 
+if (rb->flags & RAM_READONLY) {
+// more meaningful error reports (though right now no errp passed in..)
+return -EACCES;
+}
+

I see that Stefan even raised this question in the commit log:

No new RAMBlock flag is introduced for read-only because it's unclear
whether RAMBlocks need to know that they are read-only. Pass a bool
readonly argument instead.

Right now failing at fallocate() isn't that bad to me.

> 
> 
> But let's look at the commit that added the "readonly" parameter:
> 
> commit 86635aa4e9d627d5142b81c57a33dd1f36627d07
> Author: Stefan Hajnoczi 
> Date:   Mon Jan 4 17:13:19 2021 +
> 
> hostmem-file: add readonly=on|off option
> Let -object memory-backend-file work on read-only files when the
> readonly=on option is given. This can be used to share the contents of a
> file between multiple guests while preventing them from consuming
> Copy-on-Write memory if guests dirty the pages, for example.
> 
> That was part of
> 
> https://lore.kernel.org/all/20210104171320.575838-3-stefa...@redhat.com/T/#m712f995e6dcfdde433958bae5095b145dd0ee640
> 
> From what I understand, for NVDIMMs we always use
> "-object memory-backend-file,share=on", even when we want a
> readonly NVDIMM.
> 
> 
> So we have two options:
> 
> 1) Change the current behavior of -object 
> memory-backend-file,share=off,readonly=on:
> 
> -> Open the file r/o but mmap it writable
> 
> 2) Add a new property to configure the mmap accessibility. Not a big fan of 
> that.
> 
> 
> @Stefan, do you have any concern when we would do 1) ?
> 
> As far as I can tell, we have to set the nvdimm to "unarmed=on" either way:
> 
> +   "unarmed"

Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping

2023-08-17 Thread David Hildenbrand


On 17.08.23 16:37, Daniel P. Berrangé wrote:

On Thu, Aug 17, 2023 at 04:30:16PM +0200, David Hildenbrand wrote:



@Stefan, do you have any concern when we would do 1) ?

As far as I can tell, we have to set the nvdimm to "unarmed=on" either way:

+   "unarmed" controls the ACPI NFIT NVDIMM Region Mapping Structure "NVDIMM
+   State Flags" Bit 3 indicating that the device is "unarmed" and cannot accept
+   persistent writes. Linux guest drivers set the device to read-only when this
+   bit is present. Set unarmed to on when the memdev has readonly=on.

So changing the behavior would not really break the nvdimm use case.


Looking into the details, this seems to be the right thing to do.

This is what I have now as patch description, that also highlights how libvirt
doesn't even make use of readonly=true.


 From 42f272ace68e0cd660a8448adb5aefb3b9dd7005 Mon Sep 17 00:00:00 2001
From: David Hildenbrand 
Date: Thu, 17 Aug 2023 12:09:07 +0200
Subject: [PATCH v2 2/4] backends/hostmem-file: Make share=off,readonly=on
  result in RAM instead of ROM

For now, "share=off,readonly=on" would always result in us opening the
file R/O and mmap'ing the opened file MAP_PRIVATE R/O -- effectively
turning it into ROM.

As documented, readonly only specifies that we want to open the file R/O:

 @readonly: if true, the backing file is opened read-only; if false,
 it is opened read-write.  (default: false)

Especially for VM templating, "share=off" is a common use case. However,
that use case is impossible with files that lack write permissions,
because "share=off,readonly=off" will fail opening the file, and
"share=off,readonly=on" will give us ROM instead of RAM.

With MAP_PRIVATE we can easily open the file R/O and mmap it R/W, to
turn it into COW RAM: private changes don't affect the file after all and
don't require write permissions.

This implies that we only get ROM now via "share=on,readonly=on".
"share=off,readonly=on" will give us RAM.

The sole user of ROM via memory-backend-file are R/O NVDIMMs. They
also require "unarmed=on" to be set for the nvdimm device.

With this change, R/O NVDIMMs will continue working even if
"share=off,readonly=on" was specified similar to when simply
providing ordinary RAM to the nvdimm device and setting "unarmed=on".

Note that libvirt seems to default for a "readonly" nvdimm to
* -object memory-backend-file,share=off (implying readonly=off)
* -device nvdimm,unarmed=on
And never seems to even set "readonly=on" for memory-backend-file. So
this change won't affect libvirt, they already always get COW RAM -- not
modifying the underlying file but opening it R/O.

If someone really wants ROM, they can just use "share=on,readonly=on".
After all, there is not relevant difference between a R/O MAP_SHARED
file mapping and a R/O MAP_PRIVATE file mapping.

Signed-off-by: David Hildenbrand 


This still leaves the patch having a warn_report() which I think is
undesirable to emit in a valid / supported use case.


No warning.

Please elaborate on "valid/supported use case".

--
Cheers,

David / dhildenb

Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping

2023-08-17 Thread Daniel P . Berrangé

On Thu, Aug 17, 2023 at 04:30:16PM +0200, David Hildenbrand wrote:
> 
> > @Stefan, do you have any concern when we would do 1) ?
> > 
> > As far as I can tell, we have to set the nvdimm to "unarmed=on" either way:
> > 
> > +   "unarmed" controls the ACPI NFIT NVDIMM Region Mapping Structure "NVDIMM
> > +   State Flags" Bit 3 indicating that the device is "unarmed" and cannot 
> > accept
> > +   persistent writes. Linux guest drivers set the device to read-only when 
> > this
> > +   bit is present. Set unarmed to on when the memdev has readonly=on.
> > 
> > So changing the behavior would not really break the nvdimm use case.
> 
> Looking into the details, this seems to be the right thing to do.
> 
> This is what I have now as patch description, that also highlights how libvirt
> doesn't even make use of readonly=true.
> 
> 
> From 42f272ace68e0cd660a8448adb5aefb3b9dd7005 Mon Sep 17 00:00:00 2001
> From: David Hildenbrand 
> Date: Thu, 17 Aug 2023 12:09:07 +0200
> Subject: [PATCH v2 2/4] backends/hostmem-file: Make share=off,readonly=on
>  result in RAM instead of ROM
> 
> For now, "share=off,readonly=on" would always result in us opening the
> file R/O and mmap'ing the opened file MAP_PRIVATE R/O -- effectively
> turning it into ROM.
> 
> As documented, readonly only specifies that we want to open the file R/O:
> 
> @readonly: if true, the backing file is opened read-only; if false,
> it is opened read-write.  (default: false)
> 
> Especially for VM templating, "share=off" is a common use case. However,
> that use case is impossible with files that lack write permissions,
> because "share=off,readonly=off" will fail opening the file, and
> "share=off,readonly=on" will give us ROM instead of RAM.
> 
> With MAP_PRIVATE we can easily open the file R/O and mmap it R/W, to
> turn it into COW RAM: private changes don't affect the file after all and
> don't require write permissions.
> 
> This implies that we only get ROM now via "share=on,readonly=on".
> "share=off,readonly=on" will give us RAM.
> 
> The sole user of ROM via memory-backend-file are R/O NVDIMMs. They
> also require "unarmed=on" to be set for the nvdimm device.
> 
> With this change, R/O NVDIMMs will continue working even if
> "share=off,readonly=on" was specified similar to when simply
> providing ordinary RAM to the nvdimm device and setting "unarmed=on".
> 
> Note that libvirt seems to default for a "readonly" nvdimm to
> * -object memory-backend-file,share=off (implying readonly=off)
> * -device nvdimm,unarmed=on
> And never seems to even set "readonly=on" for memory-backend-file. So
> this change won't affect libvirt, they already always get COW RAM -- not
> modifying the underlying file but opening it R/O.
> 
> If someone really wants ROM, they can just use "share=on,readonly=on".
> After all, there is not relevant difference between a R/O MAP_SHARED
> file mapping and a R/O MAP_PRIVATE file mapping.
> 
> Signed-off-by: David Hildenbrand 

This still leaves the patch having a warn_report() which I think is
undesirable to emit in a valid / supported use case.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH] migrate/ram: let ram_save_target_page_legacy() return if qemu file got error

2023-08-17 Thread Guoyi Tu


I apologize for the previous email being cut off. I am resending it here.

It sounds very reasonable. the return value of the QEMUFile interface
cannot accurately reflect the actual situation, and the way these
interfaces are being called during the migration process also is a
little bit weird.

I'm glad to see that you have plans to improve these interfaces. If you
need any assistance, I'd be more than happy to be involved

On 2023/8/16 23:15, 【外部账号】 Fabiano Rosas wrote:

Peter Xu  writes:


On Tue, Aug 15, 2023 at 07:42:24PM -0300, Fabiano Rosas wrote:

Yep, I see that. I meant explicitly move the code into the loop. Feels a
bit weird to check the QEMUFile for errors first thing inside the
function when nothing around it should have touched the QEMUFile.


Valid point.  This reminded me that now we have one indirection into
->ram_save_target_page() which is a hook now.  Putting in the caller will
work for all hooks, even though they're not yet exist.

But since we don't have any other hooks yet, it'll be the same for now.

Acked-by: Peter Xu 

For the long term: there's one more reason to rework qemu_put_byte()/... to
return error codes.. Then things like save_normal_page() can simply already
return negatives when hit an error.

Fabiano - I see that you've done quite a few patches in reworking migration
code.  I had that for a long time in my todo, but if you're interested feel
free to look into it.

IIUC the idea is introducing another similar layer of API for qemufile (I'd
call it qemu_put_1|2|4|8(), or anything you can come up better with..) then
let migration to switch over to it, with retval reflecting errors.  Then we
should be able to drop this patch along with most of the explicit error
checks for the qemufile spread all over.


I was just ranting about this situation in another thread! Yes, we need
something like that. QEMUFile errors should only be set by code doing
actual IO and if we want to store the error for other parts of the code
to use, that should be another interface.

While reviewing this patch I noticed we have stuff like this:

pages = ram_find_and_save_block()
...
if (pages < 0) {
 qemu_file_set_error(f, pages);
 break;
}

So the low-level code sets the error, ram_save_target_page_legacy() sees
it and returns -1, and this^ code loses all track of the initial error
and inadvertently turns it into -EPERM!

I'll try to find some time to start cleaning this up

Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping

2023-08-17 Thread David Hildenbrand





@Stefan, do you have any concern when we would do 1) ?

As far as I can tell, we have to set the nvdimm to "unarmed=on" either way:

+   "unarmed" controls the ACPI NFIT NVDIMM Region Mapping Structure "NVDIMM
+   State Flags" Bit 3 indicating that the device is "unarmed" and cannot accept
+   persistent writes. Linux guest drivers set the device to read-only when this
+   bit is present. Set unarmed to on when the memdev has readonly=on.

So changing the behavior would not really break the nvdimm use case.


Looking into the details, this seems to be the right thing to do.

This is what I have now as patch description, that also highlights how libvirt
doesn't even make use of readonly=true.


From 42f272ace68e0cd660a8448adb5aefb3b9dd7005 Mon Sep 17 00:00:00 2001
From: David Hildenbrand 
Date: Thu, 17 Aug 2023 12:09:07 +0200
Subject: [PATCH v2 2/4] backends/hostmem-file: Make share=off,readonly=on
 result in RAM instead of ROM

For now, "share=off,readonly=on" would always result in us opening the
file R/O and mmap'ing the opened file MAP_PRIVATE R/O -- effectively
turning it into ROM.

As documented, readonly only specifies that we want to open the file R/O:

@readonly: if true, the backing file is opened read-only; if false,
it is opened read-write.  (default: false)

Especially for VM templating, "share=off" is a common use case. However,
that use case is impossible with files that lack write permissions,
because "share=off,readonly=off" will fail opening the file, and
"share=off,readonly=on" will give us ROM instead of RAM.

With MAP_PRIVATE we can easily open the file R/O and mmap it R/W, to
turn it into COW RAM: private changes don't affect the file after all and
don't require write permissions.

This implies that we only get ROM now via "share=on,readonly=on".
"share=off,readonly=on" will give us RAM.

The sole user of ROM via memory-backend-file are R/O NVDIMMs. They
also require "unarmed=on" to be set for the nvdimm device.

With this change, R/O NVDIMMs will continue working even if
"share=off,readonly=on" was specified similar to when simply
providing ordinary RAM to the nvdimm device and setting "unarmed=on".

Note that libvirt seems to default for a "readonly" nvdimm to
* -object memory-backend-file,share=off (implying readonly=off)
* -device nvdimm,unarmed=on
And never seems to even set "readonly=on" for memory-backend-file. So
this change won't affect libvirt, they already always get COW RAM -- not
modifying the underlying file but opening it R/O.

If someone really wants ROM, they can just use "share=on,readonly=on".
After all, there is not relevant difference between a R/O MAP_SHARED
file mapping and a R/O MAP_PRIVATE file mapping.

Signed-off-by: David Hildenbrand 

--
Cheers,

David / dhildenb

Re: [PATCH] migrate/ram: let ram_save_target_page_legacy() return if qemu file got error

2023-08-17 Thread Guoyi Tu

Thank you for the reminder. There might be some issues with the 
company's email service. I also noticed this morning that I missed 
receiving an email in response from Fabiano.



On 2023/8/17 21:35, 【外部账号】 Peter Xu wrote:

On Thu, Aug 17, 2023 at 10:19:19AM +0800, Guoyi Tu wrote:



On 2023/8/16 23:15, 【外部账号】 Fabiano Rosas wrote:

Peter Xu  writes:


On Tue, Aug 15, 2023 at 07:42:24PM -0300, Fabiano Rosas wrote:

Yep, I see that. I meant explicitly move the code into the loop. Feels a
bit weird to check the QEMUFile for errors first thing inside the
function when nothing around it should have touched the QEMUFile.


Valid point.  This reminded me that now we have one indirection into
->ram_save_target_page() which is a hook now.  Putting in the caller will
work for all hooks, even though they're not yet exist.

But since we don't have any other hooks yet, it'll be the same for now


Guoyi,

Your email got cut from here.  Same thing happened on emails from Hyman
(also sent from China Telecom email address), maybe your mail system did
something wrong.

Re: [PATCH v1 2/2] ui/vdagent: Unregister input handler of mouse during finalization

2023-08-17 Thread Marc-André Lureau

On Thu, Aug 17, 2023 at 6:24 PM  wrote:
>
> From: Guoyi Tu 
>
> Input handler resource should be released when
> VDAgentChardev object finalize
>
> Signed-off-by: Guoyi Tu 
> Signed-off-by: dengpengcheng 

Reviewed-by: Marc-André Lureau 

> ---
>  ui/vdagent.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/ui/vdagent.c b/ui/vdagent.c
> index 4b9a1fb7c5..00d36a8677 100644
> --- a/ui/vdagent.c
> +++ b/ui/vdagent.c
> @@ -926,6 +926,9 @@ static void vdagent_chr_fini(Object *obj)
>
>  migrate_del_blocker(vd->migration_blocker);
>  vdagent_disconnect(vd);
> +if (vd->mouse_hs) {
> +qemu_input_handler_unregister(vd->mouse_hs);
> +}
>  buffer_free(>outbuf);
>  error_free(vd->migration_blocker);
>  }
> --
> 2.27.0
>
>


-- 
Marc-André Lureau

Re: [PATCH v1 1/2] ui/vdagent: call vdagent_disconnect() when agent connection is lost

2023-08-17 Thread Marc-André Lureau

On Thu, Aug 17, 2023 at 6:24 PM  wrote:
>
> From: Guoyi Tu 
>
> when the agent connection is lost, the input handler of the mouse
> doesn't deactivate, which results in unresponsive mouse events in
> VNC windows.
>
> To fix this issue, call vdagent_disconnect() to reset the state
> each time the frontend disconncect
>
> Signed-off-by: Guoyi Tu 
> Signed-off-by: dengpengcheng 

Reviewed-by: Marc-André Lureau 

> ---
>  ui/vdagent.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/ui/vdagent.c b/ui/vdagent.c
> index 8a651492f0..4b9a1fb7c5 100644
> --- a/ui/vdagent.c
> +++ b/ui/vdagent.c
> @@ -870,8 +870,11 @@ static void vdagent_disconnect(VDAgentChardev *vd)
>
>  static void vdagent_chr_set_fe_open(struct Chardev *chr, int fe_open)
>  {
> +VDAgentChardev *vd = QEMU_VDAGENT_CHARDEV(chr);
> +
>  if (!fe_open) {
>  trace_vdagent_close();
> +vdagent_disconnect(vd);
>  /* To reset_serial, we CLOSED our side. Make sure the other end 
> knows we
>   * are ready again. */
>  qemu_chr_be_event(chr, CHR_EVENT_OPENED);
> --
> 2.27.0
>
>


-- 
Marc-André Lureau

Re: [PATCH v3 2/7] vdpa: Restore MAC address filtering state

2023-08-17 Thread Eugenio Perez Martin

On Thu, Aug 17, 2023 at 2:47 PM Hawkins Jiawei  wrote:
>
> On 2023/8/17 18:18, Eugenio Perez Martin wrote:
> > On Fri, Jul 7, 2023 at 5:27 PM Hawkins Jiawei  wrote:
> >>
> >> This patch refactors vhost_vdpa_net_load_mac() to
> >> restore the MAC address filtering state at device's startup.
> >>
> >> Signed-off-by: Hawkins Jiawei 
> >> ---
> >> v3:
> >>- return early if mismatch the condition suggested by Eugenio
> >>
> >> v2: 
> >> https://lore.kernel.org/all/2f2560f749186c0eb1055f9926f464587e419eeb.1688051252.git.yin31...@gmail.com/
> >>- use iovec suggested by Eugenio
> >>- avoid sending CVQ command in default state
> >>
> >> v1: 
> >> https://lore.kernel.org/all/00f72fe154a882fd6dc15bc39e3a1ac63f9dadce.1687402580.git.yin31...@gmail.com/
> >>
> >>   net/vhost-vdpa.c | 52 
> >>   1 file changed, 52 insertions(+)
> >>
> >> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> >> index 31ef6ad6ec..7189ccafaf 100644
> >> --- a/net/vhost-vdpa.c
> >> +++ b/net/vhost-vdpa.c
> >> @@ -660,6 +660,58 @@ static int vhost_vdpa_net_load_mac(VhostVDPAState *s, 
> >> const VirtIONet *n)
> >>   }
> >>   }
> >>
> >> +/*
> >> + * According to VirtIO standard, "The device MUST have an
> >> + * empty MAC filtering table on reset.".
> >> + *
> >> + * Therefore, there is no need to send this CVQ command if the
> >> + * driver also sets an empty MAC filter table, which aligns with
> >> + * the device's defaults.
> >> + *
> >> + * Note that the device's defaults can mismatch the driver's
> >> + * configuration only at live migration.
> >> + */
> >> +if (!virtio_vdev_has_feature(>parent_obj, VIRTIO_NET_F_CTRL_RX) ||
> >> +n->mac_table.in_use == 0) {
> >> +return 0;
> >> +}
> >> +
> >> +uint32_t uni_entries = n->mac_table.first_multi,
> >
> > QEMU coding style prefers declarations at the beginning of the code
> > block. Previous uses of these variable names would need to be
> > refactored to met this rule.
>
> Hi Eugenio,
>
> Thanks for the detailed explanation.
>
> Since this patch series has already been merged into master, I will
> submit a separate patch to correct this problem.
>
> I will take care of this problem in the future.
>

If the maintainer is ok with this, I'm totally ok with leaving the
code as it is right now.

Thanks!

> Thanks!
>
>
> >
> > Apart from that,
> >
> > Acked-by: Eugenio Pérez 
> >
> >> + uni_macs_size = uni_entries * ETH_ALEN,
> >> + mul_entries = n->mac_table.in_use - uni_entries,
> >> + mul_macs_size = mul_entries * ETH_ALEN;
> >> +struct virtio_net_ctrl_mac uni = {
> >> +.entries = cpu_to_le32(uni_entries),
> >> +};
> >> +struct virtio_net_ctrl_mac mul = {
> >> +.entries = cpu_to_le32(mul_entries),
> >> +};
> >> +const struct iovec data[] = {
> >> +{
> >> +.iov_base = ,
> >> +.iov_len = sizeof(uni),
> >> +}, {
> >> +.iov_base = n->mac_table.macs,
> >> +.iov_len = uni_macs_size,
> >> +}, {
> >> +.iov_base = ,
> >> +.iov_len = sizeof(mul),
> >> +}, {
> >> +.iov_base = >mac_table.macs[uni_macs_size],
> >> +.iov_len = mul_macs_size,
> >> +},
> >> +};
> >> +ssize_t dev_written = vhost_vdpa_net_load_cmd(s,
> >> +VIRTIO_NET_CTRL_MAC,
> >> +VIRTIO_NET_CTRL_MAC_TABLE_SET,
> >> +data, ARRAY_SIZE(data));
> >> +if (unlikely(dev_written < 0)) {
> >> +return dev_written;
> >> +}
> >> +if (*s->status != VIRTIO_NET_OK) {
> >> +return -EIO;
> >> +}
> >> +
> >>   return 0;
> >>   }
> >>
> >> --
> >> 2.25.1
> >>
> >
>

[PATCH v1 0/2] ui/vdagent: Fix two bugs about disconnect event handling

2023-08-17 Thread tugy

From: Guoyi Tu 

and resource leak

Guoyi Tu (2):
  ui/vdagent: call vdagent_disconnect() when agent connection is lost
  ui/vdagent: Unregister input handler of mouse during finalization

 ui/vdagent.c | 6 ++
 1 file changed, 6 insertions(+)

-- 
2.27.0

[PATCH v1 2/2] ui/vdagent: Unregister input handler of mouse during finalization

2023-08-17 Thread tugy

From: Guoyi Tu 

Input handler resource should be released when
VDAgentChardev object finalize

Signed-off-by: Guoyi Tu 
Signed-off-by: dengpengcheng 
---
 ui/vdagent.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/ui/vdagent.c b/ui/vdagent.c
index 4b9a1fb7c5..00d36a8677 100644
--- a/ui/vdagent.c
+++ b/ui/vdagent.c
@@ -926,6 +926,9 @@ static void vdagent_chr_fini(Object *obj)
 
 migrate_del_blocker(vd->migration_blocker);
 vdagent_disconnect(vd);
+if (vd->mouse_hs) {
+qemu_input_handler_unregister(vd->mouse_hs);
+}
 buffer_free(>outbuf);
 error_free(vd->migration_blocker);
 }
-- 
2.27.0

[PATCH v1 1/2] ui/vdagent: call vdagent_disconnect() when agent connection is lost

2023-08-17 Thread tugy

From: Guoyi Tu 

when the agent connection is lost, the input handler of the mouse
doesn't deactivate, which results in unresponsive mouse events in
VNC windows.

To fix this issue, call vdagent_disconnect() to reset the state
each time the frontend disconncect

Signed-off-by: Guoyi Tu 
Signed-off-by: dengpengcheng 
---
 ui/vdagent.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/ui/vdagent.c b/ui/vdagent.c
index 8a651492f0..4b9a1fb7c5 100644
--- a/ui/vdagent.c
+++ b/ui/vdagent.c
@@ -870,8 +870,11 @@ static void vdagent_disconnect(VDAgentChardev *vd)
 
 static void vdagent_chr_set_fe_open(struct Chardev *chr, int fe_open)
 {
+VDAgentChardev *vd = QEMU_VDAGENT_CHARDEV(chr);
+
 if (!fe_open) {
 trace_vdagent_close();
+vdagent_disconnect(vd);
 /* To reset_serial, we CLOSED our side. Make sure the other end knows 
we
  * are ready again. */
 qemu_chr_be_event(chr, CHR_EVENT_OPENED);
-- 
2.27.0

Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping

2023-08-17 Thread Daniel P . Berrangé

On Fri, Aug 11, 2023 at 09:00:54PM +0200, David Hildenbrand wrote:
> On 11.08.23 07:49, ThinerLogoer wrote:
> > At 2023-08-11 05:24:43, "Peter Xu"  wrote:
> > > On Fri, Aug 11, 2023 at 01:06:12AM +0800, ThinerLogoer wrote:
> > > > > I think we have the following options (there might be more)
> > > > > 
> > > > > 1) This patch.
> > > > > 
> > > > > 2) New flag for memory-backend-file. We already have "readonly" and
> > > > > "share=". I'm having a hard time coming up with a good name that 
> > > > > really
> > > > > describes the subtle difference.
> > > > > 
> > > > > 3) Glue behavior to the QEMU machine
> > > > > 
> > > > 
> > > > 4) '-deny-private-discard' argv, or environment variable, or both
> > > 
> > > I'd personally vote for (2).  How about "fdperm"?  To describe when we 
> > > want
> > > to use different rw permissions on the file (besides the access permission
> > > of the memory we already provided with "readonly"=XXX).  IIUC the only 
> > > sane
> > > value will be ro/rw/default, where "default" should just use the same rw
> > > permission as the memory ("readonly"=XXX).
> > > 
> > > Would that be relatively clean and also work in this use case?
> > > 
> > > (the other thing I'd wish we don't have that fallback is, as long as we
> > > have any of that "fallback" we'll need to be compatible with it since
> > > then, and for ever...)
> > 
> > If it must be (2), I would vote (2) + (4), with (4) adjust the default 
> > behavior of said `fdperm`.
> > Mainly because (private+discard) is itself not a good practice and (4) 
> > serves
> > as a good tool to help catch existing (private+discard) problems.
> 
> Instead of fdperm, maybe we could find a better name.
> 
> The man page of "open" says: The argument flags must include one of the
> following access modes: O_RDONLY, O_WRONLY, or O_RDWR.  These request
> opening the file read-only, write-only, or read/write, respectively.
> 
> So maybe something a bit more mouthful like "file-access-mode" would be
> better.

I don't think we should directly express the config in terms
of file-access-mode, as that's a low level impl detail. The
required file access mode is an artifact of the higher level
goal, or whether the RAM should be process private vs shared,
and whether we want QEMU to be able to create the backing
file or use pre-create one.

IOW, we should express whether or not we want QEMU to try to
pre-create the file or not.


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping

2023-08-17 Thread Daniel P . Berrangé

On Thu, Aug 10, 2023 at 04:19:45PM +0200, David Hildenbrand wrote:
> > > Most importantly, we won't be corrupting/touching the original file in any
> > > case, because it is R/O.
> > > 
> > > If we really want to be careful, we could clue that behavior to compat
> > > machines. I'm not really sure yet if we really have to go down that path.
> > > 
> > > Any other alternatives? I'd like to avoid new flags where not really
> > > required.
> > 
> > I was just thinking of a new flag. :) So have you already discussed that
> > possibility and decided that not a good idea?
> 
> Not really. I was briefly playing with that idea but already struggled to
> come up with a reasonable name :)
> 
> Less toggles and just have it working nice, if possible.

IMHO having a new flag is desirable, because it is directly
expressing the desired deployment scenario, such tat we get
good error reporting upon deployment mistakes, while at the
same time allowing the readonly usage.

> > The root issue to me here is we actually have two resources (memory map of
> > the process, and the file) but we only have one way to describe the
> > permissions upon the two objects.  I'd think it makes a lot more sense if a
> > new flag is added, when there's a need to differentiate the two.
> > 
> > Consider if you see a bunch of qemu instances with:
> > 
> >-mem-path $RAM_FILE
> > 
> > On the same host, which can be as weird as it could be to me.. At least
> > '-mem-path' looks still like a way to exclusively own a ram file for an
> > instance. I hesitate the new fallback can confuse people too, while that's
> > so far not the major use case.
> 
> Once I learned that this is not a MAP_SHARED mapping, I was extremely
> confused. For example, vhost-user with "-mem-path" will absolutely not work
> with "-mem-path", even though the documentation explicitly spells that out
> (I still have to send a patch to fix that).
> 
> I guess "-mem-path" was primarily only used to consume hugetlb. Even for
> tmpfs it will already result in a double memory consumption, just like when
> using -memory-backend-memfd,share=no.
> 
> I guess deprecating it was the right decision.

Regardless of whether its deprecated or not, I think its fine to just
say people need to use the more verbose memory-backend-file syntax
if they want to use an unusual deployment configuration where there is
a readonly backing file.

> > Nobody may really rely on any existing behavior of the failure, but
> > changing existing behavior is just always not wanted.  The guideline here
> > to me is: whether we want existing "-mem-path XXX" users to start using the
> > fallback in general?  If it's "no", then maybe it implies a new flag is
> > better?
> 
> I think we have the following options (there might be more)
> 
> 1) This patch.
> 
> 2) New flag for memory-backend-file. We already have "readonly" and
> "share=". I'm having a hard time coming up with a good name that really
> describes the subtle difference.
> 
> 3) Glue behavior to the QEMU machine
> 
> 
> For 3), one option would be to always open a COW file readonly (as Thiner
> originally proposed). We could leave "-mem-path" behavior alone and only
> change memory-backend-file semantics. If the COW file does *not* exist yet,
> we would refuse to create the file like patch 2+3 do. Therefore, no
> ftruncate() errors, and fallocate() errors would always happen.

I'm for (2).


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[ANNOUNCE] QEMU 8.1.0-rc4 is now available

2023-08-17 Thread Michael Roth

Hello,

On behalf of the QEMU Team, I'd like to announce the availability of the
fifth release candidate for the QEMU 8.1 release. This release is meant
for testing purposes and should not be used in a production environment.

  http://download.qemu.org/qemu-8.1.0-rc4.tar.xz
  http://download.qemu.org/qemu-8.1.0-rc4.tar.xz.sig

You can help improve the quality of the QEMU 8.1 release by testing this
release and reporting bugs using our GitLab issue tracker:

  https://gitlab.com/qemu-project/qemu/-/milestones/8#tab-issues

The release plan, as well a documented known issues for release
candidates, are available at:

  http://wiki.qemu.org/Planning/8.1

Please add entries to the ChangeLog for the 8.1 release below:

  http://wiki.qemu.org/ChangeLog/8.1

Thank you to everyone involved!

Changes since rc3:

0d52116fd8: Update version for v8.1.0-rc4 release (Richard Henderson)
d3b41127c2: tcg/i386: Output %gs prefix in tcg_out_vex_opc (Richard Henderson)
b274c2388e: hw/riscv/virt.c: change 'aclint' TCG check (Daniel Henrique Barboza)
136cb9cc03: target/riscv/kvm.c: fix mvendorid size in vcpu_set_machine_ids() 
(Daniel Henrique Barboza)
0f936247e8: pci: Fix the update of interrupt disable bit in PCI_COMMAND 
register (Guoyi Tu)
3d449bc603: hw/pci-host: Allow extended config space access for Designware PCIe 
host (Jason Chien)

Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping

2023-08-17 Thread David Hildenbrand


On 17.08.23 15:46, Daniel P. Berrangé wrote:

On Fri, Aug 11, 2023 at 09:00:54PM +0200, David Hildenbrand wrote:

On 11.08.23 07:49, ThinerLogoer wrote:

At 2023-08-11 05:24:43, "Peter Xu"  wrote:

On Fri, Aug 11, 2023 at 01:06:12AM +0800, ThinerLogoer wrote:

I think we have the following options (there might be more)

1) This patch.

2) New flag for memory-backend-file. We already have "readonly" and
"share=". I'm having a hard time coming up with a good name that really
describes the subtle difference.

3) Glue behavior to the QEMU machine



4) '-deny-private-discard' argv, or environment variable, or both


I'd personally vote for (2).  How about "fdperm"?  To describe when we want
to use different rw permissions on the file (besides the access permission
of the memory we already provided with "readonly"=XXX).  IIUC the only sane
value will be ro/rw/default, where "default" should just use the same rw
permission as the memory ("readonly"=XXX).

Would that be relatively clean and also work in this use case?

(the other thing I'd wish we don't have that fallback is, as long as we
have any of that "fallback" we'll need to be compatible with it since
then, and for ever...)


If it must be (2), I would vote (2) + (4), with (4) adjust the default behavior 
of said `fdperm`.
Mainly because (private+discard) is itself not a good practice and (4) serves
as a good tool to help catch existing (private+discard) problems.


Instead of fdperm, maybe we could find a better name.

The man page of "open" says: The argument flags must include one of the
following access modes: O_RDONLY, O_WRONLY, or O_RDWR.  These request
opening the file read-only, write-only, or read/write, respectively.

So maybe something a bit more mouthful like "file-access-mode" would be
better.


I don't think we should directly express the config in terms
of file-access-mode, as that's a low level impl detail. The
required file access mode is an artifact of the higher level
goal, or whether the RAM should be process private vs shared,
and whether we want QEMU to be able to create the backing
file or use pre-create one.


See my other mails "readonly" already expresses exactly that. So no need 
for "file-access-mode".


(and as far as I can see, no need for any other flags)

--
Cheers,

David / dhildenb

[PATCH 01/21] block: Remove unused BlockReopenQueueEntry.perms_checked

2023-08-17 Thread Kevin Wolf

This field has been unused since commit 72373e40fbc ('block:
bdrv_reopen_multiple: refresh permissions on updated graph').
Remove it.

Signed-off-by: Kevin Wolf 
---
 block.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/block.c b/block.c
index a307c151a8..6376452768 100644
--- a/block.c
+++ b/block.c
@@ -2113,7 +2113,6 @@ static int bdrv_fill_options(QDict **options, const char 
*filename,
 
 typedef struct BlockReopenQueueEntry {
  bool prepared;
- bool perms_checked;
  BDRVReopenState state;
  QTAILQ_ENTRY(BlockReopenQueueEntry) entry;
 } BlockReopenQueueEntry;
-- 
2.41.0

Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping

2023-08-17 Thread Daniel P . Berrangé

On Mon, Aug 07, 2023 at 09:07:32PM +0200, David Hildenbrand wrote:
> From: Thiner Logoer 
> 
> Users may specify
> * "-mem-path" or
> * "-object memory-backend-file,share=off,readonly=off"
> and expect such COW (MAP_PRIVATE) mappings to work, even if the user
> does not have write permissions to open the file.
> 
> For now, we would always fail in that case, always requiring file write
> permissions. Let's detect when that failure happens and fallback to opening
> the file readonly.
> 
> Warn the user, since there are other use cases where we want the file to
> be mapped writable: ftruncate() and fallocate() will fail if the file
> was not opened with write permissions.
> 
> Signed-off-by: Thiner Logoer 
> Co-developed-by: David Hildenbrand 
> Signed-off-by: David Hildenbrand 
> ---
>  softmmu/physmem.c | 26 ++
>  1 file changed, 18 insertions(+), 8 deletions(-)
> 
> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> index 3df73542e1..d1ae694b20 100644
> --- a/softmmu/physmem.c
> +++ b/softmmu/physmem.c
> @@ -1289,8 +1289,7 @@ static int64_t get_file_align(int fd)
>  static int file_ram_open(const char *path,
>   const char *region_name,
>   bool readonly,
> - bool *created,
> - Error **errp)
> + bool *created)
>  {
>  char *filename;
>  char *sanitized_name;
> @@ -1334,10 +1333,7 @@ static int file_ram_open(const char *path,
>  g_free(filename);
>  }
>  if (errno != EEXIST && errno != EINTR) {
> -error_setg_errno(errp, errno,
> - "can't open backing store %s for guest RAM",
> - path);
> -return -1;
> +return -errno;
>  }
>  /*
>   * Try again on EINTR and EEXIST.  The latter happens when
> @@ -1946,9 +1942,23 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, 
> MemoryRegion *mr,
>  bool created;
>  RAMBlock *block;
>  
> -fd = file_ram_open(mem_path, memory_region_name(mr), readonly, ,
> -   errp);
> +fd = file_ram_open(mem_path, memory_region_name(mr), readonly, );
> +if (fd == -EACCES && !(ram_flags & RAM_SHARED) && !readonly) {
> +/*
> + * We can have a writable MAP_PRIVATE mapping of a readonly file.
> + * However, some operations like ftruncate() or fallocate() might 
> fail
> + * later, let's warn the user.
> + */
> +fd = file_ram_open(mem_path, memory_region_name(mr), true, );
> +if (fd >= 0) {
> +warn_report("backing store %s for guest RAM (MAP_PRIVATE) opened"
> +" readonly because the file is not writable", 
> mem_path);

IIUC, from the description, the goal is that usage of a readonly
backing store is intented to be an explicitly supported deployment
configuration. At the time time though, this scenario could also be
a deployment mistake that we want to diagnose

It is inappropriate to issue warn_report() for things that are
supported usage.

It is also undesirable to continue execution in the case of things
which are a deployment mistake.

These two scenarios are mutually incompatible, so I understand why
you choose to fudge it with a warn_report().

I wonder if this is pointing to the need for another configuration
knob for the memory backend, to express the different desired usage
models.

We want O_WRONLY when opening the file, either if we want to file
shared, or so that we can ftruncate it to the right size, if it
does not exist. If shared=off and the file is pre-created at the
right size, we should be able to use O_RDONLY even if the file is
writable.

So what if we added a 'create=yes|no' option to memory-backend-file

   -object memory-backend-file,share=off,readonly=off,create=yes

would imply need for O_WRONLY|O_RDONLY, so that ftruncate() can
do its work. 

With share=off,create=no, we could unconditionally open O_RDONLY,
even if the file is writable.

This would let us support read-only backing files, without any
warn_reports() for this usage, while also stopping execution
with deployment mistakes

This doesn't help -mem-path, since it doesn't take options, but
IMHO it would be acceptable to say users need to use the more
verbose '-object memory-backend-file' instead.

> +}
> +}
>  if (fd < 0) {
> +error_setg_errno(errp, -fd,
> + "can't open backing store %s for guest RAM",
> + mem_path);
>  return NULL;
>  }
>  
> -- 
> 2.41.0
> 
> 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH] chardev/char-pty: Avoid losing bytes when the other side just (re-)connected

2023-08-17 Thread Marc-André Lureau

Hi

On Thu, Aug 17, 2023 at 5:06 PM Daniel P. Berrangé  wrote:
>
> On Thu, Aug 17, 2023 at 02:00:26PM +0200, Thomas Huth wrote:
> > On 17/08/2023 12.32, Daniel P. Berrangé wrote:
> > > On Wed, Aug 16, 2023 at 11:07:43PM +0200, Thomas Huth wrote:
> > > > When starting a guest via libvirt with "virsh start --console ...",
> > > > the first second of the console output is missing. This is especially
> > > > annoying on s390x that only has a text console by default and no 
> > > > graphical
> > > > output - if the bios fails to boot here, the information about what went
> > > > wrong is completely lost.
> > > >
> > > > One part of the problem (there is also some things to be done on the
> > > > libvirt side) is that QEMU only checks with a 1 second timer whether
> > > > the other side of the pty is already connected, so the first second of
> > > > the console output is always lost.
> > > >
> > > > This likely used to work better in the past, since the code once checked
> > > > for a re-connection during write, but this has been removed in commit
> > > > f8278c7d74 ("char-pty: remove the check for connection on write") to 
> > > > avoid
> > > > some locking.
> > > >
> > > > To ease the situation here at least a little bit, let's check with 
> > > > g_poll()
> > > > whether we could send out the data anyway, even if the connection has 
> > > > not
> > > > been marked as "connected" yet. The file descriptor is marked as 
> > > > non-blocking
> > > > anyway since commit fac6688a18 ("Do not hang on full PTY"), so this 
> > > > should
> > > > not cause any trouble if the other side is not ready for receiving yet.
> > > >
> > > > With this patch applied, I can now successfully see the bios output of
> > > > a s390x guest when running it with "virsh start --console" (with a 
> > > > patched
> > > > version of virsh that fixes the remaining issues there, too).
> > > >
> > > > Reported-by: Marc Hartmayer 
> > > > Signed-off-by: Thomas Huth 
> > > > ---
> > > >   chardev/char-pty.c | 22 +++---
> > > >   1 file changed, 19 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/chardev/char-pty.c b/chardev/char-pty.c
> > > > index 4e5deac18a..fad12dfef3 100644
> > > > --- a/chardev/char-pty.c
> > > > +++ b/chardev/char-pty.c
> > > > @@ -106,11 +106,27 @@ static void pty_chr_update_read_handler(Chardev 
> > > > *chr)
> > > >   static int char_pty_chr_write(Chardev *chr, const uint8_t *buf, int 
> > > > len)
> > > >   {
> > > >   PtyChardev *s = PTY_CHARDEV(chr);
> > > > +GPollFD pfd;
> > > > +int rc;
> > > > -if (!s->connected) {
> > > > -return len;
> > > > +if (s->connected) {
> > > > +return io_channel_send(s->ioc, buf, len);
> > > >   }
> > > > -return io_channel_send(s->ioc, buf, len);
> > > > +
> > > > +/*
> > > > + * The other side might already be re-connected, but the timer 
> > > > might
> > > > + * not have fired yet. So let's check here whether we can write 
> > > > again:
> > > > + */
> > > > +pfd.fd = QIO_CHANNEL_FILE(s->ioc)->fd;
> > > > +pfd.events = G_IO_OUT;
> > > > +pfd.revents = 0;
> > > > +rc = RETRY_ON_EINTR(g_poll(, 1, 0));
> > > > +g_assert(rc >= 0);
> > > > +if (!(pfd.revents & G_IO_HUP) && (pfd.revents & G_IO_OUT)) {
> > >
> > > Should (can?) we call
> > >
> > > pty_chr_state(chr, 1);
> > >
> > > here ?
> >
> > As far as I understood commit f8278c7d74c6 and f7ea2038bea04628, this is not
> > possible anymore since the lock has been removed.
> >
> > > > +io_channel_send(s->ioc, buf, len);
> > >
> > > As it feels a little dirty to be sending data before setting the
> > > 'connected == 1' and thus issuing the 'CHR_EVENT_OPENED' event
> >
> > I didn't find a really better solution so far. We could maybe introduce a
> > buffer in the char-pty code and store the last second of guest output, but
> > IMHO that's way more complex and thus somewhat ugly, too?
>
> The orignal commit f8278c7d74c6 said
>
> [quote]
> char-pty: remove the check for connection on write
>
> This doesn't help much compared to the 1 second poll PTY
> timer. I can't think of a use case where this would help.
> [/quote]
>
> We've now identified a use case where it is actually important.
>
> IOW, there's a justification to revert both f7ea2038bea04628 and
> f8278c7d74c6, re-adding the locking and write update logic.

Indeed. But isn't it possible to watch for IO_OUT and get rid of the timer?

Other thing I question is whether the serial shouldn't return BUSY if
the chardev is disconnected..

Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping

2023-08-17 Thread David Hildenbrand


On 17.08.23 15:37, Daniel P. Berrangé wrote:

On Mon, Aug 07, 2023 at 09:07:32PM +0200, David Hildenbrand wrote:

From: Thiner Logoer 

Users may specify
* "-mem-path" or
* "-object memory-backend-file,share=off,readonly=off"
and expect such COW (MAP_PRIVATE) mappings to work, even if the user
does not have write permissions to open the file.

For now, we would always fail in that case, always requiring file write
permissions. Let's detect when that failure happens and fallback to opening
the file readonly.

Warn the user, since there are other use cases where we want the file to
be mapped writable: ftruncate() and fallocate() will fail if the file
was not opened with write permissions.

Signed-off-by: Thiner Logoer 
Co-developed-by: David Hildenbrand 
Signed-off-by: David Hildenbrand 
---
  softmmu/physmem.c | 26 ++
  1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 3df73542e1..d1ae694b20 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -1289,8 +1289,7 @@ static int64_t get_file_align(int fd)
  static int file_ram_open(const char *path,
   const char *region_name,
   bool readonly,
- bool *created,
- Error **errp)
+ bool *created)
  {
  char *filename;
  char *sanitized_name;
@@ -1334,10 +1333,7 @@ static int file_ram_open(const char *path,
  g_free(filename);
  }
  if (errno != EEXIST && errno != EINTR) {
-error_setg_errno(errp, errno,
- "can't open backing store %s for guest RAM",
- path);
-return -1;
+return -errno;
  }
  /*
   * Try again on EINTR and EEXIST.  The latter happens when
@@ -1946,9 +1942,23 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, 
MemoryRegion *mr,
  bool created;
  RAMBlock *block;
  
-fd = file_ram_open(mem_path, memory_region_name(mr), readonly, ,

-   errp);
+fd = file_ram_open(mem_path, memory_region_name(mr), readonly, );
+if (fd == -EACCES && !(ram_flags & RAM_SHARED) && !readonly) {
+/*
+ * We can have a writable MAP_PRIVATE mapping of a readonly file.
+ * However, some operations like ftruncate() or fallocate() might fail
+ * later, let's warn the user.
+ */
+fd = file_ram_open(mem_path, memory_region_name(mr), true, );
+if (fd >= 0) {
+warn_report("backing store %s for guest RAM (MAP_PRIVATE) opened"
+" readonly because the file is not writable", 
mem_path);


IIUC, from the description, the goal is that usage of a readonly
backing store is intented to be an explicitly supported deployment
configuration. At the time time though, this scenario could also be
a deployment mistake that we want to diagnose


FWIW, I abandoned this approach here and instead will look into making

memory-backend-file,readonly=on,share=off

create RAM instead of ROM.

The fallback was wrong after realizing what "readonly" actually is 
supposed to do.


I stared at libvirt, an even it never seems to set readonly=on for R/O 
DIMMs, so you always get RAM and then tell the nvdimm device to not 
perform any writes (unarmed=on)


--
Cheers,

David / dhildenb

Re: [PATCH v3 1/7] vdpa: Use iovec for vhost_vdpa_net_load_cmd()

2023-08-17 Thread Eugenio Perez Martin

On Thu, Aug 17, 2023 at 2:42 PM Hawkins Jiawei  wrote:
>
> On 2023/8/17 17:23, Eugenio Perez Martin wrote:
> > On Fri, Jul 7, 2023 at 5:27 PM Hawkins Jiawei  wrote:
> >>
> >> According to VirtIO standard, "The driver MUST follow
> >> the VIRTIO_NET_CTRL_MAC_TABLE_SET command by a le32 number,
> >> followed by that number of non-multicast MAC addresses,
> >> followed by another le32 number, followed by that number
> >> of multicast addresses."
> >>
> >> Considering that these data is not stored in contiguous memory,
> >> this patch refactors vhost_vdpa_net_load_cmd() to accept
> >> scattered data, eliminating the need for an addtional data copy or
> >> packing the data into s->cvq_cmd_out_buffer outside of
> >> vhost_vdpa_net_load_cmd().
> >>
> >> Signed-off-by: Hawkins Jiawei 
> >> ---
> >> v3:
> >>- rename argument name to `data_sg` and `data_num`
> >>- use iov_to_buf() suggested by Eugenio
> >>
> >> v2: 
> >> https://lore.kernel.org/all/6d3dc0fc076564a03501e222ef1102a6a7a643af.1688051252.git.yin31...@gmail.com/
> >>- refactor vhost_vdpa_load_cmd() to accept iovec suggested by
> >> Eugenio
> >>
> >>   net/vhost-vdpa.c | 33 +
> >>   1 file changed, 25 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> >> index 373609216f..31ef6ad6ec 100644
> >> --- a/net/vhost-vdpa.c
> >> +++ b/net/vhost-vdpa.c
> >> @@ -620,29 +620,38 @@ static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState 
> >> *s, size_t out_len,
> >>   }
> >>
> >>   static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, uint8_t class,
> >> -   uint8_t cmd, const void *data,
> >> -   size_t data_size)
> >> +   uint8_t cmd, const struct iovec 
> >> *data_sg,
> >> +   size_t data_num)
> >>   {
> >>   const struct virtio_net_ctrl_hdr ctrl = {
> >>   .class = class,
> >>   .cmd = cmd,
> >>   };
> >> +size_t data_size = iov_size(data_sg, data_num);
> >>
> >>   assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - sizeof(ctrl));
> >>
> >> +/* pack the CVQ command header */
> >>   memcpy(s->cvq_cmd_out_buffer, , sizeof(ctrl));
> >> -memcpy(s->cvq_cmd_out_buffer + sizeof(ctrl), data, data_size);
> >>
> >> -return vhost_vdpa_net_cvq_add(s, sizeof(ctrl) + data_size,
> >> +/* pack the CVQ command command-specific-data */
> >> +iov_to_buf(data_sg, data_num, 0,
> >> +   s->cvq_cmd_out_buffer + sizeof(ctrl), data_size);
> >> +
> >> +return vhost_vdpa_net_cvq_add(s, data_size + sizeof(ctrl),
> >
> > Nit, any reason for changing the order of the addends? sizeof(ctrl) +
> > data_size ?
>
> Hi Eugenio,
>
> Here the code should be changed to `sizeof(ctrl) + data_size` as you
> point out.
>
> Since this patch series has already been merged into master, I will
> submit a separate patch to correct this problem.
>

Ouch, I didn't realize that. No need to make it back again, I was just
trying to reduce lines changed.

> >
> >> sizeof(virtio_net_ctrl_ack));
> >>   }
> >>
> >>   static int vhost_vdpa_net_load_mac(VhostVDPAState *s, const VirtIONet *n)
> >>   {
> >>   if (virtio_vdev_has_feature(>parent_obj, 
> >> VIRTIO_NET_F_CTRL_MAC_ADDR)) {
> >> +const struct iovec data = {
> >> +.iov_base = (void *)n->mac,
> >
> > Assign to void should always be valid, no need for casting here.
>
> Yes, assign to void should be valid for normal pointers.
>
> However, `n->mac` is an array and is treated as a const pointer. It will
> trigger the warning "error: initialization discards ‘const’ qualifier
> from pointer target type" if we don't add this cast.
>

Got it, I didn't realize it. Everything is ok then.

Thanks!

> Thanks!
>
>
> >
> >> +.iov_len = sizeof(n->mac),
> >> +};
> >>   ssize_t dev_written = vhost_vdpa_net_load_cmd(s, 
> >> VIRTIO_NET_CTRL_MAC,
> >> 
> >> VIRTIO_NET_CTRL_MAC_ADDR_SET,
> >> -  n->mac, sizeof(n->mac));
> >> +  , 1);
> >>   if (unlikely(dev_written < 0)) {
> >>   return dev_written;
> >>   }
> >> @@ -665,9 +674,13 @@ static int vhost_vdpa_net_load_mq(VhostVDPAState *s,
> >>   }
> >>
> >>   mq.virtqueue_pairs = cpu_to_le16(n->curr_queue_pairs);
> >> +const struct iovec data = {
> >> +.iov_base = ,
> >> +.iov_len = sizeof(mq),
> >> +};
> >>   dev_written = vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_MQ,
> >> -  
> >> VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET, ,
> >> -  sizeof(mq));
> >> +  VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET,
> >> +

Re: [PATCH 2/2] ui/vdagent: Unregister input handler of mouse during finalization

2023-08-17 Thread Marc-André Lureau

On Thu, Aug 17, 2023 at 3:33 PM  wrote:
>
> From: Guoyi Tu 
>
> Input handler resource should be released when
> VDAgentChardev object finalize
>
> Signed-off-by: Guoyi Tu 
> Signed-off-by: dengpengcheng 

Reviewed-by: Marc-André Lureau 

> ---
>  ui/vdagent.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/ui/vdagent.c b/ui/vdagent.c
> index 386dc5abe0..4c9b3b7ba8 100644
> --- a/ui/vdagent.c
> +++ b/ui/vdagent.c
> @@ -924,6 +924,9 @@ static void vdagent_chr_fini(Object *obj)
>  {
>  VDAgentChardev *vd = QEMU_VDAGENT_CHARDEV(obj);
>
> +if (vd->mouse_hs) {
> +qemu_input_handler_unregister(vd->mouse_hs);
> +}
>  migrate_del_blocker(vd->migration_blocker);
>  buffer_free(>outbuf);
>  error_free(vd->migration_blocker);
> --
> 2.27.0
>
>


-- 
Marc-André Lureau

1 2 >

1 - 100 of 196 matches

Mail list logo