date:20211115

Re: [PATCH RFC 2/2] hw: Replace drive_get_next() by drive_get()

2021-11-15 Thread Markus Armbruster

Philippe Mathieu-Daudé  writes:

> On 11/15/21 16:57, Markus Armbruster wrote:
>> Philippe Mathieu-Daudé  writes:
>>> On 11/15/21 13:55, Markus Armbruster wrote:
 drive_get_next() is basically a bad idea.  It returns the "next" block
 backend of a certain interface type.  "Next" means bus=0,unit=N, where
 subsequent calls count N up from zero, per interface type.

 This lets you define unit numbers implicitly by execution order.  If the
 order changes, or new calls appear "in the middle", unit numbers change.
 ABI break.  Hard to spot in review.

 Explicit is better than implicit: use drive_get() directly.

 Signed-off-by: Markus Armbruster 
 ---
>
 @@ -435,11 +438,13 @@ static void aspeed_machine_init(MachineState 
 *machine)
  }

  for (i = 0; i < bmc->soc.sdhci.num_slots; i++) {
 -sdhci_attach_drive(>soc.sdhci.slots[i], 
 drive_get_next(IF_SD));
 +sdhci_attach_drive(>soc.sdhci.slots[i],
 +   drive_get(IF_SD, 0, i));
>>>
>>> If we put SD on bus #0, ...
>>>
  }

  if (bmc->soc.emmc.num_slots) {
 -sdhci_attach_drive(>soc.emmc.slots[0], 
 drive_get_next(IF_SD));
 +sdhci_attach_drive(>soc.emmc.slots[0],
 +   drive_get(IF_SD, 0, bmc->soc.sdhci.num_slots));
>>>
>>> ... we'd want to put eMMC on bus #1
>> 
>> Using separate buses for different kinds of devices would be neater, but
>> it also would be an incompatible change.  This patch keeps existing
>> bus/unit numbers working.  drive_get_next() can only use bus 0.
>> 
>>>  but I see having eMMC cards on a
>>> IF_SD bus as a bug, since these cards are soldered on the board.
>> 
>> IF_SD is not a bus, it's an "block interface type", which is really just
>> a user interface thing.
>
> Why are we discriminating by "block interface type" then?
>
> What is the difference between "block interfaces"? I see a block drive
> as a generic unit, usable on multiple hardware devices.
>
> I never really understood how this "block interface type" helps
> developers and users. I thought BlockInterfaceType and DriveInfo
> were legacy / deprecated APIs we want to get rid of; and we would
> come up with a replacement API using BlockDeviceInfo or providing
> a BlockFrontend state of the art object.
> Anyway, I suppose the explanation is buried in the git history
> before the last 8 years. I need to keep reading.

In the beginning (v0.4.2), there was -hda and -hdb, and life was simple.

Then there was -hdc, -hdd, -cdrom (v0.5.1), -fda, -fdb (v0.6.0),
-mtdblock, -sd, -pflash (v0.9.1).

All these options do two things: they create a block backend, and they
request the board to create a certain block frontend for it, similar to
other options of this vintage, like -serial, -parallel, and -net.
Boards generally ignore requests they don't understand, but that's just
sloppiness.

For each set of related options, there was a global variable holding the
requests: bs_table[] for -hda, -hdb, -hdc, -hdd, -cdrom; fd_table[]
-fda, -fdb; mtd_bdrv for -mtd; sd_drv for -ds; pflash_table[] for
-pflash.

The options replaced prior ones, except for -pflash, which appended to
its table.

bs_table[]'s index had a peculiar meaning: it's bus * MAX_IDE_DEVS +
unit.  This ensures that -hda (index 0) goes on IDE bus 0 as unit 0;
-hdb on bus 0, unit 1; -hdc on 1, 0; -hdc on 1, 1.

Life was now complicated enough for a generalization (v0.9.1), so there
was -drive (v0.9.1).  All the variables holding requests were fused into
drives_table[].  Table elements are identified by (type, bus, unit),
where type is an enum whose members correspond to the old global
variables: IF_IDE for bs_table[], IF_FLOPPY for fd_table[], and so
forth.  So:

-hda becomes type = IF_IDE, bus = 0, unit = 0
-hdb becomes type = IF_IDE, bus = 0, unit = 1
...
-sd becomes type = IF_SD, bus = 0, unit = 0
1st -pflash becomes type = IF_PFLASH, bus = 0, unit = 0
2nd -pflash becomes type = IF_PFLASH, bus = 0, unit = 1
...

Other mappings from old to new global variables would have been
possible.  I figure this one was chosen because it comes with a
reasonable user interface.  Identifying block devices by (interface
type, bus, unit) is certainly nicer than by index in bs_table[].  Since
bus and/or unit make sense only with some interface types, they are
optional.

Things calmed down for a couple of years, until -device appeared
(v0.12).  Now we needed a way to define just a backend, without
requesting a frontend from the board.  Instead of inventing a new
option, this became IF_NONE, with meaningless bus and unit.

Over the next years, the block layer outgrew -drive's limited
capabilities to define frontends.  -blockdev appeard (v1.7.0) and
matured over several releases.  I don't remember exactly when it became
stable, relegating -drive if=none to legacy status.

What's *not*

Re: [PATCH] dump-guest-memory: Use BQL to protect dump finalize process

2021-11-15 Thread Marc-André Lureau

Hi

On Tue, Nov 16, 2021 at 7:22 AM Peter Xu  wrote:
>
> When finalizing the dump-guest-memory with detached mode, we'll first set dump
> status to either FAIL or COMPLETE before doing the cleanup, however right 
> after
> the dump status change it's possible that another dump-guest-memory qmp 
> command
> is sent so both the main thread and dump thread (which is during cleanup) 
> could
> be accessing dump state in paralell.  That's racy.

parallel

>
> Fix it by protecting the finalizing phase of dump-guest-memory using BQL as
> well.  To do that, we expand the BQL from dump_cleanup() into dump_process(),
> so we will take the BQL longer than before.  The critical section must cover
> the status switch from ACTIVE->{FAIL|COMPLETE} so as to make sure there's no
> race any more.
>
> We can also just introduce a specific mutex to serialize the dump process, but
> BQL should be enough for now so far, not to mention vm_start() in dump_cleanup
> will need BQL anyway, so maybe easier to just use the same mutex.
>
> Reported-by: Laszlo Ersek 
> Signed-off-by: Peter Xu 

Reviewed-by: Marc-André Lureau 

> ---
>  dump/dump.c | 24 ++--
>  1 file changed, 18 insertions(+), 6 deletions(-)
>
> diff --git a/dump/dump.c b/dump/dump.c
> index 662d0a62cd..196b7b8ab9 100644
> --- a/dump/dump.c
> +++ b/dump/dump.c
> @@ -96,13 +96,7 @@ static int dump_cleanup(DumpState *s)
>  g_free(s->guest_note);
>  s->guest_note = NULL;
>  if (s->resume) {
> -if (s->detached) {
> -qemu_mutex_lock_iothread();
> -}
>  vm_start();
> -if (s->detached) {
> -qemu_mutex_unlock_iothread();
> -}
>  }
>  migrate_del_blocker(dump_migration_blocker);
>
> @@ -1873,6 +1867,11 @@ static void dump_process(DumpState *s, Error **errp)
>  Error *local_err = NULL;
>  DumpQueryResult *result = NULL;
>
> +/*
> + * When running with detached mode, these operations are not run with 
> BQL.
> + * It's still safe, because it's protected by setting s->state to ACTIVE,
> + * so dump_in_progress() check will block yet another dump-guest-memory.
> + */
>  if (s->has_format && s->format == DUMP_GUEST_MEMORY_FORMAT_WIN_DMP) {
>  #ifdef TARGET_X86_64
>  create_win_dump(s, _err);
> @@ -1883,6 +1882,15 @@ static void dump_process(DumpState *s, Error **errp)
>  create_vmcore(s, _err);
>  }
>
> +/*
> + * Serialize the finalizing of dump process using BQL to make sure no
> + * concurrent access to DumpState is allowed.  BQL is also required for
> + * dump_cleanup as vm_start() needs it.
> + */
> +if (s->detached) {
> +qemu_mutex_lock_iothread();
> +}
> +
>  /* make sure status is written after written_size updates */
>  smp_wmb();
>  qatomic_set(>status,
> @@ -1898,6 +1906,10 @@ static void dump_process(DumpState *s, Error **errp)
>
>  error_propagate(errp, local_err);
>  dump_cleanup(s);
> +
> +if (s->detached) {
> +qemu_mutex_unlock_iothread();
> +}
>  }
>
>  static void *dump_thread(void *data)
> --
> 2.32.0
>

Re: [PATCH v2 1/3] icount: preserve cflags when custom tb is about to execute

2021-11-15 Thread Pavel Dovgalyuk


On 11.11.2021 15:20, Alex Bennée wrote:


Pavel Dovgalyuk  writes:


When debugging with the watchpoints, qemu may need to create
TB with single instruction. This is achieved by setting cpu->cflags_next_tb.
But when this block is about to execute, it may be interrupted by another
thread. In this case cflags will be lost and next executed TB will not
be the special one.
This patch checks TB exit reason and restores cflags_next_tb to allow
finding the interrupted block.

Signed-off-by: Pavel Dovgalyuk 
---
  accel/tcg/cpu-exec.c |   10 ++
  1 file changed, 10 insertions(+)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 2d14d02f6c..df12452b8f 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -846,6 +846,16 @@ static inline void cpu_loop_exec_tb(CPUState *cpu, 
TranslationBlock *tb,
   * cpu_handle_interrupt.  cpu_handle_interrupt will also
   * clear cpu->icount_decr.u16.high.
   */
+if (cpu->cflags_next_tb == -1
+&& (!use_icount || !(tb->cflags & CF_USE_ICOUNT)


Why check use_icount here? The cflags should always have CF_USE_ICOUNT
set when icount is enabled. Lets not over complicate the inverted ||
tests we have here.


Not really. Sometimes we use non-icount blocks in icount mode.
But AFAIR they are used only for triggering the exeptions, but not for 
real execution.





+|| cpu_neg(cpu)->icount_decr.u16.low >= tb->icount))
{


Is u16.low ever set when icount isn't enabled?


This condition is checked for icount mode only.
u16.low is not used without icount.




+/*
+ * icount is disabled or there are enough instructions
+ * in the budget, do not retranslate this block with
+ * different parameters.
+ */
+cpu->cflags_next_tb = tb->cflags;
+}
  return;
  }

[PATCH] migration: fix dump-vmstate with modules

2021-11-15 Thread Laurent Vivier

To work correctly -dump-vmstate and vmstate-static-checker.py need to
dump all the supported vmstates.

But as some devices can be modules, they are not loaded at startup and not
dumped. Fix that by loading all available modules before dumping the
machine vmstate.

Fixes: 7ab6e7fcce97 ("qdev: device module support")
Cc: kra...@redhat.com
Signed-off-by: Laurent Vivier 
---
 softmmu/vl.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/softmmu/vl.c b/softmmu/vl.c
index 1159a64bce4e..620a1f1367e2 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -3766,6 +3766,7 @@ void qemu_init(int argc, char **argv, char **envp)
 
 if (vmstate_dump_file) {
 /* dump and exit */
+module_load_qom_all();
 dump_vmstate_json_to_file(vmstate_dump_file);
 exit(0);
 }
-- 
2.33.1

Re: [PATCH v5 0/6] Add vmnet.framework based network backend

2021-11-15 Thread Jason Wang

On Mon, Nov 15, 2021 at 6:45 PM Vladislav Yaroshchuk
 wrote:
>
>
>
> пн, 15 нояб. 2021 г. в 07:53, Jason Wang :
>>
>> On Fri, Nov 12, 2021 at 5:14 PM Vladislav Yaroshchuk
>>  wrote:
>> >
>> > macOS provides networking API for VMs called 'vmnet.framework':
>> > https://developer.apple.com/documentation/vmnet
>> >
>> > We can provide its support as the new QEMU network backends which
>> > represent three different vmnet.framework interface usage modes:
>> >
>> >   * `vmnet-shared`:
>> > allows the guest to communicate with other guests in shared mode and
>> > also with external network (Internet) via NAT. Has (macOS-provided)
>> > DHCP server; subnet mask and IP range can be configured;
>> >
>> >   * `vmnet-host`:
>> > allows the guest to communicate with other guests in host mode.
>> > By default has enabled DHCP as `vmnet-shared`, but providing
>> > network unique id (uuid) can make `vmnet-host` interfaces isolated
>> > from each other and also disables DHCP.
>> >
>> >   * `vmnet-bridged`:
>> > bridges the guest with a physical network interface.
>> >
>> > This backends cannot work on macOS Catalina 10.15 cause we use
>> > vmnet.framework API provided only with macOS 11 and newer. Seems
>> > that it is not a problem, because QEMU guarantees to work on two most
>> > recent versions of macOS which now are Big Sur (11) and Monterey (12).
>> >
>> > Also, we have one inconvenient restriction: vmnet.framework interfaces
>> > can create only privileged user:
>> > `$ sudo qemu-system-x86_64 -nic vmnet-shared`
>> >
>> > Attempt of `vmnet-*` netdev creation being unprivileged user fails with
>> > vmnet's 'general failure'.
>> >
>> > This happens because vmnet.framework requires `com.apple.vm.networking`
>> > entitlement which is: "restricted to developers of virtualization software.
>> > To request this entitlement, contact your Apple representative." as Apple
>> > documentation says:
>> > https://developer.apple.com/documentation/bundleresources/entitlements/com_apple_vm_networking
>>
>> Do you know how multipass work? Looks like it uses vmnet without privileges.
>>
>
> I've just checked this, and they still need root privileges. They have a
> `multipassd` daemon which is launched as root by launchd by default.
>
> ```
> bash-5.1$ ps axo ruser,ruid,comm | grep multipass
> root 0 /Library/Application 
> Support/com.canonical.multipass/bin/multipassd
> root 0 /Library/Application 
> Support/com.canonical.multipass/bin/hyperkit
> ```
>
> That's the reason why it's required to 'enter a password' while multipass 
> installation:
> it creates launchd plist (kinda launch rule) and places it to 
> /Library/LaunchDaemons/,
> which can be done only with root privileges.
>
> ```
> bash-5.1$ ll /Library/LaunchDaemons | grep multipass
> -rw-r--r--   1 root  wheel   1.1K 15 Nov 12:47 com.canonical.multipassd.plist
> ```
>
> And after this launchd launches this multipassd daemon as root every boot.
>
> So an unprivileged user can launch a multipass VM instance, but actually the
> `hyperkit` process which interacts with vmnet is gonna be launched by 
> multipassd
> running as root.

I wonder how it passes the vmnet object to qemu? Nothing obvious from
the qemu command line that multipass launched:

-nic vmnet-macos,mode=shared,model=virtio-net-pci,mac=52:54:00:52:e8:e4

(But I haven't had time to check their qemu codes).

>
> tl;dr sadly, we can't interact with vmnet.framework without having our binary 
> correctly
> signed and being an unprivileged user. Root privileges or special signature 
> with
> entitlement is required.

This is something similar to what happens in other OS. E.g for the tap
backend, it can't be created without privileges. So qemu allows to:

1) the TAP to be created by privileged program like libvirt, and its
fd could be passed to qemu via SCM_RIGHTS
2) run a set-uid helper to create and config TAP

This is something we need to consider now or in the future probably.

Thanks

>
>
>> Thanks
>>
>
> Thank you for your review, I will check it this week and reply as soon as 
> possible.
>
>>
>> >
>> > One more note: we still have quite useful but not supported
>> > 'vmnet.framework' features as creating port forwarding rules, IPv6
>> > NAT prefix specifying and so on.
>> >
>> > Nevertheless, new backends work fine and tested within `qemu-system-x86-64`
>> > on macOS Bir Sur 11.5.2 host with such nic models:
>> >   * e1000-82545em
>> >   * virtio-net-pci
>> >   * vmxnet3
>> >
>> > The guests were:
>> >   * macOS 10.15.7
>> >   * Ubuntu Bionic (server cloudimg)
>> >
>> >
>> > This series partially reuses patches by Phillip Tennen:
>> > https://patchew.org/QEMU/20210218134947.1860-1-phillip.en...@gmail.com/
>> > So I included his signed-off line into one of the commit messages and
>> > also here.
>> >
>> > v1 -> v2:
>> >  Since v1 minor typos were fixed, patches rebased onto latest master,
>> >  redundant changes removed (small commits squashed)
>> >

Re: [PATCH v2 1/6] pci: implement power state

2021-11-15 Thread Gerd Hoffmann

  Hi,

> >  new_addr = pci_bar_address(d, i, r->type, r->size);
> > +if (!d->has_power) {
> > +new_addr = PCI_BAR_UNMAPPED;
> > +}
> >  
> >  /* This bar isn't changed */
> >  if (new_addr == r->addr)
> 
> I am a bit confused about why this is necessary.
> When power is removed device is reset, does not
> this disable device memory automatically?

Hmm.   While it clearly doesn't hurt it might not be needed indeed.  The
reset-on-poweroff should make sure both bars and dma are off, and due to
config space access being blocked the guest shouldn't be able to turn
them on while device power is off.

So, yes, in theory we should be able to drop this.  Assuming nothing in
qemu ever touches the bar mappings (host access to config space is not
blocked).

I'll have a look.

take care,
  Gerd

Re: [PATCH for 6.2 v4] nbd/server: Add --selinux-label option

2021-11-15 Thread Thomas Huth


On 15/11/2021 21.29, Eric Blake wrote:

From: "Richard W.M. Jones" 

Under SELinux, Unix domain sockets have two labels.  One is on the
disk and can be set with commands such as chcon(1).  There is a
different label stored in memory (called the process label).  This can
only be set by the process creating the socket.  When using SELinux +
SVirt and wanting qemu to be able to connect to a qemu-nbd instance,
you must set both labels correctly first.

For qemu-nbd the options to set the second label are awkward.  You can
create the socket in a wrapper program and then exec into qemu-nbd.
Or you could try something with LD_PRELOAD.

This commit adds the ability to set the label straightforwardly on the
command line, via the new --selinux-label flag.  (The name of the flag
is the same as the equivalent nbdkit option.)

...

@@ -3430,6 +3437,7 @@ summary_info += {'libdaxctl support': libdaxctl}
  summary_info += {'libudev':   libudev}
  # Dummy dependency, keep .found()
  summary_info += {'FUSE lseek':fuse_lseek.found()}
+summary_info += {'selinux':   selinux.found()}


It's nicer if you do it like this (i.e. without the .found()):

summary_info += {'selinux':   selinux}

... then meson prints out the version of the library, too.

Apart from that, patch looks fine to me:

Reviewed-by: Thomas Huth

Re: [PATCH] gitlab-ci: Test compilation on Windows with MSYS2

2021-11-15 Thread Marc-André Lureau

Hi

On Mon, Nov 15, 2021 at 6:31 PM Philippe Mathieu-Daudé 
wrote:

> On 11/15/21 15:06, Thomas Huth wrote:
> > Gitlab also provides runners with Windows, we can use them to
> > test compilation with MSYS2, in both, 64-bit and 32-bit.
> >
> > However, it takes quite a long time to set up the VM, so to
> > stay in the 1h time frame, we can only compile and check one
> > target here.
>

I wonder why gitlab does not offer the docker executor. On the freedesktop
gitlab instance, they have windows docker executor, which speeds up the
build time. Maybe we could also have our own Windows runner for qemu?

>
> > Signed-off-by: Thomas Huth 
> > ---
> >  "make check" recently broke on MSYS2, and nobody noticed since
> apparently
> >  hardly anybody looks at the cirrus-CI output ... so here's another try
> >  to get some more test coverage in this area in the gitlab-CI instead.
> >  Patch needs to be applied after the "tests/unit/test-smp-parse:
> >  Make an unified name for the tested machine" patch to get "make check"
> >  fixed first.
> >
> >  RFC -> v1:
> >  - Use cache to speed up installation a little bit
> >  - Add a 32-bit builder, too
> >
> >  .gitlab-ci.d/qemu-project.yml |  1 +
> >  .gitlab-ci.d/windows.yml  | 98 +++
> >  2 files changed, 99 insertions(+)
> >  create mode 100644 .gitlab-ci.d/windows.yml
> >
> > diff --git a/.gitlab-ci.d/qemu-project.yml
> b/.gitlab-ci.d/qemu-project.yml
> > index b3d79bc429..871262fe0e 100644
> > --- a/.gitlab-ci.d/qemu-project.yml
> > +++ b/.gitlab-ci.d/qemu-project.yml
> > @@ -11,3 +11,4 @@ include:
> >- local: '/.gitlab-ci.d/static_checks.yml'
> >- local: '/.gitlab-ci.d/custom-runners.yml'
> >- local: '/.gitlab-ci.d/cirrus.yml'
> > +  - local: '/.gitlab-ci.d/windows.yml'
> > diff --git a/.gitlab-ci.d/windows.yml b/.gitlab-ci.d/windows.yml
> > new file mode 100644
> > index 00..309f7e7fb8
> > --- /dev/null
> > +++ b/.gitlab-ci.d/windows.yml
> > @@ -0,0 +1,98 @@
> > +.shared_msys2_builder:
> > +  tags:
> > +  - shared-windows
> > +  - windows
> > +  - windows-1809
> > +  cache:
> > +key: "${CI_JOB_NAME}-cache"
>
> It would be nice to cache the shared 'before_script' part,
> but it doesn't seems trivial; meanwhile this patch works and
> is KISS, so:
>
> Reviewed-by: Philippe Mathieu-Daudé 
>
> > +paths:
> > +  - ${CI_PROJECT_DIR}/msys64/var/cache
> > +  needs: []
> > +  stage: build
> > +  timeout: 70m
> > +  before_script:
> > +  - If ( !(Test-Path -Path msys64\var\cache ) ) {
> > +  mkdir msys64\var\cache
> > +}
> > +  - If ( !(Test-Path -Path msys64\var\cache\msys2.exe ) ) {
> > +  Invoke-WebRequest
> > +  "
> https://github.com/msys2/msys2-installer/releases/download/2021-07-25/msys2-base-x86_64-20210725.sfx.exe
> "
> > +  -outfile "msys64\var\cache\msys2.exe"
> > +}
> > +  - msys64\var\cache\msys2.exe -y
> > +  - ((Get-Content -path .\msys64\etc\\post-install\\07-pacman-key.post
> -Raw)
> > +  -replace '--refresh-keys', '--version') |
> > + Set-Content -Path ${CI_PROJECT_DIR}\msys64\etc\\post-install\\
> 07-pacman-key.post
> > +  - .\msys64\usr\bin\bash -lc "sed -i 's/^CheckSpace/#CheckSpace/g'
> /etc/pacman.conf"
> > +  - .\msys64\usr\bin\bash -lc 'pacman --noconfirm -Syuu'  # Core update
> > +  - .\msys64\usr\bin\bash -lc 'pacman --noconfirm -Syuu'  # Normal
> update
> > +  - taskkill /F /FI "MODULES eq msys-2.0.dll"
> > +
> > +msys2-64bit:
> > +  extends: .shared_msys2_builder
> > +  script:
> > +  - .\msys64\usr\bin\bash -lc "pacman -Sy --noconfirm --needed
> > +  diffutils git grep make sed
> > +  mingw-w64-x86_64-capstone
> [...]
>
>

-- 
Marc-André Lureau

Re: [PATCH] qmp: Stabilize preconfig

2021-11-15 Thread Paolo Bonzini

El lun., 15 nov. 2021 16:40, Markus Armbruster  escribió:

> > Why do you care?  For another example, you can use "reboot" or
> > "systemctl isolate reboot.target" and they end up doing the same thing.
> >
> > As long as qemu_init invokes qmp_machine_set, qmp_accel_set,
> > qmp_device_add, qmp_plugin_add, qmp_cont, etc. to do its job, the
> > difference between qemu-system-* and qemu-qmp-* is a couple thousands
> > lines of boring code that all but disappears once the VM is up and
> > running.  IOW, with the right design (e.g. shortcut options for QOM
> > properties good; dozens of global variables bad), there's absolutely no
> > issue with some people using qemu-system-* and some using qemu-qmp-*.
>
> I think maintaining two binaries forever is madness.  I want the old one
> to wither away.


This seems a bit dogmatic to me. The difference between the two binaries
would be literally a single file, which basically disappears before
anything interesting happens.

Making the new binary capable of serving all use cases should not be
> hard, just work (see my design sketch).  I expect the result to serve
> *better* than the mess we have now.
>

Most of the mess is in the implementation. Not all, granted. But overall
softmmu/vl.c's ugliness is mostly due to the layers of backwards
compatibility, and that wouldn't go away very soon.

 My point is that we still have quite a few commands without
>  'allow-preconfig' mostly because we are afraid of allowing them in
>  preconfig state, not because of true phase dependencies.
> >>>
> >>> I think there's very few of them, if any (outside the block layer for
> >>> which patches exist), and those are due to distraction more than fear.
> >>
> >> qapi/*.json has 216 commands, of which 26 carry 'allow-preconfig'.
> >
> > Well,
> > https://lists.gnu.org/archive/html/qemu-devel/2021-06/msg01597.html
> > alone would more than double that. :)
> >
> > Places like machine.json, machine-target.json, migration.json,
> > replay.json have a lot of commands that are, obviously, almost entirely
> > not suitable for preconfig.  I don't think there are many commands left,
> > I'd guess maybe 30 (meaning that ~60% are done).
>
> My point is that "very few" is not literally true, and I think you just
> confirmed it ;)


Ok, let me rephrase that as "most of the missing ones are block-layer
relates". ;)

Paolo

Re: [PATCH] q35: flip acpi-pci-hotplug-with-bridge-support default back to off

2021-11-15 Thread Ani Sinha




On Thu, 11 Nov 2021, Gerd Hoffmann wrote:

> Switch qemu 6.2 back to 6.0 behavior (aka native pcie hotplug) because
> acpi hotplug for pcie ports caused all kinds of regressions and a fix
> for those is not in sight.
>
> Add compat property for 6.1 to keep it enabled there.  Use a separate
> compat property list so we can apply it to 6.1 only.

I think we are not going this route anymore. ACPI will continue to be the
default for 6.2 and future and we will continue to fix the issues on the
ACPI hotplug side. I see Michael sent PRs for some of the fixes from Igor
and Julia.

>
> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/641
> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2006409
> Signed-off-by: Gerd Hoffmann 
> ---
>  hw/acpi/ich9.c   |  2 +-
>  hw/i386/pc.c |  1 -
>  hw/i386/pc_q35.c | 14 +-
>  3 files changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
> index 1ee2ba2c508c..6e7d4c9eb54a 100644
> --- a/hw/acpi/ich9.c
> +++ b/hw/acpi/ich9.c
> @@ -427,7 +427,7 @@ void ich9_pm_add_properties(Object *obj, ICH9LPCPMRegs 
> *pm)
>  pm->disable_s3 = 0;
>  pm->disable_s4 = 0;
>  pm->s4_val = 2;
> -pm->use_acpi_hotplug_bridge = true;
> +pm->use_acpi_hotplug_bridge = false;
>
>  object_property_add_uint32_ptr(obj, ACPI_PM_PROP_PM_IO_BASE,
> >pm_io_base, OBJ_PROP_FLAG_READ);
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 2592a821486f..4fed82dafcf0 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -106,7 +106,6 @@ GlobalProperty pc_compat_6_0[] = {
>  { "qemu64" "-" TYPE_X86_CPU, "model", "6" },
>  { "qemu64" "-" TYPE_X86_CPU, "stepping", "3" },
>  { TYPE_X86_CPU, "x-vendor-cpuid-only", "off" },
> -{ "ICH9-LPC", ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, "off" },
>  };
>  const size_t pc_compat_6_0_len = G_N_ELEMENTS(pc_compat_6_0);
>
> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> index 797e09500b15..735dd3cff4ed 100644
> --- a/hw/i386/pc_q35.c
> +++ b/hw/i386/pc_q35.c
> @@ -375,8 +375,20 @@ static void pc_q35_6_1_machine_options(MachineClass *m)
>  m->smp_props.prefer_sockets = true;
>  }
>
> +/* 6.1 only compat property (not applied to 6.0 + older) */
> +static GlobalProperty pc_compat_6_1_only[] = {
> +{ "ICH9-LPC", ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, "on" },
> +};
> +static const size_t pc_compat_6_1_only_len = 
> G_N_ELEMENTS(pc_compat_6_1_only);
> +
> +static void pc_q35_6_1_only_machine_options(MachineClass *m)
> +{
> +pc_q35_6_1_machine_options(m);
> +compat_props_add(m->compat_props, pc_compat_6_1_only, 
> pc_compat_6_1_only_len);
> +}
> +
>  DEFINE_Q35_MACHINE(v6_1, "pc-q35-6.1", NULL,
> -   pc_q35_6_1_machine_options);
> +   pc_q35_6_1_only_machine_options);
>
>  static void pc_q35_6_0_machine_options(MachineClass *m)
>  {
> --
> 2.33.1
>
>

[PATCH] dump-guest-memory: Use BQL to protect dump finalize process

2021-11-15 Thread Peter Xu

When finalizing the dump-guest-memory with detached mode, we'll first set dump
status to either FAIL or COMPLETE before doing the cleanup, however right after
the dump status change it's possible that another dump-guest-memory qmp command
is sent so both the main thread and dump thread (which is during cleanup) could
be accessing dump state in paralell.  That's racy.

Fix it by protecting the finalizing phase of dump-guest-memory using BQL as
well.  To do that, we expand the BQL from dump_cleanup() into dump_process(),
so we will take the BQL longer than before.  The critical section must cover
the status switch from ACTIVE->{FAIL|COMPLETE} so as to make sure there's no
race any more.

We can also just introduce a specific mutex to serialize the dump process, but
BQL should be enough for now so far, not to mention vm_start() in dump_cleanup
will need BQL anyway, so maybe easier to just use the same mutex.

Reported-by: Laszlo Ersek 
Signed-off-by: Peter Xu 
---
 dump/dump.c | 24 ++--
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index 662d0a62cd..196b7b8ab9 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -96,13 +96,7 @@ static int dump_cleanup(DumpState *s)
 g_free(s->guest_note);
 s->guest_note = NULL;
 if (s->resume) {
-if (s->detached) {
-qemu_mutex_lock_iothread();
-}
 vm_start();
-if (s->detached) {
-qemu_mutex_unlock_iothread();
-}
 }
 migrate_del_blocker(dump_migration_blocker);
 
@@ -1873,6 +1867,11 @@ static void dump_process(DumpState *s, Error **errp)
 Error *local_err = NULL;
 DumpQueryResult *result = NULL;
 
+/*
+ * When running with detached mode, these operations are not run with BQL.
+ * It's still safe, because it's protected by setting s->state to ACTIVE,
+ * so dump_in_progress() check will block yet another dump-guest-memory.
+ */
 if (s->has_format && s->format == DUMP_GUEST_MEMORY_FORMAT_WIN_DMP) {
 #ifdef TARGET_X86_64
 create_win_dump(s, _err);
@@ -1883,6 +1882,15 @@ static void dump_process(DumpState *s, Error **errp)
 create_vmcore(s, _err);
 }
 
+/*
+ * Serialize the finalizing of dump process using BQL to make sure no
+ * concurrent access to DumpState is allowed.  BQL is also required for
+ * dump_cleanup as vm_start() needs it.
+ */
+if (s->detached) {
+qemu_mutex_lock_iothread();
+}
+
 /* make sure status is written after written_size updates */
 smp_wmb();
 qatomic_set(>status,
@@ -1898,6 +1906,10 @@ static void dump_process(DumpState *s, Error **errp)
 
 error_propagate(errp, local_err);
 dump_cleanup(s);
+
+if (s->detached) {
+qemu_mutex_unlock_iothread();
+}
 }
 
 static void *dump_thread(void *data)
-- 
2.32.0

Re: [PATCH v4 07/20] target/riscv: Adjust csr write mask with XLEN

2021-11-15 Thread Alistair Francis

On Fri, Nov 12, 2021 at 1:58 AM LIU Zhiwei  wrote:
>
> Write mask is representing the bits we care about.
>
> Signed-off-by: LIU Zhiwei 
> Reviewed-by: Richard Henderson 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/insn_trans/trans_rvi.c.inc | 4 ++--
>  target/riscv/op_helper.c| 3 ++-
>  2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
> b/target/riscv/insn_trans/trans_rvi.c.inc
> index e51dbc41c5..40c81421f2 100644
> --- a/target/riscv/insn_trans/trans_rvi.c.inc
> +++ b/target/riscv/insn_trans/trans_rvi.c.inc
> @@ -486,7 +486,7 @@ static bool trans_csrrw(DisasContext *ctx, arg_csrrw *a)
>  return do_csrw(ctx, a->csr, src);
>  }
>
> -TCGv mask = tcg_constant_tl(-1);
> +TCGv mask = tcg_constant_tl(get_xl(ctx) == MXL_RV32 ? UINT32_MAX : -1);
>  return do_csrrw(ctx, a->rd, a->csr, src, mask);
>  }
>
> @@ -537,7 +537,7 @@ static bool trans_csrrwi(DisasContext *ctx, arg_csrrwi *a)
>  return do_csrw(ctx, a->csr, src);
>  }
>
> -TCGv mask = tcg_constant_tl(-1);
> +TCGv mask = tcg_constant_tl(get_xl(ctx) == MXL_RV32 ? UINT32_MAX : -1);
>  return do_csrrw(ctx, a->rd, a->csr, src, mask);
>  }
>
> diff --git a/target/riscv/op_helper.c b/target/riscv/op_helper.c
> index 095d39671b..561e156bec 100644
> --- a/target/riscv/op_helper.c
> +++ b/target/riscv/op_helper.c
> @@ -50,7 +50,8 @@ target_ulong helper_csrr(CPURISCVState *env, int csr)
>
>  void helper_csrw(CPURISCVState *env, int csr, target_ulong src)
>  {
> -RISCVException ret = riscv_csrrw(env, csr, NULL, src, -1);
> +target_ulong mask = cpu_get_xl(env) == MXL_RV32 ? UINT32_MAX : -1;
> +RISCVException ret = riscv_csrrw(env, csr, NULL, src, mask);
>
>  if (ret != RISCV_EXCP_NONE) {
>  riscv_raise_exception(env, ret, GETPC());
> --
> 2.25.1
>
>

Re: [PATCH v4 06/20] target/riscv: Relax debug check for pm write

2021-11-15 Thread Alistair Francis

On Fri, Nov 12, 2021 at 2:12 AM LIU Zhiwei  wrote:
>
> Signed-off-by: LIU Zhiwei 
> Reviewed-by: Richard Henderson 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/csr.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> index 9f41954894..74c0b788fd 100644
> --- a/target/riscv/csr.c
> +++ b/target/riscv/csr.c
> @@ -1445,6 +1445,9 @@ static bool check_pm_current_disabled(CPURISCVState 
> *env, int csrno)
>  int csr_priv = get_field(csrno, 0x300);
>  int pm_current;
>
> +if (env->debugger) {
> +return false;
> +}
>  /*
>   * If priv lvls differ that means we're accessing csr from higher priv 
> lvl,
>   * so allow the access
> --
> 2.25.1
>
>

Re: [PATCH v4 05/20] target/riscv: Use gdb xml according to max mxlen

2021-11-15 Thread Alistair Francis

On Fri, Nov 12, 2021 at 1:54 AM LIU Zhiwei  wrote:
>
> Signed-off-by: LIU Zhiwei 
> Reviewed-by: Richard Henderson 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/gdbstub.c | 71 +++---
>  1 file changed, 52 insertions(+), 19 deletions(-)
>
> diff --git a/target/riscv/gdbstub.c b/target/riscv/gdbstub.c
> index 23429179e2..8d0f9139d7 100644
> --- a/target/riscv/gdbstub.c
> +++ b/target/riscv/gdbstub.c
> @@ -24,11 +24,23 @@ int riscv_cpu_gdb_read_register(CPUState *cs, GByteArray 
> *mem_buf, int n)
>  {
>  RISCVCPU *cpu = RISCV_CPU(cs);
>  CPURISCVState *env = >env;
> +target_ulong tmp;
>
>  if (n < 32) {
> -return gdb_get_regl(mem_buf, env->gpr[n]);
> +tmp = env->gpr[n];
>  } else if (n == 32) {
> -return gdb_get_regl(mem_buf, env->pc);
> +tmp = env->pc;
> +} else {
> +return 0;
> +}
> +
> +switch (env->misa_mxl_max) {
> +case MXL_RV32:
> +return gdb_get_reg32(mem_buf, tmp);
> +case MXL_RV64:
> +return gdb_get_reg64(mem_buf, tmp);
> +default:
> +g_assert_not_reached();
>  }
>  return 0;
>  }
> @@ -37,18 +49,32 @@ int riscv_cpu_gdb_write_register(CPUState *cs, uint8_t 
> *mem_buf, int n)
>  {
>  RISCVCPU *cpu = RISCV_CPU(cs);
>  CPURISCVState *env = >env;
> -
> -if (n == 0) {
> -/* discard writes to x0 */
> -return sizeof(target_ulong);
> -} else if (n < 32) {
> -env->gpr[n] = ldtul_p(mem_buf);
> -return sizeof(target_ulong);
> +int length = 0;
> +target_ulong tmp;
> +
> +switch (env->misa_mxl_max) {
> +case MXL_RV32:
> +tmp = (int32_t)ldl_p(mem_buf);
> +length = 4;
> +break;
> +case MXL_RV64:
> +if (cpu_get_xl(env) < MXL_RV64) {
> +tmp = (int32_t)ldq_p(mem_buf);
> +} else {
> +tmp = ldq_p(mem_buf);
> +}
> +length = 8;
> +break;
> +default:
> +g_assert_not_reached();
> +}
> +if (n > 0 && n < 32) {
> +env->gpr[n] = tmp;
>  } else if (n == 32) {
> -env->pc = ldtul_p(mem_buf);
> -return sizeof(target_ulong);
> +env->pc = tmp;
>  }
> -return 0;
> +
> +return length;
>  }
>
>  static int riscv_gdb_get_fpu(CPURISCVState *env, GByteArray *buf, int n)
> @@ -198,13 +224,20 @@ void riscv_cpu_register_gdb_regs_for_features(CPUState 
> *cs)
>  gdb_register_coprocessor(cs, riscv_gdb_get_fpu, riscv_gdb_set_fpu,
>   36, "riscv-32bit-fpu.xml", 0);
>  }
> -#if defined(TARGET_RISCV32)
> -gdb_register_coprocessor(cs, riscv_gdb_get_virtual, 
> riscv_gdb_set_virtual,
> - 1, "riscv-32bit-virtual.xml", 0);
> -#elif defined(TARGET_RISCV64)
> -gdb_register_coprocessor(cs, riscv_gdb_get_virtual, 
> riscv_gdb_set_virtual,
> - 1, "riscv-64bit-virtual.xml", 0);
> -#endif
> +switch (env->misa_mxl_max) {
> +case MXL_RV32:
> +gdb_register_coprocessor(cs, riscv_gdb_get_virtual,
> + riscv_gdb_set_virtual,
> + 1, "riscv-32bit-virtual.xml", 0);
> +break;
> +case MXL_RV64:
> +gdb_register_coprocessor(cs, riscv_gdb_get_virtual,
> + riscv_gdb_set_virtual,
> + 1, "riscv-64bit-virtual.xml", 0);
> +break;
> +default:
> +g_assert_not_reached();
> +}
>
>  gdb_register_coprocessor(cs, riscv_gdb_get_csr, riscv_gdb_set_csr,
>   riscv_gen_dynamic_csr_xml(cs, cs->gdb_num_regs),
> --
> 2.25.1
>
>

RE: [PATCH 2/2] virtio-blk: support BLKSECDISCARD

2021-11-15 Thread Qi, Yadong

> The Linux block layer shares the DISCARD queue limits with SECDISCARD.
> That's different from BLKZEROOUT (QEMU's WRITE_ZEROES). This is a Linux
> implementation detail but I guess virtio-blk can share the DISCARD limits with
> SECDISCARD too...
> 
> > @@ -622,6 +628,7 @@ static int virtio_blk_handle_request(VirtIOBlockReq
> *req, MultiReqBuffer *mrb)
> >  unsigned out_num = req->elem.out_num;
> >  VirtIOBlock *s = req->dev;
> >  VirtIODevice *vdev = VIRTIO_DEVICE(s);
> > +bool is_secdiscard = false;
> >
> >  if (req->elem.out_num < 1 || req->elem.in_num < 1) {
> >  virtio_error(vdev, "virtio-blk missing headers"); @@ -722,6
> > +729,9 @@ static int virtio_blk_handle_request(VirtIOBlockReq *req,
> MultiReqBuffer *mrb)
> >   * VIRTIO_BLK_T_OUT flag set. We masked this flag in the switch 
> > statement,
> >   * so we must mask it for these requests, then we will check if it is 
> > set.
> >   */
> > +case VIRTIO_BLK_T_SECDISCARD & ~VIRTIO_BLK_T_OUT:
> > +is_secdiscard = true;
> > +__attribute__((fallthrough));
> 
> The DISCARD case doesn't use __attribute__((fallthrough)) so this is 
> inconsistent.
> QEMU doesn't use __attribute__((fallthrough)) so I suggest dropping this.
Sure, will try to drop the fallthrough case.

> 
> > diff --git a/include/standard-headers/linux/virtio_blk.h
> > b/include/standard-headers/linux/virtio_blk.h
> > index 2dcc90826a..c55a07840c 100644
> > --- a/include/standard-headers/linux/virtio_blk.h
> > +++ b/include/standard-headers/linux/virtio_blk.h
> > @@ -40,6 +40,7 @@
> >  #define VIRTIO_BLK_F_MQ12  /* support more than one vq
> */
> >  #define VIRTIO_BLK_F_DISCARD   13  /* DISCARD is supported */
> >  #define VIRTIO_BLK_F_WRITE_ZEROES  14  /* WRITE ZEROES is
> supported */
> > +#define VIRTIO_BLK_F_SECDISCARD15  /* WRITE ZEROES is supported
> */
> 
> The comment is copy-pasted from WRITE_ZEROES.
Will fix it.

RE: [PATCH 1/2] block:hdev: support BLKSECDISCARD

2021-11-15 Thread Qi, Yadong

> > Add a new option "secdiscard" for block drive. Parse it and record it
> > in bs->open_flags as bit(BDRV_O_SECDISCARD).
> >
> > Add a new BDRV_REQ_SECDISCARD bit for secure discard request from
> > virtual device.
> >
> > For host_device backend: implement by ioctl(BLKSECDISCARD) on real
> > host block device.
> > For other backend, no implementation.
> >
> > E.g.:
> > qemu-system-x86_64 \
> > ... \
> > -drive
> file=/dev/mmcblk0p2,if=none,format=raw,discard=on,secdiscard=on,id=sd0 \
> > -device virtio-blk-pci,drive=sd0,id=sd0_vblk \
> > ...
> 
> I'm curious why there is explicit control over this feature (-drive
> secdiscard=on|off). For example, is there a reason why users would want to
> disable secdiscard on a drive that supports it? I guess there is no way to 
> emulate
> it correctly so secdiscard=on on a drive that doesn't support it produces an 
> error?

Yes. AFAIK, there is no way to emulate a "secure" discard. But it seems also 
hard to
detect whether a host drive support secdiscard unless it really perform a real
secdiscard. So I choose to add an option for user to enable it for virtual 
block.
There is an assumption that user knows whether host support secdiscard.


Best Regard
Yadong

RE: [PATCH 1/2] block:hdev: support BLKSECDISCARD

2021-11-15 Thread Qi, Yadong

> 
> Notably absent: qapi/block-core.json. Without changing this, the option can't 
> be
> available in -blockdev, which is the primary option to configure block device
> backends.
> 
> This patch seems to contain multiple logical changes that should be split into
> separate patches:
> 
> * Adding a flags parameter to .bdrv_co_pdiscard
> 
> * Support for the new feature in the core block layer (should be done
>   with -blockdev)
> 
> * Convenience magic for -drive (BDRV_O_SECDISCARD). It's not clear that
>   this should be done at all because the option is really specific to
>   one single block driver (file-posix). I think in your patch, all
>   other block drivers silently ignore the option, which is not what we
>   want.
Sorry for the absent for -blockdev. Will try add that.
Also I will try to split the patches.
And for the BDRV_O_SECDISCARD, it is specific for file-posix.c(host_device). 
Maybe
I need to add the option only for file-posix.c.

> 
> > diff --git a/block.c b/block.c
> > index 580cb77a70..4f05e96d12 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -1128,6 +1128,32 @@ int bdrv_parse_discard_flags(const char *mode,
> int *flags)
> >  return 0;
> >  }
> >
> > +/**
> > + * Set open flags for a given secdiscard mode
> > + *
> > + * Return 0 on success, -1 if the secdiscard mode was invalid.
> > + */
> > +int bdrv_parse_secdiscard_flags(const char *mode, int *flags, Error
> > +**errp) {
> > +*flags &= ~BDRV_O_SECDISCARD;
> > +
> > +if (!strcmp(mode, "off")) {
> > +/* do nothing */
> > +} else if (!strcmp(mode, "on")) {
> > +if (!(*flags & BDRV_O_UNMAP)) {
> > +error_setg(errp, "cannot enable secdiscard when discard is "
> > + "disabled!");
> > +return -1;
> > +}
> 
> This check has nothing to do with parsing the option, it's validating its 
> value.
> 
> You don't even need a new function to parse it, because there is already
> qemu_opt_get_bool(). Duplicating it means only that you're inconsistent with
> other boolean options, which alternatively accept "yes"/"no", "true"/"false",
> "y/n".
Sure. Will use qemu_opt_get_bool() instead.

> 
> > +
> > +*flags |= BDRV_O_SECDISCARD;
> > +} else {
> > +return -1;
> > +}
> > +
> > +return 0;
> > +}
> > +
> >  /**
> >   * Set open flags for a given cache mode
> >   *
> > @@ -1695,6 +1721,11 @@ QemuOptsList bdrv_runtime_opts = {
> >  .type = QEMU_OPT_STRING,
> >  .help = "discard operation (ignore/off, unmap/on)",
> >  },
> > +{
> > +.name = BDRV_OPT_SECDISCARD,
> > +.type = QEMU_OPT_STRING,
> > +.help = "secure discard operation (off, on)",
> > +},
> >  {
> >  .name = BDRV_OPT_FORCE_SHARE,
> >  .type = QEMU_OPT_BOOL,
> > @@ -1735,6 +1766,7 @@ static int bdrv_open_common(BlockDriverState *bs,
> BlockBackend *file,
> >  const char *driver_name = NULL;
> >  const char *node_name = NULL;
> >  const char *discard;
> > +const char *secdiscard;
> >  QemuOpts *opts;
> >  BlockDriver *drv;
> >  Error *local_err = NULL;
> > @@ -1829,6 +1861,16 @@ static int bdrv_open_common(BlockDriverState
> *bs, BlockBackend *file,
> >  }
> >  }
> >
> > +
> > +secdiscard = qemu_opt_get(opts, BDRV_OPT_SECDISCARD);
> > +if (secdiscard != NULL) {
> > +if (bdrv_parse_secdiscard_flags(secdiscard, >open_flags,
> > +errp) != 0) {
> > +ret = -EINVAL;
> > +goto fail_opts;
> > +}
> > +}
> > +
> >  bs->detect_zeroes =
> >  bdrv_parse_detect_zeroes(opts, bs->open_flags, _err);
> >  if (local_err) {
> > @@ -3685,6 +3727,10 @@ static BlockDriverState *bdrv_open_inherit(const
> char *filename,
> > , options, flags, options);
> >  }
> >
> > +if (g_strcmp0(qdict_get_try_str(options, BDRV_OPT_SECDISCARD), "on")) {
> > +flags |= BDRV_O_SECDISCARD;
> 
> Indentation is off.
Will fix it.

> 
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -160,6 +160,7 @@ typedef struct BDRVRawState {
> >  bool is_xfs:1;
> >  #endif
> >  bool has_discard:1;
> > +bool has_secdiscard:1;
> >  bool has_write_zeroes:1;
> >  bool discard_zeroes:1;
> >  bool use_linux_aio:1;
> 
> has_secdiscard is only set to false in raw_open_common() and never changed or
> used.
Will remove it.

> 
> > @@ -727,6 +728,7 @@ static int raw_open_common(BlockDriverState *bs,
> > QDict *options,  #endif /* !defined(CONFIG_LINUX_IO_URING) */
> >
> >  s->has_discard = true;
> > +s->has_secdiscard = false;
> >  s->has_write_zeroes = true;
> >  if ((bs->open_flags & BDRV_O_NOCACHE) != 0 && !dio_byte_aligned(s->fd))
> {
> >  s->needs_alignment = true;
> > @@ -765,6 +767,7 @@ static int raw_open_common(BlockDriverState *bs,
> QDict *options,
> >

RE: [PATCH 2/2] virtio-blk: support BLKSECDISCARD

2021-11-15 Thread Qi, Yadong

> On Mon, Nov 15, 2021 at 12:52:00PM +0800, yadong...@intel.com wrote:
> > From: Yadong Qi 
> >
> > Add new virtio feature: VIRTIO_BLK_F_SECDISCARD.
> > Add new virtio command: VIRTIO_BLK_T_SECDISCARD.
> >
> > This feature is disabled by default, it will check the backend
> > bs->open_flags & BDRV_O_SECDISCARD, enable it if BDRV_O_SECDISCARD
> > is supported.
> >
> > Signed-off-by: Yadong Qi 
> > ---
> >  hw/block/virtio-blk.c   | 26 +
> >  include/standard-headers/linux/virtio_blk.h |  4 
> 
> 
> Any changes to standard headers need to go to linux and virtio TC.
Sure. I will draft proposal to virtio-comment for review.

RE: [PATCH 2/2] virtio-blk: support BLKSECDISCARD

2021-11-15 Thread Qi, Yadong

> >Add new virtio feature: VIRTIO_BLK_F_SECDISCARD.
> >Add new virtio command: VIRTIO_BLK_T_SECDISCARD.
> 
> Has a proposal been sent out yet to virtio-comment mailing list for discussing
> these specification changes?
> 
Not yet. I will draft a proposal to virtio-comment if no big concern of this 
patch
>From maintainer.
> >
> >diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index
> >dbc4c5a3cd..7bc3484521 100644
> >--- a/hw/block/virtio-blk.c
> >+++ b/hw/block/virtio-blk.c
> >@@ -536,7 +536,8 @@ static bool virtio_blk_sect_range_ok(VirtIOBlock
> >*dev,  }
> >
> > static uint8_t virtio_blk_handle_discard_write_zeroes(VirtIOBlockReq *req,
> >-struct virtio_blk_discard_write_zeroes *dwz_hdr, bool is_write_zeroes)
> >+struct virtio_blk_discard_write_zeroes *dwz_hdr, bool is_write_zeroes,
> >+bool is_secdiscard)
> 
> Since the function now handles 3 commands, I'm thinking if it's better to pass
> the command directly and then make a switch instead of using 2 booleans.
> 
Make sense.

> > {
> > VirtIOBlock *s = req->dev;
> > VirtIODevice *vdev = VIRTIO_DEVICE(s); @@ -577,8 +578,8 @@ static
> >uint8_t virtio_blk_handle_discard_write_zeroes(VirtIOBlockReq *req,
> > goto err;
> > }
> >
> >+int blk_aio_flags = 0;
> 
> Maybe better to move it to the beginning of the function.
Sure.

> 
> >
> >-blk_aio_pdiscard(s->blk, sector << BDRV_SECTOR_BITS, bytes, 0,
> >+if (is_secdiscard) {
> >+blk_aio_flags |= BDRV_REQ_SECDISCARD;
> >+}
> >+
> >+blk_aio_pdiscard(s->blk, sector << BDRV_SECTOR_BITS, bytes,
> >+ blk_aio_flags,
> >  virtio_blk_discard_write_zeroes_complete, req);
> > }
> >
> >@@ -622,6 +628,7 @@ static int virtio_blk_handle_request(VirtIOBlockReq
> *req, MultiReqBuffer *mrb)
> > unsigned out_num = req->elem.out_num;
> > VirtIOBlock *s = req->dev;
> > VirtIODevice *vdev = VIRTIO_DEVICE(s);
> >+bool is_secdiscard = false;
> >
> > if (req->elem.out_num < 1 || req->elem.in_num < 1) {
> > virtio_error(vdev, "virtio-blk missing headers"); @@ -722,6
> >+729,9 @@ static int virtio_blk_handle_request(VirtIOBlockReq *req,
> MultiReqBuffer *mrb)
> >  * VIRTIO_BLK_T_OUT flag set. We masked this flag in the switch 
> > statement,
> >  * so we must mask it for these requests, then we will check if it is 
> > set.
> >  */
> >+case VIRTIO_BLK_T_SECDISCARD & ~VIRTIO_BLK_T_OUT:
> >+is_secdiscard = true;
> >+__attribute__((fallthrough));
> 
> We can use QEMU_FALLTHROUGH here.
Sure.

> 
> >
> > err_status = virtio_blk_handle_discard_write_zeroes(req, _hdr,
> >-
> >is_write_zeroes);
> >+is_write_zeroes,
> >+
> >+ is_secdiscard);


> >
> >+if (blk_get_flags(conf->conf.blk) & BDRV_O_SECDISCARD)
> >+virtio_add_feature(>host_features,
> >VIRTIO_BLK_F_SECDISCARD);
> >+else
> >+virtio_clear_feature(>host_features,
> >+ VIRTIO_BLK_F_SECDISCARD);
> >+
> 
> IIUC here we set or not the feature if BDRV_O_SECDISCARD is set.
> 
> Should we keep it disabled if "secdiscard" is false? (e.g. to avoid migration
> problems)
Yes, BDRV_O_SECDISCARD=(secdiscard=="on") ? 1 : 0;

> 
> Otherwise what is the purpose of the "secdiscard" property?
I cannot find a good method to detect whether host device support BLKSECDISCARD.
So I add this "secdiscard" property to explicitly enable this feature.

Best Regard
Yadong
> 
> Thanks,
> Stefano

Re: [PATCH v9 68/76] target/riscv: gdb: support vector registers for rv64 & rv32

2021-11-15 Thread Alistair Francis

On Fri, Oct 29, 2021 at 8:24 PM  wrote:
>
> From: Hsiangkai Wang 
>
> Signed-off-by: Hsiangkai Wang 
> Signed-off-by: Greentime Hu 
> Signed-off-by: Frank Chang 

Acked-by: Alistair Francis 

Alistair

> ---
>  target/riscv/cpu.c |   2 +
>  target/riscv/cpu.h |   1 +
>  target/riscv/gdbstub.c | 184 +
>  3 files changed, 187 insertions(+)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index c760ea08621..860f356bd99 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -675,6 +675,8 @@ static const char *riscv_gdb_get_dynamic_xml(CPUState 
> *cs, const char *xmlname)
>
>  if (strcmp(xmlname, "riscv-csr.xml") == 0) {
>  return cpu->dyn_csr_xml;
> +} else if (strcmp(xmlname, "riscv-vector.xml") == 0) {
> +return cpu->dyn_vreg_xml;
>  }
>
>  return NULL;
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index 5d93ccdfa71..dc10f27093b 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -291,6 +291,7 @@ struct RISCVCPU {
>  CPURISCVState env;
>
>  char *dyn_csr_xml;
> +char *dyn_vreg_xml;
>
>  /* Configuration Settings */
>  struct {
> diff --git a/target/riscv/gdbstub.c b/target/riscv/gdbstub.c
> index 23429179e2e..881ab333924 100644
> --- a/target/riscv/gdbstub.c
> +++ b/target/riscv/gdbstub.c
> @@ -20,6 +20,32 @@
>  #include "exec/gdbstub.h"
>  #include "cpu.h"
>
> +struct TypeSize {
> +const char *gdb_type;
> +const char *id;
> +int size;
> +const char suffix;
> +};
> +
> +static const struct TypeSize vec_lanes[] = {
> +/* quads */
> +{ "uint128", "quads", 128, 'q' },
> +/* 64 bit */
> +{ "uint64", "longs", 64, 'l' },
> +/* 32 bit */
> +{ "uint32", "words", 32, 'w' },
> +/* 16 bit */
> +{ "uint16", "shorts", 16, 's' },
> +/*
> + * TODO: currently there is no reliable way of telling
> + * if the remote gdb actually understands ieee_half so
> + * we don't expose it in the target description for now.
> + * { "ieee_half", 16, 'h', 'f' },
> + */
> +/* bytes */
> +{ "uint8", "bytes", 8, 'b' },
> +};
> +
>  int riscv_cpu_gdb_read_register(CPUState *cs, GByteArray *mem_buf, int n)
>  {
>  RISCVCPU *cpu = RISCV_CPU(cs);
> @@ -101,6 +127,96 @@ static int riscv_gdb_set_fpu(CPURISCVState *env, uint8_t 
> *mem_buf, int n)
>  return 0;
>  }
>
> +/*
> + * Convert register index number passed by GDB to the correspond
> + * vector CSR number. Vector CSRs are defined after vector registers
> + * in dynamic generated riscv-vector.xml, thus the starting register index
> + * of vector CSRs is 32.
> + * Return 0 if register index number is out of range.
> + */
> +static int riscv_gdb_vector_csrno(int num_regs)
> +{
> +/*
> + * The order of vector CSRs in the switch case
> + * should match with the order defined in csr_ops[].
> + */
> +switch (num_regs) {
> +case 32:
> +return CSR_VSTART;
> +case 33:
> +return CSR_VXSAT;
> +case 34:
> +return CSR_VXRM;
> +case 35:
> +return CSR_VCSR;
> +case 36:
> +return CSR_VL;
> +case 37:
> +return CSR_VTYPE;
> +case 38:
> +return CSR_VLENB;
> +default:
> +/* Unknown register. */
> +return 0;
> +}
> +}
> +
> +static int riscv_gdb_get_vector(CPURISCVState *env, GByteArray *buf, int n)
> +{
> +uint16_t vlenb = env_archcpu(env)->cfg.vlen >> 3;
> +if (n < 32) {
> +int i;
> +int cnt = 0;
> +for (i = 0; i < vlenb; i += 8) {
> +cnt += gdb_get_reg64(buf,
> + env->vreg[(n * vlenb + i) / 8]);
> +}
> +return cnt;
> +}
> +
> +int csrno = riscv_gdb_vector_csrno(n);
> +
> +if (!csrno) {
> +return 0;
> +}
> +
> +target_ulong val = 0;
> +int result = riscv_csrrw_debug(env, csrno, , 0, 0);
> +
> +if (result == 0) {
> +return gdb_get_regl(buf, val);
> +}
> +
> +return 0;
> +}
> +
> +static int riscv_gdb_set_vector(CPURISCVState *env, uint8_t *mem_buf, int n)
> +{
> +uint16_t vlenb = env_archcpu(env)->cfg.vlen >> 3;
> +if (n < 32) {
> +int i;
> +for (i = 0; i < vlenb; i += 8) {
> +env->vreg[(n * vlenb + i) / 8] = ldq_p(mem_buf + i);
> +}
> +return vlenb;
> +}
> +
> +int csrno = riscv_gdb_vector_csrno(n);
> +
> +if (!csrno) {
> +return 0;
> +}
> +
> +target_ulong val = ldtul_p(mem_buf);
> +int result = riscv_csrrw_debug(env, csrno, NULL, val, -1);
> +
> +if (result == 0) {
> +return sizeof(target_ulong);
> +}
> +
> +return 0;
> +}
> +
>  static int riscv_gdb_get_csr(CPURISCVState *env, GByteArray *buf, int n)
>  {
>  if (n < CSR_TABLE_SIZE) {
> @@ -187,6 +303,68 @@ static int riscv_gen_dynamic_csr_xml(CPUState *cs, int 
> base_reg)
>  return CSR_TABLE_SIZE;
>  }
>
> +static int

Re: [PATCH v4 04/20] target/riscv: Extend pc for runtime pc write

2021-11-15 Thread Alistair Francis

On Fri, Nov 12, 2021 at 2:01 AM LIU Zhiwei  wrote:
>
> In some cases, we must restore the guest PC to the address of the start of
> the TB, such as when the instruction counter hits zero. So extend pc register
> according to current xlen for these cases.
>
> Signed-off-by: LIU Zhiwei 
> Reviewed-by: Richard Henderson 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/cpu.c| 22 +++---
>  target/riscv/cpu.h|  2 ++
>  target/riscv/cpu_helper.c |  2 +-
>  3 files changed, 22 insertions(+), 4 deletions(-)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index f812998123..0d2d175fa2 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -319,7 +319,12 @@ static void riscv_cpu_set_pc(CPUState *cs, vaddr value)
>  {
>  RISCVCPU *cpu = RISCV_CPU(cs);
>  CPURISCVState *env = >env;
> -env->pc = value;
> +
> +if (cpu_get_xl(env) == MXL_RV32) {
> +env->pc = (int32_t)value;
> +} else {
> +env->pc = value;
> +}
>  }
>
>  static void riscv_cpu_synchronize_from_tb(CPUState *cs,
> @@ -327,7 +332,13 @@ static void riscv_cpu_synchronize_from_tb(CPUState *cs,
>  {
>  RISCVCPU *cpu = RISCV_CPU(cs);
>  CPURISCVState *env = >env;
> -env->pc = tb->pc;
> +RISCVMXL xl = FIELD_EX32(tb->flags, TB_FLAGS, XL);
> +
> +if (xl == MXL_RV32) {
> +env->pc = (int32_t)tb->pc;
> +} else {
> +env->pc = tb->pc;
> +}
>  }
>
>  static bool riscv_cpu_has_work(CPUState *cs)
> @@ -348,7 +359,12 @@ static bool riscv_cpu_has_work(CPUState *cs)
>  void restore_state_to_opc(CPURISCVState *env, TranslationBlock *tb,
>target_ulong *data)
>  {
> -env->pc = data[0];
> +RISCVMXL xl = FIELD_EX32(tb->flags, TB_FLAGS, XL);
> +if (xl == MXL_RV32) {
> +env->pc = (int32_t)data[0];
> +} else {
> +env->pc = data[0];
> +}
>  }
>
>  static void riscv_cpu_reset(DeviceState *dev)
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index 0760c0af93..8befff0166 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -420,6 +420,8 @@ static inline RISCVMXL riscv_cpu_mxl(CPURISCVState *env)
>  }
>  #endif
>
> +RISCVMXL cpu_get_xl(CPURISCVState *env);
> +
>  /*
>   * A simplification for VLMAX
>   * = (1 << LMUL) * VLEN / (8 * (1 << SEW))
> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> index 4c048cc266..79aba9c880 100644
> --- a/target/riscv/cpu_helper.c
> +++ b/target/riscv/cpu_helper.c
> @@ -35,7 +35,7 @@ int riscv_cpu_mmu_index(CPURISCVState *env, bool ifetch)
>  #endif
>  }
>
> -static RISCVMXL cpu_get_xl(CPURISCVState *env)
> +RISCVMXL cpu_get_xl(CPURISCVState *env)
>  {
>  #if defined(TARGET_RISCV32)
>  return MXL_RV32;
> --
> 2.25.1
>
>

Re: [PATCH v9 74/76] target/riscv: rvv-1.0: add vector unit-stride mask load/store insns

2021-11-15 Thread Alistair Francis

On Fri, Oct 29, 2021 at 8:38 PM  wrote:
>
> From: Frank Chang 
>
> Signed-off-by: Frank Chang 

Acked-by: Alistair Francis 

Alistair

> ---
>  target/riscv/helper.h   |  2 ++
>  target/riscv/insn32.decode  |  4 +++
>  target/riscv/insn_trans/trans_rvv.c.inc | 40 +
>  target/riscv/vector_helper.c| 21 +
>  4 files changed, 67 insertions(+)
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index ab283d12b79..6e58343af35 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -129,6 +129,8 @@ DEF_HELPER_5(vse8_v_mask, void, ptr, ptr, tl, env, i32)
>  DEF_HELPER_5(vse16_v_mask, void, ptr, ptr, tl, env, i32)
>  DEF_HELPER_5(vse32_v_mask, void, ptr, ptr, tl, env, i32)
>  DEF_HELPER_5(vse64_v_mask, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vlm_v, void, ptr, ptr, tl, env, i32)
> +DEF_HELPER_5(vsm_v, void, ptr, ptr, tl, env, i32)
>  DEF_HELPER_6(vlse8_v, void, ptr, ptr, tl, tl, env, i32)
>  DEF_HELPER_6(vlse16_v, void, ptr, ptr, tl, tl, env, i32)
>  DEF_HELPER_6(vlse32_v, void, ptr, ptr, tl, tl, env, i32)
> diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
> index 3b6524bad91..1a4a2871464 100644
> --- a/target/riscv/insn32.decode
> +++ b/target/riscv/insn32.decode
> @@ -305,6 +305,10 @@ vse16_v... 000 . 0 . 101 . 0100111 
> @r2_nfvm
>  vse32_v... 000 . 0 . 110 . 0100111 @r2_nfvm
>  vse64_v... 000 . 0 . 111 . 0100111 @r2_nfvm
>
> +# Vector unit-stride mask load/store insns.
> +vlm_v  000 000 1 01011 . 000 . 111 @r2
> +vsm_v  000 000 1 01011 . 000 . 0100111 @r2
> +
>  # Vector strided insns.
>  vlse8_v ... 010 . . . 000 . 111 @r_nfvm
>  vlse16_v... 010 . . . 101 . 111 @r_nfvm
> diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
> b/target/riscv/insn_trans/trans_rvv.c.inc
> index e540b5d33c2..97b1dc10265 100644
> --- a/target/riscv/insn_trans/trans_rvv.c.inc
> +++ b/target/riscv/insn_trans/trans_rvv.c.inc
> @@ -697,6 +697,46 @@ GEN_VEXT_TRANS(vse16_v, MO_16, r2nfvm, st_us_op, 
> st_us_check)
>  GEN_VEXT_TRANS(vse32_v, MO_32, r2nfvm, st_us_op, st_us_check)
>  GEN_VEXT_TRANS(vse64_v, MO_64, r2nfvm, st_us_op, st_us_check)
>
> +/*
> + *** unit stride mask load and store
> + */
> +static bool ld_us_mask_op(DisasContext *s, arg_vlm_v *a, uint8_t eew)
> +{
> +uint32_t data = 0;
> +gen_helper_ldst_us *fn = gen_helper_vlm_v;
> +
> +/* EMUL = 1, NFIELDS = 1 */
> +data = FIELD_DP32(data, VDATA, LMUL, 0);
> +data = FIELD_DP32(data, VDATA, NF, 1);
> +return ldst_us_trans(a->rd, a->rs1, data, fn, s, false);
> +}
> +
> +static bool ld_us_mask_check(DisasContext *s, arg_vlm_v *a, uint8_t eew)
> +{
> +/* EMUL = 1, NFIELDS = 1 */
> +return require_rvv(s) && vext_check_isa_ill(s);
> +}
> +
> +static bool st_us_mask_op(DisasContext *s, arg_vsm_v *a, uint8_t eew)
> +{
> +uint32_t data = 0;
> +gen_helper_ldst_us *fn = gen_helper_vsm_v;
> +
> +/* EMUL = 1, NFIELDS = 1 */
> +data = FIELD_DP32(data, VDATA, LMUL, 0);
> +data = FIELD_DP32(data, VDATA, NF, 1);
> +return ldst_us_trans(a->rd, a->rs1, data, fn, s, true);
> +}
> +
> +static bool st_us_mask_check(DisasContext *s, arg_vsm_v *a, uint8_t eew)
> +{
> +/* EMUL = 1, NFIELDS = 1 */
> +return require_rvv(s) && vext_check_isa_ill(s);
> +}
> +
> +GEN_VEXT_TRANS(vlm_v, MO_8, vlm_v, ld_us_mask_op, ld_us_mask_check)
> +GEN_VEXT_TRANS(vsm_v, MO_8, vsm_v, st_us_mask_op, st_us_mask_check)
> +
>  /*
>   *** stride load and store
>   */
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index 83373ca6fc6..4c1a1310e63 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -345,6 +345,27 @@ GEN_VEXT_ST_US(vse16_v, int16_t, ste_h)
>  GEN_VEXT_ST_US(vse32_v, int32_t, ste_w)
>  GEN_VEXT_ST_US(vse64_v, int64_t, ste_d)
>
> +/*
> + *** unit stride mask load and store, EEW = 1
> + */
> +void HELPER(vlm_v)(void *vd, void *v0, target_ulong base,
> +CPURISCVState *env, uint32_t desc)
> +{
> +/* evl = ceil(vl/8) */
> +uint8_t evl = (env->vl + 7) >> 3;
> +vext_ldst_us(vd, base, env, desc, lde_b,
> + 0, evl, GETPC(), MMU_DATA_LOAD);
> +}
> +
> +void HELPER(vsm_v)(void *vd, void *v0, target_ulong base,
> +CPURISCVState *env, uint32_t desc)
> +{
> +/* evl = ceil(vl/8) */
> +uint8_t evl = (env->vl + 7) >> 3;
> +vext_ldst_us(vd, base, env, desc, ste_b,
> + 0, evl, GETPC(), MMU_DATA_STORE);
> +}
> +
>  /*
>   *** index: access vector element from indexed memory
>   */
> --
> 2.25.1
>
>

Re: [PATCH v9 73/76] target/riscv: rvv-1.0: add evl parameter to vext_ldst_us()

2021-11-15 Thread Alistair Francis

On Fri, Oct 29, 2021 at 8:11 PM  wrote:
>
> From: Frank Chang 
>
> Add supports of Vector unit-stride mask load/store instructions
> (vlm.v, vsm.v), which has:
> evl (effective vector length) = ceil(env->vl / 8).
>
> The new instructions operate the same as unmasked byte loads and stores.
> Add evl parameter to reuse vext_ldst_us().
>
> Signed-off-by: Frank Chang 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/vector_helper.c | 36 ++--
>  1 file changed, 18 insertions(+), 18 deletions(-)
>
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index 946dca53ffd..83373ca6fc6 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -279,15 +279,15 @@ GEN_VEXT_ST_STRIDE(vsse64_v, int64_t, ste_d)
>  /* unmasked unit-stride load and store operation*/
>  static void
>  vext_ldst_us(void *vd, target_ulong base, CPURISCVState *env, uint32_t desc,
> - vext_ldst_elem_fn *ldst_elem,
> - uint32_t esz, uintptr_t ra, MMUAccessType access_type)
> + vext_ldst_elem_fn *ldst_elem, uint32_t esz, uint32_t evl,
> + uintptr_t ra, MMUAccessType access_type)
>  {
>  uint32_t i, k;
>  uint32_t nf = vext_nf(desc);
>  uint32_t max_elems = vext_max_elems(desc, esz);
>
>  /* load bytes from guest memory */
> -for (i = env->vstart; i < env->vl; i++, env->vstart++) {
> +for (i = env->vstart; i < evl; i++, env->vstart++) {
>  k = 0;
>  while (k < nf) {
>  target_ulong addr = base + ((i * nf + k) << esz);
> @@ -316,7 +316,7 @@ void HELPER(NAME)(void *vd, void *v0, target_ulong base,  
>   \
>CPURISCVState *env, uint32_t desc)\
>  {   \
>  vext_ldst_us(vd, base, env, desc, LOAD_FN,  \
> - ctzl(sizeof(ETYPE)), GETPC(), MMU_DATA_LOAD);  \
> + ctzl(sizeof(ETYPE)), env->vl, GETPC(), MMU_DATA_LOAD); \
>  }
>
>  GEN_VEXT_LD_US(vle8_v,  int8_t,  lde_b)
> @@ -324,20 +324,20 @@ GEN_VEXT_LD_US(vle16_v, int16_t, lde_h)
>  GEN_VEXT_LD_US(vle32_v, int32_t, lde_w)
>  GEN_VEXT_LD_US(vle64_v, int64_t, lde_d)
>
> -#define GEN_VEXT_ST_US(NAME, ETYPE, STORE_FN)   \
> -void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base, \
> - CPURISCVState *env, uint32_t desc) \
> -{   \
> -uint32_t stride = vext_nf(desc) << ctzl(sizeof(ETYPE)); \
> -vext_ldst_stride(vd, v0, base, stride, env, desc, false, STORE_FN,  \
> - ctzl(sizeof(ETYPE)), GETPC(), MMU_DATA_STORE); \
> -}   \
> -\
> -void HELPER(NAME)(void *vd, void *v0, target_ulong base,\
> -  CPURISCVState *env, uint32_t desc)\
> -{   \
> -vext_ldst_us(vd, base, env, desc, STORE_FN, \
> - ctzl(sizeof(ETYPE)), GETPC(), MMU_DATA_STORE); \
> +#define GEN_VEXT_ST_US(NAME, ETYPE, STORE_FN)\
> +void HELPER(NAME##_mask)(void *vd, void *v0, target_ulong base,  \
> + CPURISCVState *env, uint32_t desc)  \
> +{\
> +uint32_t stride = vext_nf(desc) << ctzl(sizeof(ETYPE));  \
> +vext_ldst_stride(vd, v0, base, stride, env, desc, false, STORE_FN,   \
> + ctzl(sizeof(ETYPE)), GETPC(), MMU_DATA_STORE);  \
> +}\
> + \
> +void HELPER(NAME)(void *vd, void *v0, target_ulong base, \
> +  CPURISCVState *env, uint32_t desc) \
> +{\
> +vext_ldst_us(vd, base, env, desc, STORE_FN,  \
> + ctzl(sizeof(ETYPE)), env->vl, GETPC(), MMU_DATA_STORE); \
>  }
>
>  GEN_VEXT_ST_US(vse8_v,  int8_t,  ste_b)
> --
> 2.25.1
>
>

Re: [PATCH v9 67/76] target/riscv: rvv-1.0: trigger illegal instruction exception if frm is not valid

2021-11-15 Thread Alistair Francis

On Fri, Oct 29, 2021 at 8:14 PM  wrote:
>
> From: Frank Chang 
>
> If the frm field contains an invalid rounding mode (101-111),
> attempting to execute any vector floating-point instruction, even
> those that do not depend on the rounding mode, will raise an illegal
> instruction exception.
>
> Call gen_set_rm() with DYN rounding mode to check and trigger illegal
> instruction exception if frm field contains invalid value at run-time
> for vector floating-point instructions.
>
> Signed-off-by: Frank Chang 

Acked-by: Alistair Francis 

Alistair

> ---
>  target/riscv/insn_trans/trans_rvv.c.inc | 22 ++
>  1 file changed, 22 insertions(+)
>
> diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
> b/target/riscv/insn_trans/trans_rvv.c.inc
> index 7589c8ce32a..53c8573f117 100644
> --- a/target/riscv/insn_trans/trans_rvv.c.inc
> +++ b/target/riscv/insn_trans/trans_rvv.c.inc
> @@ -2374,6 +2374,10 @@ static bool do_opfv(DisasContext *s, arg_rmr *a,
>  int rm)
>  {
>  if (checkfn(s, a)) {
> +if (rm != RISCV_FRM_DYN) {
> +gen_set_rm(s, RISCV_FRM_DYN);
> +}
> +
>  uint32_t data = 0;
>  TCGLabel *over = gen_new_label();
>  gen_set_rm(s, rm);
> @@ -2459,6 +2463,8 @@ static bool trans_vfmv_v_f(DisasContext *s, 
> arg_vfmv_v_f *a)
>  require_rvf(s) &&
>  vext_check_isa_ill(s) &&
>  require_align(a->rd, s->lmul)) {
> +gen_set_rm(s, RISCV_FRM_DYN);
> +
>  TCGv_i64 t1;
>
>  if (s->vl_eq_vlmax) {
> @@ -2540,6 +2546,10 @@ static bool opfv_widen_check(DisasContext *s, arg_rmr 
> *a)
>  static bool trans_##NAME(DisasContext *s, arg_rmr *a)  \
>  {  \
>  if (opfv_widen_check(s, a)) {  \
> +if (FRM != RISCV_FRM_DYN) {\
> +gen_set_rm(s, RISCV_FRM_DYN);  \
> +}  \
> +   \
>  uint32_t data = 0; \
>  static gen_helper_gvec_3_ptr * const fns[2] = {\
>  gen_helper_##HELPER##_h,   \
> @@ -2627,6 +2637,10 @@ static bool opfv_narrow_check(DisasContext *s, arg_rmr 
> *a)
>  static bool trans_##NAME(DisasContext *s, arg_rmr *a)  \
>  {  \
>  if (opfv_narrow_check(s, a)) { \
> +if (FRM != RISCV_FRM_DYN) {\
> +gen_set_rm(s, RISCV_FRM_DYN);  \
> +}  \
> +   \
>  uint32_t data = 0; \
>  static gen_helper_gvec_3_ptr * const fns[2] = {\
>  gen_helper_##HELPER##_h,   \
> @@ -2668,6 +2682,10 @@ static bool opxfv_narrow_check(DisasContext *s, 
> arg_rmr *a)
>  static bool trans_##NAME(DisasContext *s, arg_rmr *a)  \
>  {  \
>  if (opxfv_narrow_check(s, a)) {\
> +if (FRM != RISCV_FRM_DYN) {\
> +gen_set_rm(s, RISCV_FRM_DYN);  \
> +}  \
> +   \
>  uint32_t data = 0; \
>  static gen_helper_gvec_3_ptr * const fns[3] = {\
>  gen_helper_##HELPER##_b,   \
> @@ -3138,6 +3156,8 @@ static bool trans_vfmv_f_s(DisasContext *s, 
> arg_vfmv_f_s *a)
>  if (require_rvv(s) &&
>  require_rvf(s) &&
>  vext_check_isa_ill(s)) {
> +gen_set_rm(s, RISCV_FRM_DYN);
> +
>  unsigned int ofs = (8 << s->sew);
>  unsigned int len = 64 - ofs;
>  TCGv_i64 t_nan;
> @@ -3162,6 +3182,8 @@ static bool trans_vfmv_s_f(DisasContext *s, 
> arg_vfmv_s_f *a)
>  if (require_rvv(s) &&
>  require_rvf(s) &&
>  vext_check_isa_ill(s)) {
> +gen_set_rm(s, RISCV_FRM_DYN);
> +
>  /* The instructions ignore LMUL and vector register group. */
>  TCGv_i64 t1;
>  TCGLabel *over = gen_new_label();
> --
> 2.25.1
>
>

Re: [PATCH v9 66/76] target/riscv: rvv-1.0: implement vstart CSR

2021-11-15 Thread Alistair Francis

Alistair

On Fri, Oct 29, 2021 at 7:55 PM  wrote:
>
> From: Frank Chang 
>
> * Update and check vstart value for vector instructions.
> * Add whole register move instruction helper functions as we have to
>   call helper function for case where vstart is not zero.
> * Remove probe_pages() calls in vector load/store instructions
>   (except fault-only-first loads) to raise the memory access exception
>   at the exact processed vector element.
>
> Signed-off-by: Frank Chang 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/csr.c  |   6 +-
>  target/riscv/helper.h   |   5 +
>  target/riscv/insn_trans/trans_rvv.c.inc |  75 ++---
>  target/riscv/translate.c|   6 +-
>  target/riscv/vector_helper.c| 210 +++-
>  5 files changed, 199 insertions(+), 103 deletions(-)
>
> diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> index 3dfbc177381..146447eac5d 100644
> --- a/target/riscv/csr.c
> +++ b/target/riscv/csr.c
> @@ -343,7 +343,11 @@ static RISCVException write_vstart(CPURISCVState *env, 
> int csrno,
>  #if !defined(CONFIG_USER_ONLY)
>  env->mstatus |= MSTATUS_VS;
>  #endif
> -env->vstart = val;
> +/*
> + * The vstart CSR is defined to have only enough writable bits
> + * to hold the largest element index, i.e. lg2(VLEN) bits.
> + */
> +env->vstart = val & ~(~0ULL << ctzl(env_archcpu(env)->cfg.vlen));
>  return RISCV_EXCP_NONE;
>  }
>
> diff --git a/target/riscv/helper.h b/target/riscv/helper.h
> index 1a0d817f0f5..a717a87a0e0 100644
> --- a/target/riscv/helper.h
> +++ b/target/riscv/helper.h
> @@ -1073,6 +1073,11 @@ DEF_HELPER_6(vcompress_vm_h, void, ptr, ptr, ptr, ptr, 
> env, i32)
>  DEF_HELPER_6(vcompress_vm_w, void, ptr, ptr, ptr, ptr, env, i32)
>  DEF_HELPER_6(vcompress_vm_d, void, ptr, ptr, ptr, ptr, env, i32)
>
> +DEF_HELPER_4(vmv1r_v, void, ptr, ptr, env, i32)
> +DEF_HELPER_4(vmv2r_v, void, ptr, ptr, env, i32)
> +DEF_HELPER_4(vmv4r_v, void, ptr, ptr, env, i32)
> +DEF_HELPER_4(vmv8r_v, void, ptr, ptr, env, i32)
> +
>  DEF_HELPER_5(vzext_vf2_h, void, ptr, ptr, ptr, env, i32)
>  DEF_HELPER_5(vzext_vf2_w, void, ptr, ptr, ptr, env, i32)
>  DEF_HELPER_5(vzext_vf2_d, void, ptr, ptr, ptr, env, i32)
> diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
> b/target/riscv/insn_trans/trans_rvv.c.inc
> index be3f9f13275..7589c8ce32a 100644
> --- a/target/riscv/insn_trans/trans_rvv.c.inc
> +++ b/target/riscv/insn_trans/trans_rvv.c.inc
> @@ -490,7 +490,7 @@ static bool vext_check_sds(DisasContext *s, int vd, int 
> vs1, int vs2, int vm)
>   */
>  static bool vext_check_reduction(DisasContext *s, int vs2)
>  {
> -return require_align(vs2, s->lmul);
> +return require_align(vs2, s->lmul) && (s->vstart == 0);
>  }
>
>  /*
> @@ -2786,7 +2786,8 @@ GEN_MM_TRANS(vmxnor_mm)
>  static bool trans_vcpop_m(DisasContext *s, arg_rmr *a)
>  {
>  if (require_rvv(s) &&
> -vext_check_isa_ill(s)) {
> +vext_check_isa_ill(s) &&
> +s->vstart == 0) {
>  TCGv_ptr src2, mask;
>  TCGv dst;
>  TCGv_i32 desc;
> @@ -2817,7 +2818,8 @@ static bool trans_vcpop_m(DisasContext *s, arg_rmr *a)
>  static bool trans_vfirst_m(DisasContext *s, arg_rmr *a)
>  {
>  if (require_rvv(s) &&
> -vext_check_isa_ill(s)) {
> +vext_check_isa_ill(s) &&
> +s->vstart == 0) {
>  TCGv_ptr src2, mask;
>  TCGv dst;
>  TCGv_i32 desc;
> @@ -2852,7 +2854,8 @@ static bool trans_##NAME(DisasContext *s, arg_rmr *a)   
>\
>  if (require_rvv(s) &&  \
>  vext_check_isa_ill(s) &&   \
>  require_vm(a->vm, a->rd) &&\
> -(a->rd != a->rs2)) {   \
> +(a->rd != a->rs2) &&   \
> +(s->vstart == 0)) {\
>  uint32_t data = 0; \
>  gen_helper_gvec_3_ptr *fn = gen_helper_##NAME; \
>  TCGLabel *over = gen_new_label();  \
> @@ -2888,7 +2891,8 @@ static bool trans_viota_m(DisasContext *s, arg_viota_m 
> *a)
>  vext_check_isa_ill(s) &&
>  !is_overlapped(a->rd, 1 << MAX(s->lmul, 0), a->rs2, 1) &&
>  require_vm(a->vm, a->rd) &&
> -require_align(a->rd, s->lmul)) {
> +require_align(a->rd, s->lmul) &&
> +(s->vstart == 0)) {
>  uint32_t data = 0;
>  TCGLabel *over = gen_new_label();
>  tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
> @@ -3109,6 +3113,7 @@ static bool trans_vmv_s_x(DisasContext *s, arg_vmv_s_x 
> *a)
>  TCGLabel *over = gen_new_label();
>
>  tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over);
> +tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
>
>  t1 =

Re: [PATCH for 6.2] nbd/server: Silence clang sanitizer warning

2021-11-15 Thread Philippe Mathieu-Daudé

On 11/15/21 23:39, Eric Blake wrote:
> clang's sanitizer is picky: memset(NULL, x, 0) is technically
> undefined behavior, even though no sane implementation of memset()
> deferences the NULL.  Caught by the nbd-qemu-allocation iotest.
> 
> The alternative to checking before each memset is to instead force an
> allocation of 1 element instead of g_new0(type, 0)'s behavior of
> returning NULL for a 0-length array.
> 
> Reported-by: Peter Maydell 
> Fixes: 3b1f244c59 (nbd: Allow export of multiple bitmaps for one device)
> Signed-off-by: Eric Blake 
> ---
>  nbd/server.c | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé

[PULL 2/3] tests/unit/test-smp-parse: QOM'ify smp_machine_class_init()

2021-11-15 Thread Philippe Mathieu-Daudé

smp_machine_class_init() is the actual TypeInfo::class_init().
Declare it as such in smp_machine_info, and avoid to call it
manually in each test. Move smp_machine_info definition just
before we register the type to avoid a forward declaration.

Reviewed-by: Andrew Jones 
Reviewed-by: Richard Henderson 
Reviewed-by: Yanan Wang 
Tested-by: Yanan Wang 
Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <2025145900.2531865-3-phi...@redhat.com>
---
 tests/unit/test-smp-parse.c | 34 ++
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/tests/unit/test-smp-parse.c b/tests/unit/test-smp-parse.c
index 47774c1ee2a..3fff2af4e27 100644
--- a/tests/unit/test-smp-parse.c
+++ b/tests/unit/test-smp-parse.c
@@ -75,14 +75,6 @@ typedef struct SMPTestData {
 const char *expect_error;
 } SMPTestData;
 
-/* Type info of the tested machine */
-static const TypeInfo smp_machine_info = {
-.name = TYPE_MACHINE,
-.parent = TYPE_OBJECT,
-.class_size = sizeof(MachineClass),
-.instance_size = sizeof(MachineState),
-};
-
 /*
  * List all the possible valid sub-collections of the generic 5
  * topology parameters (i.e. cpus/maxcpus/sockets/cores/threads),
@@ -480,9 +472,10 @@ static void unsupported_params_init(MachineClass *mc, 
SMPTestData *data)
 }
 }
 
-/* Reset the related machine properties before each sub-test */
-static void smp_machine_class_init(MachineClass *mc)
+static void machine_base_class_init(ObjectClass *oc, void *data)
 {
+MachineClass *mc = MACHINE_CLASS(oc);
+
 mc->min_cpus = MIN_CPUS;
 mc->max_cpus = MAX_CPUS;
 
@@ -498,8 +491,6 @@ static void test_generic(void)
 SMPTestData *data = &(SMPTestData){{ }};
 int i;
 
-smp_machine_class_init(mc);
-
 for (i = 0; i < ARRAY_SIZE(data_generic_valid); i++) {
 *data = data_generic_valid[i];
 unsupported_params_init(mc, data);
@@ -539,7 +530,6 @@ static void test_with_dies(void)
 unsigned int num_dies = 2;
 int i;
 
-smp_machine_class_init(mc);
 /* Force the SMP compat properties */
 mc->smp_props.dies_supported = true;
 
@@ -586,12 +576,24 @@ static void test_with_dies(void)
 object_unref(obj);
 }
 
+/* Type info of the tested machine */
+static const TypeInfo smp_machine_types[] = {
+{
+.name   = TYPE_MACHINE,
+.parent = TYPE_OBJECT,
+.class_init = machine_base_class_init,
+.class_size = sizeof(MachineClass),
+.instance_size  = sizeof(MachineState),
+}
+};
+
+DEFINE_TYPES(smp_machine_types)
+
 int main(int argc, char *argv[])
 {
-g_test_init(, , NULL);
-
 module_call_init(MODULE_INIT_QOM);
-type_register_static(_machine_info);
+
+g_test_init(, , NULL);
 
 g_test_add_func("/test-smp-parse/generic", test_generic);
 g_test_add_func("/test-smp-parse/with_dies", test_with_dies);
-- 
2.31.1

Re: [PATCH] sev: allow capabilities to check for SEV-ES support

2021-11-15 Thread Eric Blake

On Mon, Nov 15, 2021 at 02:38:04PM -0500, Tyler Fanelli wrote:
> Probe for SEV-ES and SEV-SNP capabilities to distinguish between Rome,
> Naples, and Milan processors. Use the CPUID function to probe if a
> processor is capable of running SEV-ES or SEV-SNP, rather than if it
> actually is running SEV-ES or SEV-SNP.
> 
> Signed-off-by: Tyler Fanelli 
> ---
>  qapi/misc-target.json | 11 +--
>  target/i386/sev.c |  6 --
>  2 files changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/qapi/misc-target.json b/qapi/misc-target.json
> index 5aa2b95b7d..c3e9bce12b 100644
> --- a/qapi/misc-target.json
> +++ b/qapi/misc-target.json
> @@ -182,13 +182,19 @@
>  # @reduced-phys-bits: Number of physical Address bit reduction when SEV is
>  # enabled
>  #
> +# @es: SEV-ES capability of the machine.
> +#
> +# @snp: SEV-SNP capability of the machine.
> +#

Missing '(since 7.0)' tags on the new members.

>  # Since: 2.12
>  ##
>  { 'struct': 'SevCapability',
>'data': { 'pdh': 'str',
>  'cert-chain': 'str',
>  'cbitpos': 'int',
> -'reduced-phys-bits': 'int'},
> +'reduced-phys-bits': 'int',
> +'es': 'bool',
> +'snp': 'bool'},
>'if': 'TARGET_I386' }
>  
>  ##
> @@ -205,7 +211,8 @@
>  #
>  # -> { "execute": "query-sev-capabilities" }
>  # <- { "return": { "pdh": "8CCDD8DDD", "cert-chain": "888CCCDDDEE",
> -#  "cbitpos": 47, "reduced-phys-bits": 5}}
> +#  "cbitpos": 47, "reduced-phys-bits": 5
> +#  "es": false, "snp": false}}

Invalid JSON, as you missed the comma needed after 5.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v4 00/11] tests/unit: Fix test-smp-parse

2021-11-15 Thread Philippe Mathieu-Daudé

On 11/15/21 15:58, Philippe Mathieu-Daudé wrote:
> Missing review: patches #4 to #8 (new)
> 
> Yet another approach to fix test-smp-parse.  v2 from Yanan Wang:
> https://lore.kernel.org/qemu-devel/2021024429.10568-1-wangyana...@huawei.com/
> 
> Here we use the QOM class_init() to avoid having to deal with
> object_unref() and deinit().
> 
> Since v3:
> - Restore 'dies_supported' field in test_with_dies (Yanan)
> - Add R-b tags
> - QOM-ify the TYPE_MACHINE classes
> 
> Philippe Mathieu-Daudé (11):
>   tests/unit/test-smp-parse: Restore MachineClass fields after modifying
>   tests/unit/test-smp-parse: QOM'ify smp_machine_class_init()
>   tests/unit/test-smp-parse: Explicit MachineClass name

Patches 1-3 queued.

[PULL 3/3] tests/unit/test-smp-parse: Explicit MachineClass name

2021-11-15 Thread Philippe Mathieu-Daudé

If the MachineClass::name pointer is not explicitly set, it is NULL.
Per the C standard, passing a NULL pointer to printf "%s" format is
undefined. Some implementations display it as 'NULL', other as 'null'.
Since we are comparing the formatted output, we need a stable value.
The easiest is to explicit a machine name string.

Reviewed-by: Andrew Jones 
Reviewed-by: Richard Henderson 
Reviewed-by: Yanan Wang 
Tested-by: Yanan Wang 
Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <2025145900.2531865-4-phi...@redhat.com>
---
 tests/unit/test-smp-parse.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tests/unit/test-smp-parse.c b/tests/unit/test-smp-parse.c
index 3fff2af4e27..b02450e25a3 100644
--- a/tests/unit/test-smp-parse.c
+++ b/tests/unit/test-smp-parse.c
@@ -23,6 +23,8 @@
 #define MIN_CPUS 1   /* set the min CPUs supported by the machine as 1 */
 #define MAX_CPUS 512 /* set the max CPUs supported by the machine as 512 */
 
+#define SMP_MACHINE_NAME "TEST-SMP"
+
 /*
  * Used to define the generic 3-level CPU topology hierarchy
  *  -sockets/cores/threads
@@ -307,13 +309,13 @@ static struct SMPTestData data_generic_invalid[] = {
  * should tweak the supported min CPUs to 2 for testing */
 .config = SMP_CONFIG_GENERIC(T, 1, F, 0, F, 0, F, 0, F, 0),
 .expect_error = "Invalid SMP CPUs 1. The min CPUs supported "
-"by machine '(null)' is 2",
+"by machine '" SMP_MACHINE_NAME "' is 2",
 }, {
 /* config: -smp 512
  * should tweak the supported max CPUs to 511 for testing */
 .config = SMP_CONFIG_GENERIC(T, 512, F, 0, F, 0, F, 0, F, 0),
 .expect_error = "Invalid SMP CPUs 512. The max CPUs supported "
-"by machine '(null)' is 511",
+"by machine '" SMP_MACHINE_NAME "' is 511",
 },
 };
 
@@ -481,6 +483,8 @@ static void machine_base_class_init(ObjectClass *oc, void 
*data)
 
 mc->smp_props.prefer_sockets = true;
 mc->smp_props.dies_supported = false;
+
+mc->name = g_strdup(SMP_MACHINE_NAME);
 }
 
 static void test_generic(void)
-- 
2.31.1

[PULL 1/3] tests/unit/test-smp-parse: Restore MachineClass fields after modifying

2021-11-15 Thread Philippe Mathieu-Daudé

There is a single MachineClass object, registered with
type_register_static(_machine_info). Since the same
object is used multiple times (an MachineState object
is instantiated in both test_generic and test_with_dies),
we should restore its internal state after modifying for
the test purpose.

Reviewed-by: Andrew Jones 
Reviewed-by: Richard Henderson 
Reviewed-by: Yanan Wang 
Tested-by: Yanan Wang 
Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <2025145900.2531865-2-phi...@redhat.com>
---
 tests/unit/test-smp-parse.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/tests/unit/test-smp-parse.c b/tests/unit/test-smp-parse.c
index cbe0c990494..47774c1ee2a 100644
--- a/tests/unit/test-smp-parse.c
+++ b/tests/unit/test-smp-parse.c
@@ -512,7 +512,7 @@ static void test_generic(void)
 smp_parse_test(ms, data, true);
 }
 
-/* Reset the supported min CPUs and max CPUs */
+/* Force invalid min CPUs and max CPUs */
 mc->min_cpus = 2;
 mc->max_cpus = 511;
 
@@ -523,6 +523,10 @@ static void test_generic(void)
 smp_parse_test(ms, data, false);
 }
 
+/* Reset the supported min CPUs and max CPUs */
+mc->min_cpus = MIN_CPUS;
+mc->max_cpus = MAX_CPUS;
+
 object_unref(obj);
 }
 
@@ -536,6 +540,7 @@ static void test_with_dies(void)
 int i;
 
 smp_machine_class_init(mc);
+/* Force the SMP compat properties */
 mc->smp_props.dies_supported = true;
 
 for (i = 0; i < ARRAY_SIZE(data_generic_valid); i++) {
@@ -575,6 +580,9 @@ static void test_with_dies(void)
 smp_parse_test(ms, data, false);
 }
 
+/* Restore the SMP compat properties */
+mc->smp_props.dies_supported = false;
+
 object_unref(obj);
 }
 
-- 
2.31.1

[PULL 0/3] Machine-next patches for 2021-11-15

2021-11-15 Thread Philippe Mathieu-Daudé

The following changes since commit 42f6c9179be4401974dd3a75ee72defd16b5092d:

  Merge tag 'pull-ppc-2022' of https://github.com/legoater/qemu into 
staging (2021-11-12 12:28:25 +0100)

are available in the Git repository at:

  https://github.com/philmd/qemu.git tags/machine-core-2025

for you to fetch changes up to 7b6d1bc9629f3dd45647ec3418e0606a9248:

  tests/unit/test-smp-parse: Explicit MachineClass name (2021-11-15 21:49:16 
+0100)


Machine core patches

- Rework SMP parsing unit test to work on WinGW:

  https://github.com/qemu/qemu/runs/4078386652

  This fixes:

Test smp_parse failed!
Expected error report: Invalid SMP CPUs 1. The min CPUs supported by 
machine '(null)' is 2
  Output error report: Invalid SMP CPUs 1. The min CPUs supported by 
machine '(NULL)' is 2



Philippe Mathieu-Daudé (3):
  tests/unit/test-smp-parse: Restore MachineClass fields after modifying
  tests/unit/test-smp-parse: QOM'ify smp_machine_class_init()
  tests/unit/test-smp-parse: Explicit MachineClass name

 tests/unit/test-smp-parse.c | 52 +++--
 1 file changed, 33 insertions(+), 19 deletions(-)

-- 
2.31.1

Re: [PULL 00/20] pci,pc,virtio: bugfixes

2021-11-15 Thread Philippe Mathieu-Daudé

On 11/15/21 17:37, Michael S. Tsirkin wrote:
> The following changes since commit 0a70bcf18caf7a61d480f8448723c15209d128ef:
> 
>   Update version for v6.2.0-rc0 release (2021-11-09 18:22:57 +0100)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream
> 
> for you to fetch changes up to 18416c62e36a79823a9e28f6b2260aa13c25e1d9:
> 
>   pcie: expire pending delete (2021-11-15 11:10:11 -0500)
> 
> 
> pci,pc,virtio: bugfixes
> 
> pci power management fixes
> acpi hotplug fixes
> misc other fixes
> 
> Signed-off-by: Michael S. Tsirkin 
> 
> 
> Eugenio Pérez (4):
>   vhost: Rename last_index to vq_index_end
>   vhost: Fix last vq queue index of devices with no cvq
>   vdpa: Replace qemu_open_old by qemu_open at
>   vdpa: Check for existence of opts.vhostdev
> 
> Gerd Hoffmann (6):
>   pci: implement power state
>   pcie: implement slot power control for pcie root ports
>   pcie: add power indicator blink check
>   pcie: factor out pcie_cap_slot_unplug()
>   pcie: fast unplug when slot power is off
>   pcie: expire pending delete
> 
> Igor Mammedov (2):
>   pcie: rename 'native-hotplug' to 'x-native-hotplug'
>   tests: bios-tables-test update expected blobs
> 
> Jason Wang (2):
>   virtio: use virtio accessor to access packed descriptor flags
>   virtio: use virtio accessor to access packed event
> 
> Julia Suvorova (3):
>   hw/acpi/ich9: Add compat prop to keep HPC bit set for 6.1 machine type
>   bios-tables-test: Allow changes in DSDT ACPI tables
>   hw/i386/acpi-build: Deny control on PCIe Native Hot-plug in _OSC
> 
> Philippe Mathieu-Daudé (1):
>   hw/mem/pc-dimm: Restrict NUMA-specific code to NUMA machines
> 
> Stefan Hajnoczi (1):
>   softmmu/qdev-monitor: fix use-after-free in qdev_set_id()
> 
> Stefano Garzarella (1):
>   net/vhost-vdpa: fix memory leak in vhost_vdpa_get_max_queue_pairs()
> 
>  include/hw/acpi/ich9.h|   1 +
>  include/hw/pci/pci.h  |   2 +
>  include/hw/qdev-core.h|   1 +
>  include/hw/virtio/vhost.h |   4 +-
>  hw/acpi/ich9.c|  18 
>  hw/i386/acpi-build.c  |  12 --
>  hw/i386/pc.c  |   2 +
>  hw/i386/pc_q35.c  |   9 +++-
>  hw/mem/pc-dimm.c  |  23 ++
>  hw/net/vhost_net.c|  12 +++---
>  hw/pci/pci.c  |  25 ++-
>  hw/pci/pci_host.c |   6 ++-
>  hw/pci/pcie.c |  79 
> --
>  hw/pci/pcie_port.c|   2 +-
>  hw/virtio/vhost-vdpa.c|   2 +-
>  hw/virtio/virtio.c|  24 ---
>  net/vhost-vdpa.c  |   8 +++-
>  softmmu/qdev-monitor.c|   6 ++-

Cc'ing Alex for this apparently unrelated test failure:

make[2]: Entering directory 'build/tests/tcg/arm-linux-user'
timeout --foreground 60  tests/guest-debug/run-test.py --gdb
/usr/bin/gdb-multiarch --qemu build/qemu-arm --qargs "" --bin testthread
--test tests/tcg/multiarch/gdbstub/test-thread-breakpoint.py >
run-gdbstub-thread-breakpoint.out
make[2]: *** [tests/tcg/multiarch/Makefile.target:71:
run-gdbstub-thread-breakpoint] Error 1
make[1]: *** [tests/tcg/Makefile.qemu:102: run-guest-tests] Error 2
make: *** [tests/Makefile.include:63: run-tcg-tests-arm-linux-user] Error 2

https://gitlab.com/qemu-project/qemu/-/jobs/1785024040#L5986

[PATCH for 6.2] nbd/server: Silence clang sanitizer warning

2021-11-15 Thread Eric Blake

clang's sanitizer is picky: memset(NULL, x, 0) is technically
undefined behavior, even though no sane implementation of memset()
deferences the NULL.  Caught by the nbd-qemu-allocation iotest.

The alternative to checking before each memset is to instead force an
allocation of 1 element instead of g_new0(type, 0)'s behavior of
returning NULL for a 0-length array.

Reported-by: Peter Maydell 
Fixes: 3b1f244c59 (nbd: Allow export of multiple bitmaps for one device)
Signed-off-by: Eric Blake 
---
 nbd/server.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 6d03e8a4b436..d9164ee6d0da 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1,5 +1,5 @@
 /*
- *  Copyright (C) 2016-2020 Red Hat, Inc.
+ *  Copyright (C) 2016-2021 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori 
  *
  *  Network Block Device Server Side
@@ -879,7 +879,9 @@ static bool nbd_meta_qemu_query(NBDClient *client, 
NBDExportMetaContexts *meta,
 if (!*query) {
 if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
 meta->allocation_depth = meta->exp->allocation_depth;
-memset(meta->bitmaps, 1, meta->exp->nr_export_bitmaps);
+if (meta->exp->nr_export_bitmaps) {
+memset(meta->bitmaps, 1, meta->exp->nr_export_bitmaps);
+}
 }
 trace_nbd_negotiate_meta_query_parse("empty");
 return true;
@@ -894,7 +896,8 @@ static bool nbd_meta_qemu_query(NBDClient *client, 
NBDExportMetaContexts *meta,
 if (nbd_strshift(, "dirty-bitmap:")) {
 trace_nbd_negotiate_meta_query_parse("dirty-bitmap:");
 if (!*query) {
-if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
+if (client->opt == NBD_OPT_LIST_META_CONTEXT &&
+meta->exp->nr_export_bitmaps) {
 memset(meta->bitmaps, 1, meta->exp->nr_export_bitmaps);
 }
 trace_nbd_negotiate_meta_query_parse("empty");
@@ -1024,7 +1027,9 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
 /* enable all known contexts */
 meta->base_allocation = true;
 meta->allocation_depth = meta->exp->allocation_depth;
-memset(meta->bitmaps, 1, meta->exp->nr_export_bitmaps);
+if (meta->exp->nr_export_bitmaps) {
+memset(meta->bitmaps, 1, meta->exp->nr_export_bitmaps);
+}
 } else {
 for (i = 0; i < nb_queries; ++i) {
 ret = nbd_negotiate_meta_query(client, meta, errp);
-- 
2.33.1

[PATCH-for-6.2? 2/2] hw/intc/arm_gicv3: Introduce CONFIG_ARM_GIC_TCG Kconfig selector

2021-11-15 Thread Philippe Mathieu-Daudé

The TYPE_ARM_GICV3 device is an emulated one.  When using
KVM, it is recommended to use the TYPE_KVM_ARM_GICV3 device
(which uses in-kernel support).

When using --with-devices-FOO, it is possible to build a
binary with a specific set of devices. When this binary is
restricted to KVM accelerator, the TYPE_ARM_GICV3 device is
irrelevant, and it is desirable to remove it from the binary.

Therefore introduce the CONFIG_ARM_GIC_TCG Kconfig selector
which select the files required to have the TYPE_ARM_GICV3
device, but also allowing to de-select this device.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/intc/arm_gicv3.c |  2 +-
 hw/intc/Kconfig |  5 +
 hw/intc/meson.build | 10 ++
 3 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/hw/intc/arm_gicv3.c b/hw/intc/arm_gicv3.c
index c6282984b1e..76950c6f724 100644
--- a/hw/intc/arm_gicv3.c
+++ b/hw/intc/arm_gicv3.c
@@ -1,5 +1,5 @@
 /*
- * ARM Generic Interrupt Controller v3
+ * ARM Generic Interrupt Controller v3 (emulation)
  *
  * Copyright (c) 2015 Huawei.
  * Copyright (c) 2016 Linaro Limited
diff --git a/hw/intc/Kconfig b/hw/intc/Kconfig
index 78aed93c454..010ded7eae0 100644
--- a/hw/intc/Kconfig
+++ b/hw/intc/Kconfig
@@ -25,6 +25,11 @@ config APIC
 select MSI_NONBROKEN
 select I8259
 
+config ARM_GIC_TCG
+bool
+default y
+depends on ARM_GIC && TCG
+
 config ARM_GIC_KVM
 bool
 default y
diff --git a/hw/intc/meson.build b/hw/intc/meson.build
index 11352806db2..70080bc161c 100644
--- a/hw/intc/meson.build
+++ b/hw/intc/meson.build
@@ -3,12 +3,14 @@
   'arm_gic.c',
   'arm_gic_common.c',
   'arm_gicv2m.c',
-  'arm_gicv3.c',
   'arm_gicv3_common.c',
-  'arm_gicv3_dist.c',
   'arm_gicv3_its_common.c',
-  'arm_gicv3_redist.c',
+))
+softmmu_ss.add(when: 'CONFIG_ARM_GIC_TCG', if_true: files(
+  'arm_gicv3.c',
+  'arm_gicv3_dist.c',
   'arm_gicv3_its.c',
+  'arm_gicv3_redist.c',
 ))
 softmmu_ss.add(when: 'CONFIG_ETRAXFS', if_true: files('etraxfs_pic.c'))
 softmmu_ss.add(when: 'CONFIG_HEATHROW_PIC', if_true: files('heathrow_pic.c'))
@@ -26,7 +28,7 @@
 specific_ss.add(when: 'CONFIG_ALLWINNER_A10_PIC', if_true: 
files('allwinner-a10-pic.c'))
 specific_ss.add(when: 'CONFIG_APIC', if_true: files('apic.c', 'apic_common.c'))
 specific_ss.add(when: 'CONFIG_ARM_GIC', if_true: 
files('arm_gicv3_cpuif_common.c'))
-specific_ss.add(when: 'CONFIG_ARM_GIC', if_true: files('arm_gicv3_cpuif.c'))
+specific_ss.add(when: 'CONFIG_ARM_GIC_TCG', if_true: 
files('arm_gicv3_cpuif.c'))
 specific_ss.add(when: 'CONFIG_ARM_GIC_KVM', if_true: files('arm_gic_kvm.c'))
 specific_ss.add(when: ['CONFIG_ARM_GIC_KVM', 'TARGET_AARCH64'], if_true: 
files('arm_gicv3_kvm.c', 'arm_gicv3_its_kvm.c'))
 specific_ss.add(when: 'CONFIG_ARM_V7M', if_true: files('armv7m_nvic.c'))
-- 
2.31.1

[PATCH-for-6.2? 0/2] hw/intc/arm_gicv3: Introduce CONFIG_ARM_GIC_TCG Kconfig selector

2021-11-15 Thread Philippe Mathieu-Daudé

The GICv3 ITS support has been introduced uring the 6.2 development
window (commits 18f6290a6a9..17fb5e36aab). This device is for
emulation. When building virtualization-only binary, it might be
desirable to not include this device.

Introduce the CONFIG_ARM_GIC_TCG Kconfig selector to allow downstream
distributions to deselect this device.

Based-on: pull-target-arm-2025-1

Philippe Mathieu-Daudé (2):
  hw/intc/arm_gicv3: Extract gicv3_set_gicv3state from arm_gicv3_cpuif.c
  hw/intc/arm_gicv3: Introduce CONFIG_ARM_GIC_TCG Kconfig selector

 hw/intc/arm_gicv3.c  |  2 +-
 hw/intc/arm_gicv3_cpuif.c| 10 +-
 hw/intc/arm_gicv3_cpuif_common.c | 22 ++
 hw/intc/Kconfig  |  5 +
 hw/intc/meson.build  | 11 +++
 5 files changed, 36 insertions(+), 14 deletions(-)
 create mode 100644 hw/intc/arm_gicv3_cpuif_common.c

-- 
2.31.1

[PATCH-for-6.2? 1/2] hw/intc/arm_gicv3: Extract gicv3_set_gicv3state from arm_gicv3_cpuif.c

2021-11-15 Thread Philippe Mathieu-Daudé

gicv3_set_gicv3state() is used by arm_gicv3_common.c in
arm_gicv3_common_realize(). Since we want to restrict
arm_gicv3_cpuif.c to TCG, extract gicv3_set_gicv3state()
to a new file. Add this file to the meson 'specific'
source set, since it needs access to "cpu.h".

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/intc/arm_gicv3_cpuif.c| 10 +-
 hw/intc/arm_gicv3_cpuif_common.c | 22 ++
 hw/intc/meson.build  |  1 +
 3 files changed, 24 insertions(+), 9 deletions(-)
 create mode 100644 hw/intc/arm_gicv3_cpuif_common.c

diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
index 3fe5de8ad7d..5e539e3b5ef 100644
--- a/hw/intc/arm_gicv3_cpuif.c
+++ b/hw/intc/arm_gicv3_cpuif.c
@@ -1,5 +1,5 @@
 /*
- * ARM Generic Interrupt Controller v3
+ * ARM Generic Interrupt Controller v3 (emulation)
  *
  * Copyright (c) 2016 Linaro Limited
  * Written by Peter Maydell
@@ -21,14 +21,6 @@
 #include "hw/irq.h"
 #include "cpu.h"
 
-void gicv3_set_gicv3state(CPUState *cpu, GICv3CPUState *s)
-{
-ARMCPU *arm_cpu = ARM_CPU(cpu);
-CPUARMState *env = _cpu->env;
-
-env->gicv3state = (void *)s;
-};
-
 static GICv3CPUState *icc_cs_from_env(CPUARMState *env)
 {
 return env->gicv3state;
diff --git a/hw/intc/arm_gicv3_cpuif_common.c b/hw/intc/arm_gicv3_cpuif_common.c
new file mode 100644
index 000..ff1239f65db
--- /dev/null
+++ b/hw/intc/arm_gicv3_cpuif_common.c
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * ARM Generic Interrupt Controller v3
+ *
+ * Copyright (c) 2016 Linaro Limited
+ * Written by Peter Maydell
+ *
+ * This code is licensed under the GPL, version 2 or (at your option)
+ * any later version.
+ */
+
+#include "qemu/osdep.h"
+#include "gicv3_internal.h"
+#include "cpu.h"
+
+void gicv3_set_gicv3state(CPUState *cpu, GICv3CPUState *s)
+{
+ARMCPU *arm_cpu = ARM_CPU(cpu);
+CPUARMState *env = _cpu->env;
+
+env->gicv3state = (void *)s;
+};
diff --git a/hw/intc/meson.build b/hw/intc/meson.build
index c89d2ca180e..11352806db2 100644
--- a/hw/intc/meson.build
+++ b/hw/intc/meson.build
@@ -25,6 +25,7 @@
 
 specific_ss.add(when: 'CONFIG_ALLWINNER_A10_PIC', if_true: 
files('allwinner-a10-pic.c'))
 specific_ss.add(when: 'CONFIG_APIC', if_true: files('apic.c', 'apic_common.c'))
+specific_ss.add(when: 'CONFIG_ARM_GIC', if_true: 
files('arm_gicv3_cpuif_common.c'))
 specific_ss.add(when: 'CONFIG_ARM_GIC', if_true: files('arm_gicv3_cpuif.c'))
 specific_ss.add(when: 'CONFIG_ARM_GIC_KVM', if_true: files('arm_gic_kvm.c'))
 specific_ss.add(when: ['CONFIG_ARM_GIC_KVM', 'TARGET_AARCH64'], if_true: 
files('arm_gicv3_kvm.c', 'arm_gicv3_its_kvm.c'))
-- 
2.31.1

Re: "make check" fails in a clang sanitizer build on "nbd-qemu-allocation" iotest

2021-11-15 Thread Eric Blake

On Mon, Nov 15, 2021 at 05:11:54PM +, Peter Maydell wrote:
> Hi; running a 'make check' on a clang sanitizer build one of
> the iotests falls over due to a NULL pointer being passed to
> memset():
> 
> 
>   TEST   iotest-qcow2: nbd-qemu-allocation [fail]

> +../../nbd/server.c:1027:16: runtime error: null pointer passed as
> argument 1, which is declared to never be null

The code in question:

if (client->opt == NBD_OPT_LIST_META_CONTEXT && !nb_queries) {
/* enable all known contexts */
meta->base_allocation = true;
meta->allocation_depth = meta->exp->allocation_depth;
memset(meta->bitmaps, 1, meta->exp->nr_export_bitmaps);

I suspect what is happening is that meta->bitmaps is NULL when
meta->exp->nr_export_bitmaps is 0. It's annoying that clang's
sanitizer whines even for a 0-length memset, but a strict reading of
POSIX says that we really are in the technically undefined behavior
when passing NULL (even with 0 length), so such whiny behavior is
permitted.  So I'll post a patch.

> 
> Does this look familiar ?

First I've heard of it; thanks for alerting me.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH RFC 2/2] hw: Replace drive_get_next() by drive_get()

2021-11-15 Thread Philippe Mathieu-Daudé

On 11/15/21 16:57, Markus Armbruster wrote:
> Philippe Mathieu-Daudé  writes:
>> On 11/15/21 13:55, Markus Armbruster wrote:
>>> drive_get_next() is basically a bad idea.  It returns the "next" block
>>> backend of a certain interface type.  "Next" means bus=0,unit=N, where
>>> subsequent calls count N up from zero, per interface type.
>>>
>>> This lets you define unit numbers implicitly by execution order.  If the
>>> order changes, or new calls appear "in the middle", unit numbers change.
>>> ABI break.  Hard to spot in review.
>>>
>>> Explicit is better than implicit: use drive_get() directly.
>>>
>>> Signed-off-by: Markus Armbruster 
>>> ---

>>> @@ -435,11 +438,13 @@ static void aspeed_machine_init(MachineState *machine)
>>>  }
>>>  
>>>  for (i = 0; i < bmc->soc.sdhci.num_slots; i++) {
>>> -sdhci_attach_drive(>soc.sdhci.slots[i], 
>>> drive_get_next(IF_SD));
>>> +sdhci_attach_drive(>soc.sdhci.slots[i],
>>> +   drive_get(IF_SD, 0, i));
>>
>> If we put SD on bus #0, ...
>>
>>>  }
>>>  
>>>  if (bmc->soc.emmc.num_slots) {
>>> -sdhci_attach_drive(>soc.emmc.slots[0], drive_get_next(IF_SD));
>>> +sdhci_attach_drive(>soc.emmc.slots[0],
>>> +   drive_get(IF_SD, 0, bmc->soc.sdhci.num_slots));
>>
>> ... we'd want to put eMMC on bus #1
> 
> Using separate buses for different kinds of devices would be neater, but
> it also would be an incompatible change.  This patch keeps existing
> bus/unit numbers working.  drive_get_next() can only use bus 0.
> 
>>  but I see having eMMC cards on a
>> IF_SD bus as a bug, since these cards are soldered on the board.
> 
> IF_SD is not a bus, it's an "block interface type", which is really just
> a user interface thing.

Why are we discriminating by "block interface type" then?

What is the difference between "block interfaces"? I see a block drive
as a generic unit, usable on multiple hardware devices.

I never really understood how this "block interface type" helps
developers and users. I thought BlockInterfaceType and DriveInfo
were legacy / deprecated APIs we want to get rid of; and we would
come up with a replacement API using BlockDeviceInfo or providing
a BlockFrontend state of the art object.
Anyway, I suppose the explanation is buried in the git history
before the last 8 years. I need to keep reading.

Re: [PULL 00/13] Block layer patches

2021-11-15 Thread Philippe Mathieu-Daudé

Hi Kevin,

On 11/15/21 15:53, Kevin Wolf wrote:
> The following changes since commit 42f6c9179be4401974dd3a75ee72defd16b5092d:
> 
>   Merge tag 'pull-ppc-2022' of https://github.com/legoater/qemu into 
> staging (2021-11-12 12:28:25 +0100)
> 
> are available in the Git repository at:
> 
>   git://repo.or.cz/qemu/kevin.git tags/for-upstream
> 
> for you to fetch changes up to 7461272c5f6032436ef9032c091c0118539483e4:
> 
>   softmmu/qdev-monitor: fix use-after-free in qdev_set_id() (2021-11-15 
> 15:49:46 +0100)
> 
> 
> Block layer patches
> 
> - Fixes to image streaming job and block layer reconfiguration to make
>   iotest 030 pass again
> - docs: Deprecate incorrectly typed device_add arguments
> - file-posix: Fix alignment after reopen changing O_DIRECT
> 
> 
> Hanna Reitz (10):
>   stream: Traverse graph after modification
>   block: Manipulate children list in .attach/.detach
>   block: Unite remove_empty_child and child_free
>   block: Drop detached child from ignore list
>   block: Pass BdrvChild ** to replace_child_noperm
>   block: Restructure remove_file_or_backing_child()
>   transactions: Invoke clean() after everything else
>   block: Let replace_child_tran keep indirect pointer
>   block: Let replace_child_noperm free children
>   iotests/030: Unthrottle parallel jobs in reverse
> 
> Kevin Wolf (2):
>   docs: Deprecate incorrectly typed device_add arguments
>   file-posix: Fix alignment after reopen changing O_DIRECT
> 
> Stefan Hajnoczi (1):
>   softmmu/qdev-monitor: fix use-after-free in qdev_set_id()
> 
>  docs/about/deprecated.rst   |  14 +++
>  include/qemu/transactions.h |   3 +
>  block.c | 233 
> +---
>  block/file-posix.c  |  20 +++-
>  block/stream.c  |   7 +-
>  softmmu/qdev-monitor.c  |   2 +-
>  util/transactions.c |   8 +-
>  tests/qemu-iotests/030  |  11 ++-
>  tests/qemu-iotests/142  |  22 +
>  tests/qemu-iotests/142.out  |  15 +++
>  10 files changed, 269 insertions(+), 66 deletions(-)

Looking at current /staging I noticed iotest#142 failed,
build-tcg-disabled job:

+++ 142.out.bad
@@ -750,6 +750,7 @@
 --- Alignment after changing O_DIRECT ---
+qemu-io: Cannot get 'write' permission without 'resize': Image size is
not a multiple of request alignment
 read 42/42 bytes at offset 42
 42 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 read 42/42 bytes at offset 42

https://gitlab.com/qemu-project/qemu/-/jobs/1784955950#L2794

Re: [PULL 00/13] Block layer patches

2021-11-15 Thread Richard Henderson


On 11/15/21 3:53 PM, Kevin Wolf wrote:

The following changes since commit 42f6c9179be4401974dd3a75ee72defd16b5092d:

   Merge tag 'pull-ppc-2022' of https://github.com/legoater/qemu into 
staging (2021-11-12 12:28:25 +0100)

are available in the Git repository at:

   git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to 7461272c5f6032436ef9032c091c0118539483e4:

   softmmu/qdev-monitor: fix use-after-free in qdev_set_id() (2021-11-15 
15:49:46 +0100)


Block layer patches

- Fixes to image streaming job and block layer reconfiguration to make
   iotest 030 pass again
- docs: Deprecate incorrectly typed device_add arguments
- file-posix: Fix alignment after reopen changing O_DIRECT


Hanna Reitz (10):
   stream: Traverse graph after modification
   block: Manipulate children list in .attach/.detach
   block: Unite remove_empty_child and child_free
   block: Drop detached child from ignore list
   block: Pass BdrvChild ** to replace_child_noperm
   block: Restructure remove_file_or_backing_child()
   transactions: Invoke clean() after everything else
   block: Let replace_child_tran keep indirect pointer
   block: Let replace_child_noperm free children
   iotests/030: Unthrottle parallel jobs in reverse

Kevin Wolf (2):
   docs: Deprecate incorrectly typed device_add arguments
   file-posix: Fix alignment after reopen changing O_DIRECT

Stefan Hajnoczi (1):
   softmmu/qdev-monitor: fix use-after-free in qdev_set_id()

  docs/about/deprecated.rst   |  14 +++
  include/qemu/transactions.h |   3 +
  block.c | 233 +---
  block/file-posix.c  |  20 +++-
  block/stream.c  |   7 +-
  softmmu/qdev-monitor.c  |   2 +-
  util/transactions.c |   8 +-
  tests/qemu-iotests/030  |  11 ++-
  tests/qemu-iotests/142  |  22 +
  tests/qemu-iotests/142.out  |  15 +++
  10 files changed, 269 insertions(+), 66 deletions(-)


This is failing iotest 142 for build-tcg-disabled.
I did retry, in case it was transitory.

https://gitlab.com/qemu-project/qemu/-/jobs/1784955950


r~

[PATCH for 6.2 v4] nbd/server: Add --selinux-label option

2021-11-15 Thread Eric Blake

From: "Richard W.M. Jones" 

Under SELinux, Unix domain sockets have two labels.  One is on the
disk and can be set with commands such as chcon(1).  There is a
different label stored in memory (called the process label).  This can
only be set by the process creating the socket.  When using SELinux +
SVirt and wanting qemu to be able to connect to a qemu-nbd instance,
you must set both labels correctly first.

For qemu-nbd the options to set the second label are awkward.  You can
create the socket in a wrapper program and then exec into qemu-nbd.
Or you could try something with LD_PRELOAD.

This commit adds the ability to set the label straightforwardly on the
command line, via the new --selinux-label flag.  (The name of the flag
is the same as the equivalent nbdkit option.)

A worked example showing how to use the new option can be found in
this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1984938

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1984938
Signed-off-by: Richard W.M. Jones 
Reviewed-by: Daniel P. Berrangé 
Message-Id: <20210930084701.3899578-1-rjo...@redhat.com>
[eblake: rebase to configure changes, reject --selinux-label if it is
not compiled in or not used on a Unix socket]
Signed-off-by: Eric Blake 
---

v3 was here:
https://lists.gnu.org/archive/html/qemu-devel/2021-09/msg07677.html

since then: another rework of the configure logic (now that
meson-buildoptions.sh exists), and a rework of the error reporting
(for now, loudly complain on all unsupported attempts at labeling. We
may later allow things that currently fail)

Candidate for 6.2 in spite of soft freeze because of earlier attempted
pull request here:
https://lists.gnu.org/archive/html/qemu-devel/2021-09/msg07081.html

and my apologies for letting this slip a month without action.

 meson.build   | 10 +++-
 qemu-nbd.c| 46 +++
 meson_options.txt |  3 ++
 scripts/meson-buildoptions.sh |  3 ++
 tests/docker/dockerfiles/centos8.docker   |  1 +
 .../dockerfiles/fedora-i386-cross.docker  |  1 +
 tests/docker/dockerfiles/fedora.docker|  1 +
 tests/docker/dockerfiles/opensuse-leap.docker |  1 +
 tests/docker/dockerfiles/ubuntu1804.docker|  1 +
 tests/docker/dockerfiles/ubuntu2004.docker|  1 +
 10 files changed, 67 insertions(+), 1 deletion(-)

diff --git a/meson.build b/meson.build
index 2ece4fe0889a..73b7deaa0ef3 100644
--- a/meson.build
+++ b/meson.build
@@ -1201,6 +1201,11 @@ keyutils = dependency('libkeyutils', required: false,

 has_gettid = cc.has_function('gettid')

+# libselinux
+selinux = dependency('libselinux',
+ required: get_option('selinux'),
+ method: 'pkg-config', kwargs: static_kwargs)
+
 # Malloc tests

 malloc = []
@@ -1479,6 +1484,7 @@ config_host_data.set('CONFIG_SPICE_PROTOCOL', 
spice_protocol.found())
 config_host_data.set('CONFIG_SPICE', spice.found())
 config_host_data.set('CONFIG_X11', x11.found())
 config_host_data.set('CONFIG_CFI', get_option('cfi'))
+config_host_data.set('CONFIG_SELINUX', selinux.found())
 config_host_data.set('QEMU_VERSION', '"@0@"'.format(meson.project_version()))
 config_host_data.set('QEMU_VERSION_MAJOR', 
meson.project_version().split('.')[0])
 config_host_data.set('QEMU_VERSION_MINOR', 
meson.project_version().split('.')[1])
@@ -3054,7 +3060,8 @@ if have_tools
   qemu_io = executable('qemu-io', files('qemu-io.c'),
  dependencies: [block, qemuutil], install: true)
   qemu_nbd = executable('qemu-nbd', files('qemu-nbd.c'),
-   dependencies: [blockdev, qemuutil, gnutls], install: true)
+   dependencies: [blockdev, qemuutil, gnutls, selinux],
+   install: true)

   subdir('storage-daemon')
   subdir('contrib/rdmacm-mux')
@@ -3430,6 +3437,7 @@ summary_info += {'libdaxctl support': libdaxctl}
 summary_info += {'libudev':   libudev}
 # Dummy dependency, keep .found()
 summary_info += {'FUSE lseek':fuse_lseek.found()}
+summary_info += {'selinux':   selinux.found()}
 summary(summary_info, bool_yn: true, section: 'Dependencies')

 if not supported_cpus.contains(cpu)
diff --git a/qemu-nbd.c b/qemu-nbd.c
index 9d895ba24b1e..c6c20df68a4d 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -47,6 +47,10 @@
 #include "trace/control.h"
 #include "qemu-version.h"

+#ifdef CONFIG_SELINUX
+#include 
+#endif
+
 #ifdef __linux__
 #define HAVE_NBD_DEVICE 1
 #else
@@ -64,6 +68,7 @@
 #define QEMU_NBD_OPT_FORK  263
 #define QEMU_NBD_OPT_TLSAUTHZ  264
 #define QEMU_NBD_OPT_PID_FILE  265
+#define QEMU_NBD_OPT_SELINUX_LABEL 266

 #define MBR_SIZE 512

@@ -116,6 +121,9 @@ static void usage(const char *name)
 "  --forkfork off the server process and exit the parent\n"
 "once the server is running\n"
 "  --pid-file=PATH   store the server's process ID in the given file\n"
+#ifdef

[PULL 4/4] hw/rtc/pl031: Send RTC_CHANGE QMP event

2021-11-15 Thread Peter Maydell

From: Eric Auger 

The PL031 currently is not able to report guest RTC change to the QMP
monitor as opposed to mc146818 or spapr RTCs. This patch adds the call
to qapi_event_send_rtc_change() when the Load Register is written. The
value which is reported corresponds to the difference between the guest
reference time and the reference time kept in softmmu/rtc.c.

For instance adding 20s to the guest RTC value will report 20. Adding
an extra 20s to the guest RTC value will report 20 + 20 = 40.

The inclusion of qapi/qapi-types-misc-target.h in hw/rtl/pl031.c
require to compile the PL031 with specific_ss.add() to avoid
./qapi/qapi-types-misc-target.h:18:13: error: attempt to use poisoned
"TARGET_".

Signed-off-by: Eric Auger 
Reviewed-by: Peter Maydell 
Message-id: 20210920122535.269988-1-eric.au...@redhat.com
Signed-off-by: Peter Maydell 
---
 hw/rtc/pl031.c | 10 +-
 hw/rtc/meson.build |  2 +-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/hw/rtc/pl031.c b/hw/rtc/pl031.c
index 2bbb2062ac8..e7ced90b025 100644
--- a/hw/rtc/pl031.c
+++ b/hw/rtc/pl031.c
@@ -24,6 +24,7 @@
 #include "qemu/log.h"
 #include "qemu/module.h"
 #include "trace.h"
+#include "qapi/qapi-events-misc-target.h"
 
 #define RTC_DR  0x00/* Data read register */
 #define RTC_MR  0x04/* Match register */
@@ -136,10 +137,17 @@ static void pl031_write(void * opaque, hwaddr offset,
 trace_pl031_write(offset, value);
 
 switch (offset) {
-case RTC_LR:
+case RTC_LR: {
+struct tm tm;
+
 s->tick_offset += value - pl031_get_count(s);
+
+qemu_get_timedate(, s->tick_offset);
+qapi_event_send_rtc_change(qemu_timedate_diff());
+
 pl031_set_alarm(s);
 break;
+}
 case RTC_MR:
 s->mr = value;
 pl031_set_alarm(s);
diff --git a/hw/rtc/meson.build b/hw/rtc/meson.build
index 7cecdee5ddb..8fd8d8f9a71 100644
--- a/hw/rtc/meson.build
+++ b/hw/rtc/meson.build
@@ -2,7 +2,7 @@
 softmmu_ss.add(when: 'CONFIG_DS1338', if_true: files('ds1338.c'))
 softmmu_ss.add(when: 'CONFIG_M41T80', if_true: files('m41t80.c'))
 softmmu_ss.add(when: 'CONFIG_M48T59', if_true: files('m48t59.c'))
-softmmu_ss.add(when: 'CONFIG_PL031', if_true: files('pl031.c'))
+specific_ss.add(when: 'CONFIG_PL031', if_true: files('pl031.c'))
 softmmu_ss.add(when: 'CONFIG_TWL92230', if_true: files('twl92230.c'))
 softmmu_ss.add(when: ['CONFIG_ISA_BUS', 'CONFIG_M48T59'], if_true: 
files('m48t59-isa.c'))
 softmmu_ss.add(when: 'CONFIG_XLNX_ZYNQMP', if_true: files('xlnx-zynqmp-rtc.c'))
-- 
2.25.1

[PULL 0/4] target-arm queue

2021-11-15 Thread Peter Maydell

Hi; some minor changes for 6.2, which I think can be classified
as bug fixes and are OK for this point in the release cycle.
(Wouldn't be the end of the world if they slipped to 7.0.)

-- PMM

The following changes since commit 42f6c9179be4401974dd3a75ee72defd16b5092d:

  Merge tag 'pull-ppc-2022' of https://github.com/legoater/qemu into 
staging (2021-11-12 12:28:25 +0100)

are available in the Git repository at:

  https://git.linaro.org/people/pmaydell/qemu-arm.git 
tags/pull-target-arm-2025-1

for you to fetch changes up to 1adf528ec3bdf62ea3b580b7ad562534a3676ff5:

  hw/rtc/pl031: Send RTC_CHANGE QMP event (2021-11-15 18:53:00 +)


target-arm queue:
 * Support multiple redistributor regions for TCG GICv3
 * Send RTC_CHANGE QMP event from pl031


Eric Auger (1):
  hw/rtc/pl031: Send RTC_CHANGE QMP event

Peter Maydell (3):
  hw/intc/arm_gicv3: Move checking of redist-region-count to 
arm_gicv3_common_realize
  hw/intc/arm_gicv3: Set GICR_TYPER.Last correctly when nb_redist_regions > 
1
  hw/intc/arm_gicv3: Support multiple redistributor regions

 include/hw/intc/arm_gicv3_common.h | 14 --
 hw/intc/arm_gicv3.c| 12 +---
 hw/intc/arm_gicv3_common.c | 56 --
 hw/intc/arm_gicv3_kvm.c| 10 ++-
 hw/intc/arm_gicv3_redist.c | 40 +++
 hw/rtc/pl031.c | 10 ++-
 hw/rtc/meson.build |  2 +-
 7 files changed, 83 insertions(+), 61 deletions(-)

[PULL 2/4] hw/intc/arm_gicv3: Set GICR_TYPER.Last correctly when nb_redist_regions > 1

2021-11-15 Thread Peter Maydell

The 'Last' bit in the GICR_TYPER GICv3 redistributor register is
supposed to be set to 1 if this is the last redistributor in a series
of contiguous redistributor pages.  Currently we set Last only for
the redistributor for CPU (num_cpu - 1).  This only works if there is
a single redistributor region; if there are multiple redistributor
regions then we need to set the Last bit for the last redistributor
in each region.

This doesn't cause any problems currently because only the KVM GICv3
supports multiple redistributor regions, and it ignores the value in
GICv3State::gicr_typer.  But we need to fix this before we can enable
support for multiple regions in the emulated GICv3.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 hw/intc/arm_gicv3_common.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/hw/intc/arm_gicv3_common.c b/hw/intc/arm_gicv3_common.c
index 8e47809398b..8de9205b386 100644
--- a/hw/intc/arm_gicv3_common.c
+++ b/hw/intc/arm_gicv3_common.c
@@ -297,7 +297,7 @@ void gicv3_init_irqs_and_mmio(GICv3State *s, 
qemu_irq_handler handler,
 static void arm_gicv3_common_realize(DeviceState *dev, Error **errp)
 {
 GICv3State *s = ARM_GICV3_COMMON(dev);
-int i, rdist_capacity;
+int i, rdist_capacity, cpuidx;
 
 /* revision property is actually reserved and currently used only in order
  * to keep the interface compatible with GICv2 code, avoiding extra
@@ -355,7 +355,6 @@ static void arm_gicv3_common_realize(DeviceState *dev, 
Error **errp)
 for (i = 0; i < s->num_cpu; i++) {
 CPUState *cpu = qemu_get_cpu(i);
 uint64_t cpu_affid;
-int last;
 
 s->cpu[i].cpu = cpu;
 s->cpu[i].gic = s;
@@ -375,7 +374,6 @@ static void arm_gicv3_common_realize(DeviceState *dev, 
Error **errp)
  *  PLPIS == 0 (physical LPIs not supported)
  */
 cpu_affid = object_property_get_uint(OBJECT(cpu), "mp-affinity", NULL);
-last = (i == s->num_cpu - 1);
 
 /* The CPU mp-affinity property is in MPIDR register format; squash
  * the affinity bytes into 32 bits as the GICR_TYPER has them.
@@ -384,13 +382,22 @@ static void arm_gicv3_common_realize(DeviceState *dev, 
Error **errp)
  (cpu_affid & 0xFF);
 s->cpu[i].gicr_typer = (cpu_affid << 32) |
 (1 << 24) |
-(i << 8) |
-(last << 4);
+(i << 8);
 
 if (s->lpi_enable) {
 s->cpu[i].gicr_typer |= GICR_TYPER_PLPIS;
 }
 }
+
+/*
+ * Now go through and set GICR_TYPER.Last for the final
+ * redistributor in each region.
+ */
+cpuidx = 0;
+for (i = 0; i < s->nb_redist_regions; i++) {
+cpuidx += s->redist_region_count[i];
+s->cpu[cpuidx - 1].gicr_typer |= GICR_TYPER_LAST;
+}
 }
 
 static void arm_gicv3_finalize(Object *obj)
-- 
2.25.1

[PULL 1/4] hw/intc/arm_gicv3: Move checking of redist-region-count to arm_gicv3_common_realize

2021-11-15 Thread Peter Maydell

The GICv3 devices have an array property redist-region-count.
Currently we check this for errors (bad values) in
gicv3_init_irqs_and_mmio(), just before we use it.  Move this error
checking to the arm_gicv3_common_realize() function, where we
sanity-check all of the other base-class properties. (This will
always be before gicv3_init_irqs_and_mmio() is called, because
that function is called in the subclass realize methods, after
they have called the parent-class realize.)

The motivation for this refactor is:
 * we would like to use the redist_region_count[] values in
   arm_gicv3_common_realize() in a subsequent patch, so we need
   to have already done the sanity-checking first
 * this removes the only use of the Error** argument to
   gicv3_init_irqs_and_mmio(), so we can remove some error-handling
   boilerplate

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 include/hw/intc/arm_gicv3_common.h |  2 +-
 hw/intc/arm_gicv3.c|  6 +-
 hw/intc/arm_gicv3_common.c | 26 +-
 hw/intc/arm_gicv3_kvm.c|  6 +-
 4 files changed, 16 insertions(+), 24 deletions(-)

diff --git a/include/hw/intc/arm_gicv3_common.h 
b/include/hw/intc/arm_gicv3_common.h
index aa4f0d67703..cb2b0d0ad45 100644
--- a/include/hw/intc/arm_gicv3_common.h
+++ b/include/hw/intc/arm_gicv3_common.h
@@ -306,6 +306,6 @@ struct ARMGICv3CommonClass {
 };
 
 void gicv3_init_irqs_and_mmio(GICv3State *s, qemu_irq_handler handler,
-  const MemoryRegionOps *ops, Error **errp);
+  const MemoryRegionOps *ops);
 
 #endif
diff --git a/hw/intc/arm_gicv3.c b/hw/intc/arm_gicv3.c
index 3f24707838c..bcf54a5f0a5 100644
--- a/hw/intc/arm_gicv3.c
+++ b/hw/intc/arm_gicv3.c
@@ -393,11 +393,7 @@ static void arm_gic_realize(DeviceState *dev, Error **errp)
 return;
 }
 
-gicv3_init_irqs_and_mmio(s, gicv3_set_irq, gic_ops, _err);
-if (local_err) {
-error_propagate(errp, local_err);
-return;
-}
+gicv3_init_irqs_and_mmio(s, gicv3_set_irq, gic_ops);
 
 gicv3_init_cpuif(s);
 }
diff --git a/hw/intc/arm_gicv3_common.c b/hw/intc/arm_gicv3_common.c
index 223db16feca..8e47809398b 100644
--- a/hw/intc/arm_gicv3_common.c
+++ b/hw/intc/arm_gicv3_common.c
@@ -250,22 +250,11 @@ static const VMStateDescription vmstate_gicv3 = {
 };
 
 void gicv3_init_irqs_and_mmio(GICv3State *s, qemu_irq_handler handler,
-  const MemoryRegionOps *ops, Error **errp)
+  const MemoryRegionOps *ops)
 {
 SysBusDevice *sbd = SYS_BUS_DEVICE(s);
-int rdist_capacity = 0;
 int i;
 
-for (i = 0; i < s->nb_redist_regions; i++) {
-rdist_capacity += s->redist_region_count[i];
-}
-if (rdist_capacity < s->num_cpu) {
-error_setg(errp, "Capacity of the redist regions(%d) "
-   "is less than number of vcpus(%d)",
-   rdist_capacity, s->num_cpu);
-return;
-}
-
 /* For the GIC, also expose incoming GPIO lines for PPIs for each CPU.
  * GPIO array layout is thus:
  *  [0..N-1] spi
@@ -308,7 +297,7 @@ void gicv3_init_irqs_and_mmio(GICv3State *s, 
qemu_irq_handler handler,
 static void arm_gicv3_common_realize(DeviceState *dev, Error **errp)
 {
 GICv3State *s = ARM_GICV3_COMMON(dev);
-int i;
+int i, rdist_capacity;
 
 /* revision property is actually reserved and currently used only in order
  * to keep the interface compatible with GICv2 code, avoiding extra
@@ -350,6 +339,17 @@ static void arm_gicv3_common_realize(DeviceState *dev, 
Error **errp)
 return;
 }
 
+rdist_capacity = 0;
+for (i = 0; i < s->nb_redist_regions; i++) {
+rdist_capacity += s->redist_region_count[i];
+}
+if (rdist_capacity < s->num_cpu) {
+error_setg(errp, "Capacity of the redist regions(%d) "
+   "is less than number of vcpus(%d)",
+   rdist_capacity, s->num_cpu);
+return;
+}
+
 s->cpu = g_new0(GICv3CPUState, s->num_cpu);
 
 for (i = 0; i < s->num_cpu; i++) {
diff --git a/hw/intc/arm_gicv3_kvm.c b/hw/intc/arm_gicv3_kvm.c
index 5c09f00dec2..ab58c73306d 100644
--- a/hw/intc/arm_gicv3_kvm.c
+++ b/hw/intc/arm_gicv3_kvm.c
@@ -787,11 +787,7 @@ static void kvm_arm_gicv3_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
-gicv3_init_irqs_and_mmio(s, kvm_arm_gicv3_set_irq, NULL, _err);
-if (local_err) {
-error_propagate(errp, local_err);
-return;
-}
+gicv3_init_irqs_and_mmio(s, kvm_arm_gicv3_set_irq, NULL);
 
 for (i = 0; i < s->num_cpu; i++) {
 ARMCPU *cpu = ARM_CPU(qemu_get_cpu(i));
-- 
2.25.1

[PULL 3/4] hw/intc/arm_gicv3: Support multiple redistributor regions

2021-11-15 Thread Peter Maydell

Our GICv3 QOM interface includes an array property
redist-region-count which allows board models to specify that the
registributor registers are not in a single contiguous range, but
split into multiple pieces.  We implemented this for KVM, but
currently the TCG GICv3 model insists that there is only one region.
You can see the limit being hit with a setup like:
  qemu-system-aarch64 -machine virt,gic-version=3 -smp 124

Add support for split regions to the TCG GICv3.  To do this we switch
from allocating a simple array of MemoryRegions to an array of
GICv3RedistRegion structs so that we can use the GICv3RedistRegion as
the opaque pointer in the MemoryRegion read/write callbacks.  Each
GICv3RedistRegion contains the MemoryRegion, a backpointer allowing
the read/write callback to get hold of the GICv3State, and an index
which allows us to calculate which CPU's redistributor is being
accessed.

Note that arm_gicv3_kvm always passes in NULL as the ops argument
to gicv3_init_irqs_and_mmio(), so the only MemoryRegion read/write
callbacks we need to update to handle this new scheme are the
gicv3_redist_read/write functions used by the emulated GICv3.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 include/hw/intc/arm_gicv3_common.h | 12 -
 hw/intc/arm_gicv3.c|  6 -
 hw/intc/arm_gicv3_common.c | 15 ---
 hw/intc/arm_gicv3_kvm.c|  4 +--
 hw/intc/arm_gicv3_redist.c | 40 --
 5 files changed, 46 insertions(+), 31 deletions(-)

diff --git a/include/hw/intc/arm_gicv3_common.h 
b/include/hw/intc/arm_gicv3_common.h
index cb2b0d0ad45..fc38e4b7dca 100644
--- a/include/hw/intc/arm_gicv3_common.h
+++ b/include/hw/intc/arm_gicv3_common.h
@@ -215,13 +215,23 @@ struct GICv3CPUState {
 bool seenbetter;
 };
 
+/*
+ * The redistributor pages might be split into more than one region
+ * on some machine types if there are many CPUs.
+ */
+typedef struct GICv3RedistRegion {
+GICv3State *gic;
+MemoryRegion iomem;
+uint32_t cpuidx; /* index of first CPU this region covers */
+} GICv3RedistRegion;
+
 struct GICv3State {
 /*< private >*/
 SysBusDevice parent_obj;
 /*< public >*/
 
 MemoryRegion iomem_dist; /* Distributor */
-MemoryRegion *iomem_redist; /* Redistributor Regions */
+GICv3RedistRegion *redist_regions; /* Redistributor Regions */
 uint32_t *redist_region_count; /* redistributor count within each region */
 uint32_t nb_redist_regions; /* number of redist regions */
 
diff --git a/hw/intc/arm_gicv3.c b/hw/intc/arm_gicv3.c
index bcf54a5f0a5..c6282984b1e 100644
--- a/hw/intc/arm_gicv3.c
+++ b/hw/intc/arm_gicv3.c
@@ -387,12 +387,6 @@ static void arm_gic_realize(DeviceState *dev, Error **errp)
 return;
 }
 
-if (s->nb_redist_regions != 1) {
-error_setg(errp, "VGICv3 redist region number(%d) not equal to 1",
-   s->nb_redist_regions);
-return;
-}
-
 gicv3_init_irqs_and_mmio(s, gicv3_set_irq, gic_ops);
 
 gicv3_init_cpuif(s);
diff --git a/hw/intc/arm_gicv3_common.c b/hw/intc/arm_gicv3_common.c
index 8de9205b386..9884d2e39b9 100644
--- a/hw/intc/arm_gicv3_common.c
+++ b/hw/intc/arm_gicv3_common.c
@@ -254,6 +254,7 @@ void gicv3_init_irqs_and_mmio(GICv3State *s, 
qemu_irq_handler handler,
 {
 SysBusDevice *sbd = SYS_BUS_DEVICE(s);
 int i;
+int cpuidx;
 
 /* For the GIC, also expose incoming GPIO lines for PPIs for each CPU.
  * GPIO array layout is thus:
@@ -282,14 +283,20 @@ void gicv3_init_irqs_and_mmio(GICv3State *s, 
qemu_irq_handler handler,
   "gicv3_dist", 0x1);
 sysbus_init_mmio(sbd, >iomem_dist);
 
-s->iomem_redist = g_new0(MemoryRegion, s->nb_redist_regions);
+s->redist_regions = g_new0(GICv3RedistRegion, s->nb_redist_regions);
+cpuidx = 0;
 for (i = 0; i < s->nb_redist_regions; i++) {
 char *name = g_strdup_printf("gicv3_redist_region[%d]", i);
+GICv3RedistRegion *region = >redist_regions[i];
 
-memory_region_init_io(>iomem_redist[i], OBJECT(s),
-  ops ? [1] : NULL, s, name,
+region->gic = s;
+region->cpuidx = cpuidx;
+cpuidx += s->redist_region_count[i];
+
+memory_region_init_io(>iomem, OBJECT(s),
+  ops ? [1] : NULL, region, name,
   s->redist_region_count[i] * GICV3_REDIST_SIZE);
-sysbus_init_mmio(sbd, >iomem_redist[i]);
+sysbus_init_mmio(sbd, >iomem);
 g_free(name);
 }
 }
diff --git a/hw/intc/arm_gicv3_kvm.c b/hw/intc/arm_gicv3_kvm.c
index ab58c73306d..5ec5ff9ef6e 100644
--- a/hw/intc/arm_gicv3_kvm.c
+++ b/hw/intc/arm_gicv3_kvm.c
@@ -825,7 +825,7 @@ static void kvm_arm_gicv3_realize(DeviceState *dev, Error 
**errp)
 KVM_VGIC_V3_ADDR_TYPE_DIST, s->dev_fd, 0);
 
 if (!multiple_redist_region_allowed) {
-

[PATCH] sev: allow capabilities to check for SEV-ES support

2021-11-15 Thread Tyler Fanelli

Probe for SEV-ES and SEV-SNP capabilities to distinguish between Rome,
Naples, and Milan processors. Use the CPUID function to probe if a
processor is capable of running SEV-ES or SEV-SNP, rather than if it
actually is running SEV-ES or SEV-SNP.

Signed-off-by: Tyler Fanelli 
---
 qapi/misc-target.json | 11 +--
 target/i386/sev.c |  6 --
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 5aa2b95b7d..c3e9bce12b 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -182,13 +182,19 @@
 # @reduced-phys-bits: Number of physical Address bit reduction when SEV is
 # enabled
 #
+# @es: SEV-ES capability of the machine.
+#
+# @snp: SEV-SNP capability of the machine.
+#
 # Since: 2.12
 ##
 { 'struct': 'SevCapability',
   'data': { 'pdh': 'str',
 'cert-chain': 'str',
 'cbitpos': 'int',
-'reduced-phys-bits': 'int'},
+'reduced-phys-bits': 'int',
+'es': 'bool',
+'snp': 'bool'},
   'if': 'TARGET_I386' }
 
 ##
@@ -205,7 +211,8 @@
 #
 # -> { "execute": "query-sev-capabilities" }
 # <- { "return": { "pdh": "8CCDD8DDD", "cert-chain": "888CCCDDDEE",
-#  "cbitpos": 47, "reduced-phys-bits": 5}}
+#  "cbitpos": 47, "reduced-phys-bits": 5
+#  "es": false, "snp": false}}
 #
 ##
 { 'command': 'query-sev-capabilities', 'returns': 'SevCapability',
diff --git a/target/i386/sev.c b/target/i386/sev.c
index eede07f11d..6d78dcd744 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -506,7 +506,7 @@ static SevCapability *sev_get_capabilities(Error **errp)
 guchar *pdh_data = NULL;
 guchar *cert_chain_data = NULL;
 size_t pdh_len = 0, cert_chain_len = 0;
-uint32_t ebx;
+uint32_t eax, ebx;
 int fd;
 
 if (!kvm_enabled()) {
@@ -534,8 +534,10 @@ static SevCapability *sev_get_capabilities(Error **errp)
 cap->pdh = g_base64_encode(pdh_data, pdh_len);
 cap->cert_chain = g_base64_encode(cert_chain_data, cert_chain_len);
 
-host_cpuid(0x801F, 0, NULL, , NULL, NULL);
+host_cpuid(0x801F, 0, , , NULL, NULL);
 cap->cbitpos = ebx & 0x3f;
+cap->es = (eax & 0x8) ? true : false;
+cap->snp = (eax & 0x10) ? true : false;
 
 /*
  * When SEV feature is enabled, we loose one bit in guest physical
-- 
2.31.1

Re: [PATCH 0/2] hw/nvme: fix namespace attachment on controller hotplug

2021-11-15 Thread Klaus Jensen

On Sep 24 09:27, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> First patch from Hannes fixes the subsystem registration process such
> that shared (but non-detached) namespaces are automatically attached to
> hotplugged controllers.
> 
> The second patch changes the default for 'shared' such that namespaces
> are shared by default and will thus by default be attached to hotplugged
> controllers. This adds a compat property for older machine versions and
> updates the documentation to reflect this.
> 
> Hannes Reinecke (1):
>   hw/nvme: reattach subsystem namespaces on hotplug
> 
> Klaus Jensen (1):
>   hw/nvme: change nvme-ns 'shared' default
> 
>  docs/system/devices/nvme.rst | 24 ++--
>  hw/core/machine.c|  4 +++-
>  hw/nvme/ns.c |  8 +---
>  hw/nvme/subsys.c | 10 +-
>  4 files changed, 27 insertions(+), 19 deletions(-)
> 

Ping :)


signature.asc
Description: PGP signature

Re: [PATCH v2] hw/rtc/pl031: Send RTC_CHANGE QMP event

2021-11-15 Thread Peter Maydell

On Mon, 15 Nov 2021 at 12:49, Paolo Bonzini  wrote:
>
> On 11/1/21 17:04, Peter Maydell wrote:
> > On Thu, 23 Sept 2021 at 14:29, Peter Maydell  
> > wrote:
> >>
> >> On Mon, 20 Sept 2021 at 13:25, Eric Auger  wrote:
> >>>
> >>> The PL031 currently is not able to report guest RTC change to the QMP
> >>> monitor as opposed to mc146818 or spapr RTCs. This patch adds the call
> >>> to qapi_event_send_rtc_change() when the Load Register is written. The
> >>> value which is reported corresponds to the difference between the guest
> >>> reference time and the reference time kept in softmmu/rtc.c.
> >>>
> >>> For instance adding 20s to the guest RTC value will report 20. Adding
> >>> an extra 20s to the guest RTC value will report 20 + 20 = 40.
> >>>
> >>> The inclusion of qapi/qapi-types-misc-target.h in hw/rtl/pl031.c
> >>> require to compile the PL031 with specific_ss.add() to avoid
> >>> ./qapi/qapi-types-misc-target.h:18:13: error: attempt to use poisoned
> >>> "TARGET_".
> >>>
> >>> Signed-off-by: Eric Auger 
> >>> Signed-off-by: Peter Maydell 
> >>
> >> Thanks. This looks plausible to me (well, it would ;-)) but
> >> I would appreciate review from Paolo or somebody else who
> >> understands the rtc_change feature and handling.
> >
> > Ping? Review from somebody who understands rtc_change would
> > still be nice...
>
> The change looks good to me (sorry I missed this v2).  x86 also has some
> logic in the migration post-load, that might end up sending the event.
> However, that's best done separately after understanding and documenting
> exactly what x86 is doing.

Thanks; I've applied this to target-arm.next for 6.2.

-- PMM

Re: [PULL 15/26] configure, meson: move pthread_setname_np checks to Meson

2021-11-15 Thread Peter Maydell

On Mon, 15 Nov 2021 at 17:16, Daniel P. Berrangé  wrote:
>
> On Thu, Oct 14, 2021 at 06:29:27PM +0200, Paolo Bonzini wrote:
> > This makes the pthreads check dead in configure, so remove it
> > as well.
>
> This change appears broken
>
> On v6.1.0
>
> $ grep SETNAME_NP build/config-host.h
> #define CONFIG_PTHREAD_SETNAME_NP_W_TID 1
>
> While on git master
>
> $ grep SETNAME_NP build/config-host.h
> #define CONFIG_PTHREAD_SETNAME_NP_WO_TID
> #define CONFIG_PTHREAD_SETNAME_NP_W_TID
>
>
> it shoudn't be possible to have both these configs
> satisfied as they're 2 completely different impls.
>
> In the meson-log.txt we can see both tests passing, but with
> warnings
>
>   
> /home/berrange/src/virt/qemu/build/meson-private/tmpt191k3q1/testfile.c:9:5: 
> warning: implicit declaration of function 'pthread_setname_np'; did you mean 
> 'pthread_setcanceltype'? [-Wimplicit-function-declaration]
>
> So it isn't actually validating the function parameter
> signature, and then link is succeeding because the linker
> doesn't care about the incompatible function signatures.
>
> Peter meanwhile reports that he gets no pthread_Setname_np
> at all, and his meson-build.log shows -Werror set. This
> causes the missing fnuction signature to cause tests to
> fail in both cases.

More specifically, these tests get run with
"-Werror=implicit-function-declaration" (and this is not
because I've passed any kind of -Werror option in --extra-cflags;
I have passed --enable-werror but that is supposed to be the
default anyway...)

-- PMM

Re: [PULL 15/26] configure, meson: move pthread_setname_np checks to Meson

2021-11-15 Thread Daniel P . Berrangé

On Thu, Oct 14, 2021 at 06:29:27PM +0200, Paolo Bonzini wrote:
> This makes the pthreads check dead in configure, so remove it
> as well.

This change appears broken

On v6.1.0

$ grep SETNAME_NP build/config-host.h
#define CONFIG_PTHREAD_SETNAME_NP_W_TID 1

While on git master

$ grep SETNAME_NP build/config-host.h
#define CONFIG_PTHREAD_SETNAME_NP_WO_TID
#define CONFIG_PTHREAD_SETNAME_NP_W_TID


it shoudn't be possible to have both these configs
satisfied as they're 2 completely different impls.

In the meson-log.txt we can see both tests passing, but with
warnings

  /home/berrange/src/virt/qemu/build/meson-private/tmpt191k3q1/testfile.c:9:5: 
warning: implicit declaration of function 'pthread_setname_np'; did you mean 
'pthread_setcanceltype'? [-Wimplicit-function-declaration]

So it isn't actually validating the function parameter
signature, and then link is succeeding because the linker
doesn't care about the incompatible function signatures.

Peter meanwhile reports that he gets no pthread_Setname_np
at all, and his meson-build.log shows -Werror set. This
causes the missing fnuction signature to cause tests to
fail in both cases.

The original config.log shows _GNU_SOURCE being defined
for these tests, which would get the function signture
defined.

> 
> Reviewed-by: Marc-André Lureau 
> Message-Id: <20211007130829.632254-9-pbonz...@redhat.com>
> Reviewed-by: Thomas Huth 
> Signed-off-by: Paolo Bonzini 
> ---
>  configure| 78 
>  meson.build  | 23 
>  util/qemu-thread-posix.c |  5 ++-
>  3 files changed, 25 insertions(+), 81 deletions(-)
> 
> diff --git a/configure b/configure
> index e78f58978f..c7e95e59cc 100755
> --- a/configure
> +++ b/configure
> @@ -3148,71 +3148,6 @@ if test "$modules" = yes; then
>  fi
>  fi
>  
> -##
> -# pthread probe
> -PTHREADLIBS_LIST="-pthread -lpthread -lpthreadGC2"
> -
> -pthread=no
> -cat > $TMPC << EOF
> -#include 
> -static void *f(void *p) { return NULL; }
> -int main(void) {
> -  pthread_t thread;
> -  pthread_create(, 0, f, 0);
> -  return 0;
> -}
> -EOF
> -if compile_prog "" "" ; then
> -  pthread=yes
> -else
> -  for pthread_lib in $PTHREADLIBS_LIST; do
> -if compile_prog "" "$pthread_lib" ; then
> -  pthread=yes
> -  break
> -fi
> -  done
> -fi
> -
> -if test "$mingw32" != yes && test "$pthread" = no; then
> -  error_exit "pthread check failed" \
> -  "Make sure to have the pthread libs and headers installed."
> -fi
> -
> -# check for pthread_setname_np with thread id
> -pthread_setname_np_w_tid=no
> -cat > $TMPC << EOF
> -#include 
> -
> -static void *f(void *p) { return NULL; }
> -int main(void)
> -{
> -pthread_t thread;
> -pthread_create(, 0, f, 0);
> -pthread_setname_np(thread, "QEMU");
> -return 0;
> -}
> -EOF
> -if compile_prog "" "$pthread_lib" ; then
> -  pthread_setname_np_w_tid=yes
> -fi
> -
> -# check for pthread_setname_np without thread id
> -pthread_setname_np_wo_tid=no
> -cat > $TMPC << EOF
> -#include 
> -
> -static void *f(void *p) { pthread_setname_np("QEMU"); return NULL; }
> -int main(void)
> -{
> -pthread_t thread;
> -pthread_create(, 0, f, 0);
> -return 0;
> -}
> -EOF
> -if compile_prog "" "$pthread_lib" ; then
> -  pthread_setname_np_wo_tid=yes
> -fi
> -
>  ##
>  # libssh probe
>  if test "$libssh" != "no" ; then
> @@ -4498,19 +4433,6 @@ if test "$debug_mutex" = "yes" ; then
>echo "CONFIG_DEBUG_MUTEX=y" >> $config_host_mak
>  fi
>  
> -# Hold two types of flag:
> -#   CONFIG_THREAD_SETNAME_BYTHREAD  - we've got a way of setting the name on
> -# a thread we have a handle to
> -#   CONFIG_PTHREAD_SETNAME_NP_W_TID - A way of doing it on a particular
> -# platform
> -if test "$pthread_setname_np_w_tid" = "yes" ; then
> -  echo "CONFIG_THREAD_SETNAME_BYTHREAD=y" >> $config_host_mak
> -  echo "CONFIG_PTHREAD_SETNAME_NP_W_TID=y" >> $config_host_mak
> -elif test "$pthread_setname_np_wo_tid" = "yes" ; then
> -  echo "CONFIG_THREAD_SETNAME_BYTHREAD=y" >> $config_host_mak
> -  echo "CONFIG_PTHREAD_SETNAME_NP_WO_TID=y" >> $config_host_mak
> -fi
> -
>  if test "$bochs" = "yes" ; then
>echo "CONFIG_BOCHS=y" >> $config_host_mak
>  fi
> diff --git a/meson.build b/meson.build
> index e8e728bf72..26fc4e5792 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -1584,6 +1584,29 @@ config_host_data.set('CONFIG_POSIX_MADVISE', 
> cc.links(gnu_source_prefix + '''
>#include 
>#include 
>int main(void) { return posix_madvise(NULL, 0, POSIX_MADV_DONTNEED); }'''))
> +
> +config_host_data.set('CONFIG_PTHREAD_SETNAME_NP_W_TID', cc.links('''
> +  #include 
> +
> +  static void *f(void *p) { return NULL; }
> +  int main(void)
> +  {
> +pthread_t thread;
> +pthread_create(, 0, f, 0);
> +pthread_setname_np(thread, "QEMU");
> +return 0;
> +  }''',

"make check" fails in a clang sanitizer build on "nbd-qemu-allocation" iotest

2021-11-15 Thread Peter Maydell

Hi; running a 'make check' on a clang sanitizer build one of
the iotests falls over due to a NULL pointer being passed to
memset():


  TEST   iotest-qcow2: nbd-qemu-allocation [fail]
QEMU  --
"/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/tests/qemu-iotests/../../qemu-system-aarch64"
-nodefaults -display none -accel qtest -machine virt
QEMU_IMG  --
"/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/tests/qemu-iotests/../../qemu-img"
QEMU_IO   --
"/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/tests/qemu-iotests/../../qemu-io"
--cache writeback --aio threads -f qcow2
QEMU_NBD  --
"/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/tests/qemu-iotests/../../qemu-nbd"
IMGFMT-- qcow2
IMGPROTO  -- file
PLATFORM  -- Linux/x86_64 e104462 5.4.0-89-generic
TEST_DIR  --
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/tests/qemu-iotests/scratch
SOCK_DIR  -- /tmp/tmp13ihi_hj
GDB_OPTIONS   --
VALGRIND_QEMU --
PRINT_QEMU_OUTPUT --

--- 
/home/petmay01/linaro/qemu-from-laptop/qemu/tests/qemu-iotests/tests/nbd-qemu-allocation.out
+++ nbd-qemu-allocation.out.bad
@@ -14,6 +14,8 @@
 [{ "start": 0, "length": 1048576, "depth": 1, "present": true,
"zero": false, "data": true, "offset": 327680},
 { "start": 1048576, "length": 2097152, "depth": 0, "present": true,
"zero": false, "data": true, "offset": 327680},
 { "start": 3145728, "length": 1048576, "depth": 1, "present": false,
"zero": true, "data": false}]
+../../nbd/server.c:1027:16: runtime error: null pointer passed as
argument 1, which is declared to never be null
+/usr/include/string.h:61:62: note: nonnull attribute specified here
 exports available: 1
  export: ''
   size:  4194304
  TEST   iotest-qcow2: qsd-jobs
Not run: 172 186 192 220 287
Failures: nbd-qemu-allocation
Failed 1 of 118 iotests


Does this look familiar ?

-- PMM

Re: [RFC PATCH v2 1/5] virtio: introduce virtio_force_modern()

2021-11-15 Thread Cornelia Huck

On Mon, Nov 15 2021, Halil Pasic  wrote:

> On Fri, 12 Nov 2021 16:37:20 +0100
> Cornelia Huck  wrote:
>
>> On Fri, Nov 12 2021, Halil Pasic  wrote:
>> 
>> > Legacy vs modern should be detected via transport specific means. We
>> > can't wait till feature negotiation is done. Let us introduce
>> > virtio_force_modern() as a means for the transport code to signal
>> > that the device should operate in modern mode (because a modern driver
>> > was detected).
>> >
>> > A new callback is added for the situations where the device needs
>> > to do more than just setting the VIRTIO_F_VERSION_1 feature bit. For
>> > example, when vhost is involved, we may need to propagate the features
>> > to the vhost device.
>> >
>> > Signed-off-by: Halil Pasic 
>> > ---
>> >
>> > I'm still struggling with how to deal with vhost-user and co. The
>> > problem is that I'm not very familiar with the life-cycle of, let us
>> > say, a vhost_user device.
>> >
>> > Looks to me like the vhost part might be just an implementation detail,
>> > and could even become a hot swappable thing.
>> >
>> > Another thing is, that vhost processes set_features differently. It
>> > might or might not be a good idea to change this.
>> >
>> > Does anybody know why don't we propagate the features on features_set,
>> > but under a set of different conditions, one of which is the vhost
>> > device is started?
>> > ---
>> >  hw/virtio/virtio.c | 13 +
>> >  include/hw/virtio/virtio.h |  2 ++
>> >  2 files changed, 15 insertions(+)
>> >  
>> 
>> Did you see my feedback in
>> https://lore.kernel.org/qemu-devel/87tugzc26y@redhat.com/? I think
>> at least some of it still applies.
>> 
>
> Sure. My idea was to send out a v2 first which helps us think about the
> bigger picture, and then answer that mail. Now I realize I should have
> sent the response first, and then the v2 immediately afterwards.
>
>> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>> > index 3a1f6c520c..26db1b31e6 100644
>> > --- a/hw/virtio/virtio.c
>> > +++ b/hw/virtio/virtio.c
>> > @@ -3281,6 +3281,19 @@ void virtio_init(VirtIODevice *vdev, const char 
>> > *name,
>> >  vdev->use_guest_notifier_mask = true;
>> >  }
>> >  
>> > +void  virtio_force_modern(VirtIODevice *vdev)  
>> 
>> I'd still prefer to call this virtio_indicate_modern: we don't really
>> force anything; the driver has simply already decided that it will use
>> the modern interface and we provide an early indication in the features
>> so that code looking at them makes the right decisions.
>
> I tried to explain my dislike for virtio_indicate_modern in my response
> to that email. In somewhat different words: IMHO indication is about an
> external observer and has a symbolic dimension to it. Please see
> https://www.oxfordlearnersdictionaries.com/definition/english/indicate
> This function is about changing the behavior of the device. Its
> post-condition is: the device acts compliant to virtio 1.0 or higher.

My personal preference is "indicate", I don't like "force". I don't want
a semantics discussion; I'll leave the decision to the virtio
maintainers.

>
>> 
>> > +{
>> > +VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
>> > +
>> > +virtio_add_feature(>guest_features, VIRTIO_F_VERSION_1);
>> > +/* Let the device do it's normal thing. */
>> > +virtio_set_features(vdev, vdev->guest_features);  
>> 
>> I don't think this is substantially different from setting VERSION_1
>> only: At least the callers you introduce call this during reset,
>> i.e. when guest_features is 0 anyway. 
>
> I agree. Just wanted to be conservative, and preserve whatever is there.
>
>
>> We still have the whole processing
>> that is done after feature setting that may have effects different from
>> what the ultimate feature setting will give us.
>
> Yes, this is an intermediate state. As I pointed out, intermediate states
> are necessary.

Why? We just want VERSION_1 so that the checks work, why do we need to
fiddle with other settings? We only need to propagate it to e.g. vhost.

>
>> While I don't think
>> calling set_features twice is forbidden, that sequence is likely quite
>> untested, and I'm not sure we can exclude side effects.
>
> I can't disagree with that. But IMHO we can just say: such problems, if
> any, are bugs that need to be fixed.

Well, what about first fixing the endianness bugs, before we potentially
open up a can of worms?

>
> I think not doing the whole song-and-dance is conceptually more
> problematic because it is more likely to lead to inconsistent internal
> state. For example check out: vhost acked_features <-> guest_features.

What is wrong with verifying with one single feature?

Re: Guests wont start with 15 pcie-root-port devices

2021-11-15 Thread Brian Rak


Will this fix make it into 6.2?

On 11/12/2021 3:51 PM, Igor Mammedov wrote:

On Fri, 12 Nov 2021 17:53:42 +
Daniel P. Berrangé  wrote:


On Fri, Nov 12, 2021 at 12:35:07PM -0500, Brian Rak wrote:

In 6.1, a guest with 15 empty pcie-root-port devices will not boot properly
- it just hangs on "Guest has not initialized the display (yet).".  As soon
as I remove the last pcie-root-port, the guest begins starting up normally.

Yes, QEMU 6.1 has a regression

   https://gitlab.com/qemu-project/qemu/-/issues/641



commit e2a6290aab578b2170c1f5909fa556385dc0d820
Author: Marcel Apfelbaum 
Date:   Mon Aug 2 12:00:57 2021 +0300

     hw/pcie-root-port: Fix hotplug for PCI devices requiring IO

Although I can't say I really understand why that commit triggered it.

It caused the firmware to always allocate I/O space for every port
and there's limited total I/O space, so it runs out at 15 devices.

alternatively instead of reverting to native PCIe hotplug as in the issue
Daniel's mentioned, you can apply following fix
  https://patchew.org/QEMU/2022110857.3116853-1-imamm...@redhat.com/


Regards,
Daniel

Re: [PULL 21/26] configure, meson: move more compiler checks to Meson

2021-11-15 Thread Peter Maydell

On Mon, 15 Nov 2021 at 16:36, Peter Maydell  wrote:
>
> On Thu, 14 Oct 2021 at 17:49, Paolo Bonzini  wrote:
> >
> > Reviewed-by: Marc-André Lureau 
> > Message-Id: <20211007130829.632254-15-pbonz...@redhat.com>
> > Signed-off-by: Paolo Bonzini 
> > ---
> >  configure| 91 
> >  meson.build  | 44 +++
> >  util/meson.build |  4 ++-
> >  3 files changed, 47 insertions(+), 92 deletions(-)
>
>
> > diff --git a/meson.build b/meson.build
> > index 6bf43e6d30..6b7487b725 100644
> > --- a/meson.build
> > +++ b/meson.build
> > @@ -1550,6 +1550,8 @@ config_host_data.set('CONFIG_INOTIFY',
> >   cc.has_header_symbol('sys/inotify.h', 'inotify_init'))
> >  config_host_data.set('CONFIG_INOTIFY1',
> >   cc.has_header_symbol('sys/inotify.h', 
> > 'inotify_init1'))
> > +config_host_data.set('CONFIG_IOVEC',
> > + cc.has_header_symbol('sys/uio.h', 'struct iovec'))
> >  config_host_data.set('CONFIG_MACHINE_BSWAP_H',
> >   cc.has_header_symbol('machine/bswap.h', 'bswap32',
> >prefix: '''#include 
> > 
>
> Hi -- I've just noticed that this change breaks compilation for me,
> because this test incorrectly fails to set CONFIG_IOVEC on a system
> where the header defines 'struct iovec'.

That wasn't quite right. On further investigation, the problem is that
the meson tests silently misbehave if you included "-Werror" in your
--extra-cflags. This results in meson building all the tests with -Werror,
but meson's generated code for tests can't handle that. (configure
gets this right because it has code in that specifically checks
"does this test case give the same result for -Werror and not".)

-- PMM

[PULL 03/20] vhost: Rename last_index to vq_index_end

2021-11-15 Thread Michael S. Tsirkin

From: Eugenio Pérez 

The doc of this field pointed out that last_index is the last vq index.
This is misleading, since it's actually one past the end of the vqs.

Renaming and modifying comment.

Signed-off-by: Eugenio Pérez 
Acked-by: Jason Wang 
Message-Id: <20211104085625.2054959-2-epere...@redhat.com>
Reviewed-by: Juan Quintela 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/virtio/vhost.h | 4 ++--
 hw/net/vhost_net.c| 4 ++--
 hw/virtio/vhost-vdpa.c| 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 3fa0b554ef..58a73e7b7a 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -74,8 +74,8 @@ struct vhost_dev {
 unsigned int nvqs;
 /* the first virtqueue which would be used by this vhost dev */
 int vq_index;
-/* the last vq index for the virtio device (not vhost) */
-int last_index;
+/* one past the last vq index for the virtio device (not vhost) */
+int vq_index_end;
 /* if non-zero, minimum required value for max_queues */
 int num_queues;
 uint64_t features;
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 0d888f29a6..29f2c4212f 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -232,10 +232,10 @@ fail:
 }
 
 static void vhost_net_set_vq_index(struct vhost_net *net, int vq_index,
-   int last_index)
+   int vq_index_end)
 {
 net->dev.vq_index = vq_index;
-net->dev.last_index = last_index;
+net->dev.vq_index_end = vq_index_end;
 }
 
 static int vhost_net_start_one(struct vhost_net *net,
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 0d8051426c..bcaf00e09f 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -645,7 +645,7 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool 
started)
 vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
 }
 
-if (dev->vq_index + dev->nvqs != dev->last_index) {
+if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
 return 0;
 }
 
-- 
MST

Re: [PULL 21/26] configure, meson: move more compiler checks to Meson

2021-11-15 Thread Peter Maydell

On Mon, 15 Nov 2021 at 16:36, Peter Maydell  wrote:
>
> On Thu, 14 Oct 2021 at 17:49, Paolo Bonzini  wrote:
> >
> > Reviewed-by: Marc-André Lureau 
> > Message-Id: <20211007130829.632254-15-pbonz...@redhat.com>
> > Signed-off-by: Paolo Bonzini 
> > ---
> >  configure| 91 
> >  meson.build  | 44 +++
> >  util/meson.build |  4 ++-
> >  3 files changed, 47 insertions(+), 92 deletions(-)
>
>
> > diff --git a/meson.build b/meson.build
> > index 6bf43e6d30..6b7487b725 100644
> > --- a/meson.build
> > +++ b/meson.build
> > @@ -1550,6 +1550,8 @@ config_host_data.set('CONFIG_INOTIFY',
> >   cc.has_header_symbol('sys/inotify.h', 'inotify_init'))
> >  config_host_data.set('CONFIG_INOTIFY1',
> >   cc.has_header_symbol('sys/inotify.h', 
> > 'inotify_init1'))
> > +config_host_data.set('CONFIG_IOVEC',
> > + cc.has_header_symbol('sys/uio.h', 'struct iovec'))
> >  config_host_data.set('CONFIG_MACHINE_BSWAP_H',
> >   cc.has_header_symbol('machine/bswap.h', 'bswap32',
> >prefix: '''#include 
> > 
>
> Hi -- I've just noticed that this change breaks compilation for me,
> because this test incorrectly fails to set CONFIG_IOVEC on a system
> where the header defines 'struct iovec'. This seems to be because
> "struct iovec" isn't a valid thing to test with has_header_symbol,
> because it provokes a compiler error from clang.

https://github.com/mesonbuild/meson/issues/1975 says that for gcc
it's actually going to be wrong the other way (always setting CONFIG_IOVEC
whether the system header has the struct or not), because "struct wombat;"
is syntactically OK as a *declaration*, not a use.

Maybe we can work around this by testing for the presence of something else,
eg IOV_MAX or the readv or writev functions ?

-- PMM

[PULL 16/20] pcie: implement slot power control for pcie root ports

2021-11-15 Thread Michael S. Tsirkin

From: Gerd Hoffmann 

With this patch hot-plugged pci devices will only be visible to the
guest if the guests hotplug driver has enabled slot power.

This should fix the hot-plug race which one can hit when hot-plugging
a pci device at boot, while the guest is in the middle of the pci bus
scan.

Signed-off-by: Gerd Hoffmann 
Message-Id: <2021130859.1171890-3-kra...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/pci/pcie.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 914a9bf3d1..13d11a57c7 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -366,6 +366,29 @@ static void hotplug_event_clear(PCIDevice *dev)
 }
 }
 
+static void pcie_set_power_device(PCIBus *bus, PCIDevice *dev, void *opaque)
+{
+bool *power = opaque;
+
+pci_set_power(dev, *power);
+}
+
+static void pcie_cap_update_power(PCIDevice *hotplug_dev)
+{
+uint8_t *exp_cap = hotplug_dev->config + hotplug_dev->exp.exp_cap;
+PCIBus *sec_bus = pci_bridge_get_sec_bus(PCI_BRIDGE(hotplug_dev));
+uint32_t sltcap = pci_get_long(exp_cap + PCI_EXP_SLTCAP);
+uint16_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL);
+bool power = true;
+
+if (sltcap & PCI_EXP_SLTCAP_PCP) {
+power = (sltctl & PCI_EXP_SLTCTL_PCC) == PCI_EXP_SLTCTL_PWR_ON;
+}
+
+pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
+pcie_set_power_device, );
+}
+
 /*
  * A PCI Express Hot-Plug Event has occurred, so update slot status register
  * and notify OS of the event if necessary.
@@ -434,6 +457,7 @@ void pcie_cap_slot_plug_cb(HotplugHandler *hotplug_dev, 
DeviceState *dev,
 pci_word_test_and_set_mask(exp_cap + PCI_EXP_LNKSTA,
PCI_EXP_LNKSTA_DLLLA);
 }
+pcie_cap_update_power(hotplug_pdev);
 return;
 }
 
@@ -451,6 +475,7 @@ void pcie_cap_slot_plug_cb(HotplugHandler *hotplug_dev, 
DeviceState *dev,
 }
 pcie_cap_slot_event(hotplug_pdev,
 PCI_EXP_HP_EV_PDC | PCI_EXP_HP_EV_ABP);
+pcie_cap_update_power(hotplug_pdev);
 }
 }
 
@@ -625,6 +650,7 @@ void pcie_cap_slot_reset(PCIDevice *dev)
  PCI_EXP_SLTSTA_PDC |
  PCI_EXP_SLTSTA_ABP);
 
+pcie_cap_update_power(dev);
 hotplug_event_update_event_status(dev);
 }
 
@@ -705,6 +731,7 @@ void pcie_cap_slot_write_config(PCIDevice *dev,
 pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
PCI_EXP_SLTSTA_PDC);
 }
+pcie_cap_update_power(dev);
 
 hotplug_event_notify(dev);
 
@@ -731,6 +758,7 @@ int pcie_cap_slot_post_load(void *opaque, int version_id)
 {
 PCIDevice *dev = opaque;
 hotplug_event_update_event_status(dev);
+pcie_cap_update_power(dev);
 return 0;
 }
 
-- 
MST

[PULL 20/20] pcie: expire pending delete

2021-11-15 Thread Michael S. Tsirkin

From: Gerd Hoffmann 

Add an expire time for pending delete, once the time is over allow
pressing the attention button again.

This makes pcie hotplug behave more like acpi hotplug, where one can
try sending an 'device_del' monitor command again in case the guest
didn't respond to the first attempt.

Signed-off-by: Gerd Hoffmann 
Message-Id: <2021130859.1171890-7-kra...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/qdev-core.h | 1 +
 hw/pci/pcie.c  | 2 ++
 softmmu/qdev-monitor.c | 4 +++-
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 72622bd337..20d3066595 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -181,6 +181,7 @@ struct DeviceState {
 char *canonical_path;
 bool realized;
 bool pending_deleted_event;
+int64_t pending_deleted_expires_ms;
 QDict *opts;
 int hotplugged;
 bool allow_unplug_during_migration;
diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index a930ac738a..c5ed266337 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -548,6 +548,8 @@ void pcie_cap_slot_unplug_request_cb(HotplugHandler 
*hotplug_dev,
 }
 
 dev->pending_deleted_event = true;
+dev->pending_deleted_expires_ms =
+qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 5000; /* 5 secs */
 
 /* In case user cancel the operation of multi-function hot-add,
  * remove the function that is unexposed to guest individually,
diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
index 588a62b88d..5925f1ae5f 100644
--- a/softmmu/qdev-monitor.c
+++ b/softmmu/qdev-monitor.c
@@ -943,7 +943,9 @@ void qmp_device_del(const char *id, Error **errp)
 {
 DeviceState *dev = find_device_state(id, errp);
 if (dev != NULL) {
-if (dev->pending_deleted_event) {
+if (dev->pending_deleted_event &&
+(dev->pending_deleted_expires_ms == 0 ||
+ dev->pending_deleted_expires_ms > 
qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL))) {
 error_setg(errp, "Device %s is already in the "
  "process of unplug", id);
 return;
-- 
MST

[PULL 12/20] virtio: use virtio accessor to access packed event

2021-11-15 Thread Michael S. Tsirkin

From: Jason Wang 

We used to access packed descriptor event and off_wrap via
address_space_{write|read}_cached(). When we hit the cache, memcpy()
is used which is not atomic which may lead a wrong value to be read or
wrote.

This patch fixes this by switching to use
virito_{stw|lduw}_phys_cached() to make sure the access is atomic.

Fixes: 683f7665679c1 ("virtio: event suppression support for packed ring")
Cc: qemu-sta...@nongnu.org
Signed-off-by: Jason Wang 
Message-Id: <2021063854.29060-2-jasow...@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio.c | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 939bcbfeb9..ea7c079fb0 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -247,13 +247,10 @@ static void vring_packed_event_read(VirtIODevice *vdev,
 hwaddr off_off = offsetof(VRingPackedDescEvent, off_wrap);
 hwaddr off_flags = offsetof(VRingPackedDescEvent, flags);
 
-address_space_read_cached(cache, off_flags, >flags,
-  sizeof(e->flags));
+e->flags = virtio_lduw_phys_cached(vdev, cache, off_flags);
 /* Make sure flags is seen before off_wrap */
 smp_rmb();
-address_space_read_cached(cache, off_off, >off_wrap,
-  sizeof(e->off_wrap));
-virtio_tswap16s(vdev, >off_wrap);
+e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
 virtio_tswap16s(vdev, >flags);
 }
 
@@ -263,8 +260,7 @@ static void vring_packed_off_wrap_write(VirtIODevice *vdev,
 {
 hwaddr off = offsetof(VRingPackedDescEvent, off_wrap);
 
-virtio_tswap16s(vdev, _wrap);
-address_space_write_cached(cache, off, _wrap, sizeof(off_wrap));
+virtio_stw_phys_cached(vdev, cache, off, off_wrap);
 address_space_cache_invalidate(cache, off, sizeof(off_wrap));
 }
 
@@ -273,8 +269,7 @@ static void vring_packed_flags_write(VirtIODevice *vdev,
 {
 hwaddr off = offsetof(VRingPackedDescEvent, flags);
 
-virtio_tswap16s(vdev, );
-address_space_write_cached(cache, off, , sizeof(flags));
+virtio_stw_phys_cached(vdev, cache, off, flags);
 address_space_cache_invalidate(cache, off, sizeof(flags));
 }
 
-- 
MST

[PULL 19/20] pcie: fast unplug when slot power is off

2021-11-15 Thread Michael S. Tsirkin

From: Gerd Hoffmann 

In case the slot is powered off (and the power indicator turned off too)
we can unplug right away, without round-trip to the guest.

Also clear pending attention button press, there is nothing to care
about any more.

Signed-off-by: Gerd Hoffmann 
Message-Id: <2021130859.1171890-6-kra...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/pci/pcie.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 959bf074b2..a930ac738a 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -560,6 +560,16 @@ void pcie_cap_slot_unplug_request_cb(HotplugHandler 
*hotplug_dev,
 return;
 }
 
+if (((sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_OFF) &&
+((sltctl & PCI_EXP_SLTCTL_PCC) == PCI_EXP_SLTCTL_PWR_OFF)) {
+/* slot is powered off -> unplug without round-trip to the guest */
+pcie_cap_slot_do_unplug(hotplug_pdev);
+hotplug_event_notify(hotplug_pdev);
+pci_word_test_and_clear_mask(exp_cap + PCI_EXP_SLTSTA,
+ PCI_EXP_SLTSTA_ABP);
+return;
+}
+
 pcie_cap_slot_push_attention_button(hotplug_pdev);
 }
 
-- 
MST

[PULL 11/20] virtio: use virtio accessor to access packed descriptor flags

2021-11-15 Thread Michael S. Tsirkin

From: Jason Wang 

We used to access packed descriptor flags via
address_space_{write|read}_cached(). When we hit the cache, memcpy()
is used which is not an atomic operation which may lead a wrong value
is read or wrote.

So this patch switches to use virito_{stw|lduw}_phys_cached() to make
sure the aceess is atomic.

Fixes: 86044b24e865f ("virtio: basic packed virtqueue support")
Cc: qemu-sta...@nongnu.org
Signed-off-by: Jason Wang 
Message-Id: <2021063854.29060-1-jasow...@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index cc69a9b881..939bcbfeb9 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -507,11 +507,9 @@ static void vring_packed_desc_read_flags(VirtIODevice 
*vdev,
  MemoryRegionCache *cache,
  int i)
 {
-address_space_read_cached(cache,
-  i * sizeof(VRingPackedDesc) +
-  offsetof(VRingPackedDesc, flags),
-  flags, sizeof(*flags));
-virtio_tswap16s(vdev, flags);
+hwaddr off = i * sizeof(VRingPackedDesc) + offsetof(VRingPackedDesc, 
flags);
+
+*flags = virtio_lduw_phys_cached(vdev, cache, off);
 }
 
 static void vring_packed_desc_read(VirtIODevice *vdev,
@@ -564,8 +562,7 @@ static void vring_packed_desc_write_flags(VirtIODevice 
*vdev,
 {
 hwaddr off = i * sizeof(VRingPackedDesc) + offsetof(VRingPackedDesc, 
flags);
 
-virtio_tswap16s(vdev, >flags);
-address_space_write_cached(cache, off, >flags, sizeof(desc->flags));
+virtio_stw_phys_cached(vdev, cache, off, desc->flags);
 address_space_cache_invalidate(cache, off, sizeof(desc->flags));
 }
 
-- 
MST

[PULL 17/20] pcie: add power indicator blink check

2021-11-15 Thread Michael S. Tsirkin

From: Gerd Hoffmann 

Refuse to push the attention button in case the guest is busy with some
hotplug operation (as indicated by the power indicator blinking).

Signed-off-by: Gerd Hoffmann 
Message-Id: <2021130859.1171890-4-kra...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/pci/pcie.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 13d11a57c7..b92dbff118 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -506,6 +506,7 @@ void pcie_cap_slot_unplug_request_cb(HotplugHandler 
*hotplug_dev,
 PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev);
 uint8_t *exp_cap = hotplug_pdev->config + hotplug_pdev->exp.exp_cap;
 uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP);
+uint16_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL);
 
 /* Check if hot-unplug is disabled on the slot */
 if ((sltcap & PCI_EXP_SLTCAP_HPC) == 0) {
@@ -521,6 +522,12 @@ void pcie_cap_slot_unplug_request_cb(HotplugHandler 
*hotplug_dev,
 return;
 }
 
+if ((sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_BLINK) {
+error_setg(errp, "Hot-unplug failed: "
+   "guest is busy (power indicator blinking)");
+return;
+}
+
 dev->pending_deleted_event = true;
 
 /* In case user cancel the operation of multi-function hot-add,
-- 
MST

[PULL 15/20] pci: implement power state

2021-11-15 Thread Michael S. Tsirkin

From: Gerd Hoffmann 

This allows to power off pci devices.  In "off" state the devices will
not be visible.  No pci config space access, no pci bar access, no dma.

Default state is "on", so this patch (alone) should not change behavior.

Use case:  Allows hotplug controllers implement slot power.  Hotplug
controllers doing so should set the inital power state for devices in
the ->plug callback.

Signed-off-by: Gerd Hoffmann 
Message-Id: <2021130859.1171890-2-kra...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/pci/pci.h |  2 ++
 hw/pci/pci.c | 25 +++--
 hw/pci/pci_host.c|  6 --
 3 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 5c4016b995..e7cdf2d5ec 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -268,6 +268,7 @@ typedef struct PCIReqIDCache PCIReqIDCache;
 struct PCIDevice {
 DeviceState qdev;
 bool partially_hotplugged;
+bool has_power;
 
 /* PCI config space */
 uint8_t *config;
@@ -908,5 +909,6 @@ extern const VMStateDescription vmstate_pci_device;
 }
 
 MSIMessage pci_get_msi_message(PCIDevice *dev, int vector);
+void pci_set_power(PCIDevice *pci_dev, bool state);
 
 #endif
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 4a84e478ce..e5993c1ef5 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1380,6 +1380,9 @@ static void pci_update_mappings(PCIDevice *d)
 continue;
 
 new_addr = pci_bar_address(d, i, r->type, r->size);
+if (!d->has_power) {
+new_addr = PCI_BAR_UNMAPPED;
+}
 
 /* This bar isn't changed */
 if (new_addr == r->addr)
@@ -1464,8 +1467,8 @@ void pci_default_write_config(PCIDevice *d, uint32_t 
addr, uint32_t val_in, int
 if (range_covers_byte(addr, l, PCI_COMMAND)) {
 pci_update_irq_disabled(d, was_irq_disabled);
 memory_region_set_enabled(>bus_master_enable_region,
-  pci_get_word(d->config + PCI_COMMAND)
-& PCI_COMMAND_MASTER);
+  (pci_get_word(d->config + PCI_COMMAND)
+   & PCI_COMMAND_MASTER) && d->has_power);
 }
 
 msi_write_config(d, addr, val_in, l);
@@ -2182,6 +2185,8 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
 pci_qdev_unrealize(DEVICE(pci_dev));
 return;
 }
+
+pci_set_power(pci_dev, true);
 }
 
 PCIDevice *pci_new_multifunction(int devfn, bool multifunction,
@@ -2853,6 +2858,22 @@ MSIMessage pci_get_msi_message(PCIDevice *dev, int 
vector)
 return msg;
 }
 
+void pci_set_power(PCIDevice *d, bool state)
+{
+if (d->has_power == state) {
+return;
+}
+
+d->has_power = state;
+pci_update_mappings(d);
+memory_region_set_enabled(>bus_master_enable_region,
+  (pci_get_word(d->config + PCI_COMMAND)
+   & PCI_COMMAND_MASTER) && d->has_power);
+if (!d->has_power) {
+pci_device_reset(d);
+}
+}
+
 static const TypeInfo pci_device_type_info = {
 .name = TYPE_PCI_DEVICE,
 .parent = TYPE_DEVICE,
diff --git a/hw/pci/pci_host.c b/hw/pci/pci_host.c
index cf02f0d6a5..7beafd40a8 100644
--- a/hw/pci/pci_host.c
+++ b/hw/pci/pci_host.c
@@ -74,7 +74,8 @@ void pci_host_config_write_common(PCIDevice *pci_dev, 
uint32_t addr,
 /* non-zero functions are only exposed when function 0 is present,
  * allowing direct removal of unexposed functions.
  */
-if (pci_dev->qdev.hotplugged && !pci_get_function_0(pci_dev)) {
+if ((pci_dev->qdev.hotplugged && !pci_get_function_0(pci_dev)) ||
+!pci_dev->has_power) {
 return;
 }
 
@@ -97,7 +98,8 @@ uint32_t pci_host_config_read_common(PCIDevice *pci_dev, 
uint32_t addr,
 /* non-zero functions are only exposed when function 0 is present,
  * allowing direct removal of unexposed functions.
  */
-if (pci_dev->qdev.hotplugged && !pci_get_function_0(pci_dev)) {
+if ((pci_dev->qdev.hotplugged && !pci_get_function_0(pci_dev)) ||
+!pci_dev->has_power) {
 return ~0x0;
 }
 
-- 
MST

[PULL 07/20] hw/acpi/ich9: Add compat prop to keep HPC bit set for 6.1 machine type

2021-11-15 Thread Michael S. Tsirkin

From: Julia Suvorova 

To solve issues [1-2] the Hot Plug Capable bit in PCIe Slots will be
turned on, while the switch to ACPI Hot-plug will be done in the
DSDT table.

Introducing 'x-keep-native-hpc' property disables the HPC bit only
in 6.1 and as a result keeps the forced 'reserve-io' on
pcie-root-ports in 6.1 too.

[1] https://gitlab.com/qemu-project/qemu/-/issues/641
[2] https://bugzilla.redhat.com/show_bug.cgi?id=2006409

Signed-off-by: Julia Suvorova 
Signed-off-by: Igor Mammedov 
Message-Id: <2022110857.3116853-3-imamm...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/acpi/ich9.h |  1 +
 hw/acpi/ich9.c | 18 ++
 hw/i386/pc.c   |  2 ++
 hw/i386/pc_q35.c   |  7 ++-
 4 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/hw/acpi/ich9.h b/include/hw/acpi/ich9.h
index f04f1791bd..7ca92843c6 100644
--- a/include/hw/acpi/ich9.h
+++ b/include/hw/acpi/ich9.h
@@ -56,6 +56,7 @@ typedef struct ICH9LPCPMRegs {
 AcpiCpuHotplug gpe_cpu;
 CPUHotplugState cpuhp_state;
 
+bool keep_pci_slot_hpc;
 bool use_acpi_hotplug_bridge;
 AcpiPciHpState acpi_pci_hotplug;
 MemHotplugState acpi_memory_hotplug;
diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index 1ee2ba2c50..ebe08ed831 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -419,6 +419,20 @@ static void ich9_pm_set_acpi_pci_hotplug(Object *obj, bool 
value, Error **errp)
 s->pm.use_acpi_hotplug_bridge = value;
 }
 
+static bool ich9_pm_get_keep_pci_slot_hpc(Object *obj, Error **errp)
+{
+ICH9LPCState *s = ICH9_LPC_DEVICE(obj);
+
+return s->pm.keep_pci_slot_hpc;
+}
+
+static void ich9_pm_set_keep_pci_slot_hpc(Object *obj, bool value, Error 
**errp)
+{
+ICH9LPCState *s = ICH9_LPC_DEVICE(obj);
+
+s->pm.keep_pci_slot_hpc = value;
+}
+
 void ich9_pm_add_properties(Object *obj, ICH9LPCPMRegs *pm)
 {
 static const uint32_t gpe0_len = ICH9_PMIO_GPE0_LEN;
@@ -428,6 +442,7 @@ void ich9_pm_add_properties(Object *obj, ICH9LPCPMRegs *pm)
 pm->disable_s4 = 0;
 pm->s4_val = 2;
 pm->use_acpi_hotplug_bridge = true;
+pm->keep_pci_slot_hpc = true;
 
 object_property_add_uint32_ptr(obj, ACPI_PM_PROP_PM_IO_BASE,
>pm_io_base, OBJ_PROP_FLAG_READ);
@@ -454,6 +469,9 @@ void ich9_pm_add_properties(Object *obj, ICH9LPCPMRegs *pm)
 object_property_add_bool(obj, ACPI_PM_PROP_ACPI_PCIHP_BRIDGE,
  ich9_pm_get_acpi_pci_hotplug,
  ich9_pm_set_acpi_pci_hotplug);
+object_property_add_bool(obj, "x-keep-pci-slot-hpc",
+ ich9_pm_get_keep_pci_slot_hpc,
+ ich9_pm_set_keep_pci_slot_hpc);
 }
 
 void ich9_pm_device_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 2592a82148..a2ef40ecbc 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -98,6 +98,7 @@ GlobalProperty pc_compat_6_1[] = {
 { TYPE_X86_CPU, "hv-version-id-build", "0x1bbc" },
 { TYPE_X86_CPU, "hv-version-id-major", "0x0006" },
 { TYPE_X86_CPU, "hv-version-id-minor", "0x0001" },
+{ "ICH9-LPC", "x-keep-pci-slot-hpc", "false" },
 };
 const size_t pc_compat_6_1_len = G_N_ELEMENTS(pc_compat_6_1);
 
@@ -107,6 +108,7 @@ GlobalProperty pc_compat_6_0[] = {
 { "qemu64" "-" TYPE_X86_CPU, "stepping", "3" },
 { TYPE_X86_CPU, "x-vendor-cpuid-only", "off" },
 { "ICH9-LPC", ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, "off" },
+{ "ICH9-LPC", "x-keep-pci-slot-hpc", "true" },
 };
 const size_t pc_compat_6_0_len = G_N_ELEMENTS(pc_compat_6_0);
 
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index fc34b905ee..e1e100316d 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -137,6 +137,7 @@ static void pc_q35_init(MachineState *machine)
 DriveInfo *hd[MAX_SATA_PORTS];
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 bool acpi_pcihp;
+bool keep_pci_slot_hpc;
 
 /* Check whether RAM fits below 4G (leaving 1/2 GByte for IO memory
  * and 256 Mbytes for PCI Express Enhanced Configuration Access Mapping
@@ -242,7 +243,11 @@ static void pc_q35_init(MachineState *machine)
   ACPI_PM_PROP_ACPI_PCIHP_BRIDGE,
   NULL);
 
-if (acpi_pcihp) {
+keep_pci_slot_hpc = object_property_get_bool(OBJECT(lpc),
+ "x-keep-pci-slot-hpc",
+ NULL);
+
+if (!keep_pci_slot_hpc && acpi_pcihp) {
 object_register_sugar_prop(TYPE_PCIE_SLOT, "x-native-hotplug",
"false", true);
 }
-- 
MST

[PULL 14/20] vdpa: Check for existence of opts.vhostdev

2021-11-15 Thread Michael S. Tsirkin

From: Eugenio Pérez 

Since net_init_vhost_vdpa is trying to open it. Not specifying it in the
command line crash qemu.

Fixes: 7327813d17 ("vhost-vdpa: open device fd in net_init_vhost_vdpa()")
Signed-off-by: Eugenio Pérez 
Message-Id: <2022193431.2379298-3-epere...@redhat.com>
Acked-by: Jason Wang 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 net/vhost-vdpa.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 1a7250b980..2e3c22a8c7 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -260,6 +260,10 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char 
*name,
 
 assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
 opts = >u.vhost_vdpa;
+if (!opts->vhostdev) {
+error_setg(errp, "vdpa character device not specified with vhostdev");
+return -1;
+}
 
 vdpa_device_fd = qemu_open(opts->vhostdev, O_RDWR, errp);
 if (vdpa_device_fd == -1) {
-- 
MST

[PULL 02/20] softmmu/qdev-monitor: fix use-after-free in qdev_set_id()

2021-11-15 Thread Michael S. Tsirkin

From: Stefan Hajnoczi 

Reported by Coverity (CID 1465222).

Fixes: 4a1d937796de0fecd8b22d7dbebf87f38e8282fd ("softmmu/qdev-monitor: add 
error handling in qdev_set_id")
Cc: Damien Hedde 
Cc: Kevin Wolf 
Cc: Michael S. Tsirkin 
Signed-off-by: Stefan Hajnoczi 
Message-Id: <20211102163342.31162-1-stefa...@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Kevin Wolf 
Reviewed-by: Marc-André Lureau 
Reviewed-by: Damien Hedde 
Reviewed-by: Markus Armbruster 
---
 softmmu/qdev-monitor.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
index f8b3a4cd82..588a62b88d 100644
--- a/softmmu/qdev-monitor.c
+++ b/softmmu/qdev-monitor.c
@@ -593,8 +593,8 @@ const char *qdev_set_id(DeviceState *dev, char *id, Error 
**errp)
 if (prop) {
 dev->id = id;
 } else {
-g_free(id);
 error_setg(errp, "Duplicate device ID '%s'", id);
+g_free(id);
 return NULL;
 }
 } else {
-- 
MST

[PULL 13/20] vdpa: Replace qemu_open_old by qemu_open at

2021-11-15 Thread Michael S. Tsirkin

From: Eugenio Pérez 

There is no reason to keep using the old one, since we neither use the
variadics arguments nor open it with O_DIRECT.

Also, net_client_init1, the caller of net_init_vhost_vdpa, wants all
net_client_init_fun to use Error API, so it's a good step in that
direction.

Signed-off-by: Eugenio Pérez 
Message-Id: <2022193431.2379298-2-epere...@redhat.com>
Acked-by: Jason Wang 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 net/vhost-vdpa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 373b706b90..1a7250b980 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -261,7 +261,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char 
*name,
 assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
 opts = >u.vhost_vdpa;
 
-vdpa_device_fd = qemu_open_old(opts->vhostdev, O_RDWR);
+vdpa_device_fd = qemu_open(opts->vhostdev, O_RDWR, errp);
 if (vdpa_device_fd == -1) {
 return -errno;
 }
-- 
MST

[PULL 09/20] hw/i386/acpi-build: Deny control on PCIe Native Hot-plug in _OSC

2021-11-15 Thread Michael S. Tsirkin

From: Julia Suvorova 

There are two ways to enable ACPI PCI Hot-plug:

* Disable the Hot-plug Capable bit on PCIe slots.

This was the first approach which led to regression [1-2], as
I/O space for a port is allocated only when it is hot-pluggable,
which is determined by HPC bit.

* Leave the HPC bit on and disable PCIe Native Hot-plug in _OSC
  method.

This removes the (future) ability of hot-plugging switches with PCIe
Native hotplug since ACPI PCI Hot-plug only works with cold-plugged
bridges. If the user wants to explicitely use this feature, they can
disable ACPI PCI Hot-plug with:
--global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off

Change the bit in _OSC method so that the OS selects ACPI PCI Hot-plug
instead of PCIe Native.

[1] https://gitlab.com/qemu-project/qemu/-/issues/641
[2] https://bugzilla.redhat.com/show_bug.cgi?id=2006409

Signed-off-by: Julia Suvorova 
Signed-off-by: Igor Mammedov 
Message-Id: <2022110857.3116853-5-imamm...@redhat.com>
Reviewed-by: Ani Sinha 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/i386/acpi-build.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index a3ad6abd33..a99c6e4fe3 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1337,7 +1337,7 @@ static void build_x86_acpi_pci_hotplug(Aml *table, 
uint64_t pcihp_addr)
 aml_append(table, scope);
 }
 
-static Aml *build_q35_osc_method(void)
+static Aml *build_q35_osc_method(bool enable_native_pcie_hotplug)
 {
 Aml *if_ctx;
 Aml *if_ctx2;
@@ -1359,8 +1359,10 @@ static Aml *build_q35_osc_method(void)
 /*
  * Always allow native PME, AER (no dependencies)
  * Allow SHPC (PCI bridges can have SHPC controller)
+ * Disable PCIe Native Hot-plug if ACPI PCI Hot-plug is enabled.
  */
-aml_append(if_ctx, aml_and(a_ctrl, aml_int(0x1F), a_ctrl));
+aml_append(if_ctx, aml_and(a_ctrl,
+aml_int(0x1E | (enable_native_pcie_hotplug ? 0x1 : 0x0)), a_ctrl));
 
 if_ctx2 = aml_if(aml_lnot(aml_equal(aml_arg(1), aml_int(1;
 /* Unknown revision */
@@ -1449,7 +1451,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 aml_append(dev, aml_name_decl("_CID", aml_eisaid("PNP0A03")));
 aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
 aml_append(dev, aml_name_decl("_UID", aml_int(pcmc->pci_root_uid)));
-aml_append(dev, build_q35_osc_method());
+aml_append(dev, build_q35_osc_method(!pm->pcihp_bridge_en));
 aml_append(sb_scope, dev);
 if (mcfg_valid) {
 aml_append(sb_scope, build_q35_dram_controller());
@@ -1565,7 +1567,9 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 if (pci_bus_is_express(bus)) {
 aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A08")));
 aml_append(dev, aml_name_decl("_CID", aml_eisaid("PNP0A03")));
-aml_append(dev, build_q35_osc_method());
+
+/* Expander bridges do not have ACPI PCI Hot-plug enabled */
+aml_append(dev, build_q35_osc_method(true));
 } else {
 aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A03")));
 }
-- 
MST

[PULL 01/20] net/vhost-vdpa: fix memory leak in vhost_vdpa_get_max_queue_pairs()

2021-11-15 Thread Michael S. Tsirkin

From: Stefano Garzarella 

Use g_autofree to ensure that `config` is freed when
vhost_vdpa_get_max_queue_pairs() returns.

Reported-by: Coverity (CID 1465228: RESOURCE_LEAK)
Fixes: 402378407d ("vhost-vdpa: multiqueue support")
Signed-off-by: Stefano Garzarella 
Message-Id: <20211102155157.241034-1-sgarz...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Philippe Mathieu-Daudé 
Acked-by: Jason Wang 
---
 net/vhost-vdpa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 49ab322511..373b706b90 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -214,7 +214,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState 
*peer,
 static int vhost_vdpa_get_max_queue_pairs(int fd, int *has_cvq, Error **errp)
 {
 unsigned long config_size = offsetof(struct vhost_vdpa_config, buf);
-struct vhost_vdpa_config *config;
+g_autofree struct vhost_vdpa_config *config = NULL;
 __virtio16 *max_queue_pairs;
 uint64_t features;
 int ret;
-- 
MST

[PULL 18/20] pcie: factor out pcie_cap_slot_unplug()

2021-11-15 Thread Michael S. Tsirkin

From: Gerd Hoffmann 

No functional change.

Signed-off-by: Gerd Hoffmann 
Message-Id: <2021130859.1171890-5-kra...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/pci/pcie.c | 32 
 1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index b92dbff118..959bf074b2 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -497,6 +497,25 @@ static void pcie_unplug_device(PCIBus *bus, PCIDevice 
*dev, void *opaque)
 object_unparent(OBJECT(dev));
 }
 
+static void pcie_cap_slot_do_unplug(PCIDevice *dev)
+{
+PCIBus *sec_bus = pci_bridge_get_sec_bus(PCI_BRIDGE(dev));
+uint8_t *exp_cap = dev->config + dev->exp.exp_cap;
+uint32_t lnkcap = pci_get_long(exp_cap + PCI_EXP_LNKCAP);
+
+pci_for_each_device_under_bus(sec_bus, pcie_unplug_device, NULL);
+
+pci_word_test_and_clear_mask(exp_cap + PCI_EXP_SLTSTA,
+ PCI_EXP_SLTSTA_PDS);
+if (dev->cap_present & QEMU_PCIE_LNKSTA_DLLLA ||
+(lnkcap & PCI_EXP_LNKCAP_DLLLARC)) {
+pci_word_test_and_clear_mask(exp_cap + PCI_EXP_LNKSTA,
+ PCI_EXP_LNKSTA_DLLLA);
+}
+pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
+   PCI_EXP_SLTSTA_PDC);
+}
+
 void pcie_cap_slot_unplug_request_cb(HotplugHandler *hotplug_dev,
  DeviceState *dev, Error **errp)
 {
@@ -676,7 +695,6 @@ void pcie_cap_slot_write_config(PCIDevice *dev,
 uint32_t pos = dev->exp.exp_cap;
 uint8_t *exp_cap = dev->config + pos;
 uint16_t sltsta = pci_get_word(exp_cap + PCI_EXP_SLTSTA);
-uint32_t lnkcap = pci_get_long(exp_cap + PCI_EXP_LNKCAP);
 
 if (ranges_overlap(addr, len, pos + PCI_EXP_SLTSTA, 2)) {
 /*
@@ -726,17 +744,7 @@ void pcie_cap_slot_write_config(PCIDevice *dev,
 (val & PCI_EXP_SLTCTL_PIC_OFF) == PCI_EXP_SLTCTL_PIC_OFF &&
 (!(old_slt_ctl & PCI_EXP_SLTCTL_PCC) ||
 (old_slt_ctl & PCI_EXP_SLTCTL_PIC_OFF) != PCI_EXP_SLTCTL_PIC_OFF)) {
-PCIBus *sec_bus = pci_bridge_get_sec_bus(PCI_BRIDGE(dev));
-pci_for_each_device_under_bus(sec_bus, pcie_unplug_device, NULL);
-pci_word_test_and_clear_mask(exp_cap + PCI_EXP_SLTSTA,
- PCI_EXP_SLTSTA_PDS);
-if (dev->cap_present & QEMU_PCIE_LNKSTA_DLLLA ||
-(lnkcap & PCI_EXP_LNKCAP_DLLLARC)) {
-pci_word_test_and_clear_mask(exp_cap + PCI_EXP_LNKSTA,
- PCI_EXP_LNKSTA_DLLLA);
-}
-pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
-   PCI_EXP_SLTSTA_PDC);
+pcie_cap_slot_do_unplug(dev);
 }
 pcie_cap_update_power(dev);
 
-- 
MST

[PULL 10/20] tests: bios-tables-test update expected blobs

2021-11-15 Thread Michael S. Tsirkin

From: Igor Mammedov 

The changes are the result of
'hw/i386/acpi-build: Deny control on PCIe Native Hot-Plug in _OSC'
which hides PCIE hotplug bit in host-bridge _OSC

Method (_OSC, 4, NotSerialized)  // _OSC: Operating System Capabilities
 {
 CreateDWordField (Arg3, Zero, CDW1)
 If ((Arg0 == ToUUID ("33db4d5b-1ff7-401c-9657-7441c03dd766") 
/* PCI Host Bridge Device */))
 {
 CreateDWordField (Arg3, 0x04, CDW2)
 CreateDWordField (Arg3, 0x08, CDW3)
 Local0 = CDW3 /* \_SB_.PCI0._OSC.CDW3 */
-Local0 &= 0x1F
+Local0 &= 0x1E

Signed-off-by: Igor Mammedov 
Message-Id: <2022110857.3116853-6-imamm...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 tests/qtest/bios-tables-test-allowed-diff.h |  16 
 tests/data/acpi/q35/DSDT| Bin 8289 -> 8289 bytes
 tests/data/acpi/q35/DSDT.acpihmat   | Bin 9614 -> 9614 bytes
 tests/data/acpi/q35/DSDT.bridge | Bin 11003 -> 11003 bytes
 tests/data/acpi/q35/DSDT.cphp   | Bin 8753 -> 8753 bytes
 tests/data/acpi/q35/DSDT.dimmpxm| Bin 9943 -> 9943 bytes
 tests/data/acpi/q35/DSDT.ipmibt | Bin 8364 -> 8364 bytes
 tests/data/acpi/q35/DSDT.ivrs   | Bin 8306 -> 8306 bytes
 tests/data/acpi/q35/DSDT.memhp  | Bin 9648 -> 9648 bytes
 tests/data/acpi/q35/DSDT.mmio64 | Bin 9419 -> 9419 bytes
 tests/data/acpi/q35/DSDT.multi-bridge   | Bin 8583 -> 8583 bytes
 tests/data/acpi/q35/DSDT.nohpet | Bin 8147 -> 8147 bytes
 tests/data/acpi/q35/DSDT.numamem| Bin 8295 -> 8295 bytes
 tests/data/acpi/q35/DSDT.tis.tpm12  | Bin 8894 -> 8894 bytes
 tests/data/acpi/q35/DSDT.tis.tpm2   | Bin 8894 -> 8894 bytes
 tests/data/acpi/q35/DSDT.xapic  | Bin 35652 -> 35652 bytes
 16 files changed, 16 deletions(-)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index 48e5634d4b..dfb8523c8b 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1,17 +1 @@
 /* List of comma-separated changed AML files to ignore */
-"tests/data/acpi/q35/DSDT",
-"tests/data/acpi/q35/DSDT.tis",
-"tests/data/acpi/q35/DSDT.bridge",
-"tests/data/acpi/q35/DSDT.mmio64",
-"tests/data/acpi/q35/DSDT.ipmibt",
-"tests/data/acpi/q35/DSDT.cphp",
-"tests/data/acpi/q35/DSDT.memhp",
-"tests/data/acpi/q35/DSDT.acpihmat",
-"tests/data/acpi/q35/DSDT.numamem",
-"tests/data/acpi/q35/DSDT.dimmpxm",
-"tests/data/acpi/q35/DSDT.nohpet",
-"tests/data/acpi/q35/DSDT.tis.tpm2",
-"tests/data/acpi/q35/DSDT.tis.tpm12",
-"tests/data/acpi/q35/DSDT.multi-bridge",
-"tests/data/acpi/q35/DSDT.ivrs",
-"tests/data/acpi/q35/DSDT.xapic",
diff --git a/tests/data/acpi/q35/DSDT b/tests/data/acpi/q35/DSDT
index 
281fc82c03b2562d2e6b7caec0d817b034a47138..c1965f6051ef2af81dd8412abe169d87845bb033
 100644
GIT binary patch
delta 24
gcmaFp@X&$FCDET<0BnZ{w*UYD

delta 24
gcmaFp@X&$FCDET<0BnK?w*UYD

diff --git a/tests/data/acpi/q35/DSDT.acpihmat 
b/tests/data/acpi/q35/DSDT.acpihmat
index 
8c1e05a11a328ec1cc6f86e36e52c28f41f9744e..f24d4874bff8d327a165ed7c36de507aea114edd
 100644
GIT binary patch
delta 24
fcmeD4?(^ny33dtTQ)OUa+&+=(3ZvY{`|DKzU@Hhn

delta 24
fcmeD4?(^ny33dtTQ)OUa+%}Qx3ZwkS`|DKzU?vDi

diff --git a/tests/data/acpi/q35/DSDT.bridge b/tests/data/acpi/q35/DSDT.bridge
index 
6f1464b6c712d7f33cb4b891b7ce76fe228f44c9..424d51bd1cb39ea73501ef7d0044ee52cec5bdac
 100644
GIT binary patch
delta 24
gcmewz`a6`%CDF$oF>WH)6-K#@_k$DxTWtqt

delta 24
fcmdn!veAXhCDF$oF?J%?6-N1u_k$DxTWAMo

diff --git a/tests/data/acpi/q35/DSDT.dimmpxm b/tests/data/acpi/q35/DSDT.dimmpxm
index 
fe5820d93d057ef09a001662369b15afbc5b87e2..76e451e829ec4c245315f7eed8731aa1be45a747
 100644
GIT binary patch
delta 24
gcmccad)=4ICDD~$3R?@yKo0BrFHn*aa+

diff --git a/tests/data/acpi/q35/DSDT.memhp b/tests/data/acpi/q35/DSDT.memhp
index 
9bc11518fc57687ca789dc70793b48b29a0d74ed..4e9cb3dc6896bb79ccac0fe342a404549f6610e8
 100644
GIT binary patch
delta 24
gcmdnsy}_HyCD7Sg;8$f{S^uTTRaEr

delta 24
fcmZp7Zg=K#33dr-S7cyd?48JUg;9Rv{S^uTTQ>*m

diff --git a/tests/data/acpi/q35/DSDT.nohpet b/tests/data/acpi/q35/DSDT.nohpet
index 
e8202e6ddfbe96071f32f1ec05758f650569943e..83d1aa00ac5686df479673fb0d7830f946e25dea
 100644
GIT binary patch
delta 24
gcmca?f7zbPCDn+a

delta 24
gcmaFv@Z5pRCDn+a

diff --git a/tests/data/acpi/q35/DSDT.tis.tpm12 
b/tests/data/acpi/q35/DSDT.tis.tpm12
index 
c96b5277a14ae98174408d690d6e0246bd932623..0ebdf6fbd77967f1ab5d5337b7b1fed314cfaca8
 100644
GIT binary patch
delta 24
gcmdnzy3du%CDyjp@iVCN7s?mk^h31_s6r6S=N1%5A)#+64f6_X

delta 26
icmX>yjp@iVCN7s?mk^h31_s9U6S=N1%5S`%+64f6@(F(c

-- 
MST

[PULL 05/20] hw/mem/pc-dimm: Restrict NUMA-specific code to NUMA machines

2021-11-15 Thread Michael S. Tsirkin

From: Philippe Mathieu-Daudé 

When trying to use the pc-dimm device on a non-NUMA machine, we get:

  $ qemu-system-arm -M none -cpu max -S \
  -object memory-backend-file,id=mem1,size=1M,mem-path=/tmp/1m \
  -device pc-dimm,id=dimm1,memdev=mem1
  Segmentation fault (core dumped)

  (gdb) bt
  #0  pc_dimm_realize (dev=0x56da3e90, errp=0x7fffcd10) at 
hw/mem/pc-dimm.c:184
  #1  0x55fe1f8f in device_set_realized (obj=0x56da3e90, 
value=true, errp=0x7fffce18) at hw/core/qdev.c:531
  #2  0x55feb4a9 in property_set_bool (obj=0x56da3e90, 
v=0x56e54420, name=0x563c3c41 "realized", opaque=0x56a704f0, 
errp=0x7fffce18) at qom/object.c:2257

To avoid that crash, restrict the pc-dimm NUMA check to machines
supporting NUMA, and do not allow the use of 'node' property on
non-NUMA machines.

Suggested-by: Igor Mammedov 
Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20211106145016.611332-1-f4...@amsat.org>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/mem/pc-dimm.c | 23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index a3a2560301..48b913aba6 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -181,7 +181,21 @@ static void pc_dimm_realize(DeviceState *dev, Error **errp)
 PCDIMMDevice *dimm = PC_DIMM(dev);
 PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
 MachineState *ms = MACHINE(qdev_get_machine());
-int nb_numa_nodes = ms->numa_state->num_nodes;
+
+if (ms->numa_state) {
+int nb_numa_nodes = ms->numa_state->num_nodes;
+
+if (((nb_numa_nodes > 0) && (dimm->node >= nb_numa_nodes)) ||
+(!nb_numa_nodes && dimm->node)) {
+error_setg(errp, "'DIMM property " PC_DIMM_NODE_PROP " has value %"
+   PRIu32 "' which exceeds the number of numa nodes: %d",
+   dimm->node, nb_numa_nodes ? nb_numa_nodes : 1);
+return;
+}
+} else if (dimm->node > 0) {
+error_setg(errp, "machine doesn't support NUMA");
+return;
+}
 
 if (!dimm->hostmem) {
 error_setg(errp, "'" PC_DIMM_MEMDEV_PROP "' property is not set");
@@ -191,13 +205,6 @@ static void pc_dimm_realize(DeviceState *dev, Error **errp)
object_get_canonical_path_component(OBJECT(dimm->hostmem)));
 return;
 }
-if (((nb_numa_nodes > 0) && (dimm->node >= nb_numa_nodes)) ||
-(!nb_numa_nodes && dimm->node)) {
-error_setg(errp, "'DIMM property " PC_DIMM_NODE_PROP " has value %"
-   PRIu32 "' which exceeds the number of numa nodes: %d",
-   dimm->node, nb_numa_nodes ? nb_numa_nodes : 1);
-return;
-}
 
 if (ddc->realize) {
 ddc->realize(dimm, errp);
-- 
MST

[PULL 00/20] pci,pc,virtio: bugfixes

2021-11-15 Thread Michael S. Tsirkin

The following changes since commit 0a70bcf18caf7a61d480f8448723c15209d128ef:

  Update version for v6.2.0-rc0 release (2021-11-09 18:22:57 +0100)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream

for you to fetch changes up to 18416c62e36a79823a9e28f6b2260aa13c25e1d9:

  pcie: expire pending delete (2021-11-15 11:10:11 -0500)


pci,pc,virtio: bugfixes

pci power management fixes
acpi hotplug fixes
misc other fixes

Signed-off-by: Michael S. Tsirkin 


Eugenio Pérez (4):
  vhost: Rename last_index to vq_index_end
  vhost: Fix last vq queue index of devices with no cvq
  vdpa: Replace qemu_open_old by qemu_open at
  vdpa: Check for existence of opts.vhostdev

Gerd Hoffmann (6):
  pci: implement power state
  pcie: implement slot power control for pcie root ports
  pcie: add power indicator blink check
  pcie: factor out pcie_cap_slot_unplug()
  pcie: fast unplug when slot power is off
  pcie: expire pending delete

Igor Mammedov (2):
  pcie: rename 'native-hotplug' to 'x-native-hotplug'
  tests: bios-tables-test update expected blobs

Jason Wang (2):
  virtio: use virtio accessor to access packed descriptor flags
  virtio: use virtio accessor to access packed event

Julia Suvorova (3):
  hw/acpi/ich9: Add compat prop to keep HPC bit set for 6.1 machine type
  bios-tables-test: Allow changes in DSDT ACPI tables
  hw/i386/acpi-build: Deny control on PCIe Native Hot-plug in _OSC

Philippe Mathieu-Daudé (1):
  hw/mem/pc-dimm: Restrict NUMA-specific code to NUMA machines

Stefan Hajnoczi (1):
  softmmu/qdev-monitor: fix use-after-free in qdev_set_id()

Stefano Garzarella (1):
  net/vhost-vdpa: fix memory leak in vhost_vdpa_get_max_queue_pairs()

 include/hw/acpi/ich9.h|   1 +
 include/hw/pci/pci.h  |   2 +
 include/hw/qdev-core.h|   1 +
 include/hw/virtio/vhost.h |   4 +-
 hw/acpi/ich9.c|  18 
 hw/i386/acpi-build.c  |  12 --
 hw/i386/pc.c  |   2 +
 hw/i386/pc_q35.c  |   9 +++-
 hw/mem/pc-dimm.c  |  23 ++
 hw/net/vhost_net.c|  12 +++---
 hw/pci/pci.c  |  25 ++-
 hw/pci/pci_host.c |   6 ++-
 hw/pci/pcie.c |  79 --
 hw/pci/pcie_port.c|   2 +-
 hw/virtio/vhost-vdpa.c|   2 +-
 hw/virtio/virtio.c|  24 ---
 net/vhost-vdpa.c  |   8 +++-
 softmmu/qdev-monitor.c|   6 ++-
 tests/data/acpi/q35/DSDT  | Bin 8289 -> 8289 bytes
 tests/data/acpi/q35/DSDT.acpihmat | Bin 9614 -> 9614 bytes
 tests/data/acpi/q35/DSDT.bridge   | Bin 11003 -> 11003 bytes
 tests/data/acpi/q35/DSDT.cphp | Bin 8753 -> 8753 bytes
 tests/data/acpi/q35/DSDT.dimmpxm  | Bin 9943 -> 9943 bytes
 tests/data/acpi/q35/DSDT.ipmibt   | Bin 8364 -> 8364 bytes
 tests/data/acpi/q35/DSDT.ivrs | Bin 8306 -> 8306 bytes
 tests/data/acpi/q35/DSDT.memhp| Bin 9648 -> 9648 bytes
 tests/data/acpi/q35/DSDT.mmio64   | Bin 9419 -> 9419 bytes
 tests/data/acpi/q35/DSDT.multi-bridge | Bin 8583 -> 8583 bytes
 tests/data/acpi/q35/DSDT.nohpet   | Bin 8147 -> 8147 bytes
 tests/data/acpi/q35/DSDT.numamem  | Bin 8295 -> 8295 bytes
 tests/data/acpi/q35/DSDT.tis.tpm12| Bin 8894 -> 8894 bytes
 tests/data/acpi/q35/DSDT.tis.tpm2 | Bin 8894 -> 8894 bytes
 tests/data/acpi/q35/DSDT.xapic| Bin 35652 -> 35652 bytes
 33 files changed, 176 insertions(+), 60 deletions(-)

[PULL 08/20] bios-tables-test: Allow changes in DSDT ACPI tables

2021-11-15 Thread Michael S. Tsirkin

From: Julia Suvorova 

Prepare for changing the _OSC method in q35 DSDT.

Signed-off-by: Julia Suvorova 
Signed-off-by: Igor Mammedov 
Acked-by: Ani Sinha 
Message-Id: <2022110857.3116853-4-imamm...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 tests/qtest/bios-tables-test-allowed-diff.h | 16 
 1 file changed, 16 insertions(+)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..48e5634d4b 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,17 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/q35/DSDT",
+"tests/data/acpi/q35/DSDT.tis",
+"tests/data/acpi/q35/DSDT.bridge",
+"tests/data/acpi/q35/DSDT.mmio64",
+"tests/data/acpi/q35/DSDT.ipmibt",
+"tests/data/acpi/q35/DSDT.cphp",
+"tests/data/acpi/q35/DSDT.memhp",
+"tests/data/acpi/q35/DSDT.acpihmat",
+"tests/data/acpi/q35/DSDT.numamem",
+"tests/data/acpi/q35/DSDT.dimmpxm",
+"tests/data/acpi/q35/DSDT.nohpet",
+"tests/data/acpi/q35/DSDT.tis.tpm2",
+"tests/data/acpi/q35/DSDT.tis.tpm12",
+"tests/data/acpi/q35/DSDT.multi-bridge",
+"tests/data/acpi/q35/DSDT.ivrs",
+"tests/data/acpi/q35/DSDT.xapic",
-- 
MST

[PULL 06/20] pcie: rename 'native-hotplug' to 'x-native-hotplug'

2021-11-15 Thread Michael S. Tsirkin

From: Igor Mammedov 

Mark property as experimental/internal adding 'x-' prefix.

Property was introduced in 6.1 and it should have provided
ability to turn on native PCIE hotplug on port even when
ACPI PCI hotplug is in use is user explicitly sets property
on CLI. However that never worked since slot is wired to
ACPI hotplug controller.
Another non-intended usecase: disable native hotplug on slot
when APCI based hotplug is disabled, which works but slot has
'hotplug' property for this taks.

It should be relatively safe to rename it to experimental
as no users should exist for it and given that the property
is broken we don't really want to leave it around for much
longer lest users start using it.

Signed-off-by: Igor Mammedov 
Reviewed-by: Ani Sinha 
Message-Id: <2022110857.3116853-2-imamm...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/i386/pc_q35.c   | 2 +-
 hw/pci/pcie_port.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 797e09500b..fc34b905ee 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -243,7 +243,7 @@ static void pc_q35_init(MachineState *machine)
   NULL);
 
 if (acpi_pcihp) {
-object_register_sugar_prop(TYPE_PCIE_SLOT, "native-hotplug",
+object_register_sugar_prop(TYPE_PCIE_SLOT, "x-native-hotplug",
"false", true);
 }
 
diff --git a/hw/pci/pcie_port.c b/hw/pci/pcie_port.c
index da850e8dde..e95c1e5519 100644
--- a/hw/pci/pcie_port.c
+++ b/hw/pci/pcie_port.c
@@ -148,7 +148,7 @@ static Property pcie_slot_props[] = {
 DEFINE_PROP_UINT8("chassis", PCIESlot, chassis, 0),
 DEFINE_PROP_UINT16("slot", PCIESlot, slot, 0),
 DEFINE_PROP_BOOL("hotplug", PCIESlot, hotplug, true),
-DEFINE_PROP_BOOL("native-hotplug", PCIESlot, native_hotplug, true),
+DEFINE_PROP_BOOL("x-native-hotplug", PCIESlot, native_hotplug, true),
 DEFINE_PROP_END_OF_LIST()
 };
 
-- 
MST

[PULL 04/20] vhost: Fix last vq queue index of devices with no cvq

2021-11-15 Thread Michael S. Tsirkin

From: Eugenio Pérez 

The -1 assumes that cvq device model is accounted in data_queue_pairs,
if cvq does not exists, but it's actually the opposite: Devices with
!cvq are ok but devices with cvq does not add the last queue to
data_queue_pairs.

This is not a problem to vhost-net, but it is to vhost-vdpa:
* Devices with cvq gets initialized at last data vq device model, not
at cvq one.
* Devices with !cvq never gets initialized, since last_index is the
first queue of the last device model.

Because of that, the right change in last_index is to actually add the
cvq, not to remove the missing one.

This is not a problem to vhost-net, but it is to vhost-vdpa, which
device model trust to reach the last index to finish starting the
device.

Also, as the previous commit, rename it to index_end.

Tested with vp_vdpa with host's vhost=on and vhost=off, with ctrl_vq=on
and ctrl_vq=off.

Fixes: 049eb15b5fc9 ("vhost: record the last virtqueue index for the virtio 
device")
Reviewed-by: Juan Quintela 
Signed-off-by: Eugenio Pérez 
Message-Id: <20211104085625.2054959-3-epere...@redhat.com>
Acked-by: Jason Wang 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/net/vhost_net.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 29f2c4212f..30379d2ca4 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -326,11 +326,11 @@ int vhost_net_start(VirtIODevice *dev, NetClientState 
*ncs,
 VirtIONet *n = VIRTIO_NET(dev);
 int nvhosts = data_queue_pairs + cvq;
 struct vhost_net *net;
-int r, e, i, last_index = data_queue_pairs * 2;
+int r, e, i, index_end = data_queue_pairs * 2;
 NetClientState *peer;
 
-if (!cvq) {
-last_index -= 1;
+if (cvq) {
+index_end += 1;
 }
 
 if (!k->set_guest_notifiers) {
@@ -347,7 +347,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 }
 
 net = get_vhost_net(peer);
-vhost_net_set_vq_index(net, i * 2, last_index);
+vhost_net_set_vq_index(net, i * 2, index_end);
 
 /* Suppress the masking guest notifiers on vhost user
  * because vhost user doesn't interrupt masking/unmasking
-- 
MST

Re: [PULL 21/26] configure, meson: move more compiler checks to Meson

2021-11-15 Thread Peter Maydell

On Thu, 14 Oct 2021 at 17:49, Paolo Bonzini  wrote:
>
> Reviewed-by: Marc-André Lureau 
> Message-Id: <20211007130829.632254-15-pbonz...@redhat.com>
> Signed-off-by: Paolo Bonzini 
> ---
>  configure| 91 
>  meson.build  | 44 +++
>  util/meson.build |  4 ++-
>  3 files changed, 47 insertions(+), 92 deletions(-)

> diff --git a/meson.build b/meson.build
> index 6bf43e6d30..6b7487b725 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -1550,6 +1550,8 @@ config_host_data.set('CONFIG_INOTIFY',
>   cc.has_header_symbol('sys/inotify.h', 'inotify_init'))
>  config_host_data.set('CONFIG_INOTIFY1',
>   cc.has_header_symbol('sys/inotify.h', 'inotify_init1'))
> +config_host_data.set('CONFIG_IOVEC',
> + cc.has_header_symbol('sys/uio.h', 'struct iovec'))
>  config_host_data.set('CONFIG_MACHINE_BSWAP_H',
>   cc.has_header_symbol('machine/bswap.h', 'bswap32',
>prefix: '''#include 

Hi -- I've just noticed that this change breaks compilation for me,
because this test incorrectly fails to set CONFIG_IOVEC on a system
where the header defines 'struct iovec'. This seems to be because
"struct iovec" isn't a valid thing to test with has_header_symbol,
because it provokes a compiler error from clang.

The meson-log.txt shows:

Running compile:
Working directory:
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/meson-private/tmpfspzse_8
Command line:  clang-7 -m64 -mcx16
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/meson-private/tmpfspzse_8/testfile.c
-o /mnt/nvme
disk/linaro/qemu-from-laptop/qemu/build/arm-clang/meson-private/tmpfspzse_8/output.obj
-c -fsanitize=undefined -fno-sanitize=shift-base -Werror
 -D_FILE_OFFSET_BITS=64 -O0 -Werror=implicit-function-declaration
-Werror=unknown-warning-option -Werror=unused-command-line-argument
-Werror=i
gnored-optimization-argument -std=gnu11

Code:

#include 
int main(void) {
/* If it's not defined as a macro, try to use as a symbol */
#ifndef struct iovec
struct iovec;
#endif
return 0;
}
Compiler stdout:

Compiler stderr:

/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/meson-private/tmpfspzse_8/testfile.c:5:28:
error: extra tokens at end of #ifndef di
rective [-Werror,-Wextra-tokens]
#ifndef struct iovec
   ^
   //
1 error generated.

...skipping...
#ifndef struct iovec
struct iovec;
#endif
return 0;
}
Compiler stdout:

Compiler stderr:

/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/meson-private/tmpfspzse_8/testfile.c:5:28:
error: extra tokens at end of #ifndef r]
#ifndef struct iovec
   ^
   //
1 error generated.

Header  has symbol "struct iovec" : NO

For comparison, with a gcc build the test works because gcc
happens to merely warn rather than fail for the syntax issue:

Running compile:
Working directory:
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/x86/meson-private/tmpidzebj6t
Command line:  ccache gcc -m64 -mcx16
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/x86/meson-private/tmpidzebj6t/testfile.c
-o /mnt/nvmedis
k/linaro/qemu-from-laptop/qemu/build/x86/meson-private/tmpidzebj6t/output.obj
-c -D_FILE_OFFSET_BITS=64 -O0 -std=gnu11

Code:

#include 
int main(void) {
/* If it's not defined as a macro, try to use as a symbol */
 int main(void) { return 0; }

Compiler stdout:

Compiler stderr:
...skipping...
#ifndef struct iovec
struct iovec;
#endif
return 0;
}
Compiler stdout:

Compiler stderr:

/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/x86/meson-private/tmpidzebj6t/testfile.c:
In function 'main':
/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/x86/meson-private/tmpidzebj6t/testfile.c:5:28:
warning: extra tokens at end of #ifndef direcve
5 | #ifndef struct iovec
  |^

Header  has symbol "struct iovec" : YES

-- PMM

Re: [PATCH v4 00/25] block layer: split block APIs in global state and I/O

2021-11-15 Thread Daniel P . Berrangé

On Mon, Nov 15, 2021 at 05:03:28PM +0100, Hanna Reitz wrote:
> On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:
> > Currently, block layer APIs like block-backend.h contain a mix of
> > functions that are either running in the main loop and under the
> > BQL, or are thread-safe functions and run in iothreads performing I/O.
> > The functions running under BQL also take care of modifying the
> > block graph, by using drain and/or aio_context_acquire/release.
> > This makes it very confusing to understand where each function
> > runs, and what assumptions it provided with regards to thread
> > safety.
> > 
> > We call the functions running under BQL "global state (GS) API", and
> > distinguish them from the thread-safe "I/O API".
> > 
> > The aim of this series is to split the relevant block headers in
> > global state and I/O sub-headers.
> 
> Despite leaving quite some comments, the series and the split seem
> reasonable to me overall.  (This is a pretty big series, after all, so those
> “some comments” stack up against a majority of changes that seem OK to me.
> :))
> 
> One thing I noticed while reviewing is that it’s really hard to verify that
> no I/O function calls a GS function.  What would be wonderful is some
> function marker like coroutine_fn that marks GS functions (or I/O functions)
> and that we could then verify the call paths.  But AFAIU we’ve always wanted
> precisely that for coroutine_fn and still don’t have it, so this seems like
> extremely wishful thinking... :(

Even if we don't programmatically verify these rules, it would be a major
step forward if we simply adopted a standard naming convention for the
APIs that was essentally self-describing. eg block_io_XXX for all I/O
stuff and block_state_ for all global state ,and block_common if
we have stuff applicable to both use cases.

I wouldn't suggest doing it as part of this series, but I think it'd
be worthwhile in general for the medium-long term, despite the code
churn in updating all usage in the short term.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: does drive_get_next(IF_NONE) make sense?

2021-11-15 Thread Thomas Huth


On 15/11/2021 08.12, Alistair Francis wrote:

On Mon, Nov 15, 2021 at 3:32 PM Markus Armbruster  wrote:


Peter Maydell  writes:


On Fri, 12 Nov 2021 at 13:34, Markus Armbruster  wrote:


Thomas Huth  writes:


On 03/11/2021 09.41, Markus Armbruster wrote:

Peter Maydell  writes:


Does it make sense for a device/board to do drive_get_next(IF_NONE) ?

Short answer: hell, no!  ;)


Would it make sense to add an "assert(type != IF_NONE)" to drive_get()
to avoid such mistakes in the future?


Worth a try.


You need to fix the sifive_u_otp device first :-)


And for that, we may want Hao Wu's "[PATCH v4 5/7] blockdev: Add a new
IF type IF_OTHER" first.


I can fixup sifive_u_otp, just let me know what the prefered solution is


What kind of device is that OTP exactly? If it is some kind of non-serial 
flash device, maybe you could simply use IF_PFLASH instead?


 Thomas

Re: [PATCH v4 00/25] block layer: split block APIs in global state and I/O

2021-11-15 Thread Hanna Reitz


On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:

Currently, block layer APIs like block-backend.h contain a mix of
functions that are either running in the main loop and under the
BQL, or are thread-safe functions and run in iothreads performing I/O.
The functions running under BQL also take care of modifying the
block graph, by using drain and/or aio_context_acquire/release.
This makes it very confusing to understand where each function
runs, and what assumptions it provided with regards to thread
safety.

We call the functions running under BQL "global state (GS) API", and
distinguish them from the thread-safe "I/O API".

The aim of this series is to split the relevant block headers in
global state and I/O sub-headers.


Despite leaving quite some comments, the series and the split seem 
reasonable to me overall.  (This is a pretty big series, after all, so 
those “some comments” stack up against a majority of changes that seem 
OK to me. :))


One thing I noticed while reviewing is that it’s really hard to verify 
that no I/O function calls a GS function.  What would be wonderful is 
some function marker like coroutine_fn that marks GS functions (or I/O 
functions) and that we could then verify the call paths.  But AFAIU 
we’ve always wanted precisely that for coroutine_fn and still don’t have 
it, so this seems like extremely wishful thinking... :(


I think most of the issues I found can be fixed (or are even 
irrelevant), the only thing that really worries me are the two places 
that are clearly I/O paths that call permission functions: Namely first 
block_crypto_amend_options_generic_luks() (part of the luks block 
driver’s .bdrv_co_amend implementation), which calls 
bdrv_child_refresh_perms(); and second fuse_do_truncate(), which calls 
blk_set_perm().


In the first case, we need this call so that we don’t permanently hog 
the WRITE permission for the luks file, which used to be a problem, I 
believe.  We want to unshare the WRITE permission (and apparently also 
CONSISTENT_READ) during the key update, so we need some way to 
temporarily update the permissions.


I only really see four solutions for this:
(1) We somehow make the amend job run in the main context under the BQL 
and have it prevent all concurrent I/O access (seems bad)
(2) We can make the permission functions part of the I/O path (seems 
wrong and probably impossible?)
(3) We can drop the permissions update and permanently require the 
permissions that we need when updating keys (I think this might break 
existing use cases)
(4) We can acquire the BQL around the permission update call and perhaps 
that works?


I don’t know how (4) would work but it’s basically the only reasonable 
solution I can come up with.  Would this be a way to call a BQL function 
from an I/O function?


As for the second case, the same applies as above, with the differences 
that we have no jobs, so this code must always run in the block device’s 
AioContext (I think), which rules out (1); but (3) would become easier 
(i.e. require the RESIZE permission all the time), although that too 
might have an impact on existing users (don’t think so, though).  In any 
case, if we could do (4), that would solve the problem here, too.



And finally, another notable thing I noticed is that the way how 
create-related functions are handled is inconsistent.  I believe they 
should all be GS functions; qmp_blockdev_create() seems to agree with me 
on this, but we currently seem to have some bugs there.  It’s possible 
to invoke blockdev-create on a block device that’s in an I/O thread, and 
then qemu crashes.  Oops.  (The comment in qmp_blockdev_create() says 
that the block drivers’ implementations should prevent this, but 
apparently they don’t...?) In any case, that’s a pre-existing bug, of 
course, that doesn’t concern this series (other than that it suggests 
that “create” functions should be classified as GS).


Hanna

Re: [PATCH RFC 0/2] Eliminate drive_get_next()

2021-11-15 Thread Markus Armbruster

Peter Maydell  writes:

> On Mon, 15 Nov 2021 at 12:55, Markus Armbruster  wrote:
>>
>> This is RFC because I'm unsure about the removal of
>>
>> /* Reason: init() method uses drive_get_next() */
>> dc->user_creatable = false;
>>
>> in PATCH 1.  Both users appear to wire up some GPIO.  If that's
>> necessary for the thing to work, we should just replace the comment.
>
> Looking at the code, it sort of is and sort of isn't. The GPIO line
> is the chip-select line. If you don't connect it then (because the
> ssi-sd device configures cs_polarity to SSI_CS_LOW, requesting an
> active-low chip-select) the device will always be selected. If
> the machine created an SSI bus with no SSI device attached to it
> then in theory the user could create an ssi-sd device and connect
> it there and have it work. But in practice it's really unlikely:
> machines create SSI buses with specific wired-in devices on them,
> and the guest OS knows what it has to do to enable the chip select
> for the device it wants to talk to (often some known GPIO pin on
> a GPIO controller).
>
> So I would keep the user_creatable = false, with a reason of
> "user should wire up GPIO chip-select line". But see below for

I'll do it this way.

> a pile of contrary precedent.
>
> (The chip-select GPIO is created in the parent class, incidentally.)
>
>> Aside: there may be devices that need manual wiring to work, yet don't
>> have user_creatable unset.  Bugs if you ask me.  I don't have smart
>> ideas on how to track them down.
>
> Me neither. I notice that the TYPE_M25P80 is also an SSI peripheral
> with an active-low chipselect but that one doesn't set user_creatable
> to false. TYPE_SSD0323 also is user-creatable and that one has an
> active-high chipselect, which means the user can create a device but
> it will then never be usable because it will ignore all transactions.
> (More generally, looks like most subclasses of TYPE_SSI_PERIPHERAL
> don't set user_creatable = false.)

For sysbus devices, we clear user_creatable by default, because sysbus
devices pretty much always[*] need wiring.  Is this the case for SSI bus
devices, too?


[*] The most prominent exception is "dynamic sysbus", which I believe
was a mistake.

Re: [PATCH RFC 2/2] hw: Replace drive_get_next() by drive_get()

2021-11-15 Thread Markus Armbruster

Philippe Mathieu-Daudé  writes:

> On 11/15/21 13:55, Markus Armbruster wrote:
>> drive_get_next() is basically a bad idea.  It returns the "next" block
>> backend of a certain interface type.  "Next" means bus=0,unit=N, where
>> subsequent calls count N up from zero, per interface type.
>> 
>> This lets you define unit numbers implicitly by execution order.  If the
>> order changes, or new calls appear "in the middle", unit numbers change.
>> ABI break.  Hard to spot in review.
>> 
>> Explicit is better than implicit: use drive_get() directly.
>> 
>> Signed-off-by: Markus Armbruster 
>> ---
>>  include/sysemu/blockdev.h   |  1 -
>>  blockdev.c  | 10 --
>>  hw/arm/aspeed.c | 21 +
>>  hw/arm/cubieboard.c |  2 +-
>>  hw/arm/imx25_pdk.c  |  2 +-
>>  hw/arm/integratorcp.c   |  2 +-
>>  hw/arm/mcimx6ul-evk.c   |  2 +-
>>  hw/arm/mcimx7d-sabre.c  |  2 +-
>>  hw/arm/msf2-som.c   |  2 +-
>>  hw/arm/npcm7xx_boards.c |  6 +++---
>>  hw/arm/orangepi.c   |  2 +-
>>  hw/arm/raspi.c  |  2 +-
>>  hw/arm/realview.c   |  2 +-
>>  hw/arm/sabrelite.c  |  2 +-
>>  hw/arm/versatilepb.c|  4 ++--
>>  hw/arm/vexpress.c   |  6 +++---
>>  hw/arm/xilinx_zynq.c| 16 +---
>>  hw/arm/xlnx-versal-virt.c   |  3 ++-
>>  hw/arm/xlnx-zcu102.c|  6 +++---
>>  hw/microblaze/petalogix_ml605_mmu.c |  2 +-
>>  hw/misc/sifive_u_otp.c  |  2 +-
>>  hw/riscv/microchip_pfsoc.c  |  2 +-
>>  hw/sparc64/niagara.c|  2 +-
>>  23 files changed, 49 insertions(+), 52 deletions(-)
>
>> @@ -435,11 +438,13 @@ static void aspeed_machine_init(MachineState *machine)
>>  }
>>  
>>  for (i = 0; i < bmc->soc.sdhci.num_slots; i++) {
>> -sdhci_attach_drive(>soc.sdhci.slots[i], drive_get_next(IF_SD));
>> +sdhci_attach_drive(>soc.sdhci.slots[i],
>> +   drive_get(IF_SD, 0, i));
>
> If we put SD on bus #0, ...
>
>>  }
>>  
>>  if (bmc->soc.emmc.num_slots) {
>> -sdhci_attach_drive(>soc.emmc.slots[0], drive_get_next(IF_SD));
>> +sdhci_attach_drive(>soc.emmc.slots[0],
>> +   drive_get(IF_SD, 0, bmc->soc.sdhci.num_slots));
>
> ... we'd want to put eMMC on bus #1

Using separate buses for different kinds of devices would be neater, but
it also would be an incompatible change.  This patch keeps existing
bus/unit numbers working.  drive_get_next() can only use bus 0.

>  but I see having eMMC cards on a
> IF_SD bus as a bug, since these cards are soldered on the board.

IF_SD is not a bus, it's an "block interface type", which is really just
a user interface thing.

>> --- a/hw/arm/vexpress.c
>> +++ b/hw/arm/vexpress.c
>> @@ -625,7 +625,7 @@ static void vexpress_common_init(MachineState *machine)
>>qdev_get_gpio_in(sysctl, 
>> ARM_SYSCTL_GPIO_MMC_WPROT));
>>  qdev_connect_gpio_out_named(dev, "card-inserted", 0,
>>qdev_get_gpio_in(sysctl, 
>> ARM_SYSCTL_GPIO_MMC_CARDIN));
>> -dinfo = drive_get_next(IF_SD);
>> +dinfo = drive_get(IF_SD, 0, 0);
>
> Can we have one interface refactor per patch (IF_SD, IF_PFLASH, IF_MTD...)?

Peter asked for one patch per "board/SoC model".  I'll do whatever helps
reviewers.

>> @@ -657,7 +657,7 @@ static void vexpress_common_init(MachineState *machine)
>>  
>>  sysbus_create_simple("pl111", map[VE_CLCD], pic[14]);
>>  
>> -dinfo = drive_get_next(IF_PFLASH);
>> +dinfo = drive_get(IF_PFLASH, 0, 0);
>
>> -static inline void zynq_init_spi_flashes(uint32_t base_addr, qemu_irq irq,
>> - bool is_qspi)
>> +static inline int zynq_init_spi_flashes(uint32_t base_addr, qemu_irq irq,
>> +bool is_qspi, int unit0)
>>  {
>> +int unit = unit0;
>>  DeviceState *dev;
>>  SysBusDevice *busdev;
>>  SSIBus *spi;
>> @@ -156,7 +157,7 @@ static inline void zynq_init_spi_flashes(uint32_t 
>> base_addr, qemu_irq irq,
>>  spi = (SSIBus *)qdev_get_child_bus(dev, bus_name);
>>  
>>  for (j = 0; j < num_ss; ++j) {
>> -DriveInfo *dinfo = drive_get_next(IF_MTD);
>> +DriveInfo *dinfo = drive_get(IF_MTD, 0, unit++);
>
>> diff --git a/hw/arm/xlnx-zcu102.c b/hw/arm/xlnx-zcu102.c
>> index 3dc2b5e8ca..45eb19ab3b 100644
>> --- a/hw/arm/xlnx-zcu102.c
>> +++ b/hw/arm/xlnx-zcu102.c
>> @@ -190,7 +190,7 @@ static void xlnx_zcu102_init(MachineState *machine)
>>  BusState *spi_bus;
>>  DeviceState *flash_dev;
>>  qemu_irq cs_line;
>> -DriveInfo *dinfo = drive_get_next(IF_MTD);
>> +DriveInfo *dinfo = drive_get(IF_MTD, 0, i);
>
> If this is bus #0, ...
>
>>

Re: [PATCH] qmp: Stabilize preconfig

2021-11-15 Thread Markus Armbruster

Paolo Bonzini  writes:

> On 11/13/21 08:52, Markus Armbruster wrote:
>> I'm not asking what to do "if it hurts", or "if you want a cold-plugged
>> device".  I'm asking whether there's a reason for ever wanting hot plug
>> instead of cold plug.  Or in other words, what can hot plug possibly
>> gain us over cold plug?
>> 
>> As far as I know, the answer is "nothing but trouble".
>
> Yes, I agree.
>
>> If that's true, then what we should tell users is to stick to -device
>> for initial configuration, and stay away from device_add.
>
> Yes, which is one issue with stabilizing -preconfig.  It's not clear 
> what exactly it is solving.
>
> The boat for this has sailed.  The only sane way to do this is a new 
> binary.

 "Ideally" still applies to any new binary.
>>>
>>> Well, "ideally" any new binary would only have a few command line
>>> options, and ordering would be mostly irrelevant.  For example I'd
>>> expect a QMP binary to only have a few options, mostly for
>>> debugging/development (-L, -trace) and for process-wide settings (such
>>> as -name).
>> 
>> This is where we disagree.  For me, a new, alternative qemu-system-FOO binary
>> should be able to replace the warty one we have.
>> 
>> One important kind of user is management applications.  Libvirt
>> developers tell us that they'd like to configure as much as possible via
>> QMP.  Another kind of user dear to me is me^H^Hdevelopers.  For ad hoc
>> testing, having to configure via QMP is a pain we'd rathe do without.  I
>> don't want to remain stuck on the traditional binary, I want to do this
>> with the new one.
>
> Why do you care?  For another example, you can use "reboot" or 
> "systemctl isolate reboot.target" and they end up doing the same thing.
>
> As long as qemu_init invokes qmp_machine_set, qmp_accel_set, 
> qmp_device_add, qmp_plugin_add, qmp_cont, etc. to do its job, the 
> difference between qemu-system-* and qemu-qmp-* is a couple thousands 
> lines of boring code that all but disappears once the VM is up and 
> running.  IOW, with the right design (e.g. shortcut options for QOM 
> properties good; dozens of global variables bad), there's absolutely no 
> issue with some people using qemu-system-* and some using qemu-qmp-*.

I think maintaining two binaries forever is madness.  I want the old one
to wither away.

Making the new binary capable of serving all use cases should not be
hard, just work (see my design sketch).  I expect the result to serve
*better* than the mess we have now.

>> Likewise, we'd fail QMP commands that are "out of phase".
>> @allow-preconfig is a crutch that only exists because we're afraid (with
>> reason) of hidden assumptions in QMP commands.
>
> At this point, it's not even like that anymore (except for block devices
> because my patches haven't been applied).

 My point is that we still have quite a few commands without
 'allow-preconfig' mostly because we are afraid of allowing them in
 preconfig state, not because of true phase dependencies.
>>>
>>> I think there's very few of them, if any (outside the block layer for
>>> which patches exist), and those are due to distraction more than fear.
>> 
>> qapi/*.json has 216 commands, of which 26 carry 'allow-preconfig'.
>
> Well, 
> https://lists.gnu.org/archive/html/qemu-devel/2021-06/msg01597.html 
> alone would more than double that. :)
>
> Places like machine.json, machine-target.json, migration.json, 
> replay.json have a lot of commands that are, obviously, almost entirely 
> not suitable for preconfig.  I don't think there are many commands left, 
> I'd guess maybe 30 (meaning that ~60% are done).

My point is that "very few" is not literally true, and I think you just
confirmed it ;)

Re: does drive_get_next(IF_NONE) make sense?

2021-11-15 Thread Markus Armbruster

Peter Maydell  writes:

> On Mon, 15 Nov 2021 at 13:24, Kevin Wolf  wrote:
>> Same question as for Hao Wu's series: Wouldn't the proper solution be to
>> add a drive property to the machine type?
>>
>> If you can't use -blockdev, it's not done right.
>
> Is there an example of "doing it right" for built-in-to-the-machine
> devices?
>
> (My experience with the new-style options is that almost
> always they're designed for x86 where the device they're attached
> to is also created on the command line, and then handling of boards
> where the device is builtin is either an afterthought or forgotten.
> See also -netdev, where it took forever for built-in-ethernet to
> be usable.)

I'm afraid the situation for onboard block devices is far worse than it
ever was for NICs.  See my reply "On configuring onboard block devices
with -blockdev" to Kevin's other message on the topic.

To be fair, improving onboard device configuration is *hard*.  Our
general device configuration interface doesn't cover them, and we've so
far failed finding a general solution.  Without one, we're drowning in
the sheer number of boards and onboard devices.  Which is ever growing.

On configuring onboard block devices with -blockdev (was: [PATCH v4 6/7] hw/nvram: Update at24c EEPROM init function in NPCM7xx boards)

2021-11-15 Thread Markus Armbruster

Kevin Wolf  writes:

> Am 03.11.2021 um 23:01 hat Hao Wu geschrieben:
>> We made 3 changes to the at24c_eeprom_init function in
>> npcm7xx_boards.c:
>> 
>> 1. We allow the function to take a I2CBus* as parameter. This allows
>>us to attach an EEPROM device behind an I2C mux which is not
>>possible with the old method.
>> 
>> 2. We make at24c EEPROMs are backed by drives so that we can
>>specify the content of the EEPROMs.
>> 
>> 3. Instead of using i2c address as unit number, This patch assigns
>>unique unit numbers for each eeproms in each board. This avoids
>>conflict in providing multiple eeprom contents with the same address.
>>In the old method if we specify two drives with the same unit number,
>>the following error will occur: `Device with id 'none85' exists`.
>> 
>> Signed-off-by: Hao Wu 
>> ---
>>  hw/arm/npcm7xx_boards.c | 14 +-
>>  1 file changed, 9 insertions(+), 5 deletions(-)
>> 
>> diff --git a/hw/arm/npcm7xx_boards.c b/hw/arm/npcm7xx_boards.c
>> index dec7d16ae5..9121e081fa 100644
>> --- a/hw/arm/npcm7xx_boards.c
>> +++ b/hw/arm/npcm7xx_boards.c
>> @@ -126,13 +126,17 @@ static I2CBus *npcm7xx_i2c_get_bus(NPCM7xxState *soc, 
>> uint32_t num)
>>  return I2C_BUS(qdev_get_child_bus(DEVICE(>smbus[num]), "i2c-bus"));
>>  }
>>  
>> -static void at24c_eeprom_init(NPCM7xxState *soc, int bus, uint8_t addr,
>> -  uint32_t rsize)
>> +static void at24c_eeprom_init(I2CBus *i2c_bus, int bus, uint8_t addr,
>> +  uint32_t rsize, int unit_number)
>>  {
>> -I2CBus *i2c_bus = npcm7xx_i2c_get_bus(soc, bus);
>>  I2CSlave *i2c_dev = i2c_slave_new("at24c-eeprom", addr);
>>  DeviceState *dev = DEVICE(i2c_dev);
>> +DriveInfo *dinfo;
>>  
>> +dinfo = drive_get(IF_OTHER, bus, unit_number);
>> +if (dinfo) {
>> +qdev_prop_set_drive(dev, "drive", blk_by_legacy_dinfo(dinfo));
>> +}
>
> I may be missing the history of this series, but why do we have to use a
> legacy DriveInfo here instead of adding a drive property to the board to
> make this configurable with -blockdev?
>
> Adding even more devices that can only be configured with -drive feels
> like a step backwards to me.

More like sideways.

The big unfinished part of -blockdev is configuring onboard devices with
it.

I took a stab at one instance in commit ebc29e1bea "pc: Support firmware
configuration with -blockdev", 2019-03-11.  Quoting from the commit
message:

Mapping -drive if=none,... to -blockdev is a solved problem.  With
if=T other than if=none, -drive additionally configures a block device
frontend.  For non-onboard devices, that part maps to -device.  Also a
solved problem.  For onboard devices such as PC flash memory, we have
an unsolved problem.

This is actually an instance of a wider problem: our general device
configuration interface doesn't cover onboard devices.  Instead, we have
a zoo of ad hoc interfaces that are much more limited.  One of them is
-drive, which we'd rather deprecate, but can't until we have suitable
replacements for all its uses.

Sadly, I can't attack the wider problem today.  So back to the narrow
problem.

My first idea was to reduce it to its solved buddy by using pluggable
instead of onboard devices for the flash memory.  Workable, but it
requires some extra smarts in firmware descriptors and libvirt.  Paolo
had an idea that is simpler for libvirt: keep the devices onboard, and
add machine properties for their block backends.

The implementation is less than straightforward, I'm afraid.

First, block backend properties are *qdev* properties.  Machines can't
have those, as they're not devices.  I could duplicate these qdev
properties as QOM properties, but I hate that.

More seriously, the properties do not belong to the machine, they
belong to the onboard flash devices.  Adding them to the machine would
then require bad magic to somehow transfer them to the flash devices.
Fortunately, QOM provides the means to handle exactly this case: add
alias properties to the machine that forward to the onboard devices'
properties.

Properties need to be created in .instance_init() methods.  For PC
machines, that's pc_machine_initfn().  To make alias properties work,
we need to create the onboard flash devices there, too.  Requires
several bug fixes, in the previous commits.  We also have to realize
the devices.  More on that below.

The need to create onboard devices differently results in a non-trivial
refactoring.  The need for keeping -drive working complicates things
further.

The "several bug fixes" took me 26 preparatory patches, plus at least
three more to fix regressions caused by one of them.

I then did the same for ARM virt in commit e0561e60f1 "hw/arm/virt:
Support firmware configuration with -blockdev", 2019-05-07.  Only a few
preparatory patches, but the patch

Re: [PATCH v4 03/25] assertions for block global state API

2021-11-15 Thread Hanna Reitz


On 15.11.21 13:27, Emanuele Giuseppe Esposito wrote:


@@ -586,12 +589,14 @@ void bdrv_unapply_subtree_drain(BdrvChild 
*child, BlockDriverState *old_parent)

  void coroutine_fn bdrv_co_drain(BlockDriverState *bs)
  {
  assert(qemu_in_coroutine());
+    assert(qemu_in_main_thread());
  bdrv_drained_begin(bs);
  bdrv_drained_end(bs);
  }
  void bdrv_drain(BlockDriverState *bs)
  {
+    assert(qemu_in_main_thread());
  bdrv_drained_begin(bs);
  bdrv_drained_end(bs);
  }


Why are these GS functions when both bdrv_drained_begin() and 
bdrv_drained_end() are I/O functions?


I can understand making the drain_all functions GS functions, but it 
seems weird to say it’s an I/O function when a single BDS is drained 
via bdrv_drained_begin() and bdrv_drained_end(), but not via 
bdrv_drain(), which just does both.


(I can see that there are no I/O path callers, but I still find it 
strange.)


The problem as you saw is on the path callers: bdrv_drain seems to be 
called only in main thread, while others or similar drains are also 
coming from I/O. I would say maybe it's a better idea to just put 
everything I/O, maybe (probably) there are cases not caught by the tests.


The only ones in global-state will be:
- bdrv_apply_subtree_drain and bdrv_unapply_subtree_drain (called only 
in .attach and .detach callbacks of BdrvChildClass, run under BQL).

- bdrv_drain_all_end_quiesce (called only by bdrv_close thus under BQL).
- bdrv_drain_all_begin, bdrv_drain_all_end and bdrv_drain_all (have 
already BQL assertions)


In general, this is the underlying problem with these assertions: if a 
test doesn't test a specific code path, an unneeded assertion might 
pass undetected.


I believe the I/O vs. GS classification should not only be done based on 
functional concerns, i.e. who calls this function (is it called by an 
I/O function?) and whether it can be thread-safe or not (does it call a 
BQL function?), but also needs to be done semantically.  I believe there 
are some functions that we consider thread-safe but are also only called 
by BQL functions, for example apparently bdrv_drain(), 
bdrv_apply_subtree_drain(), and bdrv_unapply_subtree_drain().  Such 
functions can functionally be considered both GS and I/O functions, so 
the decision should be done semantically.  I believe that it makes sense 
to say all drain-related functions for a single subtree are I/O functions.


OTOH, functions operating on multiple trees in the block graph (i.e. all 
functions that have some “_all_” in their name) should naturally be 
considered GS functions, regardless of whether their implementation is 
thread-safe or not, because they are by nature global functions.


That’s why I’m wondering why some drain functions are I/O and others are 
GS; or why this block-status function is I/O and this one is GS.  It may 
make sense functionally, but semantically it’s strange.


I understand it may be difficult for you to know which functions relate 
to each other and thus know these semantic relationships, but in most of 
the series the split seems very reasonable to me, semantically.  If it 
didn’t, I said so. :)


Hanna

[PATCH-for-7.0 v4 08/11] tests/unit/test-smp-parse: Add 'smp-without-dies-valid' machine type

2021-11-15 Thread Philippe Mathieu-Daudé

Keep the common TYPE_MACHINE class initialization in
machine_base_class_init(), make it abstract, and move
the non-common code to a new class: "smp-without-dies-valid".

Signed-off-by: Philippe Mathieu-Daudé 
---
 tests/unit/test-smp-parse.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/tests/unit/test-smp-parse.c b/tests/unit/test-smp-parse.c
index dfe7f1313b0..90a249fe8c8 100644
--- a/tests/unit/test-smp-parse.c
+++ b/tests/unit/test-smp-parse.c
@@ -478,13 +478,19 @@ static void machine_base_class_init(ObjectClass *oc, void 
*data)
 {
 MachineClass *mc = MACHINE_CLASS(oc);
 
+mc->smp_props.prefer_sockets = true;
+
+mc->name = g_strdup(SMP_MACHINE_NAME);
+}
+
+static void machine_without_dies_valid_class_init(ObjectClass *oc, void *data)
+{
+MachineClass *mc = MACHINE_CLASS(oc);
+
 mc->min_cpus = MIN_CPUS;
 mc->max_cpus = MAX_CPUS;
 
-mc->smp_props.prefer_sockets = true;
 mc->smp_props.dies_supported = false;
-
-mc->name = g_strdup(SMP_MACHINE_NAME);
 }
 
 static void machine_without_dies_invalid_class_init(ObjectClass *oc, void 
*data)
@@ -606,9 +612,14 @@ static const TypeInfo smp_machine_types[] = {
 {
 .name   = TYPE_MACHINE,
 .parent = TYPE_OBJECT,
+.abstract   = true,
 .class_init = machine_base_class_init,
 .class_size = sizeof(MachineClass),
 .instance_size  = sizeof(MachineState),
+}, {
+.name   = MACHINE_TYPE_NAME("smp-without-dies-valid"),
+.parent = TYPE_MACHINE,
+.class_init = machine_without_dies_valid_class_init,
 }, {
 .name   = MACHINE_TYPE_NAME("smp-without-dies-invalid"),
 .parent = TYPE_MACHINE,
@@ -629,7 +640,7 @@ int main(int argc, char *argv[])
 g_test_init(, , NULL);
 
 g_test_add_data_func("/test-smp-parse/generic/valid",
- TYPE_MACHINE,
+ MACHINE_TYPE_NAME("smp-without-dies-valid"),
  test_generic_valid);
 g_test_add_data_func("/test-smp-parse/generic/invalid",
  MACHINE_TYPE_NAME("smp-without-dies-invalid"),
-- 
2.31.1

[PATCH-for-7.0 v4 07/11] tests/unit/test-smp-parse: Add 'smp-without-dies-invalid' machine type

2021-11-15 Thread Philippe Mathieu-Daudé

Avoid modifying the MachineClass internals by adding the
'smp-without-dies-invalid' machine, which inherits from TYPE_MACHINE.

Signed-off-by: Philippe Mathieu-Daudé 
---
 tests/unit/test-smp-parse.c | 26 --
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/tests/unit/test-smp-parse.c b/tests/unit/test-smp-parse.c
index ff61da06e3d..dfe7f1313b0 100644
--- a/tests/unit/test-smp-parse.c
+++ b/tests/unit/test-smp-parse.c
@@ -487,6 +487,17 @@ static void machine_base_class_init(ObjectClass *oc, void 
*data)
 mc->name = g_strdup(SMP_MACHINE_NAME);
 }
 
+static void machine_without_dies_invalid_class_init(ObjectClass *oc, void 
*data)
+{
+MachineClass *mc = MACHINE_CLASS(oc);
+
+/* Force invalid min CPUs and max CPUs */
+mc->min_cpus = 2;
+mc->max_cpus = 511;
+
+mc->smp_props.dies_supported = false;
+}
+
 static void machine_with_dies_class_init(ObjectClass *oc, void *data)
 {
 MachineClass *mc = MACHINE_CLASS(oc);
@@ -530,11 +541,6 @@ static void test_generic_invalid(const void *opaque)
 SMPTestData *data = &(SMPTestData){};
 int i;
 
-
-/* Force invalid min CPUs and max CPUs */
-mc->min_cpus = 2;
-mc->max_cpus = 511;
-
 for (i = 0; i < ARRAY_SIZE(data_generic_invalid); i++) {
 *data = data_generic_invalid[i];
 unsupported_params_init(mc, data);
@@ -542,10 +548,6 @@ static void test_generic_invalid(const void *opaque)
 smp_parse_test(ms, data, false);
 }
 
-/* Reset the supported min CPUs and max CPUs */
-mc->min_cpus = MIN_CPUS;
-mc->max_cpus = MAX_CPUS;
-
 object_unref(obj);
 }
 
@@ -607,6 +609,10 @@ static const TypeInfo smp_machine_types[] = {
 .class_init = machine_base_class_init,
 .class_size = sizeof(MachineClass),
 .instance_size  = sizeof(MachineState),
+}, {
+.name   = MACHINE_TYPE_NAME("smp-without-dies-invalid"),
+.parent = TYPE_MACHINE,
+.class_init = machine_without_dies_invalid_class_init,
 }, {
 .name   = MACHINE_TYPE_NAME("smp-with-dies"),
 .parent = TYPE_MACHINE,
@@ -626,7 +632,7 @@ int main(int argc, char *argv[])
  TYPE_MACHINE,
  test_generic_valid);
 g_test_add_data_func("/test-smp-parse/generic/invalid",
- TYPE_MACHINE,
+ MACHINE_TYPE_NAME("smp-without-dies-invalid"),
  test_generic_invalid);
 g_test_add_data_func("/test-smp-parse/with_dies",
  MACHINE_TYPE_NAME("smp-with-dies"),
-- 
2.31.1

[PATCH-for-6.2 v4 01/11] tests/unit/test-smp-parse: Restore MachineClass fields after modifying

2021-11-15 Thread Philippe Mathieu-Daudé

There is a single MachineClass object, registered with
type_register_static(_machine_info). Since the same
object is used multiple times (an MachineState object
is instantiated in both test_generic and test_with_dies),
we should restore its internal state after modifying for
the test purpose.

Reviewed-by: Andrew Jones 
Reviewed-by: Richard Henderson 
Reviewed-by: Yanan Wang 
Tested-by: Yanan Wang 
Signed-off-by: Philippe Mathieu-Daudé 
---
 tests/unit/test-smp-parse.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/tests/unit/test-smp-parse.c b/tests/unit/test-smp-parse.c
index cbe0c990494..47774c1ee2a 100644
--- a/tests/unit/test-smp-parse.c
+++ b/tests/unit/test-smp-parse.c
@@ -512,7 +512,7 @@ static void test_generic(void)
 smp_parse_test(ms, data, true);
 }
 
-/* Reset the supported min CPUs and max CPUs */
+/* Force invalid min CPUs and max CPUs */
 mc->min_cpus = 2;
 mc->max_cpus = 511;
 
@@ -523,6 +523,10 @@ static void test_generic(void)
 smp_parse_test(ms, data, false);
 }
 
+/* Reset the supported min CPUs and max CPUs */
+mc->min_cpus = MIN_CPUS;
+mc->max_cpus = MAX_CPUS;
+
 object_unref(obj);
 }
 
@@ -536,6 +540,7 @@ static void test_with_dies(void)
 int i;
 
 smp_machine_class_init(mc);
+/* Force the SMP compat properties */
 mc->smp_props.dies_supported = true;
 
 for (i = 0; i < ARRAY_SIZE(data_generic_valid); i++) {
@@ -575,6 +580,9 @@ static void test_with_dies(void)
 smp_parse_test(ms, data, false);
 }
 
+/* Restore the SMP compat properties */
+mc->smp_props.dies_supported = false;
+
 object_unref(obj);
 }
 
-- 
2.31.1

[PATCH v4 00/11] tests/unit: Fix test-smp-parse

2021-11-15 Thread Philippe Mathieu-Daudé

Missing review: patches #4 to #8 (new)

Yet another approach to fix test-smp-parse.  v2 from Yanan Wang:
https://lore.kernel.org/qemu-devel/2021024429.10568-1-wangyana...@huawei.com/

Here we use the QOM class_init() to avoid having to deal with
object_unref() and deinit().

Since v3:
- Restore 'dies_supported' field in test_with_dies (Yanan)
- Add R-b tags
- QOM-ify the TYPE_MACHINE classes

Philippe Mathieu-Daudé (11):
  tests/unit/test-smp-parse: Restore MachineClass fields after modifying
  tests/unit/test-smp-parse: QOM'ify smp_machine_class_init()
  tests/unit/test-smp-parse: Explicit MachineClass name
  tests/unit/test-smp-parse: Pass machine type as argument to tests
  tests/unit/test-smp-parse: Split the 'generic' test in valid / invalid
  tests/unit/test-smp-parse: Add 'smp-with-dies' machine type
  tests/unit/test-smp-parse: Add 'smp-without-dies-invalid' machine type
  tests/unit/test-smp-parse: Add 'smp-without-dies-valid' machine type
  tests/unit/test-smp-parse: Simplify pointer to compound literal use
  tests/unit/test-smp-parse: Constify some pointer/struct
  hw/core: Rename smp_parse() -> machine_parse_smp_config()

 include/hw/boards.h |   3 +-
 hw/core/machine-smp.c   |   6 +-
 hw/core/machine.c   |   2 +-
 tests/unit/test-smp-parse.c | 221 +++-
 4 files changed, 148 insertions(+), 84 deletions(-)

-- 
2.31.1

Re: does drive_get_next(IF_NONE) make sense?

2021-11-15 Thread Kevin Wolf

Am 15.11.2021 um 14:31 hat Peter Maydell geschrieben:
> On Mon, 15 Nov 2021 at 13:24, Kevin Wolf  wrote:
> > Same question as for Hao Wu's series: Wouldn't the proper solution be to
> > add a drive property to the machine type?
> >
> > If you can't use -blockdev, it's not done right.
> 
> Is there an example of "doing it right" for built-in-to-the-machine
> devices?
> 
> (My experience with the new-style options is that almost
> always they're designed for x86 where the device they're attached
> to is also created on the command line, and then handling of boards
> where the device is builtin is either an afterthought or forgotten.
> See also -netdev, where it took forever for built-in-ethernet to
> be usable.)

As long as we don't have a separate way to configure built-in devices
without creating them, for boards where the device is built in, the
properties for the device have to become machine properties.

I seem to remember that Markus implemented this for some boards.

Just doing without properties for these devices, either by just hard
coding things instead of providing options to the user, or by bypassing
the property system and accessing global state instead feels wrong.

Maybe once we have .instance_config (see my QOM/QAPI integration RFC),
forwarding these properties from -machine could actually become trivial,
just one call to set the properties for each built-in device. For now,
it's probably not as nice because each property needs to be forwarded
individually instead of just providing access to all properties of the
device.

Kevin

Re: [PATCH v4 24/25] job.h: split function pointers in JobDriver

2021-11-15 Thread Hanna Reitz


On 25.10.21 12:17, Emanuele Giuseppe Esposito wrote:

The job API will be handled separately in another serie.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
---
  include/qemu/job.h | 16 
  1 file changed, 16 insertions(+)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 6e67b6977f..7e9e59f4b8 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -169,12 +169,21 @@ typedef struct Job {
   * Callbacks and other information about a Job driver.
   */
  struct JobDriver {
+
+/* Fields initialized in struct definition and never changed. */


Like in patch 19, I’d prefer a slightly more verbose comment that I’d 
find more easily readable.



+
  /** Derived Job struct size */
  size_t instance_size;
  
  /** Enum describing the operation */

  JobType job_type;
  
+/*

+ * Functions run without regard to the BQL and may run in any


s/and/that/?


+ * arbitrary thread. These functions do not need to be thread-safe
+ * because the caller ensures that are invoked from one thread at time.


s/that/they/ (or “that they”)

I believe .run() must be run in the job’s context, though.  Not sure if 
that’s necessary to note, but it isn’t really an arbitrary thread, and 
block jobs certainly require this (because they run in the block 
device’s context).  Or is that something that’s going to change with I/O 
threading?


Hanna

[PULL 07/13] transactions: Invoke clean() after everything else

2021-11-15 Thread Kevin Wolf

From: Hanna Reitz 

Invoke the transaction drivers' .clean() methods only after all
.commit() or .abort() handlers are done.

This makes it easier to have nested transactions where the top-level
transactions pass objects to lower transactions that the latter can
still use throughout their commit/abort phases, while the top-level
transaction keeps a reference that is released in its .clean() method.

(Before this commit, that is also possible, but the top-level
transaction would need to take care to invoke tran_add() before the
lower-level transaction does.  This commit makes the ordering
irrelevant, which is just a bit nicer.)

Signed-off-by: Hanna Reitz 
Message-Id: <2021120829.81329-8-hre...@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Kevin Wolf 
---
 include/qemu/transactions.h | 3 +++
 util/transactions.c | 8 ++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/qemu/transactions.h b/include/qemu/transactions.h
index 92c5965235..2f2060acd9 100644
--- a/include/qemu/transactions.h
+++ b/include/qemu/transactions.h
@@ -31,6 +31,9 @@
  * tran_create(), call your "prepare" functions on it, and finally call
  * tran_abort() or tran_commit() to finalize the transaction by corresponding
  * finalization actions in reverse order.
+ *
+ * The clean() functions registered by the drivers in a transaction are called
+ * last, after all abort() or commit() functions have been called.
  */
 
 #ifndef QEMU_TRANSACTIONS_H
diff --git a/util/transactions.c b/util/transactions.c
index d0bc9a3e73..2dbdedce95 100644
--- a/util/transactions.c
+++ b/util/transactions.c
@@ -61,11 +61,13 @@ void tran_abort(Transaction *tran)
 {
 TransactionAction *act, *next;
 
-QSLIST_FOREACH_SAFE(act, >actions, entry, next) {
+QSLIST_FOREACH(act, >actions, entry) {
 if (act->drv->abort) {
 act->drv->abort(act->opaque);
 }
+}
 
+QSLIST_FOREACH_SAFE(act, >actions, entry, next) {
 if (act->drv->clean) {
 act->drv->clean(act->opaque);
 }
@@ -80,11 +82,13 @@ void tran_commit(Transaction *tran)
 {
 TransactionAction *act, *next;
 
-QSLIST_FOREACH_SAFE(act, >actions, entry, next) {
+QSLIST_FOREACH(act, >actions, entry) {
 if (act->drv->commit) {
 act->drv->commit(act->opaque);
 }
+}
 
+QSLIST_FOREACH_SAFE(act, >actions, entry, next) {
 if (act->drv->clean) {
 act->drv->clean(act->opaque);
 }
-- 
2.31.1

Re: [PATCH v2 0/3] virtio: increase VIRTQUEUE_MAX_SIZE to 32k

2021-11-15 Thread Christian Schoenebeck

On Montag, 15. November 2021 12:54:32 CET Stefan Hajnoczi wrote:
> On Thu, Nov 11, 2021 at 06:54:03PM +0100, Christian Schoenebeck wrote:
> > On Donnerstag, 11. November 2021 17:31:52 CET Stefan Hajnoczi wrote:
> > > On Wed, Nov 10, 2021 at 04:53:33PM +0100, Christian Schoenebeck wrote:
> > > > On Mittwoch, 10. November 2021 16:14:19 CET Stefan Hajnoczi wrote:
> > > > > On Wed, Nov 10, 2021 at 02:14:43PM +0100, Christian Schoenebeck 
wrote:
> > > > > > On Mittwoch, 10. November 2021 11:05:50 CET Stefan Hajnoczi wrote:
> > > > > > As you are apparently reluctant for changing the virtio specs,
> > > > > > what
> > > > > > about
> > > > > > introducing those discussed virtio capabalities either as
> > > > > > experimental
> > > > > > ones
> > > > > > without specs changes, or even just as 9p specific device
> > > > > > capabilities
> > > > > > for
> > > > > > now. I mean those could be revoked on both sides at any time
> > > > > > anyway.
> > > > > 
> > > > > I would like to understand the root cause before making changes.
> > > > > 
> > > > > "It's faster when I do X" is useful information but it doesn't
> > > > > necessarily mean doing X is the solution. The "it's faster when I do
> > > > > X
> > > > > because Y" part is missing in my mind. Once there is evidence that
> > > > > shows
> > > > > Y then it will be clearer if X is a good solution, if there's a more
> > > > > general solution, or if it was just a side-effect.
> > > > 
> > > > I think I made it clear that the root cause of the observed
> > > > performance
> > > > gain with rising transmission size is latency (and also that
> > > > performance
> > > > is not the only reason for addressing this queue size issue).
> > > > 
> > > > Each request roundtrip has a certain minimum latency, the virtio ring
> > > > alone
> > > > has its latency, plus latency of the controller portion of the file
> > > > server
> > > > (e.g. permissions, sandbox checks, file IDs) that is executed with
> > > > *every*
> > > > request, plus latency of dispatching the request handling between
> > > > threads
> > > > several times back and forth (also for each request).
> > > > 
> > > > Therefore when you split large payloads (e.g. reading a large file)
> > > > into
> > > > smaller n amount of chunks, then that individual latency per request
> > > > accumulates to n times the individual latency, eventually leading to
> > > > degraded transmission speed as those requests are serialized.
> > > 
> > > It's easy to increase the blocksize in benchmarks, but real applications
> > > offer less control over the I/O pattern. If latency in the device
> > > implementation (QEMU) is the root cause then reduce the latency to speed
> > > up all applications, even those that cannot send huge requests.
> > 
> > Which I did, still do, and also mentioned before, e.g.:
> > 
> > 8d6cb100731c4d28535adbf2a3c2d1f29be3fef4 9pfs: reduce latency of Twalk
> > 0c4356ba7dafc8ecb5877a42fc0d68d45ccf5951 9pfs: T_readdir latency
> > optimization
> > 
> > Reducing overall latency is a process that is ongoing and will still take
> > a
> > very long development time. Not because of me, but because of lack of
> > reviewers. And even then, it does not make the effort to support higher
> > transmission sizes obsolete.
> > 
> > > One idea is request merging on the QEMU side. If the application sends
> > > 10 sequential read or write requests, coalesce them together before the
> > > main part of request processing begins in the device. Process a single
> > > large request to spread the cost of the file server over the 10
> > > requests. (virtio-blk has request merging to help with the cost of lots
> > > of small qcow2 I/O requests.) The cool thing about this is that the
> > > guest does not need to change its I/O pattern to benefit from the
> > > optimization.
> > > 
> > > Stefan
> > 
> > Ok, don't get me wrong: I appreciate that you are suggesting approaches
> > that could improve things. But I could already hand you over a huge list
> > of mine. The limiting factor here is not the lack of ideas of what could
> > be improved, but rather the lack of people helping out actively on 9p
> > side:
> > https://lists.gnu.org/archive/html/qemu-devel/2021-10/msg06452.html
> > 
> > The situation on kernel side is the same. I already have a huge list of
> > what could & should be improved. But there is basically no reviewer for
> > 9p patches on Linux kernel side either.
> > 
> > The much I appreciate suggestions of what could be improved, I would
> > appreciate much more if there was *anybody* actively assisting as well. In
> > the time being I have to work the list down in small patch chunks,
> > priority based.
> I see request merging as an alternative to this patch series, not as an
> additional idea.

It is not an alternative. Like I said before, even if it would solve the 
sequential I/O performance issue (by not simply moving the problem somewhere 
else), which I doubt, your suggestion would still not solve the semantical

[PULL 09/13] block: Let replace_child_noperm free children

2021-11-15 Thread Kevin Wolf

From: Hanna Reitz 

In most of the block layer, especially when traversing down from other
BlockDriverStates, we assume that BdrvChild.bs can never be NULL.  When
it becomes NULL, it is expected that the corresponding BdrvChild pointer
also becomes NULL and the BdrvChild object is freed.

Therefore, once bdrv_replace_child_noperm() sets the BdrvChild.bs
pointer to NULL, it should also immediately set the corresponding
BdrvChild pointer (like bs->file or bs->backing) to NULL.

In that context, it also makes sense for this function to free the
child.  Sometimes we cannot do so, though, because it is called in a
transactional context where the caller might still want to reinstate the
child in the abort branch (and free it only on commit), so this behavior
has to remain optional.

In bdrv_replace_child_tran()'s abort handler, we now rely on the fact
that the BdrvChild passed to bdrv_replace_child_tran() must have had a
non-NULL .bs pointer initially.  Make a note of that and assert it.

Signed-off-by: Hanna Reitz 
Message-Id: <2021120829.81329-10-hre...@redhat.com>
Signed-off-by: Kevin Wolf 
---
 block.c | 102 +++-
 1 file changed, 79 insertions(+), 23 deletions(-)

diff --git a/block.c b/block.c
index a40027161c..0ac5b163d2 100644
--- a/block.c
+++ b/block.c
@@ -87,8 +87,10 @@ static BlockDriverState *bdrv_open_inherit(const char 
*filename,
 static bool bdrv_recurse_has_child(BlockDriverState *bs,
BlockDriverState *child);
 
+static void bdrv_child_free(BdrvChild *child);
 static void bdrv_replace_child_noperm(BdrvChild **child,
-  BlockDriverState *new_bs);
+  BlockDriverState *new_bs,
+  bool free_empty_child);
 static void bdrv_remove_file_or_backing_child(BlockDriverState *bs,
   BdrvChild *child,
   Transaction *tran);
@@ -2256,12 +2258,16 @@ typedef struct BdrvReplaceChildState {
 BdrvChild *child;
 BdrvChild **childp;
 BlockDriverState *old_bs;
+bool free_empty_child;
 } BdrvReplaceChildState;
 
 static void bdrv_replace_child_commit(void *opaque)
 {
 BdrvReplaceChildState *s = opaque;
 
+if (s->free_empty_child && !s->child->bs) {
+bdrv_child_free(s->child);
+}
 bdrv_unref(s->old_bs);
 }
 
@@ -2278,22 +2284,26 @@ static void bdrv_replace_child_abort(void *opaque)
  * modify the BdrvChild * pointer we indirectly pass to it, i.e. it
  * will not modify s->child.  From that perspective, it does not matter
  * whether we pass s->childp or >child.
- * (TODO: Right now, bdrv_replace_child_noperm() never modifies that
- * pointer anyway (though it will in the future), so at this point it
- * absolutely does not matter whether we pass s->childp or >child.)
  * (2) If new_bs is not NULL, s->childp will be NULL.  We then cannot use
  * it here.
  * (3) If new_bs is NULL, *s->childp will have been NULLed by
  * bdrv_replace_child_tran()'s bdrv_replace_child_noperm() call, and we
  * must not pass a NULL *s->childp here.
- * (TODO: In its current state, bdrv_replace_child_noperm() will not
- * have NULLed *s->childp, so this does not apply yet.  It will in the
- * future.)
  *
  * So whether new_bs was NULL or not, we cannot pass s->childp here; and in
  * any case, there is no reason to pass it anyway.
  */
-bdrv_replace_child_noperm(>child, s->old_bs);
+bdrv_replace_child_noperm(>child, s->old_bs, true);
+/*
+ * The child was pre-existing, so s->old_bs must be non-NULL, and
+ * s->child thus must not have been freed
+ */
+assert(s->child != NULL);
+if (!new_bs) {
+/* As described above, *s->childp was cleared, so restore it */
+assert(s->childp != NULL);
+*s->childp = s->child;
+}
 bdrv_unref(new_bs);
 }
 
@@ -2310,30 +2320,44 @@ static TransactionActionDrv bdrv_replace_child_drv = {
  *
  * The function doesn't update permissions, caller is responsible for this.
  *
+ * (*childp)->bs must not be NULL.
+ *
  * Note that if new_bs == NULL, @childp is stored in a state object attached
  * to @tran, so that the old child can be reinstated in the abort handler.
  * Therefore, if @new_bs can be NULL, @childp must stay valid until the
  * transaction is committed or aborted.
  *
- * (TODO: The reinstating does not happen yet, but it will once
- * bdrv_replace_child_noperm() NULLs *childp when new_bs is NULL.)
+ * If @free_empty_child is true and @new_bs is NULL, the BdrvChild is
+ * freed (on commit).  @free_empty_child should only be false if the
+ * caller will free the BDrvChild themselves (which may be important
+ * if this is in turn called in another transactional context).
  */
 static void

[PATCH-for-7.0 v4 09/11] tests/unit/test-smp-parse: Simplify pointer to compound literal use

2021-11-15 Thread Philippe Mathieu-Daudé

We can simply use a local variable (and pass its pointer) instead
of a pointer to a compound literal.

Reviewed-by: Andrew Jones 
Reviewed-by: Richard Henderson 
Reviewed-by: Yanan Wang 
Tested-by: Yanan Wang 
Signed-off-by: Philippe Mathieu-Daudé 
---
 tests/unit/test-smp-parse.c | 66 ++---
 1 file changed, 33 insertions(+), 33 deletions(-)

diff --git a/tests/unit/test-smp-parse.c b/tests/unit/test-smp-parse.c
index 90a249fe8c8..2f3bcf198a5 100644
--- a/tests/unit/test-smp-parse.c
+++ b/tests/unit/test-smp-parse.c
@@ -520,19 +520,19 @@ static void test_generic_valid(const void *opaque)
 Object *obj = object_new(machine_type);
 MachineState *ms = MACHINE(obj);
 MachineClass *mc = MACHINE_GET_CLASS(obj);
-SMPTestData *data = &(SMPTestData){{ }};
+SMPTestData data = {};
 int i;
 
 for (i = 0; i < ARRAY_SIZE(data_generic_valid); i++) {
-*data = data_generic_valid[i];
-unsupported_params_init(mc, data);
+data = data_generic_valid[i];
+unsupported_params_init(mc, );
 
-smp_parse_test(ms, data, true);
+smp_parse_test(ms, , true);
 
 /* Unsupported parameters can be provided with their values as 1 */
-data->config.has_dies = true;
-data->config.dies = 1;
-smp_parse_test(ms, data, true);
+data.config.has_dies = true;
+data.config.dies = 1;
+smp_parse_test(ms, , true);
 }
 
 object_unref(obj);
@@ -544,14 +544,14 @@ static void test_generic_invalid(const void *opaque)
 Object *obj = object_new(machine_type);
 MachineState *ms = MACHINE(obj);
 MachineClass *mc = MACHINE_GET_CLASS(obj);
-SMPTestData *data = &(SMPTestData){};
+SMPTestData data = {};
 int i;
 
 for (i = 0; i < ARRAY_SIZE(data_generic_invalid); i++) {
-*data = data_generic_invalid[i];
-unsupported_params_init(mc, data);
+data = data_generic_invalid[i];
+unsupported_params_init(mc, );
 
-smp_parse_test(ms, data, false);
+smp_parse_test(ms, , false);
 }
 
 object_unref(obj);
@@ -563,45 +563,45 @@ static void test_with_dies(const void *opaque)
 Object *obj = object_new(machine_type);
 MachineState *ms = MACHINE(obj);
 MachineClass *mc = MACHINE_GET_CLASS(obj);
-SMPTestData *data = &(SMPTestData){{ }};
+SMPTestData data = {};
 unsigned int num_dies = 2;
 int i;
 
 for (i = 0; i < ARRAY_SIZE(data_generic_valid); i++) {
-*data = data_generic_valid[i];
-unsupported_params_init(mc, data);
+data = data_generic_valid[i];
+unsupported_params_init(mc, );
 
 /* when dies parameter is omitted, it will be set as 1 */
-data->expect_prefer_sockets.dies = 1;
-data->expect_prefer_cores.dies = 1;
+data.expect_prefer_sockets.dies = 1;
+data.expect_prefer_cores.dies = 1;
 
-smp_parse_test(ms, data, true);
+smp_parse_test(ms, , true);
 
 /* when dies parameter is specified */
-data->config.has_dies = true;
-data->config.dies = num_dies;
-if (data->config.has_cpus) {
-data->config.cpus *= num_dies;
+data.config.has_dies = true;
+data.config.dies = num_dies;
+if (data.config.has_cpus) {
+data.config.cpus *= num_dies;
 }
-if (data->config.has_maxcpus) {
-data->config.maxcpus *= num_dies;
+if (data.config.has_maxcpus) {
+data.config.maxcpus *= num_dies;
 }
 
-data->expect_prefer_sockets.dies = num_dies;
-data->expect_prefer_sockets.cpus *= num_dies;
-data->expect_prefer_sockets.max_cpus *= num_dies;
-data->expect_prefer_cores.dies = num_dies;
-data->expect_prefer_cores.cpus *= num_dies;
-data->expect_prefer_cores.max_cpus *= num_dies;
+data.expect_prefer_sockets.dies = num_dies;
+data.expect_prefer_sockets.cpus *= num_dies;
+data.expect_prefer_sockets.max_cpus *= num_dies;
+data.expect_prefer_cores.dies = num_dies;
+data.expect_prefer_cores.cpus *= num_dies;
+data.expect_prefer_cores.max_cpus *= num_dies;
 
-smp_parse_test(ms, data, true);
+smp_parse_test(ms, , true);
 }
 
 for (i = 0; i < ARRAY_SIZE(data_with_dies_invalid); i++) {
-*data = data_with_dies_invalid[i];
-unsupported_params_init(mc, data);
+data = data_with_dies_invalid[i];
+unsupported_params_init(mc, );
 
-smp_parse_test(ms, data, false);
+smp_parse_test(ms, , false);
 }
 
 object_unref(obj);
-- 
2.31.1

Re: [PATCH] gitlab-ci: Split custom-runners.yml in one file per runner

2021-11-15 Thread Philippe Mathieu-Daudé

On 11/15/21 15:33, Thomas Huth wrote:
> On 15/11/2021 10.56, Philippe Mathieu-Daudé wrote:
>> To ease maintenance, add the custom-runners/ directory and
>> split custom-runners.yml in 3 files, all included by the
>> current custom-runners.yml:
>>   - ubuntu-18.04-s390x.yml
>>   - ubuntu-20.04-aarch64.yml
>>   - centos-stream-8-x86_64.yml
>>
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>> Based-on: <2021160501.862396-1-cr...@redhat.com>
>> https://gitlab.com/stsquad/qemu/-/tree/for-6.2/misc-fixes
>> ---
>>   .gitlab-ci.d/custom-runners.yml   | 268 +-
>>   .../custom-runners/centos-stream-8-x86_64.yml |  28 ++
>>   .../custom-runners/ubuntu-18.04-s390x.yml | 118 
>>   .../custom-runners/ubuntu-20.04-aarch64.yml   | 118 
>>   4 files changed, 268 insertions(+), 264 deletions(-)
>>   create mode 100644
>> .gitlab-ci.d/custom-runners/centos-stream-8-x86_64.yml
>>   create mode 100644 .gitlab-ci.d/custom-runners/ubuntu-18.04-s390x.yml
>>   create mode 100644 .gitlab-ci.d/custom-runners/ubuntu-20.04-aarch64.yml
> 
> Should there maybe be entries in MAINTAINERS for these new files, so
> that it is clear who takes care of the external runners?

Yes, this is the principal motivation behind this series.

I expect each maintainer to send their patch.

[PATCH-for-7.0 v4 11/11] hw/core: Rename smp_parse() -> machine_parse_smp_config()

2021-11-15 Thread Philippe Mathieu-Daudé

All methods related to MachineState are prefixed with "machine_".
smp_parse() does not need to be an exception. Rename it and
const'ify the SMPConfiguration argument, since it doesn't need
to be modified.

Reviewed-by: Andrew Jones 
Reviewed-by: Richard Henderson 
Reviewed-by: Yanan Wang 
Tested-by: Yanan Wang 
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/boards.h | 3 ++-
 hw/core/machine-smp.c   | 6 --
 hw/core/machine.c   | 2 +-
 tests/unit/test-smp-parse.c | 8 
 4 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/include/hw/boards.h b/include/hw/boards.h
index 9c1c1901046..7597cec4400 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -34,7 +34,8 @@ HotpluggableCPUList 
*machine_query_hotpluggable_cpus(MachineState *machine);
 void machine_set_cpu_numa_node(MachineState *machine,
const CpuInstanceProperties *props,
Error **errp);
-void smp_parse(MachineState *ms, SMPConfiguration *config, Error **errp);
+void machine_parse_smp_config(MachineState *ms,
+  const SMPConfiguration *config, Error **errp);
 
 /**
  * machine_class_allow_dynamic_sysbus_dev: Add type to list of valid devices
diff --git a/hw/core/machine-smp.c b/hw/core/machine-smp.c
index 116a0cbbfab..2cbfd574293 100644
--- a/hw/core/machine-smp.c
+++ b/hw/core/machine-smp.c
@@ -44,7 +44,8 @@ static char *cpu_hierarchy_to_string(MachineState *ms)
 }
 
 /*
- * smp_parse - Generic function used to parse the given SMP configuration
+ * machine_parse_smp_config: Generic function used to parse the given
+ *   SMP configuration
  *
  * Any missing parameter in "cpus/maxcpus/sockets/cores/threads" will be
  * automatically computed based on the provided ones.
@@ -63,7 +64,8 @@ static char *cpu_hierarchy_to_string(MachineState *ms)
  * introduced topology members which are likely to be target specific should
  * be directly set as 1 if they are omitted (e.g. dies for PC since 4.1).
  */
-void smp_parse(MachineState *ms, SMPConfiguration *config, Error **errp)
+void machine_parse_smp_config(MachineState *ms,
+  const SMPConfiguration *config, Error **errp)
 {
 MachineClass *mc = MACHINE_GET_CLASS(ms);
 unsigned cpus= config->has_cpus ? config->cpus : 0;
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 26ec54e7261..a2d3c9969d9 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -760,7 +760,7 @@ static void machine_set_smp(Object *obj, Visitor *v, const 
char *name,
 return;
 }
 
-smp_parse(ms, config, errp);
+machine_parse_smp_config(ms, config, errp);
 }
 
 static void machine_class_init(ObjectClass *oc, void *data)
diff --git a/tests/unit/test-smp-parse.c b/tests/unit/test-smp-parse.c
index 8f47a2e65f6..8e488e95145 100644
--- a/tests/unit/test-smp-parse.c
+++ b/tests/unit/test-smp-parse.c
@@ -337,7 +337,7 @@ static const struct SMPTestData data_with_dies_invalid[] = {
 },
 };
 
-static char *smp_config_to_string(SMPConfiguration *config)
+static char *smp_config_to_string(const SMPConfiguration *config)
 {
 return g_strdup_printf(
 "(SMPConfiguration) {\n"
@@ -371,7 +371,7 @@ static char *cpu_topology_to_string(const CpuTopology *topo)
 topo->cores, topo->threads, topo->max_cpus);
 }
 
-static void check_parse(MachineState *ms, SMPConfiguration *config,
+static void check_parse(MachineState *ms, const SMPConfiguration *config,
 const CpuTopology *expect_topo, const char *expect_err,
 bool is_valid)
 {
@@ -380,8 +380,8 @@ static void check_parse(MachineState *ms, SMPConfiguration 
*config,
 g_autofree char *output_topo_str = NULL;
 Error *err = NULL;
 
-/* call the generic parser smp_parse() */
-smp_parse(ms, config, );
+/* call the generic parser */
+machine_parse_smp_config(ms, config, );
 
 output_topo_str = cpu_topology_to_string(>smp);
 
-- 
2.31.1

1 2 >

1 - 100 of 191 matches

Mail list logo